* Windows slow boot: contractor wanted @ 2012-08-16 10:47 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-16 10:47 UTC (permalink / raw) To: qemu-devel, kvm Hi, We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a contractor to track down and fix problems we have with large memory Windows guests booting very slowly - they can take several hours. We previously reported these problems in July (copied below) and they are still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. This is a serious issue for us which is causing significant pain to our larger Windows VM customers when their servers are offline for many hours during boot. If anyone knowledgeable in the area would be interested in being paid to work on this, or if you know someone who might be, I would be delighted to hear from you. Cheers, Richard. ===== Previous bug report http://marc.info/?l=qemu-devel&m=134304194329745 We have been experiencing this problem for a while now too, using qemu-kvm (currently at 1.1.1). Unfortunately, hv_relaxed doesn't seem to fix it. The following command line produces the issue: qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img The hardware consists of dual AMD Opteron 6128 processors (16 cores in total) and 64GB of memory. This command line was tested on kernel 3.1.4. I've also tested with -no-hpet. What I have seen is much as described: the memory fills out slowly, and top on the host will show the process using 100% on all allocated CPU cores. The most extreme case was a machine which took something between 6 and 8 hours to boot. This seems to be related to the assigned memory, as described, but also the number of processor cores (which makes sense if we believe it's a timing issue?). I have seen slow-booting guests improved by switching down to a single or even two cores. Matthew, I agree that this seems to be linked to the number of VMs running - in fact, shutting down other VMs on a dedicated test host caused the machine to start booting at a normal speed (with no reboot required). However, the level of contention is never such that this could be explained by the host simply being overcommitted. If it helps anyone, there's an image of the hard drive I've been using to test at: http://46.20.114.253/ It's 5G of gzip file containing a fairly standard Windows 2008 trial installation. Since it's in the trial period, anyone who wants to use it may have to re-arm the trial: http://support.microsoft.com/kb/948472 Please let me know if I can provide any more information, or test anything. Best wishes, Owen Tuz ^ permalink raw reply [flat|nested] 101+ messages in thread
* [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-16 10:47 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-16 10:47 UTC (permalink / raw) To: qemu-devel, kvm Hi, We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a contractor to track down and fix problems we have with large memory Windows guests booting very slowly - they can take several hours. We previously reported these problems in July (copied below) and they are still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. This is a serious issue for us which is causing significant pain to our larger Windows VM customers when their servers are offline for many hours during boot. If anyone knowledgeable in the area would be interested in being paid to work on this, or if you know someone who might be, I would be delighted to hear from you. Cheers, Richard. ===== Previous bug report http://marc.info/?l=qemu-devel&m=134304194329745 We have been experiencing this problem for a while now too, using qemu-kvm (currently at 1.1.1). Unfortunately, hv_relaxed doesn't seem to fix it. The following command line produces the issue: qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img The hardware consists of dual AMD Opteron 6128 processors (16 cores in total) and 64GB of memory. This command line was tested on kernel 3.1.4. I've also tested with -no-hpet. What I have seen is much as described: the memory fills out slowly, and top on the host will show the process using 100% on all allocated CPU cores. The most extreme case was a machine which took something between 6 and 8 hours to boot. This seems to be related to the assigned memory, as described, but also the number of processor cores (which makes sense if we believe it's a timing issue?). I have seen slow-booting guests improved by switching down to a single or even two cores. Matthew, I agree that this seems to be linked to the number of VMs running - in fact, shutting down other VMs on a dedicated test host caused the machine to start booting at a normal speed (with no reboot required). However, the level of contention is never such that this could be explained by the host simply being overcommitted. If it helps anyone, there's an image of the hard drive I've been using to test at: http://46.20.114.253/ It's 5G of gzip file containing a fairly standard Windows 2008 trial installation. Since it's in the trial period, anyone who wants to use it may have to re-arm the trial: http://support.microsoft.com/kb/948472 Please let me know if I can provide any more information, or test anything. Best wishes, Owen Tuz ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-16 10:47 ` [Qemu-devel] " Richard Davies @ 2012-08-16 11:39 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-16 11:39 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/16/2012 01:47 PM, Richard Davies wrote: > Hi, > > We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a > contractor to track down and fix problems we have with large memory Windows > guests booting very slowly - they can take several hours. > > We previously reported these problems in July (copied below) and they are > still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. > > This is a serious issue for us which is causing significant pain to our > larger Windows VM customers when their servers are offline for many hours > during boot. > > If anyone knowledgeable in the area would be interested in being paid to > work on this, or if you know someone who might be, I would be delighted to > hear from you. > I happen to be gainfully employed but maybe I can help. Can you collect a trace during the slow boot period and post in somewhere? See http://www.linux-kvm.org/page/Tracing for instructions. 4G/8way is not a particularly large guest. What is the host configuration (memory, core count)? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-16 11:39 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-16 11:39 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/16/2012 01:47 PM, Richard Davies wrote: > Hi, > > We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a > contractor to track down and fix problems we have with large memory Windows > guests booting very slowly - they can take several hours. > > We previously reported these problems in July (copied below) and they are > still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. > > This is a serious issue for us which is causing significant pain to our > larger Windows VM customers when their servers are offline for many hours > during boot. > > If anyone knowledgeable in the area would be interested in being paid to > work on this, or if you know someone who might be, I would be delighted to > hear from you. > I happen to be gainfully employed but maybe I can help. Can you collect a trace during the slow boot period and post in somewhere? See http://www.linux-kvm.org/page/Tracing for instructions. 4G/8way is not a particularly large guest. What is the host configuration (memory, core count)? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-16 11:39 ` [Qemu-devel] " Avi Kivity @ 2012-08-17 12:36 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-17 12:36 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? Thank you, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-17 12:36 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-17 12:36 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? Thank you, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-17 12:36 ` [Qemu-devel] " Richard Davies @ 2012-08-17 13:02 ` Robert Vineyard -1 siblings, 0 replies; 101+ messages in thread From: Robert Vineyard @ 2012-08-17 13:02 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm Richard, Not sure if you've tried this, but I noticed massive performance gains (easily booting 2-3 times as fast) by converting from RAW disk images to direct-mapped raw partitions and making sure that IOMMU support was enabled in the BIOS and in the kernel at boot time. The obvious downside to using raw partitions is a loss of flexibility and portability across physical machines, but in some cases the trade-offs may be worth it. I never ran any formal benchmarks, but it "felt" like about a 50% performance boost going from RAW disk images to raw partitions (don't even think about using QCOW2 disk images for Windows, your VM's will still be booting next week...). The real gains, which I can't yet fully explain, came from passing "iommu=on intel_iommu=on" to the host kernel on bootup. I believe the boot option to enable IOMMU support may be different on AMD hardware. Granted, this is on a much smaller VM than you're using (Windows 7 x64 with two vCPUs and 4gb of vRAM), but might be worth investigating. Good luck! -- Robert Vineyard On 08/17/2012 08:36 AM, Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi at > first, but are grateful for all the other offers of help. We have a number > of other qemu-related projects which we'd be interested in getting done, and > will get in touch with these names (and anyone else who comes forward) to > see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > We are running with hv_relaxed since this was suggested in the previous > thread, but we see intermittent slow boots with and without this flag. > > > All 3 VMs are booting slowly for most of the attached capture, which I > started after confirming the slow boots and stopped as soon as the first of > them (15665) had booted. In terms of visible symptoms, the VMs are showing > the Windows boot progress bar, which is moving very slowly. In top, the VMs > are at 400% CPU and their resident state size (RES) memory is slowly > counting up until it reaches the full VM size, at which point they finish > booting. > > > Here are the trace files: > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > Please let me know if there is anything else which I can provide? > > Thank you, > > Richard. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-17 13:02 ` Robert Vineyard 0 siblings, 0 replies; 101+ messages in thread From: Robert Vineyard @ 2012-08-17 13:02 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel Richard, Not sure if you've tried this, but I noticed massive performance gains (easily booting 2-3 times as fast) by converting from RAW disk images to direct-mapped raw partitions and making sure that IOMMU support was enabled in the BIOS and in the kernel at boot time. The obvious downside to using raw partitions is a loss of flexibility and portability across physical machines, but in some cases the trade-offs may be worth it. I never ran any formal benchmarks, but it "felt" like about a 50% performance boost going from RAW disk images to raw partitions (don't even think about using QCOW2 disk images for Windows, your VM's will still be booting next week...). The real gains, which I can't yet fully explain, came from passing "iommu=on intel_iommu=on" to the host kernel on bootup. I believe the boot option to enable IOMMU support may be different on AMD hardware. Granted, this is on a much smaller VM than you're using (Windows 7 x64 with two vCPUs and 4gb of vRAM), but might be worth investigating. Good luck! -- Robert Vineyard On 08/17/2012 08:36 AM, Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi at > first, but are grateful for all the other offers of help. We have a number > of other qemu-related projects which we'd be interested in getting done, and > will get in touch with these names (and anyone else who comes forward) to > see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > We are running with hv_relaxed since this was suggested in the previous > thread, but we see intermittent slow boots with and without this flag. > > > All 3 VMs are booting slowly for most of the attached capture, which I > started after confirming the slow boots and stopped as soon as the first of > them (15665) had booted. In terms of visible symptoms, the VMs are showing > the Windows boot progress bar, which is moving very slowly. In top, the VMs > are at 400% CPU and their resident state size (RES) memory is slowly > counting up until it reaches the full VM size, at which point they finish > booting. > > > Here are the trace files: > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > Please let me know if there is anything else which I can provide? > > Thank you, > > Richard. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-17 13:02 ` [Qemu-devel] " Robert Vineyard @ 2012-08-18 14:44 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-18 14:44 UTC (permalink / raw) To: Robert Vineyard; +Cc: Avi Kivity, kvm, qemu-devel Hi Robert, Robert Vineyard wrote: > Not sure if you've tried this, but I noticed massive performance > gains (easily booting 2-3 times as fast) by converting from RAW disk > images to direct-mapped raw partitions and making sure that IOMMU > support was enabled in the BIOS and in the kernel at boot time. Thanks for the suggestions, but unfortunately do we have IOMMU support enabled, and in production (rather than this test case), we run from LVM LVs, which are effectively direct raw partitions and still have this slow boot problem. Thanks anyway, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-18 14:44 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-18 14:44 UTC (permalink / raw) To: Robert Vineyard; +Cc: Avi Kivity, kvm, qemu-devel Hi Robert, Robert Vineyard wrote: > Not sure if you've tried this, but I noticed massive performance > gains (easily booting 2-3 times as fast) by converting from RAW disk > images to direct-mapped raw partitions and making sure that IOMMU > support was enabled in the BIOS and in the kernel at boot time. Thanks for the suggestions, but unfortunately do we have IOMMU support enabled, and in production (rather than this test case), we run from LVM LVs, which are effectively direct raw partitions and still have this slow boot problem. Thanks anyway, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-17 12:36 ` [Qemu-devel] " Richard Davies @ 2012-08-19 5:02 ` Brian Jackson -1 siblings, 0 replies; 101+ messages in thread From: Brian Jackson @ 2012-08-19 5:02 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel [-- Attachment #1: Type: text/plain, Size: 2897 bytes --] On Friday 17 August 2012 07:36:42 Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi > at first, but are grateful for all the other offers of help. We have a > number of other qemu-related projects which we'd be interested in getting > done, and will get in touch with these names (and anyone else who comes > forward) to see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > We are running with hv_relaxed since this was suggested in the previous > thread, but we see intermittent slow boots with and without this flag. > > > All 3 VMs are booting slowly for most of the attached capture, which I > started after confirming the slow boots and stopped as soon as the first of > them (15665) had booted. In terms of visible symptoms, the VMs are showing > the Windows boot progress bar, which is moving very slowly. In top, the VMs > are at 400% CPU and their resident state size (RES) memory is slowly > counting up until it reaches the full VM size, at which point they finish > booting. What memory options have you tried? (KSM, hugepages, -mem-preallocate)? Is this only with 2008? (is that regular? R2?) Have you tried any of the hyperv features/hints? > > > Here are the trace files: > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > Please let me know if there is anything else which I can provide? > > Thank you, > > Richard. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Type: text/html, Size: 13237 bytes --] ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-19 5:02 ` Brian Jackson 0 siblings, 0 replies; 101+ messages in thread From: Brian Jackson @ 2012-08-19 5:02 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel [-- Attachment #1: Type: text/plain, Size: 2897 bytes --] On Friday 17 August 2012 07:36:42 Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi > at first, but are grateful for all the other offers of help. We have a > number of other qemu-related projects which we'd be interested in getting > done, and will get in touch with these names (and anyone else who comes > forward) to see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > We are running with hv_relaxed since this was suggested in the previous > thread, but we see intermittent slow boots with and without this flag. > > > All 3 VMs are booting slowly for most of the attached capture, which I > started after confirming the slow boots and stopped as soon as the first of > them (15665) had booted. In terms of visible symptoms, the VMs are showing > the Windows boot progress bar, which is moving very slowly. In top, the VMs > are at 400% CPU and their resident state size (RES) memory is slowly > counting up until it reaches the full VM size, at which point they finish > booting. What memory options have you tried? (KSM, hugepages, -mem-preallocate)? Is this only with 2008? (is that regular? R2?) Have you tried any of the hyperv features/hints? > > > Here are the trace files: > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > Please let me know if there is anything else which I can provide? > > Thank you, > > Richard. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Type: text/html, Size: 13237 bytes --] ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-19 5:02 ` [Qemu-devel] " Brian Jackson @ 2012-08-20 8:16 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-20 8:16 UTC (permalink / raw) To: Brian Jackson; +Cc: Avi Kivity, kvm, qemu-devel Brian Jackson wrote: > Richard Davies wrote: > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > > and 8 cores each (we have seen small VMs go slow as I originally said, but > > it is easier to trigger with big VMs): > > > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > What memory options have you tried? (KSM, hugepages, -mem-preallocate)? The host kernel has KSM and CONFIG_TRANSPARENT_HUGEPAGE=y and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y. Our qemu-kvm command lines are as above, so we aren't using -mem-prealloc. We'll try that. > Is this only with 2008? (is that regular? R2?) It is intermittent. We definitely see it with 2008 R2, and I believe with 2008 as well. We don't have many customers running earlier versions of Windows. > Have you tried any of the hyperv features/hints? We have tried "-cpu host" and "-cpu host,hv_relaxed" as above, which both exhibit the bug. What other hyperv options do you think we should try? Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-20 8:16 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-20 8:16 UTC (permalink / raw) To: Brian Jackson; +Cc: Avi Kivity, kvm, qemu-devel Brian Jackson wrote: > Richard Davies wrote: > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > > and 8 cores each (we have seen small VMs go slow as I originally said, but > > it is easier to trigger with big VMs): > > > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > What memory options have you tried? (KSM, hugepages, -mem-preallocate)? The host kernel has KSM and CONFIG_TRANSPARENT_HUGEPAGE=y and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y. Our qemu-kvm command lines are as above, so we aren't using -mem-prealloc. We'll try that. > Is this only with 2008? (is that regular? R2?) It is intermittent. We definitely see it with 2008 R2, and I believe with 2008 as well. We don't have many customers running earlier versions of Windows. > Have you tried any of the hyperv features/hints? We have tried "-cpu host" and "-cpu host,hv_relaxed" as above, which both exhibit the bug. What other hyperv options do you think we should try? Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-17 12:36 ` [Qemu-devel] " Richard Davies @ 2012-08-19 8:40 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-19 8:40 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/17/2012 03:36 PM, Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi at > first, but are grateful for all the other offers of help. We have a number > of other qemu-related projects which we'd be interested in getting done, and > will get in touch with these names (and anyone else who comes forward) to > see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > 40+40+40=120, pretty close to your server specs. Are you swapping? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-19 8:40 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-19 8:40 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/17/2012 03:36 PM, Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi at > first, but are grateful for all the other offers of help. We have a number > of other qemu-related projects which we'd be interested in getting done, and > will get in touch with these names (and anyone else who comes forward) to > see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > 40+40+40=120, pretty close to your server specs. Are you swapping? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-19 8:40 ` [Qemu-devel] " Avi Kivity @ 2012-08-19 8:51 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-19 8:51 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > > and 8 cores each (we have seen small VMs go slow as I originally said, but > > it is easier to trigger with big VMs): > > 40+40+40=120, pretty close to your server specs. Are you swapping? No - you can see on the "top" screenshot that there's no swap in use. Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-19 8:51 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-19 8:51 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > > and 8 cores each (we have seen small VMs go slow as I originally said, but > > it is easier to trigger with big VMs): > > 40+40+40=120, pretty close to your server specs. Are you swapping? No - you can see on the "top" screenshot that there's no swap in use. Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-17 12:36 ` [Qemu-devel] " Richard Davies @ 2012-08-19 14:04 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-19 14:04 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/17/2012 03:36 PM, Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi at > first, but are grateful for all the other offers of help. We have a number > of other qemu-related projects which we'd be interested in getting done, and > will get in touch with these names (and anyone else who comes forward) to > see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > We are running with hv_relaxed since this was suggested in the previous > thread, but we see intermittent slow boots with and without this flag. > > > All 3 VMs are booting slowly for most of the attached capture, which I > started after confirming the slow boots and stopped as soon as the first of > them (15665) had booted. In terms of visible symptoms, the VMs are showing > the Windows boot progress bar, which is moving very slowly. In top, the VMs > are at 400% CPU and their resident state size (RES) memory is slowly > counting up until it reaches the full VM size, at which point they finish > booting. > > > Here are the trace files: > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > Please let me know if there is anything else which I can provide? There are tons of PAUSE exits indicating cpu overcommit (and indeed you are overcommitted by about 50%). What host kernel version are you running? Does this reproduce without overcommit? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-19 14:04 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-19 14:04 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/17/2012 03:36 PM, Richard Davies wrote: > Hi Avi, > > Thanks to you and several others for offering help. We will work with Avi at > first, but are grateful for all the other offers of help. We have a number > of other qemu-related projects which we'd be interested in getting done, and > will get in touch with these names (and anyone else who comes forward) to > see if any are of interest to you. > > > This slow boot problem is intermittent and varys in how slow the boots are, > but I managed to trigger it this morning with medium slow booting (5-10 > minutes) and link to the requested traces below. > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > and 8 cores each (we have seen small VMs go slow as I originally said, but > it is easier to trigger with big VMs): > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > We are running with hv_relaxed since this was suggested in the previous > thread, but we see intermittent slow boots with and without this flag. > > > All 3 VMs are booting slowly for most of the attached capture, which I > started after confirming the slow boots and stopped as soon as the first of > them (15665) had booted. In terms of visible symptoms, the VMs are showing > the Windows boot progress bar, which is moving very slowly. In top, the VMs > are at 400% CPU and their resident state size (RES) memory is slowly > counting up until it reaches the full VM size, at which point they finish > booting. > > > Here are the trace files: > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > Please let me know if there is anything else which I can provide? There are tons of PAUSE exits indicating cpu overcommit (and indeed you are overcommitted by about 50%). What host kernel version are you running? Does this reproduce without overcommit? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-19 14:04 ` [Qemu-devel] " Avi Kivity @ 2012-08-20 13:56 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-20 13:56 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > Hi Avi, > > > > Thanks to you and several others for offering help. We will work with Avi at > > first, but are grateful for all the other offers of help. We have a number > > of other qemu-related projects which we'd be interested in getting done, and > > will get in touch with these names (and anyone else who comes forward) to > > see if any are of interest to you. > > > > > > This slow boot problem is intermittent and varys in how slow the boots are, > > but I managed to trigger it this morning with medium slow booting (5-10 > > minutes) and link to the requested traces below. > > > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > > and 8 cores each (we have seen small VMs go slow as I originally said, but > > it is easier to trigger with big VMs): > > > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > > > We are running with hv_relaxed since this was suggested in the previous > > thread, but we see intermittent slow boots with and without this flag. > > > > > > All 3 VMs are booting slowly for most of the attached capture, which I > > started after confirming the slow boots and stopped as soon as the first of > > them (15665) had booted. In terms of visible symptoms, the VMs are showing > > the Windows boot progress bar, which is moving very slowly. In top, the VMs > > are at 400% CPU and their resident state size (RES) memory is slowly > > counting up until it reaches the full VM size, at which point they finish > > booting. > > > > > > Here are the trace files: > > > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > > > > Please let me know if there is anything else which I can provide? > > > There are tons of PAUSE exits indicating cpu overcommit (and indeed you > are overcommitted by about 50%). > > What host kernel version are you running? > > Does this reproduce without overcommit? We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-20 13:56 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-20 13:56 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > Hi Avi, > > > > Thanks to you and several others for offering help. We will work with Avi at > > first, but are grateful for all the other offers of help. We have a number > > of other qemu-related projects which we'd be interested in getting done, and > > will get in touch with these names (and anyone else who comes forward) to > > see if any are of interest to you. > > > > > > This slow boot problem is intermittent and varys in how slow the boots are, > > but I managed to trigger it this morning with medium slow booting (5-10 > > minutes) and link to the requested traces below. > > > > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores > > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. > > > > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM > > and 8 cores each (we have seen small VMs go slow as I originally said, but > > it is easier to trigger with big VMs): > > > > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw > > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw > > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ > > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw > > > > We are running with hv_relaxed since this was suggested in the previous > > thread, but we see intermittent slow boots with and without this flag. > > > > > > All 3 VMs are booting slowly for most of the attached capture, which I > > started after confirming the slow boots and stopped as soon as the first of > > them (15665) had booted. In terms of visible symptoms, the VMs are showing > > the Windows boot progress bar, which is moving very slowly. In top, the VMs > > are at 400% CPU and their resident state size (RES) memory is slowly > > counting up until it reaches the full VM size, at which point they finish > > booting. > > > > > > Here are the trace files: > > > > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) > > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) > > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) > > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) > > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) > > > > > > Please let me know if there is anything else which I can provide? > > > There are tons of PAUSE exits indicating cpu overcommit (and indeed you > are overcommitted by about 50%). > > What host kernel version are you running? > > Does this reproduce without overcommit? We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-20 13:56 ` [Qemu-devel] " Richard Davies @ 2012-08-21 9:00 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-21 9:00 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel On 08/20/2012 04:56 PM, Richard Davies wrote: > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > I hadn't though about it, but I agree this is related to cpu overcommit. The > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > don't think it occurs without cpu overcommit. > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > reduce cpu overcommit then this will sometimes speed it up. > > I guess the question is why even with overcommit most boots are fine, but > some small fraction then go slow? Could be a bug. The scheduler and the spin-loop handling code fight each other instead of working well. Please provide snapshots of 'perf top' while a slow boot is in progress. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-21 9:00 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-21 9:00 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/20/2012 04:56 PM, Richard Davies wrote: > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > I hadn't though about it, but I agree this is related to cpu overcommit. The > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > don't think it occurs without cpu overcommit. > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > reduce cpu overcommit then this will sometimes speed it up. > > I guess the question is why even with overcommit most boots are fine, but > some small fraction then go slow? Could be a bug. The scheduler and the spin-loop handling code fight each other instead of working well. Please provide snapshots of 'perf top' while a slow boot is in progress. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-21 9:00 ` [Qemu-devel] " Avi Kivity @ 2012-08-21 15:21 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-21 15:21 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel Avi Kivity wrote: > Richard Davies wrote: > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > > > I hadn't though about it, but I agree this is related to cpu overcommit. The > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > > don't think it occurs without cpu overcommit. > > > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > > reduce cpu overcommit then this will sometimes speed it up. > > > > I guess the question is why even with overcommit most boots are fine, but > > some small fraction then go slow? > > Could be a bug. The scheduler and the spin-loop handling code fight > each other instead of working well. > > Please provide snapshots of 'perf top' while a slow boot is in progress. Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. There are a lot more "unprocessable samples recorded" messages at the end of each snapshot which I haven't included. I think these may be from the guest OS - the kernel is listed, and qemu-kvm itself is listed on some other traces which I did, although not these. Richard. PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------------- 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin 2.74% [kernel] [k] add_preempt_count 2.45% [kernel] [k] _raw_spin_unlock 2.33% [kernel] [k] sub_preempt_count 2.18% [kernel] [k] svm_vcpu_run 2.17% [kernel] [k] kvm_vcpu_yield_to 1.89% [kernel] [k] memcmp 1.50% [kernel] [k] get_pid_task 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.16% [kernel] [k] pid_task 0.70% [kernel] [k] rcu_note_context_switch 0.70% [kernel] [k] trace_hardirqs_on 0.52% [kernel] [k] __rcu_read_unlock 0.51% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.43% [kernel] [k] get_parent_ip 0.42% [kernel] [k] get_pageblock_flags_group 0.38% [kernel] [k] in_lock_functions 0.34% [kernel] [k] trace_preempt_off 0.34% [kernel] [k] trace_hardirqs_off 0.29% [kernel] [k] clear_page_c 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.14% [kernel] [k] handle_exit 0.11% libc-2.10.1.so [.] strcmp 0.11% [kernel] [k] _raw_spin_unlock_irqrestore 0.11% [kernel] [k] _raw_spin_lock_irq 0.11% [kernel] [k] find_highest_vector 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] copy_page_c 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] kmem_cache_alloc 0.08% [kernel] [k] resched_task 0.08% perf [.] dso__find_symbol 0.06% [kernel] [k] compaction_alloc 0.06% libc-2.10.1.so [.] 0x0000000000076dab 0.06% [kernel] [k] read_tsc 0.06% perf [.] add_hist_entry 0.05% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] copy_user_generic_string 0.05% [kernel] [k] ktime_get_update_offsets 0.04% [kernel] [k] kvm_check_async_pf_completion 0.04% [kernel] [k] __schedule 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] svm_complete_interrupts 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] apic_timer_interrupt 0.04% [vdso] [.] 0x00007fff2e1ff607 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% perf [.] sort__sym_cmp 0.03% [kernel] [k] find_next_bit 0.03% [kernel] [k] kvm_set_cr8 0.03% [kernel] [k] rcu_check_callbacks 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded. PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------------- 36.73% [kernel] [k] _raw_spin_lock_irqsave 19.00% [kernel] [k] isolate_freepages_block 5.80% [kernel] [k] yield_to 5.23% [kernel] [k] _raw_spin_lock 3.97% [kernel] [k] kvm_vcpu_on_spin 2.98% [kernel] [k] add_preempt_count 2.45% [kernel] [k] sub_preempt_count 2.37% [kernel] [k] _raw_spin_unlock 2.22% [kernel] [k] svm_vcpu_run 2.19% [kernel] [k] kvm_vcpu_yield_to 1.90% [kernel] [k] memcmp 1.54% [kernel] [k] get_pid_task 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.30% [kernel] [k] pid_task 0.75% [kernel] [k] rcu_note_context_switch 0.74% [kernel] [k] trace_hardirqs_on 0.58% [kernel] [k] __rcu_read_unlock 0.55% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.44% [kernel] [k] get_parent_ip 0.41% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pageblock_flags_group 0.39% [kernel] [k] in_lock_functions 0.36% [kernel] [k] trace_preempt_off 0.35% [kernel] [k] trace_hardirqs_off 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.15% [kernel] [k] _raw_spin_lock_irq 0.14% [kernel] [k] handle_exit 0.12% [kernel] [k] find_highest_vector 0.11% [kernel] [k] resched_task 0.10% libc-2.10.1.so [.] strcmp 0.09% [kernel] [k] _raw_spin_unlock_irqrestore 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] copy_page_c 0.07% [kernel] [k] __schedule 0.07% [kernel] [k] compact_zone 0.07% perf [.] dso__find_symbol 0.06% perf [.] add_hist_entry 0.06% [kernel] [k] read_tsc 0.06% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% [kernel] [k] ktime_get_update_offsets 0.05% [kernel] [k] compaction_alloc 0.05% libc-2.10.1.so [.] 0x0000000000073ae0 0.05% [kernel] [k] kmem_cache_alloc 0.05% [kernel] [k] svm_complete_interrupts 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] apic_timer_interrupt 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] kvm_set_cr8 0.04% [kernel] [k] update_curr 0.04% [kernel] [k] reschedule_interrupt 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% libc-2.10.1.so [.] strstr 0.03% [kernel] [k] apic_has_pending_timer 0.03% perf [.] sort__sym_cmp 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-21 15:21 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-21 15:21 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > > > I hadn't though about it, but I agree this is related to cpu overcommit. The > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > > don't think it occurs without cpu overcommit. > > > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > > reduce cpu overcommit then this will sometimes speed it up. > > > > I guess the question is why even with overcommit most boots are fine, but > > some small fraction then go slow? > > Could be a bug. The scheduler and the spin-loop handling code fight > each other instead of working well. > > Please provide snapshots of 'perf top' while a slow boot is in progress. Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. There are a lot more "unprocessable samples recorded" messages at the end of each snapshot which I haven't included. I think these may be from the guest OS - the kernel is listed, and qemu-kvm itself is listed on some other traces which I did, although not these. Richard. PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------------- 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin 2.74% [kernel] [k] add_preempt_count 2.45% [kernel] [k] _raw_spin_unlock 2.33% [kernel] [k] sub_preempt_count 2.18% [kernel] [k] svm_vcpu_run 2.17% [kernel] [k] kvm_vcpu_yield_to 1.89% [kernel] [k] memcmp 1.50% [kernel] [k] get_pid_task 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.16% [kernel] [k] pid_task 0.70% [kernel] [k] rcu_note_context_switch 0.70% [kernel] [k] trace_hardirqs_on 0.52% [kernel] [k] __rcu_read_unlock 0.51% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.43% [kernel] [k] get_parent_ip 0.42% [kernel] [k] get_pageblock_flags_group 0.38% [kernel] [k] in_lock_functions 0.34% [kernel] [k] trace_preempt_off 0.34% [kernel] [k] trace_hardirqs_off 0.29% [kernel] [k] clear_page_c 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.14% [kernel] [k] handle_exit 0.11% libc-2.10.1.so [.] strcmp 0.11% [kernel] [k] _raw_spin_unlock_irqrestore 0.11% [kernel] [k] _raw_spin_lock_irq 0.11% [kernel] [k] find_highest_vector 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] copy_page_c 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] kmem_cache_alloc 0.08% [kernel] [k] resched_task 0.08% perf [.] dso__find_symbol 0.06% [kernel] [k] compaction_alloc 0.06% libc-2.10.1.so [.] 0x0000000000076dab 0.06% [kernel] [k] read_tsc 0.06% perf [.] add_hist_entry 0.05% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] copy_user_generic_string 0.05% [kernel] [k] ktime_get_update_offsets 0.04% [kernel] [k] kvm_check_async_pf_completion 0.04% [kernel] [k] __schedule 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] svm_complete_interrupts 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] apic_timer_interrupt 0.04% [vdso] [.] 0x00007fff2e1ff607 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% perf [.] sort__sym_cmp 0.03% [kernel] [k] find_next_bit 0.03% [kernel] [k] kvm_set_cr8 0.03% [kernel] [k] rcu_check_callbacks 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded. PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------------- 36.73% [kernel] [k] _raw_spin_lock_irqsave 19.00% [kernel] [k] isolate_freepages_block 5.80% [kernel] [k] yield_to 5.23% [kernel] [k] _raw_spin_lock 3.97% [kernel] [k] kvm_vcpu_on_spin 2.98% [kernel] [k] add_preempt_count 2.45% [kernel] [k] sub_preempt_count 2.37% [kernel] [k] _raw_spin_unlock 2.22% [kernel] [k] svm_vcpu_run 2.19% [kernel] [k] kvm_vcpu_yield_to 1.90% [kernel] [k] memcmp 1.54% [kernel] [k] get_pid_task 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.30% [kernel] [k] pid_task 0.75% [kernel] [k] rcu_note_context_switch 0.74% [kernel] [k] trace_hardirqs_on 0.58% [kernel] [k] __rcu_read_unlock 0.55% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.44% [kernel] [k] get_parent_ip 0.41% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pageblock_flags_group 0.39% [kernel] [k] in_lock_functions 0.36% [kernel] [k] trace_preempt_off 0.35% [kernel] [k] trace_hardirqs_off 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.15% [kernel] [k] _raw_spin_lock_irq 0.14% [kernel] [k] handle_exit 0.12% [kernel] [k] find_highest_vector 0.11% [kernel] [k] resched_task 0.10% libc-2.10.1.so [.] strcmp 0.09% [kernel] [k] _raw_spin_unlock_irqrestore 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] copy_page_c 0.07% [kernel] [k] __schedule 0.07% [kernel] [k] compact_zone 0.07% perf [.] dso__find_symbol 0.06% perf [.] add_hist_entry 0.06% [kernel] [k] read_tsc 0.06% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% [kernel] [k] ktime_get_update_offsets 0.05% [kernel] [k] compaction_alloc 0.05% libc-2.10.1.so [.] 0x0000000000073ae0 0.05% [kernel] [k] kmem_cache_alloc 0.05% [kernel] [k] svm_complete_interrupts 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] apic_timer_interrupt 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] kvm_set_cr8 0.04% [kernel] [k] update_curr 0.04% [kernel] [k] reschedule_interrupt 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% libc-2.10.1.so [.] strstr 0.03% [kernel] [k] apic_has_pending_timer 0.03% perf [.] sort__sym_cmp 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted 2012-08-21 15:21 ` [Qemu-devel] " Richard Davies @ 2012-08-21 15:39 ` Troy Benjegerdes -1 siblings, 0 replies; 101+ messages in thread From: Troy Benjegerdes @ 2012-08-21 15:39 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm Do you have any way to determine what CPU groups the different VMs are running on? If you end up in an overcommit situation where half the 'virtual' cpus are on one AMD socket, and the other half are on a different AMD socket, then you'll be thrashing the hypertransport link. At Cray we were very carefull to never overcommit runnable processes to CPUS, and generally locked processes to a single cpu. Have a read of http://berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/ I'm going to speculate that when things don't work very well you end up with memory from a booting guest scattered across many different NUMA nodes/cpus, and then it really won't matter how good the spin loop/scheduler code is because you are bound by the additional latency and bandwidth limitations of running on one socekt and accessing half the memory that's resident on a different socket. On Tue, Aug 21, 2012 at 04:21:07PM +0100, Richard Davies wrote: > Avi Kivity wrote: > > Richard Davies wrote: > > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > > > > > I hadn't though about it, but I agree this is related to cpu overcommit. The > > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > > > don't think it occurs without cpu overcommit. > > > > > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > > > reduce cpu overcommit then this will sometimes speed it up. > > > > > > I guess the question is why even with overcommit most boots are fine, but > > > some small fraction then go slow? > > > > Could be a bug. The scheduler and the spin-loop handling code fight > > each other instead of working well. > > > > Please provide snapshots of 'perf top' while a slow boot is in progress. > > Below are two 'perf top' snapshots during a slow boot, which appear to me to > support your idea of a spin-lock problem. > > There are a lot more "unprocessable samples recorded" messages at the end of > each snapshot which I haven't included. I think these may be from the guest > OS - the kernel is listed, and qemu-kvm itself is listed on some other > traces which I did, although not these. > > Richard. > > > > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > -------------------------------------------------------------------------------------------------------------------------------- > > 35.80% [kernel] [k] _raw_spin_lock_irqsave > 21.64% [kernel] [k] isolate_freepages_block > 5.91% [kernel] [k] yield_to > 4.95% [kernel] [k] _raw_spin_lock > 3.37% [kernel] [k] kvm_vcpu_on_spin > 2.74% [kernel] [k] add_preempt_count > 2.45% [kernel] [k] _raw_spin_unlock > 2.33% [kernel] [k] sub_preempt_count > 2.18% [kernel] [k] svm_vcpu_run > 2.17% [kernel] [k] kvm_vcpu_yield_to > 1.89% [kernel] [k] memcmp > 1.50% [kernel] [k] get_pid_task > 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run > 1.16% [kernel] [k] pid_task > 0.70% [kernel] [k] rcu_note_context_switch > 0.70% [kernel] [k] trace_hardirqs_on > 0.52% [kernel] [k] __rcu_read_unlock > 0.51% [kernel] [k] trace_preempt_on > 0.47% [kernel] [k] __srcu_read_lock > 0.43% [kernel] [k] get_parent_ip > 0.42% [kernel] [k] get_pageblock_flags_group > 0.38% [kernel] [k] in_lock_functions > 0.34% [kernel] [k] trace_preempt_off > 0.34% [kernel] [k] trace_hardirqs_off > 0.29% [kernel] [k] clear_page_c > 0.23% [kernel] [k] __srcu_read_unlock > 0.20% [kernel] [k] __rcu_read_lock > 0.14% [kernel] [k] handle_exit > 0.11% libc-2.10.1.so [.] strcmp > 0.11% [kernel] [k] _raw_spin_unlock_irqrestore > 0.11% [kernel] [k] _raw_spin_lock_irq > 0.11% [kernel] [k] find_highest_vector > 0.09% [kernel] [k] ktime_get > 0.08% [kernel] [k] copy_page_c > 0.08% [kernel] [k] pause_interception > 0.08% [kernel] [k] kmem_cache_alloc > 0.08% [kernel] [k] resched_task > 0.08% perf [.] dso__find_symbol > 0.06% [kernel] [k] compaction_alloc > 0.06% libc-2.10.1.so [.] 0x0000000000076dab > 0.06% [kernel] [k] read_tsc > 0.06% perf [.] add_hist_entry > 0.05% [kernel] [k] svm_read_l1_tsc > 0.05% [kernel] [k] native_read_tsc > 0.05% perf [.] sort__dso_cmp > 0.05% [kernel] [k] copy_user_generic_string > 0.05% [kernel] [k] ktime_get_update_offsets > 0.04% [kernel] [k] kvm_check_async_pf_completion > 0.04% [kernel] [k] __schedule > 0.04% [kernel] [k] __rcu_pending > 0.04% [kernel] [k] svm_complete_interrupts > 0.04% [kernel] [k] perf_pmu_disable > 0.04% [kernel] [k] isolate_migratepages_range > 0.04% [kernel] [k] sched_clock_cpu > 0.04% [kernel] [k] kvm_cpu_has_pending_timer > 0.04% [kernel] [k] apic_timer_interrupt > 0.04% [vdso] [.] 0x00007fff2e1ff607 > 0.04% [kernel] [k] apic_update_ppr > 0.04% [kernel] [k] do_select > 0.04% [kernel] [k] svm_scale_tsc > 0.04% [kernel] [k] system_call_after_swapgs > 0.03% [kernel] [k] kvm_lapic_get_cr8 > 0.03% perf [.] sort__sym_cmp > 0.03% [kernel] [k] find_next_bit > 0.03% [kernel] [k] kvm_set_cr8 > 0.03% [kernel] [k] rcu_check_callbacks > 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded. > > > > PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > -------------------------------------------------------------------------------------------------------------------------------- > > 36.73% [kernel] [k] _raw_spin_lock_irqsave > 19.00% [kernel] [k] isolate_freepages_block > 5.80% [kernel] [k] yield_to > 5.23% [kernel] [k] _raw_spin_lock > 3.97% [kernel] [k] kvm_vcpu_on_spin > 2.98% [kernel] [k] add_preempt_count > 2.45% [kernel] [k] sub_preempt_count > 2.37% [kernel] [k] _raw_spin_unlock > 2.22% [kernel] [k] svm_vcpu_run > 2.19% [kernel] [k] kvm_vcpu_yield_to > 1.90% [kernel] [k] memcmp > 1.54% [kernel] [k] get_pid_task > 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run > 1.30% [kernel] [k] pid_task > 0.75% [kernel] [k] rcu_note_context_switch > 0.74% [kernel] [k] trace_hardirqs_on > 0.58% [kernel] [k] __rcu_read_unlock > 0.55% [kernel] [k] trace_preempt_on > 0.47% [kernel] [k] __srcu_read_lock > 0.44% [kernel] [k] get_parent_ip > 0.41% [kernel] [k] clear_page_c > 0.40% [kernel] [k] get_pageblock_flags_group > 0.39% [kernel] [k] in_lock_functions > 0.36% [kernel] [k] trace_preempt_off > 0.35% [kernel] [k] trace_hardirqs_off > 0.23% [kernel] [k] __srcu_read_unlock > 0.20% [kernel] [k] __rcu_read_lock > 0.15% [kernel] [k] _raw_spin_lock_irq > 0.14% [kernel] [k] handle_exit > 0.12% [kernel] [k] find_highest_vector > 0.11% [kernel] [k] resched_task > 0.10% libc-2.10.1.so [.] strcmp > 0.09% [kernel] [k] _raw_spin_unlock_irqrestore > 0.09% [kernel] [k] ktime_get > 0.08% [kernel] [k] pause_interception > 0.08% [kernel] [k] copy_page_c > 0.07% [kernel] [k] __schedule > 0.07% [kernel] [k] compact_zone > 0.07% perf [.] dso__find_symbol > 0.06% perf [.] add_hist_entry > 0.06% [kernel] [k] read_tsc > 0.06% [kernel] [k] svm_read_l1_tsc > 0.05% [kernel] [k] native_read_tsc > 0.05% [kernel] [k] ktime_get_update_offsets > 0.05% [kernel] [k] compaction_alloc > 0.05% libc-2.10.1.so [.] 0x0000000000073ae0 > 0.05% [kernel] [k] kmem_cache_alloc > 0.05% [kernel] [k] svm_complete_interrupts > 0.05% [kernel] [k] kvm_check_async_pf_completion > 0.05% [kernel] [k] apic_timer_interrupt > 0.05% perf [.] sort__dso_cmp > 0.05% [kernel] [k] kvm_cpu_has_pending_timer > 0.04% [kernel] [k] svm_scale_tsc > 0.04% [kernel] [k] isolate_migratepages_range > 0.04% [kernel] [k] sched_clock_cpu > 0.04% [kernel] [k] __rcu_pending > 0.04% [kernel] [k] apic_update_ppr > 0.04% [kernel] [k] do_select > 0.04% [kernel] [k] perf_pmu_disable > 0.04% [kernel] [k] kvm_set_cr8 > 0.04% [kernel] [k] update_curr > 0.04% [kernel] [k] reschedule_interrupt > 0.03% [kernel] [k] kvm_lapic_get_cr8 > 0.03% libc-2.10.1.so [.] strstr > 0.03% [kernel] [k] apic_has_pending_timer > 0.03% perf [.] sort__sym_cmp > 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded. > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-21 15:39 ` Troy Benjegerdes 0 siblings, 0 replies; 101+ messages in thread From: Troy Benjegerdes @ 2012-08-21 15:39 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel Do you have any way to determine what CPU groups the different VMs are running on? If you end up in an overcommit situation where half the 'virtual' cpus are on one AMD socket, and the other half are on a different AMD socket, then you'll be thrashing the hypertransport link. At Cray we were very carefull to never overcommit runnable processes to CPUS, and generally locked processes to a single cpu. Have a read of http://berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/ I'm going to speculate that when things don't work very well you end up with memory from a booting guest scattered across many different NUMA nodes/cpus, and then it really won't matter how good the spin loop/scheduler code is because you are bound by the additional latency and bandwidth limitations of running on one socekt and accessing half the memory that's resident on a different socket. On Tue, Aug 21, 2012 at 04:21:07PM +0100, Richard Davies wrote: > Avi Kivity wrote: > > Richard Davies wrote: > > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > > > > > I hadn't though about it, but I agree this is related to cpu overcommit. The > > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > > > don't think it occurs without cpu overcommit. > > > > > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > > > reduce cpu overcommit then this will sometimes speed it up. > > > > > > I guess the question is why even with overcommit most boots are fine, but > > > some small fraction then go slow? > > > > Could be a bug. The scheduler and the spin-loop handling code fight > > each other instead of working well. > > > > Please provide snapshots of 'perf top' while a slow boot is in progress. > > Below are two 'perf top' snapshots during a slow boot, which appear to me to > support your idea of a spin-lock problem. > > There are a lot more "unprocessable samples recorded" messages at the end of > each snapshot which I haven't included. I think these may be from the guest > OS - the kernel is listed, and qemu-kvm itself is listed on some other > traces which I did, although not these. > > Richard. > > > > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > -------------------------------------------------------------------------------------------------------------------------------- > > 35.80% [kernel] [k] _raw_spin_lock_irqsave > 21.64% [kernel] [k] isolate_freepages_block > 5.91% [kernel] [k] yield_to > 4.95% [kernel] [k] _raw_spin_lock > 3.37% [kernel] [k] kvm_vcpu_on_spin > 2.74% [kernel] [k] add_preempt_count > 2.45% [kernel] [k] _raw_spin_unlock > 2.33% [kernel] [k] sub_preempt_count > 2.18% [kernel] [k] svm_vcpu_run > 2.17% [kernel] [k] kvm_vcpu_yield_to > 1.89% [kernel] [k] memcmp > 1.50% [kernel] [k] get_pid_task > 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run > 1.16% [kernel] [k] pid_task > 0.70% [kernel] [k] rcu_note_context_switch > 0.70% [kernel] [k] trace_hardirqs_on > 0.52% [kernel] [k] __rcu_read_unlock > 0.51% [kernel] [k] trace_preempt_on > 0.47% [kernel] [k] __srcu_read_lock > 0.43% [kernel] [k] get_parent_ip > 0.42% [kernel] [k] get_pageblock_flags_group > 0.38% [kernel] [k] in_lock_functions > 0.34% [kernel] [k] trace_preempt_off > 0.34% [kernel] [k] trace_hardirqs_off > 0.29% [kernel] [k] clear_page_c > 0.23% [kernel] [k] __srcu_read_unlock > 0.20% [kernel] [k] __rcu_read_lock > 0.14% [kernel] [k] handle_exit > 0.11% libc-2.10.1.so [.] strcmp > 0.11% [kernel] [k] _raw_spin_unlock_irqrestore > 0.11% [kernel] [k] _raw_spin_lock_irq > 0.11% [kernel] [k] find_highest_vector > 0.09% [kernel] [k] ktime_get > 0.08% [kernel] [k] copy_page_c > 0.08% [kernel] [k] pause_interception > 0.08% [kernel] [k] kmem_cache_alloc > 0.08% [kernel] [k] resched_task > 0.08% perf [.] dso__find_symbol > 0.06% [kernel] [k] compaction_alloc > 0.06% libc-2.10.1.so [.] 0x0000000000076dab > 0.06% [kernel] [k] read_tsc > 0.06% perf [.] add_hist_entry > 0.05% [kernel] [k] svm_read_l1_tsc > 0.05% [kernel] [k] native_read_tsc > 0.05% perf [.] sort__dso_cmp > 0.05% [kernel] [k] copy_user_generic_string > 0.05% [kernel] [k] ktime_get_update_offsets > 0.04% [kernel] [k] kvm_check_async_pf_completion > 0.04% [kernel] [k] __schedule > 0.04% [kernel] [k] __rcu_pending > 0.04% [kernel] [k] svm_complete_interrupts > 0.04% [kernel] [k] perf_pmu_disable > 0.04% [kernel] [k] isolate_migratepages_range > 0.04% [kernel] [k] sched_clock_cpu > 0.04% [kernel] [k] kvm_cpu_has_pending_timer > 0.04% [kernel] [k] apic_timer_interrupt > 0.04% [vdso] [.] 0x00007fff2e1ff607 > 0.04% [kernel] [k] apic_update_ppr > 0.04% [kernel] [k] do_select > 0.04% [kernel] [k] svm_scale_tsc > 0.04% [kernel] [k] system_call_after_swapgs > 0.03% [kernel] [k] kvm_lapic_get_cr8 > 0.03% perf [.] sort__sym_cmp > 0.03% [kernel] [k] find_next_bit > 0.03% [kernel] [k] kvm_set_cr8 > 0.03% [kernel] [k] rcu_check_callbacks > 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded. > > > > PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > -------------------------------------------------------------------------------------------------------------------------------- > > 36.73% [kernel] [k] _raw_spin_lock_irqsave > 19.00% [kernel] [k] isolate_freepages_block > 5.80% [kernel] [k] yield_to > 5.23% [kernel] [k] _raw_spin_lock > 3.97% [kernel] [k] kvm_vcpu_on_spin > 2.98% [kernel] [k] add_preempt_count > 2.45% [kernel] [k] sub_preempt_count > 2.37% [kernel] [k] _raw_spin_unlock > 2.22% [kernel] [k] svm_vcpu_run > 2.19% [kernel] [k] kvm_vcpu_yield_to > 1.90% [kernel] [k] memcmp > 1.54% [kernel] [k] get_pid_task > 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run > 1.30% [kernel] [k] pid_task > 0.75% [kernel] [k] rcu_note_context_switch > 0.74% [kernel] [k] trace_hardirqs_on > 0.58% [kernel] [k] __rcu_read_unlock > 0.55% [kernel] [k] trace_preempt_on > 0.47% [kernel] [k] __srcu_read_lock > 0.44% [kernel] [k] get_parent_ip > 0.41% [kernel] [k] clear_page_c > 0.40% [kernel] [k] get_pageblock_flags_group > 0.39% [kernel] [k] in_lock_functions > 0.36% [kernel] [k] trace_preempt_off > 0.35% [kernel] [k] trace_hardirqs_off > 0.23% [kernel] [k] __srcu_read_unlock > 0.20% [kernel] [k] __rcu_read_lock > 0.15% [kernel] [k] _raw_spin_lock_irq > 0.14% [kernel] [k] handle_exit > 0.12% [kernel] [k] find_highest_vector > 0.11% [kernel] [k] resched_task > 0.10% libc-2.10.1.so [.] strcmp > 0.09% [kernel] [k] _raw_spin_unlock_irqrestore > 0.09% [kernel] [k] ktime_get > 0.08% [kernel] [k] pause_interception > 0.08% [kernel] [k] copy_page_c > 0.07% [kernel] [k] __schedule > 0.07% [kernel] [k] compact_zone > 0.07% perf [.] dso__find_symbol > 0.06% perf [.] add_hist_entry > 0.06% [kernel] [k] read_tsc > 0.06% [kernel] [k] svm_read_l1_tsc > 0.05% [kernel] [k] native_read_tsc > 0.05% [kernel] [k] ktime_get_update_offsets > 0.05% [kernel] [k] compaction_alloc > 0.05% libc-2.10.1.so [.] 0x0000000000073ae0 > 0.05% [kernel] [k] kmem_cache_alloc > 0.05% [kernel] [k] svm_complete_interrupts > 0.05% [kernel] [k] kvm_check_async_pf_completion > 0.05% [kernel] [k] apic_timer_interrupt > 0.05% perf [.] sort__dso_cmp > 0.05% [kernel] [k] kvm_cpu_has_pending_timer > 0.04% [kernel] [k] svm_scale_tsc > 0.04% [kernel] [k] isolate_migratepages_range > 0.04% [kernel] [k] sched_clock_cpu > 0.04% [kernel] [k] __rcu_pending > 0.04% [kernel] [k] apic_update_ppr > 0.04% [kernel] [k] do_select > 0.04% [kernel] [k] perf_pmu_disable > 0.04% [kernel] [k] kvm_set_cr8 > 0.04% [kernel] [k] update_curr > 0.04% [kernel] [k] reschedule_interrupt > 0.03% [kernel] [k] kvm_lapic_get_cr8 > 0.03% libc-2.10.1.so [.] strstr > 0.03% [kernel] [k] apic_has_pending_timer > 0.03% perf [.] sort__sym_cmp > 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded. > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-21 15:21 ` [Qemu-devel] " Richard Davies @ 2012-08-22 9:08 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-22 9:08 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel On 08/21/2012 06:21 PM, Richard Davies wrote: > Avi Kivity wrote: >> Richard Davies wrote: >> > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. >> > >> > I hadn't though about it, but I agree this is related to cpu overcommit. The >> > slow boots are intermittent (and infrequent) with cpu overcommit whereas I >> > don't think it occurs without cpu overcommit. >> > >> > In addition, if there is a slow boot ongoing, and you kill some other VMs to >> > reduce cpu overcommit then this will sometimes speed it up. >> > >> > I guess the question is why even with overcommit most boots are fine, but >> > some small fraction then go slow? >> >> Could be a bug. The scheduler and the spin-loop handling code fight >> each other instead of working well. >> >> Please provide snapshots of 'perf top' while a slow boot is in progress. > > Below are two 'perf top' snapshots during a slow boot, which appear to me to > support your idea of a spin-lock problem. > > There are a lot more "unprocessable samples recorded" messages at the end of > each snapshot which I haven't included. I think these may be from the guest > OS - the kernel is listed, and qemu-kvm itself is listed on some other > traces which I did, although not these. > > Richard. > > > > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > -------------------------------------------------------------------------------------------------------------------------------- > > 35.80% [kernel] [k] _raw_spin_lock_irqsave > 21.64% [kernel] [k] isolate_freepages_block Please disable ksm, and if this function persists in the profile, reduce some memory from the guests. > 5.91% [kernel] [k] yield_to > 4.95% [kernel] [k] _raw_spin_lock > 3.37% [kernel] [k] kvm_vcpu_on_spin Except for isolate_freepages_block, all functions up to here have to do with dealing with cpu overcommit. But let's deal with them after we see a profile with isolate_freepages_block removed. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 9:08 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-22 9:08 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/21/2012 06:21 PM, Richard Davies wrote: > Avi Kivity wrote: >> Richard Davies wrote: >> > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. >> > >> > I hadn't though about it, but I agree this is related to cpu overcommit. The >> > slow boots are intermittent (and infrequent) with cpu overcommit whereas I >> > don't think it occurs without cpu overcommit. >> > >> > In addition, if there is a slow boot ongoing, and you kill some other VMs to >> > reduce cpu overcommit then this will sometimes speed it up. >> > >> > I guess the question is why even with overcommit most boots are fine, but >> > some small fraction then go slow? >> >> Could be a bug. The scheduler and the spin-loop handling code fight >> each other instead of working well. >> >> Please provide snapshots of 'perf top' while a slow boot is in progress. > > Below are two 'perf top' snapshots during a slow boot, which appear to me to > support your idea of a spin-lock problem. > > There are a lot more "unprocessable samples recorded" messages at the end of > each snapshot which I haven't included. I think these may be from the guest > OS - the kernel is listed, and qemu-kvm itself is listed on some other > traces which I did, although not these. > > Richard. > > > > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > -------------------------------------------------------------------------------------------------------------------------------- > > 35.80% [kernel] [k] _raw_spin_lock_irqsave > 21.64% [kernel] [k] isolate_freepages_block Please disable ksm, and if this function persists in the profile, reduce some memory from the guests. > 5.91% [kernel] [k] yield_to > 4.95% [kernel] [k] _raw_spin_lock > 3.37% [kernel] [k] kvm_vcpu_on_spin Except for isolate_freepages_block, all functions up to here have to do with dealing with cpu overcommit. But let's deal with them after we see a profile with isolate_freepages_block removed. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 9:08 ` [Qemu-devel] " Avi Kivity @ 2012-08-22 12:40 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 12:40 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel Avi Kivity wrote: > Richard Davies wrote: > > Below are two 'perf top' snapshots during a slow boot, which appear to > > me to support your idea of a spin-lock problem. ... > > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > > -------------------------------------------------------------------------------------------------------------------------------- > > > > 35.80% [kernel] [k] _raw_spin_lock_irqsave > > 21.64% [kernel] [k] isolate_freepages_block > > Please disable ksm, and if this function persists in the profile, reduce > some memory from the guests. > > > 5.91% [kernel] [k] yield_to > > 4.95% [kernel] [k] _raw_spin_lock > > 3.37% [kernel] [k] kvm_vcpu_on_spin > > Except for isolate_freepages_block, all functions up to here have to do > with dealing with cpu overcommit. But let's deal with them after we see > a profile with isolate_freepages_block removed. I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll post agan when I get one. In the slowest boot that I have so far (1-2 minutes), this is the perf top ouput: PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 53.94% [kernel] [k] clear_page_c 2.77% [kernel] [k] svm_vcpu_put 2.60% [kernel] [k] svm_vcpu_run 1.79% [kernel] [k] sub_preempt_count 1.56% [kernel] [k] svm_vcpu_load 1.44% [kernel] [k] __schedule 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.34% [kernel] [k] resched_task 1.32% [kernel] [k] _raw_spin_lock 0.98% [kernel] [k] trace_preempt_on 0.95% [kernel] [k] get_parent_ip 0.94% [kernel] [k] yield_to 0.88% [kernel] [k] __switch_to 0.87% [kernel] [k] get_page_from_freelist 0.81% [kernel] [k] in_lock_functions 0.76% [kernel] [k] add_preempt_count 0.72% [kernel] [k] kvm_vcpu_on_spin 0.69% [kernel] [k] free_pages_prepare 0.59% [kernel] [k] find_highest_vector 0.57% [kernel] [k] rcu_note_context_switch 0.55% [kernel] [k] paging64_walk_addr_generic 0.54% [kernel] [k] __srcu_read_lock 0.49% [kernel] [k] trace_preempt_off 0.47% [kernel] [k] reschedule_interrupt 0.45% [kernel] [k] sched_clock_cpu 0.40% [kernel] [k] trace_hardirqs_on 0.38% [kernel] [k] clear_huge_page 0.37% [kernel] [k] prep_compound_page 0.32% [kernel] [k] x86_emulate_instruction 0.32% [kernel] [k] _raw_spin_lock_irq 0.31% [kernel] [k] __srcu_read_unlock 0.31% [kernel] [k] trace_hardirqs_off 0.30% [kernel] [k] pick_next_task_fair 0.29% [kernel] [k] kvm_find_cpuid_entry 0.28% [kernel] [k] x86_decode_insn 0.26% [kernel] [k] kvm_cpu_has_pending_timer 0.26% [kernel] [k] init_emulate_ctxt 0.25% [kernel] [k] kvm_vcpu_yield_to 0.24% [kernel] [k] clear_buddies 0.24% [kernel] [k] gs_change 0.23% [kernel] [k] handle_exit 0.22% qemu-kvm [.] vnc_refresh_server_surface 0.22% [kernel] [k] update_min_vruntime 0.22% [kernel] [k] gfn_to_memslot 0.22% [kernel] [k] x86_emulate_insn 0.19% [kernel] [k] kvm_sched_out 0.19% [kernel] [k] pid_task 0.18% [kernel] [k] _raw_spin_unlock 0.18% libc-2.10.1.so [.] strcmp 0.17% [kernel] [k] get_pid_task 0.17% [kernel] [k] yield_task_fair 0.17% [kernel] [k] default_send_IPI_mask_sequence_phys 0.16% [kernel] [k] __rcu_read_unlock 0.16% [kernel] [k] kvm_get_cr8 0.16% [kernel] [k] native_sched_clock 0.16% [kernel] [k] do_insn_fetch 0.15% [kernel] [k] set_next_entity 0.14% [kernel] [k] update_rq_clock 0.14% [kernel] [k] __enqueue_entity 0.14% [kernel] [k] kvm_read_guest 0.13% qemu-kvm [.] g_hash_table_lookup 0.13% [kernel] [k] rb_erase 0.12% [kernel] [k] decode_operand 0.12% libz.so.1.2.3 [.] 0x0000000000006451 0.12% [kernel] [k] update_curr 0.12% [kernel] [k] apic_update_ppr 0.12% [kernel] [k] ktime_get 5207 unprocessable samples recorded.5208 unprocessable samples recorded.5209 unprocessable samples recorded.5210 unprocessable samples recorded.5211 unprocessable samples recorded.5212 unprocessable samples recorded.5213 unprocessable samples recorded.5214 unprocessable samples recorded.5215 unprocessable samples recorded.5216 unprocessable samples recorded.5217 unprocessable samples recorded.5218 unprocessable samples recorded.5219 unprocessable samples recorded.5220 unprocessable samples recorded.5221 unprocessable samples recorded.5222 unprocessable samples recorded.5223 unprocessable samples recorded.5224 Thanks, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 12:40 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 12:40 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > Below are two 'perf top' snapshots during a slow boot, which appear to > > me to support your idea of a spin-lock problem. ... > > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > > -------------------------------------------------------------------------------------------------------------------------------- > > > > 35.80% [kernel] [k] _raw_spin_lock_irqsave > > 21.64% [kernel] [k] isolate_freepages_block > > Please disable ksm, and if this function persists in the profile, reduce > some memory from the guests. > > > 5.91% [kernel] [k] yield_to > > 4.95% [kernel] [k] _raw_spin_lock > > 3.37% [kernel] [k] kvm_vcpu_on_spin > > Except for isolate_freepages_block, all functions up to here have to do > with dealing with cpu overcommit. But let's deal with them after we see > a profile with isolate_freepages_block removed. I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll post agan when I get one. In the slowest boot that I have so far (1-2 minutes), this is the perf top ouput: PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 53.94% [kernel] [k] clear_page_c 2.77% [kernel] [k] svm_vcpu_put 2.60% [kernel] [k] svm_vcpu_run 1.79% [kernel] [k] sub_preempt_count 1.56% [kernel] [k] svm_vcpu_load 1.44% [kernel] [k] __schedule 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.34% [kernel] [k] resched_task 1.32% [kernel] [k] _raw_spin_lock 0.98% [kernel] [k] trace_preempt_on 0.95% [kernel] [k] get_parent_ip 0.94% [kernel] [k] yield_to 0.88% [kernel] [k] __switch_to 0.87% [kernel] [k] get_page_from_freelist 0.81% [kernel] [k] in_lock_functions 0.76% [kernel] [k] add_preempt_count 0.72% [kernel] [k] kvm_vcpu_on_spin 0.69% [kernel] [k] free_pages_prepare 0.59% [kernel] [k] find_highest_vector 0.57% [kernel] [k] rcu_note_context_switch 0.55% [kernel] [k] paging64_walk_addr_generic 0.54% [kernel] [k] __srcu_read_lock 0.49% [kernel] [k] trace_preempt_off 0.47% [kernel] [k] reschedule_interrupt 0.45% [kernel] [k] sched_clock_cpu 0.40% [kernel] [k] trace_hardirqs_on 0.38% [kernel] [k] clear_huge_page 0.37% [kernel] [k] prep_compound_page 0.32% [kernel] [k] x86_emulate_instruction 0.32% [kernel] [k] _raw_spin_lock_irq 0.31% [kernel] [k] __srcu_read_unlock 0.31% [kernel] [k] trace_hardirqs_off 0.30% [kernel] [k] pick_next_task_fair 0.29% [kernel] [k] kvm_find_cpuid_entry 0.28% [kernel] [k] x86_decode_insn 0.26% [kernel] [k] kvm_cpu_has_pending_timer 0.26% [kernel] [k] init_emulate_ctxt 0.25% [kernel] [k] kvm_vcpu_yield_to 0.24% [kernel] [k] clear_buddies 0.24% [kernel] [k] gs_change 0.23% [kernel] [k] handle_exit 0.22% qemu-kvm [.] vnc_refresh_server_surface 0.22% [kernel] [k] update_min_vruntime 0.22% [kernel] [k] gfn_to_memslot 0.22% [kernel] [k] x86_emulate_insn 0.19% [kernel] [k] kvm_sched_out 0.19% [kernel] [k] pid_task 0.18% [kernel] [k] _raw_spin_unlock 0.18% libc-2.10.1.so [.] strcmp 0.17% [kernel] [k] get_pid_task 0.17% [kernel] [k] yield_task_fair 0.17% [kernel] [k] default_send_IPI_mask_sequence_phys 0.16% [kernel] [k] __rcu_read_unlock 0.16% [kernel] [k] kvm_get_cr8 0.16% [kernel] [k] native_sched_clock 0.16% [kernel] [k] do_insn_fetch 0.15% [kernel] [k] set_next_entity 0.14% [kernel] [k] update_rq_clock 0.14% [kernel] [k] __enqueue_entity 0.14% [kernel] [k] kvm_read_guest 0.13% qemu-kvm [.] g_hash_table_lookup 0.13% [kernel] [k] rb_erase 0.12% [kernel] [k] decode_operand 0.12% libz.so.1.2.3 [.] 0x0000000000006451 0.12% [kernel] [k] update_curr 0.12% [kernel] [k] apic_update_ppr 0.12% [kernel] [k] ktime_get 5207 unprocessable samples recorded.5208 unprocessable samples recorded.5209 unprocessable samples recorded.5210 unprocessable samples recorded.5211 unprocessable samples recorded.5212 unprocessable samples recorded.5213 unprocessable samples recorded.5214 unprocessable samples recorded.5215 unprocessable samples recorded.5216 unprocessable samples recorded.5217 unprocessable samples recorded.5218 unprocessable samples recorded.5219 unprocessable samples recorded.5220 unprocessable samples recorded.5221 unprocessable samples recorded.5222 unprocessable samples recorded.5223 unprocessable samples recorded.5224 Thanks, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 12:40 ` [Qemu-devel] " Richard Davies @ 2012-08-22 12:44 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-22 12:44 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel On 08/22/2012 03:40 PM, Richard Davies wrote: > > I can trigger the slow boots without KSM and they have the same profile, > with _raw_spin_lock_irqsave and isolate_freepages_block at the top. > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll > post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. > > In the slowest boot that I have so far (1-2 minutes), this is the perf top > ouput: > > > PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 53.94% [kernel] [k] clear_page_c > 2.77% [kernel] [k] svm_vcpu_put > 2.60% [kernel] [k] svm_vcpu_run > 1.79% [kernel] [k] sub_preempt_count > 1.56% [kernel] [k] svm_vcpu_load > 1.44% [kernel] [k] __schedule > 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run > 1.34% [kernel] [k] resched_task > 1.32% [kernel] [k] _raw_spin_lock > 0.98% [kernel] [k] trace_preempt_on > 0.95% [kernel] [k] get_parent_ip > 0.94% [kernel] [k] yield_to This is pretty normal, Widows is touching memory so clear_page_c() is called to scrub it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 12:44 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-22 12:44 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/22/2012 03:40 PM, Richard Davies wrote: > > I can trigger the slow boots without KSM and they have the same profile, > with _raw_spin_lock_irqsave and isolate_freepages_block at the top. > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll > post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. > > In the slowest boot that I have so far (1-2 minutes), this is the perf top > ouput: > > > PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 53.94% [kernel] [k] clear_page_c > 2.77% [kernel] [k] svm_vcpu_put > 2.60% [kernel] [k] svm_vcpu_run > 1.79% [kernel] [k] sub_preempt_count > 1.56% [kernel] [k] svm_vcpu_load > 1.44% [kernel] [k] __schedule > 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run > 1.34% [kernel] [k] resched_task > 1.32% [kernel] [k] _raw_spin_lock > 0.98% [kernel] [k] trace_preempt_on > 0.95% [kernel] [k] get_parent_ip > 0.94% [kernel] [k] yield_to This is pretty normal, Widows is touching memory so clear_page_c() is called to scrub it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 12:44 ` [Qemu-devel] " Avi Kivity @ 2012-08-22 14:41 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 14:41 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel Avi Kivity wrote: > Richard Davies wrote: > > I can trigger the slow boots without KSM and they have the same profile, > > with _raw_spin_lock_irqsave and isolate_freepages_block at the top. > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core > > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll > > post agan when I get one. > > I think you can go higher than that. But 120GB on a 128GB host is > pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 14:41 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 14:41 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > I can trigger the slow boots without KSM and they have the same profile, > > with _raw_spin_lock_irqsave and isolate_freepages_block at the top. > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core > > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll > > post agan when I get one. > > I think you can go higher than that. But 120GB on a 128GB host is > pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 14:41 ` [Qemu-devel] " Richard Davies @ 2012-08-22 14:53 ` Avi Kivity -1 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-22 14:53 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel On 08/22/2012 05:41 PM, Richard Davies wrote: > Avi Kivity wrote: >> Richard Davies wrote: >> > I can trigger the slow boots without KSM and they have the same profile, >> > with _raw_spin_lock_irqsave and isolate_freepages_block at the top. >> > >> > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core >> > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll >> > post agan when I get one. >> >> I think you can go higher than that. But 120GB on a 128GB host is >> pushing it. > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > (i.e. 108GB on a 128GB host). > > It has the same profile with _raw_spin_lock_irqsave and > isolate_freepages_block at the top. Then it's still memory starved. Please provide /proc/zoneinfo while this is happening. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 14:53 ` Avi Kivity 0 siblings, 0 replies; 101+ messages in thread From: Avi Kivity @ 2012-08-22 14:53 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 08/22/2012 05:41 PM, Richard Davies wrote: > Avi Kivity wrote: >> Richard Davies wrote: >> > I can trigger the slow boots without KSM and they have the same profile, >> > with _raw_spin_lock_irqsave and isolate_freepages_block at the top. >> > >> > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core >> > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll >> > post agan when I get one. >> >> I think you can go higher than that. But 120GB on a 128GB host is >> pushing it. > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > (i.e. 108GB on a 128GB host). > > It has the same profile with _raw_spin_lock_irqsave and > isolate_freepages_block at the top. Then it's still memory starved. Please provide /proc/zoneinfo while this is happening. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 14:53 ` [Qemu-devel] " Avi Kivity @ 2012-08-22 15:26 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 15:26 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel Avi Kivity wrote: > Richard Davies wrote: > > Avi Kivity wrote: > > > Richard Davies wrote: > > > > I can trigger the slow boots without KSM and they have the same > > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at > > > > the top. > > > > > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB > > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5 > > > > minutes). I'll post agan when I get one. > > > > > > I think you can go higher than that. But 120GB on a 128GB host is > > > pushing it. > > > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB > > host (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > Then it's still memory starved. > > Please provide /proc/zoneinfo while this is happening. Here are two copies at /proc/zoneinfo a minute or so apart during a situation where there are 3x 36GB 8-core VMs on a 128GB host, with two of the three VMs slow booting. Node 0, zone DMA pages free 3968 min 3 low 3 high 4 scanned 0 spanned 4080 present 3904 nr_free_pages 3968 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 nr_anon_transparent_hugepages 0 protection: (0, 3502, 32230, 32230) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 4 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 5 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 6 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 7 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 8 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 9 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 10 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 11 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 12 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 13 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 14 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 15 count: 0 high: 0 batch: 1 vm stats threshold: 10 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone DMA32 pages free 29798 min 917 low 1146 high 1375 scanned 0 spanned 1044480 present 896720 nr_free_pages 29798 nr_inactive_anon 0 nr_active_anon 817152 nr_inactive_file 29243 nr_active_file 574 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 1 nr_file_pages 29817 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 26 nr_slab_unreclaimable 2 nr_page_table_pages 244 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 30546 nr_written 30546 numa_hit 42617 numa_miss 124755 numa_foreign 0 numa_interleave 0 numa_local 42023 numa_other 125349 nr_anon_transparent_hugepages 1596 protection: (0, 0, 28728, 28728) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 60 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 292707 min 7524 low 9405 high 11286 scanned 0 spanned 7471104 present 7354368 nr_free_pages 292707 nr_inactive_anon 281 nr_active_anon 3024092 nr_inactive_file 1824853 nr_active_file 2050217 nr_unevictable 22 nr_mlock 22 nr_anon_pages 5103 nr_mapped 570 nr_file_pages 3875107 nr_dirty 1 nr_writeback 0 nr_slab_reclaimable 99328 nr_slab_unreclaimable 2701 nr_page_table_pages 8153 nr_kernel_stack 127 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 8 nr_dirtied 4910752 nr_written 4910735 numa_hit 11010852 numa_miss 973848 numa_foreign 6137099 numa_interleave 14102 numa_local 11003048 numa_other 981652 nr_anon_transparent_hugepages 5898 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 29 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 2 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 46 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 26 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 18 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 16 Node 1, zone Normal pages free 23288 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 23288 nr_inactive_anon 361430 nr_active_anon 5925377 nr_inactive_file 1779378 nr_active_file 76158 nr_unevictable 444 nr_mlock 444 nr_anon_pages 603 nr_mapped 990 nr_file_pages 1855911 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 60961 nr_slab_unreclaimable 1404 nr_page_table_pages 10197 nr_kernel_stack 22 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 97 nr_shmem 5 nr_dirtied 5000958 nr_written 5000955 numa_hit 4879358 numa_miss 4315336 numa_foreign 1710349 numa_interleave 14052 numa_local 4860081 numa_other 4334613 nr_anon_transparent_hugepages 12277 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 88 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 176 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 179 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 8519680 inactive_ratio: 17 Node 2, zone Normal pages free 11632 min 8448 low 10560 high 12672 scanned 3 spanned 8388608 present 8257536 nr_free_pages 11632 nr_inactive_anon 368719 nr_active_anon 6009871 nr_inactive_file 1721022 nr_active_file 47969 nr_unevictable 74 nr_mlock 74 nr_anon_pages 6741 nr_mapped 1678 nr_file_pages 1769100 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 31690 nr_slab_unreclaimable 1547 nr_page_table_pages 13178 nr_kernel_stack 52 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 5 nr_dirtied 3264512 nr_written 3264506 numa_hit 3701723 numa_miss 3141775 numa_foreign 768925 numa_interleave 14093 numa_local 3685078 numa_other 3158420 nr_anon_transparent_hugepages 12446 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 2 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 172 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 47 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 30 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 16908288 inactive_ratio: 17 Node 3, zone Normal pages free 42611 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 42611 nr_inactive_anon 273 nr_active_anon 5728983 nr_inactive_file 1787163 nr_active_file 638839 nr_unevictable 79 nr_mlock 79 nr_anon_pages 2091 nr_mapped 670 nr_file_pages 2426028 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 27949 nr_slab_unreclaimable 1417 nr_page_table_pages 12372 nr_kernel_stack 28 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 1 nr_dirtied 2734460 nr_written 2734448 numa_hit 5026640 numa_miss 1501721 numa_foreign 1441062 numa_interleave 14050 numa_local 5005951 numa_other 1522410 nr_anon_transparent_hugepages 11186 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 14 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 38 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 25296896 inactive_ratio: 17 ========================================================================== Node 0, zone DMA pages free 3968 min 3 low 3 high 4 scanned 0 spanned 4080 present 3904 nr_free_pages 3968 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 nr_anon_transparent_hugepages 0 protection: (0, 3502, 32230, 32230) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 4 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 5 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 6 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 7 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 8 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 9 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 10 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 11 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 12 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 13 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 14 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 15 count: 0 high: 0 batch: 1 vm stats threshold: 10 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone DMA32 pages free 29798 min 917 low 1146 high 1375 scanned 0 spanned 1044480 present 896720 nr_free_pages 29798 nr_inactive_anon 0 nr_active_anon 817152 nr_inactive_file 29243 nr_active_file 574 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 1 nr_file_pages 29817 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 26 nr_slab_unreclaimable 2 nr_page_table_pages 244 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 30546 nr_written 30546 numa_hit 42617 numa_miss 124755 numa_foreign 0 numa_interleave 0 numa_local 42023 numa_other 125349 nr_anon_transparent_hugepages 1596 protection: (0, 0, 28728, 28728) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 60 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 140658 min 7524 low 9405 high 11286 scanned 0 spanned 7471104 present 7354368 nr_free_pages 140658 nr_inactive_anon 281 nr_active_anon 3178381 nr_inactive_file 1824810 nr_active_file 2050331 nr_unevictable 22 nr_mlock 22 nr_anon_pages 5790 nr_mapped 570 nr_file_pages 3875179 nr_dirty 1 nr_writeback 0 nr_slab_reclaimable 97265 nr_slab_unreclaimable 2756 nr_page_table_pages 8369 nr_kernel_stack 127 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 5 nr_shmem 8 nr_dirtied 4911092 nr_written 4911074 numa_hit 11018781 numa_miss 975761 numa_foreign 6137358 numa_interleave 14102 numa_local 11009945 numa_other 984597 nr_anon_transparent_hugepages 6197 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 48 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 17 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 3 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 11 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 16 Node 1, zone Normal pages free 25982 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 25982 nr_inactive_anon 361430 nr_active_anon 5948303 nr_inactive_file 1757767 nr_active_file 76240 nr_unevictable 444 nr_mlock 444 nr_anon_pages 1001 nr_mapped 990 nr_file_pages 1834319 nr_dirty 2 nr_writeback 0 nr_slab_reclaimable 56778 nr_slab_unreclaimable 1404 nr_page_table_pages 10464 nr_kernel_stack 22 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 5 nr_dirtied 5001855 nr_written 5001853 numa_hit 4882365 numa_miss 4315400 numa_foreign 1711246 numa_interleave 14052 numa_local 4861540 numa_other 4336225 nr_anon_transparent_hugepages 12322 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 29 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 74 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 120 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 27 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 8519680 inactive_ratio: 17 Node 2, zone Normal pages free 8514 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 8514 nr_inactive_anon 385103 nr_active_anon 6307975 nr_inactive_file 1409493 nr_active_file 48031 nr_unevictable 74 nr_mlock 74 nr_anon_pages 6866 nr_mapped 1678 nr_file_pages 1457589 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 31690 nr_slab_unreclaimable 1537 nr_page_table_pages 13296 nr_kernel_stack 52 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 5 nr_dirtied 3264794 nr_written 3264788 numa_hit 3704905 numa_miss 3143298 numa_foreign 774847 numa_interleave 14093 numa_local 3688103 numa_other 3160100 nr_anon_transparent_hugepages 13051 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 175 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 170 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 8 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 4 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 16908288 inactive_ratio: 17 Node 3, zone Normal pages free 42068 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 42068 nr_inactive_anon 273 nr_active_anon 5729807 nr_inactive_file 1787193 nr_active_file 638901 nr_unevictable 79 nr_mlock 79 nr_anon_pages 2930 nr_mapped 670 nr_file_pages 2426099 nr_dirty 1 nr_writeback 0 nr_slab_reclaimable 27153 nr_slab_unreclaimable 1453 nr_page_table_pages 12710 nr_kernel_stack 27 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 1 nr_dirtied 2734473 nr_written 2734460 numa_hit 5030446 numa_miss 1506319 numa_foreign 1442082 numa_interleave 14050 numa_local 5008209 numa_other 1528556 nr_anon_transparent_hugepages 11186 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 9 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 29 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 33 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 25 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 50 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 25296896 inactive_ratio: 17 ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 15:26 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 15:26 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm Avi Kivity wrote: > Richard Davies wrote: > > Avi Kivity wrote: > > > Richard Davies wrote: > > > > I can trigger the slow boots without KSM and they have the same > > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at > > > > the top. > > > > > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB > > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5 > > > > minutes). I'll post agan when I get one. > > > > > > I think you can go higher than that. But 120GB on a 128GB host is > > > pushing it. > > > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB > > host (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > Then it's still memory starved. > > Please provide /proc/zoneinfo while this is happening. Here are two copies at /proc/zoneinfo a minute or so apart during a situation where there are 3x 36GB 8-core VMs on a 128GB host, with two of the three VMs slow booting. Node 0, zone DMA pages free 3968 min 3 low 3 high 4 scanned 0 spanned 4080 present 3904 nr_free_pages 3968 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 nr_anon_transparent_hugepages 0 protection: (0, 3502, 32230, 32230) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 4 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 5 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 6 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 7 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 8 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 9 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 10 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 11 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 12 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 13 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 14 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 15 count: 0 high: 0 batch: 1 vm stats threshold: 10 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone DMA32 pages free 29798 min 917 low 1146 high 1375 scanned 0 spanned 1044480 present 896720 nr_free_pages 29798 nr_inactive_anon 0 nr_active_anon 817152 nr_inactive_file 29243 nr_active_file 574 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 1 nr_file_pages 29817 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 26 nr_slab_unreclaimable 2 nr_page_table_pages 244 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 30546 nr_written 30546 numa_hit 42617 numa_miss 124755 numa_foreign 0 numa_interleave 0 numa_local 42023 numa_other 125349 nr_anon_transparent_hugepages 1596 protection: (0, 0, 28728, 28728) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 60 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 292707 min 7524 low 9405 high 11286 scanned 0 spanned 7471104 present 7354368 nr_free_pages 292707 nr_inactive_anon 281 nr_active_anon 3024092 nr_inactive_file 1824853 nr_active_file 2050217 nr_unevictable 22 nr_mlock 22 nr_anon_pages 5103 nr_mapped 570 nr_file_pages 3875107 nr_dirty 1 nr_writeback 0 nr_slab_reclaimable 99328 nr_slab_unreclaimable 2701 nr_page_table_pages 8153 nr_kernel_stack 127 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 8 nr_dirtied 4910752 nr_written 4910735 numa_hit 11010852 numa_miss 973848 numa_foreign 6137099 numa_interleave 14102 numa_local 11003048 numa_other 981652 nr_anon_transparent_hugepages 5898 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 29 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 2 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 46 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 26 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 18 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 16 Node 1, zone Normal pages free 23288 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 23288 nr_inactive_anon 361430 nr_active_anon 5925377 nr_inactive_file 1779378 nr_active_file 76158 nr_unevictable 444 nr_mlock 444 nr_anon_pages 603 nr_mapped 990 nr_file_pages 1855911 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 60961 nr_slab_unreclaimable 1404 nr_page_table_pages 10197 nr_kernel_stack 22 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 97 nr_shmem 5 nr_dirtied 5000958 nr_written 5000955 numa_hit 4879358 numa_miss 4315336 numa_foreign 1710349 numa_interleave 14052 numa_local 4860081 numa_other 4334613 nr_anon_transparent_hugepages 12277 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 88 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 176 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 179 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 8519680 inactive_ratio: 17 Node 2, zone Normal pages free 11632 min 8448 low 10560 high 12672 scanned 3 spanned 8388608 present 8257536 nr_free_pages 11632 nr_inactive_anon 368719 nr_active_anon 6009871 nr_inactive_file 1721022 nr_active_file 47969 nr_unevictable 74 nr_mlock 74 nr_anon_pages 6741 nr_mapped 1678 nr_file_pages 1769100 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 31690 nr_slab_unreclaimable 1547 nr_page_table_pages 13178 nr_kernel_stack 52 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 5 nr_dirtied 3264512 nr_written 3264506 numa_hit 3701723 numa_miss 3141775 numa_foreign 768925 numa_interleave 14093 numa_local 3685078 numa_other 3158420 nr_anon_transparent_hugepages 12446 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 2 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 172 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 47 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 30 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 16908288 inactive_ratio: 17 Node 3, zone Normal pages free 42611 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 42611 nr_inactive_anon 273 nr_active_anon 5728983 nr_inactive_file 1787163 nr_active_file 638839 nr_unevictable 79 nr_mlock 79 nr_anon_pages 2091 nr_mapped 670 nr_file_pages 2426028 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 27949 nr_slab_unreclaimable 1417 nr_page_table_pages 12372 nr_kernel_stack 28 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 1 nr_dirtied 2734460 nr_written 2734448 numa_hit 5026640 numa_miss 1501721 numa_foreign 1441062 numa_interleave 14050 numa_local 5005951 numa_other 1522410 nr_anon_transparent_hugepages 11186 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 14 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 38 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 25296896 inactive_ratio: 17 ========================================================================== Node 0, zone DMA pages free 3968 min 3 low 3 high 4 scanned 0 spanned 4080 present 3904 nr_free_pages 3968 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 nr_anon_transparent_hugepages 0 protection: (0, 3502, 32230, 32230) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 4 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 5 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 6 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 7 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 8 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 9 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 10 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 11 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 12 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 13 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 14 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 15 count: 0 high: 0 batch: 1 vm stats threshold: 10 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone DMA32 pages free 29798 min 917 low 1146 high 1375 scanned 0 spanned 1044480 present 896720 nr_free_pages 29798 nr_inactive_anon 0 nr_active_anon 817152 nr_inactive_file 29243 nr_active_file 574 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 1 nr_file_pages 29817 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 26 nr_slab_unreclaimable 2 nr_page_table_pages 244 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 30546 nr_written 30546 numa_hit 42617 numa_miss 124755 numa_foreign 0 numa_interleave 0 numa_local 42023 numa_other 125349 nr_anon_transparent_hugepages 1596 protection: (0, 0, 28728, 28728) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 5 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 10 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 60 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 140658 min 7524 low 9405 high 11286 scanned 0 spanned 7471104 present 7354368 nr_free_pages 140658 nr_inactive_anon 281 nr_active_anon 3178381 nr_inactive_file 1824810 nr_active_file 2050331 nr_unevictable 22 nr_mlock 22 nr_anon_pages 5790 nr_mapped 570 nr_file_pages 3875179 nr_dirty 1 nr_writeback 0 nr_slab_reclaimable 97265 nr_slab_unreclaimable 2756 nr_page_table_pages 8369 nr_kernel_stack 127 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 5 nr_shmem 8 nr_dirtied 4911092 nr_written 4911074 numa_hit 11018781 numa_miss 975761 numa_foreign 6137358 numa_interleave 14102 numa_local 11009945 numa_other 984597 nr_anon_transparent_hugepages 6197 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 48 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 17 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 3 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 11 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 1 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 16 Node 1, zone Normal pages free 25982 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 25982 nr_inactive_anon 361430 nr_active_anon 5948303 nr_inactive_file 1757767 nr_active_file 76240 nr_unevictable 444 nr_mlock 444 nr_anon_pages 1001 nr_mapped 990 nr_file_pages 1834319 nr_dirty 2 nr_writeback 0 nr_slab_reclaimable 56778 nr_slab_unreclaimable 1404 nr_page_table_pages 10464 nr_kernel_stack 22 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 5 nr_dirtied 5001855 nr_written 5001853 numa_hit 4882365 numa_miss 4315400 numa_foreign 1711246 numa_interleave 14052 numa_local 4861540 numa_other 4336225 nr_anon_transparent_hugepages 12322 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 29 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 74 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 120 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 27 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 8519680 inactive_ratio: 17 Node 2, zone Normal pages free 8514 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 8514 nr_inactive_anon 385103 nr_active_anon 6307975 nr_inactive_file 1409493 nr_active_file 48031 nr_unevictable 74 nr_mlock 74 nr_anon_pages 6866 nr_mapped 1678 nr_file_pages 1457589 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 31690 nr_slab_unreclaimable 1537 nr_page_table_pages 13296 nr_kernel_stack 52 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 5 nr_dirtied 3264794 nr_written 3264788 numa_hit 3704905 numa_miss 3143298 numa_foreign 774847 numa_interleave 14093 numa_local 3688103 numa_other 3160100 nr_anon_transparent_hugepages 13051 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 175 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 170 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 8 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 30 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 4 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 16908288 inactive_ratio: 17 Node 3, zone Normal pages free 42068 min 8448 low 10560 high 12672 scanned 0 spanned 8388608 present 8257536 nr_free_pages 42068 nr_inactive_anon 273 nr_active_anon 5729807 nr_inactive_file 1787193 nr_active_file 638901 nr_unevictable 79 nr_mlock 79 nr_anon_pages 2930 nr_mapped 670 nr_file_pages 2426099 nr_dirty 1 nr_writeback 0 nr_slab_reclaimable 27153 nr_slab_unreclaimable 1453 nr_page_table_pages 12710 nr_kernel_stack 27 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 1 nr_dirtied 2734473 nr_written 2734460 numa_hit 5030446 numa_miss 1506319 numa_foreign 1442082 numa_interleave 14050 numa_local 5008209 numa_other 1528556 nr_anon_transparent_hugepages 11186 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 2 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 3 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 4 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 5 count: 9 high: 186 batch: 31 vm stats threshold: 90 cpu: 6 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 7 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 8 count: 29 high: 186 batch: 31 vm stats threshold: 90 cpu: 9 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 10 count: 33 high: 186 batch: 31 vm stats threshold: 90 cpu: 11 count: 0 high: 186 batch: 31 vm stats threshold: 90 cpu: 12 count: 31 high: 186 batch: 31 vm stats threshold: 90 cpu: 13 count: 25 high: 186 batch: 31 vm stats threshold: 90 cpu: 14 count: 50 high: 186 batch: 31 vm stats threshold: 90 cpu: 15 count: 0 high: 186 batch: 31 vm stats threshold: 90 all_unreclaimable: 0 start_pfn: 25296896 inactive_ratio: 17 ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted 2012-08-22 14:53 ` [Qemu-devel] " Avi Kivity @ 2012-08-22 17:22 ` Troy Benjegerdes -1 siblings, 0 replies; 101+ messages in thread From: Troy Benjegerdes @ 2012-08-22 17:22 UTC (permalink / raw) To: Avi Kivity; +Cc: Richard Davies, qemu-devel, kvm > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > > (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > Then it's still memory starved. > > Please provide /proc/zoneinfo while this is happening. Is there a way to capture/reproduce this 'slow boot' behavior with a simple regression test? I'd like to know if it happens on a single-physical CPU socket machine, or just on dual-sockets. I'm also observing an interesting phenomenon here.. Kernel development can move so fast as to make regression testing pointless. ;) ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 17:22 ` Troy Benjegerdes 0 siblings, 0 replies; 101+ messages in thread From: Troy Benjegerdes @ 2012-08-22 17:22 UTC (permalink / raw) To: Avi Kivity; +Cc: Richard Davies, kvm, qemu-devel > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > > (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > Then it's still memory starved. > > Please provide /proc/zoneinfo while this is happening. Is there a way to capture/reproduce this 'slow boot' behavior with a simple regression test? I'd like to know if it happens on a single-physical CPU socket machine, or just on dual-sockets. I'm also observing an interesting phenomenon here.. Kernel development can move so fast as to make regression testing pointless. ;) ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted 2012-08-22 17:22 ` Troy Benjegerdes @ 2012-08-25 17:51 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-25 17:51 UTC (permalink / raw) To: Troy Benjegerdes; +Cc: Avi Kivity, qemu-devel, kvm Troy Benjegerdes wrote: > Is there a way to capture/reproduce this 'slow boot' behavior with > a simple regression test? I'd like to know if it happens on a > single-physical CPU socket machine, or just on dual-sockets. Yes, definitely. These two emails earlier in the thread give a fairly complete description of what I am doing - please do ask any further questions? http://marc.info/?l=qemu-devel&m=134511429415347 http://marc.info/?l=qemu-devel&m=134520701317153 Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-25 17:51 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-25 17:51 UTC (permalink / raw) To: Troy Benjegerdes; +Cc: Avi Kivity, kvm, qemu-devel Troy Benjegerdes wrote: > Is there a way to capture/reproduce this 'slow boot' behavior with > a simple regression test? I'd like to know if it happens on a > single-physical CPU socket machine, or just on dual-sockets. Yes, definitely. These two emails earlier in the thread give a fairly complete description of what I am doing - please do ask any further questions? http://marc.info/?l=qemu-devel&m=134511429415347 http://marc.info/?l=qemu-devel&m=134520701317153 Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 14:41 ` [Qemu-devel] " Richard Davies @ 2012-08-22 15:21 ` Rik van Riel -1 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-08-22 15:21 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm On 08/22/2012 10:41 AM, Richard Davies wrote: > Avi Kivity wrote: >> Richard Davies wrote: >>> I can trigger the slow boots without KSM and they have the same profile, >>> with _raw_spin_lock_irqsave and isolate_freepages_block at the top. >>> >>> I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core >>> VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll >>> post agan when I get one. >> >> I think you can go higher than that. But 120GB on a 128GB host is >> pushing it. > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > (i.e. 108GB on a 128GB host). > > It has the same profile with _raw_spin_lock_irqsave and > isolate_freepages_block at the top. That's the page compaction code. Mel Gorman and I have been working to fix that, the latest fixes and improvements are in the -mm kernel already. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 15:21 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-08-22 15:21 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel On 08/22/2012 10:41 AM, Richard Davies wrote: > Avi Kivity wrote: >> Richard Davies wrote: >>> I can trigger the slow boots without KSM and they have the same profile, >>> with _raw_spin_lock_irqsave and isolate_freepages_block at the top. >>> >>> I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core >>> VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll >>> post agan when I get one. >> >> I think you can go higher than that. But 120GB on a 128GB host is >> pushing it. > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > (i.e. 108GB on a 128GB host). > > It has the same profile with _raw_spin_lock_irqsave and > isolate_freepages_block at the top. That's the page compaction code. Mel Gorman and I have been working to fix that, the latest fixes and improvements are in the -mm kernel already. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 15:21 ` [Qemu-devel] " Rik van Riel @ 2012-08-22 15:34 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 15:34 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm Rik van Riel wrote: > Richard Davies wrote: > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB > > host (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > That's the page compaction code. > > Mel Gorman and I have been working to fix that, the latest fixes and > improvements are in the -mm kernel already. Hi Rik, That's good news. Can you point me to specific patches which we can backport to a 3.5.2 kernel to test whether they fix our problem? Thanks, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-22 15:34 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-22 15:34 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel Rik van Riel wrote: > Richard Davies wrote: > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB > > host (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > That's the page compaction code. > > Mel Gorman and I have been working to fix that, the latest fixes and > improvements are in the -mm kernel already. Hi Rik, That's good news. Can you point me to specific patches which we can backport to a 3.5.2 kernel to test whether they fix our problem? Thanks, Richard. ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-22 15:21 ` [Qemu-devel] " Rik van Riel @ 2012-08-25 17:45 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-25 17:45 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm Rik van Riel wrote: > Richard Davies wrote: > > Avi Kivity wrote: > > > Richard Davies wrote: > > > > I can trigger the slow boots without KSM and they have the same > > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at > > > > the top. > > > > > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB > > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5 > > > > minutes). I'll post agan when I get one. > > > > > > I think you can go higher than that. But 120GB on a 128GB host is > > > pushing it. > > > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > > (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > That's the page compaction code. > > Mel Gorman and I have been working to fix that, > the latest fixes and improvements are in the -mm > kernel already. Hi Rik, Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mm&m=134521289221259 If so, I believe those are in 3.6.0-rc3, so I tested with that. Unfortunately, I can still get the slow boots and perf top showing _raw_spin_lock_irqsave. Here are two perf top traces on 3.6.0-rc3. They do look a bit different from 3.5.2, but _raw_spin_lock_irqsave is still at the top: PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) ------------------------------------------------------------------------------------------------------------------ 61.85% [kernel] [k] _raw_spin_lock_irqsave 7.18% [kernel] [k] sub_preempt_count 5.03% [kernel] [k] isolate_freepages_block 2.49% [kernel] [k] yield_to 2.05% [kernel] [k] memcmp 2.01% [kernel] [k] compact_zone 1.76% [kernel] [k] add_preempt_count 1.52% [kernel] [k] _raw_spin_lock 1.31% [kernel] [k] kvm_vcpu_on_spin 0.92% [kernel] [k] svm_vcpu_run 0.78% [kernel] [k] __rcu_read_unlock 0.76% [kernel] [k] migrate_pages 0.68% [kernel] [k] kvm_vcpu_yield_to 0.46% [kernel] [k] pid_task 0.42% [kernel] [k] isolate_migratepages_range 0.41% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pid_task 0.40% [kernel] [k] get_parent_ip 0.39% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] trace_hardirqs_off 0.34% [kernel] [k] trace_hardirqs_on 0.32% [kernel] [k] _raw_spin_unlock_irqrestore 0.27% [kernel] [k] _raw_spin_unlock 0.22% [kernel] [k] mod_zone_page_state 0.21% [kernel] [k] rcu_note_context_switch 0.21% [kernel] [k] trace_preempt_on 0.21% [kernel] [k] trace_preempt_off 0.19% [kernel] [k] in_lock_functions 0.16% [kernel] [k] __srcu_read_lock 0.14% [kernel] [k] ktime_get 0.11% [kernel] [k] get_pageblock_flags_group 0.11% [kernel] [k] compact_checklock_irqsave 0.11% [kernel] [k] find_busiest_group 0.10% [kernel] [k] __srcu_read_unlock 0.09% [kernel] [k] __rcu_read_lock 0.09% libc-2.10.1.so [.] 0x0000000000072c9d 0.09% [kernel] [k] cpumask_next_and 0.08% [kernel] [k] smp_call_function_many 0.08% [kernel] [k] read_tsc 0.08% [kernel] [k] kmem_cache_alloc 0.08% libc-2.10.1.so [.] strcmp 0.08% [kernel] [k] generic_smp_call_function_interrupt 0.07% [kernel] [k] __schedule 0.07% qemu-kvm [.] main_loop_wait 0.07% [kernel] [k] __hrtimer_start_range_ns 0.06% qemu-kvm [.] qemu_iohandler_poll 0.06% [kernel] [k] ktime_get_update_offsets 0.06% [kernel] [k] ktime_add_safe 0.06% [kernel] [k] find_next_bit 0.06% [kernel] [k] irq_exit 0.06% [kernel] [k] select_task_rq_fair 0.06% [kernel] [k] handle_exit 0.05% [kernel] [k] update_curr 0.05% [kernel] [k] flush_tlb_func 0.05% perf [.] dso__find_symbol 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] rcu_check_callbacks 0.05% [kernel] [k] apic_update_ppr 0.05% [kernel] [k] irq_enter 0.04% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] copy_page_c 0.04% [kernel] [k] rcu_idle_exit_common.isra.34 0.04% [kernel] [k] load_balance 0.04% [kernel] [k] rb_erase 0.04% libc-2.10.1.so [.] __select 1904 unprocessable samples recorded.1905 unprocessable samples recorded. ... PerfTop: 49639 irqs/sec kernel:98.8% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) ------------------------------------------------------------------------------------------------------------------ 81.43% [kernel] [k] _raw_spin_lock_irqsave 6.19% [kernel] [k] sub_preempt_count 1.21% [kernel] [k] memcmp 1.03% [kernel] [k] compact_zone 0.72% [kernel] [k] smp_call_function_many 0.50% [kernel] [k] yield_to 0.49% [kernel] [k] add_preempt_count 0.43% [kernel] [k] svm_vcpu_run 0.41% [kernel] [k] _raw_spin_unlock_irqrestore 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] migrate_pages 0.38% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] isolate_migratepages_range 0.34% [kernel] [k] isolate_freepages_block 0.27% [kernel] [k] kvm_vcpu_on_spin 0.23% [kernel] [k] trace_hardirqs_off 0.21% [kernel] [k] mod_zone_page_state 0.20% [kernel] [k] __rcu_read_unlock 0.18% [kernel] [k] get_parent_ip 0.17% [kernel] [k] _raw_spin_lock 0.14% [kernel] [k] flush_tlb_func 0.14% [kernel] [k] trace_preempt_on 0.14% [kernel] [k] trace_preempt_off 0.14% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.14% [kernel] [k] trace_hardirqs_on 0.10% [kernel] [k] compact_checklock_irqsave 0.09% [kernel] [k] _raw_spin_lock_irq 0.09% [kernel] [k] __srcu_read_lock 0.07% [kernel] [k] in_lock_functions 0.07% [kernel] [k] copy_page_c 0.07% [kernel] [k] kmem_cache_alloc 0.07% libc-2.10.1.so [.] strcmp 0.06% [kernel] [k] _raw_spin_unlock 0.06% [kernel] [k] kvm_vcpu_yield_to 0.06% [kernel] [k] get_pid_task 0.06% [kernel] [k] ktime_get 0.06% [kernel] [k] call_function_interrupt 0.05% [kernel] [k] generic_smp_call_function_interrupt 0.05% [kernel] [k] ktime_get_update_offsets 0.05% [kernel] [k] pid_task 0.05% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] __srcu_read_unlock 0.04% [kernel] [k] get_pageblock_flags_group 0.04% [kernel] [k] rcu_note_context_switch 0.04% libc-2.10.1.so [.] 0x00000000000743ee 0.04% perf [.] dso__find_symbol 0.04% [kernel] [k] zone_watermark_ok 0.04% [vdso] [.] 0x00007fff9afff85d 0.03% [kernel] [k] __mod_zone_page_state 0.03% [kernel] [k] smp_call_function_interrupt 0.03% [kernel] [k] _cond_resched 0.03% [kernel] [k] read_tsc 0.03% [kernel] [k] sysret_check 0.03% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] default_send_IPI_mask_sequence_phys 0.03% perf [.] add_hist_entry 0.03% [kernel] [k] __schedule 0.03% perf [.] sort__dso_cmp 0.02% [kernel] [k] mutex_spin_on_owner 0.02% [kernel] [k] do_select 0.02% [kernel] [k] __rcu_read_lock 0.02% [kernel] [k] rcu_check_callbacks 0.02% [kernel] [k] handle_exit 0.02% [kernel] [k] apic_timer_interrupt 0.02% [kernel] [k] perf_pmu_disable 0.02% [kernel] [k] find_busiest_group 3665 unprocessable samples recorded.3666 unprocessable samples recorded. ... ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-25 17:45 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-25 17:45 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel Rik van Riel wrote: > Richard Davies wrote: > > Avi Kivity wrote: > > > Richard Davies wrote: > > > > I can trigger the slow boots without KSM and they have the same > > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at > > > > the top. > > > > > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB > > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5 > > > > minutes). I'll post agan when I get one. > > > > > > I think you can go higher than that. But 120GB on a 128GB host is > > > pushing it. > > > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > > (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > That's the page compaction code. > > Mel Gorman and I have been working to fix that, > the latest fixes and improvements are in the -mm > kernel already. Hi Rik, Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mm&m=134521289221259 If so, I believe those are in 3.6.0-rc3, so I tested with that. Unfortunately, I can still get the slow boots and perf top showing _raw_spin_lock_irqsave. Here are two perf top traces on 3.6.0-rc3. They do look a bit different from 3.5.2, but _raw_spin_lock_irqsave is still at the top: PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) ------------------------------------------------------------------------------------------------------------------ 61.85% [kernel] [k] _raw_spin_lock_irqsave 7.18% [kernel] [k] sub_preempt_count 5.03% [kernel] [k] isolate_freepages_block 2.49% [kernel] [k] yield_to 2.05% [kernel] [k] memcmp 2.01% [kernel] [k] compact_zone 1.76% [kernel] [k] add_preempt_count 1.52% [kernel] [k] _raw_spin_lock 1.31% [kernel] [k] kvm_vcpu_on_spin 0.92% [kernel] [k] svm_vcpu_run 0.78% [kernel] [k] __rcu_read_unlock 0.76% [kernel] [k] migrate_pages 0.68% [kernel] [k] kvm_vcpu_yield_to 0.46% [kernel] [k] pid_task 0.42% [kernel] [k] isolate_migratepages_range 0.41% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pid_task 0.40% [kernel] [k] get_parent_ip 0.39% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] trace_hardirqs_off 0.34% [kernel] [k] trace_hardirqs_on 0.32% [kernel] [k] _raw_spin_unlock_irqrestore 0.27% [kernel] [k] _raw_spin_unlock 0.22% [kernel] [k] mod_zone_page_state 0.21% [kernel] [k] rcu_note_context_switch 0.21% [kernel] [k] trace_preempt_on 0.21% [kernel] [k] trace_preempt_off 0.19% [kernel] [k] in_lock_functions 0.16% [kernel] [k] __srcu_read_lock 0.14% [kernel] [k] ktime_get 0.11% [kernel] [k] get_pageblock_flags_group 0.11% [kernel] [k] compact_checklock_irqsave 0.11% [kernel] [k] find_busiest_group 0.10% [kernel] [k] __srcu_read_unlock 0.09% [kernel] [k] __rcu_read_lock 0.09% libc-2.10.1.so [.] 0x0000000000072c9d 0.09% [kernel] [k] cpumask_next_and 0.08% [kernel] [k] smp_call_function_many 0.08% [kernel] [k] read_tsc 0.08% [kernel] [k] kmem_cache_alloc 0.08% libc-2.10.1.so [.] strcmp 0.08% [kernel] [k] generic_smp_call_function_interrupt 0.07% [kernel] [k] __schedule 0.07% qemu-kvm [.] main_loop_wait 0.07% [kernel] [k] __hrtimer_start_range_ns 0.06% qemu-kvm [.] qemu_iohandler_poll 0.06% [kernel] [k] ktime_get_update_offsets 0.06% [kernel] [k] ktime_add_safe 0.06% [kernel] [k] find_next_bit 0.06% [kernel] [k] irq_exit 0.06% [kernel] [k] select_task_rq_fair 0.06% [kernel] [k] handle_exit 0.05% [kernel] [k] update_curr 0.05% [kernel] [k] flush_tlb_func 0.05% perf [.] dso__find_symbol 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] rcu_check_callbacks 0.05% [kernel] [k] apic_update_ppr 0.05% [kernel] [k] irq_enter 0.04% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] copy_page_c 0.04% [kernel] [k] rcu_idle_exit_common.isra.34 0.04% [kernel] [k] load_balance 0.04% [kernel] [k] rb_erase 0.04% libc-2.10.1.so [.] __select 1904 unprocessable samples recorded.1905 unprocessable samples recorded. ... PerfTop: 49639 irqs/sec kernel:98.8% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) ------------------------------------------------------------------------------------------------------------------ 81.43% [kernel] [k] _raw_spin_lock_irqsave 6.19% [kernel] [k] sub_preempt_count 1.21% [kernel] [k] memcmp 1.03% [kernel] [k] compact_zone 0.72% [kernel] [k] smp_call_function_many 0.50% [kernel] [k] yield_to 0.49% [kernel] [k] add_preempt_count 0.43% [kernel] [k] svm_vcpu_run 0.41% [kernel] [k] _raw_spin_unlock_irqrestore 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] migrate_pages 0.38% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] isolate_migratepages_range 0.34% [kernel] [k] isolate_freepages_block 0.27% [kernel] [k] kvm_vcpu_on_spin 0.23% [kernel] [k] trace_hardirqs_off 0.21% [kernel] [k] mod_zone_page_state 0.20% [kernel] [k] __rcu_read_unlock 0.18% [kernel] [k] get_parent_ip 0.17% [kernel] [k] _raw_spin_lock 0.14% [kernel] [k] flush_tlb_func 0.14% [kernel] [k] trace_preempt_on 0.14% [kernel] [k] trace_preempt_off 0.14% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.14% [kernel] [k] trace_hardirqs_on 0.10% [kernel] [k] compact_checklock_irqsave 0.09% [kernel] [k] _raw_spin_lock_irq 0.09% [kernel] [k] __srcu_read_lock 0.07% [kernel] [k] in_lock_functions 0.07% [kernel] [k] copy_page_c 0.07% [kernel] [k] kmem_cache_alloc 0.07% libc-2.10.1.so [.] strcmp 0.06% [kernel] [k] _raw_spin_unlock 0.06% [kernel] [k] kvm_vcpu_yield_to 0.06% [kernel] [k] get_pid_task 0.06% [kernel] [k] ktime_get 0.06% [kernel] [k] call_function_interrupt 0.05% [kernel] [k] generic_smp_call_function_interrupt 0.05% [kernel] [k] ktime_get_update_offsets 0.05% [kernel] [k] pid_task 0.05% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] __srcu_read_unlock 0.04% [kernel] [k] get_pageblock_flags_group 0.04% [kernel] [k] rcu_note_context_switch 0.04% libc-2.10.1.so [.] 0x00000000000743ee 0.04% perf [.] dso__find_symbol 0.04% [kernel] [k] zone_watermark_ok 0.04% [vdso] [.] 0x00007fff9afff85d 0.03% [kernel] [k] __mod_zone_page_state 0.03% [kernel] [k] smp_call_function_interrupt 0.03% [kernel] [k] _cond_resched 0.03% [kernel] [k] read_tsc 0.03% [kernel] [k] sysret_check 0.03% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] default_send_IPI_mask_sequence_phys 0.03% perf [.] add_hist_entry 0.03% [kernel] [k] __schedule 0.03% perf [.] sort__dso_cmp 0.02% [kernel] [k] mutex_spin_on_owner 0.02% [kernel] [k] do_select 0.02% [kernel] [k] __rcu_read_lock 0.02% [kernel] [k] rcu_check_callbacks 0.02% [kernel] [k] handle_exit 0.02% [kernel] [k] apic_timer_interrupt 0.02% [kernel] [k] perf_pmu_disable 0.02% [kernel] [k] find_busiest_group 3665 unprocessable samples recorded.3666 unprocessable samples recorded. ... ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-25 17:45 ` [Qemu-devel] " Richard Davies @ 2012-08-25 18:11 ` Rik van Riel -1 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-08-25 18:11 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm On 08/25/2012 01:45 PM, Richard Davies wrote: > Are you talking about these patches? > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 > http://marc.info/?l=linux-mm&m=134521289221259 > > If so, I believe those are in 3.6.0-rc3, so I tested with that. > > Unfortunately, I can still get the slow boots and perf top showing > _raw_spin_lock_irqsave. > > > Here are two perf top traces on 3.6.0-rc3. They do look a bit different from > 3.5.2, but _raw_spin_lock_irqsave is still at the top: > > PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > ------------------------------------------------------------------------------------------------------------------ > > 61.85% [kernel] [k] _raw_spin_lock_irqsave > 7.18% [kernel] [k] sub_preempt_count > 5.03% [kernel] [k] isolate_freepages_block > 2.49% [kernel] [k] yield_to > 2.05% [kernel] [k] memcmp > 2.01% [kernel] [k] compact_zone > 1.76% [kernel] [k] add_preempt_count > 1.52% [kernel] [k] _raw_spin_lock > 1.31% [kernel] [k] kvm_vcpu_on_spin > 0.92% [kernel] [k] svm_vcpu_run However, the compaction code is not as prominent as before. Can you get a backtrace to that _raw_spin_lock_irqsave, to see from where it is running into lock contention? It would be good to know whether it is isolate_freepages_block, yield_to, kvm_vcpu_on_spin or something else... -- All rights reversed ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-25 18:11 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-08-25 18:11 UTC (permalink / raw) To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel On 08/25/2012 01:45 PM, Richard Davies wrote: > Are you talking about these patches? > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 > http://marc.info/?l=linux-mm&m=134521289221259 > > If so, I believe those are in 3.6.0-rc3, so I tested with that. > > Unfortunately, I can still get the slow boots and perf top showing > _raw_spin_lock_irqsave. > > > Here are two perf top traces on 3.6.0-rc3. They do look a bit different from > 3.5.2, but _raw_spin_lock_irqsave is still at the top: > > PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) > ------------------------------------------------------------------------------------------------------------------ > > 61.85% [kernel] [k] _raw_spin_lock_irqsave > 7.18% [kernel] [k] sub_preempt_count > 5.03% [kernel] [k] isolate_freepages_block > 2.49% [kernel] [k] yield_to > 2.05% [kernel] [k] memcmp > 2.01% [kernel] [k] compact_zone > 1.76% [kernel] [k] add_preempt_count > 1.52% [kernel] [k] _raw_spin_lock > 1.31% [kernel] [k] kvm_vcpu_on_spin > 0.92% [kernel] [k] svm_vcpu_run However, the compaction code is not as prominent as before. Can you get a backtrace to that _raw_spin_lock_irqsave, to see from where it is running into lock contention? It would be good to know whether it is isolate_freepages_block, yield_to, kvm_vcpu_on_spin or something else... -- All rights reversed ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-25 18:11 ` [Qemu-devel] " Rik van Riel @ 2012-08-26 10:58 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-26 10:58 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel Rik van Riel wrote: > Can you get a backtrace to that _raw_spin_lock_irqsave, to see > from where it is running into lock contention? > > It would be good to know whether it is isolate_freepages_block, > yield_to, kvm_vcpu_on_spin or something else... Hi Rik, I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a while, then ran perf report with the output below. This trace looks more like the second perf top trace that I sent on Saturday (there were two in my email and they were different from each other as well as different from on 3.5.2). The symptoms were a bit different too - the VM boots appeared to be completely locked up rather than just slow, and I couldn't quit qemu-kvm at the monitor - I had to restart the host. So perhaps this one is actually a deadlock rather than just slow? Cheers, Richard. # ======== # captured on: Sun Aug 26 10:08:28 2012 # os release : 3.6.0-rc3-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131971760 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 2M of event 'cycles' # Event count (approx.): 1040676441385 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.99%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.91%-- 0x10100000002 | | | --45.09%-- 0x10100000006 --0.01%-- [...] 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count | --- sub_preempt_count | |--99.77%-- _raw_spin_unlock_irqrestore | | | |--99.99%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--51.94%-- 0x10100000002 | | | | | --48.06%-- 0x10100000006 | --0.01%-- [...] --0.23%-- [...] 1.23% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.83%-- memcmp_pages | | | |--78.46%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.54%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.17%-- [...] 0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.98%-- native_flush_tlb_others | | | |--99.86%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.14%-- [...] --0.02%-- [...] 0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | --- _raw_spin_unlock_irqrestore | |--96.08%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--65.19%-- 0x10100000006 | | | --34.81%-- 0x10100000002 | |--2.68%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.08%-- 0x10100000002 | | | --47.92%-- 0x10100000006 | |--0.56%-- ntp_tick_length | do_timer | tick_do_update_jiffies64 | tick_sched_timer | __run_hrtimer | hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000002 --0.68%-- [...] 0.30% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.95%-- cpu_idle | start_secondary --0.05%-- [...] 0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range | --- isolate_migratepages_range | |--97.41%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.02%-- 0x10100000002 | | | --45.98%-- 0x10100000006 | --2.59%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--56.10%-- 0x10100000002 | --43.90%-- 0x10100000006 0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--52.09%-- 0x10100000002 | --47.91%-- 0x10100000006 0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func | --- flush_tlb_func | |--99.58%-- generic_smp_call_function_interrupt | smp_call_function_interrupt | call_function_interrupt | | | |--94.65%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.04%-- 0x10100000006 | | | | | --21.96%-- 0x10100000002 | | | |--4.67%-- sub_preempt_count | | _raw_spin_unlock_irqrestore | | compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.18%-- 0x10100000006 | | | | | --21.82%-- 0x10100000002 | --0.68%-- [...] --0.42%-- [...] 0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state | --- mod_zone_page_state | |--80.84%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--53.90%-- 0x10100000002 | | | --46.10%-- 0x10100000006 | --19.16%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--55.04%-- 0x10100000002 | --44.96%-- 0x10100000006 0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--96.21%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.94%-- 0x10100000002 | | | --47.06%-- 0x10100000006 | --3.79%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--50.72%-- 0x10100000002 | --49.28%-- 0x10100000006 0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok | --- __zone_watermark_ok | |--95.81%-- zone_watermark_ok | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--51.21%-- 0x10100000002 | | | --48.79%-- 0x10100000006 | --4.19%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--50.00%-- 0x10100000006 | --50.00%-- 0x10100000002 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string | --- copy_user_generic_string generic_file_buffered_write __generic_file_aio_write generic_file_aio_write ext4_file_write do_sync_write vfs_write sys_write system_call_fastpath write run_builtin main __libc_start_main ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-26 10:58 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-08-26 10:58 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel Rik van Riel wrote: > Can you get a backtrace to that _raw_spin_lock_irqsave, to see > from where it is running into lock contention? > > It would be good to know whether it is isolate_freepages_block, > yield_to, kvm_vcpu_on_spin or something else... Hi Rik, I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a while, then ran perf report with the output below. This trace looks more like the second perf top trace that I sent on Saturday (there were two in my email and they were different from each other as well as different from on 3.5.2). The symptoms were a bit different too - the VM boots appeared to be completely locked up rather than just slow, and I couldn't quit qemu-kvm at the monitor - I had to restart the host. So perhaps this one is actually a deadlock rather than just slow? Cheers, Richard. # ======== # captured on: Sun Aug 26 10:08:28 2012 # os release : 3.6.0-rc3-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131971760 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 2M of event 'cycles' # Event count (approx.): 1040676441385 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.99%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.91%-- 0x10100000002 | | | --45.09%-- 0x10100000006 --0.01%-- [...] 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count | --- sub_preempt_count | |--99.77%-- _raw_spin_unlock_irqrestore | | | |--99.99%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--51.94%-- 0x10100000002 | | | | | --48.06%-- 0x10100000006 | --0.01%-- [...] --0.23%-- [...] 1.23% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.83%-- memcmp_pages | | | |--78.46%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.54%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.17%-- [...] 0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.98%-- native_flush_tlb_others | | | |--99.86%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.14%-- [...] --0.02%-- [...] 0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | --- _raw_spin_unlock_irqrestore | |--96.08%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--65.19%-- 0x10100000006 | | | --34.81%-- 0x10100000002 | |--2.68%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.08%-- 0x10100000002 | | | --47.92%-- 0x10100000006 | |--0.56%-- ntp_tick_length | do_timer | tick_do_update_jiffies64 | tick_sched_timer | __run_hrtimer | hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000002 --0.68%-- [...] 0.30% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.95%-- cpu_idle | start_secondary --0.05%-- [...] 0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range | --- isolate_migratepages_range | |--97.41%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.02%-- 0x10100000002 | | | --45.98%-- 0x10100000006 | --2.59%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--56.10%-- 0x10100000002 | --43.90%-- 0x10100000006 0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--52.09%-- 0x10100000002 | --47.91%-- 0x10100000006 0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func | --- flush_tlb_func | |--99.58%-- generic_smp_call_function_interrupt | smp_call_function_interrupt | call_function_interrupt | | | |--94.65%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.04%-- 0x10100000006 | | | | | --21.96%-- 0x10100000002 | | | |--4.67%-- sub_preempt_count | | _raw_spin_unlock_irqrestore | | compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.18%-- 0x10100000006 | | | | | --21.82%-- 0x10100000002 | --0.68%-- [...] --0.42%-- [...] 0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state | --- mod_zone_page_state | |--80.84%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--53.90%-- 0x10100000002 | | | --46.10%-- 0x10100000006 | --19.16%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--55.04%-- 0x10100000002 | --44.96%-- 0x10100000006 0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--96.21%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.94%-- 0x10100000002 | | | --47.06%-- 0x10100000006 | --3.79%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--50.72%-- 0x10100000002 | --49.28%-- 0x10100000006 0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok | --- __zone_watermark_ok | |--95.81%-- zone_watermark_ok | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--51.21%-- 0x10100000002 | | | --48.79%-- 0x10100000006 | --4.19%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.33 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--50.00%-- 0x10100000006 | --50.00%-- 0x10100000002 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string | --- copy_user_generic_string generic_file_buffered_write __generic_file_aio_write generic_file_aio_write ext4_file_write do_sync_write vfs_write sys_write system_call_fastpath write run_builtin main __libc_start_main ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-26 10:58 ` [Qemu-devel] " Richard Davies @ 2012-09-06 9:20 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-06 9:20 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm Hi Rik, Are there any more tests which I can usefully do for you? I notice that 3.6.0-rc4 is out - are there changes from rc3 which are worth me retesting? Cheers, Richard. Richard Davies wrote: > Rik van Riel wrote: > > Can you get a backtrace to that _raw_spin_lock_irqsave, to see > > from where it is running into lock contention? > > > > It would be good to know whether it is isolate_freepages_block, > > yield_to, kvm_vcpu_on_spin or something else... > > Hi Rik, > > I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a > while, then ran perf report with the output below. > > This trace looks more like the second perf top trace that I sent on Saturday > (there were two in my email and they were different from each other as well > as different from on 3.5.2). > > The symptoms were a bit different too - the VM boots appeared to be > completely locked up rather than just slow, and I couldn't quit qemu-kvm at > the monitor - I had to restart the host. > > So perhaps this one is actually a deadlock rather than just slow? > > Cheers, > > Richard. > > > # ======== > # captured on: Sun Aug 26 10:08:28 2012 > # os release : 3.6.0-rc3-elastic > # perf version : 3.5.2 > # arch : x86_64 > # nrcpus online : 16 > # nrcpus avail : 16 > # cpudesc : AMD Opteron(tm) Processor 6128 > # cpuid : AuthenticAMD,16,9,1 > # total memory : 131971760 kB > # cmdline : /home/root/bin/perf record -g -a > # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 } > # HEADER_CPU_TOPOLOGY info available, use -I to display > # HEADER_NUMA_TOPOLOGY info available, use -I to display > # ======== > # > # Samples: 2M of event 'cycles' > # Event count (approx.): 1040676441385 > # > # Overhead Command Shared Object Symbol > # ........ ............... .................... .............................................. > # > 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--99.99%-- isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--54.91%-- 0x10100000002 > | | > | --45.09%-- 0x10100000006 > --0.01%-- [...] > 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count > | > --- sub_preempt_count > | > |--99.77%-- _raw_spin_unlock_irqrestore > | | > | |--99.99%-- compact_checklock_irqsave > | | isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask > | | alloc_pages_vma > | | do_huge_pmd_anonymous_page > | | handle_mm_fault > | | __get_user_pages > | | get_user_page_nowait > | | hva_to_pfn.isra.33 > | | __gfn_to_pfn > | | gfn_to_pfn_async > | | try_async_pf > | | tdp_page_fault > | | kvm_mmu_page_fault > | | pf_interception > | | handle_exit > | | kvm_arch_vcpu_ioctl_run > | | kvm_vcpu_ioctl > | | do_vfs_ioctl > | | sys_ioctl > | | system_call_fastpath > | | ioctl > | | | > | | |--51.94%-- 0x10100000002 > | | | > | | --48.06%-- 0x10100000006 > | --0.01%-- [...] > --0.23%-- [...] > 1.23% ksmd [kernel.kallsyms] [k] memcmp > | > --- memcmp > | > |--99.83%-- memcmp_pages > | | > | |--78.46%-- ksm_scan_thread > | | kthread > | | kernel_thread_helper > | | > | --21.54%-- try_to_merge_with_ksm_page > | ksm_scan_thread > | kthread > | kernel_thread_helper > --0.17%-- [...] > 0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many > | > --- smp_call_function_many > | > |--99.98%-- native_flush_tlb_others > | | > | |--99.86%-- flush_tlb_page > | | ptep_clear_flush > | | try_to_merge_with_ksm_page > | | ksm_scan_thread > | | kthread > | | kernel_thread_helper > | --0.14%-- [...] > --0.02%-- [...] > 0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore > | > --- _raw_spin_unlock_irqrestore > | > |--96.08%-- compact_checklock_irqsave > | isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--65.19%-- 0x10100000006 > | | > | --34.81%-- 0x10100000002 > | > |--2.68%-- isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--52.08%-- 0x10100000002 > | | > | --47.92%-- 0x10100000006 > | > |--0.56%-- ntp_tick_length > | do_timer > | tick_do_update_jiffies64 > | tick_sched_timer > | __run_hrtimer > | hrtimer_interrupt > | smp_apic_timer_interrupt > | apic_timer_interrupt > | compact_checklock_irqsave > | isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | 0x10100000002 > --0.68%-- [...] > 0.30% swapper [kernel.kallsyms] [k] default_idle > | > --- default_idle > | > |--99.95%-- cpu_idle > | start_secondary > --0.05%-- [...] > 0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range > | > --- isolate_migratepages_range > | > |--97.41%-- compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--54.02%-- 0x10100000002 > | | > | --45.98%-- 0x10100000006 > | > --2.59%-- compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--56.10%-- 0x10100000002 > | > --43.90%-- 0x10100000006 > 0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone > | > --- compact_zone > compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--52.09%-- 0x10100000002 > | > --47.91%-- 0x10100000006 > 0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func > | > --- flush_tlb_func > | > |--99.58%-- generic_smp_call_function_interrupt > | smp_call_function_interrupt > | call_function_interrupt > | | > | |--94.65%-- compact_checklock_irqsave > | | isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask > | | alloc_pages_vma > | | do_huge_pmd_anonymous_page > | | handle_mm_fault > | | __get_user_pages > | | get_user_page_nowait > | | hva_to_pfn.isra.33 > | | __gfn_to_pfn > | | gfn_to_pfn_async > | | try_async_pf > | | tdp_page_fault > | | kvm_mmu_page_fault > | | pf_interception > | | handle_exit > | | kvm_arch_vcpu_ioctl_run > | | kvm_vcpu_ioctl > | | do_vfs_ioctl > | | sys_ioctl > | | system_call_fastpath > | | ioctl > | | | > | | |--78.04%-- 0x10100000006 > | | | > | | --21.96%-- 0x10100000002 > | | > | |--4.67%-- sub_preempt_count > | | _raw_spin_unlock_irqrestore > | | compact_checklock_irqsave > | | isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask > | | alloc_pages_vma > | | do_huge_pmd_anonymous_page > | | handle_mm_fault > | | __get_user_pages > | | get_user_page_nowait > | | hva_to_pfn.isra.33 > | | __gfn_to_pfn > | | gfn_to_pfn_async > | | try_async_pf > | | tdp_page_fault > | | kvm_mmu_page_fault > | | pf_interception > | | handle_exit > | | kvm_arch_vcpu_ioctl_run > | | kvm_vcpu_ioctl > | | do_vfs_ioctl > | | sys_ioctl > | | system_call_fastpath > | | ioctl > | | | > | | |--78.18%-- 0x10100000006 > | | | > | | --21.82%-- 0x10100000002 > | --0.68%-- [...] > --0.42%-- [...] > 0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state > | > --- mod_zone_page_state > | > |--80.84%-- isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--53.90%-- 0x10100000002 > | | > | --46.10%-- 0x10100000006 > | > --19.16%-- compact_zone > compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--55.04%-- 0x10100000002 > | > --44.96%-- 0x10100000006 > 0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages > | > --- migrate_pages > | > |--96.21%-- compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--52.94%-- 0x10100000002 > | | > | --47.06%-- 0x10100000006 > | > --3.79%-- compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--50.72%-- 0x10100000002 > | > --49.28%-- 0x10100000006 > 0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok > | > --- __zone_watermark_ok > | > |--95.81%-- zone_watermark_ok > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--51.21%-- 0x10100000002 > | | > | --48.79%-- 0x10100000006 > | > --4.19%-- compact_zone > compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--50.00%-- 0x10100000006 > | > --50.00%-- 0x10100000002 > 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string > | > --- copy_user_generic_string > generic_file_buffered_write > __generic_file_aio_write > generic_file_aio_write > ext4_file_write > do_sync_write > vfs_write > sys_write > system_call_fastpath > write > run_builtin > main > __libc_start_main ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-09-06 9:20 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-06 9:20 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel Hi Rik, Are there any more tests which I can usefully do for you? I notice that 3.6.0-rc4 is out - are there changes from rc3 which are worth me retesting? Cheers, Richard. Richard Davies wrote: > Rik van Riel wrote: > > Can you get a backtrace to that _raw_spin_lock_irqsave, to see > > from where it is running into lock contention? > > > > It would be good to know whether it is isolate_freepages_block, > > yield_to, kvm_vcpu_on_spin or something else... > > Hi Rik, > > I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a > while, then ran perf report with the output below. > > This trace looks more like the second perf top trace that I sent on Saturday > (there were two in my email and they were different from each other as well > as different from on 3.5.2). > > The symptoms were a bit different too - the VM boots appeared to be > completely locked up rather than just slow, and I couldn't quit qemu-kvm at > the monitor - I had to restart the host. > > So perhaps this one is actually a deadlock rather than just slow? > > Cheers, > > Richard. > > > # ======== > # captured on: Sun Aug 26 10:08:28 2012 > # os release : 3.6.0-rc3-elastic > # perf version : 3.5.2 > # arch : x86_64 > # nrcpus online : 16 > # nrcpus avail : 16 > # cpudesc : AMD Opteron(tm) Processor 6128 > # cpuid : AuthenticAMD,16,9,1 > # total memory : 131971760 kB > # cmdline : /home/root/bin/perf record -g -a > # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 } > # HEADER_CPU_TOPOLOGY info available, use -I to display > # HEADER_NUMA_TOPOLOGY info available, use -I to display > # ======== > # > # Samples: 2M of event 'cycles' > # Event count (approx.): 1040676441385 > # > # Overhead Command Shared Object Symbol > # ........ ............... .................... .............................................. > # > 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--99.99%-- isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--54.91%-- 0x10100000002 > | | > | --45.09%-- 0x10100000006 > --0.01%-- [...] > 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count > | > --- sub_preempt_count > | > |--99.77%-- _raw_spin_unlock_irqrestore > | | > | |--99.99%-- compact_checklock_irqsave > | | isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask > | | alloc_pages_vma > | | do_huge_pmd_anonymous_page > | | handle_mm_fault > | | __get_user_pages > | | get_user_page_nowait > | | hva_to_pfn.isra.33 > | | __gfn_to_pfn > | | gfn_to_pfn_async > | | try_async_pf > | | tdp_page_fault > | | kvm_mmu_page_fault > | | pf_interception > | | handle_exit > | | kvm_arch_vcpu_ioctl_run > | | kvm_vcpu_ioctl > | | do_vfs_ioctl > | | sys_ioctl > | | system_call_fastpath > | | ioctl > | | | > | | |--51.94%-- 0x10100000002 > | | | > | | --48.06%-- 0x10100000006 > | --0.01%-- [...] > --0.23%-- [...] > 1.23% ksmd [kernel.kallsyms] [k] memcmp > | > --- memcmp > | > |--99.83%-- memcmp_pages > | | > | |--78.46%-- ksm_scan_thread > | | kthread > | | kernel_thread_helper > | | > | --21.54%-- try_to_merge_with_ksm_page > | ksm_scan_thread > | kthread > | kernel_thread_helper > --0.17%-- [...] > 0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many > | > --- smp_call_function_many > | > |--99.98%-- native_flush_tlb_others > | | > | |--99.86%-- flush_tlb_page > | | ptep_clear_flush > | | try_to_merge_with_ksm_page > | | ksm_scan_thread > | | kthread > | | kernel_thread_helper > | --0.14%-- [...] > --0.02%-- [...] > 0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore > | > --- _raw_spin_unlock_irqrestore > | > |--96.08%-- compact_checklock_irqsave > | isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--65.19%-- 0x10100000006 > | | > | --34.81%-- 0x10100000002 > | > |--2.68%-- isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--52.08%-- 0x10100000002 > | | > | --47.92%-- 0x10100000006 > | > |--0.56%-- ntp_tick_length > | do_timer > | tick_do_update_jiffies64 > | tick_sched_timer > | __run_hrtimer > | hrtimer_interrupt > | smp_apic_timer_interrupt > | apic_timer_interrupt > | compact_checklock_irqsave > | isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | 0x10100000002 > --0.68%-- [...] > 0.30% swapper [kernel.kallsyms] [k] default_idle > | > --- default_idle > | > |--99.95%-- cpu_idle > | start_secondary > --0.05%-- [...] > 0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range > | > --- isolate_migratepages_range > | > |--97.41%-- compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--54.02%-- 0x10100000002 > | | > | --45.98%-- 0x10100000006 > | > --2.59%-- compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--56.10%-- 0x10100000002 > | > --43.90%-- 0x10100000006 > 0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone > | > --- compact_zone > compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--52.09%-- 0x10100000002 > | > --47.91%-- 0x10100000006 > 0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func > | > --- flush_tlb_func > | > |--99.58%-- generic_smp_call_function_interrupt > | smp_call_function_interrupt > | call_function_interrupt > | | > | |--94.65%-- compact_checklock_irqsave > | | isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask > | | alloc_pages_vma > | | do_huge_pmd_anonymous_page > | | handle_mm_fault > | | __get_user_pages > | | get_user_page_nowait > | | hva_to_pfn.isra.33 > | | __gfn_to_pfn > | | gfn_to_pfn_async > | | try_async_pf > | | tdp_page_fault > | | kvm_mmu_page_fault > | | pf_interception > | | handle_exit > | | kvm_arch_vcpu_ioctl_run > | | kvm_vcpu_ioctl > | | do_vfs_ioctl > | | sys_ioctl > | | system_call_fastpath > | | ioctl > | | | > | | |--78.04%-- 0x10100000006 > | | | > | | --21.96%-- 0x10100000002 > | | > | |--4.67%-- sub_preempt_count > | | _raw_spin_unlock_irqrestore > | | compact_checklock_irqsave > | | isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask > | | alloc_pages_vma > | | do_huge_pmd_anonymous_page > | | handle_mm_fault > | | __get_user_pages > | | get_user_page_nowait > | | hva_to_pfn.isra.33 > | | __gfn_to_pfn > | | gfn_to_pfn_async > | | try_async_pf > | | tdp_page_fault > | | kvm_mmu_page_fault > | | pf_interception > | | handle_exit > | | kvm_arch_vcpu_ioctl_run > | | kvm_vcpu_ioctl > | | do_vfs_ioctl > | | sys_ioctl > | | system_call_fastpath > | | ioctl > | | | > | | |--78.18%-- 0x10100000006 > | | | > | | --21.82%-- 0x10100000002 > | --0.68%-- [...] > --0.42%-- [...] > 0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state > | > --- mod_zone_page_state > | > |--80.84%-- isolate_migratepages_range > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--53.90%-- 0x10100000002 > | | > | --46.10%-- 0x10100000006 > | > --19.16%-- compact_zone > compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--55.04%-- 0x10100000002 > | > --44.96%-- 0x10100000006 > 0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages > | > --- migrate_pages > | > |--96.21%-- compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--52.94%-- 0x10100000002 > | | > | --47.06%-- 0x10100000006 > | > --3.79%-- compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--50.72%-- 0x10100000002 > | > --49.28%-- 0x10100000006 > 0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok > | > --- __zone_watermark_ok > | > |--95.81%-- zone_watermark_ok > | compact_zone > | compact_zone_order > | try_to_compact_pages > | __alloc_pages_direct_compact > | __alloc_pages_nodemask > | alloc_pages_vma > | do_huge_pmd_anonymous_page > | handle_mm_fault > | __get_user_pages > | get_user_page_nowait > | hva_to_pfn.isra.33 > | __gfn_to_pfn > | gfn_to_pfn_async > | try_async_pf > | tdp_page_fault > | kvm_mmu_page_fault > | pf_interception > | handle_exit > | kvm_arch_vcpu_ioctl_run > | kvm_vcpu_ioctl > | do_vfs_ioctl > | sys_ioctl > | system_call_fastpath > | ioctl > | | > | |--51.21%-- 0x10100000002 > | | > | --48.79%-- 0x10100000006 > | > --4.19%-- compact_zone > compact_zone_order > try_to_compact_pages > __alloc_pages_direct_compact > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > __get_user_pages > get_user_page_nowait > hva_to_pfn.isra.33 > __gfn_to_pfn > gfn_to_pfn_async > try_async_pf > tdp_page_fault > kvm_mmu_page_fault > pf_interception > handle_exit > kvm_arch_vcpu_ioctl_run > kvm_vcpu_ioctl > do_vfs_ioctl > sys_ioctl > system_call_fastpath > ioctl > | > |--50.00%-- 0x10100000006 > | > --50.00%-- 0x10100000002 > 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string > | > --- copy_user_generic_string > generic_file_buffered_write > __generic_file_aio_write > generic_file_aio_write > ext4_file_write > do_sync_write > vfs_write > sys_write > system_call_fastpath > write > run_builtin > main > __libc_start_main ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot 2012-09-06 9:20 ` [Qemu-devel] " Richard Davies (?) @ 2012-09-12 10:56 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm, linux-mm [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] Hi Rik, Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would retest with these. The typical symptom now appears to be that the Windows VMs boot reasonably fast, but then there is high CPU use and load for many minutes afterwards - the high CPU use is both for the qemu-kvm processes themselves and also for % sys. I attach a perf report which seems to show that the high CPU use is in the memory manager. Cheers, Richard. # ======== # captured on: Wed Sep 12 10:25:43 2012 # os release : 3.6.0-rc5-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 870K of event 'cycles' # Event count (approx.): 432968175910 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.47%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.64%-- 0x10100000002 | | | --44.36%-- 0x10100000006 | |--4.53%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.36%-- 0x10100000002 | | | --44.64%-- 0x10100000006 --0.00%-- [...] 4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--99.74%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.80%-- 0x10100000002 | | | --44.20%-- 0x10100000006 --0.26%-- [...] 1.59% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.69%-- memcmp_pages | | | |--78.86%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.14%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.31%-- [...] 0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many native_flush_tlb_others | |--99.81%-- flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.19%-- [...] 0.38% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.80%-- cpu_idle | | | |--90.53%-- start_secondary | | | --9.47%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.20%-- [...] 0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | --- _raw_spin_unlock_irqrestore | |--94.31%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.74%-- 0x10100000006 | | | --40.26%-- 0x10100000002 | |--3.41%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--53.57%-- 0x10100000006 | | | --46.43%-- 0x10100000002 | |--0.82%-- ntp_tick_length | do_timer | tick_do_update_jiffies64 | tick_sched_timer | __run_hrtimer.isra.28 | hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000002 | |--0.76%-- __page_cache_release.part.11 | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.70%-- [...] 0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range | --- isolate_migratepages_range | |--95.44%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.46%-- 0x10100000002 | | | --47.54%-- 0x10100000006 | --4.56%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.84%-- 0x10100000006 | --46.16%-- 0x10100000002 0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.46%-- 0x10100000002 | --46.54%-- 0x10100000006 0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state | --- mod_zone_page_state | |--70.21%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.97%-- 0x10100000002 | | | --44.03%-- 0x10100000006 | |--29.71%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--61.19%-- 0x10100000002 | | | --38.81%-- 0x10100000006 --0.08%-- [...] 0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func | --- flush_tlb_func | |--99.47%-- generic_smp_call_function_interrupt | smp_call_function_interrupt | call_function_interrupt | | | |--91.76%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--76.39%-- 0x10100000006 | | | | | --23.61%-- 0x10100000002 | | | |--7.61%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--70.59%-- 0x10100000006 | | | | | --29.41%-- 0x10100000002 | --0.63%-- [...] | --0.53%-- smp_call_function_interrupt call_function_interrupt | |--83.32%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--79.99%-- 0x10100000006 | | | --20.01%-- 0x10100000002 | --16.68%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000002 0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.75%-- __free_pages_ok | | | |--99.84%-- free_compound_page | | __put_compound_page | | put_compound_page | | release_pages | | free_pages_and_swap_cache | | tlb_flush_mmu | | tlb_finish_mmu | | exit_mmap | | mmput | | exit_mm | | do_exit | | do_group_exit | | get_signal_to_deliver | | do_signal | | do_notify_resume | | int_signal | --0.16%-- [...] --0.25%-- [...] 0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.47%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --0.53%-- free_hot_cold_page __free_pages | |--50.65%-- zap_huge_pmd | unmap_single_vma | unmap_vmas | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --49.35%-- __vunmap vfree kvm_free_physmem_slot kvm_free_physmem kvm_put_kvm kvm_vcpu_release __fput ____fput task_work_run do_exit do_group_exit get_signal_to_deliver do_signal do_notify_resume int_signal 0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.55%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.45%-- [...] 0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok | --- __zone_watermark_ok | |--56.52%-- zone_watermark_ok | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.67%-- 0x10100000002 | | | --40.33%-- 0x10100000006 | --43.48%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--58.50%-- 0x10100000002 | --41.50%-- 0x10100000006 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string | --- copy_user_generic_string | |--99.82%-- generic_file_buffered_write | __generic_file_aio_write | generic_file_aio_write | ext4_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | write | run_builtin | main | __libc_start_main --0.18%-- [...] 0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave | --- compact_checklock_irqsave | |--82.09%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.69%-- 0x10100000002 | | | --45.31%-- 0x10100000006 | --17.91%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--59.49%-- 0x10100000002 | --40.51%-- 0x10100000006 0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt | --- call_function_interrupt | |--91.95%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--72.81%-- 0x10100000006 | | | --27.19%-- 0x10100000002 | |--7.50%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.56%-- 0x10100000006 | | | --44.44%-- 0x10100000002 --0.56%-- [...] 0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys | --- default_send_IPI_mask_sequence_phys | |--99.44%-- physflat_send_IPI_mask | native_send_call_func_ipi | smp_call_function_many | native_flush_tlb_others | flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper | --0.56%-- native_send_call_func_ipi smp_call_function_many native_flush_tlb_others flush_tlb_page ptep_clear_flush try_to_merge_with_ksm_page ksm_scan_thread kthread kernel_thread_helper 0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt | --- generic_smp_call_function_interrupt | |--96.97%-- smp_call_function_interrupt | call_function_interrupt | | | |--97.39%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.65%-- 0x10100000006 | | | | | --21.35%-- 0x10100000002 | | | |--2.43%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--57.14%-- 0x10100000002 | | | | | --42.86%-- 0x10100000006 | --0.19%-- [...] | --3.03%-- call_function_interrupt | |--77.79%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--71.42%-- 0x10100000006 | | | --28.58%-- 0x10100000002 | --22.21%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot @ 2012-09-12 10:56 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm, Avi Kivity, kvm, qemu-devel [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] Hi Rik, Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would retest with these. The typical symptom now appears to be that the Windows VMs boot reasonably fast, but then there is high CPU use and load for many minutes afterwards - the high CPU use is both for the qemu-kvm processes themselves and also for % sys. I attach a perf report which seems to show that the high CPU use is in the memory manager. Cheers, Richard. # ======== # captured on: Wed Sep 12 10:25:43 2012 # os release : 3.6.0-rc5-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 870K of event 'cycles' # Event count (approx.): 432968175910 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.47%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.64%-- 0x10100000002 | | | --44.36%-- 0x10100000006 | |--4.53%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.36%-- 0x10100000002 | | | --44.64%-- 0x10100000006 --0.00%-- [...] 4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--99.74%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.80%-- 0x10100000002 | | | --44.20%-- 0x10100000006 --0.26%-- [...] 1.59% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.69%-- memcmp_pages | | | |--78.86%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.14%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.31%-- [...] 0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many native_flush_tlb_others | |--99.81%-- flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.19%-- [...] 0.38% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.80%-- cpu_idle | | | |--90.53%-- start_secondary | | | --9.47%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.20%-- [...] 0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | --- _raw_spin_unlock_irqrestore | |--94.31%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.74%-- 0x10100000006 | | | --40.26%-- 0x10100000002 | |--3.41%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--53.57%-- 0x10100000006 | | | --46.43%-- 0x10100000002 | |--0.82%-- ntp_tick_length | do_timer | tick_do_update_jiffies64 | tick_sched_timer | __run_hrtimer.isra.28 | hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000002 | |--0.76%-- __page_cache_release.part.11 | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.70%-- [...] 0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range | --- isolate_migratepages_range | |--95.44%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.46%-- 0x10100000002 | | | --47.54%-- 0x10100000006 | --4.56%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.84%-- 0x10100000006 | --46.16%-- 0x10100000002 0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.46%-- 0x10100000002 | --46.54%-- 0x10100000006 0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state | --- mod_zone_page_state | |--70.21%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.97%-- 0x10100000002 | | | --44.03%-- 0x10100000006 | |--29.71%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--61.19%-- 0x10100000002 | | | --38.81%-- 0x10100000006 --0.08%-- [...] 0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func | --- flush_tlb_func | |--99.47%-- generic_smp_call_function_interrupt | smp_call_function_interrupt | call_function_interrupt | | | |--91.76%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--76.39%-- 0x10100000006 | | | | | --23.61%-- 0x10100000002 | | | |--7.61%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--70.59%-- 0x10100000006 | | | | | --29.41%-- 0x10100000002 | --0.63%-- [...] | --0.53%-- smp_call_function_interrupt call_function_interrupt | |--83.32%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--79.99%-- 0x10100000006 | | | --20.01%-- 0x10100000002 | --16.68%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000002 0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.75%-- __free_pages_ok | | | |--99.84%-- free_compound_page | | __put_compound_page | | put_compound_page | | release_pages | | free_pages_and_swap_cache | | tlb_flush_mmu | | tlb_finish_mmu | | exit_mmap | | mmput | | exit_mm | | do_exit | | do_group_exit | | get_signal_to_deliver | | do_signal | | do_notify_resume | | int_signal | --0.16%-- [...] --0.25%-- [...] 0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.47%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --0.53%-- free_hot_cold_page __free_pages | |--50.65%-- zap_huge_pmd | unmap_single_vma | unmap_vmas | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --49.35%-- __vunmap vfree kvm_free_physmem_slot kvm_free_physmem kvm_put_kvm kvm_vcpu_release __fput ____fput task_work_run do_exit do_group_exit get_signal_to_deliver do_signal do_notify_resume int_signal 0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.55%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.45%-- [...] 0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok | --- __zone_watermark_ok | |--56.52%-- zone_watermark_ok | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.67%-- 0x10100000002 | | | --40.33%-- 0x10100000006 | --43.48%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--58.50%-- 0x10100000002 | --41.50%-- 0x10100000006 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string | --- copy_user_generic_string | |--99.82%-- generic_file_buffered_write | __generic_file_aio_write | generic_file_aio_write | ext4_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | write | run_builtin | main | __libc_start_main --0.18%-- [...] 0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave | --- compact_checklock_irqsave | |--82.09%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.69%-- 0x10100000002 | | | --45.31%-- 0x10100000006 | --17.91%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--59.49%-- 0x10100000002 | --40.51%-- 0x10100000006 0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt | --- call_function_interrupt | |--91.95%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--72.81%-- 0x10100000006 | | | --27.19%-- 0x10100000002 | |--7.50%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.56%-- 0x10100000006 | | | --44.44%-- 0x10100000002 --0.56%-- [...] 0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys | --- default_send_IPI_mask_sequence_phys | |--99.44%-- physflat_send_IPI_mask | native_send_call_func_ipi | smp_call_function_many | native_flush_tlb_others | flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper | --0.56%-- native_send_call_func_ipi smp_call_function_many native_flush_tlb_others flush_tlb_page ptep_clear_flush try_to_merge_with_ksm_page ksm_scan_thread kthread kernel_thread_helper 0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt | --- generic_smp_call_function_interrupt | |--96.97%-- smp_call_function_interrupt | call_function_interrupt | | | |--97.39%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.65%-- 0x10100000006 | | | | | --21.35%-- 0x10100000002 | | | |--2.43%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--57.14%-- 0x10100000002 | | | | | --42.86%-- 0x10100000006 | --0.19%-- [...] | --3.03%-- call_function_interrupt | |--77.79%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--71.42%-- 0x10100000006 | | | --28.58%-- 0x10100000002 | --22.21%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot @ 2012-09-12 10:56 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm, linux-mm [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] Hi Rik, Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would retest with these. The typical symptom now appears to be that the Windows VMs boot reasonably fast, but then there is high CPU use and load for many minutes afterwards - the high CPU use is both for the qemu-kvm processes themselves and also for % sys. I attach a perf report which seems to show that the high CPU use is in the memory manager. Cheers, Richard. # ======== # captured on: Wed Sep 12 10:25:43 2012 # os release : 3.6.0-rc5-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 870K of event 'cycles' # Event count (approx.): 432968175910 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.47%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.64%-- 0x10100000002 | | | --44.36%-- 0x10100000006 | |--4.53%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.36%-- 0x10100000002 | | | --44.64%-- 0x10100000006 --0.00%-- [...] 4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--99.74%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.80%-- 0x10100000002 | | | --44.20%-- 0x10100000006 --0.26%-- [...] 1.59% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.69%-- memcmp_pages | | | |--78.86%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.14%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.31%-- [...] 0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many native_flush_tlb_others | |--99.81%-- flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.19%-- [...] 0.38% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.80%-- cpu_idle | | | |--90.53%-- start_secondary | | | --9.47%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.20%-- [...] 0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | --- _raw_spin_unlock_irqrestore | |--94.31%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.74%-- 0x10100000006 | | | --40.26%-- 0x10100000002 | |--3.41%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--53.57%-- 0x10100000006 | | | --46.43%-- 0x10100000002 | |--0.82%-- ntp_tick_length | do_timer | tick_do_update_jiffies64 | tick_sched_timer | __run_hrtimer.isra.28 | hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000002 | |--0.76%-- __page_cache_release.part.11 | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.70%-- [...] 0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range | --- isolate_migratepages_range | |--95.44%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.46%-- 0x10100000002 | | | --47.54%-- 0x10100000006 | --4.56%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.84%-- 0x10100000006 | --46.16%-- 0x10100000002 0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.46%-- 0x10100000002 | --46.54%-- 0x10100000006 0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state | --- mod_zone_page_state | |--70.21%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.97%-- 0x10100000002 | | | --44.03%-- 0x10100000006 | |--29.71%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--61.19%-- 0x10100000002 | | | --38.81%-- 0x10100000006 --0.08%-- [...] 0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func | --- flush_tlb_func | |--99.47%-- generic_smp_call_function_interrupt | smp_call_function_interrupt | call_function_interrupt | | | |--91.76%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--76.39%-- 0x10100000006 | | | | | --23.61%-- 0x10100000002 | | | |--7.61%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--70.59%-- 0x10100000006 | | | | | --29.41%-- 0x10100000002 | --0.63%-- [...] | --0.53%-- smp_call_function_interrupt call_function_interrupt | |--83.32%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--79.99%-- 0x10100000006 | | | --20.01%-- 0x10100000002 | --16.68%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000002 0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.75%-- __free_pages_ok | | | |--99.84%-- free_compound_page | | __put_compound_page | | put_compound_page | | release_pages | | free_pages_and_swap_cache | | tlb_flush_mmu | | tlb_finish_mmu | | exit_mmap | | mmput | | exit_mm | | do_exit | | do_group_exit | | get_signal_to_deliver | | do_signal | | do_notify_resume | | int_signal | --0.16%-- [...] --0.25%-- [...] 0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.47%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --0.53%-- free_hot_cold_page __free_pages | |--50.65%-- zap_huge_pmd | unmap_single_vma | unmap_vmas | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --49.35%-- __vunmap vfree kvm_free_physmem_slot kvm_free_physmem kvm_put_kvm kvm_vcpu_release __fput ____fput task_work_run do_exit do_group_exit get_signal_to_deliver do_signal do_notify_resume int_signal 0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.55%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.45%-- [...] 0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok | --- __zone_watermark_ok | |--56.52%-- zone_watermark_ok | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.67%-- 0x10100000002 | | | --40.33%-- 0x10100000006 | --43.48%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--58.50%-- 0x10100000002 | --41.50%-- 0x10100000006 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string | --- copy_user_generic_string | |--99.82%-- generic_file_buffered_write | __generic_file_aio_write | generic_file_aio_write | ext4_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | write | run_builtin | main | __libc_start_main --0.18%-- [...] 0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave | --- compact_checklock_irqsave | |--82.09%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.69%-- 0x10100000002 | | | --45.31%-- 0x10100000006 | --17.91%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--59.49%-- 0x10100000002 | --40.51%-- 0x10100000006 0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt | --- call_function_interrupt | |--91.95%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--72.81%-- 0x10100000006 | | | --27.19%-- 0x10100000002 | |--7.50%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.56%-- 0x10100000006 | | | --44.44%-- 0x10100000002 --0.56%-- [...] 0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys | --- default_send_IPI_mask_sequence_phys | |--99.44%-- physflat_send_IPI_mask | native_send_call_func_ipi | smp_call_function_many | native_flush_tlb_others | flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper | --0.56%-- native_send_call_func_ipi smp_call_function_many native_flush_tlb_others flush_tlb_page ptep_clear_flush try_to_merge_with_ksm_page ksm_scan_thread kthread kernel_thread_helper 0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt | --- generic_smp_call_function_interrupt | |--96.97%-- smp_call_function_interrupt | call_function_interrupt | | | |--97.39%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.65%-- 0x10100000006 | | | | | --21.35%-- 0x10100000002 | | | |--2.43%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--57.14%-- 0x10100000002 | | | | | --42.86%-- 0x10100000006 | --0.19%-- [...] | --3.03%-- call_function_interrupt | |--77.79%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--71.42%-- 0x10100000006 | | | --28.58%-- 0x10100000002 | --22.21%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot 2012-09-12 10:56 ` Richard Davies (?) @ 2012-09-12 12:25 ` Mel Gorman -1 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote: > [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] > > Hi Rik, > I'm not Rik but hi anyway. > Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would > retest with these. > Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction if locks are contended or taking too long] that should have helped mitigate some of the lock contention problem but not all of it as we'll see later. > The typical symptom now appears to be that the Windows VMs boot reasonably > fast, I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. > but then there is high CPU use and load for many minutes afterwards - > the high CPU use is both for the qemu-kvm processes themselves and also for > % sys. > Ok, I cannot comment on the userspace portion of things but the kernel portion still indicates that there is a high percentage of time on what appears to be lock contention. > I attach a perf report which seems to show that the high CPU use is in the > memory manager. > A follow-on from commit c67fe375 was the following patch (author cc'd) which addresses lock contention in isolate_migratepages_range where your perf report indicates that we're spending 95% of the time. Would you be willing to test it please? ---8<--- From: Shaohua Li <shli@kernel.org> Subject: mm: compaction: check lock contention first before taking lock isolate_migratepages_range will take zone->lru_lock first and check if the lock is contented, if yes, it will release the lock. This isn't efficient. If the lock is truly contented, a lock/unlock pair will increase the lock contention. We'd better check if the lock is contended first. compact_trylock_irqsave perfectly meets the requirement. Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/compaction.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock +++ a/mm/compaction.c @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot @ 2012-09-12 12:25 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw) To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote: > [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] > > Hi Rik, > I'm not Rik but hi anyway. > Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would > retest with these. > Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction if locks are contended or taking too long] that should have helped mitigate some of the lock contention problem but not all of it as we'll see later. > The typical symptom now appears to be that the Windows VMs boot reasonably > fast, I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. > but then there is high CPU use and load for many minutes afterwards - > the high CPU use is both for the qemu-kvm processes themselves and also for > % sys. > Ok, I cannot comment on the userspace portion of things but the kernel portion still indicates that there is a high percentage of time on what appears to be lock contention. > I attach a perf report which seems to show that the high CPU use is in the > memory manager. > A follow-on from commit c67fe375 was the following patch (author cc'd) which addresses lock contention in isolate_migratepages_range where your perf report indicates that we're spending 95% of the time. Would you be willing to test it please? ---8<--- From: Shaohua Li <shli@kernel.org> Subject: mm: compaction: check lock contention first before taking lock isolate_migratepages_range will take zone->lru_lock first and check if the lock is contented, if yes, it will release the lock. This isn't efficient. If the lock is truly contented, a lock/unlock pair will increase the lock contention. We'd better check if the lock is contended first. compact_trylock_irqsave perfectly meets the requirement. Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/compaction.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock +++ a/mm/compaction.c @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot @ 2012-09-12 12:25 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote: > [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] > > Hi Rik, > I'm not Rik but hi anyway. > Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would > retest with these. > Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction if locks are contended or taking too long] that should have helped mitigate some of the lock contention problem but not all of it as we'll see later. > The typical symptom now appears to be that the Windows VMs boot reasonably > fast, I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. > but then there is high CPU use and load for many minutes afterwards - > the high CPU use is both for the qemu-kvm processes themselves and also for > % sys. > Ok, I cannot comment on the userspace portion of things but the kernel portion still indicates that there is a high percentage of time on what appears to be lock contention. > I attach a perf report which seems to show that the high CPU use is in the > memory manager. > A follow-on from commit c67fe375 was the following patch (author cc'd) which addresses lock contention in isolate_migratepages_range where your perf report indicates that we're spending 95% of the time. Would you be willing to test it please? ---8<--- From: Shaohua Li <shli@kernel.org> Subject: mm: compaction: check lock contention first before taking lock isolate_migratepages_range will take zone->lru_lock first and check if the lock is contented, if yes, it will release the lock. This isn't efficient. If the lock is truly contented, a lock/unlock pair will increase the lock contention. We'd better check if the lock is contended first. compact_trylock_irqsave perfectly meets the requirement. Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/compaction.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock +++ a/mm/compaction.c @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot 2012-09-12 12:25 ` Mel Gorman (?) @ 2012-09-12 16:46 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Mel - thanks for replying to my underhand bcc! Mel Gorman wrote: > I see that this is an old-ish bug but I did not read the full history. > Is it now booting faster than 3.5.0 was? I'm asking because I'm > interested to see if commit c67fe375 helped your particular case. Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be improved, as discussed. > A follow-on from commit c67fe375 was the following patch (author cc'd) > which addresses lock contention in isolate_migratepages_range where your > perf report indicates that we're spending 95% of the time. Would you be > willing to test it please? > > ---8<--- > From: Shaohua Li <shli@kernel.org> > Subject: mm: compaction: check lock contention first before taking lock > > isolate_migratepages_range will take zone->lru_lock first and check if the > lock is contented, if yes, it will release the lock. This isn't > efficient. If the lock is truly contented, a lock/unlock pair will > increase the lock contention. We'd better check if the lock is contended > first. compact_trylock_irqsave perfectly meets the requirement. > > Signed-off-by: Shaohua Li <shli@fusionio.com> > Acked-by: Mel Gorman <mgorman@suse.de> > Acked-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/compaction.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c > --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock > +++ a/mm/compaction.c > @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * > > /* Time to isolate some pages for migration */ > cond_resched(); > - spin_lock_irqsave(&zone->lru_lock, flags); > - locked = true; > + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); > + if (!locked) > + return 0; > for (; low_pfn < end_pfn; low_pfn++) { > struct page *page; I have applied and tested again - perf results below. isolate_migratepages_range is indeed much reduced. There is now a lot of time in isolate_freepages_block and still quite a lot of lock contention, although in a different place. # ======== # captured on: Wed Sep 12 16:00:52 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 1M of event 'cycles' # Event count (approx.): 560365005583 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.17%-- 0x10100000006 | | | --4.83%-- 0x10100000002 --0.01%-- [...] 15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--97.18%-- compact_checklock_irqsave | | | |--98.61%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--94.94%-- 0x10100000006 | | | | | --5.06%-- 0x10100000002 | | | --1.39%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.04%-- 0x10100000006 | | | --4.96%-- 0x10100000002 | |--1.94%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.19%-- 0x10100000006 | | | --4.81%-- 0x10100000002 --0.88%-- [...] 5.73% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.79%-- memcmp_pages | | | |--81.64%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --18.36%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.21%-- [...] 5.52% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.51%-- cpu_idle | | | |--86.19%-- start_secondary | | | --13.81%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.49%-- [...] 2.90% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.70%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.09%-- 0x10100000006 | | | --3.91%-- 0x10100000002 --0.30%-- [...] 1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.15%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.03%-- 0x10100000006 | | | --3.97%-- 0x10100000002 | --0.85%-- __alloc_pages_nodemask | |--78.22%-- alloc_pages_vma | handle_pte_fault | | | |--99.76%-- handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--91.60%-- 0x10100000006 | | | | | --8.40%-- 0x10100000002 | --0.24%-- [...] | --21.78%-- alloc_pages_current pte_alloc_one | |--97.40%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--93.12%-- 0x10100000006 | | | --6.88%-- 0x10100000002 | --2.60%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000006 1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--51.38%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.32%-- 0x10100000006 | | | --4.68%-- 0x10100000002 | |--43.05%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.52%-- 0x10100000006 | | | --4.48%-- 0x10100000002 | |--3.62%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.78%-- 0x10100000006 | | | --3.22%-- 0x10100000002 | |--1.20%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.33%-- 0x10100000006 | | | --3.67%-- 0x10100000002 | |--0.61%-- free_hot_cold_page | | | |--77.99%-- free_hot_cold_page_list | | | | | |--95.93%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--98.44%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--96.77%-- 0x10100000006 | | | | | | | | | --3.23%-- 0x10100000002 | | | | | | | --1.56%-- lru_add_drain_cpu | | | lru_add_drain | | | migrate_prep_local | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.07%-- shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--19.40%-- __free_pages | | | | | |--85.71%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--90.47%-- 0x10100000006 | | | | | | | --9.53%-- 0x10100000002 | | | | | |--10.21%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.08%-- __free_slab | | discard_slab | | __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | bdev_try_to_free_page | | blkdev_releasepage | | try_to_release_page | | move_to_new_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.61%-- __put_single_page | put_page | | | |--91.27%-- putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --8.73%-- skb_free_head.part.34 | skb_release_data | __kfree_skb | tcp_recvmsg | inet_recvmsg | sock_recvmsg | sys_recvfrom | system_call_fastpath | recv | 0x0 --0.14%-- [...] 1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.52%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--94.70%-- 0x10100000006 | | | --5.30%-- 0x10100000002 --0.48%-- [...] 1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.45%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.06%-- 0x10100000006 | | | --3.94%-- 0x10100000002 | --0.55%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--97.59%-- 0x10100000006 | --2.41%-- 0x10100000002 1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2 | |--1.63%-- 0x4eec20 | | | |--47.60%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--26.98%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.63%-- 0x4eec6e | | | |--52.41%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.99%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.60%-- 0x309c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.44%-- 0x5b4cb4 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--1.32%-- 0x503457 | 0x0 | |--1.30%-- 0x65a186 | 0x0 | |--1.22%-- 0x541422 | 0x0 | |--1.08%-- 0x568f04 | | | |--93.81%-- 0x0 | | | |--6.01%-- 0x10100000006 | --0.19%-- [...] | |--1.06%-- 0x56a08e | | | |--55.97%-- 0x2fa1410 | | 0x0 | | | |--24.12%-- 0x2179410 | | 0x0 | | | --19.92%-- 0x15ba410 | 0x0 | |--1.05%-- 0x4eeeac | | | |--66.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--19.06%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --14.71%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.01%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.96%-- 0x52fb44 | | | |--91.88%-- 0x0 | | | --8.12%-- 0x10100000006 | |--0.95%-- 0x65a102 | |--0.94%-- 0x541aac | 0x0 | |--0.93%-- 0x525261 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--0.89%-- 0x540e24 | |--0.88%-- 0x477a32 | 0x0 | |--0.87%-- 0x4eee03 | | | |--47.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.15%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --20.62%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.84%-- 0x530421 | | | --100.00%-- 0x0 | |--0.83%-- 0x4eeb52 | |--0.82%-- 0x40a6a9 | |--0.79%-- 0x672601 | 0x1 | |--0.78%-- 0x564e00 | | | --100.00%-- 0x0 | |--0.78%-- 0x568e38 | | | |--95.83%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--2.15%-- 0x10100000006 | | | --2.02%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.74%-- 0x56e704 | | | |--47.84%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.61%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--10.72%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --2.83%-- 0x10100000006 | |--0.73%-- 0x5308c3 | |--0.72%-- 0x654b22 | 0x0 | |--0.71%-- 0x530094 | |--0.71%-- 0x564e04 | | | |--87.21%-- 0x0 | | | |--12.59%-- 0x46b47b | | 0xdffebc0000a88169 | --0.20%-- [...] | |--0.71%-- 0x568e5f | | | |--98.58%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --1.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.70%-- 0x4ef092 | |--0.70%-- 0x52fac2 | | | |--99.12%-- 0x0 | | | --0.88%-- 0x10100000006 | |--0.68%-- 0x541ac1 | |--0.66%-- 0x4eec22 | | | |--44.90%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--30.11%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.00%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.65%-- 0x5afab4 | | | |--48.10%-- 0x2179410 | | 0x0 | | | |--41.94%-- 0x15ba410 | | 0x0 | | | |--5.05%-- 0x0 | | | | | |--39.43%-- 0x3099550 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--35.76%-- 0x23c0e90 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --24.81%-- 0x16b2130 | | 0x5699c0 | | 0x24448948004b4154 | | | |--4.00%-- 0x2fa1410 | | 0x0 | | | --0.92%-- 0x6 | |--0.63%-- 0x65a3f6 | 0x1 | |--0.63%-- 0x659d12 | 0x0 | |--0.62%-- 0x530764 | 0x0 | |--0.62%-- 0x46e803 | 0x46b47b | | | |--72.15%-- 0xdffebc0000a88169 | | | |--16.88%-- 0xdffebec000a08169 | | | --10.97%-- 0xdffeb1d000a88169 | |--0.61%-- 0x4eeba0 | | | |--45.41%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--36.19%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --18.40%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.60%-- 0x659d61 | |--0.60%-- 0x4ff496 | |--0.59%-- 0x5030db | |--0.58%-- 0x477822 | ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot @ 2012-09-12 16:46 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw) To: Mel Gorman; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li Hi Mel - thanks for replying to my underhand bcc! Mel Gorman wrote: > I see that this is an old-ish bug but I did not read the full history. > Is it now booting faster than 3.5.0 was? I'm asking because I'm > interested to see if commit c67fe375 helped your particular case. Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be improved, as discussed. > A follow-on from commit c67fe375 was the following patch (author cc'd) > which addresses lock contention in isolate_migratepages_range where your > perf report indicates that we're spending 95% of the time. Would you be > willing to test it please? > > ---8<--- > From: Shaohua Li <shli@kernel.org> > Subject: mm: compaction: check lock contention first before taking lock > > isolate_migratepages_range will take zone->lru_lock first and check if the > lock is contented, if yes, it will release the lock. This isn't > efficient. If the lock is truly contented, a lock/unlock pair will > increase the lock contention. We'd better check if the lock is contended > first. compact_trylock_irqsave perfectly meets the requirement. > > Signed-off-by: Shaohua Li <shli@fusionio.com> > Acked-by: Mel Gorman <mgorman@suse.de> > Acked-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/compaction.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c > --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock > +++ a/mm/compaction.c > @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * > > /* Time to isolate some pages for migration */ > cond_resched(); > - spin_lock_irqsave(&zone->lru_lock, flags); > - locked = true; > + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); > + if (!locked) > + return 0; > for (; low_pfn < end_pfn; low_pfn++) { > struct page *page; I have applied and tested again - perf results below. isolate_migratepages_range is indeed much reduced. There is now a lot of time in isolate_freepages_block and still quite a lot of lock contention, although in a different place. # ======== # captured on: Wed Sep 12 16:00:52 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 1M of event 'cycles' # Event count (approx.): 560365005583 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.17%-- 0x10100000006 | | | --4.83%-- 0x10100000002 --0.01%-- [...] 15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--97.18%-- compact_checklock_irqsave | | | |--98.61%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--94.94%-- 0x10100000006 | | | | | --5.06%-- 0x10100000002 | | | --1.39%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.04%-- 0x10100000006 | | | --4.96%-- 0x10100000002 | |--1.94%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.19%-- 0x10100000006 | | | --4.81%-- 0x10100000002 --0.88%-- [...] 5.73% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.79%-- memcmp_pages | | | |--81.64%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --18.36%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.21%-- [...] 5.52% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.51%-- cpu_idle | | | |--86.19%-- start_secondary | | | --13.81%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.49%-- [...] 2.90% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.70%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.09%-- 0x10100000006 | | | --3.91%-- 0x10100000002 --0.30%-- [...] 1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.15%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.03%-- 0x10100000006 | | | --3.97%-- 0x10100000002 | --0.85%-- __alloc_pages_nodemask | |--78.22%-- alloc_pages_vma | handle_pte_fault | | | |--99.76%-- handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--91.60%-- 0x10100000006 | | | | | --8.40%-- 0x10100000002 | --0.24%-- [...] | --21.78%-- alloc_pages_current pte_alloc_one | |--97.40%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--93.12%-- 0x10100000006 | | | --6.88%-- 0x10100000002 | --2.60%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000006 1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--51.38%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.32%-- 0x10100000006 | | | --4.68%-- 0x10100000002 | |--43.05%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.52%-- 0x10100000006 | | | --4.48%-- 0x10100000002 | |--3.62%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.78%-- 0x10100000006 | | | --3.22%-- 0x10100000002 | |--1.20%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.33%-- 0x10100000006 | | | --3.67%-- 0x10100000002 | |--0.61%-- free_hot_cold_page | | | |--77.99%-- free_hot_cold_page_list | | | | | |--95.93%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--98.44%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--96.77%-- 0x10100000006 | | | | | | | | | --3.23%-- 0x10100000002 | | | | | | | --1.56%-- lru_add_drain_cpu | | | lru_add_drain | | | migrate_prep_local | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.07%-- shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--19.40%-- __free_pages | | | | | |--85.71%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--90.47%-- 0x10100000006 | | | | | | | --9.53%-- 0x10100000002 | | | | | |--10.21%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.08%-- __free_slab | | discard_slab | | __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | bdev_try_to_free_page | | blkdev_releasepage | | try_to_release_page | | move_to_new_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.61%-- __put_single_page | put_page | | | |--91.27%-- putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --8.73%-- skb_free_head.part.34 | skb_release_data | __kfree_skb | tcp_recvmsg | inet_recvmsg | sock_recvmsg | sys_recvfrom | system_call_fastpath | recv | 0x0 --0.14%-- [...] 1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.52%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--94.70%-- 0x10100000006 | | | --5.30%-- 0x10100000002 --0.48%-- [...] 1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.45%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.06%-- 0x10100000006 | | | --3.94%-- 0x10100000002 | --0.55%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--97.59%-- 0x10100000006 | --2.41%-- 0x10100000002 1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2 | |--1.63%-- 0x4eec20 | | | |--47.60%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--26.98%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.63%-- 0x4eec6e | | | |--52.41%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.99%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.60%-- 0x309c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.44%-- 0x5b4cb4 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--1.32%-- 0x503457 | 0x0 | |--1.30%-- 0x65a186 | 0x0 | |--1.22%-- 0x541422 | 0x0 | |--1.08%-- 0x568f04 | | | |--93.81%-- 0x0 | | | |--6.01%-- 0x10100000006 | --0.19%-- [...] | |--1.06%-- 0x56a08e | | | |--55.97%-- 0x2fa1410 | | 0x0 | | | |--24.12%-- 0x2179410 | | 0x0 | | | --19.92%-- 0x15ba410 | 0x0 | |--1.05%-- 0x4eeeac | | | |--66.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--19.06%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --14.71%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.01%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.96%-- 0x52fb44 | | | |--91.88%-- 0x0 | | | --8.12%-- 0x10100000006 | |--0.95%-- 0x65a102 | |--0.94%-- 0x541aac | 0x0 | |--0.93%-- 0x525261 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--0.89%-- 0x540e24 | |--0.88%-- 0x477a32 | 0x0 | |--0.87%-- 0x4eee03 | | | |--47.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.15%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --20.62%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.84%-- 0x530421 | | | --100.00%-- 0x0 | |--0.83%-- 0x4eeb52 | |--0.82%-- 0x40a6a9 | |--0.79%-- 0x672601 | 0x1 | |--0.78%-- 0x564e00 | | | --100.00%-- 0x0 | |--0.78%-- 0x568e38 | | | |--95.83%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--2.15%-- 0x10100000006 | | | --2.02%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.74%-- 0x56e704 | | | |--47.84%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.61%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--10.72%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --2.83%-- 0x10100000006 | |--0.73%-- 0x5308c3 | |--0.72%-- 0x654b22 | 0x0 | |--0.71%-- 0x530094 | |--0.71%-- 0x564e04 | | | |--87.21%-- 0x0 | | | |--12.59%-- 0x46b47b | | 0xdffebc0000a88169 | --0.20%-- [...] | |--0.71%-- 0x568e5f | | | |--98.58%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --1.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.70%-- 0x4ef092 | |--0.70%-- 0x52fac2 | | | |--99.12%-- 0x0 | | | --0.88%-- 0x10100000006 | |--0.68%-- 0x541ac1 | |--0.66%-- 0x4eec22 | | | |--44.90%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--30.11%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.00%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.65%-- 0x5afab4 | | | |--48.10%-- 0x2179410 | | 0x0 | | | |--41.94%-- 0x15ba410 | | 0x0 | | | |--5.05%-- 0x0 | | | | | |--39.43%-- 0x3099550 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--35.76%-- 0x23c0e90 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --24.81%-- 0x16b2130 | | 0x5699c0 | | 0x24448948004b4154 | | | |--4.00%-- 0x2fa1410 | | 0x0 | | | --0.92%-- 0x6 | |--0.63%-- 0x65a3f6 | 0x1 | |--0.63%-- 0x659d12 | 0x0 | |--0.62%-- 0x530764 | 0x0 | |--0.62%-- 0x46e803 | 0x46b47b | | | |--72.15%-- 0xdffebc0000a88169 | | | |--16.88%-- 0xdffebec000a08169 | | | --10.97%-- 0xdffeb1d000a88169 | |--0.61%-- 0x4eeba0 | | | |--45.41%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--36.19%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --18.40%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.60%-- 0x659d61 | |--0.60%-- 0x4ff496 | |--0.59%-- 0x5030db | |--0.58%-- 0x477822 | ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot @ 2012-09-12 16:46 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Mel - thanks for replying to my underhand bcc! Mel Gorman wrote: > I see that this is an old-ish bug but I did not read the full history. > Is it now booting faster than 3.5.0 was? I'm asking because I'm > interested to see if commit c67fe375 helped your particular case. Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be improved, as discussed. > A follow-on from commit c67fe375 was the following patch (author cc'd) > which addresses lock contention in isolate_migratepages_range where your > perf report indicates that we're spending 95% of the time. Would you be > willing to test it please? > > ---8<--- > From: Shaohua Li <shli@kernel.org> > Subject: mm: compaction: check lock contention first before taking lock > > isolate_migratepages_range will take zone->lru_lock first and check if the > lock is contented, if yes, it will release the lock. This isn't > efficient. If the lock is truly contented, a lock/unlock pair will > increase the lock contention. We'd better check if the lock is contended > first. compact_trylock_irqsave perfectly meets the requirement. > > Signed-off-by: Shaohua Li <shli@fusionio.com> > Acked-by: Mel Gorman <mgorman@suse.de> > Acked-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/compaction.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c > --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock > +++ a/mm/compaction.c > @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * > > /* Time to isolate some pages for migration */ > cond_resched(); > - spin_lock_irqsave(&zone->lru_lock, flags); > - locked = true; > + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); > + if (!locked) > + return 0; > for (; low_pfn < end_pfn; low_pfn++) { > struct page *page; I have applied and tested again - perf results below. isolate_migratepages_range is indeed much reduced. There is now a lot of time in isolate_freepages_block and still quite a lot of lock contention, although in a different place. # ======== # captured on: Wed Sep 12 16:00:52 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 1M of event 'cycles' # Event count (approx.): 560365005583 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.17%-- 0x10100000006 | | | --4.83%-- 0x10100000002 --0.01%-- [...] 15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--97.18%-- compact_checklock_irqsave | | | |--98.61%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--94.94%-- 0x10100000006 | | | | | --5.06%-- 0x10100000002 | | | --1.39%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.04%-- 0x10100000006 | | | --4.96%-- 0x10100000002 | |--1.94%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.19%-- 0x10100000006 | | | --4.81%-- 0x10100000002 --0.88%-- [...] 5.73% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.79%-- memcmp_pages | | | |--81.64%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --18.36%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.21%-- [...] 5.52% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.51%-- cpu_idle | | | |--86.19%-- start_secondary | | | --13.81%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.49%-- [...] 2.90% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.70%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.09%-- 0x10100000006 | | | --3.91%-- 0x10100000002 --0.30%-- [...] 1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.15%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.03%-- 0x10100000006 | | | --3.97%-- 0x10100000002 | --0.85%-- __alloc_pages_nodemask | |--78.22%-- alloc_pages_vma | handle_pte_fault | | | |--99.76%-- handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--91.60%-- 0x10100000006 | | | | | --8.40%-- 0x10100000002 | --0.24%-- [...] | --21.78%-- alloc_pages_current pte_alloc_one | |--97.40%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--93.12%-- 0x10100000006 | | | --6.88%-- 0x10100000002 | --2.60%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000006 1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--51.38%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.32%-- 0x10100000006 | | | --4.68%-- 0x10100000002 | |--43.05%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.52%-- 0x10100000006 | | | --4.48%-- 0x10100000002 | |--3.62%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.78%-- 0x10100000006 | | | --3.22%-- 0x10100000002 | |--1.20%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.33%-- 0x10100000006 | | | --3.67%-- 0x10100000002 | |--0.61%-- free_hot_cold_page | | | |--77.99%-- free_hot_cold_page_list | | | | | |--95.93%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--98.44%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--96.77%-- 0x10100000006 | | | | | | | | | --3.23%-- 0x10100000002 | | | | | | | --1.56%-- lru_add_drain_cpu | | | lru_add_drain | | | migrate_prep_local | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.07%-- shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--19.40%-- __free_pages | | | | | |--85.71%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--90.47%-- 0x10100000006 | | | | | | | --9.53%-- 0x10100000002 | | | | | |--10.21%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.08%-- __free_slab | | discard_slab | | __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | bdev_try_to_free_page | | blkdev_releasepage | | try_to_release_page | | move_to_new_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.61%-- __put_single_page | put_page | | | |--91.27%-- putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --8.73%-- skb_free_head.part.34 | skb_release_data | __kfree_skb | tcp_recvmsg | inet_recvmsg | sock_recvmsg | sys_recvfrom | system_call_fastpath | recv | 0x0 --0.14%-- [...] 1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.52%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--94.70%-- 0x10100000006 | | | --5.30%-- 0x10100000002 --0.48%-- [...] 1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.45%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.06%-- 0x10100000006 | | | --3.94%-- 0x10100000002 | --0.55%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--97.59%-- 0x10100000006 | --2.41%-- 0x10100000002 1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2 | |--1.63%-- 0x4eec20 | | | |--47.60%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--26.98%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.63%-- 0x4eec6e | | | |--52.41%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.99%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.60%-- 0x309c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.44%-- 0x5b4cb4 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--1.32%-- 0x503457 | 0x0 | |--1.30%-- 0x65a186 | 0x0 | |--1.22%-- 0x541422 | 0x0 | |--1.08%-- 0x568f04 | | | |--93.81%-- 0x0 | | | |--6.01%-- 0x10100000006 | --0.19%-- [...] | |--1.06%-- 0x56a08e | | | |--55.97%-- 0x2fa1410 | | 0x0 | | | |--24.12%-- 0x2179410 | | 0x0 | | | --19.92%-- 0x15ba410 | 0x0 | |--1.05%-- 0x4eeeac | | | |--66.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--19.06%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --14.71%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.01%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.96%-- 0x52fb44 | | | |--91.88%-- 0x0 | | | --8.12%-- 0x10100000006 | |--0.95%-- 0x65a102 | |--0.94%-- 0x541aac | 0x0 | |--0.93%-- 0x525261 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--0.89%-- 0x540e24 | |--0.88%-- 0x477a32 | 0x0 | |--0.87%-- 0x4eee03 | | | |--47.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.15%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --20.62%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.84%-- 0x530421 | | | --100.00%-- 0x0 | |--0.83%-- 0x4eeb52 | |--0.82%-- 0x40a6a9 | |--0.79%-- 0x672601 | 0x1 | |--0.78%-- 0x564e00 | | | --100.00%-- 0x0 | |--0.78%-- 0x568e38 | | | |--95.83%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--2.15%-- 0x10100000006 | | | --2.02%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.74%-- 0x56e704 | | | |--47.84%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.61%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--10.72%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --2.83%-- 0x10100000006 | |--0.73%-- 0x5308c3 | |--0.72%-- 0x654b22 | 0x0 | |--0.71%-- 0x530094 | |--0.71%-- 0x564e04 | | | |--87.21%-- 0x0 | | | |--12.59%-- 0x46b47b | | 0xdffebc0000a88169 | --0.20%-- [...] | |--0.71%-- 0x568e5f | | | |--98.58%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --1.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.70%-- 0x4ef092 | |--0.70%-- 0x52fac2 | | | |--99.12%-- 0x0 | | | --0.88%-- 0x10100000006 | |--0.68%-- 0x541ac1 | |--0.66%-- 0x4eec22 | | | |--44.90%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--30.11%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.00%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.65%-- 0x5afab4 | | | |--48.10%-- 0x2179410 | | 0x0 | | | |--41.94%-- 0x15ba410 | | 0x0 | | | |--5.05%-- 0x0 | | | | | |--39.43%-- 0x3099550 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--35.76%-- 0x23c0e90 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --24.81%-- 0x16b2130 | | 0x5699c0 | | 0x24448948004b4154 | | | |--4.00%-- 0x2fa1410 | | 0x0 | | | --0.92%-- 0x6 | |--0.63%-- 0x65a3f6 | 0x1 | |--0.63%-- 0x659d12 | 0x0 | |--0.62%-- 0x530764 | 0x0 | |--0.62%-- 0x46e803 | 0x46b47b | | | |--72.15%-- 0xdffebc0000a88169 | | | |--16.88%-- 0xdffebec000a08169 | | | --10.97%-- 0xdffeb1d000a88169 | |--0.61%-- 0x4eeba0 | | | |--45.41%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--36.19%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --18.40%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.60%-- 0x659d61 | |--0.60%-- 0x4ff496 | |--0.59%-- 0x5030db | |--0.58%-- 0x477822 | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot 2012-09-12 16:46 ` Richard Davies (?) @ 2012-09-13 9:50 ` Mel Gorman -1 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote: > Hi Mel - thanks for replying to my underhand bcc! > > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. > What are the boot times for each kernel? > <PATCH SNIPPED> > > I have applied and tested again - perf results below. > > isolate_migratepages_range is indeed much reduced. > > There is now a lot of time in isolate_freepages_block and still quite a lot > of lock contention, although in a different place. > This on top please. ---8<--- From: Shaohua Li <shli@fusionio.com> compaction: abort compaction loop if lock is contended or run too long isolate_migratepages_range() might isolate none pages, for example, when zone->lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone->lru_lock is even contended. V2: only abort the compaction if lock is contended or run too long Rearranged the code by Andrea Arcangeli. [minchan@kernel.org: Putback pages isolated for migration if aborting] [akpm@linux-foundation.org: Fixup one contended usage site] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 17 ++++++++++++----- mm/internal.h | 2 +- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..a8de20d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot @ 2012-09-13 9:50 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw) To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote: > Hi Mel - thanks for replying to my underhand bcc! > > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. > What are the boot times for each kernel? > <PATCH SNIPPED> > > I have applied and tested again - perf results below. > > isolate_migratepages_range is indeed much reduced. > > There is now a lot of time in isolate_freepages_block and still quite a lot > of lock contention, although in a different place. > This on top please. ---8<--- From: Shaohua Li <shli@fusionio.com> compaction: abort compaction loop if lock is contended or run too long isolate_migratepages_range() might isolate none pages, for example, when zone->lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone->lru_lock is even contended. V2: only abort the compaction if lock is contended or run too long Rearranged the code by Andrea Arcangeli. [minchan@kernel.org: Putback pages isolated for migration if aborting] [akpm@linux-foundation.org: Fixup one contended usage site] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 17 ++++++++++++----- mm/internal.h | 2 +- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..a8de20d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot @ 2012-09-13 9:50 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote: > Hi Mel - thanks for replying to my underhand bcc! > > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. > What are the boot times for each kernel? > <PATCH SNIPPED> > > I have applied and tested again - perf results below. > > isolate_migratepages_range is indeed much reduced. > > There is now a lot of time in isolate_freepages_block and still quite a lot > of lock contention, although in a different place. > This on top please. ---8<--- From: Shaohua Li <shli@fusionio.com> compaction: abort compaction loop if lock is contended or run too long isolate_migratepages_range() might isolate none pages, for example, when zone->lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone->lru_lock is even contended. V2: only abort the compaction if lock is contended or run too long Rearranged the code by Andrea Arcangeli. [minchan@kernel.org: Putback pages isolated for migration if aborting] [akpm@linux-foundation.org: Fixup one contended usage site] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 17 ++++++++++++----- mm/internal.h | 2 +- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..a8de20d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" 2012-09-12 16:46 ` Richard Davies (?) @ 2012-09-13 19:47 ` Rik van Riel -1 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, 12 Sep 2012 17:46:15 +0100 Richard Davies <richard@arachsys.com> wrote: > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d, I believe it re-introduces the quadratic behaviour that the code was suffering from before, by not moving zone->compact_cached_free_pfn down when no more free pfns are found in a page block. This mail reverts that changeset, the next introduces what I hope to be the proper fix. Richard, would you be willing to give these patches a try, since your system seems to reproduce this bug easily? ---8<--- Revert "mm: have order > 0 compaction start near a pageblock with free pages" This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d. Mel found a real issue with my "skip ahead" logic in the compaction code, but unfortunately his approach appears to have re-introduced quadratic behaviour in that the value of zone->compact_cached_free_pfn is never advanced until the compaction run wraps around the start of the zone. This merely moved the starting point for the quadratic behaviour further into the zone, but the behaviour has still been observed. It looks like another fix is required. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..771775d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. - */ -static unsigned long start_free_pfn(struct zone *zone) -{ - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; -} - -/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously. If we wrapped around, we can only skip + * ahead if zone->compact_cached_free_pfn also wrapped to + * above our starting point. + */ + if (cc->order > 0 && (!cc->wrapped || + zone->compact_cached_free_pfn > + cc->start_free_pfn)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [Qemu-devel] [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" @ 2012-09-13 19:47 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw) To: Richard Davies Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity On Wed, 12 Sep 2012 17:46:15 +0100 Richard Davies <richard@arachsys.com> wrote: > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d, I believe it re-introduces the quadratic behaviour that the code was suffering from before, by not moving zone->compact_cached_free_pfn down when no more free pfns are found in a page block. This mail reverts that changeset, the next introduces what I hope to be the proper fix. Richard, would you be willing to give these patches a try, since your system seems to reproduce this bug easily? ---8<--- Revert "mm: have order > 0 compaction start near a pageblock with free pages" This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d. Mel found a real issue with my "skip ahead" logic in the compaction code, but unfortunately his approach appears to have re-introduced quadratic behaviour in that the value of zone->compact_cached_free_pfn is never advanced until the compaction run wraps around the start of the zone. This merely moved the starting point for the quadratic behaviour further into the zone, but the behaviour has still been observed. It looks like another fix is required. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..771775d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. - */ -static unsigned long start_free_pfn(struct zone *zone) -{ - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; -} - -/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously. If we wrapped around, we can only skip + * ahead if zone->compact_cached_free_pfn also wrapped to + * above our starting point. + */ + if (cc->order > 0 && (!cc->wrapped || + zone->compact_cached_free_pfn > + cc->start_free_pfn)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" @ 2012-09-13 19:47 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, 12 Sep 2012 17:46:15 +0100 Richard Davies <richard@arachsys.com> wrote: > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d, I believe it re-introduces the quadratic behaviour that the code was suffering from before, by not moving zone->compact_cached_free_pfn down when no more free pfns are found in a page block. This mail reverts that changeset, the next introduces what I hope to be the proper fix. Richard, would you be willing to give these patches a try, since your system seems to reproduce this bug easily? ---8<--- Revert "mm: have order > 0 compaction start near a pageblock with free pages" This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d. Mel found a real issue with my "skip ahead" logic in the compaction code, but unfortunately his approach appears to have re-introduced quadratic behaviour in that the value of zone->compact_cached_free_pfn is never advanced until the compaction run wraps around the start of the zone. This merely moved the starting point for the quadratic behaviour further into the zone, but the behaviour has still been observed. It looks like another fix is required. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..771775d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. - */ -static unsigned long start_free_pfn(struct zone *zone) -{ - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; -} - -/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously. If we wrapped around, we can only skip + * ahead if zone->compact_cached_free_pfn also wrapped to + * above our starting point. + */ + if (cc->order > 0 && (!cc->wrapped || + zone->compact_cached_free_pfn > + cc->start_free_pfn)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 2/2] make the compaction "skip ahead" logic robust 2012-09-12 16:46 ` Richard Davies (?) @ 2012-09-13 19:48 ` Rik van Riel -1 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn) + return true; + + if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [Qemu-devel] [PATCH 2/2] make the compaction "skip ahead" logic robust @ 2012-09-13 19:48 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw) To: Richard Davies Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn) + return true; + + if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 2/2] make the compaction "skip ahead" logic robust @ 2012-09-13 19:48 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn) + return true; + + if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-13 19:48 ` Rik van Riel (?) @ 2012-09-13 19:54 ` Rik van Riel -1 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Argh. And of course I send out the version from _before_ the compile test, instead of the one after! I am not used to caffeine any more and have had way too much tea... ---8<--- Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-13 19:54 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw) To: Richard Davies Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity Argh. And of course I send out the version from _before_ the compile test, instead of the one after! I am not used to caffeine any more and have had way too much tea... ---8<--- Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) ^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-13 19:54 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Argh. And of course I send out the version from _before_ the compile test, instead of the one after! I am not used to caffeine any more and have had way too much tea... ---8<--- Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-13 19:54 ` Rik van Riel (?) @ 2012-09-15 15:55 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw) To: Rik van Riel Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Rik, Mel and Shaohua, Thank you for your latest patches. I attach my latest perf report for a slow boot with all of these applied. Mel asked for timings of the slow boots. It's very hard to give anything useful here! A normal boot would be a minute or so, and many are like that, but the slowest that I have seen (on 3.5.x) was several hours. Basically, I just test many times until I get one which is noticeably slow than normal and then run perf record on that one. The latest perf report for a slow boot is below. For the fast boots, most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow one there is a lot of lock contention above that. Thanks, Richard. # ======== # captured on: Sat Sep 15 15:40:54 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1457256240581 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.07%-- compact_checklock_irqsave | | | |--70.03%-- isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--92.76%-- 0x10100000006 | | | | | --7.24%-- 0x10100000002 | | | --29.97%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.69%-- 0x10100000006 | | | --9.31%-- 0x10100000002 | |--4.53%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--92.22%-- 0x10100000006 | | | --7.78%-- 0x10100000002 --0.40%-- [...] 13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.38%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--51.86%-- 0x10100000006 | | | |--48.14%-- 0x10100000002 | --0.01%-- [...] | --0.62%-- __alloc_pages_nodemask | |--76.27%-- alloc_pages_vma | handle_pte_fault | | | |--99.57%-- handle_mm_fault | | | | | |--99.65%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--91.77%-- 0x10100000006 | | | | | | | --8.23%-- 0x10100000002 | | --0.35%-- [...] | --0.43%-- [...] | --23.73%-- alloc_pages_current | |--99.20%-- pte_alloc_one | | | |--98.68%-- do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--58.61%-- 0x10100000002 | | | | | --41.39%-- 0x10100000006 | | | --1.32%-- __pte_alloc | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000006 | |--0.69%-- __vmalloc_node_range | __vmalloc_node | vzalloc | __kvm_set_memory_region | kvm_set_memory_region | kvm_vm_ioctl_set_memory_region | kvm_vm_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl --0.12%-- [...] 6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.98%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.13%-- 0x10100000006 | | | --8.87%-- 0x10100000002 --0.02%-- [...] 1.68% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.65%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.78%-- 0x10100000006 | | | --11.22%-- 0x10100000002 --0.35%-- [...] 1.24% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.78%-- memcmp_pages | | | |--77.17%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.83%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.22%-- [...] 1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.44%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.15%-- 0x10100000006 | | | |--17.85%-- 0x10100000002 | --0.00%-- [...] | --0.56%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.21%-- 0x10100000006 | --24.79%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.74%-- cpu_idle | | | |--76.31%-- start_secondary | | | --23.69%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.26%-- [...] 1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.97%-- native_flush_tlb_others | | | |--99.78%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.22%-- [...] --0.03%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.36%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.08%-- 0x10100000006 | | | |--9.92%-- 0x10100000002 | --0.00%-- [...] | --0.64%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--87.37%-- 0x10100000006 | --12.63%-- 0x10100000002 0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.98%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.29%-- 0x10100000006 | | | --8.71%-- 0x10100000002 --0.02%-- [...] 0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--39.71%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.52%-- 0x10100000006 | | | --9.48%-- 0x10100000002 | |--15.63%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.96%-- 0x10100000006 | | | --9.04%-- 0x10100000002 | |--6.55%-- tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.78%-- 0x10100000006 | | | --21.22%-- 0x10100000002 | |--4.87%-- free_pcppages_bulk | | | |--51.10%-- free_hot_cold_page | | | | | |--83.60%-- free_hot_cold_page_list | | | | | | | |--62.17%-- release_pages | | | | pagevec_lru_move_fn | | | | __pagevec_lru_add | | | | | | | | | |--99.22%-- __lru_cache_add | | | | | lru_cache_add_lru | | | | | putback_lru_page | | | | | | | | | | | |--99.61%-- migrate_pages | | | | | | compact_zone | | | | | | compact_zone_order | | | | | | try_to_compact_pages | | | | | | __alloc_pages_direct_compact | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--88.98%-- 0x10100000006 | | | | | | | | | | | | | --11.02%-- 0x10100000002 | | | | | --0.39%-- [...] | | | | | | | | | --0.78%-- lru_add_drain_cpu | | | | lru_add_drain | | | | migrate_prep_local | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | --37.83%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--86.38%-- 0x10100000006 | | | | | | | --13.62%-- 0x10100000002 | | | | | |--12.96%-- __free_pages | | | | | | | |--98.43%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--90.49%-- 0x10100000006 | | | | | | | | | --9.51%-- 0x10100000002 | | | | | | | --1.57%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | bdev_try_to_free_page | | | blkdev_releasepage | | | try_to_release_page | | | move_to_new_page | | | migrate_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --3.44%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--88.25%-- 0x10100000006 | | | | | --11.75%-- 0x10100000002 | | | --48.90%-- drain_pages | | | |--88.65%-- drain_local_pages | | | | | |--96.33%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--23.46%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.81%-- 0x10100000006 | | | | | | | | | --6.19%-- 0x10100000002 | | | | | | | |--19.93%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.65%-- 0x10100000006 | | | | | | | | | --6.35%-- 0x10100000002 | | | | | | | |--14.19%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--89.88%-- 0x10100000006 | | | | | | | | | --10.12%-- 0x10100000002 | | | | | | | |--8.57%-- isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.14%-- 0x10100000006 | | | | | | | | | --7.86%-- 0x10100000002 | | | | | | | |--5.05%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.53%-- 0x10100000006 | | | | | | | | | --7.47%-- 0x10100000002 | | | | | | | |--4.49%-- shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--94.61%-- 0x10100000006 | | | | | | | | | --5.39%-- 0x10100000002 | | | | | | | |--2.80%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--91.24%-- 0x10100000006 | | | | | | | | | --8.76%-- 0x10100000002 | | | | | | | |--1.96%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--63.14%-- 0x10100000006 | | | | | | | | | --36.86%-- 0x10100000002 | | | | | | | |--1.62%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.49%-- compact_checklock_irqsave | | | | isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.46%-- __mutex_lock_slowpath | | | | mutex_lock | | | | page_lock_anon_vma | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.41%-- native_flush_tlb_others | | | | flush_tlb_page | | | | | | | | | |--67.10%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-15 15:55 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw) To: Rik van Riel Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity Hi Rik, Mel and Shaohua, Thank you for your latest patches. I attach my latest perf report for a slow boot with all of these applied. Mel asked for timings of the slow boots. It's very hard to give anything useful here! A normal boot would be a minute or so, and many are like that, but the slowest that I have seen (on 3.5.x) was several hours. Basically, I just test many times until I get one which is noticeably slow than normal and then run perf record on that one. The latest perf report for a slow boot is below. For the fast boots, most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow one there is a lot of lock contention above that. Thanks, Richard. # ======== # captured on: Sat Sep 15 15:40:54 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1457256240581 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.07%-- compact_checklock_irqsave | | | |--70.03%-- isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--92.76%-- 0x10100000006 | | | | | --7.24%-- 0x10100000002 | | | --29.97%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.69%-- 0x10100000006 | | | --9.31%-- 0x10100000002 | |--4.53%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--92.22%-- 0x10100000006 | | | --7.78%-- 0x10100000002 --0.40%-- [...] 13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.38%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--51.86%-- 0x10100000006 | | | |--48.14%-- 0x10100000002 | --0.01%-- [...] | --0.62%-- __alloc_pages_nodemask | |--76.27%-- alloc_pages_vma | handle_pte_fault | | | |--99.57%-- handle_mm_fault | | | | | |--99.65%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--91.77%-- 0x10100000006 | | | | | | | --8.23%-- 0x10100000002 | | --0.35%-- [...] | --0.43%-- [...] | --23.73%-- alloc_pages_current | |--99.20%-- pte_alloc_one | | | |--98.68%-- do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--58.61%-- 0x10100000002 | | | | | --41.39%-- 0x10100000006 | | | --1.32%-- __pte_alloc | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000006 | |--0.69%-- __vmalloc_node_range | __vmalloc_node | vzalloc | __kvm_set_memory_region | kvm_set_memory_region | kvm_vm_ioctl_set_memory_region | kvm_vm_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl --0.12%-- [...] 6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.98%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.13%-- 0x10100000006 | | | --8.87%-- 0x10100000002 --0.02%-- [...] 1.68% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.65%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.78%-- 0x10100000006 | | | --11.22%-- 0x10100000002 --0.35%-- [...] 1.24% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.78%-- memcmp_pages | | | |--77.17%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.83%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.22%-- [...] 1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.44%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.15%-- 0x10100000006 | | | |--17.85%-- 0x10100000002 | --0.00%-- [...] | --0.56%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.21%-- 0x10100000006 | --24.79%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.74%-- cpu_idle | | | |--76.31%-- start_secondary | | | --23.69%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.26%-- [...] 1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.97%-- native_flush_tlb_others | | | |--99.78%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.22%-- [...] --0.03%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.36%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.08%-- 0x10100000006 | | | |--9.92%-- 0x10100000002 | --0.00%-- [...] | --0.64%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--87.37%-- 0x10100000006 | --12.63%-- 0x10100000002 0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.98%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.29%-- 0x10100000006 | | | --8.71%-- 0x10100000002 --0.02%-- [...] 0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--39.71%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.52%-- 0x10100000006 | | | --9.48%-- 0x10100000002 | |--15.63%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.96%-- 0x10100000006 | | | --9.04%-- 0x10100000002 | |--6.55%-- tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.78%-- 0x10100000006 | | | --21.22%-- 0x10100000002 | |--4.87%-- free_pcppages_bulk | | | |--51.10%-- free_hot_cold_page | | | | | |--83.60%-- free_hot_cold_page_list | | | | | | | |--62.17%-- release_pages | | | | pagevec_lru_move_fn | | | | __pagevec_lru_add | | | | | | | | | |--99.22%-- __lru_cache_add | | | | | lru_cache_add_lru | | | | | putback_lru_page | | | | | | | | | | | |--99.61%-- migrate_pages | | | | | | compact_zone | | | | | | compact_zone_order | | | | | | try_to_compact_pages | | | | | | __alloc_pages_direct_compact | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--88.98%-- 0x10100000006 | | | | | | | | | | | | | --11.02%-- 0x10100000002 | | | | | --0.39%-- [...] | | | | | | | | | --0.78%-- lru_add_drain_cpu | | | | lru_add_drain | | | | migrate_prep_local | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | --37.83%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--86.38%-- 0x10100000006 | | | | | | | --13.62%-- 0x10100000002 | | | | | |--12.96%-- __free_pages | | | | | | | |--98.43%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--90.49%-- 0x10100000006 | | | | | | | | | --9.51%-- 0x10100000002 | | | | | | | --1.57%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | bdev_try_to_free_page | | | blkdev_releasepage | | | try_to_release_page | | | move_to_new_page | | | migrate_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --3.44%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--88.25%-- 0x10100000006 | | | | | --11.75%-- 0x10100000002 | | | --48.90%-- drain_pages | | | |--88.65%-- drain_local_pages | | | | | |--96.33%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--23.46%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.81%-- 0x10100000006 | | | | | | | | | --6.19%-- 0x10100000002 | | | | | | | |--19.93%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.65%-- 0x10100000006 | | | | | | | | | --6.35%-- 0x10100000002 | | | | | | | |--14.19%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--89.88%-- 0x10100000006 | | | | | | | | | --10.12%-- 0x10100000002 | | | | | | | |--8.57%-- isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.14%-- 0x10100000006 | | | | | | | | | --7.86%-- 0x10100000002 | | | | | | | |--5.05%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.53%-- 0x10100000006 | | | | | | | | | --7.47%-- 0x10100000002 | | | | | | | |--4.49%-- shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--94.61%-- 0x10100000006 | | | | | | | | | --5.39%-- 0x10100000002 | | | | | | | |--2.80%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--91.24%-- 0x10100000006 | | | | | | | | | --8.76%-- 0x10100000002 | | | | | | | |--1.96%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--63.14%-- 0x10100000006 | | | | | | | | | --36.86%-- 0x10100000002 | | | | | | | |--1.62%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.49%-- compact_checklock_irqsave | | | | isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.46%-- __mutex_lock_slowpath | | | | mutex_lock | | | | page_lock_anon_vma | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.41%-- native_flush_tlb_others | | | | flush_tlb_page | | | | | | | | | |--67.10%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-15 15:55 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw) To: Rik van Riel Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Rik, Mel and Shaohua, Thank you for your latest patches. I attach my latest perf report for a slow boot with all of these applied. Mel asked for timings of the slow boots. It's very hard to give anything useful here! A normal boot would be a minute or so, and many are like that, but the slowest that I have seen (on 3.5.x) was several hours. Basically, I just test many times until I get one which is noticeably slow than normal and then run perf record on that one. The latest perf report for a slow boot is below. For the fast boots, most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow one there is a lot of lock contention above that. Thanks, Richard. # ======== # captured on: Sat Sep 15 15:40:54 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1457256240581 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.07%-- compact_checklock_irqsave | | | |--70.03%-- isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--92.76%-- 0x10100000006 | | | | | --7.24%-- 0x10100000002 | | | --29.97%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.69%-- 0x10100000006 | | | --9.31%-- 0x10100000002 | |--4.53%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--92.22%-- 0x10100000006 | | | --7.78%-- 0x10100000002 --0.40%-- [...] 13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.38%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--51.86%-- 0x10100000006 | | | |--48.14%-- 0x10100000002 | --0.01%-- [...] | --0.62%-- __alloc_pages_nodemask | |--76.27%-- alloc_pages_vma | handle_pte_fault | | | |--99.57%-- handle_mm_fault | | | | | |--99.65%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--91.77%-- 0x10100000006 | | | | | | | --8.23%-- 0x10100000002 | | --0.35%-- [...] | --0.43%-- [...] | --23.73%-- alloc_pages_current | |--99.20%-- pte_alloc_one | | | |--98.68%-- do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--58.61%-- 0x10100000002 | | | | | --41.39%-- 0x10100000006 | | | --1.32%-- __pte_alloc | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000006 | |--0.69%-- __vmalloc_node_range | __vmalloc_node | vzalloc | __kvm_set_memory_region | kvm_set_memory_region | kvm_vm_ioctl_set_memory_region | kvm_vm_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl --0.12%-- [...] 6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.98%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.13%-- 0x10100000006 | | | --8.87%-- 0x10100000002 --0.02%-- [...] 1.68% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.65%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.78%-- 0x10100000006 | | | --11.22%-- 0x10100000002 --0.35%-- [...] 1.24% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.78%-- memcmp_pages | | | |--77.17%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.83%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.22%-- [...] 1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.44%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.15%-- 0x10100000006 | | | |--17.85%-- 0x10100000002 | --0.00%-- [...] | --0.56%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.21%-- 0x10100000006 | --24.79%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.74%-- cpu_idle | | | |--76.31%-- start_secondary | | | --23.69%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.26%-- [...] 1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.97%-- native_flush_tlb_others | | | |--99.78%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.22%-- [...] --0.03%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.36%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.08%-- 0x10100000006 | | | |--9.92%-- 0x10100000002 | --0.00%-- [...] | --0.64%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--87.37%-- 0x10100000006 | --12.63%-- 0x10100000002 0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.98%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.29%-- 0x10100000006 | | | --8.71%-- 0x10100000002 --0.02%-- [...] 0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--39.71%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.52%-- 0x10100000006 | | | --9.48%-- 0x10100000002 | |--15.63%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.96%-- 0x10100000006 | | | --9.04%-- 0x10100000002 | |--6.55%-- tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.78%-- 0x10100000006 | | | --21.22%-- 0x10100000002 | |--4.87%-- free_pcppages_bulk | | | |--51.10%-- free_hot_cold_page | | | | | |--83.60%-- free_hot_cold_page_list | | | | | | | |--62.17%-- release_pages | | | | pagevec_lru_move_fn | | | | __pagevec_lru_add | | | | | | | | | |--99.22%-- __lru_cache_add | | | | | lru_cache_add_lru | | | | | putback_lru_page | | | | | | | | | | | |--99.61%-- migrate_pages | | | | | | compact_zone | | | | | | compact_zone_order | | | | | | try_to_compact_pages | | | | | | __alloc_pages_direct_compact | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--88.98%-- 0x10100000006 | | | | | | | | | | | | | --11.02%-- 0x10100000002 | | | | | --0.39%-- [...] | | | | | | | | | --0.78%-- lru_add_drain_cpu | | | | lru_add_drain | | | | migrate_prep_local | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | --37.83%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--86.38%-- 0x10100000006 | | | | | | | --13.62%-- 0x10100000002 | | | | | |--12.96%-- __free_pages | | | | | | | |--98.43%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--90.49%-- 0x10100000006 | | | | | | | | | --9.51%-- 0x10100000002 | | | | | | | --1.57%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | bdev_try_to_free_page | | | blkdev_releasepage | | | try_to_release_page | | | move_to_new_page | | | migrate_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --3.44%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--88.25%-- 0x10100000006 | | | | | --11.75%-- 0x10100000002 | | | --48.90%-- drain_pages | | | |--88.65%-- drain_local_pages | | | | | |--96.33%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--23.46%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.81%-- 0x10100000006 | | | | | | | | | --6.19%-- 0x10100000002 | | | | | | | |--19.93%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.65%-- 0x10100000006 | | | | | | | | | --6.35%-- 0x10100000002 | | | | | | | |--14.19%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--89.88%-- 0x10100000006 | | | | | | | | | --10.12%-- 0x10100000002 | | | | | | | |--8.57%-- isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.14%-- 0x10100000006 | | | | | | | | | --7.86%-- 0x10100000002 | | | | | | | |--5.05%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.53%-- 0x10100000006 | | | | | | | | | --7.47%-- 0x10100000002 | | | | | | | |--4.49%-- shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--94.61%-- 0x10100000006 | | | | | | | | | --5.39%-- 0x10100000002 | | | | | | | |--2.80%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--91.24%-- 0x10100000006 | | | | | | | | | --8.76%-- 0x10100000002 | | | | | | | |--1.96%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--63.14%-- 0x10100000006 | | | | | | | | | --36.86%-- 0x10100000002 | | | | | | | |--1.62%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.49%-- compact_checklock_irqsave | | | | isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.46%-- __mutex_lock_slowpath | | | | mutex_lock | | | | page_lock_anon_vma | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.41%-- native_flush_tlb_others | | | | flush_tlb_page | | | | | | | | | |--67.10%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-15 15:55 ` Richard Davies @ 2012-09-16 19:12 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-16 19:12 UTC (permalink / raw) To: Rik van Riel Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Richard Davies wrote: > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. For avoidance of any doubt, there is the combined diff versus 3.6.0-rc5 which I tested: diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 38b42e7..090405d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1383,10 +1383,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, qgroup_dirty(fs_info, srcgroup); } - if (!inherit) { - ret = -EINVAL; + if (!inherit) goto unlock; - } i_qgroups = (u64 *)(inherit + 1); for (i = 0; i < inherit->num_qgroups; ++i) { diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..92bae88 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -296,8 +295,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; @@ -431,17 +431,21 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. */ -static unsigned long start_free_pfn(struct zone *zone) +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) { - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; } /* @@ -483,6 +487,13 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously, and has finished with this page block. + */ + if (cc->order > 0 && compaction_may_skip(zone, cc)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +544,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +554,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -634,7 +632,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -642,6 +640,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { @@ -787,6 +799,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +845,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +853,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-16 19:12 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-16 19:12 UTC (permalink / raw) To: Rik van Riel Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity Richard Davies wrote: > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. For avoidance of any doubt, there is the combined diff versus 3.6.0-rc5 which I tested: diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 38b42e7..090405d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1383,10 +1383,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, qgroup_dirty(fs_info, srcgroup); } - if (!inherit) { - ret = -EINVAL; + if (!inherit) goto unlock; - } i_qgroups = (u64 *)(inherit + 1); for (i = 0; i < inherit->num_qgroups; ++i) { diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..92bae88 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -296,8 +295,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; @@ -431,17 +431,21 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. */ -static unsigned long start_free_pfn(struct zone *zone) +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) { - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; } /* @@ -483,6 +487,13 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously, and has finished with this page block. + */ + if (cc->order > 0 && compaction_may_skip(zone, cc)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +544,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +554,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -634,7 +632,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -642,6 +640,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { @@ -787,6 +799,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +845,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +853,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-15 15:55 ` Richard Davies @ 2012-09-17 12:26 ` Mel Gorman -1 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-17 12:26 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Sat, Sep 15, 2012 at 04:55:24PM +0100, Richard Davies wrote: > Hi Rik, Mel and Shaohua, > > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. > Thanks for testing. > Mel asked for timings of the slow boots. It's very hard to give anything > useful here! A normal boot would be a minute or so, and many are like that, > but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > just test many times until I get one which is noticeably slow than normal > and then run perf record on that one. > Ok. > The latest perf report for a slow boot is below. For the fast boots, most of > the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > one there is a lot of lock contention above that. > > <SNIP> > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--95.07%-- compact_checklock_irqsave > | | > | |--70.03%-- isolate_migratepages_range > <SNIP> > | --29.97%-- compaction_alloc > | > |--4.53%-- isolate_migratepages_range > <SNIP> This is going the right direction but usage due to contentions is still obviously stupidly high. Compaction features throughout the profile but staying focused on the lock contention for the moment. Can you try the following patch? So far I'm not having much luck reproducing this locally. ---8<--- mm: compaction: Only release lru_lock every SWAP_CLUSTER_MAX pages if necessary Commit b2eef8c0 (mm: compaction: minimise the time IRQs are disabled while isolating pages for migration) releases the lru_lock every SWAP_CLUSTER_MAX pages that are scanned as it was found at the time that compaction could contend badly with page reclaim. This can lead to a situation where compaction contends heavily with itself as it releases and reacquires the LRU lock. This patch makes two changes to how the migrate scanner acquires the LRU lock. First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages if the lock is contended. This reduces the number of times it unnnecessarily disables and reenables IRQs. The second is that it defers acquiring the LRU lock for as long as possible. In cases where transparent hugepages are encountered the LRU lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 65 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 21 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 39342ee..1874f23 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -50,6 +50,11 @@ static inline bool migrate_async_suitable(int migratetype) return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; } +static inline bool should_release_lock(spinlock_t *lock) +{ + return need_resched() || spin_is_contended(lock); +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. Check if the process needs to be scheduled or @@ -62,7 +67,7 @@ static inline bool migrate_async_suitable(int migratetype) static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, bool locked, struct compact_control *cc) { - if (need_resched() || spin_is_contended(lock)) { + if (should_release_lock(lock)) { if (locked) { spin_unlock_irqrestore(lock, *flags); locked = false; @@ -275,7 +280,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, isolate_mode_t mode = 0; struct lruvec *lruvec; unsigned long flags; - bool locked; + bool locked = false; /* * Ensure that there are not too many pages isolated from the LRU @@ -295,24 +300,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Time to isolate some pages for migration */ cond_resched(); - locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); - if (!locked) - return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; /* give a chance to irqs before checking need_resched() */ - if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { - spin_unlock_irqrestore(&zone->lru_lock, flags); - locked = false; + if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) { + if (should_release_lock(&zone->lru_lock)) { + spin_unlock_irqrestore(&zone->lru_lock, flags); + locked = false; + } } - /* Check if it is ok to still hold the lock */ - locked = compact_checklock_irqsave(&zone->lru_lock, &flags, - locked, cc); - if (!locked) - break; - /* * migrate_pfn does not necessarily start aligned to a * pageblock. Ensure that pfn_valid is called when moving @@ -352,21 +350,38 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, pageblock_nr = low_pfn >> pageblock_order; if (!cc->sync && last_pageblock_nr != pageblock_nr && !migrate_async_suitable(get_pageblock_migratetype(page))) { - low_pfn += pageblock_nr_pages; - low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; - last_pageblock_nr = pageblock_nr; - continue; + goto next_pageblock; } + /* Check may be lockless but that's ok as we recheck later */ if (!PageLRU(page)) continue; /* - * PageLRU is set, and lru_lock excludes isolation, - * splitting and collapsing (collapsing has already - * happened if PageLRU is set). + * PageLRU is set. lru_lock normally excludes isolation + * splitting and collapsing (collapsing has already happened + * if PageLRU is set) but the lock is not necessarily taken + * here and it is wasteful to take it just to check transhuge. + * Check transhuge without lock and skip if it's either a + * transhuge or hugetlbfs page. */ if (PageTransHuge(page)) { + if (!locked) + goto next_pageblock; + low_pfn += (1 << compound_order(page)) - 1; + continue; + } + + /* Check if it is ok to still hold the lock */ + locked = compact_checklock_irqsave(&zone->lru_lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck PageLRU and PageTransHuge under lock */ + if (!PageLRU(page)) + continue; + if (PageTransHuge(page)) { low_pfn += (1 << compound_order(page)) - 1; continue; } @@ -393,6 +408,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, ++low_pfn; break; } + + continue; + +next_pageblock: + low_pfn += pageblock_nr_pages; + low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; + last_pageblock_nr = pageblock_nr; + } acct_isolated(zone, locked, cc); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-17 12:26 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-17 12:26 UTC (permalink / raw) To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li On Sat, Sep 15, 2012 at 04:55:24PM +0100, Richard Davies wrote: > Hi Rik, Mel and Shaohua, > > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. > Thanks for testing. > Mel asked for timings of the slow boots. It's very hard to give anything > useful here! A normal boot would be a minute or so, and many are like that, > but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > just test many times until I get one which is noticeably slow than normal > and then run perf record on that one. > Ok. > The latest perf report for a slow boot is below. For the fast boots, most of > the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > one there is a lot of lock contention above that. > > <SNIP> > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--95.07%-- compact_checklock_irqsave > | | > | |--70.03%-- isolate_migratepages_range > <SNIP> > | --29.97%-- compaction_alloc > | > |--4.53%-- isolate_migratepages_range > <SNIP> This is going the right direction but usage due to contentions is still obviously stupidly high. Compaction features throughout the profile but staying focused on the lock contention for the moment. Can you try the following patch? So far I'm not having much luck reproducing this locally. ---8<--- mm: compaction: Only release lru_lock every SWAP_CLUSTER_MAX pages if necessary Commit b2eef8c0 (mm: compaction: minimise the time IRQs are disabled while isolating pages for migration) releases the lru_lock every SWAP_CLUSTER_MAX pages that are scanned as it was found at the time that compaction could contend badly with page reclaim. This can lead to a situation where compaction contends heavily with itself as it releases and reacquires the LRU lock. This patch makes two changes to how the migrate scanner acquires the LRU lock. First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages if the lock is contended. This reduces the number of times it unnnecessarily disables and reenables IRQs. The second is that it defers acquiring the LRU lock for as long as possible. In cases where transparent hugepages are encountered the LRU lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 65 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 21 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 39342ee..1874f23 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -50,6 +50,11 @@ static inline bool migrate_async_suitable(int migratetype) return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; } +static inline bool should_release_lock(spinlock_t *lock) +{ + return need_resched() || spin_is_contended(lock); +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. Check if the process needs to be scheduled or @@ -62,7 +67,7 @@ static inline bool migrate_async_suitable(int migratetype) static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, bool locked, struct compact_control *cc) { - if (need_resched() || spin_is_contended(lock)) { + if (should_release_lock(lock)) { if (locked) { spin_unlock_irqrestore(lock, *flags); locked = false; @@ -275,7 +280,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, isolate_mode_t mode = 0; struct lruvec *lruvec; unsigned long flags; - bool locked; + bool locked = false; /* * Ensure that there are not too many pages isolated from the LRU @@ -295,24 +300,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Time to isolate some pages for migration */ cond_resched(); - locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); - if (!locked) - return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; /* give a chance to irqs before checking need_resched() */ - if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { - spin_unlock_irqrestore(&zone->lru_lock, flags); - locked = false; + if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) { + if (should_release_lock(&zone->lru_lock)) { + spin_unlock_irqrestore(&zone->lru_lock, flags); + locked = false; + } } - /* Check if it is ok to still hold the lock */ - locked = compact_checklock_irqsave(&zone->lru_lock, &flags, - locked, cc); - if (!locked) - break; - /* * migrate_pfn does not necessarily start aligned to a * pageblock. Ensure that pfn_valid is called when moving @@ -352,21 +350,38 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, pageblock_nr = low_pfn >> pageblock_order; if (!cc->sync && last_pageblock_nr != pageblock_nr && !migrate_async_suitable(get_pageblock_migratetype(page))) { - low_pfn += pageblock_nr_pages; - low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; - last_pageblock_nr = pageblock_nr; - continue; + goto next_pageblock; } + /* Check may be lockless but that's ok as we recheck later */ if (!PageLRU(page)) continue; /* - * PageLRU is set, and lru_lock excludes isolation, - * splitting and collapsing (collapsing has already - * happened if PageLRU is set). + * PageLRU is set. lru_lock normally excludes isolation + * splitting and collapsing (collapsing has already happened + * if PageLRU is set) but the lock is not necessarily taken + * here and it is wasteful to take it just to check transhuge. + * Check transhuge without lock and skip if it's either a + * transhuge or hugetlbfs page. */ if (PageTransHuge(page)) { + if (!locked) + goto next_pageblock; + low_pfn += (1 << compound_order(page)) - 1; + continue; + } + + /* Check if it is ok to still hold the lock */ + locked = compact_checklock_irqsave(&zone->lru_lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck PageLRU and PageTransHuge under lock */ + if (!PageLRU(page)) + continue; + if (PageTransHuge(page)) { low_pfn += (1 << compound_order(page)) - 1; continue; } @@ -393,6 +408,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, ++low_pfn; break; } + + continue; + +next_pageblock: + low_pfn += pageblock_nr_pages; + low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; + last_pageblock_nr = pageblock_nr; + } acct_isolated(zone, locked, cc); ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-17 12:26 ` [Qemu-devel] " Mel Gorman @ 2012-09-18 8:14 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-18 8:14 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Mel, Thanks for your latest patch, I attach a perf report below with this on top of all previous patches. There is still lock contention, though in a different place. Regarding Rik's question: > > Mel asked for timings of the slow boots. It's very hard to give anything > > useful here! A normal boot would be a minute or so, and many are like > > that, but the slowest that I have seen (on 3.5.x) was several hours. > > Basically, I just test many times until I get one which is noticeably > > slow than normal and then run perf record on that one. > > > > The latest perf report for a slow boot is below. For the fast boots, > > most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but > > for this slow one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? It is about 1/3rd slow boots, some of which are slower than others. I do about ten and send you the trace of the worst. Experimentally, copying large files (the VM image files) immediately before booting the VM seems to make a slow boot more likely. Thanks, Richard. # ======== # captured on: Mon Sep 17 20:09:33 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 4M of event 'cycles' # Event count (approx.): 1616311320818 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.30%-- compact_checklock_irqsave | | | |--99.98%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--84.28%-- 0x10100000006 | | | | | --15.72%-- 0x10100000002 | --0.02%-- [...] | |--0.65%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.37%-- 0x10100000006 | | | --16.63%-- 0x10100000002 --0.05%-- [...] 12.27% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.90%-- 0x10100000006 | | | --17.10%-- 0x10100000002 --0.01%-- [...] 7.90% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.19%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--64.93%-- 0x10100000006 | | | --35.07%-- 0x10100000002 | --0.81%-- __alloc_pages_nodemask | |--84.23%-- alloc_pages_vma | handle_pte_fault | | | |--99.62%-- handle_mm_fault | | | | | |--99.74%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--76.24%-- 0x10100000006 | | | | | | | --23.76%-- 0x10100000002 | | --0.26%-- [...] | --0.38%-- [...] | --15.77%-- alloc_pages_current pte_alloc_one | |--97.49%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--57.31%-- 0x10100000006 | | | --42.69%-- 0x10100000002 | --2.51%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--61.90%-- 0x10100000006 | --38.10%-- 0x10100000002 2.66% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.99%-- native_flush_tlb_others | | | |--99.79%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.21%-- [...] --0.01%-- [...] 1.62% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.58%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.42%-- 0x10100000006 | | | --22.58%-- 0x10100000002 --0.42%-- [...] 1.17% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.65%-- memcmp_pages | | | |--78.67%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.33%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.35%-- [...] 1.16% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.47%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--74.69%-- 0x10100000006 | | | --25.31%-- 0x10100000002 | --0.53%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--72.19%-- 0x10100000006 | --27.81%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.73%-- cpu_idle | | | |--84.39%-- start_secondary | | | --15.61%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.27%-- [...] 0.85% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.40%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.92%-- 0x10100000006 | | | --23.08%-- 0x10100000002 | --0.60%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.02%-- 0x10100000006 | --24.98%-- 0x10100000002 0.60% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock | --- __srcu_read_lock | |--92.87%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.37%-- 0x10100000006 | | | --23.63%-- 0x10100000002 | |--6.18%-- kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--74.92%-- 0x10100000006 | | | --25.08%-- 0x10100000002 --0.95%-- [...] 0.60% qemu-kvm [kernel.kallsyms] [k] __rcu_read_unlock | --- __rcu_read_unlock | |--79.70%-- get_pid_task | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--75.95%-- 0x10100000006 | | | --24.05%-- 0x10100000002 | |--11.44%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--75.32%-- 0x10100000006 | | | --24.68%-- 0x10100000002 | |--3.51%-- kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.56%-- 0x10100000006 | | | --23.44%-- 0x10100000002 | |--1.88%-- do_select | core_sys_select | sys_select | system_call_fastpath | __select | 0x0 | |--1.30%-- fget_light | | | |--71.87%-- do_select | | core_sys_select | | sys_select | | system_call_fastpath | | __select | | 0x0 | | | |--15.50%-- sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--50.94%-- 0x10100000002 | | | | | |--17.13%-- 0x2740310 | | | 0x0 | | | | | |--13.07%-- 0x225c310 | | | 0x0 | | | | | |--9.95%-- 0x2792310 | | | 0x0 | | | | | |--3.64%-- 0x75ed8548202c4b83 | | | | | |--1.87%-- 0x8800000 | | | 0x26433c0 | | | | | |--1.79%-- 0x10100000006 | | | | | |--0.95%-- 0x19800000 | | | 0x26953c0 | | | | | --0.67%-- 0x24bc8b4400000098 | | | |--7.32%-- sys_read | | system_call_fastpath | | read | | | | | --100.00%-- pthread_mutex_lock@plt | | | |--4.03%-- sys_write | | system_call_fastpath | | write | | | | | --100.00%-- 0x0 | | | |--0.69%-- sys_pread64 | | system_call_fastpath | | pread64 | | 0x269d260 | | 0x80 | | 0x480050b9e1058b48 | --0.59%-- [...] --2.18%-- [...] 0.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--50.00%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.93%-- 0x10100000006 | | | --22.07%-- 0x10100000002 | |--11.97%-- free_pcppages_bulk | | | |--67.09%-- free_hot_cold_page | | | | | |--87.14%-- free_hot_cold_page_list | | | | | | | |--62.82%-- shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--77.85%-- 0x10100000006 | | | | | | | | | --22.15%-- 0x10100000002 | | | | | | | --37.18%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--99.76%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--80.37%-- 0x10100000006 | | | | | | | | | --19.63%-- 0x10100000002 | | | --0.24%-- [...] | | | | | |--10.98%-- __free_pages | | | | | | | |--98.77%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--80.81%-- 0x10100000006 | | | | | | | | | --19.19%-- 0x10100000002 | | | | | | | --1.23%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | ext4_releasepage | | | try_to_release_page | | | shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--57.92%-- 0x10100000006 | | | | | | | --42.08%-- 0x10100000002 | | | | | --1.88%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--62.44%-- 0x10100000006 | | | | | --37.56%-- 0x10100000002 | | | --32.91%-- drain_pages | | | |--75.89%-- drain_local_pages | | | | | |--89.98%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--44.57%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--79.27%-- 0x10100000006 | | | | | | | | | --20.73%-- 0x10100000002 | | | | | | | |--16.92%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--86.24%-- 0x10100000006 | | | | | | | | | --13.76%-- 0x10100000002 | | | | | | | |--5.39%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--75.62%-- 0x10100000006 | | | | | | | | | --24.38%-- 0x10100000002 | | | | | | | |--3.26%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--85.62%-- 0x10100000006 | | | | | | | | | --14.38%-- 0x10100000002 | | | | | | | |--3.21%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--78.75%-- 0x10100000006 | | | | | | | | | --21.25%-- 0x10100000002 | | | | | | | |--3.01%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--84.48%-- 0x10100000006 | | | | | | | | | --15.52%-- 0x10100000002 | | | | | | | |--2.25%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--58.91%-- 0x10100000006 | | | | | | | | | --41.09%-- 0x10100000002 | | | | | | | |--2.07%-- compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--67.59%-- 0x10100000006 | | | | | | | | | --32.41%-- 0x10100000002 | | | | | | | |--1.80%-- native_flush_tlb_others | | | | | | | | | |--75.08%-- flush_tlb_page | | | | | | | | | | | |--82.69%-- ptep_clear_flush_young | | | | | | page_referenced_one | | | | | | page_referenced | | | | | | shrink_active_list | | | | | | shrink_lruvec | | | | | | try_to_free_pages | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--78.99%-- 0x10100000006 | | | | | | | | | | | | | --21.01%-- 0x10100000002 | | | | | | | | | | | --17.31%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask | | | | | alloc_pages_vma | | | | | do_huge_pmd_anonymous_page | | | | | handle_mm_fault | | | | | __get_user_pages | | | | | get_user_page_nowait | | | | | hva_to_pfn.isra.17 | | | | | __gfn_to_pfn | | | | | gfn_to_pfn_async | | | | | try_async_pf | | | | | tdp_page_fault | | | | | kvm_mmu_page_fault | | | | | pf_interception | | | | | handle_exit | | | | | kvm_arch_vcpu_ioctl_run | | | | | kvm_vcpu_ioctl | | | | | do_vfs_ioctl | | | | | sys_ioctl | | | | | system_call_fastpath | | | | | ioctl | | | | | 0x10100000006 | | | | | | | | | --24.92%-- flush_tlb_mm_range | | | | pmdp_clear_flush_young | | | | page_referenced_one | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-18 8:14 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-18 8:14 UTC (permalink / raw) To: Mel Gorman; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li Hi Mel, Thanks for your latest patch, I attach a perf report below with this on top of all previous patches. There is still lock contention, though in a different place. Regarding Rik's question: > > Mel asked for timings of the slow boots. It's very hard to give anything > > useful here! A normal boot would be a minute or so, and many are like > > that, but the slowest that I have seen (on 3.5.x) was several hours. > > Basically, I just test many times until I get one which is noticeably > > slow than normal and then run perf record on that one. > > > > The latest perf report for a slow boot is below. For the fast boots, > > most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but > > for this slow one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? It is about 1/3rd slow boots, some of which are slower than others. I do about ten and send you the trace of the worst. Experimentally, copying large files (the VM image files) immediately before booting the VM seems to make a slow boot more likely. Thanks, Richard. # ======== # captured on: Mon Sep 17 20:09:33 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 4M of event 'cycles' # Event count (approx.): 1616311320818 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.30%-- compact_checklock_irqsave | | | |--99.98%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--84.28%-- 0x10100000006 | | | | | --15.72%-- 0x10100000002 | --0.02%-- [...] | |--0.65%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.37%-- 0x10100000006 | | | --16.63%-- 0x10100000002 --0.05%-- [...] 12.27% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.90%-- 0x10100000006 | | | --17.10%-- 0x10100000002 --0.01%-- [...] 7.90% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.19%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--64.93%-- 0x10100000006 | | | --35.07%-- 0x10100000002 | --0.81%-- __alloc_pages_nodemask | |--84.23%-- alloc_pages_vma | handle_pte_fault | | | |--99.62%-- handle_mm_fault | | | | | |--99.74%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--76.24%-- 0x10100000006 | | | | | | | --23.76%-- 0x10100000002 | | --0.26%-- [...] | --0.38%-- [...] | --15.77%-- alloc_pages_current pte_alloc_one | |--97.49%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--57.31%-- 0x10100000006 | | | --42.69%-- 0x10100000002 | --2.51%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--61.90%-- 0x10100000006 | --38.10%-- 0x10100000002 2.66% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.99%-- native_flush_tlb_others | | | |--99.79%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.21%-- [...] --0.01%-- [...] 1.62% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.58%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.42%-- 0x10100000006 | | | --22.58%-- 0x10100000002 --0.42%-- [...] 1.17% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.65%-- memcmp_pages | | | |--78.67%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.33%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.35%-- [...] 1.16% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.47%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--74.69%-- 0x10100000006 | | | --25.31%-- 0x10100000002 | --0.53%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--72.19%-- 0x10100000006 | --27.81%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.73%-- cpu_idle | | | |--84.39%-- start_secondary | | | --15.61%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.27%-- [...] 0.85% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.40%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.92%-- 0x10100000006 | | | --23.08%-- 0x10100000002 | --0.60%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.02%-- 0x10100000006 | --24.98%-- 0x10100000002 0.60% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock | --- __srcu_read_lock | |--92.87%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.37%-- 0x10100000006 | | | --23.63%-- 0x10100000002 | |--6.18%-- kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--74.92%-- 0x10100000006 | | | --25.08%-- 0x10100000002 --0.95%-- [...] 0.60% qemu-kvm [kernel.kallsyms] [k] __rcu_read_unlock | --- __rcu_read_unlock | |--79.70%-- get_pid_task | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--75.95%-- 0x10100000006 | | | --24.05%-- 0x10100000002 | |--11.44%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--75.32%-- 0x10100000006 | | | --24.68%-- 0x10100000002 | |--3.51%-- kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.56%-- 0x10100000006 | | | --23.44%-- 0x10100000002 | |--1.88%-- do_select | core_sys_select | sys_select | system_call_fastpath | __select | 0x0 | |--1.30%-- fget_light | | | |--71.87%-- do_select | | core_sys_select | | sys_select | | system_call_fastpath | | __select | | 0x0 | | | |--15.50%-- sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--50.94%-- 0x10100000002 | | | | | |--17.13%-- 0x2740310 | | | 0x0 | | | | | |--13.07%-- 0x225c310 | | | 0x0 | | | | | |--9.95%-- 0x2792310 | | | 0x0 | | | | | |--3.64%-- 0x75ed8548202c4b83 | | | | | |--1.87%-- 0x8800000 | | | 0x26433c0 | | | | | |--1.79%-- 0x10100000006 | | | | | |--0.95%-- 0x19800000 | | | 0x26953c0 | | | | | --0.67%-- 0x24bc8b4400000098 | | | |--7.32%-- sys_read | | system_call_fastpath | | read | | | | | --100.00%-- pthread_mutex_lock@plt | | | |--4.03%-- sys_write | | system_call_fastpath | | write | | | | | --100.00%-- 0x0 | | | |--0.69%-- sys_pread64 | | system_call_fastpath | | pread64 | | 0x269d260 | | 0x80 | | 0x480050b9e1058b48 | --0.59%-- [...] --2.18%-- [...] 0.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--50.00%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.93%-- 0x10100000006 | | | --22.07%-- 0x10100000002 | |--11.97%-- free_pcppages_bulk | | | |--67.09%-- free_hot_cold_page | | | | | |--87.14%-- free_hot_cold_page_list | | | | | | | |--62.82%-- shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--77.85%-- 0x10100000006 | | | | | | | | | --22.15%-- 0x10100000002 | | | | | | | --37.18%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--99.76%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--80.37%-- 0x10100000006 | | | | | | | | | --19.63%-- 0x10100000002 | | | --0.24%-- [...] | | | | | |--10.98%-- __free_pages | | | | | | | |--98.77%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--80.81%-- 0x10100000006 | | | | | | | | | --19.19%-- 0x10100000002 | | | | | | | --1.23%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | ext4_releasepage | | | try_to_release_page | | | shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--57.92%-- 0x10100000006 | | | | | | | --42.08%-- 0x10100000002 | | | | | --1.88%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--62.44%-- 0x10100000006 | | | | | --37.56%-- 0x10100000002 | | | --32.91%-- drain_pages | | | |--75.89%-- drain_local_pages | | | | | |--89.98%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--44.57%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--79.27%-- 0x10100000006 | | | | | | | | | --20.73%-- 0x10100000002 | | | | | | | |--16.92%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--86.24%-- 0x10100000006 | | | | | | | | | --13.76%-- 0x10100000002 | | | | | | | |--5.39%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--75.62%-- 0x10100000006 | | | | | | | | | --24.38%-- 0x10100000002 | | | | | | | |--3.26%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--85.62%-- 0x10100000006 | | | | | | | | | --14.38%-- 0x10100000002 | | | | | | | |--3.21%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--78.75%-- 0x10100000006 | | | | | | | | | --21.25%-- 0x10100000002 | | | | | | | |--3.01%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--84.48%-- 0x10100000006 | | | | | | | | | --15.52%-- 0x10100000002 | | | | | | | |--2.25%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--58.91%-- 0x10100000006 | | | | | | | | | --41.09%-- 0x10100000002 | | | | | | | |--2.07%-- compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--67.59%-- 0x10100000006 | | | | | | | | | --32.41%-- 0x10100000002 | | | | | | | |--1.80%-- native_flush_tlb_others | | | | | | | | | |--75.08%-- flush_tlb_page | | | | | | | | | | | |--82.69%-- ptep_clear_flush_young | | | | | | page_referenced_one | | | | | | page_referenced | | | | | | shrink_active_list | | | | | | shrink_lruvec | | | | | | try_to_free_pages | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--78.99%-- 0x10100000006 | | | | | | | | | | | | | --21.01%-- 0x10100000002 | | | | | | | | | | | --17.31%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask | | | | | alloc_pages_vma | | | | | do_huge_pmd_anonymous_page | | | | | handle_mm_fault | | | | | __get_user_pages | | | | | get_user_page_nowait | | | | | hva_to_pfn.isra.17 | | | | | __gfn_to_pfn | | | | | gfn_to_pfn_async | | | | | try_async_pf | | | | | tdp_page_fault | | | | | kvm_mmu_page_fault | | | | | pf_interception | | | | | handle_exit | | | | | kvm_arch_vcpu_ioctl_run | | | | | kvm_vcpu_ioctl | | | | | do_vfs_ioctl | | | | | sys_ioctl | | | | | system_call_fastpath | | | | | ioctl | | | | | 0x10100000006 | | | | | | | | | --24.92%-- flush_tlb_mm_range | | | | pmdp_clear_flush_young | | | | page_referenced_one | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-18 8:14 ` [Qemu-devel] " Richard Davies (?) @ 2012-09-18 11:21 ` Mel Gorman -1 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote: > Hi Mel, > > Thanks for your latest patch, I attach a perf report below with this on top > of all previous patches. There is still lock contention, though in a > different place. > > 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--99.30%-- compact_checklock_irqsave > | | > | |--99.98%-- compaction_alloc Ok, this just means the focus has moved to the zone->lock instead of the zone->lru_lock. This was expected to some extent. This is an additional patch that defers acquisition of the zone->lock for as long as possible. Incidentally, I checked the efficiency of compaction - i.e. how many pages scanned versus how many pages isolated and the efficiency completely sucks. It must be addressed but addressing the lock contention should happen first. ---8<--- mm: compaction: Acquire the zone->lock as late as possible The zone lock is required when isolating pages to allocate and for checking PageBuddy. It is a coarse-grained lock but the current implementation acquires the lock when examining each pageblock before it is known if there are free pages to isolate. This patch defers acquiring the zone lock for as long as possible. In the event there are no free pages in the pageblock then the lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 80 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 33 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a5d698f..57ff9ef 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } -static inline bool compact_trylock_irqsave(spinlock_t *lock, - unsigned long *flags, struct compact_control *cc) -{ - return compact_checklock_irqsave(lock, flags, false, cc); -} - /* * Isolate free pages onto a private freelist. Caller must hold zone->lock. * If @strict is true, will abort returning 0 on any invalid PFNs or non-free * pages inside of the pageblock (even though it may still end up isolating * some pages). */ -static unsigned long isolate_freepages_block(unsigned long start_pfn, +static unsigned long isolate_freepages_block(struct compact_control *cc, + unsigned long start_pfn, unsigned long end_pfn, struct list_head *freelist, bool strict) @@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int nr_scanned = 0, total_isolated = 0; unsigned long blockpfn = start_pfn; struct page *cursor; + unsigned long flags; + bool locked = false; cursor = pfn_to_page(blockpfn); @@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int isolated, i; struct page *page = cursor; - if (!pfn_valid_within(blockpfn)) { - if (strict) - return 0; - continue; - } + if (!pfn_valid_within(blockpfn)) + goto strict_check; nr_scanned++; - if (!PageBuddy(page)) { - if (strict) - return 0; - continue; - } + if (!PageBuddy(page)) + goto strict_check; + + /* + * The zone lock must be held to isolate freepages. This + * unfortunately this is a very coarse lock and can be + * heavily contended if there are parallel allocations + * or parallel compactions. For async compaction do not + * spin on the lock and we acquire the lock as late as + * possible. + */ + locked = compact_checklock_irqsave(&cc->zone->lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck this is a buddy page under lock */ + if (!PageBuddy(page)) + goto strict_check; /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); @@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, blockpfn += isolated - 1; cursor += isolated - 1; } + + continue; + +strict_check: + /* Abort isolation if the caller requested strict isolation */ + if (strict) { + total_isolated = 0; + goto out; + } } trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned, total_isolated); + +out: + if (locked) + spin_unlock_irqrestore(&cc->zone->lock, flags); + return total_isolated; } @@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, unsigned long isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long isolated, pfn, block_end_pfn, flags; + unsigned long isolated, pfn, block_end_pfn; struct zone *zone = NULL; LIST_HEAD(freelist); + struct compact_control cc; if (pfn_valid(start_pfn)) zone = page_zone(pfn_to_page(start_pfn)); + /* cc needed for isolate_freepages_block to acquire zone->lock */ + cc.zone = zone; + cc.sync = true; + for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn))) break; @@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); block_end_pfn = min(block_end_pfn, end_pfn); - spin_lock_irqsave(&zone->lock, flags); - isolated = isolate_freepages_block(pfn, block_end_pfn, + isolated = isolate_freepages_block(&cc, pfn, block_end_pfn, &freelist, true); - spin_unlock_irqrestore(&zone->lock, flags); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone, { struct page *page; unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn; - unsigned long flags; int nr_freepages = cc->nr_freepages; struct list_head *freelist = &cc->freepages; @@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone, */ isolated = 0; - /* - * The zone lock must be held to isolate freepages. This - * unfortunately this is a very coarse lock and can be - * heavily contended if there are parallel allocations - * or parallel compactions. For async compaction do not - * spin on the lock - */ - if (!compact_trylock_irqsave(&zone->lock, &flags, cc)) - break; if (suitable_migration_target(page)) { end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); - isolated = isolate_freepages_block(pfn, end_pfn, + isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; } - spin_unlock_irqrestore(&zone->lock, flags); /* * Record the highest PFN we isolated pages from. When next ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-18 11:21 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw) To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote: > Hi Mel, > > Thanks for your latest patch, I attach a perf report below with this on top > of all previous patches. There is still lock contention, though in a > different place. > > 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--99.30%-- compact_checklock_irqsave > | | > | |--99.98%-- compaction_alloc Ok, this just means the focus has moved to the zone->lock instead of the zone->lru_lock. This was expected to some extent. This is an additional patch that defers acquisition of the zone->lock for as long as possible. Incidentally, I checked the efficiency of compaction - i.e. how many pages scanned versus how many pages isolated and the efficiency completely sucks. It must be addressed but addressing the lock contention should happen first. ---8<--- mm: compaction: Acquire the zone->lock as late as possible The zone lock is required when isolating pages to allocate and for checking PageBuddy. It is a coarse-grained lock but the current implementation acquires the lock when examining each pageblock before it is known if there are free pages to isolate. This patch defers acquiring the zone lock for as long as possible. In the event there are no free pages in the pageblock then the lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 80 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 33 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a5d698f..57ff9ef 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } -static inline bool compact_trylock_irqsave(spinlock_t *lock, - unsigned long *flags, struct compact_control *cc) -{ - return compact_checklock_irqsave(lock, flags, false, cc); -} - /* * Isolate free pages onto a private freelist. Caller must hold zone->lock. * If @strict is true, will abort returning 0 on any invalid PFNs or non-free * pages inside of the pageblock (even though it may still end up isolating * some pages). */ -static unsigned long isolate_freepages_block(unsigned long start_pfn, +static unsigned long isolate_freepages_block(struct compact_control *cc, + unsigned long start_pfn, unsigned long end_pfn, struct list_head *freelist, bool strict) @@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int nr_scanned = 0, total_isolated = 0; unsigned long blockpfn = start_pfn; struct page *cursor; + unsigned long flags; + bool locked = false; cursor = pfn_to_page(blockpfn); @@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int isolated, i; struct page *page = cursor; - if (!pfn_valid_within(blockpfn)) { - if (strict) - return 0; - continue; - } + if (!pfn_valid_within(blockpfn)) + goto strict_check; nr_scanned++; - if (!PageBuddy(page)) { - if (strict) - return 0; - continue; - } + if (!PageBuddy(page)) + goto strict_check; + + /* + * The zone lock must be held to isolate freepages. This + * unfortunately this is a very coarse lock and can be + * heavily contended if there are parallel allocations + * or parallel compactions. For async compaction do not + * spin on the lock and we acquire the lock as late as + * possible. + */ + locked = compact_checklock_irqsave(&cc->zone->lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck this is a buddy page under lock */ + if (!PageBuddy(page)) + goto strict_check; /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); @@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, blockpfn += isolated - 1; cursor += isolated - 1; } + + continue; + +strict_check: + /* Abort isolation if the caller requested strict isolation */ + if (strict) { + total_isolated = 0; + goto out; + } } trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned, total_isolated); + +out: + if (locked) + spin_unlock_irqrestore(&cc->zone->lock, flags); + return total_isolated; } @@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, unsigned long isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long isolated, pfn, block_end_pfn, flags; + unsigned long isolated, pfn, block_end_pfn; struct zone *zone = NULL; LIST_HEAD(freelist); + struct compact_control cc; if (pfn_valid(start_pfn)) zone = page_zone(pfn_to_page(start_pfn)); + /* cc needed for isolate_freepages_block to acquire zone->lock */ + cc.zone = zone; + cc.sync = true; + for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn))) break; @@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); block_end_pfn = min(block_end_pfn, end_pfn); - spin_lock_irqsave(&zone->lock, flags); - isolated = isolate_freepages_block(pfn, block_end_pfn, + isolated = isolate_freepages_block(&cc, pfn, block_end_pfn, &freelist, true); - spin_unlock_irqrestore(&zone->lock, flags); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone, { struct page *page; unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn; - unsigned long flags; int nr_freepages = cc->nr_freepages; struct list_head *freelist = &cc->freepages; @@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone, */ isolated = 0; - /* - * The zone lock must be held to isolate freepages. This - * unfortunately this is a very coarse lock and can be - * heavily contended if there are parallel allocations - * or parallel compactions. For async compaction do not - * spin on the lock - */ - if (!compact_trylock_irqsave(&zone->lock, &flags, cc)) - break; if (suitable_migration_target(page)) { end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); - isolated = isolate_freepages_block(pfn, end_pfn, + isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; } - spin_unlock_irqrestore(&zone->lock, flags); /* * Record the highest PFN we isolated pages from. When next ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-18 11:21 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote: > Hi Mel, > > Thanks for your latest patch, I attach a perf report below with this on top > of all previous patches. There is still lock contention, though in a > different place. > > 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--99.30%-- compact_checklock_irqsave > | | > | |--99.98%-- compaction_alloc Ok, this just means the focus has moved to the zone->lock instead of the zone->lru_lock. This was expected to some extent. This is an additional patch that defers acquisition of the zone->lock for as long as possible. Incidentally, I checked the efficiency of compaction - i.e. how many pages scanned versus how many pages isolated and the efficiency completely sucks. It must be addressed but addressing the lock contention should happen first. ---8<--- mm: compaction: Acquire the zone->lock as late as possible The zone lock is required when isolating pages to allocate and for checking PageBuddy. It is a coarse-grained lock but the current implementation acquires the lock when examining each pageblock before it is known if there are free pages to isolate. This patch defers acquiring the zone lock for as long as possible. In the event there are no free pages in the pageblock then the lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 80 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 33 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a5d698f..57ff9ef 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } -static inline bool compact_trylock_irqsave(spinlock_t *lock, - unsigned long *flags, struct compact_control *cc) -{ - return compact_checklock_irqsave(lock, flags, false, cc); -} - /* * Isolate free pages onto a private freelist. Caller must hold zone->lock. * If @strict is true, will abort returning 0 on any invalid PFNs or non-free * pages inside of the pageblock (even though it may still end up isolating * some pages). */ -static unsigned long isolate_freepages_block(unsigned long start_pfn, +static unsigned long isolate_freepages_block(struct compact_control *cc, + unsigned long start_pfn, unsigned long end_pfn, struct list_head *freelist, bool strict) @@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int nr_scanned = 0, total_isolated = 0; unsigned long blockpfn = start_pfn; struct page *cursor; + unsigned long flags; + bool locked = false; cursor = pfn_to_page(blockpfn); @@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int isolated, i; struct page *page = cursor; - if (!pfn_valid_within(blockpfn)) { - if (strict) - return 0; - continue; - } + if (!pfn_valid_within(blockpfn)) + goto strict_check; nr_scanned++; - if (!PageBuddy(page)) { - if (strict) - return 0; - continue; - } + if (!PageBuddy(page)) + goto strict_check; + + /* + * The zone lock must be held to isolate freepages. This + * unfortunately this is a very coarse lock and can be + * heavily contended if there are parallel allocations + * or parallel compactions. For async compaction do not + * spin on the lock and we acquire the lock as late as + * possible. + */ + locked = compact_checklock_irqsave(&cc->zone->lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck this is a buddy page under lock */ + if (!PageBuddy(page)) + goto strict_check; /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); @@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, blockpfn += isolated - 1; cursor += isolated - 1; } + + continue; + +strict_check: + /* Abort isolation if the caller requested strict isolation */ + if (strict) { + total_isolated = 0; + goto out; + } } trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned, total_isolated); + +out: + if (locked) + spin_unlock_irqrestore(&cc->zone->lock, flags); + return total_isolated; } @@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, unsigned long isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long isolated, pfn, block_end_pfn, flags; + unsigned long isolated, pfn, block_end_pfn; struct zone *zone = NULL; LIST_HEAD(freelist); + struct compact_control cc; if (pfn_valid(start_pfn)) zone = page_zone(pfn_to_page(start_pfn)); + /* cc needed for isolate_freepages_block to acquire zone->lock */ + cc.zone = zone; + cc.sync = true; + for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn))) break; @@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); block_end_pfn = min(block_end_pfn, end_pfn); - spin_lock_irqsave(&zone->lock, flags); - isolated = isolate_freepages_block(pfn, block_end_pfn, + isolated = isolate_freepages_block(&cc, pfn, block_end_pfn, &freelist, true); - spin_unlock_irqrestore(&zone->lock, flags); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone, { struct page *page; unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn; - unsigned long flags; int nr_freepages = cc->nr_freepages; struct list_head *freelist = &cc->freepages; @@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone, */ isolated = 0; - /* - * The zone lock must be held to isolate freepages. This - * unfortunately this is a very coarse lock and can be - * heavily contended if there are parallel allocations - * or parallel compactions. For async compaction do not - * spin on the lock - */ - if (!compact_trylock_irqsave(&zone->lock, &flags, cc)) - break; if (suitable_migration_target(page)) { end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); - isolated = isolate_freepages_block(pfn, end_pfn, + isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; } - spin_unlock_irqrestore(&zone->lock, flags); /* * Record the highest PFN we isolated pages from. When next -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-18 11:21 ` Mel Gorman @ 2012-09-18 17:58 ` Richard Davies -1 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-18 17:58 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Mel Gorman wrote: > Ok, this just means the focus has moved to the zone->lock instead of the > zone->lru_lock. This was expected to some extent. This is an additional > patch that defers acquisition of the zone->lock for as long as possible. And I believe you have now beaten the lock contention - congratulations! > Incidentally, I checked the efficiency of compaction - i.e. how many > pages scanned versus how many pages isolated and the efficiency > completely sucks. It must be addressed but addressing the lock > contention should happen first. Yes, compaction is now definitely top. Interestingly, some boots still seem "slow" and some "fast", even without any lock contention issues. Here are traces from a few different runs, and I attach the detailed report for the first of these which was one of the slow ones. # grep -F '[k]' report.1 | head -8 55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c 2.18% qemu-kvm [kernel.kallsyms] [k] yield_to 1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone 1.56% ksmd [kernel.kallsyms] [k] memcmp 1.48% swapper [kernel.kallsyms] [k] default_idle 1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run # # grep -F '[k]' report.2 | head -8 38.28% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 7.58% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 7.03% qemu-kvm [kernel.kallsyms] [k] clear_page_c 4.72% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range 4.31% qemu-kvm [kernel.kallsyms] [k] copy_page_c 4.15% qemu-kvm [kernel.kallsyms] [k] compact_zone 2.68% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok 2.65% qemu-kvm [kernel.kallsyms] [k] yield_to # # grep -F '[k]' report.3 | head -8 75.18% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.82% swapper [kernel.kallsyms] [k] default_idle 1.29% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 1.27% qemu-kvm [kernel.kallsyms] [k] get_page_from_freelist 1.20% ksmd [kernel.kallsyms] [k] memcmp 0.83% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare 0.78% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 0.59% qemu-kvm [kernel.kallsyms] [k] prep_compound_page # # grep -F '[k]' report.4 | head -8 41.02% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 32.20% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.76% qemu-kvm [kernel.kallsyms] [k] yield_to 1.37% swapper [kernel.kallsyms] [k] default_idle 1.35% ksmd [kernel.kallsyms] [k] memcmp 1.27% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 1.23% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 0.88% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin # # grep -F '[k]' report.5 | head -8 61.18% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 14.55% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.75% qemu-kvm [kernel.kallsyms] [k] yield_to 1.31% ksmd [kernel.kallsyms] [k] memcmp 1.21% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 1.20% swapper [kernel.kallsyms] [k] default_idle 1.14% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 0.94% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin Here is the detailed report for the first of these: # ======== # captured on: Tue Sep 18 17:03:40 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1184064513533 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.73%-- 0x10100000006 | | | --11.27%-- 0x10100000002 --0.01%-- [...] 14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.84%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.15%-- 0x10100000006 | | | --44.85%-- 0x10100000002 --0.16%-- [...] 2.18% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.62%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.34%-- 0x10100000006 | | | --16.66%-- 0x10100000002 --0.38%-- [...] 1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--57.67%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--86.10%-- 0x10100000006 | | | --13.90%-- 0x10100000002 | |--38.10%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.50%-- 0x10100000006 | | | --11.50%-- 0x10100000002 | |--2.23%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--85.85%-- 0x10100000006 | | | --14.15%-- 0x10100000002 | |--0.88%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--87.75%-- 0x10100000006 | | | --12.25%-- 0x10100000002 | |--0.75%-- free_hot_cold_page | | | |--74.93%-- free_hot_cold_page_list | | | | | |--53.13%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--82.85%-- 0x10100000006 | | | | | | | --17.15%-- 0x10100000002 | | | | | --46.87%-- release_pages | | pagevec_lru_move_fn | | __pagevec_lru_add | | | | | |--98.13%-- __lru_cache_add | | | lru_cache_add_lru | | | putback_lru_page | | | | | | | |--99.02%-- migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--88.56%-- 0x10100000006 | | | | | | | | | --11.44%-- 0x10100000002 | | | | | | | --0.98%-- putback_lru_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000002 | | | | | --1.87%-- lru_add_drain_cpu | | lru_add_drain | | | | | |--51.26%-- shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000002 | | | | | --48.74%-- migrate_prep_local | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--23.04%-- __free_pages | | | | | |--59.57%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--89.08%-- 0x10100000006 | | | | | | | --10.92%-- 0x10100000002 | | | | | |--30.57%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--60.91%-- 0x10100000006 | | | | | | | --39.09%-- 0x10100000002 | | | | | --9.86%-- __free_slab | | discard_slab | | | | | |--55.43%-- unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | ext4_releasepage | | | try_to_release_page | | | shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --44.57%-- __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | ext4_releasepage | | try_to_release_page | | shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.02%-- __put_single_page | put_page | putback_lru_page | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.36%-- 0x10100000006 | | | --16.64%-- 0x10100000002 --0.37%-- [...] 1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.99%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--85.25%-- 0x10100000006 | | | --14.75%-- 0x10100000002 --0.01%-- [...] 1.56% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.67%-- memcmp_pages | | | |--77.39%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.61%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.33%-- [...] 1.48% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.55%-- cpu_idle | | | |--92.95%-- start_secondary | | | --7.05%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.45%-- [...] 1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.34%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.65%-- 0x10100000006 | | | --22.35%-- 0x10100000002 | --0.66%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--73.97%-- 0x10100000006 | --26.03%-- 0x10100000002 1.08% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.27%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.21%-- 0x10100000006 | | | --16.79%-- 0x10100000002 | --0.73%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--80.89%-- 0x10100000006 | --19.11%-- 0x10100000002 0.79% qemu-kvm qemu-kvm [.] 0x00000000000ae282 | |--1.27%-- 0x4eec6e | | | |--38.48%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.35%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --29.16%-- 0x200c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.24%-- 0x503457 | 0x0 | |--1.02%-- 0x4eec20 | | | |--46.48%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--28.52%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --24.99%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.00%-- 0x4eec2a | | | |--77.52%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--12.67%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --9.80%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.99%-- 0x4ef092 | |--0.94%-- 0x568f04 | | | |--89.85%-- 0x0 | | | |--7.89%-- 0x10100000002 | | | --2.26%-- 0x10100000006 | |--0.93%-- 0x5afab4 | | | |--40.39%-- 0x309a410 | | 0x0 | | | |--31.80%-- 0x1f11410 | | 0x0 | | | |--20.88%-- 0x1396410 | | 0x0 | | | |--4.58%-- 0x0 | | | | | |--52.36%-- 0x148ea00 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--31.49%-- 0x2009a00 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --16.15%-- 0x3192a00 | | 0x5699c0 | | 0x24448948004b4154 | | | |--1.31%-- 0x1000 | | | --1.03%-- 0x6 | |--0.92%-- 0x4eeba0 | | | |--35.54%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.33%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --32.12%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.91%-- 0x652b11 | |--0.83%-- 0x65a102 | |--0.82%-- 0x40a6a9 | |--0.81%-- 0x530421 | | | |--94.43%-- 0x0 | | | --5.57%-- 0x46b47b | | | |--51.32%-- 0xdffec96000a08169 | | | --48.68%-- 0xdffec90000a08169 | |--0.80%-- 0x569fc4 | | | |--41.34%-- 0x1396410 | | 0x0 | | | |--29.46%-- 0x1f11410 | | 0x0 | | | --29.21%-- 0x309a410 | 0x0 | |--0.73%-- 0x541422 | 0x0 | |--0.70%-- 0x56b990 | | | |--72.77%-- 0x100000008 | | | |--26.00%-- 0xfed00000 | | | | | --100.00%-- 0x0 | | | |--0.73%-- 0x100000004 | --0.50%-- [...] | |--0.69%-- 0x525261 | 0x0 | 0x822ee8fff96873e9 | |--0.69%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.67%-- 0x52fb44 | | | |--75.44%-- 0x0 | | | |--17.16%-- 0x10100000002 | | | --7.41%-- 0x10100000006 | |--0.66%-- 0x568e29 | | | |--50.87%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--33.04%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--13.60%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--1.40%-- 0x1000 | | | |--0.65%-- 0x3000 | --0.43%-- [...] | |--0.65%-- 0x5b4cb4 | 0x0 | 0x822ee8fff96873e9 | |--0.62%-- 0x55b9ba | | | |--50.14%-- 0x0 | | | --49.86%-- 0x2000000 | |--0.61%-- 0x4ff496 | |--0.60%-- 0x672601 | 0x1 | |--0.58%-- 0x4eec06 | | | |--75.93%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--15.91%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.15%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.58%-- 0x477a32 | 0x0 | |--0.56%-- 0x477b27 | 0x0 | |--0.56%-- 0x540e24 | |--0.56%-- 0x40a4f4 | |--0.55%-- 0x659d12 | 0x0 | |--0.55%-- 0x4eec22 | | | |--44.24%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.08%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --23.68%-- 0x3195280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.53%-- 0x564394 | | | |--69.75%-- 0x0 | | | |--23.87%-- 0x10100000002 | | | --6.38%-- 0x10100000006 | |--0.52%-- 0x4eeb52 | |--0.51%-- 0x530094 | |--0.50%-- 0x477a9e | 0x0 --74.90%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock | --- __srcu_read_lock | |--91.98%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--81.72%-- 0x10100000006 | | | --18.28%-- 0x10100000002 | |--5.81%-- kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.63%-- 0x10100000006 | | | --21.37%-- 0x10100000002 | |--1.06%-- fsnotify | vfs_write | | | |--98.29%-- sys_write | | system_call_fastpath | | write | | | | | --100.00%-- 0x0 | | | --1.71%-- sys_pwrite64 | system_call_fastpath | pwrite64 | | | |--55.68%-- 0x1f12260 | | 0x80 | | 0x480050b9e1058b48 | | | --44.32%-- 0x309b260 | 0x80 | 0x480050b9e1058b48 | |--0.91%-- kvm_mmu_notifier_invalidate_page | __mmu_notifier_invalidate_page | try_to_unmap_one | | | |--98.79%-- try_to_unmap_anon | | try_to_unmap | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-18 17:58 ` Richard Davies 0 siblings, 0 replies; 101+ messages in thread From: Richard Davies @ 2012-09-18 17:58 UTC (permalink / raw) To: Mel Gorman; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li Mel Gorman wrote: > Ok, this just means the focus has moved to the zone->lock instead of the > zone->lru_lock. This was expected to some extent. This is an additional > patch that defers acquisition of the zone->lock for as long as possible. And I believe you have now beaten the lock contention - congratulations! > Incidentally, I checked the efficiency of compaction - i.e. how many > pages scanned versus how many pages isolated and the efficiency > completely sucks. It must be addressed but addressing the lock > contention should happen first. Yes, compaction is now definitely top. Interestingly, some boots still seem "slow" and some "fast", even without any lock contention issues. Here are traces from a few different runs, and I attach the detailed report for the first of these which was one of the slow ones. # grep -F '[k]' report.1 | head -8 55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c 2.18% qemu-kvm [kernel.kallsyms] [k] yield_to 1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone 1.56% ksmd [kernel.kallsyms] [k] memcmp 1.48% swapper [kernel.kallsyms] [k] default_idle 1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run # # grep -F '[k]' report.2 | head -8 38.28% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 7.58% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 7.03% qemu-kvm [kernel.kallsyms] [k] clear_page_c 4.72% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range 4.31% qemu-kvm [kernel.kallsyms] [k] copy_page_c 4.15% qemu-kvm [kernel.kallsyms] [k] compact_zone 2.68% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok 2.65% qemu-kvm [kernel.kallsyms] [k] yield_to # # grep -F '[k]' report.3 | head -8 75.18% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.82% swapper [kernel.kallsyms] [k] default_idle 1.29% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 1.27% qemu-kvm [kernel.kallsyms] [k] get_page_from_freelist 1.20% ksmd [kernel.kallsyms] [k] memcmp 0.83% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare 0.78% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 0.59% qemu-kvm [kernel.kallsyms] [k] prep_compound_page # # grep -F '[k]' report.4 | head -8 41.02% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 32.20% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.76% qemu-kvm [kernel.kallsyms] [k] yield_to 1.37% swapper [kernel.kallsyms] [k] default_idle 1.35% ksmd [kernel.kallsyms] [k] memcmp 1.27% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 1.23% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 0.88% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin # # grep -F '[k]' report.5 | head -8 61.18% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 14.55% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.75% qemu-kvm [kernel.kallsyms] [k] yield_to 1.31% ksmd [kernel.kallsyms] [k] memcmp 1.21% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 1.20% swapper [kernel.kallsyms] [k] default_idle 1.14% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 0.94% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin Here is the detailed report for the first of these: # ======== # captured on: Tue Sep 18 17:03:40 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1184064513533 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.73%-- 0x10100000006 | | | --11.27%-- 0x10100000002 --0.01%-- [...] 14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.84%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.15%-- 0x10100000006 | | | --44.85%-- 0x10100000002 --0.16%-- [...] 2.18% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.62%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.34%-- 0x10100000006 | | | --16.66%-- 0x10100000002 --0.38%-- [...] 1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--57.67%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--86.10%-- 0x10100000006 | | | --13.90%-- 0x10100000002 | |--38.10%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.50%-- 0x10100000006 | | | --11.50%-- 0x10100000002 | |--2.23%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--85.85%-- 0x10100000006 | | | --14.15%-- 0x10100000002 | |--0.88%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--87.75%-- 0x10100000006 | | | --12.25%-- 0x10100000002 | |--0.75%-- free_hot_cold_page | | | |--74.93%-- free_hot_cold_page_list | | | | | |--53.13%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--82.85%-- 0x10100000006 | | | | | | | --17.15%-- 0x10100000002 | | | | | --46.87%-- release_pages | | pagevec_lru_move_fn | | __pagevec_lru_add | | | | | |--98.13%-- __lru_cache_add | | | lru_cache_add_lru | | | putback_lru_page | | | | | | | |--99.02%-- migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--88.56%-- 0x10100000006 | | | | | | | | | --11.44%-- 0x10100000002 | | | | | | | --0.98%-- putback_lru_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000002 | | | | | --1.87%-- lru_add_drain_cpu | | lru_add_drain | | | | | |--51.26%-- shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000002 | | | | | --48.74%-- migrate_prep_local | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--23.04%-- __free_pages | | | | | |--59.57%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--89.08%-- 0x10100000006 | | | | | | | --10.92%-- 0x10100000002 | | | | | |--30.57%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--60.91%-- 0x10100000006 | | | | | | | --39.09%-- 0x10100000002 | | | | | --9.86%-- __free_slab | | discard_slab | | | | | |--55.43%-- unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | ext4_releasepage | | | try_to_release_page | | | shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --44.57%-- __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | ext4_releasepage | | try_to_release_page | | shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.02%-- __put_single_page | put_page | putback_lru_page | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.36%-- 0x10100000006 | | | --16.64%-- 0x10100000002 --0.37%-- [...] 1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.99%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--85.25%-- 0x10100000006 | | | --14.75%-- 0x10100000002 --0.01%-- [...] 1.56% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.67%-- memcmp_pages | | | |--77.39%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.61%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.33%-- [...] 1.48% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.55%-- cpu_idle | | | |--92.95%-- start_secondary | | | --7.05%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.45%-- [...] 1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.34%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.65%-- 0x10100000006 | | | --22.35%-- 0x10100000002 | --0.66%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--73.97%-- 0x10100000006 | --26.03%-- 0x10100000002 1.08% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.27%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.21%-- 0x10100000006 | | | --16.79%-- 0x10100000002 | --0.73%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--80.89%-- 0x10100000006 | --19.11%-- 0x10100000002 0.79% qemu-kvm qemu-kvm [.] 0x00000000000ae282 | |--1.27%-- 0x4eec6e | | | |--38.48%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.35%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --29.16%-- 0x200c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.24%-- 0x503457 | 0x0 | |--1.02%-- 0x4eec20 | | | |--46.48%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--28.52%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --24.99%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.00%-- 0x4eec2a | | | |--77.52%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--12.67%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --9.80%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.99%-- 0x4ef092 | |--0.94%-- 0x568f04 | | | |--89.85%-- 0x0 | | | |--7.89%-- 0x10100000002 | | | --2.26%-- 0x10100000006 | |--0.93%-- 0x5afab4 | | | |--40.39%-- 0x309a410 | | 0x0 | | | |--31.80%-- 0x1f11410 | | 0x0 | | | |--20.88%-- 0x1396410 | | 0x0 | | | |--4.58%-- 0x0 | | | | | |--52.36%-- 0x148ea00 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--31.49%-- 0x2009a00 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --16.15%-- 0x3192a00 | | 0x5699c0 | | 0x24448948004b4154 | | | |--1.31%-- 0x1000 | | | --1.03%-- 0x6 | |--0.92%-- 0x4eeba0 | | | |--35.54%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.33%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --32.12%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.91%-- 0x652b11 | |--0.83%-- 0x65a102 | |--0.82%-- 0x40a6a9 | |--0.81%-- 0x530421 | | | |--94.43%-- 0x0 | | | --5.57%-- 0x46b47b | | | |--51.32%-- 0xdffec96000a08169 | | | --48.68%-- 0xdffec90000a08169 | |--0.80%-- 0x569fc4 | | | |--41.34%-- 0x1396410 | | 0x0 | | | |--29.46%-- 0x1f11410 | | 0x0 | | | --29.21%-- 0x309a410 | 0x0 | |--0.73%-- 0x541422 | 0x0 | |--0.70%-- 0x56b990 | | | |--72.77%-- 0x100000008 | | | |--26.00%-- 0xfed00000 | | | | | --100.00%-- 0x0 | | | |--0.73%-- 0x100000004 | --0.50%-- [...] | |--0.69%-- 0x525261 | 0x0 | 0x822ee8fff96873e9 | |--0.69%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.67%-- 0x52fb44 | | | |--75.44%-- 0x0 | | | |--17.16%-- 0x10100000002 | | | --7.41%-- 0x10100000006 | |--0.66%-- 0x568e29 | | | |--50.87%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--33.04%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--13.60%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--1.40%-- 0x1000 | | | |--0.65%-- 0x3000 | --0.43%-- [...] | |--0.65%-- 0x5b4cb4 | 0x0 | 0x822ee8fff96873e9 | |--0.62%-- 0x55b9ba | | | |--50.14%-- 0x0 | | | --49.86%-- 0x2000000 | |--0.61%-- 0x4ff496 | |--0.60%-- 0x672601 | 0x1 | |--0.58%-- 0x4eec06 | | | |--75.93%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--15.91%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.15%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.58%-- 0x477a32 | 0x0 | |--0.56%-- 0x477b27 | 0x0 | |--0.56%-- 0x540e24 | |--0.56%-- 0x40a4f4 | |--0.55%-- 0x659d12 | 0x0 | |--0.55%-- 0x4eec22 | | | |--44.24%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.08%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --23.68%-- 0x3195280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.53%-- 0x564394 | | | |--69.75%-- 0x0 | | | |--23.87%-- 0x10100000002 | | | --6.38%-- 0x10100000006 | |--0.52%-- 0x4eeb52 | |--0.51%-- 0x530094 | |--0.50%-- 0x477a9e | 0x0 --74.90%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock | --- __srcu_read_lock | |--91.98%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--81.72%-- 0x10100000006 | | | --18.28%-- 0x10100000002 | |--5.81%-- kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.63%-- 0x10100000006 | | | --21.37%-- 0x10100000002 | |--1.06%-- fsnotify | vfs_write | | | |--98.29%-- sys_write | | system_call_fastpath | | write | | | | | --100.00%-- 0x0 | | | --1.71%-- sys_pwrite64 | system_call_fastpath | pwrite64 | | | |--55.68%-- 0x1f12260 | | 0x80 | | 0x480050b9e1058b48 | | | --44.32%-- 0x309b260 | 0x80 | 0x480050b9e1058b48 | |--0.91%-- kvm_mmu_notifier_invalidate_page | __mmu_notifier_invalidate_page | try_to_unmap_one | | | |--98.79%-- try_to_unmap_anon | | try_to_unmap | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-15 15:55 ` Richard Davies @ 2012-09-17 13:50 ` Rik van Riel -1 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-17 13:50 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On 09/15/2012 11:55 AM, Richard Davies wrote: > Hi Rik, Mel and Shaohua, > > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. > > Mel asked for timings of the slow boots. It's very hard to give anything > useful here! A normal boot would be a minute or so, and many are like that, > but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > just test many times until I get one which is noticeably slow than normal > and then run perf record on that one. > > The latest perf report for a slow boot is below. For the fast boots, most of > the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > one there is a lot of lock contention above that. How often do you run into slow boots, vs. fast ones? > # Overhead Command Shared Object Symbol > # ........ ............... .................... .............................................. > # > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--95.07%-- compact_checklock_irqsave > | | > | |--70.03%-- isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask Looks like it moved from isolate_freepages_block in your last trace, to isolate_migratepages_range? Mel, I wonder if we have any quadratic complexity problems in this part of the code, too? The isolate_freepages_block CPU use can be fixed by simply restarting where the last invocation left off, instead of always starting at the end of the zone. Could we need something similar for isolate_migratepages_range? After all, Richard has a 128GB system, and runs 108GB worth of KVM guests on it... -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-17 13:50 ` Rik van Riel 0 siblings, 0 replies; 101+ messages in thread From: Rik van Riel @ 2012-09-17 13:50 UTC (permalink / raw) To: Richard Davies Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity On 09/15/2012 11:55 AM, Richard Davies wrote: > Hi Rik, Mel and Shaohua, > > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. > > Mel asked for timings of the slow boots. It's very hard to give anything > useful here! A normal boot would be a minute or so, and many are like that, > but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > just test many times until I get one which is noticeably slow than normal > and then run perf record on that one. > > The latest perf report for a slow boot is below. For the fast boots, most of > the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > one there is a lot of lock contention above that. How often do you run into slow boots, vs. fast ones? > # Overhead Command Shared Object Symbol > # ........ ............... .................... .............................................. > # > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--95.07%-- compact_checklock_irqsave > | | > | |--70.03%-- isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask Looks like it moved from isolate_freepages_block in your last trace, to isolate_migratepages_range? Mel, I wonder if we have any quadratic complexity problems in this part of the code, too? The isolate_freepages_block CPU use can be fixed by simply restarting where the last invocation left off, instead of always starting at the end of the zone. Could we need something similar for isolate_migratepages_range? After all, Richard has a 128GB system, and runs 108GB worth of KVM guests on it... -- All rights reversed ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-17 13:50 ` [Qemu-devel] " Rik van Riel (?) @ 2012-09-17 14:07 ` Mel Gorman -1 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw) To: Rik van Riel Cc: Richard Davies, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote: > On 09/15/2012 11:55 AM, Richard Davies wrote: > >Hi Rik, Mel and Shaohua, > > > >Thank you for your latest patches. I attach my latest perf report for a slow > >boot with all of these applied. > > > >Mel asked for timings of the slow boots. It's very hard to give anything > >useful here! A normal boot would be a minute or so, and many are like that, > >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > >just test many times until I get one which is noticeably slow than normal > >and then run perf record on that one. > > > >The latest perf report for a slow boot is below. For the fast boots, most of > >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > >one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? > > ># Overhead Command Shared Object Symbol > ># ........ ............... .................... .............................................. > ># > > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > | > > --- _raw_spin_lock_irqsave > > | > > |--95.07%-- compact_checklock_irqsave > > | | > > | |--70.03%-- isolate_migratepages_range > > | | compact_zone > > | | compact_zone_order > > | | try_to_compact_pages > > | | __alloc_pages_direct_compact > > | | __alloc_pages_nodemask > > Looks like it moved from isolate_freepages_block in your last > trace, to isolate_migratepages_range? > > Mel, I wonder if we have any quadratic complexity problems > in this part of the code, too? > Possibly but right now I'm focusing on the contention even though I recognise that reducing the amount of scanning implicitly reduces the amount of contention. I'm running a test at the moment with an additional patch to record the pageblock being scanned by either the free or migrate page scanner. This should be enough to both calculate the scanning efficiency and how many useless blocks are scanned to determine if your "skip" patches are behaving as expected and from there decide if the migrate scanner needs similar logic. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-17 14:07 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw) To: Rik van Riel Cc: Richard Davies, kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote: > On 09/15/2012 11:55 AM, Richard Davies wrote: > >Hi Rik, Mel and Shaohua, > > > >Thank you for your latest patches. I attach my latest perf report for a slow > >boot with all of these applied. > > > >Mel asked for timings of the slow boots. It's very hard to give anything > >useful here! A normal boot would be a minute or so, and many are like that, > >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > >just test many times until I get one which is noticeably slow than normal > >and then run perf record on that one. > > > >The latest perf report for a slow boot is below. For the fast boots, most of > >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > >one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? > > ># Overhead Command Shared Object Symbol > ># ........ ............... .................... .............................................. > ># > > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > | > > --- _raw_spin_lock_irqsave > > | > > |--95.07%-- compact_checklock_irqsave > > | | > > | |--70.03%-- isolate_migratepages_range > > | | compact_zone > > | | compact_zone_order > > | | try_to_compact_pages > > | | __alloc_pages_direct_compact > > | | __alloc_pages_nodemask > > Looks like it moved from isolate_freepages_block in your last > trace, to isolate_migratepages_range? > > Mel, I wonder if we have any quadratic complexity problems > in this part of the code, too? > Possibly but right now I'm focusing on the contention even though I recognise that reducing the amount of scanning implicitly reduces the amount of contention. I'm running a test at the moment with an additional patch to record the pageblock being scanned by either the free or migrate page scanner. This should be enough to both calculate the scanning efficiency and how many useless blocks are scanned to determine if your "skip" patches are behaving as expected and from there decide if the migrate scanner needs similar logic. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust @ 2012-09-17 14:07 ` Mel Gorman 0 siblings, 0 replies; 101+ messages in thread From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw) To: Rik van Riel Cc: Richard Davies, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote: > On 09/15/2012 11:55 AM, Richard Davies wrote: > >Hi Rik, Mel and Shaohua, > > > >Thank you for your latest patches. I attach my latest perf report for a slow > >boot with all of these applied. > > > >Mel asked for timings of the slow boots. It's very hard to give anything > >useful here! A normal boot would be a minute or so, and many are like that, > >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > >just test many times until I get one which is noticeably slow than normal > >and then run perf record on that one. > > > >The latest perf report for a slow boot is below. For the fast boots, most of > >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > >one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? > > ># Overhead Command Shared Object Symbol > ># ........ ............... .................... .............................................. > ># > > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > | > > --- _raw_spin_lock_irqsave > > | > > |--95.07%-- compact_checklock_irqsave > > | | > > | |--70.03%-- isolate_migratepages_range > > | | compact_zone > > | | compact_zone_order > > | | try_to_compact_pages > > | | __alloc_pages_direct_compact > > | | __alloc_pages_nodemask > > Looks like it moved from isolate_freepages_block in your last > trace, to isolate_migratepages_range? > > Mel, I wonder if we have any quadratic complexity problems > in this part of the code, too? > Possibly but right now I'm focusing on the contention even though I recognise that reducing the amount of scanning implicitly reduces the amount of contention. I'm running a test at the moment with an additional patch to record the pageblock being scanned by either the free or migrate page scanner. This should be enough to both calculate the scanning efficiency and how many useless blocks are scanned to determine if your "skip" patches are behaving as expected and from there decide if the migrate scanner needs similar logic. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted 2012-08-16 10:47 ` [Qemu-devel] " Richard Davies @ 2012-08-16 14:10 ` Benoît Canet -1 siblings, 0 replies; 101+ messages in thread From: Benoît Canet @ 2012-08-16 14:10 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm Le Thursday 16 Aug 2012 à 11:47:27 (+0100), Richard Davies a écrit : > Hi, > > We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a > contractor to track down and fix problems we have with large memory Windows > guests booting very slowly - they can take several hours. > > We previously reported these problems in July (copied below) and they are > still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. > > This is a serious issue for us which is causing significant pain to our > larger Windows VM customers when their servers are offline for many hours > during boot. > > If anyone knowledgeable in the area would be interested in being paid to > work on this, or if you know someone who might be, I would be delighted to > hear from you. > > Cheers, > > Richard. > > > ===== Previous bug report > > http://marc.info/?l=qemu-devel&m=134304194329745 > > > We have been experiencing this problem for a while now too, using qemu-kvm > (currently at 1.1.1). > > Unfortunately, hv_relaxed doesn't seem to fix it. The following command line > produces the issue: > > qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img > > The hardware consists of dual AMD Opteron 6128 processors (16 cores in > total) and 64GB of memory. This command line was tested on kernel 3.1.4. > > I've also tested with -no-hpet. > > What I have seen is much as described: the memory fills out slowly, and top > on the host will show the process using 100% on all allocated CPU cores. The > most extreme case was a machine which took something between 6 and 8 hours > to boot. > > This seems to be related to the assigned memory, as described, but also the > number of processor cores (which makes sense if we believe it's a timing > issue?). I have seen slow-booting guests improved by switching down to a > single or even two cores. > > Matthew, I agree that this seems to be linked to the number of VMs running - > in fact, shutting down other VMs on a dedicated test host caused the machine > to start booting at a normal speed (with no reboot required). > > However, the level of contention is never such that this could be explained > by the host simply being overcommitted. > > If it helps anyone, there's an image of the hard drive I've been using to > test at: > > http://46.20.114.253/ > > It's 5G of gzip file containing a fairly standard Windows 2008 trial > installation. Since it's in the trial period, anyone who wants to use it may > have to re-arm the trial: http://support.microsoft.com/kb/948472 > > Please let me know if I can provide any more information, or test anything. For info the image boot pretty fast with qemu-kvm 1.1.1 and a 3.2.0-29 ubuntu kernel on a core i7 with these parameters. Benoît > > Best wishes, > > Owen Tuz > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted @ 2012-08-16 14:10 ` Benoît Canet 0 siblings, 0 replies; 101+ messages in thread From: Benoît Canet @ 2012-08-16 14:10 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm Le Thursday 16 Aug 2012 à 11:47:27 (+0100), Richard Davies a écrit : > Hi, > > We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a > contractor to track down and fix problems we have with large memory Windows > guests booting very slowly - they can take several hours. > > We previously reported these problems in July (copied below) and they are > still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. > > This is a serious issue for us which is causing significant pain to our > larger Windows VM customers when their servers are offline for many hours > during boot. > > If anyone knowledgeable in the area would be interested in being paid to > work on this, or if you know someone who might be, I would be delighted to > hear from you. > > Cheers, > > Richard. > > > ===== Previous bug report > > http://marc.info/?l=qemu-devel&m=134304194329745 > > > We have been experiencing this problem for a while now too, using qemu-kvm > (currently at 1.1.1). > > Unfortunately, hv_relaxed doesn't seem to fix it. The following command line > produces the issue: > > qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img > > The hardware consists of dual AMD Opteron 6128 processors (16 cores in > total) and 64GB of memory. This command line was tested on kernel 3.1.4. > > I've also tested with -no-hpet. > > What I have seen is much as described: the memory fills out slowly, and top > on the host will show the process using 100% on all allocated CPU cores. The > most extreme case was a machine which took something between 6 and 8 hours > to boot. > > This seems to be related to the assigned memory, as described, but also the > number of processor cores (which makes sense if we believe it's a timing > issue?). I have seen slow-booting guests improved by switching down to a > single or even two cores. > > Matthew, I agree that this seems to be linked to the number of VMs running - > in fact, shutting down other VMs on a dedicated test host caused the machine > to start booting at a normal speed (with no reboot required). > > However, the level of contention is never such that this could be explained > by the host simply being overcommitted. > > If it helps anyone, there's an image of the hard drive I've been using to > test at: > > http://46.20.114.253/ > > It's 5G of gzip file containing a fairly standard Windows 2008 trial > installation. Since it's in the trial period, anyone who wants to use it may > have to re-arm the trial: http://support.microsoft.com/kb/948472 > > Please let me know if I can provide any more information, or test anything. For info the image boot pretty fast with qemu-kvm 1.1.1 and a 3.2.0-29 ubuntu kernel on a core i7 with these parameters. Benoît > > Best wishes, > > Owen Tuz > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted 2012-08-16 10:47 ` [Qemu-devel] " Richard Davies ` (2 preceding siblings ...) (?) @ 2012-08-16 15:53 ` Troy Benjegerdes -1 siblings, 0 replies; 101+ messages in thread From: Troy Benjegerdes @ 2012-08-16 15:53 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm I'd be interested in working on this.. What I'd like to propose is to write an automated regression test harness that will reboot the host hardware, and start booting up guest VMs and report the time-to-boot, as well as relative performance of the running VMs. For best results, I'd need access to the specific hardware you are using. I'd also like to release the test harness back to the community, so I would like some feedback from the mailing list on what kinds of tests should be written that would provide the best information for the KVM developers. What do you want to know, and what is the most usefull data to record to debug this and future performance regressions? On Thu, Aug 16, 2012 at 11:47:27AM +0100, Richard Davies wrote: > Hi, > > We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a > contractor to track down and fix problems we have with large memory Windows > guests booting very slowly - they can take several hours. > > We previously reported these problems in July (copied below) and they are > still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. > > This is a serious issue for us which is causing significant pain to our > larger Windows VM customers when their servers are offline for many hours > during boot. > > If anyone knowledgeable in the area would be interested in being paid to > work on this, or if you know someone who might be, I would be delighted to > hear from you. > > Cheers, > > Richard. > > > ===== Previous bug report > > http://marc.info/?l=qemu-devel&m=134304194329745 > > > We have been experiencing this problem for a while now too, using qemu-kvm > (currently at 1.1.1). > > Unfortunately, hv_relaxed doesn't seem to fix it. The following command line > produces the issue: > > qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img > > The hardware consists of dual AMD Opteron 6128 processors (16 cores in > total) and 64GB of memory. This command line was tested on kernel 3.1.4. > > I've also tested with -no-hpet. > > What I have seen is much as described: the memory fills out slowly, and top > on the host will show the process using 100% on all allocated CPU cores. The > most extreme case was a machine which took something between 6 and 8 hours > to boot. > > This seems to be related to the assigned memory, as described, but also the > number of processor cores (which makes sense if we believe it's a timing > issue?). I have seen slow-booting guests improved by switching down to a > single or even two cores. > > Matthew, I agree that this seems to be linked to the number of VMs running - > in fact, shutting down other VMs on a dedicated test host caused the machine > to start booting at a normal speed (with no reboot required). > > However, the level of contention is never such that this could be explained > by the host simply being overcommitted. > > If it helps anyone, there's an image of the hard drive I've been using to > test at: > > http://46.20.114.253/ > > It's 5G of gzip file containing a fairly standard Windows 2008 trial > installation. Since it's in the trial period, anyone who wants to use it may > have to re-arm the trial: http://support.microsoft.com/kb/948472 > > Please let me know if I can provide any more information, or test anything. > > Best wishes, > > Owen Tuz > ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot 2012-08-16 10:47 ` [Qemu-devel] " Richard Davies @ 2012-09-18 15:12 ` Michael Tokarev -1 siblings, 0 replies; 101+ messages in thread From: Michael Tokarev @ 2012-09-18 15:12 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 16.08.2012 14:47, Richard Davies wrote: > http://marc.info/?l=qemu-devel&m=134304194329745 > > > We have been experiencing this problem for a while now too, using qemu-kvm > (currently at 1.1.1). > > Unfortunately, hv_relaxed doesn't seem to fix it. The following command line > produces the issue: > > qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img Just one question: did you try explicitly using hugepages? For that, - reserve some amount of hugepages (echo something > /proc/sys/vm/nr_hugepages), - mount hugetlbfs to somewhere, like, /dev/hugetlbfs - use -mem-path=/dev/hugetlbfs qemu option This may also reduce your lock contention. Sure, hugepages have some minus sides too, but I think it is worth to try anyway - for a single VM or for whole lot of VMs (for that you'll have to reserve much more memory after host boot). /mjt ^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot @ 2012-09-18 15:12 ` Michael Tokarev 0 siblings, 0 replies; 101+ messages in thread From: Michael Tokarev @ 2012-09-18 15:12 UTC (permalink / raw) To: Richard Davies; +Cc: qemu-devel, kvm On 16.08.2012 14:47, Richard Davies wrote: > http://marc.info/?l=qemu-devel&m=134304194329745 > > > We have been experiencing this problem for a while now too, using qemu-kvm > (currently at 1.1.1). > > Unfortunately, hv_relaxed doesn't seem to fix it. The following command line > produces the issue: > > qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img Just one question: did you try explicitly using hugepages? For that, - reserve some amount of hugepages (echo something > /proc/sys/vm/nr_hugepages), - mount hugetlbfs to somewhere, like, /dev/hugetlbfs - use -mem-path=/dev/hugetlbfs qemu option This may also reduce your lock contention. Sure, hugepages have some minus sides too, but I think it is worth to try anyway - for a single VM or for whole lot of VMs (for that you'll have to reserve much more memory after host boot). /mjt ^ permalink raw reply [flat|nested] 101+ messages in thread
end of thread, other threads:[~2012-09-18 17:59 UTC | newest] Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-08-16 10:47 Windows slow boot: contractor wanted Richard Davies 2012-08-16 10:47 ` [Qemu-devel] " Richard Davies 2012-08-16 11:39 ` Avi Kivity 2012-08-16 11:39 ` [Qemu-devel] " Avi Kivity 2012-08-17 12:36 ` Richard Davies 2012-08-17 12:36 ` [Qemu-devel] " Richard Davies 2012-08-17 13:02 ` Robert Vineyard 2012-08-17 13:02 ` [Qemu-devel] " Robert Vineyard 2012-08-18 14:44 ` Richard Davies 2012-08-18 14:44 ` [Qemu-devel] " Richard Davies 2012-08-19 5:02 ` Brian Jackson 2012-08-19 5:02 ` [Qemu-devel] " Brian Jackson 2012-08-20 8:16 ` Richard Davies 2012-08-20 8:16 ` [Qemu-devel] " Richard Davies 2012-08-19 8:40 ` Avi Kivity 2012-08-19 8:40 ` [Qemu-devel] " Avi Kivity 2012-08-19 8:51 ` Richard Davies 2012-08-19 8:51 ` [Qemu-devel] " Richard Davies 2012-08-19 14:04 ` Avi Kivity 2012-08-19 14:04 ` [Qemu-devel] " Avi Kivity 2012-08-20 13:56 ` Richard Davies 2012-08-20 13:56 ` [Qemu-devel] " Richard Davies 2012-08-21 9:00 ` Avi Kivity 2012-08-21 9:00 ` [Qemu-devel] " Avi Kivity 2012-08-21 15:21 ` Richard Davies 2012-08-21 15:21 ` [Qemu-devel] " Richard Davies 2012-08-21 15:39 ` Troy Benjegerdes 2012-08-21 15:39 ` Troy Benjegerdes 2012-08-22 9:08 ` Avi Kivity 2012-08-22 9:08 ` [Qemu-devel] " Avi Kivity 2012-08-22 12:40 ` Richard Davies 2012-08-22 12:40 ` [Qemu-devel] " Richard Davies 2012-08-22 12:44 ` Avi Kivity 2012-08-22 12:44 ` [Qemu-devel] " Avi Kivity 2012-08-22 14:41 ` Richard Davies 2012-08-22 14:41 ` [Qemu-devel] " Richard Davies 2012-08-22 14:53 ` Avi Kivity 2012-08-22 14:53 ` [Qemu-devel] " Avi Kivity 2012-08-22 15:26 ` Richard Davies 2012-08-22 15:26 ` [Qemu-devel] " Richard Davies 2012-08-22 17:22 ` Troy Benjegerdes 2012-08-22 17:22 ` Troy Benjegerdes 2012-08-25 17:51 ` Richard Davies 2012-08-25 17:51 ` Richard Davies 2012-08-22 15:21 ` Rik van Riel 2012-08-22 15:21 ` [Qemu-devel] " Rik van Riel 2012-08-22 15:34 ` Richard Davies 2012-08-22 15:34 ` [Qemu-devel] " Richard Davies 2012-08-25 17:45 ` Richard Davies 2012-08-25 17:45 ` [Qemu-devel] " Richard Davies 2012-08-25 18:11 ` Rik van Riel 2012-08-25 18:11 ` [Qemu-devel] " Rik van Riel 2012-08-26 10:58 ` Richard Davies 2012-08-26 10:58 ` [Qemu-devel] " Richard Davies 2012-09-06 9:20 ` Richard Davies 2012-09-06 9:20 ` [Qemu-devel] " Richard Davies 2012-09-12 10:56 ` Windows VM slow boot Richard Davies 2012-09-12 10:56 ` [Qemu-devel] " Richard Davies 2012-09-12 10:56 ` Richard Davies 2012-09-12 12:25 ` Mel Gorman 2012-09-12 12:25 ` [Qemu-devel] " Mel Gorman 2012-09-12 12:25 ` Mel Gorman 2012-09-12 16:46 ` Richard Davies 2012-09-12 16:46 ` [Qemu-devel] " Richard Davies 2012-09-12 16:46 ` Richard Davies 2012-09-13 9:50 ` Mel Gorman 2012-09-13 9:50 ` [Qemu-devel] " Mel Gorman 2012-09-13 9:50 ` Mel Gorman 2012-09-13 19:47 ` [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" Rik van Riel 2012-09-13 19:47 ` [Qemu-devel] " Rik van Riel 2012-09-13 19:47 ` Rik van Riel 2012-09-13 19:48 ` [PATCH 2/2] make the compaction "skip ahead" logic robust Rik van Riel 2012-09-13 19:48 ` [Qemu-devel] " Rik van Riel 2012-09-13 19:48 ` Rik van Riel 2012-09-13 19:54 ` [PATCH -v2 " Rik van Riel 2012-09-13 19:54 ` [Qemu-devel] " Rik van Riel 2012-09-13 19:54 ` Rik van Riel 2012-09-15 15:55 ` Richard Davies 2012-09-15 15:55 ` [Qemu-devel] " Richard Davies 2012-09-15 15:55 ` Richard Davies 2012-09-16 19:12 ` Richard Davies 2012-09-16 19:12 ` [Qemu-devel] " Richard Davies 2012-09-17 12:26 ` Mel Gorman 2012-09-17 12:26 ` [Qemu-devel] " Mel Gorman 2012-09-18 8:14 ` Richard Davies 2012-09-18 8:14 ` [Qemu-devel] " Richard Davies 2012-09-18 11:21 ` Mel Gorman 2012-09-18 11:21 ` [Qemu-devel] " Mel Gorman 2012-09-18 11:21 ` Mel Gorman 2012-09-18 17:58 ` Richard Davies 2012-09-18 17:58 ` [Qemu-devel] " Richard Davies 2012-09-17 13:50 ` Rik van Riel 2012-09-17 13:50 ` [Qemu-devel] " Rik van Riel 2012-09-17 14:07 ` Mel Gorman 2012-09-17 14:07 ` [Qemu-devel] " Mel Gorman 2012-09-17 14:07 ` Mel Gorman 2012-08-16 14:10 ` Windows slow boot: contractor wanted Benoît Canet 2012-08-16 14:10 ` [Qemu-devel] " Benoît Canet 2012-08-16 15:53 ` Troy Benjegerdes 2012-09-18 15:12 ` Windows slow boot Michael Tokarev 2012-09-18 15:12 ` [Qemu-devel] " Michael Tokarev
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.