All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Jim Schutt <jaschut@sandia.gov>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: excessive CPU utilization by isolate_freepages?
Date: Thu, 28 Jun 2012 08:59:47 +0900	[thread overview]
Message-ID: <4FEB9E73.5040709@kernel.org> (raw)
In-Reply-To: <4FEB8237.6030402@sandia.gov>

On 06/28/2012 06:59 AM, Jim Schutt wrote:

> Hi,
> 
> I'm running into trouble with systems going unresponsive,
> and perf suggests it's excessive CPU usage by isolate_freepages().
> I'm currently testing 3.5-rc4, but I think this problem may have
> first shown up in 3.4.  I'm only just learning how to use perf,
> so I only currently have results to report for 3.5-rc4.
> 
> (FWIW I'm running the my distro version of perf; please let me know
> if I need to compile the tools/perf version to match my kernel.)
> 
> The systems in question have 24 SAS drives spread across 3 HBAs,
> running 24 Ceph OSD instances, one per drive.  FWIW these servers
> are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
> Ceph Linux clients doing dd simultaneously to a Ceph file system
> backed by 12 of these servers.
> 
> In the early phase of such a test, when things are running well,
> here's what vmstat reports for the state of one of these servers:
> 
> 2012-06-27 13:56:58.356-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 31 15          0     287216        576   38606628    0    0     2 
> 1158    2   14   1  3  95  0  0
> 27 15          0     225288        576   38583384    0    0    18
> 2222016 203357 134876  11 56  17 15  0
> 28 17          0     219256        576   38544736    0    0    11
> 2305932 203141 146296  11 49  23 17  0
>  6 18          0     215596        576   38552872    0    0     7
> 2363207 215264 166502  12 45  22 20  0
> 22 18          0     226984        576   38596404    0    0     3
> 2445741 223114 179527  12 43  23 22  0
> 30 12          0     230844        576   38461648    0    0    14
> 2298537 216580 166661  12 45  23 20  0
> 29 19          0     237856        576   38422884    0    0     5
> 2332741 209408 157138  12 42  25 22  0
> 17 11          0     222156        576   38483992    0    0     4
> 2380495 210312 173121  12 39  28 22  0
> 11 13          0     216152        576   38463872    0    0    44
> 2362186 215236 176454  12 39  27 22  0
> 12 14          0     223704        576   38546720    0    0     2
> 2395530 214684 177518  12 39  28 22  0
> 39 17          0     219932        576   38598184    0    0     4
> 2428231 223284 179095  12 42  24 21  0
> 11 10          0     219956        576   38521048    0    0     4
> 2323853 207472 166665  12 39  28 21  0
> 23 15          0     216580        576   38451904    0    0     3
> 2241800 201049 163496  11 37  31 21  0
>  9 13          0     225792        576   38451916    0    0    13
> 2281900 204869 171814  11 36  30 23  0
> 14 12          0     233820        576   38492728    0    0     4
> 2293423 207686 173019  11 37  31 21  0
> 11 20          0     213796        576   38533208    0    0     3
> 2288636 205605 168118  11 37  31 21  0
> 
> 
> The system begins to struggle over the next several
> minutes; here's what vmstat has to say:
> 
> 2012-06-27 13:57:58.831-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 21 16          0     224628        576   38526872    0    0     2 
> 1233    9   19   1  4  95  0  0
> 12 19          0     232060        576   38501020    0    0     4
> 2366769 221418 159890  12 48  20 19  0
> 124  8          0     218548        576   38379656    0    0    13
> 2103075 199660 108618  11 65  13 12  0
> 24 10          0     300476        576   38230288    0    0    31
> 1966282 177472 84572  10 76   7  7  0
> 20 16          0     217584        576   38296700    0    0     9
> 2062571 195936 128810  10 55  20 15  0
> 53 12          0     235720        576   38247968    0    0    30
> 2035407 196973 133921  10 52  23 15  0
> 20 16          0     360340        576   38067992    0    0     6
> 2192179 208692 136784  11 54  19 15  0
> 26 10          0     310800        576   38093884    0    0    43
> 2138772 207105 118718  11 64  12 13  0
> 24 15          0     261108        576   38030828    0    0    68
> 2174015 205793 135302  11 56  18 15  0
>  9 17          0     241816        576   37982072    0    0    20
> 2076145 194971 120285  10 60  16 14  0
> 37 15          0     255972        576   37892868    0    0    14
> 2225076 205694 126954  11 59  16 13  0
> 27 16          0     243212        576   37872704    0    0     6
> 2249476 210885 134684  12 60  15 14  0
> 30 10          0     217572        576   37795388    0    0     3
> 2128688 205027 118319  11 66  12 11  0
> 26 11          0     236420        576   37740164    0    0    23
> 2109709 205105 133925  10 56  19 14  0
> 45 15          0     330056        576   37619896    0    0    15
> 1948311 196188 119330  10 62  15 13  0
> 54 15          0     242696        576   37631500    0    0     4
> 2159530 202595 132588  11 59  16 15  0
> 
> 2012-06-27 13:58:59.569-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 14 15          0     274932        576   37621548    0    0     2 
> 1301   15    0   1  4  95  0  0
> 44 16          0     278748        576   37509516    0    0     6
> 2097643 196722 112697  10 66  12 11  0
> 88 14          0     228088        576   37412008    0    0     6
> 2089559 202206 116146  10 66  13 11  0
> 94 11          0     348348        576   37270624    0    0     7
> 1906390 181488 87333   9 76   9  6  0
> 42 11          0     215996        576   37288556    0    0     6
> 1782459 184375 95901   9 74   9  7  0
> 45 11          0     323112        576   37146500    0    0    12
> 1868376 187814 103947   9 71  11  9  0
> 51  7          0     244560        576   37100124    0    0    35
> 1767496 181383 107259   9 68  12 11  0
> 74 12          0     221584        576   37016420    0    0    18
> 1884986 183376 93425   9 75   9  7  0
> 45 10          0     275564        576   36985324    0    0    23
> 1683688 167223 97036   8 75   9  8  0
> 19 10          0     322176        576   36813176    0    0    14
> 1747378 177594 97218   8 72  12  8  0
> 122  7          0     225256        576   36838084    0    0    26
> 1730643 177915 92621   8 75   9  8  0
> 243 10          0     223464        576   36765460    0    0    18
> 1730158 173059 79373   8 80   6  5  0
> 100 10          0     307528        576   36598456    0    0     4
> 1738567 174077 79585   9 82   6  4  0
> 243  6          0     370064        576   36358576    0    0     9
> 1586528 174680 85353   8 81   6  5  0
> 267  2          0     322640        576   36254044    0    0    40
> 1011650 129389 42277   5 93   1  1  0
> 210  4          0     505092        576   35865460    0    0    25
> 720825 116356 32422   3 96   1  0  0
> 
> 2012-06-27 14:00:03.219-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 75  1          0     566988        576   35664800    0    0     2 
> 1355   21    3   1  4  95  0  0
> 433  1          0     964052        576   35069112    0    0     7
> 456359 102256 20901   2 98   0  0  0
> 547  3          0     820116        576   34893932    0    0    57
> 560507 114878 28115   3 96   0  0  0
> 806  2          0     606992        576   34848180    0    0   339
> 309668 101230 21056   2 98   0  0  0
> 708  1          0     529624        576   34708000    0    0   248
> 370886 101327 20062   2 97   0  0  0
> 231  5          0     504772        576   34663880    0    0   305
> 334824 95045 20407   2 97   1  1  0
> 158  6          0    1063088        576   33518536    0    0   531
> 847435 130696 47140   4 92   1  2  0
> 193  0          0    1449156        576   33035572    0    0   363
> 371279 94470 18955   2 96   1  1  0
> 266  6          0    1623512        576   32728164    0    0    77
> 241114 95730 15483   2 98   0  0  0
> 243  8          0    1629504        576   32653080    0    0    81
> 471018 100223 20920   3 96   0  1  0
> 70 11          0    1342140        576   33084020    0    0   100 925869
> 139876 56599   6 88   3  3  0
> 211  7          0    1130316        576   33470432    0    0   290
> 1008984 150699 74320   6 83   6  5  0
> 365  3          0     776736        576   34072772    0    0   182
> 747167 139436 67135   5 88   4  3  0
> 29  1          0    1528412        576   34110640    0    0    50 612181
> 137403 77609   4 87   6  3  0
> 266  5          0    1657688        576   34105696    0    0     3
> 258307 62879 38508   2 93   3  2  0
> 1159  2          0    2002256        576   33775476    0    0    19
> 88554 42112 14230   1 98   0  0  0
> 
> 
> Right around 14:00 I was able to get a "perf -a -g"; here's the
> beginning of what "perf report --sort symbol --call-graph fractal,5"
> had to say:
> 
> #
>     64.86%  [k] _raw_spin_lock_irqsave
>             |
>             |--97.94%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          |
>             |          |--99.56%-- try_to_compact_pages
>             |          |          __alloc_pages_direct_compact
>             |          |          __alloc_pages_slowpath
>             |          |          __alloc_pages_nodemask
>             |          |          alloc_pages_vma
>             |          |          do_huge_pmd_anonymous_page
>             |          |          handle_mm_fault
>             |          |          do_page_fault
>             |          |          page_fault
>             |          |          |
>             |          |          |--53.53%-- skb_copy_datagram_iovec
>             |          |          |          tcp_recvmsg
>             |          |          |          inet_recvmsg
>             |          |          |          sock_recvmsg
>             |          |          |          sys_recvfrom
>             |          |          |          system_call_fastpath
>             |          |          |          __recv
>             |          |          |          |
>             |          |          |           --100.00%-- (nil)
>             |          |          |
>             |          |          |--27.80%-- __pthread_create_2_1
>             |          |          |          (nil)
>             |          |          |
>             |          |           --18.67%-- memcpy
>             |          |                     |
>             |          |                     |--57.38%-- 0x50d000005
>             |          |                     |
>             |          |                     |--34.52%-- 0x3b300bf271940a35
>             |          |                     |
>             |          |                      --8.10%-- 0x1500000000000009
>             |           --0.44%-- [...]
>              --2.06%-- [...]
> 
>      6.15%  [k] isolate_freepages_block
>             |
>             |--99.95%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          |
>             |          |--99.54%-- try_to_compact_pages
>             |          |          __alloc_pages_direct_compact
>             |          |          __alloc_pages_slowpath
>             |          |          __alloc_pages_nodemask
>             |          |          alloc_pages_vma
>             |          |          do_huge_pmd_anonymous_page
>             |          |          handle_mm_fault
>             |          |          do_page_fault
>             |          |          page_fault
>             |          |          |
>             |          |          |--54.40%-- skb_copy_datagram_iovec
>             |          |          |          tcp_recvmsg
>             |          |          |          inet_recvmsg
>             |          |          |          sock_recvmsg
>             |          |          |          sys_recvfrom
>             |          |          |          system_call_fastpath
>             |          |          |          __recv
>             |          |          |          |
>             |          |          |           --100.00%-- (nil)
>             |          |          |
>             |          |          |--25.19%-- __pthread_create_2_1
>             |          |          |          (nil)
>             |          |          |
>             |          |           --20.41%-- memcpy
>             |          |                     |
>             |          |                     |--40.24%-- 0x3b300bf271940a35
>             |          |                     |
>             |          |                     |--38.29%-- 0x1500000000000009
>             |          |                     |
>             |          |                      --21.47%-- 0x50d000005
>             |           --0.46%-- [...]
>              --0.05%-- [...]
> 
>      3.96%  [.] ceph_crc32c_le
>             |
>             |--99.99%-- 0xb8057558d0065990
>              --0.01%-- [...]
> 
> A different system in the same test had a slightly different
> call tree, but isolate_freepages() still seems to show up
> prominently:
> 
> #
>     32.32%  [k] _raw_spin_lock_irqsave
>             |
>             |--97.64%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          try_to_compact_pages
>             |          __alloc_pages_direct_compact
>             |          __alloc_pages_slowpath
>             |          __alloc_pages_nodemask
>             |          alloc_pages_vma
>             |          do_huge_pmd_anonymous_page
>             |          handle_mm_fault
>             |          do_page_fault
>             |          page_fault
>             |          |
>             |          |--65.31%-- skb_copy_datagram_iovec
>             |          |          tcp_recvmsg
>             |          |          inet_recvmsg
>             |          |          sock_recvmsg
>             |          |          sys_recvfrom
>             |          |          system_call_fastpath
>             |          |          __recv
>             |          |          |
>             |          |           --100.00%-- (nil)
>             |          |
>             |          |--30.98%-- memcpy
>             |          |          |
>             |          |          |--50.60%-- 0x50d0000
>             |          |          |
>             |          |           --49.40%-- 0x50d000005
>             |           --3.70%-- [...]
>              --2.36%-- [...]
> 
>     17.10%  [k] _raw_spin_lock_irq
>             |
>             |--98.27%-- isolate_migratepages_range
>             |          compact_zone
>             |          try_to_compact_pages
>             |          __alloc_pages_direct_compact
>             |          __alloc_pages_slowpath
>             |          __alloc_pages_nodemask
>             |          alloc_pages_vma
>             |          do_huge_pmd_anonymous_page
>             |          handle_mm_fault
>             |          do_page_fault
>             |          page_fault
>             |          |
>             |          |--99.85%-- __pthread_create_2_1
>             |          |          (nil)
>             |           --0.15%-- [...]
>              --1.73%-- [...]
> 
>      4.59%  [k] mutex_spin_on_owner
>             |
>             --- __mutex_lock_slowpath
>                 mutex_lock
>                |
>                |--50.14%-- page_lock_anon_vma
>                |          |
>                |          |--99.99%-- try_to_unmap_anon
>                |          |          try_to_unmap
>                |          |          __unmap_and_move
>                |          |          unmap_and_move
>                |          |          migrate_pages
>                |          |          compact_zone
>                |          |          try_to_compact_pages
>                |          |          __alloc_pages_direct_compact
>                |          |          __alloc_pages_slowpath
>                |          |          __alloc_pages_nodemask
>                |          |          alloc_pages_vma
>                |          |          do_huge_pmd_anonymous_page
>                |          |          handle_mm_fault
>                |          |          do_page_fault
>                |          |          page_fault
>                |          |          |
>                |          |          |--99.91%-- __pthread_create_2_1
>                |          |          |          (nil)
>                |          |           --0.09%-- [...]
>                |           --0.01%-- [...]
>                |
>                |--49.67%-- rmap_walk
>                |          move_to_new_page
>                |          __unmap_and_move
>                |          unmap_and_move
>                |          migrate_pages
>                |          compact_zone
>                |          try_to_compact_pages
>                |          __alloc_pages_direct_compact
>                |          __alloc_pages_slowpath
>                |          __alloc_pages_nodemask
>                |          alloc_pages_vma
>                |          do_huge_pmd_anonymous_page
>                |          handle_mm_fault
>                |          do_page_fault
>                |          page_fault
>                |          |
>                |          |--99.69%-- __pthread_create_2_1
>                |          |          (nil)
>                |           --0.31%-- [...]
>                 --0.20%-- [...]
> 
>      4.10%  [k] isolate_freepages_block
>             |
>             |--99.95%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          try_to_compact_pages
>             |          __alloc_pages_direct_compact
>             |          __alloc_pages_slowpath
>             |          __alloc_pages_nodemask
>             |          alloc_pages_vma
>             |          do_huge_pmd_anonymous_page
>             |          handle_mm_fault
>             |          do_page_fault
>             |          page_fault
>             |          |
>             |          |--46.97%-- skb_copy_datagram_iovec
>             |          |          tcp_recvmsg
>             |          |          inet_recvmsg
>             |          |          sock_recvmsg
>             |          |          sys_recvfrom
>             |          |          system_call_fastpath
>             |          |          __recv
>             |          |          |
>             |          |           --100.00%-- (nil)
>             |          |
>             |          |--31.79%-- __pthread_create_2_1
>             |          |          (nil)
>             |          |
>             |           --21.24%-- memcpy
>             |                     |
>             |                     |--61.90%-- 0x50d000005
>             |                     |
>             |                      --38.10%-- 0x50d0000
>              --0.05%-- [...]
> 
>      3.65%  [.] ceph_crc32c_le
>             |
>             |--99.86%-- 0xb8057558d0065990
>              --0.14%-- [...]
> 
> 
> 
> I seem to be able to recreate this issue at will, so please
> let me know what I can do to help learn what is going on.
> 
> Thanks -- Jim



I doubt compaction try to migrate continuously although we have no free memory.
Could you apply this patch and retest?

https://lkml.org/lkml/2012/6/21/30

Thanks.

-- 
Kind regards,
Minchan Kim

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Jim Schutt <jaschut@sandia.gov>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: excessive CPU utilization by isolate_freepages?
Date: Thu, 28 Jun 2012 08:59:47 +0900	[thread overview]
Message-ID: <4FEB9E73.5040709@kernel.org> (raw)
In-Reply-To: <4FEB8237.6030402@sandia.gov>

On 06/28/2012 06:59 AM, Jim Schutt wrote:

> Hi,
> 
> I'm running into trouble with systems going unresponsive,
> and perf suggests it's excessive CPU usage by isolate_freepages().
> I'm currently testing 3.5-rc4, but I think this problem may have
> first shown up in 3.4.  I'm only just learning how to use perf,
> so I only currently have results to report for 3.5-rc4.
> 
> (FWIW I'm running the my distro version of perf; please let me know
> if I need to compile the tools/perf version to match my kernel.)
> 
> The systems in question have 24 SAS drives spread across 3 HBAs,
> running 24 Ceph OSD instances, one per drive.  FWIW these servers
> are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
> Ceph Linux clients doing dd simultaneously to a Ceph file system
> backed by 12 of these servers.
> 
> In the early phase of such a test, when things are running well,
> here's what vmstat reports for the state of one of these servers:
> 
> 2012-06-27 13:56:58.356-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 31 15          0     287216        576   38606628    0    0     2 
> 1158    2   14   1  3  95  0  0
> 27 15          0     225288        576   38583384    0    0    18
> 2222016 203357 134876  11 56  17 15  0
> 28 17          0     219256        576   38544736    0    0    11
> 2305932 203141 146296  11 49  23 17  0
>  6 18          0     215596        576   38552872    0    0     7
> 2363207 215264 166502  12 45  22 20  0
> 22 18          0     226984        576   38596404    0    0     3
> 2445741 223114 179527  12 43  23 22  0
> 30 12          0     230844        576   38461648    0    0    14
> 2298537 216580 166661  12 45  23 20  0
> 29 19          0     237856        576   38422884    0    0     5
> 2332741 209408 157138  12 42  25 22  0
> 17 11          0     222156        576   38483992    0    0     4
> 2380495 210312 173121  12 39  28 22  0
> 11 13          0     216152        576   38463872    0    0    44
> 2362186 215236 176454  12 39  27 22  0
> 12 14          0     223704        576   38546720    0    0     2
> 2395530 214684 177518  12 39  28 22  0
> 39 17          0     219932        576   38598184    0    0     4
> 2428231 223284 179095  12 42  24 21  0
> 11 10          0     219956        576   38521048    0    0     4
> 2323853 207472 166665  12 39  28 21  0
> 23 15          0     216580        576   38451904    0    0     3
> 2241800 201049 163496  11 37  31 21  0
>  9 13          0     225792        576   38451916    0    0    13
> 2281900 204869 171814  11 36  30 23  0
> 14 12          0     233820        576   38492728    0    0     4
> 2293423 207686 173019  11 37  31 21  0
> 11 20          0     213796        576   38533208    0    0     3
> 2288636 205605 168118  11 37  31 21  0
> 
> 
> The system begins to struggle over the next several
> minutes; here's what vmstat has to say:
> 
> 2012-06-27 13:57:58.831-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 21 16          0     224628        576   38526872    0    0     2 
> 1233    9   19   1  4  95  0  0
> 12 19          0     232060        576   38501020    0    0     4
> 2366769 221418 159890  12 48  20 19  0
> 124  8          0     218548        576   38379656    0    0    13
> 2103075 199660 108618  11 65  13 12  0
> 24 10          0     300476        576   38230288    0    0    31
> 1966282 177472 84572  10 76   7  7  0
> 20 16          0     217584        576   38296700    0    0     9
> 2062571 195936 128810  10 55  20 15  0
> 53 12          0     235720        576   38247968    0    0    30
> 2035407 196973 133921  10 52  23 15  0
> 20 16          0     360340        576   38067992    0    0     6
> 2192179 208692 136784  11 54  19 15  0
> 26 10          0     310800        576   38093884    0    0    43
> 2138772 207105 118718  11 64  12 13  0
> 24 15          0     261108        576   38030828    0    0    68
> 2174015 205793 135302  11 56  18 15  0
>  9 17          0     241816        576   37982072    0    0    20
> 2076145 194971 120285  10 60  16 14  0
> 37 15          0     255972        576   37892868    0    0    14
> 2225076 205694 126954  11 59  16 13  0
> 27 16          0     243212        576   37872704    0    0     6
> 2249476 210885 134684  12 60  15 14  0
> 30 10          0     217572        576   37795388    0    0     3
> 2128688 205027 118319  11 66  12 11  0
> 26 11          0     236420        576   37740164    0    0    23
> 2109709 205105 133925  10 56  19 14  0
> 45 15          0     330056        576   37619896    0    0    15
> 1948311 196188 119330  10 62  15 13  0
> 54 15          0     242696        576   37631500    0    0     4
> 2159530 202595 132588  11 59  16 15  0
> 
> 2012-06-27 13:58:59.569-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 14 15          0     274932        576   37621548    0    0     2 
> 1301   15    0   1  4  95  0  0
> 44 16          0     278748        576   37509516    0    0     6
> 2097643 196722 112697  10 66  12 11  0
> 88 14          0     228088        576   37412008    0    0     6
> 2089559 202206 116146  10 66  13 11  0
> 94 11          0     348348        576   37270624    0    0     7
> 1906390 181488 87333   9 76   9  6  0
> 42 11          0     215996        576   37288556    0    0     6
> 1782459 184375 95901   9 74   9  7  0
> 45 11          0     323112        576   37146500    0    0    12
> 1868376 187814 103947   9 71  11  9  0
> 51  7          0     244560        576   37100124    0    0    35
> 1767496 181383 107259   9 68  12 11  0
> 74 12          0     221584        576   37016420    0    0    18
> 1884986 183376 93425   9 75   9  7  0
> 45 10          0     275564        576   36985324    0    0    23
> 1683688 167223 97036   8 75   9  8  0
> 19 10          0     322176        576   36813176    0    0    14
> 1747378 177594 97218   8 72  12  8  0
> 122  7          0     225256        576   36838084    0    0    26
> 1730643 177915 92621   8 75   9  8  0
> 243 10          0     223464        576   36765460    0    0    18
> 1730158 173059 79373   8 80   6  5  0
> 100 10          0     307528        576   36598456    0    0     4
> 1738567 174077 79585   9 82   6  4  0
> 243  6          0     370064        576   36358576    0    0     9
> 1586528 174680 85353   8 81   6  5  0
> 267  2          0     322640        576   36254044    0    0    40
> 1011650 129389 42277   5 93   1  1  0
> 210  4          0     505092        576   35865460    0    0    25
> 720825 116356 32422   3 96   1  0  0
> 
> 2012-06-27 14:00:03.219-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 75  1          0     566988        576   35664800    0    0     2 
> 1355   21    3   1  4  95  0  0
> 433  1          0     964052        576   35069112    0    0     7
> 456359 102256 20901   2 98   0  0  0
> 547  3          0     820116        576   34893932    0    0    57
> 560507 114878 28115   3 96   0  0  0
> 806  2          0     606992        576   34848180    0    0   339
> 309668 101230 21056   2 98   0  0  0
> 708  1          0     529624        576   34708000    0    0   248
> 370886 101327 20062   2 97   0  0  0
> 231  5          0     504772        576   34663880    0    0   305
> 334824 95045 20407   2 97   1  1  0
> 158  6          0    1063088        576   33518536    0    0   531
> 847435 130696 47140   4 92   1  2  0
> 193  0          0    1449156        576   33035572    0    0   363
> 371279 94470 18955   2 96   1  1  0
> 266  6          0    1623512        576   32728164    0    0    77
> 241114 95730 15483   2 98   0  0  0
> 243  8          0    1629504        576   32653080    0    0    81
> 471018 100223 20920   3 96   0  1  0
> 70 11          0    1342140        576   33084020    0    0   100 925869
> 139876 56599   6 88   3  3  0
> 211  7          0    1130316        576   33470432    0    0   290
> 1008984 150699 74320   6 83   6  5  0
> 365  3          0     776736        576   34072772    0    0   182
> 747167 139436 67135   5 88   4  3  0
> 29  1          0    1528412        576   34110640    0    0    50 612181
> 137403 77609   4 87   6  3  0
> 266  5          0    1657688        576   34105696    0    0     3
> 258307 62879 38508   2 93   3  2  0
> 1159  2          0    2002256        576   33775476    0    0    19
> 88554 42112 14230   1 98   0  0  0
> 
> 
> Right around 14:00 I was able to get a "perf -a -g"; here's the
> beginning of what "perf report --sort symbol --call-graph fractal,5"
> had to say:
> 
> #
>     64.86%  [k] _raw_spin_lock_irqsave
>             |
>             |--97.94%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          |
>             |          |--99.56%-- try_to_compact_pages
>             |          |          __alloc_pages_direct_compact
>             |          |          __alloc_pages_slowpath
>             |          |          __alloc_pages_nodemask
>             |          |          alloc_pages_vma
>             |          |          do_huge_pmd_anonymous_page
>             |          |          handle_mm_fault
>             |          |          do_page_fault
>             |          |          page_fault
>             |          |          |
>             |          |          |--53.53%-- skb_copy_datagram_iovec
>             |          |          |          tcp_recvmsg
>             |          |          |          inet_recvmsg
>             |          |          |          sock_recvmsg
>             |          |          |          sys_recvfrom
>             |          |          |          system_call_fastpath
>             |          |          |          __recv
>             |          |          |          |
>             |          |          |           --100.00%-- (nil)
>             |          |          |
>             |          |          |--27.80%-- __pthread_create_2_1
>             |          |          |          (nil)
>             |          |          |
>             |          |           --18.67%-- memcpy
>             |          |                     |
>             |          |                     |--57.38%-- 0x50d000005
>             |          |                     |
>             |          |                     |--34.52%-- 0x3b300bf271940a35
>             |          |                     |
>             |          |                      --8.10%-- 0x1500000000000009
>             |           --0.44%-- [...]
>              --2.06%-- [...]
> 
>      6.15%  [k] isolate_freepages_block
>             |
>             |--99.95%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          |
>             |          |--99.54%-- try_to_compact_pages
>             |          |          __alloc_pages_direct_compact
>             |          |          __alloc_pages_slowpath
>             |          |          __alloc_pages_nodemask
>             |          |          alloc_pages_vma
>             |          |          do_huge_pmd_anonymous_page
>             |          |          handle_mm_fault
>             |          |          do_page_fault
>             |          |          page_fault
>             |          |          |
>             |          |          |--54.40%-- skb_copy_datagram_iovec
>             |          |          |          tcp_recvmsg
>             |          |          |          inet_recvmsg
>             |          |          |          sock_recvmsg
>             |          |          |          sys_recvfrom
>             |          |          |          system_call_fastpath
>             |          |          |          __recv
>             |          |          |          |
>             |          |          |           --100.00%-- (nil)
>             |          |          |
>             |          |          |--25.19%-- __pthread_create_2_1
>             |          |          |          (nil)
>             |          |          |
>             |          |           --20.41%-- memcpy
>             |          |                     |
>             |          |                     |--40.24%-- 0x3b300bf271940a35
>             |          |                     |
>             |          |                     |--38.29%-- 0x1500000000000009
>             |          |                     |
>             |          |                      --21.47%-- 0x50d000005
>             |           --0.46%-- [...]
>              --0.05%-- [...]
> 
>      3.96%  [.] ceph_crc32c_le
>             |
>             |--99.99%-- 0xb8057558d0065990
>              --0.01%-- [...]
> 
> A different system in the same test had a slightly different
> call tree, but isolate_freepages() still seems to show up
> prominently:
> 
> #
>     32.32%  [k] _raw_spin_lock_irqsave
>             |
>             |--97.64%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          try_to_compact_pages
>             |          __alloc_pages_direct_compact
>             |          __alloc_pages_slowpath
>             |          __alloc_pages_nodemask
>             |          alloc_pages_vma
>             |          do_huge_pmd_anonymous_page
>             |          handle_mm_fault
>             |          do_page_fault
>             |          page_fault
>             |          |
>             |          |--65.31%-- skb_copy_datagram_iovec
>             |          |          tcp_recvmsg
>             |          |          inet_recvmsg
>             |          |          sock_recvmsg
>             |          |          sys_recvfrom
>             |          |          system_call_fastpath
>             |          |          __recv
>             |          |          |
>             |          |           --100.00%-- (nil)
>             |          |
>             |          |--30.98%-- memcpy
>             |          |          |
>             |          |          |--50.60%-- 0x50d0000
>             |          |          |
>             |          |           --49.40%-- 0x50d000005
>             |           --3.70%-- [...]
>              --2.36%-- [...]
> 
>     17.10%  [k] _raw_spin_lock_irq
>             |
>             |--98.27%-- isolate_migratepages_range
>             |          compact_zone
>             |          try_to_compact_pages
>             |          __alloc_pages_direct_compact
>             |          __alloc_pages_slowpath
>             |          __alloc_pages_nodemask
>             |          alloc_pages_vma
>             |          do_huge_pmd_anonymous_page
>             |          handle_mm_fault
>             |          do_page_fault
>             |          page_fault
>             |          |
>             |          |--99.85%-- __pthread_create_2_1
>             |          |          (nil)
>             |           --0.15%-- [...]
>              --1.73%-- [...]
> 
>      4.59%  [k] mutex_spin_on_owner
>             |
>             --- __mutex_lock_slowpath
>                 mutex_lock
>                |
>                |--50.14%-- page_lock_anon_vma
>                |          |
>                |          |--99.99%-- try_to_unmap_anon
>                |          |          try_to_unmap
>                |          |          __unmap_and_move
>                |          |          unmap_and_move
>                |          |          migrate_pages
>                |          |          compact_zone
>                |          |          try_to_compact_pages
>                |          |          __alloc_pages_direct_compact
>                |          |          __alloc_pages_slowpath
>                |          |          __alloc_pages_nodemask
>                |          |          alloc_pages_vma
>                |          |          do_huge_pmd_anonymous_page
>                |          |          handle_mm_fault
>                |          |          do_page_fault
>                |          |          page_fault
>                |          |          |
>                |          |          |--99.91%-- __pthread_create_2_1
>                |          |          |          (nil)
>                |          |           --0.09%-- [...]
>                |           --0.01%-- [...]
>                |
>                |--49.67%-- rmap_walk
>                |          move_to_new_page
>                |          __unmap_and_move
>                |          unmap_and_move
>                |          migrate_pages
>                |          compact_zone
>                |          try_to_compact_pages
>                |          __alloc_pages_direct_compact
>                |          __alloc_pages_slowpath
>                |          __alloc_pages_nodemask
>                |          alloc_pages_vma
>                |          do_huge_pmd_anonymous_page
>                |          handle_mm_fault
>                |          do_page_fault
>                |          page_fault
>                |          |
>                |          |--99.69%-- __pthread_create_2_1
>                |          |          (nil)
>                |           --0.31%-- [...]
>                 --0.20%-- [...]
> 
>      4.10%  [k] isolate_freepages_block
>             |
>             |--99.95%-- isolate_freepages
>             |          compaction_alloc
>             |          unmap_and_move
>             |          migrate_pages
>             |          compact_zone
>             |          try_to_compact_pages
>             |          __alloc_pages_direct_compact
>             |          __alloc_pages_slowpath
>             |          __alloc_pages_nodemask
>             |          alloc_pages_vma
>             |          do_huge_pmd_anonymous_page
>             |          handle_mm_fault
>             |          do_page_fault
>             |          page_fault
>             |          |
>             |          |--46.97%-- skb_copy_datagram_iovec
>             |          |          tcp_recvmsg
>             |          |          inet_recvmsg
>             |          |          sock_recvmsg
>             |          |          sys_recvfrom
>             |          |          system_call_fastpath
>             |          |          __recv
>             |          |          |
>             |          |           --100.00%-- (nil)
>             |          |
>             |          |--31.79%-- __pthread_create_2_1
>             |          |          (nil)
>             |          |
>             |           --21.24%-- memcpy
>             |                     |
>             |                     |--61.90%-- 0x50d000005
>             |                     |
>             |                      --38.10%-- 0x50d0000
>              --0.05%-- [...]
> 
>      3.65%  [.] ceph_crc32c_le
>             |
>             |--99.86%-- 0xb8057558d0065990
>              --0.14%-- [...]
> 
> 
> 
> I seem to be able to recreate this issue at will, so please
> let me know what I can do to help learn what is going on.
> 
> Thanks -- Jim



I doubt compaction try to migrate continuously although we have no free memory.
Could you apply this patch and retest?

https://lkml.org/lkml/2012/6/21/30

Thanks.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-06-27 23:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-27 21:59 excessive CPU utilization by isolate_freepages? Jim Schutt
2012-06-27 21:59 ` Jim Schutt
2012-06-27 22:35 ` Mark Nelson
2012-06-27 22:35   ` Mark Nelson
2012-06-27 23:59 ` Minchan Kim [this message]
2012-06-27 23:59   ` Minchan Kim
2012-06-28  0:28   ` Rik van Riel
2012-06-28  0:28     ` Rik van Riel
2012-06-28  0:52     ` David Rientjes
2012-06-28  0:52       ` David Rientjes
2012-06-28  0:58       ` Minchan Kim
2012-06-28  0:58         ` Minchan Kim
2012-06-28  1:06         ` David Rientjes
2012-06-28  1:06           ` David Rientjes
2012-06-28  1:18           ` Minchan Kim
2012-06-28  1:18             ` Minchan Kim
2012-06-28  1:13       ` Rik van Riel
2012-06-28  1:13         ` Rik van Riel
2012-06-28  0:55     ` Minchan Kim
2012-06-28  0:55       ` Minchan Kim
2012-06-28 11:36 ` Mel Gorman
2012-06-28 11:36   ` Mel Gorman
2012-06-28 15:30   ` Jim Schutt
2012-06-28 15:30     ` Jim Schutt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FEB9E73.5040709@kernel.org \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jaschut@sandia.gov \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.