* [BUG] infinite loop in find_get_pages()
@ 2011-09-13 19:23 Eric Dumazet
2011-09-13 23:53 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Eric Dumazet @ 2011-09-13 19:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
Linus,
It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
expect too much from them.
On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
have a cpu locked in
find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
Problem is : A bisection will be very hard, since a lot of kernels
simply destroy my disk (the PCI MRRS horror stuff).
Messages at console :
INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
11 t=60002 jiffies)
perf top -C 1
Events: 3K cycles
+ 43,08% bash [kernel.kallsyms] [k] __lookup
+ 41,51% bash [kernel.kallsyms] [k] find_get_pages
+ 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
43.08% bash [kernel.kallsyms] [k] __lookup
|
--- __lookup
|
|--97.09%-- radix_tree_gang_lookup_slot
| find_get_pages
| pagevec_lookup
| invalidate_mapping_pages
| drop_pagecache_sb
| iterate_supers
| drop_caches_sysctl_handler
| proc_sys_call_handler.isra.3
| proc_sys_write
| vfs_write
| sys_write
| system_call_fastpath
| __write
|
Steps to reproduce :
In one terminal, kernel builds in a loop (defconfig + hpsa driver)
cd /usr/src/linux
while :
do
make clean
make -j128
done
In another term :
while :
do
echo 3 >/proc/sys/vm/drop_caches
sleep 20
done
Before the lock, I can see in another terminal some swapping activity.
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 2 17728 3443924 11520 328020 0 0 256 12076 16250 554 0 6 82 12
1 1 17728 3444776 11584 328072 0 0 100 2868 16223 267 0 6 86 7
1 1 17728 3442200 12100 328348 0 0 868 0 16600 1778 0 7 88 6
1 1 17728 3438032 13036 329048 0 0 1628 0 16862 2480 0 7 87 5
1 1 17728 3546864 13988 220256 0 0 1000 0 16313 931 0 7 87 6
1 1 17728 3544260 16024 220256 0 0 2036 0 16513 1531 0 6 88 6
1 1 17728 3542896 17196 220256 0 0 1160 556 16324 893 0 6 88 6
1 1 17728 3540748 18756 220256 0 0 1560 0 16398 1172 0 6 88 6
1 1 17728 3538692 20168 220256 0 0 1412 0 16544 1088 0 6 88 6
2 0 17728 3536676 21816 220248 0 0 1648 0 16447 1246 0 6 88 6
1 1 17728 3535052 22544 220256 0 0 728 0 16215 605 1 6 87 5
1 1 17728 3533672 23404 220244 0 0 860 4240 16264 705 0 6 88 6
1 1 17728 3532688 24232 220244 0 0 828 0 16272 685 0 6 87 6
1 1 17728 3531552 25080 220244 0 0 848 0 16294 700 0 6 88 6
1 1 17728 3529584 26532 220256 0 0 1452 0 16376 1104 0 6 87 6
1 2 17728 3545232 27848 199176 0 0 1312 52 16392 911 0 7 85 8
1 2 17728 3659060 29576 84420 0 0 1736 40 16570 959 0 7 81 12
38 3 17728 3640652 29984 69976 0 0 688 0 16885 2987 3 8 80 9
5 2 17728 3601716 30208 75628 0 0 4676 4 18080 5727 11 10 66 12
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
103 27 17728 2286372 30376 78952 0 0 3044 8 17772 6803 49 16 34 1
128 1 17728 1337588 30416 79952 0 0 732 4080 16389 4874 91 9 0 0
122 7 17728 730264 30472 81056 0 0 540 1300 16535 5451 91 9 0 0
99 16 17728 996308 30544 83136 0 0 492 452 16951 6629 92 8 0 0
89 23 17728 1150640 30592 88288 0 0 3232 224 17286 7312 91 9 0 0
114 7 17728 1344768 30660 92104 0 0 1668 228 17395 7297 89 11 0 0
99 3 17728 848716 30696 93684 0 0 688 2072 16947 6368 92 8 0 0
112 9 17728 609908 30748 96036 0 0 620 272 17221 7640 90 10 0 0
111 8 17728 480244 30808 98268 0 0 788 320 17227 7391 92 8 0 0
115 7 17728 549564 30852 100552 0 0 656 232 17583 7807 92 9 0 0
107 9 17728 666776 30888 102904 0 0 716 0 17406 7781 91 9 0 0
124 5 17728 685368 30960 105544 0 0 1056 944 17281 7713 90 10 0 0
130 1 17728 538832 31000 108080 0 0 776 0 16943 7347 91 9 0 0
130 0 17728 364476 31032 110252 0 0 676 0 16767 6948 91 9 0 0
129 0 17728 149332 31064 111848 0 0 540 32 16673 6272 92 8 0 0
129 0 17728 274664 31096 114052 0 0 628 0 17207 7694 92 8 0 0
128 3 17728 589736 31160 117420 0 0 816 996 17381 8443 90 10 0 0
126 5 17728 485300 31172 119544 0 0 416 0 17024 7186 91 9 0 0
130 0 17728 349500 31216 122344 0 0 492 0 17046 7358 91 9 0 0
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
130 2 17728 416972 31248 125404 0 0 496 120 17112 7124 91 9 0 0
125 5 17308 188608 29444 106888 0 576 1436 612 40020 9430 91 8 0 0
113 16 17308 218700 29528 110336 32 0 1908 0 17210 7214 92 8 0 0
1 145 20292 15688 26884 108200 40 3020 188 4660 27003 3664 30 7 0 63
1 145 21128 15920 24212 107420 0 836 0 3824 16813 430 1 6 0 93
2 144 22904 16020 20780 106780 0 1776 0 6548 16611 505 1 6 0 93
1 146 23496 15788 17476 106160 32 596 60 3620 16610 308 1 6 0 93
1 147 23924 16216 16028 105852 32 432 32 5012 16477 156 0 6 0 93
1 145 24428 15904 14744 103452 20 504 20 3112 16776 125 1 6 0 93
1 146 25304 16184 14688 97712 0 876 16 3352 16759 447 2 6 0 92
1 147 26984 15908 14588 88348 96 1680 96 6352 17006 235 1 6 0 93
1 146 28724 16112 14152 77132 32 1740 44 3536 16739 375 2 6 0 92
1 151 29900 15896 12072 68484 156 1184 192 2068 16860 576 2 6 0 91
2 152 33724 33908 9536 58616 184 3856 512 6764 16536 492 2 6 3 88
1 142 33276 427352 8964 58988 1096 120 2624 120 16730 1129 6 7 8 79
2 142 33000 421512 8988 60944 1560 0 3512 0 16771 1220 1 6 9 84
2 143 32604 392952 9012 62308 1176 0 2436 0 16690 1173 2 7 10 82
8 134 32400 255348 9044 64696 688 0 2584 0 17105 2181 16 8 14 62
6 136 31796 142068 9092 66024 1060 0 1828 0 17040 2226 37 10 12 41
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 143 31664 15844 9152 67452 580 64 1324 292 16973 2066 37 10 11 43
4 141 31876 56160 9052 67528 48 328 140 1724 16476 696 6 7 0 87
4 141 32420 176260 8896 68808 108 732 760 2280 17449 3081 24 9 0 67
11 134 32540 119868 8484 70568 108 852 1140 1408 17436 3788 45 12 1 43
17 129 32880 57044 8256 73008 0 364 1212 364 17489 4000 59 13 0 28
11 135 33468 107128 7660 73124 200 1044 888 2076 17043 1956 23 9 0 69
1 144 34788 16076 6948 71572 180 1524 276 1908 16787 967 13 7 0 79
1 145 35472 16188 5868 70348 112 768 120 1284 16696 561 1 6 0 93
1 145 36056 16696 5492 68240 16 596 16 3356 16456 202 0 6 0 93
1 143 38200 15952 3168 63968 32 2168 52 6460 16834 423 1 7 0 92
9 131 40128 139084 3064 61060 172 2144 644 2192 17701 2250 19 9 0 72
9 133 40548 110308 3092 60492 468 620 900 1852 17516 1983 35 9 0 55
10 132 40448 79476 3132 61808 1020 0 1480 0 17505 3254 35 10 0 55
12 132 40532 139396 3156 63204 776 260 1272 892 17457 3179 44 11 0 45
11 132 40392 66336 3256 65264 788 0 1536 0 17551 3860 46 11 0 43
1 142 41112 15796 3296 65680 1176 812 1636 2568 17026 1798 28 9 0 63
1 140 41500 15960 3244 64828 92 472 116 4008 16445 443 4 7 0 90
1 140 42252 16740 3232 64356 0 764 0 1500 16403 185 0 6 0 94
1 139 49636 16024 2928 60652 52 7376 52 7376 17507 1236 0 7 0 93
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 140 55780 16444 2548 55948 176 6200 332 6260 17160 592 1 7 0 92
3 145 59800 358088 2404 55468 100 4108 1092 4132 18514 3864 17 8 0 75
3 143 60712 27028 2416 57184 816 964 2392 1288 18089 3476 43 10 0 47
4 141 61296 154136 2516 58024 424 980 1312 980 17298 2489 28 9 0 62
21 122 62544 83120 2528 58372 100 1456 788 1456 17717 2738 64 12 0 24
24 120 62780 53328 2580 62216 16 292 2528 292 17163 4076 85 14 0 1
1 143 65088 16096 2492 61524 152 2708 764 2712 16734 1474 16 8 0 76
3 141 65672 34232 2476 60536 56 672 240 3208 16726 661 4 7 0 89
1 144 65584 16044 2488 60440 808 68 948 1532 17187 1353 10 8 0 82
4 141 70836 17216 2444 58024 64 5272 64 6968 16957 437 0 6 0 93
6 134 73728 31940 2424 56880 436 3092 748 3188 16950 1269 8 7 0 85
2 139 76036 107996 2408 56404 92 2420 476 2784 16869 690 6 7 0 87
6 135 76112 82792 2436 57884 1108 476 1632 724 16999 1711 18 8 0 73
1 139 77184 17872 2444 57860 996 1084 1168 2320 16644 748 11 8 0 81
1 141 91136 15952 2300 51868 100 14088 128 14152 17494 1284 1 7 5 87
1 143 98356 204144 2256 48168 640 7496 1148 7580 17471 1840 6 7 12 74
3 139 97344 174272 2276 48968 2636 0 3216 0 16962 1499 13 8 11 69
9 133 97220 123464 2352 50584 1348 0 2320 500 17100 2255 27 9 8 56
9 134 97092 33672 2396 51780 1292 108 2028 108 16821 1547 27 8 8 57
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
11 134 95068 75744 2448 53444 852 0 1696 0 17318 2630 34 10 2 54
1 143 95104 15972 2504 54544 116 44 696 44 16545 1209 20 8 5 67
^C
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-13 19:23 [BUG] infinite loop in find_get_pages() Eric Dumazet
@ 2011-09-13 23:53 ` Andrew Morton
2011-09-14 0:21 ` Eric Dumazet
2011-09-14 0:34 ` Lin Ming
[not found] ` <CA+55aFyG3-3_gqGjqUmsTAHWfmNLMdQVf4XqUZrDAGMBxgur=Q@mail.gmail.com>
[not found] ` <CA+55aFx41_Z4TjjJwPuE21Q8oD3aGWtQwh45DUiCjPVD-wCJXw@mail.gmail.com>
2 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2011-09-13 23:53 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linus Torvalds, linux-kernel, Andrew Morton, Toshiyuki Okajima,
Dave Chinner, Hugh Dickins
On Tue, 13 Sep 2011 21:23:21 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Linus,
>
> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
> expect too much from them.
>
> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
> have a cpu locked in
>
> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>
>
> Problem is : A bisection will be very hard, since a lot of kernels
> simply destroy my disk (the PCI MRRS horror stuff).
Yes, that's hard. Quite often my bisection efforts involve moving to a
new bisection point then hand-applying a few patches to make the the
thing compile and/or work.
There have only been three commits to radix-tree.c this year, so a bit
of manual searching through those would be practical?
> Messages at console :
>
> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
> 11 t=60002 jiffies)
>
> perf top -C 1
>
> Events: 3K cycles
> + 43,08% bash [kernel.kallsyms] [k] __lookup
> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>
> 43.08% bash [kernel.kallsyms] [k] __lookup
> |
> --- __lookup
> |
> |--97.09%-- radix_tree_gang_lookup_slot
> | find_get_pages
> | pagevec_lookup
> | invalidate_mapping_pages
> | drop_pagecache_sb
> | iterate_supers
> | drop_caches_sysctl_handler
> | proc_sys_call_handler.isra.3
> | proc_sys_write
> | vfs_write
> | sys_write
> | system_call_fastpath
> | __write
> |
>
>
> Steps to reproduce :
>
> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>
> cd /usr/src/linux
> while :
> do
> make clean
> make -j128
> done
>
>
> In another term :
>
> while :
> do
> echo 3 >/proc/sys/vm/drop_caches
> sleep 20
> done
>
This is a regression? 3.0 is OK?
Also, do you know that the hang is happening at the radix-tree level?
It might be at the filemap.c level or at the superblock level and we
just end up spending most cycles at the lower levels because they're
called so often? The iterate_supers/drop_pagecache_sb code is fairly
recent.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-13 23:53 ` Andrew Morton
@ 2011-09-14 0:21 ` Eric Dumazet
2011-09-14 0:34 ` Lin Ming
1 sibling, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2011-09-14 0:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, linux-kernel, Andrew Morton, Toshiyuki Okajima,
Dave Chinner, Hugh Dickins
Le mardi 13 septembre 2011 à 16:53 -0700, Andrew Morton a écrit :
> On Tue, 13 Sep 2011 21:23:21 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > Linus,
> >
> > It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
> > expect too much from them.
> >
> > On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
> > have a cpu locked in
> >
> > find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
> >
> >
> > Problem is : A bisection will be very hard, since a lot of kernels
> > simply destroy my disk (the PCI MRRS horror stuff).
>
> Yes, that's hard. Quite often my bisection efforts involve moving to a
> new bisection point then hand-applying a few patches to make the the
> thing compile and/or work.
>
> There have only been three commits to radix-tree.c this year, so a bit
> of manual searching through those would be practical?
>
> > Messages at console :
> >
> > INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
> > 11 t=60002 jiffies)
> >
> > perf top -C 1
> >
> > Events: 3K cycles
> > + 43,08% bash [kernel.kallsyms] [k] __lookup
> > + 41,51% bash [kernel.kallsyms] [k] find_get_pages
> > + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
> >
> > 43.08% bash [kernel.kallsyms] [k] __lookup
> > |
> > --- __lookup
> > |
> > |--97.09%-- radix_tree_gang_lookup_slot
> > | find_get_pages
> > | pagevec_lookup
> > | invalidate_mapping_pages
> > | drop_pagecache_sb
> > | iterate_supers
> > | drop_caches_sysctl_handler
> > | proc_sys_call_handler.isra.3
> > | proc_sys_write
> > | vfs_write
> > | sys_write
> > | system_call_fastpath
> > | __write
> > |
> >
> >
> > Steps to reproduce :
> >
> > In one terminal, kernel builds in a loop (defconfig + hpsa driver)
> >
> > cd /usr/src/linux
> > while :
> > do
> > make clean
> > make -j128
> > done
> >
> >
> > In another term :
> >
> > while :
> > do
> > echo 3 >/proc/sys/vm/drop_caches
> > sleep 20
> > done
> >
>
> This is a regression? 3.0 is OK?
>
3.0 seems ok, and first bisection point seems OK too.
# git bisect log
git bisect start
# bad: [003f6c9df54970d8b19578d195b3e2b398cdbde2] lib/sha1.c: quiet
sparse noise about symbol not declared
git bisect bad 003f6c9df54970d8b19578d195b3e2b398cdbde2
# good: [02f8c6aee8df3cdc935e9bdd4f2d020306035dbe] Linux 3.0
git bisect good 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe
(I let the machine runs one hour or so before concluding its a good/bad
point)
> Also, do you know that the hang is happening at the radix-tree level?
> It might be at the filemap.c level or at the superblock level and we
> just end up spending most cycles at the lower levels because they're
> called so often? The iterate_supers/drop_pagecache_sb code is fairly
> recent.
>
>
No idea yet, but I'll take a look after a bit of sleep ;)
Thanks !
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-13 23:53 ` Andrew Morton
2011-09-14 0:21 ` Eric Dumazet
@ 2011-09-14 0:34 ` Lin Ming
2011-09-15 10:47 ` Pawel Sikora
1 sibling, 1 reply; 24+ messages in thread
From: Lin Ming @ 2011-09-14 0:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric Dumazet, Linus Torvalds, linux-kernel, Andrew Morton,
Toshiyuki Okajima, Dave Chinner, Hugh Dickins, Pawel Sikora,
Justin Piszcz
On Wed, Sep 14, 2011 at 7:53 AM, Andrew Morton <akpm@google.com> wrote:
> On Tue, 13 Sep 2011 21:23:21 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> Linus,
>>
>> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
>> expect too much from them.
>>
>> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
>> have a cpu locked in
>>
>> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>>
>>
>> Problem is : A bisection will be very hard, since a lot of kernels
>> simply destroy my disk (the PCI MRRS horror stuff).
>
> Yes, that's hard. Quite often my bisection efforts involve moving to a
> new bisection point then hand-applying a few patches to make the the
> thing compile and/or work.
>
> There have only been three commits to radix-tree.c this year, so a bit
> of manual searching through those would be practical?
>
>> Messages at console :
>>
>> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
>> 11 t=60002 jiffies)
>>
>> perf top -C 1
>>
>> Events: 3K cycles
>> + 43,08% bash [kernel.kallsyms] [k] __lookup
>> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
>> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>>
>> 43.08% bash [kernel.kallsyms] [k] __lookup
>> |
>> --- __lookup
>> |
>> |--97.09%-- radix_tree_gang_lookup_slot
>> | find_get_pages
>> | pagevec_lookup
>> | invalidate_mapping_pages
>> | drop_pagecache_sb
>> | iterate_supers
>> | drop_caches_sysctl_handler
>> | proc_sys_call_handler.isra.3
>> | proc_sys_write
>> | vfs_write
>> | sys_write
>> | system_call_fastpath
>> | __write
>> |
>>
>>
>> Steps to reproduce :
>>
>> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>>
>> cd /usr/src/linux
>> while :
>> do
>> make clean
>> make -j128
>> done
>>
>>
>> In another term :
>>
>> while :
>> do
>> echo 3 >/proc/sys/vm/drop_caches
>> sleep 20
>> done
>>
>
> This is a regression? 3.0 is OK?
FYI, other guys have reported similar bugs for 3.0.
kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110
http://marc.info/?l=linux-kernel&m=131342662028153&w=2
[3.0.2-stable] BUG: soft lockup - CPU#13 stuck for 22s! [kswapd2:1092]
http://marc.info/?l=linux-kernel&m=131469584117857&w=2
kernel 3.1-rc4: BUG soft lockup (w/frame pointers enabled)
http://marc.info/?l=linux-kernel&m=131566383719422&w=2
Lin Ming
>
> Also, do you know that the hang is happening at the radix-tree level?
> It might be at the filemap.c level or at the superblock level and we
> just end up spending most cycles at the lower levels because they're
> called so often? The iterate_supers/drop_pagecache_sb code is fairly
> recent.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
[not found] ` <CA+55aFyG3-3_gqGjqUmsTAHWfmNLMdQVf4XqUZrDAGMBxgur=Q@mail.gmail.com>
@ 2011-09-14 6:48 ` Linus Torvalds
2011-09-14 6:53 ` Eric Dumazet
0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2011-09-14 6:48 UTC (permalink / raw)
To: Eric Dumazet, Hugh Dickins, Andrew Morton; +Cc: linux-kernel, Rik van Riel
Re-sending, because apparently none of my email in the last few days
have actually gone out due to LF problems..
Linus
On Tue, Sep 13, 2011 at 12:48 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Sep 13, 2011 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
>> expect too much from them.
>
> No, by now, they should be damn well reliable.
>
>> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
>> have a cpu locked in
>>
>> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>
> Hmm. There hasn't been many changes in this area, so the few changes
> that *do* exist are obviously very suspicious.
>
> In particular, the only real change to that whole setup is the changes
> by Hugh to make the swap entries use the radix tree. So I'm bringing
> Hugh and Andrew to the discussion (and Rik, since he acked a few of
> those changes).
>
> The fact that some light swapping activity seems to accompany the
> problem just makes me more certain it's Hugh's swap/radix tree work.
>
> We're talking only a handful of patches, so maybe Hugh could create a
> revert patch just to confirm that yes, that's the problem.
>
> Hugh?
>
> Linus
>
> --- quoting the rest of the email for Hugh/Andrew ---
>> Problem is : A bisection will be very hard, since a lot of kernels
>> simply destroy my disk (the PCI MRRS horror stuff).
>>
>> Messages at console :
>>
>> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
>> 11 t=60002 jiffies)
>>
>> perf top -C 1
>>
>> Events: 3K cycles
>> + 43,08% bash [kernel.kallsyms] [k] __lookup
>> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
>> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>>
>> 43.08% bash [kernel.kallsyms] [k] __lookup
>> |
>> --- __lookup
>> |
>> |--97.09%-- radix_tree_gang_lookup_slot
>> | find_get_pages
>> | pagevec_lookup
>> | invalidate_mapping_pages
>> | drop_pagecache_sb
>> | iterate_supers
>> | drop_caches_sysctl_handler
>> | proc_sys_call_handler.isra.3
>> | proc_sys_write
>> | vfs_write
>> | sys_write
>> | system_call_fastpath
>> | __write
>> |
>>
>>
>> Steps to reproduce :
>>
>> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>>
>> cd /usr/src/linux
>> while :
>> do
>> make clean
>> make -j128
>> done
>>
>>
>> In another term :
>>
>> while :
>> do
>> echo 3 >/proc/sys/vm/drop_caches
>> sleep 20
>> done
>>
>>
>> Before the lock, I can see in another terminal some swapping activity.
>>
>> $ vmstat 1
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 2 2 17728 3443924 11520 328020 0 0 256 12076 16250 554 0 6 82 12
>> 1 1 17728 3444776 11584 328072 0 0 100 2868 16223 267 0 6 86 7
>> 1 1 17728 3442200 12100 328348 0 0 868 0 16600 1778 0 7 88 6
>> 1 1 17728 3438032 13036 329048 0 0 1628 0 16862 2480 0 7 87 5
>> 1 1 17728 3546864 13988 220256 0 0 1000 0 16313 931 0 7 87 6
>> 1 1 17728 3544260 16024 220256 0 0 2036 0 16513 1531 0 6 88 6
>> 1 1 17728 3542896 17196 220256 0 0 1160 556 16324 893 0 6 88 6
>> 1 1 17728 3540748 18756 220256 0 0 1560 0 16398 1172 0 6 88 6
>> 1 1 17728 3538692 20168 220256 0 0 1412 0 16544 1088 0 6 88 6
>> 2 0 17728 3536676 21816 220248 0 0 1648 0 16447 1246 0 6 88 6
>> 1 1 17728 3535052 22544 220256 0 0 728 0 16215 605 1 6 87 5
>> 1 1 17728 3533672 23404 220244 0 0 860 4240 16264 705 0 6 88 6
>> 1 1 17728 3532688 24232 220244 0 0 828 0 16272 685 0 6 87 6
>> 1 1 17728 3531552 25080 220244 0 0 848 0 16294 700 0 6 88 6
>> 1 1 17728 3529584 26532 220256 0 0 1452 0 16376 1104 0 6 87 6
>> 1 2 17728 3545232 27848 199176 0 0 1312 52 16392 911 0 7 85 8
>> 1 2 17728 3659060 29576 84420 0 0 1736 40 16570 959 0 7 81 12
>> 38 3 17728 3640652 29984 69976 0 0 688 0 16885 2987 3 8 80 9
>> 5 2 17728 3601716 30208 75628 0 0 4676 4 18080 5727 11 10 66 12
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 103 27 17728 2286372 30376 78952 0 0 3044 8 17772 6803 49 16 34 1
>> 128 1 17728 1337588 30416 79952 0 0 732 4080 16389 4874 91 9 0 0
>> 122 7 17728 730264 30472 81056 0 0 540 1300 16535 5451 91 9 0 0
>> 99 16 17728 996308 30544 83136 0 0 492 452 16951 6629 92 8 0 0
>> 89 23 17728 1150640 30592 88288 0 0 3232 224 17286 7312 91 9 0 0
>> 114 7 17728 1344768 30660 92104 0 0 1668 228 17395 7297 89 11 0 0
>> 99 3 17728 848716 30696 93684 0 0 688 2072 16947 6368 92 8 0 0
>> 112 9 17728 609908 30748 96036 0 0 620 272 17221 7640 90 10 0 0
>> 111 8 17728 480244 30808 98268 0 0 788 320 17227 7391 92 8 0 0
>> 115 7 17728 549564 30852 100552 0 0 656 232 17583 7807 92 9 0 0
>> 107 9 17728 666776 30888 102904 0 0 716 0 17406 7781 91 9 0 0
>> 124 5 17728 685368 30960 105544 0 0 1056 944 17281 7713 90 10 0 0
>> 130 1 17728 538832 31000 108080 0 0 776 0 16943 7347 91 9 0 0
>> 130 0 17728 364476 31032 110252 0 0 676 0 16767 6948 91 9 0 0
>> 129 0 17728 149332 31064 111848 0 0 540 32 16673 6272 92 8 0 0
>> 129 0 17728 274664 31096 114052 0 0 628 0 17207 7694 92 8 0 0
>> 128 3 17728 589736 31160 117420 0 0 816 996 17381 8443 90 10 0 0
>> 126 5 17728 485300 31172 119544 0 0 416 0 17024 7186 91 9 0 0
>> 130 0 17728 349500 31216 122344 0 0 492 0 17046 7358 91 9 0 0
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 130 2 17728 416972 31248 125404 0 0 496 120 17112 7124 91 9 0 0
>> 125 5 17308 188608 29444 106888 0 576 1436 612 40020 9430 91 8 0 0
>> 113 16 17308 218700 29528 110336 32 0 1908 0 17210 7214 92 8 0 0
>> 1 145 20292 15688 26884 108200 40 3020 188 4660 27003 3664 30 7 0 63
>> 1 145 21128 15920 24212 107420 0 836 0 3824 16813 430 1 6 0 93
>> 2 144 22904 16020 20780 106780 0 1776 0 6548 16611 505 1 6 0 93
>> 1 146 23496 15788 17476 106160 32 596 60 3620 16610 308 1 6 0 93
>> 1 147 23924 16216 16028 105852 32 432 32 5012 16477 156 0 6 0 93
>> 1 145 24428 15904 14744 103452 20 504 20 3112 16776 125 1 6 0 93
>> 1 146 25304 16184 14688 97712 0 876 16 3352 16759 447 2 6 0 92
>> 1 147 26984 15908 14588 88348 96 1680 96 6352 17006 235 1 6 0 93
>> 1 146 28724 16112 14152 77132 32 1740 44 3536 16739 375 2 6 0 92
>> 1 151 29900 15896 12072 68484 156 1184 192 2068 16860 576 2 6 0 91
>> 2 152 33724 33908 9536 58616 184 3856 512 6764 16536 492 2 6 3 88
>> 1 142 33276 427352 8964 58988 1096 120 2624 120 16730 1129 6 7 8 79
>> 2 142 33000 421512 8988 60944 1560 0 3512 0 16771 1220 1 6 9 84
>> 2 143 32604 392952 9012 62308 1176 0 2436 0 16690 1173 2 7 10 82
>> 8 134 32400 255348 9044 64696 688 0 2584 0 17105 2181 16 8 14 62
>> 6 136 31796 142068 9092 66024 1060 0 1828 0 17040 2226 37 10 12 41
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 2 143 31664 15844 9152 67452 580 64 1324 292 16973 2066 37 10 11 43
>> 4 141 31876 56160 9052 67528 48 328 140 1724 16476 696 6 7 0 87
>> 4 141 32420 176260 8896 68808 108 732 760 2280 17449 3081 24 9 0 67
>> 11 134 32540 119868 8484 70568 108 852 1140 1408 17436 3788 45 12 1 43
>> 17 129 32880 57044 8256 73008 0 364 1212 364 17489 4000 59 13 0 28
>> 11 135 33468 107128 7660 73124 200 1044 888 2076 17043 1956 23 9 0 69
>> 1 144 34788 16076 6948 71572 180 1524 276 1908 16787 967 13 7 0 79
>> 1 145 35472 16188 5868 70348 112 768 120 1284 16696 561 1 6 0 93
>> 1 145 36056 16696 5492 68240 16 596 16 3356 16456 202 0 6 0 93
>> 1 143 38200 15952 3168 63968 32 2168 52 6460 16834 423 1 7 0 92
>> 9 131 40128 139084 3064 61060 172 2144 644 2192 17701 2250 19 9 0 72
>> 9 133 40548 110308 3092 60492 468 620 900 1852 17516 1983 35 9 0 55
>> 10 132 40448 79476 3132 61808 1020 0 1480 0 17505 3254 35 10 0 55
>> 12 132 40532 139396 3156 63204 776 260 1272 892 17457 3179 44 11 0 45
>> 11 132 40392 66336 3256 65264 788 0 1536 0 17551 3860 46 11 0 43
>> 1 142 41112 15796 3296 65680 1176 812 1636 2568 17026 1798 28 9 0 63
>> 1 140 41500 15960 3244 64828 92 472 116 4008 16445 443 4 7 0 90
>> 1 140 42252 16740 3232 64356 0 764 0 1500 16403 185 0 6 0 94
>> 1 139 49636 16024 2928 60652 52 7376 52 7376 17507 1236 0 7 0 93
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 2 140 55780 16444 2548 55948 176 6200 332 6260 17160 592 1 7 0 92
>> 3 145 59800 358088 2404 55468 100 4108 1092 4132 18514 3864 17 8 0 75
>> 3 143 60712 27028 2416 57184 816 964 2392 1288 18089 3476 43 10 0 47
>> 4 141 61296 154136 2516 58024 424 980 1312 980 17298 2489 28 9 0 62
>> 21 122 62544 83120 2528 58372 100 1456 788 1456 17717 2738 64 12 0 24
>> 24 120 62780 53328 2580 62216 16 292 2528 292 17163 4076 85 14 0 1
>> 1 143 65088 16096 2492 61524 152 2708 764 2712 16734 1474 16 8 0 76
>> 3 141 65672 34232 2476 60536 56 672 240 3208 16726 661 4 7 0 89
>> 1 144 65584 16044 2488 60440 808 68 948 1532 17187 1353 10 8 0 82
>> 4 141 70836 17216 2444 58024 64 5272 64 6968 16957 437 0 6 0 93
>> 6 134 73728 31940 2424 56880 436 3092 748 3188 16950 1269 8 7 0 85
>> 2 139 76036 107996 2408 56404 92 2420 476 2784 16869 690 6 7 0 87
>> 6 135 76112 82792 2436 57884 1108 476 1632 724 16999 1711 18 8 0 73
>> 1 139 77184 17872 2444 57860 996 1084 1168 2320 16644 748 11 8 0 81
>> 1 141 91136 15952 2300 51868 100 14088 128 14152 17494 1284 1 7 5 87
>> 1 143 98356 204144 2256 48168 640 7496 1148 7580 17471 1840 6 7 12 74
>> 3 139 97344 174272 2276 48968 2636 0 3216 0 16962 1499 13 8 11 69
>> 9 133 97220 123464 2352 50584 1348 0 2320 500 17100 2255 27 9 8 56
>> 9 134 97092 33672 2396 51780 1292 108 2028 108 16821 1547 27 8 8 57
>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id wa
>> 11 134 95068 75744 2448 53444 852 0 1696 0 17318 2630 34 10 2 54
>> 1 143 95104 15972 2504 54544 116 44 696 44 16545 1209 20 8 5 67
>> ^C
>>
>>
>>
>>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
[not found] ` <CA+55aFx41_Z4TjjJwPuE21Q8oD3aGWtQwh45DUiCjPVD-wCJXw@mail.gmail.com>
@ 2011-09-14 6:48 ` Linus Torvalds
0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2011-09-14 6:48 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-kernel
Again, sorry for the possible duplicates - but it looks like my email
hasn't been going out.
Linus
On Tue, Sep 13, 2011 at 6:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Sep 13, 2011 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> while :
>> do
>> echo 3 >/proc/sys/vm/drop_caches
>> sleep 20
>> done
>
> Btw, do you actually have problems without this?
>
> The drop_caches thing could potentially result in a livelock, where
> we're dropping stuff as we are reading it in, and the reader just
> never makes progress (because dropping things is always faster than
> reading).
>
> So it may not be a "true lockup", it may just be really *really* slow,
> and wasting tons of CPU.
>
> It is possible that we should look at modifying the drop_caches code
> so that it always makes forward progress..
>
> Linus
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 6:48 ` Linus Torvalds
@ 2011-09-14 6:53 ` Eric Dumazet
2011-09-14 7:32 ` Shaohua Li
0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2011-09-14 6:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-kernel, Rik van Riel
Le mardi 13 septembre 2011 à 23:48 -0700, Linus Torvalds a écrit :
> Re-sending, because apparently none of my email in the last few days
> have actually gone out due to LF problems..
>
> Linus
>
> On Tue, Sep 13, 2011 at 12:48 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, Sep 13, 2011 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
> >> expect too much from them.
> >
> > No, by now, they should be damn well reliable.
> >
> >> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
> >> have a cpu locked in
> >>
> >> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
> >
> > Hmm. There hasn't been many changes in this area, so the few changes
> > that *do* exist are obviously very suspicious.
> >
> > In particular, the only real change to that whole setup is the changes
> > by Hugh to make the swap entries use the radix tree. So I'm bringing
> > Hugh and Andrew to the discussion (and Rik, since he acked a few of
> > those changes).
> >
> > The fact that some light swapping activity seems to accompany the
> > problem just makes me more certain it's Hugh's swap/radix tree work.
> >
> > We're talking only a handful of patches, so maybe Hugh could create a
> > revert patch just to confirm that yes, that's the problem.
> >
> > Hugh?
> >
> > Linus
> >
> > --- quoting the rest of the email for Hugh/Andrew ---
> >> Problem is : A bisection will be very hard, since a lot of kernels
> >> simply destroy my disk (the PCI MRRS horror stuff).
> >>
> >> Messages at console :
> >>
> >> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
> >> 11 t=60002 jiffies)
> >>
> >> perf top -C 1
> >>
> >> Events: 3K cycles
> >> + 43,08% bash [kernel.kallsyms] [k] __lookup
> >> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
> >> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
> >>
> >> 43.08% bash [kernel.kallsyms] [k] __lookup
> >> |
> >> --- __lookup
> >> |
> >> |--97.09%-- radix_tree_gang_lookup_slot
> >> | find_get_pages
> >> | pagevec_lookup
> >> | invalidate_mapping_pages
> >> | drop_pagecache_sb
> >> | iterate_supers
> >> | drop_caches_sysctl_handler
> >> | proc_sys_call_handler.isra.3
> >> | proc_sys_write
> >> | vfs_write
> >> | sys_write
> >> | system_call_fastpath
> >> | __write
> >> |
> >>
> >>
> >> Steps to reproduce :
> >>
> >> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
> >>
> >> cd /usr/src/linux
> >> while :
> >> do
> >> make clean
> >> make -j128
> >> done
> >>
> >>
> >> In another term :
> >>
> >> while :
> >> do
> >> echo 3 >/proc/sys/vm/drop_caches
> >> sleep 20
> >> done
> >>
> >>
> >> Before the lock, I can see in another terminal some swapping activity.
> >>
> >> $ vmstat 1
> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> r b swpd free buff cache si so bi bo in cs us sy id wa
> >> 2 2 17728 3443924 11520 328020 0 0 256 12076 16250 554 0 6 82 12
> >> 1 1 17728 3444776 11584 328072 0 0 100 2868 16223 267 0 6 86 7
> >> 1 1 17728 3442200 12100 328348 0 0 868 0 16600 1778 0 7 88 6
> >> 1 1 17728 3438032 13036 329048 0 0 1628 0 16862 2480 0 7 87 5
> >> 1 1 17728 3546864 13988 220256 0 0 1000 0 16313 931 0 7 87 6
> >> 1 1 17728 3544260 16024 220256 0 0 2036 0 16513 1531 0 6 88 6
> >> 1 1 17728 3542896 17196 220256 0 0 1160 556 16324 893 0 6 88 6
> >> 1 1 17728 3540748 18756 220256 0 0 1560 0 16398 1172 0 6 88 6
> >> 1 1 17728 3538692 20168 220256 0 0 1412 0 16544 1088 0 6 88 6
> >> 2 0 17728 3536676 21816 220248 0 0 1648 0 16447 1246 0 6 88 6
> >> 1 1 17728 3535052 22544 220256 0 0 728 0 16215 605 1 6 87 5
> >> 1 1 17728 3533672 23404 220244 0 0 860 4240 16264 705 0 6 88 6
> >> 1 1 17728 3532688 24232 220244 0 0 828 0 16272 685 0 6 87 6
> >> 1 1 17728 3531552 25080 220244 0 0 848 0 16294 700 0 6 88 6
> >> 1 1 17728 3529584 26532 220256 0 0 1452 0 16376 1104 0 6 87 6
> >> 1 2 17728 3545232 27848 199176 0 0 1312 52 16392 911 0 7 85 8
> >> 1 2 17728 3659060 29576 84420 0 0 1736 40 16570 959 0 7 81 12
> >> 38 3 17728 3640652 29984 69976 0 0 688 0 16885 2987 3 8 80 9
> >> 5 2 17728 3601716 30208 75628 0 0 4676 4 18080 5727 11 10 66 12
> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> r b swpd free buff cache si so bi bo in cs us sy id wa
> >> 103 27 17728 2286372 30376 78952 0 0 3044 8 17772 6803 49 16 34 1
> >> 128 1 17728 1337588 30416 79952 0 0 732 4080 16389 4874 91 9 0 0
> >> 122 7 17728 730264 30472 81056 0 0 540 1300 16535 5451 91 9 0 0
> >> 99 16 17728 996308 30544 83136 0 0 492 452 16951 6629 92 8 0 0
> >> 89 23 17728 1150640 30592 88288 0 0 3232 224 17286 7312 91 9 0 0
> >> 114 7 17728 1344768 30660 92104 0 0 1668 228 17395 7297 89 11 0 0
> >> 99 3 17728 848716 30696 93684 0 0 688 2072 16947 6368 92 8 0 0
> >> 112 9 17728 609908 30748 96036 0 0 620 272 17221 7640 90 10 0 0
> >> 111 8 17728 480244 30808 98268 0 0 788 320 17227 7391 92 8 0 0
> >> 115 7 17728 549564 30852 100552 0 0 656 232 17583 7807 92 9 0 0
> >> 107 9 17728 666776 30888 102904 0 0 716 0 17406 7781 91 9 0 0
> >> 124 5 17728 685368 30960 105544 0 0 1056 944 17281 7713 90 10 0 0
> >> 130 1 17728 538832 31000 108080 0 0 776 0 16943 7347 91 9 0 0
> >> 130 0 17728 364476 31032 110252 0 0 676 0 16767 6948 91 9 0 0
> >> 129 0 17728 149332 31064 111848 0 0 540 32 16673 6272 92 8 0 0
> >> 129 0 17728 274664 31096 114052 0 0 628 0 17207 7694 92 8 0 0
> >> 128 3 17728 589736 31160 117420 0 0 816 996 17381 8443 90 10 0 0
> >> 126 5 17728 485300 31172 119544 0 0 416 0 17024 7186 91 9 0 0
> >> 130 0 17728 349500 31216 122344 0 0 492 0 17046 7358 91 9 0 0
> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> r b swpd free buff cache si so bi bo in cs us sy id wa
> >> 130 2 17728 416972 31248 125404 0 0 496 120 17112 7124 91 9 0 0
> >> 125 5 17308 188608 29444 106888 0 576 1436 612 40020 9430 91 8 0 0
> >> 113 16 17308 218700 29528 110336 32 0 1908 0 17210 7214 92 8 0 0
> >> 1 145 20292 15688 26884 108200 40 3020 188 4660 27003 3664 30 7 0 63
> >> 1 145 21128 15920 24212 107420 0 836 0 3824 16813 430 1 6 0 93
> >> 2 144 22904 16020 20780 106780 0 1776 0 6548 16611 505 1 6 0 93
> >> 1 146 23496 15788 17476 106160 32 596 60 3620 16610 308 1 6 0 93
> >> 1 147 23924 16216 16028 105852 32 432 32 5012 16477 156 0 6 0 93
> >> 1 145 24428 15904 14744 103452 20 504 20 3112 16776 125 1 6 0 93
> >> 1 146 25304 16184 14688 97712 0 876 16 3352 16759 447 2 6 0 92
> >> 1 147 26984 15908 14588 88348 96 1680 96 6352 17006 235 1 6 0 93
> >> 1 146 28724 16112 14152 77132 32 1740 44 3536 16739 375 2 6 0 92
> >> 1 151 29900 15896 12072 68484 156 1184 192 2068 16860 576 2 6 0 91
> >> 2 152 33724 33908 9536 58616 184 3856 512 6764 16536 492 2 6 3 88
> >> 1 142 33276 427352 8964 58988 1096 120 2624 120 16730 1129 6 7 8 79
> >> 2 142 33000 421512 8988 60944 1560 0 3512 0 16771 1220 1 6 9 84
> >> 2 143 32604 392952 9012 62308 1176 0 2436 0 16690 1173 2 7 10 82
> >> 8 134 32400 255348 9044 64696 688 0 2584 0 17105 2181 16 8 14 62
> >> 6 136 31796 142068 9092 66024 1060 0 1828 0 17040 2226 37 10 12 41
> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> r b swpd free buff cache si so bi bo in cs us sy id wa
> >> 2 143 31664 15844 9152 67452 580 64 1324 292 16973 2066 37 10 11 43
> >> 4 141 31876 56160 9052 67528 48 328 140 1724 16476 696 6 7 0 87
> >> 4 141 32420 176260 8896 68808 108 732 760 2280 17449 3081 24 9 0 67
> >> 11 134 32540 119868 8484 70568 108 852 1140 1408 17436 3788 45 12 1 43
> >> 17 129 32880 57044 8256 73008 0 364 1212 364 17489 4000 59 13 0 28
> >> 11 135 33468 107128 7660 73124 200 1044 888 2076 17043 1956 23 9 0 69
> >> 1 144 34788 16076 6948 71572 180 1524 276 1908 16787 967 13 7 0 79
> >> 1 145 35472 16188 5868 70348 112 768 120 1284 16696 561 1 6 0 93
> >> 1 145 36056 16696 5492 68240 16 596 16 3356 16456 202 0 6 0 93
> >> 1 143 38200 15952 3168 63968 32 2168 52 6460 16834 423 1 7 0 92
> >> 9 131 40128 139084 3064 61060 172 2144 644 2192 17701 2250 19 9 0 72
> >> 9 133 40548 110308 3092 60492 468 620 900 1852 17516 1983 35 9 0 55
> >> 10 132 40448 79476 3132 61808 1020 0 1480 0 17505 3254 35 10 0 55
> >> 12 132 40532 139396 3156 63204 776 260 1272 892 17457 3179 44 11 0 45
> >> 11 132 40392 66336 3256 65264 788 0 1536 0 17551 3860 46 11 0 43
> >> 1 142 41112 15796 3296 65680 1176 812 1636 2568 17026 1798 28 9 0 63
> >> 1 140 41500 15960 3244 64828 92 472 116 4008 16445 443 4 7 0 90
> >> 1 140 42252 16740 3232 64356 0 764 0 1500 16403 185 0 6 0 94
> >> 1 139 49636 16024 2928 60652 52 7376 52 7376 17507 1236 0 7 0 93
> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> r b swpd free buff cache si so bi bo in cs us sy id wa
> >> 2 140 55780 16444 2548 55948 176 6200 332 6260 17160 592 1 7 0 92
> >> 3 145 59800 358088 2404 55468 100 4108 1092 4132 18514 3864 17 8 0 75
> >> 3 143 60712 27028 2416 57184 816 964 2392 1288 18089 3476 43 10 0 47
> >> 4 141 61296 154136 2516 58024 424 980 1312 980 17298 2489 28 9 0 62
> >> 21 122 62544 83120 2528 58372 100 1456 788 1456 17717 2738 64 12 0 24
> >> 24 120 62780 53328 2580 62216 16 292 2528 292 17163 4076 85 14 0 1
> >> 1 143 65088 16096 2492 61524 152 2708 764 2712 16734 1474 16 8 0 76
> >> 3 141 65672 34232 2476 60536 56 672 240 3208 16726 661 4 7 0 89
> >> 1 144 65584 16044 2488 60440 808 68 948 1532 17187 1353 10 8 0 82
> >> 4 141 70836 17216 2444 58024 64 5272 64 6968 16957 437 0 6 0 93
> >> 6 134 73728 31940 2424 56880 436 3092 748 3188 16950 1269 8 7 0 85
> >> 2 139 76036 107996 2408 56404 92 2420 476 2784 16869 690 6 7 0 87
> >> 6 135 76112 82792 2436 57884 1108 476 1632 724 16999 1711 18 8 0 73
> >> 1 139 77184 17872 2444 57860 996 1084 1168 2320 16644 748 11 8 0 81
> >> 1 141 91136 15952 2300 51868 100 14088 128 14152 17494 1284 1 7 5 87
> >> 1 143 98356 204144 2256 48168 640 7496 1148 7580 17471 1840 6 7 12 74
> >> 3 139 97344 174272 2276 48968 2636 0 3216 0 16962 1499 13 8 11 69
> >> 9 133 97220 123464 2352 50584 1348 0 2320 500 17100 2255 27 9 8 56
> >> 9 134 97092 33672 2396 51780 1292 108 2028 108 16821 1547 27 8 8 57
> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> r b swpd free buff cache si so bi bo in cs us sy id wa
> >> 11 134 95068 75744 2448 53444 852 0 1696 0 17318 2630 34 10 2 54
> >> 1 143 95104 15972 2504 54544 116 44 696 44 16545 1209 20 8 5 67
> >> ^C
> >>
> >>
> >>
> >>
> >
It appears bisection was not so horrific (I was out of the PCI/MRSS bug
window), It will complete shortly :
# git bisect bad
Bisecting: 33 revisions left to test after this (roughly 5 steps)
[c299eba3c5a801657f275d33be588b34831cd30e] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
# git bisect log
git bisect start
# bad: [003f6c9df54970d8b19578d195b3e2b398cdbde2] lib/sha1.c: quiet sparse noise about symbol not declared
git bisect bad 003f6c9df54970d8b19578d195b3e2b398cdbde2
# good: [02f8c6aee8df3cdc935e9bdd4f2d020306035dbe] Linux 3.0
git bisect good 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe
# good: [d5ef642355bdd9b383ff5c18cbc6102a06eecbaf] Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
git bisect good d5ef642355bdd9b383ff5c18cbc6102a06eecbaf
# good: [664a41b8a91bf78a01a751e15175e0008977685a] Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
git bisect good 664a41b8a91bf78a01a751e15175e0008977685a
# bad: [585df1d90cb07a02ca6c7a7d339e56e46d50dafb] xhci: Remove TDs from TD lists when URBs are canceled.
git bisect bad 585df1d90cb07a02ca6c7a7d339e56e46d50dafb
# good: [60ad4466821a96913a9b567115e194ed1087c2d7] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect good 60ad4466821a96913a9b567115e194ed1087c2d7
# bad: [7f3bf7cd348cead84f8027b32aa30ea49fa64df5] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
git bisect bad 7f3bf7cd348cead84f8027b32aa30ea49fa64df5
# good: [9e8ed3ae924b65ab5f088fe63ee6f4326f04590f] [S390] signal: use set_restore_sigmask() helper
git bisect good 9e8ed3ae924b65ab5f088fe63ee6f4326f04590f
# bad: [31475dd611209413bace21651a400afb91d0bd9d] mm: a few small updates for radix-swap
git bisect bad 31475dd611209413bace21651a400afb91d0bd9d
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 6:53 ` Eric Dumazet
@ 2011-09-14 7:32 ` Shaohua Li
2011-09-14 8:20 ` Shaohua Li
0 siblings, 1 reply; 24+ messages in thread
From: Shaohua Li @ 2011-09-14 7:32 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linus Torvalds, Hugh Dickins, Andrew Morton, linux-kernel, Rik van Riel
[-- Attachment #1: Type: text/plain, Size: 15399 bytes --]
it appears we didn't account skipped swap entry in find_get_pages().
does the attached patch help?
Thanks,
Shaohua
2011/9/14 Eric Dumazet <eric.dumazet@gmail.com>:
> Le mardi 13 septembre 2011 à 23:48 -0700, Linus Torvalds a écrit :
>> Re-sending, because apparently none of my email in the last few days
>> have actually gone out due to LF problems..
>>
>> Linus
>>
>> On Tue, Sep 13, 2011 at 12:48 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Tue, Sep 13, 2011 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >>
>> >> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
>> >> expect too much from them.
>> >
>> > No, by now, they should be damn well reliable.
>> >
>> >> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
>> >> have a cpu locked in
>> >>
>> >> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>> >
>> > Hmm. There hasn't been many changes in this area, so the few changes
>> > that *do* exist are obviously very suspicious.
>> >
>> > In particular, the only real change to that whole setup is the changes
>> > by Hugh to make the swap entries use the radix tree. So I'm bringing
>> > Hugh and Andrew to the discussion (and Rik, since he acked a few of
>> > those changes).
>> >
>> > The fact that some light swapping activity seems to accompany the
>> > problem just makes me more certain it's Hugh's swap/radix tree work.
>> >
>> > We're talking only a handful of patches, so maybe Hugh could create a
>> > revert patch just to confirm that yes, that's the problem.
>> >
>> > Hugh?
>> >
>> > Linus
>> >
>> > --- quoting the rest of the email for Hugh/Andrew ---
>> >> Problem is : A bisection will be very hard, since a lot of kernels
>> >> simply destroy my disk (the PCI MRRS horror stuff).
>> >>
>> >> Messages at console :
>> >>
>> >> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
>> >> 11 t=60002 jiffies)
>> >>
>> >> perf top -C 1
>> >>
>> >> Events: 3K cycles
>> >> + 43,08% bash [kernel.kallsyms] [k] __lookup
>> >> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
>> >> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>> >>
>> >> 43.08% bash [kernel.kallsyms] [k] __lookup
>> >> |
>> >> --- __lookup
>> >> |
>> >> |--97.09%-- radix_tree_gang_lookup_slot
>> >> | find_get_pages
>> >> | pagevec_lookup
>> >> | invalidate_mapping_pages
>> >> | drop_pagecache_sb
>> >> | iterate_supers
>> >> | drop_caches_sysctl_handler
>> >> | proc_sys_call_handler.isra.3
>> >> | proc_sys_write
>> >> | vfs_write
>> >> | sys_write
>> >> | system_call_fastpath
>> >> | __write
>> >> |
>> >>
>> >>
>> >> Steps to reproduce :
>> >>
>> >> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>> >>
>> >> cd /usr/src/linux
>> >> while :
>> >> do
>> >> make clean
>> >> make -j128
>> >> done
>> >>
>> >>
>> >> In another term :
>> >>
>> >> while :
>> >> do
>> >> echo 3 >/proc/sys/vm/drop_caches
>> >> sleep 20
>> >> done
>> >>
>> >>
>> >> Before the lock, I can see in another terminal some swapping activity.
>> >>
>> >> $ vmstat 1
>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>> >> 2 2 17728 3443924 11520 328020 0 0 256 12076 16250 554 0 6 82 12
>> >> 1 1 17728 3444776 11584 328072 0 0 100 2868 16223 267 0 6 86 7
>> >> 1 1 17728 3442200 12100 328348 0 0 868 0 16600 1778 0 7 88 6
>> >> 1 1 17728 3438032 13036 329048 0 0 1628 0 16862 2480 0 7 87 5
>> >> 1 1 17728 3546864 13988 220256 0 0 1000 0 16313 931 0 7 87 6
>> >> 1 1 17728 3544260 16024 220256 0 0 2036 0 16513 1531 0 6 88 6
>> >> 1 1 17728 3542896 17196 220256 0 0 1160 556 16324 893 0 6 88 6
>> >> 1 1 17728 3540748 18756 220256 0 0 1560 0 16398 1172 0 6 88 6
>> >> 1 1 17728 3538692 20168 220256 0 0 1412 0 16544 1088 0 6 88 6
>> >> 2 0 17728 3536676 21816 220248 0 0 1648 0 16447 1246 0 6 88 6
>> >> 1 1 17728 3535052 22544 220256 0 0 728 0 16215 605 1 6 87 5
>> >> 1 1 17728 3533672 23404 220244 0 0 860 4240 16264 705 0 6 88 6
>> >> 1 1 17728 3532688 24232 220244 0 0 828 0 16272 685 0 6 87 6
>> >> 1 1 17728 3531552 25080 220244 0 0 848 0 16294 700 0 6 88 6
>> >> 1 1 17728 3529584 26532 220256 0 0 1452 0 16376 1104 0 6 87 6
>> >> 1 2 17728 3545232 27848 199176 0 0 1312 52 16392 911 0 7 85 8
>> >> 1 2 17728 3659060 29576 84420 0 0 1736 40 16570 959 0 7 81 12
>> >> 38 3 17728 3640652 29984 69976 0 0 688 0 16885 2987 3 8 80 9
>> >> 5 2 17728 3601716 30208 75628 0 0 4676 4 18080 5727 11 10 66 12
>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>> >> 103 27 17728 2286372 30376 78952 0 0 3044 8 17772 6803 49 16 34 1
>> >> 128 1 17728 1337588 30416 79952 0 0 732 4080 16389 4874 91 9 0 0
>> >> 122 7 17728 730264 30472 81056 0 0 540 1300 16535 5451 91 9 0 0
>> >> 99 16 17728 996308 30544 83136 0 0 492 452 16951 6629 92 8 0 0
>> >> 89 23 17728 1150640 30592 88288 0 0 3232 224 17286 7312 91 9 0 0
>> >> 114 7 17728 1344768 30660 92104 0 0 1668 228 17395 7297 89 11 0 0
>> >> 99 3 17728 848716 30696 93684 0 0 688 2072 16947 6368 92 8 0 0
>> >> 112 9 17728 609908 30748 96036 0 0 620 272 17221 7640 90 10 0 0
>> >> 111 8 17728 480244 30808 98268 0 0 788 320 17227 7391 92 8 0 0
>> >> 115 7 17728 549564 30852 100552 0 0 656 232 17583 7807 92 9 0 0
>> >> 107 9 17728 666776 30888 102904 0 0 716 0 17406 7781 91 9 0 0
>> >> 124 5 17728 685368 30960 105544 0 0 1056 944 17281 7713 90 10 0 0
>> >> 130 1 17728 538832 31000 108080 0 0 776 0 16943 7347 91 9 0 0
>> >> 130 0 17728 364476 31032 110252 0 0 676 0 16767 6948 91 9 0 0
>> >> 129 0 17728 149332 31064 111848 0 0 540 32 16673 6272 92 8 0 0
>> >> 129 0 17728 274664 31096 114052 0 0 628 0 17207 7694 92 8 0 0
>> >> 128 3 17728 589736 31160 117420 0 0 816 996 17381 8443 90 10 0 0
>> >> 126 5 17728 485300 31172 119544 0 0 416 0 17024 7186 91 9 0 0
>> >> 130 0 17728 349500 31216 122344 0 0 492 0 17046 7358 91 9 0 0
>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>> >> 130 2 17728 416972 31248 125404 0 0 496 120 17112 7124 91 9 0 0
>> >> 125 5 17308 188608 29444 106888 0 576 1436 612 40020 9430 91 8 0 0
>> >> 113 16 17308 218700 29528 110336 32 0 1908 0 17210 7214 92 8 0 0
>> >> 1 145 20292 15688 26884 108200 40 3020 188 4660 27003 3664 30 7 0 63
>> >> 1 145 21128 15920 24212 107420 0 836 0 3824 16813 430 1 6 0 93
>> >> 2 144 22904 16020 20780 106780 0 1776 0 6548 16611 505 1 6 0 93
>> >> 1 146 23496 15788 17476 106160 32 596 60 3620 16610 308 1 6 0 93
>> >> 1 147 23924 16216 16028 105852 32 432 32 5012 16477 156 0 6 0 93
>> >> 1 145 24428 15904 14744 103452 20 504 20 3112 16776 125 1 6 0 93
>> >> 1 146 25304 16184 14688 97712 0 876 16 3352 16759 447 2 6 0 92
>> >> 1 147 26984 15908 14588 88348 96 1680 96 6352 17006 235 1 6 0 93
>> >> 1 146 28724 16112 14152 77132 32 1740 44 3536 16739 375 2 6 0 92
>> >> 1 151 29900 15896 12072 68484 156 1184 192 2068 16860 576 2 6 0 91
>> >> 2 152 33724 33908 9536 58616 184 3856 512 6764 16536 492 2 6 3 88
>> >> 1 142 33276 427352 8964 58988 1096 120 2624 120 16730 1129 6 7 8 79
>> >> 2 142 33000 421512 8988 60944 1560 0 3512 0 16771 1220 1 6 9 84
>> >> 2 143 32604 392952 9012 62308 1176 0 2436 0 16690 1173 2 7 10 82
>> >> 8 134 32400 255348 9044 64696 688 0 2584 0 17105 2181 16 8 14 62
>> >> 6 136 31796 142068 9092 66024 1060 0 1828 0 17040 2226 37 10 12 41
>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>> >> 2 143 31664 15844 9152 67452 580 64 1324 292 16973 2066 37 10 11 43
>> >> 4 141 31876 56160 9052 67528 48 328 140 1724 16476 696 6 7 0 87
>> >> 4 141 32420 176260 8896 68808 108 732 760 2280 17449 3081 24 9 0 67
>> >> 11 134 32540 119868 8484 70568 108 852 1140 1408 17436 3788 45 12 1 43
>> >> 17 129 32880 57044 8256 73008 0 364 1212 364 17489 4000 59 13 0 28
>> >> 11 135 33468 107128 7660 73124 200 1044 888 2076 17043 1956 23 9 0 69
>> >> 1 144 34788 16076 6948 71572 180 1524 276 1908 16787 967 13 7 0 79
>> >> 1 145 35472 16188 5868 70348 112 768 120 1284 16696 561 1 6 0 93
>> >> 1 145 36056 16696 5492 68240 16 596 16 3356 16456 202 0 6 0 93
>> >> 1 143 38200 15952 3168 63968 32 2168 52 6460 16834 423 1 7 0 92
>> >> 9 131 40128 139084 3064 61060 172 2144 644 2192 17701 2250 19 9 0 72
>> >> 9 133 40548 110308 3092 60492 468 620 900 1852 17516 1983 35 9 0 55
>> >> 10 132 40448 79476 3132 61808 1020 0 1480 0 17505 3254 35 10 0 55
>> >> 12 132 40532 139396 3156 63204 776 260 1272 892 17457 3179 44 11 0 45
>> >> 11 132 40392 66336 3256 65264 788 0 1536 0 17551 3860 46 11 0 43
>> >> 1 142 41112 15796 3296 65680 1176 812 1636 2568 17026 1798 28 9 0 63
>> >> 1 140 41500 15960 3244 64828 92 472 116 4008 16445 443 4 7 0 90
>> >> 1 140 42252 16740 3232 64356 0 764 0 1500 16403 185 0 6 0 94
>> >> 1 139 49636 16024 2928 60652 52 7376 52 7376 17507 1236 0 7 0 93
>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>> >> 2 140 55780 16444 2548 55948 176 6200 332 6260 17160 592 1 7 0 92
>> >> 3 145 59800 358088 2404 55468 100 4108 1092 4132 18514 3864 17 8 0 75
>> >> 3 143 60712 27028 2416 57184 816 964 2392 1288 18089 3476 43 10 0 47
>> >> 4 141 61296 154136 2516 58024 424 980 1312 980 17298 2489 28 9 0 62
>> >> 21 122 62544 83120 2528 58372 100 1456 788 1456 17717 2738 64 12 0 24
>> >> 24 120 62780 53328 2580 62216 16 292 2528 292 17163 4076 85 14 0 1
>> >> 1 143 65088 16096 2492 61524 152 2708 764 2712 16734 1474 16 8 0 76
>> >> 3 141 65672 34232 2476 60536 56 672 240 3208 16726 661 4 7 0 89
>> >> 1 144 65584 16044 2488 60440 808 68 948 1532 17187 1353 10 8 0 82
>> >> 4 141 70836 17216 2444 58024 64 5272 64 6968 16957 437 0 6 0 93
>> >> 6 134 73728 31940 2424 56880 436 3092 748 3188 16950 1269 8 7 0 85
>> >> 2 139 76036 107996 2408 56404 92 2420 476 2784 16869 690 6 7 0 87
>> >> 6 135 76112 82792 2436 57884 1108 476 1632 724 16999 1711 18 8 0 73
>> >> 1 139 77184 17872 2444 57860 996 1084 1168 2320 16644 748 11 8 0 81
>> >> 1 141 91136 15952 2300 51868 100 14088 128 14152 17494 1284 1 7 5 87
>> >> 1 143 98356 204144 2256 48168 640 7496 1148 7580 17471 1840 6 7 12 74
>> >> 3 139 97344 174272 2276 48968 2636 0 3216 0 16962 1499 13 8 11 69
>> >> 9 133 97220 123464 2352 50584 1348 0 2320 500 17100 2255 27 9 8 56
>> >> 9 134 97092 33672 2396 51780 1292 108 2028 108 16821 1547 27 8 8 57
>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>> >> 11 134 95068 75744 2448 53444 852 0 1696 0 17318 2630 34 10 2 54
>> >> 1 143 95104 15972 2504 54544 116 44 696 44 16545 1209 20 8 5 67
>> >> ^C
>> >>
>> >>
>> >>
>> >>
>> >
>
> It appears bisection was not so horrific (I was out of the PCI/MRSS bug
> window), It will complete shortly :
>
> # git bisect bad
> Bisecting: 33 revisions left to test after this (roughly 5 steps)
> [c299eba3c5a801657f275d33be588b34831cd30e] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
> # git bisect log
> git bisect start
> # bad: [003f6c9df54970d8b19578d195b3e2b398cdbde2] lib/sha1.c: quiet sparse noise about symbol not declared
> git bisect bad 003f6c9df54970d8b19578d195b3e2b398cdbde2
> # good: [02f8c6aee8df3cdc935e9bdd4f2d020306035dbe] Linux 3.0
> git bisect good 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe
> # good: [d5ef642355bdd9b383ff5c18cbc6102a06eecbaf] Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
> git bisect good d5ef642355bdd9b383ff5c18cbc6102a06eecbaf
> # good: [664a41b8a91bf78a01a751e15175e0008977685a] Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
> git bisect good 664a41b8a91bf78a01a751e15175e0008977685a
> # bad: [585df1d90cb07a02ca6c7a7d339e56e46d50dafb] xhci: Remove TDs from TD lists when URBs are canceled.
> git bisect bad 585df1d90cb07a02ca6c7a7d339e56e46d50dafb
> # good: [60ad4466821a96913a9b567115e194ed1087c2d7] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
> git bisect good 60ad4466821a96913a9b567115e194ed1087c2d7
> # bad: [7f3bf7cd348cead84f8027b32aa30ea49fa64df5] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
> git bisect bad 7f3bf7cd348cead84f8027b32aa30ea49fa64df5
> # good: [9e8ed3ae924b65ab5f088fe63ee6f4326f04590f] [S390] signal: use set_restore_sigmask() helper
> git bisect good 9e8ed3ae924b65ab5f088fe63ee6f4326f04590f
> # bad: [31475dd611209413bace21651a400afb91d0bd9d] mm: a few small updates for radix-swap
> git bisect bad 31475dd611209413bace21651a400afb91d0bd9d
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
[-- Attachment #2: filemap-dbg.patch --]
[-- Type: text/x-patch, Size: 1023 bytes --]
diff --git a/mm/filemap.c b/mm/filemap.c
index 645a080..f177e96 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
{
unsigned int i;
unsigned int ret;
- unsigned int nr_found;
+ unsigned int nr_found, nr_skip;
rcu_read_lock();
restart:
nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
+ nr_skip = 0;
for (i = 0; i < nr_found; i++) {
struct page *page;
repeat:
@@ -856,6 +857,7 @@ repeat:
* here as an exceptional entry: so skip over it -
* we only reach this from invalidate_mapping_pages().
*/
+ nr_skip++;
continue;
}
@@ -876,7 +878,7 @@ repeat:
* If all entries were removed before we could secure them,
* try again, because callers stop trying once 0 is returned.
*/
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found && (nr_found != nr_skip)))
goto restart;
rcu_read_unlock();
return ret;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 7:32 ` Shaohua Li
@ 2011-09-14 8:20 ` Shaohua Li
2011-09-14 8:43 ` Eric Dumazet
0 siblings, 1 reply; 24+ messages in thread
From: Shaohua Li @ 2011-09-14 8:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linus Torvalds, Hugh Dickins, Andrew Morton, linux-kernel, Rik van Riel
2011/9/14 Shaohua Li <shli@kernel.org>:
> it appears we didn't account skipped swap entry in find_get_pages().
> does the attached patch help?
I can easily reproduce the issue. Just cp files in tmpfs, trigger swap and
drop caches. The debug patch fixes it at my side.
Eric, please try it.
Thanks,
Shaohua
> 2011/9/14 Eric Dumazet <eric.dumazet@gmail.com>:
>> Le mardi 13 septembre 2011 à 23:48 -0700, Linus Torvalds a écrit :
>>> Re-sending, because apparently none of my email in the last few days
>>> have actually gone out due to LF problems..
>>>
>>> Linus
>>>
>>> On Tue, Sep 13, 2011 at 12:48 PM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>> > On Tue, Sep 13, 2011 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> >>
>>> >> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
>>> >> expect too much from them.
>>> >
>>> > No, by now, they should be damn well reliable.
>>> >
>>> >> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
>>> >> have a cpu locked in
>>> >>
>>> >> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>>> >
>>> > Hmm. There hasn't been many changes in this area, so the few changes
>>> > that *do* exist are obviously very suspicious.
>>> >
>>> > In particular, the only real change to that whole setup is the changes
>>> > by Hugh to make the swap entries use the radix tree. So I'm bringing
>>> > Hugh and Andrew to the discussion (and Rik, since he acked a few of
>>> > those changes).
>>> >
>>> > The fact that some light swapping activity seems to accompany the
>>> > problem just makes me more certain it's Hugh's swap/radix tree work.
>>> >
>>> > We're talking only a handful of patches, so maybe Hugh could create a
>>> > revert patch just to confirm that yes, that's the problem.
>>> >
>>> > Hugh?
>>> >
>>> > Linus
>>> >
>>> > --- quoting the rest of the email for Hugh/Andrew ---
>>> >> Problem is : A bisection will be very hard, since a lot of kernels
>>> >> simply destroy my disk (the PCI MRRS horror stuff).
>>> >>
>>> >> Messages at console :
>>> >>
>>> >> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
>>> >> 11 t=60002 jiffies)
>>> >>
>>> >> perf top -C 1
>>> >>
>>> >> Events: 3K cycles
>>> >> + 43,08% bash [kernel.kallsyms] [k] __lookup
>>> >> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
>>> >> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>>> >>
>>> >> 43.08% bash [kernel.kallsyms] [k] __lookup
>>> >> |
>>> >> --- __lookup
>>> >> |
>>> >> |--97.09%-- radix_tree_gang_lookup_slot
>>> >> | find_get_pages
>>> >> | pagevec_lookup
>>> >> | invalidate_mapping_pages
>>> >> | drop_pagecache_sb
>>> >> | iterate_supers
>>> >> | drop_caches_sysctl_handler
>>> >> | proc_sys_call_handler.isra.3
>>> >> | proc_sys_write
>>> >> | vfs_write
>>> >> | sys_write
>>> >> | system_call_fastpath
>>> >> | __write
>>> >> |
>>> >>
>>> >>
>>> >> Steps to reproduce :
>>> >>
>>> >> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>>> >>
>>> >> cd /usr/src/linux
>>> >> while :
>>> >> do
>>> >> make clean
>>> >> make -j128
>>> >> done
>>> >>
>>> >>
>>> >> In another term :
>>> >>
>>> >> while :
>>> >> do
>>> >> echo 3 >/proc/sys/vm/drop_caches
>>> >> sleep 20
>>> >> done
>>> >>
>>> >>
>>> >> Before the lock, I can see in another terminal some swapping activity.
>>> >>
>>> >> $ vmstat 1
>>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>>> >> 2 2 17728 3443924 11520 328020 0 0 256 12076 16250 554 0 6 82 12
>>> >> 1 1 17728 3444776 11584 328072 0 0 100 2868 16223 267 0 6 86 7
>>> >> 1 1 17728 3442200 12100 328348 0 0 868 0 16600 1778 0 7 88 6
>>> >> 1 1 17728 3438032 13036 329048 0 0 1628 0 16862 2480 0 7 87 5
>>> >> 1 1 17728 3546864 13988 220256 0 0 1000 0 16313 931 0 7 87 6
>>> >> 1 1 17728 3544260 16024 220256 0 0 2036 0 16513 1531 0 6 88 6
>>> >> 1 1 17728 3542896 17196 220256 0 0 1160 556 16324 893 0 6 88 6
>>> >> 1 1 17728 3540748 18756 220256 0 0 1560 0 16398 1172 0 6 88 6
>>> >> 1 1 17728 3538692 20168 220256 0 0 1412 0 16544 1088 0 6 88 6
>>> >> 2 0 17728 3536676 21816 220248 0 0 1648 0 16447 1246 0 6 88 6
>>> >> 1 1 17728 3535052 22544 220256 0 0 728 0 16215 605 1 6 87 5
>>> >> 1 1 17728 3533672 23404 220244 0 0 860 4240 16264 705 0 6 88 6
>>> >> 1 1 17728 3532688 24232 220244 0 0 828 0 16272 685 0 6 87 6
>>> >> 1 1 17728 3531552 25080 220244 0 0 848 0 16294 700 0 6 88 6
>>> >> 1 1 17728 3529584 26532 220256 0 0 1452 0 16376 1104 0 6 87 6
>>> >> 1 2 17728 3545232 27848 199176 0 0 1312 52 16392 911 0 7 85 8
>>> >> 1 2 17728 3659060 29576 84420 0 0 1736 40 16570 959 0 7 81 12
>>> >> 38 3 17728 3640652 29984 69976 0 0 688 0 16885 2987 3 8 80 9
>>> >> 5 2 17728 3601716 30208 75628 0 0 4676 4 18080 5727 11 10 66 12
>>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>>> >> 103 27 17728 2286372 30376 78952 0 0 3044 8 17772 6803 49 16 34 1
>>> >> 128 1 17728 1337588 30416 79952 0 0 732 4080 16389 4874 91 9 0 0
>>> >> 122 7 17728 730264 30472 81056 0 0 540 1300 16535 5451 91 9 0 0
>>> >> 99 16 17728 996308 30544 83136 0 0 492 452 16951 6629 92 8 0 0
>>> >> 89 23 17728 1150640 30592 88288 0 0 3232 224 17286 7312 91 9 0 0
>>> >> 114 7 17728 1344768 30660 92104 0 0 1668 228 17395 7297 89 11 0 0
>>> >> 99 3 17728 848716 30696 93684 0 0 688 2072 16947 6368 92 8 0 0
>>> >> 112 9 17728 609908 30748 96036 0 0 620 272 17221 7640 90 10 0 0
>>> >> 111 8 17728 480244 30808 98268 0 0 788 320 17227 7391 92 8 0 0
>>> >> 115 7 17728 549564 30852 100552 0 0 656 232 17583 7807 92 9 0 0
>>> >> 107 9 17728 666776 30888 102904 0 0 716 0 17406 7781 91 9 0 0
>>> >> 124 5 17728 685368 30960 105544 0 0 1056 944 17281 7713 90 10 0 0
>>> >> 130 1 17728 538832 31000 108080 0 0 776 0 16943 7347 91 9 0 0
>>> >> 130 0 17728 364476 31032 110252 0 0 676 0 16767 6948 91 9 0 0
>>> >> 129 0 17728 149332 31064 111848 0 0 540 32 16673 6272 92 8 0 0
>>> >> 129 0 17728 274664 31096 114052 0 0 628 0 17207 7694 92 8 0 0
>>> >> 128 3 17728 589736 31160 117420 0 0 816 996 17381 8443 90 10 0 0
>>> >> 126 5 17728 485300 31172 119544 0 0 416 0 17024 7186 91 9 0 0
>>> >> 130 0 17728 349500 31216 122344 0 0 492 0 17046 7358 91 9 0 0
>>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>>> >> 130 2 17728 416972 31248 125404 0 0 496 120 17112 7124 91 9 0 0
>>> >> 125 5 17308 188608 29444 106888 0 576 1436 612 40020 9430 91 8 0 0
>>> >> 113 16 17308 218700 29528 110336 32 0 1908 0 17210 7214 92 8 0 0
>>> >> 1 145 20292 15688 26884 108200 40 3020 188 4660 27003 3664 30 7 0 63
>>> >> 1 145 21128 15920 24212 107420 0 836 0 3824 16813 430 1 6 0 93
>>> >> 2 144 22904 16020 20780 106780 0 1776 0 6548 16611 505 1 6 0 93
>>> >> 1 146 23496 15788 17476 106160 32 596 60 3620 16610 308 1 6 0 93
>>> >> 1 147 23924 16216 16028 105852 32 432 32 5012 16477 156 0 6 0 93
>>> >> 1 145 24428 15904 14744 103452 20 504 20 3112 16776 125 1 6 0 93
>>> >> 1 146 25304 16184 14688 97712 0 876 16 3352 16759 447 2 6 0 92
>>> >> 1 147 26984 15908 14588 88348 96 1680 96 6352 17006 235 1 6 0 93
>>> >> 1 146 28724 16112 14152 77132 32 1740 44 3536 16739 375 2 6 0 92
>>> >> 1 151 29900 15896 12072 68484 156 1184 192 2068 16860 576 2 6 0 91
>>> >> 2 152 33724 33908 9536 58616 184 3856 512 6764 16536 492 2 6 3 88
>>> >> 1 142 33276 427352 8964 58988 1096 120 2624 120 16730 1129 6 7 8 79
>>> >> 2 142 33000 421512 8988 60944 1560 0 3512 0 16771 1220 1 6 9 84
>>> >> 2 143 32604 392952 9012 62308 1176 0 2436 0 16690 1173 2 7 10 82
>>> >> 8 134 32400 255348 9044 64696 688 0 2584 0 17105 2181 16 8 14 62
>>> >> 6 136 31796 142068 9092 66024 1060 0 1828 0 17040 2226 37 10 12 41
>>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>>> >> 2 143 31664 15844 9152 67452 580 64 1324 292 16973 2066 37 10 11 43
>>> >> 4 141 31876 56160 9052 67528 48 328 140 1724 16476 696 6 7 0 87
>>> >> 4 141 32420 176260 8896 68808 108 732 760 2280 17449 3081 24 9 0 67
>>> >> 11 134 32540 119868 8484 70568 108 852 1140 1408 17436 3788 45 12 1 43
>>> >> 17 129 32880 57044 8256 73008 0 364 1212 364 17489 4000 59 13 0 28
>>> >> 11 135 33468 107128 7660 73124 200 1044 888 2076 17043 1956 23 9 0 69
>>> >> 1 144 34788 16076 6948 71572 180 1524 276 1908 16787 967 13 7 0 79
>>> >> 1 145 35472 16188 5868 70348 112 768 120 1284 16696 561 1 6 0 93
>>> >> 1 145 36056 16696 5492 68240 16 596 16 3356 16456 202 0 6 0 93
>>> >> 1 143 38200 15952 3168 63968 32 2168 52 6460 16834 423 1 7 0 92
>>> >> 9 131 40128 139084 3064 61060 172 2144 644 2192 17701 2250 19 9 0 72
>>> >> 9 133 40548 110308 3092 60492 468 620 900 1852 17516 1983 35 9 0 55
>>> >> 10 132 40448 79476 3132 61808 1020 0 1480 0 17505 3254 35 10 0 55
>>> >> 12 132 40532 139396 3156 63204 776 260 1272 892 17457 3179 44 11 0 45
>>> >> 11 132 40392 66336 3256 65264 788 0 1536 0 17551 3860 46 11 0 43
>>> >> 1 142 41112 15796 3296 65680 1176 812 1636 2568 17026 1798 28 9 0 63
>>> >> 1 140 41500 15960 3244 64828 92 472 116 4008 16445 443 4 7 0 90
>>> >> 1 140 42252 16740 3232 64356 0 764 0 1500 16403 185 0 6 0 94
>>> >> 1 139 49636 16024 2928 60652 52 7376 52 7376 17507 1236 0 7 0 93
>>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>>> >> 2 140 55780 16444 2548 55948 176 6200 332 6260 17160 592 1 7 0 92
>>> >> 3 145 59800 358088 2404 55468 100 4108 1092 4132 18514 3864 17 8 0 75
>>> >> 3 143 60712 27028 2416 57184 816 964 2392 1288 18089 3476 43 10 0 47
>>> >> 4 141 61296 154136 2516 58024 424 980 1312 980 17298 2489 28 9 0 62
>>> >> 21 122 62544 83120 2528 58372 100 1456 788 1456 17717 2738 64 12 0 24
>>> >> 24 120 62780 53328 2580 62216 16 292 2528 292 17163 4076 85 14 0 1
>>> >> 1 143 65088 16096 2492 61524 152 2708 764 2712 16734 1474 16 8 0 76
>>> >> 3 141 65672 34232 2476 60536 56 672 240 3208 16726 661 4 7 0 89
>>> >> 1 144 65584 16044 2488 60440 808 68 948 1532 17187 1353 10 8 0 82
>>> >> 4 141 70836 17216 2444 58024 64 5272 64 6968 16957 437 0 6 0 93
>>> >> 6 134 73728 31940 2424 56880 436 3092 748 3188 16950 1269 8 7 0 85
>>> >> 2 139 76036 107996 2408 56404 92 2420 476 2784 16869 690 6 7 0 87
>>> >> 6 135 76112 82792 2436 57884 1108 476 1632 724 16999 1711 18 8 0 73
>>> >> 1 139 77184 17872 2444 57860 996 1084 1168 2320 16644 748 11 8 0 81
>>> >> 1 141 91136 15952 2300 51868 100 14088 128 14152 17494 1284 1 7 5 87
>>> >> 1 143 98356 204144 2256 48168 640 7496 1148 7580 17471 1840 6 7 12 74
>>> >> 3 139 97344 174272 2276 48968 2636 0 3216 0 16962 1499 13 8 11 69
>>> >> 9 133 97220 123464 2352 50584 1348 0 2320 500 17100 2255 27 9 8 56
>>> >> 9 134 97092 33672 2396 51780 1292 108 2028 108 16821 1547 27 8 8 57
>>> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>> >> r b swpd free buff cache si so bi bo in cs us sy id wa
>>> >> 11 134 95068 75744 2448 53444 852 0 1696 0 17318 2630 34 10 2 54
>>> >> 1 143 95104 15972 2504 54544 116 44 696 44 16545 1209 20 8 5 67
>>> >> ^C
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>
>> It appears bisection was not so horrific (I was out of the PCI/MRSS bug
>> window), It will complete shortly :
>>
>> # git bisect bad
>> Bisecting: 33 revisions left to test after this (roughly 5 steps)
>> [c299eba3c5a801657f275d33be588b34831cd30e] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
>> # git bisect log
>> git bisect start
>> # bad: [003f6c9df54970d8b19578d195b3e2b398cdbde2] lib/sha1.c: quiet sparse noise about symbol not declared
>> git bisect bad 003f6c9df54970d8b19578d195b3e2b398cdbde2
>> # good: [02f8c6aee8df3cdc935e9bdd4f2d020306035dbe] Linux 3.0
>> git bisect good 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe
>> # good: [d5ef642355bdd9b383ff5c18cbc6102a06eecbaf] Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
>> git bisect good d5ef642355bdd9b383ff5c18cbc6102a06eecbaf
>> # good: [664a41b8a91bf78a01a751e15175e0008977685a] Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
>> git bisect good 664a41b8a91bf78a01a751e15175e0008977685a
>> # bad: [585df1d90cb07a02ca6c7a7d339e56e46d50dafb] xhci: Remove TDs from TD lists when URBs are canceled.
>> git bisect bad 585df1d90cb07a02ca6c7a7d339e56e46d50dafb
>> # good: [60ad4466821a96913a9b567115e194ed1087c2d7] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
>> git bisect good 60ad4466821a96913a9b567115e194ed1087c2d7
>> # bad: [7f3bf7cd348cead84f8027b32aa30ea49fa64df5] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
>> git bisect bad 7f3bf7cd348cead84f8027b32aa30ea49fa64df5
>> # good: [9e8ed3ae924b65ab5f088fe63ee6f4326f04590f] [S390] signal: use set_restore_sigmask() helper
>> git bisect good 9e8ed3ae924b65ab5f088fe63ee6f4326f04590f
>> # bad: [31475dd611209413bace21651a400afb91d0bd9d] mm: a few small updates for radix-swap
>> git bisect bad 31475dd611209413bace21651a400afb91d0bd9d
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 8:20 ` Shaohua Li
@ 2011-09-14 8:43 ` Eric Dumazet
2011-09-14 8:55 ` Shaohua Li
0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2011-09-14 8:43 UTC (permalink / raw)
To: Shaohua Li
Cc: Linus Torvalds, Hugh Dickins, Andrew Morton, linux-kernel, Rik van Riel
Le mercredi 14 septembre 2011 à 16:20 +0800, Shaohua Li a écrit :
> 2011/9/14 Shaohua Li <shli@kernel.org>:
> > it appears we didn't account skipped swap entry in find_get_pages().
> > does the attached patch help?
> I can easily reproduce the issue. Just cp files in tmpfs, trigger swap and
> drop caches. The debug patch fixes it at my side.
> Eric, please try it.
>
Hello Shaohua
I tried it with added traces :
[ 277.077855] mv used greatest stack depth: 3336 bytes left
[ 310.558012] nr_found=2 nr_skip=2
[ 310.558139] nr_found=14 nr_skip=14
[ 332.195162] nr_found=2 nr_skip=2
[ 332.195274] nr_found=14 nr_skip=14
[ 352.315273] nr_found=14 nr_skip=14
[ 372.673575] nr_found=14 nr_skip=14
[ 397.115463] nr_found=14 nr_skip=14
[ 403.391694] cc1 used greatest stack depth: 3184 bytes left
[ 404.761194] cc1 used greatest stack depth: 2640 bytes left
[ 417.306510] nr_found=14 nr_skip=14
[ 440.198051] nr_found=14 nr_skip=14
I also used :
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found > nr_skip))
goto restart;
It seems to fix the bug. I suspect it also aborts
invalidate_mapping_pages() if we skip 14 pages, but existing comment
states its OK :
/*
* Note: this function may get called on a shmem/tmpfs mapping:
* pagevec_lookup() might then return 0 prematurely (because it
* got a gangful of swap entries); but it's hardly worth worrying
* about - it can rarely have anything to free from such a mapping
* (most pages are dirty), and already skips over any difficulties.
*/
Thanks !
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 8:43 ` Eric Dumazet
@ 2011-09-14 8:55 ` Shaohua Li
2011-09-14 20:38 ` Hugh Dickins
0 siblings, 1 reply; 24+ messages in thread
From: Shaohua Li @ 2011-09-14 8:55 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linus Torvalds, Hugh Dickins, Andrew Morton, linux-kernel, Rik van Riel
On Wed, 2011-09-14 at 16:43 +0800, Eric Dumazet wrote:
> Le mercredi 14 septembre 2011 à 16:20 +0800, Shaohua Li a écrit :
> > 2011/9/14 Shaohua Li <shli@kernel.org>:
> > > it appears we didn't account skipped swap entry in find_get_pages().
> > > does the attached patch help?
> > I can easily reproduce the issue. Just cp files in tmpfs, trigger swap and
> > drop caches. The debug patch fixes it at my side.
> > Eric, please try it.
> >
>
> Hello Shaohua
>
> I tried it with added traces :
>
>
> [ 277.077855] mv used greatest stack depth: 3336 bytes left
> [ 310.558012] nr_found=2 nr_skip=2
> [ 310.558139] nr_found=14 nr_skip=14
> [ 332.195162] nr_found=2 nr_skip=2
> [ 332.195274] nr_found=14 nr_skip=14
> [ 352.315273] nr_found=14 nr_skip=14
> [ 372.673575] nr_found=14 nr_skip=14
> [ 397.115463] nr_found=14 nr_skip=14
> [ 403.391694] cc1 used greatest stack depth: 3184 bytes left
> [ 404.761194] cc1 used greatest stack depth: 2640 bytes left
> [ 417.306510] nr_found=14 nr_skip=14
> [ 440.198051] nr_found=14 nr_skip=14
>
> I also used :
>
> - if (unlikely(!ret && nr_found))
> + if (unlikely(!ret && nr_found > nr_skip))
> goto restart;
nr_found > nr_skip is better
> It seems to fix the bug. I suspect it also aborts
> invalidate_mapping_pages() if we skip 14 pages, but existing comment
> states its OK :
>
> /*
> * Note: this function may get called on a shmem/tmpfs mapping:
> * pagevec_lookup() might then return 0 prematurely (because it
> * got a gangful of swap entries); but it's hardly worth worrying
> * about - it can rarely have anything to free from such a mapping
> * (most pages are dirty), and already skips over any difficulties.
> */
that might be a problem, let Hugh answer if it is.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 8:55 ` Shaohua Li
@ 2011-09-14 20:38 ` Hugh Dickins
2011-09-14 20:55 ` Eric Dumazet
0 siblings, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2011-09-14 20:38 UTC (permalink / raw)
To: Shaohua Li
Cc: Eric Dumazet, Linus Torvalds, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
[-- Attachment #1: Type: TEXT/PLAIN, Size: 6832 bytes --]
On Wed, 14 Sep 2011, Shaohua Li wrote:
> On Wed, 2011-09-14 at 16:43 +0800, Eric Dumazet wrote:
> > Le mercredi 14 septembre 2011 à 16:20 +0800, Shaohua Li a écrit :
> > > 2011/9/14 Shaohua Li <shli@kernel.org>:
> > > > it appears we didn't account skipped swap entry in find_get_pages().
> > > > does the attached patch help?
> > > I can easily reproduce the issue. Just cp files in tmpfs, trigger swap and
> > > drop caches. The debug patch fixes it at my side.
> > > Eric, please try it.
> > >
> >
> > Hello Shaohua
> >
> > I tried it with added traces :
> >
> >
> > [ 277.077855] mv used greatest stack depth: 3336 bytes left
> > [ 310.558012] nr_found=2 nr_skip=2
> > [ 310.558139] nr_found=14 nr_skip=14
> > [ 332.195162] nr_found=2 nr_skip=2
> > [ 332.195274] nr_found=14 nr_skip=14
> > [ 352.315273] nr_found=14 nr_skip=14
> > [ 372.673575] nr_found=14 nr_skip=14
> > [ 397.115463] nr_found=14 nr_skip=14
> > [ 403.391694] cc1 used greatest stack depth: 3184 bytes left
> > [ 404.761194] cc1 used greatest stack depth: 2640 bytes left
> > [ 417.306510] nr_found=14 nr_skip=14
> > [ 440.198051] nr_found=14 nr_skip=14
> >
> > I also used :
> >
> > - if (unlikely(!ret && nr_found))
> > + if (unlikely(!ret && nr_found > nr_skip))
> > goto restart;
> nr_found > nr_skip is better
>
> > It seems to fix the bug. I suspect it also aborts
> > invalidate_mapping_pages() if we skip 14 pages, but existing comment
> > states its OK :
> >
> > /*
> > * Note: this function may get called on a shmem/tmpfs mapping:
> > * pagevec_lookup() might then return 0 prematurely (because it
> > * got a gangful of swap entries); but it's hardly worth worrying
> > * about - it can rarely have anything to free from such a mapping
> > * (most pages are dirty), and already skips over any difficulties.
> > */
> that might be a problem, let Hugh answer if it is.
Thanks to you all for suffering, reporting and investigating this.
Yes, in 3.1-rc I have converted an extremely rare try-again-once
into a too-easily stumbled-upon endless loop.
Would it be a problem to give up early on a shmem/tmpfs mapping in
invalidate_mapping_pages()? No, not really: it's rare for it to find
anything it can throw away from tmpfs, because it cannot recognize the
clean swapcache pages (getting it to work on those would be nice, and
something I did look into once, but it's not a job for today), and
entirely clean pages (readonly mmap'ed zeroes never touched) are uncommon.
However, I did independently run across scan_mapping_unevictable_pages()
a few days ago: that uses pagevec_lookup() on shmem when doing SHM_UNLOCK,
and although the normal case would be that everything then is in memory,
I think it's not impossible for some to be swapped out (already swapped
out at SHM_LOCK time, and not touched since), which should not stop it
from doing its work on unswapped pages beyond.
My preferred patch is below: but it does add a cond_resched() into
find_get_pages(), which is really below the level at which we usually
do cond_resched(). All callers appear fine with it, and in practice
it would be very^14 rare on anything other than shmem/tmpfs: so this
being rc6 I'm reluctant to make matters worse with a might_sleep().
But I'm not signing this off yet, because I'm still mystified by the
several reports of seemingly the same problem on 3.0.1 and 3.0.2,
which I fear the patch below (even if adjusted to apply) will do
nothing to help - there are no swap entries in radix_tree in 3.0.
My suspicion is that there's some path by which a page gets trapped
in the radix_tree with page count 0. While it's easy to imagine that
THP's use of compaction and compaction's use of migration could have
made a bug there more common, I do not see it.
I'd like to think about that a little more before finalizing the
patch below - does it work, and does it look acceptable so far?
Of course, the mods to truncate.c and vmscan.c are not essential
parts of this fix, just things to tidy up while on the subject.
Right now I must attend to some other stuff, will return tomorrow.
Hugh
---
mm/filemap.c | 14 ++++++++++----
mm/truncate.c | 8 --------
mm/vmscan.c | 2 +-
3 files changed, 11 insertions(+), 13 deletions(-)
--- 3.1-rc6/mm/filemap.c 2011-08-07 23:44:41.231928061 -0700
+++ linux/mm/filemap.c 2011-09-14 12:24:26.431242155 -0700
@@ -829,8 +829,8 @@ unsigned find_get_pages(struct address_s
unsigned int ret;
unsigned int nr_found;
- rcu_read_lock();
restart:
+ rcu_read_lock();
nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
@@ -849,12 +849,15 @@ repeat:
* to root: none yet gotten, safe to restart.
*/
WARN_ON(start | i);
+ rcu_read_unlock();
goto restart;
}
/*
* Otherwise, shmem/tmpfs must be storing a swap entry
* here as an exceptional entry: so skip over it -
- * we only reach this from invalidate_mapping_pages().
+ * we only reach this from invalidate_mapping_pages(),
+ * or SHM_UNLOCK's scan_mapping_unevictable_pages() -
+ * in each case it's correct to skip a swapped entry.
*/
continue;
}
@@ -871,14 +874,17 @@ repeat:
pages[ret] = page;
ret++;
}
+ rcu_read_unlock();
/*
* If all entries were removed before we could secure them,
* try again, because callers stop trying once 0 is returned.
*/
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found)) {
+ cond_resched();
+ start += nr_found;
goto restart;
- rcu_read_unlock();
+ }
return ret;
}
--- 3.1-rc6/mm/truncate.c 2011-08-07 23:44:41.299928402 -0700
+++ linux/mm/truncate.c 2011-09-14 11:23:19.513059010 -0700
@@ -336,14 +336,6 @@ unsigned long invalidate_mapping_pages(s
unsigned long count = 0;
int i;
- /*
- * Note: this function may get called on a shmem/tmpfs mapping:
- * pagevec_lookup() might then return 0 prematurely (because it
- * got a gangful of swap entries); but it's hardly worth worrying
- * about - it can rarely have anything to free from such a mapping
- * (most pages are dirty), and already skips over any difficulties.
- */
-
pagevec_init(&pvec, 0);
while (index <= end && pagevec_lookup(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
--- 3.1-rc6/mm/vmscan.c 2011-08-28 22:10:26.516791859 -0700
+++ linux/mm/vmscan.c 2011-09-14 11:25:27.701694661 -0700
@@ -3375,8 +3375,8 @@ void scan_mapping_unevictable_pages(stru
pagevec_release(&pvec);
count_vm_events(UNEVICTABLE_PGSCANNED, pg_scanned);
+ cond_resched();
}
-
}
/**
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 20:38 ` Hugh Dickins
@ 2011-09-14 20:55 ` Eric Dumazet
2011-09-14 21:53 ` Hugh Dickins
0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2011-09-14 20:55 UTC (permalink / raw)
To: Hugh Dickins
Cc: Shaohua Li, Linus Torvalds, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
Le mercredi 14 septembre 2011 à 13:38 -0700, Hugh Dickins a écrit :
> On Wed, 14 Sep 2011, Shaohua Li wrote:
> > On Wed, 2011-09-14 at 16:43 +0800, Eric Dumazet wrote:
> > > Le mercredi 14 septembre 2011 à 16:20 +0800, Shaohua Li a écrit :
> > > > 2011/9/14 Shaohua Li <shli@kernel.org>:
> > > > > it appears we didn't account skipped swap entry in find_get_pages().
> > > > > does the attached patch help?
> > > > I can easily reproduce the issue. Just cp files in tmpfs, trigger swap and
> > > > drop caches. The debug patch fixes it at my side.
> > > > Eric, please try it.
> > > >
> > >
> > > Hello Shaohua
> > >
> > > I tried it with added traces :
> > >
> > >
> > > [ 277.077855] mv used greatest stack depth: 3336 bytes left
> > > [ 310.558012] nr_found=2 nr_skip=2
> > > [ 310.558139] nr_found=14 nr_skip=14
> > > [ 332.195162] nr_found=2 nr_skip=2
> > > [ 332.195274] nr_found=14 nr_skip=14
> > > [ 352.315273] nr_found=14 nr_skip=14
> > > [ 372.673575] nr_found=14 nr_skip=14
> > > [ 397.115463] nr_found=14 nr_skip=14
> > > [ 403.391694] cc1 used greatest stack depth: 3184 bytes left
> > > [ 404.761194] cc1 used greatest stack depth: 2640 bytes left
> > > [ 417.306510] nr_found=14 nr_skip=14
> > > [ 440.198051] nr_found=14 nr_skip=14
> > >
> > > I also used :
> > >
> > > - if (unlikely(!ret && nr_found))
> > > + if (unlikely(!ret && nr_found > nr_skip))
> > > goto restart;
> > nr_found > nr_skip is better
> >
> > > It seems to fix the bug. I suspect it also aborts
> > > invalidate_mapping_pages() if we skip 14 pages, but existing comment
> > > states its OK :
> > >
> > > /*
> > > * Note: this function may get called on a shmem/tmpfs mapping:
> > > * pagevec_lookup() might then return 0 prematurely (because it
> > > * got a gangful of swap entries); but it's hardly worth worrying
> > > * about - it can rarely have anything to free from such a mapping
> > > * (most pages are dirty), and already skips over any difficulties.
> > > */
> > that might be a problem, let Hugh answer if it is.
>
> Thanks to you all for suffering, reporting and investigating this.
> Yes, in 3.1-rc I have converted an extremely rare try-again-once
> into a too-easily stumbled-upon endless loop.
>
> Would it be a problem to give up early on a shmem/tmpfs mapping in
> invalidate_mapping_pages()? No, not really: it's rare for it to find
> anything it can throw away from tmpfs, because it cannot recognize the
> clean swapcache pages (getting it to work on those would be nice, and
> something I did look into once, but it's not a job for today), and
> entirely clean pages (readonly mmap'ed zeroes never touched) are uncommon.
>
> However, I did independently run across scan_mapping_unevictable_pages()
> a few days ago: that uses pagevec_lookup() on shmem when doing SHM_UNLOCK,
> and although the normal case would be that everything then is in memory,
> I think it's not impossible for some to be swapped out (already swapped
> out at SHM_LOCK time, and not touched since), which should not stop it
> from doing its work on unswapped pages beyond.
>
> My preferred patch is below: but it does add a cond_resched() into
> find_get_pages(), which is really below the level at which we usually
> do cond_resched(). All callers appear fine with it, and in practice
> it would be very^14 rare on anything other than shmem/tmpfs: so this
> being rc6 I'm reluctant to make matters worse with a might_sleep().
>
> But I'm not signing this off yet, because I'm still mystified by the
> several reports of seemingly the same problem on 3.0.1 and 3.0.2,
> which I fear the patch below (even if adjusted to apply) will do
> nothing to help - there are no swap entries in radix_tree in 3.0.
>
> My suspicion is that there's some path by which a page gets trapped
> in the radix_tree with page count 0. While it's easy to imagine that
> THP's use of compaction and compaction's use of migration could have
> made a bug there more common, I do not see it.
>
> I'd like to think about that a little more before finalizing the
> patch below - does it work, and does it look acceptable so far?
> Of course, the mods to truncate.c and vmscan.c are not essential
> parts of this fix, just things to tidy up while on the subject.
> Right now I must attend to some other stuff, will return tomorrow.
>
> Hugh
>
Hello Hugh
I am going to test this ASAP, but have one question below :
> ---
>
> mm/filemap.c | 14 ++++++++++----
> mm/truncate.c | 8 --------
> mm/vmscan.c | 2 +-
> 3 files changed, 11 insertions(+), 13 deletions(-)
>
> --- 3.1-rc6/mm/filemap.c 2011-08-07 23:44:41.231928061 -0700
> +++ linux/mm/filemap.c 2011-09-14 12:24:26.431242155 -0700
> @@ -829,8 +829,8 @@ unsigned find_get_pages(struct address_s
> unsigned int ret;
> unsigned int nr_found;
>
> - rcu_read_lock();
> restart:
> + rcu_read_lock();
> nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
> (void ***)pages, NULL, start, nr_pages);
> ret = 0;
> @@ -849,12 +849,15 @@ repeat:
> * to root: none yet gotten, safe to restart.
> */
> WARN_ON(start | i);
> + rcu_read_unlock();
> goto restart;
> }
> /*
> * Otherwise, shmem/tmpfs must be storing a swap entry
> * here as an exceptional entry: so skip over it -
> - * we only reach this from invalidate_mapping_pages().
> + * we only reach this from invalidate_mapping_pages(),
> + * or SHM_UNLOCK's scan_mapping_unevictable_pages() -
> + * in each case it's correct to skip a swapped entry.
> */
> continue;
> }
> @@ -871,14 +874,17 @@ repeat:
> pages[ret] = page;
> ret++;
> }
> + rcu_read_unlock();
>
> /*
> * If all entries were removed before we could secure them,
> * try again, because callers stop trying once 0 is returned.
> */
> - if (unlikely(!ret && nr_found))
> + if (unlikely(!ret && nr_found)) {
> + cond_resched();
> + start += nr_found;
Isnt it possible to go out of initial window ?
start could be greater than 'end' ?
invalidate_mapping_pages()
does some capping (end - index)
> pagevec_init(&pvec, 0);
> while (index <= end && pagevec_lookup(&pvec, mapping, index,
> min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 20:55 ` Eric Dumazet
@ 2011-09-14 21:53 ` Hugh Dickins
2011-09-14 22:08 ` Eric Dumazet
2011-09-14 22:37 ` Linus Torvalds
0 siblings, 2 replies; 24+ messages in thread
From: Hugh Dickins @ 2011-09-14 21:53 UTC (permalink / raw)
To: Eric Dumazet
Cc: Shaohua Li, Linus Torvalds, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2580 bytes --]
On Wed, 14 Sep 2011, Eric Dumazet wrote:
> Le mercredi 14 septembre 2011 à 13:38 -0700, Hugh Dickins a écrit :
> >
> > I'd like to think about that a little more before finalizing the
> > patch below - does it work, and does it look acceptable so far?
> > Of course, the mods to truncate.c and vmscan.c are not essential
> > parts of this fix, just things to tidy up while on the subject.
> > Right now I must attend to some other stuff, will return tomorrow.
>
> Hello Hugh
>
> I am going to test this ASAP,
Thanks, Eric, though it may not be worth spending your time on it.
It occurred to me over lunch that it may take painfully longer than
expected to invalidate_mapping_pages() on a single-swapped-out-page
1TB sparse tmpfs file - all those "start += 1" restarts until it
reaches the end.
I might decide to leave invalidate_mapping_pages() giving up early
(unsatisfying, but no worse than before), and convert scan_mapping_
unevictable_pages() (which is used on nothing but shmem) to pass
index vector to radix_tree_gang_whatever().
Dunno, I'll think about it more later.
> but have one question below :
>
> > /*
> > * If all entries were removed before we could secure them,
> > * try again, because callers stop trying once 0 is returned.
> > */
> > - if (unlikely(!ret && nr_found))
> > + if (unlikely(!ret && nr_found)) {
> > + cond_resched();
> > + start += nr_found;
>
> Isnt it possible to go out of initial window ?
> start could be greater than 'end' ?
>
> invalidate_mapping_pages()
>
> does some capping (end - index)
Good question, but even before the change (or any of my changes here)
it's perfectly possible to go out of the initial window - the radix_tree
gang interfaces allow you to specify the maximum you want back (i.e. size
of buffer), but they do not actually allow you to specify end of range.
There's a few places where we trim the maximum to match our end of range,
but that's just a slight optimization in the face of an arguably incomplete
interface. But the radix_tree is not too inefficient this way, because of
how empty nodes get removed immediately - there's a limit to the number
of nodes it will have to look through before it fills the buffer.
>
>
> > pagevec_init(&pvec, 0);
> > while (index <= end && pagevec_lookup(&pvec, mapping, index,
> > min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
It does cap by "end - index", but it has already checked "index <= end",
and it is only this minor optimization, nothing essential.
Hugh
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 21:53 ` Hugh Dickins
@ 2011-09-14 22:08 ` Eric Dumazet
2011-09-14 22:37 ` Linus Torvalds
1 sibling, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2011-09-14 22:08 UTC (permalink / raw)
To: Hugh Dickins
Cc: Shaohua Li, Linus Torvalds, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
Le mercredi 14 septembre 2011 à 14:53 -0700, Hugh Dickins a écrit :
> On Wed, 14 Sep 2011, Eric Dumazet wrote:
> > Le mercredi 14 septembre 2011 à 13:38 -0700, Hugh Dickins a écrit :
> > >
> > > I'd like to think about that a little more before finalizing the
> > > patch below - does it work, and does it look acceptable so far?
> > > Of course, the mods to truncate.c and vmscan.c are not essential
> > > parts of this fix, just things to tidy up while on the subject.
> > > Right now I must attend to some other stuff, will return tomorrow.
> >
> > Hello Hugh
> >
> > I am going to test this ASAP,
>
> Thanks, Eric, though it may not be worth spending your time on it.
> It occurred to me over lunch that it may take painfully longer than
> expected to invalidate_mapping_pages() on a single-swapped-out-page
> 1TB sparse tmpfs file - all those "start += 1" restarts until it
> reaches the end.
>
> I might decide to leave invalidate_mapping_pages() giving up early
> (unsatisfying, but no worse than before), and convert scan_mapping_
> unevictable_pages() (which is used on nothing but shmem) to pass
> index vector to radix_tree_gang_whatever().
>
> Dunno, I'll think about it more later.
>
I tested your patch as is on my machine, and everything seems fine.
I let the stress continue while I am going to sleep :)
See you
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 21:53 ` Hugh Dickins
2011-09-14 22:08 ` Eric Dumazet
@ 2011-09-14 22:37 ` Linus Torvalds
2011-09-15 0:45 ` Shaohua Li
1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2011-09-14 22:37 UTC (permalink / raw)
To: Hugh Dickins
Cc: Eric Dumazet, Shaohua Li, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
On Wed, Sep 14, 2011 at 2:53 PM, Hugh Dickins <hughd@google.com> wrote:
>
> Thanks, Eric, though it may not be worth spending your time on it.
> It occurred to me over lunch that it may take painfully longer than
> expected to invalidate_mapping_pages() on a single-swapped-out-page
> 1TB sparse tmpfs file - all those "start += 1" restarts until it
> reaches the end.
So can we have a stop-gap patch to just fixes it for now? I assume
that would be Shaohua's patch with the "nr_found > nr_skip" change?
Can you guys send whatever patch is appropriate for now with a nice
changelog and the appropriate sign-offs, please? So that we can at
least close the issue...
Linus
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 22:37 ` Linus Torvalds
@ 2011-09-15 0:45 ` Shaohua Li
2011-09-15 2:00 ` Hugh Dickins
2011-09-15 4:02 ` Eric Dumazet
0 siblings, 2 replies; 24+ messages in thread
From: Shaohua Li @ 2011-09-15 0:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Hugh Dickins, Eric Dumazet, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
On Thu, 2011-09-15 at 06:37 +0800, Linus Torvalds wrote:
> On Wed, Sep 14, 2011 at 2:53 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > Thanks, Eric, though it may not be worth spending your time on it.
> > It occurred to me over lunch that it may take painfully longer than
> > expected to invalidate_mapping_pages() on a single-swapped-out-page
> > 1TB sparse tmpfs file - all those "start += 1" restarts until it
> > reaches the end.
>
> So can we have a stop-gap patch to just fixes it for now? I assume
> that would be Shaohua's patch with the "nr_found > nr_skip" change?
>
> Can you guys send whatever patch is appropriate for now with a nice
> changelog and the appropriate sign-offs, please? So that we can at
> least close the issue...
here is my patch if you want to close the issue at hand.
Subject: mm: account skipped entries to avoid looping in find_get_pages
The found entries by find_get_pages() could be all swap entries. In
this case we skip the entries, but make sure the skipped entries are
accounted, so we don't keep looping.
Using nr_found > nr_skip to simplify code as suggested by Eric.
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
diff --git a/mm/filemap.c b/mm/filemap.c
index 645a080..7771871 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
{
unsigned int i;
unsigned int ret;
- unsigned int nr_found;
+ unsigned int nr_found, nr_skip;
rcu_read_lock();
restart:
nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
+ nr_skip = 0;
for (i = 0; i < nr_found; i++) {
struct page *page;
repeat:
@@ -856,6 +857,7 @@ repeat:
* here as an exceptional entry: so skip over it -
* we only reach this from invalidate_mapping_pages().
*/
+ nr_skip++;
continue;
}
@@ -876,7 +878,7 @@ repeat:
* If all entries were removed before we could secure them,
* try again, because callers stop trying once 0 is returned.
*/
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found > nr_skip))
goto restart;
rcu_read_unlock();
return ret;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-15 0:45 ` Shaohua Li
@ 2011-09-15 2:00 ` Hugh Dickins
2011-09-15 4:02 ` Eric Dumazet
1 sibling, 0 replies; 24+ messages in thread
From: Hugh Dickins @ 2011-09-15 2:00 UTC (permalink / raw)
To: Shaohua Li
Cc: Linus Torvalds, Eric Dumazet, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
On Thu, 15 Sep 2011, Shaohua Li wrote:
> On Thu, 2011-09-15 at 06:37 +0800, Linus Torvalds wrote:
> > On Wed, Sep 14, 2011 at 2:53 PM, Hugh Dickins <hughd@google.com> wrote:
> > >
> > > Thanks, Eric, though it may not be worth spending your time on it.
> > > It occurred to me over lunch that it may take painfully longer than
> > > expected to invalidate_mapping_pages() on a single-swapped-out-page
> > > 1TB sparse tmpfs file - all those "start += 1" restarts until it
> > > reaches the end.
> >
> > So can we have a stop-gap patch to just fixes it for now? I assume
> > that would be Shaohua's patch with the "nr_found > nr_skip" change?
> >
> > Can you guys send whatever patch is appropriate for now with a nice
> > changelog and the appropriate sign-offs, please? So that we can at
> > least close the issue...
> here is my patch if you want to close the issue at hand.
Right, it closes one of the hangs, but not whatever the 3.0 hang is,
and not the unlikely SHM_UNLOCK issue I factored in. I cannot consider
those issues closed, but I am happy to be let off the hook of providing
another fix tomorrow - thanks!
>
> Subject: mm: account skipped entries to avoid looping in find_get_pages
>
> The found entries by find_get_pages() could be all swap entries. In
> this case we skip the entries, but make sure the skipped entries are
> accounted, so we don't keep looping.
> Using nr_found > nr_skip to simplify code as suggested by Eric.
>
> Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 645a080..7771871 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
> {
> unsigned int i;
> unsigned int ret;
> - unsigned int nr_found;
> + unsigned int nr_found, nr_skip;
>
> rcu_read_lock();
> restart:
> nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
> (void ***)pages, NULL, start, nr_pages);
> ret = 0;
> + nr_skip = 0;
> for (i = 0; i < nr_found; i++) {
> struct page *page;
> repeat:
> @@ -856,6 +857,7 @@ repeat:
> * here as an exceptional entry: so skip over it -
> * we only reach this from invalidate_mapping_pages().
> */
> + nr_skip++;
> continue;
> }
>
> @@ -876,7 +878,7 @@ repeat:
> * If all entries were removed before we could secure them,
> * try again, because callers stop trying once 0 is returned.
> */
> - if (unlikely(!ret && nr_found))
> + if (unlikely(!ret && nr_found > nr_skip))
> goto restart;
> rcu_read_unlock();
> return ret;
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-15 0:45 ` Shaohua Li
2011-09-15 2:00 ` Hugh Dickins
@ 2011-09-15 4:02 ` Eric Dumazet
1 sibling, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2011-09-15 4:02 UTC (permalink / raw)
To: Shaohua Li
Cc: Linus Torvalds, Hugh Dickins, Andrew Morton, linux-kernel,
Rik van Riel, Lin Ming, Justin Piszcz, Pawel Sikora
Le jeudi 15 septembre 2011 à 08:45 +0800, Shaohua Li a écrit :
> here is my patch if you want to close the issue at hand.
>
> Subject: mm: account skipped entries to avoid looping in find_get_pages
>
> The found entries by find_get_pages() could be all swap entries. In
> this case we skip the entries, but make sure the skipped entries are
> accounted, so we don't keep looping.
> Using nr_found > nr_skip to simplify code as suggested by Eric.
>
> Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>
Yep, I guess Hugh can refine it later.
I'm pulling latest Linus tree (including this patch) and redo a stress
session, including transparent hugepage games.
Thanks !
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-14 0:34 ` Lin Ming
@ 2011-09-15 10:47 ` Pawel Sikora
2011-09-15 11:11 ` Justin Piszcz
0 siblings, 1 reply; 24+ messages in thread
From: Pawel Sikora @ 2011-09-15 10:47 UTC (permalink / raw)
To: Lin Ming
Cc: Andrew Morton, Eric Dumazet, Linus Torvalds, linux-kernel,
Andrew Morton, Toshiyuki Okajima, Dave Chinner, Hugh Dickins,
Justin Piszcz
On Wednesday 14 of September 2011 08:34:21 Lin Ming wrote:
> [3.0.2-stable] BUG: soft lockup - CPU#13 stuck for 22s! [kswapd2:1092]
> http://marc.info/?l=linux-kernel&m=131469584117857&w=2
Hi,
i'm not sure that this is fully related to this thread but i've found
new warnings about memory pages in dmesg today:
[650697.716481] ------------[ cut here ]------------
[650697.716498] WARNING: at mm/page-writeback.c:1176 __set_page_dirty_nobuffers+0x10a/0x140()
[650697.716501] Hardware name: H8DGU
[650697.716502] Modules linked in: nfs fscache binfmt_misc nfsd lockd nfs_acl auth_rpcgss sunrpc ipmi_si ipmi_devintf ipmi_msghandler sch_sfq iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_filter xt_TCPMSS xt_tcpudp iptable_mangle ip_tables ip6table_filter ip6_tables x_tables ext4 jbd2 crc16 raid10 raid0 dm_mod uvesafb autofs4
dummy aoe joydev usbhid hid ide_cd_mod cdrom ata_generic pata_acpi pata_atiixp sp5100_tco ohci_hcd ide_pci_generic ssb ehci_hcd pcmcia igb pcmcia_core psmouse mmc_core evdev
i2c_piix4 atiixp ide_core k10temp usbcore amd64_edac_mod edac_core i2c_core dca hwmon edac_mce_amd ghes serio_raw button hed processor pcspkr sg sd_mod crc_t10dif raid1 md_mod ext3
jbd mbcache ahci libahci libata scsi_mod [last unloaded: scsi_wait_scan]
[650697.716569] Pid: 16806, comm: m_xilinx Not tainted 3.0.4 #5
[650697.716572] Call Trace:
[650697.716582] [<ffffffff810470da>] warn_slowpath_common+0x7a/0xb0
[650697.716586] [<ffffffff81047125>] warn_slowpath_null+0x15/0x20
[650697.716590] [<ffffffff810e71ba>] __set_page_dirty_nobuffers+0x10a/0x140
[650697.716596] [<ffffffff81127eb8>] migrate_page_copy+0x1c8/0x1d0
[650697.716600] [<ffffffff81127ef5>] migrate_page+0x35/0x50
[650697.716623] [<ffffffffa04b6f19>] nfs_migrate_page+0x59/0xf0 [nfs]
[650697.716627] [<ffffffff81127fb9>] move_to_new_page+0xa9/0x260
[650697.716630] [<ffffffff811286bd>] migrate_pages+0x3fd/0x4c0
[650697.716635] [<ffffffff8142988e>] ? apic_timer_interrupt+0xe/0x20
[650697.716641] [<ffffffff8111cbf0>] ? ftrace_define_fields_mm_compaction_isolate_template+0x70/0x70
[650697.716645] [<ffffffff8111d5da>] compact_zone+0x52a/0x8c0
[650697.716649] [<ffffffff8111dade>] compact_zone_order+0x7e/0xb0
[650697.716653] [<ffffffff8111dbcd>] try_to_compact_pages+0xbd/0xf0
[650697.716657] [<ffffffff810e5148>] __alloc_pages_direct_compact+0xa8/0x180
[650697.716661] [<ffffffff810e588d>] __alloc_pages_nodemask+0x66d/0x7f0
[650697.716667] [<ffffffff8110a92d>] ? page_add_new_anon_rmap+0x9d/0xb0
[650697.716671] [<ffffffff8111b865>] alloc_pages_vma+0x95/0x180
[650697.716676] [<ffffffff8112c2f8>] do_huge_pmd_anonymous_page+0x138/0x310
[650697.716680] [<ffffffff81102ace>] handle_mm_fault+0x21e/0x310
[650697.716685] [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
[650697.716688] [<ffffffff811077a7>] ? do_mmap_pgoff+0x357/0x370
[650697.716692] [<ffffffff8110790d>] ? sys_mmap_pgoff+0x14d/0x220
[650697.716697] [<ffffffff811371b8>] ? do_sys_open+0x168/0x1d0
[650697.716701] [<ffffffff81421d5f>] page_fault+0x1f/0x30
[650697.716704] ---[ end trace 4255de435c6def21 ]---
BR,
Paweł.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-15 10:47 ` Pawel Sikora
@ 2011-09-15 11:11 ` Justin Piszcz
2011-09-15 12:04 ` Eric Dumazet
0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2011-09-15 11:11 UTC (permalink / raw)
To: Pawel Sikora
Cc: Lin Ming, Andrew Morton, Eric Dumazet, Linus Torvalds,
linux-kernel, Andrew Morton, Toshiyuki Okajima, Dave Chinner,
Hugh Dickins, Alan Piszcz
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4020 bytes --]
On Thu, 15 Sep 2011, Pawel Sikora wrote:
> On Wednesday 14 of September 2011 08:34:21 Lin Ming wrote:
>
>> [3.0.2-stable] BUG: soft lockup - CPU#13 stuck for 22s! [kswapd2:1092]
>> http://marc.info/?l=linux-kernel&m=131469584117857&w=2
>
> Hi,
>
> i'm not sure that this is fully related to this thread but i've found
> new warnings about memory pages in dmesg today:
>
> [650697.716481] ------------[ cut here ]------------
> [650697.716498] WARNING: at mm/page-writeback.c:1176 __set_page_dirty_nobuffers+0x10a/0x140()
> [650697.716501] Hardware name: H8DGU
> [650697.716502] Modules linked in: nfs fscache binfmt_misc nfsd lockd nfs_acl auth_rpcgss sunrpc ipmi_si ipmi_devintf ipmi_msghandler sch_sfq iptable_nat nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 iptable_filter xt_TCPMSS xt_tcpudp iptable_mangle ip_tables ip6table_filter ip6_tables x_tables ext4 jbd2 crc16 raid10 raid0 dm_mod uvesafb autofs4
> dummy aoe joydev usbhid hid ide_cd_mod cdrom ata_generic pata_acpi pata_atiixp sp5100_tco ohci_hcd ide_pci_generic ssb ehci_hcd pcmcia igb pcmcia_core psmouse mmc_core evdev
> i2c_piix4 atiixp ide_core k10temp usbcore amd64_edac_mod edac_core i2c_core dca hwmon edac_mce_amd ghes serio_raw button hed processor pcspkr sg sd_mod crc_t10dif raid1 md_mod ext3
> jbd mbcache ahci libahci libata scsi_mod [last unloaded: scsi_wait_scan]
> [650697.716569] Pid: 16806, comm: m_xilinx Not tainted 3.0.4 #5
> [650697.716572] Call Trace:
> [650697.716582] [<ffffffff810470da>] warn_slowpath_common+0x7a/0xb0
> [650697.716586] [<ffffffff81047125>] warn_slowpath_null+0x15/0x20
> [650697.716590] [<ffffffff810e71ba>] __set_page_dirty_nobuffers+0x10a/0x140
> [650697.716596] [<ffffffff81127eb8>] migrate_page_copy+0x1c8/0x1d0
> [650697.716600] [<ffffffff81127ef5>] migrate_page+0x35/0x50
> [650697.716623] [<ffffffffa04b6f19>] nfs_migrate_page+0x59/0xf0 [nfs]
> [650697.716627] [<ffffffff81127fb9>] move_to_new_page+0xa9/0x260
> [650697.716630] [<ffffffff811286bd>] migrate_pages+0x3fd/0x4c0
> [650697.716635] [<ffffffff8142988e>] ? apic_timer_interrupt+0xe/0x20
> [650697.716641] [<ffffffff8111cbf0>] ? ftrace_define_fields_mm_compaction_isolate_template+0x70/0x70
> [650697.716645] [<ffffffff8111d5da>] compact_zone+0x52a/0x8c0
> [650697.716649] [<ffffffff8111dade>] compact_zone_order+0x7e/0xb0
> [650697.716653] [<ffffffff8111dbcd>] try_to_compact_pages+0xbd/0xf0
> [650697.716657] [<ffffffff810e5148>] __alloc_pages_direct_compact+0xa8/0x180
> [650697.716661] [<ffffffff810e588d>] __alloc_pages_nodemask+0x66d/0x7f0
> [650697.716667] [<ffffffff8110a92d>] ? page_add_new_anon_rmap+0x9d/0xb0
> [650697.716671] [<ffffffff8111b865>] alloc_pages_vma+0x95/0x180
> [650697.716676] [<ffffffff8112c2f8>] do_huge_pmd_anonymous_page+0x138/0x310
> [650697.716680] [<ffffffff81102ace>] handle_mm_fault+0x21e/0x310
> [650697.716685] [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
> [650697.716688] [<ffffffff811077a7>] ? do_mmap_pgoff+0x357/0x370
> [650697.716692] [<ffffffff8110790d>] ? sys_mmap_pgoff+0x14d/0x220
> [650697.716697] [<ffffffff811371b8>] ? do_sys_open+0x168/0x1d0
> [650697.716701] [<ffffffff81421d5f>] page_fault+0x1f/0x30
> [650697.716704] ---[ end trace 4255de435c6def21 ]---
>
> BR,
> Pawe?.
>
Hi Pawell,
I had the same issues, either try the latest patch that was recommended,
OR, try the older ones (I am using these three and I have not had a memory
error/OOPS/etc in 24hrs)
Before patches:
Aug 30 05:00:48 p34 kernel: [122150.720173] [<ffffffff8103798a>] warn_slowpath_common+0x7a/0xb0
Sep 10 20:59:39 p34 kernel: [531189.671424] [<ffffffff810379ba>] warn_slowpath_common+0x7a/0xb0
After patches:
(no errors)
Patches you need (against 3.1-rc4):
(for the igb problem/memory allocation issue)
0001-Fix-pointer-dereference-before-call-to-pcie_bus_conf.patch
0002-PCI-Remove-MRRS-modification-from-MPS-setting-code.patch
(for the RCU/memory errors)
0003-filemap.patch
I've attached them to this e-mail, they seem to have fixed all of my
problems so far.
Justin.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: TEXT/x-diff; name=0003-filemap.patch, Size: 2108 bytes --]
From eric.dumazet@gmail.com Wed Sep 14 06:20:11 2011
Date: Wed, 14 Sep 2011 06:20:08
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Lin Ming <mlin@ss.pku.edu.cn>, linux-kernel@vger.kernel.org, Alan Piszcz <ap@solarrain.com>, "Li, Shaohua" <shaohua.li@intel.com>, Andrew Morton <akpm@google.com>
Subject: Re: 3.0.1: pagevec_lookup+0x1d/0x30, SLAB issues?
Le mercredi 14 septembre 2011 à 05:47 -0400, Justin Piszcz a écrit :
>
> On Wed, 14 Sep 2011, Lin Ming wrote:
>
> > On Mon, Sep 12, 2011 at 6:44 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Hi, Justin
> >
> > There is a similar bug report at:
> > http://marc.info/?t=131594190600005&r=1&w=2
> >
> > The attached patch from Shaohua fixed the bug.
> >
> > Could you have a try it?
> >
>
> Hi Lin/LKML,
>
> Can you please provide text patch files for what you want me to apply?
> I did read that e-mail thread and that could be the culprit, I will patch
> and apply as soon as someone points to to the patch locations :)
diff --git a/mm/filemap.c b/mm/filemap.c
index 645a080..7771871 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
{
unsigned int i;
unsigned int ret;
- unsigned int nr_found;
+ unsigned int nr_found, nr_skip;
rcu_read_lock();
restart:
nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
+ nr_skip = 0;
for (i = 0; i < nr_found; i++) {
struct page *page;
repeat:
@@ -856,6 +857,7 @@ repeat:
* here as an exceptional entry: so skip over it -
* we only reach this from invalidate_mapping_pages().
*/
+ nr_skip++;
continue;
}
@@ -876,7 +878,7 @@ repeat:
* If all entries were removed before we could secure them,
* try again, because callers stop trying once 0 is returned.
*/
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found > nr_skip))
goto restart;
rcu_read_unlock();
return ret;
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: Type: TEXT/x-diff; name=0002-PCI-Remove-MRRS-modification-from-MPS-setting-code.patch, Size: 4518 bytes --]
From 74d81235f8e4bd60859d539a27e51d3a09d183cf Mon Sep 17 00:00:00 2001
From: Jon Mason <mason@myri.com>
Date: Thu, 8 Sep 2011 12:59:00 -0500
Subject: [PATCH 2/2] PCI: Remove MRRS modification from MPS setting code
Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
massive negative ramifications on some devices. Without knowing which
devices have this issue, do not modify from the default value when
walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
the default procedure.
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Simon Kirby <sim@hostway.ca>
Tested-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
Signed-off-by: Jon Mason <mason@myri.com>
---
drivers/pci/pci.c | 2 +-
drivers/pci/probe.c | 41 ++++++++++++++++++++++-------------------
2 files changed, 23 insertions(+), 20 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0ce6742..4e84fd4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -77,7 +77,7 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;
unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE;
unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE;
-enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PERFORMANCE;
+enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_SAFE;
/*
* The default CLS is used if arch didn't set CLS explicitly and not
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0820fc1..b1187ff 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1396,34 +1396,37 @@ static void pcie_write_mps(struct pci_dev *dev, int mps)
static void pcie_write_mrrs(struct pci_dev *dev, int mps)
{
- int rc, mrrs;
+ int rc, mrrs, dev_mpss;
- if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
- int dev_mpss = 128 << dev->pcie_mpss;
+ /* In the "safe" case, do not configure the MRRS. There appear to be
+ * issues with setting MRRS to 0 on a number of devices.
+ */
- /* For Max performance, the MRRS must be set to the largest
- * supported value. However, it cannot be configured larger
- * than the MPS the device or the bus can support. This assumes
- * that the largest MRRS available on the device cannot be
- * smaller than the device MPSS.
- */
- mrrs = mps < dev_mpss ? mps : dev_mpss;
- } else
- /* In the "safe" case, configure the MRRS for fairness on the
- * bus by making all devices have the same size
- */
- mrrs = mps;
+ if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
+ return;
+
+ dev_mpss = 128 << dev->pcie_mpss;
+ /* For Max performance, the MRRS must be set to the largest supported
+ * value. However, it cannot be configured larger than the MPS the
+ * device or the bus can support. This assumes that the largest MRRS
+ * available on the device cannot be smaller than the device MPSS.
+ */
+ mrrs = min(mps, dev_mpss);
/* MRRS is a R/W register. Invalid values can be written, but a
- * subsiquent read will verify if the value is acceptable or not.
+ * subsequent read will verify if the value is acceptable or not.
* If the MRRS value provided is not acceptable (e.g., too large),
* shrink the value until it is acceptable to the HW.
*/
while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) {
+ dev_warn(&dev->dev, "Attempting to modify the PCI-E MRRS value"
+ " to %d. If any issues are encountered, please try "
+ "running with pci=pcie_bus_safe\n", mrrs);
rc = pcie_set_readrq(dev, mrrs);
if (rc)
- dev_err(&dev->dev, "Failed attempting to set the MRRS\n");
+ dev_err(&dev->dev,
+ "Failed attempting to set the MRRS\n");
mrrs /= 2;
}
@@ -1436,13 +1439,13 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
if (!pci_is_pcie(dev))
return 0;
- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));
pcie_write_mps(dev, mps);
pcie_write_mrrs(dev, mps);
- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));
return 0;
--
1.7.6
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: Type: TEXT/x-diff; name=0001-Fix-pointer-dereference-before-call-to-pcie_bus_conf.patch, Size: 2416 bytes --]
From cf822aed99fd8851d82ae5f2df11c29b79e316c8 Mon Sep 17 00:00:00 2001
From: Shyam Iyer <shyam.iyer.t@gmail.com>
Date: Wed, 31 Aug 2011 12:21:42 -0400
Subject: [PATCH 1/2] Fix pointer dereference before call to
pcie_bus_configure_settings
There is a potential NULL pointer dereference in calls to
pcie_bus_configure_settings due to attempts to access pci_bus self
variables when the self pointer is NULL. To correct this, verify that
the self pointer in pci_bus is non-NULL before dereferencing it.
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Jon Mason <mason@myri.com>
---
arch/x86/pci/acpi.c | 9 +++++++--
drivers/pci/hotplug/pcihp_slot.c | 4 +++-
drivers/pci/probe.c | 3 ---
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index c953302..039d913 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -365,8 +365,13 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root)
*/
if (bus) {
struct pci_bus *child;
- list_for_each_entry(child, &bus->children, node)
- pcie_bus_configure_settings(child, child->self->pcie_mpss);
+ list_for_each_entry(child, &bus->children, node) {
+ struct pci_dev *self = child->self;
+ if (!self)
+ continue;
+
+ pcie_bus_configure_settings(child, self->pcie_mpss);
+ }
}
if (!bus)
diff --git a/drivers/pci/hotplug/pcihp_slot.c b/drivers/pci/hotplug/pcihp_slot.c
index 753b21a..3ffd9c1 100644
--- a/drivers/pci/hotplug/pcihp_slot.c
+++ b/drivers/pci/hotplug/pcihp_slot.c
@@ -169,7 +169,9 @@ void pci_configure_slot(struct pci_dev *dev)
(dev->class >> 8) == PCI_CLASS_BRIDGE_PCI)))
return;
- pcie_bus_configure_settings(dev->bus, dev->bus->self->pcie_mpss);
+ if (dev->bus && dev->bus->self)
+ pcie_bus_configure_settings(dev->bus,
+ dev->bus->self->pcie_mpss);
memset(&hpp, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, &hpp);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8473727..0820fc1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1456,9 +1456,6 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
{
u8 smpss = mpss;
- if (!bus->self)
- return;
-
if (!pci_is_pcie(bus->self))
return;
--
1.7.6
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-15 11:11 ` Justin Piszcz
@ 2011-09-15 12:04 ` Eric Dumazet
2011-09-15 15:00 ` Paweł Sikora
0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2011-09-15 12:04 UTC (permalink / raw)
To: Justin Piszcz
Cc: Pawel Sikora, Lin Ming, Andrew Morton, Linus Torvalds,
linux-kernel, Andrew Morton, Toshiyuki Okajima, Dave Chinner,
Hugh Dickins, Alan Piszcz
Le jeudi 15 septembre 2011 à 07:11 -0400, Justin Piszcz a écrit :
>
> Before patches:
> Aug 30 05:00:48 p34 kernel: [122150.720173] [<ffffffff8103798a>] warn_slowpath_common+0x7a/0xb0
> Sep 10 20:59:39 p34 kernel: [531189.671424] [<ffffffff810379ba>] warn_slowpath_common+0x7a/0xb0
>
> After patches:
> (no errors)
>
> Patches you need (against 3.1-rc4):
>
> (for the igb problem/memory allocation issue)
> 0001-Fix-pointer-dereference-before-call-to-pcie_bus_conf.patch
> 0002-PCI-Remove-MRRS-modification-from-MPS-setting-code.patch
>
> (for the RCU/memory errors)
> 0003-filemap.patch
>
> I've attached them to this e-mail, they seem to have fixed all of my
> problems so far.
>
Or just pull latest Linus tree. No need to repost those patches over and
over ;)
from your local copy, do :
git pull https://github.com/torvalds/linux.git
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-15 12:04 ` Eric Dumazet
@ 2011-09-15 15:00 ` Paweł Sikora
2011-09-15 15:15 ` Eric Dumazet
0 siblings, 1 reply; 24+ messages in thread
From: Paweł Sikora @ 2011-09-15 15:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: Justin Piszcz, Lin Ming, Andrew Morton, Linus Torvalds,
linux-kernel, Andrew Morton, Toshiyuki Okajima, Dave Chinner,
Hugh Dickins, Alan Piszcz
On Thu, 15 Sep 2011 14:04:15 +0200, Eric Dumazet wrote:
> Or just pull latest Linus tree. No need to repost those patches over
> and
> over ;)
>
> from your local copy, do :
>
> git pull https://github.com/torvalds/linux.git
i'm using the 3.0.x line and mentioned patch won't be helpful
(https://lkml.org/lkml/2011/9/14/271).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] infinite loop in find_get_pages()
2011-09-15 15:00 ` Paweł Sikora
@ 2011-09-15 15:15 ` Eric Dumazet
0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2011-09-15 15:15 UTC (permalink / raw)
To: Paweł Sikora
Cc: Justin Piszcz, Lin Ming, Andrew Morton, Linus Torvalds,
linux-kernel, Andrew Morton, Toshiyuki Okajima, Dave Chinner,
Hugh Dickins, Alan Piszcz
Le jeudi 15 septembre 2011 à 17:00 +0200, Paweł Sikora a écrit :
> On Thu, 15 Sep 2011 14:04:15 +0200, Eric Dumazet wrote:
>
> > Or just pull latest Linus tree. No need to repost those patches over
> > and
> > over ;)
> >
> > from your local copy, do :
> >
> > git pull https://github.com/torvalds/linux.git
>
> i'm using the 3.0.x line and mentioned patch won't be helpful
> (https://lkml.org/lkml/2011/9/14/271).
>
All mentioned patches are for 3.1 only.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2011-09-15 15:15 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-13 19:23 [BUG] infinite loop in find_get_pages() Eric Dumazet
2011-09-13 23:53 ` Andrew Morton
2011-09-14 0:21 ` Eric Dumazet
2011-09-14 0:34 ` Lin Ming
2011-09-15 10:47 ` Pawel Sikora
2011-09-15 11:11 ` Justin Piszcz
2011-09-15 12:04 ` Eric Dumazet
2011-09-15 15:00 ` Paweł Sikora
2011-09-15 15:15 ` Eric Dumazet
[not found] ` <CA+55aFyG3-3_gqGjqUmsTAHWfmNLMdQVf4XqUZrDAGMBxgur=Q@mail.gmail.com>
2011-09-14 6:48 ` Linus Torvalds
2011-09-14 6:53 ` Eric Dumazet
2011-09-14 7:32 ` Shaohua Li
2011-09-14 8:20 ` Shaohua Li
2011-09-14 8:43 ` Eric Dumazet
2011-09-14 8:55 ` Shaohua Li
2011-09-14 20:38 ` Hugh Dickins
2011-09-14 20:55 ` Eric Dumazet
2011-09-14 21:53 ` Hugh Dickins
2011-09-14 22:08 ` Eric Dumazet
2011-09-14 22:37 ` Linus Torvalds
2011-09-15 0:45 ` Shaohua Li
2011-09-15 2:00 ` Hugh Dickins
2011-09-15 4:02 ` Eric Dumazet
[not found] ` <CA+55aFx41_Z4TjjJwPuE21Q8oD3aGWtQwh45DUiCjPVD-wCJXw@mail.gmail.com>
2011-09-14 6:48 ` Linus Torvalds
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.