* linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-04-30 22:37 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-04-30 22:37 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Benjamin Herrenschmidt, linux-kernel, linuxppc-dev Hi Paul, On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious "BUG: sleeping function called from invalid context" messages from __might_sleep(). Just once I saw such a message during startup. Once I saw such a message when rebuilding the machine's kernel. Usually I see them when I'm running a swapping load of kernel builds under memory pressure (but that's what I'm habitually running there): perhaps after a few minutes a flurry comes, then goes away, comes back again later, and after perhaps a couple of hours of that I see "INFO: rcu_preempt detected stalls" messages too, and soon it freezes (or perhaps it's still running, but I'm so flooded by messages that I reboot anyway). Rather like from before you fixed schedule_tail() for your per-cpu RCU mods, but not so easy to reproduce. I did a bisection and indeed it converged as expected on the RCU changes. No such problem seen on x86: it looks as if there's some further tweak required on PowerPC. Here are my RCU config options (I don't usually have the TORTURE_TEST in, but tried that for half an hour this morning, in the hope that it would generate the issue: but it did not). # RCU Subsystem CONFIG_TREE_PREEMPT_RCU=y CONFIG_PREEMPT_RCU=y CONFIG_RCU_FANOUT=64 # CONFIG_RCU_FANOUT_EXACT is not set CONFIG_TREE_RCU_TRACE=y # CONFIG_RCU_BOOST is not set CONFIG_HAVE_RCU_TABLE_FREE=y # CONFIG_SPARSE_RCU_POINTER is not set CONFIG_RCU_TORTURE_TEST=m CONFIG_RCU_CPU_STALL_TIMEOUT=60 # CONFIG_RCU_CPU_STALL_VERBOSE is not set # CONFIG_RCU_CPU_STALL_INFO is not set CONFIG_RCU_TRACE=y Here's the message when I was rebuilding the G5's kernel: BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 Call Trace: [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 I've plenty more examples, most of them from page faults or from kswapd; but I don't think there's any more useful information in them. Anything I can try later on? Thanks! Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-04-30 22:37 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-04-30 22:37 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linuxppc-dev, linux-kernel Hi Paul, On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious "BUG: sleeping function called from invalid context" messages from __might_sleep(). Just once I saw such a message during startup. Once I saw such a message when rebuilding the machine's kernel. Usually I see them when I'm running a swapping load of kernel builds under memory pressure (but that's what I'm habitually running there): perhaps after a few minutes a flurry comes, then goes away, comes back again later, and after perhaps a couple of hours of that I see "INFO: rcu_preempt detected stalls" messages too, and soon it freezes (or perhaps it's still running, but I'm so flooded by messages that I reboot anyway). Rather like from before you fixed schedule_tail() for your per-cpu RCU mods, but not so easy to reproduce. I did a bisection and indeed it converged as expected on the RCU changes. No such problem seen on x86: it looks as if there's some further tweak required on PowerPC. Here are my RCU config options (I don't usually have the TORTURE_TEST in, but tried that for half an hour this morning, in the hope that it would generate the issue: but it did not). # RCU Subsystem CONFIG_TREE_PREEMPT_RCU=y CONFIG_PREEMPT_RCU=y CONFIG_RCU_FANOUT=64 # CONFIG_RCU_FANOUT_EXACT is not set CONFIG_TREE_RCU_TRACE=y # CONFIG_RCU_BOOST is not set CONFIG_HAVE_RCU_TABLE_FREE=y # CONFIG_SPARSE_RCU_POINTER is not set CONFIG_RCU_TORTURE_TEST=m CONFIG_RCU_CPU_STALL_TIMEOUT=60 # CONFIG_RCU_CPU_STALL_VERBOSE is not set # CONFIG_RCU_CPU_STALL_INFO is not set CONFIG_RCU_TRACE=y Here's the message when I was rebuilding the G5's kernel: BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 Call Trace: [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 I've plenty more examples, most of them from page faults or from kswapd; but I don't think there's any more useful information in them. Anything I can try later on? Thanks! Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-04-30 22:37 ` Hugh Dickins @ 2012-04-30 23:14 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-04-30 23:14 UTC (permalink / raw) To: Hugh Dickins Cc: Paul E. McKenney, Benjamin Herrenschmidt, linux-kernel, linuxppc-dev On Mon, Apr 30, 2012 at 03:37:10PM -0700, Hugh Dickins wrote: > Hi Paul, > > On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried > rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with > CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious > "BUG: sleeping function called from invalid context" messages from > __might_sleep(). > > Just once I saw such a message during startup. Once I saw such a > message when rebuilding the machine's kernel. Usually I see them > when I'm running a swapping load of kernel builds under memory > pressure (but that's what I'm habitually running there): perhaps > after a few minutes a flurry comes, then goes away, comes back > again later, and after perhaps a couple of hours of that I see > "INFO: rcu_preempt detected stalls" messages too, and soon it > freezes (or perhaps it's still running, but I'm so flooded by > messages that I reboot anyway). > > Rather like from before you fixed schedule_tail() for your per-cpu > RCU mods, but not so easy to reproduce. I did a bisection and indeed > it converged as expected on the RCU changes. No such problem seen on > x86: it looks as if there's some further tweak required on PowerPC. > > Here are my RCU config options (I don't usually have the TORTURE_TEST > in, but tried that for half an hour this morning, in the hope that it > would generate the issue: but it did not). > > # RCU Subsystem > CONFIG_TREE_PREEMPT_RCU=y > CONFIG_PREEMPT_RCU=y > CONFIG_RCU_FANOUT=64 > # CONFIG_RCU_FANOUT_EXACT is not set > CONFIG_TREE_RCU_TRACE=y > # CONFIG_RCU_BOOST is not set > CONFIG_HAVE_RCU_TABLE_FREE=y > # CONFIG_SPARSE_RCU_POINTER is not set > CONFIG_RCU_TORTURE_TEST=m > CONFIG_RCU_CPU_STALL_TIMEOUT=60 > # CONFIG_RCU_CPU_STALL_VERBOSE is not set > # CONFIG_RCU_CPU_STALL_INFO is not set > CONFIG_RCU_TRACE=y > > Here's the message when I was rebuilding the G5's kernel: > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > Call Trace: > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > I've plenty more examples, most of them from page faults or from kswapd; > but I don't think there's any more useful information in them. > > Anything I can try later on? Interesting... As you say, I saw this sort of thing before applying the changes to schedule_tail(), and it is all too possible that there is some other "sneak path" for context switches. Have you tried running with CONFIG_PROVE_RCU? This enables some additional debugging in rcu_switch_from() and rcu_switch_to() that helped track down the schedule_tail() problem. Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-04-30 23:14 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-04-30 23:14 UTC (permalink / raw) To: Hugh Dickins; +Cc: Paul E. McKenney, linuxppc-dev, linux-kernel On Mon, Apr 30, 2012 at 03:37:10PM -0700, Hugh Dickins wrote: > Hi Paul, > > On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried > rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with > CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious > "BUG: sleeping function called from invalid context" messages from > __might_sleep(). > > Just once I saw such a message during startup. Once I saw such a > message when rebuilding the machine's kernel. Usually I see them > when I'm running a swapping load of kernel builds under memory > pressure (but that's what I'm habitually running there): perhaps > after a few minutes a flurry comes, then goes away, comes back > again later, and after perhaps a couple of hours of that I see > "INFO: rcu_preempt detected stalls" messages too, and soon it > freezes (or perhaps it's still running, but I'm so flooded by > messages that I reboot anyway). > > Rather like from before you fixed schedule_tail() for your per-cpu > RCU mods, but not so easy to reproduce. I did a bisection and indeed > it converged as expected on the RCU changes. No such problem seen on > x86: it looks as if there's some further tweak required on PowerPC. > > Here are my RCU config options (I don't usually have the TORTURE_TEST > in, but tried that for half an hour this morning, in the hope that it > would generate the issue: but it did not). > > # RCU Subsystem > CONFIG_TREE_PREEMPT_RCU=y > CONFIG_PREEMPT_RCU=y > CONFIG_RCU_FANOUT=64 > # CONFIG_RCU_FANOUT_EXACT is not set > CONFIG_TREE_RCU_TRACE=y > # CONFIG_RCU_BOOST is not set > CONFIG_HAVE_RCU_TABLE_FREE=y > # CONFIG_SPARSE_RCU_POINTER is not set > CONFIG_RCU_TORTURE_TEST=m > CONFIG_RCU_CPU_STALL_TIMEOUT=60 > # CONFIG_RCU_CPU_STALL_VERBOSE is not set > # CONFIG_RCU_CPU_STALL_INFO is not set > CONFIG_RCU_TRACE=y > > Here's the message when I was rebuilding the G5's kernel: > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > Call Trace: > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > I've plenty more examples, most of them from page faults or from kswapd; > but I don't think there's any more useful information in them. > > Anything I can try later on? Interesting... As you say, I saw this sort of thing before applying the changes to schedule_tail(), and it is all too possible that there is some other "sneak path" for context switches. Have you tried running with CONFIG_PROVE_RCU? This enables some additional debugging in rcu_switch_from() and rcu_switch_to() that helped track down the schedule_tail() problem. Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-04-30 22:37 ` Hugh Dickins @ 2012-05-01 0:33 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 40+ messages in thread From: Benjamin Herrenschmidt @ 2012-05-01 0:33 UTC (permalink / raw) To: Hugh Dickins; +Cc: Paul E. McKenney, linux-kernel, linuxppc-dev On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells like a preempt count problem... odd. Did you get a specific bisect target yet ? Cheers, Ben. > Call Trace: > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > I've plenty more examples, most of them from page faults or from kswapd; > but I don't think there's any more useful information in them. > > Anything I can try later on? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-01 0:33 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 40+ messages in thread From: Benjamin Herrenschmidt @ 2012-05-01 0:33 UTC (permalink / raw) To: Hugh Dickins; +Cc: Paul E. McKenney, linuxppc-dev, linux-kernel On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells like a preempt count problem... odd. Did you get a specific bisect target yet ? Cheers, Ben. > Call Trace: > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > I've plenty more examples, most of them from page faults or from kswapd; > but I don't think there's any more useful information in them. > > Anything I can try later on? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-01 0:33 ` Benjamin Herrenschmidt @ 2012-05-01 5:10 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-01 5:10 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Paul E. McKenney, linux-kernel, linuxppc-dev On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells > like a preempt count problem... odd. > > Did you get a specific bisect target yet ? Oh, I went as far as we need, I think, but I didn't bother quite to complete it because, once in that area, we know the schedule_tail() omission would muddy the waters: the tail of my bisect log was # bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd # good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5 from which I concluded that the patch responsible is commit ab8fc41a8545d40a4b58d745876c125af72a8a5c Author: Paul E. McKenney <paul.mckenney@linaro.org> Date: Fri Apr 13 14:32:01 2012 -0700 rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables This commit is another step towards inlinable __rcu_read_lock() and __rcu_read_unlock() functions for preemptible RCU. This keeps these two functions out of line, but switches them to use the per-CPU variables that are required to export their definitions without requiring that all RCU users include sched.h. These per-CPU variables are saved and restored at context-switch time. > > Cheers, > Ben. > > > Call Trace: > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > I've plenty more examples, most of them from page faults or from kswapd; > > but I don't think there's any more useful information in them. > > > > Anything I can try later on? I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING on that machine), but following Paul's suggestion have now turned them on. But not much light shed, I'm afraid. Within minutes it showed a trace exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU had to say was that we're holding mmap_sem at that point, which is no surprise and not a problem, just something lockdep is right to note. That was an isolated occurrence, it continued quietly for maybe 20 minutes, then output lots to the console screen - but garbled in a way I've not seen before - the 0s came out just right (or perhaps all the hex digits were being shown as 0s), but most everything else was grayly unreadable. Then after a few minutes, spontaneously rebooted. Perhaps I should remind myself of netdump; but getting the trace above without complaint from PROVE_RCU tells me that it is not helping. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-01 5:10 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-01 5:10 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Paul E. McKenney, linuxppc-dev, linux-kernel On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells > like a preempt count problem... odd. > > Did you get a specific bisect target yet ? Oh, I went as far as we need, I think, but I didn't bother quite to complete it because, once in that area, we know the schedule_tail() omission would muddy the waters: the tail of my bisect log was # bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd # good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5 from which I concluded that the patch responsible is commit ab8fc41a8545d40a4b58d745876c125af72a8a5c Author: Paul E. McKenney <paul.mckenney@linaro.org> Date: Fri Apr 13 14:32:01 2012 -0700 rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables This commit is another step towards inlinable __rcu_read_lock() and __rcu_read_unlock() functions for preemptible RCU. This keeps these two functions out of line, but switches them to use the per-CPU variables that are required to export their definitions without requiring that all RCU users include sched.h. These per-CPU variables are saved and restored at context-switch time. > > Cheers, > Ben. > > > Call Trace: > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > I've plenty more examples, most of them from page faults or from kswapd; > > but I don't think there's any more useful information in them. > > > > Anything I can try later on? I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING on that machine), but following Paul's suggestion have now turned them on. But not much light shed, I'm afraid. Within minutes it showed a trace exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU had to say was that we're holding mmap_sem at that point, which is no surprise and not a problem, just something lockdep is right to note. That was an isolated occurrence, it continued quietly for maybe 20 minutes, then output lots to the console screen - but garbled in a way I've not seen before - the 0s came out just right (or perhaps all the hex digits were being shown as 0s), but most everything else was grayly unreadable. Then after a few minutes, spontaneously rebooted. Perhaps I should remind myself of netdump; but getting the trace above without complaint from PROVE_RCU tells me that it is not helping. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-01 5:10 ` Hugh Dickins @ 2012-05-01 14:22 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-01 14:22 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote: > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells > > like a preempt count problem... odd. > > > > Did you get a specific bisect target yet ? > > Oh, I went as far as we need, I think, but I didn't bother quite to > complete it because, once in that area, we know the schedule_tail() > omission would muddy the waters: the tail of my bisect log was Agreed, your bisect is close enough for our purposes. > # bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining > git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd > # good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate > git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5 > > from which I concluded that the patch responsible is > > commit ab8fc41a8545d40a4b58d745876c125af72a8a5c > Author: Paul E. McKenney <paul.mckenney@linaro.org> > Date: Fri Apr 13 14:32:01 2012 -0700 > > rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables > > This commit is another step towards inlinable __rcu_read_lock() and > __rcu_read_unlock() functions for preemptible RCU. This keeps these two > functions out of line, but switches them to use the per-CPU variables > that are required to export their definitions without requiring that > all RCU users include sched.h. These per-CPU variables are saved and > restored at context-switch time. > > > > > Cheers, > > Ben. > > > > > Call Trace: > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > > I've plenty more examples, most of them from page faults or from kswapd; > > > but I don't think there's any more useful information in them. > > > > > > Anything I can try later on? > > I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING > on that machine), but following Paul's suggestion have now turned them on. > > But not much light shed, I'm afraid. Within minutes it showed a trace > exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU > had to say was that we're holding mmap_sem at that point, which is no > surprise and not a problem, just something lockdep is right to note. > > That was an isolated occurrence, it continued quietly for maybe 20 minutes, > then output lots to the console screen - but garbled in a way I've not > seen before - the 0s came out just right (or perhaps all the hex digits > were being shown as 0s), but most everything else was grayly unreadable. > Then after a few minutes, spontaneously rebooted. > > Perhaps I should remind myself of netdump; but getting the trace above > without complaint from PROVE_RCU tells me that it is not helping. My guess is that the following happened: 1. Task A is running, with its state in RCU's per-CPU variables. 2. Task A creates Task B and switches to it, but without invoking schedule_tail() or schedule(). Task B is now running, with Task A's state in RCU's per-CPU variables. 3. Task B switches context, saving Task A's per-CPU RCU variables (with modifications by Task B, just for fun). 4. Task A starts running again, and loads obsolete versions of its per-CPU RCU variables. This can cause rcu_read_unlock_do_special() to be invoked at inappropriate times, which could cause pretty arbitrary misbehavior. 5. Mismatched values for the RCU read-side nesting could cause the read-side critical section to complete prematurely, which could cause all manner of mischief. However, I would expect this to trigger the WARN_ON_ONCE() in __rcu_read_unlock(). Hmmm... Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-01 14:22 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-01 14:22 UTC (permalink / raw) To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote: > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells > > like a preempt count problem... odd. > > > > Did you get a specific bisect target yet ? > > Oh, I went as far as we need, I think, but I didn't bother quite to > complete it because, once in that area, we know the schedule_tail() > omission would muddy the waters: the tail of my bisect log was Agreed, your bisect is close enough for our purposes. > # bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining > git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd > # good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate > git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5 > > from which I concluded that the patch responsible is > > commit ab8fc41a8545d40a4b58d745876c125af72a8a5c > Author: Paul E. McKenney <paul.mckenney@linaro.org> > Date: Fri Apr 13 14:32:01 2012 -0700 > > rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables > > This commit is another step towards inlinable __rcu_read_lock() and > __rcu_read_unlock() functions for preemptible RCU. This keeps these two > functions out of line, but switches them to use the per-CPU variables > that are required to export their definitions without requiring that > all RCU users include sched.h. These per-CPU variables are saved and > restored at context-switch time. > > > > > Cheers, > > Ben. > > > > > Call Trace: > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > > I've plenty more examples, most of them from page faults or from kswapd; > > > but I don't think there's any more useful information in them. > > > > > > Anything I can try later on? > > I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING > on that machine), but following Paul's suggestion have now turned them on. > > But not much light shed, I'm afraid. Within minutes it showed a trace > exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU > had to say was that we're holding mmap_sem at that point, which is no > surprise and not a problem, just something lockdep is right to note. > > That was an isolated occurrence, it continued quietly for maybe 20 minutes, > then output lots to the console screen - but garbled in a way I've not > seen before - the 0s came out just right (or perhaps all the hex digits > were being shown as 0s), but most everything else was grayly unreadable. > Then after a few minutes, spontaneously rebooted. > > Perhaps I should remind myself of netdump; but getting the trace above > without complaint from PROVE_RCU tells me that it is not helping. My guess is that the following happened: 1. Task A is running, with its state in RCU's per-CPU variables. 2. Task A creates Task B and switches to it, but without invoking schedule_tail() or schedule(). Task B is now running, with Task A's state in RCU's per-CPU variables. 3. Task B switches context, saving Task A's per-CPU RCU variables (with modifications by Task B, just for fun). 4. Task A starts running again, and loads obsolete versions of its per-CPU RCU variables. This can cause rcu_read_unlock_do_special() to be invoked at inappropriate times, which could cause pretty arbitrary misbehavior. 5. Mismatched values for the RCU read-side nesting could cause the read-side critical section to complete prematurely, which could cause all manner of mischief. However, I would expect this to trigger the WARN_ON_ONCE() in __rcu_read_unlock(). Hmmm... Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-01 14:22 ` Paul E. McKenney @ 2012-05-01 21:42 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-01 21:42 UTC (permalink / raw) To: Paul E. McKenney Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Tue, 1 May 2012, Paul E. McKenney wrote: > On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote: > > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > Call Trace: > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > My guess is that the following happened: > > 1. Task A is running, with its state in RCU's per-CPU variables. > > 2. Task A creates Task B and switches to it, but without invoking > schedule_tail() or schedule(). Task B is now running, with > Task A's state in RCU's per-CPU variables. > > 3. Task B switches context, saving Task A's per-CPU RCU variables > (with modifications by Task B, just for fun). > > 4. Task A starts running again, and loads obsolete versions of its > per-CPU RCU variables. This can cause rcu_read_unlock_do_special() > to be invoked at inappropriate times, which could cause > pretty arbitrary misbehavior. > > 5. Mismatched values for the RCU read-side nesting could cause > the read-side critical section to complete prematurely, which > could cause all manner of mischief. However, I would expect > this to trigger the WARN_ON_ONCE() in __rcu_read_unlock(). > > Hmmm... I didn't find anything corresponding to that under arch/powerpc. There is something I found, that I had high hopes for, but sadly no, it does not fix it. I may be wrong, it's a long time since I thought about what happens in fork(). But I believe the rcu_switch_from(prev) you added to schedule_tail() is bogus: doesn't schedule_tail() more or less amount to a jump into schedule(), for the child to be as if it's emerging from the switch_to() in schedule()? So I think the rcu_switch_from(prev) has already been done by the prev task on the cpu, as it goes into switch_to() in schedule(). So at best you're duplicating that in schedule_tail(), and at worst (I don't know if the prev task can get far enough away for this to matter) you're messing with its state. Probably difficult to do any harm (those fields don't matter while it's on another cpu, and it has to get on another cpu for them to change), but it does look wrong to me. But commenting out that line did not fix my issues. And if you agree, you'll probably prefer, not to comment out the line, but revert back to rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail(). Something else that I noticed in comments on rcu_switch_from() in sched.h (er, is sched.h really the right place for this code?): it says "if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must also be zero" - shouldn't that say "non-zero" in each case? I must turn away from this right now, and come back to it later. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-01 21:42 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-01 21:42 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Tue, 1 May 2012, Paul E. McKenney wrote: > On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote: > > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > Call Trace: > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > My guess is that the following happened: > > 1. Task A is running, with its state in RCU's per-CPU variables. > > 2. Task A creates Task B and switches to it, but without invoking > schedule_tail() or schedule(). Task B is now running, with > Task A's state in RCU's per-CPU variables. > > 3. Task B switches context, saving Task A's per-CPU RCU variables > (with modifications by Task B, just for fun). > > 4. Task A starts running again, and loads obsolete versions of its > per-CPU RCU variables. This can cause rcu_read_unlock_do_special() > to be invoked at inappropriate times, which could cause > pretty arbitrary misbehavior. > > 5. Mismatched values for the RCU read-side nesting could cause > the read-side critical section to complete prematurely, which > could cause all manner of mischief. However, I would expect > this to trigger the WARN_ON_ONCE() in __rcu_read_unlock(). > > Hmmm... I didn't find anything corresponding to that under arch/powerpc. There is something I found, that I had high hopes for, but sadly no, it does not fix it. I may be wrong, it's a long time since I thought about what happens in fork(). But I believe the rcu_switch_from(prev) you added to schedule_tail() is bogus: doesn't schedule_tail() more or less amount to a jump into schedule(), for the child to be as if it's emerging from the switch_to() in schedule()? So I think the rcu_switch_from(prev) has already been done by the prev task on the cpu, as it goes into switch_to() in schedule(). So at best you're duplicating that in schedule_tail(), and at worst (I don't know if the prev task can get far enough away for this to matter) you're messing with its state. Probably difficult to do any harm (those fields don't matter while it's on another cpu, and it has to get on another cpu for them to change), but it does look wrong to me. But commenting out that line did not fix my issues. And if you agree, you'll probably prefer, not to comment out the line, but revert back to rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail(). Something else that I noticed in comments on rcu_switch_from() in sched.h (er, is sched.h really the right place for this code?): it says "if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must also be zero" - shouldn't that say "non-zero" in each case? I must turn away from this right now, and come back to it later. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-01 21:42 ` Hugh Dickins @ 2012-05-01 23:25 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-01 23:25 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Tue, May 01, 2012 at 02:42:02PM -0700, Hugh Dickins wrote: > On Tue, 1 May 2012, Paul E. McKenney wrote: > > On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote: > > > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > Call Trace: > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > > > > My guess is that the following happened: > > > > 1. Task A is running, with its state in RCU's per-CPU variables. > > > > 2. Task A creates Task B and switches to it, but without invoking > > schedule_tail() or schedule(). Task B is now running, with > > Task A's state in RCU's per-CPU variables. > > > > 3. Task B switches context, saving Task A's per-CPU RCU variables > > (with modifications by Task B, just for fun). > > > > 4. Task A starts running again, and loads obsolete versions of its > > per-CPU RCU variables. This can cause rcu_read_unlock_do_special() > > to be invoked at inappropriate times, which could cause > > pretty arbitrary misbehavior. > > > > 5. Mismatched values for the RCU read-side nesting could cause > > the read-side critical section to complete prematurely, which > > could cause all manner of mischief. However, I would expect > > this to trigger the WARN_ON_ONCE() in __rcu_read_unlock(). > > > > Hmmm... > > I didn't find anything corresponding to that under arch/powerpc. Nor did I. :-( > There is something I found, that I had high hopes for, but sadly no, > it does not fix it. I may be wrong, it's a long time since I thought > about what happens in fork(). But I believe the rcu_switch_from(prev) > you added to schedule_tail() is bogus: doesn't schedule_tail() more or > less amount to a jump into schedule(), for the child to be as if it's > emerging from the switch_to() in schedule()? > > So I think the rcu_switch_from(prev) has already been done by the prev > task on the cpu, as it goes into switch_to() in schedule(). So at best > you're duplicating that in schedule_tail(), and at worst (I don't know > if the prev task can get far enough away for this to matter) you're > messing with its state. Probably difficult to do any harm (those fields > don't matter while it's on another cpu, and it has to get on another cpu > for them to change), but it does look wrong to me. I do believe that you are correct. /me wonders if it was really such a good idea to tie RCU this closely to the scheduler... I also agree that the chance of error is small. The parent would have to be blocked for the child to have any probability of doing harm, but we need the probability to be zero, which it does not appear to be. I will semi-revert this change as you suggest. > But commenting out that line did not fix my issues. And if you agree, > you'll probably prefer, not to comment out the line, but revert back to > rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail(). > > Something else that I noticed in comments on rcu_switch_from() in > sched.h (er, is sched.h really the right place for this code?): it says > "if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must > also be zero" - shouldn't that say "non-zero" in each case? No, but the variables should be reversed. It should read "Both cases rely on the fact that if rcu_read_lock_nesting is zero, then rcu_read_unlock_special must also be zero." Here are the two cases: 1. rcu_read_lock_nesting is zero: Then rcu_read_unlock_special must also be zero, so there is no way to get to the rcu_read_unlock_do_special() function. A scheduling-clock interrupt might later set one of the rcu_read_unlock_special bits, but only RCU_READ_UNLOCK_BLOCKED, which is OK because rcu_read_unlock_do_special() does not mess with the scheduler in this case. 2. rcu_read_lock_nesting is non-zero: Then the task is blocking within an RCU read-side critical section, so none of the scheduler's or architecture's read-side critical sections can have the outermost rcu_read_unlock(), so the rcu_read_unlock_do_special() function will not be invoked in the first place. Thank you for catching this! > I must turn away from this right now, and come back to it later. Thank you very much for all your help with this! Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-01 23:25 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-01 23:25 UTC (permalink / raw) To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Tue, May 01, 2012 at 02:42:02PM -0700, Hugh Dickins wrote: > On Tue, 1 May 2012, Paul E. McKenney wrote: > > On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote: > > > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote: > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > Call Trace: > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > > > > My guess is that the following happened: > > > > 1. Task A is running, with its state in RCU's per-CPU variables. > > > > 2. Task A creates Task B and switches to it, but without invoking > > schedule_tail() or schedule(). Task B is now running, with > > Task A's state in RCU's per-CPU variables. > > > > 3. Task B switches context, saving Task A's per-CPU RCU variables > > (with modifications by Task B, just for fun). > > > > 4. Task A starts running again, and loads obsolete versions of its > > per-CPU RCU variables. This can cause rcu_read_unlock_do_special() > > to be invoked at inappropriate times, which could cause > > pretty arbitrary misbehavior. > > > > 5. Mismatched values for the RCU read-side nesting could cause > > the read-side critical section to complete prematurely, which > > could cause all manner of mischief. However, I would expect > > this to trigger the WARN_ON_ONCE() in __rcu_read_unlock(). > > > > Hmmm... > > I didn't find anything corresponding to that under arch/powerpc. Nor did I. :-( > There is something I found, that I had high hopes for, but sadly no, > it does not fix it. I may be wrong, it's a long time since I thought > about what happens in fork(). But I believe the rcu_switch_from(prev) > you added to schedule_tail() is bogus: doesn't schedule_tail() more or > less amount to a jump into schedule(), for the child to be as if it's > emerging from the switch_to() in schedule()? > > So I think the rcu_switch_from(prev) has already been done by the prev > task on the cpu, as it goes into switch_to() in schedule(). So at best > you're duplicating that in schedule_tail(), and at worst (I don't know > if the prev task can get far enough away for this to matter) you're > messing with its state. Probably difficult to do any harm (those fields > don't matter while it's on another cpu, and it has to get on another cpu > for them to change), but it does look wrong to me. I do believe that you are correct. /me wonders if it was really such a good idea to tie RCU this closely to the scheduler... I also agree that the chance of error is small. The parent would have to be blocked for the child to have any probability of doing harm, but we need the probability to be zero, which it does not appear to be. I will semi-revert this change as you suggest. > But commenting out that line did not fix my issues. And if you agree, > you'll probably prefer, not to comment out the line, but revert back to > rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail(). > > Something else that I noticed in comments on rcu_switch_from() in > sched.h (er, is sched.h really the right place for this code?): it says > "if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must > also be zero" - shouldn't that say "non-zero" in each case? No, but the variables should be reversed. It should read "Both cases rely on the fact that if rcu_read_lock_nesting is zero, then rcu_read_unlock_special must also be zero." Here are the two cases: 1. rcu_read_lock_nesting is zero: Then rcu_read_unlock_special must also be zero, so there is no way to get to the rcu_read_unlock_do_special() function. A scheduling-clock interrupt might later set one of the rcu_read_unlock_special bits, but only RCU_READ_UNLOCK_BLOCKED, which is OK because rcu_read_unlock_do_special() does not mess with the scheduler in this case. 2. rcu_read_lock_nesting is non-zero: Then the task is blocking within an RCU read-side critical section, so none of the scheduler's or architecture's read-side critical sections can have the outermost rcu_read_unlock(), so the rcu_read_unlock_do_special() function will not be invoked in the first place. Thank you for catching this! > I must turn away from this right now, and come back to it later. Thank you very much for all your help with this! Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-01 23:25 ` Paul E. McKenney @ 2012-05-02 20:25 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-02 20:25 UTC (permalink / raw) To: Paul E. McKenney Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > Call Trace: > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 Got it at last. Embarrassingly obvious. __rcu_read_lock() and __rcu_read_unlock() are not safe to be using __this_cpu operations, the cpu may change in between the rmw's read and write: they should be using this_cpu operations (or, I put preempt_disable/enable in the __rcu_read_unlock below). __this_cpus there work out fine on x86, which was given good instructions to use; but not so well on PowerPC. I've been running successfully for an hour now with the patch below; but I expect you'll want to consider the tradeoffs, and may choose a different solution. Hugh --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc */ static inline void __rcu_read_lock(void) { - __this_cpu_inc(rcu_read_lock_nesting); + this_cpu_inc(rcu_read_lock_nesting); barrier(); /* Keep code within RCU read-side critical section. */ } --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu */ void __rcu_read_unlock(void) { + preempt_disable(); if (__this_cpu_read(rcu_read_lock_nesting) != 1) __this_cpu_dec(rcu_read_lock_nesting); else { @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) barrier(); /* ->rcu_read_unlock_special load before assign */ __this_cpu_write(rcu_read_lock_nesting, 0); } -#ifdef CONFIG_PROVE_LOCKING +#if 1 /* CONFIG_PROVE_LOCKING */ { int rln = __this_cpu_read(rcu_read_lock_nesting); - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); + BUG_ON(rln < 0 && rln > INT_MIN / 2); } #endif /* #ifdef CONFIG_PROVE_LOCKING */ + preempt_enable(); } EXPORT_SYMBOL_GPL(__rcu_read_unlock); --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas { struct rq *rq = this_rq(); - rcu_switch_from(prev); + /* rcu_switch_from(prev); */ rcu_switch_to(); finish_task_switch(rq, prev); @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int "BUG: sleeping function called from invalid context at %s:%d\n", file, line); printk(KERN_ERR + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", + raw_smp_processor_id(), preempt_count(), preempt_offset, + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); + printk(KERN_ERR "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", in_atomic(), irqs_disabled(), current->pid, current->comm); ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 20:25 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-02 20:25 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > Call Trace: > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 Got it at last. Embarrassingly obvious. __rcu_read_lock() and __rcu_read_unlock() are not safe to be using __this_cpu operations, the cpu may change in between the rmw's read and write: they should be using this_cpu operations (or, I put preempt_disable/enable in the __rcu_read_unlock below). __this_cpus there work out fine on x86, which was given good instructions to use; but not so well on PowerPC. I've been running successfully for an hour now with the patch below; but I expect you'll want to consider the tradeoffs, and may choose a different solution. Hugh --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc */ static inline void __rcu_read_lock(void) { - __this_cpu_inc(rcu_read_lock_nesting); + this_cpu_inc(rcu_read_lock_nesting); barrier(); /* Keep code within RCU read-side critical section. */ } --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu */ void __rcu_read_unlock(void) { + preempt_disable(); if (__this_cpu_read(rcu_read_lock_nesting) != 1) __this_cpu_dec(rcu_read_lock_nesting); else { @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) barrier(); /* ->rcu_read_unlock_special load before assign */ __this_cpu_write(rcu_read_lock_nesting, 0); } -#ifdef CONFIG_PROVE_LOCKING +#if 1 /* CONFIG_PROVE_LOCKING */ { int rln = __this_cpu_read(rcu_read_lock_nesting); - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); + BUG_ON(rln < 0 && rln > INT_MIN / 2); } #endif /* #ifdef CONFIG_PROVE_LOCKING */ + preempt_enable(); } EXPORT_SYMBOL_GPL(__rcu_read_unlock); --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas { struct rq *rq = this_rq(); - rcu_switch_from(prev); + /* rcu_switch_from(prev); */ rcu_switch_to(); finish_task_switch(rq, prev); @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int "BUG: sleeping function called from invalid context at %s:%d\n", file, line); printk(KERN_ERR + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", + raw_smp_processor_id(), preempt_count(), preempt_offset, + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); + printk(KERN_ERR "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", in_atomic(), irqs_disabled(), current->pid, current->comm); ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 20:25 ` Hugh Dickins @ 2012-05-02 20:49 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 20:49 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote: > On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > > Call Trace: > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > __rcu_read_unlock() are not safe to be using __this_cpu operations, > the cpu may change in between the rmw's read and write: they should > be using this_cpu operations (or, I put preempt_disable/enable in the > __rcu_read_unlock below). __this_cpus there work out fine on x86, > which was given good instructions to use; but not so well on PowerPC. Thank you very much for tracking this down!!! > I've been running successfully for an hour now with the patch below; > but I expect you'll want to consider the tradeoffs, and may choose a > different solution. The thing that puzzles me about this is that the normal path through the scheduler would save and restore these per-CPU variables to and from the task structure. There must be a sneak path through the scheduler that I failed to account for. But given your good work, this should be easy for me to chase down even on my x86-based laptop -- just convert from __this_cpu_inc() to a read-inc-delay-write sequence. And check that the underlying variable didn't change in the meantime. And dump an ftrace if it did. ;-) Thank you again, Hugh! Thanx, Paul > Hugh > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 > +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc > */ > static inline void __rcu_read_lock(void) > { > - __this_cpu_inc(rcu_read_lock_nesting); > + this_cpu_inc(rcu_read_lock_nesting); > barrier(); /* Keep code within RCU read-side critical section. */ > } > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 > +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu > */ > void __rcu_read_unlock(void) > { > + preempt_disable(); > if (__this_cpu_read(rcu_read_lock_nesting) != 1) > __this_cpu_dec(rcu_read_lock_nesting); > else { > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) > barrier(); /* ->rcu_read_unlock_special load before assign */ > __this_cpu_write(rcu_read_lock_nesting, 0); > } > -#ifdef CONFIG_PROVE_LOCKING > +#if 1 /* CONFIG_PROVE_LOCKING */ > { > int rln = __this_cpu_read(rcu_read_lock_nesting); > > - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); > + BUG_ON(rln < 0 && rln > INT_MIN / 2); > } > #endif /* #ifdef CONFIG_PROVE_LOCKING */ > + preempt_enable(); > } > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > > --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 > +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas > { > struct rq *rq = this_rq(); > > - rcu_switch_from(prev); > + /* rcu_switch_from(prev); */ > rcu_switch_to(); > finish_task_switch(rq, prev); > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int > "BUG: sleeping function called from invalid context at %s:%d\n", > file, line); > printk(KERN_ERR > + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", > + raw_smp_processor_id(), preempt_count(), preempt_offset, > + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); > + printk(KERN_ERR > "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > in_atomic(), irqs_disabled(), > current->pid, current->comm); > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 20:49 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 20:49 UTC (permalink / raw) To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote: > On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > > Call Trace: > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > __rcu_read_unlock() are not safe to be using __this_cpu operations, > the cpu may change in between the rmw's read and write: they should > be using this_cpu operations (or, I put preempt_disable/enable in the > __rcu_read_unlock below). __this_cpus there work out fine on x86, > which was given good instructions to use; but not so well on PowerPC. Thank you very much for tracking this down!!! > I've been running successfully for an hour now with the patch below; > but I expect you'll want to consider the tradeoffs, and may choose a > different solution. The thing that puzzles me about this is that the normal path through the scheduler would save and restore these per-CPU variables to and from the task structure. There must be a sneak path through the scheduler that I failed to account for. But given your good work, this should be easy for me to chase down even on my x86-based laptop -- just convert from __this_cpu_inc() to a read-inc-delay-write sequence. And check that the underlying variable didn't change in the meantime. And dump an ftrace if it did. ;-) Thank you again, Hugh! Thanx, Paul > Hugh > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 > +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc > */ > static inline void __rcu_read_lock(void) > { > - __this_cpu_inc(rcu_read_lock_nesting); > + this_cpu_inc(rcu_read_lock_nesting); > barrier(); /* Keep code within RCU read-side critical section. */ > } > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 > +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu > */ > void __rcu_read_unlock(void) > { > + preempt_disable(); > if (__this_cpu_read(rcu_read_lock_nesting) != 1) > __this_cpu_dec(rcu_read_lock_nesting); > else { > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) > barrier(); /* ->rcu_read_unlock_special load before assign */ > __this_cpu_write(rcu_read_lock_nesting, 0); > } > -#ifdef CONFIG_PROVE_LOCKING > +#if 1 /* CONFIG_PROVE_LOCKING */ > { > int rln = __this_cpu_read(rcu_read_lock_nesting); > > - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); > + BUG_ON(rln < 0 && rln > INT_MIN / 2); > } > #endif /* #ifdef CONFIG_PROVE_LOCKING */ > + preempt_enable(); > } > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > > --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 > +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas > { > struct rq *rq = this_rq(); > > - rcu_switch_from(prev); > + /* rcu_switch_from(prev); */ > rcu_switch_to(); > finish_task_switch(rq, prev); > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int > "BUG: sleeping function called from invalid context at %s:%d\n", > file, line); > printk(KERN_ERR > + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", > + raw_smp_processor_id(), preempt_count(), preempt_offset, > + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); > + printk(KERN_ERR > "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > in_atomic(), irqs_disabled(), > current->pid, current->comm); > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 20:49 ` Paul E. McKenney @ 2012-05-02 21:32 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 21:32 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote: > On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote: > > On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > > > Call Trace: > > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > the cpu may change in between the rmw's read and write: they should > > be using this_cpu operations (or, I put preempt_disable/enable in the > > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > which was given good instructions to use; but not so well on PowerPC. > > Thank you very much for tracking this down!!! > > > I've been running successfully for an hour now with the patch below; > > but I expect you'll want to consider the tradeoffs, and may choose a > > different solution. > > The thing that puzzles me about this is that the normal path through > the scheduler would save and restore these per-CPU variables to and > from the task structure. There must be a sneak path through the > scheduler that I failed to account for. Sigh... I am slow today, I guess. The preemption could of course happen between the time that the task calculated the address of the per-CPU variable and the time that it modified it. If this happens, we are modifying some other CPU's per-CPU variable. Given that Linus saw no performance benefit from this patchset, I am strongly tempted to just drop this inlinable-__rcu_read_lock() series at this point. I suppose that the other option is to move preempt_count() to a per-CPU variable, then use the space in the task_info struct. But that didn't generate anywhere near as good of code... Thanx, Paul > But given your good work, this should be easy for me to chase down > even on my x86-based laptop -- just convert from __this_cpu_inc() to a > read-inc-delay-write sequence. And check that the underlying variable > didn't change in the meantime. And dump an ftrace if it did. ;-) > > Thank you again, Hugh! > > Thanx, Paul > > > Hugh > > > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 > > +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 > > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc > > */ > > static inline void __rcu_read_lock(void) > > { > > - __this_cpu_inc(rcu_read_lock_nesting); > > + this_cpu_inc(rcu_read_lock_nesting); > > barrier(); /* Keep code within RCU read-side critical section. */ > > } > > > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 > > +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 > > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu > > */ > > void __rcu_read_unlock(void) > > { > > + preempt_disable(); > > if (__this_cpu_read(rcu_read_lock_nesting) != 1) > > __this_cpu_dec(rcu_read_lock_nesting); > > else { > > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) > > barrier(); /* ->rcu_read_unlock_special load before assign */ > > __this_cpu_write(rcu_read_lock_nesting, 0); > > } > > -#ifdef CONFIG_PROVE_LOCKING > > +#if 1 /* CONFIG_PROVE_LOCKING */ > > { > > int rln = __this_cpu_read(rcu_read_lock_nesting); > > > > - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); > > + BUG_ON(rln < 0 && rln > INT_MIN / 2); > > } > > #endif /* #ifdef CONFIG_PROVE_LOCKING */ > > + preempt_enable(); > > } > > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > > > > --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 > > +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 > > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas > > { > > struct rq *rq = this_rq(); > > > > - rcu_switch_from(prev); > > + /* rcu_switch_from(prev); */ > > rcu_switch_to(); > > finish_task_switch(rq, prev); > > > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int > > "BUG: sleeping function called from invalid context at %s:%d\n", > > file, line); > > printk(KERN_ERR > > + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", > > + raw_smp_processor_id(), preempt_count(), preempt_offset, > > + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); > > + printk(KERN_ERR > > "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > > in_atomic(), irqs_disabled(), > > current->pid, current->comm); > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 21:32 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 21:32 UTC (permalink / raw) To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote: > On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote: > > On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > > > Call Trace: > > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > the cpu may change in between the rmw's read and write: they should > > be using this_cpu operations (or, I put preempt_disable/enable in the > > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > which was given good instructions to use; but not so well on PowerPC. > > Thank you very much for tracking this down!!! > > > I've been running successfully for an hour now with the patch below; > > but I expect you'll want to consider the tradeoffs, and may choose a > > different solution. > > The thing that puzzles me about this is that the normal path through > the scheduler would save and restore these per-CPU variables to and > from the task structure. There must be a sneak path through the > scheduler that I failed to account for. Sigh... I am slow today, I guess. The preemption could of course happen between the time that the task calculated the address of the per-CPU variable and the time that it modified it. If this happens, we are modifying some other CPU's per-CPU variable. Given that Linus saw no performance benefit from this patchset, I am strongly tempted to just drop this inlinable-__rcu_read_lock() series at this point. I suppose that the other option is to move preempt_count() to a per-CPU variable, then use the space in the task_info struct. But that didn't generate anywhere near as good of code... Thanx, Paul > But given your good work, this should be easy for me to chase down > even on my x86-based laptop -- just convert from __this_cpu_inc() to a > read-inc-delay-write sequence. And check that the underlying variable > didn't change in the meantime. And dump an ftrace if it did. ;-) > > Thank you again, Hugh! > > Thanx, Paul > > > Hugh > > > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 > > +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 > > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc > > */ > > static inline void __rcu_read_lock(void) > > { > > - __this_cpu_inc(rcu_read_lock_nesting); > > + this_cpu_inc(rcu_read_lock_nesting); > > barrier(); /* Keep code within RCU read-side critical section. */ > > } > > > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 > > +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 > > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu > > */ > > void __rcu_read_unlock(void) > > { > > + preempt_disable(); > > if (__this_cpu_read(rcu_read_lock_nesting) != 1) > > __this_cpu_dec(rcu_read_lock_nesting); > > else { > > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) > > barrier(); /* ->rcu_read_unlock_special load before assign */ > > __this_cpu_write(rcu_read_lock_nesting, 0); > > } > > -#ifdef CONFIG_PROVE_LOCKING > > +#if 1 /* CONFIG_PROVE_LOCKING */ > > { > > int rln = __this_cpu_read(rcu_read_lock_nesting); > > > > - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); > > + BUG_ON(rln < 0 && rln > INT_MIN / 2); > > } > > #endif /* #ifdef CONFIG_PROVE_LOCKING */ > > + preempt_enable(); > > } > > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > > > > --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 > > +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 > > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas > > { > > struct rq *rq = this_rq(); > > > > - rcu_switch_from(prev); > > + /* rcu_switch_from(prev); */ > > rcu_switch_to(); > > finish_task_switch(rq, prev); > > > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int > > "BUG: sleeping function called from invalid context at %s:%d\n", > > file, line); > > printk(KERN_ERR > > + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", > > + raw_smp_processor_id(), preempt_count(), preempt_offset, > > + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); > > + printk(KERN_ERR > > "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > > in_atomic(), irqs_disabled(), > > current->pid, current->comm); > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 21:32 ` Paul E. McKenney @ 2012-05-02 21:36 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 21:36 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev On Wed, May 02, 2012 at 02:32:38PM -0700, Paul E. McKenney wrote: > On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote: > > On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote: > > > On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > > > > Call Trace: > > > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > > the cpu may change in between the rmw's read and write: they should > > > be using this_cpu operations (or, I put preempt_disable/enable in the > > > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > > which was given good instructions to use; but not so well on PowerPC. > > > > Thank you very much for tracking this down!!! > > > > > I've been running successfully for an hour now with the patch below; > > > but I expect you'll want to consider the tradeoffs, and may choose a > > > different solution. > > > > The thing that puzzles me about this is that the normal path through > > the scheduler would save and restore these per-CPU variables to and > > from the task structure. There must be a sneak path through the > > scheduler that I failed to account for. > > Sigh... I am slow today, I guess. The preemption could of course > happen between the time that the task calculated the address of the > per-CPU variable and the time that it modified it. If this happens, > we are modifying some other CPU's per-CPU variable. > > Given that Linus saw no performance benefit from this patchset, I am > strongly tempted to just drop this inlinable-__rcu_read_lock() series > at this point. > > I suppose that the other option is to move preempt_count() to a > per-CPU variable, then use the space in the task_info struct. > But that didn't generate anywhere near as good of code... But preempt_count() would suffer exactly the same problem. The address is calculated, the task moves to some other CPU, and then the task is messing with some other CPU's preemption counter. Blech. Thanx, Paul > > But given your good work, this should be easy for me to chase down > > even on my x86-based laptop -- just convert from __this_cpu_inc() to a > > read-inc-delay-write sequence. And check that the underlying variable > > didn't change in the meantime. And dump an ftrace if it did. ;-) > > > > Thank you again, Hugh! > > > > Thanx, Paul > > > > > Hugh > > > > > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 > > > +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 > > > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc > > > */ > > > static inline void __rcu_read_lock(void) > > > { > > > - __this_cpu_inc(rcu_read_lock_nesting); > > > + this_cpu_inc(rcu_read_lock_nesting); > > > barrier(); /* Keep code within RCU read-side critical section. */ > > > } > > > > > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 > > > +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 > > > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu > > > */ > > > void __rcu_read_unlock(void) > > > { > > > + preempt_disable(); > > > if (__this_cpu_read(rcu_read_lock_nesting) != 1) > > > __this_cpu_dec(rcu_read_lock_nesting); > > > else { > > > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) > > > barrier(); /* ->rcu_read_unlock_special load before assign */ > > > __this_cpu_write(rcu_read_lock_nesting, 0); > > > } > > > -#ifdef CONFIG_PROVE_LOCKING > > > +#if 1 /* CONFIG_PROVE_LOCKING */ > > > { > > > int rln = __this_cpu_read(rcu_read_lock_nesting); > > > > > > - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); > > > + BUG_ON(rln < 0 && rln > INT_MIN / 2); > > > } > > > #endif /* #ifdef CONFIG_PROVE_LOCKING */ > > > + preempt_enable(); > > > } > > > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > > > > > > --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 > > > +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 > > > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas > > > { > > > struct rq *rq = this_rq(); > > > > > > - rcu_switch_from(prev); > > > + /* rcu_switch_from(prev); */ > > > rcu_switch_to(); > > > finish_task_switch(rq, prev); > > > > > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int > > > "BUG: sleeping function called from invalid context at %s:%d\n", > > > file, line); > > > printk(KERN_ERR > > > + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", > > > + raw_smp_processor_id(), preempt_count(), preempt_offset, > > > + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); > > > + printk(KERN_ERR > > > "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > > > in_atomic(), irqs_disabled(), > > > current->pid, current->comm); > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ > > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 21:36 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 21:36 UTC (permalink / raw) To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney On Wed, May 02, 2012 at 02:32:38PM -0700, Paul E. McKenney wrote: > On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote: > > On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote: > > > On Tue, 1 May 2012, Paul E. McKenney wrote: > > > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > > > > > > > > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > > > > > > > > Call Trace: > > > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > > the cpu may change in between the rmw's read and write: they should > > > be using this_cpu operations (or, I put preempt_disable/enable in the > > > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > > which was given good instructions to use; but not so well on PowerPC. > > > > Thank you very much for tracking this down!!! > > > > > I've been running successfully for an hour now with the patch below; > > > but I expect you'll want to consider the tradeoffs, and may choose a > > > different solution. > > > > The thing that puzzles me about this is that the normal path through > > the scheduler would save and restore these per-CPU variables to and > > from the task structure. There must be a sneak path through the > > scheduler that I failed to account for. > > Sigh... I am slow today, I guess. The preemption could of course > happen between the time that the task calculated the address of the > per-CPU variable and the time that it modified it. If this happens, > we are modifying some other CPU's per-CPU variable. > > Given that Linus saw no performance benefit from this patchset, I am > strongly tempted to just drop this inlinable-__rcu_read_lock() series > at this point. > > I suppose that the other option is to move preempt_count() to a > per-CPU variable, then use the space in the task_info struct. > But that didn't generate anywhere near as good of code... But preempt_count() would suffer exactly the same problem. The address is calculated, the task moves to some other CPU, and then the task is messing with some other CPU's preemption counter. Blech. Thanx, Paul > > But given your good work, this should be easy for me to chase down > > even on my x86-based laptop -- just convert from __this_cpu_inc() to a > > read-inc-delay-write sequence. And check that the underlying variable > > didn't change in the meantime. And dump an ftrace if it did. ;-) > > > > Thank you again, Hugh! > > > > Thanx, Paul > > > > > Hugh > > > > > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h 2012-04-28 09:26:38.000000000 -0700 > > > +++ testing/include/linux/rcupdate.h 2012-05-02 11:46:06.000000000 -0700 > > > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc > > > */ > > > static inline void __rcu_read_lock(void) > > > { > > > - __this_cpu_inc(rcu_read_lock_nesting); > > > + this_cpu_inc(rcu_read_lock_nesting); > > > barrier(); /* Keep code within RCU read-side critical section. */ > > > } > > > > > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c 2012-04-28 09:26:40.000000000 -0700 > > > +++ testing/kernel/rcupdate.c 2012-05-02 11:44:13.000000000 -0700 > > > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu > > > */ > > > void __rcu_read_unlock(void) > > > { > > > + preempt_disable(); > > > if (__this_cpu_read(rcu_read_lock_nesting) != 1) > > > __this_cpu_dec(rcu_read_lock_nesting); > > > else { > > > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void) > > > barrier(); /* ->rcu_read_unlock_special load before assign */ > > > __this_cpu_write(rcu_read_lock_nesting, 0); > > > } > > > -#ifdef CONFIG_PROVE_LOCKING > > > +#if 1 /* CONFIG_PROVE_LOCKING */ > > > { > > > int rln = __this_cpu_read(rcu_read_lock_nesting); > > > > > > - WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2); > > > + BUG_ON(rln < 0 && rln > INT_MIN / 2); > > > } > > > #endif /* #ifdef CONFIG_PROVE_LOCKING */ > > > + preempt_enable(); > > > } > > > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > > > > > > --- 3.4-rc4-next-20120427/kernel/sched/core.c 2012-04-28 09:26:40.000000000 -0700 > > > +++ testing/kernel/sched/core.c 2012-05-01 22:40:46.000000000 -0700 > > > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas > > > { > > > struct rq *rq = this_rq(); > > > > > > - rcu_switch_from(prev); > > > + /* rcu_switch_from(prev); */ > > > rcu_switch_to(); > > > finish_task_switch(rq, prev); > > > > > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int > > > "BUG: sleeping function called from invalid context at %s:%d\n", > > > file, line); > > > printk(KERN_ERR > > > + "cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n", > > > + raw_smp_processor_id(), preempt_count(), preempt_offset, > > > + rcu_preempt_depth(), current->rcu_read_lock_nesting_save); > > > + printk(KERN_ERR > > > "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > > > in_atomic(), irqs_disabled(), > > > current->pid, current->comm); > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ > > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 20:25 ` Hugh Dickins @ 2012-05-02 21:20 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 40+ messages in thread From: Benjamin Herrenschmidt @ 2012-05-02 21:20 UTC (permalink / raw) To: Hugh Dickins Cc: Paul E. McKenney, Paul E. McKenney, linux-kernel, linuxppc-dev On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > __rcu_read_unlock() are not safe to be using __this_cpu operations, > the cpu may change in between the rmw's read and write: they should > be using this_cpu operations (or, I put preempt_disable/enable in the > __rcu_read_unlock below). __this_cpus there work out fine on x86, > which was given good instructions to use; but not so well on PowerPC. > > I've been running successfully for an hour now with the patch below; > but I expect you'll want to consider the tradeoffs, and may choose a > different solution. Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? I thought that was going out.. Cheers, Ben. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 21:20 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 40+ messages in thread From: Benjamin Herrenschmidt @ 2012-05-02 21:20 UTC (permalink / raw) To: Hugh Dickins Cc: Paul E. McKenney, Paul E. McKenney, linuxppc-dev, linux-kernel On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > __rcu_read_unlock() are not safe to be using __this_cpu operations, > the cpu may change in between the rmw's read and write: they should > be using this_cpu operations (or, I put preempt_disable/enable in the > __rcu_read_unlock below). __this_cpus there work out fine on x86, > which was given good instructions to use; but not so well on PowerPC. > > I've been running successfully for an hour now with the patch below; > but I expect you'll want to consider the tradeoffs, and may choose a > different solution. Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? I thought that was going out.. Cheers, Ben. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 21:20 ` Benjamin Herrenschmidt @ 2012-05-02 21:54 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 21:54 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Hugh Dickins, Paul E. McKenney, linux-kernel, linuxppc-dev On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > the cpu may change in between the rmw's read and write: they should > > be using this_cpu operations (or, I put preempt_disable/enable in the > > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > which was given good instructions to use; but not so well on PowerPC. > > > > I've been running successfully for an hour now with the patch below; > > but I expect you'll want to consider the tradeoffs, and may choose a > > different solution. > > Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? > > I thought that was going out.. Linus did rant about __raw_get_cpu_var() because it cannot use the x86 %fs segement overrides a bit more than a month ago. The __this_cpu stuff is useful if you have preemption disabled -- avoids the extra layer of preempt_disable(). Or was this a different rant? Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 21:54 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-02 21:54 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paul E. McKenney, linuxppc-dev, Hugh Dickins, linux-kernel On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > the cpu may change in between the rmw's read and write: they should > > be using this_cpu operations (or, I put preempt_disable/enable in the > > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > which was given good instructions to use; but not so well on PowerPC. > > > > I've been running successfully for an hour now with the patch below; > > but I expect you'll want to consider the tradeoffs, and may choose a > > different solution. > > Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? > > I thought that was going out.. Linus did rant about __raw_get_cpu_var() because it cannot use the x86 %fs segement overrides a bit more than a month ago. The __this_cpu stuff is useful if you have preemption disabled -- avoids the extra layer of preempt_disable(). Or was this a different rant? Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 21:54 ` Paul E. McKenney @ 2012-05-02 22:54 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-02 22:54 UTC (permalink / raw) To: Paul E. McKenney Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter, linux-kernel, linuxppc-dev On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: >> > Got it at last. Embarrassingly obvious. __rcu_read_lock() and >> > __rcu_read_unlock() are not safe to be using __this_cpu operations, >> > the cpu may change in between the rmw's read and write: they should >> > be using this_cpu operations (or, I put preempt_disable/enable in the >> > __rcu_read_unlock below). __this_cpus there work out fine on x86, >> > which was given good instructions to use; but not so well on PowerPC. >> > >> > I've been running successfully for an hour now with the patch below; >> > but I expect you'll want to consider the tradeoffs, and may choose a >> > different solution. >> >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? >> >> I thought that was going out.. > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86 > %fs segement overrides a bit more than a month ago. The __this_cpu > stuff is useful if you have preemption disabled -- avoids the extra > layer of preempt_disable(). > > Or was this a different rant? https://lkml.org/lkml/2011/11/29/321 I think it ended up with Christoph removing the more egregious variants, but this_cpu_that and __this_cpu_the_other remaining. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-02 22:54 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-02 22:54 UTC (permalink / raw) To: Paul E. McKenney Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: >> > Got it at last. =C2=A0Embarrassingly obvious. =C2=A0__rcu_read_lock() = and >> > __rcu_read_unlock() are not safe to be using __this_cpu operations, >> > the cpu may change in between the rmw's read and write: they should >> > be using this_cpu operations (or, I put preempt_disable/enable in the >> > __rcu_read_unlock below). =C2=A0__this_cpus there work out fine on x86= , >> > which was given good instructions to use; but not so well on PowerPC. >> > >> > I've been running successfully for an hour now with the patch below; >> > but I expect you'll want to consider the tradeoffs, and may choose a >> > different solution. >> >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? >> >> I thought that was going out.. > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86 > %fs segement overrides a bit more than a month ago. =C2=A0The __this_cpu > stuff is useful if you have preemption disabled -- avoids the extra > layer of preempt_disable(). > > Or was this a different rant? https://lkml.org/lkml/2011/11/29/321 I think it ended up with Christoph removing the more egregious variants, but this_cpu_that and __this_cpu_the_other remaining. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-02 22:54 ` Hugh Dickins @ 2012-05-03 0:14 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-03 0:14 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter, linux-kernel, linuxppc-dev On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote: > On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: > >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > >> > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > >> > __rcu_read_unlock() are not safe to be using __this_cpu operations, > >> > the cpu may change in between the rmw's read and write: they should > >> > be using this_cpu operations (or, I put preempt_disable/enable in the > >> > __rcu_read_unlock below). __this_cpus there work out fine on x86, > >> > which was given good instructions to use; but not so well on PowerPC. > >> > > >> > I've been running successfully for an hour now with the patch below; > >> > but I expect you'll want to consider the tradeoffs, and may choose a > >> > different solution. > >> > >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? > >> > >> I thought that was going out.. > > > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86 > > %fs segement overrides a bit more than a month ago. The __this_cpu > > stuff is useful if you have preemption disabled -- avoids the extra > > layer of preempt_disable(). > > > > Or was this a different rant? > > https://lkml.org/lkml/2011/11/29/321 > > I think it ended up with Christoph removing the more egregious > variants, but this_cpu_that and __this_cpu_the_other remaining. Ah, thank you for the pointer. It would be nice to have the CPU transparency of x86 on other architectures, but from what I can see, that would require dedicating a register to this purpose -- and even then requires that the arch have indexed addressing modes. There are some other approaches, for example, having __this_cpu_that() be located at a special address that the scheduler treated as implicitly preempt_disable(). Or I suppose that the arch-specific trap-handling code could fake it. A little bit messy, but the ability to access a given CPU's per-CPU variable while running on that CPU does appear to have at least a couple of uses -- inlining RCU and also making preempt_disable() use per-CPU variables. In any case, I must confess that I feel quite silly about my series of patches. I have reverted them aside from a couple that did useful optimizations, and they should show up in -next shortly. Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-03 0:14 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-03 0:14 UTC (permalink / raw) To: Hugh Dickins Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote: > On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: > >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > >> > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > >> > __rcu_read_unlock() are not safe to be using __this_cpu operations, > >> > the cpu may change in between the rmw's read and write: they should > >> > be using this_cpu operations (or, I put preempt_disable/enable in the > >> > __rcu_read_unlock below). __this_cpus there work out fine on x86, > >> > which was given good instructions to use; but not so well on PowerPC. > >> > > >> > I've been running successfully for an hour now with the patch below; > >> > but I expect you'll want to consider the tradeoffs, and may choose a > >> > different solution. > >> > >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? > >> > >> I thought that was going out.. > > > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86 > > %fs segement overrides a bit more than a month ago. The __this_cpu > > stuff is useful if you have preemption disabled -- avoids the extra > > layer of preempt_disable(). > > > > Or was this a different rant? > > https://lkml.org/lkml/2011/11/29/321 > > I think it ended up with Christoph removing the more egregious > variants, but this_cpu_that and __this_cpu_the_other remaining. Ah, thank you for the pointer. It would be nice to have the CPU transparency of x86 on other architectures, but from what I can see, that would require dedicating a register to this purpose -- and even then requires that the arch have indexed addressing modes. There are some other approaches, for example, having __this_cpu_that() be located at a special address that the scheduler treated as implicitly preempt_disable(). Or I suppose that the arch-specific trap-handling code could fake it. A little bit messy, but the ability to access a given CPU's per-CPU variable while running on that CPU does appear to have at least a couple of uses -- inlining RCU and also making preempt_disable() use per-CPU variables. In any case, I must confess that I feel quite silly about my series of patches. I have reverted them aside from a couple that did useful optimizations, and they should show up in -next shortly. Thanx, Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-03 0:14 ` Paul E. McKenney @ 2012-05-03 0:24 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-03 0:24 UTC (permalink / raw) To: Paul E. McKenney Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter, linux-kernel, linuxppc-dev [-- Attachment #1: Type: TEXT/PLAIN, Size: 2760 bytes --] On Wed, 2 May 2012, Paul E. McKenney wrote: > On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote: > > On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney > > <paulmck@linux.vnet.ibm.com> wrote: > > > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: > > >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > > >> > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > >> > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > >> > the cpu may change in between the rmw's read and write: they should > > >> > be using this_cpu operations (or, I put preempt_disable/enable in the > > >> > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > >> > which was given good instructions to use; but not so well on PowerPC. > > >> > > > >> > I've been running successfully for an hour now with the patch below; > > >> > but I expect you'll want to consider the tradeoffs, and may choose a > > >> > different solution. > > >> > > >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? > > >> > > >> I thought that was going out.. > > > > > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86 > > > %fs segement overrides a bit more than a month ago. The __this_cpu > > > stuff is useful if you have preemption disabled -- avoids the extra > > > layer of preempt_disable(). > > > > > > Or was this a different rant? > > > > https://lkml.org/lkml/2011/11/29/321 > > > > I think it ended up with Christoph removing the more egregious > > variants, but this_cpu_that and __this_cpu_the_other remaining. > > Ah, thank you for the pointer. > > It would be nice to have the CPU transparency of x86 on other > architectures, but from what I can see, that would require dedicating > a register to this purpose -- and even then requires that the arch > have indexed addressing modes. There are some other approaches, for > example, having __this_cpu_that() be located at a special address that > the scheduler treated as implicitly preempt_disable(). Or I suppose > that the arch-specific trap-handling code could fake it. A little > bit messy, but the ability to access a given CPU's per-CPU variable > while running on that CPU does appear to have at least a couple of > uses -- inlining RCU and also making preempt_disable() use per-CPU > variables. > > In any case, I must confess that I feel quite silly about my series > of patches. I have reverted them aside from a couple that did useful > optimizations, and they should show up in -next shortly. A wee bit sad, but thank you - it was an experiment worth trying, and perhaps there will be reason to come back to it future. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-03 0:24 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-03 0:24 UTC (permalink / raw) To: Paul E. McKenney Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney [-- Attachment #1: Type: TEXT/PLAIN, Size: 2760 bytes --] On Wed, 2 May 2012, Paul E. McKenney wrote: > On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote: > > On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney > > <paulmck@linux.vnet.ibm.com> wrote: > > > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote: > > >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote: > > >> > Got it at last. Embarrassingly obvious. __rcu_read_lock() and > > >> > __rcu_read_unlock() are not safe to be using __this_cpu operations, > > >> > the cpu may change in between the rmw's read and write: they should > > >> > be using this_cpu operations (or, I put preempt_disable/enable in the > > >> > __rcu_read_unlock below). __this_cpus there work out fine on x86, > > >> > which was given good instructions to use; but not so well on PowerPC. > > >> > > > >> > I've been running successfully for an hour now with the patch below; > > >> > but I expect you'll want to consider the tradeoffs, and may choose a > > >> > different solution. > > >> > > >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ? > > >> > > >> I thought that was going out.. > > > > > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86 > > > %fs segement overrides a bit more than a month ago. The __this_cpu > > > stuff is useful if you have preemption disabled -- avoids the extra > > > layer of preempt_disable(). > > > > > > Or was this a different rant? > > > > https://lkml.org/lkml/2011/11/29/321 > > > > I think it ended up with Christoph removing the more egregious > > variants, but this_cpu_that and __this_cpu_the_other remaining. > > Ah, thank you for the pointer. > > It would be nice to have the CPU transparency of x86 on other > architectures, but from what I can see, that would require dedicating > a register to this purpose -- and even then requires that the arch > have indexed addressing modes. There are some other approaches, for > example, having __this_cpu_that() be located at a special address that > the scheduler treated as implicitly preempt_disable(). Or I suppose > that the arch-specific trap-handling code could fake it. A little > bit messy, but the ability to access a given CPU's per-CPU variable > while running on that CPU does appear to have at least a couple of > uses -- inlining RCU and also making preempt_disable() use per-CPU > variables. > > In any case, I must confess that I feel quite silly about my series > of patches. I have reverted them aside from a couple that did useful > optimizations, and they should show up in -next shortly. A wee bit sad, but thank you - it was an experiment worth trying, and perhaps there will be reason to come back to it future. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-03 0:24 ` Hugh Dickins @ 2012-05-07 16:21 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-07 16:21 UTC (permalink / raw) To: Paul E. McKenney Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter, linux-kernel, linuxppc-dev On Wed, 2 May 2012, Hugh Dickins wrote: > On Wed, 2 May 2012, Paul E. McKenney wrote: > > > > In any case, I must confess that I feel quite silly about my series > > of patches. I have reverted them aside from a couple that did useful > > optimizations, and they should show up in -next shortly. > > A wee bit sad, but thank you - it was an experiment worth trying, > and perhaps there will be reason to come back to it future. The revert indeed showed up in next-20120504: thanks, no problem now. But although it's just history, and not worth anyone's time to investigate, I shouldn't let this thread die without an epilogue. Although the patch I posted (this_cpu_inc in __rcu_read_lock, preempt_disable and enable in __rcu_read_unlock) ran well until I killed the test after 70 hours, it did not _entirely_ eliminate the sleeping function BUG messages. In 70 hours I got six isolated messages like the below (but from different __might_sleep callsites) - where before I'd have flurries of hundreds(?) and freeze within the hour. And the "rcu_nesting" debug line I'd added to the message was different: where before it was showing ffffffff on some tasks and 1 on others i.e. increment or decrement had been applied to the wrong task, these messages now all showed 0s throughout i.e. by the time the message was printed, there was no longer any justification for the message. As if a memory barrier were missing somewhere, perhaps. BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305 cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0 in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1 Call Trace: [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable) [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170 [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664 [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30 Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-07 16:21 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-07 16:21 UTC (permalink / raw) To: Paul E. McKenney Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney On Wed, 2 May 2012, Hugh Dickins wrote: > On Wed, 2 May 2012, Paul E. McKenney wrote: > > > > In any case, I must confess that I feel quite silly about my series > > of patches. I have reverted them aside from a couple that did useful > > optimizations, and they should show up in -next shortly. > > A wee bit sad, but thank you - it was an experiment worth trying, > and perhaps there will be reason to come back to it future. The revert indeed showed up in next-20120504: thanks, no problem now. But although it's just history, and not worth anyone's time to investigate, I shouldn't let this thread die without an epilogue. Although the patch I posted (this_cpu_inc in __rcu_read_lock, preempt_disable and enable in __rcu_read_unlock) ran well until I killed the test after 70 hours, it did not _entirely_ eliminate the sleeping function BUG messages. In 70 hours I got six isolated messages like the below (but from different __might_sleep callsites) - where before I'd have flurries of hundreds(?) and freeze within the hour. And the "rcu_nesting" debug line I'd added to the message was different: where before it was showing ffffffff on some tasks and 1 on others i.e. increment or decrement had been applied to the wrong task, these messages now all showed 0s throughout i.e. by the time the message was printed, there was no longer any justification for the message. As if a memory barrier were missing somewhere, perhaps. BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305 cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0 in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1 Call Trace: [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable) [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170 [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664 [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30 Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-07 16:21 ` Hugh Dickins @ 2012-05-07 18:50 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-07 18:50 UTC (permalink / raw) To: Hugh Dickins Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter, linux-kernel, linuxppc-dev On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote: > On Wed, 2 May 2012, Hugh Dickins wrote: > > On Wed, 2 May 2012, Paul E. McKenney wrote: > > > > > > In any case, I must confess that I feel quite silly about my series > > > of patches. I have reverted them aside from a couple that did useful > > > optimizations, and they should show up in -next shortly. > > > > A wee bit sad, but thank you - it was an experiment worth trying, > > and perhaps there will be reason to come back to it future. > > The revert indeed showed up in next-20120504: thanks, no problem now. > > But although it's just history, and not worth anyone's time to > investigate, I shouldn't let this thread die without an epilogue. > > Although the patch I posted (this_cpu_inc in __rcu_read_lock, > preempt_disable and enable in __rcu_read_unlock) ran well until > I killed the test after 70 hours, it did not _entirely_ eliminate > the sleeping function BUG messages. > > In 70 hours I got six isolated messages like the below (but from > different __might_sleep callsites) - where before I'd have flurries > of hundreds(?) and freeze within the hour. > > And the "rcu_nesting" debug line I'd added to the message was different: > where before it was showing ffffffff on some tasks and 1 on others i.e. > increment or decrement had been applied to the wrong task, these messages > now all showed 0s throughout i.e. by the time the message was printed, > there was no longer any justification for the message. > > As if a memory barrier were missing somewhere, perhaps. These fields should be updated only by the corresponding CPU, so if memory barriers are needed, it seems to me that the cross-CPU access is the bug, not the lack of a memory barrier. Ah... Is preemption disabled across the access to RCU's nesting level when printing out the message? If not, a preeemption at that point could result in the value printed being inaccurate. Thanx, Paul > BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305 > cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0 > in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1 > Call Trace: > [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable) > [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170 > [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664 > [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30 > > Hugh > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-07 18:50 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-07 18:50 UTC (permalink / raw) To: Hugh Dickins Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote: > On Wed, 2 May 2012, Hugh Dickins wrote: > > On Wed, 2 May 2012, Paul E. McKenney wrote: > > > > > > In any case, I must confess that I feel quite silly about my series > > > of patches. I have reverted them aside from a couple that did useful > > > optimizations, and they should show up in -next shortly. > > > > A wee bit sad, but thank you - it was an experiment worth trying, > > and perhaps there will be reason to come back to it future. > > The revert indeed showed up in next-20120504: thanks, no problem now. > > But although it's just history, and not worth anyone's time to > investigate, I shouldn't let this thread die without an epilogue. > > Although the patch I posted (this_cpu_inc in __rcu_read_lock, > preempt_disable and enable in __rcu_read_unlock) ran well until > I killed the test after 70 hours, it did not _entirely_ eliminate > the sleeping function BUG messages. > > In 70 hours I got six isolated messages like the below (but from > different __might_sleep callsites) - where before I'd have flurries > of hundreds(?) and freeze within the hour. > > And the "rcu_nesting" debug line I'd added to the message was different: > where before it was showing ffffffff on some tasks and 1 on others i.e. > increment or decrement had been applied to the wrong task, these messages > now all showed 0s throughout i.e. by the time the message was printed, > there was no longer any justification for the message. > > As if a memory barrier were missing somewhere, perhaps. These fields should be updated only by the corresponding CPU, so if memory barriers are needed, it seems to me that the cross-CPU access is the bug, not the lack of a memory barrier. Ah... Is preemption disabled across the access to RCU's nesting level when printing out the message? If not, a preeemption at that point could result in the value printed being inaccurate. Thanx, Paul > BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305 > cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0 > in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1 > Call Trace: > [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable) > [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170 > [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664 > [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30 > > Hugh > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-07 18:50 ` Paul E. McKenney @ 2012-05-07 21:38 ` Hugh Dickins -1 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-07 21:38 UTC (permalink / raw) To: Paul E. McKenney Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter, linux-kernel, linuxppc-dev On Mon, 7 May 2012, Paul E. McKenney wrote: > On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote: > > > > In 70 hours I got six isolated messages like the below (but from > > different __might_sleep callsites) - where before I'd have flurries > > of hundreds(?) and freeze within the hour. > > > > And the "rcu_nesting" debug line I'd added to the message was different: > > where before it was showing ffffffff on some tasks and 1 on others i.e. > > increment or decrement had been applied to the wrong task, these messages > > now all showed 0s throughout i.e. by the time the message was printed, > > there was no longer any justification for the message. > > > > As if a memory barrier were missing somewhere, perhaps. > > These fields should be updated only by the corresponding CPU, so > if memory barriers are needed, it seems to me that the cross-CPU > access is the bug, not the lack of a memory barrier. Yes: the code you added appeared to be using local CPU accesses only (very much intentionally), and the context switch should already have provided all the memory barriers needed there. > > Ah... Is preemption disabled across the access to RCU's nesting level > when printing out the message? If not, a preeemption at that point > could result in the value printed being inaccurate. Preemption was enabled in the cases I saw. So you're pointing out that #define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting)) should have been #define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting)) to avoid the danger of spurious __might_sleep() warnings. Yes, I believe you've got it - thanks. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-07 21:38 ` Hugh Dickins 0 siblings, 0 replies; 40+ messages in thread From: Hugh Dickins @ 2012-05-07 21:38 UTC (permalink / raw) To: Paul E. McKenney Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney On Mon, 7 May 2012, Paul E. McKenney wrote: > On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote: > > > > In 70 hours I got six isolated messages like the below (but from > > different __might_sleep callsites) - where before I'd have flurries > > of hundreds(?) and freeze within the hour. > > > > And the "rcu_nesting" debug line I'd added to the message was different: > > where before it was showing ffffffff on some tasks and 1 on others i.e. > > increment or decrement had been applied to the wrong task, these messages > > now all showed 0s throughout i.e. by the time the message was printed, > > there was no longer any justification for the message. > > > > As if a memory barrier were missing somewhere, perhaps. > > These fields should be updated only by the corresponding CPU, so > if memory barriers are needed, it seems to me that the cross-CPU > access is the bug, not the lack of a memory barrier. Yes: the code you added appeared to be using local CPU accesses only (very much intentionally), and the context switch should already have provided all the memory barriers needed there. > > Ah... Is preemption disabled across the access to RCU's nesting level > when printing out the message? If not, a preeemption at that point > could result in the value printed being inaccurate. Preemption was enabled in the cases I saw. So you're pointing out that #define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting)) should have been #define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting)) to avoid the danger of spurious __might_sleep() warnings. Yes, I believe you've got it - thanks. Hugh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs 2012-05-01 0:33 ` Benjamin Herrenschmidt @ 2012-05-01 13:39 ` Paul E. McKenney -1 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-01 13:39 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Hugh Dickins, Paul E. McKenney, linux-kernel, linuxppc-dev On Tue, May 01, 2012 at 10:33:38AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells > like a preempt count problem... odd. All of the preempt-count patches are now in mainline. :-( The CONFIG_PROVE_RCU checks verify that either the task is new or it is the same task that was last context-switched to on this CPU. So the most likely suspect is a newly created task that starts running without schedule_tail() being invoked on the path from parent task to child task. If so, the fix would be to invoke rcu_switch_from() and rcu_switch_to() on that code path. So, does Power have a way of switching to a new task without involving schedule_tail()? I convinced myself that my old bugbear, usermode helpers, aren't causing this problem, but I could easily be missing something. > Did you get a specific bisect target yet ? On this one, Hugh is close enough. ;-) Thanx, Paul > Cheers, > Ben. > > > Call Trace: > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > I've plenty more examples, most of them from page faults or from kswapd; > > but I don't think there's any more useful information in them. > > > > Anything I can try later on? > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs @ 2012-05-01 13:39 ` Paul E. McKenney 0 siblings, 0 replies; 40+ messages in thread From: Paul E. McKenney @ 2012-05-01 13:39 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paul E. McKenney, linuxppc-dev, Hugh Dickins, linux-kernel On Tue, May 01, 2012 at 10:33:38AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote: > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354 > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1 > > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells > like a preempt count problem... odd. All of the preempt-count patches are now in mainline. :-( The CONFIG_PROVE_RCU checks verify that either the task is new or it is the same task that was last context-switched to on this CPU. So the most likely suspect is a newly created task that starts running without schedule_tail() being invoked on the path from parent task to child task. If so, the fix would be to invoke rcu_switch_from() and rcu_switch_to() on that code path. So, does Power have a way of switching to a new task without involving schedule_tail()? I convinced myself that my old bugbear, usermode helpers, aren't causing this problem, but I could easily be missing something. > Did you get a specific bisect target yet ? On this one, Hugh is close enough. ;-) Thanx, Paul > Cheers, > Ben. > > > Call Trace: > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable) > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134 > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494 > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684 > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664 > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30 > > > > I've plenty more examples, most of them from page faults or from kswapd; > > but I don't think there's any more useful information in them. > > > > Anything I can try later on? > ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2012-05-07 21:38 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-30 22:37 linux-next ppc64: RCU mods cause __might_sleep BUGs Hugh Dickins 2012-04-30 22:37 ` Hugh Dickins 2012-04-30 23:14 ` Paul E. McKenney 2012-04-30 23:14 ` Paul E. McKenney 2012-05-01 0:33 ` Benjamin Herrenschmidt 2012-05-01 0:33 ` Benjamin Herrenschmidt 2012-05-01 5:10 ` Hugh Dickins 2012-05-01 5:10 ` Hugh Dickins 2012-05-01 14:22 ` Paul E. McKenney 2012-05-01 14:22 ` Paul E. McKenney 2012-05-01 21:42 ` Hugh Dickins 2012-05-01 21:42 ` Hugh Dickins 2012-05-01 23:25 ` Paul E. McKenney 2012-05-01 23:25 ` Paul E. McKenney 2012-05-02 20:25 ` Hugh Dickins 2012-05-02 20:25 ` Hugh Dickins 2012-05-02 20:49 ` Paul E. McKenney 2012-05-02 20:49 ` Paul E. McKenney 2012-05-02 21:32 ` Paul E. McKenney 2012-05-02 21:32 ` Paul E. McKenney 2012-05-02 21:36 ` Paul E. McKenney 2012-05-02 21:36 ` Paul E. McKenney 2012-05-02 21:20 ` Benjamin Herrenschmidt 2012-05-02 21:20 ` Benjamin Herrenschmidt 2012-05-02 21:54 ` Paul E. McKenney 2012-05-02 21:54 ` Paul E. McKenney 2012-05-02 22:54 ` Hugh Dickins 2012-05-02 22:54 ` Hugh Dickins 2012-05-03 0:14 ` Paul E. McKenney 2012-05-03 0:14 ` Paul E. McKenney 2012-05-03 0:24 ` Hugh Dickins 2012-05-03 0:24 ` Hugh Dickins 2012-05-07 16:21 ` Hugh Dickins 2012-05-07 16:21 ` Hugh Dickins 2012-05-07 18:50 ` Paul E. McKenney 2012-05-07 18:50 ` Paul E. McKenney 2012-05-07 21:38 ` Hugh Dickins 2012-05-07 21:38 ` Hugh Dickins 2012-05-01 13:39 ` Paul E. McKenney 2012-05-01 13:39 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.