All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-04-30 22:37 ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-04-30 22:37 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Benjamin Herrenschmidt, linux-kernel, linuxppc-dev

Hi Paul,

On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried
rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with
CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious
"BUG: sleeping function called from invalid context" messages from
__might_sleep().

Just once I saw such a message during startup.  Once I saw such a
message when rebuilding the machine's kernel.  Usually I see them
when I'm running a swapping load of kernel builds under memory
pressure (but that's what I'm habitually running there): perhaps
after a few minutes a flurry comes, then goes away, comes back
again later, and after perhaps a couple of hours of that I see
"INFO: rcu_preempt detected stalls" messages too, and soon it
freezes (or perhaps it's still running, but I'm so flooded by
messages that I reboot anyway).

Rather like from before you fixed schedule_tail() for your per-cpu
RCU mods, but not so easy to reproduce.  I did a bisection and indeed
it converged as expected on the RCU changes.  No such problem seen on
x86: it looks as if there's some further tweak required on PowerPC.

Here are my RCU config options (I don't usually have the TORTURE_TEST
in, but tried that for half an hour this morning, in the hope that it
would generate the issue: but it did not).

# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
# CONFIG_RCU_BOOST is not set
CONFIG_HAVE_RCU_TABLE_FREE=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_CPU_STALL_VERBOSE is not set
# CONFIG_RCU_CPU_STALL_INFO is not set
CONFIG_RCU_TRACE=y

Here's the message when I was rebuilding the G5's kernel:

BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
Call Trace:
[c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
[c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
[c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
[c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
[c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
[c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30

I've plenty more examples, most of them from page faults or from kswapd;
but I don't think there's any more useful information in them.

Anything I can try later on?

Thanks!
Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-04-30 22:37 ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-04-30 22:37 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linuxppc-dev, linux-kernel

Hi Paul,

On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried
rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with
CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious
"BUG: sleeping function called from invalid context" messages from
__might_sleep().

Just once I saw such a message during startup.  Once I saw such a
message when rebuilding the machine's kernel.  Usually I see them
when I'm running a swapping load of kernel builds under memory
pressure (but that's what I'm habitually running there): perhaps
after a few minutes a flurry comes, then goes away, comes back
again later, and after perhaps a couple of hours of that I see
"INFO: rcu_preempt detected stalls" messages too, and soon it
freezes (or perhaps it's still running, but I'm so flooded by
messages that I reboot anyway).

Rather like from before you fixed schedule_tail() for your per-cpu
RCU mods, but not so easy to reproduce.  I did a bisection and indeed
it converged as expected on the RCU changes.  No such problem seen on
x86: it looks as if there's some further tweak required on PowerPC.

Here are my RCU config options (I don't usually have the TORTURE_TEST
in, but tried that for half an hour this morning, in the hope that it
would generate the issue: but it did not).

# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
# CONFIG_RCU_BOOST is not set
CONFIG_HAVE_RCU_TABLE_FREE=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_CPU_STALL_VERBOSE is not set
# CONFIG_RCU_CPU_STALL_INFO is not set
CONFIG_RCU_TRACE=y

Here's the message when I was rebuilding the G5's kernel:

BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
Call Trace:
[c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
[c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
[c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
[c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
[c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
[c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30

I've plenty more examples, most of them from page faults or from kswapd;
but I don't think there's any more useful information in them.

Anything I can try later on?

Thanks!
Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-04-30 22:37 ` Hugh Dickins
@ 2012-04-30 23:14   ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-04-30 23:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Paul E. McKenney, Benjamin Herrenschmidt, linux-kernel, linuxppc-dev

On Mon, Apr 30, 2012 at 03:37:10PM -0700, Hugh Dickins wrote:
> Hi Paul,
> 
> On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried
> rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with
> CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious
> "BUG: sleeping function called from invalid context" messages from
> __might_sleep().
> 
> Just once I saw such a message during startup.  Once I saw such a
> message when rebuilding the machine's kernel.  Usually I see them
> when I'm running a swapping load of kernel builds under memory
> pressure (but that's what I'm habitually running there): perhaps
> after a few minutes a flurry comes, then goes away, comes back
> again later, and after perhaps a couple of hours of that I see
> "INFO: rcu_preempt detected stalls" messages too, and soon it
> freezes (or perhaps it's still running, but I'm so flooded by
> messages that I reboot anyway).
> 
> Rather like from before you fixed schedule_tail() for your per-cpu
> RCU mods, but not so easy to reproduce.  I did a bisection and indeed
> it converged as expected on the RCU changes.  No such problem seen on
> x86: it looks as if there's some further tweak required on PowerPC.
> 
> Here are my RCU config options (I don't usually have the TORTURE_TEST
> in, but tried that for half an hour this morning, in the hope that it
> would generate the issue: but it did not).
> 
> # RCU Subsystem
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_FANOUT=64
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_TREE_RCU_TRACE=y
> # CONFIG_RCU_BOOST is not set
> CONFIG_HAVE_RCU_TABLE_FREE=y
> # CONFIG_SPARSE_RCU_POINTER is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> # CONFIG_RCU_CPU_STALL_VERBOSE is not set
> # CONFIG_RCU_CPU_STALL_INFO is not set
> CONFIG_RCU_TRACE=y
> 
> Here's the message when I was rebuilding the G5's kernel:
> 
> BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> Call Trace:
> [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> I've plenty more examples, most of them from page faults or from kswapd;
> but I don't think there's any more useful information in them.
> 
> Anything I can try later on?

Interesting...  As you say, I saw this sort of thing before applying
the changes to schedule_tail(), and it is all too possible that there
is some other "sneak path" for context switches.

Have you tried running with CONFIG_PROVE_RCU?  This enables some
additional debugging in rcu_switch_from() and rcu_switch_to() that
helped track down the schedule_tail() problem.

						Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-04-30 23:14   ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-04-30 23:14 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Paul E. McKenney, linuxppc-dev, linux-kernel

On Mon, Apr 30, 2012 at 03:37:10PM -0700, Hugh Dickins wrote:
> Hi Paul,
> 
> On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried
> rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with
> CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious
> "BUG: sleeping function called from invalid context" messages from
> __might_sleep().
> 
> Just once I saw such a message during startup.  Once I saw such a
> message when rebuilding the machine's kernel.  Usually I see them
> when I'm running a swapping load of kernel builds under memory
> pressure (but that's what I'm habitually running there): perhaps
> after a few minutes a flurry comes, then goes away, comes back
> again later, and after perhaps a couple of hours of that I see
> "INFO: rcu_preempt detected stalls" messages too, and soon it
> freezes (or perhaps it's still running, but I'm so flooded by
> messages that I reboot anyway).
> 
> Rather like from before you fixed schedule_tail() for your per-cpu
> RCU mods, but not so easy to reproduce.  I did a bisection and indeed
> it converged as expected on the RCU changes.  No such problem seen on
> x86: it looks as if there's some further tweak required on PowerPC.
> 
> Here are my RCU config options (I don't usually have the TORTURE_TEST
> in, but tried that for half an hour this morning, in the hope that it
> would generate the issue: but it did not).
> 
> # RCU Subsystem
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_FANOUT=64
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_TREE_RCU_TRACE=y
> # CONFIG_RCU_BOOST is not set
> CONFIG_HAVE_RCU_TABLE_FREE=y
> # CONFIG_SPARSE_RCU_POINTER is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> # CONFIG_RCU_CPU_STALL_VERBOSE is not set
> # CONFIG_RCU_CPU_STALL_INFO is not set
> CONFIG_RCU_TRACE=y
> 
> Here's the message when I was rebuilding the G5's kernel:
> 
> BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> Call Trace:
> [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> I've plenty more examples, most of them from page faults or from kswapd;
> but I don't think there's any more useful information in them.
> 
> Anything I can try later on?

Interesting...  As you say, I saw this sort of thing before applying
the changes to schedule_tail(), and it is all too possible that there
is some other "sneak path" for context switches.

Have you tried running with CONFIG_PROVE_RCU?  This enables some
additional debugging in rcu_switch_from() and rcu_switch_to() that
helped track down the schedule_tail() problem.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-04-30 22:37 ` Hugh Dickins
@ 2012-05-01  0:33   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-01  0:33 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Paul E. McKenney, linux-kernel, linuxppc-dev

On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> 
> BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1

Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
like a preempt count problem... odd.

Did you get a specific bisect target yet ?

Cheers,
Ben.

> Call Trace:
> [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> I've plenty more examples, most of them from page faults or from kswapd;
> but I don't think there's any more useful information in them.
> 
> Anything I can try later on? 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-01  0:33   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-01  0:33 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Paul E. McKenney, linuxppc-dev, linux-kernel

On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> 
> BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1

Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
like a preempt count problem... odd.

Did you get a specific bisect target yet ?

Cheers,
Ben.

> Call Trace:
> [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> I've plenty more examples, most of them from page faults or from kswapd;
> but I don't think there's any more useful information in them.
> 
> Anything I can try later on? 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-01  0:33   ` Benjamin Herrenschmidt
@ 2012-05-01  5:10     ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-01  5:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Paul E. McKenney, linux-kernel, linuxppc-dev

On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > 
> > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> 
> Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
> like a preempt count problem... odd.
> 
> Did you get a specific bisect target yet ?

Oh, I went as far as we need, I think, but I didn't bother quite to
complete it because, once in that area, we know the schedule_tail()
omission would muddy the waters: the tail of my bisect log was

# bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining
git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd
# good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate
git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5

from which I concluded that the patch responsible is

commit ab8fc41a8545d40a4b58d745876c125af72a8a5c
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date:   Fri Apr 13 14:32:01 2012 -0700

    rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables
    
    This commit is another step towards inlinable __rcu_read_lock() and
    __rcu_read_unlock() functions for preemptible RCU.  This keeps these two
    functions out of line, but switches them to use the per-CPU variables
    that are required to export their definitions without requiring that
    all RCU users include sched.h.  These per-CPU variables are saved and
    restored at context-switch time.

> 
> Cheers,
> Ben.
> 
> > Call Trace:
> > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > 
> > I've plenty more examples, most of them from page faults or from kswapd;
> > but I don't think there's any more useful information in them.
> > 
> > Anything I can try later on? 

I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING
on that machine), but following Paul's suggestion have now turned them on.

But not much light shed, I'm afraid.  Within minutes it showed a trace
exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU
had to say was that we're holding mmap_sem at that point, which is no
surprise and not a problem, just something lockdep is right to note.

That was an isolated occurrence, it continued quietly for maybe 20 minutes,
then output lots to the console screen - but garbled in a way I've not
seen before - the 0s came out just right (or perhaps all the hex digits
were being shown as 0s), but most everything else was grayly unreadable.
Then after a few minutes, spontaneously rebooted.

Perhaps I should remind myself of netdump; but getting the trace above
without complaint from PROVE_RCU tells me that it is not helping.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-01  5:10     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-01  5:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Paul E. McKenney, linuxppc-dev, linux-kernel

On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > 
> > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> 
> Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
> like a preempt count problem... odd.
> 
> Did you get a specific bisect target yet ?

Oh, I went as far as we need, I think, but I didn't bother quite to
complete it because, once in that area, we know the schedule_tail()
omission would muddy the waters: the tail of my bisect log was

# bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining
git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd
# good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate
git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5

from which I concluded that the patch responsible is

commit ab8fc41a8545d40a4b58d745876c125af72a8a5c
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date:   Fri Apr 13 14:32:01 2012 -0700

    rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables
    
    This commit is another step towards inlinable __rcu_read_lock() and
    __rcu_read_unlock() functions for preemptible RCU.  This keeps these two
    functions out of line, but switches them to use the per-CPU variables
    that are required to export their definitions without requiring that
    all RCU users include sched.h.  These per-CPU variables are saved and
    restored at context-switch time.

> 
> Cheers,
> Ben.
> 
> > Call Trace:
> > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > 
> > I've plenty more examples, most of them from page faults or from kswapd;
> > but I don't think there's any more useful information in them.
> > 
> > Anything I can try later on? 

I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING
on that machine), but following Paul's suggestion have now turned them on.

But not much light shed, I'm afraid.  Within minutes it showed a trace
exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU
had to say was that we're holding mmap_sem at that point, which is no
surprise and not a problem, just something lockdep is right to note.

That was an isolated occurrence, it continued quietly for maybe 20 minutes,
then output lots to the console screen - but garbled in a way I've not
seen before - the 0s came out just right (or perhaps all the hex digits
were being shown as 0s), but most everything else was grayly unreadable.
Then after a few minutes, spontaneously rebooted.

Perhaps I should remind myself of netdump; but getting the trace above
without complaint from PROVE_RCU tells me that it is not helping.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-01  0:33   ` Benjamin Herrenschmidt
@ 2012-05-01 13:39     ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-01 13:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Hugh Dickins, Paul E. McKenney, linux-kernel, linuxppc-dev

On Tue, May 01, 2012 at 10:33:38AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > 
> > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> 
> Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
> like a preempt count problem... odd.

All of the preempt-count patches are now in mainline.  :-(

The CONFIG_PROVE_RCU checks verify that either the task is new or it
is the same task that was last context-switched to on this CPU.  So
the most likely suspect is a newly created task that starts running
without schedule_tail() being invoked on the path from parent task
to child task.  If so, the fix would be to invoke rcu_switch_from()
and rcu_switch_to() on that code path.

So, does Power have a way of switching to a new task without involving
schedule_tail()?  I convinced myself that my old bugbear, usermode helpers,
aren't causing this problem, but I could easily be missing something.

> Did you get a specific bisect target yet ?

On this one, Hugh is close enough.  ;-)

							Thanx, Paul

> Cheers,
> Ben.
> 
> > Call Trace:
> > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > 
> > I've plenty more examples, most of them from page faults or from kswapd;
> > but I don't think there's any more useful information in them.
> > 
> > Anything I can try later on? 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-01 13:39     ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-01 13:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul E. McKenney, linuxppc-dev, Hugh Dickins, linux-kernel

On Tue, May 01, 2012 at 10:33:38AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > 
> > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> 
> Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
> like a preempt count problem... odd.

All of the preempt-count patches are now in mainline.  :-(

The CONFIG_PROVE_RCU checks verify that either the task is new or it
is the same task that was last context-switched to on this CPU.  So
the most likely suspect is a newly created task that starts running
without schedule_tail() being invoked on the path from parent task
to child task.  If so, the fix would be to invoke rcu_switch_from()
and rcu_switch_to() on that code path.

So, does Power have a way of switching to a new task without involving
schedule_tail()?  I convinced myself that my old bugbear, usermode helpers,
aren't causing this problem, but I could easily be missing something.

> Did you get a specific bisect target yet ?

On this one, Hugh is close enough.  ;-)

							Thanx, Paul

> Cheers,
> Ben.
> 
> > Call Trace:
> > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > 
> > I've plenty more examples, most of them from page faults or from kswapd;
> > but I don't think there's any more useful information in them.
> > 
> > Anything I can try later on? 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-01  5:10     ` Hugh Dickins
@ 2012-05-01 14:22       ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-01 14:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > 
> > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > 
> > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
> > like a preempt count problem... odd.
> > 
> > Did you get a specific bisect target yet ?
> 
> Oh, I went as far as we need, I think, but I didn't bother quite to
> complete it because, once in that area, we know the schedule_tail()
> omission would muddy the waters: the tail of my bisect log was

Agreed, your bisect is close enough for our purposes.

> # bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining
> git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd
> # good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate
> git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5
> 
> from which I concluded that the patch responsible is
> 
> commit ab8fc41a8545d40a4b58d745876c125af72a8a5c
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> Date:   Fri Apr 13 14:32:01 2012 -0700
> 
>     rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables
>     
>     This commit is another step towards inlinable __rcu_read_lock() and
>     __rcu_read_unlock() functions for preemptible RCU.  This keeps these two
>     functions out of line, but switches them to use the per-CPU variables
>     that are required to export their definitions without requiring that
>     all RCU users include sched.h.  These per-CPU variables are saved and
>     restored at context-switch time.
> 
> > 
> > Cheers,
> > Ben.
> > 
> > > Call Trace:
> > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > 
> > > I've plenty more examples, most of them from page faults or from kswapd;
> > > but I don't think there's any more useful information in them.
> > > 
> > > Anything I can try later on? 
> 
> I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING
> on that machine), but following Paul's suggestion have now turned them on.
> 
> But not much light shed, I'm afraid.  Within minutes it showed a trace
> exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU
> had to say was that we're holding mmap_sem at that point, which is no
> surprise and not a problem, just something lockdep is right to note.
> 
> That was an isolated occurrence, it continued quietly for maybe 20 minutes,
> then output lots to the console screen - but garbled in a way I've not
> seen before - the 0s came out just right (or perhaps all the hex digits
> were being shown as 0s), but most everything else was grayly unreadable.
> Then after a few minutes, spontaneously rebooted.
> 
> Perhaps I should remind myself of netdump; but getting the trace above
> without complaint from PROVE_RCU tells me that it is not helping.

My guess is that the following happened:

1.	Task A is running, with its state in RCU's per-CPU variables.

2.	Task A creates Task B and switches to it, but without invoking
	schedule_tail() or schedule().  Task B is now running, with
	Task A's state in RCU's per-CPU variables.

3.	Task B switches context, saving Task A's per-CPU RCU variables
	(with modifications by Task B, just for fun).

4.	Task A starts running again, and loads obsolete versions of its
	per-CPU RCU variables.  This can cause rcu_read_unlock_do_special()
	to be invoked at inappropriate times, which could cause
	pretty arbitrary misbehavior.

5.	Mismatched values for the RCU read-side nesting could cause
	the read-side critical section to complete prematurely, which
	could cause all manner of mischief.  However, I would expect
	this to trigger the WARN_ON_ONCE() in __rcu_read_unlock().

Hmmm...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-01 14:22       ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-01 14:22 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > 
> > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > 
> > Hrm ... in_atomic and irqs_disabled are both 0 ... so yeah it smells
> > like a preempt count problem... odd.
> > 
> > Did you get a specific bisect target yet ?
> 
> Oh, I went as far as we need, I think, but I didn't bother quite to
> complete it because, once in that area, we know the schedule_tail()
> omission would muddy the waters: the tail of my bisect log was

Agreed, your bisect is close enough for our purposes.

> # bad: [e798cf3385d3aa7c84afa65677eb92e0c0876dfd] rcu: Add exports for per-CPU variables used for inlining
> git bisect bad e798cf3385d3aa7c84afa65677eb92e0c0876dfd
> # good: [90aec3b06194393c909e3e5a47b6ed99bb8caba5] rcu: Make exit_rcu() more precise and consolidate
> git bisect good 90aec3b06194393c909e3e5a47b6ed99bb8caba5
> 
> from which I concluded that the patch responsible is
> 
> commit ab8fc41a8545d40a4b58d745876c125af72a8a5c
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> Date:   Fri Apr 13 14:32:01 2012 -0700
> 
>     rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables
>     
>     This commit is another step towards inlinable __rcu_read_lock() and
>     __rcu_read_unlock() functions for preemptible RCU.  This keeps these two
>     functions out of line, but switches them to use the per-CPU variables
>     that are required to export their definitions without requiring that
>     all RCU users include sched.h.  These per-CPU variables are saved and
>     restored at context-switch time.
> 
> > 
> > Cheers,
> > Ben.
> > 
> > > Call Trace:
> > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > 
> > > I've plenty more examples, most of them from page faults or from kswapd;
> > > but I don't think there's any more useful information in them.
> > > 
> > > Anything I can try later on? 
> 
> I'd forgotten about CONFIG_PROVE_RCU (and hadn't been using PROVE_LOCKING
> on that machine), but following Paul's suggestion have now turned them on.
> 
> But not much light shed, I'm afraid.  Within minutes it showed a trace
> exactly like the one above, but the only thing PROVE_LOCKING and PROVE_RCU
> had to say was that we're holding mmap_sem at that point, which is no
> surprise and not a problem, just something lockdep is right to note.
> 
> That was an isolated occurrence, it continued quietly for maybe 20 minutes,
> then output lots to the console screen - but garbled in a way I've not
> seen before - the 0s came out just right (or perhaps all the hex digits
> were being shown as 0s), but most everything else was grayly unreadable.
> Then after a few minutes, spontaneously rebooted.
> 
> Perhaps I should remind myself of netdump; but getting the trace above
> without complaint from PROVE_RCU tells me that it is not helping.

My guess is that the following happened:

1.	Task A is running, with its state in RCU's per-CPU variables.

2.	Task A creates Task B and switches to it, but without invoking
	schedule_tail() or schedule().  Task B is now running, with
	Task A's state in RCU's per-CPU variables.

3.	Task B switches context, saving Task A's per-CPU RCU variables
	(with modifications by Task B, just for fun).

4.	Task A starts running again, and loads obsolete versions of its
	per-CPU RCU variables.  This can cause rcu_read_unlock_do_special()
	to be invoked at inappropriate times, which could cause
	pretty arbitrary misbehavior.

5.	Mismatched values for the RCU read-side nesting could cause
	the read-side critical section to complete prematurely, which
	could cause all manner of mischief.  However, I would expect
	this to trigger the WARN_ON_ONCE() in __rcu_read_unlock().

Hmmm...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-01 14:22       ` Paul E. McKenney
@ 2012-05-01 21:42         ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-01 21:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Tue, 1 May 2012, Paul E. McKenney wrote:
> On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote:
> > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > 
> > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > Call Trace:
> > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > > 

> My guess is that the following happened:
> 
> 1.	Task A is running, with its state in RCU's per-CPU variables.
> 
> 2.	Task A creates Task B and switches to it, but without invoking
> 	schedule_tail() or schedule().  Task B is now running, with
> 	Task A's state in RCU's per-CPU variables.
> 
> 3.	Task B switches context, saving Task A's per-CPU RCU variables
> 	(with modifications by Task B, just for fun).
> 
> 4.	Task A starts running again, and loads obsolete versions of its
> 	per-CPU RCU variables.  This can cause rcu_read_unlock_do_special()
> 	to be invoked at inappropriate times, which could cause
> 	pretty arbitrary misbehavior.
> 
> 5.	Mismatched values for the RCU read-side nesting could cause
> 	the read-side critical section to complete prematurely, which
> 	could cause all manner of mischief.  However, I would expect
> 	this to trigger the WARN_ON_ONCE() in __rcu_read_unlock().
> 
> Hmmm...

I didn't find anything corresponding to that under arch/powerpc.

There is something I found, that I had high hopes for, but sadly no,
it does not fix it.  I may be wrong, it's a long time since I thought
about what happens in fork().  But I believe the rcu_switch_from(prev)
you added to schedule_tail() is bogus: doesn't schedule_tail() more or
less amount to a jump into schedule(), for the child to be as if it's
emerging from the switch_to() in schedule()?

So I think the rcu_switch_from(prev) has already been done by the prev
task on the cpu, as it goes into switch_to() in schedule().  So at best
you're duplicating that in schedule_tail(), and at worst (I don't know
if the prev task can get far enough away for this to matter) you're
messing with its state.  Probably difficult to do any harm (those fields
don't matter while it's on another cpu, and it has to get on another cpu
for them to change), but it does look wrong to me.

But commenting out that line did not fix my issues.  And if you agree,
you'll probably prefer, not to comment out the line, but revert back to
rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail().

Something else that I noticed in comments on rcu_switch_from() in
sched.h (er, is sched.h really the right place for this code?): it says
"if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must
also be zero" - shouldn't that say "non-zero" in each case?

I must turn away from this right now, and come back to it later.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-01 21:42         ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-01 21:42 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Tue, 1 May 2012, Paul E. McKenney wrote:
> On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote:
> > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > 
> > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > Call Trace:
> > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > > 

> My guess is that the following happened:
> 
> 1.	Task A is running, with its state in RCU's per-CPU variables.
> 
> 2.	Task A creates Task B and switches to it, but without invoking
> 	schedule_tail() or schedule().  Task B is now running, with
> 	Task A's state in RCU's per-CPU variables.
> 
> 3.	Task B switches context, saving Task A's per-CPU RCU variables
> 	(with modifications by Task B, just for fun).
> 
> 4.	Task A starts running again, and loads obsolete versions of its
> 	per-CPU RCU variables.  This can cause rcu_read_unlock_do_special()
> 	to be invoked at inappropriate times, which could cause
> 	pretty arbitrary misbehavior.
> 
> 5.	Mismatched values for the RCU read-side nesting could cause
> 	the read-side critical section to complete prematurely, which
> 	could cause all manner of mischief.  However, I would expect
> 	this to trigger the WARN_ON_ONCE() in __rcu_read_unlock().
> 
> Hmmm...

I didn't find anything corresponding to that under arch/powerpc.

There is something I found, that I had high hopes for, but sadly no,
it does not fix it.  I may be wrong, it's a long time since I thought
about what happens in fork().  But I believe the rcu_switch_from(prev)
you added to schedule_tail() is bogus: doesn't schedule_tail() more or
less amount to a jump into schedule(), for the child to be as if it's
emerging from the switch_to() in schedule()?

So I think the rcu_switch_from(prev) has already been done by the prev
task on the cpu, as it goes into switch_to() in schedule().  So at best
you're duplicating that in schedule_tail(), and at worst (I don't know
if the prev task can get far enough away for this to matter) you're
messing with its state.  Probably difficult to do any harm (those fields
don't matter while it's on another cpu, and it has to get on another cpu
for them to change), but it does look wrong to me.

But commenting out that line did not fix my issues.  And if you agree,
you'll probably prefer, not to comment out the line, but revert back to
rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail().

Something else that I noticed in comments on rcu_switch_from() in
sched.h (er, is sched.h really the right place for this code?): it says
"if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must
also be zero" - shouldn't that say "non-zero" in each case?

I must turn away from this right now, and come back to it later.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-01 21:42         ` Hugh Dickins
@ 2012-05-01 23:25           ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-01 23:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Tue, May 01, 2012 at 02:42:02PM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2012, Paul E. McKenney wrote:
> > On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote:
> > > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > 
> > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > Call Trace:
> > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > > > 
> 
> > My guess is that the following happened:
> > 
> > 1.	Task A is running, with its state in RCU's per-CPU variables.
> > 
> > 2.	Task A creates Task B and switches to it, but without invoking
> > 	schedule_tail() or schedule().  Task B is now running, with
> > 	Task A's state in RCU's per-CPU variables.
> > 
> > 3.	Task B switches context, saving Task A's per-CPU RCU variables
> > 	(with modifications by Task B, just for fun).
> > 
> > 4.	Task A starts running again, and loads obsolete versions of its
> > 	per-CPU RCU variables.  This can cause rcu_read_unlock_do_special()
> > 	to be invoked at inappropriate times, which could cause
> > 	pretty arbitrary misbehavior.
> > 
> > 5.	Mismatched values for the RCU read-side nesting could cause
> > 	the read-side critical section to complete prematurely, which
> > 	could cause all manner of mischief.  However, I would expect
> > 	this to trigger the WARN_ON_ONCE() in __rcu_read_unlock().
> > 
> > Hmmm...
> 
> I didn't find anything corresponding to that under arch/powerpc.

Nor did I.  :-(

> There is something I found, that I had high hopes for, but sadly no,
> it does not fix it.  I may be wrong, it's a long time since I thought
> about what happens in fork().  But I believe the rcu_switch_from(prev)
> you added to schedule_tail() is bogus: doesn't schedule_tail() more or
> less amount to a jump into schedule(), for the child to be as if it's
> emerging from the switch_to() in schedule()?
> 
> So I think the rcu_switch_from(prev) has already been done by the prev
> task on the cpu, as it goes into switch_to() in schedule().  So at best
> you're duplicating that in schedule_tail(), and at worst (I don't know
> if the prev task can get far enough away for this to matter) you're
> messing with its state.  Probably difficult to do any harm (those fields
> don't matter while it's on another cpu, and it has to get on another cpu
> for them to change), but it does look wrong to me.

I do believe that you are correct.  /me wonders if it was really such
a good idea to tie RCU this closely to the scheduler...

I also agree that the chance of error is small.  The parent would have
to be blocked for the child to have any probability of doing harm,
but we need the probability to be zero, which it does not appear to be.

I will semi-revert this change as you suggest.

> But commenting out that line did not fix my issues.  And if you agree,
> you'll probably prefer, not to comment out the line, but revert back to
> rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail().
> 
> Something else that I noticed in comments on rcu_switch_from() in
> sched.h (er, is sched.h really the right place for this code?): it says
> "if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must
> also be zero" - shouldn't that say "non-zero" in each case?

No, but the variables should be reversed.  It should read "Both
cases rely on the fact that if rcu_read_lock_nesting is zero, then
rcu_read_unlock_special must also be zero."  Here are the two cases:

1.	rcu_read_lock_nesting is zero:  Then rcu_read_unlock_special
	must also be zero, so there is no way to get to the
	rcu_read_unlock_do_special() function.  A scheduling-clock
	interrupt might later set one of the rcu_read_unlock_special
	bits, but only RCU_READ_UNLOCK_BLOCKED, which is OK because
	rcu_read_unlock_do_special() does not mess with the scheduler
	in this case.

2.	rcu_read_lock_nesting is non-zero: Then the task is blocking
	within an RCU read-side critical section, so none of the
	scheduler's or architecture's read-side critical sections
	can have the outermost rcu_read_unlock(), so the
	rcu_read_unlock_do_special() function will not be invoked
	in the first place.

Thank you for catching this!

> I must turn away from this right now, and come back to it later.

Thank you very much for all your help with this!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-01 23:25           ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-01 23:25 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Tue, May 01, 2012 at 02:42:02PM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2012, Paul E. McKenney wrote:
> > On Mon, Apr 30, 2012 at 10:10:06PM -0700, Hugh Dickins wrote:
> > > On Tue, 1 May 2012, Benjamin Herrenschmidt wrote:
> > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > 
> > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > Call Trace:
> > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > > > 
> 
> > My guess is that the following happened:
> > 
> > 1.	Task A is running, with its state in RCU's per-CPU variables.
> > 
> > 2.	Task A creates Task B and switches to it, but without invoking
> > 	schedule_tail() or schedule().  Task B is now running, with
> > 	Task A's state in RCU's per-CPU variables.
> > 
> > 3.	Task B switches context, saving Task A's per-CPU RCU variables
> > 	(with modifications by Task B, just for fun).
> > 
> > 4.	Task A starts running again, and loads obsolete versions of its
> > 	per-CPU RCU variables.  This can cause rcu_read_unlock_do_special()
> > 	to be invoked at inappropriate times, which could cause
> > 	pretty arbitrary misbehavior.
> > 
> > 5.	Mismatched values for the RCU read-side nesting could cause
> > 	the read-side critical section to complete prematurely, which
> > 	could cause all manner of mischief.  However, I would expect
> > 	this to trigger the WARN_ON_ONCE() in __rcu_read_unlock().
> > 
> > Hmmm...
> 
> I didn't find anything corresponding to that under arch/powerpc.

Nor did I.  :-(

> There is something I found, that I had high hopes for, but sadly no,
> it does not fix it.  I may be wrong, it's a long time since I thought
> about what happens in fork().  But I believe the rcu_switch_from(prev)
> you added to schedule_tail() is bogus: doesn't schedule_tail() more or
> less amount to a jump into schedule(), for the child to be as if it's
> emerging from the switch_to() in schedule()?
> 
> So I think the rcu_switch_from(prev) has already been done by the prev
> task on the cpu, as it goes into switch_to() in schedule().  So at best
> you're duplicating that in schedule_tail(), and at worst (I don't know
> if the prev task can get far enough away for this to matter) you're
> messing with its state.  Probably difficult to do any harm (those fields
> don't matter while it's on another cpu, and it has to get on another cpu
> for them to change), but it does look wrong to me.

I do believe that you are correct.  /me wonders if it was really such
a good idea to tie RCU this closely to the scheduler...

I also agree that the chance of error is small.  The parent would have
to be blocked for the child to have any probability of doing harm,
but we need the probability to be zero, which it does not appear to be.

I will semi-revert this change as you suggest.

> But commenting out that line did not fix my issues.  And if you agree,
> you'll probably prefer, not to comment out the line, but revert back to
> rcu_switch_from(void) and just add the rcu_switch_to() to schedule_tail().
> 
> Something else that I noticed in comments on rcu_switch_from() in
> sched.h (er, is sched.h really the right place for this code?): it says
> "if rcu_read_unlock_special is zero, then rcu_read_lock_nesting must
> also be zero" - shouldn't that say "non-zero" in each case?

No, but the variables should be reversed.  It should read "Both
cases rely on the fact that if rcu_read_lock_nesting is zero, then
rcu_read_unlock_special must also be zero."  Here are the two cases:

1.	rcu_read_lock_nesting is zero:  Then rcu_read_unlock_special
	must also be zero, so there is no way to get to the
	rcu_read_unlock_do_special() function.  A scheduling-clock
	interrupt might later set one of the rcu_read_unlock_special
	bits, but only RCU_READ_UNLOCK_BLOCKED, which is OK because
	rcu_read_unlock_do_special() does not mess with the scheduler
	in this case.

2.	rcu_read_lock_nesting is non-zero: Then the task is blocking
	within an RCU read-side critical section, so none of the
	scheduler's or architecture's read-side critical sections
	can have the outermost rcu_read_unlock(), so the
	rcu_read_unlock_do_special() function will not be invoked
	in the first place.

Thank you for catching this!

> I must turn away from this right now, and come back to it later.

Thank you very much for all your help with this!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-01 23:25           ` Paul E. McKenney
@ 2012-05-02 20:25             ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-02 20:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > 
> > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > Call Trace:
> > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30

Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
__rcu_read_unlock() are not safe to be using __this_cpu operations,
the cpu may change in between the rmw's read and write: they should
be using this_cpu operations (or, I put preempt_disable/enable in the
__rcu_read_unlock below).  __this_cpus there work out fine on x86,
which was given good instructions to use; but not so well on PowerPC.

I've been running successfully for an hour now with the patch below;
but I expect you'll want to consider the tradeoffs, and may choose a
different solution.

Hugh

--- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
+++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
@@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
  */
 static inline void __rcu_read_lock(void)
 {
-	__this_cpu_inc(rcu_read_lock_nesting);
+	this_cpu_inc(rcu_read_lock_nesting);
 	barrier(); /* Keep code within RCU read-side critical section. */
 }
 
--- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
+++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
@@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
  */
 void __rcu_read_unlock(void)
 {
+	preempt_disable();
 	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
 		__this_cpu_dec(rcu_read_lock_nesting);
 	else {
@@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
 		barrier();  /* ->rcu_read_unlock_special load before assign */
 		__this_cpu_write(rcu_read_lock_nesting, 0);
 	}
-#ifdef CONFIG_PROVE_LOCKING
+#if 1 /* CONFIG_PROVE_LOCKING */
 	{
 		int rln = __this_cpu_read(rcu_read_lock_nesting);
 
-		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
+		BUG_ON(rln < 0 && rln > INT_MIN / 2);
 	}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
+	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(__rcu_read_unlock);
 
--- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
+++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
@@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
 {
 	struct rq *rq = this_rq();
 
-	rcu_switch_from(prev);
+	/* rcu_switch_from(prev); */
 	rcu_switch_to();
 	finish_task_switch(rq, prev);
 
@@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
 		"BUG: sleeping function called from invalid context at %s:%d\n",
 			file, line);
 	printk(KERN_ERR
+		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
+		raw_smp_processor_id(), preempt_count(), preempt_offset,
+		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
+	printk(KERN_ERR
 		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
 			in_atomic(), irqs_disabled(),
 			current->pid, current->comm);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 20:25             ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-02 20:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > 
> > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > Call Trace:
> > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30

Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
__rcu_read_unlock() are not safe to be using __this_cpu operations,
the cpu may change in between the rmw's read and write: they should
be using this_cpu operations (or, I put preempt_disable/enable in the
__rcu_read_unlock below).  __this_cpus there work out fine on x86,
which was given good instructions to use; but not so well on PowerPC.

I've been running successfully for an hour now with the patch below;
but I expect you'll want to consider the tradeoffs, and may choose a
different solution.

Hugh

--- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
+++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
@@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
  */
 static inline void __rcu_read_lock(void)
 {
-	__this_cpu_inc(rcu_read_lock_nesting);
+	this_cpu_inc(rcu_read_lock_nesting);
 	barrier(); /* Keep code within RCU read-side critical section. */
 }
 
--- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
+++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
@@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
  */
 void __rcu_read_unlock(void)
 {
+	preempt_disable();
 	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
 		__this_cpu_dec(rcu_read_lock_nesting);
 	else {
@@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
 		barrier();  /* ->rcu_read_unlock_special load before assign */
 		__this_cpu_write(rcu_read_lock_nesting, 0);
 	}
-#ifdef CONFIG_PROVE_LOCKING
+#if 1 /* CONFIG_PROVE_LOCKING */
 	{
 		int rln = __this_cpu_read(rcu_read_lock_nesting);
 
-		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
+		BUG_ON(rln < 0 && rln > INT_MIN / 2);
 	}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
+	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(__rcu_read_unlock);
 
--- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
+++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
@@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
 {
 	struct rq *rq = this_rq();
 
-	rcu_switch_from(prev);
+	/* rcu_switch_from(prev); */
 	rcu_switch_to();
 	finish_task_switch(rq, prev);
 
@@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
 		"BUG: sleeping function called from invalid context at %s:%d\n",
 			file, line);
 	printk(KERN_ERR
+		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
+		raw_smp_processor_id(), preempt_count(), preempt_offset,
+		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
+	printk(KERN_ERR
 		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
 			in_atomic(), irqs_disabled(),
 			current->pid, current->comm);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 20:25             ` Hugh Dickins
@ 2012-05-02 20:49               ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 20:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > > 
> > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > > Call Trace:
> > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> __rcu_read_unlock() are not safe to be using __this_cpu operations,
> the cpu may change in between the rmw's read and write: they should
> be using this_cpu operations (or, I put preempt_disable/enable in the
> __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> which was given good instructions to use; but not so well on PowerPC.

Thank you very much for tracking this down!!!

> I've been running successfully for an hour now with the patch below;
> but I expect you'll want to consider the tradeoffs, and may choose a
> different solution.

The thing that puzzles me about this is that the normal path through
the scheduler would save and restore these per-CPU variables to and
from the task structure.  There must be a sneak path through the
scheduler that I failed to account for.

But given your good work, this should be easy for me to chase down
even on my x86-based laptop -- just convert from __this_cpu_inc() to a
read-inc-delay-write sequence.  And check that the underlying variable
didn't change in the meantime.  And dump an ftrace if it did.  ;-)

Thank you again, Hugh!

							Thanx, Paul

> Hugh
> 
> --- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
> +++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
> @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
>   */
>  static inline void __rcu_read_lock(void)
>  {
> -	__this_cpu_inc(rcu_read_lock_nesting);
> +	this_cpu_inc(rcu_read_lock_nesting);
>  	barrier(); /* Keep code within RCU read-side critical section. */
>  }
> 
> --- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
> +++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
> @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
>   */
>  void __rcu_read_unlock(void)
>  {
> +	preempt_disable();
>  	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
>  		__this_cpu_dec(rcu_read_lock_nesting);
>  	else {
> @@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
>  		barrier();  /* ->rcu_read_unlock_special load before assign */
>  		__this_cpu_write(rcu_read_lock_nesting, 0);
>  	}
> -#ifdef CONFIG_PROVE_LOCKING
> +#if 1 /* CONFIG_PROVE_LOCKING */
>  	{
>  		int rln = __this_cpu_read(rcu_read_lock_nesting);
> 
> -		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
> +		BUG_ON(rln < 0 && rln > INT_MIN / 2);
>  	}
>  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> +	preempt_enable();
>  }
>  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> 
> --- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
> +++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
> @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
>  {
>  	struct rq *rq = this_rq();
> 
> -	rcu_switch_from(prev);
> +	/* rcu_switch_from(prev); */
>  	rcu_switch_to();
>  	finish_task_switch(rq, prev);
> 
> @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
>  		"BUG: sleeping function called from invalid context at %s:%d\n",
>  			file, line);
>  	printk(KERN_ERR
> +		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
> +		raw_smp_processor_id(), preempt_count(), preempt_offset,
> +		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
> +	printk(KERN_ERR
>  		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
>  			in_atomic(), irqs_disabled(),
>  			current->pid, current->comm);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 20:49               ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 20:49 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > > 
> > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > > Call Trace:
> > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> __rcu_read_unlock() are not safe to be using __this_cpu operations,
> the cpu may change in between the rmw's read and write: they should
> be using this_cpu operations (or, I put preempt_disable/enable in the
> __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> which was given good instructions to use; but not so well on PowerPC.

Thank you very much for tracking this down!!!

> I've been running successfully for an hour now with the patch below;
> but I expect you'll want to consider the tradeoffs, and may choose a
> different solution.

The thing that puzzles me about this is that the normal path through
the scheduler would save and restore these per-CPU variables to and
from the task structure.  There must be a sneak path through the
scheduler that I failed to account for.

But given your good work, this should be easy for me to chase down
even on my x86-based laptop -- just convert from __this_cpu_inc() to a
read-inc-delay-write sequence.  And check that the underlying variable
didn't change in the meantime.  And dump an ftrace if it did.  ;-)

Thank you again, Hugh!

							Thanx, Paul

> Hugh
> 
> --- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
> +++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
> @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
>   */
>  static inline void __rcu_read_lock(void)
>  {
> -	__this_cpu_inc(rcu_read_lock_nesting);
> +	this_cpu_inc(rcu_read_lock_nesting);
>  	barrier(); /* Keep code within RCU read-side critical section. */
>  }
> 
> --- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
> +++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
> @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
>   */
>  void __rcu_read_unlock(void)
>  {
> +	preempt_disable();
>  	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
>  		__this_cpu_dec(rcu_read_lock_nesting);
>  	else {
> @@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
>  		barrier();  /* ->rcu_read_unlock_special load before assign */
>  		__this_cpu_write(rcu_read_lock_nesting, 0);
>  	}
> -#ifdef CONFIG_PROVE_LOCKING
> +#if 1 /* CONFIG_PROVE_LOCKING */
>  	{
>  		int rln = __this_cpu_read(rcu_read_lock_nesting);
> 
> -		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
> +		BUG_ON(rln < 0 && rln > INT_MIN / 2);
>  	}
>  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> +	preempt_enable();
>  }
>  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> 
> --- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
> +++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
> @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
>  {
>  	struct rq *rq = this_rq();
> 
> -	rcu_switch_from(prev);
> +	/* rcu_switch_from(prev); */
>  	rcu_switch_to();
>  	finish_task_switch(rq, prev);
> 
> @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
>  		"BUG: sleeping function called from invalid context at %s:%d\n",
>  			file, line);
>  	printk(KERN_ERR
> +		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
> +		raw_smp_processor_id(), preempt_count(), preempt_offset,
> +		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
> +	printk(KERN_ERR
>  		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
>  			in_atomic(), irqs_disabled(),
>  			current->pid, current->comm);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 20:25             ` Hugh Dickins
@ 2012-05-02 21:20               ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-02 21:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Paul E. McKenney, Paul E. McKenney, linux-kernel, linuxppc-dev

On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> __rcu_read_unlock() are not safe to be using __this_cpu operations,
> the cpu may change in between the rmw's read and write: they should
> be using this_cpu operations (or, I put preempt_disable/enable in the
> __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> which was given good instructions to use; but not so well on PowerPC.
> 
> I've been running successfully for an hour now with the patch below;
> but I expect you'll want to consider the tradeoffs, and may choose a
> different solution.

Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?

I thought that was going out..

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 21:20               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-02 21:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Paul E. McKenney, Paul E. McKenney, linuxppc-dev, linux-kernel

On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> __rcu_read_unlock() are not safe to be using __this_cpu operations,
> the cpu may change in between the rmw's read and write: they should
> be using this_cpu operations (or, I put preempt_disable/enable in the
> __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> which was given good instructions to use; but not so well on PowerPC.
> 
> I've been running successfully for an hour now with the patch below;
> but I expect you'll want to consider the tradeoffs, and may choose a
> different solution.

Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?

I thought that was going out..

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 20:49               ` Paul E. McKenney
@ 2012-05-02 21:32                 ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 21:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote:
> > On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > > > 
> > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > > > Call Trace:
> > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > 
> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > the cpu may change in between the rmw's read and write: they should
> > be using this_cpu operations (or, I put preempt_disable/enable in the
> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > which was given good instructions to use; but not so well on PowerPC.
> 
> Thank you very much for tracking this down!!!
> 
> > I've been running successfully for an hour now with the patch below;
> > but I expect you'll want to consider the tradeoffs, and may choose a
> > different solution.
> 
> The thing that puzzles me about this is that the normal path through
> the scheduler would save and restore these per-CPU variables to and
> from the task structure.  There must be a sneak path through the
> scheduler that I failed to account for.

Sigh...  I am slow today, I guess.  The preemption could of course
happen between the time that the task calculated the address of the
per-CPU variable and the time that it modified it.  If this happens,
we are modifying some other CPU's per-CPU variable.

Given that Linus saw no performance benefit from this patchset, I am
strongly tempted to just drop this inlinable-__rcu_read_lock() series
at this point.

I suppose that the other option is to move preempt_count() to a
per-CPU variable, then use the space in the task_info struct.
But that didn't generate anywhere near as good of code...

							Thanx, Paul

> But given your good work, this should be easy for me to chase down
> even on my x86-based laptop -- just convert from __this_cpu_inc() to a
> read-inc-delay-write sequence.  And check that the underlying variable
> didn't change in the meantime.  And dump an ftrace if it did.  ;-)
> 
> Thank you again, Hugh!
> 
> 							Thanx, Paul
> 
> > Hugh
> > 
> > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
> > +++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
> > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
> >   */
> >  static inline void __rcu_read_lock(void)
> >  {
> > -	__this_cpu_inc(rcu_read_lock_nesting);
> > +	this_cpu_inc(rcu_read_lock_nesting);
> >  	barrier(); /* Keep code within RCU read-side critical section. */
> >  }
> > 
> > --- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
> > +++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
> > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
> >   */
> >  void __rcu_read_unlock(void)
> >  {
> > +	preempt_disable();
> >  	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
> >  		__this_cpu_dec(rcu_read_lock_nesting);
> >  	else {
> > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
> >  		barrier();  /* ->rcu_read_unlock_special load before assign */
> >  		__this_cpu_write(rcu_read_lock_nesting, 0);
> >  	}
> > -#ifdef CONFIG_PROVE_LOCKING
> > +#if 1 /* CONFIG_PROVE_LOCKING */
> >  	{
> >  		int rln = __this_cpu_read(rcu_read_lock_nesting);
> > 
> > -		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
> > +		BUG_ON(rln < 0 && rln > INT_MIN / 2);
> >  	}
> >  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> > +	preempt_enable();
> >  }
> >  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> > 
> > --- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
> > +++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
> > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
> >  {
> >  	struct rq *rq = this_rq();
> > 
> > -	rcu_switch_from(prev);
> > +	/* rcu_switch_from(prev); */
> >  	rcu_switch_to();
> >  	finish_task_switch(rq, prev);
> > 
> > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
> >  		"BUG: sleeping function called from invalid context at %s:%d\n",
> >  			file, line);
> >  	printk(KERN_ERR
> > +		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
> > +		raw_smp_processor_id(), preempt_count(), preempt_offset,
> > +		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
> > +	printk(KERN_ERR
> >  		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> >  			in_atomic(), irqs_disabled(),
> >  			current->pid, current->comm);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 21:32                 ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 21:32 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote:
> > On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > > > 
> > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > > > Call Trace:
> > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > 
> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > the cpu may change in between the rmw's read and write: they should
> > be using this_cpu operations (or, I put preempt_disable/enable in the
> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > which was given good instructions to use; but not so well on PowerPC.
> 
> Thank you very much for tracking this down!!!
> 
> > I've been running successfully for an hour now with the patch below;
> > but I expect you'll want to consider the tradeoffs, and may choose a
> > different solution.
> 
> The thing that puzzles me about this is that the normal path through
> the scheduler would save and restore these per-CPU variables to and
> from the task structure.  There must be a sneak path through the
> scheduler that I failed to account for.

Sigh...  I am slow today, I guess.  The preemption could of course
happen between the time that the task calculated the address of the
per-CPU variable and the time that it modified it.  If this happens,
we are modifying some other CPU's per-CPU variable.

Given that Linus saw no performance benefit from this patchset, I am
strongly tempted to just drop this inlinable-__rcu_read_lock() series
at this point.

I suppose that the other option is to move preempt_count() to a
per-CPU variable, then use the space in the task_info struct.
But that didn't generate anywhere near as good of code...

							Thanx, Paul

> But given your good work, this should be easy for me to chase down
> even on my x86-based laptop -- just convert from __this_cpu_inc() to a
> read-inc-delay-write sequence.  And check that the underlying variable
> didn't change in the meantime.  And dump an ftrace if it did.  ;-)
> 
> Thank you again, Hugh!
> 
> 							Thanx, Paul
> 
> > Hugh
> > 
> > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
> > +++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
> > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
> >   */
> >  static inline void __rcu_read_lock(void)
> >  {
> > -	__this_cpu_inc(rcu_read_lock_nesting);
> > +	this_cpu_inc(rcu_read_lock_nesting);
> >  	barrier(); /* Keep code within RCU read-side critical section. */
> >  }
> > 
> > --- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
> > +++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
> > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
> >   */
> >  void __rcu_read_unlock(void)
> >  {
> > +	preempt_disable();
> >  	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
> >  		__this_cpu_dec(rcu_read_lock_nesting);
> >  	else {
> > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
> >  		barrier();  /* ->rcu_read_unlock_special load before assign */
> >  		__this_cpu_write(rcu_read_lock_nesting, 0);
> >  	}
> > -#ifdef CONFIG_PROVE_LOCKING
> > +#if 1 /* CONFIG_PROVE_LOCKING */
> >  	{
> >  		int rln = __this_cpu_read(rcu_read_lock_nesting);
> > 
> > -		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
> > +		BUG_ON(rln < 0 && rln > INT_MIN / 2);
> >  	}
> >  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> > +	preempt_enable();
> >  }
> >  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> > 
> > --- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
> > +++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
> > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
> >  {
> >  	struct rq *rq = this_rq();
> > 
> > -	rcu_switch_from(prev);
> > +	/* rcu_switch_from(prev); */
> >  	rcu_switch_to();
> >  	finish_task_switch(rq, prev);
> > 
> > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
> >  		"BUG: sleeping function called from invalid context at %s:%d\n",
> >  			file, line);
> >  	printk(KERN_ERR
> > +		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
> > +		raw_smp_processor_id(), preempt_count(), preempt_offset,
> > +		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
> > +	printk(KERN_ERR
> >  		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> >  			in_atomic(), irqs_disabled(),
> >  			current->pid, current->comm);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 21:32                 ` Paul E. McKenney
@ 2012-05-02 21:36                   ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 21:36 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, linux-kernel, linuxppc-dev

On Wed, May 02, 2012 at 02:32:38PM -0700, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote:
> > On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote:
> > > On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > > > > 
> > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > > > > Call Trace:
> > > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > 
> > > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > > the cpu may change in between the rmw's read and write: they should
> > > be using this_cpu operations (or, I put preempt_disable/enable in the
> > > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > > which was given good instructions to use; but not so well on PowerPC.
> > 
> > Thank you very much for tracking this down!!!
> > 
> > > I've been running successfully for an hour now with the patch below;
> > > but I expect you'll want to consider the tradeoffs, and may choose a
> > > different solution.
> > 
> > The thing that puzzles me about this is that the normal path through
> > the scheduler would save and restore these per-CPU variables to and
> > from the task structure.  There must be a sneak path through the
> > scheduler that I failed to account for.
> 
> Sigh...  I am slow today, I guess.  The preemption could of course
> happen between the time that the task calculated the address of the
> per-CPU variable and the time that it modified it.  If this happens,
> we are modifying some other CPU's per-CPU variable.
> 
> Given that Linus saw no performance benefit from this patchset, I am
> strongly tempted to just drop this inlinable-__rcu_read_lock() series
> at this point.
> 
> I suppose that the other option is to move preempt_count() to a
> per-CPU variable, then use the space in the task_info struct.
> But that didn't generate anywhere near as good of code...

But preempt_count() would suffer exactly the same problem.  The address
is calculated, the task moves to some other CPU, and then the task
is messing with some other CPU's preemption counter.  Blech.

							Thanx, Paul

> > But given your good work, this should be easy for me to chase down
> > even on my x86-based laptop -- just convert from __this_cpu_inc() to a
> > read-inc-delay-write sequence.  And check that the underlying variable
> > didn't change in the meantime.  And dump an ftrace if it did.  ;-)
> > 
> > Thank you again, Hugh!
> > 
> > 							Thanx, Paul
> > 
> > > Hugh
> > > 
> > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
> > > +++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
> > > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
> > >   */
> > >  static inline void __rcu_read_lock(void)
> > >  {
> > > -	__this_cpu_inc(rcu_read_lock_nesting);
> > > +	this_cpu_inc(rcu_read_lock_nesting);
> > >  	barrier(); /* Keep code within RCU read-side critical section. */
> > >  }
> > > 
> > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
> > > +++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
> > > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
> > >   */
> > >  void __rcu_read_unlock(void)
> > >  {
> > > +	preempt_disable();
> > >  	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
> > >  		__this_cpu_dec(rcu_read_lock_nesting);
> > >  	else {
> > > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
> > >  		barrier();  /* ->rcu_read_unlock_special load before assign */
> > >  		__this_cpu_write(rcu_read_lock_nesting, 0);
> > >  	}
> > > -#ifdef CONFIG_PROVE_LOCKING
> > > +#if 1 /* CONFIG_PROVE_LOCKING */
> > >  	{
> > >  		int rln = __this_cpu_read(rcu_read_lock_nesting);
> > > 
> > > -		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
> > > +		BUG_ON(rln < 0 && rln > INT_MIN / 2);
> > >  	}
> > >  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> > > +	preempt_enable();
> > >  }
> > >  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> > > 
> > > --- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
> > > +++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
> > > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
> > >  {
> > >  	struct rq *rq = this_rq();
> > > 
> > > -	rcu_switch_from(prev);
> > > +	/* rcu_switch_from(prev); */
> > >  	rcu_switch_to();
> > >  	finish_task_switch(rq, prev);
> > > 
> > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
> > >  		"BUG: sleeping function called from invalid context at %s:%d\n",
> > >  			file, line);
> > >  	printk(KERN_ERR
> > > +		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
> > > +		raw_smp_processor_id(), preempt_count(), preempt_offset,
> > > +		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
> > > +	printk(KERN_ERR
> > >  		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> > >  			in_atomic(), irqs_disabled(),
> > >  			current->pid, current->comm);
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 21:36                   ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 21:36 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linuxppc-dev, linux-kernel, Paul E. McKenney

On Wed, May 02, 2012 at 02:32:38PM -0700, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 01:49:54PM -0700, Paul E. McKenney wrote:
> > On Wed, May 02, 2012 at 01:25:30PM -0700, Hugh Dickins wrote:
> > > On Tue, 1 May 2012, Paul E. McKenney wrote:
> > > > > > > > On Mon, 2012-04-30 at 15:37 -0700, Hugh Dickins wrote:
> > > > > > > > > 
> > > > > > > > > BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> > > > > > > > > in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> > > > > > > > > Call Trace:
> > > > > > > > > [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> > > > > > > > > [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> > > > > > > > > [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> > > > > > > > > [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> > > > > > > > > [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> > > > > > > > > [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> > > 
> > > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > > the cpu may change in between the rmw's read and write: they should
> > > be using this_cpu operations (or, I put preempt_disable/enable in the
> > > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > > which was given good instructions to use; but not so well on PowerPC.
> > 
> > Thank you very much for tracking this down!!!
> > 
> > > I've been running successfully for an hour now with the patch below;
> > > but I expect you'll want to consider the tradeoffs, and may choose a
> > > different solution.
> > 
> > The thing that puzzles me about this is that the normal path through
> > the scheduler would save and restore these per-CPU variables to and
> > from the task structure.  There must be a sneak path through the
> > scheduler that I failed to account for.
> 
> Sigh...  I am slow today, I guess.  The preemption could of course
> happen between the time that the task calculated the address of the
> per-CPU variable and the time that it modified it.  If this happens,
> we are modifying some other CPU's per-CPU variable.
> 
> Given that Linus saw no performance benefit from this patchset, I am
> strongly tempted to just drop this inlinable-__rcu_read_lock() series
> at this point.
> 
> I suppose that the other option is to move preempt_count() to a
> per-CPU variable, then use the space in the task_info struct.
> But that didn't generate anywhere near as good of code...

But preempt_count() would suffer exactly the same problem.  The address
is calculated, the task moves to some other CPU, and then the task
is messing with some other CPU's preemption counter.  Blech.

							Thanx, Paul

> > But given your good work, this should be easy for me to chase down
> > even on my x86-based laptop -- just convert from __this_cpu_inc() to a
> > read-inc-delay-write sequence.  And check that the underlying variable
> > didn't change in the meantime.  And dump an ftrace if it did.  ;-)
> > 
> > Thank you again, Hugh!
> > 
> > 							Thanx, Paul
> > 
> > > Hugh
> > > 
> > > --- 3.4-rc4-next-20120427/include/linux/rcupdate.h	2012-04-28 09:26:38.000000000 -0700
> > > +++ testing/include/linux/rcupdate.h	2012-05-02 11:46:06.000000000 -0700
> > > @@ -159,7 +159,7 @@ DECLARE_PER_CPU(struct task_struct *, rc
> > >   */
> > >  static inline void __rcu_read_lock(void)
> > >  {
> > > -	__this_cpu_inc(rcu_read_lock_nesting);
> > > +	this_cpu_inc(rcu_read_lock_nesting);
> > >  	barrier(); /* Keep code within RCU read-side critical section. */
> > >  }
> > > 
> > > --- 3.4-rc4-next-20120427/kernel/rcupdate.c	2012-04-28 09:26:40.000000000 -0700
> > > +++ testing/kernel/rcupdate.c	2012-05-02 11:44:13.000000000 -0700
> > > @@ -72,6 +72,7 @@ DEFINE_PER_CPU(struct task_struct *, rcu
> > >   */
> > >  void __rcu_read_unlock(void)
> > >  {
> > > +	preempt_disable();
> > >  	if (__this_cpu_read(rcu_read_lock_nesting) != 1)
> > >  		__this_cpu_dec(rcu_read_lock_nesting);
> > >  	else {
> > > @@ -83,13 +84,14 @@ void __rcu_read_unlock(void)
> > >  		barrier();  /* ->rcu_read_unlock_special load before assign */
> > >  		__this_cpu_write(rcu_read_lock_nesting, 0);
> > >  	}
> > > -#ifdef CONFIG_PROVE_LOCKING
> > > +#if 1 /* CONFIG_PROVE_LOCKING */
> > >  	{
> > >  		int rln = __this_cpu_read(rcu_read_lock_nesting);
> > > 
> > > -		WARN_ON_ONCE(rln < 0 && rln > INT_MIN / 2);
> > > +		BUG_ON(rln < 0 && rln > INT_MIN / 2);
> > >  	}
> > >  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> > > +	preempt_enable();
> > >  }
> > >  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> > > 
> > > --- 3.4-rc4-next-20120427/kernel/sched/core.c	2012-04-28 09:26:40.000000000 -0700
> > > +++ testing/kernel/sched/core.c	2012-05-01 22:40:46.000000000 -0700
> > > @@ -2024,7 +2024,7 @@ asmlinkage void schedule_tail(struct tas
> > >  {
> > >  	struct rq *rq = this_rq();
> > > 
> > > -	rcu_switch_from(prev);
> > > +	/* rcu_switch_from(prev); */
> > >  	rcu_switch_to();
> > >  	finish_task_switch(rq, prev);
> > > 
> > > @@ -7093,6 +7093,10 @@ void __might_sleep(const char *file, int
> > >  		"BUG: sleeping function called from invalid context at %s:%d\n",
> > >  			file, line);
> > >  	printk(KERN_ERR
> > > +		"cpu=%d preempt_count=%x preempt_offset=%x rcu_nesting=%x nesting_save=%x\n",
> > > +		raw_smp_processor_id(), preempt_count(), preempt_offset,
> > > +		rcu_preempt_depth(), current->rcu_read_lock_nesting_save); 
> > > +	printk(KERN_ERR
> > >  		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> > >  			in_atomic(), irqs_disabled(),
> > >  			current->pid, current->comm);
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 21:20               ` Benjamin Herrenschmidt
@ 2012-05-02 21:54                 ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 21:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Hugh Dickins, Paul E. McKenney, linux-kernel, linuxppc-dev

On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > the cpu may change in between the rmw's read and write: they should
> > be using this_cpu operations (or, I put preempt_disable/enable in the
> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > which was given good instructions to use; but not so well on PowerPC.
> > 
> > I've been running successfully for an hour now with the patch below;
> > but I expect you'll want to consider the tradeoffs, and may choose a
> > different solution.
> 
> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
> 
> I thought that was going out..

Linus did rant about __raw_get_cpu_var() because it cannot use the x86
%fs segement overrides a bit more than a month ago.  The __this_cpu
stuff is useful if you have preemption disabled -- avoids the extra
layer of preempt_disable().

Or was this a different rant?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 21:54                 ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-02 21:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul E. McKenney, linuxppc-dev, Hugh Dickins, linux-kernel

On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > the cpu may change in between the rmw's read and write: they should
> > be using this_cpu operations (or, I put preempt_disable/enable in the
> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > which was given good instructions to use; but not so well on PowerPC.
> > 
> > I've been running successfully for an hour now with the patch below;
> > but I expect you'll want to consider the tradeoffs, and may choose a
> > different solution.
> 
> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
> 
> I thought that was going out..

Linus did rant about __raw_get_cpu_var() because it cannot use the x86
%fs segement overrides a bit more than a month ago.  The __this_cpu
stuff is useful if you have preemption disabled -- avoids the extra
layer of preempt_disable().

Or was this a different rant?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 21:54                 ` Paul E. McKenney
@ 2012-05-02 22:54                   ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-02 22:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter,
	linux-kernel, linuxppc-dev

On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
>> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
>> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
>> > the cpu may change in between the rmw's read and write: they should
>> > be using this_cpu operations (or, I put preempt_disable/enable in the
>> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
>> > which was given good instructions to use; but not so well on PowerPC.
>> >
>> > I've been running successfully for an hour now with the patch below;
>> > but I expect you'll want to consider the tradeoffs, and may choose a
>> > different solution.
>>
>> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
>>
>> I thought that was going out..
>
> Linus did rant about __raw_get_cpu_var() because it cannot use the x86
> %fs segement overrides a bit more than a month ago.  The __this_cpu
> stuff is useful if you have preemption disabled -- avoids the extra
> layer of preempt_disable().
>
> Or was this a different rant?

https://lkml.org/lkml/2011/11/29/321

I think it ended up with Christoph removing the more egregious
variants, but this_cpu_that and __this_cpu_the_other remaining.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-02 22:54                   ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-02 22:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney

On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
>> > Got it at last. =C2=A0Embarrassingly obvious. =C2=A0__rcu_read_lock() =
and
>> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
>> > the cpu may change in between the rmw's read and write: they should
>> > be using this_cpu operations (or, I put preempt_disable/enable in the
>> > __rcu_read_unlock below). =C2=A0__this_cpus there work out fine on x86=
,
>> > which was given good instructions to use; but not so well on PowerPC.
>> >
>> > I've been running successfully for an hour now with the patch below;
>> > but I expect you'll want to consider the tradeoffs, and may choose a
>> > different solution.
>>
>> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
>>
>> I thought that was going out..
>
> Linus did rant about __raw_get_cpu_var() because it cannot use the x86
> %fs segement overrides a bit more than a month ago. =C2=A0The __this_cpu
> stuff is useful if you have preemption disabled -- avoids the extra
> layer of preempt_disable().
>
> Or was this a different rant?

https://lkml.org/lkml/2011/11/29/321

I think it ended up with Christoph removing the more egregious
variants, but this_cpu_that and __this_cpu_the_other remaining.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-02 22:54                   ` Hugh Dickins
@ 2012-05-03  0:14                     ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-03  0:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter,
	linux-kernel, linuxppc-dev

On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote:
> On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
> >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> >> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> >> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> >> > the cpu may change in between the rmw's read and write: they should
> >> > be using this_cpu operations (or, I put preempt_disable/enable in the
> >> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> >> > which was given good instructions to use; but not so well on PowerPC.
> >> >
> >> > I've been running successfully for an hour now with the patch below;
> >> > but I expect you'll want to consider the tradeoffs, and may choose a
> >> > different solution.
> >>
> >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
> >>
> >> I thought that was going out..
> >
> > Linus did rant about __raw_get_cpu_var() because it cannot use the x86
> > %fs segement overrides a bit more than a month ago.  The __this_cpu
> > stuff is useful if you have preemption disabled -- avoids the extra
> > layer of preempt_disable().
> >
> > Or was this a different rant?
> 
> https://lkml.org/lkml/2011/11/29/321
> 
> I think it ended up with Christoph removing the more egregious
> variants, but this_cpu_that and __this_cpu_the_other remaining.

Ah, thank you for the pointer.

It would be nice to have the CPU transparency of x86 on other
architectures, but from what I can see, that would require dedicating
a register to this purpose -- and even then requires that the arch
have indexed addressing modes.  There are some other approaches, for
example, having __this_cpu_that() be located at a special address that
the scheduler treated as implicitly preempt_disable().  Or I suppose
that the arch-specific trap-handling code could fake it.  A little
bit messy, but the ability to access a given CPU's per-CPU variable
while running on that CPU does appear to have at least a couple of
uses -- inlining RCU and also making preempt_disable() use per-CPU
variables.

In any case, I must confess that I feel quite silly about my series
of patches.  I have reverted them aside from a couple that did useful
optimizations, and they should show up in -next shortly.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-03  0:14                     ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-03  0:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney

On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote:
> On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
> >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> >> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> >> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> >> > the cpu may change in between the rmw's read and write: they should
> >> > be using this_cpu operations (or, I put preempt_disable/enable in the
> >> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> >> > which was given good instructions to use; but not so well on PowerPC.
> >> >
> >> > I've been running successfully for an hour now with the patch below;
> >> > but I expect you'll want to consider the tradeoffs, and may choose a
> >> > different solution.
> >>
> >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
> >>
> >> I thought that was going out..
> >
> > Linus did rant about __raw_get_cpu_var() because it cannot use the x86
> > %fs segement overrides a bit more than a month ago.  The __this_cpu
> > stuff is useful if you have preemption disabled -- avoids the extra
> > layer of preempt_disable().
> >
> > Or was this a different rant?
> 
> https://lkml.org/lkml/2011/11/29/321
> 
> I think it ended up with Christoph removing the more egregious
> variants, but this_cpu_that and __this_cpu_the_other remaining.

Ah, thank you for the pointer.

It would be nice to have the CPU transparency of x86 on other
architectures, but from what I can see, that would require dedicating
a register to this purpose -- and even then requires that the arch
have indexed addressing modes.  There are some other approaches, for
example, having __this_cpu_that() be located at a special address that
the scheduler treated as implicitly preempt_disable().  Or I suppose
that the arch-specific trap-handling code could fake it.  A little
bit messy, but the ability to access a given CPU's per-CPU variable
while running on that CPU does appear to have at least a couple of
uses -- inlining RCU and also making preempt_disable() use per-CPU
variables.

In any case, I must confess that I feel quite silly about my series
of patches.  I have reverted them aside from a couple that did useful
optimizations, and they should show up in -next shortly.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-03  0:14                     ` Paul E. McKenney
@ 2012-05-03  0:24                       ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-03  0:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter,
	linux-kernel, linuxppc-dev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2760 bytes --]

On Wed, 2 May 2012, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote:
> > On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
> > >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> > >> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > >> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > >> > the cpu may change in between the rmw's read and write: they should
> > >> > be using this_cpu operations (or, I put preempt_disable/enable in the
> > >> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > >> > which was given good instructions to use; but not so well on PowerPC.
> > >> >
> > >> > I've been running successfully for an hour now with the patch below;
> > >> > but I expect you'll want to consider the tradeoffs, and may choose a
> > >> > different solution.
> > >>
> > >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
> > >>
> > >> I thought that was going out..
> > >
> > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86
> > > %fs segement overrides a bit more than a month ago.  The __this_cpu
> > > stuff is useful if you have preemption disabled -- avoids the extra
> > > layer of preempt_disable().
> > >
> > > Or was this a different rant?
> > 
> > https://lkml.org/lkml/2011/11/29/321
> > 
> > I think it ended up with Christoph removing the more egregious
> > variants, but this_cpu_that and __this_cpu_the_other remaining.
> 
> Ah, thank you for the pointer.
> 
> It would be nice to have the CPU transparency of x86 on other
> architectures, but from what I can see, that would require dedicating
> a register to this purpose -- and even then requires that the arch
> have indexed addressing modes.  There are some other approaches, for
> example, having __this_cpu_that() be located at a special address that
> the scheduler treated as implicitly preempt_disable().  Or I suppose
> that the arch-specific trap-handling code could fake it.  A little
> bit messy, but the ability to access a given CPU's per-CPU variable
> while running on that CPU does appear to have at least a couple of
> uses -- inlining RCU and also making preempt_disable() use per-CPU
> variables.
> 
> In any case, I must confess that I feel quite silly about my series
> of patches.  I have reverted them aside from a couple that did useful
> optimizations, and they should show up in -next shortly.

A wee bit sad, but thank you - it was an experiment worth trying,
and perhaps there will be reason to come back to it future.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-03  0:24                       ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-03  0:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2760 bytes --]

On Wed, 2 May 2012, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 03:54:24PM -0700, Hugh Dickins wrote:
> > On Wed, May 2, 2012 at 2:54 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Thu, May 03, 2012 at 07:20:15AM +1000, Benjamin Herrenschmidt wrote:
> > >> On Wed, 2012-05-02 at 13:25 -0700, Hugh Dickins wrote:
> > >> > Got it at last.  Embarrassingly obvious.  __rcu_read_lock() and
> > >> > __rcu_read_unlock() are not safe to be using __this_cpu operations,
> > >> > the cpu may change in between the rmw's read and write: they should
> > >> > be using this_cpu operations (or, I put preempt_disable/enable in the
> > >> > __rcu_read_unlock below).  __this_cpus there work out fine on x86,
> > >> > which was given good instructions to use; but not so well on PowerPC.
> > >> >
> > >> > I've been running successfully for an hour now with the patch below;
> > >> > but I expect you'll want to consider the tradeoffs, and may choose a
> > >> > different solution.
> > >>
> > >> Didn't Linus recently rant about these __this_cpu vs this_cpu nonsense ?
> > >>
> > >> I thought that was going out..
> > >
> > > Linus did rant about __raw_get_cpu_var() because it cannot use the x86
> > > %fs segement overrides a bit more than a month ago.  The __this_cpu
> > > stuff is useful if you have preemption disabled -- avoids the extra
> > > layer of preempt_disable().
> > >
> > > Or was this a different rant?
> > 
> > https://lkml.org/lkml/2011/11/29/321
> > 
> > I think it ended up with Christoph removing the more egregious
> > variants, but this_cpu_that and __this_cpu_the_other remaining.
> 
> Ah, thank you for the pointer.
> 
> It would be nice to have the CPU transparency of x86 on other
> architectures, but from what I can see, that would require dedicating
> a register to this purpose -- and even then requires that the arch
> have indexed addressing modes.  There are some other approaches, for
> example, having __this_cpu_that() be located at a special address that
> the scheduler treated as implicitly preempt_disable().  Or I suppose
> that the arch-specific trap-handling code could fake it.  A little
> bit messy, but the ability to access a given CPU's per-CPU variable
> while running on that CPU does appear to have at least a couple of
> uses -- inlining RCU and also making preempt_disable() use per-CPU
> variables.
> 
> In any case, I must confess that I feel quite silly about my series
> of patches.  I have reverted them aside from a couple that did useful
> optimizations, and they should show up in -next shortly.

A wee bit sad, but thank you - it was an experiment worth trying,
and perhaps there will be reason to come back to it future.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-03  0:24                       ` Hugh Dickins
@ 2012-05-07 16:21                         ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-07 16:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter,
	linux-kernel, linuxppc-dev

On Wed, 2 May 2012, Hugh Dickins wrote:
> On Wed, 2 May 2012, Paul E. McKenney wrote:
> > 
> > In any case, I must confess that I feel quite silly about my series
> > of patches.  I have reverted them aside from a couple that did useful
> > optimizations, and they should show up in -next shortly.
> 
> A wee bit sad, but thank you - it was an experiment worth trying,
> and perhaps there will be reason to come back to it future.

The revert indeed showed up in next-20120504: thanks, no problem now.

But although it's just history, and not worth anyone's time to
investigate, I shouldn't let this thread die without an epilogue.

Although the patch I posted (this_cpu_inc in __rcu_read_lock,
preempt_disable and enable in __rcu_read_unlock) ran well until
I killed the test after 70 hours, it did not _entirely_ eliminate
the sleeping function BUG messages.

In 70 hours I got six isolated messages like the below (but from
different __might_sleep callsites) - where before I'd have flurries
of hundreds(?) and freeze within the hour.

And the "rcu_nesting" debug line I'd added to the message was different:
where before it was showing ffffffff on some tasks and 1 on others i.e.
increment or decrement had been applied to the wrong task, these messages
now all showed 0s throughout i.e. by the time the message was printed,
there was no longer any justification for the message.

As if a memory barrier were missing somewhere, perhaps.

BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305
cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0
in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1
Call Trace:
[c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable)
[c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170
[c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664
[c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-07 16:21                         ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-07 16:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney

On Wed, 2 May 2012, Hugh Dickins wrote:
> On Wed, 2 May 2012, Paul E. McKenney wrote:
> > 
> > In any case, I must confess that I feel quite silly about my series
> > of patches.  I have reverted them aside from a couple that did useful
> > optimizations, and they should show up in -next shortly.
> 
> A wee bit sad, but thank you - it was an experiment worth trying,
> and perhaps there will be reason to come back to it future.

The revert indeed showed up in next-20120504: thanks, no problem now.

But although it's just history, and not worth anyone's time to
investigate, I shouldn't let this thread die without an epilogue.

Although the patch I posted (this_cpu_inc in __rcu_read_lock,
preempt_disable and enable in __rcu_read_unlock) ran well until
I killed the test after 70 hours, it did not _entirely_ eliminate
the sleeping function BUG messages.

In 70 hours I got six isolated messages like the below (but from
different __might_sleep callsites) - where before I'd have flurries
of hundreds(?) and freeze within the hour.

And the "rcu_nesting" debug line I'd added to the message was different:
where before it was showing ffffffff on some tasks and 1 on others i.e.
increment or decrement had been applied to the wrong task, these messages
now all showed 0s throughout i.e. by the time the message was printed,
there was no longer any justification for the message.

As if a memory barrier were missing somewhere, perhaps.

BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305
cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0
in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1
Call Trace:
[c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable)
[c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170
[c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664
[c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-07 16:21                         ` Hugh Dickins
@ 2012-05-07 18:50                           ` Paul E. McKenney
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-07 18:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter,
	linux-kernel, linuxppc-dev

On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> On Wed, 2 May 2012, Hugh Dickins wrote:
> > On Wed, 2 May 2012, Paul E. McKenney wrote:
> > > 
> > > In any case, I must confess that I feel quite silly about my series
> > > of patches.  I have reverted them aside from a couple that did useful
> > > optimizations, and they should show up in -next shortly.
> > 
> > A wee bit sad, but thank you - it was an experiment worth trying,
> > and perhaps there will be reason to come back to it future.
> 
> The revert indeed showed up in next-20120504: thanks, no problem now.
> 
> But although it's just history, and not worth anyone's time to
> investigate, I shouldn't let this thread die without an epilogue.
> 
> Although the patch I posted (this_cpu_inc in __rcu_read_lock,
> preempt_disable and enable in __rcu_read_unlock) ran well until
> I killed the test after 70 hours, it did not _entirely_ eliminate
> the sleeping function BUG messages.
> 
> In 70 hours I got six isolated messages like the below (but from
> different __might_sleep callsites) - where before I'd have flurries
> of hundreds(?) and freeze within the hour.
> 
> And the "rcu_nesting" debug line I'd added to the message was different:
> where before it was showing ffffffff on some tasks and 1 on others i.e.
> increment or decrement had been applied to the wrong task, these messages
> now all showed 0s throughout i.e. by the time the message was printed,
> there was no longer any justification for the message.
> 
> As if a memory barrier were missing somewhere, perhaps.

These fields should be updated only by the corresponding CPU, so
if memory barriers are needed, it seems to me that the cross-CPU
access is the bug, not the lack of a memory barrier.

Ah...  Is preemption disabled across the access to RCU's nesting level
when printing out the message?  If not, a preeemption at that point
could result in the value printed being inaccurate.

							Thanx, Paul

> BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305
> cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0
> in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1
> Call Trace:
> [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable)
> [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170
> [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664
> [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> Hugh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-07 18:50                           ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2012-05-07 18:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney

On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> On Wed, 2 May 2012, Hugh Dickins wrote:
> > On Wed, 2 May 2012, Paul E. McKenney wrote:
> > > 
> > > In any case, I must confess that I feel quite silly about my series
> > > of patches.  I have reverted them aside from a couple that did useful
> > > optimizations, and they should show up in -next shortly.
> > 
> > A wee bit sad, but thank you - it was an experiment worth trying,
> > and perhaps there will be reason to come back to it future.
> 
> The revert indeed showed up in next-20120504: thanks, no problem now.
> 
> But although it's just history, and not worth anyone's time to
> investigate, I shouldn't let this thread die without an epilogue.
> 
> Although the patch I posted (this_cpu_inc in __rcu_read_lock,
> preempt_disable and enable in __rcu_read_unlock) ran well until
> I killed the test after 70 hours, it did not _entirely_ eliminate
> the sleeping function BUG messages.
> 
> In 70 hours I got six isolated messages like the below (but from
> different __might_sleep callsites) - where before I'd have flurries
> of hundreds(?) and freeze within the hour.
> 
> And the "rcu_nesting" debug line I'd added to the message was different:
> where before it was showing ffffffff on some tasks and 1 on others i.e.
> increment or decrement had been applied to the wrong task, these messages
> now all showed 0s throughout i.e. by the time the message was printed,
> there was no longer any justification for the message.
> 
> As if a memory barrier were missing somewhere, perhaps.

These fields should be updated only by the corresponding CPU, so
if memory barriers are needed, it seems to me that the cross-CPU
access is the bug, not the lack of a memory barrier.

Ah...  Is preemption disabled across the access to RCU's nesting level
when printing out the message?  If not, a preeemption at that point
could result in the value printed being inaccurate.

							Thanx, Paul

> BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305
> cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0
> in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1
> Call Trace:
> [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable)
> [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170
> [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664
> [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> Hugh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
  2012-05-07 18:50                           ` Paul E. McKenney
@ 2012-05-07 21:38                             ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-07 21:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Benjamin Herrenschmidt, Paul E. McKenney, Christoph Lameter,
	linux-kernel, linuxppc-dev

On Mon, 7 May 2012, Paul E. McKenney wrote:
> On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> > 
> > In 70 hours I got six isolated messages like the below (but from
> > different __might_sleep callsites) - where before I'd have flurries
> > of hundreds(?) and freeze within the hour.
> > 
> > And the "rcu_nesting" debug line I'd added to the message was different:
> > where before it was showing ffffffff on some tasks and 1 on others i.e.
> > increment or decrement had been applied to the wrong task, these messages
> > now all showed 0s throughout i.e. by the time the message was printed,
> > there was no longer any justification for the message.
> > 
> > As if a memory barrier were missing somewhere, perhaps.
> 
> These fields should be updated only by the corresponding CPU, so
> if memory barriers are needed, it seems to me that the cross-CPU
> access is the bug, not the lack of a memory barrier.

Yes: the code you added appeared to be using local CPU accesses only
(very much intentionally), and the context switch should already have
provided all the memory barriers needed there.

> 
> Ah...  Is preemption disabled across the access to RCU's nesting level
> when printing out the message?  If not, a preeemption at that point
> could result in the value printed being inaccurate.

Preemption was enabled in the cases I saw.  So you're pointing out that
#define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting))
should have been
#define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting))
to avoid the danger of spurious __might_sleep() warnings.

Yes, I believe you've got it - thanks.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
@ 2012-05-07 21:38                             ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2012-05-07 21:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linuxppc-dev, Christoph Lameter, linux-kernel, Paul E. McKenney

On Mon, 7 May 2012, Paul E. McKenney wrote:
> On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> > 
> > In 70 hours I got six isolated messages like the below (but from
> > different __might_sleep callsites) - where before I'd have flurries
> > of hundreds(?) and freeze within the hour.
> > 
> > And the "rcu_nesting" debug line I'd added to the message was different:
> > where before it was showing ffffffff on some tasks and 1 on others i.e.
> > increment or decrement had been applied to the wrong task, these messages
> > now all showed 0s throughout i.e. by the time the message was printed,
> > there was no longer any justification for the message.
> > 
> > As if a memory barrier were missing somewhere, perhaps.
> 
> These fields should be updated only by the corresponding CPU, so
> if memory barriers are needed, it seems to me that the cross-CPU
> access is the bug, not the lack of a memory barrier.

Yes: the code you added appeared to be using local CPU accesses only
(very much intentionally), and the context switch should already have
provided all the memory barriers needed there.

> 
> Ah...  Is preemption disabled across the access to RCU's nesting level
> when printing out the message?  If not, a preeemption at that point
> could result in the value printed being inaccurate.

Preemption was enabled in the cases I saw.  So you're pointing out that
#define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting))
should have been
#define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting))
to avoid the danger of spurious __might_sleep() warnings.

Yes, I believe you've got it - thanks.

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2012-05-07 21:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-30 22:37 linux-next ppc64: RCU mods cause __might_sleep BUGs Hugh Dickins
2012-04-30 22:37 ` Hugh Dickins
2012-04-30 23:14 ` Paul E. McKenney
2012-04-30 23:14   ` Paul E. McKenney
2012-05-01  0:33 ` Benjamin Herrenschmidt
2012-05-01  0:33   ` Benjamin Herrenschmidt
2012-05-01  5:10   ` Hugh Dickins
2012-05-01  5:10     ` Hugh Dickins
2012-05-01 14:22     ` Paul E. McKenney
2012-05-01 14:22       ` Paul E. McKenney
2012-05-01 21:42       ` Hugh Dickins
2012-05-01 21:42         ` Hugh Dickins
2012-05-01 23:25         ` Paul E. McKenney
2012-05-01 23:25           ` Paul E. McKenney
2012-05-02 20:25           ` Hugh Dickins
2012-05-02 20:25             ` Hugh Dickins
2012-05-02 20:49             ` Paul E. McKenney
2012-05-02 20:49               ` Paul E. McKenney
2012-05-02 21:32               ` Paul E. McKenney
2012-05-02 21:32                 ` Paul E. McKenney
2012-05-02 21:36                 ` Paul E. McKenney
2012-05-02 21:36                   ` Paul E. McKenney
2012-05-02 21:20             ` Benjamin Herrenschmidt
2012-05-02 21:20               ` Benjamin Herrenschmidt
2012-05-02 21:54               ` Paul E. McKenney
2012-05-02 21:54                 ` Paul E. McKenney
2012-05-02 22:54                 ` Hugh Dickins
2012-05-02 22:54                   ` Hugh Dickins
2012-05-03  0:14                   ` Paul E. McKenney
2012-05-03  0:14                     ` Paul E. McKenney
2012-05-03  0:24                     ` Hugh Dickins
2012-05-03  0:24                       ` Hugh Dickins
2012-05-07 16:21                       ` Hugh Dickins
2012-05-07 16:21                         ` Hugh Dickins
2012-05-07 18:50                         ` Paul E. McKenney
2012-05-07 18:50                           ` Paul E. McKenney
2012-05-07 21:38                           ` Hugh Dickins
2012-05-07 21:38                             ` Hugh Dickins
2012-05-01 13:39   ` Paul E. McKenney
2012-05-01 13:39     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.