All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Hugh Dickins <hughd@google.com>
Cc: "Paul E. McKenney" <paul.mckenney@linaro.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
Date: Mon, 30 Apr 2012 16:14:33 -0700	[thread overview]
Message-ID: <20120430231433.GO2429@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1204301451270.2986@eggly.anvils>

On Mon, Apr 30, 2012 at 03:37:10PM -0700, Hugh Dickins wrote:
> Hi Paul,
> 
> On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried
> rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with
> CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious
> "BUG: sleeping function called from invalid context" messages from
> __might_sleep().
> 
> Just once I saw such a message during startup.  Once I saw such a
> message when rebuilding the machine's kernel.  Usually I see them
> when I'm running a swapping load of kernel builds under memory
> pressure (but that's what I'm habitually running there): perhaps
> after a few minutes a flurry comes, then goes away, comes back
> again later, and after perhaps a couple of hours of that I see
> "INFO: rcu_preempt detected stalls" messages too, and soon it
> freezes (or perhaps it's still running, but I'm so flooded by
> messages that I reboot anyway).
> 
> Rather like from before you fixed schedule_tail() for your per-cpu
> RCU mods, but not so easy to reproduce.  I did a bisection and indeed
> it converged as expected on the RCU changes.  No such problem seen on
> x86: it looks as if there's some further tweak required on PowerPC.
> 
> Here are my RCU config options (I don't usually have the TORTURE_TEST
> in, but tried that for half an hour this morning, in the hope that it
> would generate the issue: but it did not).
> 
> # RCU Subsystem
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_FANOUT=64
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_TREE_RCU_TRACE=y
> # CONFIG_RCU_BOOST is not set
> CONFIG_HAVE_RCU_TABLE_FREE=y
> # CONFIG_SPARSE_RCU_POINTER is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> # CONFIG_RCU_CPU_STALL_VERBOSE is not set
> # CONFIG_RCU_CPU_STALL_INFO is not set
> CONFIG_RCU_TRACE=y
> 
> Here's the message when I was rebuilding the G5's kernel:
> 
> BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> Call Trace:
> [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> I've plenty more examples, most of them from page faults or from kswapd;
> but I don't think there's any more useful information in them.
> 
> Anything I can try later on?

Interesting...  As you say, I saw this sort of thing before applying
the changes to schedule_tail(), and it is all too possible that there
is some other "sneak path" for context switches.

Have you tried running with CONFIG_PROVE_RCU?  This enables some
additional debugging in rcu_switch_from() and rcu_switch_to() that
helped track down the schedule_tail() problem.

						Thanx, Paul


WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Hugh Dickins <hughd@google.com>
Cc: "Paul E. McKenney" <paul.mckenney@linaro.org>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
Date: Mon, 30 Apr 2012 16:14:33 -0700	[thread overview]
Message-ID: <20120430231433.GO2429@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1204301451270.2986@eggly.anvils>

On Mon, Apr 30, 2012 at 03:37:10PM -0700, Hugh Dickins wrote:
> Hi Paul,
> 
> On 3.4.0-rc4-next-20120427 and preceding linux-nexts (I've not tried
> rc5-next-20120430 but expect it's the same), on PowerPC G5 quad with
> CONFIG_PREEMPT=y and CONFIG_DEBUG_ATOMIC_SLEEP=y, I'm getting spurious
> "BUG: sleeping function called from invalid context" messages from
> __might_sleep().
> 
> Just once I saw such a message during startup.  Once I saw such a
> message when rebuilding the machine's kernel.  Usually I see them
> when I'm running a swapping load of kernel builds under memory
> pressure (but that's what I'm habitually running there): perhaps
> after a few minutes a flurry comes, then goes away, comes back
> again later, and after perhaps a couple of hours of that I see
> "INFO: rcu_preempt detected stalls" messages too, and soon it
> freezes (or perhaps it's still running, but I'm so flooded by
> messages that I reboot anyway).
> 
> Rather like from before you fixed schedule_tail() for your per-cpu
> RCU mods, but not so easy to reproduce.  I did a bisection and indeed
> it converged as expected on the RCU changes.  No such problem seen on
> x86: it looks as if there's some further tweak required on PowerPC.
> 
> Here are my RCU config options (I don't usually have the TORTURE_TEST
> in, but tried that for half an hour this morning, in the hope that it
> would generate the issue: but it did not).
> 
> # RCU Subsystem
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_FANOUT=64
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_TREE_RCU_TRACE=y
> # CONFIG_RCU_BOOST is not set
> CONFIG_HAVE_RCU_TABLE_FREE=y
> # CONFIG_SPARSE_RCU_POINTER is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> # CONFIG_RCU_CPU_STALL_VERBOSE is not set
> # CONFIG_RCU_CPU_STALL_INFO is not set
> CONFIG_RCU_TRACE=y
> 
> Here's the message when I was rebuilding the G5's kernel:
> 
> BUG: sleeping function called from invalid context at include/linux/pagemap.h:354
> in_atomic(): 0, irqs_disabled(): 0, pid: 6886, name: cc1
> Call Trace:
> [c0000001a99f78e0] [c00000000000f34c] .show_stack+0x6c/0x16c (unreliable)
> [c0000001a99f7990] [c000000000077b40] .__might_sleep+0x11c/0x134
> [c0000001a99f7a10] [c0000000000c6228] .filemap_fault+0x1fc/0x494
> [c0000001a99f7af0] [c0000000000e7c9c] .__do_fault+0x120/0x684
> [c0000001a99f7c00] [c000000000025790] .do_page_fault+0x458/0x664
> [c0000001a99f7e30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> I've plenty more examples, most of them from page faults or from kswapd;
> but I don't think there's any more useful information in them.
> 
> Anything I can try later on?

Interesting...  As you say, I saw this sort of thing before applying
the changes to schedule_tail(), and it is all too possible that there
is some other "sneak path" for context switches.

Have you tried running with CONFIG_PROVE_RCU?  This enables some
additional debugging in rcu_switch_from() and rcu_switch_to() that
helped track down the schedule_tail() problem.

						Thanx, Paul

  reply	other threads:[~2012-04-30 23:21 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-30 22:37 linux-next ppc64: RCU mods cause __might_sleep BUGs Hugh Dickins
2012-04-30 22:37 ` Hugh Dickins
2012-04-30 23:14 ` Paul E. McKenney [this message]
2012-04-30 23:14   ` Paul E. McKenney
2012-05-01  0:33 ` Benjamin Herrenschmidt
2012-05-01  0:33   ` Benjamin Herrenschmidt
2012-05-01  5:10   ` Hugh Dickins
2012-05-01  5:10     ` Hugh Dickins
2012-05-01 14:22     ` Paul E. McKenney
2012-05-01 14:22       ` Paul E. McKenney
2012-05-01 21:42       ` Hugh Dickins
2012-05-01 21:42         ` Hugh Dickins
2012-05-01 23:25         ` Paul E. McKenney
2012-05-01 23:25           ` Paul E. McKenney
2012-05-02 20:25           ` Hugh Dickins
2012-05-02 20:25             ` Hugh Dickins
2012-05-02 20:49             ` Paul E. McKenney
2012-05-02 20:49               ` Paul E. McKenney
2012-05-02 21:32               ` Paul E. McKenney
2012-05-02 21:32                 ` Paul E. McKenney
2012-05-02 21:36                 ` Paul E. McKenney
2012-05-02 21:36                   ` Paul E. McKenney
2012-05-02 21:20             ` Benjamin Herrenschmidt
2012-05-02 21:20               ` Benjamin Herrenschmidt
2012-05-02 21:54               ` Paul E. McKenney
2012-05-02 21:54                 ` Paul E. McKenney
2012-05-02 22:54                 ` Hugh Dickins
2012-05-02 22:54                   ` Hugh Dickins
2012-05-03  0:14                   ` Paul E. McKenney
2012-05-03  0:14                     ` Paul E. McKenney
2012-05-03  0:24                     ` Hugh Dickins
2012-05-03  0:24                       ` Hugh Dickins
2012-05-07 16:21                       ` Hugh Dickins
2012-05-07 16:21                         ` Hugh Dickins
2012-05-07 18:50                         ` Paul E. McKenney
2012-05-07 18:50                           ` Paul E. McKenney
2012-05-07 21:38                           ` Hugh Dickins
2012-05-07 21:38                             ` Hugh Dickins
2012-05-01 13:39   ` Paul E. McKenney
2012-05-01 13:39     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120430231433.GO2429@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paul.mckenney@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.