All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	"Paul E. McKenney" <paul.mckenney@linaro.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
Date: Mon, 7 May 2012 14:38:24 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.00.1205071421130.1920@eggly.anvils> (raw)
In-Reply-To: <20120507185017.GA21152@linux.vnet.ibm.com>

On Mon, 7 May 2012, Paul E. McKenney wrote:
> On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> > 
> > In 70 hours I got six isolated messages like the below (but from
> > different __might_sleep callsites) - where before I'd have flurries
> > of hundreds(?) and freeze within the hour.
> > 
> > And the "rcu_nesting" debug line I'd added to the message was different:
> > where before it was showing ffffffff on some tasks and 1 on others i.e.
> > increment or decrement had been applied to the wrong task, these messages
> > now all showed 0s throughout i.e. by the time the message was printed,
> > there was no longer any justification for the message.
> > 
> > As if a memory barrier were missing somewhere, perhaps.
> 
> These fields should be updated only by the corresponding CPU, so
> if memory barriers are needed, it seems to me that the cross-CPU
> access is the bug, not the lack of a memory barrier.

Yes: the code you added appeared to be using local CPU accesses only
(very much intentionally), and the context switch should already have
provided all the memory barriers needed there.

> 
> Ah...  Is preemption disabled across the access to RCU's nesting level
> when printing out the message?  If not, a preeemption at that point
> could result in the value printed being inaccurate.

Preemption was enabled in the cases I saw.  So you're pointing out that
#define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting))
should have been
#define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting))
to avoid the danger of spurious __might_sleep() warnings.

Yes, I believe you've got it - thanks.

Hugh

WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	"Paul E. McKenney" <paul.mckenney@linaro.org>
Subject: Re: linux-next ppc64: RCU mods cause __might_sleep BUGs
Date: Mon, 7 May 2012 14:38:24 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.00.1205071421130.1920@eggly.anvils> (raw)
In-Reply-To: <20120507185017.GA21152@linux.vnet.ibm.com>

On Mon, 7 May 2012, Paul E. McKenney wrote:
> On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> > 
> > In 70 hours I got six isolated messages like the below (but from
> > different __might_sleep callsites) - where before I'd have flurries
> > of hundreds(?) and freeze within the hour.
> > 
> > And the "rcu_nesting" debug line I'd added to the message was different:
> > where before it was showing ffffffff on some tasks and 1 on others i.e.
> > increment or decrement had been applied to the wrong task, these messages
> > now all showed 0s throughout i.e. by the time the message was printed,
> > there was no longer any justification for the message.
> > 
> > As if a memory barrier were missing somewhere, perhaps.
> 
> These fields should be updated only by the corresponding CPU, so
> if memory barriers are needed, it seems to me that the cross-CPU
> access is the bug, not the lack of a memory barrier.

Yes: the code you added appeared to be using local CPU accesses only
(very much intentionally), and the context switch should already have
provided all the memory barriers needed there.

> 
> Ah...  Is preemption disabled across the access to RCU's nesting level
> when printing out the message?  If not, a preeemption at that point
> could result in the value printed being inaccurate.

Preemption was enabled in the cases I saw.  So you're pointing out that
#define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting))
should have been
#define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting))
to avoid the danger of spurious __might_sleep() warnings.

Yes, I believe you've got it - thanks.

Hugh

  reply	other threads:[~2012-05-07 21:38 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-30 22:37 linux-next ppc64: RCU mods cause __might_sleep BUGs Hugh Dickins
2012-04-30 22:37 ` Hugh Dickins
2012-04-30 23:14 ` Paul E. McKenney
2012-04-30 23:14   ` Paul E. McKenney
2012-05-01  0:33 ` Benjamin Herrenschmidt
2012-05-01  0:33   ` Benjamin Herrenschmidt
2012-05-01  5:10   ` Hugh Dickins
2012-05-01  5:10     ` Hugh Dickins
2012-05-01 14:22     ` Paul E. McKenney
2012-05-01 14:22       ` Paul E. McKenney
2012-05-01 21:42       ` Hugh Dickins
2012-05-01 21:42         ` Hugh Dickins
2012-05-01 23:25         ` Paul E. McKenney
2012-05-01 23:25           ` Paul E. McKenney
2012-05-02 20:25           ` Hugh Dickins
2012-05-02 20:25             ` Hugh Dickins
2012-05-02 20:49             ` Paul E. McKenney
2012-05-02 20:49               ` Paul E. McKenney
2012-05-02 21:32               ` Paul E. McKenney
2012-05-02 21:32                 ` Paul E. McKenney
2012-05-02 21:36                 ` Paul E. McKenney
2012-05-02 21:36                   ` Paul E. McKenney
2012-05-02 21:20             ` Benjamin Herrenschmidt
2012-05-02 21:20               ` Benjamin Herrenschmidt
2012-05-02 21:54               ` Paul E. McKenney
2012-05-02 21:54                 ` Paul E. McKenney
2012-05-02 22:54                 ` Hugh Dickins
2012-05-02 22:54                   ` Hugh Dickins
2012-05-03  0:14                   ` Paul E. McKenney
2012-05-03  0:14                     ` Paul E. McKenney
2012-05-03  0:24                     ` Hugh Dickins
2012-05-03  0:24                       ` Hugh Dickins
2012-05-07 16:21                       ` Hugh Dickins
2012-05-07 16:21                         ` Hugh Dickins
2012-05-07 18:50                         ` Paul E. McKenney
2012-05-07 18:50                           ` Paul E. McKenney
2012-05-07 21:38                           ` Hugh Dickins [this message]
2012-05-07 21:38                             ` Hugh Dickins
2012-05-01 13:39   ` Paul E. McKenney
2012-05-01 13:39     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.00.1205071421130.1920@eggly.anvils \
    --to=hughd@google.com \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paul.mckenney@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.