From: Nicholas Mc Guire <der.herr@hofr.at>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
waiman.long@hp.com, peterz@infradead.org,
raghavendra.kt@linux.vnet.ibm.com
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9
Date: Thu, 12 Feb 2015 20:10:09 +0100 [thread overview]
Message-ID: <20150212191009.GA26275@opentech.at> (raw)
In-Reply-To: <20150212174144.GA21714@redhat.com>
On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> On 02/12, Oleg Nesterov wrote:
> > On 02/11, Davidlohr Bueso wrote:
> > >
> > > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > > hit the following on x86_64. This happened after about 15 minutes of
> > > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > > on v3.19. I will check reproducibility and try to narrow it down.
> > > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > > >
> > > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > > >
> > > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > > [ 837.287013] Call Trace:
> > > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> > >
> > > We did have some recent changes in completions:
> > >
> > > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> > >
> > > The second one being more related (although both appear to make sense).
> > > Perhaps some subtle implication in the completion_done side that
> > > disappeared with the spinlock?
> >
> > At first glance both changes look suspicious.
>
> No, sorry, only the 2nd one.
>
> > Unless at least document how
> > you can use these helpers.
> >
> > Consider this code:
> >
> > void xxx(void)
> > {
> > struct completion c;
> >
> > init_completion(&c);
> >
> > expose_this_completion(&c);
> >
> > while (!completion_done(&c)
> > schedule_timeout_uninterruptible(1);
But that would not break due to the change - even if completion_done() had a
problem - complete_done() is not consuming x->done it is only checking it?
> > }
> >
> > Before that change this code was correct, now it is not. Hmm and note that
> > this is what stop_machine_from_inactive_cpu() does although I do not know
> > if this is related or not.
> >
> > Because completion_done() can now race with complete(), the final
> > spin_unlock() can write to the memory after it was freed/reused. In this
> > case it can write to the stack after return.
> >
> > Add CC's.
>
> Nicholas, don't we need something like below?
>
> Oleg.
>
>
> --- x/kernel/sched/completion.c
> +++ x/kernel/sched/completion.c
> @@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
> * first without taking the lock so we can
> * return early in the blocking case.
> */
> - if (!ACCESS_ONCE(x->done))
> + if (!READ_ONCE(x->done))
> return 0;
>
> spin_lock_irqsave(&x->wait.lock, flags);
> @@ -297,6 +297,11 @@ EXPORT_SYMBOL(try_wait_for_completion);
> */
> bool completion_done(struct completion *x)
> {
> - return !!ACCESS_ONCE(x->done);
> + if (!READ_ONCE(x->done))
> + return false;
> +
> + smp_rmb();
> + spin_unlock_wait(&x->wait.lock);
> + return true;
what would be the sense of the spin_unlock_wait here ?
all you are interested in here is the state of x->done
regarding the smp_rmb() where would the counterpart to that be ?
looking at it and trying to see how the changes could cause your ooops - don't yet see it.
thx!
hofrat
next prev parent reply other threads:[~2015-02-12 19:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-12 0:34 BUG: spinlock bad magic on CPU#0, migration/0/9 Paul E. McKenney
2015-02-12 3:15 ` Davidlohr Bueso
2015-02-12 3:43 ` Paul E. McKenney
2015-02-12 17:28 ` Oleg Nesterov
2015-02-12 17:41 ` Oleg Nesterov
2015-02-12 17:58 ` Davidlohr Bueso
2015-02-12 19:10 ` Nicholas Mc Guire [this message]
2015-02-12 19:37 ` Oleg Nesterov
2015-02-12 21:27 ` Oleg Nesterov
2015-02-13 18:17 ` Nicholas Mc Guire
2015-02-13 18:53 ` Oleg Nesterov
2015-02-14 8:35 ` Nicholas Mc Guire
2015-02-14 14:00 ` Oleg Nesterov
2015-02-12 19:59 ` Davidlohr Bueso
2015-02-12 19:32 ` Nicholas Mc Guire
2015-02-12 19:39 ` Oleg Nesterov
2015-02-12 19:59 ` [PATCH] sched/completion: completion_done() should serialize with complete() Oleg Nesterov
2015-02-13 21:09 ` Paul E. McKenney
2015-02-13 21:56 ` Davidlohr Bueso
2015-02-13 22:02 ` Davidlohr Bueso
2015-02-16 8:21 ` Peter Zijlstra
2015-02-16 16:51 ` Oleg Nesterov
2015-02-18 17:06 ` [tip:sched/core] sched/completion: Serialize completion_done() " tip-bot for Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150212191009.GA26275@opentech.at \
--to=der.herr@hofr.at \
--cc=dave@stgolabs.net \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).