linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Neuling <mikey@neuling.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Darren Hart <dvhltc@us.ibm.com>,
	Gautham R Shenoy <ego@in.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	linuxppc-dev@ozlabs.org, Will Schmidt <willschm@us.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Ankita Garg <ankita@in.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
Date: Fri, 03 Sep 2010 09:04:53 +1000	[thread overview]
Message-ID: <30721.1283468693@neuling.org> (raw)
In-Reply-To: <1283400367.2356.69.camel@gandalf.stny.rr.com>



In message <1283400367.2356.69.camel@gandalf.stny.rr.com> you wrote:
> On Thu, 2010-09-02 at 11:02 +1000, Michael Neuling wrote:
> 
> > We need to call smp_startup_cpu on boot when we the cpus are still in
> > FW.  smp_startup_cpu does this for us on boot.
> > 
> > I'm wondering if we just need to move the test down a bit to make sure
> > the preempt_count is set.  I've not been following this thread, but
> > maybe this might work?
> 
> Egad no! Setting the preempt_count to zero _is_ the bug. I think Darren
> even said that adding the exit prevented the bug (although now he's
> hitting a hard lockup someplace else). The original code he was using
> did not have the condition to return for kexec. It was just a
> coincidence that this code helped in bringing a CPU back online.
> 
> > 
> > Untested patch below...
> > 
> > Mikey
> > 
> > diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/
pseries/smp.c
> > index 0317cce..3afaba4 100644
> > --- a/arch/powerpc/platforms/pseries/smp.c
> > +++ b/arch/powerpc/platforms/pseries/smp.c
> > @@ -104,18 +104,18 @@ static inline int __devinit smp_startup_cpu(unsigned 
int lcpu)
> >  
> >  	pcpu = get_hard_smp_processor_id(lcpu);
> >  
> > -	/* Check to see if the CPU out of FW already for kexec */
> > -	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> > -		cpumask_set_cpu(lcpu, of_spin_mask);
> > -		return 1;
> > -	}
> > -
> >  	/* Fixup atomic count: it exited inside IRQ handler. */
> >  	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
> 
> We DON'T want to do the above. It's nasty! This is one CPU's task
> touching an intimate part of another CPU's task. It's equivalent of me
> putting my hand down you wife's blouse. It's offensive, and rude.
> 
> OK, if the CPU was never online, then you can do what you want. But what
> we see is that this fails on CPU hotplug.  You stop a CPU, and it goes
> into this cede_processor() call. When you wake it up, suddenly the task
> on that woken CPU has its preempt count fscked up.  This was really
> really hard to debug. We thought it was stack corruption or something.
> But it ended up being that this code has one CPU touching the breasts of
> another CPU. This code is a pervert!
> 
> What the trace clearly showed, was that we take down a CPU, and in doing
> so, the code on that CPU set the preempt count to 1, and it expected to
> have it as 1 when it returned. But the code that kicked started this CPU
> back to life (bring the CPU back online), set the preempt count on the
> task of that CPU to 0, and screwed everything up.

/me goes to checks where this came from...

It's been in the kernel since hotplug CPU support was added to ppc64
back in 2004, so I guess we are all at fault for letting this pervert
get away with this stuff for so long in plain sight. :-)

So I guess we should remove this but we need to audit all the different
paths that go through here to make sure they are OK with preempt.
Normal boot, kexec boot, hotplug with FW stop and hotplug with
extended_cede all hit this.

Mikey

  parent reply	other threads:[~2010-09-02 23:04 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-22 18:24 [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries Darren Hart
2010-07-22 18:36 ` Darren Hart
2010-07-22 18:38 ` Thomas Gleixner
2010-08-10 22:36   ` Darren Hart
2010-07-22 22:25 ` Benjamin Herrenschmidt
2010-07-22 23:57   ` Darren Hart
2010-07-23  4:44     ` Darren Hart
2010-07-23  5:08       ` Vaidyanathan Srinivasan
2010-07-23  5:11         ` Benjamin Herrenschmidt
2010-07-23  7:07           ` Vaidyanathan Srinivasan
2010-08-05  4:45             ` Darren Hart
2010-08-05 11:06               ` Vaidyanathan Srinivasan
2010-08-05 12:26                 ` Thomas Gleixner
2010-07-23  5:09       ` Benjamin Herrenschmidt
2010-08-06  2:19         ` Darren Hart
2010-08-06  5:09           ` Vaidyanathan Srinivasan
2010-08-06  7:13             ` Darren Hart
2010-07-23 14:39     ` Will Schmidt
2010-08-04 13:44   ` Darren Hart
2010-08-19 15:58 ` Ankita Garg
2010-08-19 18:58   ` Will Schmidt
2010-08-23 22:20   ` Darren Hart
2010-08-31  7:12   ` Darren Hart
2010-09-01  5:54     ` Michael Ellerman
2010-09-01 15:10       ` Darren Hart
2010-09-01 18:47         ` Darren Hart
2010-09-01 19:59           ` Steven Rostedt
2010-09-01 20:42             ` Darren Hart
2010-09-02  1:02               ` Michael Neuling
2010-09-02  4:06                 ` Steven Rostedt
2010-09-02  6:04                   ` Darren Hart
2010-09-03 20:10                     ` Will Schmidt
2010-09-02 23:04                   ` Michael Neuling [this message]
2010-09-03  0:08                     ` Darren Hart
2010-09-02  3:46           ` Michael Neuling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30721.1283468693@neuling.org \
    --to=mikey@neuling.org \
    --cc=ankita@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=josh@joshtriplett.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    --cc=willschm@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).