LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Dave Hansen <dave@sr71.net>, Borislav Petkov <bp@alien8.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>,
	dhillf@gmail.com, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu
Date: Thu, 11 Apr 2013 14:54:13 +0200 (CEST)
Message-ID: <alpine.LFD.2.02.1304111452140.21884@ionos> (raw)
In-Reply-To: <5166A1D0.1080102@linux.vnet.ibm.com>

On Thu, 11 Apr 2013, Srivatsa S. Bhat wrote:

> On 04/11/2013 04:18 PM, Srivatsa S. Bhat wrote:
> > On 04/11/2013 03:49 PM, Thomas Gleixner wrote:
> >> Dave,
> >>
> >> On Wed, 10 Apr 2013, Dave Hansen wrote:
> >>
> >>> I think I got a full trace this time:
> >>>
> >>> 	http://sr71.net/~dave/linux/bigbox-trace.1365621899.txt.gz
> >>>
> >>> The last timestamp is pretty close to the timestamp on the console:
> >>>
> >>> [ 2071.033434] smpboot_thread_fn():
> >>> [ 2071.033455] smpboot_thread_fn() cpu: 22 159
> >>> [ 2071.033470] td->cpu: 22
> >>> [ 2071.033475] smp_processor_id(): 21
> >>> [ 2071.033486] comm: migration/%u
> >>
> >> Yes, that's helpful. Though it makes my mind boggle:
> >>
> >>     migration/22-4335  [021] d.h.  2071.020530: sched_wakeup: comm=migration/21 pid=4323 prio=0 success=1 target_cpu=021^M
> >>     migration/22-4335  [021] d...  2071.020541: sched_switch: prev_comm=migration/22 prev_pid=4335 prev_prio=0 prev_state=0x200 ==> next_comm=migration/21 next_pid=4323 next_prio=0^M
> >>     migration/21-4323  [021] d...  2071.026110: sched_switch: prev_comm=migration/21 prev_pid=4323 prev_prio=0 prev_state=S ==> next_comm=migration/22 next_pid=4335 next_prio=0^M
> >>     migration/22-4335  [021] ....  2071.033422: smpboot_thread_fn <-kthread^M
> >>
> >> So the migration thread schedules out with state TASK_PARKED and gets
> >> scheduled back in right away without a wakeup. Srivatsa was about
> >> right, that this might be related to the sched_set_stop_task() call,
> >> but the changelog led completely down the wrong road.
> >>
> >> So the issue is:
> >>
> >> CPU14				CPU21
> >> create_thread(for CPU22)
> >>  park_thread()	
> >>    wait_for_completion()	park_me()
> >> 				 complete()
> >>    sched_set_stop_task()
> >> 				 schedule(TASK_PARKED)
> >>
> >> The sched_set_stop_task() call is issued while the task is on the
> >> runqueue of CPU21 and that confuses the hell out of the stop_task
> >> class on that cpu. So as we have to synchronize the state for the
> >> bind call (the issue observed by Borislav) we need to do the same
> >> before issuing sched_set_stop_task(). Delta patch below.
> >>
> > 
> > In that case, why not just apply this 2 line patch on mainline?
> > The patch I sent yesterday was more elaborate because I wrongly assumed
> > that kthread_bind() can cause a wakeup. But now, I feel the patch shown
> > below should work just fine too. Yeah, it binds the task during creation
> > as well as during unpark, but that should be ok (see below).
> > 
> > Somehow, I believe we can handle this issue without introducing that
> > whole TASK_PARKED thing..

We need it as Borislav showed.

Thread is created and parked. Now after online we unpark the thread,
but something else wakes it before we reach the unpark code and boom
it's on the wrong cpu. We want that PARKED thing for
robustness. Anything else is just voodoo programming.

Thanks,

	tglx

  parent reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-05 21:43 kernel BUG at kernel/smpboot.c:134! Dave Hansen
2013-04-06  7:12 ` Srivatsa S. Bhat
2013-04-06  8:31   ` Thomas Gleixner
2013-04-07  9:20     ` Thomas Gleixner
2013-04-07  9:50       ` Borislav Petkov
2013-04-08  9:24         ` Thomas Gleixner
2013-04-08 11:55           ` Borislav Petkov
2013-04-08 12:17             ` Thomas Gleixner
2013-04-09 14:38               ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Thomas Gleixner
2013-04-09 15:55                 ` Dave Hansen
2013-04-09 18:43                   ` Thomas Gleixner
2013-04-09 19:30                     ` Thomas Gleixner
2013-04-09 20:38                       ` Dave Hansen
2013-04-09 20:54                         ` Dave Hansen
2013-04-10  8:29                         ` Thomas Gleixner
2013-04-10 10:51                           ` Thomas Gleixner
2013-04-10 19:41                             ` Dave Hansen
2013-04-11 10:19                               ` Thomas Gleixner
2013-04-11 10:48                                 ` Srivatsa S. Bhat
2013-04-11 11:43                                   ` Srivatsa S. Bhat
2013-04-11 11:59                                     ` Srivatsa S. Bhat
2013-04-11 12:51                                     ` Thomas Gleixner
2013-04-11 12:54                                     ` Thomas Gleixner [this message]
2013-04-11 13:46                                   ` Thomas Gleixner
2013-04-11 18:07                                 ` Dave Hansen
2013-04-11 19:48                                   ` Thomas Gleixner
2013-04-10 14:03                   ` [PATCH] CPU hotplug, smpboot: Fix crash in smpboot_thread_fn() Srivatsa S. Bhat
2013-04-11  8:10                     ` Thomas Gleixner
2013-04-11 10:19                       ` Srivatsa S. Bhat
2013-04-11 19:16                 ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Srivatsa S. Bhat
2013-04-11 20:47                   ` Thomas Gleixner
2013-04-11 21:19                     ` Srivatsa S. Bhat
2013-04-12 10:59                       ` Thomas Gleixner
2013-04-12 11:26                         ` Srivatsa S. Bhat
2013-04-15 19:49                         ` Dave Hansen
2013-04-12 10:41                 ` Peter Zijlstra
2013-04-12 12:32                 ` [tip:core/urgent] " tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1304111452140.21884@ionos \
    --to=tglx@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave@sr71.net \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git