LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>, Dave Hansen <dave@sr71.net>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>,
	dhillf@gmail.com, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu
Date: Thu, 11 Apr 2013 22:47:09 +0200 (CEST)
Message-ID: <alpine.LFD.2.02.1304112158040.21884@ionos> (raw)
In-Reply-To: <51670C17.8070608@linux.vnet.ibm.com>

Srivatsa,

On Fri, 12 Apr 2013, Srivatsa S. Bhat wrote:
> On 04/09/2013 08:08 PM, Thomas Gleixner wrote:
> > Add a new task state (TASK_PARKED) which prevents other wakeups and
> > use this state explicitely for the unpark wakeup.
> >
> 
> Again, I think this is unnecessary. We are good as long as no one other
> than the unpark code can kick the kthreads out of the loop in the park
> code. Now that I understand the race you explained above, why not just
> fix that race itself by reversing the ordering of clear(SHOULD_PARK)
> and bind_to(CPU2)? That way, even if someone else wakes up the per-cpu
> kthread, it will just remain confined to the park code, as intended.

In theory.
 
> A patch like below should do it IMHO. I guess I'm being a little too
> persistent, sorry!

No it's not about being persistent, you're JUST too much into voodoo
programming instead of going for the straight forward and robust
solutions.

Darn, I hate it as much as everyone else to introduce a new task
state, but that state allows us to make guarantees and gives us
semantical clarity. A parked thread is parked and can only be woken up
by the unpark code. That's clear semantics and not a magic construct
which will work in most cases and for the remaining ones (See below)
it will give us problems which are way harder to decode than the ones
we tried to fix with that magic.

> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 691dc2e..9512fc5 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -308,6 +308,15 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
>  	to_kthread(p)->cpu = cpu;
>  	/* Park the thread to get it out of TASK_UNINTERRUPTIBLE state */
>  	kthread_park(p);
> +
> +	/*
> +	 * Wait for p->on_rq to be reset to 0, to ensure that the per-cpu
> +	 * migration thread (which belongs to the stop_task sched class)
> +	 * doesn't run until the cpu is actually onlined and the thread is
> +	 * unparked.
> +	 */
> +	if (!wait_task_inactive(p, TASK_INTERRUPTIBLE))
> +		WARN_ON(1);

Yay, we rely on TASK_INTERRUPTIBLE state with a task which already has
references outside the creation code. And then we _HOPE_ that nothing
wakes it up _BEFORE_ we do something else.

Aside of that, you are still insisting to enforce that for every per
cpu thread even if the only one which needs that at this point are
thos which have a create() callback (i.e. the migration thread). And
next week you figure out that this is a performance impact on bringing
up large machines....

>  /**
>   * kthread_unpark - unpark a thread created by kthread_create().
>   * @k:		thread created by kthread_create().
> @@ -337,18 +357,29 @@ void kthread_unpark(struct task_struct *k)
>  	struct kthread *kthread = task_get_live_kthread(k);
>  
>  	if (kthread) {
> +		/*
> +		 * Per-cpu kthreads such as ksoftirqd can get woken up by
> +		 * other events. So after binding the thread, ensure that
> +		 * it goes off the CPU atleast once, by parking it again.
> +		 * This way, we can ensure that it will run on the correct
> +		 * CPU on subsequent wakeup.
> +		 */
> +		if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags)) {
> +			__kthread_bind(k, kthread->cpu);
> +			clear_bit(KTHREAD_IS_PARKED, &kthread->flags);

And how is that f*cking different from the previous code?

CPU0	   		CPU1		       CPU2
				       	       wakeup(T) -> run on CPU1 (last cpu)

			switch_to(T)

__kthread_bind(T, CPU2)

clear(KTHREAD_IS_PARKED)

			leave loop due to !KTHREAD_IS_PARKED

			BUG(wrong cpu)  <--- VOODOO FAILURE

kthread_park(T) <-- VOODOO TOO LATE

You can turn around the order of clearing/setting the flags as much as
you want, I'm going to punch an hole in it.

TASK_PARKED is the very obvious and robust solution which fixes _ALL_
of the corner cases, at least as far as I can imagine them. And
robustness rules at least in my world.

Thanks,

	tglx

  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-05 21:43 kernel BUG at kernel/smpboot.c:134! Dave Hansen
2013-04-06  7:12 ` Srivatsa S. Bhat
2013-04-06  8:31   ` Thomas Gleixner
2013-04-07  9:20     ` Thomas Gleixner
2013-04-07  9:50       ` Borislav Petkov
2013-04-08  9:24         ` Thomas Gleixner
2013-04-08 11:55           ` Borislav Petkov
2013-04-08 12:17             ` Thomas Gleixner
2013-04-09 14:38               ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Thomas Gleixner
2013-04-09 15:55                 ` Dave Hansen
2013-04-09 18:43                   ` Thomas Gleixner
2013-04-09 19:30                     ` Thomas Gleixner
2013-04-09 20:38                       ` Dave Hansen
2013-04-09 20:54                         ` Dave Hansen
2013-04-10  8:29                         ` Thomas Gleixner
2013-04-10 10:51                           ` Thomas Gleixner
2013-04-10 19:41                             ` Dave Hansen
2013-04-11 10:19                               ` Thomas Gleixner
2013-04-11 10:48                                 ` Srivatsa S. Bhat
2013-04-11 11:43                                   ` Srivatsa S. Bhat
2013-04-11 11:59                                     ` Srivatsa S. Bhat
2013-04-11 12:51                                     ` Thomas Gleixner
2013-04-11 12:54                                     ` Thomas Gleixner
2013-04-11 13:46                                   ` Thomas Gleixner
2013-04-11 18:07                                 ` Dave Hansen
2013-04-11 19:48                                   ` Thomas Gleixner
2013-04-10 14:03                   ` [PATCH] CPU hotplug, smpboot: Fix crash in smpboot_thread_fn() Srivatsa S. Bhat
2013-04-11  8:10                     ` Thomas Gleixner
2013-04-11 10:19                       ` Srivatsa S. Bhat
2013-04-11 19:16                 ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Srivatsa S. Bhat
2013-04-11 20:47                   ` Thomas Gleixner [this message]
2013-04-11 21:19                     ` Srivatsa S. Bhat
2013-04-12 10:59                       ` Thomas Gleixner
2013-04-12 11:26                         ` Srivatsa S. Bhat
2013-04-15 19:49                         ` Dave Hansen
2013-04-12 10:41                 ` Peter Zijlstra
2013-04-12 12:32                 ` [tip:core/urgent] " tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1304112158040.21884@ionos \
    --to=tglx@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave@sr71.net \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git