All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Hillf Danton <dhillf@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] stop_machine: dequeue work before signal completion
Date: Sat, 9 Feb 2013 11:17:37 -0800	[thread overview]
Message-ID: <20130209191737.GD2875@htj.dyndns.org> (raw)
In-Reply-To: <CAJd=RBBvE-5f1UyJGu32EHGVCXe6rVYY0010SKa56j7OvtPG7g@mail.gmail.com>

Hello, again.

On Fri, Feb 08, 2013 at 11:42:43AM +0800, Hillf Danton wrote:
> As checked with BUG_ON in the case of CPU_UP_PREPARE, we have to dequeue
> work first for further actions, then stopper reaches sane and clear state.

When a CPU is finally put down in either CPU_UP_CANCELLED or
CPU_POST_DEAD, cpu_stop_cpu_callback() signals immediate completion on
all cpu_stop_works still queued on the dead CPU; unfortunately, this
code is buggy in that it doesn't remove the canceled work items off
the stopper->works leaving it corrupted, which will trigger BUG_ON()
during CPU_UP_PREPARE if the CPU is brought back online.

This bug isn't easily triggered because CPU_DOWN has to race against
cpu_stop calls and most, if not all, cpu stop users pin target CPUs.

Fix it by popping each work item off stopper->works.

> Signed-off-by: Hillf Danton <dhillf@gmail.com>

Maybe

Cc: stable@vger.kernel.org

> --- a/kernel/stop_machine.c	Fri Feb  8 11:22:44 2013
> +++ b/kernel/stop_machine.c	Fri Feb  8 11:29:40 2013
> @@ -342,8 +342,12 @@ static int __cpuinit cpu_stop_cpu_callba
>  		kthread_stop(stopper->thread);
>  		/* drain remaining works */
>  		spin_lock_irq(&stopper->lock);
> -		list_for_each_entry(work, &stopper->works, list)
> +		while (!list_empty(&stopper->works)) {
> +			work = list_first_entry(&stopper->works,
> +					struct cpu_stop_work, list);
> +			list_del_init(&work->list);
>  			cpu_stop_signal_done(work->done, false);
> +		}

I think your previous version was better with @work declaration moved
inside the while() loop.

Thanks.

-- 
tejun

  parent reply	other threads:[~2013-02-09 19:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08  3:42 [PATCH 2/2] stop_machine: dequeue work before signal completion Hillf Danton
2013-02-08  8:22 ` Namhyung Kim
2013-02-08 12:03   ` Hillf Danton
2013-02-09 19:17 ` Tejun Heo [this message]
2013-02-10  5:26 Hillf Danton
2013-02-12 17:38 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130209191737.GD2875@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dhillf@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.