linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Kohli, Gaurav" <gkohli@codeaurora.org>
Cc: tglx@linutronix.de, mpe@ellerman.id.au, mingo@kernel.org,
	bigeasy@linutronix.de, linux-kernel@vger.kernel.org,
	linux-arm-msm@vger.kernel.org,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Will Deacon <will.deacon@arm.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup
Date: Tue, 1 May 2018 12:18:45 +0200	[thread overview]
Message-ID: <20180501101845.GE12217@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <3af3365b-4e3f-e388-8e90-45a3bd4120fd@codeaurora.org>

On Tue, May 01, 2018 at 01:20:26PM +0530, Kohli, Gaurav wrote:
> But In our older case, where we have seen failure below is the wake up path
> and ftraces, Wakeup occured  and completed before schedule call only.
> 
> So final state of CPUHP is running not parked. I have also pasted debug
> ftraces that we got during issue reproduction.
> 
> Here wakeup for cpuhp is below:
> 
> takedown_cpu-> kthread_park-> wake_up_process
> 
> 
>  39,034,311,742,395  apps (10240)        Trace Printk cpuhp/0  (16)  [000]
> 39015.625000: <debug> __kthread_parkme state=512 task=ffffffcc7458e680
> flags: 0x5 -> state 5 -> state is parked inside parkme function
> 
> 39,034,311,846,510  apps (10240)        Trace Printk cpuhp/0  (16)  [000]
> 39015.625000: <debug> before schedule __kthread_parkme state=0
> task=ffffffcc7458e680 flags: 0xd  ->  just before schedule call, state is
> running
> 
> tatic void __kthread_parkme(struct kthread *self)
> 
> {
> 
>         __set_current_state(TASK_PARKED);
> 
>         while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
> 
>                 if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
> 
>                         complete(&self->parked);
> 
>                 schedule();
> 
>                 __set_current_state(TASK_PARKED);
> 
>         }
> 
>         clear_bit(KTHREAD_IS_PARKED, &self->flags);
> 
>         __set_current_state(TASK_RUNNING);
> 
> }
> 
> So my point is here also, if it is reschedule then it can set TASK_PARKED,
> but it seems after takedown_cpu call this thread never get a chance to run,
> So final state is TASK_RUNNING.
> 
> In our current fix also can't we observe same scenario where final state is
> TASK_RUNNING.

I'm not sure I understand your concern. Loosing the TASK_PARKED store
with the above code is obviously bad. But with the loop as proposed I
don't see a problem.

takedown_cpu() can proceed beyond smpboot_park_threads() and kill the
CPU before any of the threads are parked -- per having the complete()
before hitting schedule().

And, afaict, that is harmless. When we go offline, sched_cpu_dying() ->
migrate_tasks() will migrate any still runnable threads off the cpu.
But because at this point the thread must be in the PARKED wait-loop, it
will hit schedule() and go to sleep eventually.

Also note that kthread_unpark() does __kthread_bind() to rebind the
threads.

Aaaah... I think I've spotted a problem there. We clear SHOULD_PARK
before we rebind, so if the thread lost the first PARKED store,
does the completion, gets migrated, cycles through the loop and now
observes !SHOULD_PARK and bails the wait-loop, then __kthread_bind()
will forever wait.

Is that what you had in mind?

  reply	other threads:[~2018-05-01 10:19 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25  8:33 [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup Gaurav Kohli
2018-04-25 20:09 ` Peter Zijlstra
2018-04-26  4:04   ` Kohli, Gaurav
2018-04-26  9:14     ` Peter Zijlstra
2018-04-26  8:41   ` Peter Zijlstra
2018-04-26  8:57     ` Peter Zijlstra
2018-04-26 15:53       ` Kohli, Gaurav
2018-04-30 11:17         ` Peter Zijlstra
2018-05-01  7:50           ` Kohli, Gaurav
2018-05-01 10:18             ` Peter Zijlstra [this message]
2018-05-01 10:40               ` Peter Zijlstra
2018-05-01 10:40               ` Kohli, Gaurav
2018-05-01 11:31                 ` Peter Zijlstra
2018-05-01 11:46                   ` Kohli, Gaurav
2018-05-01 13:19                     ` Peter Zijlstra
2018-05-02  5:15                       ` Kohli, Gaurav
2018-05-02  8:20                         ` Peter Zijlstra
2018-05-02 10:13                           ` Kohli, Gaurav
2018-05-07 11:09                             ` Kohli, Gaurav
2018-05-07 11:23                               ` Kohli, Gaurav
2018-06-05 11:13                                 ` Kohli, Gaurav
2018-06-05 15:08                                   ` Oleg Nesterov
2018-06-05 15:22                                     ` Peter Zijlstra
2018-06-05 15:40                                       ` Peter Zijlstra
2018-06-05 16:35                                         ` Oleg Nesterov
2018-06-05 18:21                                           ` Kohli, Gaurav
2018-06-05 20:13                                           ` Peter Zijlstra
2018-06-06 13:51                                             ` Oleg Nesterov
2018-06-06 15:03                                               ` Peter Zijlstra
2018-06-06 15:04                                               ` Peter Zijlstra
2018-06-06 15:22                                               ` Peter Zijlstra
2018-06-06 18:59                                               ` Peter Zijlstra
2018-06-07  8:30                                                 ` Kohli, Gaurav
2018-05-01 10:44               ` Peter Zijlstra
2018-04-26 16:02     ` Andrea Parri
2018-04-26 16:18     ` Oleg Nesterov
2018-04-30 11:20       ` Peter Zijlstra
2018-04-30 11:56         ` Peter Zijlstra
2018-04-28  6:43 ` [lkp-robot] [kthread/smpboot] cad8e99675: inconsistent{IN-HARDIRQ-W}->{HARDIRQ-ON-W}usage kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180501101845.GE12217@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=gkohli@codeaurora.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=neeraju@codeaurora.org \
    --cc=oleg@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).