linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
       [not found] <OF73BCE1E4.D2224FEC-ON48258025.0022CFA7-48258025.0022E3DC@kedacom.com>
@ 2016-09-09  8:13 ` Cheng Chao
  2016-09-09  8:19   ` chengchao
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-09  8:13 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel, Cheng Chao

For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).

If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
executes migration_cpu_stop(), and the stopper thread wakes up the task.

But in fact, all above works are almost useless(wasteful),the reason is 
migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the 
task is TASK_ON_RQ_QUEUED before it calls __migrate_task().

This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
so the migration_cpu_stop() can do useful works.

Signed-off-by: Cheng Chao <chengchao@kedacom.com>
---
 kernel/stop_machine.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..41aea5e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+
+#if defined(CONFIG_PREEMPT_NONE)
+	/*
+	 * Makes the stopper thread run as soon as possible.
+	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
+	 * It's special useful for some callers which are expected to be
+	 * TASK_ON_RQ_QUEUED.
+	 * sched_exec does benefit from this improvement.
+	 */
+	schedule();
+#endif
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
@ 2016-09-09  8:19   ` chengchao
  2016-09-09 13:14   ` Oleg Nesterov
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: chengchao @ 2016-09-09  8:19 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel


--in-reply-to doesn't work?

v1 is : 
https://lkml.org/lkml/2016/9/7/819


on 09/09/2016 04:13 PM, Cheng Chao wrote:
> For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
> calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).
> 
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 
> But in fact, all above works are almost useless(wasteful),the reason is 
> migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the 
> task is TASK_ON_RQ_QUEUED before it calls __migrate_task().
> 
> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.
> 
> Signed-off-by: Cheng Chao <chengchao@kedacom.com>
> ---
>  kernel/stop_machine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..41aea5e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();
> +#endif
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  2016-09-09  8:19   ` chengchao
@ 2016-09-09 13:14   ` Oleg Nesterov
  2016-09-09 16:24     ` Peter Zijlstra
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-09 13:14 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel

On 09/09, Cheng Chao wrote:
>
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.

and this finally migrates the target, ttwu() does another
select_task_rq().

> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.

yes, this avoids the extra wakeup/select_task_rq. So this is the minor
optimization.

> Signed-off-by: Cheng Chao <chengchao@kedacom.com>
> ---
>  kernel/stop_machine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..41aea5e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();

Well. This can help in general, wait_for_completion() won't block, but
only if we queue the work on the same CPU. Otherwise this schedule() adds
the unnecessary pessimization.

That is why I suggested to use _cond_resched() instead of schedule().

But let me repeat, I leave this to maintainers.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09 13:14   ` Oleg Nesterov
@ 2016-09-09 16:24     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-09 16:24 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On Fri, Sep 09, 2016 at 03:14:49PM +0200, Oleg Nesterov wrote:
> On 09/09, Cheng Chao wrote:
> >
> > If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> > It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> > executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 

Still not seeing any mail from Cheng Chao. I think he's mailing from a
domain that's not accepting email and then infradead makes its go-away,
why accept email from people who will not accept bounces.

Cheng, please fix your email setup.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  2016-09-09  8:19   ` chengchao
  2016-09-09 13:14   ` Oleg Nesterov
@ 2016-09-10  8:52   ` Cheng Chao
  2016-09-10  9:51     ` Cheng Chao
                       ` (3 more replies)
  2 siblings, 4 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-10  8:52 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel, Cheng Chao

For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).

If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
executes migration_cpu_stop(), and the stopper thread wakes up the task.

But in fact, all above works are almost useless(wasteful),the reason is
migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the
task is TASK_ON_RQ_QUEUED before it calls __migrate_task().

This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
so the migration_cpu_stop() can do useful works.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
---
 kernel/stop_machine.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..41aea5e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+
+#if defined(CONFIG_PREEMPT_NONE)
+	/*
+	 * Makes the stopper thread run as soon as possible.
+	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
+	 * It's special useful for some callers which are expected to be
+	 * TASK_ON_RQ_QUEUED.
+	 * sched_exec does benefit from this improvement.
+	 */
+	schedule();
+#endif
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
@ 2016-09-10  9:51     ` Cheng Chao
  2016-09-10 16:33       ` Peter Zijlstra
  2016-09-12 11:03     ` Oleg Nesterov
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Cheng Chao @ 2016-09-10  9:51 UTC (permalink / raw)
  To: peterz; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

hi Peter, I guess you can receive the mail from me now,
I have changed the mailbox to gmail.

Oleg has already done much work for this patch, I am really obliged.
please review this patch, thanks.

on 09/10/2016 04:52 PM, Cheng Chao wrote:
> For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
> calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).
> 
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 
> But in fact, all above works are almost useless(wasteful),the reason is
> migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the
> task is TASK_ON_RQ_QUEUED before it calls __migrate_task().
> 
> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.
> 
> Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
> ---
>  kernel/stop_machine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..41aea5e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();
> +#endif
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  9:51     ` Cheng Chao
@ 2016-09-10 16:33       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-10 16:33 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Sat, Sep 10, 2016 at 05:51:02PM +0800, Cheng Chao wrote:
> hi Peter, I guess you can receive the mail from me now,
> I have changed the mailbox to gmail.

Indeed, I'll have a look on Monday.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2016-09-10  9:51     ` Cheng Chao
@ 2016-09-12 11:03     ` Oleg Nesterov
  2016-09-13  2:45       ` Cheng Chao
  2016-09-12 11:37     ` Peter Zijlstra
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  3 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-12 11:03 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel

On 09/10, Cheng Chao wrote:
>
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();
> +#endif
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }

Cheng, I already tried twice to suggest to conditionalize this schedule,
because it can only help if cpu == smp_processor_id, and you didn't reply.
I still think _cond_resched() makes more sense.

I won't really argue if you prefer it this way. But did you see my emails?

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2016-09-10  9:51     ` Cheng Chao
  2016-09-12 11:03     ` Oleg Nesterov
@ 2016-09-12 11:37     ` Peter Zijlstra
  2016-09-12 11:41       ` Peter Zijlstra
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  3 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-12 11:37 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Sat, Sep 10, 2016 at 04:52:12PM +0800, Cheng Chao wrote:
> For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
> calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).
> 
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 
> But in fact, all above works are almost useless(wasteful),the reason is
> migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the
> task is TASK_ON_RQ_QUEUED before it calls __migrate_task().
> 
> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.

OK, completely confused. What!?

/me ponders....

So what you're saying is that migration_stop_cpu() doesn't work because
wait_for_completion() dequeues the task.

True I suppose. Not sure I like your solution, nor your implementation
of the solution much though.

I would much prefer an unconditional cond_resched() there, but also, I
think we should do what __migrate_swap_task() does, and set wake_cpu.

So something like so..

---
 kernel/sched/core.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ddd5f48551f1..ade772aa9610 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
-	if (task_rq(p) == rq && task_on_rq_queued(p))
-		rq = __migrate_task(rq, p, arg->dest_cpu);
+	if (task_rq(p) == rq) {
+		if (task_on_rq_queued(p))
+			rq = __migrate_task(rq, p, arg->dest_cpu);
+		else
+			p->wake_cpu = arg->dest_cpu;
+	}
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock(&p->pi_lock);
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:37     ` Peter Zijlstra
@ 2016-09-12 11:41       ` Peter Zijlstra
  2016-09-12 13:05         ` Oleg Nesterov
  2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-12 11:41 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Mon, Sep 12, 2016 at 01:37:27PM +0200, Peter Zijlstra wrote:
> So what you're saying is that migration_stop_cpu() doesn't work because
> wait_for_completion() dequeues the task.
> 
> True I suppose. Not sure I like your solution, nor your implementation
> of the solution much though.
> 
> I would much prefer an unconditional cond_resched() there, but also, I
> think we should do what __migrate_swap_task() does, and set wake_cpu.
> 
> So something like so..
> 
> ---
>  kernel/sched/core.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ddd5f48551f1..ade772aa9610 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq && task_on_rq_queued(p))
> -		rq = __migrate_task(rq, p, arg->dest_cpu);
> +	if (task_rq(p) == rq) {
> +		if (task_on_rq_queued(p))
> +			rq = __migrate_task(rq, p, arg->dest_cpu);
> +		else
> +			p->wake_cpu = arg->dest_cpu;
> +	}
>  	raw_spin_unlock(&rq->lock);
>  	raw_spin_unlock(&p->pi_lock);
>  

And this, too narrow a constraint do git diff made it go away.

---
 kernel/stop_machine.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ae6f41fb9cba..637798d6b554 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+	/*
+	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+	 * by doing a preemption.
+	 */
+	cond_resched();
 	wait_for_completion(&done.completion);
 	return done.ret;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:41       ` Peter Zijlstra
@ 2016-09-12 13:05         ` Oleg Nesterov
  2016-09-12 15:01           ` Peter Zijlstra
  2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  1 sibling, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-12 13:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On 09/12, Peter Zijlstra wrote:
>
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +	/*
> +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> +	 * by doing a preemption.
> +	 */
> +	cond_resched();

Yes, this is what I tried to suggest too.

But this leads to the question which I wanted to ask many times.

Why cond_resched() is not NOP if CONFIG_PREEMPT=y ?

Perhaps we have some users like, just for example,

	preempt_enable_no_resched();
	cond_resched();

which actually want the should_resched() check even if CONFIG_PREEMPT,
but most callers do not?

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 13:05         ` Oleg Nesterov
@ 2016-09-12 15:01           ` Peter Zijlstra
  2016-09-13 16:14             ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-12 15:01 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On Mon, Sep 12, 2016 at 03:05:38PM +0200, Oleg Nesterov wrote:

> But this leads to the question which I wanted to ask many times.
> 
> Why cond_resched() is not NOP if CONFIG_PREEMPT=y ?

Dunno, nobody bothered to do it? We should keep the might_sleep() of
course, but the preemption check is pointless.

> Perhaps we have some users like, just for example,
> 
> 	preempt_enable_no_resched();
> 	cond_resched();
> 
> which actually want the should_resched() check even if CONFIG_PREEMPT,
> but most callers do not?

I would hope not, the few preempt_enable_no_resched() users _should_
have an actual schedule() call in the _very_ near future.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:03     ` Oleg Nesterov
@ 2016-09-13  2:45       ` Cheng Chao
  0 siblings, 0 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-13  2:45 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel



on 09/12/2016 07:03 PM, Oleg Nesterov wrote:
> On 09/10, Cheng Chao wrote:
>>
>> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>>  	cpu_stop_init_done(&done, 1);
>>  	if (!cpu_stop_queue_work(cpu, &work))
>>  		return -ENOENT;
>> +
>> +#if defined(CONFIG_PREEMPT_NONE)
>> +	/*
>> +	 * Makes the stopper thread run as soon as possible.
>> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
>> +	 * It's special useful for some callers which are expected to be
>> +	 * TASK_ON_RQ_QUEUED.
>> +	 * sched_exec does benefit from this improvement.
>> +	 */
>> +	schedule();
>> +#endif
>>  	wait_for_completion(&done.completion);
>>  	return done.ret;
>>  }
> 
> Cheng, I already tried twice to suggest to conditionalize this schedule,
> because it can only help if cpu == smp_processor_id, and you didn't reply.
> I still think _cond_resched() makes more sense.
> 
> I won't really argue if you prefer it this way. But did you see my emails?
>

I read them, thanks. because Peter didn't receive my mails before, it took me much time
to fix my mailbox, so I didn't reply on time.

Ok, even if cpu != smp_processor_id(), to call schedule() instead _cond_resched() can
give the caller a chance not to sleep. when the caller runs on the cpu again, it may 
likely find the completion is already done. 
then the stopper thread cpu_stop_signal_done() and the caller wait_for_completion() will
actually run very soon.

I think it is trivial improvement. using cond_resched()/_cond_resched() is better for 
readability, I choose the cond_resched().

thanks again.



 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:41       ` Peter Zijlstra
  2016-09-12 13:05         ` Oleg Nesterov
@ 2016-09-13  4:03         ` Cheng Chao
  2016-09-13  8:45           ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Cheng Chao @ 2016-09-13  4:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

Peter, thank you.

on 09/12/2016 07:41 PM, Peter Zijlstra wrote:
> On Mon, Sep 12, 2016 at 01:37:27PM +0200, Peter Zijlstra wrote:
>> So what you're saying is that migration_stop_cpu() doesn't work because
>> wait_for_completion() dequeues the task.
>>
>> True I suppose. Not sure I like your solution, nor your implementation
>> of the solution much though.
>>
>> I would much prefer an unconditional cond_resched() there, but also, I
>> think we should do what __migrate_swap_task() does, and set wake_cpu.
>>
>> So something like so..
>>
>> ---
>>  kernel/sched/core.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index ddd5f48551f1..ade772aa9610 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>>  	 * we're holding p->pi_lock.
>>  	 */
>> -	if (task_rq(p) == rq && task_on_rq_queued(p))
>> -		rq = __migrate_task(rq, p, arg->dest_cpu);
>> +	if (task_rq(p) == rq) {
>> +		if (task_on_rq_queued(p))
>> +			rq = __migrate_task(rq, p, arg->dest_cpu);
>> +		else
>> +			p->wake_cpu = arg->dest_cpu;
>> +	}
>>  	raw_spin_unlock(&rq->lock);
>>  	raw_spin_unlock(&p->pi_lock);
>>  
> 
> And this, too narrow a constraint do git diff made it go away.
> 

yes, set wake_cpu is better when try_to_wake_up(). 
Peter, Is it as a new patch?

> ---
>  kernel/stop_machine.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index ae6f41fb9cba..637798d6b554 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +	/*
> +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> +	 * by doing a preemption.
> +	 */
> +	cond_resched();
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

I agree to use cond_resched(). https://lkml.org/lkml/2016/9/12/1228

for CONFIG_PREEMPT=y and CONFIG_PREEMPT_VOLUNTARY=y, this patch seems unnecessary.
1. when CONFIG_PREEMPT=y
  stop_one_cpu()
  ->cpu_stop_queue_work()
    ->spin_unlock_irqrestore() (preempt_enable() calls __preempt_schedule())

2. when CONFIG_PREEMPT_VOLUNTARY=y
  stop_one_cpu()
  ->wait_for_completion()
    ->...
      ->might_sleep() (calls _cond_resched()


so we really don't need "if defined(CONFIG_PREEMPT_NONE)"?

I also think without the "if defined(CONFIG_PREEMPT_NONE)", 
the code is more clean,and the logic is still ok.



diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..87464a2 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,15 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
        cpu_stop_init_done(&done, 1);
        if (!cpu_stop_queue_work(cpu, &work))
                return -ENOENT;
+
+#if defined(CONFIG_PREEMPT_NONE)
+       /*
+        * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+        * by doing a preemption.
+        */
+       cond_resched();
+#endif
+
        wait_for_completion(&done.completion);
        return done.ret;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
@ 2016-09-13  8:45           ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-13  8:45 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Tue, Sep 13, 2016 at 12:03:05PM +0800, Cheng Chao wrote:
> 
> Peter, Is it as a new patch?

I wanted both changes in one pathc, but I fudged my git-diff.

> > ---
> >  kernel/stop_machine.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > index ae6f41fb9cba..637798d6b554 100644
> > --- a/kernel/stop_machine.c
> > +++ b/kernel/stop_machine.c
> > @@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
> >  	cpu_stop_init_done(&done, 1);
> >  	if (!cpu_stop_queue_work(cpu, &work))
> >  		return -ENOENT;
> > +	/*
> > +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> > +	 * by doing a preemption.
> > +	 */
> > +	cond_resched();
> >  	wait_for_completion(&done.completion);
> >  	return done.ret;
> >  }
> > 
> 
> I agree to use cond_resched(). https://lkml.org/lkml/2016/9/12/1228
> 
> so we really don't need "if defined(CONFIG_PREEMPT_NONE)"?

> I also think without the "if defined(CONFIG_PREEMPT_NONE)", 
> the code is more clean,and the logic is still ok.

You really don't need the #ifdef, look at the actual code, not the
Kconfig language.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 15:01           ` Peter Zijlstra
@ 2016-09-13 16:14             ` Oleg Nesterov
  2016-09-13 16:37               ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-13 16:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On 09/12, Peter Zijlstra wrote:
>
> On Mon, Sep 12, 2016 at 03:05:38PM +0200, Oleg Nesterov wrote:
>
> > But this leads to the question which I wanted to ask many times.
> >
> > Why cond_resched() is not NOP if CONFIG_PREEMPT=y ?
>
> Dunno, nobody bothered to do it? We should keep the might_sleep() of
> course, but the preemption check is pointless.

Yes, agreed, I actually meant _cond_resched().

> > Perhaps we have some users like, just for example,
> > 
> > 	preempt_enable_no_resched();
> > 	cond_resched();
> > 
> > which actually want the should_resched() check even if CONFIG_PREEMPT,
> > but most callers do not?
>
> I would hope not, the few preempt_enable_no_resched() users _should_
> have an actual schedule() call in the _very_ near future.

Me too, and I failed to find something which could be broken... So
perhaps should make it nop and investigate the new bug reports after
that.


Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
__schedule() should be removed it seems, do_exit() can call __schedule()
directly.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13 16:14             ` Oleg Nesterov
@ 2016-09-13 16:37               ` Peter Zijlstra
  2016-09-14  2:07                 ` Cheng Chao
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-13 16:37 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On Tue, Sep 13, 2016 at 06:14:27PM +0200, Oleg Nesterov wrote:

> Me too, and I failed to find something which could be broken... So
> perhaps should make it nop and investigate the new bug reports after
> that.

Works for me :-)

> 
> Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
> __schedule() should be removed it seems, do_exit() can call __schedule()
> directly.


something like so?

---

 include/linux/kernel.h |  2 +-
 include/linux/sched.h  |  2 ++
 kernel/exit.c          | 11 ++---------
 kernel/sched/core.c    | 23 ++++++++++++-----------
 4 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d96a6118d26a..e5bd9cdd2e24 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -266,7 +266,7 @@ extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
 extern int oops_may_print(void);
-void do_exit(long error_code)
+void __noreturn do_exit(long error_code)
 	__noreturn;
 void complete_and_exit(struct completion *, long)
 	__noreturn;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index eb64fcd89e68..b0c818a05b2e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -448,6 +448,8 @@ static inline void io_schedule(void)
 	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
 }
 
+void __noreturn do_task_dead(void);
+
 struct nsproxy;
 struct user_namespace;
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 091a78be3b09..d4c12692f766 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -725,7 +725,7 @@ static void check_stack_usage(void)
 static inline void check_stack_usage(void) {}
 #endif
 
-void do_exit(long code)
+void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
@@ -897,14 +897,7 @@ void do_exit(long code)
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
-	/* causes final put_task_struct in finish_task_switch(). */
-	tsk->state = TASK_DEAD;
-	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
-	schedule();
-	BUG();
-	/* Avoid "noreturn function does return".  */
-	for (;;)
-		cpu_relax();	/* For when BUG is null */
+	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a0086a5fc008..6034f269000f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3327,17 +3327,6 @@ static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	/*
-	 * do_exit() calls schedule() with preemption disabled as an exception;
-	 * however we must fix that up, otherwise the next task will see an
-	 * inconsistent (higher) preempt count.
-	 *
-	 * It also avoids the below schedule_debug() test from complaining
-	 * about this.
-	 */
-	if (unlikely(prev->state == TASK_DEAD))
-		preempt_enable_no_resched_notrace();
-
 	schedule_debug(prev);
 
 	if (sched_feat(HRTICK))
@@ -3404,6 +3393,18 @@ static void __sched notrace __schedule(bool preempt)
 	balance_callback(rq);
 }
 
+void __noreturn do_task_dead(void)
+{
+	/* causes final put_task_struct in finish_task_switch(). */
+	__set_current_state(TASK_DEAD);
+	current->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
+	__schedule(false);
+	BUG();
+	/* Avoid "noreturn function does return".  */
+	for (;;)
+		cpu_relax();	/* For when BUG is null */
+}
+
 static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu()
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
                       ` (2 preceding siblings ...)
  2016-09-12 11:37     ` Peter Zijlstra
@ 2016-09-14  2:01     ` Cheng Chao
  2016-09-14 15:53       ` Oleg Nesterov
                         ` (2 more replies)
  3 siblings, 3 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-14  2:01 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel, Cheng Chao

In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
by doing a preemption.

the caller such as sched_exec can benefit from this change.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/core.c   | 8 ++++++--
 kernel/stop_machine.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a0086a5..283b662 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
-	if (task_rq(p) == rq && task_on_rq_queued(p))
-		rq = __migrate_task(rq, p, arg->dest_cpu);
+	if (task_rq(p) == rq) {
+		if (task_on_rq_queued(p))
+			rq = __migrate_task(rq, p, arg->dest_cpu);
+		else
+			p->wake_cpu = arg->dest_cpu;
+	}
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock(&p->pi_lock);
 
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..1a24890 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+	/*
+	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+	 * by doing a preemption.
+	 */
+	cond_resched();
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13 16:37               ` Peter Zijlstra
@ 2016-09-14  2:07                 ` Cheng Chao
  2016-09-14  7:50                   ` Peter Zijlstra
  2016-09-14 15:45                 ` Oleg Nesterov
  2016-09-22 13:59                 ` [tip:sched/core] sched/core: Optimize __schedule() tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 25+ messages in thread
From: Cheng Chao @ 2016-09-14  2:07 UTC (permalink / raw)
  To: Peter Zijlstra, Oleg Nesterov; +Cc: mingo, tj, akpm, chris, linux-kernel


great, __schedule() doesn't need pay any attention to the TASK_DEAD now.

on 09/14/2016 12:37 AM, Peter Zijlstra wrote:
> On Tue, Sep 13, 2016 at 06:14:27PM +0200, Oleg Nesterov wrote:
> 
>> Me too, and I failed to find something which could be broken... So
>> perhaps should make it nop and investigate the new bug reports after
>> that.
> 
> Works for me :-)
> 
>>
>> Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
>> __schedule() should be removed it seems, do_exit() can call __schedule()
>> directly.
> 
> 
> something like so?
> 
> ---
> 
>  include/linux/kernel.h |  2 +-
>  include/linux/sched.h  |  2 ++
>  kernel/exit.c          | 11 ++---------
>  kernel/sched/core.c    | 23 ++++++++++++-----------
>  4 files changed, 17 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d96a6118d26a..e5bd9cdd2e24 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -266,7 +266,7 @@ extern void oops_enter(void);
>  extern void oops_exit(void);
>  void print_oops_end_marker(void);
>  extern int oops_may_print(void);
> -void do_exit(long error_code)
> +void __noreturn do_exit(long error_code)
>  	__noreturn;
>  void complete_and_exit(struct completion *, long)
>  	__noreturn;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index eb64fcd89e68..b0c818a05b2e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -448,6 +448,8 @@ static inline void io_schedule(void)
>  	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
>  }
>  
> +void __noreturn do_task_dead(void);
> +
>  struct nsproxy;
>  struct user_namespace;
>  
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 091a78be3b09..d4c12692f766 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -725,7 +725,7 @@ static void check_stack_usage(void)
>  static inline void check_stack_usage(void) {}
>  #endif
>  
> -void do_exit(long code)
> +void __noreturn do_exit(long code)
>  {
>  	struct task_struct *tsk = current;
>  	int group_dead;
> @@ -897,14 +897,7 @@ void do_exit(long code)
>  	smp_mb();
>  	raw_spin_unlock_wait(&tsk->pi_lock);
>  
> -	/* causes final put_task_struct in finish_task_switch(). */
> -	tsk->state = TASK_DEAD;
> -	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> -	schedule();
> -	BUG();
> -	/* Avoid "noreturn function does return".  */
> -	for (;;)
> -		cpu_relax();	/* For when BUG is null */
> +	do_task_dead();
>  }
>  EXPORT_SYMBOL_GPL(do_exit);
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a0086a5fc008..6034f269000f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3327,17 +3327,6 @@ static void __sched notrace __schedule(bool preempt)
>  	rq = cpu_rq(cpu);
>  	prev = rq->curr;
>  
> -	/*
> -	 * do_exit() calls schedule() with preemption disabled as an exception;
> -	 * however we must fix that up, otherwise the next task will see an
> -	 * inconsistent (higher) preempt count.
> -	 *
> -	 * It also avoids the below schedule_debug() test from complaining
> -	 * about this.
> -	 */
> -	if (unlikely(prev->state == TASK_DEAD))
> -		preempt_enable_no_resched_notrace();
> -
>  	schedule_debug(prev);
>  
>  	if (sched_feat(HRTICK))
> @@ -3404,6 +3393,18 @@ static void __sched notrace __schedule(bool preempt)
>  	balance_callback(rq);
>  }
>  
> +void __noreturn do_task_dead(void)
> +{
> +	/* causes final put_task_struct in finish_task_switch(). */
> +	__set_current_state(TASK_DEAD);
> +	current->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> +	__schedule(false);
> +	BUG();
> +	/* Avoid "noreturn function does return".  */
> +	for (;;)
> +		cpu_relax();	/* For when BUG is null */
> +}
> +
>  static inline void sched_submit_work(struct task_struct *tsk)
>  {
>  	if (!tsk->state || tsk_is_pi_blocked(tsk))
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-14  2:07                 ` Cheng Chao
@ 2016-09-14  7:50                   ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-14  7:50 UTC (permalink / raw)
  To: Cheng Chao; +Cc: Oleg Nesterov, mingo, tj, akpm, chris, linux-kernel

On Wed, Sep 14, 2016 at 10:07:14AM +0800, Cheng Chao wrote:
> 
> great, __schedule() doesn't need pay any attention to the TASK_DEAD now.

There's still __schedule()->context_switch()->finish_task_switch().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13 16:37               ` Peter Zijlstra
  2016-09-14  2:07                 ` Cheng Chao
@ 2016-09-14 15:45                 ` Oleg Nesterov
  2016-09-22 13:59                 ` [tip:sched/core] sched/core: Optimize __schedule() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-14 15:45 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On 09/13, Peter Zijlstra wrote:
>
> On Tue, Sep 13, 2016 at 06:14:27PM +0200, Oleg Nesterov wrote:
>
> > Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
> > __schedule() should be removed it seems, do_exit() can call __schedule()
> > directly.
>
> something like so?

Yes, exactly.

But I think that raw_spin_unlock_wait(&tsk->pi_lock) and its huge
comment should be moved into the new helper too, this connects to
set_current_state(TASK_DEAD).

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu()
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
@ 2016-09-14 15:53       ` Oleg Nesterov
  2016-09-18  2:07       ` Cheng Chao
  2016-09-22 13:59       ` [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu() tip-bot for Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-14 15:53 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel

On 09/14, Cheng Chao wrote:
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq && task_on_rq_queued(p))
> -		rq = __migrate_task(rq, p, arg->dest_cpu);
> +	if (task_rq(p) == rq) {
> +		if (task_on_rq_queued(p))
> +			rq = __migrate_task(rq, p, arg->dest_cpu);
> +		else
> +			p->wake_cpu = arg->dest_cpu;
> +	}

Cough ;) again, I leave this to Peter...

But imo this change should be documented or perhaps even separated.
It looks fine to me, but this has nothing to do with "we can avoid
a sleep+wakeup by doing a preemption" from the changelog.

This is another improvement, and a small note in the changelog can
unconfuse the reader of git blame/log.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu()
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  2016-09-14 15:53       ` Oleg Nesterov
@ 2016-09-18  2:07       ` Cheng Chao
  2016-09-22 13:59       ` [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu() tip-bot for Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-18  2:07 UTC (permalink / raw)
  To: peterz; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

Hi Peter,

  What should I do next? Thanks.

Cheng

on 09/14/2016 10:01 AM, Cheng Chao wrote:
> In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
> by doing a preemption.
> 
> the caller such as sched_exec can benefit from this change.
> 
> Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/core.c   | 8 ++++++--
>  kernel/stop_machine.c | 5 +++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a0086a5..283b662 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq && task_on_rq_queued(p))
> -		rq = __migrate_task(rq, p, arg->dest_cpu);
> +	if (task_rq(p) == rq) {
> +		if (task_on_rq_queued(p))
> +			rq = __migrate_task(rq, p, arg->dest_cpu);
> +		else
> +			p->wake_cpu = arg->dest_cpu;
> +	}
>  	raw_spin_unlock(&rq->lock);
>  	raw_spin_unlock(&p->pi_lock);
>  
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..1a24890 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +	/*
> +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> +	 * by doing a preemption.
> +	 */
> +	cond_resched();
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu()
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  2016-09-14 15:53       ` Oleg Nesterov
  2016-09-18  2:07       ` Cheng Chao
@ 2016-09-22 13:59       ` tip-bot for Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Cheng Chao @ 2016-09-22 13:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, oleg, tglx, mingo, hpa, torvalds, cs.os.kernel, peterz

Commit-ID:  bf89a304722f6904009499a31dc68ab9a5c9742e
Gitweb:     http://git.kernel.org/tip/bf89a304722f6904009499a31dc68ab9a5c9742e
Author:     Cheng Chao <cs.os.kernel@gmail.com>
AuthorDate: Wed, 14 Sep 2016 10:01:50 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 14:53:45 +0200

stop_machine: Avoid a sleep and wakeup in stop_one_cpu()

In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
cycle by doing a preemption.

Callers such as sched_exec() can benefit from this change.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1473818510-6779-1-git-send-email-cs.os.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c   | 8 ++++++--
 kernel/stop_machine.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c5f020c..ff4e3c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
-	if (task_rq(p) == rq && task_on_rq_queued(p))
-		rq = __migrate_task(rq, p, arg->dest_cpu);
+	if (task_rq(p) == rq) {
+		if (task_on_rq_queued(p))
+			rq = __migrate_task(rq, p, arg->dest_cpu);
+		else
+			p->wake_cpu = arg->dest_cpu;
+	}
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock(&p->pi_lock);
 
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..082e71f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+	/*
+	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+	 * cycle by doing a preemption:
+	 */
+	cond_resched();
 	wait_for_completion(&done.completion);
 	return done.ret;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:sched/core] sched/core: Optimize __schedule()
  2016-09-13 16:37               ` Peter Zijlstra
  2016-09-14  2:07                 ` Cheng Chao
  2016-09-14 15:45                 ` Oleg Nesterov
@ 2016-09-22 13:59                 ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-09-22 13:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, peterz, linux-kernel, cs.os.kernel, hpa, tglx, oleg, mingo

Commit-ID:  9af6528ee9b682df7f29dbee86fbba0b67eab944
Gitweb:     http://git.kernel.org/tip/9af6528ee9b682df7f29dbee86fbba0b67eab944
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 13 Sep 2016 18:37:29 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 14:53:45 +0200

sched/core: Optimize __schedule()

Oleg noted that by making do_exit() use __schedule() for the TASK_DEAD
context switch, we can avoid the TASK_DEAD special case currently in
__schedule() because that avoids the extra preempt_disable() from
schedule().

In order to facilitate this, create a do_task_dead() helper which we
place in the scheduler code, such that it can access __schedule().

Also add some __noreturn annotations to the functions, there's no
coming back from do_exit().

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Cheng Chao <cs.os.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20160913163729.GB5012@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/kernel.h |  9 +++------
 include/linux/sched.h  |  2 ++
 kernel/exit.c          | 26 ++------------------------
 kernel/sched/core.c    | 38 +++++++++++++++++++++++++++-----------
 4 files changed, 34 insertions(+), 41 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d96a611..74fd6f0 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -259,17 +259,14 @@ static inline void might_fault(void) { }
 extern struct atomic_notifier_head panic_notifier_list;
 extern long (*panic_blink)(int state);
 __printf(1, 2)
-void panic(const char *fmt, ...)
-	__noreturn __cold;
+void panic(const char *fmt, ...) __noreturn __cold;
 void nmi_panic(struct pt_regs *regs, const char *msg);
 extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
 extern int oops_may_print(void);
-void do_exit(long error_code)
-	__noreturn;
-void complete_and_exit(struct completion *, long)
-	__noreturn;
+void do_exit(long error_code) __noreturn;
+void complete_and_exit(struct completion *, long) __noreturn;
 
 /* Internal, do not use. */
 int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d750240..f00ee8e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -448,6 +448,8 @@ static inline void io_schedule(void)
 	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
 }
 
+void __noreturn do_task_dead(void);
+
 struct nsproxy;
 struct user_namespace;
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 091a78b..1e1d913 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -725,7 +725,7 @@ static void check_stack_usage(void)
 static inline void check_stack_usage(void) {}
 #endif
 
-void do_exit(long code)
+void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
@@ -882,29 +882,7 @@ void do_exit(long code)
 	exit_rcu();
 	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
 
-	/*
-	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
-	 * when the following two conditions become true.
-	 *   - There is race condition of mmap_sem (It is acquired by
-	 *     exit_mm()), and
-	 *   - SMI occurs before setting TASK_RUNINNG.
-	 *     (or hypervisor of virtual machine switches to other guest)
-	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
-	 *
-	 * To avoid it, we have to wait for releasing tsk->pi_lock which
-	 * is held by try_to_wake_up()
-	 */
-	smp_mb();
-	raw_spin_unlock_wait(&tsk->pi_lock);
-
-	/* causes final put_task_struct in finish_task_switch(). */
-	tsk->state = TASK_DEAD;
-	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
-	schedule();
-	BUG();
-	/* Avoid "noreturn function does return".  */
-	for (;;)
-		cpu_relax();	/* For when BUG is null */
+	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ff4e3c0..b2ec53c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3331,17 +3331,6 @@ static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	/*
-	 * do_exit() calls schedule() with preemption disabled as an exception;
-	 * however we must fix that up, otherwise the next task will see an
-	 * inconsistent (higher) preempt count.
-	 *
-	 * It also avoids the below schedule_debug() test from complaining
-	 * about this.
-	 */
-	if (unlikely(prev->state == TASK_DEAD))
-		preempt_enable_no_resched_notrace();
-
 	schedule_debug(prev);
 
 	if (sched_feat(HRTICK))
@@ -3409,6 +3398,33 @@ static void __sched notrace __schedule(bool preempt)
 }
 STACK_FRAME_NON_STANDARD(__schedule); /* switch_to() */
 
+void __noreturn do_task_dead(void)
+{
+	/*
+	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
+	 * when the following two conditions become true.
+	 *   - There is race condition of mmap_sem (It is acquired by
+	 *     exit_mm()), and
+	 *   - SMI occurs before setting TASK_RUNINNG.
+	 *     (or hypervisor of virtual machine switches to other guest)
+	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
+	 *
+	 * To avoid it, we have to wait for releasing tsk->pi_lock which
+	 * is held by try_to_wake_up()
+	 */
+	smp_mb();
+	raw_spin_unlock_wait(&current->pi_lock);
+
+	/* causes final put_task_struct in finish_task_switch(). */
+	__set_current_state(TASK_DEAD);
+	current->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
+	__schedule(false);
+	BUG();
+	/* Avoid "noreturn function does return".  */
+	for (;;)
+		cpu_relax();	/* For when BUG is null */
+}
+
 static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-22 14:00 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <OF73BCE1E4.D2224FEC-ON48258025.0022CFA7-48258025.0022E3DC@kedacom.com>
2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
2016-09-09  8:19   ` chengchao
2016-09-09 13:14   ` Oleg Nesterov
2016-09-09 16:24     ` Peter Zijlstra
2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
2016-09-10  9:51     ` Cheng Chao
2016-09-10 16:33       ` Peter Zijlstra
2016-09-12 11:03     ` Oleg Nesterov
2016-09-13  2:45       ` Cheng Chao
2016-09-12 11:37     ` Peter Zijlstra
2016-09-12 11:41       ` Peter Zijlstra
2016-09-12 13:05         ` Oleg Nesterov
2016-09-12 15:01           ` Peter Zijlstra
2016-09-13 16:14             ` Oleg Nesterov
2016-09-13 16:37               ` Peter Zijlstra
2016-09-14  2:07                 ` Cheng Chao
2016-09-14  7:50                   ` Peter Zijlstra
2016-09-14 15:45                 ` Oleg Nesterov
2016-09-22 13:59                 ` [tip:sched/core] sched/core: Optimize __schedule() tip-bot for Peter Zijlstra
2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
2016-09-13  8:45           ` Peter Zijlstra
2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
2016-09-14 15:53       ` Oleg Nesterov
2016-09-18  2:07       ` Cheng Chao
2016-09-22 13:59       ` [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu() tip-bot for Cheng Chao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).