All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
       [not found] <OF73BCE1E4.D2224FEC-ON48258025.0022CFA7-48258025.0022E3DC@kedacom.com>
@ 2016-09-09  8:13 ` Cheng Chao
  2016-09-09  8:19   ` chengchao
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-09  8:13 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel, Cheng Chao

For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).

If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
executes migration_cpu_stop(), and the stopper thread wakes up the task.

But in fact, all above works are almost useless(wasteful),the reason is 
migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the 
task is TASK_ON_RQ_QUEUED before it calls __migrate_task().

This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
so the migration_cpu_stop() can do useful works.

Signed-off-by: Cheng Chao <chengchao@kedacom.com>
---
 kernel/stop_machine.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..41aea5e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+
+#if defined(CONFIG_PREEMPT_NONE)
+	/*
+	 * Makes the stopper thread run as soon as possible.
+	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
+	 * It's special useful for some callers which are expected to be
+	 * TASK_ON_RQ_QUEUED.
+	 * sched_exec does benefit from this improvement.
+	 */
+	schedule();
+#endif
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
@ 2016-09-09  8:19   ` chengchao
  2016-09-09 13:14   ` Oleg Nesterov
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: chengchao @ 2016-09-09  8:19 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel


--in-reply-to doesn't work?

v1 is : 
https://lkml.org/lkml/2016/9/7/819


on 09/09/2016 04:13 PM, Cheng Chao wrote:
> For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
> calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).
> 
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 
> But in fact, all above works are almost useless(wasteful),the reason is 
> migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the 
> task is TASK_ON_RQ_QUEUED before it calls __migrate_task().
> 
> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.
> 
> Signed-off-by: Cheng Chao <chengchao@kedacom.com>
> ---
>  kernel/stop_machine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..41aea5e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();
> +#endif
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  2016-09-09  8:19   ` chengchao
@ 2016-09-09 13:14   ` Oleg Nesterov
  2016-09-09 16:24     ` Peter Zijlstra
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-09 13:14 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel

On 09/09, Cheng Chao wrote:
>
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.

and this finally migrates the target, ttwu() does another
select_task_rq().

> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.

yes, this avoids the extra wakeup/select_task_rq. So this is the minor
optimization.

> Signed-off-by: Cheng Chao <chengchao@kedacom.com>
> ---
>  kernel/stop_machine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..41aea5e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();

Well. This can help in general, wait_for_completion() won't block, but
only if we queue the work on the same CPU. Otherwise this schedule() adds
the unnecessary pessimization.

That is why I suggested to use _cond_resched() instead of schedule().

But let me repeat, I leave this to maintainers.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09 13:14   ` Oleg Nesterov
@ 2016-09-09 16:24     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-09 16:24 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On Fri, Sep 09, 2016 at 03:14:49PM +0200, Oleg Nesterov wrote:
> On 09/09, Cheng Chao wrote:
> >
> > If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> > It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> > executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 

Still not seeing any mail from Cheng Chao. I think he's mailing from a
domain that's not accepting email and then infradead makes its go-away,
why accept email from people who will not accept bounces.

Cheng, please fix your email setup.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  2016-09-09  8:19   ` chengchao
  2016-09-09 13:14   ` Oleg Nesterov
@ 2016-09-10  8:52   ` Cheng Chao
  2016-09-10  9:51     ` Cheng Chao
                       ` (3 more replies)
  2 siblings, 4 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-10  8:52 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel, Cheng Chao

For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).

If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
executes migration_cpu_stop(), and the stopper thread wakes up the task.

But in fact, all above works are almost useless(wasteful),the reason is
migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the
task is TASK_ON_RQ_QUEUED before it calls __migrate_task().

This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
so the migration_cpu_stop() can do useful works.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
---
 kernel/stop_machine.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..41aea5e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+
+#if defined(CONFIG_PREEMPT_NONE)
+	/*
+	 * Makes the stopper thread run as soon as possible.
+	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
+	 * It's special useful for some callers which are expected to be
+	 * TASK_ON_RQ_QUEUED.
+	 * sched_exec does benefit from this improvement.
+	 */
+	schedule();
+#endif
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
@ 2016-09-10  9:51     ` Cheng Chao
  2016-09-10 16:33       ` Peter Zijlstra
  2016-09-12 11:03     ` Oleg Nesterov
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Cheng Chao @ 2016-09-10  9:51 UTC (permalink / raw)
  To: peterz; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

hi Peter, I guess you can receive the mail from me now,
I have changed the mailbox to gmail.

Oleg has already done much work for this patch, I am really obliged.
please review this patch, thanks.

on 09/10/2016 04:52 PM, Cheng Chao wrote:
> For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
> calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).
> 
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 
> But in fact, all above works are almost useless(wasteful),the reason is
> migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the
> task is TASK_ON_RQ_QUEUED before it calls __migrate_task().
> 
> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.
> 
> Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
> ---
>  kernel/stop_machine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..41aea5e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();
> +#endif
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  9:51     ` Cheng Chao
@ 2016-09-10 16:33       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-10 16:33 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Sat, Sep 10, 2016 at 05:51:02PM +0800, Cheng Chao wrote:
> hi Peter, I guess you can receive the mail from me now,
> I have changed the mailbox to gmail.

Indeed, I'll have a look on Monday.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2016-09-10  9:51     ` Cheng Chao
@ 2016-09-12 11:03     ` Oleg Nesterov
  2016-09-13  2:45       ` Cheng Chao
  2016-09-12 11:37     ` Peter Zijlstra
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  3 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-12 11:03 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel

On 09/10, Cheng Chao wrote:
>
> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +
> +#if defined(CONFIG_PREEMPT_NONE)
> +	/*
> +	 * Makes the stopper thread run as soon as possible.
> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
> +	 * It's special useful for some callers which are expected to be
> +	 * TASK_ON_RQ_QUEUED.
> +	 * sched_exec does benefit from this improvement.
> +	 */
> +	schedule();
> +#endif
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }

Cheng, I already tried twice to suggest to conditionalize this schedule,
because it can only help if cpu == smp_processor_id, and you didn't reply.
I still think _cond_resched() makes more sense.

I won't really argue if you prefer it this way. But did you see my emails?

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
  2016-09-10  9:51     ` Cheng Chao
  2016-09-12 11:03     ` Oleg Nesterov
@ 2016-09-12 11:37     ` Peter Zijlstra
  2016-09-12 11:41       ` Peter Zijlstra
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  3 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-12 11:37 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Sat, Sep 10, 2016 at 04:52:12PM +0800, Cheng Chao wrote:
> For CONFIG_PREEMPT_NONE=y, when sched_exec() needs migration, sched_exec()
> calls stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg).
> 
> If the migration_cpu_stop() can not migrate,why do we call stop_one_cpu()?
> It just makes the task TASK_UNINTERRUPTIBLE, wakes up the stopper thread,
> executes migration_cpu_stop(), and the stopper thread wakes up the task.
> 
> But in fact, all above works are almost useless(wasteful),the reason is
> migration_cpu_stop() can not migrate. why? migration_cpu_stop() needs the
> task is TASK_ON_RQ_QUEUED before it calls __migrate_task().
> 
> This patch keeps the task TASK_RUNNING instead of TASK_UNINTERRUPTIBLE,
> so the migration_cpu_stop() can do useful works.

OK, completely confused. What!?

/me ponders....

So what you're saying is that migration_stop_cpu() doesn't work because
wait_for_completion() dequeues the task.

True I suppose. Not sure I like your solution, nor your implementation
of the solution much though.

I would much prefer an unconditional cond_resched() there, but also, I
think we should do what __migrate_swap_task() does, and set wake_cpu.

So something like so..

---
 kernel/sched/core.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ddd5f48551f1..ade772aa9610 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
-	if (task_rq(p) == rq && task_on_rq_queued(p))
-		rq = __migrate_task(rq, p, arg->dest_cpu);
+	if (task_rq(p) == rq) {
+		if (task_on_rq_queued(p))
+			rq = __migrate_task(rq, p, arg->dest_cpu);
+		else
+			p->wake_cpu = arg->dest_cpu;
+	}
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock(&p->pi_lock);
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:37     ` Peter Zijlstra
@ 2016-09-12 11:41       ` Peter Zijlstra
  2016-09-12 13:05         ` Oleg Nesterov
  2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-12 11:41 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Mon, Sep 12, 2016 at 01:37:27PM +0200, Peter Zijlstra wrote:
> So what you're saying is that migration_stop_cpu() doesn't work because
> wait_for_completion() dequeues the task.
> 
> True I suppose. Not sure I like your solution, nor your implementation
> of the solution much though.
> 
> I would much prefer an unconditional cond_resched() there, but also, I
> think we should do what __migrate_swap_task() does, and set wake_cpu.
> 
> So something like so..
> 
> ---
>  kernel/sched/core.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ddd5f48551f1..ade772aa9610 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq && task_on_rq_queued(p))
> -		rq = __migrate_task(rq, p, arg->dest_cpu);
> +	if (task_rq(p) == rq) {
> +		if (task_on_rq_queued(p))
> +			rq = __migrate_task(rq, p, arg->dest_cpu);
> +		else
> +			p->wake_cpu = arg->dest_cpu;
> +	}
>  	raw_spin_unlock(&rq->lock);
>  	raw_spin_unlock(&p->pi_lock);
>  

And this, too narrow a constraint do git diff made it go away.

---
 kernel/stop_machine.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ae6f41fb9cba..637798d6b554 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+	/*
+	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+	 * by doing a preemption.
+	 */
+	cond_resched();
 	wait_for_completion(&done.completion);
 	return done.ret;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:41       ` Peter Zijlstra
@ 2016-09-12 13:05         ` Oleg Nesterov
  2016-09-12 15:01           ` Peter Zijlstra
  2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
  1 sibling, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-12 13:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On 09/12, Peter Zijlstra wrote:
>
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +	/*
> +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> +	 * by doing a preemption.
> +	 */
> +	cond_resched();

Yes, this is what I tried to suggest too.

But this leads to the question which I wanted to ask many times.

Why cond_resched() is not NOP if CONFIG_PREEMPT=y ?

Perhaps we have some users like, just for example,

	preempt_enable_no_resched();
	cond_resched();

which actually want the should_resched() check even if CONFIG_PREEMPT,
but most callers do not?

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 13:05         ` Oleg Nesterov
@ 2016-09-12 15:01           ` Peter Zijlstra
  2016-09-13 16:14             ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-12 15:01 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On Mon, Sep 12, 2016 at 03:05:38PM +0200, Oleg Nesterov wrote:

> But this leads to the question which I wanted to ask many times.
> 
> Why cond_resched() is not NOP if CONFIG_PREEMPT=y ?

Dunno, nobody bothered to do it? We should keep the might_sleep() of
course, but the preemption check is pointless.

> Perhaps we have some users like, just for example,
> 
> 	preempt_enable_no_resched();
> 	cond_resched();
> 
> which actually want the should_resched() check even if CONFIG_PREEMPT,
> but most callers do not?

I would hope not, the few preempt_enable_no_resched() users _should_
have an actual schedule() call in the _very_ near future.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:03     ` Oleg Nesterov
@ 2016-09-13  2:45       ` Cheng Chao
  0 siblings, 0 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-13  2:45 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel



on 09/12/2016 07:03 PM, Oleg Nesterov wrote:
> On 09/10, Cheng Chao wrote:
>>
>> @@ -126,6 +126,17 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>>  	cpu_stop_init_done(&done, 1);
>>  	if (!cpu_stop_queue_work(cpu, &work))
>>  		return -ENOENT;
>> +
>> +#if defined(CONFIG_PREEMPT_NONE)
>> +	/*
>> +	 * Makes the stopper thread run as soon as possible.
>> +	 * And if the caller is TASK_RUNNING, keeps the caller TASK_RUNNING.
>> +	 * It's special useful for some callers which are expected to be
>> +	 * TASK_ON_RQ_QUEUED.
>> +	 * sched_exec does benefit from this improvement.
>> +	 */
>> +	schedule();
>> +#endif
>>  	wait_for_completion(&done.completion);
>>  	return done.ret;
>>  }
> 
> Cheng, I already tried twice to suggest to conditionalize this schedule,
> because it can only help if cpu == smp_processor_id, and you didn't reply.
> I still think _cond_resched() makes more sense.
> 
> I won't really argue if you prefer it this way. But did you see my emails?
>

I read them, thanks. because Peter didn't receive my mails before, it took me much time
to fix my mailbox, so I didn't reply on time.

Ok, even if cpu != smp_processor_id(), to call schedule() instead _cond_resched() can
give the caller a chance not to sleep. when the caller runs on the cpu again, it may 
likely find the completion is already done. 
then the stopper thread cpu_stop_signal_done() and the caller wait_for_completion() will
actually run very soon.

I think it is trivial improvement. using cond_resched()/_cond_resched() is better for 
readability, I choose the cond_resched().

thanks again.



 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 11:41       ` Peter Zijlstra
  2016-09-12 13:05         ` Oleg Nesterov
@ 2016-09-13  4:03         ` Cheng Chao
  2016-09-13  8:45           ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Cheng Chao @ 2016-09-13  4:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

Peter, thank you.

on 09/12/2016 07:41 PM, Peter Zijlstra wrote:
> On Mon, Sep 12, 2016 at 01:37:27PM +0200, Peter Zijlstra wrote:
>> So what you're saying is that migration_stop_cpu() doesn't work because
>> wait_for_completion() dequeues the task.
>>
>> True I suppose. Not sure I like your solution, nor your implementation
>> of the solution much though.
>>
>> I would much prefer an unconditional cond_resched() there, but also, I
>> think we should do what __migrate_swap_task() does, and set wake_cpu.
>>
>> So something like so..
>>
>> ---
>>  kernel/sched/core.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index ddd5f48551f1..ade772aa9610 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>>  	 * we're holding p->pi_lock.
>>  	 */
>> -	if (task_rq(p) == rq && task_on_rq_queued(p))
>> -		rq = __migrate_task(rq, p, arg->dest_cpu);
>> +	if (task_rq(p) == rq) {
>> +		if (task_on_rq_queued(p))
>> +			rq = __migrate_task(rq, p, arg->dest_cpu);
>> +		else
>> +			p->wake_cpu = arg->dest_cpu;
>> +	}
>>  	raw_spin_unlock(&rq->lock);
>>  	raw_spin_unlock(&p->pi_lock);
>>  
> 
> And this, too narrow a constraint do git diff made it go away.
> 

yes, set wake_cpu is better when try_to_wake_up(). 
Peter, Is it as a new patch?

> ---
>  kernel/stop_machine.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index ae6f41fb9cba..637798d6b554 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +	/*
> +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> +	 * by doing a preemption.
> +	 */
> +	cond_resched();
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

I agree to use cond_resched(). https://lkml.org/lkml/2016/9/12/1228

for CONFIG_PREEMPT=y and CONFIG_PREEMPT_VOLUNTARY=y, this patch seems unnecessary.
1. when CONFIG_PREEMPT=y
  stop_one_cpu()
  ->cpu_stop_queue_work()
    ->spin_unlock_irqrestore() (preempt_enable() calls __preempt_schedule())

2. when CONFIG_PREEMPT_VOLUNTARY=y
  stop_one_cpu()
  ->wait_for_completion()
    ->...
      ->might_sleep() (calls _cond_resched()


so we really don't need "if defined(CONFIG_PREEMPT_NONE)"?

I also think without the "if defined(CONFIG_PREEMPT_NONE)", 
the code is more clean,and the logic is still ok.



diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..87464a2 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,15 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
        cpu_stop_init_done(&done, 1);
        if (!cpu_stop_queue_work(cpu, &work))
                return -ENOENT;
+
+#if defined(CONFIG_PREEMPT_NONE)
+       /*
+        * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+        * by doing a preemption.
+        */
+       cond_resched();
+#endif
+
        wait_for_completion(&done.completion);
        return done.ret;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
@ 2016-09-13  8:45           ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-13  8:45 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

On Tue, Sep 13, 2016 at 12:03:05PM +0800, Cheng Chao wrote:
> 
> Peter, Is it as a new patch?

I wanted both changes in one pathc, but I fudged my git-diff.

> > ---
> >  kernel/stop_machine.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > index ae6f41fb9cba..637798d6b554 100644
> > --- a/kernel/stop_machine.c
> > +++ b/kernel/stop_machine.c
> > @@ -121,6 +121,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
> >  	cpu_stop_init_done(&done, 1);
> >  	if (!cpu_stop_queue_work(cpu, &work))
> >  		return -ENOENT;
> > +	/*
> > +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> > +	 * by doing a preemption.
> > +	 */
> > +	cond_resched();
> >  	wait_for_completion(&done.completion);
> >  	return done.ret;
> >  }
> > 
> 
> I agree to use cond_resched(). https://lkml.org/lkml/2016/9/12/1228
> 
> so we really don't need "if defined(CONFIG_PREEMPT_NONE)"?

> I also think without the "if defined(CONFIG_PREEMPT_NONE)", 
> the code is more clean,and the logic is still ok.

You really don't need the #ifdef, look at the actual code, not the
Kconfig language.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-12 15:01           ` Peter Zijlstra
@ 2016-09-13 16:14             ` Oleg Nesterov
  2016-09-13 16:37               ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-13 16:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On 09/12, Peter Zijlstra wrote:
>
> On Mon, Sep 12, 2016 at 03:05:38PM +0200, Oleg Nesterov wrote:
>
> > But this leads to the question which I wanted to ask many times.
> >
> > Why cond_resched() is not NOP if CONFIG_PREEMPT=y ?
>
> Dunno, nobody bothered to do it? We should keep the might_sleep() of
> course, but the preemption check is pointless.

Yes, agreed, I actually meant _cond_resched().

> > Perhaps we have some users like, just for example,
> > 
> > 	preempt_enable_no_resched();
> > 	cond_resched();
> > 
> > which actually want the should_resched() check even if CONFIG_PREEMPT,
> > but most callers do not?
>
> I would hope not, the few preempt_enable_no_resched() users _should_
> have an actual schedule() call in the _very_ near future.

Me too, and I failed to find something which could be broken... So
perhaps should make it nop and investigate the new bug reports after
that.


Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
__schedule() should be removed it seems, do_exit() can call __schedule()
directly.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13 16:14             ` Oleg Nesterov
@ 2016-09-13 16:37               ` Peter Zijlstra
  2016-09-14  2:07                 ` Cheng Chao
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-13 16:37 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On Tue, Sep 13, 2016 at 06:14:27PM +0200, Oleg Nesterov wrote:

> Me too, and I failed to find something which could be broken... So
> perhaps should make it nop and investigate the new bug reports after
> that.

Works for me :-)

> 
> Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
> __schedule() should be removed it seems, do_exit() can call __schedule()
> directly.


something like so?

---

 include/linux/kernel.h |  2 +-
 include/linux/sched.h  |  2 ++
 kernel/exit.c          | 11 ++---------
 kernel/sched/core.c    | 23 ++++++++++++-----------
 4 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d96a6118d26a..e5bd9cdd2e24 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -266,7 +266,7 @@ extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
 extern int oops_may_print(void);
-void do_exit(long error_code)
+void __noreturn do_exit(long error_code)
 	__noreturn;
 void complete_and_exit(struct completion *, long)
 	__noreturn;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index eb64fcd89e68..b0c818a05b2e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -448,6 +448,8 @@ static inline void io_schedule(void)
 	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
 }
 
+void __noreturn do_task_dead(void);
+
 struct nsproxy;
 struct user_namespace;
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 091a78be3b09..d4c12692f766 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -725,7 +725,7 @@ static void check_stack_usage(void)
 static inline void check_stack_usage(void) {}
 #endif
 
-void do_exit(long code)
+void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
@@ -897,14 +897,7 @@ void do_exit(long code)
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
-	/* causes final put_task_struct in finish_task_switch(). */
-	tsk->state = TASK_DEAD;
-	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
-	schedule();
-	BUG();
-	/* Avoid "noreturn function does return".  */
-	for (;;)
-		cpu_relax();	/* For when BUG is null */
+	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a0086a5fc008..6034f269000f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3327,17 +3327,6 @@ static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	/*
-	 * do_exit() calls schedule() with preemption disabled as an exception;
-	 * however we must fix that up, otherwise the next task will see an
-	 * inconsistent (higher) preempt count.
-	 *
-	 * It also avoids the below schedule_debug() test from complaining
-	 * about this.
-	 */
-	if (unlikely(prev->state == TASK_DEAD))
-		preempt_enable_no_resched_notrace();
-
 	schedule_debug(prev);
 
 	if (sched_feat(HRTICK))
@@ -3404,6 +3393,18 @@ static void __sched notrace __schedule(bool preempt)
 	balance_callback(rq);
 }
 
+void __noreturn do_task_dead(void)
+{
+	/* causes final put_task_struct in finish_task_switch(). */
+	__set_current_state(TASK_DEAD);
+	current->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
+	__schedule(false);
+	BUG();
+	/* Avoid "noreturn function does return".  */
+	for (;;)
+		cpu_relax();	/* For when BUG is null */
+}
+
 static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu()
  2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
                       ` (2 preceding siblings ...)
  2016-09-12 11:37     ` Peter Zijlstra
@ 2016-09-14  2:01     ` Cheng Chao
  2016-09-14 15:53       ` Oleg Nesterov
                         ` (2 more replies)
  3 siblings, 3 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-14  2:01 UTC (permalink / raw)
  To: mingo, oleg, peterz, tj, akpm, chris; +Cc: linux-kernel, Cheng Chao

In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
by doing a preemption.

the caller such as sched_exec can benefit from this change.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/core.c   | 8 ++++++--
 kernel/stop_machine.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a0086a5..283b662 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
-	if (task_rq(p) == rq && task_on_rq_queued(p))
-		rq = __migrate_task(rq, p, arg->dest_cpu);
+	if (task_rq(p) == rq) {
+		if (task_on_rq_queued(p))
+			rq = __migrate_task(rq, p, arg->dest_cpu);
+		else
+			p->wake_cpu = arg->dest_cpu;
+	}
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock(&p->pi_lock);
 
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..1a24890 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+	/*
+	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+	 * by doing a preemption.
+	 */
+	cond_resched();
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13 16:37               ` Peter Zijlstra
@ 2016-09-14  2:07                 ` Cheng Chao
  2016-09-14  7:50                   ` Peter Zijlstra
  2016-09-14 15:45                 ` Oleg Nesterov
  2016-09-22 13:59                 ` [tip:sched/core] sched/core: Optimize __schedule() tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 25+ messages in thread
From: Cheng Chao @ 2016-09-14  2:07 UTC (permalink / raw)
  To: Peter Zijlstra, Oleg Nesterov; +Cc: mingo, tj, akpm, chris, linux-kernel


great, __schedule() doesn't need pay any attention to the TASK_DEAD now.

on 09/14/2016 12:37 AM, Peter Zijlstra wrote:
> On Tue, Sep 13, 2016 at 06:14:27PM +0200, Oleg Nesterov wrote:
> 
>> Me too, and I failed to find something which could be broken... So
>> perhaps should make it nop and investigate the new bug reports after
>> that.
> 
> Works for me :-)
> 
>>
>> Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
>> __schedule() should be removed it seems, do_exit() can call __schedule()
>> directly.
> 
> 
> something like so?
> 
> ---
> 
>  include/linux/kernel.h |  2 +-
>  include/linux/sched.h  |  2 ++
>  kernel/exit.c          | 11 ++---------
>  kernel/sched/core.c    | 23 ++++++++++++-----------
>  4 files changed, 17 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d96a6118d26a..e5bd9cdd2e24 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -266,7 +266,7 @@ extern void oops_enter(void);
>  extern void oops_exit(void);
>  void print_oops_end_marker(void);
>  extern int oops_may_print(void);
> -void do_exit(long error_code)
> +void __noreturn do_exit(long error_code)
>  	__noreturn;
>  void complete_and_exit(struct completion *, long)
>  	__noreturn;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index eb64fcd89e68..b0c818a05b2e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -448,6 +448,8 @@ static inline void io_schedule(void)
>  	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
>  }
>  
> +void __noreturn do_task_dead(void);
> +
>  struct nsproxy;
>  struct user_namespace;
>  
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 091a78be3b09..d4c12692f766 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -725,7 +725,7 @@ static void check_stack_usage(void)
>  static inline void check_stack_usage(void) {}
>  #endif
>  
> -void do_exit(long code)
> +void __noreturn do_exit(long code)
>  {
>  	struct task_struct *tsk = current;
>  	int group_dead;
> @@ -897,14 +897,7 @@ void do_exit(long code)
>  	smp_mb();
>  	raw_spin_unlock_wait(&tsk->pi_lock);
>  
> -	/* causes final put_task_struct in finish_task_switch(). */
> -	tsk->state = TASK_DEAD;
> -	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> -	schedule();
> -	BUG();
> -	/* Avoid "noreturn function does return".  */
> -	for (;;)
> -		cpu_relax();	/* For when BUG is null */
> +	do_task_dead();
>  }
>  EXPORT_SYMBOL_GPL(do_exit);
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a0086a5fc008..6034f269000f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3327,17 +3327,6 @@ static void __sched notrace __schedule(bool preempt)
>  	rq = cpu_rq(cpu);
>  	prev = rq->curr;
>  
> -	/*
> -	 * do_exit() calls schedule() with preemption disabled as an exception;
> -	 * however we must fix that up, otherwise the next task will see an
> -	 * inconsistent (higher) preempt count.
> -	 *
> -	 * It also avoids the below schedule_debug() test from complaining
> -	 * about this.
> -	 */
> -	if (unlikely(prev->state == TASK_DEAD))
> -		preempt_enable_no_resched_notrace();
> -
>  	schedule_debug(prev);
>  
>  	if (sched_feat(HRTICK))
> @@ -3404,6 +3393,18 @@ static void __sched notrace __schedule(bool preempt)
>  	balance_callback(rq);
>  }
>  
> +void __noreturn do_task_dead(void)
> +{
> +	/* causes final put_task_struct in finish_task_switch(). */
> +	__set_current_state(TASK_DEAD);
> +	current->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> +	__schedule(false);
> +	BUG();
> +	/* Avoid "noreturn function does return".  */
> +	for (;;)
> +		cpu_relax();	/* For when BUG is null */
> +}
> +
>  static inline void sched_submit_work(struct task_struct *tsk)
>  {
>  	if (!tsk->state || tsk_is_pi_blocked(tsk))
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-14  2:07                 ` Cheng Chao
@ 2016-09-14  7:50                   ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-14  7:50 UTC (permalink / raw)
  To: Cheng Chao; +Cc: Oleg Nesterov, mingo, tj, akpm, chris, linux-kernel

On Wed, Sep 14, 2016 at 10:07:14AM +0800, Cheng Chao wrote:
> 
> great, __schedule() doesn't need pay any attention to the TASK_DEAD now.

There's still __schedule()->context_switch()->finish_task_switch().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE
  2016-09-13 16:37               ` Peter Zijlstra
  2016-09-14  2:07                 ` Cheng Chao
@ 2016-09-14 15:45                 ` Oleg Nesterov
  2016-09-22 13:59                 ` [tip:sched/core] sched/core: Optimize __schedule() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-14 15:45 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cheng Chao, mingo, tj, akpm, chris, linux-kernel

On 09/13, Peter Zijlstra wrote:
>
> On Tue, Sep 13, 2016 at 06:14:27PM +0200, Oleg Nesterov wrote:
>
> > Hmm. And  preempt_enable_no_resched_notrace() under TASK_DEAD in
> > __schedule() should be removed it seems, do_exit() can call __schedule()
> > directly.
>
> something like so?

Yes, exactly.

But I think that raw_spin_unlock_wait(&tsk->pi_lock) and its huge
comment should be moved into the new helper too, this connects to
set_current_state(TASK_DEAD).

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu()
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
@ 2016-09-14 15:53       ` Oleg Nesterov
  2016-09-18  2:07       ` Cheng Chao
  2016-09-22 13:59       ` [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu() tip-bot for Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2016-09-14 15:53 UTC (permalink / raw)
  To: Cheng Chao; +Cc: mingo, peterz, tj, akpm, chris, linux-kernel

On 09/14, Cheng Chao wrote:
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq && task_on_rq_queued(p))
> -		rq = __migrate_task(rq, p, arg->dest_cpu);
> +	if (task_rq(p) == rq) {
> +		if (task_on_rq_queued(p))
> +			rq = __migrate_task(rq, p, arg->dest_cpu);
> +		else
> +			p->wake_cpu = arg->dest_cpu;
> +	}

Cough ;) again, I leave this to Peter...

But imo this change should be documented or perhaps even separated.
It looks fine to me, but this has nothing to do with "we can avoid
a sleep+wakeup by doing a preemption" from the changelog.

This is another improvement, and a small note in the changelog can
unconfuse the reader of git blame/log.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu()
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  2016-09-14 15:53       ` Oleg Nesterov
@ 2016-09-18  2:07       ` Cheng Chao
  2016-09-22 13:59       ` [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu() tip-bot for Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: Cheng Chao @ 2016-09-18  2:07 UTC (permalink / raw)
  To: peterz; +Cc: mingo, oleg, tj, akpm, chris, linux-kernel

Hi Peter,

  What should I do next? Thanks.

Cheng

on 09/14/2016 10:01 AM, Cheng Chao wrote:
> In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
> by doing a preemption.
> 
> the caller such as sched_exec can benefit from this change.
> 
> Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/core.c   | 8 ++++++--
>  kernel/stop_machine.c | 5 +++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a0086a5..283b662 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq && task_on_rq_queued(p))
> -		rq = __migrate_task(rq, p, arg->dest_cpu);
> +	if (task_rq(p) == rq) {
> +		if (task_on_rq_queued(p))
> +			rq = __migrate_task(rq, p, arg->dest_cpu);
> +		else
> +			p->wake_cpu = arg->dest_cpu;
> +	}
>  	raw_spin_unlock(&rq->lock);
>  	raw_spin_unlock(&p->pi_lock);
>  
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 4a1ca5f..1a24890 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -126,6 +126,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
>  	cpu_stop_init_done(&done, 1);
>  	if (!cpu_stop_queue_work(cpu, &work))
>  		return -ENOENT;
> +	/*
> +	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
> +	 * by doing a preemption.
> +	 */
> +	cond_resched();
>  	wait_for_completion(&done.completion);
>  	return done.ret;
>  }
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu()
  2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
  2016-09-14 15:53       ` Oleg Nesterov
  2016-09-18  2:07       ` Cheng Chao
@ 2016-09-22 13:59       ` tip-bot for Cheng Chao
  2 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Cheng Chao @ 2016-09-22 13:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, oleg, tglx, mingo, hpa, torvalds, cs.os.kernel, peterz

Commit-ID:  bf89a304722f6904009499a31dc68ab9a5c9742e
Gitweb:     http://git.kernel.org/tip/bf89a304722f6904009499a31dc68ab9a5c9742e
Author:     Cheng Chao <cs.os.kernel@gmail.com>
AuthorDate: Wed, 14 Sep 2016 10:01:50 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 14:53:45 +0200

stop_machine: Avoid a sleep and wakeup in stop_one_cpu()

In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
cycle by doing a preemption.

Callers such as sched_exec() can benefit from this change.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1473818510-6779-1-git-send-email-cs.os.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c   | 8 ++++++--
 kernel/stop_machine.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c5f020c..ff4e3c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1063,8 +1063,12 @@ static int migration_cpu_stop(void *data)
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
-	if (task_rq(p) == rq && task_on_rq_queued(p))
-		rq = __migrate_task(rq, p, arg->dest_cpu);
+	if (task_rq(p) == rq) {
+		if (task_on_rq_queued(p))
+			rq = __migrate_task(rq, p, arg->dest_cpu);
+		else
+			p->wake_cpu = arg->dest_cpu;
+	}
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock(&p->pi_lock);
 
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f..082e71f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -126,6 +126,11 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	cpu_stop_init_done(&done, 1);
 	if (!cpu_stop_queue_work(cpu, &work))
 		return -ENOENT;
+	/*
+	 * In case @cpu == smp_proccessor_id() we can avoid a sleep+wakeup
+	 * cycle by doing a preemption:
+	 */
+	cond_resched();
 	wait_for_completion(&done.completion);
 	return done.ret;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:sched/core] sched/core: Optimize __schedule()
  2016-09-13 16:37               ` Peter Zijlstra
  2016-09-14  2:07                 ` Cheng Chao
  2016-09-14 15:45                 ` Oleg Nesterov
@ 2016-09-22 13:59                 ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-09-22 13:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, peterz, linux-kernel, cs.os.kernel, hpa, tglx, oleg, mingo

Commit-ID:  9af6528ee9b682df7f29dbee86fbba0b67eab944
Gitweb:     http://git.kernel.org/tip/9af6528ee9b682df7f29dbee86fbba0b67eab944
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 13 Sep 2016 18:37:29 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 14:53:45 +0200

sched/core: Optimize __schedule()

Oleg noted that by making do_exit() use __schedule() for the TASK_DEAD
context switch, we can avoid the TASK_DEAD special case currently in
__schedule() because that avoids the extra preempt_disable() from
schedule().

In order to facilitate this, create a do_task_dead() helper which we
place in the scheduler code, such that it can access __schedule().

Also add some __noreturn annotations to the functions, there's no
coming back from do_exit().

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Cheng Chao <cs.os.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20160913163729.GB5012@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/kernel.h |  9 +++------
 include/linux/sched.h  |  2 ++
 kernel/exit.c          | 26 ++------------------------
 kernel/sched/core.c    | 38 +++++++++++++++++++++++++++-----------
 4 files changed, 34 insertions(+), 41 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d96a611..74fd6f0 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -259,17 +259,14 @@ static inline void might_fault(void) { }
 extern struct atomic_notifier_head panic_notifier_list;
 extern long (*panic_blink)(int state);
 __printf(1, 2)
-void panic(const char *fmt, ...)
-	__noreturn __cold;
+void panic(const char *fmt, ...) __noreturn __cold;
 void nmi_panic(struct pt_regs *regs, const char *msg);
 extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
 extern int oops_may_print(void);
-void do_exit(long error_code)
-	__noreturn;
-void complete_and_exit(struct completion *, long)
-	__noreturn;
+void do_exit(long error_code) __noreturn;
+void complete_and_exit(struct completion *, long) __noreturn;
 
 /* Internal, do not use. */
 int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d750240..f00ee8e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -448,6 +448,8 @@ static inline void io_schedule(void)
 	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
 }
 
+void __noreturn do_task_dead(void);
+
 struct nsproxy;
 struct user_namespace;
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 091a78b..1e1d913 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -725,7 +725,7 @@ static void check_stack_usage(void)
 static inline void check_stack_usage(void) {}
 #endif
 
-void do_exit(long code)
+void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
@@ -882,29 +882,7 @@ void do_exit(long code)
 	exit_rcu();
 	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
 
-	/*
-	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
-	 * when the following two conditions become true.
-	 *   - There is race condition of mmap_sem (It is acquired by
-	 *     exit_mm()), and
-	 *   - SMI occurs before setting TASK_RUNINNG.
-	 *     (or hypervisor of virtual machine switches to other guest)
-	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
-	 *
-	 * To avoid it, we have to wait for releasing tsk->pi_lock which
-	 * is held by try_to_wake_up()
-	 */
-	smp_mb();
-	raw_spin_unlock_wait(&tsk->pi_lock);
-
-	/* causes final put_task_struct in finish_task_switch(). */
-	tsk->state = TASK_DEAD;
-	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
-	schedule();
-	BUG();
-	/* Avoid "noreturn function does return".  */
-	for (;;)
-		cpu_relax();	/* For when BUG is null */
+	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ff4e3c0..b2ec53c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3331,17 +3331,6 @@ static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	/*
-	 * do_exit() calls schedule() with preemption disabled as an exception;
-	 * however we must fix that up, otherwise the next task will see an
-	 * inconsistent (higher) preempt count.
-	 *
-	 * It also avoids the below schedule_debug() test from complaining
-	 * about this.
-	 */
-	if (unlikely(prev->state == TASK_DEAD))
-		preempt_enable_no_resched_notrace();
-
 	schedule_debug(prev);
 
 	if (sched_feat(HRTICK))
@@ -3409,6 +3398,33 @@ static void __sched notrace __schedule(bool preempt)
 }
 STACK_FRAME_NON_STANDARD(__schedule); /* switch_to() */
 
+void __noreturn do_task_dead(void)
+{
+	/*
+	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
+	 * when the following two conditions become true.
+	 *   - There is race condition of mmap_sem (It is acquired by
+	 *     exit_mm()), and
+	 *   - SMI occurs before setting TASK_RUNINNG.
+	 *     (or hypervisor of virtual machine switches to other guest)
+	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
+	 *
+	 * To avoid it, we have to wait for releasing tsk->pi_lock which
+	 * is held by try_to_wake_up()
+	 */
+	smp_mb();
+	raw_spin_unlock_wait(&current->pi_lock);
+
+	/* causes final put_task_struct in finish_task_switch(). */
+	__set_current_state(TASK_DEAD);
+	current->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
+	__schedule(false);
+	BUG();
+	/* Avoid "noreturn function does return".  */
+	for (;;)
+		cpu_relax();	/* For when BUG is null */
+}
+
 static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-22 14:00 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <OF73BCE1E4.D2224FEC-ON48258025.0022CFA7-48258025.0022E3DC@kedacom.com>
2016-09-09  8:13 ` [PATCH v2] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
2016-09-09  8:19   ` chengchao
2016-09-09 13:14   ` Oleg Nesterov
2016-09-09 16:24     ` Peter Zijlstra
2016-09-10  8:52   ` [PATCH v3] " Cheng Chao
2016-09-10  9:51     ` Cheng Chao
2016-09-10 16:33       ` Peter Zijlstra
2016-09-12 11:03     ` Oleg Nesterov
2016-09-13  2:45       ` Cheng Chao
2016-09-12 11:37     ` Peter Zijlstra
2016-09-12 11:41       ` Peter Zijlstra
2016-09-12 13:05         ` Oleg Nesterov
2016-09-12 15:01           ` Peter Zijlstra
2016-09-13 16:14             ` Oleg Nesterov
2016-09-13 16:37               ` Peter Zijlstra
2016-09-14  2:07                 ` Cheng Chao
2016-09-14  7:50                   ` Peter Zijlstra
2016-09-14 15:45                 ` Oleg Nesterov
2016-09-22 13:59                 ` [tip:sched/core] sched/core: Optimize __schedule() tip-bot for Peter Zijlstra
2016-09-13  4:03         ` [PATCH v3] stop_machine: Make migration_cpu_stop() does useful works for CONFIG_PREEMPT_NONE Cheng Chao
2016-09-13  8:45           ` Peter Zijlstra
2016-09-14  2:01     ` [PATCH v4] stop_machine: Avoid a sleep and wakeup in the stop_one_cpu() Cheng Chao
2016-09-14 15:53       ` Oleg Nesterov
2016-09-18  2:07       ` Cheng Chao
2016-09-22 13:59       ` [tip:sched/core] stop_machine: Avoid a sleep and wakeup in stop_one_cpu() tip-bot for Cheng Chao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.