linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2)
@ 2020-10-07 18:01 Marcelo Tosatti
  2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti
  2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti
  0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-07 18:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal, Peter Xu

When enabling per-CPU posix timers, an IPI to nohz_full CPUs might be
performed (to re-read the dependencies and possibly not re-enter
nohz_full on a given CPU).

A common case is for applications that run on nohz_full= CPUs
to not use POSIX timers (eg DPDK). This patch changes the notification
to only IPI the target CPUs where the task(s) whose tick dependencies
are being updated are executing.

This reduces interruptions to nohz_full= CPUs.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-07 18:01 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) Marcelo Tosatti
@ 2020-10-07 18:01 ` Marcelo Tosatti
  2020-10-08 12:22   ` Peter Zijlstra
  2020-10-08 14:59   ` Peter Xu
  2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti
  1 sibling, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-07 18:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal,
	Peter Xu, Marcelo Tosatti

When adding a tick dependency to a task, its necessary to
wakeup the CPU where the task resides to reevaluate tick
dependencies on that CPU.

However the current code wakes up all nohz_full CPUs, which 
is unnecessary.

Switch to waking up a single CPU, by using ordering of writes
to task->cpu and task->tick_dep_mask.

From: Frederic Weisbecker <frederic@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu)
 	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
+static void tick_nohz_kick_task(struct task_struct *tsk)
+{
+	int cpu = task_cpu(tsk);
+
+	/*
+	 * If the task concurrently migrates to another cpu,
+	 * we guarantee it sees the new tick dependency upon
+	 * schedule.
+	 *
+	 *
+	 * set_task_cpu(p, cpu);
+	 *   STORE p->cpu = @cpu
+	 * __schedule() (switch to task 'p')
+	 *   LOCK rq->lock
+	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
+	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
+	 *      LOAD p->tick_dep_mask           LOAD p->cpu
+	 */
+
+	preempt_disable();
+	if (cpu_online(cpu))
+		tick_nohz_full_kick_cpu(cpu);
+	preempt_enable();
+}
+
 /*
  * Kick all full dynticks CPUs in order to force these to re-evaluate
  * their dependency on the tick and restart it if necessary.
@@ -356,19 +381,8 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cp
  */
 void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
 {
-	if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) {
-		if (tsk == current) {
-			preempt_disable();
-			tick_nohz_full_kick();
-			preempt_enable();
-		} else {
-			/*
-			 * Some future tick_nohz_full_kick_task()
-			 * should optimize this.
-			 */
-			tick_nohz_full_kick_all();
-		}
-	}
+	if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask))
+		tick_nohz_kick_task(tsk);
 }
 EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);
 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks
  2020-10-07 18:01 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) Marcelo Tosatti
  2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti
@ 2020-10-07 18:01 ` Marcelo Tosatti
  2020-10-08 12:35   ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-07 18:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal,
	Peter Xu, Marcelo Tosatti

Rather than waking up all nohz_full CPUs on the system, only wakeup 
the target CPUs of member threads of the signal.

Reduces interruptions to nohz_full CPUs.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -398,7 +398,15 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta
  */
 void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
 {
-	tick_nohz_dep_set_all(&sig->tick_dep_mask, bit);
+	int prev;
+
+	prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask);
+	if (!prev) {
+		rcu_read_lock();
+		for_each_thread(sig, t)
+			tick_nohz_kick_task(t);
+		rcu_read_unlock();
+	}
 }
 
 void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti
@ 2020-10-08 12:22   ` Peter Zijlstra
  2020-10-08 17:54     ` Marcelo Tosatti
  2020-10-08 14:59   ` Peter Xu
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2020-10-08 12:22 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu

On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> When adding a tick dependency to a task, its necessary to
> wakeup the CPU where the task resides to reevaluate tick
> dependencies on that CPU.
> 
> However the current code wakes up all nohz_full CPUs, which 
> is unnecessary.
> 
> Switch to waking up a single CPU, by using ordering of writes
> to task->cpu and task->tick_dep_mask.
> 
> From: Frederic Weisbecker <frederic@kernel.org>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Index: linux-2.6/kernel/time/tick-sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/tick-sched.c
> +++ linux-2.6/kernel/time/tick-sched.c
> @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu)
>  	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
>  }
>  
> +static void tick_nohz_kick_task(struct task_struct *tsk)
> +{
> +	int cpu = task_cpu(tsk);
> +
> +	/*
> +	 * If the task concurrently migrates to another cpu,
> +	 * we guarantee it sees the new tick dependency upon
> +	 * schedule.
> +	 *
> +	 *
> +	 * set_task_cpu(p, cpu);
> +	 *   STORE p->cpu = @cpu
> +	 * __schedule() (switch to task 'p')
> +	 *   LOCK rq->lock
> +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> +	 */
> +
> +	preempt_disable();
> +	if (cpu_online(cpu))
> +		tick_nohz_full_kick_cpu(cpu);
> +	preempt_enable();
> +}

So we need to kick the CPU unconditionally, or only when the task is
actually running? AFAICT we only care about current->tick_dep_mask.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks
  2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti
@ 2020-10-08 12:35   ` Peter Zijlstra
  2020-10-08 18:04     ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2020-10-08 12:35 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu,
	Oleg Nesterov

On Wed, Oct 07, 2020 at 03:01:53PM -0300, Marcelo Tosatti wrote:
> Rather than waking up all nohz_full CPUs on the system, only wakeup 
> the target CPUs of member threads of the signal.
> 
> Reduces interruptions to nohz_full CPUs.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Index: linux-2.6/kernel/time/tick-sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/tick-sched.c
> +++ linux-2.6/kernel/time/tick-sched.c
> @@ -398,7 +398,15 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta
>   */
>  void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
>  {
> -	tick_nohz_dep_set_all(&sig->tick_dep_mask, bit);
> +	int prev;
> +
> +	prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask);
> +	if (!prev) {
> +		rcu_read_lock();
> +		for_each_thread(sig, t)
> +			tick_nohz_kick_task(t);
> +		rcu_read_unlock();
> +	}
>  }

AFAICT, and this makes perfect sense, this function is only ever used
while holding sighand->siglock, which makes the RCU read lock
superfluous.

Would it make sense to change the signal_struct argument to task_struct,
such that we can write:

	lockdep_assert_held(&p->sighand->siglock);
	for_each_thread(p->signal, t)
		tick_nohz_kick_task(t);

?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti
  2020-10-08 12:22   ` Peter Zijlstra
@ 2020-10-08 14:59   ` Peter Xu
  2020-10-08 15:28     ` Peter Zijlstra
  2020-10-08 17:43     ` Marcelo Tosatti
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Xu @ 2020-10-08 14:59 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal

On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> +static void tick_nohz_kick_task(struct task_struct *tsk)
> +{
> +	int cpu = task_cpu(tsk);
> +
> +	/*
> +	 * If the task concurrently migrates to another cpu,
> +	 * we guarantee it sees the new tick dependency upon
> +	 * schedule.
> +	 *
> +	 *
> +	 * set_task_cpu(p, cpu);
> +	 *   STORE p->cpu = @cpu
> +	 * __schedule() (switch to task 'p')
> +	 *   LOCK rq->lock
> +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> +	 */
> +
> +	preempt_disable();

Pure question: is preempt_disable() required here?  Same question to
tick_nohz_full_kick_all().

> +	if (cpu_online(cpu))
> +		tick_nohz_full_kick_cpu(cpu);
> +	preempt_enable();
> +}

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 14:59   ` Peter Xu
@ 2020-10-08 15:28     ` Peter Zijlstra
  2020-10-08 19:16       ` Peter Xu
  2020-10-08 19:48       ` Frederic Weisbecker
  2020-10-08 17:43     ` Marcelo Tosatti
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-10-08 15:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Marcelo Tosatti, linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal

On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote:
> On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > +{
> > +	int cpu = task_cpu(tsk);
> > +
> > +	/*
> > +	 * If the task concurrently migrates to another cpu,
> > +	 * we guarantee it sees the new tick dependency upon
> > +	 * schedule.
> > +	 *
> > +	 *
> > +	 * set_task_cpu(p, cpu);
> > +	 *   STORE p->cpu = @cpu
> > +	 * __schedule() (switch to task 'p')
> > +	 *   LOCK rq->lock
> > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > +	 */
> > +
> > +	preempt_disable();
> 
> Pure question: is preempt_disable() required here?  Same question to
> tick_nohz_full_kick_all().

I think it serializes against hotplug.

> 
> > +	if (cpu_online(cpu))
> > +		tick_nohz_full_kick_cpu(cpu);
> > +	preempt_enable();
> > +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 14:59   ` Peter Xu
  2020-10-08 15:28     ` Peter Zijlstra
@ 2020-10-08 17:43     ` Marcelo Tosatti
  1 sibling, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-08 17:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal

On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote:
> On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > +{
> > +	int cpu = task_cpu(tsk);
> > +
> > +	/*
> > +	 * If the task concurrently migrates to another cpu,
> > +	 * we guarantee it sees the new tick dependency upon
> > +	 * schedule.
> > +	 *
> > +	 *
> > +	 * set_task_cpu(p, cpu);
> > +	 *   STORE p->cpu = @cpu
> > +	 * __schedule() (switch to task 'p')
> > +	 *   LOCK rq->lock
> > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > +	 */
> > +
> > +	preempt_disable();
> 
> Pure question: is preempt_disable() required here?  Same question to
> tick_nohz_full_kick_all().

Hi Peter,

Don't see why: irq_queue_work_on() disables preemption if necessary.

> 
> > +	if (cpu_online(cpu))
> > +		tick_nohz_full_kick_cpu(cpu);
> > +	preempt_enable();
> > +}
> 
> -- 
> Peter Xu


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 12:22   ` Peter Zijlstra
@ 2020-10-08 17:54     ` Marcelo Tosatti
  2020-10-08 19:54       ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-08 17:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu

On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > When adding a tick dependency to a task, its necessary to
> > wakeup the CPU where the task resides to reevaluate tick
> > dependencies on that CPU.
> > 
> > However the current code wakes up all nohz_full CPUs, which 
> > is unnecessary.
> > 
> > Switch to waking up a single CPU, by using ordering of writes
> > to task->cpu and task->tick_dep_mask.
> > 
> > From: Frederic Weisbecker <frederic@kernel.org>
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Index: linux-2.6/kernel/time/tick-sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/time/tick-sched.c
> > +++ linux-2.6/kernel/time/tick-sched.c
> > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu)
> >  	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> >  }
> >  
> > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > +{
> > +	int cpu = task_cpu(tsk);
> > +
> > +	/*
> > +	 * If the task concurrently migrates to another cpu,
> > +	 * we guarantee it sees the new tick dependency upon
> > +	 * schedule.
> > +	 *
> > +	 *
> > +	 * set_task_cpu(p, cpu);
> > +	 *   STORE p->cpu = @cpu
> > +	 * __schedule() (switch to task 'p')
> > +	 *   LOCK rq->lock
> > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > +	 */
> > +
> > +	preempt_disable();
> > +	if (cpu_online(cpu))
> > +		tick_nohz_full_kick_cpu(cpu);
> > +	preempt_enable();
> > +}
> 
> So we need to kick the CPU unconditionally, or only when the task is
> actually running? AFAICT we only care about current->tick_dep_mask.

tick is necessary to execute run_posix_cpu_timers, from tick interrupt, 
even if task is not running.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks
  2020-10-08 12:35   ` Peter Zijlstra
@ 2020-10-08 18:04     ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-08 18:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu,
	Oleg Nesterov

On Thu, Oct 08, 2020 at 02:35:44PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 03:01:53PM -0300, Marcelo Tosatti wrote:
> > Rather than waking up all nohz_full CPUs on the system, only wakeup 
> > the target CPUs of member threads of the signal.
> > 
> > Reduces interruptions to nohz_full CPUs.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Index: linux-2.6/kernel/time/tick-sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/time/tick-sched.c
> > +++ linux-2.6/kernel/time/tick-sched.c
> > @@ -398,7 +398,15 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta
> >   */
> >  void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
> >  {
> > -	tick_nohz_dep_set_all(&sig->tick_dep_mask, bit);
> > +	int prev;
> > +
> > +	prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask);
> > +	if (!prev) {
> > +		rcu_read_lock();
> > +		for_each_thread(sig, t)
> > +			tick_nohz_kick_task(t);
> > +		rcu_read_unlock();
> > +	}
> >  }
> 
> AFAICT, and this makes perfect sense, this function is only ever used
> while holding sighand->siglock, which makes the RCU read lock
> superfluous.
> 
> Would it make sense to change the signal_struct argument to task_struct,
> such that we can write:
> 
> 	lockdep_assert_held(&p->sighand->siglock);
> 	for_each_thread(p->signal, t)
> 		tick_nohz_kick_task(t);
> 
> ?

Makes sense, resending -v3.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 15:28     ` Peter Zijlstra
@ 2020-10-08 19:16       ` Peter Xu
  2020-10-08 19:48       ` Frederic Weisbecker
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Xu @ 2020-10-08 19:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal

On Thu, Oct 08, 2020 at 05:28:44PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote:
> > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > > +{
> > > +	int cpu = task_cpu(tsk);
> > > +
> > > +	/*
> > > +	 * If the task concurrently migrates to another cpu,
> > > +	 * we guarantee it sees the new tick dependency upon
> > > +	 * schedule.
> > > +	 *
> > > +	 *
> > > +	 * set_task_cpu(p, cpu);
> > > +	 *   STORE p->cpu = @cpu
> > > +	 * __schedule() (switch to task 'p')
> > > +	 *   LOCK rq->lock
> > > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > > +	 */
> > > +
> > > +	preempt_disable();
> > 
> > Pure question: is preempt_disable() required here?  Same question to
> > tick_nohz_full_kick_all().
> 
> I think it serializes against hotplug.

Thanks Peter.  So is that a lighter but trickier version of get_online_cpus()
which is even allowed with spinlock?

I noticed that this method was actually mentioned in the old cpu-hotplug.txt,
but it was removed during the convertion to rst:

  ff58fa7f556c ("Documentation: Update CPU hotplug and move it to core-api")

Not sure whether it's intended, just to raise this up.  If it's still valid,
maybe still worth to document it somewhere.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 15:28     ` Peter Zijlstra
  2020-10-08 19:16       ` Peter Xu
@ 2020-10-08 19:48       ` Frederic Weisbecker
  1 sibling, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2020-10-08 19:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Peter Xu, Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal

On Thu, Oct 08, 2020 at 05:28:44PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote:
> > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > > +{
> > > +	int cpu = task_cpu(tsk);
> > > +
> > > +	/*
> > > +	 * If the task concurrently migrates to another cpu,
> > > +	 * we guarantee it sees the new tick dependency upon
> > > +	 * schedule.
> > > +	 *
> > > +	 *
> > > +	 * set_task_cpu(p, cpu);
> > > +	 *   STORE p->cpu = @cpu
> > > +	 * __schedule() (switch to task 'p')
> > > +	 *   LOCK rq->lock
> > > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > > +	 */
> > > +
> > > +	preempt_disable();
> > 
> > Pure question: is preempt_disable() required here?  Same question to
> > tick_nohz_full_kick_all().
> 
> I think it serializes against hotplug.

Exactly!

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 17:54     ` Marcelo Tosatti
@ 2020-10-08 19:54       ` Frederic Weisbecker
  2020-10-13 17:13         ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2020-10-08 19:54 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote:
> On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > > When adding a tick dependency to a task, its necessary to
> > > wakeup the CPU where the task resides to reevaluate tick
> > > dependencies on that CPU.
> > > 
> > > However the current code wakes up all nohz_full CPUs, which 
> > > is unnecessary.
> > > 
> > > Switch to waking up a single CPU, by using ordering of writes
> > > to task->cpu and task->tick_dep_mask.
> > > 
> > > From: Frederic Weisbecker <frederic@kernel.org>
> > > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > 
> > > Index: linux-2.6/kernel/time/tick-sched.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/time/tick-sched.c
> > > +++ linux-2.6/kernel/time/tick-sched.c
> > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu)
> > >  	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> > >  }
> > >  
> > > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > > +{
> > > +	int cpu = task_cpu(tsk);
> > > +
> > > +	/*
> > > +	 * If the task concurrently migrates to another cpu,
> > > +	 * we guarantee it sees the new tick dependency upon
> > > +	 * schedule.
> > > +	 *
> > > +	 *
> > > +	 * set_task_cpu(p, cpu);
> > > +	 *   STORE p->cpu = @cpu
> > > +	 * __schedule() (switch to task 'p')
> > > +	 *   LOCK rq->lock
> > > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > > +	 */
> > > +
> > > +	preempt_disable();
> > > +	if (cpu_online(cpu))
> > > +		tick_nohz_full_kick_cpu(cpu);
> > > +	preempt_enable();
> > > +}
> > 
> > So we need to kick the CPU unconditionally, or only when the task is
> > actually running? AFAICT we only care about current->tick_dep_mask.
> 
> tick is necessary to execute run_posix_cpu_timers, from tick interrupt, 
> even if task is not running.

Yes but if the task isn't running, run_posix_cpu_timers() doesn't have
anything to elapse. So indeed we can spare the IPI if the task is not
running. Provided ordering makes sure that the task sees the new dependency
when it schedules in of course.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-08 19:54       ` Frederic Weisbecker
@ 2020-10-13 17:13         ` Marcelo Tosatti
  2020-10-14  8:33           ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-13 17:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Thu, Oct 08, 2020 at 09:54:44PM +0200, Frederic Weisbecker wrote:
> On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote:
> > On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > > > When adding a tick dependency to a task, its necessary to
> > > > wakeup the CPU where the task resides to reevaluate tick
> > > > dependencies on that CPU.
> > > > 
> > > > However the current code wakes up all nohz_full CPUs, which 
> > > > is unnecessary.
> > > > 
> > > > Switch to waking up a single CPU, by using ordering of writes
> > > > to task->cpu and task->tick_dep_mask.
> > > > 
> > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > > 
> > > > Index: linux-2.6/kernel/time/tick-sched.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/kernel/time/tick-sched.c
> > > > +++ linux-2.6/kernel/time/tick-sched.c
> > > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu)
> > > >  	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> > > >  }
> > > >  
> > > > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > > > +{
> > > > +	int cpu = task_cpu(tsk);
> > > > +
> > > > +	/*
> > > > +	 * If the task concurrently migrates to another cpu,
> > > > +	 * we guarantee it sees the new tick dependency upon
> > > > +	 * schedule.
> > > > +	 *
> > > > +	 *
> > > > +	 * set_task_cpu(p, cpu);
> > > > +	 *   STORE p->cpu = @cpu
> > > > +	 * __schedule() (switch to task 'p')
> > > > +	 *   LOCK rq->lock
> > > > +	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
> > > > +	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
> > > > +	 *      LOAD p->tick_dep_mask           LOAD p->cpu
> > > > +	 */
> > > > +
> > > > +	preempt_disable();
> > > > +	if (cpu_online(cpu))
> > > > +		tick_nohz_full_kick_cpu(cpu);
> > > > +	preempt_enable();
> > > > +}
> > > 
> > > So we need to kick the CPU unconditionally, or only when the task is
> > > actually running? AFAICT we only care about current->tick_dep_mask.
> > 
> > tick is necessary to execute run_posix_cpu_timers, from tick interrupt, 
> > even if task is not running.
> 
> Yes but if the task isn't running, run_posix_cpu_timers() doesn't have
> anything to elapse. So indeed we can spare the IPI if the task is not
> running. Provided ordering makes sure that the task sees the new dependency
> when it schedules in of course.

True.

 * p->on_cpu <- { 0, 1 }:
 *
 *   is set by prepare_task() and cleared by finish_task() such that it will be
 *   set before p is scheduled-in and cleared after p is scheduled-out, both
 *   under rq->lock. Non-zero indicates the task is running on its CPU.


CPU-0 (tick_set_dep)            CPU-1 (task switch)

STORE p->tick_dep_mask
smp_mb() (atomic_fetch_or())
LOAD p->on_cpu


                                context_switch(prev, next)
                                STORE next->on_cpu = 1
                                ...                             [*]

                                LOAD current->tick_dep_mask


Don't see any explicit memory barrier in the [*] section?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-13 17:13         ` Marcelo Tosatti
@ 2020-10-14  8:33           ` Peter Zijlstra
  2020-10-14 23:40             ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2020-10-14  8:33 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Frederic Weisbecker, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote:

> > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have
> > anything to elapse. So indeed we can spare the IPI if the task is not
> > running. Provided ordering makes sure that the task sees the new dependency
> > when it schedules in of course.
> 
> True.
> 
>  * p->on_cpu <- { 0, 1 }:
>  *
>  *   is set by prepare_task() and cleared by finish_task() such that it will be
>  *   set before p is scheduled-in and cleared after p is scheduled-out, both
>  *   under rq->lock. Non-zero indicates the task is running on its CPU.
> 
> 
> CPU-0 (tick_set_dep)            CPU-1 (task switch)
> 
> STORE p->tick_dep_mask
> smp_mb() (atomic_fetch_or())
> LOAD p->on_cpu
> 
> 
>                                 context_switch(prev, next)
>                                 STORE next->on_cpu = 1
>                                 ...                             [*]
> 
>                                 LOAD current->tick_dep_mask
> 

That load is in tick_nohz_task_switch() right? (which BTW is placed
completely wrong) You could easily do something like the below I
suppose.

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 81632cd5e3b7..2a5fafe66bb0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void)
 	ts = this_cpu_ptr(&tick_cpu_sched);
 
 	if (ts->tick_stopped) {
+		/*
+		 * tick_set_dep()		(this)
+		 *
+		 * STORE p->tick_dep_mask	STORE p->on_cpu
+		 * smp_mb()			smp_mb()
+		 * LOAD p->on_cpu		LOAD p->tick_dep_mask
+		 */
+		smp_mb();
 		if (atomic_read(&current->tick_dep_mask) ||
 		    atomic_read(&current->signal->tick_dep_mask))
 			tick_nohz_full_kick();



re tick_nohz_task_switch() being placed wrong, it should probably be
placed before finish_lock_switch(). Something like so.


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cf044580683c..5c92c959824f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
 	finish_task(prev);
+	tick_nohz_task_switch();
 	finish_lock_switch(rq);
 	finish_arch_post_lock_switch();
 	kcov_finish_switch(current);
@@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 		put_task_struct_rcu_user(prev);
 	}
 
-	tick_nohz_task_switch();
 	return rq;
 }
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 81632cd5e3b7..c8bddc9fb792 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -402,10 +402,8 @@ void __tick_nohz_task_switch(void)
 	unsigned long flags;
 	struct tick_sched *ts;
 
-	local_irq_save(flags);
-
 	if (!tick_nohz_full_cpu(smp_processor_id()))
-		goto out;
+		return;
 
 	ts = this_cpu_ptr(&tick_cpu_sched);
 
@@ -414,8 +412,6 @@ void __tick_nohz_task_switch(void)
 		    atomic_read(&current->signal->tick_dep_mask))
 			tick_nohz_full_kick();
 	}
-out:
-	local_irq_restore(flags);
 }
 
 /* Get the boot-time nohz CPU list from the kernel parameters. */

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-14  8:33           ` Peter Zijlstra
@ 2020-10-14 23:40             ` Frederic Weisbecker
  2020-10-15 10:12               ` Peter Zijlstra
  2020-10-20 18:52               ` Marcelo Tosatti
  0 siblings, 2 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2020-10-14 23:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Wed, Oct 14, 2020 at 10:33:21AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote:
> 
> > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have
> > > anything to elapse. So indeed we can spare the IPI if the task is not
> > > running. Provided ordering makes sure that the task sees the new dependency
> > > when it schedules in of course.
> > 
> > True.
> > 
> >  * p->on_cpu <- { 0, 1 }:
> >  *
> >  *   is set by prepare_task() and cleared by finish_task() such that it will be
> >  *   set before p is scheduled-in and cleared after p is scheduled-out, both
> >  *   under rq->lock. Non-zero indicates the task is running on its CPU.
> > 
> > 
> > CPU-0 (tick_set_dep)            CPU-1 (task switch)
> > 
> > STORE p->tick_dep_mask
> > smp_mb() (atomic_fetch_or())
> > LOAD p->on_cpu
> > 
> > 
> >                                 context_switch(prev, next)
> >                                 STORE next->on_cpu = 1
> >                                 ...                             [*]
> > 
> >                                 LOAD current->tick_dep_mask
> > 
> 
> That load is in tick_nohz_task_switch() right? (which BTW is placed
> completely wrong) You could easily do something like the below I
> suppose.
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 81632cd5e3b7..2a5fafe66bb0 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void)
>  	ts = this_cpu_ptr(&tick_cpu_sched);
>  
>  	if (ts->tick_stopped) {
> +		/*
> +		 * tick_set_dep()		(this)
> +		 *
> +		 * STORE p->tick_dep_mask	STORE p->on_cpu
> +		 * smp_mb()			smp_mb()
> +		 * LOAD p->on_cpu		LOAD p->tick_dep_mask
> +		 */
> +		smp_mb();
>  		if (atomic_read(&current->tick_dep_mask) ||
>  		    atomic_read(&current->signal->tick_dep_mask))
>  			tick_nohz_full_kick();

It would then need to be unconditional (whatever value of ts->tick_stopped).
Assuming the tick isn't stopped, we may well have an interrupt firing right
after schedule() which doesn't see the new value of tick_dep_map.

Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED
at wake up time, prior to the schedule() full barrier. Of course that doesn't
mean that the task is actually the one running on the CPU but it's a good sign,
considering that we are running in nohz_full mode and it's usually optimized
for single task mode.

Also setting a remote task's tick dependency is only used by posix cpu timer
in case the user has the bad taste to enqueue on a task running in nohz_full
mode. It shouldn't deserve an unconditional full barrier in the schedule path.

If the target is current, as is used by RCU, I guess we can keep a special
treatment.

> re tick_nohz_task_switch() being placed wrong, it should probably be
> placed before finish_lock_switch(). Something like so.
> 
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cf044580683c..5c92c959824f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>  	vtime_task_switch(prev);
>  	perf_event_task_sched_in(prev, current);
>  	finish_task(prev);
> +	tick_nohz_task_switch();
>  	finish_lock_switch(rq);
>  	finish_arch_post_lock_switch();
>  	kcov_finish_switch(current);
> @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>  		put_task_struct_rcu_user(prev);
>  	}
>  
> -	tick_nohz_task_switch();

IIRC, we wanted to keep it outside rq lock because it shouldn't it...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-14 23:40             ` Frederic Weisbecker
@ 2020-10-15 10:12               ` Peter Zijlstra
  2020-10-26 14:42                 ` Frederic Weisbecker
  2020-10-20 18:52               ` Marcelo Tosatti
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2020-10-15 10:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote:
> > re tick_nohz_task_switch() being placed wrong, it should probably be
> > placed before finish_lock_switch(). Something like so.
> > 
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index cf044580683c..5c92c959824f 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> >  	vtime_task_switch(prev);
> >  	perf_event_task_sched_in(prev, current);
> >  	finish_task(prev);
> > +	tick_nohz_task_switch();
> >  	finish_lock_switch(rq);
> >  	finish_arch_post_lock_switch();
> >  	kcov_finish_switch(current);
> > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> >  		put_task_struct_rcu_user(prev);
> >  	}
> >  
> > -	tick_nohz_task_switch();
> 
> IIRC, we wanted to keep it outside rq lock because it shouldn't it...

But now you've created a window with IRQs on and cause additional IRQ
state changes.

If you're really worried about rq->lock, I suppose we can do:

	rq_unlock(rq->lock);
	tick_nohz_task_switch();
	local_irq_enable();

(much like we do at the beginning of __schedule for RCU)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-14 23:40             ` Frederic Weisbecker
  2020-10-15 10:12               ` Peter Zijlstra
@ 2020-10-20 18:52               ` Marcelo Tosatti
  2020-10-22 12:53                 ` Frederic Weisbecker
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2020-10-20 18:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 14, 2020 at 10:33:21AM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote:
> > 
> > > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have
> > > > anything to elapse. So indeed we can spare the IPI if the task is not
> > > > running. Provided ordering makes sure that the task sees the new dependency
> > > > when it schedules in of course.
> > > 
> > > True.
> > > 
> > >  * p->on_cpu <- { 0, 1 }:
> > >  *
> > >  *   is set by prepare_task() and cleared by finish_task() such that it will be
> > >  *   set before p is scheduled-in and cleared after p is scheduled-out, both
> > >  *   under rq->lock. Non-zero indicates the task is running on its CPU.
> > > 
> > > 
> > > CPU-0 (tick_set_dep)            CPU-1 (task switch)
> > > 
> > > STORE p->tick_dep_mask
> > > smp_mb() (atomic_fetch_or())
> > > LOAD p->on_cpu
> > > 
> > > 
> > >                                 context_switch(prev, next)
> > >                                 STORE next->on_cpu = 1
> > >                                 ...                             [*]
> > > 
> > >                                 LOAD current->tick_dep_mask
> > > 
> > 
> > That load is in tick_nohz_task_switch() right? (which BTW is placed
> > completely wrong) You could easily do something like the below I
> > suppose.
> > 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 81632cd5e3b7..2a5fafe66bb0 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void)
> >  	ts = this_cpu_ptr(&tick_cpu_sched);
> >  
> >  	if (ts->tick_stopped) {
> > +		/*
> > +		 * tick_set_dep()		(this)
> > +		 *
> > +		 * STORE p->tick_dep_mask	STORE p->on_cpu
> > +		 * smp_mb()			smp_mb()
> > +		 * LOAD p->on_cpu		LOAD p->tick_dep_mask
> > +		 */
> > +		smp_mb();
> >  		if (atomic_read(&current->tick_dep_mask) ||
> >  		    atomic_read(&current->signal->tick_dep_mask))
> >  			tick_nohz_full_kick();
> 
> It would then need to be unconditional (whatever value of ts->tick_stopped).
> Assuming the tick isn't stopped, we may well have an interrupt firing right
> after schedule() which doesn't see the new value of tick_dep_map.
> 
> Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED
> at wake up time, prior to the schedule() full barrier. Of course that doesn't
> mean that the task is actually the one running on the CPU but it's a good sign,
> considering that we are running in nohz_full mode and it's usually optimized
> for single task mode.

Unfortunately that would require exporting p->on_rq which is internal to
the scheduler, locklessly.

(can surely do that if you prefer!)

> 
> Also setting a remote task's tick dependency is only used by posix cpu timer
> in case the user has the bad taste to enqueue on a task running in nohz_full
> mode. It shouldn't deserve an unconditional full barrier in the schedule path.
> 
> If the target is current, as is used by RCU, I guess we can keep a special
> treatment.

To answer PeterZ's original question:

"So we need to kick the CPU unconditionally, or only when the task is
actually running? AFAICT we only care about current->tick_dep_mask."

If there is a task sharing signals, executing on a remote CPU, yes that remote CPU 
should be awakened.

Could skip the IPI if the remote process is not running, however for 
the purposes of low latency isolated processes, this optimization is
not necessary.

So the last posted patchset is good enough for isolated low latency processes.

Do you guys want me to do something or can you take the series as it is?

> > re tick_nohz_task_switch() being placed wrong, it should probably be
> > placed before finish_lock_switch(). Something like so.
> > 
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index cf044580683c..5c92c959824f 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> >  	vtime_task_switch(prev);
> >  	perf_event_task_sched_in(prev, current);
> >  	finish_task(prev);
> > +	tick_nohz_task_switch();
> >  	finish_lock_switch(rq);
> >  	finish_arch_post_lock_switch();
> >  	kcov_finish_switch(current);
> > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> >  		put_task_struct_rcu_user(prev);
> >  	}
> >  
> > -	tick_nohz_task_switch();
> 
> IIRC, we wanted to keep it outside rq lock because it shouldn't it...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-20 18:52               ` Marcelo Tosatti
@ 2020-10-22 12:53                 ` Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2020-10-22 12:53 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Tue, Oct 20, 2020 at 03:52:45PM -0300, Marcelo Tosatti wrote:
> On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote:
> > Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED
> > at wake up time, prior to the schedule() full barrier. Of course that doesn't
> > mean that the task is actually the one running on the CPU but it's a good sign,
> > considering that we are running in nohz_full mode and it's usually optimized
> > for single task mode.
> 
> Unfortunately that would require exporting p->on_rq which is internal to
> the scheduler, locklessly.
> 
> (can surely do that if you prefer!)

May be:

    bool task_on_rq(struct task_struct *p) ?

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
  2020-10-15 10:12               ` Peter Zijlstra
@ 2020-10-26 14:42                 ` Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2020-10-26 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal, Peter Xu

On Thu, Oct 15, 2020 at 12:12:35PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote:
> > > re tick_nohz_task_switch() being placed wrong, it should probably be
> > > placed before finish_lock_switch(). Something like so.
> > > 
> > > 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index cf044580683c..5c92c959824f 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> > >  	vtime_task_switch(prev);
> > >  	perf_event_task_sched_in(prev, current);
> > >  	finish_task(prev);
> > > +	tick_nohz_task_switch();
> > >  	finish_lock_switch(rq);
> > >  	finish_arch_post_lock_switch();
> > >  	kcov_finish_switch(current);
> > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> > >  		put_task_struct_rcu_user(prev);
> > >  	}
> > >  
> > > -	tick_nohz_task_switch();
> > 
> > IIRC, we wanted to keep it outside rq lock because it shouldn't it...
> 
> But now you've created a window with IRQs on and cause additional IRQ
> state changes.
> 
> If you're really worried about rq->lock, I suppose we can do:
> 
> 	rq_unlock(rq->lock);
> 	tick_nohz_task_switch();
> 	local_irq_enable();
> 
> (much like we do at the beginning of __schedule for RCU)

Right. Well I'm not that worried about rq->lock though. If you're ok
with it I can just move it before finish_lock_switch().

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-10-26 14:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 18:01 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) Marcelo Tosatti
2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti
2020-10-08 12:22   ` Peter Zijlstra
2020-10-08 17:54     ` Marcelo Tosatti
2020-10-08 19:54       ` Frederic Weisbecker
2020-10-13 17:13         ` Marcelo Tosatti
2020-10-14  8:33           ` Peter Zijlstra
2020-10-14 23:40             ` Frederic Weisbecker
2020-10-15 10:12               ` Peter Zijlstra
2020-10-26 14:42                 ` Frederic Weisbecker
2020-10-20 18:52               ` Marcelo Tosatti
2020-10-22 12:53                 ` Frederic Weisbecker
2020-10-08 14:59   ` Peter Xu
2020-10-08 15:28     ` Peter Zijlstra
2020-10-08 19:16       ` Peter Xu
2020-10-08 19:48       ` Frederic Weisbecker
2020-10-08 17:43     ` Marcelo Tosatti
2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti
2020-10-08 12:35   ` Peter Zijlstra
2020-10-08 18:04     ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).