* [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) @ 2020-10-07 18:01 Marcelo Tosatti 2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti 2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti 0 siblings, 2 replies; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-07 18:01 UTC (permalink / raw) To: linux-kernel Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal, Peter Xu When enabling per-CPU posix timers, an IPI to nohz_full CPUs might be performed (to re-read the dependencies and possibly not re-enter nohz_full on a given CPU). A common case is for applications that run on nohz_full= CPUs to not use POSIX timers (eg DPDK). This patch changes the notification to only IPI the target CPUs where the task(s) whose tick dependencies are being updated are executing. This reduces interruptions to nohz_full= CPUs. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-07 18:01 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) Marcelo Tosatti @ 2020-10-07 18:01 ` Marcelo Tosatti 2020-10-08 12:22 ` Peter Zijlstra 2020-10-08 14:59 ` Peter Xu 2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti 1 sibling, 2 replies; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-07 18:01 UTC (permalink / raw) To: linux-kernel Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal, Peter Xu, Marcelo Tosatti When adding a tick dependency to a task, its necessary to wakeup the CPU where the task resides to reevaluate tick dependencies on that CPU. However the current code wakes up all nohz_full CPUs, which is unnecessary. Switch to waking up a single CPU, by using ordering of writes to task->cpu and task->tick_dep_mask. From: Frederic Weisbecker <frederic@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); } +static void tick_nohz_kick_task(struct task_struct *tsk) +{ + int cpu = task_cpu(tsk); + + /* + * If the task concurrently migrates to another cpu, + * we guarantee it sees the new tick dependency upon + * schedule. + * + * + * set_task_cpu(p, cpu); + * STORE p->cpu = @cpu + * __schedule() (switch to task 'p') + * LOCK rq->lock + * smp_mb__after_spin_lock() STORE p->tick_dep_mask + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) + * LOAD p->tick_dep_mask LOAD p->cpu + */ + + preempt_disable(); + if (cpu_online(cpu)) + tick_nohz_full_kick_cpu(cpu); + preempt_enable(); +} + /* * Kick all full dynticks CPUs in order to force these to re-evaluate * their dependency on the tick and restart it if necessary. @@ -356,19 +381,8 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cp */ void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) { - if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) { - if (tsk == current) { - preempt_disable(); - tick_nohz_full_kick(); - preempt_enable(); - } else { - /* - * Some future tick_nohz_full_kick_task() - * should optimize this. - */ - tick_nohz_full_kick_all(); - } - } + if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) + tick_nohz_kick_task(tsk); } EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti @ 2020-10-08 12:22 ` Peter Zijlstra 2020-10-08 17:54 ` Marcelo Tosatti 2020-10-08 14:59 ` Peter Xu 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2020-10-08 12:22 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > When adding a tick dependency to a task, its necessary to > wakeup the CPU where the task resides to reevaluate tick > dependencies on that CPU. > > However the current code wakes up all nohz_full CPUs, which > is unnecessary. > > Switch to waking up a single CPU, by using ordering of writes > to task->cpu and task->tick_dep_mask. > > From: Frederic Weisbecker <frederic@kernel.org> > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Index: linux-2.6/kernel/time/tick-sched.c > =================================================================== > --- linux-2.6.orig/kernel/time/tick-sched.c > +++ linux-2.6/kernel/time/tick-sched.c > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > } > > +static void tick_nohz_kick_task(struct task_struct *tsk) > +{ > + int cpu = task_cpu(tsk); > + > + /* > + * If the task concurrently migrates to another cpu, > + * we guarantee it sees the new tick dependency upon > + * schedule. > + * > + * > + * set_task_cpu(p, cpu); > + * STORE p->cpu = @cpu > + * __schedule() (switch to task 'p') > + * LOCK rq->lock > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > + * LOAD p->tick_dep_mask LOAD p->cpu > + */ > + > + preempt_disable(); > + if (cpu_online(cpu)) > + tick_nohz_full_kick_cpu(cpu); > + preempt_enable(); > +} So we need to kick the CPU unconditionally, or only when the task is actually running? AFAICT we only care about current->tick_dep_mask. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 12:22 ` Peter Zijlstra @ 2020-10-08 17:54 ` Marcelo Tosatti 2020-10-08 19:54 ` Frederic Weisbecker 0 siblings, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-08 17:54 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote: > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > When adding a tick dependency to a task, its necessary to > > wakeup the CPU where the task resides to reevaluate tick > > dependencies on that CPU. > > > > However the current code wakes up all nohz_full CPUs, which > > is unnecessary. > > > > Switch to waking up a single CPU, by using ordering of writes > > to task->cpu and task->tick_dep_mask. > > > > From: Frederic Weisbecker <frederic@kernel.org> > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > > Index: linux-2.6/kernel/time/tick-sched.c > > =================================================================== > > --- linux-2.6.orig/kernel/time/tick-sched.c > > +++ linux-2.6/kernel/time/tick-sched.c > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > > } > > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > +{ > > + int cpu = task_cpu(tsk); > > + > > + /* > > + * If the task concurrently migrates to another cpu, > > + * we guarantee it sees the new tick dependency upon > > + * schedule. > > + * > > + * > > + * set_task_cpu(p, cpu); > > + * STORE p->cpu = @cpu > > + * __schedule() (switch to task 'p') > > + * LOCK rq->lock > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > + * LOAD p->tick_dep_mask LOAD p->cpu > > + */ > > + > > + preempt_disable(); > > + if (cpu_online(cpu)) > > + tick_nohz_full_kick_cpu(cpu); > > + preempt_enable(); > > +} > > So we need to kick the CPU unconditionally, or only when the task is > actually running? AFAICT we only care about current->tick_dep_mask. tick is necessary to execute run_posix_cpu_timers, from tick interrupt, even if task is not running. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 17:54 ` Marcelo Tosatti @ 2020-10-08 19:54 ` Frederic Weisbecker 2020-10-13 17:13 ` Marcelo Tosatti 0 siblings, 1 reply; 21+ messages in thread From: Frederic Weisbecker @ 2020-10-08 19:54 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote: > On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > When adding a tick dependency to a task, its necessary to > > > wakeup the CPU where the task resides to reevaluate tick > > > dependencies on that CPU. > > > > > > However the current code wakes up all nohz_full CPUs, which > > > is unnecessary. > > > > > > Switch to waking up a single CPU, by using ordering of writes > > > to task->cpu and task->tick_dep_mask. > > > > > > From: Frederic Weisbecker <frederic@kernel.org> > > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > Index: linux-2.6/kernel/time/tick-sched.c > > > =================================================================== > > > --- linux-2.6.orig/kernel/time/tick-sched.c > > > +++ linux-2.6/kernel/time/tick-sched.c > > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > > > } > > > > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > +{ > > > + int cpu = task_cpu(tsk); > > > + > > > + /* > > > + * If the task concurrently migrates to another cpu, > > > + * we guarantee it sees the new tick dependency upon > > > + * schedule. > > > + * > > > + * > > > + * set_task_cpu(p, cpu); > > > + * STORE p->cpu = @cpu > > > + * __schedule() (switch to task 'p') > > > + * LOCK rq->lock > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > + */ > > > + > > > + preempt_disable(); > > > + if (cpu_online(cpu)) > > > + tick_nohz_full_kick_cpu(cpu); > > > + preempt_enable(); > > > +} > > > > So we need to kick the CPU unconditionally, or only when the task is > > actually running? AFAICT we only care about current->tick_dep_mask. > > tick is necessary to execute run_posix_cpu_timers, from tick interrupt, > even if task is not running. Yes but if the task isn't running, run_posix_cpu_timers() doesn't have anything to elapse. So indeed we can spare the IPI if the task is not running. Provided ordering makes sure that the task sees the new dependency when it schedules in of course. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 19:54 ` Frederic Weisbecker @ 2020-10-13 17:13 ` Marcelo Tosatti 2020-10-14 8:33 ` Peter Zijlstra 0 siblings, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-13 17:13 UTC (permalink / raw) To: Frederic Weisbecker Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu On Thu, Oct 08, 2020 at 09:54:44PM +0200, Frederic Weisbecker wrote: > On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote: > > On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > > When adding a tick dependency to a task, its necessary to > > > > wakeup the CPU where the task resides to reevaluate tick > > > > dependencies on that CPU. > > > > > > > > However the current code wakes up all nohz_full CPUs, which > > > > is unnecessary. > > > > > > > > Switch to waking up a single CPU, by using ordering of writes > > > > to task->cpu and task->tick_dep_mask. > > > > > > > > From: Frederic Weisbecker <frederic@kernel.org> > > > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > > > > > > Index: linux-2.6/kernel/time/tick-sched.c > > > > =================================================================== > > > > --- linux-2.6.orig/kernel/time/tick-sched.c > > > > +++ linux-2.6/kernel/time/tick-sched.c > > > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > > > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > > > > } > > > > > > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > > +{ > > > > + int cpu = task_cpu(tsk); > > > > + > > > > + /* > > > > + * If the task concurrently migrates to another cpu, > > > > + * we guarantee it sees the new tick dependency upon > > > > + * schedule. > > > > + * > > > > + * > > > > + * set_task_cpu(p, cpu); > > > > + * STORE p->cpu = @cpu > > > > + * __schedule() (switch to task 'p') > > > > + * LOCK rq->lock > > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > > + */ > > > > + > > > > + preempt_disable(); > > > > + if (cpu_online(cpu)) > > > > + tick_nohz_full_kick_cpu(cpu); > > > > + preempt_enable(); > > > > +} > > > > > > So we need to kick the CPU unconditionally, or only when the task is > > > actually running? AFAICT we only care about current->tick_dep_mask. > > > > tick is necessary to execute run_posix_cpu_timers, from tick interrupt, > > even if task is not running. > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > anything to elapse. So indeed we can spare the IPI if the task is not > running. Provided ordering makes sure that the task sees the new dependency > when it schedules in of course. True. * p->on_cpu <- { 0, 1 }: * * is set by prepare_task() and cleared by finish_task() such that it will be * set before p is scheduled-in and cleared after p is scheduled-out, both * under rq->lock. Non-zero indicates the task is running on its CPU. CPU-0 (tick_set_dep) CPU-1 (task switch) STORE p->tick_dep_mask smp_mb() (atomic_fetch_or()) LOAD p->on_cpu context_switch(prev, next) STORE next->on_cpu = 1 ... [*] LOAD current->tick_dep_mask Don't see any explicit memory barrier in the [*] section? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-13 17:13 ` Marcelo Tosatti @ 2020-10-14 8:33 ` Peter Zijlstra 2020-10-14 23:40 ` Frederic Weisbecker 0 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2020-10-14 8:33 UTC (permalink / raw) To: Marcelo Tosatti Cc: Frederic Weisbecker, linux-kernel, Nitesh Narayan Lal, Peter Xu On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote: > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > > anything to elapse. So indeed we can spare the IPI if the task is not > > running. Provided ordering makes sure that the task sees the new dependency > > when it schedules in of course. > > True. > > * p->on_cpu <- { 0, 1 }: > * > * is set by prepare_task() and cleared by finish_task() such that it will be > * set before p is scheduled-in and cleared after p is scheduled-out, both > * under rq->lock. Non-zero indicates the task is running on its CPU. > > > CPU-0 (tick_set_dep) CPU-1 (task switch) > > STORE p->tick_dep_mask > smp_mb() (atomic_fetch_or()) > LOAD p->on_cpu > > > context_switch(prev, next) > STORE next->on_cpu = 1 > ... [*] > > LOAD current->tick_dep_mask > That load is in tick_nohz_task_switch() right? (which BTW is placed completely wrong) You could easily do something like the below I suppose. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 81632cd5e3b7..2a5fafe66bb0 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void) ts = this_cpu_ptr(&tick_cpu_sched); if (ts->tick_stopped) { + /* + * tick_set_dep() (this) + * + * STORE p->tick_dep_mask STORE p->on_cpu + * smp_mb() smp_mb() + * LOAD p->on_cpu LOAD p->tick_dep_mask + */ + smp_mb(); if (atomic_read(¤t->tick_dep_mask) || atomic_read(¤t->signal->tick_dep_mask)) tick_nohz_full_kick(); re tick_nohz_task_switch() being placed wrong, it should probably be placed before finish_lock_switch(). Something like so. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cf044580683c..5c92c959824f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) vtime_task_switch(prev); perf_event_task_sched_in(prev, current); finish_task(prev); + tick_nohz_task_switch(); finish_lock_switch(rq); finish_arch_post_lock_switch(); kcov_finish_switch(current); @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) put_task_struct_rcu_user(prev); } - tick_nohz_task_switch(); return rq; } diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 81632cd5e3b7..c8bddc9fb792 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -402,10 +402,8 @@ void __tick_nohz_task_switch(void) unsigned long flags; struct tick_sched *ts; - local_irq_save(flags); - if (!tick_nohz_full_cpu(smp_processor_id())) - goto out; + return; ts = this_cpu_ptr(&tick_cpu_sched); @@ -414,8 +412,6 @@ void __tick_nohz_task_switch(void) atomic_read(¤t->signal->tick_dep_mask)) tick_nohz_full_kick(); } -out: - local_irq_restore(flags); } /* Get the boot-time nohz CPU list from the kernel parameters. */ ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-14 8:33 ` Peter Zijlstra @ 2020-10-14 23:40 ` Frederic Weisbecker 2020-10-15 10:12 ` Peter Zijlstra 2020-10-20 18:52 ` Marcelo Tosatti 0 siblings, 2 replies; 21+ messages in thread From: Frederic Weisbecker @ 2020-10-14 23:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal, Peter Xu On Wed, Oct 14, 2020 at 10:33:21AM +0200, Peter Zijlstra wrote: > On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote: > > > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > > > anything to elapse. So indeed we can spare the IPI if the task is not > > > running. Provided ordering makes sure that the task sees the new dependency > > > when it schedules in of course. > > > > True. > > > > * p->on_cpu <- { 0, 1 }: > > * > > * is set by prepare_task() and cleared by finish_task() such that it will be > > * set before p is scheduled-in and cleared after p is scheduled-out, both > > * under rq->lock. Non-zero indicates the task is running on its CPU. > > > > > > CPU-0 (tick_set_dep) CPU-1 (task switch) > > > > STORE p->tick_dep_mask > > smp_mb() (atomic_fetch_or()) > > LOAD p->on_cpu > > > > > > context_switch(prev, next) > > STORE next->on_cpu = 1 > > ... [*] > > > > LOAD current->tick_dep_mask > > > > That load is in tick_nohz_task_switch() right? (which BTW is placed > completely wrong) You could easily do something like the below I > suppose. > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index 81632cd5e3b7..2a5fafe66bb0 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void) > ts = this_cpu_ptr(&tick_cpu_sched); > > if (ts->tick_stopped) { > + /* > + * tick_set_dep() (this) > + * > + * STORE p->tick_dep_mask STORE p->on_cpu > + * smp_mb() smp_mb() > + * LOAD p->on_cpu LOAD p->tick_dep_mask > + */ > + smp_mb(); > if (atomic_read(¤t->tick_dep_mask) || > atomic_read(¤t->signal->tick_dep_mask)) > tick_nohz_full_kick(); It would then need to be unconditional (whatever value of ts->tick_stopped). Assuming the tick isn't stopped, we may well have an interrupt firing right after schedule() which doesn't see the new value of tick_dep_map. Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED at wake up time, prior to the schedule() full barrier. Of course that doesn't mean that the task is actually the one running on the CPU but it's a good sign, considering that we are running in nohz_full mode and it's usually optimized for single task mode. Also setting a remote task's tick dependency is only used by posix cpu timer in case the user has the bad taste to enqueue on a task running in nohz_full mode. It shouldn't deserve an unconditional full barrier in the schedule path. If the target is current, as is used by RCU, I guess we can keep a special treatment. > re tick_nohz_task_switch() being placed wrong, it should probably be > placed before finish_lock_switch(). Something like so. > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index cf044580683c..5c92c959824f 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) > vtime_task_switch(prev); > perf_event_task_sched_in(prev, current); > finish_task(prev); > + tick_nohz_task_switch(); > finish_lock_switch(rq); > finish_arch_post_lock_switch(); > kcov_finish_switch(current); > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) > put_task_struct_rcu_user(prev); > } > > - tick_nohz_task_switch(); IIRC, we wanted to keep it outside rq lock because it shouldn't it... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-14 23:40 ` Frederic Weisbecker @ 2020-10-15 10:12 ` Peter Zijlstra 2020-10-26 14:42 ` Frederic Weisbecker 2020-10-20 18:52 ` Marcelo Tosatti 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2020-10-15 10:12 UTC (permalink / raw) To: Frederic Weisbecker Cc: Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal, Peter Xu On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > > re tick_nohz_task_switch() being placed wrong, it should probably be > > placed before finish_lock_switch(). Something like so. > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index cf044580683c..5c92c959824f 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > vtime_task_switch(prev); > > perf_event_task_sched_in(prev, current); > > finish_task(prev); > > + tick_nohz_task_switch(); > > finish_lock_switch(rq); > > finish_arch_post_lock_switch(); > > kcov_finish_switch(current); > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > put_task_struct_rcu_user(prev); > > } > > > > - tick_nohz_task_switch(); > > IIRC, we wanted to keep it outside rq lock because it shouldn't it... But now you've created a window with IRQs on and cause additional IRQ state changes. If you're really worried about rq->lock, I suppose we can do: rq_unlock(rq->lock); tick_nohz_task_switch(); local_irq_enable(); (much like we do at the beginning of __schedule for RCU) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-15 10:12 ` Peter Zijlstra @ 2020-10-26 14:42 ` Frederic Weisbecker 0 siblings, 0 replies; 21+ messages in thread From: Frederic Weisbecker @ 2020-10-26 14:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal, Peter Xu On Thu, Oct 15, 2020 at 12:12:35PM +0200, Peter Zijlstra wrote: > On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > > > re tick_nohz_task_switch() being placed wrong, it should probably be > > > placed before finish_lock_switch(). Something like so. > > > > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index cf044580683c..5c92c959824f 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > > vtime_task_switch(prev); > > > perf_event_task_sched_in(prev, current); > > > finish_task(prev); > > > + tick_nohz_task_switch(); > > > finish_lock_switch(rq); > > > finish_arch_post_lock_switch(); > > > kcov_finish_switch(current); > > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > > put_task_struct_rcu_user(prev); > > > } > > > > > > - tick_nohz_task_switch(); > > > > IIRC, we wanted to keep it outside rq lock because it shouldn't it... > > But now you've created a window with IRQs on and cause additional IRQ > state changes. > > If you're really worried about rq->lock, I suppose we can do: > > rq_unlock(rq->lock); > tick_nohz_task_switch(); > local_irq_enable(); > > (much like we do at the beginning of __schedule for RCU) Right. Well I'm not that worried about rq->lock though. If you're ok with it I can just move it before finish_lock_switch(). Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-14 23:40 ` Frederic Weisbecker 2020-10-15 10:12 ` Peter Zijlstra @ 2020-10-20 18:52 ` Marcelo Tosatti 2020-10-22 12:53 ` Frederic Weisbecker 1 sibling, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-20 18:52 UTC (permalink / raw) To: Frederic Weisbecker Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > On Wed, Oct 14, 2020 at 10:33:21AM +0200, Peter Zijlstra wrote: > > On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote: > > > > > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > > > > anything to elapse. So indeed we can spare the IPI if the task is not > > > > running. Provided ordering makes sure that the task sees the new dependency > > > > when it schedules in of course. > > > > > > True. > > > > > > * p->on_cpu <- { 0, 1 }: > > > * > > > * is set by prepare_task() and cleared by finish_task() such that it will be > > > * set before p is scheduled-in and cleared after p is scheduled-out, both > > > * under rq->lock. Non-zero indicates the task is running on its CPU. > > > > > > > > > CPU-0 (tick_set_dep) CPU-1 (task switch) > > > > > > STORE p->tick_dep_mask > > > smp_mb() (atomic_fetch_or()) > > > LOAD p->on_cpu > > > > > > > > > context_switch(prev, next) > > > STORE next->on_cpu = 1 > > > ... [*] > > > > > > LOAD current->tick_dep_mask > > > > > > > That load is in tick_nohz_task_switch() right? (which BTW is placed > > completely wrong) You could easily do something like the below I > > suppose. > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index 81632cd5e3b7..2a5fafe66bb0 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void) > > ts = this_cpu_ptr(&tick_cpu_sched); > > > > if (ts->tick_stopped) { > > + /* > > + * tick_set_dep() (this) > > + * > > + * STORE p->tick_dep_mask STORE p->on_cpu > > + * smp_mb() smp_mb() > > + * LOAD p->on_cpu LOAD p->tick_dep_mask > > + */ > > + smp_mb(); > > if (atomic_read(¤t->tick_dep_mask) || > > atomic_read(¤t->signal->tick_dep_mask)) > > tick_nohz_full_kick(); > > It would then need to be unconditional (whatever value of ts->tick_stopped). > Assuming the tick isn't stopped, we may well have an interrupt firing right > after schedule() which doesn't see the new value of tick_dep_map. > > Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED > at wake up time, prior to the schedule() full barrier. Of course that doesn't > mean that the task is actually the one running on the CPU but it's a good sign, > considering that we are running in nohz_full mode and it's usually optimized > for single task mode. Unfortunately that would require exporting p->on_rq which is internal to the scheduler, locklessly. (can surely do that if you prefer!) > > Also setting a remote task's tick dependency is only used by posix cpu timer > in case the user has the bad taste to enqueue on a task running in nohz_full > mode. It shouldn't deserve an unconditional full barrier in the schedule path. > > If the target is current, as is used by RCU, I guess we can keep a special > treatment. To answer PeterZ's original question: "So we need to kick the CPU unconditionally, or only when the task is actually running? AFAICT we only care about current->tick_dep_mask." If there is a task sharing signals, executing on a remote CPU, yes that remote CPU should be awakened. Could skip the IPI if the remote process is not running, however for the purposes of low latency isolated processes, this optimization is not necessary. So the last posted patchset is good enough for isolated low latency processes. Do you guys want me to do something or can you take the series as it is? > > re tick_nohz_task_switch() being placed wrong, it should probably be > > placed before finish_lock_switch(). Something like so. > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index cf044580683c..5c92c959824f 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > vtime_task_switch(prev); > > perf_event_task_sched_in(prev, current); > > finish_task(prev); > > + tick_nohz_task_switch(); > > finish_lock_switch(rq); > > finish_arch_post_lock_switch(); > > kcov_finish_switch(current); > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > put_task_struct_rcu_user(prev); > > } > > > > - tick_nohz_task_switch(); > > IIRC, we wanted to keep it outside rq lock because it shouldn't it... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-20 18:52 ` Marcelo Tosatti @ 2020-10-22 12:53 ` Frederic Weisbecker 0 siblings, 0 replies; 21+ messages in thread From: Frederic Weisbecker @ 2020-10-22 12:53 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, linux-kernel, Nitesh Narayan Lal, Peter Xu On Tue, Oct 20, 2020 at 03:52:45PM -0300, Marcelo Tosatti wrote: > On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > > Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED > > at wake up time, prior to the schedule() full barrier. Of course that doesn't > > mean that the task is actually the one running on the CPU but it's a good sign, > > considering that we are running in nohz_full mode and it's usually optimized > > for single task mode. > > Unfortunately that would require exporting p->on_rq which is internal to > the scheduler, locklessly. > > (can surely do that if you prefer!) May be: bool task_on_rq(struct task_struct *p) ? Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti 2020-10-08 12:22 ` Peter Zijlstra @ 2020-10-08 14:59 ` Peter Xu 2020-10-08 15:28 ` Peter Zijlstra 2020-10-08 17:43 ` Marcelo Tosatti 1 sibling, 2 replies; 21+ messages in thread From: Peter Xu @ 2020-10-08 14:59 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > +static void tick_nohz_kick_task(struct task_struct *tsk) > +{ > + int cpu = task_cpu(tsk); > + > + /* > + * If the task concurrently migrates to another cpu, > + * we guarantee it sees the new tick dependency upon > + * schedule. > + * > + * > + * set_task_cpu(p, cpu); > + * STORE p->cpu = @cpu > + * __schedule() (switch to task 'p') > + * LOCK rq->lock > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > + * LOAD p->tick_dep_mask LOAD p->cpu > + */ > + > + preempt_disable(); Pure question: is preempt_disable() required here? Same question to tick_nohz_full_kick_all(). > + if (cpu_online(cpu)) > + tick_nohz_full_kick_cpu(cpu); > + preempt_enable(); > +} -- Peter Xu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 14:59 ` Peter Xu @ 2020-10-08 15:28 ` Peter Zijlstra 2020-10-08 19:16 ` Peter Xu 2020-10-08 19:48 ` Frederic Weisbecker 2020-10-08 17:43 ` Marcelo Tosatti 1 sibling, 2 replies; 21+ messages in thread From: Peter Zijlstra @ 2020-10-08 15:28 UTC (permalink / raw) To: Peter Xu Cc: Marcelo Tosatti, linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > +{ > > + int cpu = task_cpu(tsk); > > + > > + /* > > + * If the task concurrently migrates to another cpu, > > + * we guarantee it sees the new tick dependency upon > > + * schedule. > > + * > > + * > > + * set_task_cpu(p, cpu); > > + * STORE p->cpu = @cpu > > + * __schedule() (switch to task 'p') > > + * LOCK rq->lock > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > + * LOAD p->tick_dep_mask LOAD p->cpu > > + */ > > + > > + preempt_disable(); > > Pure question: is preempt_disable() required here? Same question to > tick_nohz_full_kick_all(). I think it serializes against hotplug. > > > + if (cpu_online(cpu)) > > + tick_nohz_full_kick_cpu(cpu); > > + preempt_enable(); > > +} ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 15:28 ` Peter Zijlstra @ 2020-10-08 19:16 ` Peter Xu 2020-10-08 19:48 ` Frederic Weisbecker 1 sibling, 0 replies; 21+ messages in thread From: Peter Xu @ 2020-10-08 19:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Marcelo Tosatti, linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal On Thu, Oct 08, 2020 at 05:28:44PM +0200, Peter Zijlstra wrote: > On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > +{ > > > + int cpu = task_cpu(tsk); > > > + > > > + /* > > > + * If the task concurrently migrates to another cpu, > > > + * we guarantee it sees the new tick dependency upon > > > + * schedule. > > > + * > > > + * > > > + * set_task_cpu(p, cpu); > > > + * STORE p->cpu = @cpu > > > + * __schedule() (switch to task 'p') > > > + * LOCK rq->lock > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > + */ > > > + > > > + preempt_disable(); > > > > Pure question: is preempt_disable() required here? Same question to > > tick_nohz_full_kick_all(). > > I think it serializes against hotplug. Thanks Peter. So is that a lighter but trickier version of get_online_cpus() which is even allowed with spinlock? I noticed that this method was actually mentioned in the old cpu-hotplug.txt, but it was removed during the convertion to rst: ff58fa7f556c ("Documentation: Update CPU hotplug and move it to core-api") Not sure whether it's intended, just to raise this up. If it's still valid, maybe still worth to document it somewhere. -- Peter Xu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 15:28 ` Peter Zijlstra 2020-10-08 19:16 ` Peter Xu @ 2020-10-08 19:48 ` Frederic Weisbecker 1 sibling, 0 replies; 21+ messages in thread From: Frederic Weisbecker @ 2020-10-08 19:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Peter Xu, Marcelo Tosatti, linux-kernel, Nitesh Narayan Lal On Thu, Oct 08, 2020 at 05:28:44PM +0200, Peter Zijlstra wrote: > On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > +{ > > > + int cpu = task_cpu(tsk); > > > + > > > + /* > > > + * If the task concurrently migrates to another cpu, > > > + * we guarantee it sees the new tick dependency upon > > > + * schedule. > > > + * > > > + * > > > + * set_task_cpu(p, cpu); > > > + * STORE p->cpu = @cpu > > > + * __schedule() (switch to task 'p') > > > + * LOCK rq->lock > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > + */ > > > + > > > + preempt_disable(); > > > > Pure question: is preempt_disable() required here? Same question to > > tick_nohz_full_kick_all(). > > I think it serializes against hotplug. Exactly! Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task 2020-10-08 14:59 ` Peter Xu 2020-10-08 15:28 ` Peter Zijlstra @ 2020-10-08 17:43 ` Marcelo Tosatti 1 sibling, 0 replies; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-08 17:43 UTC (permalink / raw) To: Peter Xu Cc: linux-kernel, Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > +{ > > + int cpu = task_cpu(tsk); > > + > > + /* > > + * If the task concurrently migrates to another cpu, > > + * we guarantee it sees the new tick dependency upon > > + * schedule. > > + * > > + * > > + * set_task_cpu(p, cpu); > > + * STORE p->cpu = @cpu > > + * __schedule() (switch to task 'p') > > + * LOCK rq->lock > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or()) > > + * LOAD p->tick_dep_mask LOAD p->cpu > > + */ > > + > > + preempt_disable(); > > Pure question: is preempt_disable() required here? Same question to > tick_nohz_full_kick_all(). Hi Peter, Don't see why: irq_queue_work_on() disables preemption if necessary. > > > + if (cpu_online(cpu)) > > + tick_nohz_full_kick_cpu(cpu); > > + preempt_enable(); > > +} > > -- > Peter Xu ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks 2020-10-07 18:01 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) Marcelo Tosatti 2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti @ 2020-10-07 18:01 ` Marcelo Tosatti 2020-10-08 12:35 ` Peter Zijlstra 1 sibling, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-07 18:01 UTC (permalink / raw) To: linux-kernel Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal, Peter Xu, Marcelo Tosatti Rather than waking up all nohz_full CPUs on the system, only wakeup the target CPUs of member threads of the signal. Reduces interruptions to nohz_full CPUs. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -398,7 +398,15 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta */ void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit) { - tick_nohz_dep_set_all(&sig->tick_dep_mask, bit); + int prev; + + prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask); + if (!prev) { + rcu_read_lock(); + for_each_thread(sig, t) + tick_nohz_kick_task(t); + rcu_read_unlock(); + } } void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks 2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti @ 2020-10-08 12:35 ` Peter Zijlstra 2020-10-08 18:04 ` Marcelo Tosatti 0 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2020-10-08 12:35 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu, Oleg Nesterov On Wed, Oct 07, 2020 at 03:01:53PM -0300, Marcelo Tosatti wrote: > Rather than waking up all nohz_full CPUs on the system, only wakeup > the target CPUs of member threads of the signal. > > Reduces interruptions to nohz_full CPUs. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Index: linux-2.6/kernel/time/tick-sched.c > =================================================================== > --- linux-2.6.orig/kernel/time/tick-sched.c > +++ linux-2.6/kernel/time/tick-sched.c > @@ -398,7 +398,15 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta > */ > void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit) > { > - tick_nohz_dep_set_all(&sig->tick_dep_mask, bit); > + int prev; > + > + prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask); > + if (!prev) { > + rcu_read_lock(); > + for_each_thread(sig, t) > + tick_nohz_kick_task(t); > + rcu_read_unlock(); > + } > } AFAICT, and this makes perfect sense, this function is only ever used while holding sighand->siglock, which makes the RCU read lock superfluous. Would it make sense to change the signal_struct argument to task_struct, such that we can write: lockdep_assert_held(&p->sighand->siglock); for_each_thread(p->signal, t) tick_nohz_kick_task(t); ? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks 2020-10-08 12:35 ` Peter Zijlstra @ 2020-10-08 18:04 ` Marcelo Tosatti 0 siblings, 0 replies; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-08 18:04 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Frederic Weisbecker, Nitesh Narayan Lal, Peter Xu, Oleg Nesterov On Thu, Oct 08, 2020 at 02:35:44PM +0200, Peter Zijlstra wrote: > On Wed, Oct 07, 2020 at 03:01:53PM -0300, Marcelo Tosatti wrote: > > Rather than waking up all nohz_full CPUs on the system, only wakeup > > the target CPUs of member threads of the signal. > > > > Reduces interruptions to nohz_full CPUs. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > > Index: linux-2.6/kernel/time/tick-sched.c > > =================================================================== > > --- linux-2.6.orig/kernel/time/tick-sched.c > > +++ linux-2.6/kernel/time/tick-sched.c > > @@ -398,7 +398,15 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta > > */ > > void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit) > > { > > - tick_nohz_dep_set_all(&sig->tick_dep_mask, bit); > > + int prev; > > + > > + prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask); > > + if (!prev) { > > + rcu_read_lock(); > > + for_each_thread(sig, t) > > + tick_nohz_kick_task(t); > > + rcu_read_unlock(); > > + } > > } > > AFAICT, and this makes perfect sense, this function is only ever used > while holding sighand->siglock, which makes the RCU read lock > superfluous. > > Would it make sense to change the signal_struct argument to task_struct, > such that we can write: > > lockdep_assert_held(&p->sighand->siglock); > for_each_thread(p->signal, t) > tick_nohz_kick_task(t); > > ? Makes sense, resending -v3. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v3) @ 2020-10-08 19:11 Marcelo Tosatti 2020-10-08 19:11 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti 0 siblings, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-08 19:11 UTC (permalink / raw) To: linux-kernel Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal, Peter Xu When enabling per-CPU posix timers, an IPI to nohz_full CPUs might be performed (to re-read the dependencies and possibly not re-enter nohz_full on a given CPU). A common case is for applications that run on nohz_full= CPUs to not use POSIX timers (eg DPDK). This patch changes the notification to only IPI the target CPUs where the task(s) whose tick dependencies are being updated are executing. This reduces interruptions to nohz_full= CPUs. --- v3: replace superfluous rcu_read_lock with lockdep_assert (PeterZ) ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks 2020-10-08 19:11 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v3) Marcelo Tosatti @ 2020-10-08 19:11 ` Marcelo Tosatti 0 siblings, 0 replies; 21+ messages in thread From: Marcelo Tosatti @ 2020-10-08 19:11 UTC (permalink / raw) To: linux-kernel Cc: Frederic Weisbecker, Peter Zijlstra, Nitesh Narayan Lal, Peter Xu, Marcelo Tosatti Rather than waking up all nohz_full CPUs on the system, only wakeup the target CPUs of member threads of the signal. Reduces interruptions to nohz_full CPUs. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -396,9 +396,17 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_ta * Set a per-taskgroup tick dependency. Posix CPU timers need this in order to elapse * per process timers. */ -void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit) +void tick_nohz_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { - tick_nohz_dep_set_all(&sig->tick_dep_mask, bit); + int prev; + struct task_struct *p; + + prev = atomic_fetch_or(BIT(bit), &tsk->signal->tick_dep_mask); + if (!prev) { + lockdep_assert_held(&tsk->sighand->siglock); + for_each_thread(tsk, p) + tick_nohz_kick_task(p); + } } void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit) Index: linux-2.6/include/linux/tick.h =================================================================== --- linux-2.6.orig/include/linux/tick.h +++ linux-2.6/include/linux/tick.h @@ -207,7 +207,7 @@ extern void tick_nohz_dep_set_task(struc enum tick_dep_bits bit); extern void tick_nohz_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit); -extern void tick_nohz_dep_set_signal(struct signal_struct *signal, +extern void tick_nohz_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit); extern void tick_nohz_dep_clear_signal(struct signal_struct *signal, enum tick_dep_bits bit); @@ -252,11 +252,11 @@ static inline void tick_dep_clear_task(s if (tick_nohz_full_enabled()) tick_nohz_dep_clear_task(tsk, bit); } -static inline void tick_dep_set_signal(struct signal_struct *signal, +static inline void tick_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { if (tick_nohz_full_enabled()) - tick_nohz_dep_set_signal(signal, bit); + tick_nohz_dep_set_signal(tsk, bit); } static inline void tick_dep_clear_signal(struct signal_struct *signal, enum tick_dep_bits bit) @@ -284,7 +284,7 @@ static inline void tick_dep_set_task(str enum tick_dep_bits bit) { } static inline void tick_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit) { } -static inline void tick_dep_set_signal(struct signal_struct *signal, +static inline void tick_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { } static inline void tick_dep_clear_signal(struct signal_struct *signal, enum tick_dep_bits bit) { } Index: linux-2.6/kernel/time/posix-cpu-timers.c =================================================================== --- linux-2.6.orig/kernel/time/posix-cpu-timers.c +++ linux-2.6/kernel/time/posix-cpu-timers.c @@ -523,7 +523,7 @@ static void arm_timer(struct k_itimer *t if (CPUCLOCK_PERTHREAD(timer->it_clock)) tick_dep_set_task(p, TICK_DEP_BIT_POSIX_TIMER); else - tick_dep_set_signal(p->signal, TICK_DEP_BIT_POSIX_TIMER); + tick_dep_set_signal(p, TICK_DEP_BIT_POSIX_TIMER); } /* @@ -1358,7 +1358,7 @@ void set_process_cpu_timer(struct task_s if (*newval < *nextevt) *nextevt = *newval; - tick_dep_set_signal(tsk->signal, TICK_DEP_BIT_POSIX_TIMER); + tick_dep_set_signal(tsk, TICK_DEP_BIT_POSIX_TIMER); } static int do_cpu_nanosleep(const clockid_t which_clock, int flags, ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-10-26 14:42 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-07 18:01 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v2) Marcelo Tosatti 2020-10-07 18:01 ` [patch 1/2] nohz: only wakeup a single target cpu when kicking a task Marcelo Tosatti 2020-10-08 12:22 ` Peter Zijlstra 2020-10-08 17:54 ` Marcelo Tosatti 2020-10-08 19:54 ` Frederic Weisbecker 2020-10-13 17:13 ` Marcelo Tosatti 2020-10-14 8:33 ` Peter Zijlstra 2020-10-14 23:40 ` Frederic Weisbecker 2020-10-15 10:12 ` Peter Zijlstra 2020-10-26 14:42 ` Frederic Weisbecker 2020-10-20 18:52 ` Marcelo Tosatti 2020-10-22 12:53 ` Frederic Weisbecker 2020-10-08 14:59 ` Peter Xu 2020-10-08 15:28 ` Peter Zijlstra 2020-10-08 19:16 ` Peter Xu 2020-10-08 19:48 ` Frederic Weisbecker 2020-10-08 17:43 ` Marcelo Tosatti 2020-10-07 18:01 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti 2020-10-08 12:35 ` Peter Zijlstra 2020-10-08 18:04 ` Marcelo Tosatti 2020-10-08 19:11 [patch 0/2] nohz_full: only wakeup target CPUs when notifying new tick dependency (v3) Marcelo Tosatti 2020-10-08 19:11 ` [patch 2/2] nohz: change signal tick dependency to wakeup CPUs of member tasks Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).