* [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% @ 2016-01-29 22:22 riel 2016-01-29 22:22 ` [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals riel 2016-01-29 22:23 ` [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy riel 0 siblings, 2 replies; 12+ messages in thread From: riel @ 2016-01-29 22:22 UTC (permalink / raw) To: linux-kernel; +Cc: tglx, mingo, peterz, luto, fweisbec Running with nohz_full introduces a fair amount of overhead. Specifically, various things that are usually done from the timer interrupt are now done at syscall, irq, and guest entry and exit times. However, some of the code that is called every single time has only ever worked at jiffy resolution. The code in __acct_update_integrals was also doing some unnecessary calculations. Getting rid of the unnecessary calculations, without changing any of the functionality in __acct_update_integrals gets us about a 10% win. Not calling __acct_update_integrals and related code unless jiffies changed (__acct_update_integrals does not do anything with smaller time intervals anyway) shaves off a further 30%. Run times for the microbenchmark: 4.4 3.8 seconds 4.5-rc1 3.7 seconds 4.5-rc1 + first patch 3.3 seconds 4.5-rc1 + both patches 2.3 seconds ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals 2016-01-29 22:22 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel @ 2016-01-29 22:22 ` riel 2016-01-29 23:10 ` Peter Zijlstra 2016-01-29 22:23 ` [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy riel 1 sibling, 1 reply; 12+ messages in thread From: riel @ 2016-01-29 22:22 UTC (permalink / raw) To: linux-kernel; +Cc: tglx, mingo, peterz, luto, fweisbec From: Rik van Riel <riel@redhat.com> When running a microbenchmark calling an invalid syscall number in a loop, on a nohz_full CPU, we spend a full 9% of our CPU time in __acct_update_integrals. This function converts cputime_t to jiffies, to a timeval, only to convert the timeval back to microseconds before discarding it. This patch leaves __acct_update_integrals functionally equivalent, but speeds things up by about 11%, with 10 million calls to an invalid syscall number dropping from 3.7 to 3.3 seconds. Signed-off-by: Rik van Riel <riel@redhat.com> --- kernel/tsacct.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/tsacct.c b/kernel/tsacct.c index 975cb49e32bf..afb5cf8ecc5f 100644 --- a/kernel/tsacct.c +++ b/kernel/tsacct.c @@ -125,22 +125,22 @@ static void __acct_update_integrals(struct task_struct *tsk, { if (likely(tsk->mm)) { cputime_t time, dtime; - struct timeval value; unsigned long flags; - u64 delta; + u64 delta, usecs; local_irq_save(flags); time = stime + utime; dtime = time - tsk->acct_timexpd; - jiffies_to_timeval(cputime_to_jiffies(dtime), &value); - delta = value.tv_sec; - delta = delta * USEC_PER_SEC + value.tv_usec; + delta = cputime_to_jiffies(dtime); if (delta == 0) goto out; + + usecs = jiffies_to_usecs(delta); + tsk->acct_timexpd = time; - tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm); - tsk->acct_vm_mem1 += delta * tsk->mm->total_vm; + tsk->acct_rss_mem1 += usecs * get_mm_rss(tsk->mm); + tsk->acct_vm_mem1 += usecs * tsk->mm->total_vm; out: local_irq_restore(flags); } -- 2.5.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals 2016-01-29 22:22 ` [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals riel @ 2016-01-29 23:10 ` Peter Zijlstra 2016-01-30 3:36 ` Frederic Weisbecker 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2016-01-29 23:10 UTC (permalink / raw) To: riel; +Cc: linux-kernel, tglx, mingo, luto, fweisbec On Fri, Jan 29, 2016 at 05:22:59PM -0500, riel@redhat.com wrote: > From: Rik van Riel <riel@redhat.com> > > When running a microbenchmark calling an invalid syscall number > in a loop, on a nohz_full CPU, we spend a full 9% of our CPU > time in __acct_update_integrals. > > This function converts cputime_t to jiffies, to a timeval, only to > convert the timeval back to microseconds before discarding it. > > This patch leaves __acct_update_integrals functionally equivalent, > but speeds things up by about 11%, with 10 million calls to an > invalid syscall number dropping from 3.7 to 3.3 seconds. WTH is this taskstat crap anyway? Who uses it and can't we kill it? There seems to be an endless reserve of accounting crap. > +++ b/kernel/tsacct.c > @@ -125,22 +125,22 @@ static void __acct_update_integrals(struct task_struct *tsk, > { > if (likely(tsk->mm)) { > cputime_t time, dtime; > unsigned long flags; > + u64 delta, usecs; > > local_irq_save(flags); > time = stime + utime; > dtime = time - tsk->acct_timexpd; > + delta = cputime_to_jiffies(dtime); > > if (delta == 0) > goto out; > + > + usecs = jiffies_to_usecs(delta); > + > tsk->acct_timexpd = time; > + tsk->acct_rss_mem1 += usecs * get_mm_rss(tsk->mm); > + tsk->acct_vm_mem1 += usecs * tsk->mm->total_vm; > out: > local_irq_restore(flags); > } Hurch, what horrible code. Can't you at least drop the pointless indent level while you're there anyway? Also, it looks like the callpath through acct_account_cputime() should already guarantee IRQs are disabled, so if you move the irq_save / irq_restore business into acct_update_integrals() you can remove them too from the fast path. And I think you can kill that last divide too, if you do something like: nsecs = cputime_to_nsecs(dtime); if (nsecs < TICK_NSEC) goto out; usecs = nsecs >> 10; ... And if people really care, you can correct the 1024 != 1000 thing in xacct_add_tsk() which seems to be the consumer side and thus should be exceedingly rare. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals 2016-01-29 23:10 ` Peter Zijlstra @ 2016-01-30 3:36 ` Frederic Weisbecker 2016-01-30 3:47 ` Rik van Riel 0 siblings, 1 reply; 12+ messages in thread From: Frederic Weisbecker @ 2016-01-30 3:36 UTC (permalink / raw) To: Peter Zijlstra; +Cc: riel, linux-kernel, tglx, mingo, luto On Sat, Jan 30, 2016 at 12:10:18AM +0100, Peter Zijlstra wrote: > On Fri, Jan 29, 2016 at 05:22:59PM -0500, riel@redhat.com wrote: > > From: Rik van Riel <riel@redhat.com> > > > > When running a microbenchmark calling an invalid syscall number > > in a loop, on a nohz_full CPU, we spend a full 9% of our CPU > > time in __acct_update_integrals. > > > > This function converts cputime_t to jiffies, to a timeval, only to > > convert the timeval back to microseconds before discarding it. > > > > This patch leaves __acct_update_integrals functionally equivalent, > > but speeds things up by about 11%, with 10 million calls to an > > invalid syscall number dropping from 3.7 to 3.3 seconds. > > WTH is this taskstat crap anyway? Who uses it and can't we kill it? I have no idea what it's used for, it seems to be related to taskstats over netlink. I'm not even sure if it's actually used. There don't seem to be a runtime offcase and I bet distros enable it. So that stuff does some work every millisecond on millions of machines while it probably has very few users. SGI introduced it in 2006 and it seems that their last contribution there is in 2008. The rest is kernel maintainance and fixes. If there are still users of it, then at least we should disable it on runtime by default. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals 2016-01-30 3:36 ` Frederic Weisbecker @ 2016-01-30 3:47 ` Rik van Riel 0 siblings, 0 replies; 12+ messages in thread From: Rik van Riel @ 2016-01-30 3:47 UTC (permalink / raw) To: Frederic Weisbecker, Peter Zijlstra; +Cc: linux-kernel, tglx, mingo, luto On 01/29/2016 10:36 PM, Frederic Weisbecker wrote: > On Sat, Jan 30, 2016 at 12:10:18AM +0100, Peter Zijlstra wrote: >> On Fri, Jan 29, 2016 at 05:22:59PM -0500, riel@redhat.com wrote: >>> From: Rik van Riel <riel@redhat.com> >>> >>> When running a microbenchmark calling an invalid syscall number >>> in a loop, on a nohz_full CPU, we spend a full 9% of our CPU >>> time in __acct_update_integrals. >>> >>> This function converts cputime_t to jiffies, to a timeval, only to >>> convert the timeval back to microseconds before discarding it. >>> >>> This patch leaves __acct_update_integrals functionally equivalent, >>> but speeds things up by about 11%, with 10 million calls to an >>> invalid syscall number dropping from 3.7 to 3.3 seconds. >> >> WTH is this taskstat crap anyway? Who uses it and can't we kill it? > > I have no idea what it's used for, it seems to be related to taskstats > over netlink. I'm not even sure if it's actually used. There don't seem > to be a runtime offcase and I bet distros enable it. So that stuff does > some work every millisecond on millions of machines while it probably > has very few users. > > SGI introduced it in 2006 and it seems that their last contribution there is > in 2008. The rest is kernel maintainance and fixes. > > If there are still users of it, then at least we should disable it on runtime by > default. With all the non-power-of-2 divides removed, __acct_update_integrals disappears from the profile, even running many times per millisecond. At that point, native_sched_clock takes over the top of the profile, and the only way I can think of getting rid of that one is making sure it is not called twice for every syscall, irq, and kvm guest entry/exit. -- All rights reversed ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-29 22:22 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel 2016-01-29 22:22 ` [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals riel @ 2016-01-29 22:23 ` riel 2016-01-29 22:43 ` Rik van Riel 1 sibling, 1 reply; 12+ messages in thread From: riel @ 2016-01-29 22:23 UTC (permalink / raw) To: linux-kernel; +Cc: tglx, mingo, peterz, luto, fweisbec From: Rik van Riel <riel@redhat.com> Because __acct_update_integrals does nothing unless the time interval in question exceeds a jiffy, there is no real reason to call it more than once a jiffy from the syscall, irq, and guest entry & exit paths. If tasks get rescheduled frequently, the scheduler will still update their time statistics normally. This patch only impacts longer running tasks. This speeds up Signed-off-by: Rik van Riel <riel@redhat.com> --- include/linux/sched.h | 1 + kernel/sched/cputime.c | 35 +++++++++++++++++++++++++++++------ 2 files changed, 30 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a10494a94cc3..019c3af98503 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1532,6 +1532,7 @@ struct task_struct { struct prev_cputime prev_cputime; #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN seqcount_t vtime_seqcount; + unsigned long vtime_jiffies; unsigned long long vtime_snap; enum { /* Task is sleeping or running in a CPU with VTIME inactive */ diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index b2ab2ffb1adc..923c110319b1 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -668,6 +668,15 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN +static bool vtime_jiffies_changed(struct task_struct *tsk, unsigned long now) +{ + if (tsk->vtime_jiffies == jiffies) + return false; + + tsk->vtime_jiffies = jiffies; + return true; +} + static unsigned long long vtime_delta(struct task_struct *tsk) { unsigned long long clock; @@ -699,6 +708,9 @@ static void __vtime_account_system(struct task_struct *tsk) void vtime_account_system(struct task_struct *tsk) { + if (!vtime_jiffies_changed(tsk, jiffies)) + return; + write_seqcount_begin(&tsk->vtime_seqcount); __vtime_account_system(tsk); write_seqcount_end(&tsk->vtime_seqcount); @@ -707,7 +719,8 @@ void vtime_account_system(struct task_struct *tsk) void vtime_gen_account_irq_exit(struct task_struct *tsk) { write_seqcount_begin(&tsk->vtime_seqcount); - __vtime_account_system(tsk); + if (vtime_jiffies_changed(tsk, jiffies)) + __vtime_account_system(tsk); if (context_tracking_in_user()) tsk->vtime_snap_whence = VTIME_USER; write_seqcount_end(&tsk->vtime_seqcount); @@ -718,16 +731,19 @@ void vtime_account_user(struct task_struct *tsk) cputime_t delta_cpu; write_seqcount_begin(&tsk->vtime_seqcount); - delta_cpu = get_vtime_delta(tsk); tsk->vtime_snap_whence = VTIME_SYS; - account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu)); + if (vtime_jiffies_changed(tsk, jiffies)) { + delta_cpu = get_vtime_delta(tsk); + account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu)); + } write_seqcount_end(&tsk->vtime_seqcount); } void vtime_user_enter(struct task_struct *tsk) { write_seqcount_begin(&tsk->vtime_seqcount); - __vtime_account_system(tsk); + if (vtime_jiffies_changed(tsk, jiffies)) + __vtime_account_system(tsk); tsk->vtime_snap_whence = VTIME_USER; write_seqcount_end(&tsk->vtime_seqcount); } @@ -742,7 +758,8 @@ void vtime_guest_enter(struct task_struct *tsk) * that can thus safely catch up with a tickless delta. */ write_seqcount_begin(&tsk->vtime_seqcount); - __vtime_account_system(tsk); + if (vtime_jiffies_changed(tsk, jiffies)) + __vtime_account_system(tsk); current->flags |= PF_VCPU; write_seqcount_end(&tsk->vtime_seqcount); } @@ -759,8 +776,12 @@ EXPORT_SYMBOL_GPL(vtime_guest_exit); void vtime_account_idle(struct task_struct *tsk) { - cputime_t delta_cpu = get_vtime_delta(tsk); + cputime_t delta_cpu; + + if (!vtime_jiffies_changed(tsk, jiffies)) + return; + delta_cpu = get_vtime_delta(tsk); account_idle_time(delta_cpu); } @@ -773,6 +794,7 @@ void arch_vtime_task_switch(struct task_struct *prev) write_seqcount_begin(¤t->vtime_seqcount); current->vtime_snap_whence = VTIME_SYS; current->vtime_snap = sched_clock_cpu(smp_processor_id()); + current->vtime_jiffies = jiffies; write_seqcount_end(¤t->vtime_seqcount); } @@ -784,6 +806,7 @@ void vtime_init_idle(struct task_struct *t, int cpu) write_seqcount_begin(&t->vtime_seqcount); t->vtime_snap_whence = VTIME_SYS; t->vtime_snap = sched_clock_cpu(cpu); + t->vtime_jiffies = jiffies; write_seqcount_end(&t->vtime_seqcount); local_irq_restore(flags); } -- 2.5.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-29 22:23 ` [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy riel @ 2016-01-29 22:43 ` Rik van Riel 2016-01-30 14:20 ` Frederic Weisbecker 0 siblings, 1 reply; 12+ messages in thread From: Rik van Riel @ 2016-01-29 22:43 UTC (permalink / raw) To: linux-kernel; +Cc: tglx, mingo, peterz, luto, fweisbec, Clark Williams On 01/29/2016 05:23 PM, riel@redhat.com wrote: > From: Rik van Riel <riel@redhat.com> > This speeds up ... ok, that changelog got truncated :( Here is the full version: Because __acct_update_integrals does nothing unless the time interval in question exceeds a jiffy, there is no real reason to call it more than once a jiffy from the syscall, irq, and guest entry & exit paths. If tasks get rescheduled frequently, the scheduler will still update their time statistics normally. However, longer running tasks with frequent syscall, irq, or guest entry & exit see a difference with this patch. A microbenchmark calling an invalid syscall number 10 million times in a row speeds up an additional 30% over the numbers with just the previous patch, for a total speedup of about 40% over 4.4 and 4.5-rc1. Run times for the microbenchmark: 4.4 3.8 seconds 4.5-rc1 3.7 seconds 4.5-rc1 + first patch 3.3 seconds 4.5-rc1 + both patches 2.3 seconds -- All rights reversed ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-29 22:43 ` Rik van Riel @ 2016-01-30 14:20 ` Frederic Weisbecker 2016-01-30 17:53 ` Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Frederic Weisbecker @ 2016-01-30 14:20 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel, tglx, mingo, peterz, luto, Clark Williams On Fri, Jan 29, 2016 at 05:43:28PM -0500, Rik van Riel wrote: > On 01/29/2016 05:23 PM, riel@redhat.com wrote: > > From: Rik van Riel <riel@redhat.com> > > > This speeds up > > ... ok, that changelog got truncated :( > > Here is the full version: > > > Because __acct_update_integrals does nothing unless the time > interval in question exceeds a jiffy, there is no real reason > to call it more than once a jiffy from the syscall, irq, and > guest entry & exit paths. > > If tasks get rescheduled frequently, the scheduler will still > update their time statistics normally. > > However, longer running tasks with frequent syscall, irq, > or guest entry & exit see a difference with this patch. > > A microbenchmark calling an invalid syscall number 10 million > times in a row speeds up an additional 30% over the numbers > with just the previous patch, for a total speedup of about 40% > over 4.4 and 4.5-rc1. > > Run times for the microbenchmark: > > 4.4 3.8 seconds > 4.5-rc1 3.7 seconds > 4.5-rc1 + first patch 3.3 seconds > 4.5-rc1 + both patches 2.3 seconds Very nice improvement! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-30 14:20 ` Frederic Weisbecker @ 2016-01-30 17:53 ` Mike Galbraith 2016-01-30 20:36 ` Frederic Weisbecker 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2016-01-30 17:53 UTC (permalink / raw) To: Frederic Weisbecker, Rik van Riel Cc: linux-kernel, tglx, mingo, peterz, luto, Clark Williams On Sat, 2016-01-30 at 15:20 +0100, Frederic Weisbecker wrote: > On Fri, Jan 29, 2016 at 05:43:28PM -0500, Rik van Riel wrote: > > Run times for the microbenchmark: > > > > 4.4 3.8 seconds > > 4.5-rc1 3.7 seconds > > 4.5-rc1 + first patch 3.3 seconds > > 4.5-rc1 + both patches 2.3 seconds > > Very nice improvement! Tasty indeed. When nohz_full CPUs are not isolated, ie are being used as generic CPUs, get_nohz_timer_target() is a problem with things like tbench. tbench 8 with Rik's patches applied: nohz_full=empty Throughput 3204.69 MB/sec 1.000 nohz_full=1-3,5-7 Throughput 1354.99 MB/sec .422 1.000 nohz_full=1-3,5-7 + club below Throughput 2762.22 MB/sec .861 2.038 With Rik's patches and a club, tbench becomes nearly acceptable. --- include/linux/tick.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -184,7 +184,7 @@ static inline const struct cpumask *hous static inline bool is_housekeeping_cpu(int cpu) { #ifdef CONFIG_NO_HZ_FULL - if (tick_nohz_full_enabled()) + if (tick_nohz_full_enabled() && runqueue_is_isolated(cpu)) return cpumask_test_cpu(cpu, housekeeping_mask); #endif return true; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-30 17:53 ` Mike Galbraith @ 2016-01-30 20:36 ` Frederic Weisbecker 2016-01-31 2:52 ` Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Frederic Weisbecker @ 2016-01-30 20:36 UTC (permalink / raw) To: Mike Galbraith Cc: Rik van Riel, linux-kernel, tglx, mingo, peterz, luto, Clark Williams On Sat, Jan 30, 2016 at 06:53:05PM +0100, Mike Galbraith wrote: > On Sat, 2016-01-30 at 15:20 +0100, Frederic Weisbecker wrote: > > On Fri, Jan 29, 2016 at 05:43:28PM -0500, Rik van Riel wrote: > > > > Run times for the microbenchmark: > > > > > > 4.4 3.8 seconds > > > 4.5-rc1 3.7 seconds > > > 4.5-rc1 + first patch 3.3 seconds > > > 4.5-rc1 + both patches 2.3 seconds > > > > Very nice improvement! > > Tasty indeed. > > When nohz_full CPUs are not isolated, ie are being used as generic > CPUs, get_nohz_timer_target() is a problem with things like tbench. So by isolated CPU you mean those part of isolcpus= boot option, right? > > tbench 8 with Rik's patches applied: > nohz_full=empty > Throughput 3204.69 MB/sec 1.000 > nohz_full=1-3,5-7 > Throughput 1354.99 MB/sec .422 1.000 > nohz_full=1-3,5-7 + club below > Throughput 2762.22 MB/sec .861 2.038 > > With Rik's patches and a club, tbench becomes nearly acceptable. > --- > include/linux/tick.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/include/linux/tick.h > +++ b/include/linux/tick.h > @@ -184,7 +184,7 @@ static inline const struct cpumask *hous > static inline bool is_housekeeping_cpu(int cpu) > { > #ifdef CONFIG_NO_HZ_FULL > - if (tick_nohz_full_enabled()) > + if (tick_nohz_full_enabled() && runqueue_is_isolated(cpu)) > return cpumask_test_cpu(cpu, housekeeping_mask); This makes me confused. How forcing timers to CPUs in isolcpus is making better results? > #endif > return true; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-30 20:36 ` Frederic Weisbecker @ 2016-01-31 2:52 ` Mike Galbraith 2016-01-31 5:37 ` Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2016-01-31 2:52 UTC (permalink / raw) To: Frederic Weisbecker Cc: Rik van Riel, linux-kernel, tglx, mingo, peterz, luto, Clark Williams On Sat, 2016-01-30 at 21:36 +0100, Frederic Weisbecker wrote: > On Sat, Jan 30, 2016 at 06:53:05PM +0100, Mike Galbraith wrote: > > On Sat, 2016-01-30 at 15:20 +0100, Frederic Weisbecker wrote: > > > On Fri, Jan 29, 2016 at 05:43:28PM -0500, Rik van Riel wrote: > > > > > > Run times for the microbenchmark: > > > > > > > > 4.4 3.8 seconds > > > > 4.5-rc1 3.7 seconds > > > > 4.5-rc1 + first patch 3.3 seconds > > > > 4.5-rc1 + both patches 2.3 seconds > > > > > > Very nice improvement! > > > > Tasty indeed. > > > > When nohz_full CPUs are not isolated, ie are being used as generic > > CPUs, get_nohz_timer_target() is a problem with things like tbench. > > So by isolated CPU you mean those part of isolcpus= boot option, > right? Yes, isolated in the scheduler sense, either via isolcpus= or cpusets. If the CPU is part of a scheduler domain, it is by definition part of the generic work crew. > > tbench 8 with Rik's patches applied: > > nohz_full=empty > > Throughput 3204.69 MB/sec 1.000 > > nohz_full=1-3,5-7 > > Throughput 1354.99 MB/sec .422 1.000 > > nohz_full=1-3,5-7 + club below > > Throughput 2762.22 MB/sec .861 2.038 > > > > With Rik's patches and a club, tbench becomes nearly acceptable. > > --- > > include/linux/tick.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > --- a/include/linux/tick.h > > +++ b/include/linux/tick.h > > @@ -184,7 +184,7 @@ static inline const struct cpumask *hous > > static inline bool is_housekeeping_cpu(int cpu) > > { > > #ifdef CONFIG_NO_HZ_FULL > > - if (tick_nohz_full_enabled()) > > + if (tick_nohz_full_enabled() && runqueue_is_isolated(cpu)) > > return cpumask_test_cpu(cpu, housekeeping_mask); > > This makes me confused. How forcing timers to CPUs in isolcpus is making > better results? It doesn't, it's shutting get_nohz_timer_target() down for those nohz_full CPUs that are NOT currently isolated, are thus generic CPUs with the capability to _become_ elite solo artists on demand. -Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy 2016-01-31 2:52 ` Mike Galbraith @ 2016-01-31 5:37 ` Mike Galbraith 0 siblings, 0 replies; 12+ messages in thread From: Mike Galbraith @ 2016-01-31 5:37 UTC (permalink / raw) To: Frederic Weisbecker Cc: Rik van Riel, linux-kernel, tglx, mingo, peterz, luto, Clark Williams On Sun, 2016-01-31 at 03:52 +0100, Mike Galbraith wrote: > On Sat, 2016-01-30 at 21:36 +0100, Frederic Weisbecker wrote: > > On Sat, Jan 30, 2016 at 06:53:05PM +0100, Mike Galbraith wrote: > > > On Sat, 2016-01-30 at 15:20 +0100, Frederic Weisbecker wrote: > > > > On Fri, Jan 29, 2016 at 05:43:28PM -0500, Rik van Riel wrote: > > > > > > > > Run times for the microbenchmark: > > > > > > > > > > 4.4 3.8 seconds > > > > > 4.5-rc1 3.7 seconds > > > > > 4.5-rc1 + first patch 3.3 seconds > > > > > 4.5-rc1 + both patches 2.3 seconds > > > > > > > > Very nice improvement! > > > > > > Tasty indeed. > > > > > > When nohz_full CPUs are not isolated, ie are being used as > > > generic > > > CPUs, get_nohz_timer_target() is a problem with things like > > > tbench. > > > > So by isolated CPU you mean those part of isolcpus= boot option, > > right? > > Yes, isolated in the scheduler sense, either via isolcpus= or > cpusets. > If the CPU is part of a scheduler domain, it is by definition part > of > the generic work crew. > > > > tbench 8 with Rik's patches applied: > > > nohz_full=empty > > > Throughput 3204.69 MB/sec 1.000 > > > nohz_full=1-3,5-7 > > > Throughput 1354.99 MB/sec .422 1.000 > > > nohz_full=1-3,5-7 + club below > > > Throughput 2762.22 MB/sec .861 2.038 > > > > > > With Rik's patches and a club, tbench becomes nearly acceptable. > > > --- > > > include/linux/tick.h | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > --- a/include/linux/tick.h > > > +++ b/include/linux/tick.h > > > @@ -184,7 +184,7 @@ static inline const struct cpumask *hous > > > static inline bool is_housekeeping_cpu(int cpu) > > > { > > > #ifdef CONFIG_NO_HZ_FULL > > > - if (tick_nohz_full_enabled()) > > > + if (tick_nohz_full_enabled() && > > > runqueue_is_isolated(cpu)) > > > return cpumask_test_cpu(cpu, housekeeping_mask); > > > > This makes me confused. How forcing timers to CPUs in isolcpus is > > making > > better results? > > It doesn't, it's shutting get_nohz_timer_target() down for those > nohz_full CPUs that are NOT currently isolated, are thus generic CPUs > with the capability to _become_ elite solo artists on demand. Damn, I'm gonna have to ask... If an isolated elite task sets the alarm, why would it want it to go off somewhere else, maybe in the middle of a death metal concert? Seems it must induce jitter. Why is this a good thing? -Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-01-31 5:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-29 22:22 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel 2016-01-29 22:22 ` [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals riel 2016-01-29 23:10 ` Peter Zijlstra 2016-01-30 3:36 ` Frederic Weisbecker 2016-01-30 3:47 ` Rik van Riel 2016-01-29 22:23 ` [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy riel 2016-01-29 22:43 ` Rik van Riel 2016-01-30 14:20 ` Frederic Weisbecker 2016-01-30 17:53 ` Mike Galbraith 2016-01-30 20:36 ` Frederic Weisbecker 2016-01-31 2:52 ` Mike Galbraith 2016-01-31 5:37 ` Mike Galbraith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).