linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40%
@ 2016-01-30  3:36 riel
  2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: riel @ 2016-01-30  3:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, mingo, luto, fweisbec, peterz, clark

unning with nohz_full introduces a fair amount of overhead.
Specifically, various things that are usually done from the
timer interrupt are now done at syscall, irq, and guest
entry and exit times.

However, some of the code that is called every single time
has only ever worked at jiffy resolution. The code in
__acct_update_integrals was also doing some unnecessary
calculations.

Getting rid of the unnecessary calculations, without
changing any of the functionality in __acct_update_integrals
gets us about an 11% win.

Not calling the time statistics updating code more than
once per jiffy, like is done on housekeeping CPUs and on
all the CPUs of a non-nohz_full system, shaves off a
further 30%.

I tested this series with a microbenchmark calling
an invalid syscall number ten million times in a row,
on a nohz_full cpu.

    Run times for the microbenchmark:
    
4.4				3.8 seconds
4.5-rc1				3.7 seconds
4.5-rc1 + first patch		3.3 seconds
4.5-rc1 + first 3 patches	3.1 seconds
4.5-rc1 + all patches		2.3 seconds

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH 0/4 v3] sched,time: reduce nohz_full syscall overhead 40%
@ 2016-02-01  2:12 riel
  2016-02-01  2:12 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
  0 siblings, 1 reply; 21+ messages in thread
From: riel @ 2016-02-01  2:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweisbec, tglx, mingo, luto, peterz, clark

(v3: address comments raised by Frederic)

Running with nohz_full introduces a fair amount of overhead.
Specifically, various things that are usually done from the
timer interrupt are now done at syscall, irq, and guest
entry and exit times.

However, some of the code that is called every single time
has only ever worked at jiffy resolution. The code in
__acct_update_integrals was also doing some unnecessary
calculations.

Getting rid of the unnecessary calculations, without
changing any of the functionality in __acct_update_integrals
gets us about an 11% win.

Not calling the time statistics updating code more than
once per jiffy, like is done on housekeeping CPUs and on
all the CPUs of a non-nohz_full system, shaves off a
further 30%.

I tested this series with a microbenchmark calling
an invalid syscall number ten million times in a row,
on a nohz_full cpu.

    Run times for the microbenchmark:
    
4.4				3.8 seconds
4.5-rc1				3.7 seconds
4.5-rc1 + first patch		3.3 seconds
4.5-rc1 + first 3 patches	3.1 seconds
4.5-rc1 + all patches		2.3 seconds

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH 0/4 v4] sched,time: reduce nohz_full syscall overhead 40%
@ 2016-02-01 19:21 riel
  2016-02-01 19:21 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
  0 siblings, 1 reply; 21+ messages in thread
From: riel @ 2016-02-01 19:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, peterz, fweisbec, clark, luto, mingo

(v4: address comments by Peter and Frederic)

Running with nohz_full introduces a fair amount of overhead.
Specifically, various things that are usually done from the
timer interrupt are now done at syscall, irq, and guest
entry and exit times.

However, some of the code that is called every single time
has only ever worked at jiffy resolution. The code in
__acct_update_integrals was also doing some unnecessary
calculations.

Getting rid of the unnecessary calculations, without
changing any of the functionality in __acct_update_integrals
gets us about an 11% win.

Not calling the time statistics updating code more than
once per jiffy, like is done on housekeeping CPUs and on
all the CPUs of a non-nohz_full system, shaves off a
further 30%.

I tested this series with a microbenchmark calling
an invalid syscall number ten million times in a row,
on a nohz_full cpu.

    Run times for the microbenchmark:
    
4.4				3.8 seconds
4.5-rc1				3.7 seconds
4.5-rc1 + first patch		3.3 seconds
4.5-rc1 + first 3 patches	3.1 seconds
4.5-rc1 + all patches		2.3 seconds

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH 0/4 v5] sched,time: reduce nohz_full syscall overhead 40%
@ 2016-02-02 17:19 riel
  2016-02-02 17:19 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
  0 siblings, 1 reply; 21+ messages in thread
From: riel @ 2016-02-02 17:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweisbec, tglx, mingo, luto, peterz, clark, eric.dumazet

(v5: address comments by Frederic & Peter, fix bug found by Eric)

Running with nohz_full introduces a fair amount of overhead.
Specifically, various things that are usually done from the
timer interrupt are now done at syscall, irq, and guest
entry and exit times.

However, some of the code that is called every single time
has only ever worked at jiffy resolution. The code in
__acct_update_integrals was also doing some unnecessary
calculations.

Getting rid of the unnecessary calculations, without
changing any of the functionality in __acct_update_integrals
gets us about an 11% win.

Not calling the time statistics updating code more than
once per jiffy, like is done on housekeeping CPUs and on
all the CPUs of a non-nohz_full system, shaves off a
further 30%.

I tested this series with a microbenchmark calling
an invalid syscall number ten million times in a row,
on a nohz_full cpu.

    Run times for the microbenchmark:
    
4.4				3.8 seconds
4.5-rc1				3.7 seconds
4.5-rc1 + first patch		3.3 seconds
4.5-rc1 + first 3 patches	3.1 seconds
4.5-rc1 + all patches		2.3 seconds

   Same test on a non-NOHZ_FULL, non-housekeeping CPU:
all kernels			1.86 seconds

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40%
@ 2016-02-11  1:08 riel
  2016-02-11  1:08 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
  0 siblings, 1 reply; 21+ messages in thread
From: riel @ 2016-02-11  1:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweisbec, tglx, mingo, luto, peterz, clark, eric.dumazet

(v6: make VIRT_CPU_ACCOUNTING_GEN jiffy granularity)

Running with nohz_full introduces a fair amount of overhead.
Specifically, various things that are usually done from the
timer interrupt are now done at syscall, irq, and guest
entry and exit times.

However, some of the code that is called every single time
has only ever worked at jiffy resolution. The code in
__acct_update_integrals was also doing some unnecessary
calculations.

Getting rid of the unnecessary calculations, without
changing any of the functionality in __acct_update_integrals
gets us about an 11% win.

Not calling the time statistics updating code more than
once per jiffy, like is done on housekeeping CPUs and on
all the CPUs of a non-nohz_full system, shaves off a
further 30%.

I tested this series with a microbenchmark calling
an invalid syscall number ten million times in a row,
on a nohz_full cpu.

    Run times for the microbenchmark:
    
4.4                             3.8 seconds
4.5-rc1                         3.7 seconds
4.5-rc1 + first patch           3.3 seconds
4.5-rc1 + first 3 patches       3.1 seconds
4.5-rc1 + all patches           2.3 seconds

   Same test on a non-NOHZ_FULL, non-housekeeping CPU:
all kernels                     1.86 seconds

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-02-11  1:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-30  3:36 [PATCH 0/2] sched,time: reduce nohz_full syscall overhead 40% riel
2016-01-30  3:36 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-01-30  4:56   ` kbuild test robot
2016-01-30 14:44   ` Frederic Weisbecker
2016-01-30 17:53     ` Rik van Riel
2016-02-01 11:30       ` Peter Zijlstra
2016-01-30  3:36 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-01-30 16:15   ` Frederic Weisbecker
2016-01-30  3:36 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
2016-01-30 16:24   ` Frederic Weisbecker
2016-01-30  3:36 ` [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy riel
2016-02-01  2:12 [PATCH 0/4 v3] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-01  2:12 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-01  4:46   ` kbuild test robot
2016-02-01  8:37   ` Thomas Gleixner
2016-02-01  9:22     ` Peter Zijlstra
2016-02-01  9:31       ` Thomas Gleixner
2016-02-01 13:44       ` Rik van Riel
2016-02-01 13:51         ` Peter Zijlstra
2016-02-01 19:21 [PATCH 0/4 v4] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-01 19:21 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-02 17:19 [PATCH 0/4 v5] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-02 17:19 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-11  1:08 [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-11  1:08 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).