[PATCH 0/2 v4] sched: Rewrite per entity runnable load average tracking

* [PATCH 0/2 v4] sched: Rewrite per entity runnable load average tracking
@ 2014-07-17 23:26 Yuyang Du
  2014-07-17 23:26 ` [PATCH 1/2 v4] sched: Remove update_rq_runnable_avg Yuyang Du
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Yuyang Du @ 2014-07-17 23:26 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: pjt, bsegall, arjan.van.de.ven, len.brown, rafael.j.wysocki,
	alan.cox, mark.gross, fengguang.wu, Yuyang Du

Thanks to Morten, Ben, and Fengguang.

v4 changes:

- Insert memory barrier before writing cfs_rq->load_last_update_copy.
- Fix typos.

We carried out some performance tests (thanks to Fengguang and his LKP). The results are shown
as follows. The patchset (including two patches) is on top of mainline v3.16-rc3. To make a fair
and clear comparison, we have two parts:

(1) v3.16-rc3 vs. PATCH 1/2 + 2/2
(2) PATCH 1/2 vs. PATCH 1/2 + 2/2

Overall, this rewrite has better performance, and reduced net overhead in load average tracking.

--------------------------------------------------------------------------------------

host: lkp-snb01
model: Sandy Bridge-EP
memory: 32G

host: lkp-sb03
model: Sandy Bridge-EP
memory: 64G

host: lkp-nex04
model: Nehalem-EX
memory: 256G

host: xps2
model: Nehalem
memory: 4G

host: lkp-a0x
model: Atom
memory: 8G

Legend:
	[+-]XX% - change percent
	~XX%    - stddev percent

(1) v3.16-rc3         PATCH 1/2 + 2/2
---------------  -------------------------  
      0.03 ~ 0%      +0.0%       0.03 ~ 0%  snb-drag/fileio/600s-100%-1HDD-ext4-64G-1024f-seqwr-sync
     51.72 ~ 1%      +0.5%      51.99 ~ 1%  snb-drag/fileio/600s-100%-1HDD-xfs-64G-1024f-rndrd-sync
     53.24 ~ 0%      +0.9%      53.72 ~ 0%  snb-drag/fileio/600s-100%-1HDD-xfs-64G-1024f-rndrw-sync
      0.01 ~ 0%      +0.0%       0.01 ~ 0%  snb-drag/fileio/600s-100%-1HDD-xfs-64G-1024f-rndwr-sync
      3.27 ~ 0%      -0.1%       3.27 ~ 0%  snb-drag/fileio/600s-100%-1HDD-xfs-64G-1024f-seqrd-sync
      0.02 ~ 0%      +0.0%       0.02 ~ 0%  snb-drag/fileio/600s-100%-1HDD-xfs-64G-1024f-seqrewr-sync
      0.02 ~ 0%      +0.0%       0.02 ~ 0%  snb-drag/fileio/600s-100%-1HDD-xfs-64G-1024f-seqwr-sync
    108.31 ~ 1%      +0.7%     109.06 ~ 0%  TOTAL fileio.request_latency_95%_ms

---------------  -------------------------  
    155810 ~ 3%     +62.6%     253355 ~ 0%  lkp-snb01/hackbench/1600%-process-pipe
    146931 ~ 1%      +5.5%     154948 ~ 0%  lkp-snb01/hackbench/1600%-process-socket
    172780 ~ 1%     +23.0%     212579 ~ 2%  lkp-snb01/hackbench/1600%-threads-pipe
    152966 ~ 0%      +3.6%     158433 ~ 0%  lkp-snb01/hackbench/1600%-threads-socket
     95943 ~ 0%      +2.7%      98501 ~ 0%  lkp-snb01/hackbench/50%-process-pipe
     86759 ~ 0%     +79.4%     155606 ~ 0%  lkp-snb01/hackbench/50%-process-socket
     90232 ~ 0%      +3.3%      93205 ~ 0%  lkp-snb01/hackbench/50%-threads-pipe
     79416 ~ 0%     +85.6%     147379 ~ 0%  lkp-snb01/hackbench/50%-threads-socket
    980841 ~ 1%     +29.9%    1274010 ~ 0%  TOTAL hackbench.throughput

---------------  -------------------------  
  3.02e+08 ~ 5%      -2.5%  2.944e+08 ~ 3%  lkp-a06/qperf/600s
  3.02e+08 ~ 5%      -2.5%  2.944e+08 ~ 3%  TOTAL qperf.sctp.bw

---------------  -------------------------  
 6.578e+08 ~ 1%      +1.1%  6.651e+08 ~ 1%  lkp-a06/qperf/600s
 6.578e+08 ~ 1%      +1.1%  6.651e+08 ~ 1%  TOTAL qperf.tcp.bw

---------------  -------------------------  
 6.678e+08 ~ 0%      +0.7%  6.728e+08 ~ 0%  lkp-a06/qperf/600s
 6.678e+08 ~ 0%      +0.7%  6.728e+08 ~ 0%  TOTAL qperf.udp.recv_bw

---------------  -------------------------  
 6.721e+08 ~ 0%      +1.1%  6.797e+08 ~ 0%  lkp-a06/qperf/600s
 6.721e+08 ~ 0%      +1.1%  6.797e+08 ~ 0%  TOTAL qperf.udp.send_bw

---------------  -------------------------  
     55388 ~ 2%      -1.9%      54324 ~ 0%  lkp-a06/qperf/600s
     55388 ~ 2%      -1.9%      54324 ~ 0%  TOTAL qperf.sctp.latency

---------------  -------------------------  
     39988 ~ 1%      -1.0%      39581 ~ 0%  lkp-a06/qperf/600s
     39988 ~ 1%      -1.0%      39581 ~ 0%  TOTAL qperf.tcp.latency

---------------  -------------------------  
     33022 ~ 2%      -1.6%      32484 ~ 0%  lkp-a06/qperf/600s
     33022 ~ 2%      -1.6%      32484 ~ 0%  TOTAL qperf.udp.latency

---------------  -------------------------  
   1048360 ~ 0%      +0.0%    1048360 ~ 0%  lkp-a05/iperf/300s-udp
   1048360 ~ 0%      +0.0%    1048360 ~ 0%  TOTAL iperf.udp.bps

---------------  -------------------------  
 4.801e+09 ~ 2%      -2.4%  4.688e+09 ~ 0%  lkp-a05/iperf/300s-tcp
 4.801e+09 ~ 2%      -2.4%  4.688e+09 ~ 0%  TOTAL iperf.tcp.receiver.bps

---------------  -------------------------  
 4.801e+09 ~ 2%      -2.4%  4.688e+09 ~ 0%  lkp-a05/iperf/300s-tcp
 4.801e+09 ~ 2%      -2.4%  4.688e+09 ~ 0%  TOTAL iperf.tcp.sender.bps

---------------  -------------------------  
    140261 ~ 1%      +2.6%     143971 ~ 0%  lkp-sb03/nepim/300s-100%-udp
    126862 ~ 1%      +4.4%     132471 ~ 4%  lkp-sb03/nepim/300s-100%-udp6
    577494 ~ 3%      -2.7%     561810 ~ 2%  lkp-sb03/nepim/300s-25%-udp
    515120 ~ 2%      +3.3%     532350 ~ 2%  lkp-sb03/nepim/300s-25%-udp6
   1359739 ~ 3%      +0.8%    1370604 ~ 2%  TOTAL nepim.udp.avg.kbps_in

---------------  -------------------------  
    160888 ~ 2%      +3.2%     165964 ~ 2%  lkp-sb03/nepim/300s-100%-udp
    127159 ~ 1%      +4.4%     132798 ~ 4%  lkp-sb03/nepim/300s-100%-udp6
    653177 ~ 3%      -1.0%     646770 ~ 3%  lkp-sb03/nepim/300s-25%-udp
    515540 ~ 2%      +4.1%     536440 ~ 2%  lkp-sb03/nepim/300s-25%-udp6
   1456766 ~ 3%      +1.7%    1481974 ~ 3%  TOTAL nepim.udp.avg.kbps_out

---------------  -------------------------  
    680285 ~ 1%      +1.7%     691663 ~ 1%  lkp-sb03/nepim/300s-100%-tcp
    645357 ~ 1%      +1.2%     653140 ~ 1%  lkp-sb03/nepim/300s-100%-tcp6
   2850752 ~ 1%      +0.0%    2851577 ~ 0%  lkp-sb03/nepim/300s-25%-tcp
   2588447 ~ 1%      +0.2%    2593352 ~ 0%  lkp-sb03/nepim/300s-25%-tcp6
   6764842 ~ 1%      +0.4%    6789733 ~ 0%  TOTAL nepim.tcp.avg.kbps_in

---------------  -------------------------  
    680449 ~ 1%      +1.7%     691824 ~ 1%  lkp-sb03/nepim/300s-100%-tcp
    645502 ~ 1%      +1.2%     653247 ~ 1%  lkp-sb03/nepim/300s-100%-tcp6
   2850934 ~ 1%      +0.0%    2851776 ~ 0%  lkp-sb03/nepim/300s-25%-tcp
   2588647 ~ 1%      +0.2%    2593553 ~ 0%  lkp-sb03/nepim/300s-25%-tcp6
   6765533 ~ 1%      +0.4%    6790402 ~ 0%  TOTAL nepim.tcp.avg.kbps_out

---------------  -------------------------  
     45789 ~ 1%      +1.9%      46658 ~ 0%  lkp-sb03/nuttcp/300s
     45789 ~ 1%      +1.9%      46658 ~ 0%  TOTAL nuttcp.throughput_Mbps

---------------  -------------------------  
     47139 ~ 4%      +3.6%      48854 ~ 3%  lkp-sb03/thrulay/300s
     47139 ~ 4%      +3.6%      48854 ~ 3%  TOTAL thrulay.throughput

---------------  -------------------------  
      0.02 ~11%     -10.1%       0.02 ~12%  lkp-sb03/thrulay/300s
      0.02 ~11%     -10.1%       0.02 ~12%  TOTAL thrulay.jitter

---------------  -------------------------  
      0.10 ~ 5%      -3.3%       0.10 ~ 4%  lkp-sb03/thrulay/300s
      0.10 ~ 5%      -3.3%       0.10 ~ 4%  TOTAL thrulay.RTT

---------------  -------------------------  
  75644346 ~ 0%      +0.5%   76029397 ~ 0%  xps2/pigz/100%-128K
  77167258 ~ 0%      +0.5%   77522343 ~ 0%  xps2/pigz/100%-512K
 152811604 ~ 0%      +0.5%  153551740 ~ 0%  TOTAL pigz.throughput

---------------  -------------------------  
     12773 ~ 0%      -1.2%      12615 ~ 0%  lkp-nex04/ebizzy/200%-100x-10s
     12773 ~ 0%      -1.2%      12615 ~ 0%  TOTAL ebizzy.throughput

---------------  -------------------------  
      6.87 ~ 2%     -83.6%       1.12 ~ 3%  lkp-snb01/hackbench/50%-process-socket
      6.43 ~ 2%     -79.8%       1.30 ~ 1%  lkp-snb01/hackbench/50%-threads-socket
     13.30 ~ 2%     -81.8%       2.42 ~ 2%  TOTAL perf-profile.cpu-cycles._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_aio_write

---------------  -------------------------  
      0.90 ~42%     -77.3%       0.20 ~16%  lkp-snb01/hackbench/1600%-process-pipe
      0.90 ~42%     -77.3%       0.20 ~16%  TOTAL perf-profile.cpu-cycles.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common.__wake_up_sync_key

---------------  -------------------------  
      1.76 ~ 2%     -83.7%       0.29 ~ 8%  lkp-snb01/hackbench/50%-process-socket
      1.08 ~ 1%     -71.8%       0.30 ~ 3%  lkp-snb01/hackbench/50%-threads-socket
      2.84 ~ 2%     -79.2%       0.59 ~ 5%  TOTAL perf-profile.cpu-cycles.__schedule.schedule.schedule_timeout.unix_stream_recvmsg.sock_aio_read

---------------  -------------------------  
      1.78 ~33%     -63.6%       0.65 ~28%  lkp-snb01/hackbench/1600%-process-pipe
      0.92 ~31%     -59.9%       0.37 ~30%  lkp-snb01/hackbench/1600%-threads-pipe
      1.55 ~10%    -100.0%       0.00 ~ 0%  lkp-snb01/hackbench/50%-process-socket
      1.84 ~ 5%     +14.9%       2.11 ~ 2%  lkp-snb01/hackbench/50%-threads-pipe
      1.43 ~ 9%     -79.7%       0.29 ~ 2%  lkp-snb01/hackbench/50%-threads-socket
      7.51 ~17%     -54.5%       3.42 ~10%  TOTAL perf-profile.cpu-cycles._raw_spin_lock.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common

---------------  -------------------------  
      0.89 ~20%     -88.0%       0.11 ~19%  lkp-snb01/hackbench/1600%-process-pipe
      0.47 ~ 5%    +110.0%       0.98 ~13%  lkp-snb01/hackbench/50%-process-pipe
      1.35 ~14%     -19.7%       1.09 ~13%  TOTAL perf-profile.cpu-cycles.__schedule.schedule_user.sysret_careful.__write_nocancel

---------------  -------------------------  
      2.81 ~ 2%     +40.3%       3.94 ~ 5%  lkp-snb01/hackbench/50%-process-pipe
      1.37 ~ 7%     -82.5%       0.24 ~ 5%  lkp-snb01/hackbench/50%-process-socket
      2.84 ~ 1%     +42.8%       4.06 ~ 1%  lkp-snb01/hackbench/50%-threads-pipe
      1.56 ~ 3%     -75.2%       0.39 ~ 4%  lkp-snb01/hackbench/50%-threads-socket
      8.58 ~ 3%      +0.5%       8.63 ~ 3%  TOTAL perf-profile.cpu-cycles.idle_cpu.select_task_rq_fair.try_to_wake_up.default_wake_function.autoremove_wake_function

---------------  -------------------------  
      2.60 ~33%     -72.5%       0.72 ~16%  lkp-snb01/hackbench/1600%-process-pipe
      0.97 ~15%     -52.8%       0.46 ~17%  lkp-snb01/hackbench/1600%-threads-pipe
      2.85 ~ 1%     +26.9%       3.62 ~ 3%  lkp-snb01/hackbench/50%-process-pipe
      6.42 ~16%     -25.3%       4.80 ~ 6%  TOTAL perf-profile.cpu-cycles.__schedule.schedule.pipe_wait.pipe_read.new_sync_read

---------------  -------------------------  
      1.14 ~22%     -75.2%       0.28 ~16%  lkp-snb01/hackbench/1600%-process-pipe
      0.91 ~14%     -56.9%       0.39 ~16%  lkp-snb01/hackbench/1600%-threads-pipe
      0.88 ~ 2%     +36.5%       1.20 ~ 6%  lkp-snb01/hackbench/50%-process-pipe
      0.88 ~ 2%     +41.6%       1.25 ~ 2%  lkp-snb01/hackbench/50%-threads-pipe
      3.82 ~11%     -18.0%       3.13 ~ 6%  TOTAL perf-profile.cpu-cycles.select_task_rq_fair.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common

(2) PATCH 1/2         PATCH 1/2 + 2/2 
---------------  -------------------------  
      6.73 ~ 2%     -83.3%       1.12 ~ 3%  lkp-snb01/hackbench/50%-process-socket
      6.63 ~ 0%     -80.4%       1.30 ~ 1%  lkp-snb01/hackbench/50%-threads-socket
     13.36 ~ 1%     -81.9%       2.42 ~ 2%  TOTAL perf-profile.cpu-cycles._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_aio_write

---------------  -------------------------  
      1.10 ~46%     -81.5%       0.20 ~16%  lkp-snb01/hackbench/1600%-process-pipe
      1.10 ~46%     -81.5%       0.20 ~16%  TOTAL perf-profile.cpu-cycles.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common.__wake_up_sync_key

---------------  -------------------------  
      1.80 ~ 1%     -84.0%       0.29 ~ 8%  lkp-snb01/hackbench/50%-process-socket
      1.09 ~ 1%     -72.2%       0.30 ~ 3%  lkp-snb01/hackbench/50%-threads-socket
      2.89 ~ 1%     -79.6%       0.59 ~ 5%  TOTAL perf-profile.cpu-cycles.__schedule.schedule.schedule_timeout.unix_stream_recvmsg.sock_aio_read

---------------  -------------------------  
      1.29 ~29%     -49.7%       0.65 ~28%  lkp-snb01/hackbench/1600%-process-pipe
      0.83 ~47%     -55.8%       0.37 ~30%  lkp-snb01/hackbench/1600%-threads-pipe
      1.38 ~ 7%    -100.0%       0.00 ~ 0%  lkp-snb01/hackbench/50%-process-socket
      1.61 ~ 4%     -82.0%       0.29 ~ 2%  lkp-snb01/hackbench/50%-threads-socket
      5.11 ~18%     -74.5%       1.30 ~23%  TOTAL perf-profile.cpu-cycles._raw_spin_lock.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common

---------------  -------------------------  
      0.83 ~14%     -87.1%       0.11 ~19%  lkp-snb01/hackbench/1600%-process-pipe
      0.50 ~ 3%     +97.3%       0.98 ~13%  lkp-snb01/hackbench/50%-process-pipe
      1.33 ~10%     -18.1%       1.09 ~13%  TOTAL perf-profile.cpu-cycles.__schedule.schedule_user.sysret_careful.__write_nocancel

---------------  -------------------------  
      1.19 ~21%     -52.1%       0.57 ~30%  lkp-snb01/hackbench/1600%-threads-pipe
      2.95 ~ 0%     +33.6%       3.94 ~ 5%  lkp-snb01/hackbench/50%-process-pipe
      1.52 ~ 6%     -84.2%       0.24 ~ 5%  lkp-snb01/hackbench/50%-process-socket
      2.98 ~ 1%     +36.4%       4.06 ~ 1%  lkp-snb01/hackbench/50%-threads-pipe
      1.50 ~ 3%     -74.2%       0.39 ~ 4%  lkp-snb01/hackbench/50%-threads-socket
     10.13 ~ 4%      -9.2%       9.20 ~ 5%  TOTAL perf-profile.cpu-cycles.idle_cpu.select_task_rq_fair.try_to_wake_up.default_wake_function.autoremove_wake_function

---------------  -------------------------  
      2.85 ~35%     -74.9%       0.72 ~16%  lkp-snb01/hackbench/1600%-process-pipe
      0.92 ~13%     -50.2%       0.46 ~17%  lkp-snb01/hackbench/1600%-threads-pipe
      2.92 ~ 1%     +23.9%       3.62 ~ 3%  lkp-snb01/hackbench/50%-process-pipe
      6.69 ~17%     -28.3%       4.80 ~ 6%  TOTAL perf-profile.cpu-cycles.__schedule.schedule.pipe_wait.pipe_read.new_sync_read

---------------  -------------------------  
    153533 ~ 2%     +65.0%     253355 ~ 0%  lkp-snb01/hackbench/1600%-process-pipe
    152059 ~ 0%      +1.9%     154948 ~ 0%  lkp-snb01/hackbench/1600%-process-socket
    174164 ~ 2%     +22.1%     212579 ~ 2%  lkp-snb01/hackbench/1600%-threads-pipe
    158193 ~ 0%      +0.2%     158433 ~ 0%  lkp-snb01/hackbench/1600%-threads-socket
     94656 ~ 0%      +4.1%      98501 ~ 0%  lkp-snb01/hackbench/50%-process-pipe
     87638 ~ 0%     +77.6%     155606 ~ 0%  lkp-snb01/hackbench/50%-process-socket
     89973 ~ 0%      +3.6%      93205 ~ 0%  lkp-snb01/hackbench/50%-threads-pipe
     80210 ~ 0%     +83.7%     147379 ~ 0%  lkp-snb01/hackbench/50%-threads-socket
    990430 ~ 1%     +28.6%    1274010 ~ 0%  TOTAL hackbench.throughput

---------------  -------------------------  
    702188 ~ 0%      -1.5%     691663 ~ 1%  lkp-sb03/nepim/300s-100%-tcp
    655502 ~ 0%      -0.4%     653140 ~ 1%  lkp-sb03/nepim/300s-100%-tcp6
   2860533 ~ 0%      -0.3%    2851577 ~ 0%  lkp-sb03/nepim/300s-25%-tcp
   2609335 ~ 0%      -0.6%    2593352 ~ 0%  lkp-sb03/nepim/300s-25%-tcp6
   6827559 ~ 0%      -0.6%    6789733 ~ 0%  TOTAL nepim.tcp.avg.kbps_in

---------------  -------------------------  
    702354 ~ 0%      -1.5%     691824 ~ 1%  lkp-sb03/nepim/300s-100%-tcp
    655502 ~ 0%      -0.3%     653247 ~ 1%  lkp-sb03/nepim/300s-100%-tcp6
   2860734 ~ 0%      -0.3%    2851776 ~ 0%  lkp-sb03/nepim/300s-25%-tcp
   2609536 ~ 0%      -0.6%    2593553 ~ 0%  lkp-sb03/nepim/300s-25%-tcp6
   6828128 ~ 0%      -0.6%    6790402 ~ 0%  TOTAL nepim.tcp.avg.kbps_out

---------------  -------------------------  
    140076 ~ 0%      +2.8%     143971 ~ 0%  lkp-sb03/nepim/300s-100%-udp
    126302 ~ 0%      +4.9%     132471 ~ 4%  lkp-sb03/nepim/300s-100%-udp6
    557984 ~ 0%      +0.7%     561810 ~ 2%  lkp-sb03/nepim/300s-25%-udp
    501648 ~ 1%      +6.1%     532350 ~ 2%  lkp-sb03/nepim/300s-25%-udp6
   1326011 ~ 0%      +3.4%    1370604 ~ 2%  TOTAL nepim.udp.avg.kbps_in

---------------  -------------------------  
    162279 ~ 1%      +2.3%     165964 ~ 2%  lkp-sb03/nepim/300s-100%-udp
    127240 ~ 1%      +4.4%     132798 ~ 4%  lkp-sb03/nepim/300s-100%-udp6
    649372 ~ 1%      -0.4%     646770 ~ 3%  lkp-sb03/nepim/300s-25%-udp
    502056 ~ 1%      +6.8%     536440 ~ 2%  lkp-sb03/nepim/300s-25%-udp6
   1440949 ~ 1%      +2.8%    1481974 ~ 3%  TOTAL nepim.udp.avg.kbps_out

---------------  -------------------------  
     49149 ~ 1%      -0.6%      48854 ~ 3%  lkp-sb03/thrulay/300s
     49149 ~ 1%      -0.6%      48854 ~ 3%  TOTAL thrulay.throughput

---------------  -------------------------  
      0.02 ~ 9%      +3.6%       0.02 ~12%  lkp-sb03/thrulay/300s
      0.02 ~ 9%      +3.6%       0.02 ~12%  TOTAL thrulay.jitter

---------------  -------------------------  
      0.10 ~ 1%      +2.1%       0.10 ~ 4%  lkp-sb03/thrulay/300s
      0.10 ~ 1%      +2.1%       0.10 ~ 4%  TOTAL thrulay.RTT

---------------  -------------------------  
 4.817e+09 ~ 1%      -2.7%  4.688e+09 ~ 0%  lkp-a05/iperf/300s-tcp
 4.817e+09 ~ 1%      -2.7%  4.688e+09 ~ 0%  TOTAL iperf.tcp.receiver.bps

---------------  -------------------------  
 4.817e+09 ~ 1%      -2.7%  4.688e+09 ~ 0%  lkp-a05/iperf/300s-tcp
 4.817e+09 ~ 1%      -2.7%  4.688e+09 ~ 0%  TOTAL iperf.tcp.sender.bps

---------------  -------------------------  
 3.036e+08 ~ 7%      -3.0%  2.944e+08 ~ 3%  lkp-a06/qperf/600s
 3.036e+08 ~ 7%      -3.0%  2.944e+08 ~ 3%  TOTAL qperf.sctp.bw

---------------  -------------------------  
 6.678e+08 ~ 0%      -0.4%  6.651e+08 ~ 1%  lkp-a06/qperf/600s
 6.678e+08 ~ 0%      -0.4%  6.651e+08 ~ 1%  TOTAL qperf.tcp.bw

---------------  -------------------------  
  6.73e+08 ~ 0%      -0.0%  6.728e+08 ~ 0%  lkp-a06/qperf/600s
  6.73e+08 ~ 0%      -0.0%  6.728e+08 ~ 0%  TOTAL qperf.udp.recv_bw

---------------  -------------------------  
 6.773e+08 ~ 0%      +0.4%  6.797e+08 ~ 0%  lkp-a06/qperf/600s
 6.773e+08 ~ 0%      +0.4%  6.797e+08 ~ 0%  TOTAL qperf.udp.send_bw

---------------  -------------------------  
     54508 ~ 2%      -0.3%      54324 ~ 0%  lkp-a06/qperf/600s
     54508 ~ 2%      -0.3%      54324 ~ 0%  TOTAL qperf.sctp.latency

---------------  -------------------------  
     39293 ~ 1%      +0.7%      39581 ~ 0%  lkp-a06/qperf/600s
     39293 ~ 1%      +0.7%      39581 ~ 0%  TOTAL qperf.tcp.latency

---------------  -------------------------  
     31924 ~ 0%      +1.8%      32484 ~ 0%  lkp-a06/qperf/600s
     31924 ~ 0%      +1.8%      32484 ~ 0%  TOTAL qperf.udp.latency

---------------  -------------------------  
   1048360 ~ 0%      +0.0%    1048360 ~ 0%  lkp-a05/iperf/300s-udp
   1048360 ~ 0%      +0.0%    1048360 ~ 0%  TOTAL iperf.udp.bps

---------------  -------------------------  
     45897 ~ 0%      +1.7%      46658 ~ 0%  lkp-sb03/nuttcp/300s
     45897 ~ 0%      +1.7%      46658 ~ 0%  TOTAL nuttcp.throughput_Mbps

---------------  -------------------------  
  75801537 ~ 0%      +0.3%   76029397 ~ 0%  xps2/pigz/100%-128K
  77314567 ~ 0%      +0.3%   77522343 ~ 0%  xps2/pigz/100%-512K
 153116104 ~ 0%      +0.3%  153551740 ~ 0%  TOTAL pigz.throughput

---------------  -------------------------  
     12763 ~ 0%      -1.2%      12615 ~ 0%  lkp-nex04/ebizzy/200%-100x-10s
     12763 ~ 0%      -1.2%      12615 ~ 0%  TOTAL ebizzy.throughput

--------------------------------------------------------------------------------------

Regarding the overflow issue, we now have for both entity and cfs_rq:

struct sched_avg {
    .....
    u64 load_sum;
    unsigned long load_avg;
    .....
};

Given the weight for both entity and cfs_rq is:

struct load_weight {
    unsigned long weight;
    .....
};

So, load_sum's max is 47742 * load.weight (which is unsigned long), then on 32bit,
it is absolutly safe. On 64bit, with unsigned long being 64bit, but we can afford
about 4353082796 (=2^64/47742/88761) entities with the highest weight (=88761)
always runnable, even considering we may multiply 1<<15 in decay_load64, we can
still support 132845 (=4353082796/2^15) always runnable, which should be acceptible.

load_avg = load_sum / 47742 = load.weight (which is unsigned long), so it should be
perfectly safe for both entity (even with arbitrary user group share) and cfs_rq on
both 32bit and 64bit. Originally, we saved this division, but have to get it back
because of the overflow issue on 32bit (actually load average itself is safe from
overflow, but the rest of the code referencing it always uses long, such as cpu_load,
etc., which prevents it from saving).

v3 changes:

Many thanks to Ben for v3 revision.

- Fix overflow issue both for entity and cfs_rq on both 32bit and 64bit.
- Track all entities (both task and group entity) due to group entity's clock issue.
  This actually improves code simplicity.
- Make a copy of cfs_rq sched_avg's last_update_time, to read an intact 64bit
  variable on 32bit machine when in data race (hope I did it right).
- Minor fixes and code improvement.

v2 changes:

Thanks to PeterZ and Ben for their help in fixing the issues and improving
the quality, and Fengguang and his 0Day in finding compile errors in different
configurations for version 2.

- Batch update the tg->load_avg, making sure it is up-to-date before update_cfs_shares
- Remove migrating task from the old CPU/cfs_rq, and do so with atomic operations

Yuyang Du (2):
  sched: Remove update_rq_runnable_avg
  sched: Rewrite per entity runnable load average tracking

 include/linux/sched.h |   21 +-
 kernel/sched/debug.c  |   30 +--
 kernel/sched/fair.c   |  566 ++++++++++++++++---------------------------------
 kernel/sched/proc.c   |    2 +-
 kernel/sched/sched.h  |   22 +-
 5 files changed, 207 insertions(+), 434 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 47+ messages in thread