All of lore.kernel.org
 help / color / mirror / Atom feed
* Confusing Performance Numbers
@ 2013-11-12 15:31 Richard Purdie
  2013-11-12 23:33 ` Phil Blundell
  2013-11-13 14:02 ` Enrico Scholz
  0 siblings, 2 replies; 3+ messages in thread
From: Richard Purdie @ 2013-11-12 15:31 UTC (permalink / raw)
  To: openembedded-core

I had some ideas I decided to test out. Specifically:

a) stripping -native binaries before putting them in the sysroot
b) compiling -natives with -march=native
c) using hardlinks for sysroot_stage_dir

I was wondering if any of these could give a speedup to the build and if
so, whether it was a useful one. I ran this through our standard
performance script and the results were:

# Baseline
#./perfscript -c d8baa6c89b2f29e8ccc814c61dfd3b106af4fd65
#dax,(nobranch):d8baa6c89b2f29e8ccc814c61dfd3b106af4fd65,poky-10.0.0.final-391-gd8baa6c,38:41.96,11:01.73,5:02.05,4:34.08,0:11.63,0:08.04,0:01.76,25582020,4853400

# staging: Strip native binaries
#./perfscript -c 3e7cab95423c688d7db7355d86b461c1408381d0
#dax,(nobranch):3e7cab95423c688d7db7355d86b461c1408381d0,poky-10.0.0.final-392-g3e7cab9,39:07.56,11:02.05,4:46.99,4:33.99,0:11.63,0:09.40,0:01.73,25398400,4669464

#bitbake.conf: Add -march=native
#./perfscript -c d30b212c8975ad96a35ee6bace731b9823216570    
#dax,(nobranch):d30b212c8975ad96a35ee6bace731b9823216570,poky-10.0.0.final-393-gd30b212,39:16.88,11:41.67,5:05.63,4:31.88,0:11.61,0:07.94,0:01.70,25401000,4670192

# staging: Experiment with hardlinking for sysroot_stage_dir
#./perfscript -c 2feafbbd6338acd52577074db993da4e6d2de03f
#dax,master2:2feafbbd6338acd52577074db993da4e6d2de03f,poky-10.0.0.final-394-g2feafbb,39:49.88,11:11.63,4:41.16,4:38.85,0:11.40,0:07.90,0:01.73,23438704,4669984

so each time, the overall build time seems to rise slightly (first
number). The time to bitbake the kernel remains roughly the same (second
number), maybe the -march=native makes it take fractionally longer?
Stripping the binaries is worth about 200MB of disk use (with and
without rmwork). The hardlink patch is worth about 2GB disk use
(non-rmwork).

I also totalled the time in each task type, the output is below. The
numbers there are interesting as the strip patch seems to add 100s to
do_populate_sysroot. There doesn't seem to be a big difference in
compile time.

Its very hard to figure out what in these tests might be noise and what
if anything was a real gain (other than the disk usage improvements). I
suspect the hardlinking patch may be worthwhile, the other two are
perhaps not useful.

I often get asked what things we've tried so this should go some way to
documenting that and give people in insight into how we can test
changes.

Cheers,

Richard

baseline (master):
do_compile                 4062.83s
do_compile_kernelmodules   69.14s
do_compile_ptest_base      1.09s
do_configure               5455.18s
do_configure_ptest_base    1.11s
do_deploy                  13.34s
do_fetch                   20.4s
do_generate_toolchain_file 0.09s
do_install                 672.76s
do_install_locale          0.24s
do_install_ptest_base      0.72s
do_kernel_checkout         109.76s
do_kernel_configcheck      4.37s
do_kernel_configme         69.8s
do_kernel_link_vmlinux     0.09s
do_multilib_install        0.0s
do_package                 1136.39s
do_package_write_rpm       964.58s
do_packagedata             132.65s
do_patch                   96.29s
do_populate_lic            160.33s
do_populate_sysroot        490.19s
do_rootfs                  178.94s
do_sizecheck               0.08s
do_strip                   0.08s
do_uboot_mkimage           0.09s
do_unpack                  258.55s
do_validate_branches       18.69s
richard@dax:/media/build1/pokyperf/build-perf-test$ /home/richard/buildstats-tasktotals.py results-3e7cab9-20131112082932/buildstats-test1/core-image-sato-qemux86/201311120837/
do_compile                 4108.84s
do_compile_kernelmodules   69.34s
do_compile_ptest_base      0.99s
do_configure               5419.83s
do_configure_ptest_base    1.02s
do_deploy                  13.88s
do_fetch                   37.07s
do_generate_toolchain_file 0.1s
do_install                 628.89s
do_install_locale          0.53s
do_install_ptest_base      0.85s
do_kernel_checkout         47.62s
do_kernel_configcheck      4.25s
do_kernel_configme         94.91s
do_kernel_link_vmlinux     0.15s
do_multilib_install        0.0s
do_package                 1176.35s
do_package_write_rpm       947.94s
do_packagedata             182.74s
do_patch                   101.38s
do_populate_lic            152.46s
do_populate_sysroot        593.62s
do_rootfs                  180.68s
do_sizecheck               0.09s
do_strip                   0.14s
do_uboot_mkimage           0.08s
do_unpack                  290.88s
do_validate_branches       73.12s
richard@dax:/media/build1/pokyperf/build-perf-test$ /home/richard/buildstats-tasktotals.py results-d30b212-20131112093824/buildstats-test1/core-image-sato-qemux86/201311120946/
do_bundle_initramfs        0.09s
do_compile                 4183.91s
do_compile_kernelmodules   69.55s
do_compile_ptest_base      1.1s
do_configure               5535.51s
do_configure_ptest_base    0.98s
do_deploy                  13.37s
do_evacuate_scripts        4.32s
do_fetch                   47.25s
do_generate_toolchain_file 0.09s
do_install                 667.07s
do_install_locale          0.23s
do_install_ptest_base      0.81s
do_kernel_checkout         39.4s
do_kernel_configcheck      4.39s
do_kernel_configme         11.32s
do_kernel_link_vmlinux     0.09s
do_multilib_install        0.0s
do_package                 1176.05s
do_package_write_rpm       987.45s
do_packagedata             162.89s
do_patch                   97.33s
do_populate_lic            127.9s
do_populate_sysroot        594.76s
do_rootfs                  180.98s
do_sizecheck               0.1s
do_strip                   0.12s
do_uboot_mkimage           0.08s
do_unpack                  319.22s
do_validate_branches       12.08s
:/media/build1/pokyperf/build-perf-test$ /home/richard/buildstats-tasktotals.py results-2feafbb-20131112002904/buildstats-test1/core-image-sato-qemux86/201311120037/
do_bundle_initramfs        0.09s
do_compile                 4195.68s
do_compile_kernelmodules   71.07s
do_compile_ptest_base      1.09s
do_configure               5741.49s
do_configure_ptest_base    1.12s
do_deploy                  13.33s
do_evacuate_scripts        0.18s
do_fetch                   18.1s
do_generate_toolchain_file 0.09s
do_install                 697.21s
do_install_locale          0.38s
do_install_ptest_base      0.84s
do_kernel_checkout         143.08s
do_kernel_configcheck      3.85s
do_kernel_configme         60.15s
do_kernel_link_vmlinux     0.09s
do_multilib_install        0.0s
do_package                 1118.97s
do_package_write_rpm       957.41s
do_packagedata             134.04s
do_patch                   101.13s
do_populate_lic            123.11s
do_populate_sysroot        598.03s
do_rootfs                  182.04s
do_sizecheck               0.11s
do_strip                   0.15s
do_uboot_mkimage           0.1s
do_unpack                  286.99s
do_validate_branches       11.91s




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Confusing Performance Numbers
  2013-11-12 15:31 Confusing Performance Numbers Richard Purdie
@ 2013-11-12 23:33 ` Phil Blundell
  2013-11-13 14:02 ` Enrico Scholz
  1 sibling, 0 replies; 3+ messages in thread
From: Phil Blundell @ 2013-11-12 23:33 UTC (permalink / raw)
  To: Richard Purdie; +Cc: openembedded-core

On Tue, 2013-11-12 at 15:31 +0000, Richard Purdie wrote:
> I also totalled the time in each task type, the output is below. The
> numbers there are interesting as the strip patch seems to add 100s to
> do_populate_sysroot. There doesn't seem to be a big difference in
> compile time.

That's probably to be expected: the amount of disk bandwidth you save by
not copying the symbols into the sysroot (and into sstate) is almost
certainly going to be outweighed by the time it takes to strip the files
in the first place.  So that set of numbers at least does seem
believable.

In theory at least, adding -Wl,-s to BUILD_LDFLAGS could give you the
stripping behaviour at lower cost since the linker could avoid writing
out the symbol table in the first place.  But I'm not entirely sure how
this option is actually implemented within the linker (or whether ld.bfd
and ld.gold do the same thing in that respect) and it's entirely
possible that it might be no better in practice.  

Using -march=native for natives is an interesting idea, though even if
this was a win on a single machine it might obviously be a net loss in
the presence of a shared sstate cache.  Of course, the vast majority of
-native binaries are run so seldom that their impact on the overall
build performance is going to be negligible: the ones that dominate are
going to be the toolchain.  One idea that I've seen suggested from time
to time (but never actually tried out) is that it might be a win under
some circumstances to do a 2-stage bootstrap of gcc-native and then use
that gcc-native to build gcc-cross as well as all other native packages.
If you were building everything from scratch every time then it's hard
to imagine this being a real benefit unless your distribution bundled
compiler was spectacularly rubbish, but if you assume that these gccs
will come out of sstate for most builds then it starts to seem more
plausible that this might be worth doing.

p.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Confusing Performance Numbers
  2013-11-12 15:31 Confusing Performance Numbers Richard Purdie
  2013-11-12 23:33 ` Phil Blundell
@ 2013-11-13 14:02 ` Enrico Scholz
  1 sibling, 0 replies; 3+ messages in thread
From: Enrico Scholz @ 2013-11-13 14:02 UTC (permalink / raw)
  To: openembedded-core

Richard Purdie
<richard.purdie-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
writes:

> I had some ideas I decided to test out. Specifically:
> Its very hard to figure out what in these tests might be noise and what
> if anything was a real gain (other than the disk usage improvements). I
> suspect the hardlinking patch may be worthwhile, the other two are
> perhaps not useful.

fwiw, we record performance data as part of our normal build process and
it should be possible to compare these statistics.  Results are e.g.

             https://www.cvg.de/people/ensc/metrics.txt.gz
             https://www.cvg.de/people/ensc/metrics.html

(there is much room to improve the visualization; a new colleague just
started to work on it).  Relevant changes/classes are

  https://www.cvg.de/people/ensc/metrics.py
  https://www.cvg.de/people/ensc/elito-metrics.bbclass
  https://www.cvg.de/people/ensc/0008-build.py-fire-TaskFailed-with-same-environment-as-Ta.patch




Enrico


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-11-13 14:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-12 15:31 Confusing Performance Numbers Richard Purdie
2013-11-12 23:33 ` Phil Blundell
2013-11-13 14:02 ` Enrico Scholz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.