On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote: > On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote: > > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote: > > > Comparing that build to a corresponding do-nothing build with Thud, > > > the time difference matches those three minutes where I have no idea > > > what bitbake is doing now that it didn’t need to do before… > > > > > > Hopefully these time degradations can be solved, because the current > > > state of bitbake is barely usable. We also need to look into possible > > > ways to improve the cooker output when it is running the setscene > > > tasks so it makes some kind of sense again. > > > > We talked on irc and you pointed at the commit things started to go > > wrong. Just to summarise things for the benefit of the list, this is > > some quick testing I did: > > > > "bitbake -p; time bitbake core-image-minimal -n" > > > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > > 40.3s 7df31ff36892c2f9c65326b06b4c70 > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > > 76.9s master-next > > I know I'm late to the party, but it took really long for the test to finish.. > > I'm not using poky, so reproduce this testing on our builds I've used > bitbake revisions matching with poky revision RP was testing: > > + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because current master isn't compatible with old bitbake (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], expression was ${@oe_import(d)} which triggered exception AttributeError: module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn') > and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to possibly eliminate impact of metadata changes) > > tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM > > nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server configured > > on a build with few more layers: > Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 1927 skipped, 54 masked, 0 errors. > > but first doing just core-image-minimal like RP did: > > time poky bitbake oe-core > 2m8.191s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > N/A a0d941c787cf3ef030d190903279d311bc05d752 doesn't exist in poky/poky-contrib 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > 2m17.053s 7df31ff36892c2f9c65326b06b4c70 1f630fdf0260db08541d3ca9f25f852931c19905 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e f43778c2d19e70d4befd483b06aaf247fc65c799 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae > 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c 8c26b451f22193ef1c544e2017cc84515566c1b8 > 2m38.185s master-next fbcb89dc3dbabfc80310e9a4ebe72d919300a32e > cache.py:446: ResourceWarning: unclosed > value = pickled.load() > started showing with this revision > 2m17.991s master-next + fixups f7cd14f90463957b3e4be6d3876def98b78f1424 > 2m9.651s master-next + "Holding off tasks %s" out > > now world > 253m36.637s 7df31ff36892c2f9c65326b06b4c70 1f630fdf0260db08541d3ca9f25f852931c19905 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > 174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e f43778c2d19e70d4befd483b06aaf247fc65c799 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae > this is time when killed at "NOTE: Executing Tasks" without showing any progress on the tasks (unlike other tests) and only one bitbake process is running > 633m27.475s master-next fbcb89dc3dbabfc80310e9a4ebe72d919300a32e > this is time when killed at (1417 from 71749) - running very slowly unlike other bitbake revisions where the task number changes relatively quickly - about 10 tasks per second > 89m13.992s master-next + "Holding off tasks %s" out > 89m59.201s master-next updated today 85fe627fdb6510f0942917964386fad9d8c479c8 > > Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times > slower than recent master-next. > > I'll send -P output next. So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because this commit introduced that logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks)) Here is -P output from it: 5min (til "Executing Tasks" message): profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.return.at.Executing.Tasks http://paste.ubuntu.com/p/Nw6n5hTPmF/ 5hours (whole dry run): profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.whole.dry.run http://paste.ubuntu.com/p/sydFGvYTxQ/ profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.worker.log.processed http://paste.ubuntu.com/p/RMGbm9DyZz/ So for good old baseline I need to go few commits back, to e.g. 18d4a31f RP was testing before: 1f630fdf runqueue: Enable dynamic task adjustment to hash equivalency a0d941c7 siggen: Convert to use self.unitaskhashes 1f326f2c siggen: Add new unitaskhashes data variable which is cached b707d0cb runqueue: Improve scenequeue processing logic 19a6e356 cache: Add SimpleCache class 7bb79099 siggen: Import unihash code from OE-Core 7aec8632 hashserv: SQL Optimizations 18d4a31f fetch2/wget: avoid 'maximum recursion depth' RuntimeErrors when handling 403 codes 94m19.081s profile.log.processed.18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.whole.dry.run http://paste.ubuntu.com/p/Jnvv8GJNKB/ And here is profile with latest master from yesterday: 90mins: profile.log.processed.a5e084a266f63c2fd370122327615e49beaeb94e.whole.dry.run http://paste.ubuntu.com/p/YhMCHPmg56/ So good news is that current master is a bit faster than "good old" 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7. But still significantly slower on my fast builder compared to world tests on poky + meta-oe, not sure if it's all just because the size of metadata in my builds (with 71K tasks). Even the dry running the tasks was about 10 tasks/second while RP was seeing around 200 tasks/second. Will try to bump BB_NUMBER_THREADS from 8 to 72. I've already disabled rm_work and PRserv, but that doesn't seem to have any significant impact on performance here. Other than resolving the PRserv terminating and then warnings about lost connection. Cheers, > > > > > So basically the original changes showed a 25% hit but the performance > > of -next is dire. > > > > This is with no hash equiv server configured. > > > > It will vary depending on the target used (numbers with -sato for the > > above would be interesting for comparision) and how much was or is in > > sstate, they type of sstate mirror configured and so on. > > > > I really need to focus on getting the new code functioning correctly > > before we attempt to optimise but if nobody tests the new code due to > > performance problems we have a different issue. We also have a scaling > > problem with the hash server itself I need to fix to stop the > > autobuilder throwing weird errors. I'm therefore a bit challenged on > > where to start with it all :/. > > > > Cheers, > > > > Richard > > > > > > > > -- > > _______________________________________________ > > Openembedded-core mailing list > > Openembedded-core@lists.openembedded.org > > http://lists.openembedded.org/mailman/listinfo/openembedded-core > > -- > Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com -- Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com