All of lore.kernel.org
 help / color / mirror / Atom feed
* Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
@ 2019-08-12 20:26 Peter Kjellerstedt
  2019-08-13  9:04 ` Richard Purdie
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Kjellerstedt @ 2019-08-12 20:26 UTC (permalink / raw)
  To: Alexander Kanavin, Khem Raj; +Cc: OE-core

[-- Attachment #1: Type: text/plain, Size: 3433 bytes --]

I’m seeing similar results with our builds (Poky + some layers from OpenEmbedded + our own layers; no hash equivalency). Here are some examples of builds from today (with approximate timings added manually by me):

Initialising tasks: 100% |#######################################| Time: 0:00:08
<1 minute without any output>
Checking sstate mirror object availability: 100% |###############| Time: 0:00:23
Sstate summary: Wanted 4151 Found 0 Missed 4151 Current 943 (0% match, 18% complete)
NOTE: Executing Tasks
<3 minutes without any output>
NOTE: Setscene tasks completed
<9 minutes without any output>
No currently running tasks (512 of 12540)   4% |#                              |

After that I aborted the build and removed the tmp directory. The next build started without any of the above delays. However, the display of the then executed setscene tasks by knotty is anything but useful now after the latest changes to bitbake (mostly showing nothing, and occasionally showing a few lines saying that it is running task 0 of 0 which are immediately removed again). Anyway, after that build had run for a couple of thousand tasks it failed with a build error. When I had fixed that and started the build again, I got these timings:

Initialising tasks: 100% |#######################################| Time: 0:00:08
Checking sstate mirror object availability: 100% |###############| Time: 0:00:18
<1.5 minutes without any output>
Sstate summary: Wanted 3870 Found 1 Missed 3869 Current 1224 (0% match, 24% complete)
NOTE: Executing Tasks
<2 minutes without any output>
NOTE: Setscene tasks completed

That build eventually completed normally. Finally, when I started the build again to basically do nothing, I got this:

Initialising tasks: 100% |#######################################| Time: 0:00:08
<3 minutes without any output>
NOTE: Executing Tasks
NOTE: Setscene tasks completed

Comparing that build to a corresponding do-nothing build with Thud, the time difference matches those three minutes where I have no idea what bitbake is doing now that it didn’t need to do before…

Hopefully these time degradations can be solved, because the current state of bitbake is barely usable. We also need to look into possible ways to improve the cooker output when it is running the setscene tasks so it makes some kind of sense again.

//Peter

From: openembedded-core-bounces@lists.openembedded.org <openembedded-core-bounces@lists.openembedded.org> On Behalf Of Alexander Kanavin
Sent: den 12 augusti 2019 21:50
To: Khem Raj <raj.khem@gmail.com>
Cc: OE-core <openembedded-core@lists.openembedded.org>
Subject: Re: [OE-core] [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively

On Mon, 12 Aug 2019 at 21:01, Khem Raj <raj.khem@gmail.com<mailto:raj.khem@gmail.com>> wrote:

Richard, another issue I saw was slowness in world builds when I had
this enabled
I was on master-next for both core and bitbake. It was spending a lot
of time in ensuring
the build is not needed so much that the build took almost same time
than the build with
out it where the build without it will build everything.

world builds seem to have regressed (preparation time wise) even without having hash equivalency. Just oe-core is bearable, but adding meta-oe layers makes it spent many minutes after initializing tasks but before any tasks actually begin.

Alex

[-- Attachment #2: Type: text/html, Size: 10929 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-12 20:26 Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively) Peter Kjellerstedt
@ 2019-08-13  9:04 ` Richard Purdie
  2019-08-13 13:20   ` Alexander Kanavin
                     ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Richard Purdie @ 2019-08-13  9:04 UTC (permalink / raw)
  To: Peter Kjellerstedt, Alexander Kanavin, Khem Raj; +Cc: OE-core

On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> Comparing that build to a corresponding do-nothing build with Thud,
> the time difference matches those three minutes where I have no idea
> what bitbake is doing now that it didn’t need to do before…
>  
> Hopefully these time degradations can be solved, because the current
> state of bitbake is barely usable. We also need to look into possible
> ways to improve the cooker output when it is running the setscene
> tasks so it makes some kind of sense again.

We talked on irc and you pointed at the commit things started to go
wrong. Just to summarise things for the benefit of the list, this is
some quick testing I did:

"bitbake -p; time bitbake core-image-minimal -n"

30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
30.6s a0d941c787cf3ef030d190903279d311bc05d752
40.3s 7df31ff36892c2f9c65326b06b4c70 
42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
76.9s master-next

So basically the original changes showed a 25% hit but the performance
of -next is dire.

This is with no hash equiv server configured.

It will vary depending on the target used (numbers with -sato for the
above would be interesting for comparision) and how much was or is in
sstate, they type of sstate mirror configured and so on.

I really need to focus on getting the new code functioning correctly
before we attempt to optimise but if nobody tests the new code due to
performance problems we have a different issue. We also have a scaling
problem with the hash server itself I need to fix to stop the
autobuilder throwing weird errors. I'm therefore a bit challenged on
where to start with it all :/.

Cheers,

Richard





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-13  9:04 ` Richard Purdie
@ 2019-08-13 13:20   ` Alexander Kanavin
  2019-08-13 19:18   ` Richard Purdie
  2019-08-15 15:05   ` Martin Jansa
  2 siblings, 0 replies; 24+ messages in thread
From: Alexander Kanavin @ 2019-08-13 13:20 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

On Tue, 13 Aug 2019 at 11:04, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> We talked on irc and you pointed at the commit things started to go
> wrong. Just to summarise things for the benefit of the list, this is
> some quick testing I did:
>
> "bitbake -p; time bitbake core-image-minimal -n"
>
> 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> 40.3s 7df31ff36892c2f9c65326b06b4c70
> 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> 76.9s master-next
>
> So basically the original changes showed a 25% hit but the performance
> of -next is dire.
>

Sadly with larger sets the performance hit is much worse. I enabled all of
the layers in meta-oe, and ran this with empty sstate and build directories:
bitbake -p; time bitbake -n -k world

The outcomes are:

6c7c0cefd34067311144a1d4c01986fe0a4aef26
12m51,848s
(this is the baseline; the bulk of the time goes towards going through the
task list without executing the tasks - the 'spinning task counter" part)

9983b07fffd19082abded7c3f15cc77d306dd69c
66m29,563s

So there could be something quadratically-growing going on with these new
commits :(

I didn't dare to try master-next after seeing above how it hits
core-image-minimal and master already regressing to over an hour :-(

Alex

[-- Attachment #2: Type: text/html, Size: 1982 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-13  9:04 ` Richard Purdie
  2019-08-13 13:20   ` Alexander Kanavin
@ 2019-08-13 19:18   ` Richard Purdie
  2019-08-14  4:06     ` Khem Raj
  2019-08-14 11:25     ` Alexander Kanavin
  2019-08-15 15:05   ` Martin Jansa
  2 siblings, 2 replies; 24+ messages in thread
From: Richard Purdie @ 2019-08-13 19:18 UTC (permalink / raw)
  To: Peter Kjellerstedt, Alexander Kanavin, Khem Raj; +Cc: OE-core

On Tue, 2019-08-13 at 10:04 +0100, Richard Purdie wrote:
> On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > Comparing that build to a corresponding do-nothing build with Thud,
> > the time difference matches those three minutes where I have no
> > idea
> > what bitbake is doing now that it didn’t need to do before…
> >  
> > Hopefully these time degradations can be solved, because the
> > current
> > state of bitbake is barely usable. We also need to look into
> > possible
> > ways to improve the cooker output when it is running the setscene
> > tasks so it makes some kind of sense again.
> 
> We talked on irc and you pointed at the commit things started to go
> wrong. Just to summarise things for the benefit of the list, this is
> some quick testing I did:
> 
> "bitbake -p; time bitbake core-image-minimal -n"
> 
> 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> 40.3s 7df31ff36892c2f9c65326b06b4c70 
> 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
> 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
> 76.9s master-next
> 
> So basically the original changes showed a 25% hit but the
> performance
> of -next is dire.
> 
> This is with no hash equiv server configured.
> 
> It will vary depending on the target used (numbers with -sato for the
> above would be interesting for comparision) and how much was or is in
> sstate, they type of sstate mirror configured and so on.
> 
> I really need to focus on getting the new code functioning correctly
> before we attempt to optimise but if nobody tests the new code due to
> performance problems we have a different issue. We also have a
> scaling
> problem with the hash server itself I need to fix to stop the
> autobuilder throwing weird errors. I'm therefore a bit challenged on
> where to start with it all :/.

I had a glance at the profile output from master-next and the problem
wasn't where I thought it would be, it was in the scheduler code. That
is good as those classes are effectively independent of the other
changes and hence are a separate fix.

I've put a patch in -next which takes the above test to 36s which is
close to the older bitbake.

Could be interesting to see how it looks for others and different
workloads.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-13 19:18   ` Richard Purdie
@ 2019-08-14  4:06     ` Khem Raj
  2019-08-14 11:25     ` Alexander Kanavin
  1 sibling, 0 replies; 24+ messages in thread
From: Khem Raj @ 2019-08-14  4:06 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

On Tue, Aug 13, 2019 at 12:18 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> On Tue, 2019-08-13 at 10:04 +0100, Richard Purdie wrote:
> > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > > Comparing that build to a corresponding do-nothing build with Thud,
> > > the time difference matches those three minutes where I have no
> > > idea
> > > what bitbake is doing now that it didn’t need to do before…
> > >
> > > Hopefully these time degradations can be solved, because the
> > > current
> > > state of bitbake is barely usable. We also need to look into
> > > possible
> > > ways to improve the cooker output when it is running the setscene
> > > tasks so it makes some kind of sense again.
> >
> > We talked on irc and you pointed at the commit things started to go
> > wrong. Just to summarise things for the benefit of the list, this is
> > some quick testing I did:
> >
> > "bitbake -p; time bitbake core-image-minimal -n"
> >
> > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > 76.9s master-next
> >
> > So basically the original changes showed a 25% hit but the
> > performance
> > of -next is dire.
> >
> > This is with no hash equiv server configured.
> >
> > It will vary depending on the target used (numbers with -sato for the
> > above would be interesting for comparision) and how much was or is in
> > sstate, they type of sstate mirror configured and so on.
> >
> > I really need to focus on getting the new code functioning correctly
> > before we attempt to optimise but if nobody tests the new code due to
> > performance problems we have a different issue. We also have a
> > scaling
> > problem with the hash server itself I need to fix to stop the
> > autobuilder throwing weird errors. I'm therefore a bit challenged on
> > where to start with it all :/.
>
> I had a glance at the profile output from master-next and the problem
> wasn't where I thought it would be, it was in the scheduler code. That
> is good as those classes are effectively independent of the other
> changes and hence are a separate fix.
>
> I've put a patch in -next which takes the above test to 36s which is
> close to the older bitbake.
>
> Could be interesting to see how it looks for others and different
> workloads.
>

will try it tonight

> Cheers,
>
> Richard
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-13 19:18   ` Richard Purdie
  2019-08-14  4:06     ` Khem Raj
@ 2019-08-14 11:25     ` Alexander Kanavin
  2019-08-14 11:36       ` richard.purdie
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Kanavin @ 2019-08-14 11:25 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> I had a glance at the profile output from master-next and the problem
> wasn't where I thought it would be, it was in the scheduler code. That
> is good as those classes are effectively independent of the other
> changes and hence are a separate fix.
>
> I've put a patch in -next which takes the above test to 36s which is
> close to the older bitbake.
>
> Could be interesting to see how it looks for others and different
> workloads.
>

I just tried the same test I did yesterday with
ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't seem to
improve things much: bitbake is stuck at "NOTE: Executing Tasks" for 15
minutes now.

Alex

[-- Attachment #2: Type: text/html, Size: 1122 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 11:25     ` Alexander Kanavin
@ 2019-08-14 11:36       ` richard.purdie
  2019-08-14 12:08         ` Alexander Kanavin
  0 siblings, 1 reply; 24+ messages in thread
From: richard.purdie @ 2019-08-14 11:36 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: Peter Kjellerstedt, OE-core

On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> richard.purdie@linuxfoundation.org> wrote:
> > I had a glance at the profile output from master-next and the
> > problem
> > wasn't where I thought it would be, it was in the scheduler code.
> > That
> > is good as those classes are effectively independent of the other
> > changes and hence are a separate fix.
> > 
> > I've put a patch in -next which takes the above test to 36s which
> > is
> > close to the older bitbake.
> > 
> > Could be interesting to see how it looks for others and different
> > workloads.
> 
> I just tried the same test I did yesterday with
> ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't
> seem to improve things much: bitbake is stuck at "NOTE: Executing
> Tasks" for 15 minutes now.

This might sound slightly crazy but can you try commenting out this
line in runqueue.py:

logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))

?

This was with the hash server disabled, right?

Cheers,

Richard



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 11:36       ` richard.purdie
@ 2019-08-14 12:08         ` Alexander Kanavin
  2019-08-14 12:43           ` Alexander Kanavin
                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Alexander Kanavin @ 2019-08-14 12:08 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

On Wed, 14 Aug 2019 at 13:36, <richard.purdie@linuxfoundation.org> wrote:

> On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > richard.purdie@linuxfoundation.org> wrote:
> > > I had a glance at the profile output from master-next and the
> > > problem
> > > wasn't where I thought it would be, it was in the scheduler code.
> > > That
> > > is good as those classes are effectively independent of the other
> > > changes and hence are a separate fix.
> > >
> > > I've put a patch in -next which takes the above test to 36s which
> > > is
> > > close to the older bitbake.
> > >
> > > Could be interesting to see how it looks for others and different
> > > workloads.
> >
> > I just tried the same test I did yesterday with
> > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't
> > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > Tasks" for 15 minutes now.
>
> This might sound slightly crazy but can you try commenting out this
> line in runqueue.py:
>
> logger.debug(2, "Holding off tasks %s" %
> pprint.pformat(self.holdoff_tasks))
>
> ?
>

Even crazier is the outcome: it helped! The whole thing completed after
15m49secons (with much of the time going to the empty task spin), that's
some 3 minutes slower, but certainly it's usable again.

I have not enabled the hash server at any point.

Alex

[-- Attachment #2: Type: text/html, Size: 2054 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 12:08         ` Alexander Kanavin
@ 2019-08-14 12:43           ` Alexander Kanavin
  2019-08-14 12:50           ` Mikko.Rapeli
  2019-08-14 12:55           ` richard.purdie
  2 siblings, 0 replies; 24+ messages in thread
From: Alexander Kanavin @ 2019-08-14 12:43 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

[-- Attachment #1: Type: text/plain, Size: 678 bytes --]

On Wed, 14 Aug 2019 at 14:08, Alexander Kanavin <alex.kanavin@gmail.com>
wrote:

>
> This might sound slightly crazy but can you try commenting out this
>> line in runqueue.py:
>>
>> logger.debug(2, "Holding off tasks %s" %
>> pprint.pformat(self.holdoff_tasks))
>>
>> ?
>>
>
> Even crazier is the outcome: it helped! The whole thing completed after
> 15m49secons (with much of the time going to the empty task spin), that's
> some 3 minutes slower, but certainly it's usable again.
>
> I have not enabled the hash server at any point.
>

I need to clarify: this was with current master
(18b8e2e10494d3b0e18743a1bf9e99038da4229c), not master-next.

Alex

[-- Attachment #2: Type: text/html, Size: 1219 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 12:08         ` Alexander Kanavin
  2019-08-14 12:43           ` Alexander Kanavin
@ 2019-08-14 12:50           ` Mikko.Rapeli
  2019-08-14 12:55           ` richard.purdie
  2 siblings, 0 replies; 24+ messages in thread
From: Mikko.Rapeli @ 2019-08-14 12:50 UTC (permalink / raw)
  To: alex.kanavin; +Cc: peter.kjellerstedt, openembedded-core

On Wed, Aug 14, 2019 at 02:08:01PM +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 13:36, <richard.purdie@linuxfoundation.org> wrote:
> 
> > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > richard.purdie@linuxfoundation.org> wrote:
> > > > I had a glance at the profile output from master-next and the
> > > > problem
> > > > wasn't where I thought it would be, it was in the scheduler code.
> > > > That
> > > > is good as those classes are effectively independent of the other
> > > > changes and hence are a separate fix.
> > > >
> > > > I've put a patch in -next which takes the above test to 36s which
> > > > is
> > > > close to the older bitbake.
> > > >
> > > > Could be interesting to see how it looks for others and different
> > > > workloads.
> > >
> > > I just tried the same test I did yesterday with
> > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't
> > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > Tasks" for 15 minutes now.
> >
> > This might sound slightly crazy but can you try commenting out this
> > line in runqueue.py:
> >
> > logger.debug(2, "Holding off tasks %s" %
> > pprint.pformat(self.holdoff_tasks))
> >
> > ?
> >
> 
> Even crazier is the outcome: it helped! The whole thing completed after
> 15m49secons (with much of the time going to the empty task spin), that's
> some 3 minutes slower, but certainly it's usable again.
> 
> I have not enabled the hash server at any point.

Indeed, commenting that debug statement out removes few minutes of the
bitbake preparation times.

Here is a time stamped output from master branch:

2019-08-14_14:29:24  
2019-08-14_14:29:35  Initialising tasks...done.
2019-08-14_14:32:41  Checking sstate mirror object availability...done.
2019-08-14_14:32:41  Sstate summary: Wanted 436 Found 426 Missed 10 Current 2407 (97% match, 99% complete)
2019-08-14_14:32:41  NOTE: Executing Tasks
2019-08-14_14:32:52  NOTE: Setscene tasks completed

And with the comment removed:

2019-08-14_14:35:10  
2019-08-14_14:35:21  Initialising tasks...done.
2019-08-14_14:35:30  Checking sstate mirror object availability...done.
2019-08-14_14:35:30  Sstate summary: Wanted 436 Found 426 Missed 10 Current 2407 (97% match, 99% complete)
2019-08-14_14:35:30  NOTE: Executing Tasks

And just in case back to master branch state:

2019-08-14_14:41:32  
2019-08-14_14:41:43  Initialising tasks...done.
2019-08-14_14:45:02  Checking sstate mirror object availability...done.
2019-08-14_14:45:02  Sstate summary: Wanted 436 Found 426 Missed 10 Current 2407 (97% match, 99% complete)
2019-08-14_14:45:02  NOTE: Executing Tasks

I did not flush caches in between and I stopped the builds once bitbake started doing stuff, so I think sstate cache was completely buffered in memory
from file system and IO delays did not affect the timings.

Change to poky was exactly:

--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2216,7 +2216,7 @@ class RunQueueExecute:
             for dep in self.sqdata.sq_covered_tasks[tid]:
                 if dep not in self.runq_complete:
                     self.holdoff_tasks.add(dep)
-        logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))
+        # logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))
 
     def process_possible_migrations(self):
         changes = False

-Mikko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 12:08         ` Alexander Kanavin
  2019-08-14 12:43           ` Alexander Kanavin
  2019-08-14 12:50           ` Mikko.Rapeli
@ 2019-08-14 12:55           ` richard.purdie
  2019-08-14 14:57             ` Peter Kjellerstedt
  2019-08-14 20:27             ` Alexander Kanavin
  2 siblings, 2 replies; 24+ messages in thread
From: richard.purdie @ 2019-08-14 12:55 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: Peter Kjellerstedt, OE-core

On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 13:36, <richard.purdie@linuxfoundation.org>
> wrote:
> > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > richard.purdie@linuxfoundation.org> wrote:
> > > > I had a glance at the profile output from master-next and the
> > > > problem
> > > > wasn't where I thought it would be, it was in the scheduler
> > code.
> > > > That
> > > > is good as those classes are effectively independent of the
> > other
> > > > changes and hence are a separate fix.
> > > > 
> > > > I've put a patch in -next which takes the above test to 36s
> > which
> > > > is
> > > > close to the older bitbake.
> > > > 
> > > > Could be interesting to see how it looks for others and
> > different
> > > > workloads.
> > > 
> > > I just tried the same test I did yesterday with
> > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > doesn't
> > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > Tasks" for 15 minutes now.
> > 
> > This might sound slightly crazy but can you try commenting out this
> > line in runqueue.py:
> > 
> > logger.debug(2, "Holding off tasks %s" %
> > pprint.pformat(self.holdoff_tasks))
> > 
> > ?
> 
> Even crazier is the outcome: it helped! 

Cool, I think I can explain it.

The holdoff_tasks list can contain a list of nearly all the tasks at
some points in execution. Even though the debug messages aren't being
printed on the console, they are being sent over the internal IPC bus
between the cooker, UI and other event handlers. Obviously for small
task lists its not a problem, for large ones its multiple 4k chunks
over pipes which isn't going to be fast.

We have done a lot of optimisation in the past but its all too easy to
trend on something like this and upset things :/.

> The whole thing completed after 15m49secons (with much of the time
> going to the empty task spin), that's some 3 minutes slower, but
> certainly it's usable again.

You followed up mentioning this wasn't with master-next. I think there
is a patch in -next which will help with the empty task spin so both
together might get us back to more normal numbers.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 12:55           ` richard.purdie
@ 2019-08-14 14:57             ` Peter Kjellerstedt
  2019-08-14 15:19               ` Khem Raj
  2019-08-14 20:27             ` Alexander Kanavin
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Kjellerstedt @ 2019-08-14 14:57 UTC (permalink / raw)
  To: richard.purdie, Alexander Kanavin; +Cc: OE-core

> -----Original Message-----
> From: richard.purdie@linuxfoundation.org
> <richard.purdie@linuxfoundation.org>
> Sent: den 14 augusti 2019 14:56
> To: Alexander Kanavin <alex.kanavin@gmail.com>
> Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Khem Raj
> <raj.khem@gmail.com>; OE-core <openembedded-
> core@lists.openembedded.org>
> Subject: Re: [OE-core] Long delays with latest bitbake (was: [PATCH
> 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS
> recursively)
> 
> On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> > On Wed, 14 Aug 2019 at 13:36, <richard.purdie@linuxfoundation.org>
> > wrote:
> > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > > richard.purdie@linuxfoundation.org> wrote:
> > > > > I had a glance at the profile output from master-next and the
> > > > > problem
> > > > > wasn't where I thought it would be, it was in the scheduler
> > > code.
> > > > > That
> > > > > is good as those classes are effectively independent of the
> > > other
> > > > > changes and hence are a separate fix.
> > > > >
> > > > > I've put a patch in -next which takes the above test to 36s
> > > which
> > > > > is
> > > > > close to the older bitbake.
> > > > >
> > > > > Could be interesting to see how it looks for others and
> > > different
> > > > > workloads.
> > > >
> > > > I just tried the same test I did yesterday with
> > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > > doesn't
> > > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > > Tasks" for 15 minutes now.
> > >
> > > This might sound slightly crazy but can you try commenting out this
> > > line in runqueue.py:
> > >
> > > logger.debug(2, "Holding off tasks %s" %
> > > pprint.pformat(self.holdoff_tasks))
> > >
> > > ?
> >
> > Even crazier is the outcome: it helped!
> 
> Cool, I think I can explain it.
> 
> The holdoff_tasks list can contain a list of nearly all the tasks at
> some points in execution. Even though the debug messages aren't being
> printed on the console, they are being sent over the internal IPC bus
> between the cooker, UI and other event handlers. Obviously for small
> task lists its not a problem, for large ones its multiple 4k chunks
> over pipes which isn't going to be fast.
> 
> We have done a lot of optimisation in the past but its all too easy to
> trend on something like this and upset things :/.
> 
> > The whole thing completed after 15m49secons (with much of the time
> > going to the empty task spin), that's some 3 minutes slower, but
> > certainly it's usable again.
> 
> You followed up mentioning this wasn't with master-next. I think there
> is a patch in -next which will help with the empty task spin so both
> together might get us back to more normal numbers.
> 
> Cheers,
> 
> Richard

I can just confirm that removing the debug line removes almost all of 
the delays I was seeing. Here are some time statistics from my builds:

6c7c0cef: 02:50
7df31ff3: 06:13  (first attempt)
a0542ed3: 06:32  (master)
19a88c68: 43:19* (~yesterday's master-next)
b0a0e4a6: 06:55  (master-next)
no debug: 03:04  (master-next + removal of "Holding off tasks" debug)

* I aborted this build after about half of the 12540 tasks were done...

One thing I have noticed while I was doing the timings is that I do 
not seem to be able to kill the bitbake server with bitbake -m after 
the builds have completed, but have to resort to using kill -9 most 
of the time... 

//Peter


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 14:57             ` Peter Kjellerstedt
@ 2019-08-14 15:19               ` Khem Raj
  0 siblings, 0 replies; 24+ messages in thread
From: Khem Raj @ 2019-08-14 15:19 UTC (permalink / raw)
  To: Peter Kjellerstedt; +Cc: OE-core

On Wed, Aug 14, 2019 at 7:57 AM Peter Kjellerstedt
<peter.kjellerstedt@axis.com> wrote:
>
> > -----Original Message-----
> > From: richard.purdie@linuxfoundation.org
> > <richard.purdie@linuxfoundation.org>
> > Sent: den 14 augusti 2019 14:56
> > To: Alexander Kanavin <alex.kanavin@gmail.com>
> > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Khem Raj
> > <raj.khem@gmail.com>; OE-core <openembedded-
> > core@lists.openembedded.org>
> > Subject: Re: [OE-core] Long delays with latest bitbake (was: [PATCH
> > 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS
> > recursively)
> >
> > On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> > > On Wed, 14 Aug 2019 at 13:36, <richard.purdie@linuxfoundation.org>
> > > wrote:
> > > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > > > richard.purdie@linuxfoundation.org> wrote:
> > > > > > I had a glance at the profile output from master-next and the
> > > > > > problem
> > > > > > wasn't where I thought it would be, it was in the scheduler
> > > > code.
> > > > > > That
> > > > > > is good as those classes are effectively independent of the
> > > > other
> > > > > > changes and hence are a separate fix.
> > > > > >
> > > > > > I've put a patch in -next which takes the above test to 36s
> > > > which
> > > > > > is
> > > > > > close to the older bitbake.
> > > > > >
> > > > > > Could be interesting to see how it looks for others and
> > > > different
> > > > > > workloads.
> > > > >
> > > > > I just tried the same test I did yesterday with
> > > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > > > doesn't
> > > > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > > > Tasks" for 15 minutes now.
> > > >
> > > > This might sound slightly crazy but can you try commenting out this
> > > > line in runqueue.py:
> > > >
> > > > logger.debug(2, "Holding off tasks %s" %
> > > > pprint.pformat(self.holdoff_tasks))
> > > >
> > > > ?
> > >
> > > Even crazier is the outcome: it helped!
> >
> > Cool, I think I can explain it.
> >
> > The holdoff_tasks list can contain a list of nearly all the tasks at
> > some points in execution. Even though the debug messages aren't being
> > printed on the console, they are being sent over the internal IPC bus
> > between the cooker, UI and other event handlers. Obviously for small
> > task lists its not a problem, for large ones its multiple 4k chunks
> > over pipes which isn't going to be fast.
> >
> > We have done a lot of optimisation in the past but its all too easy to
> > trend on something like this and upset things :/.
> >
> > > The whole thing completed after 15m49secons (with much of the time
> > > going to the empty task spin), that's some 3 minutes slower, but
> > > certainly it's usable again.
> >
> > You followed up mentioning this wasn't with master-next. I think there
> > is a patch in -next which will help with the empty task spin so both
> > together might get us back to more normal numbers.
> >
> > Cheers,
> >
> > Richard
>
> I can just confirm that removing the debug line removes almost all of
> the delays I was seeing. Here are some time statistics from my builds:
>
> 6c7c0cef: 02:50
> 7df31ff3: 06:13  (first attempt)
> a0542ed3: 06:32  (master)
> 19a88c68: 43:19* (~yesterday's master-next)
> b0a0e4a6: 06:55  (master-next)
> no debug: 03:04  (master-next + removal of "Holding off tasks" debug)
>
> * I aborted this build after about half of the 12540 tasks were done...
>
> One thing I have noticed while I was doing the timings is that I do
> not seem to be able to kill the bitbake server with bitbake -m after
> the builds have completed, but have to resort to using kill -9 most
> of the time...

me too.  My world build time came down to < 15 mins as well, Last
night it was at 45 mins and still parsing before I stopped it.

>
> //Peter
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 12:55           ` richard.purdie
  2019-08-14 14:57             ` Peter Kjellerstedt
@ 2019-08-14 20:27             ` Alexander Kanavin
  2019-08-14 21:25               ` richard.purdie
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Kanavin @ 2019-08-14 20:27 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

On Wed, 14 Aug 2019 at 14:55, <richard.purdie@linuxfoundation.org> wrote:

> You followed up mentioning this wasn't with master-next. I think there
> is a patch in -next which will help with the empty task spin so both
> together might get us back to more normal numbers.
>

As all of these patches are now in master, I re-ran the test with that
(209f89ab8ed51ac2867ca8f749336af1ee24ab25), but without including the
spinning task part, pressing ctrl-c just as it starts. The outcome is
9m42s, compared to 2m9s for the baseline
(6c7c0cefd34067311144a1d4c01986fe0a4aef26).
So the worst is fixed, but the slowdown is still noticeable.

Alex

[-- Attachment #2: Type: text/html, Size: 1039 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 20:27             ` Alexander Kanavin
@ 2019-08-14 21:25               ` richard.purdie
  2019-08-15 12:56                 ` Alexander Kanavin
  0 siblings, 1 reply; 24+ messages in thread
From: richard.purdie @ 2019-08-14 21:25 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: Peter Kjellerstedt, OE-core

On Wed, 2019-08-14 at 22:27 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 14:55, <richard.purdie@linuxfoundation.org>
> wrote:
> > You followed up mentioning this wasn't with master-next. I think
> > there
> > is a patch in -next which will help with the empty task spin so
> > both
> > together might get us back to more normal numbers.
> > 
> 
> As all of these patches are now in master, I re-ran the test with
> that (209f89ab8ed51ac2867ca8f749336af1ee24ab25), but without
> including the spinning task part, pressing ctrl-c just as it starts.
> The outcome is 9m42s, compared to 2m9s for the baseline
> (6c7c0cefd34067311144a1d4c01986fe0a4aef26). So the worst is fixed,
> but the slowdown is still noticeable.

Right, it still definitely needs work. Its a balancing act between
sorting out the execution bugs in the code and figuring out the
performance problem.

If anyone wants to experiment, the way I'd debug this is to run the
before and after cases with the -P option to bitbake. If you want to
exit early just make the code return where it prints "Executing tasks"
or whatever makes sense as Ctrl+C won't write the profile data.

You want the profile.log.processed file.

So save that file with the "before" commit, then save it afterwards and
look at those files and see where its spending more time.

If someone generates those two files I'll happily take a look, I'm kind
of used to reading them. There are four sets of output in there, same
data but different sorting/types, each has its uses.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-14 21:25               ` richard.purdie
@ 2019-08-15 12:56                 ` Alexander Kanavin
  2019-08-15 13:56                   ` richard.purdie
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Kanavin @ 2019-08-15 12:56 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, OE-core

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Wed, 14 Aug 2019 at 23:25, <richard.purdie@linuxfoundation.org> wrote:

> Right, it still definitely needs work. Its a balancing act between
> sorting out the execution bugs in the code and figuring out the
> performance problem.
>
> If anyone wants to experiment, the way I'd debug this is to run the
> before and after cases with the -P option to bitbake. If you want to
> exit early just make the code return where it prints "Executing tasks"
> or whatever makes sense as Ctrl+C won't write the profile data.
>
> You want the profile.log.processed file.
>
> So save that file with the "before" commit, then save it afterwards and
> look at those files and see where its spending more time.
>
> If someone generates those two files I'll happily take a look, I'm kind
> of used to reading them. There are four sets of output in there, same
> data but different sorting/types, each has its uses.
>

And here they are (with task spinning included this time):
http://sensi.org/~ak/tmp/profile.log.processed.before
http://sensi.org/~ak/tmp/profile.log.processed.after

Hope you find it useful!

Alex

[-- Attachment #2: Type: text/html, Size: 1694 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-15 12:56                 ` Alexander Kanavin
@ 2019-08-15 13:56                   ` richard.purdie
  2019-08-15 22:44                     ` Richard Purdie
  0 siblings, 1 reply; 24+ messages in thread
From: richard.purdie @ 2019-08-15 13:56 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: Peter Kjellerstedt, OE-core

On Thu, 2019-08-15 at 14:56 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 23:25, <richard.purdie@linuxfoundation.org>
> wrote:
> > Right, it still definitely needs work. Its a balancing act between
> > sorting out the execution bugs in the code and figuring out the
> > performance problem.
> > 
> > If anyone wants to experiment, the way I'd debug this is to run the
> > before and after cases with the -P option to bitbake. If you want
> > to
> > exit early just make the code return where it prints "Executing
> > tasks"
> > or whatever makes sense as Ctrl+C won't write the profile data.
> > 
> > You want the profile.log.processed file.
> > 
> > So save that file with the "before" commit, then save it afterwards
> > and
> > look at those files and see where its spending more time.
> > 
> > If someone generates those two files I'll happily take a look, I'm
> > kind
> > of used to reading them. There are four sets of output in there,
> > same
> > data but different sorting/types, each has its uses.
> 
> And here they are (with task spinning included this time):
> http://sensi.org/~ak/tmp/profile.log.processed.before
> http://sensi.org/~ak/tmp/profile.log.processed.after
> 
> Hope you find it useful!

How many tasks are in that build?

What is really odd is this on both traces:

   524436  405.102    0.001  405.266    0.001 /home/alexander/development/poky/bitbake/lib/bb/cooker.py:264(notifications)

What that means is that cooker recieved notification of 524436 file
changes whilst it was parsing/preparing runqueue.

I would be very interested to understand which files its being notified
of changes to as I suspect there is something that shouldn't be there.

If we fixed that, it would take 406s off the 1800s for starters in both
cases. I don't see that locally in my trace, my build layout may be
different.

Other things of note:

Time in next_buildable_task went up 338 -> 440s
(probably through the set manipulation e.g. method 'difference' of
'set' objects)
New function update_holdofftasks() took 173s

build_taskdepdata is also eating lots of time suddenly.

So yes, useful, thanks. You may want to add some debug to that
nofitications() function in cooker and see which files are changing
though.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-13  9:04 ` Richard Purdie
  2019-08-13 13:20   ` Alexander Kanavin
  2019-08-13 19:18   ` Richard Purdie
@ 2019-08-15 15:05   ` Martin Jansa
  2019-08-16 10:24     ` Martin Jansa
  2 siblings, 1 reply; 24+ messages in thread
From: Martin Jansa @ 2019-08-15 15:05 UTC (permalink / raw)
  To: Richard Purdie; +Cc: OE-core, Peter Kjellerstedt

[-- Attachment #1: Type: text/plain, Size: 5268 bytes --]

On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > Comparing that build to a corresponding do-nothing build with Thud,
> > the time difference matches those three minutes where I have no idea
> > what bitbake is doing now that it didn’t need to do before…
> >  
> > Hopefully these time degradations can be solved, because the current
> > state of bitbake is barely usable. We also need to look into possible
> > ways to improve the cooker output when it is running the setscene
> > tasks so it makes some kind of sense again.
> 
> We talked on irc and you pointed at the commit things started to go
> wrong. Just to summarise things for the benefit of the list, this is
> some quick testing I did:
> 
> "bitbake -p; time bitbake core-image-minimal -n"
> 
> 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> 40.3s 7df31ff36892c2f9c65326b06b4c70 
> 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
> 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
> 76.9s master-next

I know I'm late to the party, but it took really long for the test to finish..

I'm not using poky, so reproduce this testing on our builds I've used
bitbake revisions matching with poky revision RP was testing:

+ oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because current master isn't compatible with old bitbake (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], expression was ${@oe_import(d)} which triggered exception AttributeError: module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to possibly eliminate impact of metadata changes)

tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM

nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server configured

on a build with few more layers:
Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 1927 skipped, 54 masked, 0 errors.

but first doing just core-image-minimal like RP did:

time		poky						bitbake						oe-core
2m8.191s	6c7c0cefd34067311144a1d4c01986fe0a4aef26	18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7	18cdc087fd5da30e2b31f3d4e81b153cd36ca844
N/A		a0d941c787cf3ef030d190903279d311bc05d752	doesn't exist in poky/poky-contrib		18cdc087fd5da30e2b31f3d4e81b153cd36ca844
2m17.053s	7df31ff36892c2f9c65326b06b4c70			1f630fdf0260db08541d3ca9f25f852931c19905	18cdc087fd5da30e2b31f3d4e81b153cd36ca844
2m18.515s	a0542ed3ff700eca35f9195f743c9e28bcd50f3e	f43778c2d19e70d4befd483b06aaf247fc65c799	23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
2m22.220s	9983b07fffd19082abded7c3f15cc77d306dd69c	8c26b451f22193ef1c544e2017cc84515566c1b8
2m38.185s	master-next					fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
cache.py:446: ResourceWarning: unclosed <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 41117)>
  value = pickled.load()
started showing with this revision
2m17.991s	master-next + fixups				f7cd14f90463957b3e4be6d3876def98b78f1424
2m9.651s	master-next + "Holding off tasks %s" out

now world
253m36.637s	7df31ff36892c2f9c65326b06b4c70			1f630fdf0260db08541d3ca9f25f852931c19905	18cdc087fd5da30e2b31f3d4e81b153cd36ca844
174m13.324s	a0542ed3ff700eca35f9195f743c9e28bcd50f3e	f43778c2d19e70d4befd483b06aaf247fc65c799	23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
   this is time when killed at "NOTE: Executing Tasks" without showing any progress on the tasks (unlike other tests) and only one bitbake process is running
633m27.475s	master-next					fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
   this is time when killed at (1417 from 71749) - running very slowly unlike other bitbake revisions where the task number changes relatively quickly - about 10 tasks per second
89m13.992s	master-next + "Holding off tasks %s" out
89m59.201s	master-next updated today			85fe627fdb6510f0942917964386fad9d8c479c8

Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
slower than recent master-next.

I'll send -P output next.

> 
> So basically the original changes showed a 25% hit but the performance
> of -next is dire.
> 
> This is with no hash equiv server configured.
> 
> It will vary depending on the target used (numbers with -sato for the
> above would be interesting for comparision) and how much was or is in
> sstate, they type of sstate mirror configured and so on.
> 
> I really need to focus on getting the new code functioning correctly
> before we attempt to optimise but if nobody tests the new code due to
> performance problems we have a different issue. We also have a scaling
> problem with the hash server itself I need to fix to stop the
> autobuilder throwing weird errors. I'm therefore a bit challenged on
> where to start with it all :/.
> 
> Cheers,
> 
> Richard
> 
> 
> 
> -- 
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 201 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-15 13:56                   ` richard.purdie
@ 2019-08-15 22:44                     ` Richard Purdie
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2019-08-15 22:44 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: Peter Kjellerstedt, OE-core

On Thu, 2019-08-15 at 14:56 +0100, richard.purdie@linuxfoundation.org
wrote:
> On Thu, 2019-08-15 at 14:56 +0200, Alexander Kanavin wrote:
> What is really odd is this on both traces:
> 
>    524436  405.102    0.001  405.266    0.001
> /home/alexander/development/poky/bitbake/lib/bb/cooker.py:264(notific
> ations)
> 
> What that means is that cooker recieved notification of 524436 file
> changes whilst it was parsing/preparing runqueue.
> 
> I would be very interested to understand which files its being
> notified
> of changes to as I suspect there is something that shouldn't be
> there.

This is a bit of a false lead. At least locally I found them to be
profiles of the individual tasks being written out which is only
triggered on profiled builds.

We should write those files to a different directory to avoid this
confusion of the profiles...

Cheers,

Richard



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-15 15:05   ` Martin Jansa
@ 2019-08-16 10:24     ` Martin Jansa
  2019-08-16 15:00       ` Martin Jansa
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Jansa @ 2019-08-16 10:24 UTC (permalink / raw)
  To: Richard Purdie; +Cc: OE-core, Peter Kjellerstedt

[-- Attachment #1: Type: text/plain, Size: 7801 bytes --]

On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > > Comparing that build to a corresponding do-nothing build with Thud,
> > > the time difference matches those three minutes where I have no idea
> > > what bitbake is doing now that it didn’t need to do before…
> > >  
> > > Hopefully these time degradations can be solved, because the current
> > > state of bitbake is barely usable. We also need to look into possible
> > > ways to improve the cooker output when it is running the setscene
> > > tasks so it makes some kind of sense again.
> > 
> > We talked on irc and you pointed at the commit things started to go
> > wrong. Just to summarise things for the benefit of the list, this is
> > some quick testing I did:
> > 
> > "bitbake -p; time bitbake core-image-minimal -n"
> > 
> > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > 40.3s 7df31ff36892c2f9c65326b06b4c70 
> > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
> > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
> > 76.9s master-next
> 
> I know I'm late to the party, but it took really long for the test to finish..
> 
> I'm not using poky, so reproduce this testing on our builds I've used
> bitbake revisions matching with poky revision RP was testing:
> 
> + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because current master isn't compatible with old bitbake (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], expression was ${@oe_import(d)} which triggered exception AttributeError: module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to possibly eliminate impact of metadata changes)
> 
> tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM
> 
> nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server configured
> 
> on a build with few more layers:
> Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 1927 skipped, 54 masked, 0 errors.
> 
> but first doing just core-image-minimal like RP did:
> 
> time		poky						bitbake						oe-core
> 2m8.191s	6c7c0cefd34067311144a1d4c01986fe0a4aef26	18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7	18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> N/A		a0d941c787cf3ef030d190903279d311bc05d752	doesn't exist in poky/poky-contrib		18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m17.053s	7df31ff36892c2f9c65326b06b4c70			1f630fdf0260db08541d3ca9f25f852931c19905	18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m18.515s	a0542ed3ff700eca35f9195f743c9e28bcd50f3e	f43778c2d19e70d4befd483b06aaf247fc65c799	23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> 2m22.220s	9983b07fffd19082abded7c3f15cc77d306dd69c	8c26b451f22193ef1c544e2017cc84515566c1b8
> 2m38.185s	master-next					fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> cache.py:446: ResourceWarning: unclosed <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 41117)>
>   value = pickled.load()
> started showing with this revision
> 2m17.991s	master-next + fixups				f7cd14f90463957b3e4be6d3876def98b78f1424
> 2m9.651s	master-next + "Holding off tasks %s" out
> 
> now world
> 253m36.637s	7df31ff36892c2f9c65326b06b4c70			1f630fdf0260db08541d3ca9f25f852931c19905	18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 174m13.324s	a0542ed3ff700eca35f9195f743c9e28bcd50f3e	f43778c2d19e70d4befd483b06aaf247fc65c799	23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
>    this is time when killed at "NOTE: Executing Tasks" without showing any progress on the tasks (unlike other tests) and only one bitbake process is running
> 633m27.475s	master-next					fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
>    this is time when killed at (1417 from 71749) - running very slowly unlike other bitbake revisions where the task number changes relatively quickly - about 10 tasks per second
> 89m13.992s	master-next + "Holding off tasks %s" out
> 89m59.201s	master-next updated today			85fe627fdb6510f0942917964386fad9d8c479c8
> 
> Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
> slower than recent master-next.
> 
> I'll send -P output next.


So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because
this commit introduced that
logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))

Here is -P output from it:

5min (til "Executing Tasks" message):
profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.return.at.Executing.Tasks
http://paste.ubuntu.com/p/Nw6n5hTPmF/

5hours (whole dry run):
profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.whole.dry.run
http://paste.ubuntu.com/p/sydFGvYTxQ/
profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.worker.log.processed
http://paste.ubuntu.com/p/RMGbm9DyZz/

So for good old baseline I need to go few commits back, to e.g. 18d4a31f RP was testing before:

1f630fdf runqueue: Enable dynamic task adjustment to hash equivalency
a0d941c7 siggen: Convert to use self.unitaskhashes
1f326f2c siggen: Add new unitaskhashes data variable which is cached
b707d0cb runqueue: Improve scenequeue processing logic
19a6e356 cache: Add SimpleCache class
7bb79099 siggen: Import unihash code from OE-Core
7aec8632 hashserv: SQL Optimizations
18d4a31f fetch2/wget: avoid 'maximum recursion depth' RuntimeErrors when handling 403 codes

94m19.081s
profile.log.processed.18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.whole.dry.run
http://paste.ubuntu.com/p/Jnvv8GJNKB/

And here is profile with latest master from yesterday:
90mins:
profile.log.processed.a5e084a266f63c2fd370122327615e49beaeb94e.whole.dry.run
http://paste.ubuntu.com/p/YhMCHPmg56/

So good news is that current master is a bit faster than "good old" 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.

But still significantly slower on my fast builder compared to world tests on poky + meta-oe,
not sure if it's all just because the size of metadata in my builds (with 71K tasks).

Even the dry running the tasks was about 10 tasks/second while RP was seeing around 200 tasks/second.

Will try to bump BB_NUMBER_THREADS from 8 to 72.

I've already disabled rm_work and PRserv, but that doesn't seem to have any significant impact on performance here. Other than resolving the PRserv terminating and then warnings about lost connection.

Cheers,
> 
> > 
> > So basically the original changes showed a 25% hit but the performance
> > of -next is dire.
> > 
> > This is with no hash equiv server configured.
> > 
> > It will vary depending on the target used (numbers with -sato for the
> > above would be interesting for comparision) and how much was or is in
> > sstate, they type of sstate mirror configured and so on.
> > 
> > I really need to focus on getting the new code functioning correctly
> > before we attempt to optimise but if nobody tests the new code due to
> > performance problems we have a different issue. We also have a scaling
> > problem with the hash server itself I need to fix to stop the
> > autobuilder throwing weird errors. I'm therefore a bit challenged on
> > where to start with it all :/.
> > 
> > Cheers,
> > 
> > Richard
> > 
> > 
> > 
> > -- 
> > _______________________________________________
> > Openembedded-core mailing list
> > Openembedded-core@lists.openembedded.org
> > http://lists.openembedded.org/mailman/listinfo/openembedded-core
> 
> -- 
> Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com



-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 201 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-16 10:24     ` Martin Jansa
@ 2019-08-16 15:00       ` Martin Jansa
  2019-08-16 15:54         ` richard.purdie
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Jansa @ 2019-08-16 15:00 UTC (permalink / raw)
  To: Richard Purdie; +Cc: OE-core, Peter Kjellerstedt

[-- Attachment #1: Type: text/plain, Size: 9001 bytes --]

> Will try to bump BB_NUMBER_THREADS from 8 to 72.

I've tried to remove icecc.bbclass inherit (because it does things while
parsing and RP probably doesn't have it inherited), but that didn't save
much time.

All 3 tests were with bitbake 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
94m19.081s  8 BB_NUMBER_THREADS and icecc
82m59.574s  8 BB_NUMBER_THREADS no icecc
68m3.556s    72 BB_NUMBER_THREADS no icecc

Still don't know how to get to sub 10min world runs RP was seeing, but at
least it's as slow as it was before runqeueu changes (or even a bit faster
in lastest master).

On Fri, Aug 16, 2019 at 12:24 PM Martin Jansa <martin.jansa@gmail.com>
wrote:

> On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> > On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > > > Comparing that build to a corresponding do-nothing build with Thud,
> > > > the time difference matches those three minutes where I have no idea
> > > > what bitbake is doing now that it didn’t need to do before…
> > > >
> > > > Hopefully these time degradations can be solved, because the current
> > > > state of bitbake is barely usable. We also need to look into possible
> > > > ways to improve the cooker output when it is running the setscene
> > > > tasks so it makes some kind of sense again.
> > >
> > > We talked on irc and you pointed at the commit things started to go
> > > wrong. Just to summarise things for the benefit of the list, this is
> > > some quick testing I did:
> > >
> > > "bitbake -p; time bitbake core-image-minimal -n"
> > >
> > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > > 76.9s master-next
> >
> > I know I'm late to the party, but it took really long for the test to
> finish..
> >
> > I'm not using poky, so reproduce this testing on our builds I've used
> > bitbake revisions matching with poky revision RP was testing:
> >
> > + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake,
> because current master isn't compatible with old bitbake
> (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=],
> expression was ${@oe_import(d)} which triggered exception AttributeError:
> module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> > and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake
> (to possibly eliminate impact of metadata changes)
> >
> > tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB
> RAM
> >
> > nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence
> Server configured
> >
> > on a build with few more layers:
> > Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632
> targets, 1927 skipped, 54 masked, 0 errors.
> >
> > but first doing just core-image-minimal like RP did:
> >
> > time          poky                                            bitbake
>                                      oe-core
> > 2m8.191s      6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > N/A           a0d941c787cf3ef030d190903279d311bc05d752        doesn't
> exist in poky/poky-contrib
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 2m17.053s     7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 2m18.515s     a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> > 2m22.220s     9983b07fffd19082abded7c3f15cc77d306dd69c
> 8c26b451f22193ef1c544e2017cc84515566c1b8
> > 2m38.185s     master-next
>  fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> > cache.py:446: ResourceWarning: unclosed <socket.socket fd=10,
> family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0,
> laddr=('127.0.0.1', 41117)>
> >   value = pickled.load()
> > started showing with this revision
> > 2m17.991s     master-next + fixups
> f7cd14f90463957b3e4be6d3876def98b78f1424
> > 2m9.651s      master-next + "Holding off tasks %s" out
> >
> > now world
> > 253m36.637s   7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 174m13.324s   a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> >    this is time when killed at "NOTE: Executing Tasks" without showing
> any progress on the tasks (unlike other tests) and only one bitbake process
> is running
> > 633m27.475s   master-next
>  fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> >    this is time when killed at (1417 from 71749) - running very slowly
> unlike other bitbake revisions where the task number changes relatively
> quickly - about 10 tasks per second
> > 89m13.992s    master-next + "Holding off tasks %s" out
> > 89m59.201s    master-next updated today
>  85fe627fdb6510f0942917964386fad9d8c479c8
> >
> > Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
> > slower than recent master-next.
> >
> > I'll send -P output next.
>
>
> So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because
> this commit introduced that
> logger.debug(2, "Holding off tasks %s" %
> pprint.pformat(self.holdoff_tasks))
>
> Here is -P output from it:
>
> 5min (til "Executing Tasks" message):
> profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.return.at
> .Executing.Tasks
> http://paste.ubuntu.com/p/Nw6n5hTPmF/
>
> 5hours (whole dry run):
>
> profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.whole.dry.run
> http://paste.ubuntu.com/p/sydFGvYTxQ/
>
> profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.worker.log.processed
> http://paste.ubuntu.com/p/RMGbm9DyZz/
>
> So for good old baseline I need to go few commits back, to e.g. 18d4a31f
> RP was testing before:
>
> 1f630fdf runqueue: Enable dynamic task adjustment to hash equivalency
> a0d941c7 siggen: Convert to use self.unitaskhashes
> 1f326f2c siggen: Add new unitaskhashes data variable which is cached
> b707d0cb runqueue: Improve scenequeue processing logic
> 19a6e356 cache: Add SimpleCache class
> 7bb79099 siggen: Import unihash code from OE-Core
> 7aec8632 hashserv: SQL Optimizations
> 18d4a31f fetch2/wget: avoid 'maximum recursion depth' RuntimeErrors when
> handling 403 codes
>
> 94m19.081s
>
> profile.log.processed.18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.whole.dry.run
> http://paste.ubuntu.com/p/Jnvv8GJNKB/
>
> And here is profile with latest master from yesterday:
> 90mins:
>
> profile.log.processed.a5e084a266f63c2fd370122327615e49beaeb94e.whole.dry.run
> http://paste.ubuntu.com/p/YhMCHPmg56/
>
> So good news is that current master is a bit faster than "good old"
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.
>
> But still significantly slower on my fast builder compared to world tests
> on poky + meta-oe,
> not sure if it's all just because the size of metadata in my builds (with
> 71K tasks).
>
> Even the dry running the tasks was about 10 tasks/second while RP was
> seeing around 200 tasks/second.
>
> Will try to bump BB_NUMBER_THREADS from 8 to 72.
>
> I've already disabled rm_work and PRserv, but that doesn't seem to have
> any significant impact on performance here. Other than resolving the PRserv
> terminating and then warnings about lost connection.
>
> Cheers,
> >
> > >
> > > So basically the original changes showed a 25% hit but the performance
> > > of -next is dire.
> > >
> > > This is with no hash equiv server configured.
> > >
> > > It will vary depending on the target used (numbers with -sato for the
> > > above would be interesting for comparision) and how much was or is in
> > > sstate, they type of sstate mirror configured and so on.
> > >
> > > I really need to focus on getting the new code functioning correctly
> > > before we attempt to optimise but if nobody tests the new code due to
> > > performance problems we have a different issue. We also have a scaling
> > > problem with the hash server itself I need to fix to stop the
> > > autobuilder throwing weird errors. I'm therefore a bit challenged on
> > > where to start with it all :/.
> > >
> > > Cheers,
> > >
> > > Richard
> > >
> > >
> > >
> > > --
> > > _______________________________________________
> > > Openembedded-core mailing list
> > > Openembedded-core@lists.openembedded.org
> > > http://lists.openembedded.org/mailman/listinfo/openembedded-core
> >
> > --
> > Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com
>
>
>
> --
> Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com
>

[-- Attachment #2: Type: text/html, Size: 11514 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-16 15:00       ` Martin Jansa
@ 2019-08-16 15:54         ` richard.purdie
  2019-08-16 17:22           ` Martin Jansa
  0 siblings, 1 reply; 24+ messages in thread
From: richard.purdie @ 2019-08-16 15:54 UTC (permalink / raw)
  To: Martin Jansa; +Cc: OE-core, Peter Kjellerstedt

On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote:
> > Will try to bump BB_NUMBER_THREADS from 8 to 72.
> 
> I've tried to remove icecc.bbclass inherit (because it does things
> while parsing and RP probably doesn't have it inherited), but that
> didn't save much time.
> 
> All 3 tests were with bitbake
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 94m19.081s  8 BB_NUMBER_THREADS and icecc
> 82m59.574s  8 BB_NUMBER_THREADS no icecc
> 68m3.556s    72 BB_NUMBER_THREADS no icecc
> 
> Still don't know how to get to sub 10min world runs RP was seeing,
> but at least it's as slow as it was before runqeueu changes (or even
> a bit faster in lastest master).

Just thinking out loud, other things which can influence timings:

Is SSTATE_DIR on NFS or local disk?
Are sstate mirrors configured?
Is there an existing build or not, if so, how much is valid?
Underlying filesystem of the build?

Your build seems especially slow at executing through the tasks which
is effectively a test on how fast the system can fork() and return in
some ways. It would be interesting to block dry-run on the server side,
skip the fork and see how the numbers compare.

My build manages some parts of the tasklist faster than others, perhaps
because there are more "free" tasks to execute at some points in the
task graph than others.

Also, I have some patches which improve performance a bit which I'm
still testing.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-16 15:54         ` richard.purdie
@ 2019-08-16 17:22           ` Martin Jansa
  2019-08-19 21:23             ` Martin Jansa
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Jansa @ 2019-08-16 17:22 UTC (permalink / raw)
  To: richard.purdie; +Cc: OE-core, Peter Kjellerstedt

[-- Attachment #1: Type: text/plain, Size: 2393 bytes --]

On Fri, Aug 16, 2019 at 04:54:48PM +0100, richard.purdie@linuxfoundation.org wrote:
> On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote:
> > > Will try to bump BB_NUMBER_THREADS from 8 to 72.
> > 
> > I've tried to remove icecc.bbclass inherit (because it does things
> > while parsing and RP probably doesn't have it inherited), but that
> > didn't save much time.
> > 
> > All 3 tests were with bitbake
> > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> > 94m19.081s  8 BB_NUMBER_THREADS and icecc
> > 82m59.574s  8 BB_NUMBER_THREADS no icecc
> > 68m3.556s    72 BB_NUMBER_THREADS no icecc
> > 
> > Still don't know how to get to sub 10min world runs RP was seeing,
> > but at least it's as slow as it was before runqeueu changes (or even
> > a bit faster in lastest master).
> 
> Just thinking out loud, other things which can influence timings:
> 
> Is SSTATE_DIR on NFS or local disk?

SSTATE_DIR is empty directory on local disk
/dev/mapper/vg00-jenkins on /jenkins type ext4 (rw,noatime,nobarrier,commit=6000,stripe=128,data=ordered)

> Are sstate mirrors configured?

None, normally I have SSTATE_MIRRORS over sshfs in this case, but I've
removed it before any performance testing.

> Is there an existing build or not, if so, how much is valid?

Nothing, I remove whole TMPDIR and cache before each run. Then let it
recreate the cache before starting the profile:
bitbake -p; time bitbake world -P -n

> Underlying filesystem of the build?

ext4, everything is pretty much generic Ubuntu 18.04

There is plenty of ram, I'll try to test this from tmpfs as well.

> Your build seems especially slow at executing through the tasks which
> is effectively a test on how fast the system can fork() and return in
> some ways. It would be interesting to block dry-run on the server side,
> skip the fork and see how the numbers compare.

As discussed on IRC, it's slower than yours (8 minutes from 68), but the
most significant chunk of time is lost somewhere else.

> My build manages some parts of the tasklist faster than others, perhaps
> because there are more "free" tasks to execute at some points in the
> task graph than others.
> 
> Also, I have some patches which improve performance a bit which I'm
> still testing.

Thanks for all the work on this!

Cheers,
-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 201 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
  2019-08-16 17:22           ` Martin Jansa
@ 2019-08-19 21:23             ` Martin Jansa
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Jansa @ 2019-08-19 21:23 UTC (permalink / raw)
  To: richard.purdie; +Cc: OE-core, Peter Kjellerstedt

[-- Attachment #1: Type: text/plain, Size: 3977 bytes --]

On Fri, Aug 16, 2019 at 07:22:55PM +0200, Martin Jansa wrote:
> On Fri, Aug 16, 2019 at 04:54:48PM +0100, richard.purdie@linuxfoundation.org wrote:
> > On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote:
> > > > Will try to bump BB_NUMBER_THREADS from 8 to 72.
> > > 
> > > I've tried to remove icecc.bbclass inherit (because it does things
> > > while parsing and RP probably doesn't have it inherited), but that
> > > didn't save much time.
> > > 
> > > All 3 tests were with bitbake
> > > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> > > 94m19.081s  8 BB_NUMBER_THREADS and icecc
> > > 82m59.574s  8 BB_NUMBER_THREADS no icecc
> > > 68m3.556s    72 BB_NUMBER_THREADS no icecc
> > > 
> > > Still don't know how to get to sub 10min world runs RP was seeing,
> > > but at least it's as slow as it was before runqeueu changes (or even
> > > a bit faster in lastest master).
> > 
> > Just thinking out loud, other things which can influence timings:
> > 
> > Is SSTATE_DIR on NFS or local disk?
> 
> SSTATE_DIR is empty directory on local disk
> /dev/mapper/vg00-jenkins on /jenkins type ext4 (rw,noatime,nobarrier,commit=6000,stripe=128,data=ordered)
> 
> > Are sstate mirrors configured?
> 
> None, normally I have SSTATE_MIRRORS over sshfs in this case, but I've
> removed it before any performance testing.
> 
> > Is there an existing build or not, if so, how much is valid?
> 
> Nothing, I remove whole TMPDIR and cache before each run. Then let it
> recreate the cache before starting the profile:
> bitbake -p; time bitbake world -P -n
> 
> > Underlying filesystem of the build?
> 
> ext4, everything is pretty much generic Ubuntu 18.04
> 
> There is plenty of ram, I'll try to test this from tmpfs as well.

I did the same "bitbake -p; time bitbake world -P -n" on the same build with thud, warrior and zeus:

thud/profile.log.processed:         2803894450 function calls (2803191143 primitive calls) in 3340.784 seconds
thud/profile-worker.log.processed:         27005515 function calls (26944030 primitive calls) in 3243.139 seconds
warrior/profile.log.processed:         2804294486 function calls (2803446296 primitive calls) in 3604.226 seconds
warrior/profile-worker.log.processed:         27649679 function calls (27591220 primitive calls) in 3503.772 seconds
zeus/profile.log.processed:         2327031298 function calls (2326396186 primitive calls) in 4433.851 seconds
zeus/profile-worker.log.processed:         28829453 function calls (28771730 primitive calls) in 4331.257 seconds

thud/time:real    55m41.595s
warrior/time:real    60m4.954s
zeus/time:real    72m23.053s

with multilib disabled I got
zeus-no-multilib/profile.log.processed:         1232798107 function calls (1232447521 primitive calls) in 1773.164 seconds
zeus-no-multilib/profile-worker.log.processed:         15561565 function calls (15555044 primitive calls) in 1716.234 seconds
real    29m33.658s

Which seems reasonable as the number of tasks was cut in half.

Let me know if it's worth uploading these profiles somewhere.

Cheers,

> > Your build seems especially slow at executing through the tasks which
> > is effectively a test on how fast the system can fork() and return in
> > some ways. It would be interesting to block dry-run on the server side,
> > skip the fork and see how the numbers compare.
> 
> As discussed on IRC, it's slower than yours (8 minutes from 68), but the
> most significant chunk of time is lost somewhere else.
> 
> > My build manages some parts of the tasklist faster than others, perhaps
> > because there are more "free" tasks to execute at some points in the
> > task graph than others.
> > 
> > Also, I have some patches which improve performance a bit which I'm
> > still testing.
> 
> Thanks for all the work on this!
> 
> Cheers,
> -- 
> Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com



-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 201 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-08-19 21:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12 20:26 Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively) Peter Kjellerstedt
2019-08-13  9:04 ` Richard Purdie
2019-08-13 13:20   ` Alexander Kanavin
2019-08-13 19:18   ` Richard Purdie
2019-08-14  4:06     ` Khem Raj
2019-08-14 11:25     ` Alexander Kanavin
2019-08-14 11:36       ` richard.purdie
2019-08-14 12:08         ` Alexander Kanavin
2019-08-14 12:43           ` Alexander Kanavin
2019-08-14 12:50           ` Mikko.Rapeli
2019-08-14 12:55           ` richard.purdie
2019-08-14 14:57             ` Peter Kjellerstedt
2019-08-14 15:19               ` Khem Raj
2019-08-14 20:27             ` Alexander Kanavin
2019-08-14 21:25               ` richard.purdie
2019-08-15 12:56                 ` Alexander Kanavin
2019-08-15 13:56                   ` richard.purdie
2019-08-15 22:44                     ` Richard Purdie
2019-08-15 15:05   ` Martin Jansa
2019-08-16 10:24     ` Martin Jansa
2019-08-16 15:00       ` Martin Jansa
2019-08-16 15:54         ` richard.purdie
2019-08-16 17:22           ` Martin Jansa
2019-08-19 21:23             ` Martin Jansa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.