bitbake-devel.lists.openembedded.org archive mirror
 help / color / mirror / Atom feed
From: Randy MacLeod <randy.macleod@windriver.com>
To: bitbake-devel@lists.openembedded.org,
	Richard Purdie <richard.purdie@linuxfoundation.org>,
	ola.x.nilsson@axis.com
Cc: Zheng.qiu@uwaterloo.ca
Subject: Re: [bitbake-devel] Bitbake PSI checker
Date: Mon, 12 Dec 2022 15:48:10 -0500	[thread overview]
Message-ID: <49ffc1db-9b43-e570-d726-dba12d560a30@windriver.com> (raw)
In-Reply-To: <jwq5yegq5h5.fsf@axis.com>

CCing Richard

On 2022-12-12 05:07, Ola x Nilsson via lists.openembedded.org wrote:
> Hi,
>
> I've been looking into using the pressure stall information awareness of
> bitbake
That's good to hear Ola.
>   but I have some problems getting it to work.  Actually I think
> it just doesn't work at all.

Doesn't work at all?

Well that would be surprising. See below.

>
> Reading the code I find that
> runqueue.QunQueueScheduler.exceeds_max_pressure claims to "Monitor the
> difference in pressure at least once per second".

That comment isn't accurate. I'll fix it.

Currently, the pressure is only checked when
bitbake is looking for the next_buildable_task.

This can occur many/100s of times per seconds at some points in a build
and later, when larger recipes are compiling, the function may not be called
for 10s or 100s of seconds depending on what is being built.


>   But using some
> debugprints added to that method I see output like
>
> 1670840023.757171 cpu_pressure 0.0 io_pressure 0.0 mem_pressure 0.0
> 1670840023.758697 cpu_pressure 0.0 io_pressure 0.0 mem_pressure 0.0
> 1670840023.760158 cpu_pressure 0.0 io_pressure 0.0 mem_pressure 0.0
> 1670840023.761733 cpu_pressure 0.0 io_pressure 0.0 mem_pressure 0.0
> 1670840023.959357 cpu_pressure 969.0 io_pressure 16135.0 mem_pressure 0.0
   19 second gap
> 1670840042.384582 cpu_pressure 8978077.0 io_pressure 1353882.0 mem_pressure 20922.0
> 1670840042.384582 cpu io  pressure exceeded over 18.677629 seconds
> 1670840042.486946 cpu_pressure 466.0 io_pressure 30792.0 mem_pressure 0.0
> 1670840042.490340 cpu_pressure 466.0 io_pressure 30792.0 mem_pressure 0.0
>
> where the first column is the value of 'now', and the pressure values
> are the calculated deltas.  The 0-pressure values are probably because
> this is very early in the run and the time delta is less than 0.01
> seconds.
>
> But there is a time delta of almost 19 seconds between line 5 and 6, and
> unsurprisingly the pressure exceeds my max settings of CPU:600000 and
> IO:200000.
>
> But the very next check is only 0.1 second later and while the
> prev-values wont be updated, the calculated pressure will be used.  This
> pressure will be below my settings and a new task will be started.

Yes, that's a bug and I need to fix it. See below.

>
> Am I missing something here?

You aren't missing anything.

The code has "limitations" but it has still proven useful to some people
and on the Yocto Autobuilder system. Note the lack of 'interval" errors 
starting

around Aug 18th, 2022, when we enabled this feature for the YP Autobuilder:

    https://autobuilder.yocto.io/pub/non-release/


> If the pressure should be monitored each
> second, isn't it reasonable to have some sort of tick to update the
> pre-values?  And using the pressure delta of intervals of less than a
> second also seems to give too low pressure values.

That would be a better implementation in some ways but
what we've done so far is only check the pressure when
bitbake is checking for a new task to run. This will be less
intrusive and people do worry about the efficiency of bitbake.
Adding a 1 second timer may not be where we want to go.

It's a little tricky to provide short-term averaging regardless
of how often the function is called. Here are the improvements
that I'm considering:

1. Rather than just keep track of the previous pressure values
seen more than 1 second ago as done currently:

       if now - self.prev_pressure_time > 1.0:

and always using that as a reference, we can
store say 10 values per second and use that as a reference.

There are some challenges in that approach in that we don't control
how often the function is called. Averaging over the last 10 calls
is tempting but likely has some edge cases such as when there are
lots of tasks starting/ending.


2. If there has been a long delay since the function was last called,
we could check the pressure, sleep for a short period of time and check it
again. Some people would not like this since it will needlessly delay 
the build
so we'd have to keep the delay to < 1 second. Too short a delay will reduce
the accuracy of the result but I suspect that 0.1 seconds is sufficient 
for most
users. We could also look at the avg10 value in this case or even some 
combination of
both the current contention and avg10.


3. Just calculate the pressure per second by:

    ( current pressure - last pressure ) / (now - last_time)

This could handle  short time differences such os milliseconds
as would be a 'cheap' way to deal with long delays. In your case,
the pressure would be:

   978077.0 io_pressure 1353882.0 mem_pressure 20922.0

divided by ~19 since the initial values were close to zero.

Then for the next time, just 0.1 seconds later:

1670840042.384582 cpu_pressure 8978077.0 io_pressure 1353882.0 mem_pressure 20922.0
1670840042.384582 cpu io  pressure exceeded over 18.677629 seconds
1670840042.486946 cpu_pressure 466.0 io_pressure 30792.0 mem_pressure 0.0

Multiplying by 10 or easy calculation, the would be a pressure:

cpu: 4660, io: 307920, mem: 0.


Do you have another idea or a preference as to which approach we take?


../Randy


>
> /Ola Nilsson
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#14178): https://lists.openembedded.org/g/bitbake-devel/message/14178
> Mute This Topic: https://lists.openembedded.org/mt/95618299/3616765
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [randy.macleod@windriver.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>

-- 
# Randy MacLeod
# Wind River Linux



  reply	other threads:[~2022-12-12 20:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-12 10:07 Bitbake PSI checker Ola x Nilsson
2022-12-12 20:48 ` Randy MacLeod [this message]
2022-12-19 12:50   ` [bitbake-devel] " Ola x Nilsson
2022-12-19 19:49     ` contrib
2023-05-20 19:58       ` Randy MacLeod
2023-05-22  2:17         ` ChenQi
2023-05-22  9:36           ` Ola x Nilsson
2023-05-22 14:41             ` Randy MacLeod
2023-05-23  2:08               ` Chen, Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49ffc1db-9b43-e570-d726-dba12d560a30@windriver.com \
    --to=randy.macleod@windriver.com \
    --cc=Zheng.qiu@uwaterloo.ca \
    --cc=bitbake-devel@lists.openembedded.org \
    --cc=ola.x.nilsson@axis.com \
    --cc=richard.purdie@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).