From: ChenQi <Qi.Chen@windriver.com>
To: Randy MacLeod <randy.macleod@windriver.com>,
contrib@zhengqiu.net, Ola x Nilsson <ola.x.nilsson@axis.com>
Cc: Richard Purdie <richard.purdie@linuxfoundation.org>,
bitbake-devel@lists.openembedded.org
Subject: Re: [bitbake-devel] Bitbake PSI checker
Date: Mon, 22 May 2023 10:17:06 +0800 [thread overview]
Message-ID: <5fb8eecc-135f-c34a-3d1a-1d4b9ae62509@windriver.com> (raw)
In-Reply-To: <732553c1-870e-c794-c245-d664afa14343@windriver.com>
[-- Attachment #1: Type: text/plain, Size: 10747 bytes --]
Hi Ola & Randy,
I just checked the codes and I think Ola is right. The current PSI check
cannot block spawning of new tasks if the time interval is small between
current check and last check. I'll send out a patch to fix this issue.
Also, I don't think calculating the value too often is a good idea, so
I'll change the check to be >1s.
Please help review the patch.
Regards,
Qi
On 5/21/23 03:58, Randy MacLeod wrote:
> On 2022-12-19 14:49, Zheng Qiu via lists.openembedded.org wrote:
>>
>>
>>> On Dec 19, 2022, at 7:50 AM, Ola x Nilsson <ola.x.nilsson@axis.com>
>>> wrote:
>>>
>>>
>>> On Mon, Dec 12 2022, Randy MacLeod wrote:
>>>
>>>> CCing Richard
>>>>
>>>> On 2022-12-12 05:07, Ola x Nilsson via lists.openembedded.org wrote:
>>>>> Hi,
>>>>>
>>>>> I've been looking into using the pressure stall information
>>>>> awareness of
>>>>> bitbake
>>>> That's good to hear Ola.
>>>>> but I have some problems getting it to work. Actually I think
>>>>> it just doesn't work at all.
>>>>
>>>> Doesn't work at all?
>>>>
>>>> Well that would be surprising. See below.
>>>
>>> OK, it will occasionally block a task. But since the next attempt will
>>> always be a very short time interval it will almost always start a new
>>> task even if the pressure is high.
>>> At least this is what I observe on my system.
>>>
>>> <snip>
>>>
>>>> 1. Rather than just keep track of the previous pressure values
>>>> seen more than 1 second ago as done currently:
>>>>
>>>> if now - self.prev_pressure_time > 1.0:
>>>>
>>>> and always using that as a reference, we can
>>>> store say 10 values per second and use that as a reference.
>>>>
>>>> There are some challenges in that approach in that we don't control
>>>> how often the function is called. Averaging over the last 10 calls
>>>> is tempting but likely has some edge cases such as when there are
>>>> lots of tasks starting/ending.
>>>>
>>>>
>>>> 2. If there has been a long delay since the function was last called,
>>>> we could check the pressure, sleep for a short period of time and
>>>> check it
>>>> again. Some people would not like this since it will needlessly delay
>>>> the build
>>>> so we'd have to keep the delay to < 1 second. Too short a delay
>>>> will reduce
>>>> the accuracy of the result but I suspect that 0.1 seconds is sufficient
>>>> for most
>>>> users. We could also look at the avg10 value in this case or even some
>>>> combination of
>>>> both the current contention and avg10.
>>>>
>>>>
>>>> 3. Just calculate the pressure per second by:
>>>>
>>>> ( current pressure - last pressure ) / (now - last_time)
>>>>
>>>> This could handle short time differences such os milliseconds
>>>> as would be a 'cheap' way to deal with long delays. In your case,
>>>> the pressure would be:
>>>>
>>>> 978077.0 io_pressure 1353882.0 mem_pressure 20922.0
>>>>
>>>> divided by ~19 since the initial values were close to zero.
>>>>
>>>> Then for the next time, just 0.1 seconds later:
>>>>
>>>> 1670840042.384582 cpu_pressure 8978077.0 io_pressure 1353882.0
>>>> mem_pressure 20922.0
>>>> 1670840042.384582 cpu io pressure exceeded over 18.677629 seconds
>>>> 1670840042.486946 cpu_pressure 466.0 io_pressure 30792.0
>>>> mem_pressure 0.0
>>>>
>>>> Multiplying by 10 or easy calculation, the would be a pressure:
>>>>
>>>> cpu: 4660, io: 307920, mem: 0.
>>>>
>>>>
>>>> Do you have another idea or a preference as to which approach we take?
>>>
>>> I think 3 is a good first step. Using multiple samples could improve
>>> our calculated "avg1", but lets do that later if needed.
>>
>> I agree; Randy and I have been working on patching make and have
>> taken a similar approach:
>> make.png
>> ZhengQ2/make at cpu-pressure
>> <https://github.com/ZhengQ2/make/tree/cpu-pressure>
>> github.com <https://github.com/ZhengQ2/make/tree/cpu-pressure>
>>
>> <https://github.com/ZhengQ2/make/tree/cpu-pressure>
>> Additionally, we found that when the pressure read is too frequent,
>> we may get the same cpu pressure as an result,
>> even if the pressure have actually changed. This is likely due to the
>> per cpu variables used in the kernel.
>> So, in addition to the algorithm Randy talked above, we also compares
>> if the cpu pressure has been changed, if not,
>> we will return the last result that has been produced.
>>
>> I will CC you when I have a patch, and you can try it out before the
>> commit gets merged if you like.
>
>
> Ola,
>
> Does Qi's patch below help in your situation?
>
> I still want/intent to add a bitbake PSI test case that uses stress-ng
> to induce load
> and a lightweight sleep task but there are never enough hours in the
> day/week/...
>
> The basic idea is to:
>
> 1. Run a task that just sleeps for say 10 seconds and confirm that the
> actual
> execution time is < 11 seconds or so.
>
> 2. use stress to get the system into a CPU pressure environment above
> the current threshold for say 30 seconds and simultaneously / shortly
> there after,
> launch the same sleep task and confirm that this time, the actual
> exectuion time of
> the launch to completion time is 40+ seconds.
>
> ../Randy 'getting caught up on email on the weekend' MacLeod
>
>
> ❯ git show ba94f9a3b1960cc0fdc831c20a9d2f8ad289f307
> commit ba94f9a3b1960cc0fdc831c20a9d2f8ad289f307
> Author: Chen Qi <Qi.Chen@windriver.com>
> Date: Thu Apr 6 23:07:14 2023
>
> bitbake: runqueue: fix PSI check calculation
>
> The current PSI check calculation does not take into consideration
> the possibility of the time interval between last check and current
> check being much larger than 1s. In fact, the current behavior does
> not match what the manual says about BB_PRESSURE_MAX_XXX, even if
> the value is set to upper limit, 1000000, we still get many blocks
> on new task launch. The difference between 'total' should be divided
> by the time interval if it's larger than 1s.
>
> (Bitbake rev: b4763c2c93e7494e0a27f5970c19c1aac66c228b)
>
> Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>
>
> Δ bitbake/lib/bb/runqueue.py
> ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>
> ────────────────────────────────────────┐
> • 198: class RunQueueScheduler(object): │
> ────────────────────────────────────────┘
> curr_cpu_pressure =
> cpu_pressure_fds.readline().split()[4].split("=")[1]
> curr_io_pressure =
> io_pressure_fds.readline().split()[4].split("=")[1]
> curr_memory_pressure =
> memory_pressure_fds.readline().split()[4].split("=")[1]
> exceeds_cpu_pressure = self.rq.max_cpu_pressure and
> (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) >
> self.rq.max_cpu_pressure
> exceeds_io_pressure = self.rq.max_io_pressure and
> (float(curr_io_pressure) - float(self.prev_io_pressure)) >
> self.rq.max_io_pressure
> exceeds_memory_pressure = self.rq.max_memory_pressure
> and (float(curr_memory_pressure) - float(self.prev_memory_pressure)) >
> self.rq.max_memory_pressure
> now = time.time()
> if now - self.prev_pressure_time > 1.0:
> tdiff = now - self.prev_pressure_time
> if tdiff > 1.0:
> exceeds_cpu_pressure = self.rq.max_cpu_pressure
> and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) / tdiff
> > self.rq.max_cpu_pressure
> exceeds_io_pressure = self.rq.max_io_pressure and
> (float(curr_io_pressure) - float(self.prev_io_pressure)) / tdiff >
> self.rq.max_io_pressure
> exceeds_memory_pressure =
> self.rq.max_memory_pressure and (float(curr_memory_pressure) -
> float(self.prev_memory_pressure)) / tdiff > self.rq.max_memory_pressure
> self.prev_cpu_pressure = curr_cpu_pressure
> self.prev_io_pressure = curr_io_pressure
> self.prev_memory_pressure = curr_memory_pressure
> self.prev_pressure_time = now
> else:
> exceeds_cpu_pressure = self.rq.max_cpu_pressure
> and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) >
> self.rq.max_cpu_pressure
> exceeds_io_pressure = self.rq.max_io_pressure and
> (float(curr_io_pressure) - float(self.prev_io_pressure)) >
> self.rq.max_io_pressure
> exceeds_memory_pressure =
> self.rq.max_memory_pressure and (float(curr_memory_pressure) -
> float(self.prev_memory_pressure)) > self.rq.max_memory_pressure
> return (exceeds_cpu_pressure or exceeds_io_pressure or
> exceeds_memory_pressure)
> return False
>
>
>>
>> ZQ
>>
>>>
>>> /Ola
>>>
>>>>
>>>> ../Randy
>>>>
>>>>
>>>>>
>>>>> /Ola Nilsson
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#14206):https://lists.openembedded.org/g/bitbake-devel/message/14206
>> Mute This Topic:https://lists.openembedded.org/mt/95618299/3616765
>> Group Owner:bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe:https://lists.openembedded.org/g/bitbake-devel/unsub [randy.macleod@windriver.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
>
> --
> # Randy MacLeod
> # Wind River Linux
[-- Attachment #2.1: Type: text/html, Size: 37189 bytes --]
[-- Attachment #2.2: make.png --]
[-- Type: image/png, Size: 107869 bytes --]
next prev parent reply other threads:[~2023-05-22 2:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-12 10:07 Bitbake PSI checker Ola x Nilsson
2022-12-12 20:48 ` [bitbake-devel] " Randy MacLeod
2022-12-19 12:50 ` Ola x Nilsson
2022-12-19 19:49 ` contrib
2023-05-20 19:58 ` Randy MacLeod
2023-05-22 2:17 ` ChenQi [this message]
2023-05-22 9:36 ` Ola x Nilsson
2023-05-22 14:41 ` Randy MacLeod
2023-05-23 2:08 ` Chen, Qi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5fb8eecc-135f-c34a-3d1a-1d4b9ae62509@windriver.com \
--to=qi.chen@windriver.com \
--cc=bitbake-devel@lists.openembedded.org \
--cc=contrib@zhengqiu.net \
--cc=ola.x.nilsson@axis.com \
--cc=randy.macleod@windriver.com \
--cc=richard.purdie@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).