Hi Ola & Randy, I just checked the codes and I think Ola is right. The current PSI check cannot block spawning of new tasks if the time interval is small between current check and last check. I'll send out a patch to fix this issue. Also, I don't think calculating the value too often is a good idea, so I'll change the check to be >1s. Please help review the patch. Regards, Qi On 5/21/23 03:58, Randy MacLeod wrote: > On 2022-12-19 14:49, Zheng Qiu via lists.openembedded.org wrote: >> >> >>> On Dec 19, 2022, at 7:50 AM, Ola x Nilsson >>> wrote: >>> >>> >>> On Mon, Dec 12 2022, Randy MacLeod wrote: >>> >>>> CCing Richard >>>> >>>> On 2022-12-12 05:07, Ola x Nilsson via lists.openembedded.org wrote: >>>>> Hi, >>>>> >>>>> I've been looking into using the pressure stall information >>>>> awareness of >>>>> bitbake >>>> That's good to hear Ola. >>>>>  but I have some problems getting it to work.  Actually I think >>>>> it just doesn't work at all. >>>> >>>> Doesn't work at all? >>>> >>>> Well that would be surprising. See below. >>> >>> OK, it will occasionally block a task. But since the next attempt will >>> always be a very short time interval it will almost always start a new >>> task even if the pressure is high. >>> At least this is what I observe on my system. >>> >>> >>> >>>> 1. Rather than just keep track of the previous pressure values >>>> seen more than 1 second ago as done currently: >>>> >>>>       if now - self.prev_pressure_time > 1.0: >>>> >>>> and always using that as a reference, we can >>>> store say 10 values per second and use that as a reference. >>>> >>>> There are some challenges in that approach in that we don't control >>>> how often the function is called. Averaging over the last 10 calls >>>> is tempting but likely has some edge cases such as when there are >>>> lots of tasks starting/ending. >>>> >>>> >>>> 2. If there has been a long delay since the function was last called, >>>> we could check the pressure, sleep for a short period of time and >>>> check it >>>> again. Some people would not like this since it will needlessly delay >>>> the build >>>> so we'd have to keep the delay to < 1 second. Too short a delay >>>> will reduce >>>> the accuracy of the result but I suspect that 0.1 seconds is sufficient >>>> for most >>>> users. We could also look at the avg10 value in this case or even some >>>> combination of >>>> both the current contention and avg10. >>>> >>>> >>>> 3. Just calculate the pressure per second by: >>>> >>>>    ( current pressure - last pressure ) / (now - last_time) >>>> >>>> This could handle  short time differences such os milliseconds >>>> as would be a 'cheap' way to deal with long delays. In your case, >>>> the pressure would be: >>>> >>>>   978077.0 io_pressure 1353882.0 mem_pressure 20922.0 >>>> >>>> divided by ~19 since the initial values were close to zero. >>>> >>>> Then for the next time, just 0.1 seconds later: >>>> >>>> 1670840042.384582 cpu_pressure 8978077.0 io_pressure 1353882.0 >>>> mem_pressure 20922.0 >>>> 1670840042.384582 cpu io  pressure exceeded over 18.677629 seconds >>>> 1670840042.486946 cpu_pressure 466.0 io_pressure 30792.0 >>>> mem_pressure 0.0 >>>> >>>> Multiplying by 10 or easy calculation, the would be a pressure: >>>> >>>> cpu: 4660, io: 307920, mem: 0. >>>> >>>> >>>> Do you have another idea or a preference as to which approach we take? >>> >>> I think 3 is a good first step.  Using multiple samples could improve >>> our calculated "avg1", but lets do that later if needed. >> >> I agree; Randy and I have been working on patching make and have >> taken a similar approach: >> make.png >> ZhengQ2/make at cpu-pressure >> >> github.com >> >> >> Additionally, we found that when the pressure read is too frequent, >> we may get the same cpu pressure as an result, >> even if the pressure have actually changed. This is likely due to the >> per cpu variables used in the kernel. >> So, in addition to the algorithm Randy talked above, we also compares >> if the cpu pressure has been changed, if not, >> we will return the last result that has been produced. >> >> I will CC you when I have a patch, and you can try it out before the >> commit gets merged if you like. > > > Ola, > > Does Qi's patch below help in your situation? > > I still want/intent to add a bitbake PSI test case that uses stress-ng > to induce load > and a lightweight sleep task but there are never enough hours in the > day/week/... > > The basic idea is to: > > 1. Run a task that just sleeps for say 10 seconds and confirm that the > actual > execution time is < 11 seconds or so. > > 2. use stress to get the system into a CPU pressure environment above > the current threshold for say 30 seconds and simultaneously / shortly > there after, > launch the same sleep task and confirm that this time, the actual > exectuion time of > the launch to completion time is 40+ seconds. > > ../Randy 'getting caught up on email on the weekend' MacLeod > > > ❯ git show ba94f9a3b1960cc0fdc831c20a9d2f8ad289f307 > commit ba94f9a3b1960cc0fdc831c20a9d2f8ad289f307 > Author: Chen Qi > Date:   Thu Apr 6 23:07:14 2023 > >     bitbake: runqueue: fix PSI check calculation > >     The current PSI check calculation does not take into consideration >     the possibility of the time interval between last check and current >     check being much larger than 1s. In fact, the current behavior does >     not match what the manual says about BB_PRESSURE_MAX_XXX, even if >     the value is set to upper limit, 1000000, we still get many blocks >     on new task launch. The difference between 'total' should be divided >     by the time interval if it's larger than 1s. > >     (Bitbake rev: b4763c2c93e7494e0a27f5970c19c1aac66c228b) > >     Signed-off-by: Chen Qi >     Signed-off-by: Richard Purdie > > > Δ bitbake/lib/bb/runqueue.py > ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── > > ────────────────────────────────────────┐ > • 198: class RunQueueScheduler(object): │ > ────────────────────────────────────────┘ >                 curr_cpu_pressure = > cpu_pressure_fds.readline().split()[4].split("=")[1] >                 curr_io_pressure = > io_pressure_fds.readline().split()[4].split("=")[1] >                 curr_memory_pressure = > memory_pressure_fds.readline().split()[4].split("=")[1] >                 exceeds_cpu_pressure =  self.rq.max_cpu_pressure and > (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) > > self.rq.max_cpu_pressure >                 exceeds_io_pressure =  self.rq.max_io_pressure and > (float(curr_io_pressure) - float(self.prev_io_pressure)) > > self.rq.max_io_pressure >                 exceeds_memory_pressure = self.rq.max_memory_pressure > and (float(curr_memory_pressure) - float(self.prev_memory_pressure)) > > self.rq.max_memory_pressure >                 now = time.time() >                 if now - self.prev_pressure_time > 1.0: >                 tdiff = now - self.prev_pressure_time >                 if tdiff > 1.0: >                     exceeds_cpu_pressure = self.rq.max_cpu_pressure > and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) / tdiff > > self.rq.max_cpu_pressure >                     exceeds_io_pressure = self.rq.max_io_pressure and > (float(curr_io_pressure) - float(self.prev_io_pressure)) / tdiff > > self.rq.max_io_pressure >                     exceeds_memory_pressure = > self.rq.max_memory_pressure and (float(curr_memory_pressure) - > float(self.prev_memory_pressure)) / tdiff > self.rq.max_memory_pressure >                     self.prev_cpu_pressure = curr_cpu_pressure >                     self.prev_io_pressure = curr_io_pressure >                     self.prev_memory_pressure = curr_memory_pressure >                     self.prev_pressure_time = now >                 else: >                     exceeds_cpu_pressure = self.rq.max_cpu_pressure > and (float(curr_cpu_pressure) - float(self.prev_cpu_pressure)) > > self.rq.max_cpu_pressure >                     exceeds_io_pressure = self.rq.max_io_pressure and > (float(curr_io_pressure) - float(self.prev_io_pressure)) > > self.rq.max_io_pressure >                     exceeds_memory_pressure = > self.rq.max_memory_pressure and (float(curr_memory_pressure) - > float(self.prev_memory_pressure)) > self.rq.max_memory_pressure >             return (exceeds_cpu_pressure or exceeds_io_pressure or > exceeds_memory_pressure) >         return False > > >> >> ZQ >> >>> >>> /Ola >>> >>>> >>>> ../Randy >>>> >>>> >>>>> >>>>> /Ola Nilsson >>>>> >>>>> >>>>> >>> >>> >> >> >> -=-=-=-=-=-=-=-=-=-=-=- >> Links: You receive all messages sent to this group. >> View/Reply Online (#14206):https://lists.openembedded.org/g/bitbake-devel/message/14206 >> Mute This Topic:https://lists.openembedded.org/mt/95618299/3616765 >> Group Owner:bitbake-devel+owner@lists.openembedded.org >> Unsubscribe:https://lists.openembedded.org/g/bitbake-devel/unsub [randy.macleod@windriver.com] >> -=-=-=-=-=-=-=-=-=-=-=- >> > > -- > # Randy MacLeod > # Wind River Linux