All of lore.kernel.org
 help / color / mirror / Atom feed
* Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: June 17, 2021
@ 2021-06-17 14:05 Randy MacLeod
  2021-06-18  8:30 ` Alexandre Belloni
  0 siblings, 1 reply; 2+ messages in thread
From: Randy MacLeod @ 2021-06-17 14:05 UTC (permalink / raw)
  To: Sakib Sajal, alexandre.belloni, richard.purdie, Wold, Saul
  Cc: Yocto discussion list, Tascioglu, Tony, Trevor Gamblin


Join Zoom Meeting - 9 AM ET
https://windriver.zoom.us/j/3696693975

Attendees:  Alex, Richard, Saul, Randy, Tony, Trevor, Sakib


Summary: Things are improving somewhat on the autobuilder,
          RCU stalls are the top problem now.


1.
LTP kernel BUG:
Many thanks to Paul Gortmaker for his work on this!


2. The most common problem now is the qemu RCU hang.

For example these builds:

https://autobuilder.yoctoproject.org/typhoon/#/builders/73/builds/3541/steps/13/logs/stdio

https://autobuilder.yoctoproject.org/typhoon/#/builders/73/builds/3541/steps/13/logs/stdio

Richards links on RCU stall detection, and tuning parameters:
    https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
    https://lwn.net/Articles/777214/

Next:
  - Ask around for advice on qemu debugging.
  - RP thinks that the underlying system has a problem:
      CPU or other overload.
    We do see that there are two qemus that are using lots of CPU in
    the links above.
    Richard says that the likely activity is:
      - core-image-sato-sdk, compiler tests
      - core-image-sato  lighter general tests
    Alex thinks that the particular workload is not significant.
  - run two qemu in a controlled env, with stress-ng.
  - iostat will help - Sakib.

3. Valgrind ptest results are getting better.

4. ptest issues are coming along, with util-linux being the next
thing to be merged today likely.

5. On the ubuntu-18.04 builders, we seem to see issues there,
    we dont' know why, maybe only that we have more of those workers...
    Alex, could you possibly get failures per worker statistics?


6. discussed Sakib's summary script. It's coming along.
    TO DO:
    - special activities: rm (of trash), tar, qemu*
    - report all zombies
      (The current hoard are due to Paul Barker's patch)
7.
make: job server
  - the fifo was being re-created by the wrapper on each call
    so Trevor will fix that.



8. From last week, I don't think we've increased the timeouts:

  - qemu-runner? timeout increase 120 -> 240
  - ptest timeouts 300 -> 450?




8.
   Plans for the week:

   Richard: RCU stall
   Alex:
   Sakib: task summary
   Trevor: make job server
   Tony: ptests and work with upstream valgrind on fixing bugs.
   Saul: (1 week) have QMP deal with sigusr1 to close the QMP socket
   Randy: coffee, herd cats!!


../Randy

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: June 17, 2021
  2021-06-17 14:05 Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: June 17, 2021 Randy MacLeod
@ 2021-06-18  8:30 ` Alexandre Belloni
  0 siblings, 0 replies; 2+ messages in thread
From: Alexandre Belloni @ 2021-06-18  8:30 UTC (permalink / raw)
  To: Randy MacLeod
  Cc: Sakib Sajal, richard.purdie, Wold, Saul, Yocto discussion list,
	Tascioglu, Tony, Trevor Gamblin

On 17/06/2021 10:05:48-0400, Randy MacLeod wrote:
> 5. On the ubuntu-18.04 builders, we seem to see issues there,
>    we dont' know why, maybe only that we have more of those workers...
>    Alex, could you possibly get failures per worker statistics?
> 

Specifically for bug 14273 (the rcu stall is seen on the logs):

ubuntu1804-ty-1    6   23.1%
ubuntu1804-ty-2    5   19.2%
ubuntu1804-ty-3   12   38.5%
fedora31-ty-1      2    7.7%
debian8-ty-1       3   11.5%

> 
> 6. discussed Sakib's summary script. It's coming along.
>    TO DO:
>    - special activities: rm (of trash), tar, qemu*
>    - report all zombies
>      (The current hoard are due to Paul Barker's patch)
> 7.
> make: job server
>  - the fifo was being re-created by the wrapper on each call
>    so Trevor will fix that.
> 
> 
> 
> 8. From last week, I don't think we've increased the timeouts:
> 
>  - qemu-runner? timeout increase 120 -> 240
>  - ptest timeouts 300 -> 450?
> 
> 
> 
> 
> 8.
>   Plans for the week:
> 
>   Richard: RCU stall
>   Alex:
>   Sakib: task summary
>   Trevor: make job server
>   Tony: ptests and work with upstream valgrind on fixing bugs.
>   Saul: (1 week) have QMP deal with sigusr1 to close the QMP socket
>   Randy: coffee, herd cats!!
> 
> 
> ../Randy

-- 
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-06-18  8:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17 14:05 Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: June 17, 2021 Randy MacLeod
2021-06-18  8:30 ` Alexandre Belloni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.