From: "Randy MacLeod" <randy.macleod@windriver.com>
To: Sakib Sajal <sakib.sajal@windriver.com>,
alexandre.belloni@bootlin.com,
richard.purdie@linuxfoundation.org, "Wold,
Saul" <saul.wold@windriver.com>,
Trevor Gamblin <trevor.gamblin@windriver.com>
Cc: Yocto discussion list <yocto@yoctoproject.org>,
"Tascioglu, Tony" <Tony.Tascioglu@windriver.com>,
Michael Halstead <mhalstead@linuxfoundation.org>
Subject: Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: July 8, 2021
Date: Thu, 8 Jul 2021 09:53:18 -0400 [thread overview]
Message-ID: <15248ca8-1257-4ed9-b405-149070fde4a1@windriver.com> (raw)
YP AB Intermittent failures meeting
===================================
July 1, 2021, 9 AM ET
https://windriver.zoom.us/j/3696693975
Attendees: Tony, Richard, Trevor, Randy
Summary:
========
The autobuilder RCU hang is still fixed ;-),
master branch builds greener.
ptest failures are the top problem now.
Add Michael Halstead, see questions below in section 4.
If anyone wants to help, we could use more eyes on the logs,
particularly the summary logs and understanding iostat #
when the dd test times out.
Plans for the week:
===================
Richard: glibc upgrade, etc.
Alex: ?
Sakib: pub/non-release link upgrade, script clean-up.
Trevor: make job server test. Try it on YP AB!!! What type of build?
Tony: fix/work-around valgrind ptest bug:
none/tests/amd64/fb_test_amd64
Saul: nothing this week for YP.
Randy: vacation, then email catch-up!
Meeting Notes:
==============
1, runqemu
Tony having trouble with runqemu on some Wind River machine.
Richard has a fix for a race in runqemu in master-next.
These might be related but if not Tony should debug the
issue/ collect logs.
2. job server
- Trevor has conclusive evidence that the 'make' job server is useful.
email summary to come. Need to fix some assumptoins in code that
parses PARALLEL_MAKE then send patch to yocto-autobuilder-helper.
- ninja will have to be done next.
3. AB status
generally better but...
ptests are having some recurring problems.
- parted - only on arm?
- valgrind - none/tests/amd64/fb_test_amd64
- gdb test failing again. - Randy!
4. Richard reported
- something really flaky going on with serial ports.
- particularly bad on qemuppc but also x86.
- related to Saul's QMP data dump?
5. Sakib needs to send patch to make testimage failures
generate summary logs.
6. Richard says that we may need to redesign the data collection system
that Sakib's AB INT tests are based on.
Still relevant parts of
Previous Meeting Notes:
=======================
1. The qemu RCU hang has been fixed to not deadlock anymore!
It still hangs at times but this dramatically reduces the
AB failures.
4. bitbake server timeout.
"Timeout while waiting for a reply from the bitbake server (60s)"
Randy mentioned that the bitbake server timeouts seen in the
Wind River build cluster have gone away after upgrading to
a newer version of docker.
Old: Docker Version: Docker version 18.09.4, build d14af54266
New: Docker Version: Docker version 20.10.7, build f0df350
Clearly the YP ABs aren't running in docker but what
about firmware and kernel tunings.
Michael,
Is the BIOS/firmware kept up to date on most nodes?
It seems that we are running stock kernels which makes sense but
given that we don't have concerns about privacy since system access
is controlled and the nodes are being used to test open
source software, we might consider optimizing for performance
rather than security.
Alex pointed at: https://make-linux-fast-again.com/
Which just lists a set of kernel boot options:
noibrs noibpb nopti nospectre_v2 nospectre_v1 \
l1tf=off nospec_store_bypass_disable no_stf_barrier \
mds=off tsx=on tsx_async_abort=off mitigations=off
Can we enable some or all of these on a node to see what the
performance difference is?
5. io stalls
Richard said that it would make sense to write an ftrace utility
/ script to monitor io latency and we could install it with sudo
Ch^W mentioned ftrace on IRC.
Sakib and Randy will work on that but not for a week or two.
6. Switch the pub/non-release links from full log to summary.
The host data links on:
https://autobuilder.yocto.io/pub/non-release/
should include links to the summary data. I think we have room to
include both links like this:
0 1 2 3 (Full: 0 1 2 3 )
../Randy
next reply other threads:[~2021-07-08 13:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-08 13:53 Randy MacLeod [this message]
2021-07-12 13:34 ` Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: July 8, 2021 Trevor Gamblin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=15248ca8-1257-4ed9-b405-149070fde4a1@windriver.com \
--to=randy.macleod@windriver.com \
--cc=Tony.Tascioglu@windriver.com \
--cc=alexandre.belloni@bootlin.com \
--cc=mhalstead@linuxfoundation.org \
--cc=richard.purdie@linuxfoundation.org \
--cc=sakib.sajal@windriver.com \
--cc=saul.wold@windriver.com \
--cc=trevor.gamblin@windriver.com \
--cc=yocto@yoctoproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.