From: "Randy MacLeod" <randy.macleod@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
Sakib Sajal <sakib.sajal@windriver.com>,
yocto@lists.yoctoproject.org, "Wold,
Saul" <saul.wold@windriver.com>
Subject: Re: [yocto] [PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template -- Build: 20210420-1
Date: Tue, 20 Apr 2021 17:50:47 -0400 [thread overview]
Message-ID: <ae8693af-6b2d-0a0e-155b-2e4468a340b5@windriver.com> (raw)
In-Reply-To: <ee652d79-8c00-0606-bf6c-a3fc793b4a8b@windriver.com>
[-- Attachment #1: Type: text/plain, Size: 7798 bytes --]
Summary:
Build: 20210420-1 had 23 triggers and isn't really worth
analyzing. Stay tuned for what is hopefully better data
with a higher timeout threshold.
There is a nice looking pair of graph attached though! :)
../Randy
On 2021-04-15 4:48 p.m., Randy MacLeod wrote:
> On 2021-04-15 1:55 p.m., Randy MacLeod wrote:
> Should we increase the interval to 30, 60, ore more seconds?
I've bumped the interval to 60 seconds.
I spent some time looking at the first bit of data along with
Sakib and Saul from time to time.
> General conclusions:
>
> 1. It seems like ALL triggers involve oe-selftest being active.
Still true.
>
> 2. xz might be a problem but we're not sure yet.
Added to graph attached.
>
> 3. We need more data and tools and time to think about it.
> To Do:
>
> 1. increase top cmdline length from 512 to 16K
Still to do, maybe. It's not clear that it's useful yet.
>
> 2. sometimes we see:
>
> Command '['oe-time-dd-test.sh', '100']' timed out after 10.0 seconds
>
> That should not happen so we should understand why and either increase
> the time between runs or fix the tooling. This seems to happen under load
> so it's hiding the interesting data that we are looking for!
With a 60 second interval, this does NOT happen.
>
> 3. tail the cooker console in addition to top. Present that before top.
>
> It would be nice to have a top equivalent for bitbake.
Still to do.
This can be done via the cooker log.
We can parse the file and do:
1. list recent tasks regardless of whether they have completed:
tail -20?
2. list tasks that have started but not completed.
This email is about:
https://autobuilder.yocto.io/pub/non-release/20210420-1/
It was a 'quick' build. There was only 1 log file produced.
It used the master so the timeout was still 5 seconds.
There were 23 (!!) times that the dd time exceeded the
15 (ACTUALLY 5) second limit out of a total of 1504 invocations
and those triggers were captured by 1 log file:
testresults/qemux86-world/2021-04-20--03-43/host_stats_0_top.txt
Splitting each top output into a separate file as before,
now we have 23 log files:
How big are these files, ie how many process/kernel threads were running
when top ran?
$ wc -l host-stats-0* | sort -n
699 host-stats-03:42:40--23.log
704 host-stats-02:20:41--5.log
720 host-stats-03:33:41--21.log
733 host-stats-02:19:40--4.log
743 host-stats-03:32:41--20.log
752 host-stats-02:25:45--6.log
753 host-stats-02:02:46--3.log
760 host-stats-03:36:07--22.log
784 host-stats-02:01:48--2.log
784 host-stats-03:30:43--18.log
802 host-stats-02:45:55--10.log
807 host-stats-03:31:42--19.log
816 host-stats-03:29:49--17.log
829 host-stats-03:19:50--15.log
845 host-stats-02:41:40--8.log
851 host-stats-02:00:57--1.log
899 host-stats-02:32:47--7.log
906 host-stats-03:18:41--14.log
925 host-stats-03:28:24--16.log
947 host-stats-02:42:41--9.log
1084 host-stats-03:16:50--13.log
1204 host-stats-02:54:43--11.log
1314 host-stats-03:03:41--12.log
19661 total
I noticed that several but not all log files were running xz:
$ for i in `ls host-stats-*`; do echo -n $i ": "; grep "xz " $i | wc -l;
done
host-stats-02:00:57--1.log : 0
host-stats-02:01:48--2.log : 0
host-stats-02:02:46--3.log : 0
host-stats-02:19:40--4.log : 2
host-stats-02:20:41--5.log : 0
host-stats-02:25:45--6.log : 2
host-stats-02:32:47--7.log : 0
host-stats-02:41:40--8.log : 1
host-stats-02:42:41--9.log : 2
host-stats-02:45:55--10.log : 1
host-stats-02:54:43--11.log : 0
host-stats-03:03:41--12.log : 0
host-stats-03:16:50--13.log : 47
host-stats-03:18:41--14.log : 0
host-stats-03:19:50--15.log : 9
host-stats-03:28:24--16.log : 15
host-stats-03:29:49--17.log : 0
host-stats-03:30:43--18.log : 0
host-stats-03:31:42--19.log : 7
host-stats-03:32:41--20.log : 7
host-stats-03:33:41--21.log : 11
host-stats-03:36:07--22.log : 15
host-stats-03:42:40--23.log : 9
I've plotted the number of xz processes along with the load average
and the dd time in the attached graph.
All of the top output logs seems to be running oe-selftest:
for i in host-stats-0*; do grep -H -c "DISPLAY.*oe-selftest " $i ; done
host-stats-02:00:57--1.log:1
host-stats-02:01:48--2.log:1
host-stats-02:02:46--3.log:1
host-stats-02:19:40--4.log:1
host-stats-02:20:41--5.log:1
host-stats-02:25:45--6.log:1
host-stats-02:32:47--7.log:1
host-stats-02:41:40--8.log:2
host-stats-02:42:41--9.log:2
host-stats-02:45:55--10.log:2
host-stats-02:54:43--11.log:2
host-stats-03:03:41--12.log:2
host-stats-03:16:50--13.log:2
host-stats-03:18:41--14.log:2
host-stats-03:19:50--15.log:2
host-stats-03:28:24--16.log:2
host-stats-03:29:49--17.log:2
host-stats-03:30:43--18.log:2
host-stats-03:31:42--19.log:2
host-stats-03:32:41--20.log:2
host-stats-03:33:41--21.log:2
host-stats-03:36:07--22.log:2
host-stats-03:42:40--23.log:2
The logs DO seem to be clustered in that there are several 1 minute
adjacent intervals where the dd time exceeded the threshold.
Data here but you can see this in the attached graph.
$ for i in host-stats-*; do echo -n $i ": "; head -1 $i | cut -c -15; done
host-stats-02:00:57--1.log : top - 02:00:57
host-stats-02:01:48--2.log : top - 02:01:48
host-stats-02:02:46--3.log : top - 02:02:46
host-stats-02:19:40--4.log : top - 02:19:40
host-stats-02:20:41--5.log : top - 02:20:41
host-stats-02:25:45--6.log : top - 02:25:45
host-stats-02:32:47--7.log : top - 02:32:47
host-stats-02:41:40--8.log : top - 02:41:40
host-stats-02:42:41--9.log : top - 02:42:41
host-stats-02:45:55--10.log : top - 02:45:55
host-stats-02:54:43--11.log : top - 02:54:43
host-stats-03:03:41--12.log : top - 03:03:41
host-stats-03:16:50--13.log : top - 03:16:50
host-stats-03:18:41--14.log : top - 03:18:41
host-stats-03:19:50--15.log : top - 03:19:50
host-stats-03:28:24--16.log : top - 03:28:24
host-stats-03:29:49--17.log : top - 03:29:49
host-stats-03:30:43--18.log : top - 03:30:43
host-stats-03:31:42--19.log : top - 03:31:42
host-stats-03:32:41--20.log : top - 03:32:41
host-stats-03:33:41--21.log : top - 03:33:41
host-stats-03:36:07--22.log : top - 03:36:07
host-stats-03:42:40--23.log : top - 03:42:40
*** Add time distribution of dd.
Oddly enough, the output of dd does not seem to be consistent:
102400 bytes (102 kB) copied, 0.0163975 s, 6.2 MB/s
102400 bytes (102 kB) copied, 0.0054732 s, 18.7 MB/s
102400 bytes (102 kB, 100 KiB) copied, 0.0984687 s, 1.0 MB/s
102400 bytes (102 kB, 100 KiB) copied, 0.304233 s, 337 kB/s
I'm not sure what's going on there, yet. Sakib?
I think the cooker output will be quite useful.
I'm not going to put any more time into this data set
since the time threshold is still 5 seconds.
All for now.
../Randy
>
>>
>> ../Randy
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Richard
>>>>
>>>> The template is not used anywhere, yet, the initial patchset
>>>> enables the
>>>> data collection by default.
>>>>
>>>> I have left the template in case the data collection is removed from
>>>> defaults and need to be used on a case by case basis.
>>>>
>>>> I am not entirely sure if nesting templates work. I have not seen any
>>>> examples of it, neither did i try it myself. If nesting does work, the
>>>> template should be useful.
>>>
>>> I had a quick look at the code and sadly, it doesn't appear I
>>> implemented
>>> nesting so this wouldn't be that useful as things stand.
>>>
>>> Cheers,
>>>
>>> Richard
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
--
# Randy MacLeod
# Wind River Linux
[-- Attachment #2: time-dd-latency-distribution-20210420-1.png --]
[-- Type: image/png, Size: 8344 bytes --]
[-- Attachment #3: load-average-dd-time-2021-04-20--1.png --]
[-- Type: image/png, Size: 17155 bytes --]
prev parent reply other threads:[~2021-04-20 21:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-13 17:02 [PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template sakib.sajal
2021-04-13 17:02 ` [PATCH yocto-autobuilder-helper 2/4] config.json: collect data by default sakib.sajal
2021-04-16 8:28 ` [yocto] " Richard Purdie
[not found] ` <1676495329CC3E3C.31742@lists.yoctoproject.org>
2021-04-16 9:05 ` Richard Purdie
2021-04-13 17:02 ` [PATCH yocto-autobuilder-helper 3/4] collect-results: collect "host_stats" files generated by collect-data template sakib.sajal
2021-04-13 17:02 ` [PATCH yocto-autobuilder-helper 4/4] generate-testresult-index.py: publish host data collected " sakib.sajal
2021-04-15 13:52 ` [yocto] [PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template Richard Purdie
2021-04-15 15:31 ` sakib.sajal
2021-04-15 15:55 ` Richard Purdie
2021-04-15 17:55 ` Randy MacLeod
2021-04-15 20:48 ` Randy MacLeod
2021-04-16 21:35 ` Randy MacLeod
2021-04-20 21:50 ` Randy MacLeod [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae8693af-6b2d-0a0e-155b-2e4468a340b5@windriver.com \
--to=randy.macleod@windriver.com \
--cc=richard.purdie@linuxfoundation.org \
--cc=sakib.sajal@windriver.com \
--cc=saul.wold@windriver.com \
--cc=yocto@lists.yoctoproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.