All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Randy MacLeod" <randy.macleod@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
	Sakib Sajal <sakib.sajal@windriver.com>,
	yocto@lists.yoctoproject.org, "Wold,
	Saul" <saul.wold@windriver.com>
Subject: Re: [yocto] [PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template -- Build: 20210420-1
Date: Tue, 20 Apr 2021 17:50:47 -0400	[thread overview]
Message-ID: <ae8693af-6b2d-0a0e-155b-2e4468a340b5@windriver.com> (raw)
In-Reply-To: <ee652d79-8c00-0606-bf6c-a3fc793b4a8b@windriver.com>

[-- Attachment #1: Type: text/plain, Size: 7798 bytes --]

Summary:

Build: 20210420-1 had 23 triggers and isn't really worth
analyzing. Stay tuned for what is hopefully better data
with a higher timeout threshold.

There is a nice looking pair of graph attached though! :)

../Randy

On 2021-04-15 4:48 p.m., Randy MacLeod wrote:
> On 2021-04-15 1:55 p.m., Randy MacLeod wrote:


 > Should we  increase the interval to 30, 60, ore more seconds?

I've bumped the interval to 60 seconds.


I spent some time looking at the first bit of data along with
Sakib and Saul from time to time.
> General conclusions:
> 
> 1. It seems like ALL triggers involve oe-selftest being active.

Still true.
> 
> 2. xz might be a problem but we're not sure yet. 

Added to graph attached.

> 
> 3. We need more data and tools and time to think about it.



> To Do:
> 
> 1. increase top cmdline length from 512 to  16K

Still to do, maybe. It's not clear that it's useful yet.

> 
> 2. sometimes we see:
> 
>      Command '['oe-time-dd-test.sh', '100']' timed out after 10.0 seconds
> 
> That should not happen so we should understand why and either increase
> the time between runs or fix the tooling. This seems to happen under load
> so it's hiding the interesting data that we are looking for!

With a 60 second interval, this does NOT happen.

> 
> 3. tail the cooker console in addition to top. Present that before top.
> 
>     It would be nice to have a top equivalent for bitbake.

Still to do.
This can be done via the cooker log.
We can parse the file and do:
1. list recent tasks regardless of whether they have completed:
       tail -20?
2. list tasks that have started but not completed.



This email is about:
    https://autobuilder.yocto.io/pub/non-release/20210420-1/
It was a 'quick' build. There was only 1 log file produced.
It used the master so the timeout was still 5 seconds.

There were 23 (!!) times that the dd time exceeded the
15 (ACTUALLY 5) second limit out of a total of 1504 invocations
and those triggers were captured by 1 log file:
    testresults/qemux86-world/2021-04-20--03-43/host_stats_0_top.txt


Splitting each top output into a separate file as before,
now we have 23 log files:

How big are these files, ie how many process/kernel threads were running
when top ran?

$ wc -l host-stats-0* | sort -n
     699 host-stats-03:42:40--23.log
     704 host-stats-02:20:41--5.log
     720 host-stats-03:33:41--21.log
     733 host-stats-02:19:40--4.log
     743 host-stats-03:32:41--20.log
     752 host-stats-02:25:45--6.log
     753 host-stats-02:02:46--3.log
     760 host-stats-03:36:07--22.log
     784 host-stats-02:01:48--2.log
     784 host-stats-03:30:43--18.log
     802 host-stats-02:45:55--10.log
     807 host-stats-03:31:42--19.log
     816 host-stats-03:29:49--17.log
     829 host-stats-03:19:50--15.log
     845 host-stats-02:41:40--8.log
     851 host-stats-02:00:57--1.log
     899 host-stats-02:32:47--7.log
     906 host-stats-03:18:41--14.log
     925 host-stats-03:28:24--16.log
     947 host-stats-02:42:41--9.log
    1084 host-stats-03:16:50--13.log
    1204 host-stats-02:54:43--11.log
    1314 host-stats-03:03:41--12.log
   19661 total



I noticed that several but not all log files were running xz:

$ for i in `ls host-stats-*`; do echo -n $i ": "; grep "xz " $i | wc -l; 
done
host-stats-02:00:57--1.log : 0
host-stats-02:01:48--2.log : 0
host-stats-02:02:46--3.log : 0
host-stats-02:19:40--4.log : 2
host-stats-02:20:41--5.log : 0
host-stats-02:25:45--6.log : 2
host-stats-02:32:47--7.log : 0
host-stats-02:41:40--8.log : 1
host-stats-02:42:41--9.log : 2
host-stats-02:45:55--10.log : 1
host-stats-02:54:43--11.log : 0
host-stats-03:03:41--12.log : 0
host-stats-03:16:50--13.log : 47
host-stats-03:18:41--14.log : 0
host-stats-03:19:50--15.log : 9
host-stats-03:28:24--16.log : 15
host-stats-03:29:49--17.log : 0
host-stats-03:30:43--18.log : 0
host-stats-03:31:42--19.log : 7
host-stats-03:32:41--20.log : 7
host-stats-03:33:41--21.log : 11
host-stats-03:36:07--22.log : 15
host-stats-03:42:40--23.log : 9


I've plotted the number of xz processes along with the load average
and the dd time  in the attached graph.


All of the top output logs seems to be running oe-selftest:

for i in host-stats-0*; do grep -H -c "DISPLAY.*oe-selftest " $i ; done
host-stats-02:00:57--1.log:1
host-stats-02:01:48--2.log:1
host-stats-02:02:46--3.log:1
host-stats-02:19:40--4.log:1
host-stats-02:20:41--5.log:1
host-stats-02:25:45--6.log:1
host-stats-02:32:47--7.log:1
host-stats-02:41:40--8.log:2
host-stats-02:42:41--9.log:2
host-stats-02:45:55--10.log:2
host-stats-02:54:43--11.log:2
host-stats-03:03:41--12.log:2
host-stats-03:16:50--13.log:2
host-stats-03:18:41--14.log:2
host-stats-03:19:50--15.log:2
host-stats-03:28:24--16.log:2
host-stats-03:29:49--17.log:2
host-stats-03:30:43--18.log:2
host-stats-03:31:42--19.log:2
host-stats-03:32:41--20.log:2
host-stats-03:33:41--21.log:2
host-stats-03:36:07--22.log:2
host-stats-03:42:40--23.log:2



The logs DO seem to be clustered in that there are several 1 minute 
adjacent intervals where the dd time exceeded the threshold.

Data here but you can see this in the attached graph.

$ for i in host-stats-*; do echo -n $i ": "; head -1 $i | cut -c -15; done
host-stats-02:00:57--1.log : top - 02:00:57
host-stats-02:01:48--2.log : top - 02:01:48
host-stats-02:02:46--3.log : top - 02:02:46
host-stats-02:19:40--4.log : top - 02:19:40
host-stats-02:20:41--5.log : top - 02:20:41
host-stats-02:25:45--6.log : top - 02:25:45
host-stats-02:32:47--7.log : top - 02:32:47
host-stats-02:41:40--8.log : top - 02:41:40
host-stats-02:42:41--9.log : top - 02:42:41
host-stats-02:45:55--10.log : top - 02:45:55
host-stats-02:54:43--11.log : top - 02:54:43
host-stats-03:03:41--12.log : top - 03:03:41
host-stats-03:16:50--13.log : top - 03:16:50
host-stats-03:18:41--14.log : top - 03:18:41
host-stats-03:19:50--15.log : top - 03:19:50
host-stats-03:28:24--16.log : top - 03:28:24
host-stats-03:29:49--17.log : top - 03:29:49
host-stats-03:30:43--18.log : top - 03:30:43
host-stats-03:31:42--19.log : top - 03:31:42
host-stats-03:32:41--20.log : top - 03:32:41
host-stats-03:33:41--21.log : top - 03:33:41
host-stats-03:36:07--22.log : top - 03:36:07
host-stats-03:42:40--23.log : top - 03:42:40



*** Add time distribution of dd.
Oddly enough, the output of dd does not seem to be consistent:

102400 bytes (102 kB) copied, 0.0163975 s, 6.2 MB/s
102400 bytes (102 kB) copied, 0.0054732 s, 18.7 MB/s
102400 bytes (102 kB, 100 KiB) copied, 0.0984687 s, 1.0 MB/s
102400 bytes (102 kB, 100 KiB) copied, 0.304233 s, 337 kB/s

I'm not sure what's going on there, yet. Sakib?


I think the cooker output will be quite useful.
I'm not going to put any more time into this data set
since the time threshold is still 5 seconds.

All for now.

../Randy


>
>>
>> ../Randy
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Richard
>>>>
>>>> The template is not used anywhere, yet, the initial patchset 
>>>> enables the
>>>> data collection by default.
>>>>
>>>> I have left the template in case the data collection is removed from
>>>> defaults and need to be used on a case by case basis.
>>>>
>>>> I am not entirely sure if nesting templates work. I have not seen any
>>>> examples of it, neither did i try it myself. If nesting does work, the
>>>> template should be useful.
>>>
>>> I had a quick look at the code and sadly, it doesn't appear I 
>>> implemented
>>> nesting so this wouldn't be that useful as things stand.
>>>
>>> Cheers,
>>>
>>> Richard
>>>
>>>
>>>
>>>
>>> 
>>>
>>
>>
>
>

-- 
# Randy MacLeod
# Wind River Linux


[-- Attachment #2: time-dd-latency-distribution-20210420-1.png --]
[-- Type: image/png, Size: 8344 bytes --]

[-- Attachment #3: load-average-dd-time-2021-04-20--1.png --]
[-- Type: image/png, Size: 17155 bytes --]

      parent reply	other threads:[~2021-04-20 21:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 17:02 [PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template sakib.sajal
2021-04-13 17:02 ` [PATCH yocto-autobuilder-helper 2/4] config.json: collect data by default sakib.sajal
2021-04-16  8:28   ` [yocto] " Richard Purdie
     [not found]   ` <1676495329CC3E3C.31742@lists.yoctoproject.org>
2021-04-16  9:05     ` Richard Purdie
2021-04-13 17:02 ` [PATCH yocto-autobuilder-helper 3/4] collect-results: collect "host_stats" files generated by collect-data template sakib.sajal
2021-04-13 17:02 ` [PATCH yocto-autobuilder-helper 4/4] generate-testresult-index.py: publish host data collected " sakib.sajal
2021-04-15 13:52 ` [yocto] [PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template Richard Purdie
2021-04-15 15:31   ` sakib.sajal
2021-04-15 15:55     ` Richard Purdie
2021-04-15 17:55       ` Randy MacLeod
2021-04-15 20:48         ` Randy MacLeod
2021-04-16 21:35           ` Randy MacLeod
2021-04-20 21:50           ` Randy MacLeod [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae8693af-6b2d-0a0e-155b-2e4468a340b5@windriver.com \
    --to=randy.macleod@windriver.com \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=sakib.sajal@windriver.com \
    --cc=saul.wold@windriver.com \
    --cc=yocto@lists.yoctoproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.