u-boot.lists.denx.de archive mirror
 help / color / mirror / Atom feed
* Strange gitlab idea
@ 2023-08-13  3:14 Simon Glass
  2023-08-13 15:52 ` Tom Rini
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Glass @ 2023-08-13  3:14 UTC (permalink / raw)
  To: U-Boot Mailing List; +Cc: Tom Rini

Hi Tom,

I notice that the runners are not utilised much by the QEMU jobs,
since we only run one at a time.

I wonder if we could improve this, perhaps by using a different tag
for the QEMU ones and then having a machine that only runs those (and
runs 40 in parallel)?

In general our use of the runners seems a bit primitive, since the
main use of parallelism is in the world builds.

Regards,
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-13  3:14 Strange gitlab idea Simon Glass
@ 2023-08-13 15:52 ` Tom Rini
  2023-08-15 14:44   ` Simon Glass
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Rini @ 2023-08-13 15:52 UTC (permalink / raw)
  To: Simon Glass; +Cc: U-Boot Mailing List

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:

> Hi Tom,
> 
> I notice that the runners are not utilised much by the QEMU jobs,
> since we only run one at a time.
> 
> I wonder if we could improve this, perhaps by using a different tag
> for the QEMU ones and then having a machine that only runs those (and
> runs 40 in parallel)?
> 
> In general our use of the runners seems a bit primitive, since the
> main use of parallelism is in the world builds.

I'm honestly not sure. I think there's a few tweaks that we should do,
like putting the opensbi and coreboot files in to the Dockerfile logic
instead.  And maybe seeing if just like we can have a docker registry
cache, if we can setup local pypi cache too?  I'm not otherwise sure
what's taking 23 seconds or so of
https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
and run parts aren't much.

My first big worry about running 2 or 3 qemu jobs at the same time on a
host is that any wins get from a shorter queue will be lost to buildman
doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.

My second big worry is that getting the right tags on runners will be a
little tricky.

My third big worry (but this is something you can test easy enough at
least) is that running the big sandbox tests, 2 or 3 times at once on
the same host will get much slower. I think, but profiling would be
helpful, that those get slow due to I/O and not CPU.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-13 15:52 ` Tom Rini
@ 2023-08-15 14:44   ` Simon Glass
  2023-08-15 14:56     ` Tom Rini
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Glass @ 2023-08-15 14:44 UTC (permalink / raw)
  To: Tom Rini; +Cc: U-Boot Mailing List

Hi Tom,

On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
>
> On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
>
> > Hi Tom,
> >
> > I notice that the runners are not utilised much by the QEMU jobs,
> > since we only run one at a time.
> >
> > I wonder if we could improve this, perhaps by using a different tag
> > for the QEMU ones and then having a machine that only runs those (and
> > runs 40 in parallel)?
> >
> > In general our use of the runners seems a bit primitive, since the
> > main use of parallelism is in the world builds.
>
> I'm honestly not sure. I think there's a few tweaks that we should do,
> like putting the opensbi and coreboot files in to the Dockerfile logic
> instead.  And maybe seeing if just like we can have a docker registry
> cache, if we can setup local pypi cache too?  I'm not otherwise sure
> what's taking 23 seconds or so of
> https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> and run parts aren't much.
>
> My first big worry about running 2 or 3 qemu jobs at the same time on a
> host is that any wins get from a shorter queue will be lost to buildman
> doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.

Yes, perhaps.

>
> My second big worry is that getting the right tags on runners will be a
> little tricky.

Yes, and error-prone. Also it makes it harder to deal with broken machines.

>
> My third big worry (but this is something you can test easy enough at
> least) is that running the big sandbox tests, 2 or 3 times at once on
> the same host will get much slower. I think, but profiling would be
> helpful, that those get slow due to I/O and not CPU.

I suspect it would be fast enough.

But actually the other problem is that I am not sure whether the jobs
would have their own filesystem?

Regards,
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-15 14:44   ` Simon Glass
@ 2023-08-15 14:56     ` Tom Rini
  2023-08-17 13:41       ` Simon Glass
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Rini @ 2023-08-15 14:56 UTC (permalink / raw)
  To: Simon Glass; +Cc: U-Boot Mailing List

[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]

On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> >
> > > Hi Tom,
> > >
> > > I notice that the runners are not utilised much by the QEMU jobs,
> > > since we only run one at a time.
> > >
> > > I wonder if we could improve this, perhaps by using a different tag
> > > for the QEMU ones and then having a machine that only runs those (and
> > > runs 40 in parallel)?
> > >
> > > In general our use of the runners seems a bit primitive, since the
> > > main use of parallelism is in the world builds.
> >
> > I'm honestly not sure. I think there's a few tweaks that we should do,
> > like putting the opensbi and coreboot files in to the Dockerfile logic
> > instead.  And maybe seeing if just like we can have a docker registry
> > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > what's taking 23 seconds or so of
> > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > and run parts aren't much.
> >
> > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > host is that any wins get from a shorter queue will be lost to buildman
> > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> 
> Yes, perhaps.
> 
> >
> > My second big worry is that getting the right tags on runners will be a
> > little tricky.
> 
> Yes, and error-prone. Also it makes it harder to deal with broken machines.
> 
> >
> > My third big worry (but this is something you can test easy enough at
> > least) is that running the big sandbox tests, 2 or 3 times at once on
> > the same host will get much slower. I think, but profiling would be
> > helpful, that those get slow due to I/O and not CPU.
> 
> I suspect it would be fast enough.
> 
> But actually the other problem is that I am not sure whether the jobs
> would have their own filesystem?

Yes, they should be properly sandboxed.  If you want to test some of
these ideas, I think the best path is to just un-register temproarily
(comment out the token in config.toml) some of your runners and then
register them with just the DM tree and experiment.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-15 14:56     ` Tom Rini
@ 2023-08-17 13:41       ` Simon Glass
  2023-08-17 14:28         ` Simon Glass
  2023-08-17 15:10         ` Tom Rini
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Glass @ 2023-08-17 13:41 UTC (permalink / raw)
  To: Tom Rini; +Cc: U-Boot Mailing List

Hi Tom,

On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini@konsulko.com> wrote:
>
> On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
> > >
> > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > since we only run one at a time.
> > > >
> > > > I wonder if we could improve this, perhaps by using a different tag
> > > > for the QEMU ones and then having a machine that only runs those (and
> > > > runs 40 in parallel)?
> > > >
> > > > In general our use of the runners seems a bit primitive, since the
> > > > main use of parallelism is in the world builds.
> > >
> > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > instead.  And maybe seeing if just like we can have a docker registry
> > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > what's taking 23 seconds or so of
> > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > and run parts aren't much.
> > >
> > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > host is that any wins get from a shorter queue will be lost to buildman
> > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> >
> > Yes, perhaps.
> >
> > >
> > > My second big worry is that getting the right tags on runners will be a
> > > little tricky.
> >
> > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> >
> > >
> > > My third big worry (but this is something you can test easy enough at
> > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > the same host will get much slower. I think, but profiling would be
> > > helpful, that those get slow due to I/O and not CPU.
> >
> > I suspect it would be fast enough.
> >
> > But actually the other problem is that I am not sure whether the jobs
> > would have their own filesystem?
>
> Yes, they should be properly sandboxed.  If you want to test some of
> these ideas, I think the best path is to just un-register temproarily
> (comment out the token in config.toml) some of your runners and then
> register them with just the DM tree and experiment.

OK thanks for the idea. I tried this on tui

I used a 'concurrent = 10' and it got up to a load of 70 or so every
now and then, but mostly it was much less.

The whole run (of just the test.py stage) took 8 minutes, with
'sandbox with clang test' taking the longest.

I'm not too sure what that tells us...

Regards,
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-17 13:41       ` Simon Glass
@ 2023-08-17 14:28         ` Simon Glass
  2023-08-17 15:10         ` Tom Rini
  1 sibling, 0 replies; 10+ messages in thread
From: Simon Glass @ 2023-08-17 14:28 UTC (permalink / raw)
  To: Tom Rini; +Cc: U-Boot Mailing List

Hi Tom,

On Thu, 17 Aug 2023 at 07:41, Simon Glass <sjg@chromium.org> wrote:
>
> Hi Tom,
>
> On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
> > > >
> > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > >
> > > > > Hi Tom,
> > > > >
> > > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > > since we only run one at a time.
> > > > >
> > > > > I wonder if we could improve this, perhaps by using a different tag
> > > > > for the QEMU ones and then having a machine that only runs those (and
> > > > > runs 40 in parallel)?
> > > > >
> > > > > In general our use of the runners seems a bit primitive, since the
> > > > > main use of parallelism is in the world builds.
> > > >
> > > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > > instead.  And maybe seeing if just like we can have a docker registry
> > > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > > what's taking 23 seconds or so of
> > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > > and run parts aren't much.
> > > >
> > > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > > host is that any wins get from a shorter queue will be lost to buildman
> > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> > >
> > > Yes, perhaps.
> > >
> > > >
> > > > My second big worry is that getting the right tags on runners will be a
> > > > little tricky.
> > >
> > > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> > >
> > > >
> > > > My third big worry (but this is something you can test easy enough at
> > > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > > the same host will get much slower. I think, but profiling would be
> > > > helpful, that those get slow due to I/O and not CPU.
> > >
> > > I suspect it would be fast enough.
> > >
> > > But actually the other problem is that I am not sure whether the jobs
> > > would have their own filesystem?
> >
> > Yes, they should be properly sandboxed.  If you want to test some of
> > these ideas, I think the best path is to just un-register temproarily
> > (comment out the token in config.toml) some of your runners and then
> > register them with just the DM tree and experiment.
>
> OK thanks for the idea. I tried this on tui
>
> I used a 'concurrent = 10' and it got up to a load of 70 or so every
> now and then, but mostly it was much less.
>
> The whole run (of just the test.py stage) took 8 minutes, with
> 'sandbox with clang test' taking the longest.
>
> I'm not too sure what that tells us...

After a bit more thought, perhaps we should:

- Give everything except the world builds a special tag like 'single',
meaning it is somewhat single-threaded
- Adjust some runners to have a second registration which only accepts
'single' jobs, with a concurrency of 10, say
- Consider running everything in a single stage

That might be easy to maintain?

Regards,
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-17 13:41       ` Simon Glass
  2023-08-17 14:28         ` Simon Glass
@ 2023-08-17 15:10         ` Tom Rini
  2023-08-17 16:58           ` Simon Glass
  1 sibling, 1 reply; 10+ messages in thread
From: Tom Rini @ 2023-08-17 15:10 UTC (permalink / raw)
  To: Simon Glass; +Cc: U-Boot Mailing List

[-- Attachment #1: Type: text/plain, Size: 3359 bytes --]

On Thu, Aug 17, 2023 at 07:41:50AM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
> > > >
> > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > >
> > > > > Hi Tom,
> > > > >
> > > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > > since we only run one at a time.
> > > > >
> > > > > I wonder if we could improve this, perhaps by using a different tag
> > > > > for the QEMU ones and then having a machine that only runs those (and
> > > > > runs 40 in parallel)?
> > > > >
> > > > > In general our use of the runners seems a bit primitive, since the
> > > > > main use of parallelism is in the world builds.
> > > >
> > > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > > instead.  And maybe seeing if just like we can have a docker registry
> > > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > > what's taking 23 seconds or so of
> > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > > and run parts aren't much.
> > > >
> > > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > > host is that any wins get from a shorter queue will be lost to buildman
> > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> > >
> > > Yes, perhaps.
> > >
> > > >
> > > > My second big worry is that getting the right tags on runners will be a
> > > > little tricky.
> > >
> > > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> > >
> > > >
> > > > My third big worry (but this is something you can test easy enough at
> > > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > > the same host will get much slower. I think, but profiling would be
> > > > helpful, that those get slow due to I/O and not CPU.
> > >
> > > I suspect it would be fast enough.
> > >
> > > But actually the other problem is that I am not sure whether the jobs
> > > would have their own filesystem?
> >
> > Yes, they should be properly sandboxed.  If you want to test some of
> > these ideas, I think the best path is to just un-register temproarily
> > (comment out the token in config.toml) some of your runners and then
> > register them with just the DM tree and experiment.
> 
> OK thanks for the idea. I tried this on tui
> 
> I used a 'concurrent = 10' and it got up to a load of 70 or so every
> now and then, but mostly it was much less.
> 
> The whole run (of just the test.py stage) took 8 minutes, with
> 'sandbox with clang test' taking the longest.
> 
> I'm not too sure what that tells us...

Well, looking at
https://source.denx.de/u-boot/u-boot/-/pipelines/17391/builds the whole
run took 56 minutes, of which 46 minutes was on 32bit ARM world build.
And the longest test.py stage was sandbox without LTO at just under 8
minutes.  So I think trying to get more concurrency in this stage is
likely to be a wash in terms of overall CI run time.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-17 15:10         ` Tom Rini
@ 2023-08-17 16:58           ` Simon Glass
  2023-08-17 17:07             ` Tom Rini
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Glass @ 2023-08-17 16:58 UTC (permalink / raw)
  To: Tom Rini; +Cc: U-Boot Mailing List

Hi Tom,

On Thu, 17 Aug 2023 at 09:10, Tom Rini <trini@konsulko.com> wrote:
>
> On Thu, Aug 17, 2023 at 07:41:50AM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini@konsulko.com> wrote:
> > >
> > > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
> > > > >
> > > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > > >
> > > > > > Hi Tom,
> > > > > >
> > > > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > > > since we only run one at a time.
> > > > > >
> > > > > > I wonder if we could improve this, perhaps by using a different tag
> > > > > > for the QEMU ones and then having a machine that only runs those (and
> > > > > > runs 40 in parallel)?
> > > > > >
> > > > > > In general our use of the runners seems a bit primitive, since the
> > > > > > main use of parallelism is in the world builds.
> > > > >
> > > > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > > > instead.  And maybe seeing if just like we can have a docker registry
> > > > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > > > what's taking 23 seconds or so of
> > > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > > > and run parts aren't much.
> > > > >
> > > > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > > > host is that any wins get from a shorter queue will be lost to buildman
> > > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> > > >
> > > > Yes, perhaps.
> > > >
> > > > >
> > > > > My second big worry is that getting the right tags on runners will be a
> > > > > little tricky.
> > > >
> > > > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> > > >
> > > > >
> > > > > My third big worry (but this is something you can test easy enough at
> > > > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > > > the same host will get much slower. I think, but profiling would be
> > > > > helpful, that those get slow due to I/O and not CPU.
> > > >
> > > > I suspect it would be fast enough.
> > > >
> > > > But actually the other problem is that I am not sure whether the jobs
> > > > would have their own filesystem?
> > >
> > > Yes, they should be properly sandboxed.  If you want to test some of
> > > these ideas, I think the best path is to just un-register temproarily
> > > (comment out the token in config.toml) some of your runners and then
> > > register them with just the DM tree and experiment.
> >
> > OK thanks for the idea. I tried this on tui
> >
> > I used a 'concurrent = 10' and it got up to a load of 70 or so every
> > now and then, but mostly it was much less.
> >
> > The whole run (of just the test.py stage) took 8 minutes, with
> > 'sandbox with clang test' taking the longest.
> >
> > I'm not too sure what that tells us...
>
> Well, looking at
> https://source.denx.de/u-boot/u-boot/-/pipelines/17391/builds the whole
> run took 56 minutes, of which 46 minutes was on 32bit ARM world build.
> And the longest test.py stage was sandbox without LTO at just under 8
> minutes.  So I think trying to get more concurrency in this stage is
> likely to be a wash in terms of overall CI run time.

There is quite a lot of variability. Two of the machines take about
15mins to 32-bit ARM and another two take under 20mins, e.g.:

https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/676055

Perhaps we should reserve the big jobs for the fastest machines? But
then what if they all go offline at once?

Regards,
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-17 16:58           ` Simon Glass
@ 2023-08-17 17:07             ` Tom Rini
  2023-08-18  3:10               ` Simon Glass
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Rini @ 2023-08-17 17:07 UTC (permalink / raw)
  To: Simon Glass; +Cc: U-Boot Mailing List

[-- Attachment #1: Type: text/plain, Size: 4394 bytes --]

On Thu, Aug 17, 2023 at 10:58:15AM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On Thu, 17 Aug 2023 at 09:10, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 07:41:50AM -0600, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini@konsulko.com> wrote:
> > > >
> > > > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > > > Hi Tom,
> > > > >
> > > > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com> wrote:
> > > > > >
> > > > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > > > >
> > > > > > > Hi Tom,
> > > > > > >
> > > > > > > I notice that the runners are not utilised much by the QEMU jobs,
> > > > > > > since we only run one at a time.
> > > > > > >
> > > > > > > I wonder if we could improve this, perhaps by using a different tag
> > > > > > > for the QEMU ones and then having a machine that only runs those (and
> > > > > > > runs 40 in parallel)?
> > > > > > >
> > > > > > > In general our use of the runners seems a bit primitive, since the
> > > > > > > main use of parallelism is in the world builds.
> > > > > >
> > > > > > I'm honestly not sure. I think there's a few tweaks that we should do,
> > > > > > like putting the opensbi and coreboot files in to the Dockerfile logic
> > > > > > instead.  And maybe seeing if just like we can have a docker registry
> > > > > > cache, if we can setup local pypi cache too?  I'm not otherwise sure
> > > > > > what's taking 23 seconds or so of
> > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since the build
> > > > > > and run parts aren't much.
> > > > > >
> > > > > > My first big worry about running 2 or 3 qemu jobs at the same time on a
> > > > > > host is that any wins get from a shorter queue will be lost to buildman
> > > > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build slower.
> > > > >
> > > > > Yes, perhaps.
> > > > >
> > > > > >
> > > > > > My second big worry is that getting the right tags on runners will be a
> > > > > > little tricky.
> > > > >
> > > > > Yes, and error-prone. Also it makes it harder to deal with broken machines.
> > > > >
> > > > > >
> > > > > > My third big worry (but this is something you can test easy enough at
> > > > > > least) is that running the big sandbox tests, 2 or 3 times at once on
> > > > > > the same host will get much slower. I think, but profiling would be
> > > > > > helpful, that those get slow due to I/O and not CPU.
> > > > >
> > > > > I suspect it would be fast enough.
> > > > >
> > > > > But actually the other problem is that I am not sure whether the jobs
> > > > > would have their own filesystem?
> > > >
> > > > Yes, they should be properly sandboxed.  If you want to test some of
> > > > these ideas, I think the best path is to just un-register temproarily
> > > > (comment out the token in config.toml) some of your runners and then
> > > > register them with just the DM tree and experiment.
> > >
> > > OK thanks for the idea. I tried this on tui
> > >
> > > I used a 'concurrent = 10' and it got up to a load of 70 or so every
> > > now and then, but mostly it was much less.
> > >
> > > The whole run (of just the test.py stage) took 8 minutes, with
> > > 'sandbox with clang test' taking the longest.
> > >
> > > I'm not too sure what that tells us...
> >
> > Well, looking at
> > https://source.denx.de/u-boot/u-boot/-/pipelines/17391/builds the whole
> > run took 56 minutes, of which 46 minutes was on 32bit ARM world build.
> > And the longest test.py stage was sandbox without LTO at just under 8
> > minutes.  So I think trying to get more concurrency in this stage is
> > likely to be a wash in terms of overall CI run time.
> 
> There is quite a lot of variability. Two of the machines take about
> 15mins to 32-bit ARM and another two take under 20mins, e.g.:
> 
> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/676055
> 
> Perhaps we should reserve the big jobs for the fastest machines? But
> then what if they all go offline at once?

Barring some significant donation of resources, we probably are just
going to have to live with enough variation in build time "about an
hour" is what we'll end up with.  I see that overall the pipeline the
above example is from took 50 minutes.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange gitlab idea
  2023-08-17 17:07             ` Tom Rini
@ 2023-08-18  3:10               ` Simon Glass
  0 siblings, 0 replies; 10+ messages in thread
From: Simon Glass @ 2023-08-18  3:10 UTC (permalink / raw)
  To: Tom Rini; +Cc: U-Boot Mailing List

Hi Tom,

On Thu, 17 Aug 2023 at 11:07, Tom Rini <trini@konsulko.com> wrote:
>
> On Thu, Aug 17, 2023 at 10:58:15AM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Thu, 17 Aug 2023 at 09:10, Tom Rini <trini@konsulko.com> wrote:
> > >
> > > On Thu, Aug 17, 2023 at 07:41:50AM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Tue, 15 Aug 2023 at 08:56, Tom Rini <trini@konsulko.com> wrote:
> > > > >
> > > > > On Tue, Aug 15, 2023 at 08:44:20AM -0600, Simon Glass wrote:
> > > > > > Hi Tom,
> > > > > >
> > > > > > On Sun, 13 Aug 2023 at 09:52, Tom Rini <trini@konsulko.com>
wrote:
> > > > > > >
> > > > > > > On Sat, Aug 12, 2023 at 09:14:45PM -0600, Simon Glass wrote:
> > > > > > >
> > > > > > > > Hi Tom,
> > > > > > > >
> > > > > > > > I notice that the runners are not utilised much by the QEMU
jobs,
> > > > > > > > since we only run one at a time.
> > > > > > > >
> > > > > > > > I wonder if we could improve this, perhaps by using a
different tag
> > > > > > > > for the QEMU ones and then having a machine that only runs
those (and
> > > > > > > > runs 40 in parallel)?
> > > > > > > >
> > > > > > > > In general our use of the runners seems a bit primitive,
since the
> > > > > > > > main use of parallelism is in the world builds.
> > > > > > >
> > > > > > > I'm honestly not sure. I think there's a few tweaks that we
should do,
> > > > > > > like putting the opensbi and coreboot files in to the
Dockerfile logic
> > > > > > > instead.  And maybe seeing if just like we can have a docker
registry
> > > > > > > cache, if we can setup local pypi cache too?  I'm not
otherwise sure
> > > > > > > what's taking 23 seconds or so of
> > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/673565#L34 since
the build
> > > > > > > and run parts aren't much.
> > > > > > >
> > > > > > > My first big worry about running 2 or 3 qemu jobs at the same
time on a
> > > > > > > host is that any wins get from a shorter queue will be lost
to buildman
> > > > > > > doing "make -j$(nproc)" 2 or 3 times at once and so we build
slower.
> > > > > >
> > > > > > Yes, perhaps.
> > > > > >
> > > > > > >
> > > > > > > My second big worry is that getting the right tags on runners
will be a
> > > > > > > little tricky.
> > > > > >
> > > > > > Yes, and error-prone. Also it makes it harder to deal with
broken machines.
> > > > > >
> > > > > > >
> > > > > > > My third big worry (but this is something you can test easy
enough at
> > > > > > > least) is that running the big sandbox tests, 2 or 3 times at
once on
> > > > > > > the same host will get much slower. I think, but profiling
would be
> > > > > > > helpful, that those get slow due to I/O and not CPU.
> > > > > >
> > > > > > I suspect it would be fast enough.
> > > > > >
> > > > > > But actually the other problem is that I am not sure whether
the jobs
> > > > > > would have their own filesystem?
> > > > >
> > > > > Yes, they should be properly sandboxed.  If you want to test some
of
> > > > > these ideas, I think the best path is to just un-register
temproarily
> > > > > (comment out the token in config.toml) some of your runners and
then
> > > > > register them with just the DM tree and experiment.
> > > >
> > > > OK thanks for the idea. I tried this on tui
> > > >
> > > > I used a 'concurrent = 10' and it got up to a load of 70 or so every
> > > > now and then, but mostly it was much less.
> > > >
> > > > The whole run (of just the test.py stage) took 8 minutes, with
> > > > 'sandbox with clang test' taking the longest.
> > > >
> > > > I'm not too sure what that tells us...
> > >
> > > Well, looking at
> > > https://source.denx.de/u-boot/u-boot/-/pipelines/17391/builds the
whole
> > > run took 56 minutes, of which 46 minutes was on 32bit ARM world build.
> > > And the longest test.py stage was sandbox without LTO at just under 8
> > > minutes.  So I think trying to get more concurrency in this stage is
> > > likely to be a wash in terms of overall CI run time.
> >
> > There is quite a lot of variability. Two of the machines take about
> > 15mins to 32-bit ARM and another two take under 20mins, e.g.:
> >
> > https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/676055
> >
> > Perhaps we should reserve the big jobs for the fastest machines? But
> > then what if they all go offline at once?
>
> Barring some significant donation of resources, we probably are just
> going to have to live with enough variation in build time "about an
> hour" is what we'll end up with.  I see that overall the pipeline the
> above example is from took 50 minutes.

Yes I think so.

Regards,
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-18  3:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-13  3:14 Strange gitlab idea Simon Glass
2023-08-13 15:52 ` Tom Rini
2023-08-15 14:44   ` Simon Glass
2023-08-15 14:56     ` Tom Rini
2023-08-17 13:41       ` Simon Glass
2023-08-17 14:28         ` Simon Glass
2023-08-17 15:10         ` Tom Rini
2023-08-17 16:58           ` Simon Glass
2023-08-17 17:07             ` Tom Rini
2023-08-18  3:10               ` Simon Glass

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).