On Fri, Sep 24, 2021 at 05:36:31PM -0600, Simon Glass wrote: > Hi Tom, > > On Fri, 24 Sept 2021 at 08:55, Tom Rini wrote: > > > > On Fri, Sep 24, 2021 at 08:38:49AM -0600, Simon Glass wrote: > > > Hi Tom, > > > > > > On Fri, 24 Sept 2021 at 08:20, Tom Rini wrote: > > > > > > > > On Fri, Sep 24, 2021 at 04:01:21PM +0200, Harald Seiler wrote: > > > > > Hi Simon, > > > > > > > > > > On Mon, 2021-09-20 at 08:06 -0600, Simon Glass wrote: > > > > > > Hi Harald, > > > > > > > > > > > > On Mon, 20 Sept 2021 at 02:12, Harald Seiler wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On Sat, 2021-09-18 at 10:37 -0600, Simon Glass wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > Is there something screwy with this? It seems that denx-vulcan does > > > > > > > > two builds at once? > > > > > > > > > > > > > > > > https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/323540 > > > > > > > > > > > > > > Hm, I did some changes to the vulcan runner which might have caused > > > > > > > this... But still, even if it is running multiple jobs in parallel, they > > > > > > > should still be isolated, so how does this lead to a build failure? > > > > > > > > > > > > I'm not sure that it does, but I do see this at the above link: > > > > > > > > > > > > Error: Unable to create > > > > > > '/builds/u-boot/custodians/u-boot-dm/.git/logs/HEAD.lock': File > > > > > > exists. > > > > > > > > > > This is super strange... Each build should be running in its own > > > > > container so there should never be a way for such a race to occur. No > > > > > clue what is going on here... > > > > > > > > I know this from having to track down a different oddball failure with > > > > konsulko-bootbake. It comes down to something along the lines of > > > > volumes being re-used. Good in that it means that every job every time > > > > isn't doing a whole clone of the u-boot tree. Bad in that just in case > > > > the job gets wedged/killed in a crazy spot you end up with problems like > > > > this. If you run a 'find' on vulcan you'll figure out which overlay has > > > > a problem. Or you can stop the runner for a moment and tell docker to > > > > purge unused volumes and it'll clear it up. > > > > > > > > > > Re doing multiple builds, have you set it up so it doesn't take on the > > > > > > very large builds? I would love to enable multiple builds for the qemu > > > > > > steps since they mostly use a single CPU, but am not sure how to do > > > > > > it. > > > > > > > > > > Actually, this was more a mistake than an intentional change. I updated > > > > > the runner on vulcan to also take jobs for some other repos and wanted > > > > > those jobs to run in parallel. It looks like I just forgot setting the > > > > > `limit = 1` option for the U-Boot runner. > > > > > > > > > > Now, I think doing what you suggest is possible. We need to tag build > > > > > and "test" jobs differently and then define multiple runners with > > > > > different limits. E.g. in `.gitlab-ci.yml`: > > > > > > > > > > build all 32bit ARM platforms: > > > > > stage: world build > > > > > tags: > > > > > - build > > > > > > > > > > cppcheck: > > > > > stage: testsuites > > > > > tags: > > > > > - test > > > > > > > > > > And then define two runners in `/etc/gitlab-runner/config.toml`: > > > > > > > > > > concurrent = 4 > > > > > > > > > > [[runners]] > > > > > name = "u-boot builder on vulcan" > > > > > limit = 1 > > > > > ... > > > > > > > > > > [[runners]] > > > > > name = "u-boot tester on vulcan" > > > > > limit = 4 > > > > > ... > > > > > > > > > > and during registration they get the `build` and `test` tags > > > > > respectively. This would allow running (in this example) up to 4 test > > > > > jobs concurrently, but only ever one large build job at once. > > > > > > > > Yes, but this would also make it harder for people to use the CI as-is > > > > with their own runners. For example, the only thing stopping people > > > > from using the free gitlab CI runners on their own is that squashfs > > > > test being broken. > > > > > > Thanks for the info Harald. > > > > > > Would it just mean that they would need to add both 'build' and 'test' > > > tags to their running? If so that does not sound onerous. > > > > Along with not being able to use the gitlab free runners. > > > > > I believe it would speed up CI quite a bit. > > > > I'm not sure? First, did you upgrade your runners recently? I started > > by looking at > > https://source.denx.de/u-boot/u-boot/-/pipelines/9238/builds and all of > > the last stage jobs went super quick. But second, assuming the time > > They are the same as ever: tui did about 1 build per second on average > and kaki did 0.5 builds per second, but this has slowed by about 15% > recently. They are both have quite a few cores. It could just be that > the other two runners were busy so kaki and tui did everything. > > > there includes spinning up the runner, sandbox+clang took 2x as long to > > run as regular sandbox, to run less tests: > > https://source.denx.de/u-boot/u-boot/-/jobs/326772 > > https://source.denx.de/u-boot/u-boot/-/jobs/326773 > > Yes but tui is 2x as fast as kaki (both in terms of number of CPUs and > single-threaded performance) so that might explain it. > > > > > But we might save a minute, or two, if all of the other much quicker > > tests ran to completion sooner, but we'd still be stuck waiting on the > > longest running test. > > Yes, which can be many minutes. But each qemu run takes a good minute > and we have about 30 of them now. Even if all four runners are running > on them, then that is 7 minutes. In parallel it might only take a > minute or two. > > > > > So while I think splitting the job in to stages, such that if something > > fails early we call it all off, a time test where we just have a single > > stage would mean more stuff in parallel and maybe would be quicker, > > especially when we have more free runners. And to me, sadly, that's our > > biggest gating factor and the one that can be solved with money rather > > than technical wizardry. > > Make sense. The other problem is that, to run the tests in parallel, > we might need to clean some of them up (the series I sent is a start > on that). But I think tui could probably run all the qemu jobs in > parallel at once, for example. > > So perhaps we can come back to this when we get parallel tests > running. It definitely is not efficient at present, in the second > (qemu) stage. OK. And I guess the other part of this would be that you could take tui/kaki/etc out of general rotation for a bit and run some pipelines to see what the time change is with your ideas in place. -- Tom