All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
@ 2021-02-08 16:33 Daniel P. Berrangé
  2021-02-08 16:33 ` [PATCH 1/3] gitlab: always build container images Daniel P. Berrangé
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-08 16:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Thomas Huth, Philippe Mathieu-Daudé,
	Daniel P. Berrangé,
	Wainer dos Santos Moschetta

This series fixes a problem with our gitlab CI rules that cause
container builds to be skipped. See the commit description in the
first patch for the details on this problem.

The overall result of this series though is a small increase in overall
pipeline time.

Previously

 - When container jobs are skipped: approx 1hr 5 mins
 - When container jobs are run, cached by docker: approx 1hr 15 mins
 - When container jobs are run, not cached by docker: approx 1hr 30 mins

With this series applied the first scenario no longer exists, so
all piplines are either 1hr 15 or 1hr 30 depending on whether the
container phase is skipped.

On the plus side the builds are more reliable as we're actually
building container images at correct times.

There is still a race condition though where build jobs can run
with the wrong containers. This happens if you push two different
branches to gitlab with different docker file content. If the
container jobs for the 2nd branch finish before the 1st
branch runs its build jobs, the 1st branch can end up using
containers fro the second branch.  The only fix to truely fix
that would be to stop using "latest" docker tag and always
use a tag based on the branch name. This would mean we build
up a growing set of docker images in the gitlab registry.

At least this series is much more correct that what exists in
git currently.

Daniel P. Berrangé (3):
  gitlab: always build container images
  gitlab: add fine grained job deps for all build jobs
  gitlab: fix inconsistent indentation

 .gitlab-ci.d/containers.yml |  7 ----
 .gitlab-ci.yml              | 74 +++++++++++++++++++++++++++++++++----
 2 files changed, 66 insertions(+), 15 deletions(-)

-- 
2.29.2




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] gitlab: always build container images
  2021-02-08 16:33 [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
@ 2021-02-08 16:33 ` Daniel P. Berrangé
  2021-02-09  6:37   ` Thomas Huth
  2021-02-08 16:33 ` [PATCH 2/3] gitlab: add fine grained job deps for all build jobs Daniel P. Berrangé
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-08 16:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Thomas Huth, Philippe Mathieu-Daudé,
	Daniel P. Berrangé,
	Wainer dos Santos Moschetta

Currently we attempt to skip building container images if the commits do
not involve changes to the dockerfiles or gitlab CI definitions.

Conceptually this makes sense, but there is a challenge in the real
world implementation of this in gitlab.

In the case of a CI pipeline triggered from a merge request, GitLab
knows the common ancestor of the merge request and the main git repo,
so it can trivially determine if any of the commits associated with
the MR change the dockerfiles.

In the case of a CI pipeline triggered from a push to a branch, it is
much more difficult. There is no concept of a common ancestor in this
case. Instead GitLab looks at the set of commits in the git push event.

On the surface this may sound reasonable, but it doesn't take into
account that a push event does not always contain the full set of
patches from a branch.

For example, consider pushing 5 commits, one of which contains a
dockerfile change. This will trigger a CI pipeline for the
containers. Now consider you do some more work on the branch and push 3
further commits, so you now have a branch of 8 commits. For the second
push GitLab will only look at the 3 most recent commits, the other 5
were already present. Thus GitLab will not realize that the branch has
dockerfile changes that need to trigger the container build.

This can cause real world problems:

 - Push 5 commits to branch "foo", including a dockerfile change

    => rebuilds the container images with content from "foo"
    => build jobs runs against containers from "foo"

 - Refresh your master branch with latest upstream master

    => rebuilds the container images with content from "master"
    => build jobs runs against containers from "master"

 - Push 3 more commits to branch "foo", with no dockerfile change

    => no container rebuild triggers
    => build jobs runs against containers from "master"

The "changes" conditional in gitlab is OK, *provided* your build
jobs are not relying on any external state from previous builds.

This is NOT the case in QEMU, because we are building container
images and these are cached. This is a scenario in which the
"changes" conditional is not usuable.

The only other way to avoid this problem would be to use the git
branch name as the container image tag, instead of always using
"latest". The downside of this approach is that the user's gitlab
registry will grow significantly until it starts to trigger
GitLab's automatic deletion policy.  Every time the user starts
a new branch they will have to trigger a rebuild of the container
images. Given this, we might as well just drop the conditional
and always build the container images. Most of the time docker
will be able to use the layer cache to avoid the most expensive
part of the rebuild process (installing all the RPMs/debs/etc)

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 .gitlab-ci.d/containers.yml | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/.gitlab-ci.d/containers.yml b/.gitlab-ci.d/containers.yml
index 90fac85ce4..33e4046e23 100644
--- a/.gitlab-ci.d/containers.yml
+++ b/.gitlab-ci.d/containers.yml
@@ -20,13 +20,6 @@
     - docker push "$TAG"
   after_script:
     - docker logout
-  rules:
-    - changes:
-      - .gitlab-ci.d/containers.yml
-      - tests/docker/*
-      - tests/docker/dockerfiles/*
-    - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'
-    - if: '$CI_COMMIT_REF_NAME == "testing/next"'
 
 amd64-alpine-container:
   <<: *container_job_definition
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] gitlab: add fine grained job deps for all build jobs
  2021-02-08 16:33 [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
  2021-02-08 16:33 ` [PATCH 1/3] gitlab: always build container images Daniel P. Berrangé
@ 2021-02-08 16:33 ` Daniel P. Berrangé
  2021-02-09  6:39   ` Thomas Huth
  2021-02-08 16:33 ` [PATCH 3/3] gitlab: fix inconsistent indentation Daniel P. Berrangé
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-08 16:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Thomas Huth, Philippe Mathieu-Daudé,
	Daniel P. Berrangé,
	Wainer dos Santos Moschetta

This allows the build jobs to start running as soon as their respective
container image is ready, instead of waiting for all container builds
to finish.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 .gitlab-ci.yml | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 7c0db64710..a1d5a876e3 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -110,6 +110,8 @@ acceptance-system-alpine:
 
 build-system-ubuntu:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-ubuntu2004-container
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-fdt=system --enable-slirp=system
@@ -142,6 +144,8 @@ acceptance-system-ubuntu:
 
 build-system-debian:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-container
   variables:
     IMAGE: debian-amd64
     CONFIGURE_ARGS: --enable-fdt=system
@@ -187,6 +191,8 @@ acceptance-system-debian:
 
 build-system-fedora:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-fedora-container
   variables:
     IMAGE: fedora
     CONFIGURE_ARGS: --disable-gcrypt --enable-nettle --enable-docs
@@ -220,6 +226,8 @@ acceptance-system-fedora:
 
 build-system-centos:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-centos8-container
   variables:
     IMAGE: centos8
     CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
@@ -252,6 +260,8 @@ acceptance-system-centos:
 
 build-system-opensuse:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-opensuse-leap-container
   variables:
     IMAGE: opensuse-leap
     CONFIGURE_ARGS: --enable-fdt=system
@@ -284,6 +294,8 @@ acceptance-system-opensuse:
 
 build-disabled:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-fedora-container
   variables:
     IMAGE: fedora
     CONFIGURE_ARGS:
@@ -366,6 +378,8 @@ build-disabled:
 # available.
 build-tcg-disabled:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-centos8-container
   variables:
     IMAGE: centos8
   script:
@@ -386,6 +400,8 @@ build-tcg-disabled:
 
 build-user:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-user-cross-container
   variables:
     IMAGE: debian-all-test-cross
     CONFIGURE_ARGS: --disable-tools --disable-system
@@ -393,6 +409,8 @@ build-user:
 
 build-user-static:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-user-cross-container
   variables:
     IMAGE: debian-all-test-cross
     CONFIGURE_ARGS: --disable-tools --disable-system --static
@@ -401,6 +419,8 @@ build-user-static:
 # Only build the softmmu targets we have check-tcg tests for
 build-some-softmmu:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-user-cross-container
   variables:
     IMAGE: debian-all-test-cross
     CONFIGURE_ARGS: --disable-tools --enable-debug-tcg
@@ -412,6 +432,8 @@ build-some-softmmu:
 # we skip cris-linux-user as it doesn't use the common run loop
 build-user-plugins:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-user-cross-container
   variables:
     IMAGE: debian-all-test-cross
     CONFIGURE_ARGS: --disable-tools --disable-system --enable-plugins --enable-debug-tcg --target-list-exclude=sparc64-linux-user,cris-linux-user
@@ -427,6 +449,8 @@ build-user-centos7:
 
 build-some-softmmu-plugins:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-user-cross-container
   variables:
     IMAGE: debian-all-test-cross
     CONFIGURE_ARGS: --disable-tools --disable-user --enable-plugins --enable-debug-tcg
@@ -435,6 +459,8 @@ build-some-softmmu-plugins:
 
 build-clang:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-fedora-container
   variables:
     IMAGE: fedora
     CONFIGURE_ARGS: --cc=clang --cxx=clang++
@@ -445,6 +471,8 @@ build-clang:
 # These targets are on the way out
 build-deprecated:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-user-cross-container
   variables:
     IMAGE: debian-all-test-cross
     CONFIGURE_ARGS: --disable-docs --disable-tools
@@ -470,6 +498,8 @@ check-deprecated:
 
 build-oss-fuzz:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-fedora-container
   variables:
     IMAGE: fedora
   script:
@@ -488,6 +518,8 @@ build-oss-fuzz:
 
 build-tci:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-fedora-container
   variables:
     IMAGE: fedora
   script:
@@ -511,6 +543,8 @@ build-tci:
 # However we can't test against KVM on Gitlab-CI so we can only run unit tests
 build-coroutine-ucontext:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-ubuntu2004-container
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --with-coroutine=ucontext --disable-tcg
@@ -518,6 +552,8 @@ build-coroutine-ucontext:
 
 build-coroutine-sigaltstack:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-ubuntu2004-container
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --with-coroutine=sigaltstack --disable-tcg
@@ -529,6 +565,8 @@ build-coroutine-sigaltstack:
 # which had some API differences.
 build-crypto-old-nettle:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-centos7-container
   variables:
     IMAGE: centos7
     TARGETS: x86_64-softmmu x86_64-linux-user
@@ -550,6 +588,8 @@ check-crypto-old-nettle:
 
 build-crypto-old-gcrypt:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-centos7-container
   variables:
     IMAGE: centos7
     TARGETS: x86_64-softmmu x86_64-linux-user
@@ -571,6 +611,8 @@ check-crypto-old-gcrypt:
 
 build-crypto-only-gnutls:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-centos7-container
   variables:
     IMAGE: centos7
     TARGETS: x86_64-softmmu x86_64-linux-user
@@ -592,18 +634,24 @@ check-crypto-only-gnutls:
 # We don't need to exercise every backend with every front-end
 build-trace-multi-user:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-ubuntu2004-container
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-trace-backends=log,simple,syslog --disable-system
 
 build-trace-ftrace-system:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-ubuntu2004-container
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-trace-backends=ftrace --target-list=x86_64-softmmu
 
 build-trace-ust-system:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-ubuntu2004-container
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-trace-backends=ust --target-list=x86_64-softmmu
@@ -611,12 +659,16 @@ build-trace-ust-system:
 # Check our reduced build configurations
 build-without-default-devices:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-centos8-container
   variables:
     IMAGE: centos8
     CONFIGURE_ARGS: --without-default-devices --disable-user
 
 build-without-default-features:
   <<: *native_build_job_definition
+  needs:
+    job: amd64-debian-container
   variables:
     IMAGE: debian-amd64
     CONFIGURE_ARGS: --without-default-features --disable-user
@@ -626,6 +678,8 @@ build-without-default-features:
 check-patch:
   stage: build
   image: $CI_REGISTRY_IMAGE/qemu/centos8:latest
+  needs:
+    job: amd64-centos8-container
   script: .gitlab-ci.d/check-patch.py
   except:
     variables:
@@ -637,6 +691,8 @@ check-patch:
 check-dco:
   stage: build
   image: $CI_REGISTRY_IMAGE/qemu/centos8:latest
+  needs:
+    job: amd64-centos8-container
   script: .gitlab-ci.d/check-dco.py
   except:
     variables:
@@ -647,6 +703,8 @@ check-dco:
 build-libvhost-user:
   stage: build
   image: $CI_REGISTRY_IMAGE/qemu/fedora:latest
+  needs:
+    job: amd64-fedora-container
   before_script:
     - dnf install -y meson ninja-build
   script:
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] gitlab: fix inconsistent indentation
  2021-02-08 16:33 [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
  2021-02-08 16:33 ` [PATCH 1/3] gitlab: always build container images Daniel P. Berrangé
  2021-02-08 16:33 ` [PATCH 2/3] gitlab: add fine grained job deps for all build jobs Daniel P. Berrangé
@ 2021-02-08 16:33 ` Daniel P. Berrangé
  2021-02-08 17:20   ` Philippe Mathieu-Daudé
  2021-02-08 17:22 ` [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
  2021-02-16 10:39 ` Philippe Mathieu-Daudé
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-08 16:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Thomas Huth, Philippe Mathieu-Daudé,
	Daniel P. Berrangé,
	Wainer dos Santos Moschetta

The standard is to use 2 space indent, not 3.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 .gitlab-ci.yml | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index a1d5a876e3..53173da9bf 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -282,14 +282,14 @@ check-system-opensuse:
     MAKE_CHECK_ARGS: check
 
 acceptance-system-opensuse:
-   <<: *native_test_job_definition
-   needs:
-     - job: build-system-opensuse
-       artifacts: true
-   variables:
-     IMAGE: opensuse-leap
-     MAKE_CHECK_ARGS: check-acceptance
-   <<: *acceptance_definition
+  <<: *native_test_job_definition
+  needs:
+    - job: build-system-opensuse
+      artifacts: true
+  variables:
+    IMAGE: opensuse-leap
+    MAKE_CHECK_ARGS: check-acceptance
+  <<: *acceptance_definition
 
 
 build-disabled:
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] gitlab: fix inconsistent indentation
  2021-02-08 16:33 ` [PATCH 3/3] gitlab: fix inconsistent indentation Daniel P. Berrangé
@ 2021-02-08 17:20   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-08 17:20 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Thomas Huth, Alex Bennée, Wainer dos Santos Moschetta

On 2/8/21 5:33 PM, Daniel P. Berrangé wrote:
> The standard is to use 2 space indent, not 3.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  .gitlab-ci.yml | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-08 16:33 [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2021-02-08 16:33 ` [PATCH 3/3] gitlab: fix inconsistent indentation Daniel P. Berrangé
@ 2021-02-08 17:22 ` Daniel P. Berrangé
  2021-02-08 18:08   ` Philippe Mathieu-Daudé
  2021-02-16 10:39 ` Philippe Mathieu-Daudé
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-08 17:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Thomas Huth, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta

On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote:
> This series fixes a problem with our gitlab CI rules that cause
> container builds to be skipped. See the commit description in the
> first patch for the details on this problem.
> 
> The overall result of this series though is a small increase in overall
> pipeline time.
> 
> Previously
> 
>  - When container jobs are skipped: approx 1hr 5 mins
>  - When container jobs are run, cached by docker: approx 1hr 15 mins
>  - When container jobs are run, not cached by docker: approx 1hr 30 mins
> 
> With this series applied the first scenario no longer exists, so
> all piplines are either 1hr 15 or 1hr 30 depending on whether the
> container phase is skipped.

I mean to say the biggest problem I see is the cross-win64-system
job. This consumes 1 hour 5 minutes all on its own. It is at least
15 minutes longer that every other job AFAICT. So no matter how
well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline
duration right now.

We might want to consider how to split the win64 job or cut down
what it does in some way ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-08 17:22 ` [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
@ 2021-02-08 18:08   ` Philippe Mathieu-Daudé
  2021-02-08 18:12     ` Daniel P. Berrangé
  0 siblings, 1 reply; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-08 18:08 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	qemu-devel, Yonggang Luo, Sunil Muthuswamy, Stefan Weil
  Cc: Thomas Huth, Alex Bennée, Wainer dos Santos Moschetta

On 2/8/21 6:22 PM, Daniel P. Berrangé wrote:
> On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote:
>> This series fixes a problem with our gitlab CI rules that cause
>> container builds to be skipped. See the commit description in the
>> first patch for the details on this problem.
>>
>> The overall result of this series though is a small increase in overall
>> pipeline time.
>>
>> Previously
>>
>>  - When container jobs are skipped: approx 1hr 5 mins
>>  - When container jobs are run, cached by docker: approx 1hr 15 mins
>>  - When container jobs are run, not cached by docker: approx 1hr 30 mins
>>
>> With this series applied the first scenario no longer exists, so
>> all piplines are either 1hr 15 or 1hr 30 depending on whether the
>> container phase is skipped.
> 
> I mean to say the biggest problem I see is the cross-win64-system
> job. This consumes 1 hour 5 minutes all on its own. It is at least
> 15 minutes longer that every other job AFAICT. So no matter how
> well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline
> duration right now.
> 
> We might want to consider how to split the win64 job or cut down
> what it does in some way ?

When the win64 build was added (on Debian), it was to mostly to cover
the WHPX. Later we moved mingw jobs to Fedora. I just checked and
WHPX is no more built, and nobody complained, so it is not relevant
anymore.

I don't mind much what you do with the Gitlab win64 job, as this config
is better covered on Cirrus. I'd like to keep the win32 job because it
has been useful to catch 32-bit issues.

Regards,

Phil.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-08 18:08   ` Philippe Mathieu-Daudé
@ 2021-02-08 18:12     ` Daniel P. Berrangé
  2021-02-09  6:01       ` Stefan Weil
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-08 18:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Thomas Huth, Stefan Weil, qemu-devel,
	Wainer dos Santos Moschetta, Yonggang Luo, Sunil Muthuswamy,
	Alex Bennée

On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/8/21 6:22 PM, Daniel P. Berrangé wrote:
> > On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote:
> >> This series fixes a problem with our gitlab CI rules that cause
> >> container builds to be skipped. See the commit description in the
> >> first patch for the details on this problem.
> >>
> >> The overall result of this series though is a small increase in overall
> >> pipeline time.
> >>
> >> Previously
> >>
> >>  - When container jobs are skipped: approx 1hr 5 mins
> >>  - When container jobs are run, cached by docker: approx 1hr 15 mins
> >>  - When container jobs are run, not cached by docker: approx 1hr 30 mins
> >>
> >> With this series applied the first scenario no longer exists, so
> >> all piplines are either 1hr 15 or 1hr 30 depending on whether the
> >> container phase is skipped.
> > 
> > I mean to say the biggest problem I see is the cross-win64-system
> > job. This consumes 1 hour 5 minutes all on its own. It is at least
> > 15 minutes longer that every other job AFAICT. So no matter how
> > well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline
> > duration right now.
> > 
> > We might want to consider how to split the win64 job or cut down
> > what it does in some way ?
> 
> When the win64 build was added (on Debian), it was to mostly to cover
> the WHPX. Later we moved mingw jobs to Fedora. I just checked and
> WHPX is no more built, and nobody complained, so it is not relevant
> anymore.
> 
> I don't mind much what you do with the Gitlab win64 job, as this config
> is better covered on Cirrus. I'd like to keep the win32 job because it
> has been useful to catch 32-bit issues.

I'm not suggesting we remove it. Most developers won't setup Cirrus CI,
so will only run GitLab CI jobs.  IME it is good to have both win32
and win64 coverage because things do break differently on them - especially
if you use bad printf format strings that are not independant of host
word size


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-08 18:12     ` Daniel P. Berrangé
@ 2021-02-09  6:01       ` Stefan Weil
  2021-02-09  6:55         ` Thomas Huth
  2021-02-09  9:53         ` Daniel P. Berrangé
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Weil @ 2021-02-09  6:01 UTC (permalink / raw)
  To: Daniel P. Berrangé, Philippe Mathieu-Daudé
  Cc: Thomas Huth, qemu-devel, Wainer dos Santos Moschetta,
	Yonggang Luo, Sunil Muthuswamy, Alex Bennée

Am 08.02.21 um 19:12 schrieb Daniel P. Berrangé:

> On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote:
>> On 2/8/21 6:22 PM, Daniel P. Berrangé wrote:
>>> On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote:
>>>> This series fixes a problem with our gitlab CI rules that cause
>>>> container builds to be skipped. See the commit description in the
>>>> first patch for the details on this problem.
>>>>
>>>> The overall result of this series though is a small increase in overall
>>>> pipeline time.
>>>>
>>>> Previously
>>>>
>>>>   - When container jobs are skipped: approx 1hr 5 mins
>>>>   - When container jobs are run, cached by docker: approx 1hr 15 mins
>>>>   - When container jobs are run, not cached by docker: approx 1hr 30 mins
>>>>
>>>> With this series applied the first scenario no longer exists, so
>>>> all piplines are either 1hr 15 or 1hr 30 depending on whether the
>>>> container phase is skipped.
>>> I mean to say the biggest problem I see is the cross-win64-system
>>> job. This consumes 1 hour 5 minutes all on its own. It is at least
>>> 15 minutes longer that every other job AFAICT. So no matter how
>>> well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline
>>> duration right now.
>>>
>>> We might want to consider how to split the win64 job or cut down
>>> what it does in some way ?
>> When the win64 build was added (on Debian), it was to mostly to cover
>> the WHPX. Later we moved mingw jobs to Fedora. I just checked and
>> WHPX is no more built, and nobody complained, so it is not relevant
>> anymore.
>>
>> I don't mind much what you do with the Gitlab win64 job, as this config
>> is better covered on Cirrus. I'd like to keep the win32 job because it
>> has been useful to catch 32-bit issues.
> I'm not suggesting we remove it. Most developers won't setup Cirrus CI,
> so will only run GitLab CI jobs.  IME it is good to have both win32
> and win64 coverage because things do break differently on them - especially
> if you use bad printf format strings that are not independant of host
> word size


I would not say that something is not relevant just because nobody 
complains. Nobody would miss any CI if everything were always fine. So 
people would miss WHPX CI as soon as there are changes (which happen 
infrequently) and something breaks. WHPX should be covered by the w64 
build, and as many as possible other features where I currently see a 
"NO" in the configure output as well.

Nevertheless I don't think that each CI job must run frequently. Each 
run not only costs time, but also energy, and contributes to our climate 
change.

I think that for the win32 and win64 jobs it would be sufficient to run 
them weekly, maybe even alternating if that is possible.

Stefan





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] gitlab: always build container images
  2021-02-08 16:33 ` [PATCH 1/3] gitlab: always build container images Daniel P. Berrangé
@ 2021-02-09  6:37   ` Thomas Huth
  2021-02-09  9:58     ` Daniel P. Berrangé
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Huth @ 2021-02-09  6:37 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta

On 08/02/2021 17.33, Daniel P. Berrangé wrote:
[...]
> For example, consider pushing 5 commits, one of which contains a
> dockerfile change. This will trigger a CI pipeline for the
> containers. Now consider you do some more work on the branch and push 3
> further commits, so you now have a branch of 8 commits. For the second
> push GitLab will only look at the 3 most recent commits, the other 5
> were already present. Thus GitLab will not realize that the branch has
> dockerfile changes that need to trigger the container build.
> 
> This can cause real world problems:
> 
>   - Push 5 commits to branch "foo", including a dockerfile change
> 
>      => rebuilds the container images with content from "foo"
>      => build jobs runs against containers from "foo"
> 
>   - Refresh your master branch with latest upstream master
> 
>      => rebuilds the container images with content from "master"
>      => build jobs runs against containers from "master"
> 
>   - Push 3 more commits to branch "foo", with no dockerfile change
> 
>      => no container rebuild triggers
>      => build jobs runs against containers from "master"
> 
> The "changes" conditional in gitlab is OK, *provided* your build
> jobs are not relying on any external state from previous builds.
> 
> This is NOT the case in QEMU, because we are building container
> images and these are cached. This is a scenario in which the
> "changes" conditional is not usuable.
> 
> The only other way to avoid this problem would be to use the git
> branch name as the container image tag, instead of always using
> "latest".
I'm basically fine with your patch, but let me ask one more thing: Won't we 
still have the problem if the user pushes to different branches 
simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles, 
containers start to get rebuild, then pushes to master without waiting for 
the previous CI to finish, then the containers get rebuild from the "master" 
job without the local changes to the dockerfiles. Then in the "foo" CI 
pipelines the following jobs might run with the containers that have been 
built by the "master" job...

So if we really want to get it bulletproof, do we have to use the git branch 
name as the container image tag?

  Thomas



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] gitlab: add fine grained job deps for all build jobs
  2021-02-08 16:33 ` [PATCH 2/3] gitlab: add fine grained job deps for all build jobs Daniel P. Berrangé
@ 2021-02-09  6:39   ` Thomas Huth
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Huth @ 2021-02-09  6:39 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta

On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> This allows the build jobs to start running as soon as their respective
> container image is ready, instead of waiting for all container builds
> to finish.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   .gitlab-ci.yml | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 58 insertions(+)

Please also adjust .gitlab-ci.d/crossbuilds.yml

  Thanks,
   Thomas



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-09  6:01       ` Stefan Weil
@ 2021-02-09  6:55         ` Thomas Huth
  2021-02-09  9:53         ` Daniel P. Berrangé
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Huth @ 2021-02-09  6:55 UTC (permalink / raw)
  To: Stefan Weil, Daniel P. Berrangé, Philippe Mathieu-Daudé
  Cc: Alex Bennée, Yonggang Luo, Sunil Muthuswamy, qemu-devel,
	Wainer dos Santos Moschetta

On 09/02/2021 07.01, Stefan Weil wrote:
> Am 08.02.21 um 19:12 schrieb Daniel P. Berrangé:
> 
>> On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote:
>>> On 2/8/21 6:22 PM, Daniel P. Berrangé wrote:
>>>> On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote:
>>>>> This series fixes a problem with our gitlab CI rules that cause
>>>>> container builds to be skipped. See the commit description in the
>>>>> first patch for the details on this problem.
>>>>>
>>>>> The overall result of this series though is a small increase in overall
>>>>> pipeline time.
>>>>>
>>>>> Previously
>>>>>
>>>>>   - When container jobs are skipped: approx 1hr 5 mins
>>>>>   - When container jobs are run, cached by docker: approx 1hr 15 mins
>>>>>   - When container jobs are run, not cached by docker: approx 1hr 30 mins
>>>>>
>>>>> With this series applied the first scenario no longer exists, so
>>>>> all piplines are either 1hr 15 or 1hr 30 depending on whether the
>>>>> container phase is skipped.
>>>> I mean to say the biggest problem I see is the cross-win64-system
>>>> job. This consumes 1 hour 5 minutes all on its own. It is at least
>>>> 15 minutes longer that every other job AFAICT. So no matter how
>>>> well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline
>>>> duration right now.
>>>>
>>>> We might want to consider how to split the win64 job or cut down
>>>> what it does in some way ?
>>> When the win64 build was added (on Debian), it was to mostly to cover
>>> the WHPX. Later we moved mingw jobs to Fedora. I just checked and
>>> WHPX is no more built, and nobody complained, so it is not relevant
>>> anymore.
>>>
>>> I don't mind much what you do with the Gitlab win64 job, as this config
>>> is better covered on Cirrus. I'd like to keep the win32 job because it
>>> has been useful to catch 32-bit issues.
>> I'm not suggesting we remove it. Most developers won't setup Cirrus CI,
>> so will only run GitLab CI jobs.  IME it is good to have both win32
>> and win64 coverage because things do break differently on them - especially
>> if you use bad printf format strings that are not independant of host
>> word size
> 
> 
> I would not say that something is not relevant just because nobody 
> complains. Nobody would miss any CI if everything were always fine. So 
> people would miss WHPX CI as soon as there are changes (which happen 
> infrequently) and something breaks. WHPX should be covered by the w64 build, 
> and as many as possible other features where I currently see a "NO" in the 
> configure output as well.

Yes, I agree, we should add WHPX there again. Could somebody please check 
whether the headers are already available in the latest Fedora? Then we 
might simply switch the container to use the latest version of Fedora instead.

Otherwise we should install the headers manually there, like we did in 
commit d3dd34a1e5e134 for the now-removed Debian container.

> Nevertheless I don't think that each CI job must run frequently. Each run 
> not only costs time, but also energy, and contributes to our climate change.
> 
> I think that for the win32 and win64 jobs it would be sufficient to run them 
> weekly, maybe even alternating if that is possible.

Maybe it would be sufficient to run those jobs only on tags (so that they 
get checked for pull requests) and on the master and staging branch?

  Thomas



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-09  6:01       ` Stefan Weil
  2021-02-09  6:55         ` Thomas Huth
@ 2021-02-09  9:53         ` Daniel P. Berrangé
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-09  9:53 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Thomas Huth, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, qemu-devel, Yonggang Luo,
	Sunil Muthuswamy, Alex Bennée

On Tue, Feb 09, 2021 at 07:01:49AM +0100, Stefan Weil wrote:
> Am 08.02.21 um 19:12 schrieb Daniel P. Berrangé:
> 
> > On Mon, Feb 08, 2021 at 07:08:39PM +0100, Philippe Mathieu-Daudé wrote:
> > > On 2/8/21 6:22 PM, Daniel P. Berrangé wrote:
> > > > On Mon, Feb 08, 2021 at 04:33:36PM +0000, Daniel P. Berrangé wrote:
> > > > > This series fixes a problem with our gitlab CI rules that cause
> > > > > container builds to be skipped. See the commit description in the
> > > > > first patch for the details on this problem.
> > > > > 
> > > > > The overall result of this series though is a small increase in overall
> > > > > pipeline time.
> > > > > 
> > > > > Previously
> > > > > 
> > > > >   - When container jobs are skipped: approx 1hr 5 mins
> > > > >   - When container jobs are run, cached by docker: approx 1hr 15 mins
> > > > >   - When container jobs are run, not cached by docker: approx 1hr 30 mins
> > > > > 
> > > > > With this series applied the first scenario no longer exists, so
> > > > > all piplines are either 1hr 15 or 1hr 30 depending on whether the
> > > > > container phase is skipped.
> > > > I mean to say the biggest problem I see is the cross-win64-system
> > > > job. This consumes 1 hour 5 minutes all on its own. It is at least
> > > > 15 minutes longer that every other job AFAICT. So no matter how
> > > > well we parallelize stuff, 1 hr 5 is a hard lower limit on pipeline
> > > > duration right now.
> > > > 
> > > > We might want to consider how to split the win64 job or cut down
> > > > what it does in some way ?
> > > When the win64 build was added (on Debian), it was to mostly to cover
> > > the WHPX. Later we moved mingw jobs to Fedora. I just checked and
> > > WHPX is no more built, and nobody complained, so it is not relevant
> > > anymore.
> > > 
> > > I don't mind much what you do with the Gitlab win64 job, as this config
> > > is better covered on Cirrus. I'd like to keep the win32 job because it
> > > has been useful to catch 32-bit issues.
> > I'm not suggesting we remove it. Most developers won't setup Cirrus CI,
> > so will only run GitLab CI jobs.  IME it is good to have both win32
> > and win64 coverage because things do break differently on them - especially
> > if you use bad printf format strings that are not independant of host
> > word size
> 
> 
> I would not say that something is not relevant just because nobody
> complains. Nobody would miss any CI if everything were always fine. So
> people would miss WHPX CI as soon as there are changes (which happen
> infrequently) and something breaks. WHPX should be covered by the w64 build,
> and as many as possible other features where I currently see a "NO" in the
> configure output as well.
> 
> Nevertheless I don't think that each CI job must run frequently. Each run
> not only costs time, but also energy, and contributes to our climate change.
> 
> I think that for the win32 and win64 jobs it would be sufficient to run them
> weekly, maybe even alternating if that is possible.

I think that would be a major step backwards. Not only do we need these
jobs to be gating for merges into git master to prevent regressions getting
merged, but we want want to stop contributors and subsystem maintainers
sending broken patch series.  Not running them every time will loose these
protections.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] gitlab: always build container images
  2021-02-09  6:37   ` Thomas Huth
@ 2021-02-09  9:58     ` Daniel P. Berrangé
  2021-02-10 11:17       ` Daniel P. Berrangé
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-09  9:58 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Philippe Mathieu-Daudé,
	Alex Bennée, qemu-devel, Wainer dos Santos Moschetta

On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> [...]
> > For example, consider pushing 5 commits, one of which contains a
> > dockerfile change. This will trigger a CI pipeline for the
> > containers. Now consider you do some more work on the branch and push 3
> > further commits, so you now have a branch of 8 commits. For the second
> > push GitLab will only look at the 3 most recent commits, the other 5
> > were already present. Thus GitLab will not realize that the branch has
> > dockerfile changes that need to trigger the container build.
> > 
> > This can cause real world problems:
> > 
> >   - Push 5 commits to branch "foo", including a dockerfile change
> > 
> >      => rebuilds the container images with content from "foo"
> >      => build jobs runs against containers from "foo"
> > 
> >   - Refresh your master branch with latest upstream master
> > 
> >      => rebuilds the container images with content from "master"
> >      => build jobs runs against containers from "master"
> > 
> >   - Push 3 more commits to branch "foo", with no dockerfile change
> > 
> >      => no container rebuild triggers
> >      => build jobs runs against containers from "master"
> > 
> > The "changes" conditional in gitlab is OK, *provided* your build
> > jobs are not relying on any external state from previous builds.
> > 
> > This is NOT the case in QEMU, because we are building container
> > images and these are cached. This is a scenario in which the
> > "changes" conditional is not usuable.
> > 
> > The only other way to avoid this problem would be to use the git
> > branch name as the container image tag, instead of always using
> > "latest".
> I'm basically fine with your patch, but let me ask one more thing: Won't we
> still have the problem if the user pushes to different branches
> simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> containers start to get rebuild, then pushes to master without waiting for
> the previous CI to finish, then the containers get rebuild from the "master"
> job without the local changes to the dockerfiles. Then in the "foo" CI
> pipelines the following jobs might run with the containers that have been
> built by the "master" job...

Yes,  this is the issue I describe in the cover letter.

> So if we really want to get it bulletproof, do we have to use the git branch
> name as the container image tag?

That is possible, but I'm somewhat loathe to do that, as it means the
container registry in developers forks will accumulate a growing list
of image tags. I know gitlab will force expire once it gets beyond a
certain number of tags, but it still felt pretty wasteful of space
to create so many tags.

Having said that, maybe this is not actually wasteful if we always
use the "master" as a cache for docker, then the "new" images we
build on each branch will just re-use existing docker layers and
thus not add to disk usage. We'd only see extra usage if the branch
contained changes to dockerfiles.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] gitlab: always build container images
  2021-02-09  9:58     ` Daniel P. Berrangé
@ 2021-02-10 11:17       ` Daniel P. Berrangé
  2021-02-16 12:43         ` Daniel P. Berrangé
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-10 11:17 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Alex Bennée, Philippe Mathieu-Daudé,
	qemu-devel, Wainer dos Santos Moschetta

On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> > On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> > [...]
> > > For example, consider pushing 5 commits, one of which contains a
> > > dockerfile change. This will trigger a CI pipeline for the
> > > containers. Now consider you do some more work on the branch and push 3
> > > further commits, so you now have a branch of 8 commits. For the second
> > > push GitLab will only look at the 3 most recent commits, the other 5
> > > were already present. Thus GitLab will not realize that the branch has
> > > dockerfile changes that need to trigger the container build.
> > > 
> > > This can cause real world problems:
> > > 
> > >   - Push 5 commits to branch "foo", including a dockerfile change
> > > 
> > >      => rebuilds the container images with content from "foo"
> > >      => build jobs runs against containers from "foo"
> > > 
> > >   - Refresh your master branch with latest upstream master
> > > 
> > >      => rebuilds the container images with content from "master"
> > >      => build jobs runs against containers from "master"
> > > 
> > >   - Push 3 more commits to branch "foo", with no dockerfile change
> > > 
> > >      => no container rebuild triggers
> > >      => build jobs runs against containers from "master"
> > > 
> > > The "changes" conditional in gitlab is OK, *provided* your build
> > > jobs are not relying on any external state from previous builds.
> > > 
> > > This is NOT the case in QEMU, because we are building container
> > > images and these are cached. This is a scenario in which the
> > > "changes" conditional is not usuable.
> > > 
> > > The only other way to avoid this problem would be to use the git
> > > branch name as the container image tag, instead of always using
> > > "latest".
> > I'm basically fine with your patch, but let me ask one more thing: Won't we
> > still have the problem if the user pushes to different branches
> > simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> > containers start to get rebuild, then pushes to master without waiting for
> > the previous CI to finish, then the containers get rebuild from the "master"
> > job without the local changes to the dockerfiles. Then in the "foo" CI
> > pipelines the following jobs might run with the containers that have been
> > built by the "master" job...
> 
> Yes,  this is the issue I describe in the cover letter.
> 
> > So if we really want to get it bulletproof, do we have to use the git branch
> > name as the container image tag?
> 
> That is possible, but I'm somewhat loathe to do that, as it means the
> container registry in developers forks will accumulate a growing list
> of image tags. I know gitlab will force expire once it gets beyond a
> certain number of tags, but it still felt pretty wasteful of space
> to create so many tags.
> 
> Having said that, maybe this is not actually wasteful if we always
> use the "master" as a cache for docker, then the "new" images we
> build on each branch will just re-use existing docker layers and
> thus not add to disk usage. We'd only see extra usage if the branch
> contained changes to dockerfiles.

The challenge here is that I need the docker tag name to be in an env
variable in the gitlab-ci.yml file.

I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
the list of valid characters for a git branch is way more permissive
than valid characters for a docker tag.

So we need to filter the git branch name to form a valid docker tag,
and AFAICT, there's no way todo that when setting a global env variable
in the gitlab-ci.yml.  I can only do filtering once in the before_script:
stage, and that's too late to use it in the image name for the job.

We could ignore the problem and hope people always have sane branch
names ? 

   https://docs.docker.com/engine/reference/commandline/tag/

  "A tag name must be valid ASCII and may contain lowercase and 
   uppercase letters, digits, underscores, periods and dashes. 
   A tag name may not start with a period or a dash and may 
   contain a maximum of 128 characters."

that rule would cover all my git branch names, but then ASCII covers
most common english needs.  I worry that we might have contributors
who genuinely use non-ASCII chars in their git branch names, especially
those speakers of non-english/european languages eg persian, chinese,
japanese languages for example. Git is very permissive, allowing
everything except a short list

   https://www.spinics.net/lists/git/msg133704.html

  "A branch name can not:
        - Have a path component that begins with "."
        - Have a double dot ".."
        - Have an ASCII control character, "~", "^", ":" or SP, anywhere
        - End with a "/"
        - End with ".lock"
        - Contain a "\" (backslash"

The result will be if someone names their git branch "🏂", then all
the CI jobs will fail in gitlab.

 $ git branch 🏂

works

 $ docker  tag 470671670cac foo:🏂
  Error: invalid reference format

fails

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] fix build failures from incorrectly skipped container build jobs
  2021-02-08 16:33 [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2021-02-08 17:22 ` [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
@ 2021-02-16 10:39 ` Philippe Mathieu-Daudé
  4 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-16 10:39 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Richard Henderson, Thomas Huth, Alex Bennée,
	Wainer dos Santos Moschetta

On 2/8/21 5:33 PM, Daniel P. Berrangé wrote:
> This series fixes a problem with our gitlab CI rules that cause
> container builds to be skipped. See the commit description in the
> first patch for the details on this problem.
...

> There is still a race condition though where build jobs can run
> with the wrong containers. This happens if you push two different
> branches to gitlab with different docker file content. If the
> container jobs for the 2nd branch finish before the 1st
> branch runs its build jobs, the 1st branch can end up using
> containers fro the second branch.  The only fix to truely fix
> that would be to stop using "latest" docker tag and always
> use a tag based on the branch name. This would mean we build
> up a growing set of docker images in the gitlab registry.

OK this indeed describes the problem I'm facing.

> At least this series is much more correct that what exists in
> git currently.

Good, I'll test it then.

Regards,

Phil.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] gitlab: always build container images
  2021-02-10 11:17       ` Daniel P. Berrangé
@ 2021-02-16 12:43         ` Daniel P. Berrangé
  2021-02-16 13:02           ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-16 12:43 UTC (permalink / raw)
  To: Thomas Huth, Alex Bennée, Philippe Mathieu-Daudé,
	qemu-devel, Wainer dos Santos Moschetta

On Wed, Feb 10, 2021 at 11:17:00AM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
> > On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> > > On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> > > [...]
> > > > For example, consider pushing 5 commits, one of which contains a
> > > > dockerfile change. This will trigger a CI pipeline for the
> > > > containers. Now consider you do some more work on the branch and push 3
> > > > further commits, so you now have a branch of 8 commits. For the second
> > > > push GitLab will only look at the 3 most recent commits, the other 5
> > > > were already present. Thus GitLab will not realize that the branch has
> > > > dockerfile changes that need to trigger the container build.
> > > > 
> > > > This can cause real world problems:
> > > > 
> > > >   - Push 5 commits to branch "foo", including a dockerfile change
> > > > 
> > > >      => rebuilds the container images with content from "foo"
> > > >      => build jobs runs against containers from "foo"
> > > > 
> > > >   - Refresh your master branch with latest upstream master
> > > > 
> > > >      => rebuilds the container images with content from "master"
> > > >      => build jobs runs against containers from "master"
> > > > 
> > > >   - Push 3 more commits to branch "foo", with no dockerfile change
> > > > 
> > > >      => no container rebuild triggers
> > > >      => build jobs runs against containers from "master"
> > > > 
> > > > The "changes" conditional in gitlab is OK, *provided* your build
> > > > jobs are not relying on any external state from previous builds.
> > > > 
> > > > This is NOT the case in QEMU, because we are building container
> > > > images and these are cached. This is a scenario in which the
> > > > "changes" conditional is not usuable.
> > > > 
> > > > The only other way to avoid this problem would be to use the git
> > > > branch name as the container image tag, instead of always using
> > > > "latest".
> > > I'm basically fine with your patch, but let me ask one more thing: Won't we
> > > still have the problem if the user pushes to different branches
> > > simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> > > containers start to get rebuild, then pushes to master without waiting for
> > > the previous CI to finish, then the containers get rebuild from the "master"
> > > job without the local changes to the dockerfiles. Then in the "foo" CI
> > > pipelines the following jobs might run with the containers that have been
> > > built by the "master" job...
> > 
> > Yes,  this is the issue I describe in the cover letter.
> > 
> > > So if we really want to get it bulletproof, do we have to use the git branch
> > > name as the container image tag?
> > 
> > That is possible, but I'm somewhat loathe to do that, as it means the
> > container registry in developers forks will accumulate a growing list
> > of image tags. I know gitlab will force expire once it gets beyond a
> > certain number of tags, but it still felt pretty wasteful of space
> > to create so many tags.
> > 
> > Having said that, maybe this is not actually wasteful if we always
> > use the "master" as a cache for docker, then the "new" images we
> > build on each branch will just re-use existing docker layers and
> > thus not add to disk usage. We'd only see extra usage if the branch
> > contained changes to dockerfiles.
> 
> The challenge here is that I need the docker tag name to be in an env
> variable in the gitlab-ci.yml file.
> 
> I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
> the list of valid characters for a git branch is way more permissive
> than valid characters for a docker tag.
> 
> So we need to filter the git branch name to form a valid docker tag,
> and AFAICT, there's no way todo that when setting a global env variable
> in the gitlab-ci.yml.  I can only do filtering once in the before_script:
> stage, and that's too late to use it in the image name for the job.

I've thought of a solution here.

We can tag the images with $CI_COMMIT_SHORT_SHA , and the build jobs
can reference them with 

  image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:$CI_COMMIT_SHORT_SHA

In the continer build script, we then *also* tag them with a sanitized
version of $CI_COMMIT_REF_NAME, and also use this as the cache to pull
from when building the image.

The main downside here is that we'll end up creating alot of tags, but
most will have the same content so shouldn't be too bad.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] gitlab: always build container images
  2021-02-16 12:43         ` Daniel P. Berrangé
@ 2021-02-16 13:02           ` Philippe Mathieu-Daudé
  2021-02-16 13:15             ` Daniel P. Berrangé
  0 siblings, 1 reply; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-16 13:02 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Thomas Huth, Alex Bennée, qemu-devel,
	Wainer dos Santos Moschetta

On 2/16/21 1:43 PM, Daniel P. Berrangé wrote:
> On Wed, Feb 10, 2021 at 11:17:00AM +0000, Daniel P. Berrangé wrote:
>> On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
>>> On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
>>>> On 08/02/2021 17.33, Daniel P. Berrangé wrote:
>>>> [...]
>>>>> For example, consider pushing 5 commits, one of which contains a
>>>>> dockerfile change. This will trigger a CI pipeline for the
>>>>> containers. Now consider you do some more work on the branch and push 3
>>>>> further commits, so you now have a branch of 8 commits. For the second
>>>>> push GitLab will only look at the 3 most recent commits, the other 5
>>>>> were already present. Thus GitLab will not realize that the branch has
>>>>> dockerfile changes that need to trigger the container build.
>>>>>
>>>>> This can cause real world problems:
>>>>>
>>>>>   - Push 5 commits to branch "foo", including a dockerfile change
>>>>>
>>>>>      => rebuilds the container images with content from "foo"
>>>>>      => build jobs runs against containers from "foo"
>>>>>
>>>>>   - Refresh your master branch with latest upstream master
>>>>>
>>>>>      => rebuilds the container images with content from "master"
>>>>>      => build jobs runs against containers from "master"
>>>>>
>>>>>   - Push 3 more commits to branch "foo", with no dockerfile change
>>>>>
>>>>>      => no container rebuild triggers
>>>>>      => build jobs runs against containers from "master"
>>>>>
>>>>> The "changes" conditional in gitlab is OK, *provided* your build
>>>>> jobs are not relying on any external state from previous builds.
>>>>>
>>>>> This is NOT the case in QEMU, because we are building container
>>>>> images and these are cached. This is a scenario in which the
>>>>> "changes" conditional is not usuable.
>>>>>
>>>>> The only other way to avoid this problem would be to use the git
>>>>> branch name as the container image tag, instead of always using
>>>>> "latest".
>>>> I'm basically fine with your patch, but let me ask one more thing: Won't we
>>>> still have the problem if the user pushes to different branches
>>>> simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
>>>> containers start to get rebuild, then pushes to master without waiting for
>>>> the previous CI to finish, then the containers get rebuild from the "master"
>>>> job without the local changes to the dockerfiles. Then in the "foo" CI
>>>> pipelines the following jobs might run with the containers that have been
>>>> built by the "master" job...
>>>
>>> Yes,  this is the issue I describe in the cover letter.
>>>
>>>> So if we really want to get it bulletproof, do we have to use the git branch
>>>> name as the container image tag?
>>>
>>> That is possible, but I'm somewhat loathe to do that, as it means the
>>> container registry in developers forks will accumulate a growing list
>>> of image tags. I know gitlab will force expire once it gets beyond a
>>> certain number of tags, but it still felt pretty wasteful of space
>>> to create so many tags.
>>>
>>> Having said that, maybe this is not actually wasteful if we always
>>> use the "master" as a cache for docker, then the "new" images we
>>> build on each branch will just re-use existing docker layers and
>>> thus not add to disk usage. We'd only see extra usage if the branch
>>> contained changes to dockerfiles.
>>
>> The challenge here is that I need the docker tag name to be in an env
>> variable in the gitlab-ci.yml file.
>>
>> I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
>> the list of valid characters for a git branch is way more permissive
>> than valid characters for a docker tag.
>>
>> So we need to filter the git branch name to form a valid docker tag,
>> and AFAICT, there's no way todo that when setting a global env variable
>> in the gitlab-ci.yml.  I can only do filtering once in the before_script:
>> stage, and that's too late to use it in the image name for the job.
> 
> I've thought of a solution here.
> 
> We can tag the images with $CI_COMMIT_SHORT_SHA , and the build jobs
> can reference them with 
> 
>   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:$CI_COMMIT_SHORT_SHA
> 
> In the continer build script, we then *also* tag them with a sanitized
> version of $CI_COMMIT_REF_NAME, and also use this as the cache to pull
> from when building the image.
> 
> The main downside here is that we'll end up creating alot of tags, but
> most will have the same content so shouldn't be too bad.

This could be automated (for forks):

https://docs.gitlab.com/ee/user/packages/container_registry/#delete-images-by-using-a-cleanup-policy

Not yet to the qemu-project registry because:

  Cleanup policies can be run on all projects, with these exceptions:

    For GitLab.com, the project must have been created after 2020-02-22.

Regards,

Phil.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] gitlab: always build container images
  2021-02-16 13:02           ` Philippe Mathieu-Daudé
@ 2021-02-16 13:15             ` Daniel P. Berrangé
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel P. Berrangé @ 2021-02-16 13:15 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Thomas Huth, Alex Bennée, qemu-devel, Wainer dos Santos Moschetta

On Tue, Feb 16, 2021 at 02:02:31PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/16/21 1:43 PM, Daniel P. Berrangé wrote:
> > On Wed, Feb 10, 2021 at 11:17:00AM +0000, Daniel P. Berrangé wrote:
> >> On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
> >>> On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> >>>> On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> >>>> [...]
> >>>>> For example, consider pushing 5 commits, one of which contains a
> >>>>> dockerfile change. This will trigger a CI pipeline for the
> >>>>> containers. Now consider you do some more work on the branch and push 3
> >>>>> further commits, so you now have a branch of 8 commits. For the second
> >>>>> push GitLab will only look at the 3 most recent commits, the other 5
> >>>>> were already present. Thus GitLab will not realize that the branch has
> >>>>> dockerfile changes that need to trigger the container build.
> >>>>>
> >>>>> This can cause real world problems:
> >>>>>
> >>>>>   - Push 5 commits to branch "foo", including a dockerfile change
> >>>>>
> >>>>>      => rebuilds the container images with content from "foo"
> >>>>>      => build jobs runs against containers from "foo"
> >>>>>
> >>>>>   - Refresh your master branch with latest upstream master
> >>>>>
> >>>>>      => rebuilds the container images with content from "master"
> >>>>>      => build jobs runs against containers from "master"
> >>>>>
> >>>>>   - Push 3 more commits to branch "foo", with no dockerfile change
> >>>>>
> >>>>>      => no container rebuild triggers
> >>>>>      => build jobs runs against containers from "master"
> >>>>>
> >>>>> The "changes" conditional in gitlab is OK, *provided* your build
> >>>>> jobs are not relying on any external state from previous builds.
> >>>>>
> >>>>> This is NOT the case in QEMU, because we are building container
> >>>>> images and these are cached. This is a scenario in which the
> >>>>> "changes" conditional is not usuable.
> >>>>>
> >>>>> The only other way to avoid this problem would be to use the git
> >>>>> branch name as the container image tag, instead of always using
> >>>>> "latest".
> >>>> I'm basically fine with your patch, but let me ask one more thing: Won't we
> >>>> still have the problem if the user pushes to different branches
> >>>> simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> >>>> containers start to get rebuild, then pushes to master without waiting for
> >>>> the previous CI to finish, then the containers get rebuild from the "master"
> >>>> job without the local changes to the dockerfiles. Then in the "foo" CI
> >>>> pipelines the following jobs might run with the containers that have been
> >>>> built by the "master" job...
> >>>
> >>> Yes,  this is the issue I describe in the cover letter.
> >>>
> >>>> So if we really want to get it bulletproof, do we have to use the git branch
> >>>> name as the container image tag?
> >>>
> >>> That is possible, but I'm somewhat loathe to do that, as it means the
> >>> container registry in developers forks will accumulate a growing list
> >>> of image tags. I know gitlab will force expire once it gets beyond a
> >>> certain number of tags, but it still felt pretty wasteful of space
> >>> to create so many tags.
> >>>
> >>> Having said that, maybe this is not actually wasteful if we always
> >>> use the "master" as a cache for docker, then the "new" images we
> >>> build on each branch will just re-use existing docker layers and
> >>> thus not add to disk usage. We'd only see extra usage if the branch
> >>> contained changes to dockerfiles.
> >>
> >> The challenge here is that I need the docker tag name to be in an env
> >> variable in the gitlab-ci.yml file.
> >>
> >> I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
> >> the list of valid characters for a git branch is way more permissive
> >> than valid characters for a docker tag.
> >>
> >> So we need to filter the git branch name to form a valid docker tag,
> >> and AFAICT, there's no way todo that when setting a global env variable
> >> in the gitlab-ci.yml.  I can only do filtering once in the before_script:
> >> stage, and that's too late to use it in the image name for the job.
> > 
> > I've thought of a solution here.
> > 
> > We can tag the images with $CI_COMMIT_SHORT_SHA , and the build jobs
> > can reference them with 
> > 
> >   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:$CI_COMMIT_SHORT_SHA
> > 
> > In the continer build script, we then *also* tag them with a sanitized
> > version of $CI_COMMIT_REF_NAME, and also use this as the cache to pull
> > from when building the image.
> > 
> > The main downside here is that we'll end up creating alot of tags, but
> > most will have the same content so shouldn't be too bad.
> 
> This could be automated (for forks):
> 
> https://docs.gitlab.com/ee/user/packages/container_registry/#delete-images-by-using-a-cleanup-policy
> 
> Not yet to the qemu-project registry because:
> 
>   Cleanup policies can be run on all projects, with these exceptions:
> 
>     For GitLab.com, the project must have been created after 2020-02-22.

NB, when they say "project" here it appears to refer to the top level
namespace, ie your personal account namespace, not the individual repos.

None of my repos allow expiration to be turned on, even repos I only
created last week :-(

It apears they are getting close to removing this restriction though

  https://gitlab.com/gitlab-org/gitlab/-/issues/196124#note_492157369

so perhaps in the next 6 months expiration will be active.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-02-16 13:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-08 16:33 [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
2021-02-08 16:33 ` [PATCH 1/3] gitlab: always build container images Daniel P. Berrangé
2021-02-09  6:37   ` Thomas Huth
2021-02-09  9:58     ` Daniel P. Berrangé
2021-02-10 11:17       ` Daniel P. Berrangé
2021-02-16 12:43         ` Daniel P. Berrangé
2021-02-16 13:02           ` Philippe Mathieu-Daudé
2021-02-16 13:15             ` Daniel P. Berrangé
2021-02-08 16:33 ` [PATCH 2/3] gitlab: add fine grained job deps for all build jobs Daniel P. Berrangé
2021-02-09  6:39   ` Thomas Huth
2021-02-08 16:33 ` [PATCH 3/3] gitlab: fix inconsistent indentation Daniel P. Berrangé
2021-02-08 17:20   ` Philippe Mathieu-Daudé
2021-02-08 17:22 ` [PATCH 0/3] fix build failures from incorrectly skipped container build jobs Daniel P. Berrangé
2021-02-08 18:08   ` Philippe Mathieu-Daudé
2021-02-08 18:12     ` Daniel P. Berrangé
2021-02-09  6:01       ` Stefan Weil
2021-02-09  6:55         ` Thomas Huth
2021-02-09  9:53         ` Daniel P. Berrangé
2021-02-16 10:39 ` Philippe Mathieu-Daudé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.