[RFC 0/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
@ 2021-09-14 18:48 Willian Rampazzo
  2021-09-14 18:48 ` [RFC 1/1] " Willian Rampazzo
  0 siblings, 1 reply; 6+ messages in thread
From: Willian Rampazzo @ 2021-09-14 18:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, Daniel P . Berrangé,
	Alex Bennée, Wainer dos Santos Moschetta, Cleber Rosa,
	Philippe Mathieu-Daudé

This adds a high-level plan for the QEMU GitLab CI based on use cases.
The idea is to have a base for evolving the QEMU CI. It sets high-level
characteristics for the QEMU CI use cases, which helps guide its
development.

There is an opportunity to discuss the high-level QEMU CI plan and some of
the implementation details during the KVM Forum.

Willian Rampazzo (1):
  docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI

 docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
 docs/devel/ci.rst      |  1 +
 2 files changed, 78 insertions(+)
 create mode 100644 docs/devel/ci-plan.rst

-- 
2.31.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
  2021-09-14 18:48 [RFC 0/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI Willian Rampazzo
@ 2021-09-14 18:48 ` Willian Rampazzo
  2021-09-15  8:59   ` Daniel P. Berrangé
  0 siblings, 1 reply; 6+ messages in thread
From: Willian Rampazzo @ 2021-09-14 18:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, Daniel P . Berrangé,
	Alex Bennée, Wainer dos Santos Moschetta, Cleber Rosa,
	Philippe Mathieu-Daudé

This adds a high-level plan for the QEMU GitLab CI based on use cases.
The idea is to have a base for evolving the QEMU CI. It sets high-level
characteristics for the QEMU CI use cases, which helps guide its
development.

Signed-off-by: Willian Rampazzo <willianr@redhat.com>
---
 docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
 docs/devel/ci.rst      |  1 +
 2 files changed, 78 insertions(+)
 create mode 100644 docs/devel/ci-plan.rst

diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
new file mode 100644
index 0000000000..5e95b6bcea
--- /dev/null
+++ b/docs/devel/ci-plan.rst
@@ -0,0 +1,77 @@
+The GitLab CI structure
+=======================
+
+This section describes the current state of the QEMU GitLab CI and the
+high-level plan for its future.
+
+Current state
+-------------
+
+The mainstream QEMU project considers the GitLab CI its primary CI system.
+Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 are QEMU
+build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
+external job covering Travis jobs execution.
+
+In the current state, every push a user does to its fork runs most of the jobs
+compared to the jobs running on the main repository. The exceptions are the
+acceptance tests jobs, which run automatically on the main repository only.
+Running most of the jobs in the user's fork or the main repository is not
+viable. The job number tends to increase, becoming impractical to run all of
+them on every single push.
+
+Future of QEMU GitLab CI
+------------------------
+
+Following is a proposal to establish a high-level plan and set the
+characteristics for the QEMU GitLab CI. The idea is to organize the CI by use
+cases, avoid wasting resources and CI minutes, anticipating the time GitLab
+starts to enforce minutes limits soon.
+
+Use cases
+^^^^^^^^^
+
+Below is a list of the most common use cases for the QEMU GitLab CI.
+
+Gating
+""""""
+
+The gating set of jobs runs on the maintainer's pull requests when the project
+leader pushes them to the staging branch of the project. The gating CI pipeline
+has the following characteristics:
+
+ * Jobs tagged as gating run as part of the gating CI pipeline;
+ * The gating CI pipeline consists of stable jobs;
+ * The execution duration of the gating CI pipeline should, as much as possible,
+   have an upper bound limit of 2 hours.
+
+Developers
+""""""""""
+
+A developer working on a new feature or fixing an issue may want to run/propose
+a specific set of tests. Those tests may, eventually, benefit other developers.
+A developer CI pipeline has the following characteristics:
+
+ * It is easy to run current tests available in the project;
+ * It is easy to add new tests or remove unneeded tests;
+ * It is flexible enough to allow changes in the current jobs.
+
+Maintainers
+"""""""""""
+
+When accepting developers' patches, a maintainer may want to run a specific
+test set. A maintainer CI pipeline has the following characteristics:
+
+ * It consists of tests that are valuable for the subsystem;
+ * It is easy to run a set of specific tests available in the project;
+ * It is easy to add new tests or remove unneeded tests.
+
+Scheduled / periodic pipelines
+""""""""""""""""""""""""""""""
+
+The scheduled CI pipeline runs periodically on the master/main branch of the
+project. It covers as many jobs as needed or allowed by the execution duration
+of GitLab CI. The main idea of this pipeline is to run jobs that are not part
+of any other use cases due to some limitations, like execution duration, or
+flakiness. This pipeline may be helpful, for example, to collect test/job
+statistics or to define test/job stability. The scheduled CI pipeline should
+not act as a gating CI pipeline.
diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst
index 8d95247188..c9a43f997d 100644
--- a/docs/devel/ci.rst
+++ b/docs/devel/ci.rst
@@ -9,5 +9,6 @@ found at::
    https://wiki.qemu.org/Testing/CI
 
 .. include:: ci-definitions.rst
+.. include:: ci-plan.rst
 .. include:: ci-jobs.rst
 .. include:: ci-runners.rst
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
  2021-09-14 18:48 ` [RFC 1/1] " Willian Rampazzo
@ 2021-09-15  8:59   ` Daniel P. Berrangé
  2021-09-15 13:51     ` Willian Rampazzo
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel P. Berrangé @ 2021-09-15  8:59 UTC (permalink / raw)
  To: Willian Rampazzo
  Cc: Peter Maydell, Thomas Huth, Alex Bennée, qemu-devel,
	Wainer dos Santos Moschetta, Cleber Rosa,
	Philippe Mathieu-Daudé

On Tue, Sep 14, 2021 at 03:48:30PM -0300, Willian Rampazzo wrote:
> This adds a high-level plan for the QEMU GitLab CI based on use cases.
> The idea is to have a base for evolving the QEMU CI. It sets high-level
> characteristics for the QEMU CI use cases, which helps guide its
> development.
> 
> Signed-off-by: Willian Rampazzo <willianr@redhat.com>
> ---
>  docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
>  docs/devel/ci.rst      |  1 +
>  2 files changed, 78 insertions(+)
>  create mode 100644 docs/devel/ci-plan.rst
> 
> diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
> new file mode 100644
> index 0000000000..5e95b6bcea
> --- /dev/null
> +++ b/docs/devel/ci-plan.rst
> @@ -0,0 +1,77 @@
> +The GitLab CI structure
> +=======================
> +
> +This section describes the current state of the QEMU GitLab CI and the
> +high-level plan for its future.
> +
> +Current state
> +-------------
> +
> +The mainstream QEMU project considers the GitLab CI its primary CI system.
> +Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 are QEMU
> +build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
> +external job covering Travis jobs execution.
> +
> +In the current state, every push a user does to its fork runs most of the jobs
> +compared to the jobs running on the main repository. The exceptions are the
> +acceptance tests jobs, which run automatically on the main repository only.
> +Running most of the jobs in the user's fork or the main repository is not
> +viable. The job number tends to increase, becoming impractical to run all of
> +them on every single push.
> +
> +Future of QEMU GitLab CI
> +------------------------
> +
> +Following is a proposal to establish a high-level plan and set the
> +characteristics for the QEMU GitLab CI. The idea is to organize the CI by use
> +cases, avoid wasting resources and CI minutes, anticipating the time GitLab
> +starts to enforce minutes limits soon.
> +
> +Use cases
> +^^^^^^^^^
> +
> +Below is a list of the most common use cases for the QEMU GitLab CI.
> +
> +Gating
> +""""""
> +
> +The gating set of jobs runs on the maintainer's pull requests when the project
> +leader pushes them to the staging branch of the project. The gating CI pipeline
> +has the following characteristics:
> +
> + * Jobs tagged as gating run as part of the gating CI pipeline;
> + * The gating CI pipeline consists of stable jobs;
> + * The execution duration of the gating CI pipeline should, as much as possible,
> +   have an upper bound limit of 2 hours.
> +
> +Developers
> +""""""""""
> +
> +A developer working on a new feature or fixing an issue may want to run/propose
> +a specific set of tests. Those tests may, eventually, benefit other developers.
> +A developer CI pipeline has the following characteristics:
> +
> + * It is easy to run current tests available in the project;
> + * It is easy to add new tests or remove unneeded tests;
> + * It is flexible enough to allow changes in the current jobs.
> +
> +Maintainers
> +"""""""""""
> +
> +When accepting developers' patches, a maintainer may want to run a specific
> +test set. A maintainer CI pipeline has the following characteristics:
> +
> + * It consists of tests that are valuable for the subsystem;
> + * It is easy to run a set of specific tests available in the project;
> + * It is easy to add new tests or remove unneeded tests.


Neither of these describe why I use CI as a developer and/or subsys 
maintainer.

My desire with using CI is to (as close as possible) be able to 
execute the exact same  set of tests that will be run by gating CI
on pull requests.

My goal is to minimize (ideally eliminate) the risk that a patch 
series or pull request gets rejected with a need to re-spin due 
to CI failures. Each such rejection causes a round trip delaying 
merge, and this wastes my time & the maintainer/gate keepers' time. 
It is also hard to debug failures if I can't replicate the gating 
CI myself.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
  2021-09-15  8:59   ` Daniel P. Berrangé
@ 2021-09-15 13:51     ` Willian Rampazzo
  2021-09-15 14:07       ` Daniel P. Berrangé
  0 siblings, 1 reply; 6+ messages in thread
From: Willian Rampazzo @ 2021-09-15 13:51 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, Thomas Huth, Alex Bennée, qemu-devel,
	Wainer dos Santos Moschetta, Cleber Rosa,
	Philippe Mathieu-Daudé

On Wed, Sep 15, 2021 at 6:00 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Sep 14, 2021 at 03:48:30PM -0300, Willian Rampazzo wrote:
> > This adds a high-level plan for the QEMU GitLab CI based on use cases.
> > The idea is to have a base for evolving the QEMU CI. It sets high-level
> > characteristics for the QEMU CI use cases, which helps guide its
> > development.
> >
> > Signed-off-by: Willian Rampazzo <willianr@redhat.com>
> > ---
> >  docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
> >  docs/devel/ci.rst      |  1 +
> >  2 files changed, 78 insertions(+)
> >  create mode 100644 docs/devel/ci-plan.rst
> >
> > diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
> > new file mode 100644
> > index 0000000000..5e95b6bcea
> > --- /dev/null
> > +++ b/docs/devel/ci-plan.rst
> > @@ -0,0 +1,77 @@
> > +The GitLab CI structure
> > +=======================
> > +
> > +This section describes the current state of the QEMU GitLab CI and the
> > +high-level plan for its future.
> > +
> > +Current state
> > +-------------
> > +
> > +The mainstream QEMU project considers the GitLab CI its primary CI system.
> > +Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 are QEMU
> > +build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
> > +external job covering Travis jobs execution.
> > +
> > +In the current state, every push a user does to its fork runs most of the jobs
> > +compared to the jobs running on the main repository. The exceptions are the
> > +acceptance tests jobs, which run automatically on the main repository only.
> > +Running most of the jobs in the user's fork or the main repository is not
> > +viable. The job number tends to increase, becoming impractical to run all of
> > +them on every single push.
> > +
> > +Future of QEMU GitLab CI
> > +------------------------
> > +
> > +Following is a proposal to establish a high-level plan and set the
> > +characteristics for the QEMU GitLab CI. The idea is to organize the CI by use
> > +cases, avoid wasting resources and CI minutes, anticipating the time GitLab
> > +starts to enforce minutes limits soon.
> > +
> > +Use cases
> > +^^^^^^^^^
> > +
> > +Below is a list of the most common use cases for the QEMU GitLab CI.
> > +
> > +Gating
> > +""""""
> > +
> > +The gating set of jobs runs on the maintainer's pull requests when the project
> > +leader pushes them to the staging branch of the project. The gating CI pipeline
> > +has the following characteristics:
> > +
> > + * Jobs tagged as gating run as part of the gating CI pipeline;
> > + * The gating CI pipeline consists of stable jobs;
> > + * The execution duration of the gating CI pipeline should, as much as possible,
> > +   have an upper bound limit of 2 hours.
> > +
> > +Developers
> > +""""""""""
> > +
> > +A developer working on a new feature or fixing an issue may want to run/propose
> > +a specific set of tests. Those tests may, eventually, benefit other developers.
> > +A developer CI pipeline has the following characteristics:
> > +
> > + * It is easy to run current tests available in the project;
> > + * It is easy to add new tests or remove unneeded tests;
> > + * It is flexible enough to allow changes in the current jobs.
> > +
> > +Maintainers
> > +"""""""""""
> > +
> > +When accepting developers' patches, a maintainer may want to run a specific
> > +test set. A maintainer CI pipeline has the following characteristics:
> > +
> > + * It consists of tests that are valuable for the subsystem;
> > + * It is easy to run a set of specific tests available in the project;
> > + * It is easy to add new tests or remove unneeded tests.
>
>
> Neither of these describe why I use CI as a developer and/or subsys
> maintainer.
>
> My desire with using CI is to (as close as possible) be able to
> execute the exact same  set of tests that will be run by gating CI
> on pull requests.

I totally understand your desire and I think it is valid.

What I'm trying with this proposal is the same strategy we used when
we started planning for the gating CI. The decision was to start
small. Today the CI grew and we don´t have a so called gating CI yet,
but a bunch of jobs that runs on staging branch, some needing
reevaluation whether they should run on staging or not.

My proposal is to organize the CI, so it runs a gating set of jobs and
is flexible enough for developers and maintainers. Just like when we
started planning for the CI, this is another step I think we need to
take now. Unfortunately, at this time, flexibility does not mean
running all the gating jobs mainly because of hardware limitations.
Still, I don't think we can continue without organizing what we have
today.

>
> My goal is to minimize (ideally eliminate) the risk that a patch
> series or pull request gets rejected with a need to re-spin due
> to CI failures. Each such rejection causes a round trip delaying
> merge, and this wastes my time & the maintainer/gate keepers' time.
> It is also hard to debug failures if I can't replicate the gating
> CI myself.

Again, I totally agree with you. That would be the perfect scenario.

The barrier I see to have it working the way you described is the
hardware access. The staging branch runs on two different custom
runners. We have two possible solutions to accomplish the scenario you
described: remove the custom runners from the staging branch and let
the jobs run on the GitLab CI shared runners, which everyone with
access to GitLab can use, or allow developers to access the custom
runners.

Today, I don´t think any of those options are feasible or bring value
to the project. That is one of the reasons I'm not covering it now in
the future plan. As I mentioned before, let's take another small step
and organize a gating CI with some ground rules. When we reach it, the
future future step can be to implement merge requests, think about
reproducibility, and so on.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
  2021-09-15 13:51     ` Willian Rampazzo
@ 2021-09-15 14:07       ` Daniel P. Berrangé
  2021-09-15 15:59         ` Willian Rampazzo
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel P. Berrangé @ 2021-09-15 14:07 UTC (permalink / raw)
  To: Willian Rampazzo
  Cc: Peter Maydell, Thomas Huth, Alex Bennée, qemu-devel,
	Wainer dos Santos Moschetta, Cleber Rosa,
	Philippe Mathieu-Daudé

On Wed, Sep 15, 2021 at 10:51:56AM -0300, Willian Rampazzo wrote:
> On Wed, Sep 15, 2021 at 6:00 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Tue, Sep 14, 2021 at 03:48:30PM -0300, Willian Rampazzo wrote:
> > > This adds a high-level plan for the QEMU GitLab CI based on use cases.
> > > The idea is to have a base for evolving the QEMU CI. It sets high-level
> > > characteristics for the QEMU CI use cases, which helps guide its
> > > development.
> > >
> > > Signed-off-by: Willian Rampazzo <willianr@redhat.com>
> > > ---
> > >  docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
> > >  docs/devel/ci.rst      |  1 +
> > >  2 files changed, 78 insertions(+)
> > >  create mode 100644 docs/devel/ci-plan.rst
> > >
> > > diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
> > > new file mode 100644
> > > index 0000000000..5e95b6bcea
> > > --- /dev/null
> > > +++ b/docs/devel/ci-plan.rst
> > > @@ -0,0 +1,77 @@
> > > +The GitLab CI structure
> > > +=======================
> > > +
> > > +This section describes the current state of the QEMU GitLab CI and the
> > > +high-level plan for its future.
> > > +
> > > +Current state
> > > +-------------
> > > +
> > > +The mainstream QEMU project considers the GitLab CI its primary CI system.
> > > +Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 are QEMU
> > > +build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
> > > +external job covering Travis jobs execution.
> > > +
> > > +In the current state, every push a user does to its fork runs most of the jobs
> > > +compared to the jobs running on the main repository. The exceptions are the
> > > +acceptance tests jobs, which run automatically on the main repository only.
> > > +Running most of the jobs in the user's fork or the main repository is not
> > > +viable. The job number tends to increase, becoming impractical to run all of
> > > +them on every single push.
> > > +
> > > +Future of QEMU GitLab CI
> > > +------------------------
> > > +
> > > +Following is a proposal to establish a high-level plan and set the
> > > +characteristics for the QEMU GitLab CI. The idea is to organize the CI by use
> > > +cases, avoid wasting resources and CI minutes, anticipating the time GitLab
> > > +starts to enforce minutes limits soon.
> > > +
> > > +Use cases
> > > +^^^^^^^^^
> > > +
> > > +Below is a list of the most common use cases for the QEMU GitLab CI.
> > > +
> > > +Gating
> > > +""""""
> > > +
> > > +The gating set of jobs runs on the maintainer's pull requests when the project
> > > +leader pushes them to the staging branch of the project. The gating CI pipeline
> > > +has the following characteristics:
> > > +
> > > + * Jobs tagged as gating run as part of the gating CI pipeline;
> > > + * The gating CI pipeline consists of stable jobs;
> > > + * The execution duration of the gating CI pipeline should, as much as possible,
> > > +   have an upper bound limit of 2 hours.
> > > +
> > > +Developers
> > > +""""""""""
> > > +
> > > +A developer working on a new feature or fixing an issue may want to run/propose
> > > +a specific set of tests. Those tests may, eventually, benefit other developers.
> > > +A developer CI pipeline has the following characteristics:
> > > +
> > > + * It is easy to run current tests available in the project;
> > > + * It is easy to add new tests or remove unneeded tests;
> > > + * It is flexible enough to allow changes in the current jobs.
> > > +
> > > +Maintainers
> > > +"""""""""""
> > > +
> > > +When accepting developers' patches, a maintainer may want to run a specific
> > > +test set. A maintainer CI pipeline has the following characteristics:
> > > +
> > > + * It consists of tests that are valuable for the subsystem;
> > > + * It is easy to run a set of specific tests available in the project;
> > > + * It is easy to add new tests or remove unneeded tests.
> >
> >
> > Neither of these describe why I use CI as a developer and/or subsys
> > maintainer.
> >
> > My desire with using CI is to (as close as possible) be able to
> > execute the exact same  set of tests that will be run by gating CI
> > on pull requests.
> 
> I totally understand your desire and I think it is valid.
> 
> What I'm trying with this proposal is the same strategy we used when
> we started planning for the gating CI. The decision was to start
> small. Today the CI grew and we don´t have a so called gating CI yet,
> but a bunch of jobs that runs on staging branch, some needing
> reevaluation whether they should run on staging or not.

Of course we have a gating CI today, it is the very thing you just
described. Whether or not the set of CI jobs that run on staging is
designed ground up, or evolved organically is irrelevant. It is what
exists today and is used to test merges to master, so by definition
is is our gating CI.  The set of jobs will never be perfect because
we're in a changing world, so they will always need re-evaluation
periodically to judge whether they're the right mix for our current
needs.

> > My goal is to minimize (ideally eliminate) the risk that a patch
> > series or pull request gets rejected with a need to re-spin due
> > to CI failures. Each such rejection causes a round trip delaying
> > merge, and this wastes my time & the maintainer/gate keepers' time.
> > It is also hard to debug failures if I can't replicate the gating
> > CI myself.
> 
> Again, I totally agree with you. That would be the perfect scenario.

Aside from the custom runners, it is the scenario that exists today
and is relied on by many people. That existing usage and starting 
point has to be acknowledged in any CI plan that is proposed.

> The barrier I see to have it working the way you described is the
> hardware access. The staging branch runs on two different custom
> runners. We have two possible solutions to accomplish the scenario you
> described: remove the custom runners from the staging branch and let
> the jobs run on the GitLab CI shared runners, which everyone with
> access to GitLab can use, or allow developers to access the custom
> runners.

It isn't that large of a barrier IMHO. It will be slow, but people
can bring up custom runners for ppc/s390 using QEMU VMs if they lack
access to hardware. The most important is the build coverage and 
that's already acquired via the cross compilers. The custom runners 
essentially only add "make check" as a benefit.

> Today, I don´t think any of those options are feasible or bring value
> to the project. That is one of the reasons I'm not covering it now in
> the future plan. As I mentioned before, let's take another small step
> and organize a gating CI with some ground rules. When we reach it, the
> future future step can be to implement merge requests, think about
> reproducibility, and so on.

Being able to replicate gating CI jobs as a contributor is the most 
critical starting point for any plan. Historically diagnosing failures
in gating CI has been the biggest pain point in submitting code to QEMU,
and why myself and others have spent so much time on Travis, and now 
GitLab config to let us have a well defined environment and ruleset for
build jobs. That can't be ignored by any proposed CI plan.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI
  2021-09-15 14:07       ` Daniel P. Berrangé
@ 2021-09-15 15:59         ` Willian Rampazzo
  0 siblings, 0 replies; 6+ messages in thread
From: Willian Rampazzo @ 2021-09-15 15:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, Thomas Huth, Alex Bennée, qemu-devel,
	Wainer dos Santos Moschetta, Cleber Rosa,
	Philippe Mathieu-Daudé

On Wed, Sep 15, 2021 at 11:07 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, Sep 15, 2021 at 10:51:56AM -0300, Willian Rampazzo wrote:
> > On Wed, Sep 15, 2021 at 6:00 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > On Tue, Sep 14, 2021 at 03:48:30PM -0300, Willian Rampazzo wrote:
> > > > This adds a high-level plan for the QEMU GitLab CI based on use cases.
> > > > The idea is to have a base for evolving the QEMU CI. It sets high-level
> > > > characteristics for the QEMU CI use cases, which helps guide its
> > > > development.
> > > >
> > > > Signed-off-by: Willian Rampazzo <willianr@redhat.com>
> > > > ---
> > > >  docs/devel/ci-plan.rst | 77 ++++++++++++++++++++++++++++++++++++++++++
> > > >  docs/devel/ci.rst      |  1 +
> > > >  2 files changed, 78 insertions(+)
> > > >  create mode 100644 docs/devel/ci-plan.rst
> > > >
> > > > diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
> > > > new file mode 100644
> > > > index 0000000000..5e95b6bcea
> > > > --- /dev/null
> > > > +++ b/docs/devel/ci-plan.rst
> > > > @@ -0,0 +1,77 @@
> > > > +The GitLab CI structure
> > > > +=======================
> > > > +
> > > > +This section describes the current state of the QEMU GitLab CI and the
> > > > +high-level plan for its future.
> > > > +
> > > > +Current state
> > > > +-------------
> > > > +
> > > > +The mainstream QEMU project considers the GitLab CI its primary CI system.
> > > > +Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 are QEMU
> > > > +build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
> > > > +external job covering Travis jobs execution.
> > > > +
> > > > +In the current state, every push a user does to its fork runs most of the jobs
> > > > +compared to the jobs running on the main repository. The exceptions are the
> > > > +acceptance tests jobs, which run automatically on the main repository only.
> > > > +Running most of the jobs in the user's fork or the main repository is not
> > > > +viable. The job number tends to increase, becoming impractical to run all of
> > > > +them on every single push.
> > > > +
> > > > +Future of QEMU GitLab CI
> > > > +------------------------
> > > > +
> > > > +Following is a proposal to establish a high-level plan and set the
> > > > +characteristics for the QEMU GitLab CI. The idea is to organize the CI by use
> > > > +cases, avoid wasting resources and CI minutes, anticipating the time GitLab
> > > > +starts to enforce minutes limits soon.
> > > > +
> > > > +Use cases
> > > > +^^^^^^^^^
> > > > +
> > > > +Below is a list of the most common use cases for the QEMU GitLab CI.
> > > > +
> > > > +Gating
> > > > +""""""
> > > > +
> > > > +The gating set of jobs runs on the maintainer's pull requests when the project
> > > > +leader pushes them to the staging branch of the project. The gating CI pipeline
> > > > +has the following characteristics:
> > > > +
> > > > + * Jobs tagged as gating run as part of the gating CI pipeline;
> > > > + * The gating CI pipeline consists of stable jobs;
> > > > + * The execution duration of the gating CI pipeline should, as much as possible,
> > > > +   have an upper bound limit of 2 hours.
> > > > +
> > > > +Developers
> > > > +""""""""""
> > > > +
> > > > +A developer working on a new feature or fixing an issue may want to run/propose
> > > > +a specific set of tests. Those tests may, eventually, benefit other developers.
> > > > +A developer CI pipeline has the following characteristics:
> > > > +
> > > > + * It is easy to run current tests available in the project;
> > > > + * It is easy to add new tests or remove unneeded tests;
> > > > + * It is flexible enough to allow changes in the current jobs.
> > > > +
> > > > +Maintainers
> > > > +"""""""""""
> > > > +
> > > > +When accepting developers' patches, a maintainer may want to run a specific
> > > > +test set. A maintainer CI pipeline has the following characteristics:
> > > > +
> > > > + * It consists of tests that are valuable for the subsystem;
> > > > + * It is easy to run a set of specific tests available in the project;
> > > > + * It is easy to add new tests or remove unneeded tests.
> > >
> > >
> > > Neither of these describe why I use CI as a developer and/or subsys
> > > maintainer.
> > >
> > > My desire with using CI is to (as close as possible) be able to
> > > execute the exact same  set of tests that will be run by gating CI
> > > on pull requests.
> >
> > I totally understand your desire and I think it is valid.
> >
> > What I'm trying with this proposal is the same strategy we used when
> > we started planning for the gating CI. The decision was to start
> > small. Today the CI grew and we don´t have a so called gating CI yet,
> > but a bunch of jobs that runs on staging branch, some needing
> > reevaluation whether they should run on staging or not.
>
> Of course we have a gating CI today, it is the very thing you just
> described. Whether or not the set of CI jobs that run on staging is
> designed ground up, or evolved organically is irrelevant. It is what
> exists today and is used to test merges to master, so by definition
> is is our gating CI.  The set of jobs will never be perfect because
> we're in a changing world, so they will always need re-evaluation
> periodically to judge whether they're the right mix for our current
> needs.

Okay, let me rephrase my sentence. Today the CI grew, and we have an
opportunity to improve the gating CI to reduce the number of manual
interventions we have and make it fit the project better. For example,
during the release freeze window, or right before it, sometimes the
gating CI execution was ignored because it took too much time to
execute. Another example is the set of flaky tests we have running
today. They should not be part of the gating CI.

>
> > > My goal is to minimize (ideally eliminate) the risk that a patch
> > > series or pull request gets rejected with a need to re-spin due
> > > to CI failures. Each such rejection causes a round trip delaying
> > > merge, and this wastes my time & the maintainer/gate keepers' time.
> > > It is also hard to debug failures if I can't replicate the gating
> > > CI myself.
> >
> > Again, I totally agree with you. That would be the perfect scenario.
>
> Aside from the custom runners, it is the scenario that exists today
> and is relied on by many people. That existing usage and starting
> point has to be acknowledged in any CI plan that is proposed.

If I understood correctly, we should first find a way to let the
developers run the same jobs as the gating CI and then think about
other improvements, right? I can adjust the proposal to list that, no
problem. At least we have a plan.

>
> > The barrier I see to have it working the way you described is the
> > hardware access. The staging branch runs on two different custom
> > runners. We have two possible solutions to accomplish the scenario you
> > described: remove the custom runners from the staging branch and let
> > the jobs run on the GitLab CI shared runners, which everyone with
> > access to GitLab can use, or allow developers to access the custom
> > runners.
>
> It isn't that large of a barrier IMHO. It will be slow, but people
> can bring up custom runners for ppc/s390 using QEMU VMs if they lack
> access to hardware. The most important is the build coverage and
> that's already acquired via the cross compilers. The custom runners
> essentially only add "make check" as a benefit.

Okay, I can adjust the plan to list this too. My only concern is about
those developers that do not have access to a custom runner, but we
can discuss it during the implementation.

>
> > Today, I don´t think any of those options are feasible or bring value
> > to the project. That is one of the reasons I'm not covering it now in
> > the future plan. As I mentioned before, let's take another small step
> > and organize a gating CI with some ground rules. When we reach it, the
> > future future step can be to implement merge requests, think about
> > reproducibility, and so on.
>
> Being able to replicate gating CI jobs as a contributor is the most
> critical starting point for any plan. Historically diagnosing failures
> in gating CI has been the biggest pain point in submitting code to QEMU,
> and why myself and others have spent so much time on Travis, and now
> GitLab config to let us have a well defined environment and ruleset for
> build jobs. That can't be ignored by any proposed CI plan.

Alright, I can adjust the plan to add this too.

And just a side note, I never said the work done until now is not
valuable. I'm sure all the work done until now in the CI is valuable.
I feel today that we reached a point where we need to talk about the
next steps. I personally find it difficult to contribute with the CI
because there are diverging ideas about what we should do next, so
having a high-level plan helps newcomers interested in contributing
with the CI.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-15 16:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-14 18:48 [RFC 0/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI Willian Rampazzo
2021-09-14 18:48 ` [RFC 1/1] " Willian Rampazzo
2021-09-15  8:59   ` Daniel P. Berrangé
2021-09-15 13:51     ` Willian Rampazzo
2021-09-15 14:07       ` Daniel P. Berrangé
2021-09-15 15:59         ` Willian Rampazzo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.