qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC] QEMU Gating CI
@ 2019-12-02 14:05 Cleber Rosa
  2019-12-02 17:00 ` Stefan Hajnoczi
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Cleber Rosa @ 2019-12-02 14:05 UTC (permalink / raw)
  To: qemu-devel, Peter Maydell, Alex Bennée
  Cc: Ademar Reis, Jeff Nelson, Stefan Hajnoczi,
	Wainer dos Santos Moschetta, Markus Armbruster

RFC: QEMU Gating CI
===================

This RFC attempts to address most of the issues described in
"Requirements/GatinCI"[1].  An also relevant write up is the "State of
QEMU CI as we enter 4.0"[2].

The general approach is one to minimize the infrastructure maintenance
and development burden, leveraging as much as possible "other people's"
infrastructure and code.  GitLab's CI/CD platform is the most relevant
component dealt with here.

Problem Statement
-----------------

The following is copied verbatim from Peter Maydell's write up[1]:

"A gating CI is a prerequisite to having a multi-maintainer model of
merging. By having a common set of tests that are run prior to a merge
you do not rely on who is currently doing merging duties having access
to the current set of test machines."

This is of a very simplified view of the problem that I'd like to break
down even further into the following key points:

 * Common set of tests
 * Pre-merge ("prior to a merge")
 * Access to the current set of test machines
 * Multi-maintainer model

Common set of tests
~~~~~~~~~~~~~~~~~~~

Before we delve any further, let's make it clear that a "common set of
tests" is really a "dynamic common set of tests".  My point is that a
set of tests in QEMU may include or exclude different tests depending
on the environment.

The exact tests that will be executed may differ depending on the
environment, including:

 * Hardware
 * Operating system
 * Build configuration
 * Environment variables

In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
those "common set of tests":

 * check
 * check-tcg
 * check-softfloat
 * check-block
 * check-acceptance

While Peter mentions that most of his checks are limited to:

 * check
 * check-tcg

Our current inability to quickly identify a faulty test from test
execution results (and specially in remote environments), and act upon
it (say quickly disable it on a given host platform), makes me believe
that it's fair to start a gating CI implementation that uses this
rather coarse granularity.

Another benefit is a close or even a 1:1 relationship between a common
test set and an entry in the CI configuration.  For instance, the
"check" common test set would map to a "make check" command in a
"script:" YAML entry.

To exemplify my point, if one specific test run as part of "check-tcg"
is found to be faulty on a specific job (say on a specific OS), the
entire "check-tcg" test set may be disabled as a CI-level maintenance
action.  Of course a follow up action to deal with the specific test
is required, probably in the form of a Launchpad bug and patches
dealing with the issue, but without necessarily a CI related angle to
it.

If/when test result presentation and control mechanism evolve, we may
feel confident and go into finer grained granularity.  For instance, a
mechanism for disabling nothing but "tests/migration-test" on a given
environment would be possible and desirable from a CI management level.

Pre-merge
~~~~~~~~~

The natural way to have pre-merge CI jobs in GitLab is to send "Merge
Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
MR comes from individual contributors, usually the authors of the
changes themselves.  It's my understanding that the current maintainer
model employed in QEMU will *not* change at this time, meaning that
code contributions and reviews will continue to happen on the mailing
list.  A maintainer then, having collected a number of patches, would
submit a MR either in addition or in substitution to the Pull Requests
sent to the mailing list.

"Pipelines for Merged Results"[4] is a very important feature to
support the multi-maintainer model, and looks in practice, similar to
Peter's "staging" branch approach, with an "automatic refresh" of the
target branch.  It can give a maintainer extra confidence that a MR
will play nicely with the updated status of the target branch.  It's
my understanding that it should be the "key to the gates".  A minor
note is that conflicts are still possible in a multi-maintainer model
if there are more than one person doing the merges.

A worthy point is that the GitLab web UI is not the only way to create
a Merge Request, but a rich set of APIs are available[5].  This is
interesting for many reasons, and maybe some of Peter's
"apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
addresses checks could be made earlier) as part of a
"send-mergereq"-like script, bringing conformance earlier on the merge
process, at the MR creation stage.

Note: It's possible to have CI jobs definition that are specific to
MR, allowing generic non-MR jobs to be kept on the default
configuration.  This can be used so individual contributors continue
to leverage some of the "free" (shared) runner made available on
gitlab.com.

Multi-maintainer model
~~~~~~~~~~~~~~~~~~~~~~

The previous section already introduced some of the proposed workflow
that can enable such a multi-maintainer model.  With a Gating CI
system, though, it will be natural to have a smaller "Mean time
between (CI) failures", simply because of the expected increased
number of systems and checks.  A lot of countermeasures have to be
employed to keep that MTBF in check.

For once, it's imperative that the maintainers for such systems and
jobs are clearly defined and readily accessible.  Either the same
MAINTAINERS file or a more suitable variation of such data should be
defined before activating the *gating* rules.  This would allow a
routing to request the attention of the maintainer responsible.

In case of unresposive maintainers, or any other condition that
renders and keeps one or more CI jobs failing for a given previously
established amount of time, the job can be demoted with an
"allow_failure" configuration[7].  Once such a change is commited, the
path to promotion would be just the same as in a newly added job
definition.

Note: In a future phase we can evaluate the creation of rules that
look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
and the execution of specific CI jobs, which would be the
responsibility of a given maintainer[8].

Access to the current set of test machines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When compared to the various CI systems and services already being
employed in the QEMU project, this is the most striking difference in
the proposed model.  Instead of relying on shared/public/free
resources, this proposal also deals with privately owned and
operated machines.

Even though the QEMU project operates with great cooperation, it's
crucial to define clear boundaries when it comes to machine access.
Restricted access to machines is important because:

 * The results of jobs are many times directly tied to the setup and
   status of machines.  Even "soft" changes such as removing or updating
   packages can introduce failures in jobs (this is greatly minimized
   but not completely eliminated when using containers or VMs).
   Updating firmware or changing its settings are also examples of
   changes that may change the outcome of jobs.

 * If maintainers will be accounted for the status of the jobs defined
   to run on specific machines, they must be sure of the status of the
   machines.

 * Machines need regular monitoring and will receive required
   maintainance actions which can cause job regressions.

Thus, there needs to be one clear way for machines to be *used* for
running jobs sent by different maintainers, while still prohibiting
any other privileged action that can cause permanent change to the
machine.  The GitLab agent (gitlab-runner) is designed to do just
that, and defining what will be excuted in a job (in a given system)
should be all that's generally allowed.  The job definition itself,
will of course be subject to code review before a maintainer decides
to send a MR containing such new or updated job definitions.

Still related to machine maintanance, it's highly desirable for jobs
tied to specific host machines to be introduced alongside with
documentation and/or scripts that can replicate the machine setup.  If
the machine setup steps can be easily and reliably reproduced, then:

 * Other people may help to debug issues and regressions if they
   happen to have the same hardware available

 * Other people may provide more machines to run the same types of
   jobs

 * If a machine maintainer goes MIA, it'd be easier to find another
   maintainer

GitLab Jobs and Pipelines
-------------------------

GitLab CI is built around two major concepts: jobs and pipelines.  The
current GitLab CI configuration in QEMU uses jobs only (or putting it
another way, all jobs in a single pipeline stage).  Consider the
folowing job definition[9]:

   build-tci:
    script:
    - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
    - ./configure --enable-tcg-interpreter
         --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
    - make -j2
    - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
    - for tg in $TARGETS ; do
        export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
        ./tests/boot-serial-test || exit 1 ;
        ./tests/cdrom-test || exit 1 ;
      done
    - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
    - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow

All the lines under "script" are performed sequentially.  It should be
clear that there's the possibility of breaking this down into multiple
stages, so that a build happens first, and then "common set of tests"
run in parallel.  Using the example above, it would look something
like:

   +---------------+------------------------+
   |  BUILD STAGE  |        TEST STAGE      |
   +---------------+------------------------+
   |   +-------+   |  +------------------+  |
   |   | build |   |  | boot-serial-test |  |
   |   +-------+   |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | cdrom-test       |  |
   |               |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | x86_64-pxe-test  |  |
   |               |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | s390x-pxe-test   |  |
   |               |  +------------------+  |
   |               |                        |
   +---------------+------------------------+

Of course it would be silly to break down that job into smaller jobs that
would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
the pipeline approach is valid because:

 * Common set of tests would run in parallel, giving a quicker result
   turnaround

 * It's easier to determine to possible nature of the problem with
   just the basic CI job status

 * Different maintainers could be defined for different "common set of
   tests", and again by leveraging the basic CI job status, automation
   for directed notification can be implemented

In the following example, "check-block" maintainers could be left
undisturbed with failures in the "check-acceptance" job:

   +---------------+------------------------+
   |  BUILD STAGE  |        TEST STAGE      |
   +---------------+------------------------+
   |   +-------+   |  +------------------+  |
   |   | build |   |  | check-block      |  |
   |   +-------+   |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | check-acceptance |  |
   |               |  +------------------+  |
   |               |                        |
   +---------------+------------------------+

The same logic applies for test sets for different targets.  For
instance, combining the two previous examples, there could different
maintainers defined for the different jobs on the test stage:

   +---------------+------------------------+
   |  BUILD STAGE  |        TEST STAGE      |
   +---------------+------------------------+
   |   +-------+   |  +------------------+  |
   |   | build |   |  | x86_64-block     |  |
   |   +-------+   |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | x86_64-acceptance|  |
   |               |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | s390x-block      |  |
   |               |  +------------------+  |
   |               |                        |
   |               |  +------------------+  |
   |               |  | s390x-acceptance |  |
   |               |  +------------------+  |
   +---------------+------------------------+

Current limitations for a multi-stage pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Because it's assumed that each job will happen in an isolated and
independent execution environment, jobs must explicitly define the
resources that will be shared between stages.  GitLab will make sure
the same source code revision will be available on all jobs
automatically.  Additionaly, GitLab supports the concept of artifacts.
By defining artifacts in the "build" stage, jobs in the "test" stage
can expect to have a copy of those artifacts automatically.

In theory, there's nothing that prevents an entire QEMU build
directory, to be treated as an artifact.  In practice, there are
predefined limits on GitLab that prevents that from being possible,
resulting in errors such as:

   Uploading artifacts...
   build: found 3164 matching files                   
   ERROR: Uploading artifacts to coordinator... too large archive
          id=xxxxxxx responseStatus=413 Request Entity Too Large
          status=413 Request Entity Too Large token=yyyyyyyyy
   FATAL: too large                                   
   ERROR: Job failed: exit code 1

As far as I can tell, this is an instance define limit that's clearly
influenced by storage costs.  I see a few possible solutions to this
limitation:

 1) Provide our own "artifact" like solution that uses our own storage
    solution

 2) Reduce or eliminate the dependency on a complete build tree

The first solution can go against the general trend of not having to
maintain CI infrastructure.  It could be made simpler by using cloud
storage, but there would still be some interaction with another
external infrastructure component.

I find the second solution preferrable, given that most tests depend
on having one or a few binaries available.  I've run multi-stage
pipelines with some of those binaries (qemu-img,
$target-softmmu/qemu-system-$target) defined as artifcats and they
behaved as expected.  But, this could require some intrusive changes
to the current "make"-based test invocation.

Job Naming convention
---------------------

Based only on the very simple examples job above, it should already be
clear that there's a lot of possibility for confusion and chaos.  For
instance, by looking at the "build" job definition or results, it's
very hard to tell what's really about.  A bit more could be inferred by
the "x86_64-block" job name.

Still, the problem we have to address here is not only about the
amount of information easily obtained from a job name, but allowing
for very similar job definitions within a global namespace.  For
instance, if we add an Operating Systems component to the mix, we need
an extra qualifier for unique job names.

Some of the possible components in a job definition are:

 * Stage
 * Build profile
 * Test set (a shorter name for what was described in the "Common set
   of tests" section)
 * Host architecture
 * Target architecture
 * Host Operating System identification (name and version)
 * Execution mode/environment (bare metal, container, VM, etc)

Stage
~~~~~

The stage of a job (which maps roughly to its purpose) should be
clearly defined.  A job that builds QEMU should start with "build" and
a job that tests QEMU should start with "test".

IMO, in a second phase, once multi-stage pipelines are taken for
granted, we could evaluate dropping this component altogether from the
naming convention, and relying purely on the stage classification.

Build profile
~~~~~~~~~~~~~

Different build profiles already abound in QEMU's various CI
configuration files.  It's hard to put a naming convention here,
except that it should represent the most distinguishable
characteristics of the build configuration.  For instance, we can find
a "build-disabled" job in the current ".gitlab-ci.yml" file that is
aptly named, as it forcefully disables a lot of build options.

Test set
~~~~~~~~

As mentioned in the "Common set of tests" section, I believe that the
make target name can be used to identify the test set that will be
executed in a job.  That is, if a job is to be run at the "test"
stage, and will run "make check", its name should start with
"test-check".

QEMU Targets
~~~~~~~~~~~~

Because a given job could, and usually do, involve multiple targets, I
honestly can not think of how to add this to the naming convention.
I'll ignore it for now, and consider the targets are defined in the
build profile.

Host Architecture
~~~~~~~~~~~~~~~~~

The host architecture name convention should be an easy pick, given
that QEMU itself employes a architecture convention for its targets.

Host OS
~~~~~~~

The suggestion I have for the host OS name is to follow the
libosinfo[10] convention as closely as possible.  libosinfo's "Short
ID" should be well suitable here.  Examples include: "openbsd4.2",
"opensuse42.3", "rhel8.0", "ubuntu9.10" and "win2k12r2".

Execution Environment
~~~~~~~~~~~~~~~~~~~~~

Distinguishing between running tests in a bare-metal versus a nested
VM environment is quite significant to a number of people.

Still, I think it could probably be optional for the initial
implementation phase, like the naming convention for the QEMU Targets.

Example 1
~~~~~~~~~

Defining a job that will build QEMU with common debug options, on
a RHEL 8.0 system on a x86_64 host:

   build-debug-rhel8.0-x86_64:
     script:
       - ./configure --enable-debug
       - make

Example 2
~~~~~~~~~

Defining a job that will run the "qtest" test set on a NetBSD 8.1
system on an aarch64 host:

   test-qtest-netbsd8.1-aarch64:
     script:
       - make check-qtest

Job and Machine Scheduling
--------------------------

While the naming convention gives some information to human beings,
and hopefully allows for some order and avoids collusions on the
global job namespace, it's not enough to define where those jobs
should run.

Tags[11] is the available mechanism to tie jobs to specific machines
running the GitLab CI agent, "gitlab-runner".  Unfortunately, some
duplication seems unavoidable, in the sense that some of the naming
components listed above are machine specific, and will then need to be
also given as tags.

Note: it may be a good idea to be extra verbose with tags, by having a
qualifier prefix.  The justification is that tags also live in a
global namespace, and in theory, at a given point, tags of different
"categories", say a CPU name and Operating System name may collide.
Or, it may just be me being paranoid.

Example 1
~~~~~~~~~

   build-debug-rhel8.0-x86_64:
     tags:
       - rhel8.0
       - x86_64
     script:
       - ./configure --enable-debug
       - make

Example 2
~~~~~~~~~

   test-qtest-netbsd8.1-aarch64:
     tags:
       - netbsd8.1
       - aarch64
     script:
       - make check-qtest

Operating System definition versus Container Images
---------------------------------------------------

In the previous section and examples, we're assuming that tests will
run on machines that have registered "gitlab-runner" agents with
matching tags.  The tags given at gitlab-runner registration time
would of course match the same naming convention defined earlier.

So, if one is registering a "gitlab-runner" instance on a x86_64
machine, running RHEL 8.0, the tags "rhel8.0" and "x86_64" would be
given (possibly among others).

Nevertheless, most deployment scenarios will probably rely on jobs
being executed by gitlab-runner's container executor (currently
Docker-only).  This means that tags given to a job *may* drop the tag
associated with the host operating system selection, and instead
provide the ".gitlab-ci.yml" configuration directive that determines
the container image to be used.

Most jobs would probably *not* require a matching host operating
system and container images, but there should still be the capability
to make it a requirement.  For instance, jobs containing tests that
require the KVM accelerator on specific scenarios may require a
matching host Operating System.

Note: What was mentioned in the "Execution Environment" section under
the naming conventions section, is also closely related to this
requirement, that is, one may require a job to run under a container,
VM or bare metal.

Example 1
~~~~~~~~~

Build QEMU on a "rhel8.0" image hosted under the "qemuci" organization
and require the runner to support container execution:

   build-debug-rhel8.0-x86_64:
     tags:
       - x86_64
       - container
     image: qemuci/rhel8.0
     script:
       - ./configure --enable-debug
       - make

Example 2
~~~~~~~~~

Run check QEMU on a "rhel8.0" image hosted under the "qemuci"
organization and require the runner to support container execution and
be on a matching host:

   test-check-rhel8.0-x86_64:
     tags:
       - x86_64
       - rhel8.0
       - container
     image: qemuci/rhel8.0
     script:
       - make check

Next
----

Because this document is already too long and that can be distracting,
I decided to defer many other implementation level details to a second
RFC, alongside some code.

Some completementary topics that I have prepared include:

 * Container images creation, hosting and management
 * Advanced pipeline definitions
   - Job depedencies
   - Artifacts
   - Results
 * GitLab CI for Individial Contributors
 * GitLab runner:
   - Official and Custom Binaries
   - Executors
   - Security implications
   - Helper container images for non supported architectures
 * Checklists for:
   - Preparing and documenting machine setup
   - Proposing new runners and jobs
   - Runners and jobs promotions and demotions

Of course any other topics that spurr from this discussion will also
be added to the following threads.

References:
-----------
 [1] https://wiki.qemu.org/Requirements/GatingCI
 [2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg04909.html
 [3] https://docs.gitlab.com/ee/gitlab-basics/add-merge-request.html
 [4] https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_merged_results/index.html
 [5] https://docs.gitlab.com/ee/api/merge_requests.html#create-mr-pipeline
 [6] https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree/apply-pullreq
 [7] https://docs.gitlab.com/ee/ci/yaml/README.html#allow_failure
 [8] https://docs.gitlab.com/ee/ci/yaml/README.html#using-onlychanges-with-pipelines-for-merge-requests
 [9] https://github.com/qemu/qemu/blob/fb2246882a2c8d7f084ebe0617e97ac78467d156/.gitlab-ci.yml#L70 
 [10] https://libosinfo.org/
 [11] https://docs.gitlab.com/ee/ci/runners/README.html#using-tags



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 14:05 [RFC] QEMU Gating CI Cleber Rosa
@ 2019-12-02 17:00 ` Stefan Hajnoczi
  2019-12-02 17:08   ` Peter Maydell
  2019-12-02 18:12   ` Cleber Rosa
  2019-12-03 14:07 ` Alex Bennée
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-02 17:00 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Peter Maydell, qemu-devel, Wainer dos Santos Moschetta,
	Markus Armbruster, Jeff Nelson, Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 18244 bytes --]

On Mon, Dec 02, 2019 at 09:05:52AM -0500, Cleber Rosa wrote:
> RFC: QEMU Gating CI
> ===================

Excellent, thank you for your work on this!

> 
> This RFC attempts to address most of the issues described in
> "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> QEMU CI as we enter 4.0"[2].
> 
> The general approach is one to minimize the infrastructure maintenance
> and development burden, leveraging as much as possible "other people's"
> infrastructure and code.  GitLab's CI/CD platform is the most relevant
> component dealt with here.
> 
> Problem Statement
> -----------------
> 
> The following is copied verbatim from Peter Maydell's write up[1]:
> 
> "A gating CI is a prerequisite to having a multi-maintainer model of
> merging. By having a common set of tests that are run prior to a merge
> you do not rely on who is currently doing merging duties having access
> to the current set of test machines."
> 
> This is of a very simplified view of the problem that I'd like to break
> down even further into the following key points:
> 
>  * Common set of tests
>  * Pre-merge ("prior to a merge")
>  * Access to the current set of test machines
>  * Multi-maintainer model
> 
> Common set of tests
> ~~~~~~~~~~~~~~~~~~~
> 
> Before we delve any further, let's make it clear that a "common set of
> tests" is really a "dynamic common set of tests".  My point is that a
> set of tests in QEMU may include or exclude different tests depending
> on the environment.
> 
> The exact tests that will be executed may differ depending on the
> environment, including:
> 
>  * Hardware
>  * Operating system
>  * Build configuration
>  * Environment variables
> 
> In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> those "common set of tests":
> 
>  * check
>  * check-tcg
>  * check-softfloat
>  * check-block
>  * check-acceptance
> 
> While Peter mentions that most of his checks are limited to:
> 
>  * check
>  * check-tcg
> 
> Our current inability to quickly identify a faulty test from test
> execution results (and specially in remote environments), and act upon
> it (say quickly disable it on a given host platform), makes me believe
> that it's fair to start a gating CI implementation that uses this
> rather coarse granularity.
> 
> Another benefit is a close or even a 1:1 relationship between a common
> test set and an entry in the CI configuration.  For instance, the
> "check" common test set would map to a "make check" command in a
> "script:" YAML entry.
> 
> To exemplify my point, if one specific test run as part of "check-tcg"
> is found to be faulty on a specific job (say on a specific OS), the
> entire "check-tcg" test set may be disabled as a CI-level maintenance
> action.  Of course a follow up action to deal with the specific test
> is required, probably in the form of a Launchpad bug and patches
> dealing with the issue, but without necessarily a CI related angle to
> it.

I think this coarse level of granularity is unrealistic.  We cannot
disable 99 tests because of 1 known failure.  There must be a way of
disabling individual tests.  You don't need to implement it yourself,
but I think this needs to be solved by someone before a gating CI can be
put into use.

It probably involves adding a "make EXCLUDE_TESTS=foo,bar check"
variable so that .gitlab-ci.yml can be modified to exclude specific
tests on certain OSes.

> 
> If/when test result presentation and control mechanism evolve, we may
> feel confident and go into finer grained granularity.  For instance, a
> mechanism for disabling nothing but "tests/migration-test" on a given
> environment would be possible and desirable from a CI management level.
> 
> Pre-merge
> ~~~~~~~~~
> 
> The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> MR comes from individual contributors, usually the authors of the
> changes themselves.  It's my understanding that the current maintainer
> model employed in QEMU will *not* change at this time, meaning that
> code contributions and reviews will continue to happen on the mailing
> list.  A maintainer then, having collected a number of patches, would
> submit a MR either in addition or in substitution to the Pull Requests
> sent to the mailing list.
> 
> "Pipelines for Merged Results"[4] is a very important feature to
> support the multi-maintainer model, and looks in practice, similar to
> Peter's "staging" branch approach, with an "automatic refresh" of the
> target branch.  It can give a maintainer extra confidence that a MR
> will play nicely with the updated status of the target branch.  It's
> my understanding that it should be the "key to the gates".  A minor
> note is that conflicts are still possible in a multi-maintainer model
> if there are more than one person doing the merges.

The intention is to have only 1 active maintainer at a time.  The
maintainer will handle all merges for the current QEMU release and then
hand over to the next maintainer after the release has been made.

Solving the problem for multiple active maintainers is low priority at
the moment.

> A worthy point is that the GitLab web UI is not the only way to create
> a Merge Request, but a rich set of APIs are available[5].  This is
> interesting for many reasons, and maybe some of Peter's
> "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> addresses checks could be made earlier) as part of a
> "send-mergereq"-like script, bringing conformance earlier on the merge
> process, at the MR creation stage.
> 
> Note: It's possible to have CI jobs definition that are specific to
> MR, allowing generic non-MR jobs to be kept on the default
> configuration.  This can be used so individual contributors continue
> to leverage some of the "free" (shared) runner made available on
> gitlab.com.

I expected this section to say:
1. Maintainer sets up a personal gitlab.com account with a qemu.git fork.
2. Maintainer adds QEMU's CI tokens to their personal account.
3. Each time a maintainer pushes to their "staging" branch the CI
   triggers.

IMO this model is simpler than MRs because once it has been set up the
maintainer just uses git push.  Why are MRs necessary?

> Multi-maintainer model
> ~~~~~~~~~~~~~~~~~~~~~~
> 
> The previous section already introduced some of the proposed workflow
> that can enable such a multi-maintainer model.  With a Gating CI
> system, though, it will be natural to have a smaller "Mean time
> between (CI) failures", simply because of the expected increased
> number of systems and checks.  A lot of countermeasures have to be
> employed to keep that MTBF in check.

I expect the CI to be in a state of partial failure all the time.
Previously the idea of Tier 1 and Tier 2 platforms was raised where Tier
2 platforms can be failing without gating the CI.  I think this is
reality for us.  Niche host OSes and features fail and remain in the
failing state for days/weeks.  The CI should be designed to run in this
mode all the time.

> For once, it's imperative that the maintainers for such systems and
> jobs are clearly defined and readily accessible.  Either the same
> MAINTAINERS file or a more suitable variation of such data should be
> defined before activating the *gating* rules.  This would allow a
> routing to request the attention of the maintainer responsible.
> 
> In case of unresposive maintainers, or any other condition that
> renders and keeps one or more CI jobs failing for a given previously
> established amount of time, the job can be demoted with an
> "allow_failure" configuration[7].  Once such a change is commited, the
> path to promotion would be just the same as in a newly added job
> definition.
> 
> Note: In a future phase we can evaluate the creation of rules that
> look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> and the execution of specific CI jobs, which would be the
> responsibility of a given maintainer[8].
> 
> Access to the current set of test machines
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> When compared to the various CI systems and services already being
> employed in the QEMU project, this is the most striking difference in
> the proposed model.  Instead of relying on shared/public/free
> resources, this proposal also deals with privately owned and
> operated machines.
> 
> Even though the QEMU project operates with great cooperation, it's
> crucial to define clear boundaries when it comes to machine access.
> Restricted access to machines is important because:
> 
>  * The results of jobs are many times directly tied to the setup and
>    status of machines.  Even "soft" changes such as removing or updating
>    packages can introduce failures in jobs (this is greatly minimized
>    but not completely eliminated when using containers or VMs).
>    Updating firmware or changing its settings are also examples of
>    changes that may change the outcome of jobs.
> 
>  * If maintainers will be accounted for the status of the jobs defined
>    to run on specific machines, they must be sure of the status of the
>    machines.
> 
>  * Machines need regular monitoring and will receive required
>    maintainance actions which can cause job regressions.
> 
> Thus, there needs to be one clear way for machines to be *used* for
> running jobs sent by different maintainers, while still prohibiting
> any other privileged action that can cause permanent change to the
> machine.  The GitLab agent (gitlab-runner) is designed to do just
> that, and defining what will be excuted in a job (in a given system)
> should be all that's generally allowed.  The job definition itself,
> will of course be subject to code review before a maintainer decides
> to send a MR containing such new or updated job definitions.
> 
> Still related to machine maintanance, it's highly desirable for jobs
> tied to specific host machines to be introduced alongside with
> documentation and/or scripts that can replicate the machine setup.  If
> the machine setup steps can be easily and reliably reproduced, then:
> 
>  * Other people may help to debug issues and regressions if they
>    happen to have the same hardware available
> 
>  * Other people may provide more machines to run the same types of
>    jobs
> 
>  * If a machine maintainer goes MIA, it'd be easier to find another
>    maintainer

qemu.git has tests/vm for Ubuntu (i386), FreeBSD, NetBSD, OpenBSD,
CentOS, Fedora and tests/docker for Debian cross-compilation.  These are
a good starting point for automated/reproducible environments for
running builds and tests.  It would be great to integrate with
gitlab-runner.

> 
> GitLab Jobs and Pipelines
> -------------------------
> 
> GitLab CI is built around two major concepts: jobs and pipelines.  The
> current GitLab CI configuration in QEMU uses jobs only (or putting it
> another way, all jobs in a single pipeline stage).  Consider the
> folowing job definition[9]:
> 
>    build-tci:
>     script:
>     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
>     - ./configure --enable-tcg-interpreter
>          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
>     - make -j2
>     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
>     - for tg in $TARGETS ; do
>         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
>         ./tests/boot-serial-test || exit 1 ;
>         ./tests/cdrom-test || exit 1 ;
>       done
>     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
>     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
> 
> All the lines under "script" are performed sequentially.  It should be
> clear that there's the possibility of breaking this down into multiple
> stages, so that a build happens first, and then "common set of tests"
> run in parallel.  Using the example above, it would look something
> like:
> 
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | boot-serial-test |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | cdrom-test       |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | x86_64-pxe-test  |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-pxe-test   |  |
>    |               |  +------------------+  |
>    |               |                        |
>    +---------------+------------------------+
> 
> Of course it would be silly to break down that job into smaller jobs that
> would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
> the pipeline approach is valid because:
> 
>  * Common set of tests would run in parallel, giving a quicker result
>    turnaround
> 
>  * It's easier to determine to possible nature of the problem with
>    just the basic CI job status
> 
>  * Different maintainers could be defined for different "common set of
>    tests", and again by leveraging the basic CI job status, automation
>    for directed notification can be implemented
> 
> In the following example, "check-block" maintainers could be left
> undisturbed with failures in the "check-acceptance" job:
> 
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | check-block      |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | check-acceptance |  |
>    |               |  +------------------+  |
>    |               |                        |
>    +---------------+------------------------+
> 
> The same logic applies for test sets for different targets.  For
> instance, combining the two previous examples, there could different
> maintainers defined for the different jobs on the test stage:
> 
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | x86_64-block     |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | x86_64-acceptance|  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-block      |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-acceptance |  |
>    |               |  +------------------+  |
>    +---------------+------------------------+
> 
> Current limitations for a multi-stage pipeline
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Because it's assumed that each job will happen in an isolated and
> independent execution environment, jobs must explicitly define the
> resources that will be shared between stages.  GitLab will make sure
> the same source code revision will be available on all jobs
> automatically.  Additionaly, GitLab supports the concept of artifacts.
> By defining artifacts in the "build" stage, jobs in the "test" stage
> can expect to have a copy of those artifacts automatically.
> 
> In theory, there's nothing that prevents an entire QEMU build
> directory, to be treated as an artifact.  In practice, there are
> predefined limits on GitLab that prevents that from being possible,
> resulting in errors such as:
> 
>    Uploading artifacts...
>    build: found 3164 matching files                   
>    ERROR: Uploading artifacts to coordinator... too large archive
>           id=xxxxxxx responseStatus=413 Request Entity Too Large
>           status=413 Request Entity Too Large token=yyyyyyyyy
>    FATAL: too large                                   
>    ERROR: Job failed: exit code 1
> 
> As far as I can tell, this is an instance define limit that's clearly
> influenced by storage costs.  I see a few possible solutions to this
> limitation:
> 
>  1) Provide our own "artifact" like solution that uses our own storage
>     solution
> 
>  2) Reduce or eliminate the dependency on a complete build tree
> 
> The first solution can go against the general trend of not having to
> maintain CI infrastructure.  It could be made simpler by using cloud
> storage, but there would still be some interaction with another
> external infrastructure component.
> 
> I find the second solution preferrable, given that most tests depend
> on having one or a few binaries available.  I've run multi-stage
> pipelines with some of those binaries (qemu-img,
> $target-softmmu/qemu-system-$target) defined as artifcats and they
> behaved as expected.  But, this could require some intrusive changes
> to the current "make"-based test invocation.

I agree.  It should be possible to bring the necessary artifacts down to
below the limit.  This wasn't a problem for the virtio-fs GitLab CI
scripts I wrote that build a Linux kernel, QEMU, and guest image so I
think will be possible for QEMU as a whole:
https://gitlab.com/virtio-fs/virtio-fs-ci/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 17:00 ` Stefan Hajnoczi
@ 2019-12-02 17:08   ` Peter Maydell
  2019-12-02 18:28     ` Cleber Rosa
  2019-12-02 18:12   ` Cleber Rosa
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2019-12-02 17:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Markus Armbruster, Wainer dos Santos Moschetta, QEMU Developers,
	Jeff Nelson, Cleber Rosa, Alex Bennée, Ademar Reis

On Mon, 2 Dec 2019 at 17:00, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Dec 02, 2019 at 09:05:52AM -0500, Cleber Rosa wrote:
> > To exemplify my point, if one specific test run as part of "check-tcg"
> > is found to be faulty on a specific job (say on a specific OS), the
> > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > action.  Of course a follow up action to deal with the specific test
> > is required, probably in the form of a Launchpad bug and patches
> > dealing with the issue, but without necessarily a CI related angle to
> > it.
>
> I think this coarse level of granularity is unrealistic.  We cannot
> disable 99 tests because of 1 known failure.  There must be a way of
> disabling individual tests.  You don't need to implement it yourself,
> but I think this needs to be solved by someone before a gating CI can be
> put into use.
>
> It probably involves adding a "make EXCLUDE_TESTS=foo,bar check"
> variable so that .gitlab-ci.yml can be modified to exclude specific
> tests on certain OSes.

We don't have this at the moment, so I'm not sure we need to
add it as part of moving to doing merge testing via gitlab ?
The current process is "if the pullreq causes a test to fail
then the pullreq needs to be changed, perhaps by adding a
patch which disables the test on a particular platform if
necessary". Making that smoother might be nice, but I would
be a little wary about adding requirements to the move-to-gitlab
that don't absolutely need to be there.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 17:00 ` Stefan Hajnoczi
  2019-12-02 17:08   ` Peter Maydell
@ 2019-12-02 18:12   ` Cleber Rosa
  2019-12-03 14:14     ` Stefan Hajnoczi
  1 sibling, 1 reply; 22+ messages in thread
From: Cleber Rosa @ 2019-12-02 18:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, qemu-devel, Wainer dos Santos Moschetta,
	Markus Armbruster, Jeff Nelson, Alex Bennée, Ademar Reis

On Mon, Dec 02, 2019 at 05:00:18PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 02, 2019 at 09:05:52AM -0500, Cleber Rosa wrote:
> > RFC: QEMU Gating CI
> > ===================
> 
> Excellent, thank you for your work on this!
> 
> > 
> > This RFC attempts to address most of the issues described in
> > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > QEMU CI as we enter 4.0"[2].
> > 
> > The general approach is one to minimize the infrastructure maintenance
> > and development burden, leveraging as much as possible "other people's"
> > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > component dealt with here.
> > 
> > Problem Statement
> > -----------------
> > 
> > The following is copied verbatim from Peter Maydell's write up[1]:
> > 
> > "A gating CI is a prerequisite to having a multi-maintainer model of
> > merging. By having a common set of tests that are run prior to a merge
> > you do not rely on who is currently doing merging duties having access
> > to the current set of test machines."
> > 
> > This is of a very simplified view of the problem that I'd like to break
> > down even further into the following key points:
> > 
> >  * Common set of tests
> >  * Pre-merge ("prior to a merge")
> >  * Access to the current set of test machines
> >  * Multi-maintainer model
> > 
> > Common set of tests
> > ~~~~~~~~~~~~~~~~~~~
> > 
> > Before we delve any further, let's make it clear that a "common set of
> > tests" is really a "dynamic common set of tests".  My point is that a
> > set of tests in QEMU may include or exclude different tests depending
> > on the environment.
> > 
> > The exact tests that will be executed may differ depending on the
> > environment, including:
> > 
> >  * Hardware
> >  * Operating system
> >  * Build configuration
> >  * Environment variables
> > 
> > In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> > those "common set of tests":
> > 
> >  * check
> >  * check-tcg
> >  * check-softfloat
> >  * check-block
> >  * check-acceptance
> > 
> > While Peter mentions that most of his checks are limited to:
> > 
> >  * check
> >  * check-tcg
> > 
> > Our current inability to quickly identify a faulty test from test
> > execution results (and specially in remote environments), and act upon
> > it (say quickly disable it on a given host platform), makes me believe
> > that it's fair to start a gating CI implementation that uses this
> > rather coarse granularity.
> > 
> > Another benefit is a close or even a 1:1 relationship between a common
> > test set and an entry in the CI configuration.  For instance, the
> > "check" common test set would map to a "make check" command in a
> > "script:" YAML entry.
> > 
> > To exemplify my point, if one specific test run as part of "check-tcg"
> > is found to be faulty on a specific job (say on a specific OS), the
> > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > action.  Of course a follow up action to deal with the specific test
> > is required, probably in the form of a Launchpad bug and patches
> > dealing with the issue, but without necessarily a CI related angle to
> > it.
> 
> I think this coarse level of granularity is unrealistic.  We cannot
> disable 99 tests because of 1 known failure.  There must be a way of
> disabling individual tests.  You don't need to implement it yourself,
> but I think this needs to be solved by someone before a gating CI can be
> put into use.
>

IMO it should be realistic if you look at it from a "CI related
angle".  The pull request could still be revised and disable a single
test because of a known failure, but this would not be necessarily
related to the CI.

> It probably involves adding a "make EXCLUDE_TESTS=foo,bar check"
> variable so that .gitlab-ci.yml can be modified to exclude specific
> tests on certain OSes.
>

I certainly acknowledge the issue, but I don't think this (and many
other issues that will certainly come up) should be a blocker to the
transition to GitLab.

> > 
> > If/when test result presentation and control mechanism evolve, we may
> > feel confident and go into finer grained granularity.  For instance, a
> > mechanism for disabling nothing but "tests/migration-test" on a given
> > environment would be possible and desirable from a CI management level.
> > 
> > Pre-merge
> > ~~~~~~~~~
> > 
> > The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> > Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> > MR comes from individual contributors, usually the authors of the
> > changes themselves.  It's my understanding that the current maintainer
> > model employed in QEMU will *not* change at this time, meaning that
> > code contributions and reviews will continue to happen on the mailing
> > list.  A maintainer then, having collected a number of patches, would
> > submit a MR either in addition or in substitution to the Pull Requests
> > sent to the mailing list.
> > 
> > "Pipelines for Merged Results"[4] is a very important feature to
> > support the multi-maintainer model, and looks in practice, similar to
> > Peter's "staging" branch approach, with an "automatic refresh" of the
> > target branch.  It can give a maintainer extra confidence that a MR
> > will play nicely with the updated status of the target branch.  It's
> > my understanding that it should be the "key to the gates".  A minor
> > note is that conflicts are still possible in a multi-maintainer model
> > if there are more than one person doing the merges.
> 
> The intention is to have only 1 active maintainer at a time.  The
> maintainer will handle all merges for the current QEMU release and then
> hand over to the next maintainer after the release has been made.
> 
> Solving the problem for multiple active maintainers is low priority at
> the moment.
>

Even so, I have the impression that the following workflow:

 - Look at Merge Results Pipeline for MR#1
 - Merge MR #1
 - Hack on something else
 - Look at *automatically updated* Merge Results Pipeline for MR#2
 - Merge MR #2

Is better than:

 - Push PR #1 to staging
 - Wait for PR #1 Pipeline to finish
 - Look at PR #1 Pipeline results
 - Push staging into master
 - Push PR #2 to staging 
 - Wait for PR #2 Pipeline to finish
 - Push staging into master

But I don't think I'll be a direct user of those workflows, so I'm
completely open to feedback on it.

> > A worthy point is that the GitLab web UI is not the only way to create
> > a Merge Request, but a rich set of APIs are available[5].  This is
> > interesting for many reasons, and maybe some of Peter's
> > "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> > addresses checks could be made earlier) as part of a
> > "send-mergereq"-like script, bringing conformance earlier on the merge
> > process, at the MR creation stage.
> > 
> > Note: It's possible to have CI jobs definition that are specific to
> > MR, allowing generic non-MR jobs to be kept on the default
> > configuration.  This can be used so individual contributors continue
> > to leverage some of the "free" (shared) runner made available on
> > gitlab.com.
> 
> I expected this section to say:
> 1. Maintainer sets up a personal gitlab.com account with a qemu.git fork.
> 2. Maintainer adds QEMU's CI tokens to their personal account.
> 3. Each time a maintainer pushes to their "staging" branch the CI
>    triggers.
> 
> IMO this model is simpler than MRs because once it has been set up the
> maintainer just uses git push.  Why are MRs necessary?
>

I am not sure GitLab "Specific Runners" can be used from other
accounts/forks.  AFAICT, you'd need a MR to send jobs that would run
on those machines, because (again AFAICT) the token used to register
those gitlab-runner instances on those machines is not shareable
across forks.  But, I'll double check that.

> > Multi-maintainer model
> > ~~~~~~~~~~~~~~~~~~~~~~
> > 
> > The previous section already introduced some of the proposed workflow
> > that can enable such a multi-maintainer model.  With a Gating CI
> > system, though, it will be natural to have a smaller "Mean time
> > between (CI) failures", simply because of the expected increased
> > number of systems and checks.  A lot of countermeasures have to be
> > employed to keep that MTBF in check.
> 
> I expect the CI to be in a state of partial failure all the time.
> Previously the idea of Tier 1 and Tier 2 platforms was raised where Tier
> 2 platforms can be failing without gating the CI.  I think this is
> reality for us.  Niche host OSes and features fail and remain in the
> failing state for days/weeks.  The CI should be designed to run in this
> mode all the time.
>

The most important tool we'd have at the CI level is to "allow
failures" indeed.  GitLab CI itself doesn't provide the concept of
different tiers, so effectively we'd have to mimic that with jobs that
will not be blocking.  What I think we should use is, have a well
defined methodology, and tools, to either promote or demote
failing/passing jobs.

For example, a newly introduced job will always be in "allow failure"
mode (similar to Tier 2), until it proves itself by running reliably
for 100 runs or 2 months, whatever comes last.  Likewise, a job that
is not allowed to fail (similar to a Tier 1) would be demoted if it
fails twice and is not repaired within 24 hours.

> > For once, it's imperative that the maintainers for such systems and
> > jobs are clearly defined and readily accessible.  Either the same
> > MAINTAINERS file or a more suitable variation of such data should be
> > defined before activating the *gating* rules.  This would allow a
> > routing to request the attention of the maintainer responsible.
> > 
> > In case of unresposive maintainers, or any other condition that
> > renders and keeps one or more CI jobs failing for a given previously
> > established amount of time, the job can be demoted with an
> > "allow_failure" configuration[7].  Once such a change is commited, the
> > path to promotion would be just the same as in a newly added job
> > definition.
> > 
> > Note: In a future phase we can evaluate the creation of rules that
> > look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> > and the execution of specific CI jobs, which would be the
> > responsibility of a given maintainer[8].
> > 
> > Access to the current set of test machines
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > When compared to the various CI systems and services already being
> > employed in the QEMU project, this is the most striking difference in
> > the proposed model.  Instead of relying on shared/public/free
> > resources, this proposal also deals with privately owned and
> > operated machines.
> > 
> > Even though the QEMU project operates with great cooperation, it's
> > crucial to define clear boundaries when it comes to machine access.
> > Restricted access to machines is important because:
> > 
> >  * The results of jobs are many times directly tied to the setup and
> >    status of machines.  Even "soft" changes such as removing or updating
> >    packages can introduce failures in jobs (this is greatly minimized
> >    but not completely eliminated when using containers or VMs).
> >    Updating firmware or changing its settings are also examples of
> >    changes that may change the outcome of jobs.
> > 
> >  * If maintainers will be accounted for the status of the jobs defined
> >    to run on specific machines, they must be sure of the status of the
> >    machines.
> > 
> >  * Machines need regular monitoring and will receive required
> >    maintainance actions which can cause job regressions.
> > 
> > Thus, there needs to be one clear way for machines to be *used* for
> > running jobs sent by different maintainers, while still prohibiting
> > any other privileged action that can cause permanent change to the
> > machine.  The GitLab agent (gitlab-runner) is designed to do just
> > that, and defining what will be excuted in a job (in a given system)
> > should be all that's generally allowed.  The job definition itself,
> > will of course be subject to code review before a maintainer decides
> > to send a MR containing such new or updated job definitions.
> > 
> > Still related to machine maintanance, it's highly desirable for jobs
> > tied to specific host machines to be introduced alongside with
> > documentation and/or scripts that can replicate the machine setup.  If
> > the machine setup steps can be easily and reliably reproduced, then:
> > 
> >  * Other people may help to debug issues and regressions if they
> >    happen to have the same hardware available
> > 
> >  * Other people may provide more machines to run the same types of
> >    jobs
> > 
> >  * If a machine maintainer goes MIA, it'd be easier to find another
> >    maintainer
> 
> qemu.git has tests/vm for Ubuntu (i386), FreeBSD, NetBSD, OpenBSD,
> CentOS, Fedora and tests/docker for Debian cross-compilation.  These are
> a good starting point for automated/reproducible environments for
> running builds and tests.  It would be great to integrate with
> gitlab-runner.
>

Yes, the idea is to close the gap as much as possible, and make what
we already have on qemu.git available to CI/gitlab-runner and
vice-versa.

> > 
> > GitLab Jobs and Pipelines
> > -------------------------
> > 
> > GitLab CI is built around two major concepts: jobs and pipelines.  The
> > current GitLab CI configuration in QEMU uses jobs only (or putting it
> > another way, all jobs in a single pipeline stage).  Consider the
> > folowing job definition[9]:
> > 
> >    build-tci:
> >     script:
> >     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
> >     - ./configure --enable-tcg-interpreter
> >          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
> >     - make -j2
> >     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
> >     - for tg in $TARGETS ; do
> >         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
> >         ./tests/boot-serial-test || exit 1 ;
> >         ./tests/cdrom-test || exit 1 ;
> >       done
> >     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
> >     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
> > 
> > All the lines under "script" are performed sequentially.  It should be
> > clear that there's the possibility of breaking this down into multiple
> > stages, so that a build happens first, and then "common set of tests"
> > run in parallel.  Using the example above, it would look something
> > like:
> > 
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | boot-serial-test |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | cdrom-test       |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | x86_64-pxe-test  |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-pxe-test   |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    +---------------+------------------------+
> > 
> > Of course it would be silly to break down that job into smaller jobs that
> > would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
> > the pipeline approach is valid because:
> > 
> >  * Common set of tests would run in parallel, giving a quicker result
> >    turnaround
> > 
> >  * It's easier to determine to possible nature of the problem with
> >    just the basic CI job status
> > 
> >  * Different maintainers could be defined for different "common set of
> >    tests", and again by leveraging the basic CI job status, automation
> >    for directed notification can be implemented
> > 
> > In the following example, "check-block" maintainers could be left
> > undisturbed with failures in the "check-acceptance" job:
> > 
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | check-block      |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | check-acceptance |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    +---------------+------------------------+
> > 
> > The same logic applies for test sets for different targets.  For
> > instance, combining the two previous examples, there could different
> > maintainers defined for the different jobs on the test stage:
> > 
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | x86_64-block     |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | x86_64-acceptance|  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-block      |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-acceptance |  |
> >    |               |  +------------------+  |
> >    +---------------+------------------------+
> > 
> > Current limitations for a multi-stage pipeline
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > Because it's assumed that each job will happen in an isolated and
> > independent execution environment, jobs must explicitly define the
> > resources that will be shared between stages.  GitLab will make sure
> > the same source code revision will be available on all jobs
> > automatically.  Additionaly, GitLab supports the concept of artifacts.
> > By defining artifacts in the "build" stage, jobs in the "test" stage
> > can expect to have a copy of those artifacts automatically.
> > 
> > In theory, there's nothing that prevents an entire QEMU build
> > directory, to be treated as an artifact.  In practice, there are
> > predefined limits on GitLab that prevents that from being possible,
> > resulting in errors such as:
> > 
> >    Uploading artifacts...
> >    build: found 3164 matching files                   
> >    ERROR: Uploading artifacts to coordinator... too large archive
> >           id=xxxxxxx responseStatus=413 Request Entity Too Large
> >           status=413 Request Entity Too Large token=yyyyyyyyy
> >    FATAL: too large                                   
> >    ERROR: Job failed: exit code 1
> > 
> > As far as I can tell, this is an instance define limit that's clearly
> > influenced by storage costs.  I see a few possible solutions to this
> > limitation:
> > 
> >  1) Provide our own "artifact" like solution that uses our own storage
> >     solution
> > 
> >  2) Reduce or eliminate the dependency on a complete build tree
> > 
> > The first solution can go against the general trend of not having to
> > maintain CI infrastructure.  It could be made simpler by using cloud
> > storage, but there would still be some interaction with another
> > external infrastructure component.
> > 
> > I find the second solution preferrable, given that most tests depend
> > on having one or a few binaries available.  I've run multi-stage
> > pipelines with some of those binaries (qemu-img,
> > $target-softmmu/qemu-system-$target) defined as artifcats and they
> > behaved as expected.  But, this could require some intrusive changes
> > to the current "make"-based test invocation.
> 
> I agree.  It should be possible to bring the necessary artifacts down to
> below the limit.  This wasn't a problem for the virtio-fs GitLab CI
> scripts I wrote that build a Linux kernel, QEMU, and guest image so I
> think will be possible for QEMU as a whole:
> https://gitlab.com/virtio-fs/virtio-fs-ci/


Cool, thanks for the pointer and feedback!

- Cleber.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 17:08   ` Peter Maydell
@ 2019-12-02 18:28     ` Cleber Rosa
  2019-12-02 18:36       ` Warner Losh
  0 siblings, 1 reply; 22+ messages in thread
From: Cleber Rosa @ 2019-12-02 18:28 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Stefan Hajnoczi, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Jeff Nelson, Alex Bennée, Ademar Reis

On Mon, Dec 02, 2019 at 05:08:35PM +0000, Peter Maydell wrote:
> On Mon, 2 Dec 2019 at 17:00, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, Dec 02, 2019 at 09:05:52AM -0500, Cleber Rosa wrote:
> > > To exemplify my point, if one specific test run as part of "check-tcg"
> > > is found to be faulty on a specific job (say on a specific OS), the
> > > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > > action.  Of course a follow up action to deal with the specific test
> > > is required, probably in the form of a Launchpad bug and patches
> > > dealing with the issue, but without necessarily a CI related angle to
> > > it.
> >
> > I think this coarse level of granularity is unrealistic.  We cannot
> > disable 99 tests because of 1 known failure.  There must be a way of
> > disabling individual tests.  You don't need to implement it yourself,
> > but I think this needs to be solved by someone before a gating CI can be
> > put into use.
> >
> > It probably involves adding a "make EXCLUDE_TESTS=foo,bar check"
> > variable so that .gitlab-ci.yml can be modified to exclude specific
> > tests on certain OSes.
> 
> We don't have this at the moment, so I'm not sure we need to
> add it as part of moving to doing merge testing via gitlab ?
> The current process is "if the pullreq causes a test to fail
> then the pullreq needs to be changed, perhaps by adding a
> patch which disables the test on a particular platform if
> necessary". Making that smoother might be nice, but I would
> be a little wary about adding requirements to the move-to-gitlab
> that don't absolutely need to be there.
> 
> thanks
> -- PMM
> 

Right, it goes without saying that:

1) I acknowledge the problem (and I can have a long conversation
about it :)

2) I don't think it has to be a prerequisite to the "move-to-gitlab"
effort

Thanks,
- Cleber.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 18:28     ` Cleber Rosa
@ 2019-12-02 18:36       ` Warner Losh
  2019-12-02 22:38         ` Cleber Rosa
  0 siblings, 1 reply; 22+ messages in thread
From: Warner Losh @ 2019-12-02 18:36 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Peter Maydell, Stefan Hajnoczi, QEMU Developers,
	Wainer dos Santos Moschetta, Markus Armbruster, Jeff Nelson,
	Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 2685 bytes --]

On Mon, Dec 2, 2019 at 11:29 AM Cleber Rosa <crosa@redhat.com> wrote:

> On Mon, Dec 02, 2019 at 05:08:35PM +0000, Peter Maydell wrote:
> > On Mon, 2 Dec 2019 at 17:00, Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
> > >
> > > On Mon, Dec 02, 2019 at 09:05:52AM -0500, Cleber Rosa wrote:
> > > > To exemplify my point, if one specific test run as part of
> "check-tcg"
> > > > is found to be faulty on a specific job (say on a specific OS), the
> > > > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > > > action.  Of course a follow up action to deal with the specific test
> > > > is required, probably in the form of a Launchpad bug and patches
> > > > dealing with the issue, but without necessarily a CI related angle to
> > > > it.
> > >
> > > I think this coarse level of granularity is unrealistic.  We cannot
> > > disable 99 tests because of 1 known failure.  There must be a way of
> > > disabling individual tests.  You don't need to implement it yourself,
> > > but I think this needs to be solved by someone before a gating CI can
> be
> > > put into use.
> > >
> > > It probably involves adding a "make EXCLUDE_TESTS=foo,bar check"
> > > variable so that .gitlab-ci.yml can be modified to exclude specific
> > > tests on certain OSes.
> >
> > We don't have this at the moment, so I'm not sure we need to
> > add it as part of moving to doing merge testing via gitlab ?
> > The current process is "if the pullreq causes a test to fail
> > then the pullreq needs to be changed, perhaps by adding a
> > patch which disables the test on a particular platform if
> > necessary". Making that smoother might be nice, but I would
> > be a little wary about adding requirements to the move-to-gitlab
> > that don't absolutely need to be there.
> >
> > thanks
> > -- PMM
> >
>
> Right, it goes without saying that:
>
> 1) I acknowledge the problem (and I can have a long conversation
> about it :)
>

Just make sure that any pipeline and mandatory CI steps don't slow things
down too much... While the examples have talked about 1 or 2 pull requests
getting done in parallel, and that's great, the problem is when you try to
land 10 or 20 all at once, one that causes the failure and you aren't sure
which one it actually is... Make sure whatever you design has sane
exception case handling to not cause too much collateral damage... I worked
one place that would back everything out if a once-a-week CI test ran and
had failures... That CI test-run took 2 days to run, so it wasn't practical
to run it often, or for every commit. In the end, though, the powers that
be implemented a automated bisection tool that made it marginally less
sucky..

Warner

[-- Attachment #2: Type: text/html, Size: 3466 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 18:36       ` Warner Losh
@ 2019-12-02 22:38         ` Cleber Rosa
  0 siblings, 0 replies; 22+ messages in thread
From: Cleber Rosa @ 2019-12-02 22:38 UTC (permalink / raw)
  To: Warner Losh
  Cc: Peter Maydell, Stefan Hajnoczi, QEMU Developers,
	Wainer dos Santos Moschetta, Markus Armbruster, Jeff Nelson,
	Alex Bennée, Ademar Reis

On Mon, Dec 02, 2019 at 11:36:35AM -0700, Warner Losh wrote:
> 
> Just make sure that any pipeline and mandatory CI steps don't slow things
> down too much... While the examples have talked about 1 or 2 pull requests
> getting done in parallel, and that's great, the problem is when you try to
> land 10 or 20 all at once, one that causes the failure and you aren't sure
> which one it actually is... Make sure whatever you design has sane
> exception case handling to not cause too much collateral damage... I worked
> one place that would back everything out if a once-a-week CI test ran and
> had failures... That CI test-run took 2 days to run, so it wasn't practical
> to run it often, or for every commit. In the end, though, the powers that
> be implemented a automated bisection tool that made it marginally less
> sucky..
> 
> Warner

What I would personally like to see is the availability of enough
resources to give a ~2 hour max result turnaround, that is, the
complete pipeline finishes within that 2 hours.  Of course the exact
max time should be a constructed consensus.

If someone is contributing a new job supposed to run on existing
hardware, its acceptance should be carefully considered.  If more
hardware is being added and the job is capable of running parallel
with others, than it shouldn't be an issue (I don't think we'll hit
GitLab's scheduling limits anytime soon).

With regards to the "1 or 2 pull requests done in parallel", of course
there could be a queue of pending jobs, but given that the idea is for
these jobs to be run based on maintainers actions (say a Merge
Request), the volume should be much lower than if individual
contributors were triggering the same jobs on their patch series, and
not at all on every commit (as you describe with the ~2 days jobs).

Anyway, thanks for the feedback and please do not refrain from further
participation in this effort.  Your experience seems quite valuable.

Thanks,
- Cleber.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 14:05 [RFC] QEMU Gating CI Cleber Rosa
  2019-12-02 17:00 ` Stefan Hajnoczi
@ 2019-12-03 14:07 ` Alex Bennée
  2019-12-04  8:55   ` Thomas Huth
  2019-12-06 19:03   ` Cleber Rosa
  2019-12-03 17:54 ` Peter Maydell
  2020-01-17 14:33 ` Peter Maydell
  3 siblings, 2 replies; 22+ messages in thread
From: Alex Bennée @ 2019-12-03 14:07 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Peter Maydell, Stefan Hajnoczi, qemu-devel,
	Wainer dos Santos Moschetta, Markus Armbruster, Jeff Nelson,
	Ademar Reis


Cleber Rosa <crosa@redhat.com> writes:

> RFC: QEMU Gating CI
> ===================
>
> This RFC attempts to address most of the issues described in
> "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> QEMU CI as we enter 4.0"[2].
>
> The general approach is one to minimize the infrastructure maintenance
> and development burden, leveraging as much as possible "other people's"
> infrastructure and code.  GitLab's CI/CD platform is the most relevant
> component dealt with here.
>
> Problem Statement
> -----------------
>
> The following is copied verbatim from Peter Maydell's write up[1]:
>
> "A gating CI is a prerequisite to having a multi-maintainer model of
> merging. By having a common set of tests that are run prior to a merge
> you do not rely on who is currently doing merging duties having access
> to the current set of test machines."
>
> This is of a very simplified view of the problem that I'd like to break
> down even further into the following key points:
>
>  * Common set of tests
>  * Pre-merge ("prior to a merge")
>  * Access to the current set of test machines
>  * Multi-maintainer model
>
> Common set of tests
> ~~~~~~~~~~~~~~~~~~~
>
> Before we delve any further, let's make it clear that a "common set of
> tests" is really a "dynamic common set of tests".  My point is that a
> set of tests in QEMU may include or exclude different tests depending
> on the environment.
>
> The exact tests that will be executed may differ depending on the
> environment, including:
>
>  * Hardware
>  * Operating system
>  * Build configuration
>  * Environment variables
>
> In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> those "common set of tests":
>
>  * check

Check encompasses a subset of the other checks - currently:

 - check-unit
 - check-qtest
 - check-block

The thing that stops other groups of tests being included is generally
are they solid on all the various hw/os/config/env setups you describe.
For example check-tcg currently fails gloriously on non-x86 with docker
enabled as it tries to get all the cross compiler images working.

>  * check-tcg
>  * check-softfloat
>  * check-block
>  * check-acceptance
>
> While Peter mentions that most of his checks are limited to:
>
>  * check
>  * check-tcg
>
> Our current inability to quickly identify a faulty test from test
> execution results (and specially in remote environments), and act upon
> it (say quickly disable it on a given host platform), makes me believe
> that it's fair to start a gating CI implementation that uses this
> rather coarse granularity.
>
> Another benefit is a close or even a 1:1 relationship between a common
> test set and an entry in the CI configuration.  For instance, the
> "check" common test set would map to a "make check" command in a
> "script:" YAML entry.
>
> To exemplify my point, if one specific test run as part of "check-tcg"
> is found to be faulty on a specific job (say on a specific OS), the
> entire "check-tcg" test set may be disabled as a CI-level maintenance
> action.

This would in this example eliminate practically all emulation testing
apart from the very minimal boot-codes that get spun up by the various
qtest migration tests. And of course the longer a group of tests is
disabled the larger the window for additional regressions to get in.

It may be a reasonable approach but it's not without consequence.

> Of course a follow up action to deal with the specific test
> is required, probably in the form of a Launchpad bug and patches
> dealing with the issue, but without necessarily a CI related angle to
> it.
>
> If/when test result presentation and control mechanism evolve, we may
> feel confident and go into finer grained granularity.  For instance, a
> mechanism for disabling nothing but "tests/migration-test" on a given
> environment would be possible and desirable from a CI management
> level.

The migration tests have found regressions although the problem has
generally been they were intermittent failures and hard to reproduce
locally. The last one took a few weeks of grinding to reproduce and get
patches together.

> Pre-merge
> ~~~~~~~~~
>
> The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> MR comes from individual contributors, usually the authors of the
> changes themselves.  It's my understanding that the current maintainer
> model employed in QEMU will *not* change at this time, meaning that
> code contributions and reviews will continue to happen on the mailing
> list.  A maintainer then, having collected a number of patches, would
> submit a MR either in addition or in substitution to the Pull Requests
> sent to the mailing list.
>
> "Pipelines for Merged Results"[4] is a very important feature to
> support the multi-maintainer model, and looks in practice, similar to
> Peter's "staging" branch approach, with an "automatic refresh" of the
> target branch.  It can give a maintainer extra confidence that a MR
> will play nicely with the updated status of the target branch.  It's
> my understanding that it should be the "key to the gates".  A minor
> note is that conflicts are still possible in a multi-maintainer model
> if there are more than one person doing the merges.
>
> A worthy point is that the GitLab web UI is not the only way to create
> a Merge Request, but a rich set of APIs are available[5].  This is
> interesting for many reasons, and maybe some of Peter's
> "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> addresses checks could be made earlier) as part of a
> "send-mergereq"-like script, bringing conformance earlier on the merge
> process, at the MR creation stage.
>
> Note: It's possible to have CI jobs definition that are specific to
> MR, allowing generic non-MR jobs to be kept on the default
> configuration.  This can be used so individual contributors continue
> to leverage some of the "free" (shared) runner made available on
> gitlab.com.
>
> Multi-maintainer model
> ~~~~~~~~~~~~~~~~~~~~~~
>
> The previous section already introduced some of the proposed workflow
> that can enable such a multi-maintainer model.  With a Gating CI
> system, though, it will be natural to have a smaller "Mean time
> between (CI) failures", simply because of the expected increased
> number of systems and checks.  A lot of countermeasures have to be
> employed to keep that MTBF in check.
>
> For once, it's imperative that the maintainers for such systems and
> jobs are clearly defined and readily accessible.  Either the same
> MAINTAINERS file or a more suitable variation of such data should be
> defined before activating the *gating* rules.  This would allow a
> routing to request the attention of the maintainer responsible.
>
> In case of unresposive maintainers, or any other condition that
> renders and keeps one or more CI jobs failing for a given previously
> established amount of time, the job can be demoted with an
> "allow_failure" configuration[7].  Once such a change is commited, the
> path to promotion would be just the same as in a newly added job
> definition.
>
> Note: In a future phase we can evaluate the creation of rules that
> look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> and the execution of specific CI jobs, which would be the
> responsibility of a given maintainer[8].
>
> Access to the current set of test machines
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> When compared to the various CI systems and services already being
> employed in the QEMU project, this is the most striking difference in
> the proposed model.  Instead of relying on shared/public/free
> resources, this proposal also deals with privately owned and
> operated machines.
>
> Even though the QEMU project operates with great cooperation, it's
> crucial to define clear boundaries when it comes to machine access.
> Restricted access to machines is important because:
>
>  * The results of jobs are many times directly tied to the setup and
>    status of machines.  Even "soft" changes such as removing or updating
>    packages can introduce failures in jobs (this is greatly minimized
>    but not completely eliminated when using containers or VMs).
>    Updating firmware or changing its settings are also examples of
>    changes that may change the outcome of jobs.
>
>  * If maintainers will be accounted for the status of the jobs defined
>    to run on specific machines, they must be sure of the status of the
>    machines.
>
>  * Machines need regular monitoring and will receive required
>    maintainance actions which can cause job regressions.
>
> Thus, there needs to be one clear way for machines to be *used* for
> running jobs sent by different maintainers, while still prohibiting
> any other privileged action that can cause permanent change to the
> machine.  The GitLab agent (gitlab-runner) is designed to do just
> that, and defining what will be excuted in a job (in a given system)
> should be all that's generally allowed.  The job definition itself,
> will of course be subject to code review before a maintainer decides
> to send a MR containing such new or updated job definitions.
>
> Still related to machine maintanance, it's highly desirable for jobs
> tied to specific host machines to be introduced alongside with
> documentation and/or scripts that can replicate the machine setup.  If
> the machine setup steps can be easily and reliably reproduced, then:
>
>  * Other people may help to debug issues and regressions if they
>    happen to have the same hardware available
>
>  * Other people may provide more machines to run the same types of
>    jobs
>
>  * If a machine maintainer goes MIA, it'd be easier to find another
>    maintainer
>
> GitLab Jobs and Pipelines
> -------------------------
>
> GitLab CI is built around two major concepts: jobs and pipelines.  The
> current GitLab CI configuration in QEMU uses jobs only (or putting it
> another way, all jobs in a single pipeline stage).  Consider the
> folowing job definition[9]:
>
>    build-tci:
>     script:
>     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
>     - ./configure --enable-tcg-interpreter
>          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
>     - make -j2
>     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
>     - for tg in $TARGETS ; do
>         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
>         ./tests/boot-serial-test || exit 1 ;
>         ./tests/cdrom-test || exit 1 ;
>       done
>     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
>     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
>
> All the lines under "script" are performed sequentially.  It should be
> clear that there's the possibility of breaking this down into multiple
> stages, so that a build happens first, and then "common set of tests"
> run in parallel.  Using the example above, it would look something
> like:
>
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | boot-serial-test |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | cdrom-test       |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | x86_64-pxe-test  |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-pxe-test   |  |
>    |               |  +------------------+  |
>    |               |                        |
>    +---------------+------------------------+
>
> Of course it would be silly to break down that job into smaller jobs that
> would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
> the pipeline approach is valid because:
>
>  * Common set of tests would run in parallel, giving a quicker result
>    turnaround

check-unit is a good candidate for parallel tests. The others depends -
I've recently turned most make check's back to -j 1 on travis because
it's a real pain to see what test has hung when other tests keep
running.

>
>  * It's easier to determine to possible nature of the problem with
>    just the basic CI job status
>
>  * Different maintainers could be defined for different "common set of
>    tests", and again by leveraging the basic CI job status, automation
>    for directed notification can be implemented
>
> In the following example, "check-block" maintainers could be left
> undisturbed with failures in the "check-acceptance" job:
>
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | check-block      |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | check-acceptance |  |
>    |               |  +------------------+  |
>    |               |                        |
>    +---------------+------------------------+
>
> The same logic applies for test sets for different targets.  For
> instance, combining the two previous examples, there could different
> maintainers defined for the different jobs on the test stage:
>
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | x86_64-block     |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | x86_64-acceptance|  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-block      |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-acceptance |  |
>    |               |  +------------------+  |
>    +---------------+------------------------+
>
> Current limitations for a multi-stage pipeline
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Because it's assumed that each job will happen in an isolated and
> independent execution environment, jobs must explicitly define the
> resources that will be shared between stages.  GitLab will make sure
> the same source code revision will be available on all jobs
> automatically.  Additionaly, GitLab supports the concept of artifacts.
> By defining artifacts in the "build" stage, jobs in the "test" stage
> can expect to have a copy of those artifacts automatically.
>
> In theory, there's nothing that prevents an entire QEMU build
> directory, to be treated as an artifact.  In practice, there are
> predefined limits on GitLab that prevents that from being possible,
> resulting in errors such as:
>
>    Uploading artifacts...
>    build: found 3164 matching files                   
>    ERROR: Uploading artifacts to coordinator... too large archive
>           id=xxxxxxx responseStatus=413 Request Entity Too Large
>           status=413 Request Entity Too Large token=yyyyyyyyy
>    FATAL: too large                                   
>    ERROR: Job failed: exit code 1
>
> As far as I can tell, this is an instance define limit that's clearly
> influenced by storage costs.  I see a few possible solutions to this
> limitation:
>
>  1) Provide our own "artifact" like solution that uses our own storage
>     solution
>
>  2) Reduce or eliminate the dependency on a complete build tree
>
> The first solution can go against the general trend of not having to
> maintain CI infrastructure.  It could be made simpler by using cloud
> storage, but there would still be some interaction with another
> external infrastructure component.
>
> I find the second solution preferrable, given that most tests depend
> on having one or a few binaries available.  I've run multi-stage
> pipelines with some of those binaries (qemu-img,
> $target-softmmu/qemu-system-$target) defined as artifcats and they
> behaved as expected.  But, this could require some intrusive changes
> to the current "make"-based test invocation.

It would be nice if the make check could be run with a make install'ed
set of binaries. I'm not sure how much hackery would be required to get
that to work nicely. Does specifying QEMU and QEMU_IMG prevent make
trying to re-build everything in situ?

>
> Job Naming convention
> ---------------------
>
> Based only on the very simple examples job above, it should already be
> clear that there's a lot of possibility for confusion and chaos.  For
> instance, by looking at the "build" job definition or results, it's
> very hard to tell what's really about.  A bit more could be inferred by
> the "x86_64-block" job name.
>
> Still, the problem we have to address here is not only about the
> amount of information easily obtained from a job name, but allowing
> for very similar job definitions within a global namespace.  For
> instance, if we add an Operating Systems component to the mix, we need
> an extra qualifier for unique job names.
>
> Some of the possible components in a job definition are:
>
>  * Stage
>  * Build profile
>  * Test set (a shorter name for what was described in the "Common set
>    of tests" section)
>  * Host architecture
>  * Target architecture
>  * Host Operating System identification (name and version)
>  * Execution mode/environment (bare metal, container, VM, etc)
>
> Stage
> ~~~~~
>
> The stage of a job (which maps roughly to its purpose) should be
> clearly defined.  A job that builds QEMU should start with "build" and
> a job that tests QEMU should start with "test".
>
> IMO, in a second phase, once multi-stage pipelines are taken for
> granted, we could evaluate dropping this component altogether from the
> naming convention, and relying purely on the stage classification.
>
> Build profile
> ~~~~~~~~~~~~~
>
> Different build profiles already abound in QEMU's various CI
> configuration files.  It's hard to put a naming convention here,
> except that it should represent the most distinguishable
> characteristics of the build configuration.  For instance, we can find
> a "build-disabled" job in the current ".gitlab-ci.yml" file that is
> aptly named, as it forcefully disables a lot of build options.
>
> Test set
> ~~~~~~~~
>
> As mentioned in the "Common set of tests" section, I believe that the
> make target name can be used to identify the test set that will be
> executed in a job.  That is, if a job is to be run at the "test"
> stage, and will run "make check", its name should start with
> "test-check".
>
> QEMU Targets
> ~~~~~~~~~~~~
>
> Because a given job could, and usually do, involve multiple targets, I
> honestly can not think of how to add this to the naming convention.
> I'll ignore it for now, and consider the targets are defined in the
> build profile.

I like to think of three groups:

  Core SoftMMU - the major KVM architectures
  The rest of SoftMMU - all our random emulation targets
  linux-user

>
> Host Architecture
> ~~~~~~~~~~~~~~~~~
>
> The host architecture name convention should be an easy pick, given
> that QEMU itself employes a architecture convention for its targets.
>
> Host OS
> ~~~~~~~
>
> The suggestion I have for the host OS name is to follow the
> libosinfo[10] convention as closely as possible.  libosinfo's "Short
> ID" should be well suitable here.  Examples include: "openbsd4.2",
> "opensuse42.3", "rhel8.0", "ubuntu9.10" and "win2k12r2".
>
> Execution Environment
> ~~~~~~~~~~~~~~~~~~~~~
>
> Distinguishing between running tests in a bare-metal versus a nested
> VM environment is quite significant to a number of people.
>
> Still, I think it could probably be optional for the initial
> implementation phase, like the naming convention for the QEMU Targets.
>
> Example 1
> ~~~~~~~~~
>
> Defining a job that will build QEMU with common debug options, on
> a RHEL 8.0 system on a x86_64 host:
>
>    build-debug-rhel8.0-x86_64:
>      script:
>        - ./configure --enable-debug
>        - make
>
> Example 2
> ~~~~~~~~~
>
> Defining a job that will run the "qtest" test set on a NetBSD 8.1
> system on an aarch64 host:
>
>    test-qtest-netbsd8.1-aarch64:
>      script:
>        - make check-qtest
>
> Job and Machine Scheduling
> --------------------------
>
> While the naming convention gives some information to human beings,
> and hopefully allows for some order and avoids collusions on the
> global job namespace, it's not enough to define where those jobs
> should run.
>
> Tags[11] is the available mechanism to tie jobs to specific machines
> running the GitLab CI agent, "gitlab-runner".  Unfortunately, some
> duplication seems unavoidable, in the sense that some of the naming
> components listed above are machine specific, and will then need to be
> also given as tags.
>
> Note: it may be a good idea to be extra verbose with tags, by having a
> qualifier prefix.  The justification is that tags also live in a
> global namespace, and in theory, at a given point, tags of different
> "categories", say a CPU name and Operating System name may collide.
> Or, it may just be me being paranoid.
>
> Example 1
> ~~~~~~~~~
>
>    build-debug-rhel8.0-x86_64:
>      tags:
>        - rhel8.0
>        - x86_64
>      script:
>        - ./configure --enable-debug
>        - make
>
> Example 2
> ~~~~~~~~~
>
>    test-qtest-netbsd8.1-aarch64:
>      tags:
>        - netbsd8.1
>        - aarch64
>      script:
>        - make check-qtest

Where are all these going to go? Are we overloading the existing
gitlab.yml or are we going to have a new set of configs for the GatingCI
and keep gitlab.yml as the current subset that people run on their own
accounts?

>
> Operating System definition versus Container Images
> ---------------------------------------------------
>
> In the previous section and examples, we're assuming that tests will
> run on machines that have registered "gitlab-runner" agents with
> matching tags.  The tags given at gitlab-runner registration time
> would of course match the same naming convention defined earlier.
>
> So, if one is registering a "gitlab-runner" instance on a x86_64
> machine, running RHEL 8.0, the tags "rhel8.0" and "x86_64" would be
> given (possibly among others).
>
> Nevertheless, most deployment scenarios will probably rely on jobs
> being executed by gitlab-runner's container executor (currently
> Docker-only).  This means that tags given to a job *may* drop the tag
> associated with the host operating system selection, and instead
> provide the ".gitlab-ci.yml" configuration directive that determines
> the container image to be used.
>
> Most jobs would probably *not* require a matching host operating
> system and container images, but there should still be the capability
> to make it a requirement.  For instance, jobs containing tests that
> require the KVM accelerator on specific scenarios may require a
> matching host Operating System.
>
> Note: What was mentioned in the "Execution Environment" section under
> the naming conventions section, is also closely related to this
> requirement, that is, one may require a job to run under a container,
> VM or bare metal.
>
> Example 1
> ~~~~~~~~~
>
> Build QEMU on a "rhel8.0" image hosted under the "qemuci" organization
> and require the runner to support container execution:
>
>    build-debug-rhel8.0-x86_64:
>      tags:
>        - x86_64
>        - container
>      image: qemuci/rhel8.0
>      script:
>        - ./configure --enable-debug
>        - make
>
> Example 2
> ~~~~~~~~~
>
> Run check QEMU on a "rhel8.0" image hosted under the "qemuci"
> organization and require the runner to support container execution and
> be on a matching host:
>
>    test-check-rhel8.0-x86_64:
>      tags:
>        - x86_64
>        - rhel8.0
>        - container
>      image: qemuci/rhel8.0
>      script:
>        - make check
>
> Next
> ----
>
> Because this document is already too long and that can be distracting,
> I decided to defer many other implementation level details to a second
> RFC, alongside some code.
>
> Some completementary topics that I have prepared include:
>
>  * Container images creation, hosting and management
>  * Advanced pipeline definitions
>    - Job depedencies
>    - Artifacts
>    - Results
>  * GitLab CI for Individial Contributors
>  * GitLab runner:
>    - Official and Custom Binaries
>    - Executors
>    - Security implications
>    - Helper container images for non supported architectures
>  * Checklists for:
>    - Preparing and documenting machine setup
>    - Proposing new runners and jobs
>    - Runners and jobs promotions and demotions
>
> Of course any other topics that spurr from this discussion will also
> be added to the following threads.
>
> References:
> -----------
>  [1] https://wiki.qemu.org/Requirements/GatingCI
>  [2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg04909.html
>  [3] https://docs.gitlab.com/ee/gitlab-basics/add-merge-request.html
>  [4] https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_merged_results/index.html
>  [5] https://docs.gitlab.com/ee/api/merge_requests.html#create-mr-pipeline
>  [6] https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree/apply-pullreq
>  [7] https://docs.gitlab.com/ee/ci/yaml/README.html#allow_failure
>  [8] https://docs.gitlab.com/ee/ci/yaml/README.html#using-onlychanges-with-pipelines-for-merge-requests
>  [9] https://github.com/qemu/qemu/blob/fb2246882a2c8d7f084ebe0617e97ac78467d156/.gitlab-ci.yml#L70 
>  [10] https://libosinfo.org/
>  [11] https://docs.gitlab.com/ee/ci/runners/README.html#using-tags


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 18:12   ` Cleber Rosa
@ 2019-12-03 14:14     ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-03 14:14 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Peter Maydell, qemu-devel, Wainer dos Santos Moschetta,
	Markus Armbruster, Jeff Nelson, Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 8997 bytes --]

On Mon, Dec 02, 2019 at 01:12:54PM -0500, Cleber Rosa wrote:
> On Mon, Dec 02, 2019 at 05:00:18PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Dec 02, 2019 at 09:05:52AM -0500, Cleber Rosa wrote:
> > > RFC: QEMU Gating CI
> > > ===================
> > 
> > Excellent, thank you for your work on this!
> > 
> > > 
> > > This RFC attempts to address most of the issues described in
> > > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > > QEMU CI as we enter 4.0"[2].
> > > 
> > > The general approach is one to minimize the infrastructure maintenance
> > > and development burden, leveraging as much as possible "other people's"
> > > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > > component dealt with here.
> > > 
> > > Problem Statement
> > > -----------------
> > > 
> > > The following is copied verbatim from Peter Maydell's write up[1]:
> > > 
> > > "A gating CI is a prerequisite to having a multi-maintainer model of
> > > merging. By having a common set of tests that are run prior to a merge
> > > you do not rely on who is currently doing merging duties having access
> > > to the current set of test machines."
> > > 
> > > This is of a very simplified view of the problem that I'd like to break
> > > down even further into the following key points:
> > > 
> > >  * Common set of tests
> > >  * Pre-merge ("prior to a merge")
> > >  * Access to the current set of test machines
> > >  * Multi-maintainer model
> > > 
> > > Common set of tests
> > > ~~~~~~~~~~~~~~~~~~~
> > > 
> > > Before we delve any further, let's make it clear that a "common set of
> > > tests" is really a "dynamic common set of tests".  My point is that a
> > > set of tests in QEMU may include or exclude different tests depending
> > > on the environment.
> > > 
> > > The exact tests that will be executed may differ depending on the
> > > environment, including:
> > > 
> > >  * Hardware
> > >  * Operating system
> > >  * Build configuration
> > >  * Environment variables
> > > 
> > > In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> > > those "common set of tests":
> > > 
> > >  * check
> > >  * check-tcg
> > >  * check-softfloat
> > >  * check-block
> > >  * check-acceptance
> > > 
> > > While Peter mentions that most of his checks are limited to:
> > > 
> > >  * check
> > >  * check-tcg
> > > 
> > > Our current inability to quickly identify a faulty test from test
> > > execution results (and specially in remote environments), and act upon
> > > it (say quickly disable it on a given host platform), makes me believe
> > > that it's fair to start a gating CI implementation that uses this
> > > rather coarse granularity.
> > > 
> > > Another benefit is a close or even a 1:1 relationship between a common
> > > test set and an entry in the CI configuration.  For instance, the
> > > "check" common test set would map to a "make check" command in a
> > > "script:" YAML entry.
> > > 
> > > To exemplify my point, if one specific test run as part of "check-tcg"
> > > is found to be faulty on a specific job (say on a specific OS), the
> > > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > > action.  Of course a follow up action to deal with the specific test
> > > is required, probably in the form of a Launchpad bug and patches
> > > dealing with the issue, but without necessarily a CI related angle to
> > > it.
> > 
> > I think this coarse level of granularity is unrealistic.  We cannot
> > disable 99 tests because of 1 known failure.  There must be a way of
> > disabling individual tests.  You don't need to implement it yourself,
> > but I think this needs to be solved by someone before a gating CI can be
> > put into use.
> >
> 
> IMO it should be realistic if you look at it from a "CI related
> angle".  The pull request could still be revised and disable a single
> test because of a known failure, but this would not be necessarily
> related to the CI.

That sounds fine, thanks.  I interpreted the text a little differently.
I agree this functionality doesn't need to present in order to move to
GitLab.

> 
> > It probably involves adding a "make EXCLUDE_TESTS=foo,bar check"
> > variable so that .gitlab-ci.yml can be modified to exclude specific
> > tests on certain OSes.
> >
> 
> I certainly acknowledge the issue, but I don't think this (and many
> other issues that will certainly come up) should be a blocker to the
> transition to GitLab.
> 
> > > 
> > > If/when test result presentation and control mechanism evolve, we may
> > > feel confident and go into finer grained granularity.  For instance, a
> > > mechanism for disabling nothing but "tests/migration-test" on a given
> > > environment would be possible and desirable from a CI management level.
> > > 
> > > Pre-merge
> > > ~~~~~~~~~
> > > 
> > > The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> > > Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> > > MR comes from individual contributors, usually the authors of the
> > > changes themselves.  It's my understanding that the current maintainer
> > > model employed in QEMU will *not* change at this time, meaning that
> > > code contributions and reviews will continue to happen on the mailing
> > > list.  A maintainer then, having collected a number of patches, would
> > > submit a MR either in addition or in substitution to the Pull Requests
> > > sent to the mailing list.
> > > 
> > > "Pipelines for Merged Results"[4] is a very important feature to
> > > support the multi-maintainer model, and looks in practice, similar to
> > > Peter's "staging" branch approach, with an "automatic refresh" of the
> > > target branch.  It can give a maintainer extra confidence that a MR
> > > will play nicely with the updated status of the target branch.  It's
> > > my understanding that it should be the "key to the gates".  A minor
> > > note is that conflicts are still possible in a multi-maintainer model
> > > if there are more than one person doing the merges.
> > 
> > The intention is to have only 1 active maintainer at a time.  The
> > maintainer will handle all merges for the current QEMU release and then
> > hand over to the next maintainer after the release has been made.
> > 
> > Solving the problem for multiple active maintainers is low priority at
> > the moment.
> >
> 
> Even so, I have the impression that the following workflow:
> 
>  - Look at Merge Results Pipeline for MR#1
>  - Merge MR #1
>  - Hack on something else
>  - Look at *automatically updated* Merge Results Pipeline for MR#2
>  - Merge MR #2
> 
> Is better than:
> 
>  - Push PR #1 to staging
>  - Wait for PR #1 Pipeline to finish
>  - Look at PR #1 Pipeline results
>  - Push staging into master
>  - Push PR #2 to staging 
>  - Wait for PR #2 Pipeline to finish
>  - Push staging into master
> 
> But I don't think I'll be a direct user of those workflows, so I'm
> completely open to feedback on it.

If the goal is to run multiple trees through the CI in parallel then
multiple branches can be used.  I guess I'm just

> 
> > > A worthy point is that the GitLab web UI is not the only way to create
> > > a Merge Request, but a rich set of APIs are available[5].  This is
> > > interesting for many reasons, and maybe some of Peter's
> > > "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> > > addresses checks could be made earlier) as part of a
> > > "send-mergereq"-like script, bringing conformance earlier on the merge
> > > process, at the MR creation stage.
> > > 
> > > Note: It's possible to have CI jobs definition that are specific to
> > > MR, allowing generic non-MR jobs to be kept on the default
> > > configuration.  This can be used so individual contributors continue
> > > to leverage some of the "free" (shared) runner made available on
> > > gitlab.com.
> > 
> > I expected this section to say:
> > 1. Maintainer sets up a personal gitlab.com account with a qemu.git fork.
> > 2. Maintainer adds QEMU's CI tokens to their personal account.
> > 3. Each time a maintainer pushes to their "staging" branch the CI
> >    triggers.
> > 
> > IMO this model is simpler than MRs because once it has been set up the
> > maintainer just uses git push.  Why are MRs necessary?
> >
> 
> I am not sure GitLab "Specific Runners" can be used from other
> accounts/forks.  AFAICT, you'd need a MR to send jobs that would run
> on those machines, because (again AFAICT) the token used to register
> those gitlab-runner instances on those machines is not shareable
> across forks.  But, I'll double check that.

Another question:
Is a Merge Request necessary in order to trigger the CI or is just
pushing to a branch enough?  With GitHub + Travis just pushing is
enough.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 14:05 [RFC] QEMU Gating CI Cleber Rosa
  2019-12-02 17:00 ` Stefan Hajnoczi
  2019-12-03 14:07 ` Alex Bennée
@ 2019-12-03 17:54 ` Peter Maydell
  2019-12-05  5:05   ` Cleber Rosa
  2020-01-17 14:33 ` Peter Maydell
  3 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2019-12-03 17:54 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Jeff Nelson, QEMU Developers, Wainer dos Santos Moschetta,
	Markus Armbruster, Stefan Hajnoczi, Alex Bennée,
	Ademar Reis

On Mon, 2 Dec 2019 at 14:06, Cleber Rosa <crosa@redhat.com> wrote:
>
> RFC: QEMU Gating CI
> ===================
>
> This RFC attempts to address most of the issues described in
> "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> QEMU CI as we enter 4.0"[2].
>
> The general approach is one to minimize the infrastructure maintenance
> and development burden, leveraging as much as possible "other people's"
> infrastructure and code.  GitLab's CI/CD platform is the most relevant
> component dealt with here.

Thanks for writing up this RFC.

My overall view is that there's some interesting stuff in
here and definitely some things we'll want to cover at some
point, but there's also a fair amount that is veering away
from solving the immediate problem we want to solve, and
which we should thus postpone for later (beyond making some
reasonable efforts not to design something which paints us
into a corner so it's annoyingly hard to improve later).

> To exemplify my point, if one specific test run as part of "check-tcg"
> is found to be faulty on a specific job (say on a specific OS), the
> entire "check-tcg" test set may be disabled as a CI-level maintenance
> action.  Of course a follow up action to deal with the specific test
> is required, probably in the form of a Launchpad bug and patches
> dealing with the issue, but without necessarily a CI related angle to
> it.
>
> If/when test result presentation and control mechanism evolve, we may
> feel confident and go into finer grained granularity.  For instance, a
> mechanism for disabling nothing but "tests/migration-test" on a given
> environment would be possible and desirable from a CI management level.

For instance, we don't have anything today for granularity of
definition of what tests we run where or where we disable them.
So we don't need it in order to move away from the scripting
approach I have at the moment. We can just say "the CI system
will run make and make check (and maybe in some hosts some
additional test-running commands) on these hosts" and hardcode
that into whatever yaml file the CI system's configured in.

> Pre-merge
> ~~~~~~~~~
>
> The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> MR comes from individual contributors, usually the authors of the
> changes themselves.  It's my understanding that the current maintainer
> model employed in QEMU will *not* change at this time, meaning that
> code contributions and reviews will continue to happen on the mailing
> list.  A maintainer then, having collected a number of patches, would
> submit a MR either in addition or in substitution to the Pull Requests
> sent to the mailing list.

Eventually it would be nice to allow any submaintainer
to send a merge request to the CI system (though you would
want it to have a "but don't apply until somebody else approves it"
gate as well as the automated testing part). But right now all
we need is for the one person managing merges and releases
to be able to say "here's the branch where I merged this
pullrequest, please test it". At any rate, supporting multiple
submaintainers all talking to the CI independently should be
out of scope for now.

> Multi-maintainer model
> ~~~~~~~~~~~~~~~~~~~~~~
>
> The previous section already introduced some of the proposed workflow
> that can enable such a multi-maintainer model.  With a Gating CI
> system, though, it will be natural to have a smaller "Mean time
> between (CI) failures", simply because of the expected increased
> number of systems and checks.  A lot of countermeasures have to be
> employed to keep that MTBF in check.
>
> For once, it's imperative that the maintainers for such systems and
> jobs are clearly defined and readily accessible.  Either the same
> MAINTAINERS file or a more suitable variation of such data should be
> defined before activating the *gating* rules.  This would allow a
> routing to request the attention of the maintainer responsible.
>
> In case of unresposive maintainers, or any other condition that
> renders and keeps one or more CI jobs failing for a given previously
> established amount of time, the job can be demoted with an
> "allow_failure" configuration[7].  Once such a change is commited, the
> path to promotion would be just the same as in a newly added job
> definition.
>
> Note: In a future phase we can evaluate the creation of rules that
> look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> and the execution of specific CI jobs, which would be the
> responsibility of a given maintainer[8].

All this stuff is not needed to start with. We cope at the
moment with "everything is gating, and if something doesn't
pass it needs to be fixed or manually removed from the setup".

> GitLab Jobs and Pipelines
> -------------------------
>
> GitLab CI is built around two major concepts: jobs and pipelines.  The
> current GitLab CI configuration in QEMU uses jobs only (or putting it
> another way, all jobs in a single pipeline stage).  Consider the
> folowing job definition[9]:
>
>    build-tci:
>     script:
>     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
>     - ./configure --enable-tcg-interpreter
>          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
>     - make -j2
>     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
>     - for tg in $TARGETS ; do
>         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
>         ./tests/boot-serial-test || exit 1 ;
>         ./tests/cdrom-test || exit 1 ;
>       done
>     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
>     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
>
> All the lines under "script" are performed sequentially.  It should be
> clear that there's the possibility of breaking this down into multiple
> stages, so that a build happens first, and then "common set of tests"
> run in parallel.

We could do this, but we don't do it today, so we don't need
to think about this at all to start with.

> In theory, there's nothing that prevents an entire QEMU build
> directory, to be treated as an artifact.  In practice, there are
> predefined limits on GitLab that prevents that from being possible,

...so we don't need to worry about somehow defining some
cut-down "build artefact" that we provide to the testing
phase. Just do a build and test run as a single thing.
We can always come back and improve later.


Have you been able to investigate and confirm that we can
get a gitlab-runner setup that works on non-x86 ? That seems
to me like an important thing we should be confident about
early before we sink too much effort into a gitlab-based
solution.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-03 14:07 ` Alex Bennée
@ 2019-12-04  8:55   ` Thomas Huth
  2019-12-06 19:03   ` Cleber Rosa
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2019-12-04  8:55 UTC (permalink / raw)
  To: Alex Bennée, Cleber Rosa
  Cc: Peter Maydell, Stefan Hajnoczi, Markus Armbruster,
	Wainer dos Santos Moschetta, qemu-devel, Jeff Nelson,
	Philippe Mathieu-Daudé,
	Ademar Reis

On 03/12/2019 15.07, Alex Bennée wrote:
[...]
>> GitLab Jobs and Pipelines
>> -------------------------
>>
>> GitLab CI is built around two major concepts: jobs and pipelines.  The
>> current GitLab CI configuration in QEMU uses jobs only (or putting it
>> another way, all jobs in a single pipeline stage).

Yeah, the initial gitlab-ci.yml file was one of the very first YAML file
and one the very first CI files that I wrote, with hardly any experience
in this area ... there is definitely a lot of room for improvement here!

>>  Consider the
>> folowing job definition[9]:
>>
>>    build-tci:
>>     script:
>>     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
>>     - ./configure --enable-tcg-interpreter
>>          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
>>     - make -j2
>>     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
>>     - for tg in $TARGETS ; do
>>         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
>>         ./tests/boot-serial-test || exit 1 ;
>>         ./tests/cdrom-test || exit 1 ;
>>       done
>>     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
>>     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
>>
>> All the lines under "script" are performed sequentially.  It should be
>> clear that there's the possibility of breaking this down into multiple
>> stages, so that a build happens first, and then "common set of tests"
>> run in parallel.  Using the example above, it would look something
>> like:
>>
>>    +---------------+------------------------+
>>    |  BUILD STAGE  |        TEST STAGE      |
>>    +---------------+------------------------+
>>    |   +-------+   |  +------------------+  |
>>    |   | build |   |  | boot-serial-test |  |
>>    |   +-------+   |  +------------------+  |
>>    |               |                        |
>>    |               |  +------------------+  |
>>    |               |  | cdrom-test       |  |
>>    |               |  +------------------+  |
>>    |               |                        |
>>    |               |  +------------------+  |
>>    |               |  | x86_64-pxe-test  |  |
>>    |               |  +------------------+  |
>>    |               |                        |
>>    |               |  +------------------+  |
>>    |               |  | s390x-pxe-test   |  |
>>    |               |  +------------------+  |
>>    |               |                        |
>>    +---------------+------------------------+
>>
>> Of course it would be silly to break down that job into smaller jobs that
>> would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
>> the pipeline approach is valid because:
>>
>>  * Common set of tests would run in parallel, giving a quicker result
>>    turnaround

Ok, full ack for the idea to use separate pipelines for the testing
(Philippe once showed me this idea already, too, he's using it for EDK2
testing IIRC). But the example with the build-tci is quite bad. The
single steps here are basically just a subset of "check-qtest" to skip
the tests that we are not interested in here. If we don't care about
losing some minutes of testing, we can simply replace all those steps
with "make check-qtest" again.

I think what we really want to put into different pipelines are the
sub-steps of "make check", i.e.:

- check-block
- check-qapi-schema
- check-unit
- check-softfloat
- check-qtest
- check-decodetree

And of course also the other ones that are not included in "make check"
yet, e.g. "check-acceptance" etc.

> check-unit is a good candidate for parallel tests. The others depends -
> I've recently turned most make check's back to -j 1 on travis because
> it's a real pain to see what test has hung when other tests keep
> running.

If I understood correctly, it's not about running the check steps in
parallel with "make -jXX" in one pipeline, but rather about running the
different test steps in different pipelines. So you get a separate
output for each test subsystem.

>> Current limitations for a multi-stage pipeline
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Because it's assumed that each job will happen in an isolated and
>> independent execution environment, jobs must explicitly define the
>> resources that will be shared between stages.  GitLab will make sure
>> the same source code revision will be available on all jobs
>> automatically.  Additionaly, GitLab supports the concept of artifacts.
>> By defining artifacts in the "build" stage, jobs in the "test" stage
>> can expect to have a copy of those artifacts automatically.
>>
>> In theory, there's nothing that prevents an entire QEMU build
>> directory, to be treated as an artifact.  In practice, there are
>> predefined limits on GitLab that prevents that from being possible,
>> resulting in errors such as:
>>
>>    Uploading artifacts...
>>    build: found 3164 matching files                   
>>    ERROR: Uploading artifacts to coordinator... too large archive
>>           id=xxxxxxx responseStatus=413 Request Entity Too Large
>>           status=413 Request Entity Too Large token=yyyyyyyyy
>>    FATAL: too large                                   
>>    ERROR: Job failed: exit code 1
>>
>> As far as I can tell, this is an instance define limit that's clearly
>> influenced by storage costs.  I see a few possible solutions to this
>> limitation:
>>
>>  1) Provide our own "artifact" like solution that uses our own storage
>>     solution
>>
>>  2) Reduce or eliminate the dependency on a complete build tree
>>
>> The first solution can go against the general trend of not having to
>> maintain CI infrastructure.  It could be made simpler by using cloud
>> storage, but there would still be some interaction with another
>> external infrastructure component.
>>
>> I find the second solution preferrable, given that most tests depend
>> on having one or a few binaries available.  I've run multi-stage
>> pipelines with some of those binaries (qemu-img,
>> $target-softmmu/qemu-system-$target) defined as artifcats and they
>> behaved as expected.  But, this could require some intrusive changes
>> to the current "make"-based test invocation.

I think it should be sufficient to define a simple set of artifacts like:

- tests/*
- *-softmmu/qemu-system-*
- qemu-img, qemu-nbd ... and all the other helper binaries
- Makefile*

... and maybe some more missing files. It's some initial work, but once
we have the basic list, I don't expect to change it much in the course
of time.

 Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-03 17:54 ` Peter Maydell
@ 2019-12-05  5:05   ` Cleber Rosa
  0 siblings, 0 replies; 22+ messages in thread
From: Cleber Rosa @ 2019-12-05  5:05 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jeff Nelson, QEMU Developers, Wainer dos Santos Moschetta,
	Markus Armbruster, Stefan Hajnoczi, Alex Bennée,
	Ademar Reis

On Tue, Dec 03, 2019 at 05:54:38PM +0000, Peter Maydell wrote:
> On Mon, 2 Dec 2019 at 14:06, Cleber Rosa <crosa@redhat.com> wrote:
> >
> > RFC: QEMU Gating CI
> > ===================
> >
> > This RFC attempts to address most of the issues described in
> > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > QEMU CI as we enter 4.0"[2].
> >
> > The general approach is one to minimize the infrastructure maintenance
> > and development burden, leveraging as much as possible "other people's"
> > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > component dealt with here.
> 
> Thanks for writing up this RFC.
> 
> My overall view is that there's some interesting stuff in
> here and definitely some things we'll want to cover at some
> point, but there's also a fair amount that is veering away
> from solving the immediate problem we want to solve, and
> which we should thus postpone for later (beyond making some
> reasonable efforts not to design something which paints us
> into a corner so it's annoyingly hard to improve later).
>

Right.  I think this is a valid perspective to consider as we define
the order and scope of thanks.  I'll follow up with a more
straightforward suggestion with the bare minimum actions for a first
round.

> > To exemplify my point, if one specific test run as part of "check-tcg"
> > is found to be faulty on a specific job (say on a specific OS), the
> > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > action.  Of course a follow up action to deal with the specific test
> > is required, probably in the form of a Launchpad bug and patches
> > dealing with the issue, but without necessarily a CI related angle to
> > it.
> >
> > If/when test result presentation and control mechanism evolve, we may
> > feel confident and go into finer grained granularity.  For instance, a
> > mechanism for disabling nothing but "tests/migration-test" on a given
> > environment would be possible and desirable from a CI management level.
> 
> For instance, we don't have anything today for granularity of
> definition of what tests we run where or where we disable them.
> So we don't need it in order to move away from the scripting
> approach I have at the moment. We can just say "the CI system
> will run make and make check (and maybe in some hosts some
> additional test-running commands) on these hosts" and hardcode
> that into whatever yaml file the CI system's configured in.
>

I absolutely agree.  That's why I even considered *if* this will done,
and not only *when*.  Because I happen to be biased from working on a
test runner/framework, this is something that I had to at least talk
about, so that it can be evaluated and maybe turned into a goal.

> > Pre-merge
> > ~~~~~~~~~
> >
> > The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> > Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> > MR comes from individual contributors, usually the authors of the
> > changes themselves.  It's my understanding that the current maintainer
> > model employed in QEMU will *not* change at this time, meaning that
> > code contributions and reviews will continue to happen on the mailing
> > list.  A maintainer then, having collected a number of patches, would
> > submit a MR either in addition or in substitution to the Pull Requests
> > sent to the mailing list.
> 
> Eventually it would be nice to allow any submaintainer
> to send a merge request to the CI system (though you would
> want it to have a "but don't apply until somebody else approves it"
> gate as well as the automated testing part). But right now all
> we need is for the one person managing merges and releases
> to be able to say "here's the branch where I merged this
> pullrequest, please test it". At any rate, supporting multiple
> submaintainers all talking to the CI independently should be
> out of scope for now.
>

OK, noted.

> > Multi-maintainer model
> > ~~~~~~~~~~~~~~~~~~~~~~
> >
> > The previous section already introduced some of the proposed workflow
> > that can enable such a multi-maintainer model.  With a Gating CI
> > system, though, it will be natural to have a smaller "Mean time
> > between (CI) failures", simply because of the expected increased
> > number of systems and checks.  A lot of countermeasures have to be
> > employed to keep that MTBF in check.
> >
> > For once, it's imperative that the maintainers for such systems and
> > jobs are clearly defined and readily accessible.  Either the same
> > MAINTAINERS file or a more suitable variation of such data should be
> > defined before activating the *gating* rules.  This would allow a
> > routing to request the attention of the maintainer responsible.
> >
> > In case of unresposive maintainers, or any other condition that
> > renders and keeps one or more CI jobs failing for a given previously
> > established amount of time, the job can be demoted with an
> > "allow_failure" configuration[7].  Once such a change is commited, the
> > path to promotion would be just the same as in a newly added job
> > definition.
> >
> > Note: In a future phase we can evaluate the creation of rules that
> > look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> > and the execution of specific CI jobs, which would be the
> > responsibility of a given maintainer[8].
> 
> All this stuff is not needed to start with. We cope at the
> moment with "everything is gating, and if something doesn't
> pass it needs to be fixed or manually removed from the setup".
>

OK, I get your point.  But, I think it's fair to say though, that one
big motivation that we also have for this work, is to be able to
provide new machines and jobs into the Gating CI in the very near
future.  And to do that, we must set common rules so that anyone
else can do the same and abide by the same terms.

> > GitLab Jobs and Pipelines
> > -------------------------
> >
> > GitLab CI is built around two major concepts: jobs and pipelines.  The
> > current GitLab CI configuration in QEMU uses jobs only (or putting it
> > another way, all jobs in a single pipeline stage).  Consider the
> > folowing job definition[9]:
> >
> >    build-tci:
> >     script:
> >     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
> >     - ./configure --enable-tcg-interpreter
> >          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
> >     - make -j2
> >     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
> >     - for tg in $TARGETS ; do
> >         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
> >         ./tests/boot-serial-test || exit 1 ;
> >         ./tests/cdrom-test || exit 1 ;
> >       done
> >     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
> >     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
> >
> > All the lines under "script" are performed sequentially.  It should be
> > clear that there's the possibility of breaking this down into multiple
> > stages, so that a build happens first, and then "common set of tests"
> > run in parallel.
> 
> We could do this, but we don't do it today, so we don't need
> to think about this at all to start with.
>

So, in your opinion, this is phase >= 1 material.  Noted.

> > In theory, there's nothing that prevents an entire QEMU build
> > directory, to be treated as an artifact.  In practice, there are
> > predefined limits on GitLab that prevents that from being possible,
> 
> ...so we don't need to worry about somehow defining some
> cut-down "build artefact" that we provide to the testing
> phase. Just do a build and test run as a single thing.
> We can always come back and improve later.
> 
> 
> Have you been able to investigate and confirm that we can
> get a gitlab-runner setup that works on non-x86 ? That seems
> to me like an important thing we should be confident about
> early before we sink too much effort into a gitlab-based
> solution.
>

I've successfully built gitlab-runner and run jobs on aarch64, ppc64le
and s390x.  The binaries are available here:

   https://cleber.fedorapeople.org/gitlab-runner/v12.4.1/

But, with the "shell" executor (given that Docker helper images are
not available for those architectures).  I don't think we'd have to
depend on GitLab providing those images though, it should be possible
to create them for different architectures and tweak the gitlab-runner
code to use different image references on those architectures.

Does this answer this specific question?

Best,
- Cleber.

> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-03 14:07 ` Alex Bennée
  2019-12-04  8:55   ` Thomas Huth
@ 2019-12-06 19:03   ` Cleber Rosa
  1 sibling, 0 replies; 22+ messages in thread
From: Cleber Rosa @ 2019-12-06 19:03 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Stefan Hajnoczi, qemu-devel,
	Wainer dos Santos Moschetta, Markus Armbruster, Jeff Nelson,
	Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 27651 bytes --]

On Tue, Dec 03, 2019 at 02:07:32PM +0000, Alex Bennée wrote:
> 
> Cleber Rosa <crosa@redhat.com> writes:
> 
> > RFC: QEMU Gating CI
> > ===================
> >
> > This RFC attempts to address most of the issues described in
> > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > QEMU CI as we enter 4.0"[2].
> >
> > The general approach is one to minimize the infrastructure maintenance
> > and development burden, leveraging as much as possible "other people's"
> > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > component dealt with here.
> >
> > Problem Statement
> > -----------------
> >
> > The following is copied verbatim from Peter Maydell's write up[1]:
> >
> > "A gating CI is a prerequisite to having a multi-maintainer model of
> > merging. By having a common set of tests that are run prior to a merge
> > you do not rely on who is currently doing merging duties having access
> > to the current set of test machines."
> >
> > This is of a very simplified view of the problem that I'd like to break
> > down even further into the following key points:
> >
> >  * Common set of tests
> >  * Pre-merge ("prior to a merge")
> >  * Access to the current set of test machines
> >  * Multi-maintainer model
> >
> > Common set of tests
> > ~~~~~~~~~~~~~~~~~~~
> >
> > Before we delve any further, let's make it clear that a "common set of
> > tests" is really a "dynamic common set of tests".  My point is that a
> > set of tests in QEMU may include or exclude different tests depending
> > on the environment.
> >
> > The exact tests that will be executed may differ depending on the
> > environment, including:
> >
> >  * Hardware
> >  * Operating system
> >  * Build configuration
> >  * Environment variables
> >
> > In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> > those "common set of tests":
> >
> >  * check
> 
> Check encompasses a subset of the other checks - currently:
> 
>  - check-unit
>  - check-qtest
>  - check-block
>
> The thing that stops other groups of tests being included is generally
> are they solid on all the various hw/os/config/env setups you describe.
> For example check-tcg currently fails gloriously on non-x86 with docker
> enabled as it tries to get all the cross compiler images working.
>

Right.

> >  * check-tcg
> >  * check-softfloat
> >  * check-block
> >  * check-acceptance
> >
> > While Peter mentions that most of his checks are limited to:
> >
> >  * check
> >  * check-tcg
> >
> > Our current inability to quickly identify a faulty test from test
> > execution results (and specially in remote environments), and act upon
> > it (say quickly disable it on a given host platform), makes me believe
> > that it's fair to start a gating CI implementation that uses this
> > rather coarse granularity.
> >
> > Another benefit is a close or even a 1:1 relationship between a common
> > test set and an entry in the CI configuration.  For instance, the
> > "check" common test set would map to a "make check" command in a
> > "script:" YAML entry.
> >
> > To exemplify my point, if one specific test run as part of "check-tcg"
> > is found to be faulty on a specific job (say on a specific OS), the
> > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > action.
> 
> This would in this example eliminate practically all emulation testing
> apart from the very minimal boot-codes that get spun up by the various
> qtest migration tests. And of course the longer a group of tests is
> disabled the larger the window for additional regressions to get in.
> 
> It may be a reasonable approach but it's not without consequence.
>

Thanks for the insights.  I agree that there are tradeoffs, and I bet
that I'm speculating a lot, and there's just to much to be learned here.

My main point in proposing this very crude rule, though, is trying to
address the operational difficulties when such a Gating CI grows
beyond a single *machine* and *job* maintainer.  A practical example
may help.  Let's consider the following jobs are initially (on phase
1) active and *gating*:

 * test-qtest-ubuntu19.04-x86_64
 * test-qtest-ubuntu19.04-aarch64
 * test-qtest-ubuntu19.04-s390x

Then, because the model has proven successful, a new job that has
already being running for a while with successful results, but with no
influence to gating, is added to the gating group.  This job is being
run on a machine that is managed by a different maintainer:

 * test-qtest-centos8.0-ppc64le

After some time, the test-qtest-centos8.0-ppc64le job starts to fail,
with seemingly no relation to recently merged code.  From a CI
management perspective, disabling the job completely is reasonable if:

 * the machine seems to be faulty
 * ppc64le machine maintainer is unresponsive
 * there's no mechanism to disable a portion of the job (such as an
   specific test)
 * bug has been found but there's no short-term fix

This doesn't mean that a lot of test will be eliminated for good.
Unless the machine is faulty, it's expected that the tests will
continue to run, but not with Gating powers.  Also, it's expected that
the same (or similar) tests will be running on other machines/jobs.

IMO, it can be actually the opposite, when "skip test y on platform x"
conditions hidden in test code can survive a lot longer than a
disabled job/machine with a frustrated (and engaged) maintainer trying
to get it back to a "green" status, and then to a reliable status for
a given time so that it can be considered a gating job again.

> > Of course a follow up action to deal with the specific test
> > is required, probably in the form of a Launchpad bug and patches
> > dealing with the issue, but without necessarily a CI related angle to
> > it.
> >
> > If/when test result presentation and control mechanism evolve, we may
> > feel confident and go into finer grained granularity.  For instance, a
> > mechanism for disabling nothing but "tests/migration-test" on a given
> > environment would be possible and desirable from a CI management
> > level.
> 
> The migration tests have found regressions although the problem has
> generally been they were intermittent failures and hard to reproduce
> locally. The last one took a few weeks of grinding to reproduce and get
> patches together.
>

Right.  So I believe we are in sync with the nature of the problem,
that is, that some tests would benefit from individually being pulled
from specific jobs until a permanent solution can be applied to them.

At the same time, if we can't do that (see the conditions that may
render us unable to do it), it would be fair to remove a CI job from
being gating.

> > Pre-merge
> > ~~~~~~~~~
> >
> > The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> > Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> > MR comes from individual contributors, usually the authors of the
> > changes themselves.  It's my understanding that the current maintainer
> > model employed in QEMU will *not* change at this time, meaning that
> > code contributions and reviews will continue to happen on the mailing
> > list.  A maintainer then, having collected a number of patches, would
> > submit a MR either in addition or in substitution to the Pull Requests
> > sent to the mailing list.
> >
> > "Pipelines for Merged Results"[4] is a very important feature to
> > support the multi-maintainer model, and looks in practice, similar to
> > Peter's "staging" branch approach, with an "automatic refresh" of the
> > target branch.  It can give a maintainer extra confidence that a MR
> > will play nicely with the updated status of the target branch.  It's
> > my understanding that it should be the "key to the gates".  A minor
> > note is that conflicts are still possible in a multi-maintainer model
> > if there are more than one person doing the merges.
> >
> > A worthy point is that the GitLab web UI is not the only way to create
> > a Merge Request, but a rich set of APIs are available[5].  This is
> > interesting for many reasons, and maybe some of Peter's
> > "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> > addresses checks could be made earlier) as part of a
> > "send-mergereq"-like script, bringing conformance earlier on the merge
> > process, at the MR creation stage.
> >
> > Note: It's possible to have CI jobs definition that are specific to
> > MR, allowing generic non-MR jobs to be kept on the default
> > configuration.  This can be used so individual contributors continue
> > to leverage some of the "free" (shared) runner made available on
> > gitlab.com.
> >
> > Multi-maintainer model
> > ~~~~~~~~~~~~~~~~~~~~~~
> >
> > The previous section already introduced some of the proposed workflow
> > that can enable such a multi-maintainer model.  With a Gating CI
> > system, though, it will be natural to have a smaller "Mean time
> > between (CI) failures", simply because of the expected increased
> > number of systems and checks.  A lot of countermeasures have to be
> > employed to keep that MTBF in check.
> >
> > For once, it's imperative that the maintainers for such systems and
> > jobs are clearly defined and readily accessible.  Either the same
> > MAINTAINERS file or a more suitable variation of such data should be
> > defined before activating the *gating* rules.  This would allow a
> > routing to request the attention of the maintainer responsible.
> >
> > In case of unresposive maintainers, or any other condition that
> > renders and keeps one or more CI jobs failing for a given previously
> > established amount of time, the job can be demoted with an
> > "allow_failure" configuration[7].  Once such a change is commited, the
> > path to promotion would be just the same as in a newly added job
> > definition.
> >
> > Note: In a future phase we can evaluate the creation of rules that
> > look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> > and the execution of specific CI jobs, which would be the
> > responsibility of a given maintainer[8].
> >
> > Access to the current set of test machines
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > When compared to the various CI systems and services already being
> > employed in the QEMU project, this is the most striking difference in
> > the proposed model.  Instead of relying on shared/public/free
> > resources, this proposal also deals with privately owned and
> > operated machines.
> >
> > Even though the QEMU project operates with great cooperation, it's
> > crucial to define clear boundaries when it comes to machine access.
> > Restricted access to machines is important because:
> >
> >  * The results of jobs are many times directly tied to the setup and
> >    status of machines.  Even "soft" changes such as removing or updating
> >    packages can introduce failures in jobs (this is greatly minimized
> >    but not completely eliminated when using containers or VMs).
> >    Updating firmware or changing its settings are also examples of
> >    changes that may change the outcome of jobs.
> >
> >  * If maintainers will be accounted for the status of the jobs defined
> >    to run on specific machines, they must be sure of the status of the
> >    machines.
> >
> >  * Machines need regular monitoring and will receive required
> >    maintainance actions which can cause job regressions.
> >
> > Thus, there needs to be one clear way for machines to be *used* for
> > running jobs sent by different maintainers, while still prohibiting
> > any other privileged action that can cause permanent change to the
> > machine.  The GitLab agent (gitlab-runner) is designed to do just
> > that, and defining what will be excuted in a job (in a given system)
> > should be all that's generally allowed.  The job definition itself,
> > will of course be subject to code review before a maintainer decides
> > to send a MR containing such new or updated job definitions.
> >
> > Still related to machine maintanance, it's highly desirable for jobs
> > tied to specific host machines to be introduced alongside with
> > documentation and/or scripts that can replicate the machine setup.  If
> > the machine setup steps can be easily and reliably reproduced, then:
> >
> >  * Other people may help to debug issues and regressions if they
> >    happen to have the same hardware available
> >
> >  * Other people may provide more machines to run the same types of
> >    jobs
> >
> >  * If a machine maintainer goes MIA, it'd be easier to find another
> >    maintainer
> >
> > GitLab Jobs and Pipelines
> > -------------------------
> >
> > GitLab CI is built around two major concepts: jobs and pipelines.  The
> > current GitLab CI configuration in QEMU uses jobs only (or putting it
> > another way, all jobs in a single pipeline stage).  Consider the
> > folowing job definition[9]:
> >
> >    build-tci:
> >     script:
> >     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x x86_64"
> >     - ./configure --enable-tcg-interpreter
> >          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; done)"
> >     - make -j2
> >     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
> >     - for tg in $TARGETS ; do
> >         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
> >         ./tests/boot-serial-test || exit 1 ;
> >         ./tests/cdrom-test || exit 1 ;
> >       done
> >     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
> >     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m slow
> >
> > All the lines under "script" are performed sequentially.  It should be
> > clear that there's the possibility of breaking this down into multiple
> > stages, so that a build happens first, and then "common set of tests"
> > run in parallel.  Using the example above, it would look something
> > like:
> >
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | boot-serial-test |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | cdrom-test       |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | x86_64-pxe-test  |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-pxe-test   |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    +---------------+------------------------+
> >
> > Of course it would be silly to break down that job into smaller jobs that
> > would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
> > the pipeline approach is valid because:
> >
> >  * Common set of tests would run in parallel, giving a quicker result
> >    turnaround
> 
> check-unit is a good candidate for parallel tests. The others depends -
> I've recently turned most make check's back to -j 1 on travis because
> it's a real pain to see what test has hung when other tests keep
> running.
>

Agreed.  Running tests in parallel while keeping/presenting traceable
results is a real problem, specially in a remote environment such as
in a CI.

FIY, slightly off-topic and Avocado specific: Avocado keeps each test
result in a separate directory and log file, which helps with that.
I'm bridging the parallel test runner (introduced a few releases back)
to the existing result/reporting infrastructure.  My goal is to have
that running the acceptance tests shortly.

> >
> >  * It's easier to determine to possible nature of the problem with
> >    just the basic CI job status
> >
> >  * Different maintainers could be defined for different "common set of
> >    tests", and again by leveraging the basic CI job status, automation
> >    for directed notification can be implemented
> >
> > In the following example, "check-block" maintainers could be left
> > undisturbed with failures in the "check-acceptance" job:
> >
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | check-block      |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | check-acceptance |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    +---------------+------------------------+
> >
> > The same logic applies for test sets for different targets.  For
> > instance, combining the two previous examples, there could different
> > maintainers defined for the different jobs on the test stage:
> >
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | x86_64-block     |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | x86_64-acceptance|  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-block      |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-acceptance |  |
> >    |               |  +------------------+  |
> >    +---------------+------------------------+
> >
> > Current limitations for a multi-stage pipeline
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Because it's assumed that each job will happen in an isolated and
> > independent execution environment, jobs must explicitly define the
> > resources that will be shared between stages.  GitLab will make sure
> > the same source code revision will be available on all jobs
> > automatically.  Additionaly, GitLab supports the concept of artifacts.
> > By defining artifacts in the "build" stage, jobs in the "test" stage
> > can expect to have a copy of those artifacts automatically.
> >
> > In theory, there's nothing that prevents an entire QEMU build
> > directory, to be treated as an artifact.  In practice, there are
> > predefined limits on GitLab that prevents that from being possible,
> > resulting in errors such as:
> >
> >    Uploading artifacts...
> >    build: found 3164 matching files                   
> >    ERROR: Uploading artifacts to coordinator... too large archive
> >           id=xxxxxxx responseStatus=413 Request Entity Too Large
> >           status=413 Request Entity Too Large token=yyyyyyyyy
> >    FATAL: too large                                   
> >    ERROR: Job failed: exit code 1
> >
> > As far as I can tell, this is an instance define limit that's clearly
> > influenced by storage costs.  I see a few possible solutions to this
> > limitation:
> >
> >  1) Provide our own "artifact" like solution that uses our own storage
> >     solution
> >
> >  2) Reduce or eliminate the dependency on a complete build tree
> >
> > The first solution can go against the general trend of not having to
> > maintain CI infrastructure.  It could be made simpler by using cloud
> > storage, but there would still be some interaction with another
> > external infrastructure component.
> >
> > I find the second solution preferrable, given that most tests depend
> > on having one or a few binaries available.  I've run multi-stage
> > pipelines with some of those binaries (qemu-img,
> > $target-softmmu/qemu-system-$target) defined as artifcats and they
> > behaved as expected.  But, this could require some intrusive changes
> > to the current "make"-based test invocation.
> 
> It would be nice if the make check could be run with a make install'ed
> set of binaries. I'm not sure how much hackery would be required to get
> that to work nicely. Does specifying QEMU and QEMU_IMG prevent make
> trying to re-build everything in situ?
>

At this point, I don't know how hard it'd be.  I'll certainly give it
a try.  Thomas has provided some extra info on another response to
this thread too.

> >
> > Job Naming convention
> > ---------------------
> >
> > Based only on the very simple examples job above, it should already be
> > clear that there's a lot of possibility for confusion and chaos.  For
> > instance, by looking at the "build" job definition or results, it's
> > very hard to tell what's really about.  A bit more could be inferred by
> > the "x86_64-block" job name.
> >
> > Still, the problem we have to address here is not only about the
> > amount of information easily obtained from a job name, but allowing
> > for very similar job definitions within a global namespace.  For
> > instance, if we add an Operating Systems component to the mix, we need
> > an extra qualifier for unique job names.
> >
> > Some of the possible components in a job definition are:
> >
> >  * Stage
> >  * Build profile
> >  * Test set (a shorter name for what was described in the "Common set
> >    of tests" section)
> >  * Host architecture
> >  * Target architecture
> >  * Host Operating System identification (name and version)
> >  * Execution mode/environment (bare metal, container, VM, etc)
> >
> > Stage
> > ~~~~~
> >
> > The stage of a job (which maps roughly to its purpose) should be
> > clearly defined.  A job that builds QEMU should start with "build" and
> > a job that tests QEMU should start with "test".
> >
> > IMO, in a second phase, once multi-stage pipelines are taken for
> > granted, we could evaluate dropping this component altogether from the
> > naming convention, and relying purely on the stage classification.
> >
> > Build profile
> > ~~~~~~~~~~~~~
> >
> > Different build profiles already abound in QEMU's various CI
> > configuration files.  It's hard to put a naming convention here,
> > except that it should represent the most distinguishable
> > characteristics of the build configuration.  For instance, we can find
> > a "build-disabled" job in the current ".gitlab-ci.yml" file that is
> > aptly named, as it forcefully disables a lot of build options.
> >
> > Test set
> > ~~~~~~~~
> >
> > As mentioned in the "Common set of tests" section, I believe that the
> > make target name can be used to identify the test set that will be
> > executed in a job.  That is, if a job is to be run at the "test"
> > stage, and will run "make check", its name should start with
> > "test-check".
> >
> > QEMU Targets
> > ~~~~~~~~~~~~
> >
> > Because a given job could, and usually do, involve multiple targets, I
> > honestly can not think of how to add this to the naming convention.
> > I'll ignore it for now, and consider the targets are defined in the
> > build profile.
> 
> I like to think of three groups:
> 
>   Core SoftMMU - the major KVM architectures
>   The rest of SoftMMU - all our random emulation targets
>   linux-user
>

OK, makes sense.  It'd be nice to know if other share the same general
idea.  I'll check how pervasive this general definition is on the
documentation too.

> >
> > Host Architecture
> > ~~~~~~~~~~~~~~~~~
> >
> > The host architecture name convention should be an easy pick, given
> > that QEMU itself employes a architecture convention for its targets.
> >
> > Host OS
> > ~~~~~~~
> >
> > The suggestion I have for the host OS name is to follow the
> > libosinfo[10] convention as closely as possible.  libosinfo's "Short
> > ID" should be well suitable here.  Examples include: "openbsd4.2",
> > "opensuse42.3", "rhel8.0", "ubuntu9.10" and "win2k12r2".
> >
> > Execution Environment
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > Distinguishing between running tests in a bare-metal versus a nested
> > VM environment is quite significant to a number of people.
> >
> > Still, I think it could probably be optional for the initial
> > implementation phase, like the naming convention for the QEMU Targets.
> >
> > Example 1
> > ~~~~~~~~~
> >
> > Defining a job that will build QEMU with common debug options, on
> > a RHEL 8.0 system on a x86_64 host:
> >
> >    build-debug-rhel8.0-x86_64:
> >      script:
> >        - ./configure --enable-debug
> >        - make
> >
> > Example 2
> > ~~~~~~~~~
> >
> > Defining a job that will run the "qtest" test set on a NetBSD 8.1
> > system on an aarch64 host:
> >
> >    test-qtest-netbsd8.1-aarch64:
> >      script:
> >        - make check-qtest
> >
> > Job and Machine Scheduling
> > --------------------------
> >
> > While the naming convention gives some information to human beings,
> > and hopefully allows for some order and avoids collusions on the
> > global job namespace, it's not enough to define where those jobs
> > should run.
> >
> > Tags[11] is the available mechanism to tie jobs to specific machines
> > running the GitLab CI agent, "gitlab-runner".  Unfortunately, some
> > duplication seems unavoidable, in the sense that some of the naming
> > components listed above are machine specific, and will then need to be
> > also given as tags.
> >
> > Note: it may be a good idea to be extra verbose with tags, by having a
> > qualifier prefix.  The justification is that tags also live in a
> > global namespace, and in theory, at a given point, tags of different
> > "categories", say a CPU name and Operating System name may collide.
> > Or, it may just be me being paranoid.
> >
> > Example 1
> > ~~~~~~~~~
> >
> >    build-debug-rhel8.0-x86_64:
> >      tags:
> >        - rhel8.0
> >        - x86_64
> >      script:
> >        - ./configure --enable-debug
> >        - make
> >
> > Example 2
> > ~~~~~~~~~
> >
> >    test-qtest-netbsd8.1-aarch64:
> >      tags:
> >        - netbsd8.1
> >        - aarch64
> >      script:
> >        - make check-qtest
> 
> Where are all these going to go? Are we overloading the existing
> gitlab.yml or are we going to have a new set of configs for the GatingCI
> and keep gitlab.yml as the current subset that people run on their own
> accounts?
>

These will have to go into the existing ".gitlab-ci.yml" file, because
current GitLab has no support for multiple pipelines, and no support
for multiple "gitlab.yml" files.  That's one of the reasons why I took
the time to describe the proposal, because normalizing the current
file to receive extra jobs is, if not necessary, highly desirable.

Thanks for the feedback and insights,
- Cleber.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2019-12-02 14:05 [RFC] QEMU Gating CI Cleber Rosa
                   ` (2 preceding siblings ...)
  2019-12-03 17:54 ` Peter Maydell
@ 2020-01-17 14:33 ` Peter Maydell
  2020-01-21 20:00   ` Cleber Rosa
  2020-02-03  3:27   ` Cleber Rosa
  3 siblings, 2 replies; 22+ messages in thread
From: Peter Maydell @ 2020-01-17 14:33 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Jeff Nelson, QEMU Developers, Wainer dos Santos Moschetta,
	Markus Armbruster, Stefan Hajnoczi, Alex Bennée,
	Ademar Reis

On Mon, 2 Dec 2019 at 14:06, Cleber Rosa <crosa@redhat.com> wrote:
>
> RFC: QEMU Gating CI
> ===================
>
> This RFC attempts to address most of the issues described in
> "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> QEMU CI as we enter 4.0"[2].
>
> The general approach is one to minimize the infrastructure maintenance
> and development burden, leveraging as much as possible "other people's"
> infrastructure and code.  GitLab's CI/CD platform is the most relevant
> component dealt with here.

Happy New Year! Now we're in 2020, any chance of an update on
plans/progress here? I would very much like to be able to hand
processing of pull requests over to somebody else after the
5.0 cycle, if not before. (I'm quite tempted to make that a
hard deadline and just say that somebody else will have to
pick it up for 5.1, regardless...)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-01-17 14:33 ` Peter Maydell
@ 2020-01-21 20:00   ` Cleber Rosa
  2020-02-03  3:27   ` Cleber Rosa
  1 sibling, 0 replies; 22+ messages in thread
From: Cleber Rosa @ 2020-01-21 20:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jeff Nelson, QEMU Developers, Wainer dos Santos Moschetta,
	Markus Armbruster, Stefan Hajnoczi, Alex Bennée,
	Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

On Fri, Jan 17, 2020 at 02:33:54PM +0000, Peter Maydell wrote:
> On Mon, 2 Dec 2019 at 14:06, Cleber Rosa <crosa@redhat.com> wrote:
> >
> > RFC: QEMU Gating CI
> > ===================
> >
> > This RFC attempts to address most of the issues described in
> > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > QEMU CI as we enter 4.0"[2].
> >
> > The general approach is one to minimize the infrastructure maintenance
> > and development burden, leveraging as much as possible "other people's"
> > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > component dealt with here.
> 
> Happy New Year! Now we're in 2020, any chance of an update on
> plans/progress here? I would very much like to be able to hand
> processing of pull requests over to somebody else after the
> 5.0 cycle, if not before. (I'm quite tempted to make that a
> hard deadline and just say that somebody else will have to
> pick it up for 5.1, regardless...)
> 
> thanks
> -- PMM
> 

Hi Peter,

Happy New Year too!

As an status update, I have some work queued up related to this work
that I need to do some minor polishing and then post to the mailing
list.  That has to do with the changes to container definition files,
and the most basic changes to the GitLab YAML configuration files to
achieve a first stage of implementation.

We'd also have to coordinate access to the existing machines in use,
so that we can validate that this proposal will work.  Just to be
extra clear, I'm available to this initial configuration on the
machines that are currently running the tests, provided you think it's
a good idea.  Also, that would also be helpful to me, as I'm learning
a lot of this stuff as I go, and there's always some tricky new
details on different environments.

PS: I'm actually on the road, but I should be settle by Tomorrow, and
I expect to resume work on this the following day.

Best Regards,
- Cleber.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-01-17 14:33 ` Peter Maydell
  2020-01-21 20:00   ` Cleber Rosa
@ 2020-02-03  3:27   ` Cleber Rosa
  2020-02-03 15:00     ` Cleber Rosa
  2020-02-07 16:42     ` Peter Maydell
  1 sibling, 2 replies; 22+ messages in thread
From: Cleber Rosa @ 2020-02-03  3:27 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 5905 bytes --]

On Fri, Jan 17, 2020 at 02:33:54PM +0000, Peter Maydell wrote:
> On Mon, 2 Dec 2019 at 14:06, Cleber Rosa <crosa@redhat.com> wrote:
> >
> > RFC: QEMU Gating CI
> > ===================
> >
> > This RFC attempts to address most of the issues described in
> > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > QEMU CI as we enter 4.0"[2].
> >
> > The general approach is one to minimize the infrastructure maintenance
> > and development burden, leveraging as much as possible "other people's"
> > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > component dealt with here.
> 
> Happy New Year! Now we're in 2020, any chance of an update on
> plans/progress here? I would very much like to be able to hand
> processing of pull requests over to somebody else after the
> 5.0 cycle, if not before. (I'm quite tempted to make that a
> hard deadline and just say that somebody else will have to
> pick it up for 5.1, regardless...)
> 
> thanks
> -- PMM
> 

Hi Peter,

Last time I believe the take was to be as simplistic as possible, and
try to focus on the bare mininum necessary to implement the workflow
you described[1].  The following lines preceded by ">>>" were
extracted from the Wiki and will be used to explain those points.

   >>> The set of machine I currently test on are:
   >>>
   >>>  * an S390x box (this is provided to the project by IBM's Community
   >>>    Cloud so can be used for the new CI setup)
   >>>  * aarch32 (as a chroot on an aarch64 system)
   >>>  * aarch64
   >>>  * ppc64 (on the GCC compile farm)

I've built an updated gitlab-runner version for s390x, aarch64 and
ppc64[2].  I've now tested its behavior with the shell executor
(instead of docker) on aarch64 and ppc64.  I did not get a chance yet
to test this new version and executor with s390x, but I'm planning
to do it soon.

   >>>  * OSX
   >>>  * Windows crossbuilds
   >>>  * NetBSD, FreeBSD and OpenBSD using the tests/vm VMs

gitlab-runner clients are available for Darwin, Windows (native)
and FreeBSD.  I have *not* tested any of those, though.   I've
tried a Windows crossbuild, and with the right packages installed,
and worked like a charm on a Fedora machine.

   >>>  * x86-64 Linux with a variety of different build configs (see the
   >>>    'remake-merge-builds' script for how these are set up)

This is of course the more standard setup for gitlab-runner, and the
bulk of the work that I'm posting here is related to those different
build configs.  I assumed those x86-64 machines had some sort version
of Ubuntu, so I used 18.04.3 LTS.  Hopefully it maches most or all of
the current environment.  Please refer to messages on the mailing list
with $SUBJECT:

 [RFC PATCH 1/2] GitLab CI: avoid calling before_scripts on unintended jobs
 [RFC PATCH 2/2] GitLab CI: crude mapping of PMM's scripts to jobs

There are few question in there which I'd appreciate help with.

   >>> Testing process:
   >>>
   >>>  * I get an email which is a pull request, and I run the
   >>>    "apply-pullreq" script, which takes the GIT URL and tag/branch name
   >>>    to test.
   >>>  * apply-pullreq performs the merge into a 'staging' branch
   >>>  * apply-pullreq also performs some simple local tests:
   >>>     * does git verify-tag like the GPG signature?
   >>>     * are we trying to apply the pull before reopening the dev tree
   >>>       for a new release?
   >>>     * does the pull include commits with bad UTF8 or bogus qemu-devel
   >>>       email addresses?
   >>>     * submodule updates are only allowed if the --submodule-ok option
   >>>       was specifically passed

These steps could go unchanged at this point.  One minor remark is
that the repo hosted at gitlab.com would be used instead.  The
'staging' branch can be protected[4] so that only authorized people
can do it (and trigger the pipeline and its jobs).

   >>>  * apply-pullreq then invokes parallel-buildtest to do the actual
   >>>    testing

This would be done by GitLab instead.  The dispatching of jobs is
based on the tags given to jobs and machines.  IMO at least the OS
version and architecture should be given as tags, and the machine
needs proper setup to run a job, such as having the right packages
installed.  It can start with a proper documentation for every type of
OS and version (and possibly job type), and evolve into scripts
or other type of automation.

These are usuall identical or very similar to what is defined in
"tests/docker/dockerfiles", but need to be done at the machine level
because of the "shell" executor.

   >>>  * parallel-buildtest is a trivial wrapper around GNU Parallel which
   >>>    invokes 'mergebuild' on each of the test machines
   >>>  * if all is OK then the user gets to do the 'git push' to push the
   >>>    staging branch to master

The central place to check for success or failure would be the
pipeline page.  Also, there's a configurable notification system that
should (I've not tested it throughly) send failed and/or successful
pipeline results to the pipeline author.  IIUC, this means whoever
pushed to the 'staging' branch that caused the pipeline to be
triggered.

Let me know if this makes sense to you, and if so, we can arrange
a real world PoC.  FIY, I've run hundreds of jobs in an internal
GitLab instance, and GitLab itself (server and runner) seems very
stable.

Regards,
- Cleber.

---

[1] - https://wiki.qemu.org/Requirements/GatingCI
[2] - https://cleber.fedorapeople.org/gitlab-runner/v12.7.0/
[3] - https://docs.gitlab.com/runner/install/bleeding-edge.html#download-the-standalone-binaries
[4] - https://docs.gitlab.com/ee/user/project/protected_branches.html
[5] - https://docs.gitlab.com/ee/user/profile/notifications.html#issue--epics--merge-request-events

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-02-03  3:27   ` Cleber Rosa
@ 2020-02-03 15:00     ` Cleber Rosa
  2020-02-07 16:42     ` Peter Maydell
  1 sibling, 0 replies; 22+ messages in thread
From: Cleber Rosa @ 2020-02-03 15:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On Sun, Feb 02, 2020 at 10:27:12PM -0500, Cleber Rosa wrote:
>    >>> The set of machine I currently test on are:
>    >>>
>    >>>  * an S390x box (this is provided to the project by IBM's Community
>    >>>    Cloud so can be used for the new CI setup)
>    >>>  * aarch32 (as a chroot on an aarch64 system)
>    >>>  * aarch64
>    >>>  * ppc64 (on the GCC compile farm)
> 
> I've built an updated gitlab-runner version for s390x, aarch64 and
> ppc64[2].  I've now tested its behavior with the shell executor
> (instead of docker) on aarch64 and ppc64.  I did not get a chance yet
> to test this new version and executor with s390x, but I'm planning
> to do it soon.
>

Just a quick update on s390x.  I've run a job and had no issues:

  https://gitlab.com/cleber.gnu/qemuci/-/jobs/424084346

- Cleber.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-02-03  3:27   ` Cleber Rosa
  2020-02-03 15:00     ` Cleber Rosa
@ 2020-02-07 16:42     ` Peter Maydell
  2020-02-07 20:38       ` Cleber Rosa
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2020-02-07 16:42 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

On Mon, 3 Feb 2020 at 03:28, Cleber Rosa <crosa@redhat.com> wrote
>    >>> Testing process:
>    >>>
>    >>>  * I get an email which is a pull request, and I run the
>    >>>    "apply-pullreq" script, which takes the GIT URL and tag/branch name
>    >>>    to test.
>    >>>  * apply-pullreq performs the merge into a 'staging' branch
>    >>>  * apply-pullreq also performs some simple local tests:
>    >>>     * does git verify-tag like the GPG signature?
>    >>>     * are we trying to apply the pull before reopening the dev tree
>    >>>       for a new release?
>    >>>     * does the pull include commits with bad UTF8 or bogus qemu-devel
>    >>>       email addresses?
>    >>>     * submodule updates are only allowed if the --submodule-ok option
>    >>>       was specifically passed
>
> These steps could go unchanged at this point.  One minor remark is
> that the repo hosted at gitlab.com would be used instead.  The
> 'staging' branch can be protected[4] so that only authorized people
> can do it (and trigger the pipeline and its jobs).
>
>    >>>  * apply-pullreq then invokes parallel-buildtest to do the actual
>    >>>    testing
>
> This would be done by GitLab instead.  The dispatching of jobs is
> based on the tags given to jobs and machines.  IMO at least the OS
> version and architecture should be given as tags, and the machine
> needs proper setup to run a job, such as having the right packages
> installed.  It can start with a proper documentation for every type of
> OS and version (and possibly job type), and evolve into scripts
> or other type of automation.
>
> These are usuall identical or very similar to what is defined in
> "tests/docker/dockerfiles", but need to be done at the machine level
> because of the "shell" executor.
>
>    >>>  * parallel-buildtest is a trivial wrapper around GNU Parallel which
>    >>>    invokes 'mergebuild' on each of the test machines
>    >>>  * if all is OK then the user gets to do the 'git push' to push the
>    >>>    staging branch to master
>
> The central place to check for success or failure would be the
> pipeline page.  Also, there's a configurable notification system that
> should (I've not tested it throughly) send failed and/or successful
> pipeline results to the pipeline author.  IIUC, this means whoever
> pushed to the 'staging' branch that caused the pipeline to be
> triggered.
>
> Let me know if this makes sense to you, and if so, we can arrange
> a real world PoC.  FIY, I've run hundreds of jobs in an internal
> GitLab instance, and GitLab itself (server and runner) seems very
> stable.

This all sounds like the right thing and great progress. So yes,
I agree that the next step would be to get to a point where you
can give me some instructions on how to say "OK, here's my staging
branch" and run it through the new test process and look at the
results.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-02-07 16:42     ` Peter Maydell
@ 2020-02-07 20:38       ` Cleber Rosa
  2020-02-08 13:08         ` Peter Maydell
  0 siblings, 1 reply; 22+ messages in thread
From: Cleber Rosa @ 2020-02-07 20:38 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

On Fri, Feb 07, 2020 at 04:42:10PM +0000, Peter Maydell wrote:
> 
> This all sounds like the right thing and great progress. So yes,
> I agree that the next step would be to get to a point where you
> can give me some instructions on how to say "OK, here's my staging
> branch" and run it through the new test process and look at the
> results.
>

IIUC the point you're describing, we must:

 * Have the rigth jobs defined in .gitlab-ci.yml (there are some
   questions to be answered on that thread)

 * Setup machines with:
   - gitlab-runner (with tags matching OS and arch)
   - packages needed for the actual job execution (compilers, etc)

At this point, the "parallel-buildtest" command[1], would be replaced
with something like:

 - git push git@gitlab.com:qemu-project/qemu.git staging:staging

Which would automatically generate a pipeline.  Checking the results can
be done with programmatically using the GitLab APIs[2].

Once the result is validated, you would run "git push publish-upstream
staging:master" as usual (as instructed by the script)[3].

So this leaves us with the "musts" above, and also with creating a
command line tool that uses the GitLab APIs to check on the status of
the pipeline associated with the staging branch.

> thanks
> -- PMM
> 

Thanks for the feedback, and please please let me know if I got your
point.

Cheers,
- Cleber.

[1] - https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree/apply-pullreq#n125
[2] - https://docs.gitlab.com/ee/api/pipelines.html#list-project-pipelines
[3] - https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree/apply-pullreq#n136


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-02-07 20:38       ` Cleber Rosa
@ 2020-02-08 13:08         ` Peter Maydell
  2020-03-02 15:27           ` Peter Maydell
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2020-02-08 13:08 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

On Fri, 7 Feb 2020 at 20:39, Cleber Rosa <crosa@redhat.com> wrote:
> On Fri, Feb 07, 2020 at 04:42:10PM +0000, Peter Maydell wrote:
> > This all sounds like the right thing and great progress. So yes,
> > I agree that the next step would be to get to a point where you
> > can give me some instructions on how to say "OK, here's my staging
> > branch" and run it through the new test process and look at the
> > results.
> >
>
> IIUC the point you're describing, we must:
>
>  * Have the rigth jobs defined in .gitlab-ci.yml (there are some
>    questions to be answered on that thread)

For the non-x86 architectures, do we define the jobs
to run on those in the same .ym file? (Generally the non-x86
machines just want to run a simple make/make check; they
don't need to check the wide variety of different configs x86 does.)

>  * Setup machines with:
>    - gitlab-runner (with tags matching OS and arch)
>    - packages needed for the actual job execution (compilers, etc)
>
> At this point, the "parallel-buildtest" command[1], would be replaced
> with something like:
>
>  - git push git@gitlab.com:qemu-project/qemu.git staging:staging
>
> Which would automatically generate a pipeline.  Checking the results can
> be done with programmatically using the GitLab APIs[2].
>
> Once the result is validated, you would run "git push publish-upstream
> staging:master" as usual (as instructed by the script)[3].
>
> So this leaves us with the "musts" above, and also with creating a
> command line tool that uses the GitLab APIs to check on the status of
> the pipeline associated with the staging branch.

Yeah, that sounds right. To start with I'm ok with checking a web
page by hand to see what the job results are, so getting the
runners set up so we can test by doing git push is the place to start,
I think. Once we've got the actual testing going and all the machines
and configs we want to test in place, we can go back and look at
improving the UX for the person doing pullreqs so it's a bit
more automated using the GitLab APIs.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-02-08 13:08         ` Peter Maydell
@ 2020-03-02 15:27           ` Peter Maydell
  2020-03-05  6:50             ` Cleber Rosa
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2020-03-02 15:27 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

On Sat, 8 Feb 2020 at 13:08, Peter Maydell <peter.maydell@linaro.org> wrote:
> On Fri, 7 Feb 2020 at 20:39, Cleber Rosa <crosa@redhat.com> wrote:
> > On Fri, Feb 07, 2020 at 04:42:10PM +0000, Peter Maydell wrote:
> > > This all sounds like the right thing and great progress. So yes,
> > > I agree that the next step would be to get to a point where you
> > > can give me some instructions on how to say "OK, here's my staging
> > > branch" and run it through the new test process and look at the
> > > results.

Hi -- any progress on this front? (Maybe I missed an email; if
so, sorry about that...)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] QEMU Gating CI
  2020-03-02 15:27           ` Peter Maydell
@ 2020-03-05  6:50             ` Cleber Rosa
  0 siblings, 0 replies; 22+ messages in thread
From: Cleber Rosa @ 2020-03-05  6:50 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jeff Nelson, Markus Armbruster, Wainer dos Santos Moschetta,
	QEMU Developers, Stefan Hajnoczi, Alex Bennée, Ademar Reis

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Mon, Mar 02, 2020 at 03:27:42PM +0000, Peter Maydell wrote:
> 
> Hi -- any progress on this front? (Maybe I missed an email; if
> so, sorry about that...)
> 
> thanks
> -- PMM
> 

Hi Peter,

Yes, I've made some progress on some of the points raised on the last
email exchanges:

 1) Jobs on non-Linux OS. I've built/setup gitlab-runner for FreeBSD,
    and tested a job:
    - https://gitlab.com/cleber.gnu/qemuci/-/jobs/440379169
    
    There are some limitations on a library that gitlab-runner uses to
    manage services (and that has no implementation for FreeBSD "services").
    But, there are workarounds that work allright.

 2) Wrote a script that checks/waits on the pipeline:
    - https://gitlab.com/cleber.gnu/qemuci/-/commit/d90c5cf917c43f06c0724dc025205d618521c4cc

 3) Wrote machine setup documentation/scripts.

I'm tidying it all up to send a PR in the next day or two.

Thanks for your patience!
- Cleber.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-03-05  6:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02 14:05 [RFC] QEMU Gating CI Cleber Rosa
2019-12-02 17:00 ` Stefan Hajnoczi
2019-12-02 17:08   ` Peter Maydell
2019-12-02 18:28     ` Cleber Rosa
2019-12-02 18:36       ` Warner Losh
2019-12-02 22:38         ` Cleber Rosa
2019-12-02 18:12   ` Cleber Rosa
2019-12-03 14:14     ` Stefan Hajnoczi
2019-12-03 14:07 ` Alex Bennée
2019-12-04  8:55   ` Thomas Huth
2019-12-06 19:03   ` Cleber Rosa
2019-12-03 17:54 ` Peter Maydell
2019-12-05  5:05   ` Cleber Rosa
2020-01-17 14:33 ` Peter Maydell
2020-01-21 20:00   ` Cleber Rosa
2020-02-03  3:27   ` Cleber Rosa
2020-02-03 15:00     ` Cleber Rosa
2020-02-07 16:42     ` Peter Maydell
2020-02-07 20:38       ` Cleber Rosa
2020-02-08 13:08         ` Peter Maydell
2020-03-02 15:27           ` Peter Maydell
2020-03-05  6:50             ` Cleber Rosa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).