From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 765D2C432C0 for ; Tue, 3 Dec 2019 14:18:23 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3036B20684 for ; Tue, 3 Dec 2019 14:18:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="eGWkxLqz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3036B20684 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:54426 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ic90P-00070I-Sz for qemu-devel@archiver.kernel.org; Tue, 03 Dec 2019 09:18:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:46776) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ic8qs-0002B2-Lg for qemu-devel@nongnu.org; Tue, 03 Dec 2019 09:08:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ic8qG-000184-Cl for qemu-devel@nongnu.org; Tue, 03 Dec 2019 09:07:57 -0500 Received: from mail-wr1-x442.google.com ([2a00:1450:4864:20::442]:42040) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ic8q8-0000zF-4h for qemu-devel@nongnu.org; Tue, 03 Dec 2019 09:07:51 -0500 Received: by mail-wr1-x442.google.com with SMTP id a15so3805570wrf.9 for ; Tue, 03 Dec 2019 06:07:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=63L/uB3wsiSJVS66jNLNHAGEcuh3laIFH4uatnAHI10=; b=eGWkxLqzFKcyjUTHuFxnckM52T2vLpp6c8Zh+BbVoLlsx3rP6NQFZ/CdR+WU4IuywQ o/D+H6D6k6aUtYkLyeRieJ+KLDzQiOlrbfKqF8ERBInSChPtZaIlf/lpXJxdcpFjPh6D 4HwYXcM9zn4kikrfqTn1nALD9qx6UTWE+I/RIcI1XF7pCQM0//77xO7ZIq7ru2lKlozB +NavUOpW/XCmOfA04TzGcqkmEV4csTn+R3QbxB4AsauYDfgoT8YH0L3L5D8gWZoSUezJ LJIlyGSV0Bn3aw4EpbWyMnAFyVRZLiLEm+rh1G81Ut/Oqtn6oQsDGTUatvDD+HFbsfIT Uq3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=63L/uB3wsiSJVS66jNLNHAGEcuh3laIFH4uatnAHI10=; b=sbMWJhAMDh+tBd4/8TFmWk8bAcOGa/yPQQIWRCuxtiDo9jzV5yP4Cno71Xory3IHqg +NPxAYHdtwDL4rfnkj+v4YU4IGo+g1BD55mIbvobekHR/TKVT+uMNcbzrgFHFGDXGIbT FNwJenDz62nKofOovTC6Qhp12ATmMaf4GkndgPI/5q1znbEpbyEutJgP17smpmP2kLhm eBrTfN6ZFwUReZ81/0ZkTwhAa3Vtyoz6n2gC7yyBAM4HkMEpEihqNY79srwWhVeXn1md hl+PY6w2f5HQPQCocNzdjt4U72fv8YtHxXzFDLXiesmtj6ajcylFxSgoJc+YIm8LJyEe B2eQ== X-Gm-Message-State: APjAAAWWT49CpFAt8qBqPM6bB8e7DazkoFYoYIZ/NDmvuowctSnktXE2 xvDdH7OjjyJA55QyFqC2gKcyvg== X-Google-Smtp-Source: APXvYqwU4/4BdRZ2meVelLparifNJSUl8ta3ixlcEtuGeTtm3Zs0163loAihnY187azOzQnqCa1crg== X-Received: by 2002:adf:cf0a:: with SMTP id o10mr3408336wrj.325.1575382055590; Tue, 03 Dec 2019 06:07:35 -0800 (PST) Received: from zen.linaroharston ([51.148.130.216]) by smtp.gmail.com with ESMTPSA id f67sm3112109wme.16.2019.12.03.06.07.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Dec 2019 06:07:33 -0800 (PST) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id E27031FF87; Tue, 3 Dec 2019 14:07:32 +0000 (GMT) References: <20191202140552.GA5353@localhost.localdomain> User-agent: mu4e 1.3.5; emacs 27.0.50 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Cleber Rosa Subject: Re: [RFC] QEMU Gating CI In-reply-to: <20191202140552.GA5353@localhost.localdomain> Date: Tue, 03 Dec 2019 14:07:32 +0000 Message-ID: <87a789bizf.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::442 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Stefan Hajnoczi , qemu-devel@nongnu.org, Wainer dos Santos Moschetta , Markus Armbruster , Jeff Nelson , Ademar Reis Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Cleber Rosa writes: > RFC: QEMU Gating CI > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > This RFC attempts to address most of the issues described in > "Requirements/GatinCI"[1]. An also relevant write up is the "State of > QEMU CI as we enter 4.0"[2]. > > The general approach is one to minimize the infrastructure maintenance > and development burden, leveraging as much as possible "other people's" > infrastructure and code. GitLab's CI/CD platform is the most relevant > component dealt with here. > > Problem Statement > ----------------- > > The following is copied verbatim from Peter Maydell's write up[1]: > > "A gating CI is a prerequisite to having a multi-maintainer model of > merging. By having a common set of tests that are run prior to a merge > you do not rely on who is currently doing merging duties having access > to the current set of test machines." > > This is of a very simplified view of the problem that I'd like to break > down even further into the following key points: > > * Common set of tests > * Pre-merge ("prior to a merge") > * Access to the current set of test machines > * Multi-maintainer model > > Common set of tests > ~~~~~~~~~~~~~~~~~~~ > > Before we delve any further, let's make it clear that a "common set of > tests" is really a "dynamic common set of tests". My point is that a > set of tests in QEMU may include or exclude different tests depending > on the environment. > > The exact tests that will be executed may differ depending on the > environment, including: > > * Hardware > * Operating system > * Build configuration > * Environment variables > > In the "State of QEMU CI as we enter 4.0" Alex Benn=C3=A9e listed some of > those "common set of tests": > > * check Check encompasses a subset of the other checks - currently: - check-unit - check-qtest - check-block The thing that stops other groups of tests being included is generally are they solid on all the various hw/os/config/env setups you describe. For example check-tcg currently fails gloriously on non-x86 with docker enabled as it tries to get all the cross compiler images working. > * check-tcg > * check-softfloat > * check-block > * check-acceptance > > While Peter mentions that most of his checks are limited to: > > * check > * check-tcg > > Our current inability to quickly identify a faulty test from test > execution results (and specially in remote environments), and act upon > it (say quickly disable it on a given host platform), makes me believe > that it's fair to start a gating CI implementation that uses this > rather coarse granularity. > > Another benefit is a close or even a 1:1 relationship between a common > test set and an entry in the CI configuration. For instance, the > "check" common test set would map to a "make check" command in a > "script:" YAML entry. > > To exemplify my point, if one specific test run as part of "check-tcg" > is found to be faulty on a specific job (say on a specific OS), the > entire "check-tcg" test set may be disabled as a CI-level maintenance > action. This would in this example eliminate practically all emulation testing apart from the very minimal boot-codes that get spun up by the various qtest migration tests. And of course the longer a group of tests is disabled the larger the window for additional regressions to get in. It may be a reasonable approach but it's not without consequence. > Of course a follow up action to deal with the specific test > is required, probably in the form of a Launchpad bug and patches > dealing with the issue, but without necessarily a CI related angle to > it. > > If/when test result presentation and control mechanism evolve, we may > feel confident and go into finer grained granularity. For instance, a > mechanism for disabling nothing but "tests/migration-test" on a given > environment would be possible and desirable from a CI management > level. The migration tests have found regressions although the problem has generally been they were intermittent failures and hard to reproduce locally. The last one took a few weeks of grinding to reproduce and get patches together. > Pre-merge > ~~~~~~~~~ > > The natural way to have pre-merge CI jobs in GitLab is to send "Merge > Requests"[3] (abbreviated as "MR" from now on). In most projects, a > MR comes from individual contributors, usually the authors of the > changes themselves. It's my understanding that the current maintainer > model employed in QEMU will *not* change at this time, meaning that > code contributions and reviews will continue to happen on the mailing > list. A maintainer then, having collected a number of patches, would > submit a MR either in addition or in substitution to the Pull Requests > sent to the mailing list. > > "Pipelines for Merged Results"[4] is a very important feature to > support the multi-maintainer model, and looks in practice, similar to > Peter's "staging" branch approach, with an "automatic refresh" of the > target branch. It can give a maintainer extra confidence that a MR > will play nicely with the updated status of the target branch. It's > my understanding that it should be the "key to the gates". A minor > note is that conflicts are still possible in a multi-maintainer model > if there are more than one person doing the merges. > > A worthy point is that the GitLab web UI is not the only way to create > a Merge Request, but a rich set of APIs are available[5]. This is > interesting for many reasons, and maybe some of Peter's > "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email > addresses checks could be made earlier) as part of a > "send-mergereq"-like script, bringing conformance earlier on the merge > process, at the MR creation stage. > > Note: It's possible to have CI jobs definition that are specific to > MR, allowing generic non-MR jobs to be kept on the default > configuration. This can be used so individual contributors continue > to leverage some of the "free" (shared) runner made available on > gitlab.com. > > Multi-maintainer model > ~~~~~~~~~~~~~~~~~~~~~~ > > The previous section already introduced some of the proposed workflow > that can enable such a multi-maintainer model. With a Gating CI > system, though, it will be natural to have a smaller "Mean time > between (CI) failures", simply because of the expected increased > number of systems and checks. A lot of countermeasures have to be > employed to keep that MTBF in check. > > For once, it's imperative that the maintainers for such systems and > jobs are clearly defined and readily accessible. Either the same > MAINTAINERS file or a more suitable variation of such data should be > defined before activating the *gating* rules. This would allow a > routing to request the attention of the maintainer responsible. > > In case of unresposive maintainers, or any other condition that > renders and keeps one or more CI jobs failing for a given previously > established amount of time, the job can be demoted with an > "allow_failure" configuration[7]. Once such a change is commited, the > path to promotion would be just the same as in a newly added job > definition. > > Note: In a future phase we can evaluate the creation of rules that > look at changed paths in a MR (similar to "F:" entries on MAINTAINERS) > and the execution of specific CI jobs, which would be the > responsibility of a given maintainer[8]. > > Access to the current set of test machines > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > When compared to the various CI systems and services already being > employed in the QEMU project, this is the most striking difference in > the proposed model. Instead of relying on shared/public/free > resources, this proposal also deals with privately owned and > operated machines. > > Even though the QEMU project operates with great cooperation, it's > crucial to define clear boundaries when it comes to machine access. > Restricted access to machines is important because: > > * The results of jobs are many times directly tied to the setup and > status of machines. Even "soft" changes such as removing or updating > packages can introduce failures in jobs (this is greatly minimized > but not completely eliminated when using containers or VMs). > Updating firmware or changing its settings are also examples of > changes that may change the outcome of jobs. > > * If maintainers will be accounted for the status of the jobs defined > to run on specific machines, they must be sure of the status of the > machines. > > * Machines need regular monitoring and will receive required > maintainance actions which can cause job regressions. > > Thus, there needs to be one clear way for machines to be *used* for > running jobs sent by different maintainers, while still prohibiting > any other privileged action that can cause permanent change to the > machine. The GitLab agent (gitlab-runner) is designed to do just > that, and defining what will be excuted in a job (in a given system) > should be all that's generally allowed. The job definition itself, > will of course be subject to code review before a maintainer decides > to send a MR containing such new or updated job definitions. > > Still related to machine maintanance, it's highly desirable for jobs > tied to specific host machines to be introduced alongside with > documentation and/or scripts that can replicate the machine setup. If > the machine setup steps can be easily and reliably reproduced, then: > > * Other people may help to debug issues and regressions if they > happen to have the same hardware available > > * Other people may provide more machines to run the same types of > jobs > > * If a machine maintainer goes MIA, it'd be easier to find another > maintainer > > GitLab Jobs and Pipelines > ------------------------- > > GitLab CI is built around two major concepts: jobs and pipelines. The > current GitLab CI configuration in QEMU uses jobs only (or putting it > another way, all jobs in a single pipeline stage). Consider the > folowing job definition[9]: > > build-tci: > script: > - TARGETS=3D"aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x= x86_64" > - ./configure --enable-tcg-interpreter > --target-list=3D"$(for tg in $TARGETS; do echo -n ${tg}'-softmmu= '; done)" > - make -j2 > - make tests/boot-serial-test tests/cdrom-test tests/pxe-test > - for tg in $TARGETS ; do > export QTEST_QEMU_BINARY=3D"${tg}-softmmu/qemu-system-${tg}" ; > ./tests/boot-serial-test || exit 1 ; > ./tests/cdrom-test || exit 1 ; > done > - QTEST_QEMU_BINARY=3D"x86_64-softmmu/qemu-system-x86_64" ./tests/pxe= -test > - QTEST_QEMU_BINARY=3D"s390x-softmmu/qemu-system-s390x" ./tests/pxe-t= est -m slow > > All the lines under "script" are performed sequentially. It should be > clear that there's the possibility of breaking this down into multiple > stages, so that a build happens first, and then "common set of tests" > run in parallel. Using the example above, it would look something > like: > > +---------------+------------------------+ > | BUILD STAGE | TEST STAGE | > +---------------+------------------------+ > | +-------+ | +------------------+ | > | | build | | | boot-serial-test | | > | +-------+ | +------------------+ | > | | | > | | +------------------+ | > | | | cdrom-test | | > | | +------------------+ | > | | | > | | +------------------+ | > | | | x86_64-pxe-test | | > | | +------------------+ | > | | | > | | +------------------+ | > | | | s390x-pxe-test | | > | | +------------------+ | > | | | > +---------------+------------------------+ > > Of course it would be silly to break down that job into smaller jobs that > would run individual tests like "boot-serial-test" or "cdrom-test". Stil= l, > the pipeline approach is valid because: > > * Common set of tests would run in parallel, giving a quicker result > turnaround check-unit is a good candidate for parallel tests. The others depends - I've recently turned most make check's back to -j 1 on travis because it's a real pain to see what test has hung when other tests keep running. > > * It's easier to determine to possible nature of the problem with > just the basic CI job status > > * Different maintainers could be defined for different "common set of > tests", and again by leveraging the basic CI job status, automation > for directed notification can be implemented > > In the following example, "check-block" maintainers could be left > undisturbed with failures in the "check-acceptance" job: > > +---------------+------------------------+ > | BUILD STAGE | TEST STAGE | > +---------------+------------------------+ > | +-------+ | +------------------+ | > | | build | | | check-block | | > | +-------+ | +------------------+ | > | | | > | | +------------------+ | > | | | check-acceptance | | > | | +------------------+ | > | | | > +---------------+------------------------+ > > The same logic applies for test sets for different targets. For > instance, combining the two previous examples, there could different > maintainers defined for the different jobs on the test stage: > > +---------------+------------------------+ > | BUILD STAGE | TEST STAGE | > +---------------+------------------------+ > | +-------+ | +------------------+ | > | | build | | | x86_64-block | | > | +-------+ | +------------------+ | > | | | > | | +------------------+ | > | | | x86_64-acceptance| | > | | +------------------+ | > | | | > | | +------------------+ | > | | | s390x-block | | > | | +------------------+ | > | | | > | | +------------------+ | > | | | s390x-acceptance | | > | | +------------------+ | > +---------------+------------------------+ > > Current limitations for a multi-stage pipeline > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Because it's assumed that each job will happen in an isolated and > independent execution environment, jobs must explicitly define the > resources that will be shared between stages. GitLab will make sure > the same source code revision will be available on all jobs > automatically. Additionaly, GitLab supports the concept of artifacts. > By defining artifacts in the "build" stage, jobs in the "test" stage > can expect to have a copy of those artifacts automatically. > > In theory, there's nothing that prevents an entire QEMU build > directory, to be treated as an artifact. In practice, there are > predefined limits on GitLab that prevents that from being possible, > resulting in errors such as: > > Uploading artifacts... > build: found 3164 matching files=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20 > ERROR: Uploading artifacts to coordinator... too large archive > id=3Dxxxxxxx responseStatus=3D413 Request Entity Too Large > status=3D413 Request Entity Too Large token=3Dyyyyyyyyy > FATAL: too large=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 > ERROR: Job failed: exit code 1 > > As far as I can tell, this is an instance define limit that's clearly > influenced by storage costs. I see a few possible solutions to this > limitation: > > 1) Provide our own "artifact" like solution that uses our own storage > solution > > 2) Reduce or eliminate the dependency on a complete build tree > > The first solution can go against the general trend of not having to > maintain CI infrastructure. It could be made simpler by using cloud > storage, but there would still be some interaction with another > external infrastructure component. > > I find the second solution preferrable, given that most tests depend > on having one or a few binaries available. I've run multi-stage > pipelines with some of those binaries (qemu-img, > $target-softmmu/qemu-system-$target) defined as artifcats and they > behaved as expected. But, this could require some intrusive changes > to the current "make"-based test invocation. It would be nice if the make check could be run with a make install'ed set of binaries. I'm not sure how much hackery would be required to get that to work nicely. Does specifying QEMU and QEMU_IMG prevent make trying to re-build everything in situ? > > Job Naming convention > --------------------- > > Based only on the very simple examples job above, it should already be > clear that there's a lot of possibility for confusion and chaos. For > instance, by looking at the "build" job definition or results, it's > very hard to tell what's really about. A bit more could be inferred by > the "x86_64-block" job name. > > Still, the problem we have to address here is not only about the > amount of information easily obtained from a job name, but allowing > for very similar job definitions within a global namespace. For > instance, if we add an Operating Systems component to the mix, we need > an extra qualifier for unique job names. > > Some of the possible components in a job definition are: > > * Stage > * Build profile > * Test set (a shorter name for what was described in the "Common set > of tests" section) > * Host architecture > * Target architecture > * Host Operating System identification (name and version) > * Execution mode/environment (bare metal, container, VM, etc) > > Stage > ~~~~~ > > The stage of a job (which maps roughly to its purpose) should be > clearly defined. A job that builds QEMU should start with "build" and > a job that tests QEMU should start with "test". > > IMO, in a second phase, once multi-stage pipelines are taken for > granted, we could evaluate dropping this component altogether from the > naming convention, and relying purely on the stage classification. > > Build profile > ~~~~~~~~~~~~~ > > Different build profiles already abound in QEMU's various CI > configuration files. It's hard to put a naming convention here, > except that it should represent the most distinguishable > characteristics of the build configuration. For instance, we can find > a "build-disabled" job in the current ".gitlab-ci.yml" file that is > aptly named, as it forcefully disables a lot of build options. > > Test set > ~~~~~~~~ > > As mentioned in the "Common set of tests" section, I believe that the > make target name can be used to identify the test set that will be > executed in a job. That is, if a job is to be run at the "test" > stage, and will run "make check", its name should start with > "test-check". > > QEMU Targets > ~~~~~~~~~~~~ > > Because a given job could, and usually do, involve multiple targets, I > honestly can not think of how to add this to the naming convention. > I'll ignore it for now, and consider the targets are defined in the > build profile. I like to think of three groups: Core SoftMMU - the major KVM architectures The rest of SoftMMU - all our random emulation targets linux-user > > Host Architecture > ~~~~~~~~~~~~~~~~~ > > The host architecture name convention should be an easy pick, given > that QEMU itself employes a architecture convention for its targets. > > Host OS > ~~~~~~~ > > The suggestion I have for the host OS name is to follow the > libosinfo[10] convention as closely as possible. libosinfo's "Short > ID" should be well suitable here. Examples include: "openbsd4.2", > "opensuse42.3", "rhel8.0", "ubuntu9.10" and "win2k12r2". > > Execution Environment > ~~~~~~~~~~~~~~~~~~~~~ > > Distinguishing between running tests in a bare-metal versus a nested > VM environment is quite significant to a number of people. > > Still, I think it could probably be optional for the initial > implementation phase, like the naming convention for the QEMU Targets. > > Example 1 > ~~~~~~~~~ > > Defining a job that will build QEMU with common debug options, on > a RHEL 8.0 system on a x86_64 host: > > build-debug-rhel8.0-x86_64: > script: > - ./configure --enable-debug > - make > > Example 2 > ~~~~~~~~~ > > Defining a job that will run the "qtest" test set on a NetBSD 8.1 > system on an aarch64 host: > > test-qtest-netbsd8.1-aarch64: > script: > - make check-qtest > > Job and Machine Scheduling > -------------------------- > > While the naming convention gives some information to human beings, > and hopefully allows for some order and avoids collusions on the > global job namespace, it's not enough to define where those jobs > should run. > > Tags[11] is the available mechanism to tie jobs to specific machines > running the GitLab CI agent, "gitlab-runner". Unfortunately, some > duplication seems unavoidable, in the sense that some of the naming > components listed above are machine specific, and will then need to be > also given as tags. > > Note: it may be a good idea to be extra verbose with tags, by having a > qualifier prefix. The justification is that tags also live in a > global namespace, and in theory, at a given point, tags of different > "categories", say a CPU name and Operating System name may collide. > Or, it may just be me being paranoid. > > Example 1 > ~~~~~~~~~ > > build-debug-rhel8.0-x86_64: > tags: > - rhel8.0 > - x86_64 > script: > - ./configure --enable-debug > - make > > Example 2 > ~~~~~~~~~ > > test-qtest-netbsd8.1-aarch64: > tags: > - netbsd8.1 > - aarch64 > script: > - make check-qtest Where are all these going to go? Are we overloading the existing gitlab.yml or are we going to have a new set of configs for the GatingCI and keep gitlab.yml as the current subset that people run on their own accounts? > > Operating System definition versus Container Images > --------------------------------------------------- > > In the previous section and examples, we're assuming that tests will > run on machines that have registered "gitlab-runner" agents with > matching tags. The tags given at gitlab-runner registration time > would of course match the same naming convention defined earlier. > > So, if one is registering a "gitlab-runner" instance on a x86_64 > machine, running RHEL 8.0, the tags "rhel8.0" and "x86_64" would be > given (possibly among others). > > Nevertheless, most deployment scenarios will probably rely on jobs > being executed by gitlab-runner's container executor (currently > Docker-only). This means that tags given to a job *may* drop the tag > associated with the host operating system selection, and instead > provide the ".gitlab-ci.yml" configuration directive that determines > the container image to be used. > > Most jobs would probably *not* require a matching host operating > system and container images, but there should still be the capability > to make it a requirement. For instance, jobs containing tests that > require the KVM accelerator on specific scenarios may require a > matching host Operating System. > > Note: What was mentioned in the "Execution Environment" section under > the naming conventions section, is also closely related to this > requirement, that is, one may require a job to run under a container, > VM or bare metal. > > Example 1 > ~~~~~~~~~ > > Build QEMU on a "rhel8.0" image hosted under the "qemuci" organization > and require the runner to support container execution: > > build-debug-rhel8.0-x86_64: > tags: > - x86_64 > - container > image: qemuci/rhel8.0 > script: > - ./configure --enable-debug > - make > > Example 2 > ~~~~~~~~~ > > Run check QEMU on a "rhel8.0" image hosted under the "qemuci" > organization and require the runner to support container execution and > be on a matching host: > > test-check-rhel8.0-x86_64: > tags: > - x86_64 > - rhel8.0 > - container > image: qemuci/rhel8.0 > script: > - make check > > Next > ---- > > Because this document is already too long and that can be distracting, > I decided to defer many other implementation level details to a second > RFC, alongside some code. > > Some completementary topics that I have prepared include: > > * Container images creation, hosting and management > * Advanced pipeline definitions > - Job depedencies > - Artifacts > - Results > * GitLab CI for Individial Contributors > * GitLab runner: > - Official and Custom Binaries > - Executors > - Security implications > - Helper container images for non supported architectures > * Checklists for: > - Preparing and documenting machine setup > - Proposing new runners and jobs > - Runners and jobs promotions and demotions > > Of course any other topics that spurr from this discussion will also > be added to the following threads. > > References: > ----------- > [1] https://wiki.qemu.org/Requirements/GatingCI > [2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg04909.html > [3] https://docs.gitlab.com/ee/gitlab-basics/add-merge-request.html > [4] https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_= merged_results/index.html > [5] https://docs.gitlab.com/ee/api/merge_requests.html#create-mr-pipeline > [6] https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree/ap= ply-pullreq > [7] https://docs.gitlab.com/ee/ci/yaml/README.html#allow_failure > [8] https://docs.gitlab.com/ee/ci/yaml/README.html#using-onlychanges-wit= h-pipelines-for-merge-requests > [9] https://github.com/qemu/qemu/blob/fb2246882a2c8d7f084ebe0617e97ac784= 67d156/.gitlab-ci.yml#L70=20 > [10] https://libosinfo.org/ > [11] https://docs.gitlab.com/ee/ci/runners/README.html#using-tags --=20 Alex Benn=C3=A9e