From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Jeff King <peff@peff.net>
Subject: Re: [PATCH] .github/workflows: add coverity action
Date: Tue, 29 Aug 2023 10:18:24 +0200 (CEST) [thread overview]
Message-ID: <6534ba5d-f4a6-71e6-5b0f-9cba2be8426e@gmx.de> (raw)
In-Reply-To: <4590e1381feb8962cadf2b40b22086531d662ef8.1692675172.git.me@ttaylorr.com>
Hi Taylor,
On Mon, 21 Aug 2023, Taylor Blau wrote:
> diff --git a/.github/workflows/coverity.yml b/.github/workflows/coverity.yml
> new file mode 100644
> index 0000000000..26b9145d9e
> --- /dev/null
> +++ b/.github/workflows/coverity.yml
> @@ -0,0 +1,35 @@
> +name: Coverity
> +
> +on:
> + push:
> + branches:
> + - master
> + tags:
> + - '*'
I would consider it too late to do a Coverity analysis when the version
is already tagged. Therefore, I would like to see this `tags: *' part go.
And I agree with Peff that we should activate this build for `next`, too.
And for good measure for `maint` (and maybe `maint-*`, although those
branches are typically not pushed to `git/git`).
> +
> +jobs:
> + coverity:
> + runs-on: ubuntu-latest
> + env:
> + HAVE_COVERITY_TOKEN: ${{ secrets.COVERITY_SCAN_EMAIL != '' && secrets.COVERITY_SCAN_TOKEN != '' }}
> + steps:
> + - id: check-coverity
> + name: check whether Coverity token is configured
> + run: |
> + echo "enabled=$HAVE_COVERITY_TOKEN" >>$GITHUB_OUTPUT
The canonical output for a step that has no other outputs is called
`result`. That output name is handled by linters such as
https://rhysd.github.io/actionlint/, too, in contrast to outputs with
different names that may be set in a free-form `run:` step that is
impossible to parse by `actionlint`.
But I see a bigger problem: Contrary to what the commit message claims,
the design makes this workflow far from a no-op: There will always be the
need for a VM to be spun up, to receive the orchestration script, to
download the necessary Actions (in this instance,
`vapier/coverity-scan-action`), only to check environment variables, see
that actually nothing needs to be done, update the job's and the run's
status (claiming "success" even if nothing was sent to Coverity) and then
shut down.
We discussed this in the past, as it is a great waste of resources (even
if other people pay for it, it still costs money [*1*] and electricity),
and for some of us even more relevant: it is a great waste of time because
when you are already waiting for 20 queued jobs to be picked up, having to
wait for the next build agent to become available only to spend, I don't
know, half a minute (wall clock time)? on getting this workflow run
queued, the next build agent to become available, the job to be set up,
the first step to be run, the others to be skipped, and then to wrap it
all up by updating the run as completed. I would not fault anyone for
being frustrated with this.
Therefore, I would strongly suggest looking for alternative ways to skip
this workflow run, before applying this patch.
I see these viable alternatives, all pulling the `if:` directive from the
step level to the job level, just like the `l10n.yml` workflow does [*2*],
to avoid even starting the job unless really needed:
- Limit it to the primary repository, `git/git`:
if: github.event.repository.fork == false
This is what Git for Windows does in the build that validates the
revision to use in the `git-for-windows/setup-git-for-windows-sdk`
Action that is used in Git's CI runs to build and test Git on Windows:
https://github.com/git-for-windows/git-sdk-64/blob/894191270a78/.github/workflows/ci-artifacts.yml#L14
While this still clutters the list of workflow runs with plenty of
skipped runs (see https://github.com/git/git/actions/workflows/l10n.yml
for an example how this looks), at least it avoids wasting a build agent
on essentially only checking for the presence of two secrets.
Developers wishing to run this workflow in their fork could always
create a branch, edit the workflow to remove this `if:` directive, and
push the branch to their fork.
- Limit it to `git/git` explicitly, by name:
if: github.event.repository.owner.login == 'git'
This is how we run the `vs-build`/`vs-test` combo only in the
`git-for-windows` fork:
https://github.com/git/git/blob/v2.42.0/.github/workflows/main.yml#L150
The advantage here is that it would be slightly more obvious what we are
doing here than checking `fork == false`, as the org name is mentioned
explicitly.
- Limit it to `git/git` but also allow it to be triggered manually:
if: github.event.repository.fork == false || github.event_name == 'workflow_dispatch'
This is what Git for Windows does in the test verifying that it works in
the "Nano Server" container image that allows for kind of a minimal
Windows system to be run via Docker:
https://github.com/git-for-windows/git-sdk-64/blob/894191270a78/.github/workflows/nano-server.yml#L7
The advantage of allowing the workflow to be triggered via a
`workflow_dispatch` is that developers wishing to run this workflow in
their fork would not need to edit the GitHub workflow but could run it
unmodified. For details, see
https://docs.github.com/en/actions/using-workflows/manually-running-a-workflow
- Limit it by repository "topics" (think: "repository tags"):
if: contains(github.event.repository.topics, 'has-coverity-secrets')
On GitHub, every repository can be annotated with "topics", i.e. a list
of labels (which in other scenarios would be called "tags"). These
topics can be seen by the GitHub workflow run, _before_ even starting a
job, via the `github.event.repository` attribute.
This is how InnoSetup does it:
https://github.com/jrsoftware/issrc/blob/main/.github/workflows/build.yml#L12
The primary repository does not have the secrets required to run this
workflow, and hence does not have the "has-issrc-build-env" topic as
can be seen here: https://github.com/jrsoftware/issrc (look below the
"About" section on the right side, where you see the topics "installer"
and "inno-setup"). My personal fork does have them secrets, and the
topic "has-issrc-build-env" says as much: https://github.com/dscho/issrc
The advantage of this solution is that it is very easy to "turn on and
off" the workflow on a repository level, simply by adding or removing
the "has-coverity-secrets" topic. See:
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/classifying-your-repository-with-topics
The downside is that this method is somewhat unintuitive and needs the
developer to read documentation (*shudder*) to know what they need to
do.
I have no strong preference as long as the job is skipped whenever the
Coverity build (read: the `coverity` GitHub workflow run) should be
skipped.
> + - uses: actions/checkout@v3
> + if: steps.check-coverity.outputs.enabled == 'true'
> + - run: ci/install-dependencies.sh
> + env:
> + CC: gcc
> + CC_PACKAGE: gcc-9
> + jobname: linux-gcc-default
How about modifying `ci/install-dependencies.sh` to introduce a new job
name, `coverity`, that is handled like `linux-gcc-default`? I.e. something
like this:
-- snip --
diff --git a/ci/install-dependencies.sh b/ci/install-dependencies.sh
index 107757a1fea..e6a894e507d 100755
--- a/ci/install-dependencies.sh
+++ b/ci/install-dependencies.sh
@@ -74,7 +74,7 @@ Documentation)
test -n "$ALREADY_HAVE_ASCIIDOCTOR" ||
sudo gem install --version 1.5.8 asciidoctor
;;
-linux-gcc-default)
+linux-gcc-default|coverity)
sudo apt-get -q update
sudo apt-get -q -y install $UBUNTU_COMMON_PKGS
;;
diff --git a/ci/lib.sh b/ci/lib.sh
index 1b0cc2b57db..7a196aa0d17 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -227,7 +227,7 @@ export SKIP_DASHED_BUILT_INS=YesPlease
case "$runs_on_pool" in
ubuntu-latest)
- if test "$jobname" = "linux-gcc-default"
+ if test "$jobname" = "linux-gcc-default" || test "$jobname" = "coverity"
then
break
fi
-- snap --
That would be a lot cleaner and would also allow for Coverity-specific
things in this step that won't affect `linux-gcc-default` at the same
time.
> + runs_on_pool: ubuntu-latest
> + if: steps.check-coverity.outputs.enabled == 'true'
> + - uses: vapier/coverity-scan-action@v1
In contrast to the other Actions we use in Git, this one is neither
provided by GitHub nor by the Git (or Git for Windows) project [*3*].
The recommended way to handle such third-party Actions is to pin them by
full length commit SHA:
https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions
This is particularly important because we're passing secret values to that
Action.
> + if: steps.check-coverity.outputs.enabled == 'true'
> + with:
> + email: ${{ secrets.COVERITY_SCAN_EMAIL }}
> + token: ${{ secrets.COVERITY_SCAN_TOKEN }}
> + command: make -j8
> +
The commit message would make for a fine place to describe the role of
these `COVERITY_SCAN_*` secrets, how to obtain them, and how to configure
them (i.e. providing a link to
https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository).
The reason I am suggesting that is that many a time, I have authored
commits like this one, only to puzzle about the exact steps to configure
the the thing properly when I needed it six months later, and regretting
dearly not having had the foresight to help my future self while writing
the commit message.
Another important issue that should be discussed in the commit message is
the matter of Coverity's false positives. I regularly dread going through
the new reports in Coverity (I am doing this for git-for-windows) because
of the many falsely pointed out "buffer overruns" when working e.g. with
`strvec` where Coverity assumes that its `empty_strvec` is ever written
to.
In Git for Windows, I have not had any luck with suppressing these false
reports. For what it's worth, this is the "model file" I am currently
using:
-- snip --
/* modelfile for git */
char strbuf_slopbuf[64];
void *malloc(size_t);
void *calloc(size_t, size_t);
void *realloc(void *, size_t);
void free(void *);
void *xrealloc(void *ptr, size_t size)
{
void *ret = realloc(ptr, size);
if (!ret) __coverity_panic__();
return ret;
}
void *xmalloc(size_t size)
{
void *mem = malloc(size);
if (!mem) __coverity_panic__();
return mem;
}
void xcalloc(size_t num, size_t size)
{
void *ret = calloc(num, size);
if (!ret) __coverity_panic__();
return ret;
}
void usage(const char *err) {
__coverity_panic__();
}
void usagef(const char *err, ...) {
__coverity_panic__();
}
void die(const char *err, ...) {
__coverity_panic__();
}
void die_errno(const char *err, ...) {
__coverity_panic__();
}
void BUG_fl(const char *file, int line, const char *fmt, ...) {
__coverity_panic__();
}
struct strbuf {
size_t alloc;
size_t len;
char *buf;
};
void strbuf_setlen(struct strbuf *sb, size_t len)
{
sb->len = len;
sb->buf[len] = '\0';
}
void strbuf_grow(struct strbuf *sb, size_t amount);
void *memcpy(void *dest, const void *src, size_t n);
void strbuf_add(struct strbuf *sb, const void *data, size_t len)
{
strbuf_grow(sb, len);
memcpy(sb->buf + sb->len, data, len);
sb->len = len;
sb->buf[len] = '\0';
}
void *memmove(void *dest, const void *src, size_t n);
void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
const void *data, size_t dlen)
{
if (pos > sb->len)
die("`pos' is too far after the end of the buffer");
if (pos + len > sb->len)
die("`pos + len' is too far after the end of the buffer");
if (dlen >= len)
strbuf_grow(sb, dlen - len);
memmove(sb->buf + pos + dlen,
sb->buf + pos + len,
sb->len - pos - len);
memcpy(sb->buf + pos, data, dlen);
sb->len = sb->len + dlen - len;
sb->buf[sb->len + dlen - len] = '\0';
}
-- snap --
It is getting a bit ridiculous at this point, the size of it and the
meager benefit it reaps. But it is better than nothing, I guess.
The reason I would really like to see an extensive discussion about this
in the commit message is that this place would be _the_ most logical
habitat for such a description, to help developers who want to set up
their own Coverity coverage (either in their own Git fork, or even in a
separate project) trying to learn from the Git project.
Ciao,
Johannes
Footnote *1*: In case it was unclear: Build agents are essentially virtual
machines. Those virtual machines are destroyed after every run, and
recreated to a pristine state before picking up the next job. To avoid
having to wait for a virtual machine to be created whenever a job needs to
be run, there is a pool of virtual machines ready to run. There have been
times when we ran so many jobs in the Git project that we drained that
pool and had to wait, painfully, for the virtual machines to be created
which, let me tell you, is not a matter of a mere couple of seconds.
This complexity is reflected in the pricing you can see at
https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions
Just to put things into perspective: In our security work, we have to run
our GitHub workflow in a private repository, where GitHub Actions are not
provided free of cost by GitHub. We have a very nice donation from GitHub
in the form of 50,000 build minutes we can use every month. If we spent
all of those minutes running on Linux (i.e. the cheapest option), GitHub
would essentially donate $400 to us. That's what we usually spend in a
matter of days preparing for security releases: The workflow run to
validate v2.39.3, for example, while taking "only" 36 minutes and 1 second
wall clock time, used up over 12 hours of those 50,000 build minutes, and
every time we iterated, we pushed 11 branches and 11 tags. Three
iterations in, we were out of build minutes.
All this is to say: We, the Git project, are not exactly cautious when it
comes to spending GitHub Actions build minutes. I really would love to see
us all to become a lot more mindful on this matter.
Footnote *2*:
https://github.com/git/git/blob/v2.42.0/.github/workflows/l10n.yml#L13
Footnote *3*: We do already use a third-party Action in `l10n.yml`:
`mshick/add-pr-comment@v1`. As you can verify easily, this is not pinned
by full length commit SHA. But then, this step only gets the
`GITHUB_TOKEN` secret, which only has the permission to write to the Pull
Request (i.e. add a comment), not to write to the repository.
Still, we will probably want to pin that Action at some stage, better safe
'n sorry. #leftoverbits
next prev parent reply other threads:[~2023-08-29 8:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-22 3:34 [PATCH] .github/workflows: add coverity action Taylor Blau
2023-08-25 21:32 ` Jeff King
2023-08-25 21:56 ` Jeff King
2023-08-29 8:18 ` Johannes Schindelin [this message]
2023-08-30 0:18 ` Jeff King
2023-08-30 8:07 ` Johannes Schindelin
2023-08-30 19:03 ` Jeff King
2023-09-21 21:53 ` [PATCH v2] " Taylor Blau
2023-09-21 23:30 ` Junio C Hamano
2023-09-22 11:09 ` Johannes Schindelin
2023-09-23 6:21 ` Jeff King
2023-09-23 7:09 ` Jeff King
2023-09-23 6:10 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6534ba5d-f4a6-71e6-5b0f-9cba2be8426e@gmx.de \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).