git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Eric Wong <e@80x24.org>
Cc: "Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	git@vger.kernel.org, "Lars Schneider" <larsxschneider@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Jeff King" <peff@peff.net>
Subject: test suite speedups via some not-so-crazy ideas (was: scripting speedups[...])
Date: Wed, 03 Nov 2021 10:24:57 +0100	[thread overview]
Message-ID: <211103.864k8t1sma.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <211030.86ee8246hy.gmgdl@evledraar.gmail.com>


On Sat, Oct 30 2021, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Oct 26 2021, Eric Wong wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>>> * Test suite is slow. Shell scripts and process forking.
>>> 
>>>    * What if we had a special shell that interpreted the commands in a
>>>      single process?
>>> 
>>>    * Even Git commands like rev-parse and hash-object, as long as that’s
>>>      not the command you’re trying to test
>>
>> This is something I've wanted in a very long time as a scripter.
>> fast-import has been great over the years, as is
>> "cat-file --batch(-check)", but there's gaps should be filled
>> (preferably without fragile linkage of shared libraries into a
>> script process)
>>
>>>    * Dscho wants to slip in a C-based solution
>>> 
>>>    * Jonathan tan commented: going back to your custom shell for tests
>>>      idea, one thing we could do is have a custom command that generates
>>>      the repo commits that we want (and that saves process spawns and
>>>      might make the tests simpler too)
>>
>> Perhaps a not-seriously-proposed patch from 2006 could be
>> modernized for our now-libified internals:
>
> I think something very short of a "C-based solution" could give us most
> of the wins here. Johannes was probably thinking of the scripting being
> slow on Windows aspect of it.
>
> But the main benefit of hypothetical C-based testing is that you can
> connect it to the dependency tree we have in the Makefile, and only
> re-run tests for code you needed to re-compile.
>
> So e.g. we don't need to run tests that invoke "git tag" if the
> dependency graph of builtin/tag.c didn't change.
>
> With COMPUTE_HEADER_DEPENDENCIES we've got access to that dependency
> information for our C code.
>
> With trace2 we could record an initial test run, and know which built-in
> commands are executed by which tests (even down to the sub-test level).
>
> Connecting these two means that we can find all tests that say run "git
> fsck", and if builtin/fsck.c is the only thing that changed in an
> interactive rebase, that's the only tests we need to run.
>
> Of course changes to things like cache.h or t/test-lib.sh would spoil
> that cache entirely, but pretty much the same is true for re-compiling
> things now, so would changing say builtin/init-db.c, as almost every
> test does a "git init" somewhere.
>
> But I think that approch is viable, and should take us from a huge
> hypothetical project like "rewrite all the tests in C" to something
> that's a viable weekend hacking project for someone who's interested.

First to outline some goals: I think saying we'd like to speed up
scripts is really getting into the weeds.

Surely we'd like to speed up test runs, and generally speaking our test
suite can be parallelized, and it mostly doesn't matter if it runs on
your computer or other people's computers, as long as it runs your
code. So:

 1. Even for contributors that have a slow system they could benefit from
    the hosted CI (on GitHub or wherever else) being faster.

 2. Our CI takes around 30-60m to finish.

 3. That CI time is almost entirely something that could be sped up by
    throwing hardware at it.

 4. We're currently using "Dv2 and DSv2-series" hosted runners
    (https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners)
    we have quite a few people on-list who work for the
    company/companies involved.

    Is it within the realm of possibility to get more CI resources
    assigned to git/git's organization network?

 5. Or, is there willingness to host/pay for hosted runners from
    someone?

    Not wearing PLC hat I'd think that we could speed that up a lot with
    some reasonable money spending, and if pushing to CI made CI run in
    3-5m instead of 60m that would be worthwhile.

 6. Related to #5: I've been able to setup hosted runner jobs, and
    self-hosted runner jobs, but is there a way to do some opportunistic
    mixture of the two? Even one where self-hosted runners could come
    and go, and if they're present contribute resources to git/git's
    network?

 7. We run the various GIT_TEST_* etc. jobs in sequence, is there a
    reason for why we're serializing things in GitHub CI that could be
    parallelized?

    The vs-build and vs-test tests run in parallel, any reason we're not
    doing that trick on the ubuntu runners other than "nobody got to
    it?". We seem to be trying hard to do the exact opposite there..

    At the extreme end we could build git ~once, and have N tests depend
    on that, where N ~= $(ls t/*.sh) x $number_of_test_modes). But
    perhaps runner starting overhead starts to be the limiting factor at
    some point.

 8. To a first approximation, does anyone really care about getting an
    exhaustive list of all failures in a run, or just that we have *a*
    failure? You can always do an exhaustive run later.

 9. On the "no" answer to #8: When I build/test my own git I first run
    those tests that I modified in the relevant branches, and if any of
    those fail I just stop.

    I generally don't need to run the entirety of the rest of the test
    suite to stop and investigate why I have a failure.

    Perhaps our CI could use a similar trick, i.e. first test the set of
    modified test files, and perhaps with some ad-hoc matching of
    filenames, so e.g. if you modify builtin/add.c we'd run t/*add*.sh
    in the first set, and all with --immediate per #8 above.

    If we pass that we'd run the full set, minus that initial set.

  reply	other threads:[~2021-11-03  9:44 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 11:55 Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-21 11:55 ` [Summit topic] Crazy (and not so crazy) ideas Johannes Schindelin
2021-10-21 12:30   ` Son Luong Ngoc
2021-10-26 20:14   ` scripting speedups [was: [Summit topic] Crazy (and not so crazy) ideas] Eric Wong
2021-10-30 19:58     ` Ævar Arnfjörð Bjarmason
2021-11-03  9:24       ` Ævar Arnfjörð Bjarmason [this message]
2021-11-03 22:12         ` test suite speedups via some not-so-crazy ideas Junio C Hamano
2021-11-02 13:52     ` scripting speedups [was: [Summit topic] Crazy (and not so crazy) ideas] Johannes Schindelin
2021-10-21 11:55 ` [Summit topic] SHA-256 Updates Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] Server-side merge/rebase: needs and wants? Johannes Schindelin
2021-10-22  3:06   ` Bagas Sanjaya
2021-10-22 10:01     ` Johannes Schindelin
2021-10-23 20:52       ` Ævar Arnfjörð Bjarmason
2021-11-08 18:21   ` Taylor Blau
2021-11-09  2:15     ` Ævar Arnfjörð Bjarmason
2021-11-30 10:06       ` Christian Couder
2021-10-21 11:56 ` [Summit topic] Submodules and how to make them worth using Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] Sparse checkout behavior and plans Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] The state of getting a reftable backend working in git.git Johannes Schindelin
2021-10-25 19:00   ` Han-Wen Nienhuys
2021-10-25 22:09     ` Ævar Arnfjörð Bjarmason
2021-10-26  8:12       ` Han-Wen Nienhuys
2021-10-28 14:17         ` Philip Oakley
2021-10-26 15:51       ` Philip Oakley
2021-10-21 11:56 ` [Summit topic] Documentation (translations, FAQ updates, new user-focused, general improvements, etc.) Johannes Schindelin
2021-10-22 14:20   ` Jean-Noël Avila
2021-10-22 14:31     ` Ævar Arnfjörð Bjarmason
2021-10-27  7:02       ` Jean-Noël Avila
2021-10-27  8:50       ` Jeff King
2021-10-21 11:56 ` [Summit topic] Increasing diversity & inclusion (transition to `main`, etc) Johannes Schindelin
2021-10-21 12:55   ` Son Luong Ngoc
2021-10-22 10:02     ` vale check, was " Johannes Schindelin
2021-10-22 10:03       ` Johannes Schindelin
2021-10-21 11:57 ` [Summit topic] Improving Git UX Johannes Schindelin
2021-10-21 16:45   ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Ævar Arnfjörð Bjarmason
2021-10-21 23:03     ` changing the experimental 'git switch' Junio C Hamano
2021-10-22  3:33     ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Bagas Sanjaya
2021-10-22 14:04     ` martin
2021-10-22 14:24       ` Ævar Arnfjörð Bjarmason
2021-10-22 15:30         ` martin
2021-10-23  8:27           ` changing the experimental 'git switch' Sergey Organov
2021-10-22 21:54         ` Sergey Organov
2021-10-24  6:54       ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Martin
2021-10-24 20:27         ` changing the experimental 'git switch' Junio C Hamano
2021-10-25 12:48           ` Ævar Arnfjörð Bjarmason
2021-10-25 17:06             ` Junio C Hamano
2021-10-25 16:44     ` Sergey Organov
2021-10-25 22:23       ` Ævar Arnfjörð Bjarmason
2021-10-27 18:54         ` Sergey Organov
2021-10-21 11:57 ` [Summit topic] Improving reviewer quality of life (patchwork, subsystem lists?, etc) Johannes Schindelin
2021-10-21 13:41   ` Konstantin Ryabitsev
2021-10-22 22:06     ` Ævar Arnfjörð Bjarmason
2021-10-22  8:02 ` Missing notes, was Re: Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-22  8:22   ` Johannes Schindelin
2021-10-22  8:30     ` Johannes Schindelin
2021-10-22  9:07       ` Johannes Schindelin
2021-10-22  9:44 ` Let's have public Git chalk talks, " Johannes Schindelin
2021-10-25 12:58   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211103.864k8t1sma.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).