From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Eric Sunshine via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Elijah Newren <newren@gmail.com>,
Fabian Stelzer <fs@gigacodes.de>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel
Date: Thu, 01 Sep 2022 14:36:57 +0200 [thread overview]
Message-ID: <220901.86bkrzjm6e.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <62fc652eb47a4df83d88a197e376f28dbbab3b52.1661992197.git.gitgitgadget@gmail.com>
On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote:
> From: Eric Sunshine <sunshine@sunshineco.com>
>
> Although chainlint.pl has undergone a good deal of optimization during
> its development -- increasing in speed significantly -- parsing and
> validating 1050+ scripts and 16500+ tests via Perl is not exactly
> instantaneous. However, perceived performance can be improved by taking
> advantage of the fact that there is no interdependence between test
> scripts or test definitions, thus parsing and validating can be done in
> parallel. The number of available cores is determined automatically but
> can be overridden via the --jobs option.
Per your CL:
Ævar offered some sensible comments[2,3] about optimizing the Makefile rules
related to chainlint, but those optimizations are not tackled here for a few
reasons: (1) this series is already quite long, (2) I'd like to keep the
series focused on its primary goal of installing a new and improved linter,
(3) these patches do not make the Makefile situation any worse[4], and (4)
those optimizations can easily be done atop this series[5].
I have been running with those t/Makefile changesg locally, but didn't
submit them. FWIW that's here:
https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-dependency-graph-for-check-chainlint
Which I'm not entirely sure I'm happy about, and it's jeust about the
chainlint tests, but...
> +sub ncores {
> + # Windows
> + return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS});
> + # Linux / MSYS2 / Cygwin / WSL
> + do { local @ARGV='/proc/cpuinfo'; return scalar(grep(/^processor\s*:/, <>)); } if -r '/proc/cpuinfo';
> + # macOS & BSD
> + return qx/sysctl -n hw.ncpu/ if $^O =~ /(?:^darwin$|bsd)/;
> + return 1;
> +}
> +
> sub show_stats {
> my ($start_time, $stats) = @_;
> my $walltime = $interval->($start_time);
> @@ -621,7 +633,9 @@ sub exit_code {
> Getopt::Long::Configure(qw{bundling});
> GetOptions(
> "emit-all!" => \$emit_all,
> + "jobs|j=i" => \$jobs,
> "stats|show-stats!" => \$show_stats) or die("option error\n");
> +$jobs = ncores() if $jobs < 1;
>
> my $start_time = $getnow->();
> my @stats;
> @@ -633,6 +647,40 @@ unless (@scripts) {
> exit;
> }
>
> -push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); }));
> +unless ($Config{useithreads} && eval {
> + require threads; threads->import();
> + require Thread::Queue; Thread::Queue->import();
> + 1;
> + }) {
> + push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); }));
> + show_stats($start_time, \@stats) if $show_stats;
> + exit(exit_code(\@stats));
> +}
> +
> +my $script_queue = Thread::Queue->new();
> +my $output_queue = Thread::Queue->new();
> +
> +sub next_script { return $script_queue->dequeue(); }
> +sub emit { $output_queue->enqueue(@_); }
> +
> +sub monitor {
> + while (my $s = $output_queue->dequeue()) {
> + print($s);
> + }
> +}
> +
> +my $mon = threads->create({'context' => 'void'}, \&monitor);
> +threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs;
> +
> +$script_queue->enqueue(@scripts);
> +$script_queue->end();
> +
> +for (threads->list()) {
> + push(@stats, $_->join()) unless $_ == $mon;
> +}
> +
> +$output_queue->end();
> +$mon->join();
Maybe I'm misunderstanding this whole thing, but this really seems like
the wrong direction in an otherwise fantastic direction of a series.
I.e. it's *great* that we can do chain-lint without needing to actually
execute the *.sh file, this series adds a lint parser that can parse
those *.sh "at rest".
But in your 16/18 you then do:
+if test "${GIT_TEST_CHAIN_LINT:-1}" != 0
+then
+ "$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
+ BUG "lint error (see '?!...!? annotations above)"
+fi
I may just be missing something here, but why not instead just borrow
what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs"
non-.PHONY, 2021-10-15)?
I.e. if we can run against t0001-init.sh or whatever *once* to see if it
chain-lints OK then surely we could have a rule like:
t0001-init.sh.chainlint-ok: t0001-init.sh
perl chainlint.pl $< >$@
Then whenever you change t0001-init.sh we refresh that
t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll
fail to make it, and will unlink that t0001-init.sh.chainlint-ok.
That way you wouldn't need any parallelism in the Perl script, because
you'd have "make" take care of it, and the common case of re-testing
where the speed matters would be that we woudln't need to run this at
all, or would only re-run it for the test scripts that changed.
(Obviously a "real" implementation would want to create that ".ok" file
in t/.build/chainlint" or whatever)
A drawback is that you'd probably be slower on the initial run, as you'd
spwn N chainlint.pl. You could use $? instead of $< to get around that,
but that requires some re-structuring, and I've found it to generally
not be worth it.
It would also have the drawback that a:
./t0001-init.sh
wouldn't run the chain-lint, but this would:
make T=t0001-init.sh
But if want the former to work we could carry some
"GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the
test-via-test-lib.sh if it isn't set.
next prev parent reply other threads:[~2022-09-01 12:48 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 0:29 [PATCH 00/18] make test "linting" more comprehensive Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 01/18] t: add skeleton chainlint.pl Eric Sunshine via GitGitGadget
2022-09-01 12:27 ` Ævar Arnfjörð Bjarmason
2022-09-02 18:53 ` Eric Sunshine
2022-09-01 0:29 ` [PATCH 02/18] chainlint.pl: add POSIX shell lexical analyzer Eric Sunshine via GitGitGadget
2022-09-01 12:32 ` Ævar Arnfjörð Bjarmason
2022-09-03 6:00 ` Eric Sunshine
2022-09-01 0:29 ` [PATCH 03/18] chainlint.pl: add POSIX shell parser Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 04/18] chainlint.pl: add parser to validate tests Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 05/18] chainlint.pl: add parser to identify test definitions Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 06/18] chainlint.pl: validate test scripts in parallel Eric Sunshine via GitGitGadget
2022-09-01 12:36 ` Ævar Arnfjörð Bjarmason [this message]
2022-09-03 7:51 ` Eric Sunshine
2022-09-06 22:35 ` Eric Wong
2022-09-06 22:52 ` Eric Sunshine
2022-09-06 23:26 ` Jeff King
2022-11-21 4:02 ` Eric Sunshine
2022-11-21 13:28 ` Ævar Arnfjörð Bjarmason
2022-11-21 14:07 ` Eric Sunshine
2022-11-21 14:18 ` Ævar Arnfjörð Bjarmason
2022-11-21 14:48 ` Eric Sunshine
2022-11-21 18:04 ` Jeff King
2022-11-21 18:47 ` Eric Sunshine
2022-11-21 18:50 ` Eric Sunshine
2022-11-21 18:52 ` Jeff King
2022-11-21 19:00 ` Eric Sunshine
2022-11-21 19:28 ` Jeff King
2022-11-22 0:11 ` Ævar Arnfjörð Bjarmason
2022-09-01 0:29 ` [PATCH 07/18] chainlint.pl: don't require `return|exit|continue` to end with `&&` Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 08/18] t/Makefile: apply chainlint.pl to existing self-tests Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 09/18] chainlint.pl: don't require `&` background command to end with `&&` Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 10/18] chainlint.pl: don't flag broken &&-chain if `$?` handled explicitly Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 11/18] chainlint.pl: don't flag broken &&-chain if failure indicated explicitly Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 12/18] chainlint.pl: complain about loops lacking explicit failure handling Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 13/18] chainlint.pl: allow `|| echo` to signal failure upstream of a pipe Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 14/18] t/chainlint: add more chainlint.pl self-tests Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 15/18] test-lib: retire "lint harder" optimization hack Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 16/18] test-lib: replace chainlint.sed with chainlint.pl Eric Sunshine via GitGitGadget
2022-09-03 5:07 ` Elijah Newren
2022-09-03 5:24 ` Eric Sunshine
2022-09-01 0:29 ` [PATCH 17/18] t/Makefile: teach `make test` and `make prove` to run chainlint.pl Eric Sunshine via GitGitGadget
2022-09-01 0:29 ` [PATCH 18/18] t: retire unused chainlint.sed Eric Sunshine via GitGitGadget
2022-09-02 12:42 ` several messages Johannes Schindelin
2022-09-02 18:16 ` Eric Sunshine
2022-09-02 18:34 ` Jeff King
2022-09-02 18:44 ` Junio C Hamano
2022-09-11 5:28 ` [PATCH 00/18] make test "linting" more comprehensive Jeff King
2022-09-11 7:01 ` Eric Sunshine
2022-09-11 18:31 ` Jeff King
2022-09-12 23:17 ` Eric Sunshine
2022-09-13 0:04 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=220901.86bkrzjm6e.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=fs@gigacodes.de \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).