All of lore.kernel.org
 help / color / mirror / Atom feed
* Why the Makefile is so eager to re-build & re-link
@ 2021-06-24 13:16 Ævar Arnfjörð Bjarmason
  2021-06-24 15:16 ` Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 13:16 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin

This is probably all stuff that's been on list-before / known by
some/all people in the CC list, but in case not: I looked a bit into why
we'e so frequently re-linking and re compiling things these days,
slowing down e.g. "git rebase --exec='make ...'".

These are all fixable issues, I haven't worked on them, just some notes
in case anyone has better ideas:

 * version.c: The biggest offender, every time you move to a new commit
   we have a FORCE on GIT-VERSION-FILE, if it changes (which it will, if
   you're in a git checkout), we in turn re-compile version.o, link
   libgit to contain it, and then need to re-link everything that uses
   libgit.

   Some of this can be micro-optimized by moving that out of libgit,
   e.g. we re-link git-shell, which doesn't ever report the version, we
   do the same for all the perl scripts, just because obscure codepaths
   of them that could shell out to "git version" want to report it, and
   we embed it in the generated code.

   But I think the best approach here is to piggy-back on
   SKIP_DASHED_BUILT_INS, if you have that enabled then only "git" (and
   git-upload-pack etc.) need to be aware of the version, which they can
   then set in the environment for the others.

   This also applies to e.g. the git-http* stuff, which has a user agent
   derived from the version.

 * {command,config}-list.h (and in-flight, my hook-list.h): Every time
   you touch a Documentation/git-*.txt we need to re-generate these, and
   since their mtime changes we re-compile and re-link all the way up to
   libgit and our other tools.

   I think the best solution here is to make the generate-*.sh
   shellscripts faster (just one takes ~300ms of nested shellscripting,
   just to grep out the first few lines of every git-*.txt, in e.g. Perl
   or a smarter awk script this would be <5ms).

   Then we make those FORCE, but most of the time the config or command
   summary (or list of hooks) doesn't change, so we don't need to
   replace the file.

Perhaps even better would be to piggy-back on the RUNTIME_PREFIX
support, and simply drop in generated plain-text files, so in your build
checkout the list of hooks, commands etc. would be parsed instead of
compiled in. Then we wouldn't need to re-build or re-link anything for
the version or this other data.

We could/should still have some facility to compile those in for
"install", see also [1] for some concerns / ideas for other similar
things that could use that.

1. https://lore.kernel.org/git/87czvoowg2.fsf@evledraar.gmail.com/


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
@ 2021-06-24 15:16 ` Jeff King
  2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
                     ` (3 more replies)
  2021-06-24 23:35 ` Øystein Walle
  2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
  2 siblings, 4 replies; 87+ messages in thread
From: Jeff King @ 2021-06-24 15:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin

On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This is probably all stuff that's been on list-before / known by
> some/all people in the CC list, but in case not: I looked a bit into why
> we'e so frequently re-linking and re compiling things these days,
> slowing down e.g. "git rebase --exec='make ...'".
> 
> These are all fixable issues, I haven't worked on them, just some notes
> in case anyone has better ideas:

From a quick skim I didn't see anything wrong in your analysis or
suggestions. I do kind of wonder if we are hitting a point of
diminishing returns here. "make -j16" on my system takes ~175ms for a
noop, and ~650ms if I have to regenerate version.h (it's something like
2s total of CPU, but I have 8 cores).

I know I've probably got a nicer machine than many other folks. But at
some point correctness and avoiding complexity in the Makefile become a
win over shaving off a second from compile times. You'd probably find
lower hanging fruit in the test suite which could shave off tens of
seconds. :)

>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>    you touch a Documentation/git-*.txt we need to re-generate these, and
>    since their mtime changes we re-compile and re-link all the way up to
>    libgit and our other tools.
> 
>    I think the best solution here is to make the generate-*.sh
>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>    or a smarter awk script this would be <5ms).

Yeah, I think Eric mentioned he had looked into doing this in perl, but
we weren't entirely happy with the dependency. Here's another really odd
thing I noticed:

  $ time sh ./generate-cmdlist.sh command-list.txt >one
  real	0m1.323s
  user	0m1.531s
  sys	0m0.834s

  $ time sh -x ./generate-cmdlist.sh command-list.txt >two
  [a bunch of trace output]
  real	0m0.513s
  user	0m0.754s
  sys	0m0.168s

  $ cmp one two
  [no output]

Er, what? Running with "-x" makes it almost 3 times faster to generate
the same output? I'd have said this is an anomaly, but it's repeatable
(and swapping the order produces the same output, so it's not some weird
priming thing). And then to top it all off, redirecting the trace is
slow again:

  $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
  real	0m1.363s
  user	0m1.538s
  sys	0m0.902s

A little mini-mystery that I think I may leave unsolved for now.

>    Then we make those FORCE, but most of the time the config or command
>    summary (or list of hooks) doesn't change, so we don't need to
>    replace the file.

Yes, possibly we could use the "if it hasn't changed, don't update the
file" trick to avoid cascading updates.

> Perhaps even better would be to piggy-back on the RUNTIME_PREFIX
> support, and simply drop in generated plain-text files, so in your build
> checkout the list of hooks, commands etc. would be parsed instead of
> compiled in. Then we wouldn't need to re-build or re-link anything for
> the version or this other data.

Yeah, that would work. I worry a bit that the value of something like
"version.h" is lost with a runtime file, though. The point is to bake it
into the binary so you can't accidentally get the wrong value (say, from
running "./git" from the build directory, which looks at the runtime
file where the binary _would_ be installed, except you haven't run "make
install" yet).

For cmdlist stuff it could be a good match, though.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 15:16 ` Jeff King
@ 2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
  2021-06-24 21:30   ` Johannes Sixt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 15:28 UTC (permalink / raw)
  To: Jeff King
  Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin


On Thu, Jun 24 2021, Jeff King wrote:

> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> This is probably all stuff that's been on list-before / known by
>> some/all people in the CC list, but in case not: I looked a bit into why
>> we'e so frequently re-linking and re compiling things these days,
>> slowing down e.g. "git rebase --exec='make ...'".
>> 
>> These are all fixable issues, I haven't worked on them, just some notes
>> in case anyone has better ideas:
>
> From a quick skim I didn't see anything wrong in your analysis or
> suggestions. I do kind of wonder if we are hitting a point of
> diminishing returns here. "make -j16" on my system takes ~175ms for a
> noop, and ~650ms if I have to regenerate version.h (it's something like
> 2s total of CPU, but I have 8 cores).
>
> I know I've probably got a nicer machine than many other folks. But at
> some point correctness and avoiding complexity in the Makefile become a
> win over shaving off a second from compile times. You'd probably find
> lower hanging fruit in the test suite which could shave off tens of
> seconds. :)

It's mainly annoying when e.g. doing a rebase of an N patch series,
those ~700ms v.s. ~200ms add up quickly.

>>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>    you touch a Documentation/git-*.txt we need to re-generate these, and
>>    since their mtime changes we re-compile and re-link all the way up to
>>    libgit and our other tools.
>> 
>>    I think the best solution here is to make the generate-*.sh
>>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>    or a smarter awk script this would be <5ms).
>
> Yeah, I think Eric mentioned he had looked into doing this in perl, but
> we weren't entirely happy with the dependency. Here's another really odd
> thing I noticed:
>
>   $ time sh ./generate-cmdlist.sh command-list.txt >one
>   real	0m1.323s
>   user	0m1.531s
>   sys	0m0.834s
>
>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>   [a bunch of trace output]
>   real	0m0.513s
>   user	0m0.754s
>   sys	0m0.168s
>
>   $ cmp one two
>   [no output]
>
> Er, what? Running with "-x" makes it almost 3 times faster to generate
> the same output? I'd have said this is an anomaly, but it's repeatable
> (and swapping the order produces the same output, so it's not some weird
> priming thing). And then to top it all off, redirecting the trace is
> slow again:
>
>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>   real	0m1.363s
>   user	0m1.538s
>   sys	0m0.902s
>
> A little mini-mystery that I think I may leave unsolved for now.

Sounds interesting if true, I haven't looked into it.

>>    Then we make those FORCE, but most of the time the config or command
>>    summary (or list of hooks) doesn't change, so we don't need to
>>    replace the file.
>
> Yes, possibly we could use the "if it hasn't changed, don't update the
> file" trick to avoid cascading updates.

The problem is also that you can only do it at the lowest level, or
you'll get into a dead-end of something else depending on the FORCE
target continually re-making it, even though the target itself decided
there was nothing to do based on a cmp(1).

>> Perhaps even better would be to piggy-back on the RUNTIME_PREFIX
>> support, and simply drop in generated plain-text files, so in your build
>> checkout the list of hooks, commands etc. would be parsed instead of
>> compiled in. Then we wouldn't need to re-build or re-link anything for
>> the version or this other data.
>
> Yeah, that would work. I worry a bit that the value of something like
> "version.h" is lost with a runtime file, though. The point is to bake it
> into the binary so you can't accidentally get the wrong value (say, from
> running "./git" from the build directory, which looks at the runtime
> file where the binary _would_ be installed, except you haven't run "make
> install" yet).

I think all of those concerns are covered under RUNTIME_PREFIX, it
discovers files relative to git whether you have it installed or not.

I still haven't looked into why I sometimes need --exec-path=$PWD in the
build checkout, and sometimes not though...

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 15:16 ` Jeff King
  2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
@ 2021-06-24 21:30   ` Johannes Sixt
  2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
  2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
  2021-06-29  5:04   ` Eric Sunshine
  3 siblings, 1 reply; 87+ messages in thread
From: Johannes Sixt @ 2021-06-24 21:30 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin

Am 24.06.21 um 17:16 schrieb Jeff King:
> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>    you touch a Documentation/git-*.txt we need to re-generate these, and
>>    since their mtime changes we re-compile and re-link all the way up to
>>    libgit and our other tools.
>>
>>    I think the best solution here is to make the generate-*.sh
>>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>    or a smarter awk script this would be <5ms).
> 
> Yeah, I think Eric mentioned he had looked into doing this in perl, but
> we weren't entirely happy with the dependency. Here's another really odd
> thing I noticed:
> 
>   $ time sh ./generate-cmdlist.sh command-list.txt >one
>   real	0m1.323s
>   user	0m1.531s
>   sys	0m0.834s
> 
>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>   [a bunch of trace output]
>   real	0m0.513s
>   user	0m0.754s
>   sys	0m0.168s
> 
>   $ cmp one two
>   [no output]
> 
> Er, what? Running with "-x" makes it almost 3 times faster to generate
> the same output? I'd have said this is an anomaly, but it's repeatable
> (and swapping the order produces the same output, so it's not some weird
> priming thing). And then to top it all off, redirecting the trace is
> slow again:
> 
>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>   real	0m1.363s
>   user	0m1.538s
>   sys	0m0.902s
> 
> A little mini-mystery that I think I may leave unsolved for now.

Strange, really. Reminds me of the case that the `read` built-in must
read input byte by byte if stdin is not connected to a (searchable) file.

I have two patches that speed up generate-cmdlist.sh a bit:

generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
(https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03)
generate-cmdlist.sh: spawn fewer processes
(https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304)

that I can submit (give me some time, though) or interested parties
could pick up.

-- Hannes

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
  2021-06-24 15:16 ` Jeff King
@ 2021-06-24 23:35 ` Øystein Walle
  2021-06-24 23:39   ` Øystein Walle
  2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
  2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 87+ messages in thread
From: Øystein Walle @ 2021-06-24 23:35 UTC (permalink / raw)
  To: avarab
  Cc: Johannes.Schindelin, emilyshaffer, git, gitster, me, peff, sunshine

Hi, Ævar

> {command,config}-list.h (and in-flight, my hook-list.h): Every time
> you touch a Documentation/git-*.txt we need to re-generate these, and
> since their mtime changes we re-compile and re-link all the way up to
> libgit and our other tools.
> 
> I think the best solution here is to make the generate-*.sh
> shellscripts faster (just one takes ~300ms of nested shellscripting,
> just to grep out the first few lines of every git-*.txt, in e.g. Perl
> or a smarter awk script this would be <5ms).
> 
> Then we make those FORCE, but most of the time the config or command
> summary (or list of hooks) doesn't change, so we don't need to
> replace the file.

One possible technique to fix this is to generate to a temporary file, and copy
the temporary file over the real file if it's actually different. I see the
Makefile already does create a temporary file, so how about something like
this:

diff --git a/Makefile b/Makefile
index 93664d6714..9b2f081a2a 100644
--- a/Makefile
+++ b/Makefile
@@ -2216,7 +2216,8 @@ command-list.h: generate-cmdlist.sh command-list.txt
 command-list.h: $(wildcard Documentation/git*.txt)
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
-		command-list.txt >$@+ && mv $@+ $@
+		command-list.txt >$@+ && \
+		if ! cmp -s $@ $@+; then mv $@+ $@; fi
 
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\


This seems to work fine from my basic testing. I think it can even be written
more terse as `&& ! cmp ... &&` but that looks a bit weird to me. In any case
it looks like it can easily be added the other relevant places in the Makefile
too.

Øsse

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 23:35 ` Øystein Walle
@ 2021-06-24 23:39   ` Øystein Walle
  2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 87+ messages in thread
From: Øystein Walle @ 2021-06-24 23:39 UTC (permalink / raw)
  To: oystwa
  Cc: Johannes.Schindelin, avarab, emilyshaffer, git, gitster, me,
	peff, sunshine

Oops, I missed that you had already discussed this idea. Sorry about about
that.

Øsse

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 23:35 ` Øystein Walle
  2021-06-24 23:39   ` Øystein Walle
@ 2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-25  0:11 UTC (permalink / raw)
  To: Øystein Walle
  Cc: Johannes.Schindelin, emilyshaffer, git, gitster, me, peff, sunshine


On Fri, Jun 25 2021, Øystein Walle wrote:

> Hi, Ævar
>
>> {command,config}-list.h (and in-flight, my hook-list.h): Every time
>> you touch a Documentation/git-*.txt we need to re-generate these, and
>> since their mtime changes we re-compile and re-link all the way up to
>> libgit and our other tools.
>> 
>> I think the best solution here is to make the generate-*.sh
>> shellscripts faster (just one takes ~300ms of nested shellscripting,
>> just to grep out the first few lines of every git-*.txt, in e.g. Perl
>> or a smarter awk script this would be <5ms).
>> 
>> Then we make those FORCE, but most of the time the config or command
>> summary (or list of hooks) doesn't change, so we don't need to
>> replace the file.
>
> One possible technique to fix this is to generate to a temporary file, and copy
> the temporary file over the real file if it's actually different. I see the
> Makefile already does create a temporary file, so how about something like
> this:
>
> diff --git a/Makefile b/Makefile
> index 93664d6714..9b2f081a2a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2216,7 +2216,8 @@ command-list.h: generate-cmdlist.sh command-list.txt
>  command-list.h: $(wildcard Documentation/git*.txt)
>  	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
>  		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
> -		command-list.txt >$@+ && mv $@+ $@
> +		command-list.txt >$@+ && \
> +		if ! cmp -s $@ $@+; then mv $@+ $@; fi
>  
>  SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
>  	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
>
>
> This seems to work fine from my basic testing. I think it can even be written
> more terse as `&& ! cmp ... &&` but that looks a bit weird to me. In any case
> it looks like it can easily be added the other relevant places in the Makefile
> too.

I see your downthread "oops", no worries. Just to elaborate slightly on
the comment I had in [1] (and as you've doubtless discovered) then yes,
this works the first time:
    
    $ touch Documentation/git-bisect.txt
    $ make -k -j $(nproc) git
        GEN command-list.h

Without your change we'd have also had to recompile "git", but the
problem is that if you run and re-run that we'll keep making
command-list.h again and again, which is "make" getting stuck on,
because:

    $ make -d -k -j $(nproc) git 2>&1 | grep newer.*command-list.h
        Prerequisite 'Documentation/git-bisect.txt' is newer than target 'command-list.h'.

Hence the "lowest level" comment in [1], i.e. you can only use this
"cmp" trick for things that don't themselves have dependencies.

Which at the end of the day is why make is both wonderful, and why it
sucks.

It sometimes gets hard to force your "circular" dependency graph into
the "square peg" of files and their mtimes, but it's also great for
simplicity that everyone can't write their own custom "this is what it
means for my dependencies to have changed" function (which in practice
would break way more in the hands of most programmers, including yours
truly, than files & their mtimes).

E.g. in this case we don't *actually* depend on Documentation/git*.txt
for the purposes of re-making things all the way up the graph, we depend
on the tiny bit of data we extract from those files, basically just the
NAME section at the very beginning.

The only way to represent that information in a way that "make"
understands would be to add another intermediate step where we extract
exactly that information out into intermediate files with a script, and
then in turn depend on those files.

Which actually, might not be such a bad idea as a solution to this
particular problem.

I.e. just to have individual generated versions of
git-bisect.txt.just-what-command-list.h-needs. We do something vaguely
similar with Documentation/build-docdep.perl, which is another glorious
hack of trying to get make's dependency graph to behave as we'd like.

As a practical matter it wouldn't get you very far except for doc-only
changes, since you'd still have the version.c problem, which also
changes on every (code) commit.

1. http://lore.kernel.org/git/87y2azyzer.fsf@evledraar.gmail.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 21:30   ` Johannes Sixt
@ 2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
  2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
  2021-06-29  2:13       ` Jeff King
  0 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-25  8:34 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Git List, Junio C Hamano, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin


On Thu, Jun 24 2021, Johannes Sixt wrote:

> Am 24.06.21 um 17:16 schrieb Jeff King:
>> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>>    you touch a Documentation/git-*.txt we need to re-generate these, and
>>>    since their mtime changes we re-compile and re-link all the way up to
>>>    libgit and our other tools.
>>>
>>>    I think the best solution here is to make the generate-*.sh
>>>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>>>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>>    or a smarter awk script this would be <5ms).
>> 
>> Yeah, I think Eric mentioned he had looked into doing this in perl, but
>> we weren't entirely happy with the dependency. Here's another really odd
>> thing I noticed:
>> 
>>   $ time sh ./generate-cmdlist.sh command-list.txt >one
>>   real	0m1.323s
>>   user	0m1.531s
>>   sys	0m0.834s
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>>   [a bunch of trace output]
>>   real	0m0.513s
>>   user	0m0.754s
>>   sys	0m0.168s
>> 
>>   $ cmp one two
>>   [no output]
>> 
>> Er, what? Running with "-x" makes it almost 3 times faster to generate
>> the same output? I'd have said this is an anomaly, but it's repeatable
>> (and swapping the order produces the same output, so it's not some weird
>> priming thing). And then to top it all off, redirecting the trace is
>> slow again:
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>>   real	0m1.363s
>>   user	0m1.538s
>>   sys	0m0.902s
>> 
>> A little mini-mystery that I think I may leave unsolved for now.
>
> Strange, really. Reminds me of the case that the `read` built-in must
> read input byte by byte if stdin is not connected to a (searchable) file.
>
> I have two patches that speed up generate-cmdlist.sh a bit:
>
> generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
> (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03)
> generate-cmdlist.sh: spawn fewer processes
> (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304)
>
> that I can submit (give me some time, though) or interested parties
> could pick up.

Interesting, but I think rather than micro-optimizing the O(n) loop it
makes more sense to turn it into a series of O(1) in -j parallel,
i.e. actually use the make dependency graph for this as I suggested in:
https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/

Something like the hacky throwaway patch that follows. Now when you
touch a file in Documentation/git-*.txt you re-make just that file
chain, which gets assembled into the command-list.h:
	
	$ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1
	GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    CC version.o
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN command-list.h
	    CC help.o
	    AR libgit.a
	    LINK git

Those build/* files are, respectively, the relevant line corresponding
to the *.txt file from command-list.txt:

    $ cat build/Documentation/git-add.txt.cmdlist.in
    git-add                                 mainporcelain           worktree

And then the line we want in command-list.h at the end:

    $ cat build/Documentation/git-add.txt.cmdlist
        { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree },

The build process in then just a matter of keeping those files
up-to-date, and having "make" create the command-list.h is just a matter
of:

    cat build/Documentation/*.cmdlist

Well, with the categories still being an O(n) affair, but the cost of
generating those is trivial.

I think that aside from optimizing for speed it just makes things
clearer, if you modify e.g. git-add.txt you see a clear chain of output
from make about how we generate output from that, and then make the
command-list.h.

Now we'd just show a mysterious command-list.h entry. Whenever you have
"make" create one file from many it becomes hard to see at a glance
what's going on.

I'd be curious to see what how that performs e.g. on Windows, on Linux I
get e.g. (warm cache):
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN build/Documentation/git-apply.txt.cmdlist.in
	    GEN build/Documentation/git-annotate.txt.cmdlist.in
	    GEN build/Documentation/git-am.txt.cmdlist.in
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    GEN build/Documentation/git-archimport.txt.cmdlist.in
	    GEN build/Documentation/git-archive.txt.cmdlist.in
	    GEN build/Documentation/git-apply.txt.cmdlist
	    GEN build/Documentation/git-annotate.txt.cmdlist
	    GEN build/Documentation/git-am.txt.cmdlist
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN build/Documentation/git-archimport.txt.cmdlist
	    GEN build/Documentation/git-archive.txt.cmdlist
	    GEN command-list.h
	
	real    0m0.214s
	user    0m0.196s
	sys     0m0.071s
	
Doing the same on master takes around 600ms for me at best:
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN command-list.h
	
	real    0m0.611s
	user    0m0.756s
	sys     0m0.112s

It's even faster when I have to make all of them (my -j is = 8), or
around 450ms after a "touch Documentation/git-*.txt".

We have ~170 lines we process in command-list.txt, I'd think on Windows
you'd get much better results, instead of optimizing those number of
"sort | uniq" to the same number of "sort -u" the common case is just
running 1-5 of those, as that's all the *.txt files you changed, then
just "cat-ing" the full set.

diff --git a/.gitignore b/.gitignore
index 311841f9bed..9b365395496 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@
 /GIT-USER-AGENT
 /GIT-VERSION-FILE
 /bin-wrappers/
+/build/
 /git
 /git-add
 /git-add--interactive
diff --git a/Makefile b/Makefile
index c3565fc0f8f..5e845bd0f69 100644
--- a/Makefile
+++ b/Makefile
@@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \
 		>$@+ && mv $@+ $@
 
-command-list.h: generate-cmdlist.sh command-list.txt
+build/Documentation:
+	$(QUIET_GEN)mkdir -p build/Documentation
+.PRECIOUS: build/Documentation/git%.txt.cmdlist.in
+build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt
+	$(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \
+	then \
+		# For e.g. git-init-db, which has a *.txt file, but no \
+		# command-list.h entry \
+		>$@; \
+	fi
+build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in
+	$(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@
 
-command-list.h: $(wildcard Documentation/git*.txt)
-	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
+COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt))
+
+COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES))
+command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN)
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
-		command-list.txt >$@+ && mv $@+ $@
+		command-list.txt >$@+ && \
+	echo "static struct cmdname_help command_list[] = {" >>$@+ && \
+	cat build/Documentation/*.txt.cmdlist >>$@+ && \
+	echo "};" >>$@+ && \
+	mv $@+ $@
 
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..f80266d5138 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -1,5 +1,8 @@
 #!/bin/sh
 
+mode=$1
+shift
+
 die () {
 	echo "$@" >&2
 	exit 1
@@ -61,8 +64,6 @@ define_category_names () {
 }
 
 print_command_list () {
-	echo "static struct cmdname_help command_list[] = {"
-
 	command_list "$1" |
 	while read cmd rest
 	do
@@ -73,6 +74,9 @@ print_command_list () {
 		done
 		echo " },"
 	done
+}
+
+end_print_command_list () {
 	echo "};"
 }
 
@@ -84,6 +88,12 @@ do
 	shift
 done
 
+if test "$mode" = "--tail"
+then
+	print_command_list "$1"
+	exit 0
+fi
+
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
 	const char *name;
@@ -94,5 +104,3 @@ struct cmdname_help {
 define_categories "$1"
 echo
 define_category_names "$1"
-echo
-print_command_list "$1"

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
@ 2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
  2021-06-29  2:13       ` Jeff King
  1 sibling, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-25  9:01 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, Git List, Junio C Hamano, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin, Johannes Schindelin


On Fri, Jun 25 2021, Ævar Arnfjörð Bjarmason wrote:

Just one small thing I forgot to note:

> On Thu, Jun 24 2021, Johannes Sixt wrote:
>> I have two patches that speed up generate-cmdlist.sh a bit:
> [...]
> Interesting, but I think rather than micro-optimizing the O(n) loop it
> makes more sense to turn it into a series of O(1) in -j parallel,
> i.e. actually use the make dependency graph for this as I suggested in:
> https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/
> [...]
> diff --git a/.gitignore b/.gitignore
> index 311841f9bed..9b365395496 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -13,6 +13,7 @@
>  /GIT-USER-AGENT
>  /GIT-VERSION-FILE
>  /bin-wrappers/
> +/build/
>  /git
>  /git-add
>  /git-add--interactive
> diff --git a/Makefile b/Makefile
> index c3565fc0f8f..5e845bd0f69 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt
>  	$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \
>  		>$@+ && mv $@+ $@
>  
> -command-list.h: generate-cmdlist.sh command-list.txt
> +build/Documentation:
> +	$(QUIET_GEN)mkdir -p build/Documentation
> +.PRECIOUS: build/Documentation/git%.txt.cmdlist.in
> +build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt
> +	$(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \
> +	then \
> +		# For e.g. git-init-db, which has a *.txt file, but no \
> +		# command-list.h entry \
> +		>$@; \
> +	fi
> +build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in
> +	$(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@
>  
> -command-list.h: $(wildcard Documentation/git*.txt)
> -	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
> +COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt))
> +
> +COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES))
> +command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN)
> +	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \
>  		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \

I left this --exclude-program code (added in 724d63569fe (help -a: do
not list commands that are excluded from the build, 2019-04-18)) in just
because I couldn't be bothered to rewrite it for this demo. But of
course that's also something that becomes much easier once we do this in
a Makefile-native way, we'd just $(filter) that list out ourselves, so
no need to have some shellscript deal with it.

That's also true on "master" right now, 724d63569fe doesn't discuss why
it went for the shellscript approach. I suspect it's just that the
author thought it was easier than dealing with some combination of
$(filter) and (a possibly nested) $(pathsubst).

> -		command-list.txt >$@+ && mv $@+ $@
> +		command-list.txt >$@+ && \
> +	echo "static struct cmdname_help command_list[] = {" >>$@+ && \
> +	cat build/Documentation/*.txt.cmdlist >>$@+ && \
> +	echo "};" >>$@+ && \
> +	mv $@+ $@

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 15:16 ` Jeff King
  2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
  2021-06-24 21:30   ` Johannes Sixt
@ 2021-06-25 21:17   ` Felipe Contreras
  2021-06-29  5:04   ` Eric Sunshine
  3 siblings, 0 replies; 87+ messages in thread
From: Felipe Contreras @ 2021-06-25 21:17 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer,
	Eric Sunshine, Johannes Schindelin

Jeff King wrote:
> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > This is probably all stuff that's been on list-before / known by
> > some/all people in the CC list, but in case not: I looked a bit into why
> > we'e so frequently re-linking and re compiling things these days,
> > slowing down e.g. "git rebase --exec='make ...'".
> > 
> > These are all fixable issues, I haven't worked on them, just some notes
> > in case anyone has better ideas:
> 
> From a quick skim I didn't see anything wrong in your analysis or
> suggestions. I do kind of wonder if we are hitting a point of
> diminishing returns here. "make -j16" on my system takes ~175ms for a
> noop, and ~650ms if I have to regenerate version.h (it's something like
> 2s total of CPU, but I have 8 cores).
> 
> I know I've probably got a nicer machine than many other folks. But at
> some point correctness and avoiding complexity in the Makefile become a
> win over shaving off a second from compile times. You'd probably find
> lower hanging fruit in the test suite which could shave off tens of
> seconds. :)

That's not a good enough reason to not try to improve something.
In fact, it's a known fallacy called the fallacy of relative privation [1].

Also, your definition of "correctness" is not necessarily shared by
everyone.

I for one don't see what I'm gaining by running tests with
GIT_VERSION = 2.32.0.97.g949e814b27, and then 2.32.0.98.gcfb60a24d6.

What's wrong with GIT_VERSION = 2.33-dev?

Sure, I understand that some people might want to have a precise
version, but when I'm doing development I don't. Perhaps set a static
version when DEVELOPER=1?

[1] https://rationalwiki.org/wiki/Not_as_bad_as

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
  2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
@ 2021-06-29  2:13       ` Jeff King
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-06-29  2:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Sixt, Git List, Junio C Hamano, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Johannes Schindelin

On Fri, Jun 25, 2021 at 10:34:20AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Interesting, but I think rather than micro-optimizing the O(n) loop it
> makes more sense to turn it into a series of O(1) in -j parallel,
> i.e. actually use the make dependency graph for this as I suggested in:
> https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/

I have mixed feelings on that. I do like the general notion of breaking
apart tasks and feeding the dependencies to "make", because that lets it
do a better job of parallelizing or avoiding already-done work. But
there's a cost to running any job, so eventually you get to a unit of
work that's so small the overhead dominates.

For instance, starting from a built Git but dirtying all doc files with
"touch Documentation/*.txt", running "time make -j16" yields:

  real	0m1.749s
  user	0m2.963s
  sys	0m1.146s

With your patch to break it apart into many jobs, the same operation
gives:

  real	0m0.762s
  user	0m3.054s
  sys	0m0.600s

So that took fewer wall-clock seconds, but we actually spent more CPU.
On a system with fewer cores, it would probably be a loss in both
places.

Now maybe that's a good tradeoff, especially because the common case
(aside from a build-from-scratch, which will spend loads more time
actually compiling anyway) is that only a handful of files would be
updated.

But if we can just make the actual operation faster, then even O(n)
repeated work might be a win in all cases, because it's avoiding the
overhead of extra jobs.

I dunno. I think there's a formula here that depends on the overhead of
a job versus the time to handle a single file in the script, coupled
with the expected number of changed files for any run. I'm not sure of
the exact values of those numbers in this case, but am mostly pointing
out that it's a tradeoff and not always a pure win. :)

> Something like the hacky throwaway patch that follows. Now when you
> touch a file in Documentation/git-*.txt you re-make just that file
> chain, which gets assembled into the command-list.h:

I know you said this was throwaway, but in case you do pursue it
further, my first run hit:

  $ time make
  GIT_VERSION = 2.32.0.94.gaa5e6f14dd
      * new prefix flags
      GEN build/Documentation
      GEN build/Documentation/git-add.txt.cmdlist.in
  /bin/sh: 1: cannot create build/Documentation/git-add.txt.cmdlist.in: Directory nonexistent
  /bin/sh: 5: cannot create build/Documentation/git-add.txt.cmdlist.in: Directory nonexistent
      GEN build/Documentation/git-am.txt.cmdlist.in
  /bin/sh: 1: cannot create build/Documentation/git-am.txt.cmdlist.in: Directory nonexistent
  /bin/sh: 5: cannot create build/Documentation/git-am.txt.cmdlist.in: Directory nonexistent

So I'd guess there's some race with creating the build/Documentation
directory (a subsequent run worked fine).

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: Why the Makefile is so eager to re-build & re-link
  2021-06-24 15:16 ` Jeff King
                     ` (2 preceding siblings ...)
  2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
@ 2021-06-29  5:04   ` Eric Sunshine
  3 siblings, 0 replies; 87+ messages in thread
From: Eric Sunshine @ 2021-06-29  5:04 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano,
	Taylor Blau, Emily Shaffer, Johannes Schindelin

On Thu, Jun 24, 2021 at 11:16 AM Jeff King <peff@peff.net> wrote:
> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >    I think the best solution here is to make the generate-*.sh
> >    shellscripts faster (just one takes ~300ms of nested shellscripting,
> >    just to grep out the first few lines of every git-*.txt, in e.g. Perl
> >    or a smarter awk script this would be <5ms).
>
> Yeah, I think Eric mentioned he had looked into doing this in perl, but
> we weren't entirely happy with the dependency. [...]

For what it's worth, the original `generate-cmdlist` was a shell
script which I rewrote[1] in `awk` to extend the functionality, but
Junio felt uncomfortable[2] about making `awk` a build dependency, so
I rewrote[3] it again in `perl`. However, the `perl` version didn't
last long since we got a report[4] that Git would no longer build in
the FreeBSD ports tree, so I rewrote[5] it a final time in shell, thus
coming full circle (but with extended functionality).

[1]: https://lore.kernel.org/git/1431976697-26288-4-git-send-email-sebastien.guimmara@gmail.com/
[2]: https://lore.kernel.org/git/xmqqr3qda7kx.fsf@gitster.dls.corp.google.com/
[3]: https://lore.kernel.org/git/1432149781-24596-4-git-send-email-sebastien.guimmara@gmail.com/
[4]: https://lore.kernel.org/git/loom.20150814T171757-901@post.gmane.org/
[5]: https://lore.kernel.org/git/1440365469-9928-1-git-send-email-sunshine@sunshineco.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
  2021-06-24 15:16 ` Jeff King
  2021-06-24 23:35 ` Øystein Walle
@ 2021-07-02 11:58 ` Ævar Arnfjörð Bjarmason
  2021-07-02 15:53   ` Junio C Hamano
  2021-07-03  1:05   ` Felipe Contreras
  2 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-02 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Martin Ågren, Jeff King, Felipe Contreras,
	Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle,
	Ævar Arnfjörð Bjarmason

Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17)
we've been eagerly re-building the documentation whenever the output
of "git version" (via the GIT-VERSION file) changed. This was never
the intention, and was a regression on what we intended in
7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation,
2007-03-25).

So let's add an ASCIIDOC_MANVERSION variable that we exclude from
ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch
cases where we e.g. switched between asciidoc and asciidoctor, not to
undo the logic in 7b8a74f39cb and force a re-build every time our HEAD
changed in the repository.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

As a follow-up to
https://lore.kernel.org/git/874kdn1j6i.fsf@evledraar.gmail.com/ I cut
"make man" out of my "rebase -x" invocations, I could swear it didn't
used to take so long. Turns out it didn't, and that its eagerness is a
recent-ish regression.

This is what we used to do before v2.22.0, so I'm not too worried
about the edge case discussed in the comment here. I think an
improvement on this might be to e.g. force all the flags with a "make
dist" or one of the install targets.

In practice I don't think there's many/any people who build releases
that matter to anyone out of the checkout they've been using for their
own development.

 Documentation/Makefile | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index f5605b7767f..6b3f0bb6c8b 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -137,11 +137,12 @@ ASCIIDOC_HTML = xhtml11
 ASCIIDOC_DOCBOOK = docbook
 ASCIIDOC_CONF = -f asciidoc.conf
 ASCIIDOC_COMMON = $(ASCIIDOC) $(ASCIIDOC_EXTRA) $(ASCIIDOC_CONF) \
-		-amanversion=$(GIT_VERSION) \
 		-amanmanual='Git Manual' -amansource='Git'
+ASCIIDOC_MANVERSION = -amanversion=$(GIT_VERSION)
+ASCIIDOC_ALL = $(ASCIIDOC_COMMON) $(ASCIIDOC_MANVERSION)
 ASCIIDOC_DEPS = asciidoc.conf GIT-ASCIIDOCFLAGS
-TXT_TO_HTML = $(ASCIIDOC_COMMON) -b $(ASCIIDOC_HTML)
-TXT_TO_XML = $(ASCIIDOC_COMMON) -b $(ASCIIDOC_DOCBOOK)
+TXT_TO_HTML = $(ASCIIDOC_ALL) -b $(ASCIIDOC_HTML)
+TXT_TO_XML = $(ASCIIDOC_ALL) -b $(ASCIIDOC_DOCBOOK)
 MANPAGE_XSL = manpage-normal.xsl
 XMLTO = xmlto
 XMLTO_EXTRA =
@@ -333,6 +334,16 @@ mergetools-list.made: ../git-mergetool--lib.sh $(wildcard ../mergetools/*)
 		show_tool_names can_merge "* " || :' >mergetools-merge.txt && \
 	date >$@
 
+# We use $(ASCIIDOC_COMMON) here, and not $(ASCIIDOC_ALL). We don't
+# want to include $(ASCIIDOC_MANVERSION) and have the documentation
+# re-built every time HEAD changes.
+#
+# This is a trade-off requiring a "clean" build of the documentation
+# for release purposes, in the future we might include the version if
+# there's a cheaper way to re-insert the "Source" version during
+# re-builds. If we detect that that's the only thing we changed we
+# could insert it with a cheap search/replacement against the existing
+# files.
 TRACK_ASCIIDOCFLAGS = $(subst ','\'',$(ASCIIDOC_COMMON):$(ASCIIDOC_HTML):$(ASCIIDOC_DOCBOOK))
 
 GIT-ASCIIDOCFLAGS: FORCE
-- 
2.32.0.634.g284ac724283


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
@ 2021-07-02 15:53   ` Junio C Hamano
  2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
  2021-07-03  1:05   ` Felipe Contreras
  1 sibling, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-07-02 15:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17)
> we've been eagerly re-building the documentation whenever the output
> of "git version" (via the GIT-VERSION file) changed. This was never
> the intention, and was a regression on what we intended in
> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation,
> 2007-03-25).

I am not sure.  Even if there were no changes in say
'Documentation/git-cat-file.txt' and the sources it depends on
between 'master' and 'next', after doing this:

    $ git checkout next
    $ make prefix=$HOME/git-next/ install install-doc
    $ git checkout master
    $ make prefix=$HOME/git-master/ install install-doc
    $ $HOME/git-master/bin/git help cat-file | tail -n 1

I should see that the documentation should say it is from the
'master' branch in its footer, no?

In other words, I think 7b8a74f39cb's reasoning (not the
implementation), especially the last sentence of its log message, is
flawed, where it said:

    Documentation: Replace @@GIT_VERSION@@ in documentation
    
    Include GIT-VERSION-FILE and replace @@GIT_VERSION@@ in
    the HTML and XML asciidoc output. The documentation
    doesn't depend on GIT-VERSION-FILE so it will not be
    automatically rebuild if nothing else changed.

Thanks.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
  2021-07-02 15:53   ` Junio C Hamano
@ 2021-07-03  1:05   ` Felipe Contreras
  2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 87+ messages in thread
From: Felipe Contreras @ 2021-07-03  1:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Martin Ågren, Jeff King, Felipe Contreras,
	Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle,
	Ævar Arnfjörð Bjarmason

Ævar Arnfjörð Bjarmason wrote:
> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17)
> we've been eagerly re-building the documentation whenever the output
> of "git version" (via the GIT-VERSION file) changed. This was never
> the intention, and was a regression on what we intended in
> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation,
> 2007-03-25).
> 
> So let's add an ASCIIDOC_MANVERSION variable that we exclude from
> ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch
> cases where we e.g. switched between asciidoc and asciidoctor, not to
> undo the logic in 7b8a74f39cb and force a re-build every time our HEAD
> changed in the repository.

Once again, why do we care that the version is 2.32.0.98.gcfb60a24d6 and
not 2.32.0.97.g949e814b27?

Not just in the documentation, but everywhere.

Maybe we can add a GIT_RELEASE variable that unlike GIT_VERSION it
doesn't contain the precise commit. For example GIT_RELEASE = 2.33-dev.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-02 15:53   ` Junio C Hamano
@ 2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
  2021-07-05 19:48       ` Junio C Hamano
  0 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-03 11:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle


On Fri, Jul 02 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17)
>> we've been eagerly re-building the documentation whenever the output
>> of "git version" (via the GIT-VERSION file) changed. This was never
>> the intention, and was a regression on what we intended in
>> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation,
>> 2007-03-25).
>
> I am not sure.  Even if there were no changes in say
> 'Documentation/git-cat-file.txt' and the sources it depends on
> between 'master' and 'next', after doing this:
>
>     $ git checkout next
>     $ make prefix=$HOME/git-next/ install install-doc
>     $ git checkout master
>     $ make prefix=$HOME/git-master/ install install-doc
>     $ $HOME/git-master/bin/git help cat-file | tail -n 1
>
> I should see that the documentation should say it is from the
> 'master' branch in its footer, no?

Yes in theory, in practice it's very annoying to have the very slow
documentation build be re-built so aggressively. Since it wasn't a
practical issue anyone worried about before 2019 I think it's worth
reverting it.

> In other words, I think 7b8a74f39cb's reasoning (not the
> implementation), especially the last sentence of its log message, is
> flawed, where it said:
>
>     Documentation: Replace @@GIT_VERSION@@ in documentation
>     
>     Include GIT-VERSION-FILE and replace @@GIT_VERSION@@ in
>     the HTML and XML asciidoc output. The documentation
>     doesn't depend on GIT-VERSION-FILE so it will not be
>     automatically rebuild if nothing else changed.

Arguably it's a feature. The point of the version in the documentation
is to make it clear what version we're discussing. If I build something
on the master SHA-1 and advance to next, and none of the documentation
dependencies change, it's most useful to refer to the oldest last
version we can cover.

I think nobody's doing such a "chained" build when building the docs for
a "real" release, and having mixed versions might be confusing, but for
the "local build" case from a development checkout it's arguably more
useful.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-03  1:05   ` Felipe Contreras
@ 2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
  2021-07-03 18:56       ` Felipe Contreras
  2021-07-05 19:38       ` Junio C Hamano
  0 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-03 12:03 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: git, Junio C Hamano, Martin Ågren, Jeff King, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle


On Fri, Jul 02 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17)
>> we've been eagerly re-building the documentation whenever the output
>> of "git version" (via the GIT-VERSION file) changed. This was never
>> the intention, and was a regression on what we intended in
>> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation,
>> 2007-03-25).
>> 
>> So let's add an ASCIIDOC_MANVERSION variable that we exclude from
>> ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch
>> cases where we e.g. switched between asciidoc and asciidoctor, not to
>> undo the logic in 7b8a74f39cb and force a re-build every time our HEAD
>> changed in the repository.
>
> Once again, why do we care that the version is 2.32.0.98.gcfb60a24d6 and
> not 2.32.0.97.g949e814b27?
>
> Not just in the documentation, but everywhere.

It's useful e.g. on my Debian system to see that the "next" Debian
packaged is 2.31.0.291.g576ba9dcdaf in docs & "git version", arguably
less so for documentation.

But yeah, I think we should at least have some opt-out for sticking the
exact version everywhere, given the mostly unless re-building it does
during development.

> Maybe we can add a GIT_RELEASE variable that unlike GIT_VERSION it
> doesn't contain the precise commit. For example GIT_RELEASE = 2.33-dev.

I just added this to my pre-make script:

	grep -q ^/version .git/info/exclude || echo /version >>.git/info/exclude
	echo $(grep -o -P '(?<=^DEF_VER=v).*' GIT-VERSION-GEN)-dev >version

It makes use of GIT-VERSION-GEN picking up a tarball "version" file.

So far it seems like Junio isn't interested in the patch, and in any
case it doesn't fix the more general over-rebuilding due to version
changes noted upthread when it comes to *.o and libgit. Doing this fixes
both for me. Then when I build an installed version I don't use that
trick.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
@ 2021-07-03 18:56       ` Felipe Contreras
  2021-07-05 19:38       ` Junio C Hamano
  1 sibling, 0 replies; 87+ messages in thread
From: Felipe Contreras @ 2021-07-03 18:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Felipe Contreras
  Cc: git, Junio C Hamano, Martin Ågren, Jeff King, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle

Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jul 02 2021, Felipe Contreras wrote:
> 
> > Ævar Arnfjörð Bjarmason wrote:
> >> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17)
> >> we've been eagerly re-building the documentation whenever the output
> >> of "git version" (via the GIT-VERSION file) changed. This was never
> >> the intention, and was a regression on what we intended in
> >> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation,
> >> 2007-03-25).
> >> 
> >> So let's add an ASCIIDOC_MANVERSION variable that we exclude from
> >> ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch
> >> cases where we e.g. switched between asciidoc and asciidoctor, not to
> >> undo the logic in 7b8a74f39cb and force a re-build every time our HEAD
> >> changed in the repository.
> >
> > Once again, why do we care that the version is 2.32.0.98.gcfb60a24d6 and
> > not 2.32.0.97.g949e814b27?
> >
> > Not just in the documentation, but everywhere.
> 
> It's useful e.g. on my Debian system to see that the "next" Debian
> packaged is 2.31.0.291.g576ba9dcdaf in docs & "git version", arguably
> less so for documentation.

Obviously packagers would use real versions I meant this only for
developers (e.g. DEVELOPER=1).

And BTW, where is that Debian package coming from? I thought
distributions didn't package git pre-releases.

> > Maybe we can add a GIT_RELEASE variable that unlike GIT_VERSION it
> > doesn't contain the precise commit. For example GIT_RELEASE = 2.33-dev.
> 
> I just added this to my pre-make script:
> 
> 	grep -q ^/version .git/info/exclude || echo /version >>.git/info/exclude
> 	echo $(grep -o -P '(?<=^DEF_VER=v).*' GIT-VERSION-GEN)-dev >version
> 
> It makes use of GIT-VERSION-GEN picking up a tarball "version" file.

This would also do the trick (you need DEVELOPER on the environment
though).

--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -6,9 +6,10 @@ DEF_VER=v2.32.0
 LF='
 '
 
-# First see if there is a version file (included in release tarballs),
-# then try git-describe, then default.
-if test -f version
+if test -n "$DEVELOPER"
+then
+       VN="$DEF_VER"-dev
+elif test -f version
 then
        VN=$(cat version) || VN="$DEF_VER"
 elif test -d ${GIT_DIR:-.git} -o -f .git &&


-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
  2021-07-03 18:56       ` Felipe Contreras
@ 2021-07-05 19:38       ` Junio C Hamano
  2021-07-06 22:25         ` Felipe Contreras
  1 sibling, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-07-05 19:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Felipe Contreras, git, Martin Ågren, Jeff King, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It's useful e.g. on my Debian system to see that the "next" Debian
> packaged is 2.31.0.291.g576ba9dcdaf in docs & "git version", arguably
> less so for documentation.

If it is arguable, perhaps make an argument that is more convincing?

What I dislike the most is that in the sample scenario where master
and next has the same documentation material to build "git-cat-file.1",
the installed result would be different depending on the order of
building the documentation, with the change being discussed, i.e.

    $ git checkout master && make prefix=$HOME/git-master install-doc
    $ git checkout next && make prefix=$HOME/git-next install-doc

would make "~/git-next/bin/git help cat-file" to claim the
documentation is from the "master" version.  Which is not all that
bad, given that there wasn't anything that changed the documentation
between 'master' and 'next'.  But if you swap the installation
order, "~/git-master/bin/git help cat-file" would say that the
documentation is from a version much newer than 'master', which is
not quite acceptable.

I am OK with the approach you hinted to have an option to _hide_ the
version string in the generated documentation (hence they lose their
dependency on GIT-VERSION-FILE), while keeping the dependency of
version.o on GIT-VERSION-FILE, so that something goes wrong in a
built binary, the developer can still ask "git version" to identify
where the binary came from.

Thanks.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
@ 2021-07-05 19:48       ` Junio C Hamano
  0 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-07-05 19:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think nobody's doing such a "chained" build when building the docs for
> a "real" release, and having mixed versions might be confusing, but for
> the "local build" case from a development checkout it's arguably more
> useful.

Well, when I prepare a cascade of maintenance releases for CVE, what
do you think is done?  Yes, I do have "make distclean" in the loop
that iterates over the maint-$version branches, just to be extra
careful, but I shouldn't have to.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes
  2021-07-05 19:38       ` Junio C Hamano
@ 2021-07-06 22:25         ` Felipe Contreras
  0 siblings, 0 replies; 87+ messages in thread
From: Felipe Contreras @ 2021-07-06 22:25 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Felipe Contreras, git, Martin Ågren, Jeff King, Taylor Blau,
	Emily Shaffer, Eric Sunshine, Øystein Walle

Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I am OK with the approach you hinted to have an option to _hide_ the
> version string in the generated documentation (hence they lose their
> dependency on GIT-VERSION-FILE), while keeping the dependency of
> version.o on GIT-VERSION-FILE, so that something goes wrong in a
> built binary, the developer can still ask "git version" to identify
> where the binary came from.

But it would be even better if the build system knew what was the intent
of the build.

There's a difference when building a revision just to run `make test` on
it, and building it to do `make install`. In the former there's no need
for a real version in version.o, only in the latter.

Maybe RELASE=1. Not a release from the point of view of a maintainer,
but a release in the sense that the binaries will escape our in-tree
build.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-06-29  2:13       ` Jeff King
@ 2021-10-20 18:39         ` Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
                             ` (9 more replies)
  0 siblings, 10 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

This series is based off an off-hand comment I made about making the
cmdlist.sh faster, in the meantime much of the same methods are
already cooking in "next" for the "lint-docs" target.

See 7/8 for the main performance numbers, along the way I stole some
patches from Johannes Sixt who'd worked on optimizing the script
before, which compliment this new method of generating this file by
piggy-backing more on GNU make for managing a dependency graph for us.

1. https://lore.kernel.org/git/87r1gqxqxn.fsf@evledraar.gmail.com/

Johannes Sixt (2):
  generate-cmdlist.sh: spawn fewer processes
  generate-cmdlist.sh: replace for loop by printf's auto-repeat feature

Ævar Arnfjörð Bjarmason (6):
  command-list.txt: sort with "LC_ALL=C sort"
  generate-cmdlist.sh: trivial whitespace change
  generate-cmdlist.sh: don't call get_categories() from category_list()
  generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  Makefile: stop having command-list.h depend on a wildcard
  Makefile: assert correct generate-cmdlist.sh output

 .gitignore          |  1 +
 Makefile            | 57 ++++++++++++++++++++++++++++++++++++++++-----
 command-list.txt    | 20 ++++++++--------
 generate-cmdlist.sh | 53 ++++++++++++++++++++++++++++-------------
 help.c              |  3 ---
 5 files changed, 99 insertions(+), 35 deletions(-)

-- 
2.33.1.1338.g20da966911a


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort"
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
                             ` (8 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

We should keep these files sorted in the C locale, e.g. in the C
locale the order is:

    git-check-mailmap
    git-check-ref-format
    git-checkout

But under en_US.UTF-8 it's:

    git-check-mailmap
    git-checkout
    git-check-ref-format

In a subsequent commit I'll change generate-cmdlist.sh to use C sort
order, and without this change we'd be led to believe that that change
caused a meaningful change in the output, so let's do this as a
separate step, right now the generate-cmdlist.sh script just uses the
order found in this file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 command-list.txt | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/command-list.txt b/command-list.txt
index a289f09ed6f..02fc7ddde68 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -60,9 +60,9 @@ git-cat-file                            plumbinginterrogators
 git-check-attr                          purehelpers
 git-check-ignore                        purehelpers
 git-check-mailmap                       purehelpers
+git-check-ref-format                    purehelpers
 git-checkout                            mainporcelain
 git-checkout-index                      plumbingmanipulators
-git-check-ref-format                    purehelpers
 git-cherry                              plumbinginterrogators          complete
 git-cherry-pick                         mainporcelain
 git-citool                              mainporcelain
@@ -111,7 +111,6 @@ git-index-pack                          plumbingmanipulators
 git-init                                mainporcelain           init
 git-instaweb                            ancillaryinterrogators          complete
 git-interpret-trailers                  purehelpers
-gitk                                    mainporcelain
 git-log                                 mainporcelain           info
 git-ls-files                            plumbinginterrogators
 git-ls-remote                           plumbinginterrogators
@@ -124,11 +123,11 @@ git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
 git-merge-index                         plumbingmanipulators
 git-merge-one-file                      purehelpers
-git-mergetool                           ancillarymanipulators           complete
 git-merge-tree                          ancillaryinterrogators
-git-multi-pack-index                    plumbingmanipulators
+git-mergetool                           ancillarymanipulators           complete
 git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
+git-multi-pack-index                    plumbingmanipulators
 git-mv                                  mainporcelain           worktree
 git-name-rev                            plumbinginterrogators
 git-notes                               mainporcelain
@@ -154,23 +153,23 @@ git-request-pull                        foreignscminterface             complete
 git-rerere                              ancillaryinterrogators
 git-reset                               mainporcelain           history
 git-restore                             mainporcelain           worktree
-git-revert                              mainporcelain
 git-rev-list                            plumbinginterrogators
 git-rev-parse                           plumbinginterrogators
+git-revert                              mainporcelain
 git-rm                                  mainporcelain           worktree
 git-send-email                          foreignscminterface             complete
 git-send-pack                           synchingrepositories
+git-sh-i18n                             purehelpers
+git-sh-setup                            purehelpers
 git-shell                               synchelpers
 git-shortlog                            mainporcelain
 git-show                                mainporcelain           info
 git-show-branch                         ancillaryinterrogators          complete
 git-show-index                          plumbinginterrogators
 git-show-ref                            plumbinginterrogators
-git-sh-i18n                             purehelpers
-git-sh-setup                            purehelpers
 git-sparse-checkout                     mainporcelain           worktree
-git-stash                               mainporcelain
 git-stage                                                               complete
+git-stash                               mainporcelain
 git-status                              mainporcelain           info
 git-stripspace                          purehelpers
 git-submodule                           mainporcelain
@@ -189,10 +188,11 @@ git-var                                 plumbinginterrogators
 git-verify-commit                       ancillaryinterrogators
 git-verify-pack                         plumbinginterrogators
 git-verify-tag                          ancillaryinterrogators
-gitweb                                  ancillaryinterrogators
 git-whatchanged                         ancillaryinterrogators          complete
 git-worktree                            mainporcelain
 git-write-tree                          plumbingmanipulators
+gitk                                    mainporcelain
+gitweb                                  ancillaryinterrogators
 gitattributes                           guide
 gitcli                                  guide
 gitcore-tutorial                        guide
@@ -211,6 +211,6 @@ gitremote-helpers                       guide
 gitrepository-layout                    guide
 gitrevisions                            guide
 gitsubmodules                           guide
-gittutorial-2                           guide
 gittutorial                             guide
+gittutorial-2                           guide
 gitworkflows                            guide
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
                             ` (7 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

This makes a subsequent diff smaller, and won't leave us with this
syntax nit at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..5114f46680a 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -10,7 +10,7 @@ command_list () {
 }
 
 get_categories () {
-	tr ' ' '\012'|
+	tr ' ' '\012' |
 	grep -v '^$' |
 	sort |
 	uniq
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

From: Johannes Sixt <j6t@kdbg.org>

The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.

Furthermore, use sort -u instead of sort | uniq to remove yet another
process.

[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:

$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):     371.3 ms ±  64.2 ms    [User: 430.4 ms, System: 72.5 ms]
  Range (min … max):   320.5 ms … 517.7 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):     489.9 ms ± 185.4 ms    [User: 724.7 ms, System: 141.3 ms]
  Range (min … max):   346.0 ms … 885.3 ms    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 5114f46680a..27367915611 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -11,15 +11,14 @@ command_list () {
 
 get_categories () {
 	tr ' ' '\012' |
-	grep -v '^$' |
-	sort |
-	uniq
+	LC_ALL=C sort -u
 }
 
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories
+	get_categories |
+	grep -v '^$'
 }
 
 get_synopsis () {
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list()
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

This isn't for optimization as the get_categories() is a purely shell
function, but rather for ease of readability, let's just inline these
two lines. We'll be changing this code some more in subsequent commits
to make this worth it.

Rename the get_categories() function to get_category_line(), since
that's what it's doing now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 27367915611..16043e38476 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -9,7 +9,7 @@ command_list () {
 	eval "grep -ve '^#' $exclude_programs" <"$1"
 }
 
-get_categories () {
+get_category_line () {
 	tr ' ' '\012' |
 	LC_ALL=C sort -u
 }
@@ -17,7 +17,8 @@ get_categories () {
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories |
+	tr ' ' '\012' |
+	LC_ALL=C sort -u |
 	grep -v '^$'
 }
 
@@ -66,7 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_categories)
+		for cat in $(echo "$rest" | get_category_line)
 		do
 			printf " | CAT_$cat"
 		done
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

This doesn't matter for performance, but let's not include the empty
lines in our sorting. This makes the intent of the code clearer.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 16043e38476..e517c33710a 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -18,8 +18,8 @@ category_list () {
 	command_list "$1" |
 	cut -c 40- |
 	tr ' ' '\012' |
-	LC_ALL=C sort -u |
-	grep -v '^$'
+	grep -v '^$' |
+	LC_ALL=C sort -u
 }
 
 get_synopsis () {
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-21 14:42             ` Jeff King
  2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

From: Johannes Sixt <j6t@kdbg.org>

This is just a small code reduction. There is a small probability that
the new code breaks when the category list is empty. But that would be
noticed during the compile step.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index e517c33710a..a1ab2b1f077 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -67,10 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_category_line)
-		do
-			printf " | CAT_$cat"
-		done
+		printf " | CAT_%s" $(echo "$rest" | get_category_line)
 		echo " },"
 	done
 	echo "};"
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-21 14:45             ` Jeff King
  2021-10-21 22:46             ` Øystein Walle
  2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  9 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

Change the dependency graph for "command-list.h" so that it's not
re-built in its entirety every time one file in Documentation/git*.txt
changes. This makes the generation of the command-list.h use a pattern
similar to that established in 8650c6298c1 (doc lint: make "lint-docs"
non-.PHONY, 2021-10-15) for the "lint-docs" target.

This change replaces the monolithic "generate-cmdlist.sh" invocation
with a dependency chain like:

    Documentation/git-add.txt ->
    .build/command-list.h.d/git-add.gen (and "git-add.txt" intermediary) ->
    command-list.h

I.e. we'll now generate intermediary files when extracting the "NAME"
section the documentation. Here the 6th line of
"Documentation/git.txt" will be extracted to
".build/command-list.h.d/git-add.gen.txt":

    git-add - Add file contents to the index

We'll then create an intermediate
".build/command-list.h.d/git-add.gen" (leading "\t" stripped):

    ./generate-cmdlist.sh --entry-only .build/command-list.h.d/git-add.gen.txt
    { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree },

The "command-list.h" itself can the be made by having
"generate-cmdlist.sh" emit only the header section, followed by:

    LC_ALL=sort .build/command-list.h.d/*.gen

Unfortunately we can't drop the old code completely due to the CMake
integration, see 061c2240b1b (Introduce CMake support for configuring
Git, 2020-06-12). It will keep using the older and slower script.

With this the initial creation of the command-list.h is a bit slower
with -j1, but around 2x as fast -j8[1]. The real benefit comes from
the more common case of an incremental build, say when only
"Documentation/git-add.txt" was updated after a "pull". There we're
4-5x as fast with this new method[2].

The benefit of optimizing this is because this file is very frequently
re-generated, e.g. for "git rebase -i --exec 'make git'" with
"git-add.txt" modified we're around 1.7 times as fast[3]. That target
will need to re-make "git" (via "help.o") due to help.c's use of
command-list.h.

In terms of implementation: I'm adding $(QUIET) while I'm at it, which
is here so we don't quiet the equivalent of trace output under V=1,
this could be used in other places that use "@cmd" to quiet "cmd"
output.

Some of the dependencies between "command-list.h" and
"$(COMMAND_LIST_GEN)" are redundant, considering that the former
depends on the latter. I'm sticking to listing dependencies mentioned
in the rule itself, e.g. "command-list.h" itself calls
"generate-cmdlist.sh", so I list the dependency even though it would
get it via a recursive dependency.

These rules can be lazy about leaving behind files on error thanks to
the .DELETE_ON_ERROR flag, see 7b76d6bf221 (Makefile: add and use the
".DELETE_ON_ERROR" flag, 2021-06-29), or in the case of the
"*.txt.gen" files because we'll unconditionally clobber them anyway if
the relevant source file is touched, so we can leave them for "make
clean".

1. Piped through: <cmd> 2>/dev/null | grep -P -v "^\s*$"; ditto below:
   hyperfine -s basic -L j ", -j8" -L s ,.old -p 'rm -rf .build command-list.h' 'make{j} -f Makefile{s} command-list.h'
   Benchmark #1: make -f Makefile command-list.h
     Time (mean ± σ):     769.3 ms ±  90.4 ms    [User: 892.4 ms, System: 98.8 ms]
     Range (min … max):   665.6 ms … 941.9 ms    10 runs
   Benchmark #2: make -j8 -f Makefile command-list.h
     Time (mean ± σ):     212.5 ms ±  45.0 ms    [User: 954.4 ms, System: 136.9 ms]
     Range (min … max):   187.6 ms … 326.2 ms    11 runs
   Benchmark #3: make -f Makefile.old command-list.h
     Time (mean ± σ):     515.0 ms ±  70.5 ms    [User: 613.8 ms, System: 110.9 ms]
     Range (min … max):   407.6 ms … 603.9 ms    10 runs
   Benchmark #4: make -j8 -f Makefile.old command-list.h
     Time (mean ± σ):     474.9 ms ±  62.7 ms    [User: 569.7 ms, System: 106.0 ms]
     Range (min … max):   407.0 ms … 556.8 ms    10 runs
   Summary
     'make -j8 -f Makefile command-list.h' ran
       2.24 ± 0.56 times faster than 'make -j8 -f Makefile.old command-list.h'
       2.42 ± 0.61 times faster than 'make -f Makefile.old command-list.h'
       3.62 ± 0.88 times faster than 'make -f Makefile command-list.h'
2. hyperfine -s basic -L j ", -j8" -L s ,.old -p 'touch Documentation/git-add.txt' 'make{j} -f Makefile{s} command-list.h'
   Benchmark #1: make -f Makefile command-list.h
     Time (mean ± σ):      94.6 ms ±  26.7 ms    [User: 83.0 ms, System: 14.1 ms]
     Range (min … max):    79.4 ms … 186.0 ms    15 runs
   Benchmark #2: make -j8 -f Makefile command-list.h
     Time (mean ± σ):      80.3 ms ±  10.5 ms    [User: 81.3 ms, System: 12.9 ms]
     Range (min … max):    76.9 ms … 127.6 ms    37 runs
   Benchmark #3: make -f Makefile.old command-list.h
     Time (mean ± σ):     348.9 ms ± 174.5 ms    [User: 415.3 ms, System: 70.6 ms]
     Range (min … max):    66.0 ms … 550.3 ms    43 runs
   Benchmark #4: make -j8 -f Makefile.old command-list.h
     Time (mean ± σ):     406.7 ms ± 157.1 ms    [User: 473.4 ms, System: 78.3 ms]
     Range (min … max):    67.4 ms … 560.4 ms    34 runs
   Summary
     'make -j8 -f Makefile command-list.h' ran
       1.18 ± 0.37 times faster than 'make -f Makefile command-list.h'
       4.34 ± 2.25 times faster than 'make -f Makefile.old command-list.h'
       5.07 ± 2.07 times faster than 'make -j8 -f Makefile.old command-list.h'
3. hyperfine -s basic -L j " -j8" -L s ,.old -p 'touch Documentation/git-add.txt' 'make{j} -f Makefile{s} git'
   Benchmark #1: make -j8 -f Makefile git
     Time (mean ± σ):     484.2 ms ±  53.5 ms    [User: 253.2 ms, System: 83.5 ms]
     Range (min … max):   400.7 ms … 580.5 ms    10 runs
   Benchmark #2: make -j8 -f Makefile.old git
     Time (mean ± σ):     836.7 ms ±  37.3 ms    [User: 729.2 ms, System: 157.1 ms]
     Range (min … max):   774.2 ms … 885.6 ms    10 runs
   Summary
     'make -j8 -f Makefile git' ran
       1.73 ± 0.21 times faster than 'make -j8 -f Makefile.old git'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 .gitignore          |  1 +
 Makefile            | 52 +++++++++++++++++++++++++++++++++++++++------
 generate-cmdlist.sh | 36 +++++++++++++++++++++++++------
 3 files changed, 77 insertions(+), 12 deletions(-)

diff --git a/.gitignore b/.gitignore
index 054249b20a8..dc03a988b16 100644
--- a/.gitignore
+++ b/.gitignore
@@ -198,6 +198,7 @@
 *.[aos]
 *.o.json
 *.py[co]
+.build/
 .depend/
 *.gcda
 *.gcno
diff --git a/Makefile b/Makefile
index 381bed2c1d2..ce4cce57eb8 100644
--- a/Makefile
+++ b/Makefile
@@ -1954,6 +1954,7 @@ endif
 
 ifneq ($(findstring s,$(MAKEFLAGS)),s)
 ifndef V
+	QUIET          = @
 	QUIET_CC       = @echo '   ' CC $@;
 	QUIET_AR       = @echo '   ' AR $@;
 	QUIET_LINK     = @echo '   ' LINK $@;
@@ -2242,12 +2243,50 @@ config-list.h: generate-configlist.sh
 config-list.h: Documentation/*config.txt Documentation/config/*.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh >$@
 
-command-list.h: generate-cmdlist.sh command-list.txt
-
-command-list.h: $(wildcard Documentation/git*.txt)
-	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
-		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
-		command-list.txt >$@
+EXCLUDED_COMMAND_LIST =
+EXCLUDED_COMMAND_LIST += git
+EXCLUDED_COMMAND_LIST += git-bisect-lk2009
+EXCLUDED_COMMAND_LIST += git-credential-cache--daemon
+EXCLUDED_COMMAND_LIST += git-fsck-objects
+EXCLUDED_COMMAND_LIST += git-init-db
+EXCLUDED_COMMAND_LIST += git-mergetool--lib
+EXCLUDED_COMMAND_LIST += git-remote-ext
+EXCLUDED_COMMAND_LIST += git-remote-fd
+EXCLUDED_COMMAND_LIST += git-sh-i18n--envsubst
+EXCLUDED_COMMAND_LIST += git-submodules
+EXCLUDED_COMMAND_LIST += git-tools
+EXCLUDED_COMMAND_LIST += git-version
+EXCLUDED_COMMAND_LIST += git-web--browse
+EXCLUDED_COMMAND_LIST += gitweb.conf
+
+EXCLUDED_TXT += $(patsubst %,Documentation/%.txt,$(EXCLUDED_PROGRAMS) $(EXCLUDED_COMMAND_LIST))
+COMMAND_LIST_TXT_DEP = $(filter-out $(EXCLUDED_TXT), $(wildcard Documentation/git*.txt))
+
+COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,.build/command-list.h.d/%.gen,$(COMMAND_LIST_TXT_DEP))
+
+.build:
+	$(QUIET)mkdir .build
+.build/command-list.h.d: | .build
+	$(QUIET)mkdir -p .build/command-list.h.d
+
+# We must depend on .build/command-list.h as an "order-only"
+# prerequisite, its mtime will change when these targets run.
+$(COMMAND_LIST_GEN): | .build/command-list.h.d
+$(COMMAND_LIST_GEN): command-list.txt
+$(COMMAND_LIST_GEN): generate-cmdlist.sh
+$(COMMAND_LIST_GEN): .build/command-list.h.d/%.gen: Documentation/%.txt
+	$(QUIET)grep "^$(patsubst .build/command-list.h.d/%.gen,%,$@) " command-list.txt >$@.txt && \
+	./generate-cmdlist.sh --entry-only $@.txt >$@
+
+command-list.h: $(COMMAND_LIST_GEN)
+command-list.h: generate-cmdlist.sh
+command-list.h: command-list.txt
+	$(QUIET_GEN){ \
+		$(SHELL_PATH) ./generate-cmdlist.sh --header-only command-list.txt && \
+		echo "static struct cmdname_help command_list[] = {" && \
+		LC_ALL=C sort $(COMMAND_LIST_GEN) && \
+		echo "};"; \
+	} >$@
 
 hook-list.h: generate-hooklist.sh Documentation/githooks.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-hooklist.sh >$@
@@ -3238,6 +3277,7 @@ clean: profile-clean coverage-clean cocciclean
 	$(RM) $(SP_OBJ)
 	$(RM) $(HCC)
 	$(RM) -r bin-wrappers $(dep_dirs) $(compdb_dir) compile_commands.json
+	$(RM) -r .build/
 	$(RM) -r po/build/
 	$(RM) *.pyc *.pyo */*.pyc */*.pyo $(GENERATED_H) $(ETAGS_TARGET) tags cscope*
 	$(RM) -r .dist-tmp-dir .doc-tmp-dir
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f077..2bc528e8cae 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -15,7 +15,7 @@ get_category_line () {
 }
 
 category_list () {
-	command_list "$1" |
+	grep -v '^#' "$1" |
 	cut -c 40- |
 	tr ' ' '\012' |
 	grep -v '^$' |
@@ -23,13 +23,14 @@ category_list () {
 }
 
 get_synopsis () {
+	head -n 10 "Documentation/$1.txt" |
 	sed -n '
 		/^NAME/,/'"$1"'/H
 		${
 			x
 			s/.*'"$1"' - \(.*\)/N_("\1")/
 			p
-		}' "Documentation/$1.txt"
+		}'
 }
 
 define_categories () {
@@ -61,16 +62,12 @@ define_category_names () {
 }
 
 print_command_list () {
-	echo "static struct cmdname_help command_list[] = {"
-
-	command_list "$1" |
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
 		printf " | CAT_%s" $(echo "$rest" | get_category_line)
 		echo " },"
 	done
-	echo "};"
 }
 
 exclude_programs=
@@ -81,6 +78,19 @@ do
 	shift
 done
 
+header_only=
+case $1 in
+--entry-only)
+	shift
+	print_command_list $1 <"$1"
+	exit 0
+	;;
+--header-only)
+	shift
+	header_only=t
+	;;
+esac
+
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
 	const char *name;
@@ -92,4 +102,18 @@ define_categories "$1"
 echo
 define_category_names "$1"
 echo
+
+if test -n "$header_only"
+then
+	exit 0
+fi
+
+# The old compatibility mode for CMmake. See 061c2240b1b (Introduce
+# CMake support for configuring Git, 2020-06-12)
+echo "static struct cmdname_help command_list[] = {"
+grep -v \
+	-e '^#' \
+	-e '^git-fsck-objects ' \
+	"$1" |
 print_command_list "$1"
+echo "};"
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
@ 2021-10-20 18:39           ` Ævar Arnfjörð Bjarmason
  2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Ævar Arnfjörð Bjarmason

Because generate-cmdlist.sh invokes "sed" to extract the "NAME" blurb
from the Documentation/git-*.txt files, we can end up with bad content
if those files aren't what we expected. E.g. we'll emit multiple lines
from that "sed" one-liner if the "NAME" section were to move further
down than what our "head -n 10" covers.

Let's assert that this can't happen by checking that the number of
lines we make is what we'd expect to get. This means we can remove an
assertion added in cfb22a02ab5 (help: use command-list.h for common
command list, 2018-05-10). We'll catch this during compilation
instead.

There are still other cases where it's possible to generate a bad
command-list.h, but the rest should be caught by a C compilation
error.

It would be possible to change our generated "command-list.h" to
simply include this new ".build/command-list.h.gen" instead of cat-ing
it in the Makefile, but let's leave the generated file as it is.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile            | 7 ++++++-
 generate-cmdlist.sh | 2 +-
 help.c              | 3 ---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index ce4cce57eb8..711f0c064e6 100644
--- a/Makefile
+++ b/Makefile
@@ -2278,13 +2278,18 @@ $(COMMAND_LIST_GEN): .build/command-list.h.d/%.gen: Documentation/%.txt
 	$(QUIET)grep "^$(patsubst .build/command-list.h.d/%.gen,%,$@) " command-list.txt >$@.txt && \
 	./generate-cmdlist.sh --entry-only $@.txt >$@
 
+.build/command-list.h.gen: $(COMMAND_LIST_GEN)
+	$(QUIET)LC_ALL=C sort $(COMMAND_LIST_GEN) >$@ && \
+	test $$(wc -l <$@) -eq $(words $(COMMAND_LIST_GEN))
+
 command-list.h: $(COMMAND_LIST_GEN)
 command-list.h: generate-cmdlist.sh
 command-list.h: command-list.txt
+command-list.h: .build/command-list.h.gen
 	$(QUIET_GEN){ \
 		$(SHELL_PATH) ./generate-cmdlist.sh --header-only command-list.txt && \
 		echo "static struct cmdname_help command_list[] = {" && \
-		LC_ALL=C sort $(COMMAND_LIST_GEN) && \
+		cat $< && \
 		echo "};"; \
 	} >$@
 
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 2bc528e8cae..bdefa151ae1 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -23,7 +23,7 @@ category_list () {
 }
 
 get_synopsis () {
-	head -n 10 "Documentation/$1.txt" |
+	head -n 6 "Documentation/$1.txt" |
 	sed -n '
 		/^NAME/,/'"$1"'/H
 		${
diff --git a/help.c b/help.c
index 973e47cdc30..cd6dd4c2440 100644
--- a/help.c
+++ b/help.c
@@ -57,9 +57,6 @@ static void extract_cmds(struct cmdname_help **p_cmds, uint32_t mask)
 	int i, nr = 0;
 	struct cmdname_help *cmds;
 
-	if (ARRAY_SIZE(command_list) == 0)
-		BUG("empty command_list[] is a sign of broken generate-cmdlist.sh");
-
 	ALLOC_ARRAY(cmds, ARRAY_SIZE(command_list) + 1);
 
 	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
-- 
2.33.1.1338.g20da966911a


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (7 preceding siblings ...)
  2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
@ 2021-10-20 20:35           ` Jeff King
  2021-10-20 21:31             ` Taylor Blau
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
  9 siblings, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-20 20:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This series is based off an off-hand comment I made about making the
> cmdlist.sh faster, in the meantime much of the same methods are
> already cooking in "next" for the "lint-docs" target.
> 
> See 7/8 for the main performance numbers, along the way I stole some
> patches from Johannes Sixt who'd worked on optimizing the script
> before, which compliment this new method of generating this file by
> piggy-backing more on GNU make for managing a dependency graph for us.

I still think this is a much more complicated and error-prone approach
than just making the script faster. I know we can't rely on perl, but
could we use it optimistically?

The proof-of-concept below on top of your patch 6 does two things:

  - observes that there is no need for get_category_line in the loop; it
    is just sorting and de-duping the bitfields, but since we just OR
    them together, neither of those things matters

  - uses perl to open each individual doc file to get the synopsis. It
    _feels_ like this should be something that sed or awk could do, but
    it is beyond me. However, speculatively trying perl is an easy win,
    and we can fall back to the shell loop.

Here are my timings:

Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):      40.4 ms ±  18.1 ms    [User: 44.9 ms, System: 7.1 ms]
  Range (min … max):    20.3 ms …  65.5 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):      1.414 s ±  0.038 s    [User: 1.641 s, System: 0.863 s]
  Range (min … max):    1.344 s …  1.451 s    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
   34.96 ± 15.66 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

I hate having fallbacks, because the seldom-used version may bitrot. I'm
tempted to just write that loop in C, but there's a circular dependency
with using any of libgit.a (even though it's really only the git
porcelain that cares about command-list.h, it goes into help.o which
goes into libgit.a. We could break that dependency if we wanted,
though). If we can do it in awk, that may be worthwhile.

But either way, I think this is superior to trying to parallelize the
Makefile:

  - it actually uses less CPU, rather than just trying to improve
    wall-clock time by using more cores

  - there's little chance of having some subtle dependency problem

Parallelizing makes a lot of sense to me when the operation is truly
expensive. But in this case it's literally just opening a file, and the
only reason it's slow is because we spawn a ton of processes to do it.

---
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..f922eebe23 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -63,11 +63,23 @@ define_category_names () {
 print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"
 
+	# try perl first, as we can do it all in one process
+	command_list "$1" |
+	perl -ne '
+		my ($cmd, @rest) = split;
+		open(my $fh, "<", "Documentation/$cmd.txt");
+		while (<$fh>) {
+			next unless /^$cmd - (.*)/;
+			print "{ \"$cmd\", N_(\"$1\"), 0";
+			print " | CAT_$_" for (@rest);
+			print " },\n";
+		}
+	' ||
 	command_list "$1" |
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
@ 2021-10-20 21:31             ` Taylor Blau
  2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Taylor Blau @ 2021-10-20 21:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Johannes Sixt, Øystein Walle

On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> > This series is based off an off-hand comment I made about making the
> > cmdlist.sh faster, in the meantime much of the same methods are
> > already cooking in "next" for the "lint-docs" target.
> >
> > See 7/8 for the main performance numbers, along the way I stole some
> > patches from Johannes Sixt who'd worked on optimizing the script
> > before, which compliment this new method of generating this file by
> > piggy-backing more on GNU make for managing a dependency graph for us.
>
> I still think this is a much more complicated and error-prone approach
> than just making the script faster. I know we can't rely on perl, but
> could we use it optimistically?

I'll take credit for this terrible idea of using Perl when available.

But I don't think we even need to, since we could just rely on Awk. That
has all the benefits you described while still avoiding the circular
dependency on libgit.a. But the killer feature is that we don't have to
rely on two implementations, the lesser-used of which is likely to
bitrot over time.

The resulting awk is a little ugly, because of the nested-ness. I'm also
no awk-spert, but I think that something like the below gets the job
done.

It also has the benefit of being slightly faster than the equivalent
Perl implementation, for whatever those extra ~9 ms are worth ;).

Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
  Range (min … max):    15.5 ms …  31.7 ms    95 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
  Range (min … max):    22.4 ms …  54.8 ms    64 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

---

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..39338ef1cc 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -64,12 +64,19 @@ print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"

 	command_list "$1" |
-	while read cmd rest
-	do
-		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
-		echo " },"
-	done
+	awk '{
+		f="Documentation/" $1 ".txt"
+		while((getline line<f) > 0) {
+			if (match(line, "^" $1 " - ")) {
+				syn=substr(line, RLENGTH+1)
+				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
+				for (i=2; i<=NF; i++) {
+					printf " | CAT_%s", $i
+				}
+				print " },"
+			}
+		}
+	}'
 	echo "};"
 }

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-20 21:31             ` Taylor Blau
@ 2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
  2021-10-20 23:46                 ` Jeff King
  2021-10-21  5:39                 ` Eric Sunshine
  0 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-20 23:14 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Jeff King, git, Junio C Hamano, Johannes Sixt, Øystein Walle


On Wed, Oct 20 2021, Taylor Blau wrote:

> On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
>> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > This series is based off an off-hand comment I made about making the
>> > cmdlist.sh faster, in the meantime much of the same methods are
>> > already cooking in "next" for the "lint-docs" target.
>> >
>> > See 7/8 for the main performance numbers, along the way I stole some
>> > patches from Johannes Sixt who'd worked on optimizing the script
>> > before, which compliment this new method of generating this file by
>> > piggy-backing more on GNU make for managing a dependency graph for us.
>>
>> I still think this is a much more complicated and error-prone approach
>> than just making the script faster. I know we can't rely on perl, but
>> could we use it optimistically?

Jeff: Just in terms of error prone both of these implementations will
accept bad input that's being caught in 8/8 of this series.

We accept a lot of bad input now, ending up with some combinations of
bad output or compile errors if you screw with the input *.txt files. I
think I've addressed all of those in this series.

If you mean the general concept of making a "foo.gen" from a "foo.txt"
as an intermediate with make as a way to get to "many-foo.h" I don't
really see how it's error prone conceptually. You get error checking
each step of the way, and it encourages logic that's simpler each step
of the way.

> I'll take credit for this terrible idea of using Perl when available.
>
> But I don't think we even need to, since we could just rely on Awk. That
> has all the benefits you described while still avoiding the circular
> dependency on libgit.a. But the killer feature is that we don't have to
> rely on two implementations, the lesser-used of which is likely to
> bitrot over time.
>
> The resulting awk is a little ugly, because of the nested-ness. I'm also
> no awk-spert, but I think that something like the below gets the job
> done.
>
> It also has the benefit of being slightly faster than the equivalent
> Perl implementation, for whatever those extra ~9 ms are worth ;).
>
> Benchmark #1: sh generate-cmdlist.sh command-list.txt
>   Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
>   Range (min … max):    15.5 ms …  31.7 ms    95 runs
>
> Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
>   Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
>   Range (min … max):    22.4 ms …  54.8 ms    64 runs
>
> Summary
>   'sh generate-cmdlist.sh command-list.txt' ran
>     1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>
> ---
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..39338ef1cc 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -64,12 +64,19 @@ print_command_list () {
>  	echo "static struct cmdname_help command_list[] = {"
>
>  	command_list "$1" |
> -	while read cmd rest
> -	do
> -		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> -		echo " },"
> -	done
> +	awk '{
> +		f="Documentation/" $1 ".txt"
> +		while((getline line<f) > 0) {
> +			if (match(line, "^" $1 " - ")) {
> +				syn=substr(line, RLENGTH+1)
> +				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
> +				for (i=2; i<=NF; i++) {
> +					printf " | CAT_%s", $i
> +				}
> +				print " },"
> +			}
> +		}
> +	}'
>  	echo "};"
>  }

Per Eric's Sunshine's upthread comments an awk and Perl implementation
were both considered before[1].

I also care a bit about the timings of the from-scratch build, but I
think they're way less interesting than a partial build.

I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
with/without this awk version the difference in runtime is within the
error bars. I.e. making the loop faster isn't necessary. It's better to
get to a point where make can save you from doing all/most of the work
by checking modification times, rather than making an O(n) loop faster.

The only reason there's even a loop there is because it's used by the
cmake logic in contrib/* (how we've ended up with a hard dependency in
contrib is another matter...).

I'm also interested in (and have WIP patches for) simplifying things
more generally in the Makefile. Once we have a file exploded out has
just the synopsis line that can be used to replace what's now in
Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
"man git".

There's subtle dependency issues there as well, and just having a
one-off solution for the the command-list.h doesn't get us closer to
addressing that sibling implementation.

In terms of future Makefile work I was hoping to get this in, untangle
some of the complexity between the inter-dependency of Makefile &
Documentation/Makefile (eventually just merging the two, and leaving a
stub in Documentation/Makefile). I've also got a working implementation
for getting rid of all of the "FORCE" dependencies (except the version
one).

1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
@ 2021-10-20 23:46                 ` Jeff King
  2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
  2021-10-21  5:39                 ` Eric Sunshine
  1 sibling, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-20 23:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Thu, Oct 21, 2021 at 01:14:37AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Jeff: Just in terms of error prone both of these implementations will
> accept bad input that's being caught in 8/8 of this series.
> 
> We accept a lot of bad input now, ending up with some combinations of
> bad output or compile errors if you screw with the input *.txt files. I
> think I've addressed all of those in this series.

I don't mind more error-checking, though TBH I don't find a huge value
in it. But what I did mean was:

> If you mean the general concept of making a "foo.gen" from a "foo.txt"
> as an intermediate with make as a way to get to "many-foo.h" I don't
> really see how it's error prone conceptually. You get error checking
> each step of the way, and it encourages logic that's simpler each step
> of the way.

Yes. It just seems like the Makefile gets more complicated, and
sometimes that can lead to subtle dependency issues (e.g., the ".build"
dependency in the earlier iteration of the series).

And in general I'd much rather debug an awk script than a Makefile.

> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> were both considered before[1].

Ah sorry, I thought it was just a perl one that had been the
show-stopper. I hadn't noticed the awk one. However, the point of my
patch was to use perl if available, and fall back otherwise. Maybe
that's too ugly, but it does address the concern with Eric's
implementation.

> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
> with/without this awk version the difference in runtime is within the
> error bars. I.e. making the loop faster isn't necessary. It's better to
> get to a point where make can save you from doing all/most of the work
> by checking modification times, rather than making an O(n) loop faster.

FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
partial results is IMHO inferior to just making things faster.

> I'm also interested in (and have WIP patches for) simplifying things
> more generally in the Makefile. Once we have a file exploded out has
> just the synopsis line that can be used to replace what's now in
> Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
> "man git".
> 
> There's subtle dependency issues there as well, and just having a
> one-off solution for the the command-list.h doesn't get us closer to
> addressing that sibling implementation.

So I don't know what "subtle dependency issues" you found here, but this
is exactly the kind of complexity it was my goal to avoid.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-20 23:46                 ` Jeff King
@ 2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
  2021-10-21  2:20                     ` Taylor Blau
  2021-10-21 14:34                     ` Jeff King
  0 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-21  0:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle


On Wed, Oct 20 2021, Jeff King wrote:

> On Thu, Oct 21, 2021 at 01:14:37AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Jeff: Just in terms of error prone both of these implementations will
>> accept bad input that's being caught in 8/8 of this series.
>> 
>> We accept a lot of bad input now, ending up with some combinations of
>> bad output or compile errors if you screw with the input *.txt files. I
>> think I've addressed all of those in this series.
>
> I don't mind more error-checking, though TBH I don't find a huge value
> in it. But what I did mean was:
>
>> If you mean the general concept of making a "foo.gen" from a "foo.txt"
>> as an intermediate with make as a way to get to "many-foo.h" I don't
>> really see how it's error prone conceptually. You get error checking
>> each step of the way, and it encourages logic that's simpler each step
>> of the way.
>
> Yes. It just seems like the Makefile gets more complicated, and
> sometimes that can lead to subtle dependency issues (e.g., the ".build"
> dependency in the earlier iteration of the series).

FWIW there wasn't an earlier version of the series, just a POC patch I
had as a comment in
https://lore.kernel.org/git/87r1gqxqxn.fsf@evledraar.gmail.com/

> And in general I'd much rather debug an awk script than a Makefile.
>
>> Per Eric's Sunshine's upthread comments an awk and Perl implementation
>> were both considered before[1].
>
> Ah sorry, I thought it was just a perl one that had been the
> show-stopper. I hadn't noticed the awk one. However, the point of my
> patch was to use perl if available, and fall back otherwise. Maybe
> that's too ugly, but it does address the concern with Eric's
> implementation.

I think carrying two implementations is worse than just having the one
slightly slower one.

>> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
>> with/without this awk version the difference in runtime is within the
>> error bars. I.e. making the loop faster isn't necessary. It's better to
>> get to a point where make can save you from doing all/most of the work
>> by checking modification times, rather than making an O(n) loop faster.
>
> FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
> partial results is IMHO inferior to just making things faster.

I agree with you in the general case, but for something that's consumed
by a make dependency graph I find it easier to debug things if
e.g. changing git-add.txt results in a change to git-add.gen, which is
then cat'd together.

IOW if we had a sufficiently fast C compiler I think I'd still prefer
make's existing rules over some equivalent of:

    cat *.c | super-fast-cc

Since similar to how the *.sp files depend on the the *.o files now,
declaring the dependency graph allows you to easily add more built
things.

>> I'm also interested in (and have WIP patches for) simplifying things
>> more generally in the Makefile. Once we have a file exploded out has
>> just the synopsis line that can be used to replace what's now in
>> Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
>> "man git".
>> 
>> There's subtle dependency issues there as well, and just having a
>> one-off solution for the the command-list.h doesn't get us closer to
>> addressing that sibling implementation.
>
> So I don't know what "subtle dependency issues" you found here, but this
> is exactly the kind of complexity it was my goal to avoid.

But how? I don't see how narrowly making the loop in generate-cmdlist.sh
gets us closer to generating the "cmds_txt" in the
Documentation/Makefile.

Whereas after this series we're pretty much there in terms of generating
those files. i.e. try:

    cat Documentation/cmds-mainporcelain.txt

All of those synopsis blurbs are extracted, and reverse-attributable to
the corresponding files.

The dependencies there are (arguably) subtly broken because those files
aren't re-made if a "cmd-list.made" is more recent, so if you remove one
of the generated text files the Makefile logic will get stuck because
the graph is incomplete (which can happen e.g. if "make clean" is
interrupted, or you run a "git clean -dxf '*.txt'".

I did the latter and ran into that recently, because I was trying to
ad-hoc fix another more general dependency issue we tend to have, which
is using wildcards on potentially generated files, so if you checkout a
new verison, build, and then checkout an old version (or are adding one
of the files involved) a script like build-docdep.perl will "helpfully"
pick up bad dependencies.

I guess you could argue that those are all problems with the Makefile,
but I think they're ultimately best solved by driving the dependencies
from the Makefile.

I.e. all we need is the one list of built-ins in command-list.txt, pair
that up with the "category" and we can always generated everything down
to the manpages correctly without relying on FS wildcards.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
@ 2021-10-21  2:20                     ` Taylor Blau
  2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
  2021-10-21 14:34                     ` Jeff King
  1 sibling, 1 reply; 87+ messages in thread
From: Taylor Blau @ 2021-10-21  2:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Taylor Blau, git, Junio C Hamano, Johannes Sixt,
	Øystein Walle

On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> >> were both considered before[1].
> >
> > Ah sorry, I thought it was just a perl one that had been the
> > show-stopper. I hadn't noticed the awk one. However, the point of my
> > patch was to use perl if available, and fall back otherwise. Maybe
> > that's too ugly, but it does address the concern with Eric's
> > implementation.
>
> I think carrying two implementations is worse than just having the one
> slightly slower one.

I have no opinion on whether or not assuming that awk or Perl exists and
can be relied upon during the build is reasonable or not. It seems like
the former might be a slightly safer assumption than the latter, but in
all honesty it seems like both are always likely to be around.

In any case, I think the point was that we could improve upon Peff's
patch by just having a single implementation done in awk. And when I
wrote that I definitely was in the mindset of being able to rely on awk
during compilation.

> >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
> >> with/without this awk version the difference in runtime is within the
> >> error bars. I.e. making the loop faster isn't necessary. It's better to
> >> get to a point where make can save you from doing all/most of the work
> >> by checking modification times, rather than making an O(n) loop faster.
> >
> > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
> > partial results is IMHO inferior to just making things faster.
>
> I agree with you in the general case, but for something that's consumed
> by a make dependency graph I find it easier to debug things if
> e.g. changing git-add.txt results in a change to git-add.gen, which is
> then cat'd together.
>
> IOW if we had a sufficiently fast C compiler I think I'd still prefer
> make's existing rules over some equivalent of:
>
>     cat *.c | super-fast-cc
>
> Since similar to how the *.sp files depend on the the *.o files now,
> declaring the dependency graph allows you to easily add more built
> things.

This seems like an unfair comparison to me. I might be more sympathetic
if we were generating a more complicated artifact by running
generate-cmdlist.sh, but its inputs and outputs seem very well defined
(and non-complicated) to me.

In any case, I agree with Peff that this isn't the approach that I would
have taken. But I also think that *just* parallelizing isn't necessarily
a win here. There are two reasons I think that:

  - The cognitive load required to parallelize this process is
    complicated; the .build directory seems like another thing to keep
    track of, and it's not clear to me what updates it, or what the
    result of touching some file in that directory is.

  - But even if the parallelization was achievable by more
    straightforward means, you still have to do the slow thing when
    you're rebuilding from scratch. So this is strictly worse the first
    time you are compiling, at least on machines with fewer cores.

In any case, this is all overkill in my mind for what we are talking
about. I agree that 'cat *.c | super-fast-cc' is worse than a competent
Makefile that knows what to build and when. But the problem here is a
slow loop in shell that is easily made much faster by implementing it
in a language that can execute the whole loop in a single process.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
  2021-10-20 23:46                 ` Jeff King
@ 2021-10-21  5:39                 ` Eric Sunshine
  1 sibling, 0 replies; 87+ messages in thread
From: Eric Sunshine @ 2021-10-21  5:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, Jeff King, Git List, Junio C Hamano, Johannes Sixt,
	Øystein Walle

On Wed, Oct 20, 2021 at 7:34 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Wed, Oct 20 2021, Taylor Blau wrote:
> > But I don't think we even need to, since we could just rely on Awk. That
> > has all the benefits you described while still avoiding the circular
> > dependency on libgit.a. But the killer feature is that we don't have to
> > rely on two implementations, the lesser-used of which is likely to
> > bitrot over time.
>
> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> were both considered before[1].
> 1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/

Thanks for saving me the trouble of digging up that email reference.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
  2021-10-21  2:20                     ` Taylor Blau
@ 2021-10-21 14:34                     ` Jeff King
  2021-10-21 22:34                       ` Junio C Hamano
  2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 87+ messages in thread
From: Jeff King @ 2021-10-21 14:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> There's subtle dependency issues there as well, and just having a
> >> one-off solution for the the command-list.h doesn't get us closer to
> >> addressing that sibling implementation.
> >
> > So I don't know what "subtle dependency issues" you found here, but this
> > is exactly the kind of complexity it was my goal to avoid.
> 
> But how? I don't see how narrowly making the loop in generate-cmdlist.sh
> gets us closer to generating the "cmds_txt" in the
> Documentation/Makefile.

What I meant is that the work to get everything right in the Makefile to
correctly handle dependencies and a partial rebuild can be tricky. For
instance, you're still stuck with a big wildcard dependency on
Documentation/git*.txt (and a manual list of exclusions in the Makefile)
because it's hard in make to do make new dynamic rules based on an
existing one (i.e., the list _should_ come from what's in
command-list.txt). Or the fact that we apparently need to keep the old
script around or cmake anyway.

It's also much slower. Here are from-scratch builds before and after
your patch 7:

  $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
  Benchmark #1: make command-list.h
    Time (mean ± σ):      1.527 s ±  0.060 s    [User: 1.320 s, System: 0.649 s]
    Range (min … max):    1.433 s …  1.625 s    10 runs
   
  
  $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
  Benchmark #1: make command-list.h
    Time (mean ± σ):      2.661 s ±  0.080 s    [User: 2.359 s, System: 1.082 s]
    Range (min … max):    2.481 s …  2.756 s    10 runs

I know that partial builds will offset that in some cases, but it still
feels like a step in the wrong direction. Even with a partial build,
swapping out "make clean" for "touch Documentation/git-add.txt" takes
about 200ms for me. Whereas with the faster version of
generate-cmdlist.sh I showed, it takes 150ms.

Now performance isn't everything, and it's possible these partial
snippets will be useful in other places. But I'm not sure I see any real
advantage in this series, and it seems like we're taking a hit in both
performance and complexity in the meantime.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
  2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
@ 2021-10-21 14:42             ` Jeff King
  2021-10-21 16:25               ` Jeff King
  0 siblings, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-21 14:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Wed, Oct 20, 2021 at 08:39:57PM +0200, Ævar Arnfjörð Bjarmason wrote:

> From: Johannes Sixt <j6t@kdbg.org>
> 
> This is just a small code reduction. There is a small probability that
> the new code breaks when the category list is empty. But that would be
> noticed during the compile step.
> 
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  generate-cmdlist.sh | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index e517c33710a..a1ab2b1f077 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -67,10 +67,7 @@ print_command_list () {
>  	while read cmd rest
>  	do
>  		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		for cat in $(echo "$rest" | get_category_line)
> -		do
> -			printf " | CAT_$cat"
> -		done
> +		printf " | CAT_%s" $(echo "$rest" | get_category_line)
>  		echo " },"

I think this is fine, but regardless of what happens in patch 7, it's
probably worth dropping this get_category_line call. All it does is sort
and de-dup the tokens in $rest, but we don't care because we're just
OR-ing them together. And of the 3 processes spawned by each loop, it is
responsible for 2 of them.

Even if this loop is broken out into individual bits of Makefile
snippet, avoiding the extra processes is worth doing.

The patch below gives me:

  $ git show HEAD:generate-cmdlist.sh >generate-cmdlist.sh.old
  $ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
  Benchmark #1: sh generate-cmdlist.sh command-list.txt
    Time (mean ± σ):     591.3 ms ±  31.5 ms    [User: 392.9 ms, System: 243.7 ms]
    Range (min … max):   543.7 ms … 630.6 ms    10 runs
   
  Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
    Time (mean ± σ):      1.236 s ±  0.060 s    [User: 1.100 s, System: 0.556 s]
    Range (min … max):    1.089 s …  1.275 s    10 runs
   
  Summary
    'sh generate-cmdlist.sh command-list.txt' ran
      2.09 ± 0.15 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

---
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..fab9e6a671 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -67,7 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"

I think you could also delete get_category_line, as it was inlined in
the other caller.

-Peff

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard
  2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
@ 2021-10-21 14:45             ` Jeff King
  2021-10-21 18:24               ` Junio C Hamano
  2021-10-21 22:46             ` Øystein Walle
  1 sibling, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-21 14:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Wed, Oct 20, 2021 at 08:39:58PM +0200, Ævar Arnfjörð Bjarmason wrote:

>  get_synopsis () {
> +	head -n 10 "Documentation/$1.txt" |
>  	sed -n '
>  		/^NAME/,/'"$1"'/H
>  		${
>  			x
>  			s/.*'"$1"' - \(.*\)/N_("\1")/
>  			p
> -		}' "Documentation/$1.txt"
> +		}'
>  }

By the way, I'm not sure about the utility of this change. It reduces
the number of lines that sed looks at, but at the cost of an extra
process. That's probably a net loss. And if we did want to limit the
data sed covers, doing "pq" after we matched would be simpler.

It also feels like it's orthogonal to what this patch is doing, but
maybe there's some subtle non-performance reason to want this.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
  2021-10-21 14:42             ` Jeff King
@ 2021-10-21 16:25               ` Jeff King
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff King @ 2021-10-21 16:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Thu, Oct 21, 2021 at 10:42:52AM -0400, Jeff King wrote:

> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..fab9e6a671 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -67,7 +67,7 @@ print_command_list () {
>  	while read cmd rest
>  	do
>  		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> +		printf " | CAT_%s" $rest
>  		echo " },"
>  	done
>  	echo "};"
> 
> I think you could also delete get_category_line, as it was inlined in
> the other caller.

Just for fun, I did a pure-shell loop to drop get_synopsis, which means
we don't exec any processes inside the loop. That patch is below, which
yields the timings (orig is up to your patch 6, no-sort is the patch
above, and pure-shell is the patch below on top):

  $ hyperfine --warmup 1 -L v orig,no-sort,pure-shell -p 'make clean' 'sh generate-cmdlist.sh.{v} command-list.txt'
  Benchmark #1: sh generate-cmdlist.sh.orig command-list.txt
    Time (mean ± σ):      1.286 s ±  0.148 s    [User: 1.503 s, System: 0.781 s]
    Range (min … max):    0.938 s …  1.451 s    10 runs
   
  Benchmark #2: sh generate-cmdlist.sh.no-sort command-list.txt
    Time (mean ± σ):     553.6 ms ± 143.3 ms    [User: 396.7 ms, System: 198.3 ms]
    Range (min … max):   192.6 ms … 683.5 ms    10 runs
   
  Benchmark #3: sh generate-cmdlist.sh.pure-shell command-list.txt
    Time (mean ± σ):      29.7 ms ±  15.6 ms    [User: 22.6 ms, System: 19.4 ms]
    Range (min … max):    12.0 ms …  49.1 ms    10 runs
   
  Summary
    'sh generate-cmdlist.sh.pure-shell command-list.txt' ran
     18.65 ± 10.93 times faster than 'sh generate-cmdlist.sh.no-sort command-list.txt'
     43.33 ± 23.32 times faster than 'sh generate-cmdlist.sh.orig command-list.txt'

So that's building all of the commands faster than I could get even
"touch Documentation/git-add.txt && make command-list.h" to run with
your patch (not entirely fair; I'm not invoking make here, which
probably does add 100ms of overhead, but I think it's still a net win).

The patch below doesn't enforce the /NAME/ section as the sed does. IMHO
that's not of much value because it uses the line with the command-name
as the lower bound. But it could be done pretty easily with an extra
$seen_name variable.

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index fab9e6a671..eae4bbb4c7 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -22,16 +22,6 @@ category_list () {
 	LC_ALL=C sort -u
 }
 
-get_synopsis () {
-	sed -n '
-		/^NAME/,/'"$1"'/H
-		${
-			x
-			s/.*'"$1"' - \(.*\)/N_("\1")/
-			p
-		}' "Documentation/$1.txt"
-}
-
 define_categories () {
 	echo
 	echo "/* Command categories */"
@@ -66,7 +56,18 @@ print_command_list () {
 	command_list "$1" |
 	while read cmd rest
 	do
-		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
+		synopsis=
+		while read line
+		do
+			case "$line" in
+			"$cmd - "*)
+				synopsis=${line#$cmd - }
+				break
+				;;
+			esac
+		done <"Documentation/$cmd.txt"
+
+		printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis"
 		printf " | CAT_%s" $rest
 		echo " },"
 	done

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard
  2021-10-21 14:45             ` Jeff King
@ 2021-10-21 18:24               ` Junio C Hamano
  0 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-21 18:24 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Sixt,
	Øystein Walle

Jeff King <peff@peff.net> writes:

> On Wed, Oct 20, 2021 at 08:39:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>>  get_synopsis () {
>> +	head -n 10 "Documentation/$1.txt" |
>>  	sed -n '
>>  		/^NAME/,/'"$1"'/H
>>  		${
>>  			x
>>  			s/.*'"$1"' - \(.*\)/N_("\1")/
>>  			p
>> -		}' "Documentation/$1.txt"
>> +		}'
>>  }
>
> By the way, I'm not sure about the utility of this change. It reduces
> the number of lines that sed looks at, but at the cost of an extra
> process. That's probably a net loss. And if we did want to limit the
> data sed covers, doing "pq" after we matched would be simpler.

Doesn't the above

 - slurp the lines into hold space while we are in the synopsis
   part;
 - otherwise keep reading and discarding;
 - and do the processing at end

So presumably, instead of waiting till the end, can't we immediately
process the thing and exit?

I guess your "'pq' after we matched" is saying the same thing.  I
agree that extra process with an ad-hoc limitation of 10 does leave
a bad taste in my mouth.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-21 14:34                     ` Jeff King
@ 2021-10-21 22:34                       ` Junio C Hamano
  2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-21 22:34 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, git,
	Johannes Sixt, Øystein Walle

Jeff King <peff@peff.net> writes:

> Now performance isn't everything, and it's possible these partial
> snippets will be useful in other places. But I'm not sure I see any real
> advantage in this series, and it seems like we're taking a hit in both
> performance and complexity in the meantime.

Let's not forget about a hit we are taking in reviewer bandwidth.
And added complexity will cost more over time.  I am not sure if
these 8-patches deserved this much attention, compared to other
neglected topics.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard
  2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
  2021-10-21 14:45             ` Jeff King
@ 2021-10-21 22:46             ` Øystein Walle
  1 sibling, 0 replies; 87+ messages in thread
From: Øystein Walle @ 2021-10-21 22:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git, Junio C Hamano, Jeff King, Johannes Sixt

On Wed, 20 Oct 2021 at 20:40, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> Unfortunately we can't drop the old code completely due to the CMake
> integration, see 061c2240b1b (Introduce CMake support for configuring
> Git, 2020-06-12). It will keep using the older and slower script.

I took a quick stab at this earlier today and it's not too difficult to
implement. I can take a deeper dive over the weekend if this patch
series gets traction.

> +header_only=
> +case $1 in
> +--entry-only)
> +       shift
> +       print_command_list $1 <"$1"
> +       exit 0
> +       ;;

Why is "$1" given twice here? As far as I can tell print_command_list
doesn't reference its arguments anymore.

> +# The old compatibility mode for CMmake. See 061c2240b1b (Introduce
> +# CMake support for configuring Git, 2020-06-12)

s/CMmake/CMake/

> +echo "static struct cmdname_help command_list[] = {"
> +grep -v \
> +       -e '^#' \
> +       -e '^git-fsck-objects ' \
> +       "$1" |
>  print_command_list "$1"
> +echo "};"

I suppose it stems from this unchanged line. That can be changed to just
`print_command_list`.

Øsse

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-21 14:34                     ` Jeff King
  2021-10-21 22:34                       ` Junio C Hamano
@ 2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
  2021-10-22 18:31                         ` Jeff King
  1 sibling, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 10:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle


On Thu, Oct 21 2021, Jeff King wrote:

> On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> >> There's subtle dependency issues there as well, and just having a
>> >> one-off solution for the the command-list.h doesn't get us closer to
>> >> addressing that sibling implementation.
>> >
>> > So I don't know what "subtle dependency issues" you found here, but this
>> > is exactly the kind of complexity it was my goal to avoid.
>> 
>> But how? I don't see how narrowly making the loop in generate-cmdlist.sh
>> gets us closer to generating the "cmds_txt" in the
>> Documentation/Makefile.
>
> What I meant is that the work to get everything right in the Makefile to
> correctly handle dependencies and a partial rebuild can be tricky. For
> instance, you're still stuck with a big wildcard dependency on
> Documentation/git*.txt (and a manual list of exclusions in the Makefile)
> because it's hard in make to do make new dynamic rules based on an
> existing one (i.e., the list _should_ come from what's in
> command-list.txt). Or the fact that we apparently need to keep the old
> script around or cmake anyway.
>
> It's also much slower. Here are from-scratch builds before and after
> your patch 7:
>
>   $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
>   Benchmark #1: make command-list.h
>     Time (mean ± σ):      1.527 s ±  0.060 s    [User: 1.320 s, System: 0.649 s]
>     Range (min … max):    1.433 s …  1.625 s    10 runs
>    
>   
>   $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
>   Benchmark #1: make command-list.h
>     Time (mean ± σ):      2.661 s ±  0.080 s    [User: 2.359 s, System: 1.082 s]
>     Range (min … max):    2.481 s …  2.756 s    10 runs
>
> I know that partial builds will offset that in some cases, but it still
> feels like a step in the wrong direction. Even with a partial build,
> swapping out "make clean" for "touch Documentation/git-add.txt" takes
> about 200ms for me. Whereas with the faster version of
> generate-cmdlist.sh I showed, it takes 150ms.
>
> Now performance isn't everything, and it's possible these partial
> snippets will be useful in other places. But I'm not sure I see any real
> advantage in this series, and it seems like we're taking a hit in both
> performance and complexity in the meantime.

Yes, the same numbers are noted in the 7/8 commit message. I.e. it's
slower on -j1, but faster with higher -j<n> numbers.

Aside from any changes I'm proposing here it seems rather pointless to
me to optimize the runtime of -j1 runs.

I think we use those in e.g. CI, so of course if they become *really*
slow it will matter, but the purpose of this change is to make hacking
on git easier, both in terms of runtime and discovering what the
Makefile is doing wih V=1.

I think anyone hacking on git is going to be on a system with -j2 at
least. So again, separate from these specific changes, if we've got a
change that speeds up -jN runs at the cost of a -j1 run that seems like
good thing.

In terms of the utility of benchmarks this benchmark uses "make" and is
meaningful, but e.g. <YXCKqAEwtwFozWk6@nand.local> (and I think some
other ones?) in this thread invoke the shellscript directly.

Those sorts of benchmarks may or may not matter, and in this case the
script is always called in the context of a Makefile, so that's really
the only meaningful way to test it. If e.g. its performance changes in a
way that won't be noticed in other Makefile noise it probably won't
matter to anyone.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-21  2:20                     ` Taylor Blau
@ 2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 12:37 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Jeff King, git, Junio C Hamano, Johannes Sixt, Øystein Walle


On Wed, Oct 20 2021, Taylor Blau wrote:

> On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> Per Eric's Sunshine's upthread comments an awk and Perl implementation
>> >> were both considered before[1].
>> >
>> > Ah sorry, I thought it was just a perl one that had been the
>> > show-stopper. I hadn't noticed the awk one. However, the point of my
>> > patch was to use perl if available, and fall back otherwise. Maybe
>> > that's too ugly, but it does address the concern with Eric's
>> > implementation.
>>
>> I think carrying two implementations is worse than just having the one
>> slightly slower one.
>
> I have no opinion on whether or not assuming that awk or Perl exists and
> can be relied upon during the build is reasonable or not. It seems like
> the former might be a slightly safer assumption than the latter, but in
> all honesty it seems like both are always likely to be around.
>
> In any case, I think the point was that we could improve upon Peff's
> patch by just having a single implementation done in awk. And when I
> wrote that I definitely was in the mindset of being able to rely on awk
> during compilation.
>
>> >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
>> >> with/without this awk version the difference in runtime is within the
>> >> error bars. I.e. making the loop faster isn't necessary. It's better to
>> >> get to a point where make can save you from doing all/most of the work
>> >> by checking modification times, rather than making an O(n) loop faster.
>> >
>> > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
>> > partial results is IMHO inferior to just making things faster.
>>
>> I agree with you in the general case, but for something that's consumed
>> by a make dependency graph I find it easier to debug things if
>> e.g. changing git-add.txt results in a change to git-add.gen, which is
>> then cat'd together.
>>
>> IOW if we had a sufficiently fast C compiler I think I'd still prefer
>> make's existing rules over some equivalent of:
>>
>>     cat *.c | super-fast-cc
>>
>> Since similar to how the *.sp files depend on the the *.o files now,
>> declaring the dependency graph allows you to easily add more built
>> things.
>
> This seems like an unfair comparison to me. I might be more sympathetic
> if we were generating a more complicated artifact by running
> generate-cmdlist.sh, but its inputs and outputs seem very well defined
> (and non-complicated) to me.

They are? a foo.o to foo.o input is relatively uncomplicated, and you
can discover the exact dependencies with 3rd party tools, like the GCC
and Clang switches we use generate the .depends dirs[1].

Whereas with the custom shellscripts that have for-loops of their own
like generate-cmdlist.sh what it depends on exactly is relatively opaque
to you until you read the shellscript.

I guess it's a matter of taste, but if you run this with/without this
series:

    touch Documentation/git-a*.txt; time make -j1 command-list.h --debug=b V=1

You'll see that before we'd spot that e.g. git-add.txt changed, but
we'll run one target in response to that at the end.

So it's just like what you'd get when you make %.o from %.c to produce a
"program" that links all those %.o together at the end.

So I do think it's a fair comparison, if anything it's unfair to this
series, because as noted you can discover these dependency independently
with GCC etc. for C code. But for a custom *.txt format with an ad-hoc
*.sh to parse it there's no such aid available.

1. $ cat .depend/help.o.d 
help.o: help.c cache.h git-compat-util.h compat/bswap.h wildmatch.h \
 banned.h strbuf.h hashmap.h hash.h repository.h path.h sha1dc_git.h \
 sha1collisiondetection/lib/sha1.h sha256/block/sha256.h list.h advice.h \
 gettext.h convert.h string-list.h trace.h trace2.h pack-revindex.h \
 oid-array.h mem-pool.h config.h builtin.h commit.h object.h tree.h \
 decorate.h gpg-interface.h pretty.h commit-slab.h commit-slab-decl.h \
 commit-slab-impl.h exec-cmd.h run-command.h thread-utils.h strvec.h \
 levenshtein.h help.h command-list.h column.h version.h refs.h \
 parse-options.h prompt.h
[...]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
@ 2021-10-22 18:31                         ` Jeff King
  2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-22 18:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle

On Fri, Oct 22, 2021 at 12:51:11PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Yes, the same numbers are noted in the 7/8 commit message. I.e. it's
> slower on -j1, but faster with higher -j<n> numbers.
> 
> Aside from any changes I'm proposing here it seems rather pointless to
> me to optimize the runtime of -j1 runs.
> 
> I think we use those in e.g. CI, so of course if they become *really*
> slow it will matter, but the purpose of this change is to make hacking
> on git easier, both in terms of runtime and discovering what the
> Makefile is doing wih V=1.
> 
> I think anyone hacking on git is going to be on a system with -j2 at
> least. So again, separate from these specific changes, if we've got a
> change that speeds up -jN runs at the cost of a -j1 run that seems like
> good thing.

It seems weird to me to assume that all of our processors are available
to build command-list.h. In most cases you are not running "make -j16
command-list.h", but rather "make -j16", and those other processors are
doing useful things, like say, building actual C code.

So counting CPU time is the interesting thing, because every cycle you
save there gets used for other work. And "make -j1" just brings
wall-clock and CPU time together.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster
  2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
                             ` (8 preceding siblings ...)
  2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
@ 2021-10-22 19:36           ` Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
                               ` (13 more replies)
  9 siblings, 14 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This version of this series drops the Makefile-powered version of the
cmdlist in favor of making the shellscript much faster, mostly with
suggestions from Jeff King.

I still think that splitting out the generated data into files may be
useful for unifying the Documentation/ and C code build processes,
there's another custom parser for command-list.txt in
Documentation/cmd-list.perl.

But if and when I've got something for that I can dig that out of the
v1, in the meantime the v1 of this should be mostly uncontroversial.

The last tow patches make things a bit slower for me, but since they
replace command invocations with pure-shell logic they presumably make
things a bit less painful on e.g. Windows, and the 8th patch here
already made things quite very fast already.

Jeff King (1):
  generate-cmdlist.sh: do not shell out to "sed"

Johannes Sixt (2):
  generate-cmdlist.sh: spawn fewer processes
  generate-cmdlist.sh: replace for loop by printf's auto-repeat feature

Ævar Arnfjörð Bjarmason (7):
  command-list.txt: sort with "LC_ALL=C sort"
  generate-cmdlist.sh: trivial whitespace change
  generate-cmdlist.sh: don't call get_categories() from category_list()
  generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  generate-cmdlist.sh: stop sorting category lines
  generate-cmdlist.sh: replace "grep' invocation with a shell version
  generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell

 command-list.txt    | 20 +++++++-------
 generate-cmdlist.sh | 66 ++++++++++++++++++++++++++-------------------
 2 files changed, 48 insertions(+), 38 deletions(-)

Range-diff against v1:
 1:  96885282988 =  1:  96885282988 command-list.txt: sort with "LC_ALL=C sort"
 2:  5e8fef90e42 =  2:  5e8fef90e42 generate-cmdlist.sh: trivial whitespace change
 3:  6b4de6a6088 =  3:  6b4de6a6088 generate-cmdlist.sh: spawn fewer processes
 4:  074685cf714 =  4:  074685cf714 generate-cmdlist.sh: don't call get_categories() from category_list()
 5:  f01c1fd8088 =  5:  f01c1fd8088 generate-cmdlist.sh: run "grep | sort", not "sort | grep"
 6:  e0b11514b8d =  6:  e0b11514b8d generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
 7:  0c6f9b80d3b <  -:  ----------- Makefile: stop having command-list.h depend on a wildcard
 8:  23d4cc77b6c <  -:  ----------- Makefile: assert correct generate-cmdlist.sh output
 -:  ----------- >  7:  f2f37c2963b generate-cmdlist.sh: stop sorting category lines
 -:  ----------- >  8:  83318d6c0da generate-cmdlist.sh: do not shell out to "sed"
 -:  ----------- >  9:  7903dd1f8c2 generate-cmdlist.sh: replace "grep' invocation with a shell version
 -:  ----------- > 10:  e10a43756d1 generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-25 18:29               ` Junio C Hamano
  2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
                               ` (12 subsequent siblings)
  13 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

We should keep these files sorted in the C locale, e.g. in the C
locale the order is:

    git-check-mailmap
    git-check-ref-format
    git-checkout

But under en_US.UTF-8 it's:

    git-check-mailmap
    git-checkout
    git-check-ref-format

In a subsequent commit I'll change generate-cmdlist.sh to use C sort
order, and without this change we'd be led to believe that that change
caused a meaningful change in the output, so let's do this as a
separate step, right now the generate-cmdlist.sh script just uses the
order found in this file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 command-list.txt | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/command-list.txt b/command-list.txt
index a289f09ed6f..02fc7ddde68 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -60,9 +60,9 @@ git-cat-file                            plumbinginterrogators
 git-check-attr                          purehelpers
 git-check-ignore                        purehelpers
 git-check-mailmap                       purehelpers
+git-check-ref-format                    purehelpers
 git-checkout                            mainporcelain
 git-checkout-index                      plumbingmanipulators
-git-check-ref-format                    purehelpers
 git-cherry                              plumbinginterrogators          complete
 git-cherry-pick                         mainporcelain
 git-citool                              mainporcelain
@@ -111,7 +111,6 @@ git-index-pack                          plumbingmanipulators
 git-init                                mainporcelain           init
 git-instaweb                            ancillaryinterrogators          complete
 git-interpret-trailers                  purehelpers
-gitk                                    mainporcelain
 git-log                                 mainporcelain           info
 git-ls-files                            plumbinginterrogators
 git-ls-remote                           plumbinginterrogators
@@ -124,11 +123,11 @@ git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
 git-merge-index                         plumbingmanipulators
 git-merge-one-file                      purehelpers
-git-mergetool                           ancillarymanipulators           complete
 git-merge-tree                          ancillaryinterrogators
-git-multi-pack-index                    plumbingmanipulators
+git-mergetool                           ancillarymanipulators           complete
 git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
+git-multi-pack-index                    plumbingmanipulators
 git-mv                                  mainporcelain           worktree
 git-name-rev                            plumbinginterrogators
 git-notes                               mainporcelain
@@ -154,23 +153,23 @@ git-request-pull                        foreignscminterface             complete
 git-rerere                              ancillaryinterrogators
 git-reset                               mainporcelain           history
 git-restore                             mainporcelain           worktree
-git-revert                              mainporcelain
 git-rev-list                            plumbinginterrogators
 git-rev-parse                           plumbinginterrogators
+git-revert                              mainporcelain
 git-rm                                  mainporcelain           worktree
 git-send-email                          foreignscminterface             complete
 git-send-pack                           synchingrepositories
+git-sh-i18n                             purehelpers
+git-sh-setup                            purehelpers
 git-shell                               synchelpers
 git-shortlog                            mainporcelain
 git-show                                mainporcelain           info
 git-show-branch                         ancillaryinterrogators          complete
 git-show-index                          plumbinginterrogators
 git-show-ref                            plumbinginterrogators
-git-sh-i18n                             purehelpers
-git-sh-setup                            purehelpers
 git-sparse-checkout                     mainporcelain           worktree
-git-stash                               mainporcelain
 git-stage                                                               complete
+git-stash                               mainporcelain
 git-status                              mainporcelain           info
 git-stripspace                          purehelpers
 git-submodule                           mainporcelain
@@ -189,10 +188,11 @@ git-var                                 plumbinginterrogators
 git-verify-commit                       ancillaryinterrogators
 git-verify-pack                         plumbinginterrogators
 git-verify-tag                          ancillaryinterrogators
-gitweb                                  ancillaryinterrogators
 git-whatchanged                         ancillaryinterrogators          complete
 git-worktree                            mainporcelain
 git-write-tree                          plumbingmanipulators
+gitk                                    mainporcelain
+gitweb                                  ancillaryinterrogators
 gitattributes                           guide
 gitcli                                  guide
 gitcore-tutorial                        guide
@@ -211,6 +211,6 @@ gitremote-helpers                       guide
 gitrepository-layout                    guide
 gitrevisions                            guide
 gitsubmodules                           guide
-gittutorial-2                           guide
 gittutorial                             guide
+gittutorial-2                           guide
 gitworkflows                            guide
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
                               ` (11 subsequent siblings)
  13 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This makes a subsequent diff smaller, and won't leave us with this
syntax nit at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..5114f46680a 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -10,7 +10,7 @@ command_list () {
 }
 
 get_categories () {
-	tr ' ' '\012'|
+	tr ' ' '\012' |
 	grep -v '^$' |
 	sort |
 	uniq
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
                               ` (10 subsequent siblings)
  13 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

From: Johannes Sixt <j6t@kdbg.org>

The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.

Furthermore, use sort -u instead of sort | uniq to remove yet another
process.

[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:

$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):     371.3 ms ±  64.2 ms    [User: 430.4 ms, System: 72.5 ms]
  Range (min … max):   320.5 ms … 517.7 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):     489.9 ms ± 185.4 ms    [User: 724.7 ms, System: 141.3 ms]
  Range (min … max):   346.0 ms … 885.3 ms    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 5114f46680a..27367915611 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -11,15 +11,14 @@ command_list () {
 
 get_categories () {
 	tr ' ' '\012' |
-	grep -v '^$' |
-	sort |
-	uniq
+	LC_ALL=C sort -u
 }
 
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories
+	get_categories |
+	grep -v '^$'
 }
 
 get_synopsis () {
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list()
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
                               ` (9 subsequent siblings)
  13 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This isn't for optimization as the get_categories() is a purely shell
function, but rather for ease of readability, let's just inline these
two lines. We'll be changing this code some more in subsequent commits
to make this worth it.

Rename the get_categories() function to get_category_line(), since
that's what it's doing now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 27367915611..16043e38476 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -9,7 +9,7 @@ command_list () {
 	eval "grep -ve '^#' $exclude_programs" <"$1"
 }
 
-get_categories () {
+get_category_line () {
 	tr ' ' '\012' |
 	LC_ALL=C sort -u
 }
@@ -17,7 +17,8 @@ get_categories () {
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories |
+	tr ' ' '\012' |
+	LC_ALL=C sort -u |
 	grep -v '^$'
 }
 
@@ -66,7 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_categories)
+		for cat in $(echo "$rest" | get_category_line)
 		do
 			printf " | CAT_$cat"
 		done
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
                               ` (8 subsequent siblings)
  13 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This doesn't matter for performance, but let's not include the empty
lines in our sorting. This makes the intent of the code clearer.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 16043e38476..e517c33710a 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -18,8 +18,8 @@ category_list () {
 	command_list "$1" |
 	cut -c 40- |
 	tr ' ' '\012' |
-	LC_ALL=C sort -u |
-	grep -v '^$'
+	grep -v '^$' |
+	LC_ALL=C sort -u
 }
 
 get_synopsis () {
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
                               ` (7 subsequent siblings)
  13 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

From: Johannes Sixt <j6t@kdbg.org>

This is just a small code reduction. There is a small probability that
the new code breaks when the category list is empty. But that would be
noticed during the compile step.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index e517c33710a..a1ab2b1f077 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -67,10 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_category_line)
-		do
-			printf " | CAT_$cat"
-		done
+		printf " | CAT_%s" $(echo "$rest" | get_category_line)
 		echo " },"
 	done
 	echo "};"
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-25 16:39               ` Jeff King
  2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  13 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

In a preceding commit we changed the print_command_list() loop to use
printf's auto-repeat feature. Let's now get rid of get_category_line()
entirely by not sorting the categories.

This will change the output of the generated code from e.g.:

    -       { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators },

To:

    +       { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete },

I.e. the categories are no longer sorted, but as they're OR'd together
it won't matter for the end result.

This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old)
and "master" to this code:

  'sh generate-cmdlist.sh command-list.txt' ran
    1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
    1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f077..f50112c50f8 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -9,11 +9,6 @@ command_list () {
 	eval "grep -ve '^#' $exclude_programs" <"$1"
 }
 
-get_category_line () {
-	tr ' ' '\012' |
-	LC_ALL=C sort -u
-}
-
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
@@ -67,7 +62,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed"
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-25 16:46               ` Jeff King
  2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  13 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

From: Jeff King <peff@peff.net>

Replace the "sed" invocation in get_synopsis() with a pure-shell
version. This speeds up generate-cmdlist.sh significantly. Compared to
HEAD~ (old) and "master" we are, according to hyperfine(1):

  'sh generate-cmdlist.sh command-list.txt' ran
   12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
   18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 generate-cmdlist.sh | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index f50112c50f8..9b7d6aea629 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -17,16 +17,6 @@ category_list () {
 	LC_ALL=C sort -u
 }
 
-get_synopsis () {
-	sed -n '
-		/^NAME/,/'"$1"'/H
-		${
-			x
-			s/.*'"$1"' - \(.*\)/N_("\1")/
-			p
-		}' "Documentation/$1.txt"
-}
-
 define_categories () {
 	echo
 	echo "/* Command categories */"
@@ -61,7 +51,18 @@ print_command_list () {
 	command_list "$1" |
 	while read cmd rest
 	do
-		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
+		synopsis=
+		while read line
+		do
+			case "$line" in
+			"$cmd - "*)
+				synopsis=${line#$cmd - }
+				break
+				;;
+			esac
+		done <"Documentation/$cmd.txt"
+
+		printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis"
 		printf " | CAT_%s" $rest
 		echo " },"
 	done
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (7 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-23 22:19               ` Junio C Hamano
  2021-10-23 22:26               ` Junio C Hamano
  2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  13 siblings, 2 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

Replace the "grep" we run to exclude certain programs from the
generated output with a pure-shell loop that strips out the comments,
and sees if the "cmd" we're reading is on a list of excluded
programs. This uses a trick similar to test_have_prereq() in
test-lib-functions.sh.

On my *nix system this makes things quite a bit slower compared to
HEAD~, but since the generate-cmdlist.sh is already quite fast, and
this likely helps systems where command invocations are more
expensive (i.e. Windows) let's use this anyway.

  'sh generate-cmdlist.sh.old command-list.txt' ran
    1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
   18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9b7d6aea629..2b184bbc65f 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -6,12 +6,27 @@ die () {
 }
 
 command_list () {
-	eval "grep -ve '^#' $exclude_programs" <"$1"
+	while read cmd rest
+	do
+		case "$cmd" in
+		"#"*)
+			continue;
+			;;
+		*)
+			case "$exclude_programs" in
+				*":$cmd:"*)
+				;;
+			*)
+				echo "$cmd $rest"
+				;;
+			esac
+		esac
+	done
 }
 
 category_list () {
-	command_list "$1" |
-	cut -c 40- |
+	command_list <"$1" |
+	cut -d' ' -f2- |
 	tr ' ' '\012' |
 	grep -v '^$' |
 	LC_ALL=C sort -u
@@ -48,7 +63,7 @@ define_category_names () {
 print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"
 
-	command_list "$1" |
+	command_list <"$1" |
 	while read cmd rest
 	do
 		synopsis=
@@ -69,11 +84,11 @@ print_command_list () {
 	echo "};"
 }
 
-exclude_programs=
+exclude_programs=:
 while test "--exclude-program" = "$1"
 do
 	shift
-	exclude_programs="$exclude_programs -e \"^$1 \""
+	exclude_programs="$exclude_programs$1:"
 	shift
 done
 
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (8 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
@ 2021-10-22 19:36             ` Ævar Arnfjörð Bjarmason
  2021-10-23 22:26               ` Junio C Hamano
  2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
                               ` (3 subsequent siblings)
  13 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

Extend the pure-shell parsing of command-list.txt by using having
command_list() take an argument indicating whether we're interested in
the "$cmd" part of the line, or just the "$rest".

That takes care of the "cut -d", and printf's auto-repeat feature can
replace the "tr". We don't need the "grep -v" either, as we're not
emitting any empty lines here (the command-list.txt doesn't have any).

This doesn't make things any faster or slower in my tests, but as with
the preceding commit let's do it just to get rid of command
invocations, it'll probably help on e.g. Windows.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 2b184bbc65f..394443c66df 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -17,7 +17,12 @@ command_list () {
 				*":$cmd:"*)
 				;;
 			*)
-				echo "$cmd $rest"
+				if test -n "$1"
+				then
+					printf "%s\n" $rest
+				else
+					echo "$cmd $rest"
+				fi
 				;;
 			esac
 		esac
@@ -25,10 +30,7 @@ command_list () {
 }
 
 category_list () {
-	command_list <"$1" |
-	cut -d' ' -f2- |
-	tr ' ' '\012' |
-	grep -v '^$' |
+	command_list --no-cat <"$1" |
 	LC_ALL=C sort -u
 }
 
-- 
2.33.1.1505.g075a284c562


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
  2021-10-22 18:31                         ` Jeff King
@ 2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-22 20:50 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle


On Fri, Oct 22 2021, Jeff King wrote:

> On Fri, Oct 22, 2021 at 12:51:11PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Yes, the same numbers are noted in the 7/8 commit message. I.e. it's
>> slower on -j1, but faster with higher -j<n> numbers.
>> 
>> Aside from any changes I'm proposing here it seems rather pointless to
>> me to optimize the runtime of -j1 runs.
>> 
>> I think we use those in e.g. CI, so of course if they become *really*
>> slow it will matter, but the purpose of this change is to make hacking
>> on git easier, both in terms of runtime and discovering what the
>> Makefile is doing wih V=1.
>> 
>> I think anyone hacking on git is going to be on a system with -j2 at
>> least. So again, separate from these specific changes, if we've got a
>> change that speeds up -jN runs at the cost of a -j1 run that seems like
>> good thing.
>
> It seems weird to me to assume that all of our processors are available
> to build command-list.h. In most cases you are not running "make -j16
> command-list.h", but rather "make -j16", and those other processors are
> doing useful things, like say, building actual C code.
>
> So counting CPU time is the interesting thing, because every cycle you
> save there gets used for other work. And "make -j1" just brings
> wall-clock and CPU time together.

There's been a lot of goalpost moving in the discussion around this
series, but if you look at the numbers for the v1 I proposed at
<patch-7.8-0c6f9b80d3b-20211020T183533Z-avarab@gmail.com> it also had a
significant drop in "user" time.

The only thing that was slower in my tests in either wallclock or user
time was the wallclock time on the building from scratch case (the user
time was lower). Maybe those measurements were bad or whatever, but
that's the context for the above.

Now, since then I've submitted a v2 with just the "make the shellscript
faster" parts of this, including with some of your suggestions, making
the shellscript version faster.

FWIW there's a rather trivial addition to my version that makes the
"make all the things" faster again in wallclock/user time. We don't need
to re-parse the command-list.txt at all again to emit the headers if
command-list.txt doesn't change, which is the common case. So we can
just cache the whole header portion in another *.gen file and "cat" it.

Anyway, I've been running these changes on top of other Makefile changes
I've got locally where we re-build almost nothing as HEAD changes. No
FORCE targets shellscripting (but the same via native GNU make
features), and no *.sh/*.perl script re-generation on HEAD changes (the
former being on-list).

So with that the common case really is that everything hangs on the
command-list.h generation. Since we have maybe 1-5 *.c files modified,
then we need to link everything, and the linking is hanging on help.o,
which needs command-list.h.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (9 preceding siblings ...)
  2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
@ 2021-10-22 21:20             ` Taylor Blau
  2021-10-23 22:34             ` Junio C Hamano
                               ` (2 subsequent siblings)
  13 siblings, 0 replies; 87+ messages in thread
From: Taylor Blau @ 2021-10-22 21:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Johannes Sixt,
	Øystein Walle, Eric Sunshine

On Fri, Oct 22, 2021 at 09:36:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This version of this series drops the Makefile-powered version of the
> cmdlist in favor of making the shellscript much faster, mostly with
> suggestions from Jeff King.

I'd be happy with the version that you suggest here, since at least the
net-effect is that generate-commandlist.sh is faster than before without
much additional complexity.

That said, I did find the structure of these patches somewhat confusing.
There is a lot of refactoring of get_categories() and related functions,
and those patches were a little tricky to read for me. I wonder how much
could be cleaned up by placing "generate-cmdlist.sh: stop sorting
category lines" earlier in the series, getting rid of the caller.

It's too bad that the penultimate patch slowed things down a bit, but
I think that things are so fast now that much more discussion in this
area is really just splitting hairs. It would be interesting to hear
from somebody on Windows whether or not the speed-up there was worth it
(otherwise dropping that patch might make sense).

Anyway, I think we've all spent a lot of time discussing a rather
straightforward set of patches ;-). So this version looks good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version
  2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
@ 2021-10-23 22:19               ` Junio C Hamano
  2021-10-23 22:26               ` Junio C Hamano
  1 sibling, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-23 22:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Replace the "grep" we run to exclude certain programs from the
> generated output with a pure-shell loop that strips out the comments,
> and sees if the "cmd" we're reading is on a list of excluded
> programs. This uses a trick similar to test_have_prereq() in
> test-lib-functions.sh.
>
> On my *nix system this makes things quite a bit slower compared to
> HEAD~, but since the generate-cmdlist.sh is already quite fast, and
> this likely helps systems where command invocations are more
> expensive (i.e. Windows) let's use this anyway.
>
>   'sh generate-cmdlist.sh.old command-list.txt' ran
>     1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
>    18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  generate-cmdlist.sh | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index 9b7d6aea629..2b184bbc65f 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -6,12 +6,27 @@ die () {
>  }
>  
>  command_list () {
> -	eval "grep -ve '^#' $exclude_programs" <"$1"
> +	while read cmd rest
> +	do
> +		case "$cmd" in
> +		"#"*)
> +			continue;
> +			;;
> +		*)
> +			case "$exclude_programs" in
> +				*":$cmd:"*)
> +				;;

Funny indentation.

> +			*)
> +				echo "$cmd $rest"
> +				;;
> +			esac
> +		esac
> +	done
>  }

>  category_list () {
> -	command_list "$1" |
> +	command_list <"$1" |

This change is unnecessary if you did

	while read cmd rest
	do
		...
	done <"$1"

to keep the external interface to the command_list helper unchanged.

> -	cut -c 40- |
> +	cut -d' ' -f2- |

Is this just a subjective preference or a logical consequence of how
the output from command_list looks like got somehow changed?

> @@ -48,7 +63,7 @@ define_category_names () {
>  print_command_list () {
>  	echo "static struct cmdname_help command_list[] = {"
>  
> -	command_list "$1" |
> +	command_list <"$1" |

Ditto.

> -exclude_programs=
> +exclude_programs=:
>  while test "--exclude-program" = "$1"
>  do
>  	shift
> -	exclude_programs="$exclude_programs -e \"^$1 \""
> +	exclude_programs="$exclude_programs$1:"
>  	shift
>  done

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell
  2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
@ 2021-10-23 22:26               ` Junio C Hamano
  0 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-23 22:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Extend the pure-shell parsing of command-list.txt by using having

using having???

> command_list() take an argument indicating whether we're interested in
> the "$cmd" part of the line, or just the "$rest".

OK, --no-cat stands for --no-category?  Even if (or especially if,
perhaps) you do not bother to parse the option in the command_list
helper, it would help the readers if it is spelled out.  I somehow
thought if this option has anything to do with "/bin/cat".

> That takes care of the "cut -d", and printf's auto-repeat feature can
> replace the "tr". We don't need the "grep -v" either, as we're not
> emitting any empty lines here (the command-list.txt doesn't have any).

It may make sense to ensure that the case arm won't feed an empty 
line that made cmd an empty by tightening the condition.

	case "$cmd" in
	"#"*)
		continue
		;; 
-	*)
+	?*)
		case "$exclude_programs" in
		*:"$cmd":*)
			;;

If anything, that would serve as a clear documentation that we are
safe even when the input has an empty line.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version
  2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
  2021-10-23 22:19               ` Junio C Hamano
@ 2021-10-23 22:26               ` Junio C Hamano
  1 sibling, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-23 22:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> +		"#"*)
> +			continue;
> +			;;

Stray ';' after continue.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (10 preceding siblings ...)
  2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
@ 2021-10-23 22:34             ` Junio C Hamano
  2021-10-25 16:57             ` Jeff King
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
  13 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-23 22:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Subject: Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster

Stale topic as there is no change to the Makefile.

> This version of this series drops the Makefile-powered version of the
> cmdlist in favor of making the shellscript much faster, mostly with
> suggestions from Jeff King.

OK.

I looked at the whole thing and it looked almost done, modulo just a
little breakages here and there whose corrections should be fairly
obvious.

Thanks.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines
  2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
@ 2021-10-25 16:39               ` Jeff King
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff King @ 2021-10-25 16:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau

On Fri, Oct 22, 2021 at 09:36:11PM +0200, Ævar Arnfjörð Bjarmason wrote:

> In a preceding commit we changed the print_command_list() loop to use
> printf's auto-repeat feature. Let's now get rid of get_category_line()
> entirely by not sorting the categories.
> 
> This will change the output of the generated code from e.g.:
> 
>     -       { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators },
> 
> To:
> 
>     +       { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete },
> 
> I.e. the categories are no longer sorted, but as they're OR'd together
> it won't matter for the end result.

Thanks for picking this up. The commit message here is well explained.

> This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old)
> and "master" to this code:
> 
>   'sh generate-cmdlist.sh command-list.txt' ran
>     1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>     1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Curious. I get much more dramatic results (as I'd expect, as we are
cutting out 2 of 3 process spawns in the loop):

    'sh generate-cmdlist.sh command-list.txt' ran
    2.16 ± 0.17 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
    2.37 ± 0.28 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Either way, I think it's a good idea (and it paves the way for the next
patch, where we get the biggest speedup because we stop spawning any
processes at all).

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed"
  2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
@ 2021-10-25 16:46               ` Jeff King
  2021-10-25 17:52                 ` Jeff King
  0 siblings, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-25 16:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau

On Fri, Oct 22, 2021 at 09:36:12PM +0200, Ævar Arnfjörð Bjarmason wrote:

> From: Jeff King <peff@peff.net>
> 
> Replace the "sed" invocation in get_synopsis() with a pure-shell
> version. This speeds up generate-cmdlist.sh significantly. Compared to
> HEAD~ (old) and "master" we are, according to hyperfine(1):

Unsurprisingly I'm in favor of this. ;)

Curiously again, I get more dramatic results than you:

>   'sh generate-cmdlist.sh command-list.txt' ran
>    12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>    18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

    'sh generate-cmdlist.sh command-list.txt' ran
     22.44 ± 13.59 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
     57.35 ± 34.10 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

It's like spawning processes is somehow faster on your machine than
mine. I wonder if it's a CPU governor thing. This is a laptop, and those
numbers come from using "powersave". Doing "cpufreq-set -g performance",
I get:

   'sh generate-cmdlist.sh command-list.txt' ran
   14.35 ± 0.23 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
   33.15 ± 0.50 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

which is closer. But most notably all versions are 3-5x faster than
their "powersave" counterparts. I wonder if that has been driving some
of the confusion in our timings in this thread.

Either way, I think this is still a good direction to go.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (11 preceding siblings ...)
  2021-10-23 22:34             ` Junio C Hamano
@ 2021-10-25 16:57             ` Jeff King
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
  13 siblings, 0 replies; 87+ messages in thread
From: Jeff King @ 2021-10-25 16:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau

On Fri, Oct 22, 2021 at 09:36:04PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This version of this series drops the Makefile-powered version of the
> cmdlist in favor of making the shellscript much faster, mostly with
> suggestions from Jeff King.
> 
> I still think that splitting out the generated data into files may be
> useful for unifying the Documentation/ and C code build processes,
> there's another custom parser for command-list.txt in
> Documentation/cmd-list.perl.
> 
> But if and when I've got something for that I can dig that out of the
> v1, in the meantime the v1 of this should be mostly uncontroversial.

Thanks, up through patch 8 this all looks good to me.

> The last tow patches make things a bit slower for me, but since they
> replace command invocations with pure-shell logic they presumably make
> things a bit less painful on e.g. Windows, and the 8th patch here
> already made things quite very fast already.

These ones I could take or leave. They probably do help a little on
Windows, but I'm much more concerned about O(nr_of_commands) process
invocations than I am in reducing the base number of invocations
(because one gives a 169x speedup over the other).

And in patch 9 in particular, we're trading a grep one-liner for a
much-longer shell loop.  And I don't think this is hypocritical with
respect to patch 8; there we are replacing ugly sed with ugly shell, and
the speed benefit is clear and large.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed"
  2021-10-25 16:46               ` Jeff King
@ 2021-10-25 17:52                 ` Jeff King
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff King @ 2021-10-25 17:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau

On Mon, Oct 25, 2021 at 12:46:45PM -0400, Jeff King wrote:

> It's like spawning processes is somehow faster on your machine than
> mine. I wonder if it's a CPU governor thing. This is a laptop, and those
> numbers come from using "powersave". Doing "cpufreq-set -g performance",
> I get:
> 
>    'sh generate-cmdlist.sh command-list.txt' ran
>    14.35 ± 0.23 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>    33.15 ± 0.50 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
> 
> which is closer. But most notably all versions are 3-5x faster than
> their "powersave" counterparts. I wonder if that has been driving some
> of the confusion in our timings in this thread.

BTW, I should have mentioned these governors are used with the
intel_pstate driver. So "powersave" here is not the default "use the
lowest frequency" governor in linux, but the pstate-specific one which
is more like "ondemand".

Not that I expect anybody to look into or pontificate on governor
issues, but I wanted to make sure I didn't confuse anyone.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
@ 2021-10-25 18:29               ` Junio C Hamano
  2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-10-25 18:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> We should keep these files sorted in the C locale, e.g. in the C
> locale the order is:
>
>     git-check-mailmap
>     git-check-ref-format
>     git-checkout
>
> But under en_US.UTF-8 it's:
>
>     git-check-mailmap
>     git-checkout
>     git-check-ref-format
>
> In a subsequent commit I'll change generate-cmdlist.sh to use C sort
> order, and without this change we'd be led to believe that that change
> caused a meaningful change in the output, so let's do this as a
> separate step, right now the generate-cmdlist.sh script just uses the
> order found in this file.

Hmph, I do not mind sorting this file bytewise like this at all, but
does the justification above still apply to this round?  I had an
impression that we lose the sorting altogether in the end...

Also, I am not sure where that "led to believe" comes from---do we
have a test that checks the output from generate-cmdlist somehow?



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-10-25 18:29               ` Junio C Hamano
@ 2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
  2021-10-25 21:26                   ` Junio C Hamano
  0 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-25 21:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau


On Mon, Oct 25 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> We should keep these files sorted in the C locale, e.g. in the C
>> locale the order is:
>>
>>     git-check-mailmap
>>     git-check-ref-format
>>     git-checkout
>>
>> But under en_US.UTF-8 it's:
>>
>>     git-check-mailmap
>>     git-checkout
>>     git-check-ref-format
>>
>> In a subsequent commit I'll change generate-cmdlist.sh to use C sort
>> order, and without this change we'd be led to believe that that change
>> caused a meaningful change in the output, so let's do this as a
>> separate step, right now the generate-cmdlist.sh script just uses the
>> order found in this file.
>
> Hmph, I do not mind sorting this file bytewise like this at all, but
> does the justification above still apply to this round?  I had an
> impression that we lose the sorting altogether in the end...
>
> Also, I am not sure where that "led to believe" comes from---do we
> have a test that checks the output from generate-cmdlist somehow?

We end up unsorting the categories we bitwise-OR together. I.e. the
CAT_* on a line like this:

        { "git-whatchanged", N_("Show logs with difference each commit introduces"), 0 | CAT_ancillaryinterrogators | CAT_complete },

But we'll still sort the actual command list, and spew it out as-is in
some places "git" and "help" output. So having it be unsorted or
shuffled wouldn't be nice.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
@ 2021-10-25 21:26                   ` Junio C Hamano
  0 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-25 21:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Also, I am not sure where that "led to believe" comes from---do we
>> have a test that checks the output from generate-cmdlist somehow?
>
> We end up unsorting the categories we bitwise-OR together. I.e. the
> CAT_* on a line like this:
>
>         { "git-whatchanged", N_("Show logs with difference each commit introduces"), 0 | CAT_ancillaryinterrogators | CAT_complete },
>
> But we'll still sort the actual command list, and spew it out as-is in
> some places "git" and "help" output. So having it be unsorted or
> shuffled wouldn't be nice.

The last paragraph would be a good replacement sentence for the part
of the proposed log message that made me ask the question, I would
think.

Thanks.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster
  2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
                               ` (12 preceding siblings ...)
  2021-10-25 16:57             ` Jeff King
@ 2021-11-05 14:07             ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
                                 ` (9 more replies)
  13 siblings, 10 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This series makes the rather slow generate-cmdlist.sh shellscript run
in ~20ms on my box instead of ~400ms. This helps a lot with "make"
runtime, e.g. during interactive rebase.

This hopefully addresses all the feedback on v2. I kept the
penultimate patch as explained in an updated commit message it helped
quite a bit in the CI environment, and presumably on various other
setup.

There's also a new ~1.3x speedup of "generate-cmdlist.sh: don't parse
command-list.txt thrice" new at the end here, but on some faster boxes
that's against a relative slowdown in 9/10 compared to Jeff's
8/10[1]. I opted to keep 9/10 to give slower systems where process
spawning isn't as fast the benefit of the doubt. We spend less user
time in 10/10 than in 8/10, but more system time.

1. This is HEAD~n at the tip of the series. So .2 = 8/10, .1 = 9/10 '' = 10/10
  $ hyperfine --warmup 20 -L v .2,.1, 'sh generate-cmdlist.sh{v} command-list.txt'
   Benchmark #1: sh generate-cmdlist.sh.2 command-list.txt
     Time (mean ± σ):      19.5 ms ±   0.2 ms    [User: 16.7 ms, System: 8.1 ms]
     Range (min … max):    18.9 ms …  20.5 ms    151 runs
   
   Benchmark #2: sh generate-cmdlist.sh.1 command-list.txt
     Time (mean ± σ):      30.1 ms ±   0.3 ms    [User: 24.8 ms, System: 17.1 ms]
     Range (min … max):    29.3 ms …  31.3 ms    97 runs
   
   Benchmark #3: sh generate-cmdlist.sh command-list.txt
     Time (mean ± σ):      22.8 ms ±   0.3 ms    [User: 15.2 ms, System: 10.1 ms]
     Range (min … max):    22.5 ms …  23.7 ms    125 runs
   
   Summary
     'sh generate-cmdlist.sh.2 command-list.txt' ran
       1.17 ± 0.02 times faster than 'sh generate-cmdlist.sh command-list.txt'
       1.54 ± 0.02 times faster than 'sh generate-cmdlist.sh.1 command-list.txt'
   

Jeff King (1):
  generate-cmdlist.sh: do not shell out to "sed"

Johannes Sixt (2):
  generate-cmdlist.sh: spawn fewer processes
  generate-cmdlist.sh: replace for loop by printf's auto-repeat feature

Ævar Arnfjörð Bjarmason (7):
  command-list.txt: sort with "LC_ALL=C sort"
  generate-cmdlist.sh: trivial whitespace change
  generate-cmdlist.sh: don't call get_categories() from category_list()
  generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  generate-cmdlist.sh: stop sorting category lines
  generate-cmdlist.sh: replace "grep' invocation with a shell version
  generate-cmdlist.sh: don't parse command-list.txt thrice

 command-list.txt    | 22 ++++++-------
 generate-cmdlist.sh | 78 ++++++++++++++++++++++++++-------------------
 2 files changed, 56 insertions(+), 44 deletions(-)

Range-diff against v2:
 1:  96885282988 !  1:  c385e84c04c command-list.txt: sort with "LC_ALL=C sort"
    @@ Commit message
         separate step, right now the generate-cmdlist.sh script just uses the
         order found in this file.
     
    +    Note that this refers to the sort order of the lines in
    +    command-list.txt, a subsequent commit will also change how we treat
    +    the sort order of the "category" fields, but that's unrelated to this
    +    change.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## command-list.txt ##
    @@ command-list.txt: git-request-pull                        foreignscminterface
      git-show-ref                            plumbinginterrogators
     -git-sh-i18n                             purehelpers
     -git-sh-setup                            purehelpers
    - git-sparse-checkout                     mainporcelain           worktree
    + git-sparse-checkout                     mainporcelain
     -git-stash                               mainporcelain
      git-stage                                                               complete
     +git-stash                               mainporcelain
    @@ command-list.txt: git-var                                 plumbinginterrogators
      git-whatchanged                         ancillaryinterrogators          complete
      git-worktree                            mainporcelain
      git-write-tree                          plumbingmanipulators
    +@@ command-list.txt: gitfaq                                  guide
    + gitglossary                             guide
    + githooks                                guide
    + gitignore                               guide
     +gitk                                    mainporcelain
    -+gitweb                                  ancillaryinterrogators
    - gitattributes                           guide
    - gitcli                                  guide
    - gitcore-tutorial                        guide
    + gitmailmap                              guide
    + gitmodules                              guide
    + gitnamespaces                           guide
     @@ command-list.txt: gitremote-helpers                       guide
      gitrepository-layout                    guide
      gitrevisions                            guide
    @@ command-list.txt: gitremote-helpers                       guide
     -gittutorial-2                           guide
      gittutorial                             guide
     +gittutorial-2                           guide
    ++gitweb                                  ancillaryinterrogators
      gitworkflows                            guide
 2:  5e8fef90e42 !  2:  b4b4c3aa135 generate-cmdlist.sh: trivial whitespace change
    @@ Metadata
      ## Commit message ##
         generate-cmdlist.sh: trivial whitespace change
     
    -    This makes a subsequent diff smaller, and won't leave us with this
    -    syntax nit at the end.
    +    Add " " before a "|" at the end of a line in generate-cmdlist.sh for
    +    consistency with other code in the file. Some of the surrounding code
    +    will be modified in subsequent commits.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
 3:  6b4de6a6088 =  3:  737cca59d99 generate-cmdlist.sh: spawn fewer processes
 4:  074685cf714 =  4:  6ad17ab56c2 generate-cmdlist.sh: don't call get_categories() from category_list()
 5:  f01c1fd8088 =  5:  d7be565b567 generate-cmdlist.sh: run "grep | sort", not "sort | grep"
 6:  e0b11514b8d =  6:  646363db11f generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
 7:  f2f37c2963b =  7:  d8cc7c246b8 generate-cmdlist.sh: stop sorting category lines
 8:  83318d6c0da =  8:  aeeecc575fb generate-cmdlist.sh: do not shell out to "sed"
 9:  7903dd1f8c2 !  9:  e2702bcc1d0 generate-cmdlist.sh: replace "grep' invocation with a shell version
    @@ Commit message
         test-lib-functions.sh.
     
         On my *nix system this makes things quite a bit slower compared to
    -    HEAD~, but since the generate-cmdlist.sh is already quite fast, and
    -    this likely helps systems where command invocations are more
    -    expensive (i.e. Windows) let's use this anyway.
    -
    +    HEAD~:
    +    o
           'sh generate-cmdlist.sh.old command-list.txt' ran
             1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
            18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
     
    -    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    +    But when I tried running generate-cmdlist.sh 100 times in CI I found
    +    that it helped across the board even on OSX & Linux. I tried testing
    +    it in CI with this ad-hoc few-liner:
    +
    +        for i in $(seq -w 0 11 | sort -nr)
    +        do
    +            git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh &&
    +            git add generate-cmdlist* &&
    +            cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : &&
    +            perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh &&
    +            git add t/t00*.sh
    +        done && git commit -m"generated it"
    +
    +    Here HEAD~02 and the t0002* file refers to this change, and HEAD~03
    +    and t0003* file to the preceding commit, the relevant results were:
    +
    +        linux-gcc:
    +
    +        [12:05:33] t0002-generate-cmdlist.sh .. ok       14 ms ( 0.00 usr  0.00 sys +  3.64 cusr  3.09 csys =  6.73 CPU)
    +        [12:05:30] t0003-generate-cmdlist.sh .. ok       32 ms ( 0.00 usr  0.00 sys +  2.66 cusr  1.81 csys =  4.47 CPU)
    +
    +        osx-gcc:
    +
    +        [11:58:04] t0002-generate-cmdlist.sh .. ok    80081 ms ( 0.02 usr  0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
    +        [11:58:16] t0003-generate-cmdlist.sh .. ok    92127 ms ( 0.02 usr  0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU)
    +
    +        vs-test:
    +
    +        [12:03:14] t0002-generate-cmdlist.sh .. ok       30 s ( 0.02 usr  0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
    +        [12:03:20] t0003-generate-cmdlist.sh .. ok       32 s ( 0.00 usr  0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU)
    +
    +    I.e. even on *nix running 100 of these in a loop was up to ~2x faster
    +    in absolute runtime, I suspect it's due factors that are exacerbated
    +    in the CI, e.g. much slower process startup due to some platform
    +    limits, or a slower FS.
    +
    +    The "cut -d" change here is because we're not emitting the
    +    40-character aligned output anymore, i.e. we'll get the output from
    +    command_list() now, not an as-is line from command-list.txt.
    +
    +    This also makes the parsing more reliable, as we could tweak the
    +    whitespace alignment without breaking this parser. Let's reword a
    +    now-inaccurate comment in "command-list.txt" describing that previous
    +    alignment limitation. We'll still need the "### command-list [...]"
    +    line due to the "Documentation/cmd-list.perl" logic added in
    +    11c6659d85d (command-list: prepare machinery for upcoming "common
    +    groups" section, 2015-05-21).
    +
    +    There was a proposed change subsequent to this one[3] which continued
    +    moving more logic into the "command_list() function, i.e. replaced the
    +    "cut | tr | grep" chain in "category_list()" with an argument to
    +    "command_list()".
    +
    +    That change might have had a bit of an effect, but not as much as the
    +    preceding commit, so I decided to drop it. The relevant performance
    +    numbers from it were:
    +
    +        linux-gcc:
    +
    +        [12:05:33] t0001-generate-cmdlist.sh .. ok       13 ms ( 0.00 usr  0.00 sys +  3.33 cusr  2.78 csys =  6.11 CPU)
    +        [12:05:33] t0002-generate-cmdlist.sh .. ok       14 ms ( 0.00 usr  0.00 sys +  3.64 cusr  3.09 csys =  6.73 CPU)
    +
    +        osx-gcc:
    +
    +        [11:58:03] t0001-generate-cmdlist.sh .. ok    78416 ms ( 0.02 usr  0.01 sys + 11.78 cusr  6.22 csys = 18.03 CPU)
    +        [11:58:04] t0002-generate-cmdlist.sh .. ok    80081 ms ( 0.02 usr  0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
    +
    +        vs-test:
    +
    +        [12:03:20] t0001-generate-cmdlist.sh .. ok       34 s ( 0.00 usr  0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU)
    +        [12:03:14] t0002-generate-cmdlist.sh .. ok       30 s ( 0.02 usr  0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
    +
    +    As above HEAD~2 and t0002* are testing the code in this commit (and
    +    the line is the same), but HEAD~1 and t0001* are testing that dropped
    +    change in [3].
    +
    +    1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/
    +    2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/
    +    3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/
    +
    + ## command-list.txt ##
    +@@
    + # specified here, which can only have "guide" attribute and nothing
    + # else.
    + #
    +-### command list (do not change this line, also do not change alignment)
    ++### command list (do not change this line)
    + # command name                          category [category] [category]
    + git-add                                 mainporcelain           worktree
    + git-am                                  mainporcelain
     
      ## generate-cmdlist.sh ##
     @@ generate-cmdlist.sh: die () {
    @@ generate-cmdlist.sh: die () {
     +	while read cmd rest
     +	do
     +		case "$cmd" in
    -+		"#"*)
    -+			continue;
    ++		"#"* | '')
    ++			# Ignore comments and allow empty lines
    ++			continue
     +			;;
     +		*)
     +			case "$exclude_programs" in
    -+				*":$cmd:"*)
    ++			*":$cmd:"*)
     +				;;
     +			*)
     +				echo "$cmd $rest"
     +				;;
     +			esac
     +		esac
    -+	done
    ++	done <"$1"
      }
      
      category_list () {
    --	command_list "$1" |
    + 	command_list "$1" |
     -	cut -c 40- |
    -+	command_list <"$1" |
     +	cut -d' ' -f2- |
      	tr ' ' '\012' |
      	grep -v '^$' |
      	LC_ALL=C sort -u
    -@@ generate-cmdlist.sh: define_category_names () {
    - print_command_list () {
    - 	echo "static struct cmdname_help command_list[] = {"
    - 
    --	command_list "$1" |
    -+	command_list <"$1" |
    - 	while read cmd rest
    - 	do
    - 		synopsis=
     @@ generate-cmdlist.sh: print_command_list () {
      	echo "};"
      }
10:  e10a43756d1 <  -:  ----------- generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell
 -:  ----------- > 10:  100084070fd generate-cmdlist.sh: don't parse command-list.txt thrice
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:07               ` Ævar Arnfjörð Bjarmason
  2021-11-05 22:45                 ` Junio C Hamano
  2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
                                 ` (8 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

We should keep these files sorted in the C locale, e.g. in the C
locale the order is:

    git-check-mailmap
    git-check-ref-format
    git-checkout

But under en_US.UTF-8 it's:

    git-check-mailmap
    git-checkout
    git-check-ref-format

In a subsequent commit I'll change generate-cmdlist.sh to use C sort
order, and without this change we'd be led to believe that that change
caused a meaningful change in the output, so let's do this as a
separate step, right now the generate-cmdlist.sh script just uses the
order found in this file.

Note that this refers to the sort order of the lines in
command-list.txt, a subsequent commit will also change how we treat
the sort order of the "category" fields, but that's unrelated to this
change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 command-list.txt | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/command-list.txt b/command-list.txt
index eb9cee8dee9..04cde20c3da 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -60,9 +60,9 @@ git-cat-file                            plumbinginterrogators
 git-check-attr                          purehelpers
 git-check-ignore                        purehelpers
 git-check-mailmap                       purehelpers
+git-check-ref-format                    purehelpers
 git-checkout                            mainporcelain
 git-checkout-index                      plumbingmanipulators
-git-check-ref-format                    purehelpers
 git-cherry                              plumbinginterrogators          complete
 git-cherry-pick                         mainporcelain
 git-citool                              mainporcelain
@@ -111,7 +111,6 @@ git-index-pack                          plumbingmanipulators
 git-init                                mainporcelain           init
 git-instaweb                            ancillaryinterrogators          complete
 git-interpret-trailers                  purehelpers
-gitk                                    mainporcelain
 git-log                                 mainporcelain           info
 git-ls-files                            plumbinginterrogators
 git-ls-remote                           plumbinginterrogators
@@ -124,11 +123,11 @@ git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
 git-merge-index                         plumbingmanipulators
 git-merge-one-file                      purehelpers
-git-mergetool                           ancillarymanipulators           complete
 git-merge-tree                          ancillaryinterrogators
-git-multi-pack-index                    plumbingmanipulators
+git-mergetool                           ancillarymanipulators           complete
 git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
+git-multi-pack-index                    plumbingmanipulators
 git-mv                                  mainporcelain           worktree
 git-name-rev                            plumbinginterrogators
 git-notes                               mainporcelain
@@ -154,23 +153,23 @@ git-request-pull                        foreignscminterface             complete
 git-rerere                              ancillaryinterrogators
 git-reset                               mainporcelain           history
 git-restore                             mainporcelain           worktree
-git-revert                              mainporcelain
 git-rev-list                            plumbinginterrogators
 git-rev-parse                           plumbinginterrogators
+git-revert                              mainporcelain
 git-rm                                  mainporcelain           worktree
 git-send-email                          foreignscminterface             complete
 git-send-pack                           synchingrepositories
+git-sh-i18n                             purehelpers
+git-sh-setup                            purehelpers
 git-shell                               synchelpers
 git-shortlog                            mainporcelain
 git-show                                mainporcelain           info
 git-show-branch                         ancillaryinterrogators          complete
 git-show-index                          plumbinginterrogators
 git-show-ref                            plumbinginterrogators
-git-sh-i18n                             purehelpers
-git-sh-setup                            purehelpers
 git-sparse-checkout                     mainporcelain
-git-stash                               mainporcelain
 git-stage                                                               complete
+git-stash                               mainporcelain
 git-status                              mainporcelain           info
 git-stripspace                          purehelpers
 git-submodule                           mainporcelain
@@ -189,7 +188,6 @@ git-var                                 plumbinginterrogators
 git-verify-commit                       ancillaryinterrogators
 git-verify-pack                         plumbinginterrogators
 git-verify-tag                          ancillaryinterrogators
-gitweb                                  ancillaryinterrogators
 git-whatchanged                         ancillaryinterrogators          complete
 git-worktree                            mainporcelain
 git-write-tree                          plumbingmanipulators
@@ -204,6 +202,7 @@ gitfaq                                  guide
 gitglossary                             guide
 githooks                                guide
 gitignore                               guide
+gitk                                    mainporcelain
 gitmailmap                              guide
 gitmodules                              guide
 gitnamespaces                           guide
@@ -211,6 +210,7 @@ gitremote-helpers                       guide
 gitrepository-layout                    guide
 gitrevisions                            guide
 gitsubmodules                           guide
-gittutorial-2                           guide
 gittutorial                             guide
+gittutorial-2                           guide
+gitweb                                  ancillaryinterrogators
 gitworkflows                            guide
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
  2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
                                 ` (7 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

Add " " before a "|" at the end of a line in generate-cmdlist.sh for
consistency with other code in the file. Some of the surrounding code
will be modified in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..5114f46680a 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -10,7 +10,7 @@ command_list () {
 }
 
 get_categories () {
-	tr ' ' '\012'|
+	tr ' ' '\012' |
 	grep -v '^$' |
 	sort |
 	uniq
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
  2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 22:47                 ` Junio C Hamano
  2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
                                 ` (6 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

From: Johannes Sixt <j6t@kdbg.org>

The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.

Furthermore, use sort -u instead of sort | uniq to remove yet another
process.

[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:

$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):     371.3 ms ±  64.2 ms    [User: 430.4 ms, System: 72.5 ms]
  Range (min … max):   320.5 ms … 517.7 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):     489.9 ms ± 185.4 ms    [User: 724.7 ms, System: 141.3 ms]
  Range (min … max):   346.0 ms … 885.3 ms    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 5114f46680a..27367915611 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -11,15 +11,14 @@ command_list () {
 
 get_categories () {
 	tr ' ' '\012' |
-	grep -v '^$' |
-	sort |
-	uniq
+	LC_ALL=C sort -u
 }
 
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories
+	get_categories |
+	grep -v '^$'
 }
 
 get_synopsis () {
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list()
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (2 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
                                 ` (5 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This isn't for optimization as the get_categories() is a purely shell
function, but rather for ease of readability, let's just inline these
two lines. We'll be changing this code some more in subsequent commits
to make this worth it.

Rename the get_categories() function to get_category_line(), since
that's what it's doing now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 27367915611..16043e38476 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -9,7 +9,7 @@ command_list () {
 	eval "grep -ve '^#' $exclude_programs" <"$1"
 }
 
-get_categories () {
+get_category_line () {
 	tr ' ' '\012' |
 	LC_ALL=C sort -u
 }
@@ -17,7 +17,8 @@ get_categories () {
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories |
+	tr ' ' '\012' |
+	LC_ALL=C sort -u |
 	grep -v '^$'
 }
 
@@ -66,7 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_categories)
+		for cat in $(echo "$rest" | get_category_line)
 		do
 			printf " | CAT_$cat"
 		done
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (3 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
                                 ` (4 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

This doesn't matter for performance, but let's not include the empty
lines in our sorting. This makes the intent of the code clearer.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 16043e38476..e517c33710a 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -18,8 +18,8 @@ category_list () {
 	command_list "$1" |
 	cut -c 40- |
 	tr ' ' '\012' |
-	LC_ALL=C sort -u |
-	grep -v '^$'
+	grep -v '^$' |
+	LC_ALL=C sort -u
 }
 
 get_synopsis () {
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (4 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
                                 ` (3 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

From: Johannes Sixt <j6t@kdbg.org>

This is just a small code reduction. There is a small probability that
the new code breaks when the category list is empty. But that would be
noticed during the compile step.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index e517c33710a..a1ab2b1f077 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -67,10 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_category_line)
-		do
-			printf " | CAT_$cat"
-		done
+		printf " | CAT_%s" $(echo "$rest" | get_category_line)
 		echo " },"
 	done
 	echo "};"
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (5 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
                                 ` (2 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

In a preceding commit we changed the print_command_list() loop to use
printf's auto-repeat feature. Let's now get rid of get_category_line()
entirely by not sorting the categories.

This will change the output of the generated code from e.g.:

    -       { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators },

To:

    +       { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete },

I.e. the categories are no longer sorted, but as they're OR'd together
it won't matter for the end result.

This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old)
and "master" to this code:

  'sh generate-cmdlist.sh command-list.txt' ran
    1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
    1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f077..f50112c50f8 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -9,11 +9,6 @@ command_list () {
 	eval "grep -ve '^#' $exclude_programs" <"$1"
 }
 
-get_category_line () {
-	tr ' ' '\012' |
-	LC_ALL=C sort -u
-}
-
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
@@ -67,7 +62,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed"
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (6 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

From: Jeff King <peff@peff.net>

Replace the "sed" invocation in get_synopsis() with a pure-shell
version. This speeds up generate-cmdlist.sh significantly. Compared to
HEAD~ (old) and "master" we are, according to hyperfine(1):

  'sh generate-cmdlist.sh command-list.txt' ran
   12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
   18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 generate-cmdlist.sh | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index f50112c50f8..9b7d6aea629 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -17,16 +17,6 @@ category_list () {
 	LC_ALL=C sort -u
 }
 
-get_synopsis () {
-	sed -n '
-		/^NAME/,/'"$1"'/H
-		${
-			x
-			s/.*'"$1"' - \(.*\)/N_("\1")/
-			p
-		}' "Documentation/$1.txt"
-}
-
 define_categories () {
 	echo
 	echo "/* Command categories */"
@@ -61,7 +51,18 @@ print_command_list () {
 	command_list "$1" |
 	while read cmd rest
 	do
-		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
+		synopsis=
+		while read line
+		do
+			case "$line" in
+			"$cmd - "*)
+				synopsis=${line#$cmd - }
+				break
+				;;
+			esac
+		done <"Documentation/$cmd.txt"
+
+		printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis"
 		printf " | CAT_%s" $rest
 		echo " },"
 	done
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (7 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

Replace the "grep" we run to exclude certain programs from the
generated output with a pure-shell loop that strips out the comments,
and sees if the "cmd" we're reading is on a list of excluded
programs. This uses a trick similar to test_have_prereq() in
test-lib-functions.sh.

On my *nix system this makes things quite a bit slower compared to
HEAD~:
o
  'sh generate-cmdlist.sh.old command-list.txt' ran
    1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
   18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

But when I tried running generate-cmdlist.sh 100 times in CI I found
that it helped across the board even on OSX & Linux. I tried testing
it in CI with this ad-hoc few-liner:

    for i in $(seq -w 0 11 | sort -nr)
    do
    	git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh &&
    	git add generate-cmdlist* &&
    	cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : &&
    	perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh &&
    	git add t/t00*.sh
    done && git commit -m"generated it"

Here HEAD~02 and the t0002* file refers to this change, and HEAD~03
and t0003* file to the preceding commit, the relevant results were:

    linux-gcc:

    [12:05:33] t0002-generate-cmdlist.sh .. ok       14 ms ( 0.00 usr  0.00 sys +  3.64 cusr  3.09 csys =  6.73 CPU)
    [12:05:30] t0003-generate-cmdlist.sh .. ok       32 ms ( 0.00 usr  0.00 sys +  2.66 cusr  1.81 csys =  4.47 CPU)

    osx-gcc:

    [11:58:04] t0002-generate-cmdlist.sh .. ok    80081 ms ( 0.02 usr  0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
    [11:58:16] t0003-generate-cmdlist.sh .. ok    92127 ms ( 0.02 usr  0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU)

    vs-test:

    [12:03:14] t0002-generate-cmdlist.sh .. ok       30 s ( 0.02 usr  0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
    [12:03:20] t0003-generate-cmdlist.sh .. ok       32 s ( 0.00 usr  0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU)

I.e. even on *nix running 100 of these in a loop was up to ~2x faster
in absolute runtime, I suspect it's due factors that are exacerbated
in the CI, e.g. much slower process startup due to some platform
limits, or a slower FS.

The "cut -d" change here is because we're not emitting the
40-character aligned output anymore, i.e. we'll get the output from
command_list() now, not an as-is line from command-list.txt.

This also makes the parsing more reliable, as we could tweak the
whitespace alignment without breaking this parser. Let's reword a
now-inaccurate comment in "command-list.txt" describing that previous
alignment limitation. We'll still need the "### command-list [...]"
line due to the "Documentation/cmd-list.perl" logic added in
11c6659d85d (command-list: prepare machinery for upcoming "common
groups" section, 2015-05-21).

There was a proposed change subsequent to this one[3] which continued
moving more logic into the "command_list() function, i.e. replaced the
"cut | tr | grep" chain in "category_list()" with an argument to
"command_list()".

That change might have had a bit of an effect, but not as much as the
preceding commit, so I decided to drop it. The relevant performance
numbers from it were:

    linux-gcc:

    [12:05:33] t0001-generate-cmdlist.sh .. ok       13 ms ( 0.00 usr  0.00 sys +  3.33 cusr  2.78 csys =  6.11 CPU)
    [12:05:33] t0002-generate-cmdlist.sh .. ok       14 ms ( 0.00 usr  0.00 sys +  3.64 cusr  3.09 csys =  6.73 CPU)

    osx-gcc:

    [11:58:03] t0001-generate-cmdlist.sh .. ok    78416 ms ( 0.02 usr  0.01 sys + 11.78 cusr  6.22 csys = 18.03 CPU)
    [11:58:04] t0002-generate-cmdlist.sh .. ok    80081 ms ( 0.02 usr  0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)

    vs-test:

    [12:03:20] t0001-generate-cmdlist.sh .. ok       34 s ( 0.00 usr  0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU)
    [12:03:14] t0002-generate-cmdlist.sh .. ok       30 s ( 0.02 usr  0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)

As above HEAD~2 and t0002* are testing the code in this commit (and
the line is the same), but HEAD~1 and t0001* are testing that dropped
change in [3].

1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/
3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/
---
 command-list.txt    |  2 +-
 generate-cmdlist.sh | 24 ++++++++++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/command-list.txt b/command-list.txt
index 04cde20c3da..675c28f0bd0 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -43,7 +43,7 @@
 # specified here, which can only have "guide" attribute and nothing
 # else.
 #
-### command list (do not change this line, also do not change alignment)
+### command list (do not change this line)
 # command name                          category [category] [category]
 git-add                                 mainporcelain           worktree
 git-am                                  mainporcelain
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9b7d6aea629..cfe0454d1de 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -6,12 +6,28 @@ die () {
 }
 
 command_list () {
-	eval "grep -ve '^#' $exclude_programs" <"$1"
+	while read cmd rest
+	do
+		case "$cmd" in
+		"#"* | '')
+			# Ignore comments and allow empty lines
+			continue
+			;;
+		*)
+			case "$exclude_programs" in
+			*":$cmd:"*)
+				;;
+			*)
+				echo "$cmd $rest"
+				;;
+			esac
+		esac
+	done <"$1"
 }
 
 category_list () {
 	command_list "$1" |
-	cut -c 40- |
+	cut -d' ' -f2- |
 	tr ' ' '\012' |
 	grep -v '^$' |
 	LC_ALL=C sort -u
@@ -69,11 +85,11 @@ print_command_list () {
 	echo "};"
 }
 
-exclude_programs=
+exclude_programs=:
 while test "--exclude-program" = "$1"
 do
 	shift
-	exclude_programs="$exclude_programs -e \"^$1 \""
+	exclude_programs="$exclude_programs$1:"
 	shift
 done
 
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice
  2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
                                 ` (8 preceding siblings ...)
  2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
@ 2021-11-05 14:08               ` Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle,
	Eric Sunshine, Taylor Blau,
	Ævar Arnfjörð Bjarmason

Change the "define_categories()" and "define_category_names()" functions
to take the already-parsed output of "category_list()" as an argument,
which brings our number of passes over "command-list.txt" from three
to two.

Then have "category_list()" itself take the output of "command_list()"
as an argument, bringing the number of times we parse the file to one.

Compared to the pre-image this speeds us up quite a bit:

    $ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
    $ hyperfine --warmup 10 -L v ,.old 'sh generate-cmdlist.sh{v} command-list.txt'
    Benchmark #1: sh generate-cmdlist.sh command-list.txt
      Time (mean ± σ):      22.9 ms ±   0.3 ms    [User: 15.8 ms, System: 9.6 ms]
      Range (min … max):    22.5 ms …  24.0 ms    125 runs

    Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
      Time (mean ± σ):      30.1 ms ±   0.4 ms    [User: 24.4 ms, System: 17.5 ms]
      Range (min … max):    29.5 ms …  32.3 ms    96 runs

    Summary
      'sh generate-cmdlist.sh command-list.txt' ran
        1.32 ± 0.02 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index cfe0454d1de..205541e0f7f 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -26,7 +26,7 @@ command_list () {
 }
 
 category_list () {
-	command_list "$1" |
+	echo "$1" |
 	cut -d' ' -f2- |
 	tr ' ' '\012' |
 	grep -v '^$' |
@@ -37,7 +37,7 @@ define_categories () {
 	echo
 	echo "/* Command categories */"
 	bit=0
-	category_list "$1" |
+	echo "$1" |
 	while read cat
 	do
 		echo "#define CAT_$cat (1UL << $bit)"
@@ -51,7 +51,7 @@ define_category_names () {
 	echo "/* Category names */"
 	echo "static const char *category_names[] = {"
 	bit=0
-	category_list "$1" |
+	echo "$1" |
 	while read cat
 	do
 		echo "	\"$cat\", /* (1UL << $bit) */"
@@ -64,7 +64,7 @@ define_category_names () {
 print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"
 
-	command_list "$1" |
+	echo "$1" |
 	while read cmd rest
 	do
 		synopsis=
@@ -93,6 +93,9 @@ do
 	shift
 done
 
+commands="$(command_list "$1")"
+categories="$(category_list "$commands")"
+
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
 	const char *name;
@@ -100,8 +103,8 @@ struct cmdname_help {
 	uint32_t category;
 };
 "
-define_categories "$1"
+define_categories "$categories"
 echo
-define_category_names "$1"
+define_category_names "$categories"
 echo
-print_command_list "$1"
+print_command_list "$commands"
-- 
2.34.0.rc1.721.ga0c1db665bc


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
@ 2021-11-05 22:45                 ` Junio C Hamano
  2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-11-05 22:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In a subsequent commit I'll change generate-cmdlist.sh to use C sort
> order, and without this change we'd be led to believe that that change
> caused a meaningful change in the output,

What I found misleading in this sentence (which hasn't changed after
I pointed it out in the previous iterations) is that
command-list.txt is an input file, and if the sort order used in the
script that reads this to produce some other file as its output
changes, nobody will be "led to believe" anything.  Not unless/until
which output file to look at and compare between revisions.

Is this talking about the order of entries in command-list.h file?

Also, if the script sorts the input, whether it is in C locale or
other locale, it would not matter how the input originally is
sorted, as the input does not have duplicated entries to make sort
stability matter, no?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes
  2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
@ 2021-11-05 22:47                 ` Junio C Hamano
  2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-11-05 22:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index 5114f46680a..27367915611 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -11,15 +11,14 @@ command_list () {
>  
>  get_categories () {
>  	tr ' ' '\012' |
> -	grep -v '^$' |
> -	sort |
> -	uniq
> +	LC_ALL=C sort -u
>  }
>  
>  category_list () {
>  	command_list "$1" |
>  	cut -c 40- |
> -	get_categories
> +	get_categories |
> +	grep -v '^$'
>  }

It is funny that this changes "grep then sort" into "sort then
grep", which will be "corrected" in two steps down.  The series
seems a bit over-engineered and broken down too much, at least to
me, but let's not waste any more time on it by an extra reroll.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes
  2021-11-05 22:47                 ` Junio C Hamano
@ 2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-06  4:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau


On Fri, Nov 05 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
>> index 5114f46680a..27367915611 100755
>> --- a/generate-cmdlist.sh
>> +++ b/generate-cmdlist.sh
>> @@ -11,15 +11,14 @@ command_list () {
>>  
>>  get_categories () {
>>  	tr ' ' '\012' |
>> -	grep -v '^$' |
>> -	sort |
>> -	uniq
>> +	LC_ALL=C sort -u
>>  }
>>  
>>  category_list () {
>>  	command_list "$1" |
>>  	cut -c 40- |
>> -	get_categories
>> +	get_categories |
>> +	grep -v '^$'
>>  }
>
> It is funny that this changes "grep then sort" into "sort then
> grep", which will be "corrected" in two steps down.  The series
> seems a bit over-engineered and broken down too much, at least to
> me, but let's not waste any more time on it by an extra reroll.

Yes, it's a bit of back and forth, but I didn't want to outright drop
Johannes's patches which I'd integrated here, and thought it would be
helpful to others to distill the history of various optimization steps
(starting with Johannes's work here) into the permanent commit history.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-11-05 22:45                 ` Junio C Hamano
@ 2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
  2021-11-08 19:18                     ` Junio C Hamano
  0 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-06  4:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau


On Fri, Nov 05 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> In a subsequent commit I'll change generate-cmdlist.sh to use C sort
>> order, and without this change we'd be led to believe that that change
>> caused a meaningful change in the output,
>
> What I found misleading in this sentence (which hasn't changed after
> I pointed it out in the previous iterations) is that[...]

I tried to clarify what you raised in the previous iteration with the
new paragraph after the one you're quoting. I.e.:

    Note that this refers to the sort order of the lines in
    command-list.txt, a subsequent commit will also change how we treat
    the sort order of the "category" fields, but that's unrelated to this
    change.

> command-list.txt is an input file, and if the sort order used in the
> script that reads this to produce some other file as its output
> changes, nobody will be "led to believe" anything.  Not unless/until
> which output file to look at and compare between revisions.

I'd read your comment on the previous iteration as you being unclear on
whether we were changing the sort order of lines in the file, or the
category fields found within those lines.

We don't sort the lines, and never have, just the fields, and within
this series we stop doing that sorting (as it ends up in an OR'd
bitfield, so it doesn't matter).

> Is this talking about the order of entries in command-list.h file?
>
> Also, if the script sorts the input, whether it is in C locale or
> other locale, it would not matter how the input originally is
> sorted, as the input does not have duplicated entries to make sort
> stability matter, no?

This change is just to make this consistent for human editors. I think
we re-sort this wherever we display this in git, whether that's via
help.c or the completion powered by git.c.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort"
  2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
@ 2021-11-08 19:18                     ` Junio C Hamano
  0 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-11-08 19:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine,
	Taylor Blau

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Nov 05 2021, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> In a subsequent commit I'll change generate-cmdlist.sh to use C sort
>>> order, and without this change we'd be led to believe that that change
>>> caused a meaningful change in the output,
>>
>> What I found misleading in this sentence (which hasn't changed after
>> I pointed it out in the previous iterations) is that[...]
>
> I tried to clarify what you raised in the previous iteration with the
> new paragraph after the one you're quoting. I.e.:
>
>     Note that this refers to the sort order of the lines in
>     command-list.txt, a subsequent commit will also change how we treat
>     the sort order of the "category" fields, but that's unrelated to this
>     change.

Yeah, but my question was not about the order of category tokens on
each line.  My question was the order of these lines in the file.

The new pargraph is answering a question that wasn't asked.

>> Is this talking about the order of entries in command-list.h file?
>>
>> Also, if the script sorts the input, whether it is in C locale or
>> other locale, it would not matter how the input originally is
>> sorted, as the input does not have duplicated entries to make sort
>> stability matter, no?
>
> This change is just to make this consistent for human editors. I think
> we re-sort this wherever we display this in git, whether that's via
> help.c or the completion powered by git.c.

So

>> order, and without this change we'd be led to believe that that change
>> caused a meaningful change in the output,

is something irrelevant to explain what this change is, no?

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2021-11-08 19:18 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31             ` Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.