git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"René Scharfe" <l.s.r@web.de>,
	"SZEDER Gábor" <szeder.dev@gmail.com>, "Eric Wong" <e@80x24.org>
Subject: Re: [PATCH] Makefile: fix bugs in coccicheck and speed it up
Date: Fri, 5 Mar 2021 05:24:13 -0500	[thread overview]
Message-ID: <YEIGzXMDax83cwAx@coredump.intra.peff.net> (raw)
In-Reply-To: <20210302205103.12230-1-avarab@gmail.com>

On Tue, Mar 02, 2021 at 09:51:03PM +0100, Ævar Arnfjörð Bjarmason wrote:

> What we were doing before was processing all our *.c files, and for
> each of those *.c files we'd recursively look around for includes and
> see if we needed to search/replace in those too.
> 
> That we did that dates back to [1] when we were only processing *.c
> files, and it was always very redundant. We'd e.g. visit the likes of
> strbuf.h lots of times since it's widely used as an include.

OK. I don't offhand know if processing the includes along with the C
files buys us anything else. Coccinelle's behavior is often quite a
mystery, but if we think this produces the same results with less time,
I'm for it.

BTW, this "dates back to when we were only processing *.c files"
statement confused me a bit. Doesn't that include the current state
before this patch?  I.e., this hunk:

> -FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES)))
> +FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES)))

seems to be an important part of the change, without which moving away
from --all-includes would be wrong.

> Then in the most recent attempt to optimize coccicheck in [2] this
> anti-pattern finally turned into a bug.
>
> Namely: before this change, if your coccicheck rule applied to
> e.g. making a change in strbuf.h itself we'd get *lots* of duplicate
> hunks applying the exact same change, as concurrent spatch processes
> invoked by xargs raced one another. In one instance I ended up with 27
> copies of the same hunk in a strbuf.patch.
> 
> Setting SPATCH_BATCH_SIZE=0 and processing all the files in one giant
> batch mitigated this. I suspect the author of [2] either mostly ran in
> that mode, or didn't test on changes that impacted widely used header
> files.

This "turned into a bug" I didn't understand, though. That commit [2]
just switched from always feeding files one at a time to letting you
feed more than one.  So yes, feeding all at once (with
SPATCH_BATCH_SIZE=0) mitigated the bug. But wouldn't any bug that shows
up with SPATCH_BATCH_SIZE=1 have already existed previously?

IOW, I don't think the batch-size stuff made anything worse. It made one
specific case better, but that was not even its purpose.

That is maybe splitting hairs, but I want to make sure I understand all
of what you're claiming here. But also...

> I'm entirely removing SPATCH_BATCH_SIZE. If you want to tweak it you
> can tweak SPATCH_XARGS_FLAGS to e.g. "-n 256", or "-P 4 -n 128". But
> in my testing it isn't worth it to tweak SPATCH_XARGS_FLAGS for a full
> "make coccicheck".

You hard-code this to 32 now. But I'm not sure if that's going to be the
universal best value.

Applying just part of your patch, but leaving SPATCH_BATCH_SIZE in
place, like so:

diff --git a/Makefile b/Makefile
index dfb0f1000f..88cb157547 100644
--- a/Makefile
+++ b/Makefile
@@ -1201,7 +1201,7 @@ SP_EXTRA_FLAGS = -Wno-universal-initializer
 # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will
 # usually result in less CPU usage at the cost of higher peak memory.
 # Setting it to 0 will feed all files in a single spatch invocation.
-SPATCH_FLAGS = --all-includes --patch .
+SPATCH_FLAGS = --no-includes --patch .
 SPATCH_BATCH_SIZE = 1
 
 include config.mak.uname
@@ -2859,7 +2859,7 @@ check: config-list.h command-list.h
 		exit 1; \
 	fi
 
-FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES)))
+FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES)))
 COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES))
 
 %.cocci.patch: %.cocci $(COCCI_SOURCES)

Then I get this with various batch sizes (using "make -j16" on an 8-core
machine, so all of the patches are being run at the same time):

  size |  real  |  user  | sys
  -------------------------------
    1  |  1m28s | 10m00s | 0m56s
    2  |  1m03s |  7m49s | 0m33s
    4  |  0m51s |  6m28s | 0m20s
    8  |  0m45s |  6m02s | 0m14s
   16  |  0m45s |  6m17s | 0m10s
   32  |  0m44s |  6m33s | 0m07s
   64  |  0m45s |  6m48s | 0m06s
    0  |  1m08s | 10m08s | 0m03s

So there's a sweet spot at 8. Doing "32" isn't that much worse (runtime
is about the same, but we use more CPU), but it gets progressively worse
as the batch size increases.

That's the same sweet spot before your patch, too, btw. I know because I
happened to be timing it the other day, as coccinelle 1.1.0 hit debian
unstable. When I originally wrote the SPATCH_BATCH_SIZE feature, I think
I was using 1.0.4 or 1.0.7. Back then SPATCH_BATCH_SIZE=0 was a clear
win, assuming you had the memory. But in 1.0.8 it ran for many minutes
without finishing. I found back then that "2" was the sweet spot. But
now it's "8".

All of which is to say: the timing difference between my "8" and "32"
cases isn't that exciting. But the performance of spatch has been
sufficiently baffling to me that I think it's worth keeping the knob.

Your XARGS_FLAGS does accomplish something similar (and is in fact more
flexible, though at the cost of being less abstract).  I'm OK to replace
one with the other, but shouldn't it happen in a separate patch? It's
completely orthogonal to the --no-includes behavior.

> I'm also the whole "cat $@.log" introduced in [3]. Since we don't call
> this in a loop anymore (and xargs will early-exit) we can just rely on
> standard V=1 for debugging issues.

I think this is missing the point of the cat. We've redirected spatch's
stderr to the logfile. So if there's an error, you have no idea what it
was without manually figuring out which file to cat. And V=1 doesn't
help that.

We could stop doing that redirection, but the problem is that spatch is
very verbose. So the point was to store the output but only dump it
when we see an error.

So this part of the patch seems like a pure regression to me. E.g.,
introduce a syntax error like:

diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
index 4490069df9..c6c6562a0a 100644
--- a/contrib/coccinelle/free.cocci
+++ b/contrib/coccinelle/free.cocci
@@ -11,7 +11,7 @@ expression E;
   free(E);
 
 @@
-expression E;
+expression E
 @@
 - free(E);
 + FREE_AND_NULL(E);

and run "make coccicheck" before and after your patch:

  [before]
  $ make coccicheck
  GIT_VERSION = 2.31.0.rc1.8.gbe7935ed8b.dirty
      SPATCH contrib/coccinelle/free.cocci
  init_defs_builtins: /usr/bin/../lib/coccinelle/standard.h
  meta: parse error: 
    File "contrib/coccinelle/free.cocci", line 15, column 0, charpos = 99
    around = '@@',
    whole content = @@

  xargs: spatch: exited with status 255; aborting
  make: *** [Makefile:2866: contrib/coccinelle/free.cocci.patch] Error 1

  [after]
  $ make coccicheck
    SPATCH contrib/coccinelle/free.cocci
  make: *** [Makefile:2875: contrib/coccinelle/free.cocci.patch] Error 124

-Peff

  parent reply	other threads:[~2021-03-05 10:25 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 23:48 [RFC PATCH] *.h: remove extern from function declarations Denton Liu
2019-04-13  1:24 ` Jeff King
2019-04-13  5:45   ` Junio C Hamano
2019-04-15 18:24 ` [PATCH v2 0/3] " Denton Liu
2019-04-15 18:24   ` [PATCH v2 1/3] *.[ch]: remove extern from function declarations using spatch Denton Liu
2019-04-15 19:19     ` Thomas Gummerer
2019-04-15 18:24   ` [PATCH v2 2/3] *.[ch]: remove extern from function declarations using sed Denton Liu
2019-04-15 18:24   ` [PATCH v2 3/3] cocci: prevent extern function declarations Denton Liu
2019-04-17  7:58   ` [PATCH v3 0/4] remove extern from " Denton Liu
2019-04-17  7:58     ` [PATCH v3 1/4] *.[ch]: remove extern from function declarations using spatch Denton Liu
2019-04-17  7:58     ` [PATCH v3 2/4] *.[ch]: remove extern from function declarations using sed Denton Liu
2019-04-17  7:58     ` [PATCH v3 3/4] *.[ch]: manually align parameter lists Denton Liu
2019-04-17  7:58     ` [PATCH v3 4/4] cocci: prevent extern function declarations Denton Liu
2019-04-22  5:44     ` [PATCH] cache.h: fix mismerge of 'dl/no-extern-in-func-decl' Denton Liu
2019-04-22  6:30       ` Junio C Hamano
2019-04-22 11:19         ` Junio C Hamano
2019-04-22 21:49     ` [PATCH v3 0/4] remove extern from function declarations Jeff King
2019-04-25 12:07       ` SZEDER Gábor
2019-04-25 18:05         ` Denton Liu
2019-04-30 23:21         ` Johannes Schindelin
2019-05-01 10:01           ` Denton Liu
2019-05-01 18:56             ` Jeff King
2019-05-02  0:04             ` SZEDER Gábor
2019-05-03  9:32               ` Johannes Schindelin
2019-05-03 14:42                 ` SZEDER Gábor
2019-05-03 14:58                   ` SZEDER Gábor
2019-05-03 17:45                   ` Jeff King
2019-05-03 18:44                     ` SZEDER Gábor
2019-05-05  5:28                     ` Junio C Hamano
2019-05-05 18:09                       ` Jacob Keller
2019-05-05 18:08                     ` Jacob Keller
2019-05-06  5:11                       ` [PATCH] coccicheck: optionally process every source file at once Jeff King
2019-05-06  9:34                         ` Duy Nguyen
2019-05-06 23:43                           ` [PATCH] coccicheck: optionally batch spatch invocations Jeff King
2019-05-07  1:41                             ` Jacob Keller
2019-05-07  2:04                               ` Jeff King
2019-05-07  2:42                             ` Junio C Hamano
2019-05-07  2:55                               ` Jeff King
2019-05-07  3:04                                 ` Jacob Keller
2019-05-07  4:52                                 ` Junio C Hamano
2019-05-08  7:07                                   ` Jeff King
2019-05-08 12:36                                     ` Denton Liu
2019-05-08 22:39                                       ` Jeff King
2019-05-07 10:20                             ` Duy Nguyen
2019-05-07 11:19                             ` SZEDER Gábor
2021-03-02 20:51                             ` [PATCH] Makefile: fix bugs in coccicheck and speed it up Ævar Arnfjörð Bjarmason
2021-03-03  9:43                               ` Denton Liu
2021-03-03 11:45                               ` Ævar Arnfjörð Bjarmason
2021-03-04 23:18                               ` Junio C Hamano
2021-03-05 11:17                                 ` Ævar Arnfjörð Bjarmason
2021-03-05 10:24                               ` Jeff King [this message]
2021-03-05 17:20                                 ` Ævar Arnfjörð Bjarmason
2021-03-06 10:59                                   ` Jeff King
2021-03-05 17:07                               ` [PATCH v2 0/4] Makefile/coccicheck: fix bugs " Ævar Arnfjörð Bjarmason
2021-03-05 19:10                                 ` René Scharfe.
     [not found]                                   ` <xmqqim659u57.fsf@gitster.c.googlers.com>
2021-03-06 11:26                                     ` René Scharfe.
2021-03-06 12:43                                       ` René Scharfe.
     [not found]                                       ` <xmqqft16914r.fsf@gitster.c.googlers.com>
2021-03-13 16:10                                         ` René Scharfe.
2021-03-06 17:27                                   ` Ævar Arnfjörð Bjarmason
2021-03-06 17:41                                     ` René Scharfe.
2021-03-06 17:52                                       ` Ævar Arnfjörð Bjarmason
2021-03-06 19:08                                         ` René Scharfe.
2021-03-05 17:07                               ` [PATCH v2 1/4] Makefile/coccicheck: add comment heading for all SPATCH flags Ævar Arnfjörð Bjarmason
2021-03-05 17:07                               ` [PATCH v2 2/4] Makefile/coccicheck: speed up and fix bug with duplicate hunks Ævar Arnfjörð Bjarmason
2021-03-06 10:45                                 ` Jeff King
2021-03-06 19:29                                   ` Ævar Arnfjörð Bjarmason
2021-03-05 17:07                               ` [PATCH v2 3/4] Makefile/coccicheck: allow for setting xargs concurrency Ævar Arnfjörð Bjarmason
2021-03-06 10:51                                 ` Jeff King
2021-03-05 17:07                               ` [PATCH v2 4/4] Makefile/coccicheck: set SPATCH_BATCH_SIZE to 8 Ævar Arnfjörð Bjarmason
2021-03-06 19:25                                 ` [PATCH v2 5/4] Makefile/coccicheck: use --include-headers-for-types Ævar Arnfjörð Bjarmason
2021-03-18 20:49                                   ` SZEDER Gábor
2021-03-19 10:32                                     ` Ævar Arnfjörð Bjarmason
2021-03-22 12:11                                   ` [PATCH v4 0/4] Makefile/coccicheck: fix bugs and speed it up Ævar Arnfjörð Bjarmason
2021-03-22 12:11                                     ` [PATCH v4 1/4] Makefile/coccicheck: add comment heading for all SPATCH flags Ævar Arnfjörð Bjarmason
2021-03-22 18:04                                       ` René Scharfe.
2021-03-22 12:11                                     ` [PATCH v4 2/4] Makefile/coccicheck: speed up and fix bug with duplicate hunks Ævar Arnfjörð Bjarmason
2021-03-22 18:05                                       ` René Scharfe.
2021-03-24 19:19                                         ` Jeff King
2021-03-22 19:09                                       ` Junio C Hamano
2021-03-22 12:11                                     ` [PATCH v4 3/4] Makefile/coccicheck: allow for setting xargs concurrency Ævar Arnfjörð Bjarmason
2021-03-24 19:26                                       ` Jeff King
2021-03-25  2:29                                         ` Ævar Arnfjörð Bjarmason
2021-03-26  4:11                                           ` Jeff King
2021-03-22 12:11                                     ` [PATCH v4 4/4] Makefile/coccicheck: set SPATCH_BATCH_SIZE to 8 Ævar Arnfjörð Bjarmason
2021-03-22 18:05                                       ` René Scharfe.
2021-03-24 19:27                                         ` Jeff King
2021-03-27 17:43                                     ` [PATCH v4 0/4] Makefile/coccicheck: fix bugs and speed it up Junio C Hamano
2021-03-27 19:46                                       ` Ævar Arnfjörð Bjarmason
2019-05-03  9:40               ` [PATCH v3 0/4] remove extern from function declarations Denton Liu
2019-04-23 23:40   ` [PATCH v4 " Denton Liu
2019-04-23 23:40     ` [PATCH v4 1/4] *.[ch]: remove extern from function declarations using spatch Denton Liu
2019-04-23 23:40     ` [PATCH v4 2/4] *.[ch]: remove extern from function declarations using sed Denton Liu
2019-04-24  4:56       ` Junio C Hamano
2019-04-25 19:00         ` Denton Liu
2019-04-23 23:40     ` [PATCH v4 3/4] *.[ch]: manually align parameter lists Denton Liu
2019-04-23 23:40     ` [PATCH v4 4/4] cocci: prevent extern function declarations Denton Liu
2019-04-29  8:28     ` [PATCH v5 0/3] *** SUBJECT HERE *** Denton Liu
2019-04-29  8:28       ` [PATCH v5 1/3] *.[ch]: remove extern from function declarations using spatch Denton Liu
2019-04-29  8:28       ` [PATCH v5 2/3] *.[ch]: remove extern from function declarations using sed Denton Liu
2019-04-29  8:28       ` [PATCH v5 3/3] *.[ch]: manually align parameter lists Denton Liu
2019-04-29  8:30       ` [PATCH v5 0/3] *** SUBJECT HERE *** Denton Liu
2019-05-06 11:03         ` Ævar Arnfjörð Bjarmason
2019-05-06 15:34           ` Denton Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEIGzXMDax83cwAx@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).