Re: [PATCH 0/4] test-tool: split up "read-cache" tool

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Emily Shaffer <emilyshaffer@google.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 0/4] test-tool: split up "read-cache" tool
Date: Thu, 01 Jul 2021 00:24:37 +0200	[thread overview]
Message-ID: <87pmw3dlkl.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YNzp4WIlZIzqD7PU@google.com>

On Wed, Jun 30 2021, Emily Shaffer wrote:

> On Mon, Jun 07, 2021 at 01:58:23PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> When the sparse index topic was being discussed I suggested that the
>> t/helper/read-cache.c tool was getting to the point of doing too many
>> things and should be split up.
>> 
>> Since that series has landed on master here's that suggestion again in
>> the form of patches on top of master. The 4/4 patch is a "while I was
>> at it" addition of an extra perf test for index refreshing.
>> 
>> 1. https://lore.kernel.org/git/20210317132814.30175-6-avarab@gmail.com/
>> 
>
> Since "What's cooking" mentioned this series was starved for review, we
> took a look at it today. Most of my comments are pretty general, so I'll
> reply primarily to the cover letter instead.
>
> The result after this series is that we have three forms of 'test-tool
> read-cache':
>  - one for unit testing, with no iteration (test-tool read-cache)
>  - one for not-so-perfy iteration (test-tool read-cache-again)
>  - one for perfy iteration and refresh_index benching (test-tool
>    read-cache-perf)
>
> Before I read patch 4, I said, "'again' and 'perf' have a lot of code in
> common, but I guess we are trying to reduce the amount of overhead for
> tight-loop performance testing, so OK." But patch 4 adds an alternative
> body for the inside of the loop, which *looks* not very performant
> (although is probably optimized well) by checking an unchanging bool on
> every iteration of the loop.
>
> I tend to think it would be easier to read
> and understand to have two forms of 'test-tool read-cache' - one which
> iterates and one which does not. Maybe the one that iterates should be
> called -perf, maybe it should be called something else, whatever. And
> perhaps it makes sense for the iterating one to look like so (heavily
> pseudocoded, hope you can follow along with the rough sketch):
>
>   enum iteration_mode;
>
>   parse_options();
>   if (should_do_again) {
>     iteration_mode = AGAIN;
>   else if (should_do_perf) {
>     iteration_mode = PERF;
>   else if (should_do_refresh) {
>     iteration_mode = REFRESH;
>   }
>
>   while (passes--)
>     switch (iteration_mode) {
>     case AGAIN:
>       read index;
>       refresh index;
>       index_name_pos;
>       error reporting;
>       write_file;
>       discard_index;
>       break;
>     case PERF:
>       read index;
>       discard index;
>       break;
>     case REFRESH:
>       refresh_index;
>       break;
>     }
>
> This would put all our "loop lots of times for performance benchmarking"
> into one place. We know that the switch statement is very performant,
> especially if we manage to const-ify iteration_mode.  The cases make it
> very clear that the body of the loop is being swapped out depending on
> the arguments, and that entirely different behavior is happening in each
> scenario.
>
> There's also an orthogonal bit of cleanup here by moving to
> parse_options(), which I am excited about in general :) but which I
> think wasn't done very cleanly in this series. In patch 1, the commit
> message makes no mention of the fairly significant refactor happening to
> move 'test-tool read-cache' to parse_options(), and I think the mix of
> cut&paste with refactor makes the patch a little muddy. What about a
> series like so:
>
>  1/3: teach test-tool read-cache to use parse_options()
>  2/3: add test-tool read-cache-(perf|iterating|whatever)
>  3/3: teach test-tool read-cache-perf "--refresh"
>
>
> Thanks, and hopefully this is a welcome necromancy and not an annoying
> one ;)

Thanks.

Yes 1/4 is a bit confsing, but the alternative to converting it to
parse_option while we're at it would be to introduce support for
understanding "print-and-refresh" in one commit, only to remove it in
another. So I aimed for something that would play as well with the diff
rename detection as possible for 1/4. Do you think it's worth it to add
such an back & forth step?

For the rest I think there's been a bit of a misunderstanding (which is
on me in not explaining this well enough), i.e. your:

    I guess we are trying to reduce the amount of overhead for
    tight-loop performance testing

That wasn't really what I was going for at all, but rather (copied from
the 1st commit message):

    I think that having one test tool do so many different things makes it
    harder to read its code. Let's instead split up the "again" and "perf"
    uses for it into their own tools.

This series was extracted out of an RFC-reply of mine to one of
Derrick's recent patches where he was adding more knobs to
test-read-cache.c (and I understand the plan in the near-term is to add
more sparse-specific things to it).

So the goal here was only to split these different use cases all sharing
one function so that they used 3 functions instead.

Your suggestion of the switch/case loop would work, and would be
performant. Optimizing this wasn't a goal at all though, making it more
readable was.

I don't think it matters much if at all how performant the scaffolding
of the test-tool itself is, if it was relatively more expensive we could
just run more loops and we'd eventually get meaningful numbers in the
perf test. The "perf" in the name is just "this is for the perf test"
not "this is the performant version".

I think that for the test tools where we're mostly not aiming for
generalized but helpers narrowly tailored to specific tests it makes
more sense to split early than have tools that take N options, with
functions that use some small subset of those N options for any given
use-case.

There's going to be duplication in the scaffolding
(e.g. parse_options()), but that sort of thing is easy to read, whereas
reasoning about some inner loop and wondering "hrm, did this config call
there impact any of this" takes more time than it should.

Anyway, knowing that's the goal (and performance of the tool itself
wasn't, at all), do you think the end result makes more/less sense than
before?