git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling
Date: Wed, 09 Jun 2021 00:12:58 +0200	[thread overview]
Message-ID: <87lf7k2bem.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <74183ce6-e17f-1b11-1ceb-7a8d873bc1c7@web.de>


On Tue, Jun 08 2021, René Scharfe wrote:

> Am 08.06.21 um 12:58 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Tue, Jun 08 2021, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>>> So I think this pattern works:
>>>>>
>>>>> 	for (i = 0; i < nr; i++) {
>>>>> 		display_progress(p, i);
>>>>> 		/* work work work */
>>>>> 	}
>>>>> 	display_progress(p, nr);
>>>>>
>>>>> Alternatively, if the work part doesn't contain continue statements:
>>>>>
>>>>> 	for (i = 0; i < nr; i++) {
>>>>> 		/* work work work */
>>>>> 		display_progress(p, i + 1);
>>>>> 	}
>>>>
>>>> But yes, I agree with the issue in theory, but I think in practice we
>>>> don't need to worry about these 100% cases.
>>>
>>> Hmph, but in practice we do need to worry, don't we?  Otherwise you
>>> wouldn't have started this thread and René wouldn't have responded.
>>
>> I started this thread because of:
>>
>> 	for (i = 0; i < large_number; i++) {
>> 		if (maybe_branch_here())
>> 			continue;
>> 		/* work work work */
>> 		display_progress(p, i);
>> 	}
>> 	display_progress(p, large_number);
>>
>> Mainly because it's a special snowflake in how the process.c API is
>> used, with most other callsites doing:
>>
>> 	for (i = 0; i < large_number; i++) {
>> 		display_progress(p, i + 1);
>> 		/* work work work */
>> 	}
>
> Moving the first call to the top of the loop makes sense.  It ensures
> all kind of progress -- skipping and actual work -- is reported without
> undue delay.
>
> Adding one would introduce an off-by-one error.  Removing the call after
> the loop would leave the progress report at one short of 100%.  I don't
> see any benefits of these additional changes, only downsides.
>
> If other callsites have an off-by-one error and we care enough then we
> should fix them.  Copying their style and spreading the error doesn't
> make sense -- correctness trumps consistency.
>
>> Fair enough, but in the meantime can we take this patch? I think fixing
>> that (IMO in practice hypothetical issue) is much easier when we
>> consistently use that "i + 1" pattern above (which we mostly do
>> already). We can just search-replace "++i" to "i++" and "i + 1" to "i"
>> and have stop_progress() be what bumps it to 100%.
>
> This assumes the off-by-one error is consistent.  Even if that is the
> case you could apply your mechanical fix and leave out read-cache.
> This would happen automatically because when keeping i there is no ++i
> to be found.
>
> stop_progress() doesn't set the progress to 100%:
>
>    $ (echo progress 0; echo update) |
>      ./t/helper/test-tool progress --total 1 test
>    test:   0% (0/1), done.
>

I was too quick with that "But yes, I agree with the issue in theory".

Having thought about it some more I think you're wrong, it doesn't make
sense to use the API in the way you're suggesting.

There's an off-by-one error, but it's in the pattern you're
suggesting. Calling "i + 1" on the "first item" doesn't just make for
less verbose code, it's also more correct.

Why? Because:

    /* 1. Setup progress */
    large_number = 3;
    progress = start_progress(title, total);

    /* 2. Before we "actually" do anything */
    for (i = 0; i < 3; i++) {
        /* 3. Now doing O(large_number) work */
        display_progress(progress, i + 1);
    }

    /* 4. Done */
    stop_progress(&progress);

As you'll see if you insert a sleep or whatever at "2" we'll signal and
display the progress bar even if we haven't called display_progress()
yet.

Thus calling display_progress with n=0 makes "we are doing the first
item" indistinguishable from "we haven't gotten to the first item
yet". When you're in the loop the first item should be n=1.

This isn't just more correct with this API. I think it also makes sense
not to leak the zero indexed C abstraction to human output. As in:

    I sat down at the table with my three apples (stages 1..2 above),
    and now I'm eating my first apple (stage 3), I'm not starting to eat
    my zeroeth apple. At some point I'm done eating all 3 apples (stage
    4).

I think this gets to the crux of the issue for you, per your upthread:

    [...]With your change you'd get 100% for a minute.  Both would be
    annoying, but the latter would have me raging.  "If you're done",
    I'd yell at the uncaring machine, "what are you still doing!?".

If we said we're done and we're not then yes, that would be a bug.

But that's not what we said. Distinguishing "I'm on 3/3" from "I'm done"
is exactly what the progress.c output does:

    $ perl -wE 'for (0..3) { say "update"; say "progress $_" }' |
      ./helper/test-tool progress --total=3 Apples 2>&1 |
      cat -v | perl -pe 's/\^M\K/\n/g'
    Apples:   0% (0/3)^M
    Apples:  33% (1/3)^M
    Apples:  66% (2/3)^M
    Apples: 100% (3/3)^M
    Apples: 100% (3/3), done.

It's not immediately obvious from that output, but the last line ending
in ", done" isn't added by any display_progress() call, but by calling
stop_progress(). That's when we're done.

Arguably this is too subtle and we should say ", but not done yet!" or
something on that penultimate line. But that's bikeshedding a display
issue; The API use in my patch is correct.

As I noted upthread that's a "matter of some philosophy". I guess you
might report your progress as being at 2/3 apples, even though you're
one bite away from finishing the third apple. Personally I'd say I'm on
the third apple, I'm just not done with it.

But the philosophizing of whether your "progress" is the zeroeth apple
of the day as you're chewing on it aside, I think it's clear that in the
context of the progress.c API using n=0 for the first apple would be a
bug.

You'll conflate "setup" with "work" per the above, and you'll say you're
on the second apple even if you're a bite away from finishing it. We
don't conflate that "100%" state with being "done", so I don't see why
we'd do that.

> I wonder (only in a semi-curious way, though) if we can detect
> off-by-one errors by adding an assertion to display_progress() that
> requires the first update to have the value 0, and in stop_progress()
> one that requires the previous display_progress() call to have a value
> equal to the total number of work items.  Not sure it'd be worth the
> hassle..

That's intentional. We started eating 3 apples, got to one, but now our
house is on fire and we're eating no more apples today, even if we
planned to eat 3 when we sat down.

The progress bar reflects this unexpected but recoverable state:
    
    $ perl -wE 'for (0..1) { say "update"; say "progress $_" }' |
      ./helper/test-tool progress --total=3 Apples 2>&1 |
      cat -v | perl -pe 's/\^M\K/\n/g'
    Apples:   0% (0/3)^M
    Apples:  33% (1/3)^M
    Apples:  33% (1/3), done.

We're at 1/3, but we're done. No more apples.

This isn't just some hypothetical, e.g. consider neeing to unlink() or
remove files/directories one at a time in a directory and getting the
estimated number from st_nlink (yeah yeah, unportable, but it was the
first thing I thought of).

We might think we're processing 10 entries, but another other processes
might make our progress bar end at more or less than the 100% we
expected. That's OK, not something we should invoke BUG() about.

Similarly, the n=0 being distinguishable from the first
display_progress() is actually useful in practice. It's something I've
seen git.git emit (not recently, I patched the relevant code to emit
more granular progress).

It's useful to know that we're stalling on the setup code before the
for-loop, not on the first item.

  reply	other threads:[~2021-06-08 23:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 14:43 [PATCH 0/2] trivial progress.c API usage fixes Ævar Arnfjörð Bjarmason
2021-06-07 14:43 ` [PATCH 1/2] read-cache.c: don't guard calls to progress.c API Ævar Arnfjörð Bjarmason
2021-06-07 15:28   ` Derrick Stolee
2021-06-07 15:52     ` Ævar Arnfjörð Bjarmason
2021-06-07 16:11       ` Derrick Stolee
2021-06-07 14:43 ` [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling Ævar Arnfjörð Bjarmason
2021-06-07 15:31   ` Derrick Stolee
2021-06-07 15:58     ` Ævar Arnfjörð Bjarmason
2021-06-07 19:20       ` René Scharfe
2021-06-07 19:49         ` Ævar Arnfjörð Bjarmason
2021-06-07 23:41           ` Junio C Hamano
2021-06-08 10:58             ` Ævar Arnfjörð Bjarmason
2021-06-08 16:14               ` René Scharfe
2021-06-08 22:12                 ` Ævar Arnfjörð Bjarmason [this message]
2021-06-10  5:30                   ` Junio C Hamano
2021-06-10 15:14                     ` René Scharfe
2021-06-10 15:14                   ` René Scharfe
2021-06-14 11:07                     ` Ævar Arnfjörð Bjarmason
2021-06-14 17:18                       ` René Scharfe
2021-06-14 19:08                         ` Ævar Arnfjörð Bjarmason
2021-06-15  2:32                           ` Junio C Hamano
2021-06-15 15:14                           ` René Scharfe
2021-06-15 16:46                             ` Ævar Arnfjörð Bjarmason
2021-06-20 12:53                               ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lf7k2bem.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=pclouds@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).