git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Ramsay Jones <ramsay@ramsayjones.plus.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH v3 6/7] fsck: use oidset for skiplist
Date: Tue, 28 Aug 2018 16:29:09 +0200	[thread overview]
Message-ID: <c356f239-f3f4-242b-fb75-95a4ccbe374b@web.de> (raw)
In-Reply-To: <87lg8refcr.fsf@evledraar.gmail.com>

Am 27.08.2018 um 22:15 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:
> 
>> From: René Scharfe <l.s.r@web.de>
>>
>> Object IDs to skip are stored in a shared static oid_array.  Lookups do
>> a binary search on the sorted array.  The code checks if the object IDs
>> are already in the correct order while loading and skips sorting in that
>> case.  Lookups are done before reporting a (non-fatal) corruption and
>> before checking .gitmodules files.
>>
>> Simplify the code by using an oidset instead.  Memory usage is a bit
>> higher, but we don't need to worry about any sort order anymore.  Embed
>> the oidset into struct fsck_options to make its ownership clear (no
>> hidden sharing) and avoid unnecessary pointer indirection.
>>
>> Performance on repositories with a low number of reported issues and
>> .gitmodules files (i.e. the usual case) won't be affected much.  The
>> oidset should be a bit quicker with higher numbers of bad objects in
>> the skipList.
> 
> I didn't change this commit message at all, but FWIW this still has me
> somewhat confused. What is the interaction with .gitmodules being
> described here? You also mentioned it in
> https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
> (but I don't get that either)

The skipList is consulted before checking .gitmodules blobs, since
fb16287719 (fsck: check skiplist for object in fsck_blob()).

> Does that just mean that when cloning with --recursive with
> transfer.fsckObjects=true we'll re-read the file for each "clone"
> invocation, both for the main project and everything listed in
> .gitmodules?

That is probably true, but I didn't mean that.  I was only talking
about when an object is looked up in the skiplist (oid_array/oidset).

> If so, I think something like this commit message would be clearer:
> 
>     fsck: use oidset instead of oid_array for skipList
> 
>     Change the implementation of the skipList feature to use oidset
>     instead of oid_array to store SHA-1s for later lookup.
> 
>     This list is parsed once on startup by fsck, fetch-pack or
>     receive-pack depending on the *.skipList config in use, so for fetch
>     it's currently re-parsed for each submodule fetch.

OK; the patch doesn't change that, though.  Mentioning it sound like
a good idea if the load takes a significant amount of time -- I
didn't measure this separately, so perhaps this needs to be explored
further if people use big skipLists.

>     Memory usage is a bit higher, but we don't need to keep track of the
>     sort order anymore. Embed the oidset into struct fsck_options to
>     make its ownership clear (no hidden sharing) and avoid unnecessary
>     pointer indirection.
> 
> Then let's just attach the test program you wrote in
> https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
> and note the results and let them speak for themselves.

I was surprised that such a big repo can be mocked up relatively
quickly, but the numbers are a bit underwhelming -- there is not a
lot of change.  The script is not too expensive, so I don't mind
adding it.

> I can do all that if that seems fine to you. I also notice that the test
> case only sets up a case where all the items on the skip list are bad
> commits in the repo, it would be interesting to test with a few needles
> in a large haystack (I can modify it to do that...).

I tried something like this first, and its results are even more
boring.  Since skipList lookups are done only for bad objects (and
.gitmodules) you won't see  any difference at all.  Having only bad
objects is the edge case, this test being designed to highlight the
performance of the skipList implementation.

René

  parent reply	other threads:[~2018-08-28 14:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:59       ` René Scharfe
2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:29     ` René Scharfe [this message]
2018-08-27 19:43 ` [PATCH v3 7/7] " Ævar Arnfjörð Bjarmason
2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c356f239-f3f4-242b-fb75-95a4ccbe374b@web.de \
    --to=l.s.r@web.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).