git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "Git List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Jeff King" <peff@peff.net>, "Stefan Beller" <sbeller@google.com>
Subject: Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph
Date: Fri, 05 Oct 2018 15:05:56 +0200	[thread overview]
Message-ID: <87ftxkh7bf.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <dcb8f115-ce3c-64fa-50cc-dd03569c0164@gmail.com>


On Fri, Oct 05 2018, Derrick Stolee wrote:

> On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:
>> I don't have time to polish this up for submission now, but here's a WIP
>> patch that implements this, highlights:
>>
>>   * There's a gc.clone.autoDetach=false default setting which overrides
>>     gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
>>     --cloning option to indicate this).
>
> I'll repeat that it could make sense to do the same thing on clone
> _and_ fetch. Perhaps a "--post-fetch" flag would be good here to
> communicate that we just downloaded a pack from a remote.

I don't think that makes sense, but let's talk about why, because maybe
I've missed something, you're certainly more familiar with the
commit-graph than I am.

The reason to do it on clone as a special-case or when the file is
missing, is because we know the file is desired (via the GC config), and
presumably is expected to help performance, and we have 0% of it. So by
going from 0% to 100% on clone we'll get fast --contains and other
goodies the graph helps with.

But when we're doing a fetch, or really anything else that runs "git gc
--auto" we can safely assume that we have a recent enough graph, because
it will have been run whenever auto-gc kicked in.

I.e.:

    # Slow, if we assume background forked commit-graph generation
    # (which I'm avoiding)
    git clone x && cd x && git tag --contains
    # Fast enough, since we have an existing commit-graph
    cd x && git fetch && git tag --contains

I *do* think it might make sense to in general split off parts of "gc
--auto" that we'd like to be more aggressive about, simply because the
ratio of how long it takes to do, and how much it helps with performance
makes more sense than a full repack, which is what the current heuristic
is based on.

And maybe when we run in that mode we should run in the foreground, but
I don't see why git-fetch should be a special case there, and in this
regard, the gc.clone.autoDetach=false setting I've made doesn't make
much sence. I.e. maybe we should also skip forking to the background in
such a mode when we trigger such a "mini gc" via git-commit or whatever.

>>   * A clone of say git.git with gc.writeCommitGraph=true looks like:
>>
>>     [...]
>>     Receiving objects: 100% (255262/255262), 100.49 MiB | 17.78 MiB/s, done.
>>     Resolving deltas: 100% (188947/188947), done.
>>     Computing commit graph generation numbers: 100% (55210/55210), done.
>
> This looks like good UX. Thanks for the progress here!
>
>>   * The 'git gc --auto' command also knows to (only) run the commit-graph
>>     (and space is left for future optimization steps) if general GC isn't
>>     needed, but we need "optimization":
>>
>>     $ rm .git/objects/info/commit-graph; ~/g/git/git --exec-path=$PWD -c gc.writeCommitGraph=true -c gc.autoDetach=false gc --auto;
>>     Annotating commits in commit graph: 341229, done.
>>     Computing commit graph generation numbers: 100% (165969/165969), done.
>>     $
>
> Will this also trigger a full commit-graph rewrite on every 'git
> commit' command?

Nope, because "git commit" can safely be assumed to have some
commit-graph anyway, and I'm just special casing the case where it
doesn't exist.

But if it doesn't exist and you do a "git commit" then "gc --auto" will
be run, and we'll fork to the background and generate it...

>  Or is there some way we can compute the staleness of
> the commit-graph in order to only update if we get too far ahead?
> Previously, this was solved by relying on the auto-GC threshold.

So re the "I don't think that makes sense..." at the start of my E-Mail,
isn't it fine to rely on the default thresholds here, or should we be
more aggressive?

>>   * The patch to gc.c looks less scary with -w, most of it is indenting
>>     the existing pack-refs etc. with a "!auto_gc || should_gc" condition.
>>
>>   * I added a commit_graph_exists() exists function and only care if I
>>     get ENOENT for the purposes of this gc mode. This would need to be
>>     tweaked for the incremental mode Derrick talks about, but if we just
>>     set "should_optimize" that'll also work as far as gc --auto is
>>     concerned (e.g. on fetch, am etc.)
>
> The incremental mode would operate the same as split-index, which
> means we will still look for .git/objects/info/commit-graph. That file
> may point us to more files.

Ah!

>> +int commit_graph_exists(const char *graph_file)
>> +{
>> +	struct stat st;
>> +	if (stat(graph_file, &st)) {
>> +		if (errno == ENOENT)
>> +			return 0;
>> +		else
>> +			return -1;
>> +	}
>> +	return 1;
>> +}
>> +
>
> This method serves a very similar purpose to
> generation_numbers_enabled(), except your method only cares about the
> file existing. It ignores information like `core.commitGraph`, which
> should keep us from doing anything with the commit-graph file if
> false.
>
> Nothing about your method is specific to the commit-graph file, since
> you provide a filename as a parameter. It could easily be "int
> file_exists(const char *filename)".

I was being paranoid about not doing this if it didn't exist but it was
something else than ENOENT (e.g. permission error?), but in retrospect
that's silly. I'll drop this helper and just use file_exists().

  reply	other threads:[~2018-10-05 13:06 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03 13:23 We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
2018-10-03 13:36 ` SZEDER Gábor
2018-10-03 13:42   ` Derrick Stolee
2018-10-03 14:18     ` Ævar Arnfjörð Bjarmason
2018-10-03 14:01   ` Ævar Arnfjörð Bjarmason
2018-10-03 14:17     ` SZEDER Gábor
2018-10-03 14:22       ` Ævar Arnfjörð Bjarmason
2018-10-03 14:53         ` SZEDER Gábor
2018-10-03 15:19           ` Ævar Arnfjörð Bjarmason
2018-10-03 16:59             ` SZEDER Gábor
2018-10-05  6:09               ` Junio C Hamano
2018-10-10 22:07                 ` SZEDER Gábor
2018-10-10 23:01                   ` Ævar Arnfjörð Bjarmason
2018-10-03 19:08           ` Stefan Beller
2018-10-03 19:21             ` Jeff King
2018-10-03 20:35               ` Ævar Arnfjörð Bjarmason
2018-10-03 17:47         ` Stefan Beller
2018-10-03 18:47           ` Ævar Arnfjörð Bjarmason
2018-10-03 18:51             ` Jeff King
2018-10-03 18:59               ` Derrick Stolee
2018-10-03 19:18                 ` Jeff King
2018-10-08 16:41                   ` SZEDER Gábor
2018-10-08 16:57                     ` Derrick Stolee
2018-10-08 18:10                       ` SZEDER Gábor
2018-10-08 18:29                         ` Derrick Stolee
2018-10-09  3:08                           ` Jeff King
2018-10-09 13:48                             ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee
2018-10-09 18:45                               ` Ævar Arnfjörð Bjarmason
2018-10-09 18:46                               ` Jeff King
2018-10-09 19:03                                 ` Derrick Stolee
2018-10-09 21:14                                   ` Jeff King
2018-10-09 23:12                                     ` Bloom Filters Jeff King
2018-10-09 23:13                                       ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Jeff King
2018-10-10  0:48                                         ` Junio C Hamano
2018-10-11  3:13                                           ` Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King
2018-10-10  0:58                                         ` Junio C Hamano
2018-10-11  3:20                                           ` Jeff King
2018-10-11 12:33                                       ` Bloom Filters Derrick Stolee
2018-10-11 13:43                                         ` Jeff King
2018-10-09 21:30                             ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor
2018-10-09 19:34                       ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor
2018-10-09 21:06                           ` Jeff King
2018-10-09 21:37                             ` SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor
2018-10-09 19:47                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-11  1:21                         ` [PATCH 0/2] Per-commit filter proof of concept Jonathan Tan
2018-10-11  1:21                           ` [PATCH 1/2] One filter per commit Jonathan Tan
2018-10-11 12:49                             ` Derrick Stolee
2018-10-11 19:11                               ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan
2018-10-11  1:21                           ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan
2018-10-11  7:37                           ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason
2018-10-15 14:39                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-16  4:45                           ` Junio C Hamano
2018-10-16 11:13                             ` Derrick Stolee
2018-10-16 12:57                               ` Ævar Arnfjörð Bjarmason
2018-10-16 13:03                                 ` Derrick Stolee
2018-10-18  2:00                                 ` Junio C Hamano
2018-10-16 23:41                           ` Jonathan Tan
2018-10-08 23:02                     ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano
2018-10-03 14:32     ` Duy Nguyen
2018-10-03 16:45 ` Duy Nguyen
2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason
2018-10-05 12:05   ` Derrick Stolee
2018-10-05 13:05     ` Ævar Arnfjörð Bjarmason [this message]
2018-10-05 13:45       ` Derrick Stolee
2018-10-05 14:04         ` Ævar Arnfjörð Bjarmason
2018-10-05 19:21         ` Jeff King
2018-10-05 19:41           ` Derrick Stolee
2018-10-05 19:47             ` Jeff King
2018-10-05 20:00               ` Derrick Stolee
2018-10-05 20:02                 ` Jeff King
2018-10-05 20:01               ` Ævar Arnfjörð Bjarmason
2018-10-05 20:09                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ftxkh7bf.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).