From: "Ævar Arnfjörð Bjarmason" <firstname.lastname@example.org> To: Derrick Stolee <email@example.com> Cc: "Git List" <firstname.lastname@example.org>, "Nguyễn Thái Ngọc Duy" <email@example.com>, "SZEDER Gábor" <firstname.lastname@example.org>, "Jeff King" <email@example.com>, "Stefan Beller" <firstname.lastname@example.org> Subject: Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph Date: Fri, 05 Oct 2018 16:04:13 +0200 Message-ID: <email@example.com> (raw) In-Reply-To: <firstname.lastname@example.org> On Fri, Oct 05 2018, Derrick Stolee wrote: > On 10/5/2018 9:05 AM, Ævar Arnfjörð Bjarmason wrote: >> On Fri, Oct 05 2018, Derrick Stolee wrote: >> >>> On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: >>>> I don't have time to polish this up for submission now, but here's a WIP >>>> patch that implements this, highlights: >>>> >>>> * There's a gc.clone.autoDetach=false default setting which overrides >>>> gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a >>>> --cloning option to indicate this). >>> I'll repeat that it could make sense to do the same thing on clone >>> _and_ fetch. Perhaps a "--post-fetch" flag would be good here to >>> communicate that we just downloaded a pack from a remote. >> I don't think that makes sense, but let's talk about why, because maybe >> I've missed something, you're certainly more familiar with the >> commit-graph than I am. >> >> The reason to do it on clone as a special-case or when the file is >> missing, is because we know the file is desired (via the GC config), and >> presumably is expected to help performance, and we have 0% of it. So by >> going from 0% to 100% on clone we'll get fast --contains and other >> goodies the graph helps with. >> >> But when we're doing a fetch, or really anything else that runs "git gc >> --auto" we can safely assume that we have a recent enough graph, because >> it will have been run whenever auto-gc kicked in. >> >> I.e.: >> >> # Slow, if we assume background forked commit-graph generation >> # (which I'm avoiding) >> git clone x && cd x && git tag --contains >> # Fast enough, since we have an existing commit-graph >> cd x && git fetch && git tag --contains >> >> I *do* think it might make sense to in general split off parts of "gc >> --auto" that we'd like to be more aggressive about, simply because the >> ratio of how long it takes to do, and how much it helps with performance >> makes more sense than a full repack, which is what the current heuristic >> is based on. >> >> And maybe when we run in that mode we should run in the foreground, but >> I don't see why git-fetch should be a special case there, and in this >> regard, the gc.clone.autoDetach=false setting I've made doesn't make >> much sence. I.e. maybe we should also skip forking to the background in >> such a mode when we trigger such a "mini gc" via git-commit or whatever. > > My misunderstanding was that your proposed change to gc computes the > commit-graph in either of these two cases: > > (1) The auto-GC threshold is met. > > (2) There is no commit-graph file. > > And what I hope to have instead of (2) is (3): > > (3) The commit-graph file is "sufficiently behind" the tip refs. > > This condition is intentionally vague at the moment. It could be that > we hint that (3) holds by saying "--post-fetch" (i.e. "We just > downloaded a pack, and it probably contains a lot of new commits") or > we could create some more complicated condition based on counting > reachable commits with infinite generation number (the number of > commits not in the commit-graph file). > > I like that you are moving forward to make the commit-graph be written > more frequently, but I'm trying to push us in a direction of writing > it even more often than your proposed strategy. We should avoid > creating too many orthogonal conditions that trigger the commit-graph > write, which is why I'm pushing on your design here. > > Anyone else have thoughts on this direction? Ah. I see. I think #3 makes perfect sense, but probably makes sense to do as a follow-up, or maybe you'd like to stick a patch on top of the series I have when I send it. I don't know how to write the "I'm not quite happy about the commit graph" code :) What I will do is refactor gc.c a bit and leave it in a state where it's going to be really easy to change the existing "we have no commit graph, and thus should do the optimization step" to have some more complex condition instead of "we have no commit graph", i.e. your "we just grabbed a lot of data". Also, I'll drop the gc.clone.autoDetach=false setting and name it something more general. maybe gc.AutoDetachOnBigOptimization=false? Anyway something more generic so that "clone" will always pass in some option saying "expect a large % commit graph update" (100% in its case), and then in "fetch" we could have some detection of how big what we just got from the server is, and do the same. This seems to be to be the most general thing that would make sense, and could also be extended e.g. to "git commit" and other users of gc --auto. If I started with a README file in an empty repo, and then made a commit where I added 1 million files all in one commit, in which case we'd (depending on that setting) also block in the foreground and generate the commit-graph.
next prev parent reply index Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-03 13:23 Ævar Arnfjörð Bjarmason 2018-10-03 13:36 ` SZEDER Gábor 2018-10-03 13:42 ` Derrick Stolee 2018-10-03 14:18 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:01 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:17 ` SZEDER Gábor 2018-10-03 14:22 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:53 ` SZEDER Gábor 2018-10-03 15:19 ` Ævar Arnfjörð Bjarmason 2018-10-03 16:59 ` SZEDER Gábor 2018-10-05 6:09 ` Junio C Hamano 2018-10-10 22:07 ` SZEDER Gábor 2018-10-10 23:01 ` Ævar Arnfjörð Bjarmason 2018-10-03 19:08 ` Stefan Beller 2018-10-03 19:21 ` Jeff King 2018-10-03 20:35 ` Ævar Arnfjörð Bjarmason 2018-10-03 17:47 ` Stefan Beller 2018-10-03 18:47 ` Ævar Arnfjörð Bjarmason 2018-10-03 18:51 ` Jeff King 2018-10-03 18:59 ` Derrick Stolee 2018-10-03 19:18 ` Jeff King 2018-10-08 16:41 ` SZEDER Gábor 2018-10-08 16:57 ` Derrick Stolee 2018-10-08 18:10 ` SZEDER Gábor 2018-10-08 18:29 ` Derrick Stolee 2018-10-09 3:08 ` Jeff King 2018-10-09 13:48 ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee 2018-10-09 18:45 ` Ævar Arnfjörð Bjarmason 2018-10-09 18:46 ` Jeff King 2018-10-09 19:03 ` Derrick Stolee 2018-10-09 21:14 ` Jeff King 2018-10-09 23:12 ` Bloom Filters Jeff King 2018-10-09 23:13 ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King 2018-10-09 23:14 ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Jeff King 2018-10-10 0:48 ` Junio C Hamano 2018-10-11 3:13 ` Jeff King 2018-10-09 23:14 ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King 2018-10-10 0:58 ` Junio C Hamano 2018-10-11 3:20 ` Jeff King 2018-10-11 12:33 ` Bloom Filters Derrick Stolee 2018-10-11 13:43 ` Jeff King 2018-10-09 21:30 ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor 2018-10-09 19:34 ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor 2018-10-09 19:34 ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor 2018-10-09 19:34 ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor 2018-10-09 21:06 ` Jeff King 2018-10-09 21:37 ` SZEDER Gábor 2018-10-09 19:34 ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor 2018-10-09 19:34 ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor 2018-10-09 19:47 ` [PATCH 0/4] Bloom filter experiment Derrick Stolee 2018-10-11 1:21 ` [PATCH 0/2] Per-commit filter proof of concept Jonathan Tan 2018-10-11 1:21 ` [PATCH 1/2] One filter per commit Jonathan Tan 2018-10-11 1:21 ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan 2018-10-11 7:37 ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason 2018-10-15 14:39 ` [PATCH 0/4] Bloom filter experiment Derrick Stolee 2018-10-16 4:45 ` Junio C Hamano 2018-10-16 11:13 ` Derrick Stolee 2018-10-16 12:57 ` Ævar Arnfjörð Bjarmason 2018-10-16 13:03 ` Derrick Stolee 2018-10-18 2:00 ` Junio C Hamano 2018-10-16 23:41 ` Jonathan Tan 2018-10-08 23:02 ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano 2018-10-03 14:32 ` Duy Nguyen 2018-10-03 16:45 ` Duy Nguyen 2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason 2018-10-05 12:05 ` Derrick Stolee 2018-10-05 13:05 ` Ævar Arnfjörð Bjarmason 2018-10-05 13:45 ` Derrick Stolee 2018-10-05 14:04 ` Ævar Arnfjörð Bjarmason [this message] 2018-10-05 19:21 ` Jeff King 2018-10-05 19:41 ` Derrick Stolee 2018-10-05 19:47 ` Jeff King 2018-10-05 20:00 ` Derrick Stolee 2018-10-05 20:02 ` Jeff King 2018-10-05 20:01 ` Ævar Arnfjörð Bjarmason 2018-10-05 20:09 ` Jeff King 2018-10-11 12:49 [PATCH 1/2] One filter per commit Derrick Stolee 2018-10-11 19:11 ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Mailing List Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/git/0 git/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 git git/ https://lore.kernel.org/git \ email@example.com public-inbox-index git Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git