Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* We should add a "git gc --auto" after "git clone" due to commit graph
@ 2018-10-03 13:23 Ævar Arnfjörð Bjarmason
  2018-10-03 13:36 ` SZEDER Gábor
                   ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-03 13:23 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git List, Nguyễn Thái Ngọc Duy

Don't have time to patch this now, but thought I'd send a note / RFC
about this.

Now that we have the commit graph it's nice to be able to set
e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
/etc/gitconfig to apply them to all repos.

But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
until whenever my first "gc" kicks in, which may be quite some time if
I'm just using it passively.

So we should make "git gc --auto" be run on clone, and change the
need_to_gc() / cmd_gc() behavior so that we detect that the
gc.writeCommitGraph=true setting is on, but we have no commit graph, and
then just generate that without doing a full repack.

As an aside such more granular "gc" would be nice for e.g. pack-refs
too. It's possible for us to just have one pack, but to have 100k loose
refs.

It might also be good to have some gc.autoDetachOnClone option and have
it false by default, so we don't have a race condition where "clone
linux && git -C linux tag --contains" is slow because the graph hasn't
been generated yet, and generating the graph initially doesn't take that
long compared to the time to clone a large repo (and on a small one it
won't matter either way).

I was going to say "also for midx", but of course after clone we have
just one pack, so I can't imagine us needing this. But I can see us
having other such optional side-indexes in the future generated by gc,
and they'd also benefit from this.

#leftoverbits

^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH 1/2] One filter per commit
  2018-10-11  1:21                           ` [PATCH 1/2] One filter per commit Jonathan Tan
@ 2018-10-11 12:49 Derrick Stolee
  2018-10-11 19:11 ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan
  -1 siblings, 1 reply; 76+ messages in thread
From: Derrick Stolee @ 2018-10-11 12:49 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: peff, avarab, szeder.dev

On 10/10/2018 9:21 PM, Jonathan Tan wrote:
> diff --git a/commit-graph.c b/commit-graph.c
> index f415d3b41f..90b0b3df90 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -715,13 +715,11 @@ static int add_ref_to_list(const char *refname,
>   static void add_changes_to_bloom_filter(struct bloom_filter *bf,
>   					struct commit *parent,
>   					struct commit *commit,
> +					int index,
>   					struct diff_options *diffopt)
>   {
> -	unsigned char p_c_hash[GIT_MAX_RAWSZ];
>   	int i;
>   
> -	hashxor(parent->object.oid.hash, commit->object.oid.hash, p_c_hash);
> -
>   	diff_tree_oid(&parent->object.oid, &commit->object.oid, "", diffopt);
>   	diffcore_std(diffopt);
>   
> @@ -756,8 +754,8 @@ static void add_changes_to_bloom_filter(struct bloom_filter *bf,
>   			the_hash_algo->update_fn(&ctx, path, p - path);
>   			the_hash_algo->final_fn(name_hash, &ctx);
>   
> -			hashxor(name_hash, p_c_hash, hash);
> -			bloom_filter_add_hash(bf, hash);
> +			hashxor(name_hash, parent->object.oid.hash, hash);
> +			bloom_filter_add_hash(bf, index, hash);
>   		} while (*p);
>   
>   		diff_free_filepair(diff_queued_diff.queue[i]);
[snip]
> @@ -768,11 +766,10 @@ static void add_changes_to_bloom_filter(struct bloom_filter *bf,
>   }
>   
>   static void fill_bloom_filter(struct bloom_filter *bf,
> -				    struct progress *progress)
> +				    struct progress *progress, struct commit **commits, int commit_nr)
>   {
>   	struct rev_info revs;
>   	const char *revs_argv[] = {NULL, "--all", NULL};
> -	struct commit *commit;
>   	int i = 0;
>   
>   	/* We (re-)create the bloom filter from scratch every time for now. */
> @@ -783,18 +780,19 @@ static void fill_bloom_filter(struct bloom_filter *bf,
>   	if (prepare_revision_walk(&revs))
>   		die("revision walk setup failed while preparing bloom filter");
>   
> -	while ((commit = get_revision(&revs))) {
> +	for (i = 0; i < commit_nr; i++) {
> +		struct commit *commit = commits[i];
>   		struct commit_list *parent;
>   
>   		for (parent = commit->parents; parent; parent = parent->next)
> -			add_changes_to_bloom_filter(bf, parent->item, commit,
> +			add_changes_to_bloom_filter(bf, parent->item, commit, i,
>   						    &revs.diffopt);
>   
[snip]
>   
> -		hashxor(pi->name_hash, p_c_hash, hash);
> -		if (bloom_filter_check_hash(&bf, hash)) {
> +		hashxor(pi->name_hash, parent->object.oid.hash, hash);
> +		if (bloom_filter_check_hash(&bf, commit->graph_pos, hash)) {
>   			/*
>   			 * At least one of the interesting pathspecs differs,
>   			 * so we can return early and let the diff machinery
One main benefit of storing on Bloom filter per commit is to avoid 
recomputing hashes at every commit. Currently, this patch only improves 
locality when checking membership at the cost of taking up more space. 
Drop the dependence on the parent oid and then we can save the time 
spent hashing during history queries.

-Stolee

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, back to index

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 13:23 We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
2018-10-03 13:36 ` SZEDER Gábor
2018-10-03 13:42   ` Derrick Stolee
2018-10-03 14:18     ` Ævar Arnfjörð Bjarmason
2018-10-03 14:01   ` Ævar Arnfjörð Bjarmason
2018-10-03 14:17     ` SZEDER Gábor
2018-10-03 14:22       ` Ævar Arnfjörð Bjarmason
2018-10-03 14:53         ` SZEDER Gábor
2018-10-03 15:19           ` Ævar Arnfjörð Bjarmason
2018-10-03 16:59             ` SZEDER Gábor
2018-10-05  6:09               ` Junio C Hamano
2018-10-10 22:07                 ` SZEDER Gábor
2018-10-10 23:01                   ` Ævar Arnfjörð Bjarmason
2018-10-03 19:08           ` Stefan Beller
2018-10-03 19:21             ` Jeff King
2018-10-03 20:35               ` Ævar Arnfjörð Bjarmason
2018-10-03 17:47         ` Stefan Beller
2018-10-03 18:47           ` Ævar Arnfjörð Bjarmason
2018-10-03 18:51             ` Jeff King
2018-10-03 18:59               ` Derrick Stolee
2018-10-03 19:18                 ` Jeff King
2018-10-08 16:41                   ` SZEDER Gábor
2018-10-08 16:57                     ` Derrick Stolee
2018-10-08 18:10                       ` SZEDER Gábor
2018-10-08 18:29                         ` Derrick Stolee
2018-10-09  3:08                           ` Jeff King
2018-10-09 13:48                             ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee
2018-10-09 18:45                               ` Ævar Arnfjörð Bjarmason
2018-10-09 18:46                               ` Jeff King
2018-10-09 19:03                                 ` Derrick Stolee
2018-10-09 21:14                                   ` Jeff King
2018-10-09 23:12                                     ` Bloom Filters Jeff King
2018-10-09 23:13                                       ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Jeff King
2018-10-10  0:48                                         ` Junio C Hamano
2018-10-11  3:13                                           ` Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King
2018-10-10  0:58                                         ` Junio C Hamano
2018-10-11  3:20                                           ` Jeff King
2018-10-11 12:33                                       ` Bloom Filters Derrick Stolee
2018-10-11 13:43                                         ` Jeff King
2018-10-09 21:30                             ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor
2018-10-09 19:34                       ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor
2018-10-09 21:06                           ` Jeff King
2018-10-09 21:37                             ` SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor
2018-10-09 19:47                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-11  1:21                         ` [PATCH 0/2] Per-commit filter proof of concept Jonathan Tan
2018-10-11  1:21                           ` [PATCH 1/2] One filter per commit Jonathan Tan
2018-10-11  1:21                           ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan
2018-10-11  7:37                           ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason
2018-10-15 14:39                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-16  4:45                           ` Junio C Hamano
2018-10-16 11:13                             ` Derrick Stolee
2018-10-16 12:57                               ` Ævar Arnfjörð Bjarmason
2018-10-16 13:03                                 ` Derrick Stolee
2018-10-18  2:00                                 ` Junio C Hamano
2018-10-16 23:41                           ` Jonathan Tan
2018-10-08 23:02                     ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano
2018-10-03 14:32     ` Duy Nguyen
2018-10-03 16:45 ` Duy Nguyen
2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason
2018-10-05 12:05   ` Derrick Stolee
2018-10-05 13:05     ` Ævar Arnfjörð Bjarmason
2018-10-05 13:45       ` Derrick Stolee
2018-10-05 14:04         ` Ævar Arnfjörð Bjarmason
2018-10-05 19:21         ` Jeff King
2018-10-05 19:41           ` Derrick Stolee
2018-10-05 19:47             ` Jeff King
2018-10-05 20:00               ` Derrick Stolee
2018-10-05 20:02                 ` Jeff King
2018-10-05 20:01               ` Ævar Arnfjörð Bjarmason
2018-10-05 20:09                 ` Jeff King
2018-10-11 12:49 [PATCH 1/2] One filter per commit Derrick Stolee
2018-10-11 19:11 ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git