git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: peff@peff.net, avarab@gmail.com, szeder.dev@gmail.com
Subject: Re: [PATCH 1/2] One filter per commit
Date: Thu, 11 Oct 2018 08:49:24 -0400	[thread overview]
Message-ID: <2f49b953-4e07-0eb6-05b8-90d2eb72994b@gmail.com> (raw)
In-Reply-To: <ebf0a811be047e9fd24b61fea3c4a164b3d18dc0.1539219248.git.jonathantanmy@google.com>

On 10/10/2018 9:21 PM, Jonathan Tan wrote:
> diff --git a/commit-graph.c b/commit-graph.c
> index f415d3b41f..90b0b3df90 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -715,13 +715,11 @@ static int add_ref_to_list(const char *refname,
>   static void add_changes_to_bloom_filter(struct bloom_filter *bf,
>   					struct commit *parent,
>   					struct commit *commit,
> +					int index,
>   					struct diff_options *diffopt)
>   {
> -	unsigned char p_c_hash[GIT_MAX_RAWSZ];
>   	int i;
>   
> -	hashxor(parent->object.oid.hash, commit->object.oid.hash, p_c_hash);
> -
>   	diff_tree_oid(&parent->object.oid, &commit->object.oid, "", diffopt);
>   	diffcore_std(diffopt);
>   
> @@ -756,8 +754,8 @@ static void add_changes_to_bloom_filter(struct bloom_filter *bf,
>   			the_hash_algo->update_fn(&ctx, path, p - path);
>   			the_hash_algo->final_fn(name_hash, &ctx);
>   
> -			hashxor(name_hash, p_c_hash, hash);
> -			bloom_filter_add_hash(bf, hash);
> +			hashxor(name_hash, parent->object.oid.hash, hash);
> +			bloom_filter_add_hash(bf, index, hash);
>   		} while (*p);
>   
>   		diff_free_filepair(diff_queued_diff.queue[i]);
[snip]
> @@ -768,11 +766,10 @@ static void add_changes_to_bloom_filter(struct bloom_filter *bf,
>   }
>   
>   static void fill_bloom_filter(struct bloom_filter *bf,
> -				    struct progress *progress)
> +				    struct progress *progress, struct commit **commits, int commit_nr)
>   {
>   	struct rev_info revs;
>   	const char *revs_argv[] = {NULL, "--all", NULL};
> -	struct commit *commit;
>   	int i = 0;
>   
>   	/* We (re-)create the bloom filter from scratch every time for now. */
> @@ -783,18 +780,19 @@ static void fill_bloom_filter(struct bloom_filter *bf,
>   	if (prepare_revision_walk(&revs))
>   		die("revision walk setup failed while preparing bloom filter");
>   
> -	while ((commit = get_revision(&revs))) {
> +	for (i = 0; i < commit_nr; i++) {
> +		struct commit *commit = commits[i];
>   		struct commit_list *parent;
>   
>   		for (parent = commit->parents; parent; parent = parent->next)
> -			add_changes_to_bloom_filter(bf, parent->item, commit,
> +			add_changes_to_bloom_filter(bf, parent->item, commit, i,
>   						    &revs.diffopt);
>   
[snip]
>   
> -		hashxor(pi->name_hash, p_c_hash, hash);
> -		if (bloom_filter_check_hash(&bf, hash)) {
> +		hashxor(pi->name_hash, parent->object.oid.hash, hash);
> +		if (bloom_filter_check_hash(&bf, commit->graph_pos, hash)) {
>   			/*
>   			 * At least one of the interesting pathspecs differs,
>   			 * so we can return early and let the diff machinery
One main benefit of storing on Bloom filter per commit is to avoid 
recomputing hashes at every commit. Currently, this patch only improves 
locality when checking membership at the cost of taking up more space. 
Drop the dependence on the parent oid and then we can save the time 
spent hashing during history queries.

-Stolee

  reply	other threads:[~2018-10-11 12:49 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03 13:23 We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
2018-10-03 13:36 ` SZEDER Gábor
2018-10-03 13:42   ` Derrick Stolee
2018-10-03 14:18     ` Ævar Arnfjörð Bjarmason
2018-10-03 14:01   ` Ævar Arnfjörð Bjarmason
2018-10-03 14:17     ` SZEDER Gábor
2018-10-03 14:22       ` Ævar Arnfjörð Bjarmason
2018-10-03 14:53         ` SZEDER Gábor
2018-10-03 15:19           ` Ævar Arnfjörð Bjarmason
2018-10-03 16:59             ` SZEDER Gábor
2018-10-05  6:09               ` Junio C Hamano
2018-10-10 22:07                 ` SZEDER Gábor
2018-10-10 23:01                   ` Ævar Arnfjörð Bjarmason
2018-10-03 19:08           ` Stefan Beller
2018-10-03 19:21             ` Jeff King
2018-10-03 20:35               ` Ævar Arnfjörð Bjarmason
2018-10-03 17:47         ` Stefan Beller
2018-10-03 18:47           ` Ævar Arnfjörð Bjarmason
2018-10-03 18:51             ` Jeff King
2018-10-03 18:59               ` Derrick Stolee
2018-10-03 19:18                 ` Jeff King
2018-10-08 16:41                   ` SZEDER Gábor
2018-10-08 16:57                     ` Derrick Stolee
2018-10-08 18:10                       ` SZEDER Gábor
2018-10-08 18:29                         ` Derrick Stolee
2018-10-09  3:08                           ` Jeff King
2018-10-09 13:48                             ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee
2018-10-09 18:45                               ` Ævar Arnfjörð Bjarmason
2018-10-09 18:46                               ` Jeff King
2018-10-09 19:03                                 ` Derrick Stolee
2018-10-09 21:14                                   ` Jeff King
2018-10-09 23:12                                     ` Bloom Filters Jeff King
2018-10-09 23:13                                       ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Jeff King
2018-10-10  0:48                                         ` Junio C Hamano
2018-10-11  3:13                                           ` Jeff King
2018-10-09 23:14                                       ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King
2018-10-10  0:58                                         ` Junio C Hamano
2018-10-11  3:20                                           ` Jeff King
2018-10-11 12:33                                       ` Bloom Filters Derrick Stolee
2018-10-11 13:43                                         ` Jeff King
2018-10-09 21:30                             ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor
2018-10-09 19:34                       ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor
2018-10-09 21:06                           ` Jeff King
2018-10-09 21:37                             ` SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor
2018-10-09 19:34                         ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor
2018-10-09 19:47                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-11  1:21                         ` [PATCH 0/2] Per-commit filter proof of concept Jonathan Tan
2018-10-11  1:21                           ` [PATCH 1/2] One filter per commit Jonathan Tan
2018-10-11 12:49                             ` Derrick Stolee [this message]
2018-10-11 19:11                               ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan
2018-10-11  1:21                           ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan
2018-10-11  7:37                           ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason
2018-10-15 14:39                         ` [PATCH 0/4] Bloom filter experiment Derrick Stolee
2018-10-16  4:45                           ` Junio C Hamano
2018-10-16 11:13                             ` Derrick Stolee
2018-10-16 12:57                               ` Ævar Arnfjörð Bjarmason
2018-10-16 13:03                                 ` Derrick Stolee
2018-10-18  2:00                                 ` Junio C Hamano
2018-10-16 23:41                           ` Jonathan Tan
2018-10-08 23:02                     ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano
2018-10-03 14:32     ` Duy Nguyen
2018-10-03 16:45 ` Duy Nguyen
2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason
2018-10-05 12:05   ` Derrick Stolee
2018-10-05 13:05     ` Ævar Arnfjörð Bjarmason
2018-10-05 13:45       ` Derrick Stolee
2018-10-05 14:04         ` Ævar Arnfjörð Bjarmason
2018-10-05 19:21         ` Jeff King
2018-10-05 19:41           ` Derrick Stolee
2018-10-05 19:47             ` Jeff King
2018-10-05 20:00               ` Derrick Stolee
2018-10-05 20:02                 ` Jeff King
2018-10-05 20:01               ` Ævar Arnfjörð Bjarmason
2018-10-05 20:09                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f49b953-4e07-0eb6-05b8-90d2eb72994b@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).