* [GSoC Patch 0/3] Move generation, graph_pos to a slab @ 2020-06-04 7:27 Abhishek Kumar 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar ` (6 more replies) 0 siblings, 7 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-04 7:27 UTC (permalink / raw) To: git; +Cc: stolee, jnareb The struct commit is used in many contexts. However, members generation and graph_pos are only used for commit-graph related operations and otherwise waste memory. This wastage would have been more pronounced as transistion to generation number v2, which uses 64-bit generation number instead of current 32-bits. The third patch ("commit: convert commit->graph_pos to a slab", 2020-06-04) is currently failing diff-submodule related tests (t4041, t4059 and t4060) for gcc [1]. I am going to send a second version soon, fixing that. [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189 Abhishek Kumar (3): commit: introduce helpers for generation slab commit: convert commit->generation to a slab commit: convert commit->graph_pos to a slab alloc.c | 2 - blame.c | 2 +- bloom.c | 6 +- commit-graph.c | 116 +++++++++++++++++++++------- commit-graph.h | 8 ++ commit-reach.c | 50 ++++++------ commit.c | 6 +- commit.h | 6 -- contrib/coccinelle/generation.cocci | 12 +++ contrib/coccinelle/graph_pos.cocci | 12 +++ revision.c | 16 ++-- 11 files changed, 158 insertions(+), 78 deletions(-) create mode 100644 contrib/coccinelle/generation.cocci create mode 100644 contrib/coccinelle/graph_pos.cocci -- 2.27.0 ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSoC Patch 1/3] commit: introduce helpers for generation slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar @ 2020-06-04 7:27 ` Abhishek Kumar 2020-06-04 14:36 ` Derrick Stolee ` (2 more replies) 2020-06-04 7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar ` (5 subsequent siblings) 6 siblings, 3 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-04 7:27 UTC (permalink / raw) To: git; +Cc: stolee, jnareb The struct member generation refers to "generation number" (or more broadly, a reachablity index value) used by commit-graph to reduce time taken to walk commits. However, generation is not useful in other contexts and bloats the struct. Let's move it to a commit-slab and shrink the struct by four bytes. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- commit-graph.c | 27 +++++++++++++++++++++++++++ commit-graph.h | 5 +++++ commit.h | 3 --- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index e3420ddcbf..63f419048d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb) commit_pos_at(&commit_pos, b); } +define_commit_slab(generation_slab, uint32_t); +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab); + +uint32_t generation(const struct commit *c) +{ + uint32_t *gen = generation_slab_peek(&generation_slab, c); + + return gen ? *gen : GENERATION_NUMBER_INFINITY; +} + +static void set_generation(const struct commit *c, const uint32_t generation) +{ + unsigned int i = generation_slab.slab_count; + uint32_t *gen = generation_slab_at(&generation_slab, c); + + /* + * commit-slab initializes with zero, overwrite this with + * GENERATION_NUMBER_INFINITY + */ + for (; i < generation_slab.slab_count; ++i) { + memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY, + generation_slab.slab_size * sizeof(uint32_t)); + } + + *gen = generation; +} + static int commit_gen_cmp(const void *va, const void *vb) { const struct commit *a = *(const struct commit **)va; diff --git a/commit-graph.h b/commit-graph.h index 4212766a4f..653bd041ad 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -8,6 +8,10 @@ #include "object-store.h" #include "oidset.h" +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF +#define GENERATION_NUMBER_MAX 0x3FFFFFFF +#define GENERATION_NUMBER_ZERO 0 + #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *); */ void disable_commit_graph(struct repository *r); +uint32_t generation(const struct commit *c); #endif diff --git a/commit.h b/commit.h index 1b2dea5d85..cc610400d5 100644 --- a/commit.h +++ b/commit.h @@ -11,9 +11,6 @@ #include "commit-slab.h" #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF -#define GENERATION_NUMBER_MAX 0x3FFFFFFF -#define GENERATION_NUMBER_ZERO 0 struct commit_list { struct commit *item; -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 1/3] commit: introduce helpers for generation slab 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar @ 2020-06-04 14:36 ` Derrick Stolee 2020-06-04 17:35 ` Junio C Hamano 2020-06-05 23:23 ` Jakub Narębski 2 siblings, 0 replies; 39+ messages in thread From: Derrick Stolee @ 2020-06-04 14:36 UTC (permalink / raw) To: Abhishek Kumar, git; +Cc: jnareb On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > The struct member generation refers to "generation number" (or more > broadly, a reachablity index value) used by commit-graph to reduce time > taken to walk commits. However, generation is not useful in other > contexts and bloats the struct. > > Let's move it to a commit-slab and shrink the struct by four bytes. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 27 +++++++++++++++++++++++++++ > commit-graph.h | 5 +++++ > commit.h | 3 --- > 3 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index e3420ddcbf..63f419048d 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb) > commit_pos_at(&commit_pos, b); > } > > +define_commit_slab(generation_slab, uint32_t); > +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab); > + > +uint32_t generation(const struct commit *c) > +{ > + uint32_t *gen = generation_slab_peek(&generation_slab, c); > + > + return gen ? *gen : GENERATION_NUMBER_INFINITY; > +} This is good: if we don't have the value, then use INFINITY. In the header file, perhaps include a warning comment that a caller _must_ first parse the commit or else we have no guarantee that the generation slab is populated. This matches the current expectations before accessing the generation member. > +static void set_generation(const struct commit *c, const uint32_t generation) > +{ > + unsigned int i = generation_slab.slab_count; > + uint32_t *gen = generation_slab_at(&generation_slab, c); > + > + /* > + * commit-slab initializes with zero, overwrite this with > + * GENERATION_NUMBER_INFINITY > + */ > + for (; i < generation_slab.slab_count; ++i) { > + memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY, > + generation_slab.slab_size * sizeof(uint32_t)); > + } Here is an example where combining the graph_pos and generation slabs into one would be helpful. The only reason the generation would be INFINITY is if graph_pos is COMMIT_NOT_FROM_GRAPH. If the two values are side-by-side, we could just check graph_pos first and return INFINITY instead of paying this initialization cost as the slab grows. I would also like to avoid initializing the slab if there is no commit-graph present. I wonder if we can populate the slab while parsing the commit-graph and check here if the slab is NULL before doing any other logic? (I'm not sure if this is possible, but it would be nice.) > diff --git a/commit-graph.h b/commit-graph.h > index 4212766a4f..653bd041ad 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -8,6 +8,10 @@ > #include "object-store.h" > #include "oidset.h" > > +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > +#define GENERATION_NUMBER_MAX 0x3FFFFFFF > +#define GENERATION_NUMBER_ZERO 0 > + > #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" > #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" > #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" > @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *); > */ > void disable_commit_graph(struct repository *r); > > +uint32_t generation(const struct commit *c); > #endif > diff --git a/commit.h b/commit.h > index 1b2dea5d85..cc610400d5 100644 > --- a/commit.h > +++ b/commit.h > @@ -11,9 +11,6 @@ > #include "commit-slab.h" > > #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF > -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > -#define GENERATION_NUMBER_MAX 0x3FFFFFFF > -#define GENERATION_NUMBER_ZERO 0 I appreciate that you are able to relocate these constants to a more appropriate location. Thanks, -Stolee ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 1/3] commit: introduce helpers for generation slab 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar 2020-06-04 14:36 ` Derrick Stolee @ 2020-06-04 17:35 ` Junio C Hamano 2020-06-05 23:23 ` Jakub Narębski 2 siblings, 0 replies; 39+ messages in thread From: Junio C Hamano @ 2020-06-04 17:35 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, stolee, jnareb Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > The struct member generation refers to "generation number" (or more > broadly, a reachablity index value) used by commit-graph to reduce time > taken to walk commits. However, generation is not useful in other > contexts and bloats the struct. > > Let's move it to a commit-slab and shrink the struct by four bytes. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 27 +++++++++++++++++++++++++++ > commit-graph.h | 5 +++++ > commit.h | 3 --- > 3 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index e3420ddcbf..63f419048d 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb) > commit_pos_at(&commit_pos, b); > } > > +define_commit_slab(generation_slab, uint32_t); > +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab); > + > +uint32_t generation(const struct commit *c) > +{ > + uint32_t *gen = generation_slab_peek(&generation_slab, c); > + > + return gen ? *gen : GENERATION_NUMBER_INFINITY; > +} > + > +static void set_generation(const struct commit *c, const uint32_t generation) > +{ > + unsigned int i = generation_slab.slab_count; > + uint32_t *gen = generation_slab_at(&generation_slab, c); > + > + /* > + * commit-slab initializes with zero, overwrite this with > + * GENERATION_NUMBER_INFINITY > + */ > + for (; i < generation_slab.slab_count; ++i) { Style: favor post-increment over pre-increment when there is no difference, especially when updating the loop control in for() loop. > + memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY, > + generation_slab.slab_size * sizeof(uint32_t)); > + } > + > + *gen = generation; > +} > + > static int commit_gen_cmp(const void *va, const void *vb) > { > const struct commit *a = *(const struct commit **)va; > diff --git a/commit-graph.h b/commit-graph.h > index 4212766a4f..653bd041ad 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -8,6 +8,10 @@ > #include "object-store.h" > #include "oidset.h" > > +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > +#define GENERATION_NUMBER_MAX 0x3FFFFFFF > +#define GENERATION_NUMBER_ZERO 0 > + Makes sense to move it from commit.h, I guess. > #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" > #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" > #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" > @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *); > */ > void disable_commit_graph(struct repository *r); > > +uint32_t generation(const struct commit *c); > #endif > diff --git a/commit.h b/commit.h > index 1b2dea5d85..cc610400d5 100644 > --- a/commit.h > +++ b/commit.h > @@ -11,9 +11,6 @@ > #include "commit-slab.h" > > #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF > -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > -#define GENERATION_NUMBER_MAX 0x3FFFFFFF > -#define GENERATION_NUMBER_ZERO 0 > > struct commit_list { > struct commit *item; ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 1/3] commit: introduce helpers for generation slab 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar 2020-06-04 14:36 ` Derrick Stolee 2020-06-04 17:35 ` Junio C Hamano @ 2020-06-05 23:23 ` Jakub Narębski 2 siblings, 0 replies; 39+ messages in thread From: Jakub Narębski @ 2020-06-05 23:23 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, Derrick Stolee Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > The struct member generation refers to "generation number" (or more Again, a minor thing: 'generation'. > broadly, a reachablity index value) used by commit-graph to reduce time > taken to walk commits. However, generation is not useful in other > contexts and bloats the struct. > > Let's move it to a commit-slab and shrink the struct by four bytes. It looks like the description is from earlier version of the commit, before it was split -- because this commit does not remove 'generation' member from the 'struct commit', actually. This commit is about creating helper functions. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 27 +++++++++++++++++++++++++++ > commit-graph.h | 5 +++++ > commit.h | 3 --- > 3 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index e3420ddcbf..63f419048d 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb) > commit_pos_at(&commit_pos, b); > } > > +define_commit_slab(generation_slab, uint32_t); > +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab); All right, we need this for the following helper functions to work. We might want to encapsulate all commit-graph data together, in a single struct (e.g. as 'struct commit_graph_data' instead of uint32_t here). On the other hand other data is stored on slab often as separate scalar data (contains_cache, commit_seen, indegree_slab, author_date_slab, commit_base, commit_pos), but not always; sometimes it is a struct (bloom_filter_slab, buffer_slab, commit_rev_name), sometimes it is an array (commit_depth, ref_bitmap, commit_weight), and sometimes it is an array/list of structs or pointer to struct (commit_names, commit_name_slab, saved_parents, blame_suspects, commit_todo_item). > + > +uint32_t generation(const struct commit *c) > +{ > + uint32_t *gen = generation_slab_peek(&generation_slab, c); > + > + return gen ? *gen : GENERATION_NUMBER_INFINITY; > +} All right, this is a synthetic getter using the fact that commits outside the commit-graph should get GENERATION_NUMBER_INFINITY (because [effective] commit-graph is closed under reachability, is full DAG). Should we have something like that for 'graph_pos' and COMMIT_NOT_FROM_GRAPH? > + > +static void set_generation(const struct commit *c, const uint32_t generation) > +{ > + unsigned int i = generation_slab.slab_count; > + uint32_t *gen = generation_slab_at(&generation_slab, c); > + > + /* > + * commit-slab initializes with zero, overwrite this with > + * GENERATION_NUMBER_INFINITY > + */ > + for (; i < generation_slab.slab_count; ++i) { > + memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY, > + generation_slab.slab_size * sizeof(uint32_t)); > + } > + > + *gen = generation; > +} All right. I wonder if putting 'generation' and 'graph_pos' on the slab together, gathered in 'struct commit_graph_data' would make this helper more complex... > + > static int commit_gen_cmp(const void *va, const void *vb) > { > const struct commit *a = *(const struct commit **)va; > diff --git a/commit-graph.h b/commit-graph.h > index 4212766a4f..653bd041ad 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -8,6 +8,10 @@ > #include "object-store.h" > #include "oidset.h" > > +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > +#define GENERATION_NUMBER_MAX 0x3FFFFFFF > +#define GENERATION_NUMBER_ZERO 0 > + > #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" > #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" > #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" > @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *); > */ > void disable_commit_graph(struct repository *r); > > +uint32_t generation(const struct commit *c); > #endif > diff --git a/commit.h b/commit.h > index 1b2dea5d85..cc610400d5 100644 > --- a/commit.h > +++ b/commit.h > @@ -11,9 +11,6 @@ > #include "commit-slab.h" > > #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF > -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > -#define GENERATION_NUMBER_MAX 0x3FFFFFFF > -#define GENERATION_NUMBER_ZERO 0 > > struct commit_list { > struct commit *item; Why this change? Best, -- Jakub Narębski ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSoC Patch 2/3] commit: convert commit->generation to a slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar @ 2020-06-04 7:27 ` Abhishek Kumar 2020-06-04 14:27 ` Derrick Stolee ` (2 more replies) 2020-06-04 7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar ` (4 subsequent siblings) 6 siblings, 3 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-04 7:27 UTC (permalink / raw) To: git; +Cc: stolee, jnareb In this commit, we will use the generation slab helpers introduced in last commit and replace existing uses of commit->generation using 'contrib/coccinelle/generation.cocci' Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- alloc.c | 1 - blame.c | 2 +- commit-graph.c | 39 +++++++++++----------- commit-reach.c | 50 ++++++++++++++--------------- commit.c | 4 +-- commit.h | 1 - contrib/coccinelle/generation.cocci | 12 +++++++ revision.c | 16 ++++----- 8 files changed, 68 insertions(+), 57 deletions(-) create mode 100644 contrib/coccinelle/generation.cocci diff --git a/alloc.c b/alloc.c index 1c64c4dd16..cbed187094 100644 --- a/alloc.c +++ b/alloc.c @@ -109,7 +109,6 @@ void init_commit_node(struct repository *r, struct commit *c) c->object.type = OBJ_COMMIT; c->index = alloc_commit_index(r); c->graph_pos = COMMIT_NOT_FROM_GRAPH; - c->generation = GENERATION_NUMBER_INFINITY; } void *alloc_commit_node(struct repository *r) diff --git a/blame.c b/blame.c index da7e28800e..50e6316076 100644 --- a/blame.c +++ b/blame.c @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r, if (!bd) return 1; - if (origin->commit->generation == GENERATION_NUMBER_INFINITY) + if (generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; filter = get_bloom_filter(r, origin->commit, 0); diff --git a/commit-graph.c b/commit-graph.c index 63f419048d..9ce7d4acb1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -120,9 +120,9 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *b = *(const struct commit **)vb; /* lower generation commits first */ - if (a->generation < b->generation) + if (generation(a) < generation(b)) return -1; - else if (a->generation > b->generation) + else if (generation(a) > generation(b)) return 1; /* use date as a heuristic when generations are equal */ @@ -712,7 +712,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; item->graph_pos = pos; - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2); } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -754,7 +754,7 @@ static int fill_commit_in_graph(struct repository *r, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2); pptr = &item->parents; @@ -1048,7 +1048,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, else packedDate[0] = 0; - packedDate[0] |= htonl((*list)->generation << 2); + packedDate[0] |= htonl(generation((*list)) << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1280,8 +1280,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { display_progress(ctx->progress, i + 1); - if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY && - ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO) + if (generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && + generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1292,22 +1292,23 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) uint32_t max_generation = 0; for (parent = current->parents; parent; parent = parent->next) { - if (parent->item->generation == GENERATION_NUMBER_INFINITY || - parent->item->generation == GENERATION_NUMBER_ZERO) { + if (generation(parent->item) == GENERATION_NUMBER_INFINITY || + generation(parent->item) == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (parent->item->generation > max_generation) { - max_generation = parent->item->generation; + } else if (generation(parent->item) > max_generation) { + max_generation = generation(parent->item); } } if (all_parents_computed) { - current->generation = max_generation + 1; + set_generation(current, max_generation + 1); pop_commit(&list); - if (current->generation > GENERATION_NUMBER_MAX) - current->generation = GENERATION_NUMBER_MAX; + if (generation(current) > GENERATION_NUMBER_MAX) + set_generation(current, + GENERATION_NUMBER_MAX); } } } @@ -2314,8 +2315,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) oid_to_hex(&graph_parents->item->object.oid), oid_to_hex(&odb_parents->item->object.oid)); - if (graph_parents->item->generation > max_generation) - max_generation = graph_parents->item->generation; + if (generation(graph_parents->item) > max_generation) + max_generation = generation(graph_parents->item); graph_parents = graph_parents->next; odb_parents = odb_parents->next; @@ -2325,7 +2326,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) graph_report(_("commit-graph parent list for commit %s terminates early"), oid_to_hex(&cur_oid)); - if (!graph_commit->generation) { + if (!generation(graph_commit)) { if (generation_zero == GENERATION_NUMBER_EXISTS) graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"), oid_to_hex(&cur_oid)); @@ -2345,10 +2346,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (max_generation == GENERATION_NUMBER_MAX) max_generation--; - if (graph_commit->generation != max_generation + 1) + if (generation(graph_commit) != max_generation + 1) graph_report(_("commit-graph generation for commit %s is %u != %u"), oid_to_hex(&cur_oid), - graph_commit->generation, + generation(graph_commit), max_generation + 1); if (graph_commit->date != odb_commit->date) diff --git a/commit-reach.c b/commit-reach.c index 4ca7e706a1..77c980054a 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit_list *parents; int flags; - if (min_generation && commit->generation > last_gen) + if (min_generation && generation(commit) > last_gen) BUG("bad generation skip %8x > %8x at %s", - commit->generation, last_gen, + generation(commit), last_gen, oid_to_hex(&commit->object.oid)); - last_gen = commit->generation; + last_gen = generation(commit); - if (commit->generation < min_generation) + if (generation(commit) < min_generation) break; flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); @@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; - uint32_t min_generation = array[i]->generation; + uint32_t min_generation = generation(array[i]); if (redundant[i]) continue; @@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt filled_index[filled] = j; work[filled++] = array[j]; - if (array[j]->generation < min_generation) - min_generation = array[j]->generation; + if (generation(array[j]) < min_generation) + min_generation = generation(array[j]); } common = paint_down_to_common(r, array[i], filled, work, min_generation); @@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, for (i = 0; i < nr_reference; i++) { if (repo_parse_commit(r, reference[i])) return ret; - if (reference[i]->generation < min_generation) - min_generation = reference[i]->generation; + if (generation(reference[i]) < min_generation) + min_generation = generation(reference[i]); } - if (commit->generation > min_generation) + if (generation(commit) > min_generation) return ret; bases = paint_down_to_common(r, commit, nr_reference, reference, - commit->generation); + generation(commit)); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); @@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate, /* Otherwise, we don't know; prepare to recurse */ parse_commit_or_die(candidate); - if (candidate->generation < cutoff) + if (generation(candidate) < cutoff) return CONTAINS_NO; return CONTAINS_UNKNOWN; @@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate, for (p = want; p; p = p->next) { struct commit *c = p->item; load_commit_graph_info(the_repository, c); - if (c->generation < cutoff) - cutoff = c->generation; + if (generation(c) < cutoff) + cutoff = generation(c); } result = contains_test(candidate, want, cache, cutoff); @@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - if (a->generation < b->generation) + if (generation(a) < generation(b)) return -1; - if (a->generation > b->generation) + if (generation(a) > generation(b)) return 1; return 0; } @@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from, list[nr_commits] = (struct commit *)from_one; if (parse_commit(list[nr_commits]) || - list[nr_commits]->generation < min_generation) { + generation(list[nr_commits]) < min_generation) { result = 0; goto cleanup; } @@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from, if (parse_commit(parent->item) || parent->item->date < min_commit_date || - parent->item->generation < min_generation) + generation(parent->item) < min_generation) continue; commit_list_insert(parent->item, &stack); @@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; - if (from_iter->item->generation < min_generation) - min_generation = from_iter->item->generation; + if (generation(from_iter->item) < min_generation) + min_generation = generation(from_iter->item); } from_iter = from_iter->next; @@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; - if (to_iter->item->generation < min_generation) - min_generation = to_iter->item->generation; + if (generation(to_iter->item) < min_generation) + min_generation = generation(to_iter->item); } to_iter->item->object.flags |= PARENT2; @@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit *c = *item; parse_commit(c); - if (c->generation < min_generation) - min_generation = c->generation; + if (generation(c) < min_generation) + min_generation = generation(c); if (!(c->object.flags & PARENT1)) { c->object.flags |= PARENT1; @@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, parse_commit(p); - if (p->generation < min_generation) + if (generation(p) < min_generation) continue; if (p->object.flags & PARENT2) diff --git a/commit.c b/commit.c index 87686a7055..8dad0f8446 100644 --- a/commit.c +++ b/commit.c @@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void const struct commit *a = a_, *b = b_; /* newer commits first */ - if (a->generation < b->generation) + if (generation(a) < generation(b)) return 1; - else if (a->generation > b->generation) + else if (generation(a) > generation(b)) return -1; /* use date as a heuristic when generations are equal */ diff --git a/commit.h b/commit.h index cc610400d5..01e1c4c3eb 100644 --- a/commit.h +++ b/commit.h @@ -34,7 +34,6 @@ struct commit { */ struct tree *maybe_tree; uint32_t graph_pos; - uint32_t generation; unsigned int index; }; diff --git a/contrib/coccinelle/generation.cocci b/contrib/coccinelle/generation.cocci new file mode 100644 index 0000000000..da13c44856 --- /dev/null +++ b/contrib/coccinelle/generation.cocci @@ -0,0 +1,12 @@ +@@ +struct commit *c; +expression E; +@@ +- c->generation = E ++ set_generation(c, E) + +@@ +struct commit *c; +@@ +- c->generation ++ generation(c) diff --git a/revision.c b/revision.c index 60cca8c0b9..d76382007c 100644 --- a/revision.c +++ b/revision.c @@ -720,7 +720,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (!revs->repo->objects->commit_graph) return -1; - if (commit->generation == GENERATION_NUMBER_INFINITY) + if (generation(commit) == GENERATION_NUMBER_INFINITY) return -1; filter = get_bloom_filter(revs->repo, commit, 0); @@ -3314,7 +3314,7 @@ static void explore_to_depth(struct rev_info *revs, struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; while ((c = prio_queue_peek(&info->explore_queue)) && - c->generation >= gen_cutoff) + generation(c) >= gen_cutoff) explore_walk_step(revs); } @@ -3330,7 +3330,7 @@ static void indegree_walk_step(struct rev_info *revs) if (parse_commit_gently(c, 1) < 0) return; - explore_to_depth(revs, c->generation); + explore_to_depth(revs, generation(c)); for (p = c->parents; p; p = p->next) { struct commit *parent = p->item; @@ -3354,7 +3354,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs, struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; while ((c = prio_queue_peek(&info->indegree_queue)) && - c->generation >= gen_cutoff) + generation(c) >= gen_cutoff) indegree_walk_step(revs); } @@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs) test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); - if (c->generation < info->min_generation) - info->min_generation = c->generation; + if (generation(c) < info->min_generation) + info->min_generation = generation(c); *(indegree_slab_at(&info->indegree, c)) = 1; @@ -3473,8 +3473,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) if (parse_commit_gently(parent, 1) < 0) continue; - if (parent->generation < info->min_generation) { - info->min_generation = parent->generation; + if (generation(parent) < info->min_generation) { + info->min_generation = generation(parent); compute_indegrees_to_depth(revs, info->min_generation); } -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 2/3] commit: convert commit->generation to a slab 2020-06-04 7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar @ 2020-06-04 14:27 ` Derrick Stolee 2020-06-04 17:49 ` Junio C Hamano 2020-06-06 22:03 ` Jakub Narębski 2 siblings, 0 replies; 39+ messages in thread From: Derrick Stolee @ 2020-06-04 14:27 UTC (permalink / raw) To: Abhishek Kumar, git; +Cc: jnareb On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > In this commit, we will use the generation slab helpers introduced in > last commit and replace existing uses of commit->generation using > 'contrib/coccinelle/generation.cocci' > @@ -1048,7 +1048,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, > else > packedDate[0] = 0; > > - packedDate[0] |= htonl((*list)->generation << 2); > + packedDate[0] |= htonl(generation((*list)) << 2); nit: We no longer need the extra parens around *list. > @@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs) > test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); > test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); > > - if (c->generation < info->min_generation) > - info->min_generation = c->generation; > + if (generation(c) < info->min_generation) > + info->min_generation = generation(c); A pattern I've noticed in several places is that the struct member is accessed multiple times in the same method body, and this is auto-converted to multiple method calls. However, these values are fixed, so it would be better to store the value as a local variable and reuse that variable instead. This is one of the shortcomings of the Coccinelle transformation, so you'll need to manually inspect each of the diff fragments to see if we can reduce the number of method calls. It might be helpful to do that as a follow-up, so we can see that this patch is generated by the Coccinelle script, and then a later patch can be scrutinized more carefully when you are doing manual code manipulation. Thanks, -Stolee ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 2/3] commit: convert commit->generation to a slab 2020-06-04 7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar 2020-06-04 14:27 ` Derrick Stolee @ 2020-06-04 17:49 ` Junio C Hamano 2020-06-06 22:03 ` Jakub Narębski 2 siblings, 0 replies; 39+ messages in thread From: Junio C Hamano @ 2020-06-04 17:49 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, stolee, jnareb Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > In this commit, we will use the generation slab helpers introduced in > last commit and replace existing uses of commit->generation using > 'contrib/coccinelle/generation.cocci' > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > alloc.c | 1 - > blame.c | 2 +- > commit-graph.c | 39 +++++++++++----------- > commit-reach.c | 50 ++++++++++++++--------------- > commit.c | 4 +-- > commit.h | 1 - > contrib/coccinelle/generation.cocci | 12 +++++++ > revision.c | 16 ++++----- > 8 files changed, 68 insertions(+), 57 deletions(-) > create mode 100644 contrib/coccinelle/generation.cocci > > diff --git a/alloc.c b/alloc.c > index 1c64c4dd16..cbed187094 100644 > --- a/alloc.c > +++ b/alloc.c > @@ -109,7 +109,6 @@ void init_commit_node(struct repository *r, struct commit *c) > c->object.type = OBJ_COMMIT; > c->index = alloc_commit_index(r); > c->graph_pos = COMMIT_NOT_FROM_GRAPH; > - c->generation = GENERATION_NUMBER_INFINITY; > } > > void *alloc_commit_node(struct repository *r) > diff --git a/blame.c b/blame.c > index da7e28800e..50e6316076 100644 > --- a/blame.c > +++ b/blame.c > @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r, > if (!bd) > return 1; > > - if (origin->commit->generation == GENERATION_NUMBER_INFINITY) > + if (generation(origin->commit) == GENERATION_NUMBER_INFINITY) Hmmmm, as C is not all that object-oriented that lets us say "commit objects have generation() method", a plain vanilla function whose name is generation() is a bit overly vague. The field name this helper function replaces, .generation, is very localized to a commit "object" and does not have such a problem. We probably need to choose a better name in the previous step to fix it. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 2/3] commit: convert commit->generation to a slab 2020-06-04 7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar 2020-06-04 14:27 ` Derrick Stolee 2020-06-04 17:49 ` Junio C Hamano @ 2020-06-06 22:03 ` Jakub Narębski 2 siblings, 0 replies; 39+ messages in thread From: Jakub Narębski @ 2020-06-06 22:03 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, Derrick Stolee Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > In this commit, we will use the generation slab helpers introduced in > last commit and replace existing uses of commit->generation using > 'contrib/coccinelle/generation.cocci' > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > alloc.c | 1 - > blame.c | 2 +- > commit-graph.c | 39 +++++++++++----------- > commit-reach.c | 50 ++++++++++++++--------------- > commit.c | 4 +-- > commit.h | 1 - > contrib/coccinelle/generation.cocci | 12 +++++++ > revision.c | 16 ++++----- > 8 files changed, 68 insertions(+), 57 deletions(-) > create mode 100644 contrib/coccinelle/generation.cocci > For easier review I have changed the order of files, grouping together different categories of changes. First category is removing commit->generation and related changes: > diff --git a/commit.h b/commit.h > index cc610400d5..01e1c4c3eb 100644 > --- a/commit.h > +++ b/commit.h > @@ -34,7 +34,6 @@ struct commit { > */ > struct tree *maybe_tree; > uint32_t graph_pos; > - uint32_t generation; > unsigned int index; > }; > This is quite straightforward. > diff --git a/alloc.c b/alloc.c > index 1c64c4dd16..cbed187094 100644 > --- a/alloc.c > +++ b/alloc.c > @@ -109,7 +109,6 @@ void init_commit_node(struct repository *r, struct commit *c) > c->object.type = OBJ_COMMIT; > c->index = alloc_commit_index(r); > c->graph_pos = COMMIT_NOT_FROM_GRAPH; > - c->generation = GENERATION_NUMBER_INFINITY; > } > > void *alloc_commit_node(struct repository *r) But this change might need a more detailed write-up in the commit message. If I understand it correctly, this is function is used by generic commit allocator, and given commit object may not need generation number. That is why the default value of GENERATION_NUMBER_INFINITY is handled by new "getters" and "setters", i.e. generation() and set_generation() -- which are called only if commit-graph is present, I think. The generation() returns GENERATION_NUMBER_INFINITY for commits not in generation_slab, assuming that for all commits in the commit graph it would be filled by commit-graph parsing -- just like init_commit_node() would initialize commit->generation to this value. Because <slabname>_at() (including generation_slab_at()) extends array if necessary, we want all data on generation_slab to be correctly initialized to GENERATION_NUMBER_INFINITY, just like init_commit_node() would do it, because generation() "getter" cannot. The commit might be present on generation_slab just because allocation is done in chunks (slabs). Second category is semantic patch that generates the rest of changes (which could be adjusted manually in subsequent commits). > diff --git a/contrib/coccinelle/generation.cocci b/contrib/coccinelle/generation.cocci > new file mode 100644 > index 0000000000..da13c44856 > --- /dev/null > +++ b/contrib/coccinelle/generation.cocci > @@ -0,0 +1,12 @@ > +@@ > +struct commit *c; > +expression E; > +@@ > +- c->generation = E > ++ set_generation(c, E) > + > +@@ > +struct commit *c; > +@@ > +- c->generation > ++ generation(c) I wonder if Coccinelle is able to automatically discard extra parentheses (the problem noticed by Stolee in his reply) with the following chunk: +@@ +struct commit *c; +@@ +- (c)->generation ++ generation(c) Third category is all the changes that are just straight mechanical changes being the result of applying the Coccinelle patch. > diff --git a/blame.c b/blame.c > index da7e28800e..50e6316076 100644 > --- a/blame.c > +++ b/blame.c > @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r, > if (!bd) > return 1; > > - if (origin->commit->generation == GENERATION_NUMBER_INFINITY) > + if (generation(origin->commit) == GENERATION_NUMBER_INFINITY) > return 1; > > filter = get_bloom_filter(r, origin->commit, 0); > diff --git a/commit-graph.c b/commit-graph.c > index 63f419048d..9ce7d4acb1 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -120,9 +120,9 @@ static int commit_gen_cmp(const void *va, const void *vb) > const struct commit *b = *(const struct commit **)vb; > > /* lower generation commits first */ > - if (a->generation < b->generation) > + if (generation(a) < generation(b)) > return -1; > - else if (a->generation > b->generation) > + else if (generation(a) > generation(b)) > return 1; > > /* use date as a heuristic when generations are equal */ > @@ -712,7 +712,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, > lex_index = pos - g->num_commits_in_base; > commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; > item->graph_pos = pos; > - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > + set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2); > } > > static inline void set_commit_tree(struct commit *c, struct tree *t) > @@ -754,7 +754,7 @@ static int fill_commit_in_graph(struct repository *r, > date_low = get_be32(commit_data + g->hash_len + 12); > item->date = (timestamp_t)((date_high << 32) | date_low); > > - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > + set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2); > > pptr = &item->parents; > > @@ -1048,7 +1048,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, > else > packedDate[0] = 0; > > - packedDate[0] |= htonl((*list)->generation << 2); > + packedDate[0] |= htonl(generation((*list)) << 2); > > packedDate[1] = htonl((*list)->date); > hashwrite(f, packedDate, 8); > @@ -1280,8 +1280,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > ctx->commits.nr); > for (i = 0; i < ctx->commits.nr; i++) { > display_progress(ctx->progress, i + 1); > - if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY && > - ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO) > + if (generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && > + generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) > continue; > > commit_list_insert(ctx->commits.list[i], &list); > @@ -1292,22 +1292,23 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > uint32_t max_generation = 0; > > for (parent = current->parents; parent; parent = parent->next) { > - if (parent->item->generation == GENERATION_NUMBER_INFINITY || > - parent->item->generation == GENERATION_NUMBER_ZERO) { > + if (generation(parent->item) == GENERATION_NUMBER_INFINITY || > + generation(parent->item) == GENERATION_NUMBER_ZERO) { > all_parents_computed = 0; > commit_list_insert(parent->item, &list); > break; > - } else if (parent->item->generation > max_generation) { > - max_generation = parent->item->generation; > + } else if (generation(parent->item) > max_generation) { > + max_generation = generation(parent->item); > } > } Here generation(parent->item) is called three times, which is probably cost effective to save the value to the local variable (as Stolee noticed). > > if (all_parents_computed) { > - current->generation = max_generation + 1; > + set_generation(current, max_generation + 1); > pop_commit(&list); > > - if (current->generation > GENERATION_NUMBER_MAX) > - current->generation = GENERATION_NUMBER_MAX; > + if (generation(current) > GENERATION_NUMBER_MAX) > + set_generation(current, > + GENERATION_NUMBER_MAX); > } > } > } > @@ -2314,8 +2315,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) > oid_to_hex(&graph_parents->item->object.oid), > oid_to_hex(&odb_parents->item->object.oid)); > > - if (graph_parents->item->generation > max_generation) > - max_generation = graph_parents->item->generation; > + if (generation(graph_parents->item) > max_generation) > + max_generation = generation(graph_parents->item); > > graph_parents = graph_parents->next; > odb_parents = odb_parents->next; > @@ -2325,7 +2326,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) > graph_report(_("commit-graph parent list for commit %s terminates early"), > oid_to_hex(&cur_oid)); > > - if (!graph_commit->generation) { > + if (!generation(graph_commit)) { > if (generation_zero == GENERATION_NUMBER_EXISTS) > graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"), > oid_to_hex(&cur_oid)); > @@ -2345,10 +2346,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) > if (max_generation == GENERATION_NUMBER_MAX) > max_generation--; > > - if (graph_commit->generation != max_generation + 1) > + if (generation(graph_commit) != max_generation + 1) > graph_report(_("commit-graph generation for commit %s is %u != %u"), > oid_to_hex(&cur_oid), > - graph_commit->generation, > + generation(graph_commit), > max_generation + 1); > > if (graph_commit->date != odb_commit->date) > diff --git a/commit-reach.c b/commit-reach.c > index 4ca7e706a1..77c980054a 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r, > struct commit_list *parents; > int flags; > > - if (min_generation && commit->generation > last_gen) > + if (min_generation && generation(commit) > last_gen) > BUG("bad generation skip %8x > %8x at %s", > - commit->generation, last_gen, > + generation(commit), last_gen, > oid_to_hex(&commit->object.oid)); > - last_gen = commit->generation; > + last_gen = generation(commit); > > - if (commit->generation < min_generation) > + if (generation(commit) < min_generation) > break; Here generation(commit) is called three times (two times in the exceptional case). > > flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); > @@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt > repo_parse_commit(r, array[i]); > for (i = 0; i < cnt; i++) { > struct commit_list *common; > - uint32_t min_generation = array[i]->generation; > + uint32_t min_generation = generation(array[i]); > > if (redundant[i]) > continue; > @@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt > filled_index[filled] = j; > work[filled++] = array[j]; > > - if (array[j]->generation < min_generation) > - min_generation = array[j]->generation; > + if (generation(array[j]) < min_generation) > + min_generation = generation(array[j]); > } > common = paint_down_to_common(r, array[i], filled, > work, min_generation); > @@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, > for (i = 0; i < nr_reference; i++) { > if (repo_parse_commit(r, reference[i])) > return ret; > - if (reference[i]->generation < min_generation) > - min_generation = reference[i]->generation; > + if (generation(reference[i]) < min_generation) > + min_generation = generation(reference[i]); > } > > - if (commit->generation > min_generation) > + if (generation(commit) > min_generation) > return ret; > > bases = paint_down_to_common(r, commit, > nr_reference, reference, > - commit->generation); > + generation(commit)); > if (commit->object.flags & PARENT2) > ret = 1; > clear_commit_marks(commit, all_flags); > @@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate, > /* Otherwise, we don't know; prepare to recurse */ > parse_commit_or_die(candidate); > > - if (candidate->generation < cutoff) > + if (generation(candidate) < cutoff) > return CONTAINS_NO; > > return CONTAINS_UNKNOWN; > @@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate, > for (p = want; p; p = p->next) { > struct commit *c = p->item; > load_commit_graph_info(the_repository, c); > - if (c->generation < cutoff) > - cutoff = c->generation; > + if (generation(c) < cutoff) > + cutoff = generation(c); > } > > result = contains_test(candidate, want, cache, cutoff); > @@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b) > const struct commit *a = *(const struct commit * const *)_a; > const struct commit *b = *(const struct commit * const *)_b; > > - if (a->generation < b->generation) > + if (generation(a) < generation(b)) > return -1; > - if (a->generation > b->generation) > + if (generation(a) > generation(b)) > return 1; > return 0; > } > @@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from, > > list[nr_commits] = (struct commit *)from_one; > if (parse_commit(list[nr_commits]) || > - list[nr_commits]->generation < min_generation) { > + generation(list[nr_commits]) < min_generation) { > result = 0; > goto cleanup; > } > @@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from, > > if (parse_commit(parent->item) || > parent->item->date < min_commit_date || > - parent->item->generation < min_generation) > + generation(parent->item) < min_generation) > continue; > > commit_list_insert(parent->item, &stack); > @@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, > if (from_iter->item->date < min_commit_date) > min_commit_date = from_iter->item->date; > > - if (from_iter->item->generation < min_generation) > - min_generation = from_iter->item->generation; > + if (generation(from_iter->item) < min_generation) > + min_generation = generation(from_iter->item); > } > > from_iter = from_iter->next; > @@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, > if (to_iter->item->date < min_commit_date) > min_commit_date = to_iter->item->date; > > - if (to_iter->item->generation < min_generation) > - min_generation = to_iter->item->generation; > + if (generation(to_iter->item) < min_generation) > + min_generation = generation(to_iter->item); > } > > to_iter->item->object.flags |= PARENT2; > @@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > struct commit *c = *item; > > parse_commit(c); > - if (c->generation < min_generation) > - min_generation = c->generation; > + if (generation(c) < min_generation) > + min_generation = generation(c); > > if (!(c->object.flags & PARENT1)) { > c->object.flags |= PARENT1; > @@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > > parse_commit(p); > > - if (p->generation < min_generation) > + if (generation(p) < min_generation) > continue; > > if (p->object.flags & PARENT2) > diff --git a/commit.c b/commit.c > index 87686a7055..8dad0f8446 100644 > --- a/commit.c > +++ b/commit.c > @@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void > const struct commit *a = a_, *b = b_; > > /* newer commits first */ > - if (a->generation < b->generation) > + if (generation(a) < generation(b)) > return 1; > - else if (a->generation > b->generation) > + else if (generation(a) > generation(b)) > return -1; > > /* use date as a heuristic when generations are equal */ > diff --git a/revision.c b/revision.c > index 60cca8c0b9..d76382007c 100644 > --- a/revision.c > +++ b/revision.c > @@ -720,7 +720,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, > if (!revs->repo->objects->commit_graph) > return -1; > > - if (commit->generation == GENERATION_NUMBER_INFINITY) > + if (generation(commit) == GENERATION_NUMBER_INFINITY) > return -1; > > filter = get_bloom_filter(revs->repo, commit, 0); > @@ -3314,7 +3314,7 @@ static void explore_to_depth(struct rev_info *revs, > struct topo_walk_info *info = revs->topo_walk_info; > struct commit *c; > while ((c = prio_queue_peek(&info->explore_queue)) && > - c->generation >= gen_cutoff) > + generation(c) >= gen_cutoff) > explore_walk_step(revs); > } > > @@ -3330,7 +3330,7 @@ static void indegree_walk_step(struct rev_info *revs) > if (parse_commit_gently(c, 1) < 0) > return; > > - explore_to_depth(revs, c->generation); > + explore_to_depth(revs, generation(c)); > > for (p = c->parents; p; p = p->next) { > struct commit *parent = p->item; > @@ -3354,7 +3354,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs, > struct topo_walk_info *info = revs->topo_walk_info; > struct commit *c; > while ((c = prio_queue_peek(&info->indegree_queue)) && > - c->generation >= gen_cutoff) > + generation(c) >= gen_cutoff) > indegree_walk_step(revs); > } > > @@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs) > test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); > test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); > > - if (c->generation < info->min_generation) > - info->min_generation = c->generation; > + if (generation(c) < info->min_generation) > + info->min_generation = generation(c); > > *(indegree_slab_at(&info->indegree, c)) = 1; > > @@ -3473,8 +3473,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) > if (parse_commit_gently(parent, 1) < 0) > continue; > > - if (parent->generation < info->min_generation) { > - info->min_generation = parent->generation; > + if (generation(parent) < info->min_generation) { > + info->min_generation = generation(parent); > compute_indegrees_to_depth(revs, info->min_generation); > } Best, -- Jakub Narębski ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSoC Patch 3/3] commit: convert commit->graph_pos to a slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar 2020-06-04 7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar @ 2020-06-04 7:27 ` Abhishek Kumar 2020-06-07 12:12 ` Jakub Narębski 2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee ` (3 subsequent siblings) 6 siblings, 1 reply; 39+ messages in thread From: Abhishek Kumar @ 2020-06-04 7:27 UTC (permalink / raw) To: git; +Cc: stolee, jnareb The member graph_pos refers to the integer position used to identify a commit in commit-graph files. However, graph_pos is not useful in other contexts and bloats the struct. Let's move it to a commit-slab and shrink the struct by four bytes. Existing references to graph_pos are replaced using 'contrib/coccinelle/graph_pos.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- alloc.c | 1 - bloom.c | 6 ++-- commit-graph.c | 50 +++++++++++++++++++++++------- commit-graph.h | 3 ++ commit.c | 2 +- commit.h | 2 -- contrib/coccinelle/graph_pos.cocci | 12 +++++++ 7 files changed, 58 insertions(+), 18 deletions(-) create mode 100644 contrib/coccinelle/graph_pos.cocci diff --git a/alloc.c b/alloc.c index cbed187094..f37fb3b8b6 100644 --- a/alloc.c +++ b/alloc.c @@ -108,7 +108,6 @@ void init_commit_node(struct repository *r, struct commit *c) { c->object.type = OBJ_COMMIT; c->index = alloc_commit_index(r); - c->graph_pos = COMMIT_NOT_FROM_GRAPH; } void *alloc_commit_node(struct repository *r) diff --git a/bloom.c b/bloom.c index 9b86aa3f59..5bee5bb0c1 100644 --- a/bloom.c +++ b/bloom.c @@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, { uint32_t lex_pos, start_index, end_index; - while (c->graph_pos < g->num_commits_in_base) + while (graph_pos(c) < g->num_commits_in_base) g = g->base_graph; /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ if (!g->chunk_bloom_indexes) return 0; - lex_pos = c->graph_pos - g->num_commits_in_base; + lex_pos = graph_pos(c) - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); @@ -188,7 +188,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (c->graph_pos != COMMIT_NOT_FROM_GRAPH && + if (graph_pos(c) != COMMIT_NOT_FROM_GRAPH && r->objects->commit_graph->chunk_bloom_indexes) { if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) return filter; diff --git a/commit-graph.c b/commit-graph.c index 9ce7d4acb1..7ff460b442 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -87,6 +87,34 @@ static int commit_pos_cmp(const void *va, const void *vb) commit_pos_at(&commit_pos, b); } +define_commit_slab(graph_pos_slab, uint32_t); +static struct graph_pos_slab graph_pos_slab = COMMIT_SLAB_INIT(1, graph_pos_slab); + +uint32_t graph_pos(const struct commit *c) +{ + uint32_t *pos = graph_pos_slab_peek(&graph_pos_slab, c); + + return pos ? *pos : COMMIT_NOT_FROM_GRAPH; +} + +static void set_graph_pos(const struct commit *c, const uint32_t position) +{ + unsigned int i = graph_pos_slab.slab_count; + uint32_t *pos = graph_pos_slab_at(&graph_pos_slab, c); + + /* + * commit-slab initializes with zero, overwrite this with + * COMMIT_NOT_FROM_GRAPH + */ + for (; i < graph_pos_slab.slab_count; ++i) + { + memset(graph_pos_slab.slab[i], COMMIT_NOT_FROM_GRAPH, + graph_pos_slab.slab_size * sizeof(uint32_t)); + } + + *pos = position; +} + define_commit_slab(generation_slab, uint32_t); static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab); @@ -697,7 +725,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r, c = lookup_commit(r, &oid); if (!c) die(_("could not find commit %s"), oid_to_hex(&oid)); - c->graph_pos = pos; + set_graph_pos(c, pos); return &commit_list_insert(c, pptr)->next; } @@ -711,7 +739,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; - item->graph_pos = pos; + set_graph_pos(item, pos); set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2); } @@ -741,7 +769,7 @@ static int fill_commit_in_graph(struct repository *r, * Store the "full" position, but then use the * "local" position for the rest of the calculation. */ - item->graph_pos = pos; + set_graph_pos(item, pos); lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; @@ -786,8 +814,8 @@ static int fill_commit_in_graph(struct repository *r, static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) { - if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { - *pos = item->graph_pos; + if (graph_pos(item) != COMMIT_NOT_FROM_GRAPH) { + *pos = graph_pos(item); return 1; } else { struct commit_graph *cur_g = g; @@ -843,11 +871,11 @@ static struct tree *load_tree_for_commit(struct repository *r, struct object_id oid; const unsigned char *commit_data; - while (c->graph_pos < g->num_commits_in_base) + while (graph_pos(c) < g->num_commits_in_base) g = g->base_graph; commit_data = g->chunk_commit_data + - GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base); + GRAPH_DATA_WIDTH * (graph_pos(c) - g->num_commits_in_base); hashcpy(oid.hash, commit_data); set_commit_tree(c, lookup_tree(r, &oid)); @@ -861,7 +889,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r, { if (c->maybe_tree) return c->maybe_tree; - if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) + if (graph_pos(c) == COMMIT_NOT_FROM_GRAPH) BUG("get_commit_tree_in_graph_one called from non-commit-graph commit"); return load_tree_for_commit(r, g, (struct commit *)c); @@ -1247,7 +1275,7 @@ static void close_reachable(struct write_commit_graph_context *ctx) continue; if (ctx->split) { if ((!parse_commit(commit) && - commit->graph_pos == COMMIT_NOT_FROM_GRAPH) || + graph_pos(commit) == COMMIT_NOT_FROM_GRAPH) || flags == COMMIT_GRAPH_SPLIT_REPLACE) add_missing_parents(ctx, commit); } else if (!parse_commit_no_graph(commit)) @@ -1493,7 +1521,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) if (ctx->split) { struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]); - if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH) + if (!c || graph_pos(c) != COMMIT_NOT_FROM_GRAPH) continue; } @@ -1527,7 +1555,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx) ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]); if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE && - ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH) + graph_pos(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH) continue; if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE) diff --git a/commit-graph.h b/commit-graph.h index 653bd041ad..3cb59ba336 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -8,6 +8,7 @@ #include "object-store.h" #include "oidset.h" +#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF #define GENERATION_NUMBER_INFINITY 0xFFFFFFFF #define GENERATION_NUMBER_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 @@ -142,4 +143,6 @@ void free_commit_graph(struct commit_graph *); void disable_commit_graph(struct repository *r); uint32_t generation(const struct commit *c); + +uint32_t graph_pos(const struct commit *c); #endif diff --git a/commit.c b/commit.c index 8dad0f8446..da6de08b2b 100644 --- a/commit.c +++ b/commit.c @@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r, if (commit->maybe_tree || !commit->object.parsed) return commit->maybe_tree; - if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH) + if (graph_pos(commit) != COMMIT_NOT_FROM_GRAPH) return get_commit_tree_in_graph(r, commit); return NULL; diff --git a/commit.h b/commit.h index 01e1c4c3eb..0b10464a10 100644 --- a/commit.h +++ b/commit.h @@ -10,8 +10,6 @@ #include "pretty.h" #include "commit-slab.h" -#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF - struct commit_list { struct commit *item; struct commit_list *next; diff --git a/contrib/coccinelle/graph_pos.cocci b/contrib/coccinelle/graph_pos.cocci new file mode 100644 index 0000000000..0929164bdf --- /dev/null +++ b/contrib/coccinelle/graph_pos.cocci @@ -0,0 +1,12 @@ +@@ +struct commit *c; +expression E; +@@ +- c->graph_pos = E ++ set_graph_pos(c, E) + +@@ +struct commit *c; +@@ +- c->graph_pos ++ graph_pos(c) -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 3/3] commit: convert commit->graph_pos to a slab 2020-06-04 7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar @ 2020-06-07 12:12 ` Jakub Narębski 0 siblings, 0 replies; 39+ messages in thread From: Jakub Narębski @ 2020-06-07 12:12 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, Derrick Stolee Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > The member graph_pos refers to the integer position used to identify a > commit in commit-graph files. However, graph_pos is not useful in other > contexts and bloats the struct. > > Let's move it to a commit-slab and shrink the struct by four bytes. > > Existing references to graph_pos are replaced using > 'contrib/coccinelle/graph_pos.cocci'. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > alloc.c | 1 - > bloom.c | 6 ++-- > commit-graph.c | 50 +++++++++++++++++++++++------- > commit-graph.h | 3 ++ > commit.c | 2 +- > commit.h | 2 -- > contrib/coccinelle/graph_pos.cocci | 12 +++++++ > 7 files changed, 58 insertions(+), 18 deletions(-) > create mode 100644 contrib/coccinelle/graph_pos.cocci I have reordered the chunks to make it easier to review. > diff --git a/commit.h b/commit.h > index 01e1c4c3eb..0b10464a10 100644 > --- a/commit.h > +++ b/commit.h > @@ -10,8 +10,6 @@ > #include "pretty.h" > #include "commit-slab.h" > > -#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF > - > struct commit_list { > struct commit *item; > struct commit_list *next; > diff --git a/commit-graph.h b/commit-graph.h > index 653bd041ad..3cb59ba336 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -8,6 +8,7 @@ > #include "object-store.h" > #include "oidset.h" > > +#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF > #define GENERATION_NUMBER_INFINITY 0xFFFFFFFF > #define GENERATION_NUMBER_MAX 0x3FFFFFFF > #define GENERATION_NUMBER_ZERO 0 Those two chunks move COMMIT_NOT_FROM_GRAPH from commit.h to commit-graph.h, because it is no longer needed in init_commit_node() from alloc.c. On the other hand the remaining commit-graph #define-s were moved in first commit in series. What I DO NOT SEE in this commit is actual *removal* of `graph_pos` field from the `struct commit`, i.e.: diff --git a/commit.h b/commit.h index cc610400d5..01e1c4c3eb 100644 --- a/commit.h +++ b/commit.h @@ -34,6 +34,5 @@ struct commit { */ struct tree *maybe_tree; - uint32_t graph_pos; unsigned int index; }; > diff --git a/alloc.c b/alloc.c > index cbed187094..f37fb3b8b6 100644 > --- a/alloc.c > +++ b/alloc.c > @@ -108,7 +108,6 @@ void init_commit_node(struct repository *r, struct commit *c) > { > c->object.type = OBJ_COMMIT; > c->index = alloc_commit_index(r); > - c->graph_pos = COMMIT_NOT_FROM_GRAPH; > } > > void *alloc_commit_node(struct repository *r) This removes commit->graph_pos initialization from init_commit_node(), and thus from alloc_commit_node(); the handling of COMMIT_NOT_FROM_GRAPH is moved to setter and getter "methods", see below. > diff --git a/commit-graph.c b/commit-graph.c > index 9ce7d4acb1..7ff460b442 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -87,6 +87,34 @@ static int commit_pos_cmp(const void *va, const void *vb) > commit_pos_at(&commit_pos, b); > } > > +define_commit_slab(graph_pos_slab, uint32_t); > +static struct graph_pos_slab graph_pos_slab = COMMIT_SLAB_INIT(1, graph_pos_slab); > + > +uint32_t graph_pos(const struct commit *c) > +{ > + uint32_t *pos = graph_pos_slab_peek(&graph_pos_slab, c); > + > + return pos ? *pos : COMMIT_NOT_FROM_GRAPH; > +} > + > +static void set_graph_pos(const struct commit *c, const uint32_t position) > +{ > + unsigned int i = graph_pos_slab.slab_count; > + uint32_t *pos = graph_pos_slab_at(&graph_pos_slab, c); > + > + /* > + * commit-slab initializes with zero, overwrite this with > + * COMMIT_NOT_FROM_GRAPH > + */ > + for (; i < graph_pos_slab.slab_count; ++i) > + { > + memset(graph_pos_slab.slab[i], COMMIT_NOT_FROM_GRAPH, > + graph_pos_slab.slab_size * sizeof(uint32_t)); > + } > + > + *pos = position; > +} > + > define_commit_slab(generation_slab, uint32_t); > static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab); > [...] > @@ -142,4 +143,6 @@ void free_commit_graph(struct commit_graph *); > void disable_commit_graph(struct repository *r); > > uint32_t generation(const struct commit *c); > + > +uint32_t graph_pos(const struct commit *c); > #endif I wonder why those helper functions: graph_pos() and set_graph_pos() were not introduced in the 1st patch of this series, together with generation() and set_generation(). The same comments as for previous patch apply: if graph_pos was not explicitely set, we want for it to be COMMIT_NOT_FROM_GRAPH (like init_commit_node() did before this change). If it is not on commit-slab, it is not set -- this is handled by graph_pos() function. However we allocate memory on slab in chunks, and set_graph_pos() ensures that those extra allocated `graph_pos` values on commit-slab are properly initialized to COMMIT_NOT_FROM_GRAPH, like init_commit_node() did. Note that we always have COMMIT_NOT_FROM_GRAPH for `graph_pos` if and only if there is GENERATION_NUMBER_INFINITY for `generation`, so perhaps putting those two together in `struct commit_graph_info` would make sense. But whether doing more work in "setter" set_*() functions or doing extra conditional in "getter" *() would give better performance needs (micro-)benchmarking. > diff --git a/contrib/coccinelle/graph_pos.cocci b/contrib/coccinelle/graph_pos.cocci > new file mode 100644 > index 0000000000..0929164bdf > --- /dev/null > +++ b/contrib/coccinelle/graph_pos.cocci > @@ -0,0 +1,12 @@ > +@@ > +struct commit *c; > +expression E; > +@@ > +- c->graph_pos = E > ++ set_graph_pos(c, E) > + > +@@ > +struct commit *c; > +@@ > +- c->graph_pos > ++ graph_pos(c) This is semantic patch that generates the rest of changes. (If any of those needs manual improvement, it is better left for a separate "manual fixup" patch, in my opinion.) > diff --git a/bloom.c b/bloom.c > index 9b86aa3f59..5bee5bb0c1 100644 > --- a/bloom.c > +++ b/bloom.c > @@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, > { > uint32_t lex_pos, start_index, end_index; > > - while (c->graph_pos < g->num_commits_in_base) > + while (graph_pos(c) < g->num_commits_in_base) > g = g->base_graph; > > /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ > if (!g->chunk_bloom_indexes) > return 0; > > - lex_pos = c->graph_pos - g->num_commits_in_base; > + lex_pos = graph_pos(c) - g->num_commits_in_base; > > end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); > Here graph_pos(c) is used twice. It probably needs to be checked (with benchmark) if it matters. > @@ -188,7 +188,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, > > if (!filter->data) { > load_commit_graph_info(r, c); > - if (c->graph_pos != COMMIT_NOT_FROM_GRAPH && > + if (graph_pos(c) != COMMIT_NOT_FROM_GRAPH && > r->objects->commit_graph->chunk_bloom_indexes) { > if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) > return filter; > diff --git a/commit-graph.c b/commit-graph.c > index 9ce7d4acb1..7ff460b442 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -697,7 +725,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r, > c = lookup_commit(r, &oid); > if (!c) > die(_("could not find commit %s"), oid_to_hex(&oid)); > - c->graph_pos = pos; > + set_graph_pos(c, pos); > return &commit_list_insert(c, pptr)->next; > } > > @@ -711,7 +739,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, > > lex_index = pos - g->num_commits_in_base; > commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; > - item->graph_pos = pos; > + set_graph_pos(item, pos); > set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2); > } > > @@ -741,7 +769,7 @@ static int fill_commit_in_graph(struct repository *r, > * Store the "full" position, but then use the > * "local" position for the rest of the calculation. > */ > - item->graph_pos = pos; > + set_graph_pos(item, pos); > lex_index = pos - g->num_commits_in_base; > > commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; > @@ -786,8 +814,8 @@ static int fill_commit_in_graph(struct repository *r, > > static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) > { > - if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { > - *pos = item->graph_pos; > + if (graph_pos(item) != COMMIT_NOT_FROM_GRAPH) { > + *pos = graph_pos(item); > return 1; Here graph_pos(item) is used twice. > } else { > struct commit_graph *cur_g = g; > @@ -843,11 +871,11 @@ static struct tree *load_tree_for_commit(struct repository *r, > struct object_id oid; > const unsigned char *commit_data; > > - while (c->graph_pos < g->num_commits_in_base) > + while (graph_pos(c) < g->num_commits_in_base) > g = g->base_graph; > > commit_data = g->chunk_commit_data + > - GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base); > + GRAPH_DATA_WIDTH * (graph_pos(c) - g->num_commits_in_base); > > hashcpy(oid.hash, commit_data); > set_commit_tree(c, lookup_tree(r, &oid)); Here graph_pos(c) is used twice. > @@ -861,7 +889,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r, > { > if (c->maybe_tree) > return c->maybe_tree; > - if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) > + if (graph_pos(c) == COMMIT_NOT_FROM_GRAPH) > BUG("get_commit_tree_in_graph_one called from non-commit-graph commit"); > > return load_tree_for_commit(r, g, (struct commit *)c); > @@ -1247,7 +1275,7 @@ static void close_reachable(struct write_commit_graph_context *ctx) > continue; > if (ctx->split) { > if ((!parse_commit(commit) && > - commit->graph_pos == COMMIT_NOT_FROM_GRAPH) || > + graph_pos(commit) == COMMIT_NOT_FROM_GRAPH) || > flags == COMMIT_GRAPH_SPLIT_REPLACE) > add_missing_parents(ctx, commit); > } else if (!parse_commit_no_graph(commit)) > @@ -1493,7 +1521,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) > if (ctx->split) { > struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]); > > - if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH) > + if (!c || graph_pos(c) != COMMIT_NOT_FROM_GRAPH) > continue; > } > > @@ -1527,7 +1555,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx) > ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]); > > if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE && > - ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH) > + graph_pos(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH) > continue; > > if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE) > diff --git a/commit.c b/commit.c > index 8dad0f8446..da6de08b2b 100644 > --- a/commit.c > +++ b/commit.c > @@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r, > if (commit->maybe_tree || !commit->object.parsed) > return commit->maybe_tree; > > - if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH) > + if (graph_pos(commit) != COMMIT_NOT_FROM_GRAPH) > return get_commit_tree_in_graph(r, commit); > > return NULL; Best, -- Jakub Narębski ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar ` (2 preceding siblings ...) 2020-06-04 7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar @ 2020-06-04 14:22 ` Derrick Stolee 2020-06-04 17:55 ` Junio C Hamano 2020-06-07 19:53 ` SZEDER Gábor 2020-06-05 19:00 ` Jakub Narębski ` (2 subsequent siblings) 6 siblings, 2 replies; 39+ messages in thread From: Derrick Stolee @ 2020-06-04 14:22 UTC (permalink / raw) To: Abhishek Kumar, git; +Cc: jnareb On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > The struct commit is used in many contexts. However, members generation > and graph_pos are only used for commit-graph related operations and > otherwise waste memory. > > This wastage would have been more pronounced as transistion to > generation number v2, which uses 64-bit generation number instead of > current 32-bits. Thanks! This is an important step, and will already improve performance in subtle ways. > The third patch ("commit: convert commit->graph_pos to a slab", > 2020-06-04) is currently failing diff-submodule related tests (t4041, > t4059 and t4060) for gcc [1]. I am going to send a second version soon, > fixing that. > > [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189 > > Abhishek Kumar (3): > commit: introduce helpers for generation slab > commit: convert commit->generation to a slab > commit: convert commit->graph_pos to a slab If we have a commit-graph file, then we have graph_pos and generation both coming from that file. Perhaps it would be better to combine the data into a single slab that stores a "struct commit_graph_data" or something? This would change only the slab definitions, since you already do a good job of wrapping the slab access in methods. > alloc.c | 2 - > blame.c | 2 +- > bloom.c | 6 +- > commit-graph.c | 116 +++++++++++++++++++++------- > commit-graph.h | 8 ++ > commit-reach.c | 50 ++++++------ > commit.c | 6 +- > commit.h | 6 -- > contrib/coccinelle/generation.cocci | 12 +++ > contrib/coccinelle/graph_pos.cocci | 12 +++ > revision.c | 16 ++-- > 11 files changed, 158 insertions(+), 78 deletions(-) > create mode 100644 contrib/coccinelle/generation.cocci > create mode 100644 contrib/coccinelle/graph_pos.cocci I appreciate the Coccinelle scripts to help identify automatic fixes for other topics in-flight. However, I wonder if they would be better placed inside the existing commit.cocci file? Thanks, -Stolee ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee @ 2020-06-04 17:55 ` Junio C Hamano 2020-06-07 19:53 ` SZEDER Gábor 1 sibling, 0 replies; 39+ messages in thread From: Junio C Hamano @ 2020-06-04 17:55 UTC (permalink / raw) To: Derrick Stolee; +Cc: Abhishek Kumar, git, jnareb Derrick Stolee <stolee@gmail.com> writes: > If we have a commit-graph file, then we have graph_pos > and generation both coming from that file. Perhaps it > would be better to combine the data into a single slab > that stores a "struct commit_graph_data" or something? Excellent. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee 2020-06-04 17:55 ` Junio C Hamano @ 2020-06-07 19:53 ` SZEDER Gábor 2020-06-08 5:48 ` Abhishek Kumar 1 sibling, 1 reply; 39+ messages in thread From: SZEDER Gábor @ 2020-06-07 19:53 UTC (permalink / raw) To: Derrick Stolee; +Cc: Abhishek Kumar, git, jnareb On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote: > On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > > The struct commit is used in many contexts. However, members generation > > and graph_pos are only used for commit-graph related operations and > > otherwise waste memory. > > > > This wastage would have been more pronounced as transistion to > > generation number v2, which uses 64-bit generation number instead of > > current 32-bits. > > Thanks! This is an important step, and will already improve > performance in subtle ways. While the reduced memory footprint of each commit object might improve performance, accessing graph position and generation numbers in a commit-slab is more expensive than direct field accesses in 'struct commit' instances. Consequently, these patches increase the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux repository from 0.630s to 0.940s. > > create mode 100644 contrib/coccinelle/generation.cocci > > create mode 100644 contrib/coccinelle/graph_pos.cocci > > I appreciate the Coccinelle scripts to help identify > automatic fixes for other topics in-flight. However, > I wonder if they would be better placed inside the > existing commit.cocci file? We add Coccinelle scripts to avoid undesirable code patterns entering our code base. That, however, is not the case here: this is a one-time conversion, and at the end of this series 'struct commit' won't have a 'generation' field anymore, so once it's merged the compiler will catch any new 'commit->generation' accesses. Therefore I don't think that these Coccinelle scripts should be added at all. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-07 19:53 ` SZEDER Gábor @ 2020-06-08 5:48 ` Abhishek Kumar 2020-06-08 8:36 ` SZEDER Gábor 0 siblings, 1 reply; 39+ messages in thread From: Abhishek Kumar @ 2020-06-08 5:48 UTC (permalink / raw) To: SZEDER Gábor; +Cc: stolee, jnareb, git On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote: > On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote: > > On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > > > The struct commit is used in many contexts. However, members generation > > > and graph_pos are only used for commit-graph related operations and > > > otherwise waste memory. > > > > > > This wastage would have been more pronounced as transistion to > > > generation number v2, which uses 64-bit generation number instead of > > > current 32-bits. > > > > Thanks! This is an important step, and will already improve > > performance in subtle ways. > > While the reduced memory footprint of each commit object might improve > performance, accessing graph position and generation numbers in a > commit-slab is more expensive than direct field accesses in 'struct > commit' instances. Consequently, these patches increase the runtime > of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux > repository from 0.630s to 0.940s. > Thank you for checking performance. Performance penalty was something we had discussed here [1]. Caching the commit slab results in local variables helped wonderfully in v2 [2]. For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux repository increased from 0.762 to 0.767s. Since this is a change of <1%, it is *no longer* a performance regression in my opinion. [1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/ [2]: https://lore.kernel.org/git/20200607193237.699335-5-abhishekkumar8222@gmail.com/ > > > > create mode 100644 contrib/coccinelle/generation.cocci > > > create mode 100644 contrib/coccinelle/graph_pos.cocci > > > > I appreciate the Coccinelle scripts to help identify > > automatic fixes for other topics in-flight. However, > > I wonder if they would be better placed inside the > > existing commit.cocci file? > > We add Coccinelle scripts to avoid undesirable code patterns entering > our code base. That, however, is not the case here: this is a > one-time conversion, and at the end of this series 'struct commit' > won't have a 'generation' field anymore, so once it's merged the > compiler will catch any new 'commit->generation' accesses. Therefore > I don't think that these Coccinelle scripts should be added at all. > Alright, that makes sense to me. Will remove in a subsequent version. Thanks Abhishek ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-08 5:48 ` Abhishek Kumar @ 2020-06-08 8:36 ` SZEDER Gábor 2020-06-08 13:45 ` Derrick Stolee 2020-06-08 15:21 ` Jakub Narębski 0 siblings, 2 replies; 39+ messages in thread From: SZEDER Gábor @ 2020-06-08 8:36 UTC (permalink / raw) To: Abhishek Kumar; +Cc: stolee, jnareb, git On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote: > On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote: > > On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote: > > > On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > > > > The struct commit is used in many contexts. However, members generation > > > > and graph_pos are only used for commit-graph related operations and > > > > otherwise waste memory. > > > > > > > > This wastage would have been more pronounced as transistion to > > > > generation number v2, which uses 64-bit generation number instead of > > > > current 32-bits. > > > > > > Thanks! This is an important step, and will already improve > > > performance in subtle ways. > > > > While the reduced memory footprint of each commit object might improve > > performance, accessing graph position and generation numbers in a > > commit-slab is more expensive than direct field accesses in 'struct > > commit' instances. Consequently, these patches increase the runtime > > of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux > > repository from 0.630s to 0.940s. > > > > Thank you for checking performance. Performance penalty was something we > had discussed here [1]. > > Caching the commit slab results in local variables helped wonderfully in v2 [2]. > For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD' > in the linux repository increased from 0.762 to 0.767s. Since this is a > change of <1%, it is *no longer* a performance regression in my opinion. Interesting, I measured 0.870s with v2, still a notable increase from 0.630s. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-08 8:36 ` SZEDER Gábor @ 2020-06-08 13:45 ` Derrick Stolee 2020-06-08 16:46 ` SZEDER Gábor 2020-06-08 15:21 ` Jakub Narębski 1 sibling, 1 reply; 39+ messages in thread From: Derrick Stolee @ 2020-06-08 13:45 UTC (permalink / raw) To: SZEDER Gábor, Abhishek Kumar; +Cc: jnareb, git On 6/8/2020 4:36 AM, SZEDER Gábor wrote: > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote: >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote: >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote: >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote: >>>>> The struct commit is used in many contexts. However, members generation >>>>> and graph_pos are only used for commit-graph related operations and >>>>> otherwise waste memory. >>>>> >>>>> This wastage would have been more pronounced as transistion to >>>>> generation number v2, which uses 64-bit generation number instead of >>>>> current 32-bits. >>>> >>>> Thanks! This is an important step, and will already improve >>>> performance in subtle ways. >>> >>> While the reduced memory footprint of each commit object might improve >>> performance, accessing graph position and generation numbers in a >>> commit-slab is more expensive than direct field accesses in 'struct >>> commit' instances. Consequently, these patches increase the runtime >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux >>> repository from 0.630s to 0.940s. >>> >> >> Thank you for checking performance. Performance penalty was something we >> had discussed here [1]. >> >> Caching the commit slab results in local variables helped wonderfully in v2 [2]. >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD' >> in the linux repository increased from 0.762 to 0.767s. Since this is a >> change of <1%, it is *no longer* a performance regression in my opinion. > > Interesting, I measured 0.870s with v2, still a notable increase from > 0.630s. This is an interesting point. The --is-ancestor is critical to the performance issue (as measured on my machine). For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get v2.27.0: real 0m0.515s user 0m0.467s sys 0m0.048s v2 series: real 0m0.534s user 0m0.481s sys 0m0.053s With "--is-ancestor" I see the following: v2.27.0: real 0m0.591s user 0m0.539s sys 0m0.052s v2 series: real 0m0.773s user 0m0.733s sys 0m0.040s The --is-ancestor option [1] says Check if the first <commit> is an ancestor of the second <commit>, and exit with status 0 if true, or with status 1 if not. Errors are signaled by a non-zero status that is not 1. [1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor This _should_ be faster than "git branch --contains HEAD~50000", but it is much much slower: $ time git branch --contains HEAD~50000 real 0m0.068s user 0m0.061s sys 0m0.008s So, there is definitely something going on that slows the "--is-ancestor" path in this case. But, the solution is not to halt the current patch (which likely has memory footprint benefits when dealing with a lot of tree and blob objects) and instead fix the underlying algorithm. Let's add that to the list of things to do. >>> create mode 100644 contrib/coccinelle/generation.cocci >>> create mode 100644 contrib/coccinelle/graph_pos.cocci >> >> I appreciate the Coccinelle scripts to help identify >> automatic fixes for other topics in-flight. However, >> I wonder if they would be better placed inside the >> existing commit.cocci file? > > We add Coccinelle scripts to avoid undesirable code patterns entering > our code base. That, however, is not the case here: this is a > one-time conversion, and at the end of this series 'struct commit' > won't have a 'generation' field anymore, so once it's merged the > compiler will catch any new 'commit->generation' accesses. Therefore > I don't think that these Coccinelle scripts should be added at all. I disagree. We _also_ add Coccinelle scripts when doing one-time refactors to avoid logical merge conflicts with other topics in flight. If someone else is working on a parallel topic that adds references to graph_pos or generation member, then the scripts provide an easy way for the maintainer to update those references in the merge commit. Alternatively, the contributor could rebase on top of this series and run the scripts themselves to fix their patches before submission. For example, this was done carefully in the sha->object_id conversion using contrib/coccinelle/object_id.cocci. Thanks, -Stolee ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-08 13:45 ` Derrick Stolee @ 2020-06-08 16:46 ` SZEDER Gábor 0 siblings, 0 replies; 39+ messages in thread From: SZEDER Gábor @ 2020-06-08 16:46 UTC (permalink / raw) To: Derrick Stolee; +Cc: Abhishek Kumar, jnareb, git On Mon, Jun 08, 2020 at 09:45:12AM -0400, Derrick Stolee wrote: > On 6/8/2020 4:36 AM, SZEDER Gábor wrote: > > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote: > >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote: > >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote: > >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote: > >>>>> The struct commit is used in many contexts. However, members generation > >>>>> and graph_pos are only used for commit-graph related operations and > >>>>> otherwise waste memory. > >>>>> > >>>>> This wastage would have been more pronounced as transistion to > >>>>> generation number v2, which uses 64-bit generation number instead of > >>>>> current 32-bits. > >>>> > >>>> Thanks! This is an important step, and will already improve > >>>> performance in subtle ways. > >>> > >>> While the reduced memory footprint of each commit object might improve > >>> performance, accessing graph position and generation numbers in a > >>> commit-slab is more expensive than direct field accesses in 'struct > >>> commit' instances. Consequently, these patches increase the runtime > >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux > >>> repository from 0.630s to 0.940s. > >>> > >> > >> Thank you for checking performance. Performance penalty was something we > >> had discussed here [1]. > >> > >> Caching the commit slab results in local variables helped wonderfully in v2 [2]. > >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD' > >> in the linux repository increased from 0.762 to 0.767s. Since this is a > >> change of <1%, it is *no longer* a performance regression in my opinion. > > > > Interesting, I measured 0.870s with v2, still a notable increase from > > 0.630s. > > This is an interesting point. The --is-ancestor is critical to the > performance issue (as measured on my machine). > > For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get > > v2.27.0: > real 0m0.515s > user 0m0.467s > sys 0m0.048s > > v2 series: > real 0m0.534s > user 0m0.481s > sys 0m0.053s I, too, see similarly small differences in this case. > With "--is-ancestor" I see the following: > > v2.27.0: > real 0m0.591s > user 0m0.539s > sys 0m0.052s > > v2 series: > real 0m0.773s > user 0m0.733s > sys 0m0.040s > > The --is-ancestor option [1] says > > Check if the first <commit> is an ancestor of the second > <commit>, and exit with status 0 if true, or with status > 1 if not. Errors are signaled by a non-zero status that > is not 1. > > [1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor > > This _should_ be faster than "git branch --contains HEAD~50000", > but it is much much slower: > > $ time git branch --contains HEAD~50000 > real 0m0.068s > user 0m0.061s > sys 0m0.008s > > So, there is definitely something going on that slows the > "--is-ancestor" path in this case. But, the solution is not > to halt the current patch (which likely has memory footprint > benefits when dealing with a lot of tree and blob objects) > and instead fix the underlying algorithm. Other, more common cases are affected as well, notably the simple 'git rev-list --topo-order': performance: 1.226479734 s: git command: /home/szeder/src/git/BUILDS/v2.27.0/bin/git rev-list --topo-order HEAD max RSS: 162400k performance: 1.741309536 s: git command: /home/szeder/src/git/git rev-list --topo-order HEAD max RSS: 169556k Is the supposed memory footprint reduction that large to justify this runtime increase? > Let's add that to the list of things to do. And to the commit messages. > >>> create mode 100644 contrib/coccinelle/generation.cocci > >>> create mode 100644 contrib/coccinelle/graph_pos.cocci > >> > >> I appreciate the Coccinelle scripts to help identify > >> automatic fixes for other topics in-flight. However, > >> I wonder if they would be better placed inside the > >> existing commit.cocci file? > > > > We add Coccinelle scripts to avoid undesirable code patterns entering > > our code base. That, however, is not the case here: this is a > > one-time conversion, and at the end of this series 'struct commit' > > won't have a 'generation' field anymore, so once it's merged the > > compiler will catch any new 'commit->generation' accesses. Therefore > > I don't think that these Coccinelle scripts should be added at all. > > I disagree. We _also_ add Coccinelle scripts when doing one-time > refactors to avoid logical merge conflicts with other topics in > flight. If someone else is working on a parallel topic that adds > references to graph_pos or generation member, then the scripts provide > an easy way for the maintainer to update those references in the merge > commit. Alternatively, the contributor could rebase on top of this > series and run the scripts themselves to fix their patches before > submission. > > For example, this was done carefully in the sha->object_id > conversion using contrib/coccinelle/object_id.cocci. 'object_id.cocci' is not about sha->object_id conversions, but about avoiding undesirable code patterns, e.g. we prefer oideq() over !oidcmp(), and the compiler, of course, can't help to catch that. Coccinelle scripts used for actual sha->object_id transformations were not added to 'object_id.cocci', but were recorded only in the commit messages for reference, see e.g. 9b56149996 (merge-recursive: convert struct merge_file_info to object_id, 2016-06-24) and a couple of its ancestors. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-08 8:36 ` SZEDER Gábor 2020-06-08 13:45 ` Derrick Stolee @ 2020-06-08 15:21 ` Jakub Narębski 1 sibling, 0 replies; 39+ messages in thread From: Jakub Narębski @ 2020-06-08 15:21 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Abhishek Kumar, Derrick Stolee, git SZEDER Gábor <szeder.dev@gmail.com> writes: > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote: >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote: >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote: >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote: >>>>> The struct commit is used in many contexts. However, members generation >>>>> and graph_pos are only used for commit-graph related operations and >>>>> otherwise waste memory. >>>>> >>>>> This wastage would have been more pronounced as transistion to >>>>> generation number v2, which uses 64-bit generation number instead of >>>>> current 32-bits. >>>> >>>> Thanks! This is an important step, and will already improve >>>> performance in subtle ways. >>> >>> While the reduced memory footprint of each commit object might improve >>> performance, accessing graph position and generation numbers in a >>> commit-slab is more expensive than direct field accesses in 'struct >>> commit' instances. Consequently, these patches increase the runtime >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux >>> repository from 0.630s to 0.940s. >> >> Thank you for checking performance. Performance penalty was something we >> had discussed here [1]. >> >> Caching the commit slab results in local variables helped wonderfully in v2 [2]. >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD' >> in the linux repository increased from 0.762 to 0.767s. Since this is a >> change of <1%, it is *no longer* a performance regression in my opinion. >> >> [1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/ >> [2]: https://lore.kernel.org/git/20200607193237.699335-5-abhishekkumar8222@gmail.com/ > > Interesting, I measured 0.870s with v2, still a notable increase from > 0.630s [a change of +38%]. I wonder what might be the cause for this difference. Is it difference in hardware (faster memory, larger CPU cache?), difference in operating system, or difference in position of HEAD? On one hand it is large relative difference. On the other hand it is almost unnoticeable absolute difference of 0.25s. I also wonder how the performance changes (with moving commit-graph data to the slab) for commands that do not use this data, like e.g.: $ git -o core.commitGraph=false merge-base --is-ancestor HEAD~50000 HEAD or $ git gc Sidenote: I think the performance changes should be mentioned at least in the cover letter for the series, if not in commit message(s). Best, -- Jakub Narębski ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar ` (3 preceding siblings ...) 2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee @ 2020-06-05 19:00 ` Jakub Narębski 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar 6 siblings, 0 replies; 39+ messages in thread From: Jakub Narębski @ 2020-06-05 19:00 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, Derrick Stolee, Jakub Narębski Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > The struct commit is used in many contexts. However, members generation > and graph_pos are only used for commit-graph related operations and > otherwise waste memory. Very minor nitpick: this sentence would read better if the names of `generation` and `graph_pos` fields (but especially the 'generation') were quoted. > > This wastage would have been more pronounced as transistion to > generation number v2, which uses 64-bit generation number instead of > current 32-bits. Good. Moving reachability index value into a commit slab was one of prerequisites to switching to the generation number v2, see [2] [2]: https://public-inbox.org/git/cfa2c367-5cd7-add5-0293-caa75b103f34@gmail.com/t/#u The other prerequisite was proper handling of commit-graph format change, either by using "metadata chunk" as more flexible replacement of mishandled format version field in the commit-graph file header, or as proposed in [3] (and subsequent posts), removing "CDAT" chunk and replacing it with "CDA2" chunk. [3]: https://public-inbox.org/git/xmqq369z7i1b.fsf@gitster.c.googlers.com/t/#u Also, we should probably stop mishandling the format version field, that is do not error out [4] when commit-graph version of the file does not match version supported by git code running the command, but just simply not use the commit-graph (like it is done for Bloom filter chunks). [4]: https://github.com/git/git/blob/master/commit-graph.c#L253 > > The third patch ("commit: convert commit->graph_pos to a slab", > 2020-06-04) is currently failing diff-submodule related tests (t4041, > t4059 and t4060) for gcc [1]. I am going to send a second version soon, > fixing that. > > [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189 > > Abhishek Kumar (3): > commit: introduce helpers for generation slab > commit: convert commit->generation to a slab > commit: convert commit->graph_pos to a slab > > alloc.c | 2 - > blame.c | 2 +- > bloom.c | 6 +- > commit-graph.c | 116 +++++++++++++++++++++------- > commit-graph.h | 8 ++ > commit-reach.c | 50 ++++++------ > commit.c | 6 +- > commit.h | 6 -- > contrib/coccinelle/generation.cocci | 12 +++ > contrib/coccinelle/graph_pos.cocci | 12 +++ It is nice to see the use of Coccinelle scripts. > revision.c | 16 ++-- > 11 files changed, 158 insertions(+), 78 deletions(-) > create mode 100644 contrib/coccinelle/generation.cocci > create mode 100644 contrib/coccinelle/graph_pos.cocci Best, -- Jakub Narębski ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v2 0/4] Move generation, graph_pos to a slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar ` (4 preceding siblings ...) 2020-06-05 19:00 ` Jakub Narębski @ 2020-06-07 19:32 ` Abhishek Kumar 2020-06-07 19:32 ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar ` (5 more replies) 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar 6 siblings, 6 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw) To: git; +Cc: jnareb, stolee The struct commit is used in many contexts. However, members `generation` and `graph_pos` are only used for commit graph related operations and otherwise waste memory. This wastage would have been more pronounced as we transition to generation number v2, which uses 64-bite generation number instead of current 32-bits. Abhishek Kumar (4): commit-graph: introduce commit_graph_data_slab commit: move members graph_pos, generation to a slab commit-graph: use generation directly when writing commit-graph commit-graph: minimize commit_graph_data_slab access alloc.c | 2 - blame.c | 2 +- bloom.c | 7 +- commit-graph.c | 127 ++++++++++++++++++++++++-------- commit-graph.h | 10 +++ commit-reach.c | 69 ++++++++++------- commit.c | 8 +- contrib/coccinelle/commit.cocci | 18 +++++ revision.c | 20 +++-- 9 files changed, 190 insertions(+), 73 deletions(-) -- 2.27.0 Thanks to Dr. Stolee, Dr. Narebski and Junio for their excellent suggestions. Changes in v2: - Introduce struct commit_graph_data. - Merge `graph_pos`, `generation` slabs into a single, `commit_graph_data` slab. - Use graph position for an intermediate check for generation, saving the cost of initializing generation numbers. - Add an follow-up patch caching results of slab access in local variables. - Move coccinelle transformation to commit.coccinelle instead of creating new scripts. - Elaborate on removing default values from init_commit_node(). - Revert moving macro constants (e.g. COMMIT_NOT_FROM_GRAPH, GENERATION_NUMBER_ZERO) from commit.h to commit-graph.h About the failing diff-submodule related tests, I came up with a plausible explanation but could be wrong on this: Commit slabs rely on uniqueness of commit->index to access data. But submodules are repositories on their own, alloc_commit_index(), which relies on repository->parsed_objects->commit_count no longer returns unique values. A commit belong to super repo and another belonging to submodule might have the same index but different generation and graph positions. This could be fixed by defining commit index as maximum of commit index of all repositories + 1 but I have no idea how that would impact other code. Thoughts on this? Regards Abhishek ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar @ 2020-06-07 19:32 ` Abhishek Kumar 2020-06-15 16:27 ` Taylor Blau 2020-06-07 19:32 ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar ` (4 subsequent siblings) 5 siblings, 1 reply; 39+ messages in thread From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw) To: git; +Cc: jnareb, stolee The struct commit is used in many contexts. However, members `generation` and `graph_pos` are only used for commit-graph related operations and otherwise waste memory. As they are often accessed together, let's introduce struct commit_graph_data and move them to a commit_graph_data slab. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- commit-graph.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ commit-graph.h | 10 ++++++++++ 2 files changed, 59 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index e3420ddcbf..7d887a6a2c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb) commit_pos_at(&commit_pos, b); } +define_commit_slab(commit_graph_data_slab, struct commit_graph_data); +static struct commit_graph_data_slab commit_graph_data_slab = + COMMIT_SLAB_INIT(1, commit_graph_data_slab); + +uint32_t commit_graph_position(const struct commit *c) +{ + struct commit_graph_data *data = + commit_graph_data_slab_peek(&commit_graph_data_slab, c); + + return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH; +} + +uint32_t commit_graph_generation(const struct commit *c) +{ + struct commit_graph_data *data = + commit_graph_data_slab_peek(&commit_graph_data_slab, c); + + if (!data) + return GENERATION_NUMBER_INFINITY; + if (data->graph_pos == COMMIT_NOT_FROM_GRAPH) + return GENERATION_NUMBER_INFINITY; + + return data->generation; +} + +static struct commit_graph_data *commit_graph_data_at(const struct commit *c) +{ + uint32_t i = commit_graph_data_slab.slab_count, j; + uint32_t slab_size = commit_graph_data_slab.slab_size; + struct commit_graph_data *data = + commit_graph_data_slab_at(&commit_graph_data_slab, c); + + /* + * commit-slab initializes elements with zero, overwrite this with + * COMMIT_NOT_FROM_GRAPH for graph_pos. + * + * We avoid the cost of initializing `generation` as generation + * number would be GENERATION_NUMBER_INFINITY if graph position + * is COMMIT_NOT_FROM_GRAPH. + */ + for (; i < commit_graph_data_slab.slab_count; i++) { + for (j = 0; j < slab_size; j++) { + commit_graph_data_slab.slab[i][j].graph_pos = COMMIT_NOT_FROM_GRAPH; + } + } + + return data; +} + static int commit_gen_cmp(const void *va, const void *vb) { const struct commit *a = *(const struct commit **)va; diff --git a/commit-graph.h b/commit-graph.h index 4212766a4f..9d22f98f44 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -137,4 +137,14 @@ void free_commit_graph(struct commit_graph *); */ void disable_commit_graph(struct repository *r); +struct commit_graph_data { + uint32_t graph_pos; + uint32_t generation; +}; + +/* + * Commits should be parsed before accessing generation, graph positions. + */ +uint32_t commit_graph_generation(const struct commit *); +uint32_t commit_graph_position(const struct commit *); #endif -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab 2020-06-07 19:32 ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar @ 2020-06-15 16:27 ` Taylor Blau 0 siblings, 0 replies; 39+ messages in thread From: Taylor Blau @ 2020-06-15 16:27 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, jnareb, stolee On Mon, Jun 08, 2020 at 01:02:34AM +0530, Abhishek Kumar wrote: > The struct commit is used in many contexts. However, members > `generation` and `graph_pos` are only used for commit-graph related > operations and otherwise waste memory. > > As they are often accessed together, let's introduce struct > commit_graph_data and move them to a commit_graph_data slab. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ > commit-graph.h | 10 ++++++++++ > 2 files changed, 59 insertions(+) > > diff --git a/commit-graph.c b/commit-graph.c > index e3420ddcbf..7d887a6a2c 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb) > commit_pos_at(&commit_pos, b); > } > > +define_commit_slab(commit_graph_data_slab, struct commit_graph_data); > +static struct commit_graph_data_slab commit_graph_data_slab = > + COMMIT_SLAB_INIT(1, commit_graph_data_slab); > + > +uint32_t commit_graph_position(const struct commit *c) > +{ > + struct commit_graph_data *data = > + commit_graph_data_slab_peek(&commit_graph_data_slab, c); > + > + return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH; > +} > + > +uint32_t commit_graph_generation(const struct commit *c) > +{ > + struct commit_graph_data *data = > + commit_graph_data_slab_peek(&commit_graph_data_slab, c); > + > + if (!data) > + return GENERATION_NUMBER_INFINITY; > + if (data->graph_pos == COMMIT_NOT_FROM_GRAPH) > + return GENERATION_NUMBER_INFINITY; > + > + return data->generation; > +} > + > +static struct commit_graph_data *commit_graph_data_at(const struct commit *c) > +{ > + uint32_t i = commit_graph_data_slab.slab_count, j; > + uint32_t slab_size = commit_graph_data_slab.slab_size; > + struct commit_graph_data *data = > + commit_graph_data_slab_at(&commit_graph_data_slab, c); > + > + /* > + * commit-slab initializes elements with zero, overwrite this with > + * COMMIT_NOT_FROM_GRAPH for graph_pos. > + * > + * We avoid the cost of initializing `generation` as generation > + * number would be GENERATION_NUMBER_INFINITY if graph position > + * is COMMIT_NOT_FROM_GRAPH. > + */ > + for (; i < commit_graph_data_slab.slab_count; i++) { > + for (j = 0; j < slab_size; j++) { > + commit_graph_data_slab.slab[i][j].graph_pos = COMMIT_NOT_FROM_GRAPH; > + } > + } > + > + return data; > +} > + It looks like this function ('commit_graph_data_at') is unused in this patch. No big deal, especially since I can see that you are using it in the second patch, but I think that you should remove it from this patch and instead introduce it there. In fact, this causes a build error (with 'make DEVELOPER=1') because of '-Wunused-function'. It's good practice to "git rebase -x 'make DEVELOPER=1' origin/master" before sending. > static int commit_gen_cmp(const void *va, const void *vb) > { > const struct commit *a = *(const struct commit **)va; > diff --git a/commit-graph.h b/commit-graph.h > index 4212766a4f..9d22f98f44 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -137,4 +137,14 @@ void free_commit_graph(struct commit_graph *); > */ > void disable_commit_graph(struct repository *r); > > +struct commit_graph_data { > + uint32_t graph_pos; > + uint32_t generation; > +}; > + > +/* > + * Commits should be parsed before accessing generation, graph positions. > + */ > +uint32_t commit_graph_generation(const struct commit *); > +uint32_t commit_graph_position(const struct commit *); > #endif > -- > 2.27.0 Thanks, Taylor ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar 2020-06-07 19:32 ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar @ 2020-06-07 19:32 ` Abhishek Kumar 2020-06-08 8:26 ` SZEDER Gábor 2020-06-07 19:32 ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar ` (3 subsequent siblings) 5 siblings, 1 reply; 39+ messages in thread From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw) To: git; +Cc: jnareb, stolee We remove members `graph_pos` and `generation` from the struct commit. The default assignments in init_commit_node() are no longer valid, which is fine as the slab helpers return appropriate default values and the assignments are removed. We will replace existing use of commit->generation and commit->graph_pos by commit_graph_data slab helpers using `contrib/coccinelle/commit.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- alloc.c | 2 -- blame.c | 2 +- bloom.c | 6 ++-- commit-graph.c | 60 ++++++++++++++++----------------- commit-graph.h | 2 +- commit-reach.c | 50 +++++++++++++-------------- commit.c | 6 ++-- contrib/coccinelle/commit.cocci | 18 ++++++++++ revision.c | 16 ++++----- 9 files changed, 89 insertions(+), 73 deletions(-) diff --git a/alloc.c b/alloc.c index 1c64c4dd16..f37fb3b8b6 100644 --- a/alloc.c +++ b/alloc.c @@ -108,8 +108,6 @@ void init_commit_node(struct repository *r, struct commit *c) { c->object.type = OBJ_COMMIT; c->index = alloc_commit_index(r); - c->graph_pos = COMMIT_NOT_FROM_GRAPH; - c->generation = GENERATION_NUMBER_INFINITY; } void *alloc_commit_node(struct repository *r) diff --git a/blame.c b/blame.c index da7e28800e..82fa16d658 100644 --- a/blame.c +++ b/blame.c @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r, if (!bd) return 1; - if (origin->commit->generation == GENERATION_NUMBER_INFINITY) + if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; filter = get_bloom_filter(r, origin->commit, 0); diff --git a/bloom.c b/bloom.c index 9b86aa3f59..df62e3763d 100644 --- a/bloom.c +++ b/bloom.c @@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, { uint32_t lex_pos, start_index, end_index; - while (c->graph_pos < g->num_commits_in_base) + while (commit_graph_position(c) < g->num_commits_in_base) g = g->base_graph; /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ if (!g->chunk_bloom_indexes) return 0; - lex_pos = c->graph_pos - g->num_commits_in_base; + lex_pos = commit_graph_position(c) - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); @@ -188,7 +188,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (c->graph_pos != COMMIT_NOT_FROM_GRAPH && + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && r->objects->commit_graph->chunk_bloom_indexes) { if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) return filter; diff --git a/commit-graph.c b/commit-graph.c index 7d887a6a2c..f7cca4def4 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -142,9 +142,9 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *b = *(const struct commit **)vb; /* lower generation commits first */ - if (a->generation < b->generation) + if (commit_graph_generation(a) < commit_graph_generation(b)) return -1; - else if (a->generation > b->generation) + else if (commit_graph_generation(a) > commit_graph_generation(b)) return 1; /* use date as a heuristic when generations are equal */ @@ -719,7 +719,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r, c = lookup_commit(r, &oid); if (!c) die(_("could not find commit %s"), oid_to_hex(&oid)); - c->graph_pos = pos; + commit_graph_data_at(c)->graph_pos = pos; return &commit_list_insert(c, pptr)->next; } @@ -733,8 +733,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; - item->graph_pos = pos; - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + commit_graph_data_at(item)->graph_pos = pos; + commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -763,7 +763,7 @@ static int fill_commit_in_graph(struct repository *r, * Store the "full" position, but then use the * "local" position for the rest of the calculation. */ - item->graph_pos = pos; + commit_graph_data_at(item)->graph_pos = pos; lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; @@ -776,7 +776,7 @@ static int fill_commit_in_graph(struct repository *r, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; pptr = &item->parents; @@ -808,8 +808,8 @@ static int fill_commit_in_graph(struct repository *r, static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) { - if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { - *pos = item->graph_pos; + if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) { + *pos = commit_graph_position(item); return 1; } else { struct commit_graph *cur_g = g; @@ -865,11 +865,11 @@ static struct tree *load_tree_for_commit(struct repository *r, struct object_id oid; const unsigned char *commit_data; - while (c->graph_pos < g->num_commits_in_base) + while (commit_graph_position(c) < g->num_commits_in_base) g = g->base_graph; commit_data = g->chunk_commit_data + - GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base); + GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base); hashcpy(oid.hash, commit_data); set_commit_tree(c, lookup_tree(r, &oid)); @@ -883,7 +883,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r, { if (c->maybe_tree) return c->maybe_tree; - if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) + if (commit_graph_position(c) == COMMIT_NOT_FROM_GRAPH) BUG("get_commit_tree_in_graph_one called from non-commit-graph commit"); return load_tree_for_commit(r, g, (struct commit *)c); @@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, else packedDate[0] = 0; - packedDate[0] |= htonl((*list)->generation << 2); + packedDate[0] |= htonl(commit_graph_generation((*list)) << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1269,7 +1269,7 @@ static void close_reachable(struct write_commit_graph_context *ctx) continue; if (ctx->split) { if ((!parse_commit(commit) && - commit->graph_pos == COMMIT_NOT_FROM_GRAPH) || + commit_graph_position(commit) == COMMIT_NOT_FROM_GRAPH) || flags == COMMIT_GRAPH_SPLIT_REPLACE) add_missing_parents(ctx, commit); } else if (!parse_commit_no_graph(commit)) @@ -1302,8 +1302,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { display_progress(ctx->progress, i + 1); - if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY && - ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO) + if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && + commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1314,22 +1314,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) uint32_t max_generation = 0; for (parent = current->parents; parent; parent = parent->next) { - if (parent->item->generation == GENERATION_NUMBER_INFINITY || - parent->item->generation == GENERATION_NUMBER_ZERO) { + if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY || + commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (parent->item->generation > max_generation) { - max_generation = parent->item->generation; + } else if (commit_graph_generation(parent->item) > max_generation) { + max_generation = commit_graph_generation(parent->item); } } if (all_parents_computed) { - current->generation = max_generation + 1; + commit_graph_data_at(current)->generation = max_generation + 1; pop_commit(&list); - if (current->generation > GENERATION_NUMBER_MAX) - current->generation = GENERATION_NUMBER_MAX; + if (commit_graph_generation(current) > GENERATION_NUMBER_MAX) + commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX; } } } @@ -1514,7 +1514,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) if (ctx->split) { struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]); - if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH) + if (!c || commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) continue; } @@ -1548,7 +1548,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx) ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]); if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE && - ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH) + commit_graph_position(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH) continue; if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE) @@ -2336,8 +2336,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) oid_to_hex(&graph_parents->item->object.oid), oid_to_hex(&odb_parents->item->object.oid)); - if (graph_parents->item->generation > max_generation) - max_generation = graph_parents->item->generation; + if (commit_graph_generation(graph_parents->item) > max_generation) + max_generation = commit_graph_generation(graph_parents->item); graph_parents = graph_parents->next; odb_parents = odb_parents->next; @@ -2347,7 +2347,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) graph_report(_("commit-graph parent list for commit %s terminates early"), oid_to_hex(&cur_oid)); - if (!graph_commit->generation) { + if (!commit_graph_generation(graph_commit)) { if (generation_zero == GENERATION_NUMBER_EXISTS) graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"), oid_to_hex(&cur_oid)); @@ -2367,10 +2367,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (max_generation == GENERATION_NUMBER_MAX) max_generation--; - if (graph_commit->generation != max_generation + 1) + if (commit_graph_generation(graph_commit) != max_generation + 1) graph_report(_("commit-graph generation for commit %s is %u != %u"), oid_to_hex(&cur_oid), - graph_commit->generation, + commit_graph_generation(graph_commit), max_generation + 1); if (graph_commit->date != odb_commit->date) diff --git a/commit-graph.h b/commit-graph.h index 9d22f98f44..2d1fecf481 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -142,7 +142,7 @@ struct commit_graph_data { uint32_t generation; }; -/* +/* * Commits should be parsed before accessing generation, graph positions. */ uint32_t commit_graph_generation(const struct commit *); diff --git a/commit-reach.c b/commit-reach.c index 4ca7e706a1..3b2f863f5f 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit_list *parents; int flags; - if (min_generation && commit->generation > last_gen) + if (min_generation && commit_graph_generation(commit) > last_gen) BUG("bad generation skip %8x > %8x at %s", - commit->generation, last_gen, + commit_graph_generation(commit), last_gen, oid_to_hex(&commit->object.oid)); - last_gen = commit->generation; + last_gen = commit_graph_generation(commit); - if (commit->generation < min_generation) + if (commit_graph_generation(commit) < min_generation) break; flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); @@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; - uint32_t min_generation = array[i]->generation; + uint32_t min_generation = commit_graph_generation(array[i]); if (redundant[i]) continue; @@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt filled_index[filled] = j; work[filled++] = array[j]; - if (array[j]->generation < min_generation) - min_generation = array[j]->generation; + if (commit_graph_generation(array[j]) < min_generation) + min_generation = commit_graph_generation(array[j]); } common = paint_down_to_common(r, array[i], filled, work, min_generation); @@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, for (i = 0; i < nr_reference; i++) { if (repo_parse_commit(r, reference[i])) return ret; - if (reference[i]->generation < min_generation) - min_generation = reference[i]->generation; + if (commit_graph_generation(reference[i]) < min_generation) + min_generation = commit_graph_generation(reference[i]); } - if (commit->generation > min_generation) + if (commit_graph_generation(commit) > min_generation) return ret; bases = paint_down_to_common(r, commit, nr_reference, reference, - commit->generation); + commit_graph_generation(commit)); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); @@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate, /* Otherwise, we don't know; prepare to recurse */ parse_commit_or_die(candidate); - if (candidate->generation < cutoff) + if (commit_graph_generation(candidate) < cutoff) return CONTAINS_NO; return CONTAINS_UNKNOWN; @@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate, for (p = want; p; p = p->next) { struct commit *c = p->item; load_commit_graph_info(the_repository, c); - if (c->generation < cutoff) - cutoff = c->generation; + if (commit_graph_generation(c) < cutoff) + cutoff = commit_graph_generation(c); } result = contains_test(candidate, want, cache, cutoff); @@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - if (a->generation < b->generation) + if (commit_graph_generation(a) < commit_graph_generation(b)) return -1; - if (a->generation > b->generation) + if (commit_graph_generation(a) > commit_graph_generation(b)) return 1; return 0; } @@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from, list[nr_commits] = (struct commit *)from_one; if (parse_commit(list[nr_commits]) || - list[nr_commits]->generation < min_generation) { + commit_graph_generation(list[nr_commits]) < min_generation) { result = 0; goto cleanup; } @@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from, if (parse_commit(parent->item) || parent->item->date < min_commit_date || - parent->item->generation < min_generation) + commit_graph_generation(parent->item) < min_generation) continue; commit_list_insert(parent->item, &stack); @@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; - if (from_iter->item->generation < min_generation) - min_generation = from_iter->item->generation; + if (commit_graph_generation(from_iter->item) < min_generation) + min_generation = commit_graph_generation(from_iter->item); } from_iter = from_iter->next; @@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; - if (to_iter->item->generation < min_generation) - min_generation = to_iter->item->generation; + if (commit_graph_generation(to_iter->item) < min_generation) + min_generation = commit_graph_generation(to_iter->item); } to_iter->item->object.flags |= PARENT2; @@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit *c = *item; parse_commit(c); - if (c->generation < min_generation) - min_generation = c->generation; + if (commit_graph_generation(c) < min_generation) + min_generation = commit_graph_generation(c); if (!(c->object.flags & PARENT1)) { c->object.flags |= PARENT1; @@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, parse_commit(p); - if (p->generation < min_generation) + if (commit_graph_generation(p) < min_generation) continue; if (p->object.flags & PARENT2) diff --git a/commit.c b/commit.c index 87686a7055..ad9a76dcc6 100644 --- a/commit.c +++ b/commit.c @@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r, if (commit->maybe_tree || !commit->object.parsed) return commit->maybe_tree; - if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH) + if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH) return get_commit_tree_in_graph(r, commit); return NULL; @@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void const struct commit *a = a_, *b = b_; /* newer commits first */ - if (a->generation < b->generation) + if (commit_graph_generation(a) < commit_graph_generation(b)) return 1; - else if (a->generation > b->generation) + else if (commit_graph_generation(a) > commit_graph_generation(b)) return -1; /* use date as a heuristic when generations are equal */ diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci index 778e4704f6..af6dd4c20c 100644 --- a/contrib/coccinelle/commit.cocci +++ b/contrib/coccinelle/commit.cocci @@ -32,3 +32,21 @@ expression c; - c->maybe_tree + repo_get_commit_tree(specify_the_right_repo_here, c) ...>} + +@@ +struct commit *c; +expression E; +@@ +( +- c->generation = E; ++ commit_graph_data_at(c)->generation = E; +| +- c->graph_pos = E; ++ commit_graph_data_at(c)->graph_pos = E; +| +- c->generation ++ commit_graph_generation(c) +| +- c->graph_pos ++ commit_graph_position(c) +) diff --git a/revision.c b/revision.c index 60cca8c0b9..cb1b200e9f 100644 --- a/revision.c +++ b/revision.c @@ -720,7 +720,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (!revs->repo->objects->commit_graph) return -1; - if (commit->generation == GENERATION_NUMBER_INFINITY) + if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; filter = get_bloom_filter(revs->repo, commit, 0); @@ -3314,7 +3314,7 @@ static void explore_to_depth(struct rev_info *revs, struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; while ((c = prio_queue_peek(&info->explore_queue)) && - c->generation >= gen_cutoff) + commit_graph_generation(c) >= gen_cutoff) explore_walk_step(revs); } @@ -3330,7 +3330,7 @@ static void indegree_walk_step(struct rev_info *revs) if (parse_commit_gently(c, 1) < 0) return; - explore_to_depth(revs, c->generation); + explore_to_depth(revs, commit_graph_generation(c)); for (p = c->parents; p; p = p->next) { struct commit *parent = p->item; @@ -3354,7 +3354,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs, struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; while ((c = prio_queue_peek(&info->indegree_queue)) && - c->generation >= gen_cutoff) + commit_graph_generation(c) >= gen_cutoff) indegree_walk_step(revs); } @@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs) test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); - if (c->generation < info->min_generation) - info->min_generation = c->generation; + if (commit_graph_generation(c) < info->min_generation) + info->min_generation = commit_graph_generation(c); *(indegree_slab_at(&info->indegree, c)) = 1; @@ -3473,8 +3473,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) if (parse_commit_gently(parent, 1) < 0) continue; - if (parent->generation < info->min_generation) { - info->min_generation = parent->generation; + if (commit_graph_generation(parent) < info->min_generation) { + info->min_generation = commit_graph_generation(parent); compute_indegrees_to_depth(revs, info->min_generation); } -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab 2020-06-07 19:32 ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar @ 2020-06-08 8:26 ` SZEDER Gábor 2020-06-08 12:35 ` Derrick Stolee 0 siblings, 1 reply; 39+ messages in thread From: SZEDER Gábor @ 2020-06-08 8:26 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, jnareb, stolee On Mon, Jun 08, 2020 at 01:02:35AM +0530, Abhishek Kumar wrote: > diff --git a/commit-graph.c b/commit-graph.c > index 7d887a6a2c..f7cca4def4 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1302,8 +1302,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > ctx->commits.nr); > for (i = 0; i < ctx->commits.nr; i++) { > display_progress(ctx->progress, i + 1); > - if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY && > - ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO) > + if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && > + commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) > continue; > > commit_list_insert(ctx->commits.list[i], &list); > @@ -1314,22 +1314,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > uint32_t max_generation = 0; > > for (parent = current->parents; parent; parent = parent->next) { > - if (parent->item->generation == GENERATION_NUMBER_INFINITY || > - parent->item->generation == GENERATION_NUMBER_ZERO) { > + if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY || > + commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) { > all_parents_computed = 0; > commit_list_insert(parent->item, &list); > break; > - } else if (parent->item->generation > max_generation) { > - max_generation = parent->item->generation; > + } else if (commit_graph_generation(parent->item) > max_generation) { > + max_generation = commit_graph_generation(parent->item); > } > } > > if (all_parents_computed) { > - current->generation = max_generation + 1; > + commit_graph_data_at(current)->generation = max_generation + 1; > pop_commit(&list); > > - if (current->generation > GENERATION_NUMBER_MAX) > - current->generation = GENERATION_NUMBER_MAX; > + if (commit_graph_generation(current) > GENERATION_NUMBER_MAX) > + commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX; > } > } > } Something about these conversions is not right, as they send compute_generation_numbers() into an endless loop, and 't5318-commit-graph.sh' hangs because of this. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab 2020-06-08 8:26 ` SZEDER Gábor @ 2020-06-08 12:35 ` Derrick Stolee 0 siblings, 0 replies; 39+ messages in thread From: Derrick Stolee @ 2020-06-08 12:35 UTC (permalink / raw) To: SZEDER Gábor, Abhishek Kumar; +Cc: git, jnareb On 6/8/2020 4:26 AM, SZEDER Gábor wrote: > On Mon, Jun 08, 2020 at 01:02:35AM +0530, Abhishek Kumar wrote: >> diff --git a/commit-graph.c b/commit-graph.c >> index 7d887a6a2c..f7cca4def4 100644 >> --- a/commit-graph.c >> +++ b/commit-graph.c > >> @@ -1302,8 +1302,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) >> ctx->commits.nr); >> for (i = 0; i < ctx->commits.nr; i++) { >> display_progress(ctx->progress, i + 1); >> - if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY && >> - ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO) >> + if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && >> + commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) >> continue; >> >> commit_list_insert(ctx->commits.list[i], &list); >> @@ -1314,22 +1314,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) >> uint32_t max_generation = 0; >> >> for (parent = current->parents; parent; parent = parent->next) { >> - if (parent->item->generation == GENERATION_NUMBER_INFINITY || >> - parent->item->generation == GENERATION_NUMBER_ZERO) { >> + if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY || >> + commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) { >> all_parents_computed = 0; >> commit_list_insert(parent->item, &list); >> break; >> - } else if (parent->item->generation > max_generation) { >> - max_generation = parent->item->generation; >> + } else if (commit_graph_generation(parent->item) > max_generation) { >> + max_generation = commit_graph_generation(parent->item); >> } >> } >> >> if (all_parents_computed) { >> - current->generation = max_generation + 1; >> + commit_graph_data_at(current)->generation = max_generation + 1; >> pop_commit(&list); >> >> - if (current->generation > GENERATION_NUMBER_MAX) >> - current->generation = GENERATION_NUMBER_MAX; >> + if (commit_graph_generation(current) > GENERATION_NUMBER_MAX) >> + commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX; >> } >> } >> } > > Something about these conversions is not right, as they send > compute_generation_numbers() into an endless loop, and > 't5318-commit-graph.sh' hangs because of this. Abhishek responded off-list, but it's worth having the discussion here, too. While the next patch fixes the bug introduced here, we strive to have every patch compile and pass all tests on all platforms. It can be hard to verify that last "all platforms" condition, but we can run (most) tests on each of our patches using the following: $ git rebase -x "make -j12 DEVELOPER=1 && (cd t && prove -j8 t[0-8]*.sh)" <base> Thanks, Szeder, for finding this issue in the patch. Looking at this patch and patch 3, I think you should just squash that patch into this one, since the code you are removing in patch 3 was added by this one. Add a paragraph in your commit message that details why we need to use commit_graph_data_at() directly in write_graph_chunk_data() and compute_generation_numbers(). Thanks, -Stolee ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar 2020-06-07 19:32 ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar 2020-06-07 19:32 ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar @ 2020-06-07 19:32 ` Abhishek Kumar 2020-06-08 16:31 ` Jakub Narębski 2020-06-07 19:32 ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar ` (2 subsequent siblings) 5 siblings, 1 reply; 39+ messages in thread From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw) To: git; +Cc: jnareb, stolee commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the graph position for commit is COMMIT_NOT_FROM_GRAPH. While this is true when reading from a commit graph, no graph positions are associated with a commit when writing a commit graph. Therefore, the helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a finite generation number. Let's fix this by using generation number directly when writing a commit graph. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- commit-graph.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f7cca4def4..0dc79e7c90 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, else packedDate[0] = 0; - packedDate[0] |= htonl(commit_graph_generation((*list)) << 2); + packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1301,9 +1301,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { + uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + display_progress(ctx->progress, i + 1); - if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && - commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) + if (generation != GENERATION_NUMBER_INFINITY && + generation != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1314,8 +1316,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) uint32_t max_generation = 0; for (parent = current->parents; parent; parent = parent->next) { - if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY || - commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) { + + if (generation == GENERATION_NUMBER_INFINITY || + generation == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph 2020-06-07 19:32 ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar @ 2020-06-08 16:31 ` Jakub Narębski 2020-06-15 16:31 ` Taylor Blau 0 siblings, 1 reply; 39+ messages in thread From: Jakub Narębski @ 2020-06-08 16:31 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, Derrick Stolee Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the > graph position for commit is COMMIT_NOT_FROM_GRAPH. > > While this is true when reading from a commit graph, no graph positions > are associated with a commit when writing a commit graph. Therefore, the > helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a > finite generation number. > > Let's fix this by using generation number directly when writing a commit > graph. I think that to avoid having non-working patch (which can cause problems when bisecting), it would be a better idea to switch the order of patches 2 and 3. This way we won't have incorrect behaviour. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index f7cca4def4..0dc79e7c90 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, > else > packedDate[0] = 0; > > - packedDate[0] |= htonl(commit_graph_generation((*list)) << 2); > + packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); > All right. > packedDate[1] = htonl((*list)->date); > hashwrite(f, packedDate, 8); > @@ -1301,9 +1301,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > _("Computing commit graph generation numbers"), > ctx->commits.nr); > for (i = 0; i < ctx->commits.nr; i++) { > + uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; > + > display_progress(ctx->progress, i + 1); > - if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && > - commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) > + if (generation != GENERATION_NUMBER_INFINITY && > + generation != GENERATION_NUMBER_ZERO) > continue; > All right; this also introduces local variable to avoid accessing the slab twice^W four times... > commit_list_insert(ctx->commits.list[i], &list); > @@ -1314,8 +1316,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > uint32_t max_generation = 0; > > for (parent = current->parents; parent; parent = parent->next) { > - if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY || > - commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) { > + > + if (generation == GENERATION_NUMBER_INFINITY || > + generation == GENERATION_NUMBER_ZERO) { > all_parents_computed = 0; > commit_list_insert(parent->item, &list); > break; ... which is then used here. Best, -- Jakub Narębski ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph 2020-06-08 16:31 ` Jakub Narębski @ 2020-06-15 16:31 ` Taylor Blau 0 siblings, 0 replies; 39+ messages in thread From: Taylor Blau @ 2020-06-15 16:31 UTC (permalink / raw) To: Jakub Narębski; +Cc: Abhishek Kumar, git, Derrick Stolee On Mon, Jun 08, 2020 at 06:31:49PM +0200, Jakub Narębski wrote: > Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > > > commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the > > graph position for commit is COMMIT_NOT_FROM_GRAPH. > > > > While this is true when reading from a commit graph, no graph positions > > are associated with a commit when writing a commit graph. Therefore, the > > helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a > > finite generation number. > > > > Let's fix this by using generation number directly when writing a commit > > graph. > > I think that to avoid having non-working patch (which can cause problems > when bisecting), it would be a better idea to switch the order of > patches 2 and 3. This way we won't have incorrect behaviour. Yup... agreed (with the additional caveat that the unused function in the first patch also be moved into this new--larger--patch). > > > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > > --- > > commit-graph.c | 13 ++++++++----- > > 1 file changed, 8 insertions(+), 5 deletions(-) > > > > diff --git a/commit-graph.c b/commit-graph.c > > index f7cca4def4..0dc79e7c90 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, > > else > > packedDate[0] = 0; > > > > - packedDate[0] |= htonl(commit_graph_generation((*list)) << 2); > > + packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); > > > > All right. > > > packedDate[1] = htonl((*list)->date); > > hashwrite(f, packedDate, 8); > > @@ -1301,9 +1301,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > > _("Computing commit graph generation numbers"), > > ctx->commits.nr); > > for (i = 0; i < ctx->commits.nr; i++) { > > + uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; > > + > > display_progress(ctx->progress, i + 1); > > - if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY && > > - commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO) > > + if (generation != GENERATION_NUMBER_INFINITY && > > + generation != GENERATION_NUMBER_ZERO) > > continue; > > > > All right; this also introduces local variable to avoid accessing the > slab twice^W four times... > > > commit_list_insert(ctx->commits.list[i], &list); > > @@ -1314,8 +1316,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > > uint32_t max_generation = 0; > > > > for (parent = current->parents; parent; parent = parent->next) { > > - if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY || > > - commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) { > > + > > + if (generation == GENERATION_NUMBER_INFINITY || > > + generation == GENERATION_NUMBER_ZERO) { > > all_parents_computed = 0; > > commit_list_insert(parent->item, &list); > > break; > > ... which is then used here. > > Best, > -- > Jakub Narębski Thanks, Taylor ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar ` (2 preceding siblings ...) 2020-06-07 19:32 ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar @ 2020-06-07 19:32 ` Abhishek Kumar 2020-06-08 16:22 ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski 2020-06-15 16:24 ` Taylor Blau 5 siblings, 0 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw) To: git; +Cc: jnareb, stolee In an earlier patch, multiple struct acccesses to `graph_pos` and `generation` were auto-converted to multiple method calls. Since the values are fixed and commit-slab access costly, we would be better off with storing the values as a local variable and reuse it. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> [1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/ --- bloom.c | 5 +++-- commit-graph.c | 55 +++++++++++++++++++++++++++++----------------- commit-reach.c | 59 ++++++++++++++++++++++++++++++++------------------ commit.c | 6 +++-- revision.c | 12 ++++++---- 5 files changed, 88 insertions(+), 49 deletions(-) diff --git a/bloom.c b/bloom.c index df62e3763d..568ebd75e7 100644 --- a/bloom.c +++ b/bloom.c @@ -33,15 +33,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, struct commit *c) { uint32_t lex_pos, start_index, end_index; + uint32_t graph_pos = commit_graph_position(c); - while (commit_graph_position(c) < g->num_commits_in_base) + while (graph_pos < g->num_commits_in_base) g = g->base_graph; /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ if (!g->chunk_bloom_indexes) return 0; - lex_pos = commit_graph_position(c) - g->num_commits_in_base; + lex_pos = graph_pos - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); diff --git a/commit-graph.c b/commit-graph.c index 0dc79e7c90..5b10d1da2c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -141,10 +141,12 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; + uint32_t generation_a = commit_graph_generation(a); + uint32_t generation_b = commit_graph_generation(b); /* lower generation commits first */ - if (commit_graph_generation(a) < commit_graph_generation(b)) + if (generation_a < generation_b) return -1; - else if (commit_graph_generation(a) > commit_graph_generation(b)) + else if (generation_a > generation_b) return 1; /* use date as a heuristic when generations are equal */ @@ -726,6 +728,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r, static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos) { const unsigned char *commit_data; + struct commit_graph_data *graph_data; uint32_t lex_index; while (pos < g->num_commits_in_base) @@ -733,8 +736,10 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; - commit_graph_data_at(item)->graph_pos = pos; - commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + + graph_data = commit_graph_data_at(item); + graph_data->graph_pos = pos; + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -750,6 +755,7 @@ static int fill_commit_in_graph(struct repository *r, uint32_t *parent_data_ptr; uint64_t date_low, date_high; struct commit_list **pptr; + struct commit_graph_data *graph_data; const unsigned char *commit_data; uint32_t lex_index; @@ -763,7 +769,8 @@ static int fill_commit_in_graph(struct repository *r, * Store the "full" position, but then use the * "local" position for the rest of the calculation. */ - commit_graph_data_at(item)->graph_pos = pos; + graph_data = commit_graph_data_at(item); + graph_data->graph_pos = pos; lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; @@ -776,7 +783,7 @@ static int fill_commit_in_graph(struct repository *r, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; pptr = &item->parents; @@ -808,8 +815,9 @@ static int fill_commit_in_graph(struct repository *r, static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) { - if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) { - *pos = commit_graph_position(item); + uint32_t graph_pos = commit_graph_position(item); + if (graph_pos != COMMIT_NOT_FROM_GRAPH) { + *pos = graph_pos; return 1; } else { struct commit_graph *cur_g = g; @@ -864,12 +872,13 @@ static struct tree *load_tree_for_commit(struct repository *r, { struct object_id oid; const unsigned char *commit_data; + uint32_t graph_pos = commit_graph_position(c); - while (commit_graph_position(c) < g->num_commits_in_base) + while (graph_pos < g->num_commits_in_base) g = g->base_graph; commit_data = g->chunk_commit_data + - GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base); + GRAPH_DATA_WIDTH * (graph_pos - g->num_commits_in_base); hashcpy(oid.hash, commit_data); set_commit_tree(c, lookup_tree(r, &oid)); @@ -1301,7 +1310,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { - uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + uint32_t generation = commit_graph_generation(ctx->commits.list[i]); display_progress(ctx->progress, i + 1); if (generation != GENERATION_NUMBER_INFINITY && @@ -1316,23 +1325,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) uint32_t max_generation = 0; for (parent = current->parents; parent; parent = parent->next) { + generation = commit_graph_generation(parent->item); if (generation == GENERATION_NUMBER_INFINITY || generation == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (commit_graph_generation(parent->item) > max_generation) { - max_generation = commit_graph_generation(parent->item); + } else if (generation > max_generation) { + max_generation = generation; } } if (all_parents_computed) { - commit_graph_data_at(current)->generation = max_generation + 1; + struct commit_graph_data *graph_data = commit_graph_data_at(current); + + graph_data->generation = max_generation + 1; pop_commit(&list); - if (commit_graph_generation(current) > GENERATION_NUMBER_MAX) - commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX; + if (graph_data->generation > GENERATION_NUMBER_MAX) + graph_data->generation = GENERATION_NUMBER_MAX; } } } @@ -2301,6 +2313,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) struct commit *graph_commit, *odb_commit; struct commit_list *graph_parents, *odb_parents; uint32_t max_generation = 0; + uint32_t generation; display_progress(progress, i + 1); hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -2339,8 +2352,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) oid_to_hex(&graph_parents->item->object.oid), oid_to_hex(&odb_parents->item->object.oid)); - if (commit_graph_generation(graph_parents->item) > max_generation) - max_generation = commit_graph_generation(graph_parents->item); + generation = commit_graph_generation(graph_parents->item); + if (generation > max_generation) + max_generation = generation; graph_parents = graph_parents->next; odb_parents = odb_parents->next; @@ -2370,10 +2384,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (max_generation == GENERATION_NUMBER_MAX) max_generation--; - if (commit_graph_generation(graph_commit) != max_generation + 1) + generation = commit_graph_generation(graph_commit); + if (generation != max_generation + 1) graph_report(_("commit-graph generation for commit %s is %u != %u"), oid_to_hex(&cur_oid), - commit_graph_generation(graph_commit), + generation, max_generation + 1); if (graph_commit->date != odb_commit->date) diff --git a/commit-reach.c b/commit-reach.c index 3b2f863f5f..f5e5c0a32b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -58,14 +58,15 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit *commit = prio_queue_get(&queue); struct commit_list *parents; int flags; + uint32_t generation = commit_graph_generation(commit); - if (min_generation && commit_graph_generation(commit) > last_gen) + if (min_generation && generation > last_gen) BUG("bad generation skip %8x > %8x at %s", - commit_graph_generation(commit), last_gen, + generation, last_gen, oid_to_hex(&commit->object.oid)); - last_gen = commit_graph_generation(commit); + last_gen = generation; - if (commit_graph_generation(commit) < min_generation) + if (generation < min_generation) break; flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); @@ -181,13 +182,15 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt if (redundant[i]) continue; for (j = filled = 0; j < cnt; j++) { + uint32_t curr_generation; if (i == j || redundant[j]) continue; filled_index[filled] = j; work[filled++] = array[j]; - if (commit_graph_generation(array[j]) < min_generation) - min_generation = commit_graph_generation(array[j]); + curr_generation = commit_graph_generation(array[j]); + if (curr_generation < min_generation) + min_generation = curr_generation; } common = paint_down_to_common(r, array[i], filled, work, min_generation); @@ -316,23 +319,26 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, { struct commit_list *bases; int ret = 0, i; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY; if (repo_parse_commit(r, commit)) return ret; for (i = 0; i < nr_reference; i++) { if (repo_parse_commit(r, reference[i])) return ret; - if (commit_graph_generation(reference[i]) < min_generation) - min_generation = commit_graph_generation(reference[i]); + + generation = commit_graph_generation(reference[i]); + if (generation < min_generation) + min_generation = generation; } - if (commit_graph_generation(commit) > min_generation) + generation = commit_graph_generation(commit); + if (generation > min_generation) return ret; bases = paint_down_to_common(r, commit, nr_reference, reference, - commit_graph_generation(commit)); + generation); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); @@ -490,10 +496,12 @@ static enum contains_result contains_tag_algo(struct commit *candidate, const struct commit_list *p; for (p = want; p; p = p->next) { + uint32_t generation; struct commit *c = p->item; load_commit_graph_info(the_repository, c); - if (commit_graph_generation(c) < cutoff) - cutoff = commit_graph_generation(c); + generation = commit_graph_generation(c); + if (generation < cutoff) + cutoff = generation; } result = contains_test(candidate, want, cache, cutoff); @@ -544,9 +552,12 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - if (commit_graph_generation(a) < commit_graph_generation(b)) + uint32_t generation_a = commit_graph_generation(a); + uint32_t generation_b = commit_graph_generation(b); + + if (generation_a < generation_b) return -1; - if (commit_graph_generation(a) > commit_graph_generation(b)) + if (generation_a > generation_b) return 1; return 0; } @@ -662,11 +673,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, add_object_array(&from_iter->item->object, NULL, &from_objs); if (!parse_commit(from_iter->item)) { + uint32_t generation; if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; - if (commit_graph_generation(from_iter->item) < min_generation) - min_generation = commit_graph_generation(from_iter->item); + generation = commit_graph_generation(from_iter->item); + if (generation < min_generation) + min_generation = generation; } from_iter = from_iter->next; @@ -674,11 +687,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, while (to_iter) { if (!parse_commit(to_iter->item)) { + uint32_t generation; if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; - if (commit_graph_generation(to_iter->item) < min_generation) - min_generation = commit_graph_generation(to_iter->item); + generation = commit_graph_generation(to_iter->item); + if (generation < min_generation) + min_generation = generation; } to_iter->item->object.flags |= PARENT2; @@ -718,11 +733,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; for (item = to; item < to_last; item++) { + uint32_t generation; struct commit *c = *item; parse_commit(c); - if (commit_graph_generation(c) < min_generation) - min_generation = commit_graph_generation(c); + generation = commit_graph_generation(c); + if (generation < min_generation) + min_generation = generation; if (!(c->object.flags & PARENT1)) { c->object.flags |= PARENT1; diff --git a/commit.c b/commit.c index ad9a76dcc6..f85ade78ed 100644 --- a/commit.c +++ b/commit.c @@ -729,11 +729,13 @@ int compare_commits_by_author_date(const void *a_, const void *b_, int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused) { const struct commit *a = a_, *b = b_; + const uint32_t generation_a = commit_graph_generation(a), + generation_b = commit_graph_generation(b); /* newer commits first */ - if (commit_graph_generation(a) < commit_graph_generation(b)) + if (generation_a < generation_b) return 1; - else if (commit_graph_generation(a) > commit_graph_generation(b)) + else if (generation_a > generation_b) return -1; /* use date as a heuristic when generations are equal */ diff --git a/revision.c b/revision.c index cb1b200e9f..c257089808 100644 --- a/revision.c +++ b/revision.c @@ -3407,6 +3407,7 @@ static void init_topo_walk(struct rev_info *revs) info->min_generation = GENERATION_NUMBER_INFINITY; for (list = revs->commits; list; list = list->next) { struct commit *c = list->item; + uint32_t generation; if (parse_commit_gently(c, 1)) continue; @@ -3414,8 +3415,9 @@ static void init_topo_walk(struct rev_info *revs) test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); - if (commit_graph_generation(c) < info->min_generation) - info->min_generation = commit_graph_generation(c); + generation = commit_graph_generation(c); + if (generation < info->min_generation) + info->min_generation = generation; *(indegree_slab_at(&info->indegree, c)) = 1; @@ -3466,6 +3468,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) for (p = commit->parents; p; p = p->next) { struct commit *parent = p->item; int *pi; + uint32_t generation; if (parent->object.flags & UNINTERESTING) continue; @@ -3473,8 +3476,9 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) if (parse_commit_gently(parent, 1) < 0) continue; - if (commit_graph_generation(parent) < info->min_generation) { - info->min_generation = commit_graph_generation(parent); + generation = commit_graph_generation(parent); + if (generation < info->min_generation) { + info->min_generation = generation; compute_indegrees_to_depth(revs, info->min_generation); } -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 0/4] Move generation, graph_pos to a slab 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar ` (3 preceding siblings ...) 2020-06-07 19:32 ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar @ 2020-06-08 16:22 ` Jakub Narębski 2020-06-15 16:24 ` Taylor Blau 5 siblings, 0 replies; 39+ messages in thread From: Jakub Narębski @ 2020-06-08 16:22 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, Derrick Stolee Abhishek Kumar <abhishekkumar8222@gmail.com> writes: > The struct commit is used in many contexts. However, members > `generation` and `graph_pos` are only used for commit graph related > operations and otherwise waste memory. > > This wastage would have been more pronounced as we transition to > generation number v2, which uses 64-bite generation number instead of > current 32-bits. It would be nice (though not required) to have some specific data: benchmarks with time and memory (RSS maybe?) of Git commands using commit-graph and those not using it, before and after this patch series. Maybe time to run the test suite, or the perf suite... But this is not a show stopper, in my opinion. Best, Jakub Narębski > Abhishek Kumar (4): > commit-graph: introduce commit_graph_data_slab > commit: move members graph_pos, generation to a slab > commit-graph: use generation directly when writing commit-graph > commit-graph: minimize commit_graph_data_slab access > > alloc.c | 2 - > blame.c | 2 +- > bloom.c | 7 +- > commit-graph.c | 127 ++++++++++++++++++++++++-------- > commit-graph.h | 10 +++ > commit-reach.c | 69 ++++++++++------- > commit.c | 8 +- > contrib/coccinelle/commit.cocci | 18 +++++ > revision.c | 20 +++-- > 9 files changed, 190 insertions(+), 73 deletions(-) ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v2 0/4] Move generation, graph_pos to a slab 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar ` (4 preceding siblings ...) 2020-06-08 16:22 ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski @ 2020-06-15 16:24 ` Taylor Blau 5 siblings, 0 replies; 39+ messages in thread From: Taylor Blau @ 2020-06-15 16:24 UTC (permalink / raw) To: Abhishek Kumar; +Cc: git, jnareb, stolee Hi Abhishek, I am so excited that you are working on this, and welcome to Git! This change will make a meaningful difference for us at GitHub (who use commit graphs extensively), and it sounds like they will even make a larger difference with your later changes. I was out of the office last week, and so I hadn't gotten a chance to review the first version of this series, but I'll review v2 now... On Mon, Jun 08, 2020 at 01:02:33AM +0530, Abhishek Kumar wrote: > The struct commit is used in many contexts. However, members > `generation` and `graph_pos` are only used for commit graph related > operations and otherwise waste memory. Thanks, Taylor ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v4 0/4] Move generation, graph_pos to a slab 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar ` (5 preceding siblings ...) 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar @ 2020-06-17 9:14 ` Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar ` (4 more replies) 6 siblings, 5 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-17 9:14 UTC (permalink / raw) To: git; +Cc: jnareb, stolee The struct commit is used in many contexts. However, members `generation` and `graph_pos` are only used for commit graph related operations and otherwise waste memory. This wastage would have been more pronounced as we transition to generation nuber v2, which uses 64-bit generation number instead of current 32-bits. While the overall test suite runs as fast as master (series: 26m48s, master: 27m34s, faster by 2.87%), certain commands like `git merge-base --is-ancestor` were slowed by 40% as discovered by Szeder Gábor [1]. After minimizing commit-slab access, the slow down persists but is closer to 20%. Derrick Stolee believes the slow down is attributable to the underlying algorithm rather than the slowness of commit-slab access [2] and we will follow-up in a later series. Abhishek Kumar (4): object: drop parsed_object_pool->commit_count commit-graph: introduce commit_graph_data_slab commit: move members graph_pos, generation to a slab commit-graph: minimize commit_graph_data_slab access alloc.c | 18 +++-- alloc.h | 2 +- blame.c | 2 +- blob.c | 2 +- bloom.c | 7 +- builtin/commit-graph.c | 2 +- builtin/fsck.c | 2 +- commit-graph.c | 130 ++++++++++++++++++++++++-------- commit-graph.h | 10 +++ commit-reach.c | 69 ++++++++++------- commit.c | 12 +-- commit.h | 2 - contrib/coccinelle/commit.cocci | 18 +++++ object.c | 4 +- object.h | 3 +- refs.c | 2 +- revision.c | 20 +++-- t/helper/test-reach.c | 2 +- tag.c | 2 +- tree.c | 2 +- 20 files changed, 217 insertions(+), 94 deletions(-) -- 2.27.0 Changes in v4: - Fix segfault while initializing commit_graph_data_slab. - Fix the "commit index should have unique values" more cleanly Changes in v3: - Introduce alloc commit to fix the failing diff-submodule test. - Elaborate on perforamnce and the slow down in commit message Changes in v2: - Introduce struct commit_graph_data - Merge `graph_pos`, `generation` slabs into a single, `commit_graph_data` slab. - Use graph position for an intermediate check for generation, saving the cost of initializing generation numbers. - Add an follow-up patch caching results of slab access in local variables. - Move coccinelle transformation to commit.coccinelle instead of creating new scripts. - Elaborate on removing default values from init_commit_node(). - Revert moving macro constants (e.g. COMMIT_NOT_FROM_GRAPH, GENERATION_NUMBER_ZERO0 from commit.h to commit-graph.h ^ permalink raw reply [flat|nested] 39+ messages in thread
* [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar @ 2020-06-17 9:14 ` Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar ` (3 subsequent siblings) 4 siblings, 0 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-17 9:14 UTC (permalink / raw) To: git; +Cc: jnareb, stolee 14ba97f8 (alloc: allow arbitrary repositories for alloc functions, 2018-05-15) introduced parsed_object_pool->commit_count to keep count of commits per repository and was used to assign commit->index. However, commit-slab code requires commit->index values to be unique and a global count would be correct, rather than a per-repo count. Let's introduce a static counter variable, `parsed_commits_count` to keep track of parsed commits so far. As commit_count has no use anymore, let's also drop it from the struct. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- alloc.c | 16 +++++++++++----- alloc.h | 2 +- blob.c | 2 +- builtin/commit-graph.c | 2 +- builtin/fsck.c | 2 +- commit.c | 4 ++-- object.c | 4 ++-- object.h | 3 +-- refs.c | 2 +- t/helper/test-reach.c | 2 +- tag.c | 2 +- tree.c | 2 +- 12 files changed, 24 insertions(+), 19 deletions(-) diff --git a/alloc.c b/alloc.c index 1c64c4dd16..99fa934b32 100644 --- a/alloc.c +++ b/alloc.c @@ -99,15 +99,21 @@ void *alloc_object_node(struct repository *r) return obj; } -static unsigned int alloc_commit_index(struct repository *r) +/* + * The returned count is to be used as an index into commit slabs, + * that are *NOT* maintained per repository, and that is why a single + * global counter is used. + */ +static unsigned int alloc_commit_index(void) { - return r->parsed_objects->commit_count++; + static unsigned int parsed_commits_count; + return parsed_commits_count++; } -void init_commit_node(struct repository *r, struct commit *c) +void init_commit_node(struct commit *c) { c->object.type = OBJ_COMMIT; - c->index = alloc_commit_index(r); + c->index = alloc_commit_index(); c->graph_pos = COMMIT_NOT_FROM_GRAPH; c->generation = GENERATION_NUMBER_INFINITY; } @@ -115,7 +121,7 @@ void init_commit_node(struct repository *r, struct commit *c) void *alloc_commit_node(struct repository *r) { struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit)); - init_commit_node(r, c); + init_commit_node(c); return c; } diff --git a/alloc.h b/alloc.h index ed1071c11e..371d388b55 100644 --- a/alloc.h +++ b/alloc.h @@ -9,7 +9,7 @@ struct repository; void *alloc_blob_node(struct repository *r); void *alloc_tree_node(struct repository *r); -void init_commit_node(struct repository *r, struct commit *c); +void init_commit_node(struct commit *c); void *alloc_commit_node(struct repository *r); void *alloc_tag_node(struct repository *r); void *alloc_object_node(struct repository *r); diff --git a/blob.c b/blob.c index 36f9abda19..182718aba9 100644 --- a/blob.c +++ b/blob.c @@ -10,7 +10,7 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid) struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_blob_node(r)); - return object_as_type(r, obj, OBJ_BLOB, 0); + return object_as_type(obj, OBJ_BLOB, 0); } int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 75455da138..f6797e2a9f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -154,7 +154,7 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, NULL, 0); if (!result) return error(_("invalid object: %s"), hash); - else if (object_as_type(the_repository, result, OBJ_COMMIT, 1)) + else if (object_as_type(result, OBJ_COMMIT, 1)) oidset_insert(commits, &result->oid); display_progress(progress, oidset_size(commits)); diff --git a/builtin/fsck.c b/builtin/fsck.c index f02cbdb439..b2cef01389 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -241,7 +241,7 @@ static void mark_unreachable_referents(const struct object_id *oid) enum object_type type = oid_object_info(the_repository, &obj->oid, NULL); if (type > 0) - object_as_type(the_repository, obj, type, 0); + object_as_type(obj, type, 0); } options.walk = mark_used; diff --git a/commit.c b/commit.c index 87686a7055..b30875e66b 100644 --- a/commit.c +++ b/commit.c @@ -37,7 +37,7 @@ struct commit *lookup_commit_reference_gently(struct repository *r, if (!obj) return NULL; - return object_as_type(r, obj, OBJ_COMMIT, quiet); + return object_as_type(obj, OBJ_COMMIT, quiet); } struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid) @@ -62,7 +62,7 @@ struct commit *lookup_commit(struct repository *r, const struct object_id *oid) struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_commit_node(r)); - return object_as_type(r, obj, OBJ_COMMIT, 0); + return object_as_type(obj, OBJ_COMMIT, 0); } struct commit *lookup_commit_reference_by_name(const char *name) diff --git a/object.c b/object.c index 794c86650e..3257518656 100644 --- a/object.c +++ b/object.c @@ -157,13 +157,13 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o) return obj; } -void *object_as_type(struct repository *r, struct object *obj, enum object_type type, int quiet) +void *object_as_type(struct object *obj, enum object_type type, int quiet) { if (obj->type == type) return obj; else if (obj->type == OBJ_NONE) { if (type == OBJ_COMMIT) - init_commit_node(r, (struct commit *) obj); + init_commit_node((struct commit *) obj); else obj->type = type; return obj; diff --git a/object.h b/object.h index b22328b838..532d7d7f28 100644 --- a/object.h +++ b/object.h @@ -15,7 +15,6 @@ struct parsed_object_pool { struct alloc_state *commit_state; struct alloc_state *tag_state; struct alloc_state *object_state; - unsigned commit_count; /* parent substitutions from .git/info/grafts and .git/shallow */ struct commit_graft **grafts; @@ -121,7 +120,7 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid); void *create_object(struct repository *r, const struct object_id *oid, void *obj); -void *object_as_type(struct repository *r, struct object *obj, enum object_type type, int quiet); +void *object_as_type(struct object *obj, enum object_type type, int quiet); /* * Returns the object, having parsed it to find out what it is. diff --git a/refs.c b/refs.c index 224ff66c7b..1f551dd279 100644 --- a/refs.c +++ b/refs.c @@ -339,7 +339,7 @@ enum peel_status peel_object(const struct object_id *name, struct object_id *oid if (o->type == OBJ_NONE) { int type = oid_object_info(the_repository, name, NULL); - if (type < 0 || !object_as_type(the_repository, o, type, 0)) + if (type < 0 || !object_as_type(o, type, 0)) return PEEL_INVALID; } diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index a0272178b7..ccf837cb33 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -67,7 +67,7 @@ int cmd__reach(int ac, const char **av) die("failed to load commit for input %s resulting in oid %s\n", buf.buf, oid_to_hex(&oid)); - c = object_as_type(r, peeled, OBJ_COMMIT, 0); + c = object_as_type(peeled, OBJ_COMMIT, 0); if (!c) die("failed to load commit for input %s resulting in oid %s\n", diff --git a/tag.c b/tag.c index 71b544467e..1ed2684e45 100644 --- a/tag.c +++ b/tag.c @@ -103,7 +103,7 @@ struct tag *lookup_tag(struct repository *r, const struct object_id *oid) struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_tag_node(r)); - return object_as_type(r, obj, OBJ_TAG, 0); + return object_as_type(obj, OBJ_TAG, 0); } static timestamp_t parse_tag_date(const char *buf, const char *tail) diff --git a/tree.c b/tree.c index 1466bcc6a8..e76517f6b1 100644 --- a/tree.c +++ b/tree.c @@ -200,7 +200,7 @@ struct tree *lookup_tree(struct repository *r, const struct object_id *oid) struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_tree_node(r)); - return object_as_type(r, obj, OBJ_TREE, 0); + return object_as_type(obj, OBJ_TREE, 0); } int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size) -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar @ 2020-06-17 9:14 ` Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar ` (2 subsequent siblings) 4 siblings, 0 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-17 9:14 UTC (permalink / raw) To: git; +Cc: jnareb, stolee The struct commit is used in many contexts. However, members `generation` and `graph_pos` are only used for commit-graph related operations and otherwise waste memory. This wastage would have been more pronounced as we transition to generation number v2, which uses 64-bit generation number instead of current 32-bits. As they are often accessed together, let's introduce struct commit_graph_data and move them to a commit_graph_data slab. While the overall test suite runs just as fast as master, (series: 26m48s, master: 27m34s, faster by 2.87%), certain commands like `git merge-base --is-ancestor` were slowed by 40% as discovered by Szeder Gábor [1]. After minimizing commit-slab access, the slow down persists but is closer to 20%. Derrick Stolee believes the slow down is attributable to the underlying algorithm rather than the slowness of commit-slab access [2] and we will follow-up in a later series. [1]: https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/ [2]: https://lore.kernel.org/git/13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- On linux.git with HEAD at 08bf1a27 (Merge tag 'powerpc-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux, 2020-06-13): `git merge-base --is-ancestor HEAD~50000 HEAD` Time (master): 0.787s Time (series): 0.927s Change: 17.79% (slower) Max RSS (master): 177694kb Max RSS (series): 177707kb Change: 0.01% (more) `git gc` Time (master): 3m55s Time (series): 3m38s Change: 7.23% (faster) Max RSS (master): 4889868kb Max RSS (series): 4911960kb Change: 0.45% (more) Earlier implementation of commit_graph_data_at() was incorrect, as we used to iterate from old slab count to new slab count - assuming all intermediate slabs are allocated. This is incorrect as the slabs are allocated only when there's a corresponding commit. It now makes *two slab accesses* in the worst case, but it's okay since the worst case occurs once nearly every (512kb / 8b) commits. commit-graph.c | 78 +++++++++++++++++++++++++++++++++++++++++++------- commit-graph.h | 10 +++++++ 2 files changed, 78 insertions(+), 10 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 2ff042fbf4..8ad7d202b2 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -87,6 +87,58 @@ static int commit_pos_cmp(const void *va, const void *vb) commit_pos_at(&commit_pos, b); } +define_commit_slab(commit_graph_data_slab, struct commit_graph_data); +static struct commit_graph_data_slab commit_graph_data_slab = + COMMIT_SLAB_INIT(1, commit_graph_data_slab); + +uint32_t commit_graph_position(const struct commit *c) +{ + struct commit_graph_data *data = + commit_graph_data_slab_peek(&commit_graph_data_slab, c); + + return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH; +} + +uint32_t commit_graph_generation(const struct commit *c) +{ + struct commit_graph_data *data = + commit_graph_data_slab_peek(&commit_graph_data_slab, c); + + if (!data) + return GENERATION_NUMBER_INFINITY; + else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH) + return GENERATION_NUMBER_INFINITY; + + return data->generation; +} + +static struct commit_graph_data *commit_graph_data_at(const struct commit *c) +{ + unsigned int i, nth_slab; + struct commit_graph_data *data = + commit_graph_data_slab_peek(&commit_graph_data_slab, c); + + if (data) + return data; + + nth_slab = c->index / commit_graph_data_slab.slab_size; + data = commit_graph_data_slab_at(&commit_graph_data_slab, c); + + /* + * commit-slab initializes elements with zero, overwrite this with + * COMMIT_NOT_FROM_GRAPH for graph_pos. + * + * We avoid initializing generation with checking if graph position + * is not COMMIT_NOT_FROM_GRAPH. + */ + for (i = 0; i < commit_graph_data_slab.slab_size; i++) { + commit_graph_data_slab.slab[nth_slab][i].graph_pos = + COMMIT_NOT_FROM_GRAPH; + } + + return data; +} + static int commit_gen_cmp(const void *va, const void *vb) { const struct commit *a = *(const struct commit **)va; @@ -1020,7 +1072,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, else packedDate[0] = 0; - packedDate[0] |= htonl((*list)->generation << 2); + packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1251,9 +1303,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { + uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + display_progress(ctx->progress, i + 1); - if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY && - ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO) + if (generation != GENERATION_NUMBER_INFINITY && + generation != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1264,22 +1318,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) uint32_t max_generation = 0; for (parent = current->parents; parent; parent = parent->next) { - if (parent->item->generation == GENERATION_NUMBER_INFINITY || - parent->item->generation == GENERATION_NUMBER_ZERO) { + generation = commit_graph_data_at(parent->item)->generation; + + if (generation == GENERATION_NUMBER_INFINITY || + generation == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (parent->item->generation > max_generation) { - max_generation = parent->item->generation; + } else if (generation > max_generation) { + max_generation = generation; } } if (all_parents_computed) { - current->generation = max_generation + 1; + struct commit_graph_data *data = commit_graph_data_at(current); + + data->generation = max_generation + 1; pop_commit(&list); - if (current->generation > GENERATION_NUMBER_MAX) - current->generation = GENERATION_NUMBER_MAX; + if (data->generation > GENERATION_NUMBER_MAX) + data->generation = GENERATION_NUMBER_MAX; } } } diff --git a/commit-graph.h b/commit-graph.h index 3ba0da1e5f..28f89cdf3e 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -135,4 +135,14 @@ void free_commit_graph(struct commit_graph *); */ void disable_commit_graph(struct repository *r); +struct commit_graph_data { + uint32_t graph_pos; + uint32_t generation; +}; + +/* + * Commits should be parsed before accessing generation, graph positions. + */ +uint32_t commit_graph_generation(const struct commit *); +uint32_t commit_graph_position(const struct commit *); #endif -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar @ 2020-06-17 9:14 ` Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar 2020-06-19 13:59 ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee 4 siblings, 0 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-17 9:14 UTC (permalink / raw) To: git; +Cc: jnareb, stolee We remove members `graph_pos` and `generation` from the struct commit. The default assignments in init_commit_node() are no longer valid, which is fine as the slab helpers return appropriate default values and the assignments are removed. We will replace existing use of commit->generation and commit->graph_pos by commit_graph_data_slab helpers using `contrib/coccinelle/commit.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- alloc.c | 2 -- blame.c | 2 +- bloom.c | 6 ++-- commit-graph.c | 40 +++++++++++++------------- commit-reach.c | 50 ++++++++++++++++----------------- commit.c | 6 ++-- commit.h | 2 -- contrib/coccinelle/commit.cocci | 18 ++++++++++++ revision.c | 16 +++++------ 9 files changed, 78 insertions(+), 64 deletions(-) diff --git a/alloc.c b/alloc.c index 99fa934b32..957a0af362 100644 --- a/alloc.c +++ b/alloc.c @@ -114,8 +114,6 @@ void init_commit_node(struct commit *c) { c->object.type = OBJ_COMMIT; c->index = alloc_commit_index(); - c->graph_pos = COMMIT_NOT_FROM_GRAPH; - c->generation = GENERATION_NUMBER_INFINITY; } void *alloc_commit_node(struct repository *r) diff --git a/blame.c b/blame.c index da7e28800e..82fa16d658 100644 --- a/blame.c +++ b/blame.c @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r, if (!bd) return 1; - if (origin->commit->generation == GENERATION_NUMBER_INFINITY) + if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; filter = get_bloom_filter(r, origin->commit, 0); diff --git a/bloom.c b/bloom.c index 6c7611847a..3062aafaba 100644 --- a/bloom.c +++ b/bloom.c @@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, { uint32_t lex_pos, start_index, end_index; - while (c->graph_pos < g->num_commits_in_base) + while (commit_graph_position(c) < g->num_commits_in_base) g = g->base_graph; /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ if (!g->chunk_bloom_indexes) return 0; - lex_pos = c->graph_pos - g->num_commits_in_base; + lex_pos = commit_graph_position(c) - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); @@ -193,7 +193,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (c->graph_pos != COMMIT_NOT_FROM_GRAPH && + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && r->objects->commit_graph->chunk_bloom_indexes) { if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) return filter; diff --git a/commit-graph.c b/commit-graph.c index 8ad7d202b2..14cc7e931c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -145,9 +145,9 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *b = *(const struct commit **)vb; /* lower generation commits first */ - if (a->generation < b->generation) + if (commit_graph_generation(a) < commit_graph_generation(b)) return -1; - else if (a->generation > b->generation) + else if (commit_graph_generation(a) > commit_graph_generation(b)) return 1; /* use date as a heuristic when generations are equal */ @@ -722,7 +722,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r, c = lookup_commit(r, &oid); if (!c) die(_("could not find commit %s"), oid_to_hex(&oid)); - c->graph_pos = pos; + commit_graph_data_at(c)->graph_pos = pos; return &commit_list_insert(c, pptr)->next; } @@ -736,8 +736,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; - item->graph_pos = pos; - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + commit_graph_data_at(item)->graph_pos = pos; + commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -766,7 +766,7 @@ static int fill_commit_in_graph(struct repository *r, * Store the "full" position, but then use the * "local" position for the rest of the calculation. */ - item->graph_pos = pos; + commit_graph_data_at(item)->graph_pos = pos; lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; @@ -779,7 +779,7 @@ static int fill_commit_in_graph(struct repository *r, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; pptr = &item->parents; @@ -811,8 +811,8 @@ static int fill_commit_in_graph(struct repository *r, static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) { - if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { - *pos = item->graph_pos; + if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) { + *pos = commit_graph_position(item); return 1; } else { struct commit_graph *cur_g = g; @@ -868,11 +868,11 @@ static struct tree *load_tree_for_commit(struct repository *r, struct object_id oid; const unsigned char *commit_data; - while (c->graph_pos < g->num_commits_in_base) + while (commit_graph_position(c) < g->num_commits_in_base) g = g->base_graph; commit_data = g->chunk_commit_data + - GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base); + GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base); hashcpy(oid.hash, commit_data); set_commit_tree(c, lookup_tree(r, &oid)); @@ -886,7 +886,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r, { if (c->maybe_tree) return c->maybe_tree; - if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) + if (commit_graph_position(c) == COMMIT_NOT_FROM_GRAPH) BUG("get_commit_tree_in_graph_one called from non-commit-graph commit"); return load_tree_for_commit(r, g, (struct commit *)c); @@ -1271,7 +1271,7 @@ static void close_reachable(struct write_commit_graph_context *ctx) continue; if (ctx->split) { if ((!parse_commit(commit) && - commit->graph_pos == COMMIT_NOT_FROM_GRAPH) || + commit_graph_position(commit) == COMMIT_NOT_FROM_GRAPH) || flags == COMMIT_GRAPH_SPLIT_REPLACE) add_missing_parents(ctx, commit); } else if (!parse_commit_no_graph(commit)) @@ -1516,7 +1516,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) if (ctx->split) { struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]); - if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH) + if (!c || commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) continue; } @@ -1550,7 +1550,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx) ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]); if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE && - ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH) + commit_graph_position(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH) continue; if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE) @@ -2337,8 +2337,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) oid_to_hex(&graph_parents->item->object.oid), oid_to_hex(&odb_parents->item->object.oid)); - if (graph_parents->item->generation > max_generation) - max_generation = graph_parents->item->generation; + if (commit_graph_generation(graph_parents->item) > max_generation) + max_generation = commit_graph_generation(graph_parents->item); graph_parents = graph_parents->next; odb_parents = odb_parents->next; @@ -2348,7 +2348,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) graph_report(_("commit-graph parent list for commit %s terminates early"), oid_to_hex(&cur_oid)); - if (!graph_commit->generation) { + if (!commit_graph_generation(graph_commit)) { if (generation_zero == GENERATION_NUMBER_EXISTS) graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"), oid_to_hex(&cur_oid)); @@ -2368,10 +2368,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (max_generation == GENERATION_NUMBER_MAX) max_generation--; - if (graph_commit->generation != max_generation + 1) + if (commit_graph_generation(graph_commit) != max_generation + 1) graph_report(_("commit-graph generation for commit %s is %u != %u"), oid_to_hex(&cur_oid), - graph_commit->generation, + commit_graph_generation(graph_commit), max_generation + 1); if (graph_commit->date != odb_commit->date) diff --git a/commit-reach.c b/commit-reach.c index 4ca7e706a1..3b2f863f5f 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit_list *parents; int flags; - if (min_generation && commit->generation > last_gen) + if (min_generation && commit_graph_generation(commit) > last_gen) BUG("bad generation skip %8x > %8x at %s", - commit->generation, last_gen, + commit_graph_generation(commit), last_gen, oid_to_hex(&commit->object.oid)); - last_gen = commit->generation; + last_gen = commit_graph_generation(commit); - if (commit->generation < min_generation) + if (commit_graph_generation(commit) < min_generation) break; flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); @@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; - uint32_t min_generation = array[i]->generation; + uint32_t min_generation = commit_graph_generation(array[i]); if (redundant[i]) continue; @@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt filled_index[filled] = j; work[filled++] = array[j]; - if (array[j]->generation < min_generation) - min_generation = array[j]->generation; + if (commit_graph_generation(array[j]) < min_generation) + min_generation = commit_graph_generation(array[j]); } common = paint_down_to_common(r, array[i], filled, work, min_generation); @@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, for (i = 0; i < nr_reference; i++) { if (repo_parse_commit(r, reference[i])) return ret; - if (reference[i]->generation < min_generation) - min_generation = reference[i]->generation; + if (commit_graph_generation(reference[i]) < min_generation) + min_generation = commit_graph_generation(reference[i]); } - if (commit->generation > min_generation) + if (commit_graph_generation(commit) > min_generation) return ret; bases = paint_down_to_common(r, commit, nr_reference, reference, - commit->generation); + commit_graph_generation(commit)); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); @@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate, /* Otherwise, we don't know; prepare to recurse */ parse_commit_or_die(candidate); - if (candidate->generation < cutoff) + if (commit_graph_generation(candidate) < cutoff) return CONTAINS_NO; return CONTAINS_UNKNOWN; @@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate, for (p = want; p; p = p->next) { struct commit *c = p->item; load_commit_graph_info(the_repository, c); - if (c->generation < cutoff) - cutoff = c->generation; + if (commit_graph_generation(c) < cutoff) + cutoff = commit_graph_generation(c); } result = contains_test(candidate, want, cache, cutoff); @@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - if (a->generation < b->generation) + if (commit_graph_generation(a) < commit_graph_generation(b)) return -1; - if (a->generation > b->generation) + if (commit_graph_generation(a) > commit_graph_generation(b)) return 1; return 0; } @@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from, list[nr_commits] = (struct commit *)from_one; if (parse_commit(list[nr_commits]) || - list[nr_commits]->generation < min_generation) { + commit_graph_generation(list[nr_commits]) < min_generation) { result = 0; goto cleanup; } @@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from, if (parse_commit(parent->item) || parent->item->date < min_commit_date || - parent->item->generation < min_generation) + commit_graph_generation(parent->item) < min_generation) continue; commit_list_insert(parent->item, &stack); @@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; - if (from_iter->item->generation < min_generation) - min_generation = from_iter->item->generation; + if (commit_graph_generation(from_iter->item) < min_generation) + min_generation = commit_graph_generation(from_iter->item); } from_iter = from_iter->next; @@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; - if (to_iter->item->generation < min_generation) - min_generation = to_iter->item->generation; + if (commit_graph_generation(to_iter->item) < min_generation) + min_generation = commit_graph_generation(to_iter->item); } to_iter->item->object.flags |= PARENT2; @@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit *c = *item; parse_commit(c); - if (c->generation < min_generation) - min_generation = c->generation; + if (commit_graph_generation(c) < min_generation) + min_generation = commit_graph_generation(c); if (!(c->object.flags & PARENT1)) { c->object.flags |= PARENT1; @@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, parse_commit(p); - if (p->generation < min_generation) + if (commit_graph_generation(p) < min_generation) continue; if (p->object.flags & PARENT2) diff --git a/commit.c b/commit.c index b30875e66b..ed0917a2c7 100644 --- a/commit.c +++ b/commit.c @@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r, if (commit->maybe_tree || !commit->object.parsed) return commit->maybe_tree; - if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH) + if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH) return get_commit_tree_in_graph(r, commit); return NULL; @@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void const struct commit *a = a_, *b = b_; /* newer commits first */ - if (a->generation < b->generation) + if (commit_graph_generation(a) < commit_graph_generation(b)) return 1; - else if (a->generation > b->generation) + else if (commit_graph_generation(a) > commit_graph_generation(b)) return -1; /* use date as a heuristic when generations are equal */ diff --git a/commit.h b/commit.h index 1b2dea5d85..e901538909 100644 --- a/commit.h +++ b/commit.h @@ -36,8 +36,6 @@ struct commit { * or get_commit_tree_oid(). */ struct tree *maybe_tree; - uint32_t graph_pos; - uint32_t generation; unsigned int index; }; diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci index 778e4704f6..af6dd4c20c 100644 --- a/contrib/coccinelle/commit.cocci +++ b/contrib/coccinelle/commit.cocci @@ -32,3 +32,21 @@ expression c; - c->maybe_tree + repo_get_commit_tree(specify_the_right_repo_here, c) ...>} + +@@ +struct commit *c; +expression E; +@@ +( +- c->generation = E; ++ commit_graph_data_at(c)->generation = E; +| +- c->graph_pos = E; ++ commit_graph_data_at(c)->graph_pos = E; +| +- c->generation ++ commit_graph_generation(c) +| +- c->graph_pos ++ commit_graph_position(c) +) diff --git a/revision.c b/revision.c index ebb4d2a0f2..8648d7c43c 100644 --- a/revision.c +++ b/revision.c @@ -725,7 +725,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (!revs->repo->objects->commit_graph) return -1; - if (commit->generation == GENERATION_NUMBER_INFINITY) + if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; filter = get_bloom_filter(revs->repo, commit, 0); @@ -3320,7 +3320,7 @@ static void explore_to_depth(struct rev_info *revs, struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; while ((c = prio_queue_peek(&info->explore_queue)) && - c->generation >= gen_cutoff) + commit_graph_generation(c) >= gen_cutoff) explore_walk_step(revs); } @@ -3336,7 +3336,7 @@ static void indegree_walk_step(struct rev_info *revs) if (parse_commit_gently(c, 1) < 0) return; - explore_to_depth(revs, c->generation); + explore_to_depth(revs, commit_graph_generation(c)); for (p = c->parents; p; p = p->next) { struct commit *parent = p->item; @@ -3360,7 +3360,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs, struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; while ((c = prio_queue_peek(&info->indegree_queue)) && - c->generation >= gen_cutoff) + commit_graph_generation(c) >= gen_cutoff) indegree_walk_step(revs); } @@ -3420,8 +3420,8 @@ static void init_topo_walk(struct rev_info *revs) test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); - if (c->generation < info->min_generation) - info->min_generation = c->generation; + if (commit_graph_generation(c) < info->min_generation) + info->min_generation = commit_graph_generation(c); *(indegree_slab_at(&info->indegree, c)) = 1; @@ -3479,8 +3479,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) if (parse_commit_gently(parent, 1) < 0) continue; - if (parent->generation < info->min_generation) { - info->min_generation = parent->generation; + if (commit_graph_generation(parent) < info->min_generation) { + info->min_generation = commit_graph_generation(parent); compute_indegrees_to_depth(revs, info->min_generation); } -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar ` (2 preceding siblings ...) 2020-06-17 9:14 ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar @ 2020-06-17 9:14 ` Abhishek Kumar 2020-06-19 13:59 ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee 4 siblings, 0 replies; 39+ messages in thread From: Abhishek Kumar @ 2020-06-17 9:14 UTC (permalink / raw) To: git; +Cc: jnareb, stolee In an earlier patch, multiple struct acccesses to `graph_pos` and `generation` were auto-converted to multiple method calls. Since the values are fixed and commit-slab access costly, we would be better off with storing the values as a local variable and reusing it. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> --- bloom.c | 5 +++-- commit-graph.c | 40 ++++++++++++++++++++++------------ commit-reach.c | 59 ++++++++++++++++++++++++++++++++------------------ commit.c | 6 +++-- revision.c | 12 ++++++---- 5 files changed, 79 insertions(+), 43 deletions(-) diff --git a/bloom.c b/bloom.c index 3062aafaba..6a7f2f2bdc 100644 --- a/bloom.c +++ b/bloom.c @@ -33,15 +33,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, struct commit *c) { uint32_t lex_pos, start_index, end_index; + uint32_t graph_pos = commit_graph_position(c); - while (commit_graph_position(c) < g->num_commits_in_base) + while (graph_pos < g->num_commits_in_base) g = g->base_graph; /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ if (!g->chunk_bloom_indexes) return 0; - lex_pos = commit_graph_position(c) - g->num_commits_in_base; + lex_pos = graph_pos - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); diff --git a/commit-graph.c b/commit-graph.c index 14cc7e931c..fdd1c4fa7c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -144,10 +144,12 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; + uint32_t generation_a = commit_graph_generation(a); + uint32_t generation_b = commit_graph_generation(b); /* lower generation commits first */ - if (commit_graph_generation(a) < commit_graph_generation(b)) + if (generation_a < generation_b) return -1; - else if (commit_graph_generation(a) > commit_graph_generation(b)) + else if (generation_a > generation_b) return 1; /* use date as a heuristic when generations are equal */ @@ -729,6 +731,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r, static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos) { const unsigned char *commit_data; + struct commit_graph_data *graph_data; uint32_t lex_index; while (pos < g->num_commits_in_base) @@ -736,8 +739,10 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; - commit_graph_data_at(item)->graph_pos = pos; - commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + + graph_data = commit_graph_data_at(item); + graph_data->graph_pos = pos; + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -753,6 +758,7 @@ static int fill_commit_in_graph(struct repository *r, uint32_t *parent_data_ptr; uint64_t date_low, date_high; struct commit_list **pptr; + struct commit_graph_data *graph_data; const unsigned char *commit_data; uint32_t lex_index; @@ -766,7 +772,8 @@ static int fill_commit_in_graph(struct repository *r, * Store the "full" position, but then use the * "local" position for the rest of the calculation. */ - commit_graph_data_at(item)->graph_pos = pos; + graph_data = commit_graph_data_at(item); + graph_data->graph_pos = pos; lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; @@ -779,7 +786,7 @@ static int fill_commit_in_graph(struct repository *r, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; pptr = &item->parents; @@ -811,8 +818,9 @@ static int fill_commit_in_graph(struct repository *r, static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) { - if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) { - *pos = commit_graph_position(item); + uint32_t graph_pos = commit_graph_position(item); + if (graph_pos != COMMIT_NOT_FROM_GRAPH) { + *pos = graph_pos; return 1; } else { struct commit_graph *cur_g = g; @@ -867,12 +875,13 @@ static struct tree *load_tree_for_commit(struct repository *r, { struct object_id oid; const unsigned char *commit_data; + uint32_t graph_pos = commit_graph_position(c); - while (commit_graph_position(c) < g->num_commits_in_base) + while (graph_pos < g->num_commits_in_base) g = g->base_graph; commit_data = g->chunk_commit_data + - GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base); + GRAPH_DATA_WIDTH * (graph_pos - g->num_commits_in_base); hashcpy(oid.hash, commit_data); set_commit_tree(c, lookup_tree(r, &oid)); @@ -2299,6 +2308,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) struct commit *graph_commit, *odb_commit; struct commit_list *graph_parents, *odb_parents; uint32_t max_generation = 0; + uint32_t generation; display_progress(progress, i + 1); hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -2337,8 +2347,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) oid_to_hex(&graph_parents->item->object.oid), oid_to_hex(&odb_parents->item->object.oid)); - if (commit_graph_generation(graph_parents->item) > max_generation) - max_generation = commit_graph_generation(graph_parents->item); + generation = commit_graph_generation(graph_parents->item); + if (generation > max_generation) + max_generation = generation; graph_parents = graph_parents->next; odb_parents = odb_parents->next; @@ -2368,10 +2379,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (max_generation == GENERATION_NUMBER_MAX) max_generation--; - if (commit_graph_generation(graph_commit) != max_generation + 1) + generation = commit_graph_generation(graph_commit); + if (generation != max_generation + 1) graph_report(_("commit-graph generation for commit %s is %u != %u"), oid_to_hex(&cur_oid), - commit_graph_generation(graph_commit), + generation, max_generation + 1); if (graph_commit->date != odb_commit->date) diff --git a/commit-reach.c b/commit-reach.c index 3b2f863f5f..f5e5c0a32b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -58,14 +58,15 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit *commit = prio_queue_get(&queue); struct commit_list *parents; int flags; + uint32_t generation = commit_graph_generation(commit); - if (min_generation && commit_graph_generation(commit) > last_gen) + if (min_generation && generation > last_gen) BUG("bad generation skip %8x > %8x at %s", - commit_graph_generation(commit), last_gen, + generation, last_gen, oid_to_hex(&commit->object.oid)); - last_gen = commit_graph_generation(commit); + last_gen = generation; - if (commit_graph_generation(commit) < min_generation) + if (generation < min_generation) break; flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); @@ -181,13 +182,15 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt if (redundant[i]) continue; for (j = filled = 0; j < cnt; j++) { + uint32_t curr_generation; if (i == j || redundant[j]) continue; filled_index[filled] = j; work[filled++] = array[j]; - if (commit_graph_generation(array[j]) < min_generation) - min_generation = commit_graph_generation(array[j]); + curr_generation = commit_graph_generation(array[j]); + if (curr_generation < min_generation) + min_generation = curr_generation; } common = paint_down_to_common(r, array[i], filled, work, min_generation); @@ -316,23 +319,26 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, { struct commit_list *bases; int ret = 0, i; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY; if (repo_parse_commit(r, commit)) return ret; for (i = 0; i < nr_reference; i++) { if (repo_parse_commit(r, reference[i])) return ret; - if (commit_graph_generation(reference[i]) < min_generation) - min_generation = commit_graph_generation(reference[i]); + + generation = commit_graph_generation(reference[i]); + if (generation < min_generation) + min_generation = generation; } - if (commit_graph_generation(commit) > min_generation) + generation = commit_graph_generation(commit); + if (generation > min_generation) return ret; bases = paint_down_to_common(r, commit, nr_reference, reference, - commit_graph_generation(commit)); + generation); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); @@ -490,10 +496,12 @@ static enum contains_result contains_tag_algo(struct commit *candidate, const struct commit_list *p; for (p = want; p; p = p->next) { + uint32_t generation; struct commit *c = p->item; load_commit_graph_info(the_repository, c); - if (commit_graph_generation(c) < cutoff) - cutoff = commit_graph_generation(c); + generation = commit_graph_generation(c); + if (generation < cutoff) + cutoff = generation; } result = contains_test(candidate, want, cache, cutoff); @@ -544,9 +552,12 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - if (commit_graph_generation(a) < commit_graph_generation(b)) + uint32_t generation_a = commit_graph_generation(a); + uint32_t generation_b = commit_graph_generation(b); + + if (generation_a < generation_b) return -1; - if (commit_graph_generation(a) > commit_graph_generation(b)) + if (generation_a > generation_b) return 1; return 0; } @@ -662,11 +673,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, add_object_array(&from_iter->item->object, NULL, &from_objs); if (!parse_commit(from_iter->item)) { + uint32_t generation; if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; - if (commit_graph_generation(from_iter->item) < min_generation) - min_generation = commit_graph_generation(from_iter->item); + generation = commit_graph_generation(from_iter->item); + if (generation < min_generation) + min_generation = generation; } from_iter = from_iter->next; @@ -674,11 +687,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, while (to_iter) { if (!parse_commit(to_iter->item)) { + uint32_t generation; if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; - if (commit_graph_generation(to_iter->item) < min_generation) - min_generation = commit_graph_generation(to_iter->item); + generation = commit_graph_generation(to_iter->item); + if (generation < min_generation) + min_generation = generation; } to_iter->item->object.flags |= PARENT2; @@ -718,11 +733,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; for (item = to; item < to_last; item++) { + uint32_t generation; struct commit *c = *item; parse_commit(c); - if (commit_graph_generation(c) < min_generation) - min_generation = commit_graph_generation(c); + generation = commit_graph_generation(c); + if (generation < min_generation) + min_generation = generation; if (!(c->object.flags & PARENT1)) { c->object.flags |= PARENT1; diff --git a/commit.c b/commit.c index ed0917a2c7..43d29a800d 100644 --- a/commit.c +++ b/commit.c @@ -729,11 +729,13 @@ int compare_commits_by_author_date(const void *a_, const void *b_, int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused) { const struct commit *a = a_, *b = b_; + const uint32_t generation_a = commit_graph_generation(a), + generation_b = commit_graph_generation(b); /* newer commits first */ - if (commit_graph_generation(a) < commit_graph_generation(b)) + if (generation_a < generation_b) return 1; - else if (commit_graph_generation(a) > commit_graph_generation(b)) + else if (generation_a > generation_b) return -1; /* use date as a heuristic when generations are equal */ diff --git a/revision.c b/revision.c index 8648d7c43c..32be93f404 100644 --- a/revision.c +++ b/revision.c @@ -3413,6 +3413,7 @@ static void init_topo_walk(struct rev_info *revs) info->min_generation = GENERATION_NUMBER_INFINITY; for (list = revs->commits; list; list = list->next) { struct commit *c = list->item; + uint32_t generation; if (parse_commit_gently(c, 1)) continue; @@ -3420,8 +3421,9 @@ static void init_topo_walk(struct rev_info *revs) test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED); test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE); - if (commit_graph_generation(c) < info->min_generation) - info->min_generation = commit_graph_generation(c); + generation = commit_graph_generation(c); + if (generation < info->min_generation) + info->min_generation = generation; *(indegree_slab_at(&info->indegree, c)) = 1; @@ -3472,6 +3474,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) for (p = commit->parents; p; p = p->next) { struct commit *parent = p->item; int *pi; + uint32_t generation; if (parent->object.flags & UNINTERESTING) continue; @@ -3479,8 +3482,9 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) if (parse_commit_gently(parent, 1) < 0) continue; - if (commit_graph_generation(parent) < info->min_generation) { - info->min_generation = commit_graph_generation(parent); + generation = commit_graph_generation(parent); + if (generation < info->min_generation) { + info->min_generation = generation; compute_indegrees_to_depth(revs, info->min_generation); } -- 2.27.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v4 0/4] Move generation, graph_pos to a slab 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar ` (3 preceding siblings ...) 2020-06-17 9:14 ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar @ 2020-06-19 13:59 ` Derrick Stolee 2020-06-19 17:44 ` Junio C Hamano 4 siblings, 1 reply; 39+ messages in thread From: Derrick Stolee @ 2020-06-19 13:59 UTC (permalink / raw) To: Abhishek Kumar, git; +Cc: jnareb, SZEDER Gábor, Taylor Blau On 6/17/2020 5:14 AM, Abhishek Kumar wrote: > The struct commit is used in many contexts. However, members > `generation` and `graph_pos` are only used for commit graph related > operations and otherwise waste memory. > > This wastage would have been more pronounced as we transition to > generation nuber v2, which uses 64-bit generation number instead of > current 32-bits. Thanks, Szeder (CC'd) for the quality review in the previous versions. I manually built and tested all of the patches here and verified they passed all tests. I think this series is in good shape. Thanks, -Stolee ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [GSOC Patch v4 0/4] Move generation, graph_pos to a slab 2020-06-19 13:59 ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee @ 2020-06-19 17:44 ` Junio C Hamano 0 siblings, 0 replies; 39+ messages in thread From: Junio C Hamano @ 2020-06-19 17:44 UTC (permalink / raw) To: Derrick Stolee Cc: Abhishek Kumar, git, jnareb, SZEDER Gábor, Taylor Blau Derrick Stolee <stolee@gmail.com> writes: > On 6/17/2020 5:14 AM, Abhishek Kumar wrote: >> The struct commit is used in many contexts. However, members >> `generation` and `graph_pos` are only used for commit graph related >> operations and otherwise waste memory. >> >> This wastage would have been more pronounced as we transition to >> generation nuber v2, which uses 64-bit generation number instead of >> current 32-bits. > > Thanks, Szeder (CC'd) for the quality review in the previous > versions. I manually built and tested all of the patches here > and verified they passed all tests. > > I think this series is in good shape. Thank you to all who are involved in this topic. Looking good. ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2020-06-19 17:44 UTC | newest] Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-04 7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar 2020-06-04 7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar 2020-06-04 14:36 ` Derrick Stolee 2020-06-04 17:35 ` Junio C Hamano 2020-06-05 23:23 ` Jakub Narębski 2020-06-04 7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar 2020-06-04 14:27 ` Derrick Stolee 2020-06-04 17:49 ` Junio C Hamano 2020-06-06 22:03 ` Jakub Narębski 2020-06-04 7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar 2020-06-07 12:12 ` Jakub Narębski 2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee 2020-06-04 17:55 ` Junio C Hamano 2020-06-07 19:53 ` SZEDER Gábor 2020-06-08 5:48 ` Abhishek Kumar 2020-06-08 8:36 ` SZEDER Gábor 2020-06-08 13:45 ` Derrick Stolee 2020-06-08 16:46 ` SZEDER Gábor 2020-06-08 15:21 ` Jakub Narębski 2020-06-05 19:00 ` Jakub Narębski 2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar 2020-06-07 19:32 ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar 2020-06-15 16:27 ` Taylor Blau 2020-06-07 19:32 ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar 2020-06-08 8:26 ` SZEDER Gábor 2020-06-08 12:35 ` Derrick Stolee 2020-06-07 19:32 ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar 2020-06-08 16:31 ` Jakub Narębski 2020-06-15 16:31 ` Taylor Blau 2020-06-07 19:32 ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar 2020-06-08 16:22 ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski 2020-06-15 16:24 ` Taylor Blau 2020-06-17 9:14 ` [GSOC Patch v4 " Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar 2020-06-17 9:14 ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar 2020-06-19 13:59 ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee 2020-06-19 17:44 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).