From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: "Junio C Hamano" <gitster@pobox.com>, "Jakub Narebski" <jnareb@gmail.com>, "Ted Ts\'o" <tytso@mit.edu>, "Jonathan Nieder" <jrnieder@gmail.com>, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, "Clemens Buchacher" <drizzd@aon.at>, "Shawn O. Pearce" <spearce@spearce.org> Subject: [RFC/PATCHv2 3/6] commit: add commit_generation function Date: Wed, 13 Jul 2011 03:05:17 -0400 Message-ID: <20110713070517.GC18566@sigill.intra.peff.net> (raw) In-Reply-To: <20110713064709.GA18499@sigill.intra.peff.net> A commit's generation is its height in the history graph, as measured from the farthest root. It is defined as: - If the commit has no parents, then its generation is 0. - Otherwise, its generation is 1 more than the maximum of its parents generations. The following diagram shows a sample history with generations: A(0)--B(1)--C(2)------G(5)--H(6) \ / D(2)--E(3)--F(4) Note that C and D have the same generation, as they are both children of B. Note also that the merge commit G's generation is 5, not 3, as we take the maximum of its parents' generations. Generation numbers can be useful for bounding traversals. For example, if we have two commits with generations 500 and 600, we know that the second cannot be an ancestor of the first. The first could be an ancestor of the second, but we can't know unless we traverse the history graph. However, when walking backwards from the "600" commit, once we reach generation "499", we know that the "500" commit cannot be an ancestor of the "499" commit, and we can stop the traversal without even looking at the earlier parts of the history. We already do something similar with commit timestamps in many traversals. However, timestamps are somewhat untrustworthy, as we have to deal with clock skew and with imports from buggy systems. Generation numbers are easy to calculate recursively, though you have to go to the roots to do so. This patch calculates and stores them in a persistent cache. It uses a simple recursive algorithm; you could probably drop the recursion by topologically sorting a list of all commits and filling in generation numbers left to right. But the recursive definition coupled with the cache make it very cheap to calculate generation numbers for new commits at the tip of history (you only have to traverse back to the last cached parents). We could also store generation numbers in the commit header directly. These would be faster to look at than an external cache (they would be on par speed-wise with commit timestamps). But there are a few reasons not to: 1. The reason to avoid commit timestamps is that they are unreliable. Generation numbers would probably be more reliable, but they are still redundant with the actual graph structure represented by the parent pointers, and can therefore be out of sync with the parent information. By calculating them ourselves, we know they are correct. 2. With grafts and replacement objects, the graph structure (and thus the generation numbers) can be changed. So the generation number, while immutable for a given commit object, can be changed when we "lie" about the graph structure via these mechanisms. Being able to simply clear the cache when these things are changed is helpful. 3. There's a lot of existing git history without generation headers. So we'd still need to have the same cache to handle those cases. 4. It generally pollutes the header with redundant information, which we try to avoid. Putting it in the commit header is purely a speedup, and it seems we can get similar performance with a generation cache. Signed-off-by: Jeff King <peff@peff.net> --- Same as before, but rebased onto the new metadata-cache interface. commit.c | 36 ++++++++++++++++++++++++++++++++++++ commit.h | 2 ++ 2 files changed, 38 insertions(+), 0 deletions(-) diff --git a/commit.c b/commit.c index ac337c7..fb37aa0 100644 --- a/commit.c +++ b/commit.c @@ -6,6 +6,7 @@ #include "diff.h" #include "revision.h" #include "notes.h" +#include "metadata-cache.h" int save_commit_buffer = 1; @@ -878,3 +879,38 @@ int commit_tree(const char *msg, unsigned char *tree, strbuf_release(&buffer); return result; } + +static struct metadata_cache generations = + METADATA_CACHE_INIT("generations", sizeof(uint32_t), NULL); + +static unsigned long commit_generation_recurse(struct commit *c) +{ + struct commit_list *p; + uint32_t r; + + if (!metadata_cache_lookup_uint32(&generations, &c->object, &r)) + return r; + + if (parse_commit(c) < 0) + die("unable to parse commit: %s", sha1_to_hex(c->object.sha1)); + + if (!c->parents) + return 0; + + r = 0; + for (p = c->parents; p; p = p->next) { + unsigned long pgen = commit_generation_recurse(p->item); + if (pgen > r) + r = pgen; + } + r++; + + metadata_cache_add_uint32(&generations, &c->object, r); + return r; +} + +unsigned long commit_generation(const struct commit *commit) +{ + /* drop const because we may call parse_commit */ + return commit_generation_recurse((struct commit *)commit); +} diff --git a/commit.h b/commit.h index a2d571b..bff6b36 100644 --- a/commit.h +++ b/commit.h @@ -176,4 +176,6 @@ extern int commit_tree(const char *msg, unsigned char *tree, struct commit_list *parents, unsigned char *ret, const char *author); +unsigned long commit_generation(const struct commit *commit); + #endif /* COMMIT_H */ -- 1.7.6.37.g989c6
next prev parent reply index Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2011-07-13 6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King 2011-07-13 6:57 ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jeff King 2011-07-13 17:52 ` Jonathan Nieder 2011-07-13 20:08 ` Jeff King 2011-07-14 17:34 ` Jeff King 2011-07-14 17:51 ` [PATCH 1/3] implement generic key/value map Jeff King 2011-07-14 18:52 ` Bert Wesarg 2011-07-14 18:54 ` Bert Wesarg 2011-07-14 18:55 ` Jeff King 2011-07-14 19:07 ` Bert Wesarg 2011-07-14 19:14 ` Jeff King 2011-07-14 19:18 ` Bert Wesarg 2011-07-14 17:52 ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King 2011-07-15 9:40 ` Sverre Rabbelier 2011-07-15 20:00 ` Jeff King 2011-07-14 17:53 ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King 2011-07-14 21:06 ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano 2011-08-04 22:43 ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King 2011-08-04 22:45 ` [PATCH 1/5] implement generic key/value map Jeff King 2011-08-04 22:46 ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King 2011-08-04 22:46 ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King 2011-08-04 22:46 ` [PATCH 4/5] map: implement persistent maps Jeff King 2011-08-04 22:46 ` [PATCH 5/5] implement metadata cache subsystem Jeff King 2011-08-05 11:03 ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King 2011-08-05 15:31 ` René Scharfe 2011-08-06 6:30 ` Jeff King 2011-07-13 7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King 2011-07-13 8:18 ` Bert Wesarg 2011-07-13 8:31 ` Jeff King 2011-07-13 8:45 ` Bert Wesarg 2011-07-13 19:18 ` Jeff King 2011-07-13 19:40 ` Junio C Hamano 2011-07-13 19:33 ` Junio C Hamano 2011-07-13 20:25 ` Jeff King 2011-07-13 7:05 ` Jeff King [this message] 2011-07-13 14:26 ` [RFC/PATCHv2 3/6] commit: add commit_generation function Eric Sunshine 2011-07-13 7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King 2011-07-13 7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King 2011-07-13 14:26 ` Eric Sunshine 2011-07-13 19:35 ` Jeff King 2011-07-13 7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King 2011-07-13 7:23 ` Jeff King 2011-07-13 20:33 ` Junio C Hamano 2011-07-13 20:58 ` Jeff King 2011-07-13 21:12 ` Junio C Hamano 2011-07-13 21:18 ` Jeff King 2011-07-15 18:22 ` Junio C Hamano 2011-07-15 20:40 ` Jeff King 2011-07-15 21:04 ` Junio C Hamano 2011-07-15 21:14 ` Jeff King 2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski 2011-07-15 21:10 ` Jeff King 2011-07-16 21:10 ` Jakub Narebski 2011-08-04 22:48 [RFC/PATCH 0/2] patch-id caching Jeff King 2011-08-04 22:49 ` [PATCH 1/2] cherry: read default config Jeff King 2011-08-04 22:49 ` [PATCH 2/2] cache patch ids on disk Jeff King 2011-08-04 22:52 ` Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20110713070517.GC18566@sigill.intra.peff.net \ --to=peff@peff.net \ --cc=avarab@gmail.com \ --cc=drizzd@aon.at \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=jnareb@gmail.com \ --cc=jrnieder@gmail.com \ --cc=spearce@spearce.org \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Mailing List Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/git/0 git/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 git git/ https://lore.kernel.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git