git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "Jakub Narebski" <jnareb@gmail.com>,
	"Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>
Subject: Re: [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation
Date: Fri, 15 Jul 2011 11:22:03 -0700	[thread overview]
Message-ID: <7vpqlb1k1g.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20110713070644.GF18566@sigill.intra.peff.net> (Jeff King's message of "Wed, 13 Jul 2011 03:06:44 -0400")

Jeff King <peff@peff.net> writes:

> diff --git a/builtin/tag.c b/builtin/tag.c
> index 63bce6e..df6de47 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -40,7 +40,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
>  }
>  
>  static int contains_recurse(struct commit *candidate,
> -			    const struct commit_list *want)
> +			    const struct commit_list *want,
> +			    unsigned long cutoff)
>  {
>  	struct commit_list *p;
>  
> @@ -57,9 +58,13 @@ static int contains_recurse(struct commit *candidate,
>  	if (parse_commit(candidate) < 0)
>  		return 0;
>  
> +	/* stop searching if we go too far back in time */
> +	if (commit_generation(candidate) < cutoff)
> +		return 0;
> +

Here, the "generation number" was the commit timestamp of the commit in
your earlier round, but now it is not.

I agree with Linus that for the purpose of "rev-list A ^B ^C" computation,
"generation number" is a much better thing to use than the commit
timestamp, and I also think if we want to revamp the codepath around
still_interesting() in revision.c, Linus's "let's add generation header in
new commits" is a good first step. I haven't stared at that codepath long
enough lately to say for sure, but I suspect that in that codepath we
could use generation number in commits when available and fall back to
timestamp with slop without recomputing the generation number and caching
for older commits.

But that is _not_ the codepath your series is about.

What you are trying to say in this series is that "If a tag points at a
commit X (i.e. candidate), another commit Y (i.e. "want") that is much
younger than cannot possibly be included by it, because a tag was made on
commit X way before Y was created". You cut off by "age".

The heuristics would work efficiently to check what tags point at a
relatively recent commit (e.g. when trying to see which releases are
affected and needing a fix by a recently discovered bug introduced by
commit Y) by discarding tags for ancient releases. In such a scenario, the
timestamp of a tagged and ancient commit X on a side branch that was never
merged in the mainline that leads to commit Y (i.e. "want"), in an ideal
world without clock skews, will be way older than the timestamp of commit
Y. In other words, if you use timestamp as "age", even though X and Y do
not relate to each other directly, except that they may share an ancient
common ancestor, their "age"s can be compared and you could apply your
heuristics to optimize.

But if you use the generation number as "age", even in an ideal world
without clock skew nor miscomputed generation number, you no longer can
compare "age" of X and Y.  The ancient side branch that led to X may have
tons more commits than the history leading to Y.

So I have two tangents here.

 * As to revision traversal, the reason we do the SLOP in
   still_interesting() is to avoid the issue arising from the following
   topology:

              5
             /
     ---2---4---...---1
         \
          3---6

   where numbers denote the timestamp of the commit (1 incorrectly records
   ancient timestamp, while everybody else has newer timestamp than its
   parents). When we try to "rev-list 5 --not 1 6", we start from having
   ~1, ~6 and 5 to "currently active" list, and iteratively pick more
   recent commits from the active list to dig deeper while newly found
   ancestors back to the active list. ~6 leads to 3 marked as
   uninteresting and active list gets ~3 while ~6 is removed. 5 leads to 4
   marked as interesting and 4 gets inserted in the list while 5 is
   removed. 4 finds 2 to also be interesting. Then ~3 finds 2 to be
   uninteresting. At that point, we have ~1 (still not processed) and ~2
   in the active list, while we have 5 and 4 in the result list. We need
   to dig through ~1 to realize that 4 is reachable from it to mark it
   uninteresting.

   So how about marking commits (using the metainfo-cache facility) that
   has an ancestor (not necessarily its direct parent) that records a
   younger timestamp (e.g. 1 is such a commit, as its ancestors include
   things like 2 and 4)? There should be relatively small number of them,
   and still_interesting() logic can be told to dig through such commits
   even if everybody is uninteresting in the active list.

 * As to "tag --contains", when timestamp based heuristics breaks down is
   when a tagged commit incorrectly records way young timestamp or the
   "want" commit records way old timetsamp. I haven't thought things
   through, but the same metainfo-cache may be useful to detect which
   commit to dig through ignoring the cutoff heuristics.

  parent reply	other threads:[~2011-07-15 18:22 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13  6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King
2011-07-13  6:57 ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jeff King
2011-07-13 17:52   ` Jonathan Nieder
2011-07-13 20:08     ` Jeff King
2011-07-14 17:34       ` Jeff King
2011-07-14 17:51         ` [PATCH 1/3] implement generic key/value map Jeff King
2011-07-14 18:52           ` Bert Wesarg
2011-07-14 18:54             ` Bert Wesarg
2011-07-14 18:55               ` Jeff King
2011-07-14 19:07                 ` Bert Wesarg
2011-07-14 19:14                   ` Jeff King
2011-07-14 19:18                     ` Bert Wesarg
2011-07-14 17:52         ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-07-15  9:40           ` Sverre Rabbelier
2011-07-15 20:00             ` Jeff King
2011-07-14 17:53         ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King
2011-07-14 21:06         ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano
2011-08-04 22:43           ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-04 22:45             ` [PATCH 1/5] implement generic key/value map Jeff King
2011-08-04 22:46             ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-08-04 22:46             ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King
2011-08-04 22:46             ` [PATCH 4/5] map: implement persistent maps Jeff King
2011-08-04 22:46             ` [PATCH 5/5] implement metadata cache subsystem Jeff King
2011-08-04 22:48             ` [RFC/PATCH 0/2] patch-id caching Jeff King
2011-08-04 22:49               ` [PATCH 1/2] cherry: read default config Jeff King
2011-08-04 22:49               ` [PATCH 2/2] cache patch ids on disk Jeff King
2011-08-04 22:52                 ` Jeff King
2011-08-05 11:03             ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-05 15:31               ` René Scharfe
2011-08-06  6:30                 ` Jeff King
2011-07-13  7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King
2011-07-13  8:18   ` Bert Wesarg
2011-07-13  8:31     ` Jeff King
2011-07-13  8:45       ` Bert Wesarg
2011-07-13 19:18         ` Jeff King
2011-07-13 19:40       ` Junio C Hamano
2011-07-13 19:33   ` Junio C Hamano
2011-07-13 20:25     ` Jeff King
2011-07-13  7:05 ` [RFC/PATCHv2 3/6] commit: add commit_generation function Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13  7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13 19:35     ` Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King
2011-07-13  7:23   ` Jeff King
2011-07-13 20:33     ` Junio C Hamano
2011-07-13 20:58       ` Jeff King
2011-07-13 21:12         ` Junio C Hamano
2011-07-13 21:18           ` Jeff King
2011-07-15 18:22   ` Junio C Hamano [this message]
2011-07-15 20:40     ` Jeff King
2011-07-15 21:04       ` Junio C Hamano
2011-07-15 21:14         ` Jeff King
2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski
2011-07-15 21:10   ` Jeff King
2011-07-16 21:10     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vpqlb1k1g.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).