git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] speed up reflog unreachability pruning
@ 2009-03-31 17:03 Linus Torvalds
  2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:03 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


Ok, this is a two-patch series that first tries to clean things up a bit, 
and then applies Junio's approach to speed up the reachability test.

The reason I did it this way is that I think we can improve on the 
reachability logic a bit more, but in order to do that I refuse to work 
with the crazy duplicated complex logic inside 'expire_reflog_ent()', and 
I wanted to abstract it out.

Then, the first cut at speedup is just Junio's approach. Which is fairly 
hacky, but works.

I'd _like_ to do more of a "dynamically do 'mark_reachable()' only when 
necessary" thing, but that's a separate cleanup thing.

As is, this improves the reflog expire quite enormously for me:

 - before:

	[torvalds@nehalem linux]$ time git reflog expire --all
	real	0m37.193s
	user	0m37.174s
	sys	0m0.020s

 - after:

	[torvalds@nehalem linux]$ time ~/git/git reflog expire --all
	real	0m1.693s
	user	0m1.672s
	sys	0m0.020s

although I do suspect that the 'mark_reachable()' could slow things down 
in some less extreme cases. But probably never by a huge amount.

Total diffstat:

 builtin-reflog.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 69 insertions(+), 6 deletions(-)

with the two individual patches coming up next.

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Clean up reflog unreachability pruning decision
  2009-03-31 17:03 [PATCH 0/2] speed up reflog unreachability pruning Linus Torvalds
@ 2009-03-31 17:09 ` Linus Torvalds
  2009-03-31 17:11   ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
  2009-03-31 17:44   ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:09 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 31 Mar 2009 09:45:22 -0700

This clarifies the pruning rules for unreachable commits by having a 
separate helpder function for the unreachability decision.

It's preparation for actual bigger changes to come to speed up the
decision when the reachability calculations become a bottleneck.

In the process it also _does_ change behavior, although in a way that I 
think is much saner than the old behavior (which was in my opinion not 
designed, just a result of how the tests were written). It now will prune 
reflog entries that are older than that 'prune_unreacable' time _and_ that 
have commit references that can't be even looked up.

Of course, "--stale-fix" also does that, and does it regardless of the age 
of the reflog entry is, but I really think this is the right thing to do. 
If we can't even look it up, we should consider it to be unreachable.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 builtin-reflog.c |   32 ++++++++++++++++++++++++++------
 1 files changed, 26 insertions(+), 6 deletions(-)

Note the behavioural change. I think it's sane and "ObviouslyCorrect(tm)", 
but maybe somebody disagrees.

diff --git a/builtin-reflog.c b/builtin-reflog.c
index d95f515..0355ce6 100644
--- a/builtin-reflog.c
+++ b/builtin-reflog.c
@@ -209,6 +209,31 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
 	return 1;
 }
 
+static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
+{
+	/*
+	 * We may or may not have the commit yet - if not, look it
+	 * up using the supplied sha1.
+	 */
+	if (!commit) {
+		if (is_null_sha1(sha1))
+			return 0;
+
+		commit = lookup_commit_reference_gently(sha1, 1);
+
+		/* We can't even look it up - consider it unreachable */
+		if (!commit)
+			return 1;
+	}
+
+	/* Reachable from the current reflog top? Don't prune */
+	if (in_merge_bases(commit, &cb->ref_commit, 1))
+		return 0;
+
+	/* We can't reach it - prune it. */
+	return 1;
+}
+
 static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
 		const char *email, unsigned long timestamp, int tz,
 		const char *message, void *cb_data)
@@ -230,12 +255,7 @@ static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
 	if (timestamp < cb->cmd->expire_unreachable) {
 		if (!cb->ref_commit)
 			goto prune;
-		if (!old && !is_null_sha1(osha1))
-			old = lookup_commit_reference_gently(osha1, 1);
-		if (!new && !is_null_sha1(nsha1))
-			new = lookup_commit_reference_gently(nsha1, 1);
-		if ((old && !in_merge_bases(old, &cb->ref_commit, 1)) ||
-		    (new && !in_merge_bases(new, &cb->ref_commit, 1)))
+		if (unreachable(cb, old, osha1) || unreachable(cb, new, nsha1))
 			goto prune;
 	}
 
-- 
1.6.2.1.404.gb0085.dirty

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] Speed up reflog pruning of unreachable commits
  2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
@ 2009-03-31 17:11   ` Linus Torvalds
  2009-03-31 17:46     ` Junio C Hamano
  2009-03-31 17:44   ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:11 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


From: Junio Hamano <gitster@pobox.com>
Date: Mon, 30 Mar 2009 21:34:14 -0700

Instead of doing the (potentially very expensive) "in_merge_base()"
check for each commit that might be pruned if it is unreachable, do a
preparatory reachability graph of the commit space, so that the common
case of being reachable can be tested directly.

[ Cleaned up a bit and tweaked to actually work.  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 builtin-reflog.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/builtin-reflog.c b/builtin-reflog.c
index 0355ce6..f29ab2f 100644
--- a/builtin-reflog.c
+++ b/builtin-reflog.c
@@ -52,6 +52,7 @@ struct collect_reflog_cb {
 
 #define INCOMPLETE	(1u<<10)
 #define STUDYING	(1u<<11)
+#define REACHABLE	(1u<<12)
 
 static int tree_is_complete(const unsigned char *sha1)
 {
@@ -209,6 +210,43 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
 	return 1;
 }
 
+static void mark_reachable(struct commit *commit, unsigned long expire_limit)
+{
+	/*
+	 * We need to compute if commit on either side of an reflog
+	 * entry is reachable from the tip of the ref for all entries.
+	 * Mark commits that are reachable from the tip down to the
+	 * time threashold first; we know a commit marked thusly is
+	 * reachable from the tip without running in_merge_bases()
+	 * at all.
+	 */
+	struct commit_list *pending = NULL;
+
+	commit_list_insert(commit, &pending);
+	while (pending) {
+		struct commit_list *entry = pending;
+		struct commit_list *parent;
+		pending = entry->next;
+		commit = entry->item;
+		free(entry);
+		if (commit->object.flags & REACHABLE)
+			continue;
+		if (parse_commit(commit))
+			continue;
+		commit->object.flags |= REACHABLE;
+		if (commit->date < expire_limit)
+			continue;
+		parent = commit->parents;
+		while (parent) {
+			commit = parent->item;
+			parent = parent->next;
+			if (commit->object.flags & REACHABLE)
+				continue;
+			commit_list_insert(commit, &pending);
+		}
+	}
+}
+
 static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
 {
 	/*
@@ -227,6 +265,8 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig
 	}
 
 	/* Reachable from the current reflog top? Don't prune */
+	if (commit->object.flags & REACHABLE)
+		return 0;
 	if (in_merge_bases(commit, &cb->ref_commit, 1))
 		return 0;
 
@@ -308,7 +348,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused,
 	cb.ref_commit = lookup_commit_reference_gently(sha1, 1);
 	cb.ref = ref;
 	cb.cmd = cmd;
+
+	mark_reachable(cb.ref_commit, cmd->expire_total);
 	for_each_reflog_ent(ref, expire_reflog_ent, &cb);
+	clear_commit_marks(cb.ref_commit, REACHABLE);
  finish:
 	if (cb.newlog) {
 		if (fclose(cb.newlog)) {
-- 
1.6.2.1.404.gb0085.dirty

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] Clean up reflog unreachability pruning decision
  2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
  2009-03-31 17:11   ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
@ 2009-03-31 17:44   ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2009-03-31 17:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 31 Mar 2009 09:45:22 -0700
>
> This clarifies the pruning rules for unreachable commits by having a 
> separate helpder function for the unreachability decision.
>
> It's preparation for actual bigger changes to come to speed up the
> decision when the reachability calculations become a bottleneck.
>
> In the process it also _does_ change behavior, although in a way that I 
> think is much saner than the old behavior (which was in my opinion not 
> designed, just a result of how the tests were written). It now will prune 
> reflog entries that are older than that 'prune_unreacable' time _and_ that 
> have commit references that can't be even looked up.

> Of course, "--stale-fix" also does that, and does it regardless of the age 
> of the reflog entry is, but I really think this is the right thing to do. 
> If we can't even look it up, we should consider it to be unreachable.
>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  builtin-reflog.c |   32 ++++++++++++++++++++++++++------
>  1 files changed, 26 insertions(+), 6 deletions(-)
>
> Note the behavioural change. I think it's sane and "ObviouslyCorrect(tm)", 
> but maybe somebody disagrees.

I did not recall initially when I was discussing this with you last night
but I think this "no commit? not prune" is intentional.  The codepath is
not about logs/heads but logs/*anything*, and we allow pointers to non
commit objects (like v2.6.11-tree).

Also even if we limit ourselves to commits, @{-N} notation can be used to
see the history of branch switching, and that does not need any underlying
objects.  I do not think it is interesting to be able to see which branch
you were on 30 days abo, though ;-)


> diff --git a/builtin-reflog.c b/builtin-reflog.c
> index d95f515..0355ce6 100644
> --- a/builtin-reflog.c
> +++ b/builtin-reflog.c
> @@ -209,6 +209,31 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
>  	return 1;
>  }
>  
> +static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
> +{
> +	/*
> +	 * We may or may not have the commit yet - if not, look it
> +	 * up using the supplied sha1.
> +	 */
> +	if (!commit) {
> +		if (is_null_sha1(sha1))
> +			return 0;
> +
> +		commit = lookup_commit_reference_gently(sha1, 1);
> +
> +		/* We can't even look it up - consider it unreachable */
> +		if (!commit)
> +			return 1;

	/* If it is not a commit, keep it. */
        if (!commit)
        	return 0;

> +	}
> +
> +	/* Reachable from the current reflog top? Don't prune */

That's "tip of the ref", not necessarily "reflog top".

> +	if (in_merge_bases(commit, &cb->ref_commit, 1))
> +		return 0;
> +
> +	/* We can't reach it - prune it. */
> +	return 1;
> +}
> +
>  static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
>  		const char *email, unsigned long timestamp, int tz,
>  		const char *message, void *cb_data)
> @@ -230,12 +255,7 @@ static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
>  	if (timestamp < cb->cmd->expire_unreachable) {
>  		if (!cb->ref_commit)
>  			goto prune;
> -		if (!old && !is_null_sha1(osha1))
> -			old = lookup_commit_reference_gently(osha1, 1);
> -		if (!new && !is_null_sha1(nsha1))
> -			new = lookup_commit_reference_gently(nsha1, 1);
> -		if ((old && !in_merge_bases(old, &cb->ref_commit, 1)) ||
> -		    (new && !in_merge_bases(new, &cb->ref_commit, 1)))
> +		if (unreachable(cb, old, osha1) || unreachable(cb, new, nsha1))
>  			goto prune;
>  	}
>  
> -- 
> 1.6.2.1.404.gb0085.dirty

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] Speed up reflog pruning of unreachable commits
  2009-03-31 17:11   ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
@ 2009-03-31 17:46     ` Junio C Hamano
  2009-03-31 17:58       ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2009-03-31 17:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> From: Junio Hamano <gitster@pobox.com>
> Date: Mon, 30 Mar 2009 21:34:14 -0700
>
> Instead of doing the (potentially very expensive) "in_merge_base()"
> check for each commit that might be pruned if it is unreachable, do a
> preparatory reachability graph of the commit space, so that the common
> case of being reachable can be tested directly.
>
> [ Cleaned up a bit and tweaked to actually work.  - Linus ]
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  builtin-reflog.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-reflog.c b/builtin-reflog.c
> index 0355ce6..f29ab2f 100644
> --- a/builtin-reflog.c
> +++ b/builtin-reflog.c
> @@ -52,6 +52,7 @@ struct collect_reflog_cb {
>  
>  #define INCOMPLETE	(1u<<10)
>  #define STUDYING	(1u<<11)
> +#define REACHABLE	(1u<<12)
>  
>  static int tree_is_complete(const unsigned char *sha1)
>  {
> @@ -209,6 +210,43 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
>  	return 1;
>  }
>  
> +static void mark_reachable(struct commit *commit, unsigned long expire_limit)
> +{
> +	/*
> +	 * We need to compute if commit on either side of an reflog
> +	 * entry is reachable from the tip of the ref for all entries.
> +	 * Mark commits that are reachable from the tip down to the
> +	 * time threashold first; we know a commit marked thusly is
> +	 * reachable from the tip without running in_merge_bases()
> +	 * at all.
> +	 */
> +	struct commit_list *pending = NULL;
> +
> +	commit_list_insert(commit, &pending);
> +	while (pending) {
> +		struct commit_list *entry = pending;
> +		struct commit_list *parent;
> +		pending = entry->next;
> +		commit = entry->item;
> +		free(entry);
> +		if (commit->object.flags & REACHABLE)
> +			continue;
> +		if (parse_commit(commit))
> +			continue;
> +		commit->object.flags |= REACHABLE;
> +		if (commit->date < expire_limit)
> +			continue;
> +		parent = commit->parents;
> +		while (parent) {
> +			commit = parent->item;
> +			parent = parent->next;
> +			if (commit->object.flags & REACHABLE)
> +				continue;
> +			commit_list_insert(commit, &pending);
> +		}
> +	}
> +}
> +
>  static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
>  {
>  	/*
> @@ -227,6 +265,8 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig
>  	}
>  
>  	/* Reachable from the current reflog top? Don't prune */
> +	if (commit->object.flags & REACHABLE)
> +		return 0;
>  	if (in_merge_bases(commit, &cb->ref_commit, 1))
>  		return 0;
>  
> @@ -308,7 +348,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused,
>  	cb.ref_commit = lookup_commit_reference_gently(sha1, 1);
>  	cb.ref = ref;
>  	cb.cmd = cmd;
> +

You seem to have lost "if (cb.ref_commit)" from the last round to protect
mark_rechable().  It can be NULL.

> +	mark_reachable(cb.ref_commit, cmd->expire_total);
>  	for_each_reflog_ent(ref, expire_reflog_ent, &cb);
> +	clear_commit_marks(cb.ref_commit, REACHABLE);
>   finish:
>  	if (cb.newlog) {
>  		if (fclose(cb.newlog)) {
> -- 
> 1.6.2.1.404.gb0085.dirty

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] Speed up reflog pruning of unreachable commits
  2009-03-31 17:46     ` Junio C Hamano
@ 2009-03-31 17:58       ` Linus Torvalds
  0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List



On Tue, 31 Mar 2009, Junio C Hamano wrote:
> 
> You seem to have lost "if (cb.ref_commit)" from the last round to protect
> mark_rechable().  It can be NULL.

I based it on your original patch, so yeah, it's missing all your updates 
since.

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-03-31 18:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-31 17:03 [PATCH 0/2] speed up reflog unreachability pruning Linus Torvalds
2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
2009-03-31 17:11   ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
2009-03-31 17:46     ` Junio C Hamano
2009-03-31 17:58       ` Linus Torvalds
2009-03-31 17:44   ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).