* [PATCH 0/2] speed up reflog unreachability pruning
@ 2009-03-31 17:03 Linus Torvalds
2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:03 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Ok, this is a two-patch series that first tries to clean things up a bit,
and then applies Junio's approach to speed up the reachability test.
The reason I did it this way is that I think we can improve on the
reachability logic a bit more, but in order to do that I refuse to work
with the crazy duplicated complex logic inside 'expire_reflog_ent()', and
I wanted to abstract it out.
Then, the first cut at speedup is just Junio's approach. Which is fairly
hacky, but works.
I'd _like_ to do more of a "dynamically do 'mark_reachable()' only when
necessary" thing, but that's a separate cleanup thing.
As is, this improves the reflog expire quite enormously for me:
- before:
[torvalds@nehalem linux]$ time git reflog expire --all
real 0m37.193s
user 0m37.174s
sys 0m0.020s
- after:
[torvalds@nehalem linux]$ time ~/git/git reflog expire --all
real 0m1.693s
user 0m1.672s
sys 0m0.020s
although I do suspect that the 'mark_reachable()' could slow things down
in some less extreme cases. But probably never by a huge amount.
Total diffstat:
builtin-reflog.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 69 insertions(+), 6 deletions(-)
with the two individual patches coming up next.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] Clean up reflog unreachability pruning decision
2009-03-31 17:03 [PATCH 0/2] speed up reflog unreachability pruning Linus Torvalds
@ 2009-03-31 17:09 ` Linus Torvalds
2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
2009-03-31 17:44 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano
0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:09 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 31 Mar 2009 09:45:22 -0700
This clarifies the pruning rules for unreachable commits by having a
separate helpder function for the unreachability decision.
It's preparation for actual bigger changes to come to speed up the
decision when the reachability calculations become a bottleneck.
In the process it also _does_ change behavior, although in a way that I
think is much saner than the old behavior (which was in my opinion not
designed, just a result of how the tests were written). It now will prune
reflog entries that are older than that 'prune_unreacable' time _and_ that
have commit references that can't be even looked up.
Of course, "--stale-fix" also does that, and does it regardless of the age
of the reflog entry is, but I really think this is the right thing to do.
If we can't even look it up, we should consider it to be unreachable.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
builtin-reflog.c | 32 ++++++++++++++++++++++++++------
1 files changed, 26 insertions(+), 6 deletions(-)
Note the behavioural change. I think it's sane and "ObviouslyCorrect(tm)",
but maybe somebody disagrees.
diff --git a/builtin-reflog.c b/builtin-reflog.c
index d95f515..0355ce6 100644
--- a/builtin-reflog.c
+++ b/builtin-reflog.c
@@ -209,6 +209,31 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
return 1;
}
+static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
+{
+ /*
+ * We may or may not have the commit yet - if not, look it
+ * up using the supplied sha1.
+ */
+ if (!commit) {
+ if (is_null_sha1(sha1))
+ return 0;
+
+ commit = lookup_commit_reference_gently(sha1, 1);
+
+ /* We can't even look it up - consider it unreachable */
+ if (!commit)
+ return 1;
+ }
+
+ /* Reachable from the current reflog top? Don't prune */
+ if (in_merge_bases(commit, &cb->ref_commit, 1))
+ return 0;
+
+ /* We can't reach it - prune it. */
+ return 1;
+}
+
static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
const char *email, unsigned long timestamp, int tz,
const char *message, void *cb_data)
@@ -230,12 +255,7 @@ static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
if (timestamp < cb->cmd->expire_unreachable) {
if (!cb->ref_commit)
goto prune;
- if (!old && !is_null_sha1(osha1))
- old = lookup_commit_reference_gently(osha1, 1);
- if (!new && !is_null_sha1(nsha1))
- new = lookup_commit_reference_gently(nsha1, 1);
- if ((old && !in_merge_bases(old, &cb->ref_commit, 1)) ||
- (new && !in_merge_bases(new, &cb->ref_commit, 1)))
+ if (unreachable(cb, old, osha1) || unreachable(cb, new, nsha1))
goto prune;
}
--
1.6.2.1.404.gb0085.dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] Speed up reflog pruning of unreachable commits
2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
@ 2009-03-31 17:11 ` Linus Torvalds
2009-03-31 17:46 ` Junio C Hamano
2009-03-31 17:44 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano
1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:11 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
From: Junio Hamano <gitster@pobox.com>
Date: Mon, 30 Mar 2009 21:34:14 -0700
Instead of doing the (potentially very expensive) "in_merge_base()"
check for each commit that might be pruned if it is unreachable, do a
preparatory reachability graph of the commit space, so that the common
case of being reachable can be tested directly.
[ Cleaned up a bit and tweaked to actually work. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
builtin-reflog.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 files changed, 43 insertions(+), 0 deletions(-)
diff --git a/builtin-reflog.c b/builtin-reflog.c
index 0355ce6..f29ab2f 100644
--- a/builtin-reflog.c
+++ b/builtin-reflog.c
@@ -52,6 +52,7 @@ struct collect_reflog_cb {
#define INCOMPLETE (1u<<10)
#define STUDYING (1u<<11)
+#define REACHABLE (1u<<12)
static int tree_is_complete(const unsigned char *sha1)
{
@@ -209,6 +210,43 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
return 1;
}
+static void mark_reachable(struct commit *commit, unsigned long expire_limit)
+{
+ /*
+ * We need to compute if commit on either side of an reflog
+ * entry is reachable from the tip of the ref for all entries.
+ * Mark commits that are reachable from the tip down to the
+ * time threashold first; we know a commit marked thusly is
+ * reachable from the tip without running in_merge_bases()
+ * at all.
+ */
+ struct commit_list *pending = NULL;
+
+ commit_list_insert(commit, &pending);
+ while (pending) {
+ struct commit_list *entry = pending;
+ struct commit_list *parent;
+ pending = entry->next;
+ commit = entry->item;
+ free(entry);
+ if (commit->object.flags & REACHABLE)
+ continue;
+ if (parse_commit(commit))
+ continue;
+ commit->object.flags |= REACHABLE;
+ if (commit->date < expire_limit)
+ continue;
+ parent = commit->parents;
+ while (parent) {
+ commit = parent->item;
+ parent = parent->next;
+ if (commit->object.flags & REACHABLE)
+ continue;
+ commit_list_insert(commit, &pending);
+ }
+ }
+}
+
static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
{
/*
@@ -227,6 +265,8 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig
}
/* Reachable from the current reflog top? Don't prune */
+ if (commit->object.flags & REACHABLE)
+ return 0;
if (in_merge_bases(commit, &cb->ref_commit, 1))
return 0;
@@ -308,7 +348,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused,
cb.ref_commit = lookup_commit_reference_gently(sha1, 1);
cb.ref = ref;
cb.cmd = cmd;
+
+ mark_reachable(cb.ref_commit, cmd->expire_total);
for_each_reflog_ent(ref, expire_reflog_ent, &cb);
+ clear_commit_marks(cb.ref_commit, REACHABLE);
finish:
if (cb.newlog) {
if (fclose(cb.newlog)) {
--
1.6.2.1.404.gb0085.dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] Clean up reflog unreachability pruning decision
2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
@ 2009-03-31 17:44 ` Junio C Hamano
1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2009-03-31 17:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 31 Mar 2009 09:45:22 -0700
>
> This clarifies the pruning rules for unreachable commits by having a
> separate helpder function for the unreachability decision.
>
> It's preparation for actual bigger changes to come to speed up the
> decision when the reachability calculations become a bottleneck.
>
> In the process it also _does_ change behavior, although in a way that I
> think is much saner than the old behavior (which was in my opinion not
> designed, just a result of how the tests were written). It now will prune
> reflog entries that are older than that 'prune_unreacable' time _and_ that
> have commit references that can't be even looked up.
> Of course, "--stale-fix" also does that, and does it regardless of the age
> of the reflog entry is, but I really think this is the right thing to do.
> If we can't even look it up, we should consider it to be unreachable.
>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> builtin-reflog.c | 32 ++++++++++++++++++++++++++------
> 1 files changed, 26 insertions(+), 6 deletions(-)
>
> Note the behavioural change. I think it's sane and "ObviouslyCorrect(tm)",
> but maybe somebody disagrees.
I did not recall initially when I was discussing this with you last night
but I think this "no commit? not prune" is intentional. The codepath is
not about logs/heads but logs/*anything*, and we allow pointers to non
commit objects (like v2.6.11-tree).
Also even if we limit ourselves to commits, @{-N} notation can be used to
see the history of branch switching, and that does not need any underlying
objects. I do not think it is interesting to be able to see which branch
you were on 30 days abo, though ;-)
> diff --git a/builtin-reflog.c b/builtin-reflog.c
> index d95f515..0355ce6 100644
> --- a/builtin-reflog.c
> +++ b/builtin-reflog.c
> @@ -209,6 +209,31 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
> return 1;
> }
>
> +static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
> +{
> + /*
> + * We may or may not have the commit yet - if not, look it
> + * up using the supplied sha1.
> + */
> + if (!commit) {
> + if (is_null_sha1(sha1))
> + return 0;
> +
> + commit = lookup_commit_reference_gently(sha1, 1);
> +
> + /* We can't even look it up - consider it unreachable */
> + if (!commit)
> + return 1;
/* If it is not a commit, keep it. */
if (!commit)
return 0;
> + }
> +
> + /* Reachable from the current reflog top? Don't prune */
That's "tip of the ref", not necessarily "reflog top".
> + if (in_merge_bases(commit, &cb->ref_commit, 1))
> + return 0;
> +
> + /* We can't reach it - prune it. */
> + return 1;
> +}
> +
> static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
> const char *email, unsigned long timestamp, int tz,
> const char *message, void *cb_data)
> @@ -230,12 +255,7 @@ static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
> if (timestamp < cb->cmd->expire_unreachable) {
> if (!cb->ref_commit)
> goto prune;
> - if (!old && !is_null_sha1(osha1))
> - old = lookup_commit_reference_gently(osha1, 1);
> - if (!new && !is_null_sha1(nsha1))
> - new = lookup_commit_reference_gently(nsha1, 1);
> - if ((old && !in_merge_bases(old, &cb->ref_commit, 1)) ||
> - (new && !in_merge_bases(new, &cb->ref_commit, 1)))
> + if (unreachable(cb, old, osha1) || unreachable(cb, new, nsha1))
> goto prune;
> }
>
> --
> 1.6.2.1.404.gb0085.dirty
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] Speed up reflog pruning of unreachable commits
2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
@ 2009-03-31 17:46 ` Junio C Hamano
2009-03-31 17:58 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2009-03-31 17:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> From: Junio Hamano <gitster@pobox.com>
> Date: Mon, 30 Mar 2009 21:34:14 -0700
>
> Instead of doing the (potentially very expensive) "in_merge_base()"
> check for each commit that might be pruned if it is unreachable, do a
> preparatory reachability graph of the commit space, so that the common
> case of being reachable can be tested directly.
>
> [ Cleaned up a bit and tweaked to actually work. - Linus ]
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> builtin-reflog.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-reflog.c b/builtin-reflog.c
> index 0355ce6..f29ab2f 100644
> --- a/builtin-reflog.c
> +++ b/builtin-reflog.c
> @@ -52,6 +52,7 @@ struct collect_reflog_cb {
>
> #define INCOMPLETE (1u<<10)
> #define STUDYING (1u<<11)
> +#define REACHABLE (1u<<12)
>
> static int tree_is_complete(const unsigned char *sha1)
> {
> @@ -209,6 +210,43 @@ static int keep_entry(struct commit **it, unsigned char *sha1)
> return 1;
> }
>
> +static void mark_reachable(struct commit *commit, unsigned long expire_limit)
> +{
> + /*
> + * We need to compute if commit on either side of an reflog
> + * entry is reachable from the tip of the ref for all entries.
> + * Mark commits that are reachable from the tip down to the
> + * time threashold first; we know a commit marked thusly is
> + * reachable from the tip without running in_merge_bases()
> + * at all.
> + */
> + struct commit_list *pending = NULL;
> +
> + commit_list_insert(commit, &pending);
> + while (pending) {
> + struct commit_list *entry = pending;
> + struct commit_list *parent;
> + pending = entry->next;
> + commit = entry->item;
> + free(entry);
> + if (commit->object.flags & REACHABLE)
> + continue;
> + if (parse_commit(commit))
> + continue;
> + commit->object.flags |= REACHABLE;
> + if (commit->date < expire_limit)
> + continue;
> + parent = commit->parents;
> + while (parent) {
> + commit = parent->item;
> + parent = parent->next;
> + if (commit->object.flags & REACHABLE)
> + continue;
> + commit_list_insert(commit, &pending);
> + }
> + }
> +}
> +
> static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1)
> {
> /*
> @@ -227,6 +265,8 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig
> }
>
> /* Reachable from the current reflog top? Don't prune */
> + if (commit->object.flags & REACHABLE)
> + return 0;
> if (in_merge_bases(commit, &cb->ref_commit, 1))
> return 0;
>
> @@ -308,7 +348,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused,
> cb.ref_commit = lookup_commit_reference_gently(sha1, 1);
> cb.ref = ref;
> cb.cmd = cmd;
> +
You seem to have lost "if (cb.ref_commit)" from the last round to protect
mark_rechable(). It can be NULL.
> + mark_reachable(cb.ref_commit, cmd->expire_total);
> for_each_reflog_ent(ref, expire_reflog_ent, &cb);
> + clear_commit_marks(cb.ref_commit, REACHABLE);
> finish:
> if (cb.newlog) {
> if (fclose(cb.newlog)) {
> --
> 1.6.2.1.404.gb0085.dirty
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] Speed up reflog pruning of unreachable commits
2009-03-31 17:46 ` Junio C Hamano
@ 2009-03-31 17:58 ` Linus Torvalds
0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2009-03-31 17:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Tue, 31 Mar 2009, Junio C Hamano wrote:
>
> You seem to have lost "if (cb.ref_commit)" from the last round to protect
> mark_rechable(). It can be NULL.
I based it on your original patch, so yeah, it's missing all your updates
since.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-03-31 18:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-31 17:03 [PATCH 0/2] speed up reflog unreachability pruning Linus Torvalds
2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds
2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds
2009-03-31 17:46 ` Junio C Hamano
2009-03-31 17:58 ` Linus Torvalds
2009-03-31 17:44 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).