* [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
@ 2005-10-18 17:52 Linus Torvalds
2005-10-18 18:45 ` Junio C Hamano
2005-10-18 18:49 ` Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2005-10-18 17:52 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
If everything is up-to-date locally, we don't need to even ask for a
pack-file from the remote, or try to unpack it.
This is especially important for tags - since the pack-file common commit
logic is based purely on the commit history, it will never be able to find
a common tag, and will thus always end up re-fetching them.
Especially notably, if the tag points to a non-commit (eg a tagged tree),
the pack-file would be unnecessarily big, just because it cannot any most
recent common point between commits for pruning.
Short-circuiting the case where we already have that reference means that
we avoid a lot of these in the common case.
NOTE! This only matches remote ref names against the same local name,
which works well for tags, but is not as generic as it could be. If we
ever need to, we could match against _any_ local ref (if we have it, we
have it), but this "match against same name" is simpler and more
efficient, and covers the common case.
Renaming of refs is common for branch heads, but since those are always
commits, the pack-file generation can optimize that case.
In some cases we might still end up fetching pack-files unnecessarily, but
this at least avoids the re-fetching of tags over and over if you use a
regular
git fetch --tags ...
which was the main reason behind the change.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
Ok, this should be the trivially fixed (famous last words) patch, which
should correct the case where we don't have a local name for the remote
ref at all.
If you already applied the previous patch, you just need to fix the
if (read_ref(...) < 0)
continue
if (memcmp(..)) {
to be one case (a failing read_ref should do the exact same thing as a
failed memcmp):
if (read_ref(...) < 0 ||
memcmp(...) {
and everything should be ok.
This has gotten _some_ testing, but obviously not enough ;)
Linus
diff --git a/fetch-pack.c b/fetch-pack.c
index 953c0cf..4597369 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -12,6 +12,7 @@ static const char *exec = "git-upload-pa
static int find_common(int fd[2], unsigned char *result_sha1,
struct ref *refs)
{
+ int fetching;
static char line[1000];
int count = 0, flushes = 0, retval;
FILE *revs;
@@ -20,16 +21,19 @@ static int find_common(int fd[2], unsign
if (!revs)
die("unable to run 'git-rev-list'");
- while (refs) {
+ fetching = 0;
+ for ( ; refs ; refs = refs->next) {
unsigned char *remote = refs->old_sha1;
- if (verbose)
- fprintf(stderr,
- "want %s (%s)\n", sha1_to_hex(remote),
- refs->name);
+ unsigned char *local = refs->new_sha1;
+
+ if (!memcmp(remote, local, 20))
+ continue;
packet_write(fd[1], "want %s\n", sha1_to_hex(remote));
- refs = refs->next;
+ fetching++;
}
packet_flush(fd[1]);
+ if (!fetching)
+ return 1;
flushes = 1;
retval = -1;
while (fgets(line, sizeof(line), revs) != NULL) {
@@ -74,6 +78,35 @@ static int find_common(int fd[2], unsign
return retval;
}
+static int everything_local(struct ref *refs)
+{
+ int retval;
+
+ for (retval = 1; refs ; refs = refs->next) {
+ const unsigned char *remote = refs->old_sha1;
+ unsigned char local[20];
+
+ if (read_ref(git_path("%s", refs->name), local) < 0 ||
+ memcmp(remote, local, 20)) {
+ retval = 0;
+ if (!verbose)
+ continue;
+ fprintf(stderr,
+ "want %s (%s)\n", sha1_to_hex(remote),
+ refs->name);
+ continue;
+ }
+
+ memcpy(refs->new_sha1, local, 20);
+ if (!verbose)
+ continue;
+ fprintf(stderr,
+ "already have %s (%s)\n", sha1_to_hex(remote),
+ refs->name);
+ }
+ return retval;
+}
+
static int fetch_pack(int fd[2], int nr_match, char **match)
{
struct ref *ref;
@@ -86,6 +119,10 @@ static int fetch_pack(int fd[2], int nr_
packet_flush(fd[1]);
die("no matching remote head");
}
+ if (everything_local(ref)) {
+ packet_flush(fd[1]);
+ goto all_done;
+ }
if (find_common(fd, sha1, ref) < 0)
fprintf(stderr, "warning: no common commits\n");
pid = fork();
@@ -109,6 +146,7 @@ static int fetch_pack(int fd[2], int nr_
int code = WEXITSTATUS(status);
if (code)
die("git-unpack-objects died with error code %d", code);
+all_done:
while (ref) {
printf("%s %s\n",
sha1_to_hex(ref->old_sha1), ref->name);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
2005-10-18 17:52 [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing Linus Torvalds
@ 2005-10-18 18:45 ` Junio C Hamano
2005-10-18 18:49 ` Junio C Hamano
1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2005-10-18 18:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@osdl.org> writes:
> If everything is up-to-date locally, we don't need to even ask for a
> pack-file from the remote, or try to unpack it.
>
> This is especially important for tags - since the pack-file common commit
> logic is based purely on the commit history, it will never be able to find
> a common tag, and will thus always end up re-fetching them.
>
> Especially notably, if the tag points to a non-commit (eg a tagged tree),
> the pack-file would be unnecessarily big, just because it cannot any most
> recent common point between commits for pruning.
>
> Short-circuiting the case where we already have that reference means that
> we avoid a lot of these in the common case.
>
> NOTE! This only matches remote ref names against the same local name,
> which works well for tags, but is not as generic as it could be. If we
> ever need to, we could match against _any_ local ref (if we have it, we
> have it), but this "match against same name" is simpler and more
> efficient, and covers the common case.
>
> Renaming of refs is common for branch heads, but since those are always
> commits, the pack-file generation can optimize that case.
>
> In some cases we might still end up fetching pack-files unnecessarily, but
> this at least avoids the re-fetching of tags over and over if you use a
> regular
>
> git fetch --tags ...
>
> which was the main reason behind the change.
>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> ---
>
> Ok, this should be the trivially fixed (famous last words) patch, which
> should correct the case where we don't have a local name for the remote
> ref at all.
>
> If you already applied the previous patch, you just need to fix the
>
> if (read_ref(...) < 0)
> continue
> if (memcmp(..)) {
>
> to be one case (a failing read_ref should do the exact same thing as a
> failed memcmp):
>
> if (read_ref(...) < 0 ||
> memcmp(...) {
>
> and everything should be ok.
>
> This has gotten _some_ testing, but obviously not enough ;)
>
> Linus
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 953c0cf..4597369 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -12,6 +12,7 @@ static const char *exec = "git-upload-pa
> static int find_common(int fd[2], unsigned char *result_sha1,
> struct ref *refs)
> {
> + int fetching;
> static char line[1000];
> int count = 0, flushes = 0, retval;
> FILE *revs;
> @@ -20,16 +21,19 @@ static int find_common(int fd[2], unsign
> if (!revs)
> die("unable to run 'git-rev-list'");
>
> - while (refs) {
> + fetching = 0;
> + for ( ; refs ; refs = refs->next) {
> unsigned char *remote = refs->old_sha1;
> - if (verbose)
> - fprintf(stderr,
> - "want %s (%s)\n", sha1_to_hex(remote),
> - refs->name);
> + unsigned char *local = refs->new_sha1;
> +
> + if (!memcmp(remote, local, 20))
> + continue;
> packet_write(fd[1], "want %s\n", sha1_to_hex(remote));
> - refs = refs->next;
> + fetching++;
> }
> packet_flush(fd[1]);
> + if (!fetching)
> + return 1;
> flushes = 1;
> retval = -1;
> while (fgets(line, sizeof(line), revs) != NULL) {
> @@ -74,6 +78,35 @@ static int find_common(int fd[2], unsign
> return retval;
> }
>
> +static int everything_local(struct ref *refs)
> +{
> + int retval;
> +
> + for (retval = 1; refs ; refs = refs->next) {
> + const unsigned char *remote = refs->old_sha1;
> + unsigned char local[20];
> +
> + if (read_ref(git_path("%s", refs->name), local) < 0 ||
> + memcmp(remote, local, 20)) {
> + retval = 0;
> + if (!verbose)
> + continue;
> + fprintf(stderr,
> + "want %s (%s)\n", sha1_to_hex(remote),
> + refs->name);
> + continue;
> + }
> +
> + memcpy(refs->new_sha1, local, 20);
> + if (!verbose)
> + continue;
> + fprintf(stderr,
> + "already have %s (%s)\n", sha1_to_hex(remote),
> + refs->name);
> + }
> + return retval;
> +}
> +
> static int fetch_pack(int fd[2], int nr_match, char **match)
> {
> struct ref *ref;
> @@ -86,6 +119,10 @@ static int fetch_pack(int fd[2], int nr_
> packet_flush(fd[1]);
> die("no matching remote head");
> }
> + if (everything_local(ref)) {
> + packet_flush(fd[1]);
> + goto all_done;
> + }
> if (find_common(fd, sha1, ref) < 0)
> fprintf(stderr, "warning: no common commits\n");
> pid = fork();
> @@ -109,6 +146,7 @@ static int fetch_pack(int fd[2], int nr_
> int code = WEXITSTATUS(status);
> if (code)
> die("git-unpack-objects died with error code %d", code);
> +all_done:
> while (ref) {
> printf("%s %s\n",
> sha1_to_hex(ref->old_sha1), ref->name);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
2005-10-18 17:52 [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing Linus Torvalds
2005-10-18 18:45 ` Junio C Hamano
@ 2005-10-18 18:49 ` Junio C Hamano
2005-10-18 19:19 ` Junio C Hamano
2005-10-18 20:38 ` Linus Torvalds
1 sibling, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2005-10-18 18:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@osdl.org> writes:
> Ok, this should be the trivially fixed (famous last words) patch, which
> should correct the case where we don't have a local name for the remote
> ref at all.
>
> If you already applied the previous patch, you just need to fix the
No, I haven't (not my git day today).
It strikes me that we could walk from our refs, depth reasonably
limited to say 20 or so commit chain and/or last 5 days of
commit time, to see if any of the remotes are reachable from our
refs and omit issuing "want" quite cheaply. Do you think that
would be a worthy change to make things more efficient?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
2005-10-18 18:49 ` Junio C Hamano
@ 2005-10-18 19:19 ` Junio C Hamano
2005-10-18 20:38 ` Linus Torvalds
1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2005-10-18 19:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano <junkio@cox.net> writes:
> No, I haven't (not my git day today).
>
> It strikes me that we could walk from our refs, depth reasonably
> limited to say 20 or so commit chain and/or last 5 days of
> commit time, to see if any of the remotes are reachable from our
> refs and omit issuing "want" quite cheaply. Do you think that
> would be a worthy change to make things more efficient?
Something like this on top of your second patch?
------------
Subject: do not ask for objects known to be complete.
On top of optimization by Linus not to ask refs that already
match, we can shallowly walk our refs and not issue "want" for
things that we know are reachable from them.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/fetch-pack.c b/fetch-pack.c
index 4597369..212e00f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1,6 +1,9 @@
#include "cache.h"
#include "refs.h"
#include "pkt-line.h"
+#include "commit.h"
+#include "tag.h"
+#include <time.h>
#include <sys/wait.h>
static int quiet;
@@ -78,16 +81,58 @@ static int find_common(int fd[2], unsign
return retval;
}
+#define COMPLETE (1U << 0)
+
+/*
+ * 5 days - this should be configurable.
+ */
+#define RECENT (5 * 24 * 60 * 60)
+
+static struct commit_list *complete = NULL;
+
+static int mark_complete(const char *path, const unsigned char *sha1)
+{
+ struct object *o = parse_object(sha1);
+
+ while (o && o->type == tag_type) {
+ o->flags |= COMPLETE;
+ o = parse_object(((struct tag *)o)->tagged->sha1);
+ }
+ if (o->type == commit_type) {
+ struct commit *commit = (struct commit *)o;
+ commit->object.flags |= COMPLETE;
+ insert_by_date(commit, &complete);
+ }
+ return 0;
+}
+
+static void mark_recent_complete_commits(unsigned long cutoff_date)
+{
+ while (complete && cutoff_date <= complete->item->date) {
+ if (verbose)
+ fprintf(stderr, "Marking %s as complete\n",
+ sha1_to_hex(complete->item->object.sha1));
+ pop_most_recent_commit(&complete, COMPLETE);
+ }
+}
+
static int everything_local(struct ref *refs)
{
int retval;
+ time_t now;
+
+ time(&now);
+
+ for_each_ref(mark_complete);
+ mark_recent_complete_commits((unsigned long) now - RECENT);
for (retval = 1; refs ; refs = refs->next) {
const unsigned char *remote = refs->old_sha1;
unsigned char local[20];
+ struct object *o;
- if (read_ref(git_path("%s", refs->name), local) < 0 ||
- memcmp(remote, local, 20)) {
+ o = parse_object(remote);
+ if (!o || !(o->flags & COMPLETE)) {
retval = 0;
if (!verbose)
continue;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
2005-10-18 18:49 ` Junio C Hamano
2005-10-18 19:19 ` Junio C Hamano
@ 2005-10-18 20:38 ` Linus Torvalds
2005-10-18 20:42 ` Linus Torvalds
1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2005-10-18 20:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Tue, 18 Oct 2005, Junio C Hamano wrote:
>
> It strikes me that we could walk from our refs, depth reasonably
> limited to say 20 or so commit chain and/or last 5 days of
> commit time, to see if any of the remotes are reachable from our
> refs and omit issuing "want" quite cheaply. Do you think that
> would be a worthy change to make things more efficient?
Probably doesn't make a huge difference, but it might be worth trying.
There's a cheap test you can do _before_ you even start walking: check if
you have the object that is pointed to by the remote ref at all. If you
don't have it, then you know it can't be reachable from any of the local
refs. And if you do have it, the likelihood that it _is_ reachable is
likely pretty high.
(I didn't do that for the current fetch-pack optimization, since just
doing the read_ref() is likely faster than even bothering with the object
lookup. But if you start traversing commit lists, it suddenly becomes
more worthwhile).
You'd need to look up the object anyway in order to figure out that it's a
commit (or points to a commit).
So it might be wasting a bit of time, but the good news is that it wastes
time on the _client_ side, where we've got plenty.
I'll see if I can come up with a good patch.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
2005-10-18 20:38 ` Linus Torvalds
@ 2005-10-18 20:42 ` Linus Torvalds
2005-10-19 0:27 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2005-10-18 20:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Tue, 18 Oct 2005, Linus Torvalds wrote:
>
> I'll see if I can come up with a good patch.
I see you already did. Looks fine. I'd suggest limiting the commits by
number in mark_recent_commit_complete(), because
(a) somebody might have their clock set wrong and you don't want to walk
a huge tree just because of something like that.
(b) you might just have imported a huge history (badly) from somewhere
else
(c) a _lot_ can happen in five days with automated things.
but yes, the approach looks very sane otherwise.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing
2005-10-18 20:42 ` Linus Torvalds
@ 2005-10-19 0:27 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2005-10-19 0:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> I see you already did. Looks fine. I'd suggest limiting the commits by
> number in mark_recent_commit_complete(), because
>
> (a) somebody might have their clock set wrong and you don't want to walk
> a huge tree just because of something like that.
> (b) you might just have imported a huge history (badly) from somewhere
> else
> (c) a _lot_ can happen in five days with automated things.
>
> but yes, the approach looks very sane otherwise.
When you have several dozen commits on top of a head you fetched
from the remote last time you polled them, and the remote has
not updated that head since then, it may be worthwhile to have
the client dig deeper to avoid asking the server.
What I am thinking is:
- For objects our refs directly refer to, mark them COMPLETE
as the patch I sent out.
- See if we have any objects the remote refs refer to
already; find the timestamp of the latest one if we have
commits among them, and use its time as the cutoff time.
It is likely that we have synched with them after that
timestamp (either upload or download). walk the commits
from our ref, and mark *everything* that are newer than
that timestamp. This can turn out to be a huge walking
but that happens on the client side.
This way I can get rid of the arbitrary 5-day window, and I do
not have to invent another arbitrary number to limit the commits
we walk.
In other words, let's put the burden on the client if its effort
possibly can help the server.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-10-19 0:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-18 17:52 [CORRECTED PATCH] git-fetch-pack: avoid unnecessary zero packing Linus Torvalds
2005-10-18 18:45 ` Junio C Hamano
2005-10-18 18:49 ` Junio C Hamano
2005-10-18 19:19 ` Junio C Hamano
2005-10-18 20:38 ` Linus Torvalds
2005-10-18 20:42 ` Linus Torvalds
2005-10-19 0:27 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).