* [RFC] log(n)-transmissions common commit handshake @ 2008-09-17 23:00 Thomas Rast 2008-09-17 23:01 ` [RFC PATCH] fetch-pack: log(n)-transmission find_common() Thomas Rast 2008-09-18 8:18 ` [RFC] log(n)-transmissions common commit handshake Thomas Rast 0 siblings, 2 replies; 13+ messages in thread From: Thomas Rast @ 2008-09-17 23:00 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 8601 bytes --] Hi * This is a fairly long mail, so here's a brief abstract: Using two passes, first exponential stride through history, then bisection on the "right" windows, we implement the find_common handshake in log(history_size) many transmissions. The followup patch partially achieves that, but a few points are open and strict backwards compatibility is missing (but its absence is Mostly Harmless). Terminology =========== The sending side (running upload-pack) is called server, and the receiving side (running fetch-pack) is the client. Commits fall into the following categories: 'want': Requested by the client for transmission. 'have': In the client's object storage. 'common': In both client's and server's object storage. 'private': have, but not in the server's storage. 'required': Part of the history graph "between" commons and wants; minimal subset that needs to be sent over the wire as part of the transmission. ("Verbing weirds language." -- Calvin) When I say "a commit is not _yet_ X", I mean "it is not yet known to be X", usually from the client's POV. Motivation ========== This spawned out of the O(#haves*#wants*#history) analysis late last week. As a brief summary, the server uses the fairly expensive test ok_to_give_up() to check whether it can cut the handshake short since it knows enough commons to cover all wants. The thread is here: http://article.gmane.org/gmane.comp.version-control.git/95787 (Be sure to read the follow-up regarding the first '*' bullet point.) It does this shortcut because the client risks getting side-tracked on a private chunk of history leading to a private root. (According to the commit message. I'm not sure how effective it really is.) This check by far dominates the rest of the server's work, which is mostly lookup_object(): it only has to take each "have" line sha1, see if it is present, and if so, ACK it. The client announces when it considers the handshake finished, so there is no other requirement to test for it. A more efficient handshake could -- in addition to being more efficient :-) -- solve this problem simply by hitting the private root very quickly and proceeding elsewhere, avoiding the server-side check. Outline of current algorithm ============================ The client sends all its 'have' sha1s, ordered by date, in chunks of 32. The server replies with an ACK for each 'common', i.e., for any sha1 it finds in its own object store. [This is complicated a bit by the exchange of refs, see "Open questions".] As it goes, the client marks the commons, and as soon as it runs out of unmarket commits, it sends a "done". The server then knows the entire set of commons too, and can proceed to generate a pack file of the requireds. There is a hard limit of 256 commits that go un-ACKed until the client gives up and declares that there are no common commits. Idea ==== The proposed algorithm scans the haves in a different order, using two stages: exponential stride and bisection. (1) Exponential stride: ----------------------- Emit the first commit, then the 2nd from that, then the 4th from that, and so on. As a boundary case, always emit the root. If the history has "bounded width" (the meaning should be fairly obvious), then this stage emits O(log(n)) commits up to the root. (2) Bisection: -------------- Call a pair of (A,B) where A is an ancestor of B "interesting" if B is not yet common and A is not yet private. Then the bisection step is: between any interesting pair of commits (A,B) emitted so far, emit the one in the middle M=M(A,B). ["Cut points" between last common and first private commits clearly lie only between such pairs.] Assuming for a moment that the server responds immediately, this also takes only O(log(n)) commits since M must be either common or private, and thus A..M or M..B is no longer interesting, respectively. To deal with the delayed responses in the actual protocol, at each pass 'i' emit M(A,B) for all (A,B) in pairs(i) only if the pair is still interesting. Add both (A,M) and (M,B) to pairs(i+1). Iterate. pairs(0) is constructed from pass (1). Intuitively, this means we emit more than log(n) only if we have nothing else left to do. ---- Between the two stages, we achieve O(log(n)) complexity. (This notably means that we hit private roots in far less transmitted sha1s than the chunk size of 32.) Unfortunately, this ordering sometimes conflicts with the server's ok_to_give_up() shortcut mentioned in "Motivation": Consider a linear branch 'master', and suppose the only 'common' sent out in the first batch was the root. ok_to_give_up() detects that it has a common on the only ancestry line of the want, and trigger a fake ACK for all further commits. This becomes a problem when we later emit, say, 'master~1' and the server ACKs it even though it is private. (It does not affect correctness, but can increase the depth of history transmitted by a factor of 2 at worst.) RFC Implementation ================== I'll just highlight a few design decisions. It's my first serious C coding in years, so warm up the flamethrowers... * The work_list (replacing rev_list) is the "recursion stack" for get_rev(). New "frames" are inserted such that the order of emitted commits is (a) refs (b) stage (1) in date order (b) stage (2) in bisection depth order get_rev() is split across functions for each case. Refs are moved to front because the current code emits them anyway (see "Open questions" below), mostly to help fill the first 64 "have" lines with things we are going to send anyway, as opposed to the "real" walk where we make choices. * The bisection does not actually remember pairs, but flags commits with the directions it has already started a bisection from them. Combined with the existing POPPED flag for commits that were emitted, the pair (A,B) can be determined from (A,direction) by a parent/children search through the history. This simplifies the implementation. The idea is then to scan with two pointers 'half' and 'full' in parallel, where the first one makes only half as many steps, until we hit a POPPED commit. This must then be the interval we want to bisect, and we can check whether it is still interesting, emit M, and queue it for recursion at one level deeper. * We maintain a list 'outstanding' of all commits that have been sent, so that an "implied not-ACK" can be inferred from the next ACK. This is used to do a child-recursive NOTCOMMON flagging, analogous to the parent-recursive COMMON flagging on an ACK. The bisection uses these flags to determine ranges. * Yes, commit_info is a pointless level of indirection. I had more tracking information in it before the last draft, and left it in for the time being. [Hint: when playing with the patch, command lines such as git fetch-pack -v -k <url> <head> 2>&1 | git name-rev --stdin help see the effects.] Open Problems ============= * Effect of branch/merge points I do not see a better solution to deal with branch/merge points than recursing into them. There may be examples of history where this extra level of recursion spoils the log(n) bound. Carefully choosing which side to scan first might help. * Impact/clever use of refs For some reason, current git sends all refs to the server, even if the server should already know about lots of them. For example, in git.git, emitting v1.6.0.2 covers almost all tags in the repository by simple ancestor scanning. Is there a reason for this behaviour? Otherwise it would be better to emit them in date order and intelligently handle commons. (In fact this does not depend on the discussed change.) * Backwards compatibility As mentioned above, ok_to_give_up() is not really compatible with this algorithm. I suppose we should introduce a new capability identifier that has the client enable the log(n) walker, and the server disable ok_to_give_up()? Is there a better approach, or are there problems with it? (I haven't worked with the protocol enough to say, though it seems harmless.) ... and of course everything that I missed. ;-) I'd greatly appreciate comments, especially from people who know the protocol side of git. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH] fetch-pack: log(n)-transmission find_common() 2008-09-17 23:00 [RFC] log(n)-transmissions common commit handshake Thomas Rast @ 2008-09-17 23:01 ` Thomas Rast 2008-09-24 0:48 ` [RFC PATCH v2] " Thomas Rast 2008-09-18 8:18 ` [RFC] log(n)-transmissions common commit handshake Thomas Rast 1 sibling, 1 reply; 13+ messages in thread From: Thomas Rast @ 2008-09-17 23:01 UTC (permalink / raw) To: git See thread starter for discussion. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- builtin-fetch-pack.c | 333 +++++++++++++++++++++++++++++++++++++++++++------ upload-pack.c | 4 +- 2 files changed, 294 insertions(+), 43 deletions(-) diff --git a/builtin-fetch-pack.c b/builtin-fetch-pack.c index 4dfef29..9f010fe 100644 --- a/builtin-fetch-pack.c +++ b/builtin-fetch-pack.c @@ -25,6 +25,11 @@ static const char fetch_pack_usage[] = #define COMMON_REF (1U << 2) #define SEEN (1U << 3) #define POPPED (1U << 4) +#define BISECTED_FW (1U << 5) +#define BISECTED_BW (1U << 6) +#define NOTCOMMON (1U << 7) +#define HAS_INFO (1U << 8) +#define CLEAN_MARKS (COMMON | COMMON_REF | SEEN | POPPED | BISECTED_FW | BISECTED_BW | NOTCOMMON) static int marked; @@ -34,19 +39,78 @@ static int marked; */ #define MAX_IN_VAIN 256 -static struct commit_list *rev_list; +struct work_list +{ + struct work_list *next; + struct commit *commit; + unsigned int stride; /* current stride length between commits */ + unsigned int steps; /* steps left to go before emitting again */ + unsigned int bisect; + unsigned int is_ref :1; /* prioritise refs */ +}; + +static struct work_list *work_list = NULL; + +static struct commit_list *outstanding = NULL; /* list of commits not ACKed yet */ +static struct commit_list *outstanding_end = NULL; + +struct commit_info +{ + struct commit_list *children; +}; + + static int non_common_revs, multi_ack, use_sideband; -static void rev_list_push(struct commit *commit, int mark) +static struct commit_info *get_commit_info(struct commit *commit) +{ + if (commit->object.flags & HAS_INFO) + return commit->util; + + struct commit_info *info = xmalloc(sizeof(struct commit_info)); + info->children = NULL; + commit->util = info; + commit->object.flags |= HAS_INFO; + return info; +} + +static void work_list_insert(struct work_list *work_item) { + struct work_list **pp = &work_list; + struct work_list *p; + + while ((p = *pp)) { + if ((work_item->is_ref && (!p->is_ref || p->commit->date < work_item->commit->date)) + || (!work_item->bisect && (p->bisect || p->commit->date < work_item->commit->date)) + || p->bisect > work_item->bisect) + break; + pp = &p->next; + } + + work_item->next = p; + *pp = work_item; +} + + +static void rev_list_push(struct commit *commit, int mark, unsigned int stride, unsigned int steps, unsigned int is_ref) +{ + struct work_list *work_item; + if (!(commit->object.flags & mark)) { commit->object.flags |= mark; - if (!(commit->object.parsed)) + if (!(commit->object.parsed)) { if (parse_commit(commit)) return; + } - insert_by_date(commit, &rev_list); + work_item = xmalloc(sizeof(struct work_list)); + work_item->commit = commit; + work_item->stride = stride; + work_item->steps = steps; + work_item->bisect = 0; + work_item->is_ref = is_ref; + work_list_insert(work_item); if (!(commit->object.flags & COMMON)) non_common_revs++; @@ -58,7 +122,7 @@ static int rev_list_insert_ref(const char *path, const unsigned char *sha1, int struct object *o = deref_tag(parse_object(sha1), path, 0); if (o && o->type == OBJ_COMMIT) - rev_list_push((struct commit *)o, SEEN); + rev_list_push((struct commit *)o, SEEN, 1, 1, 1); return 0; } @@ -68,8 +132,7 @@ static int clear_marks(const char *path, const unsigned char *sha1, int flag, vo struct object *o = deref_tag(parse_object(sha1), path, 0); if (o && o->type == OBJ_COMMIT) - clear_commit_marks((struct commit *)o, - COMMON | COMMON_REF | SEEN | POPPED); + clear_commit_marks((struct commit *)o, CLEAN_MARKS); return 0; } @@ -89,7 +152,7 @@ static void mark_common(struct commit *commit, o->flags |= COMMON; if (!(o->flags & SEEN)) - rev_list_push(commit, SEEN); + rev_list_push(commit, SEEN, 1, 1, 0); else { struct commit_list *parents; @@ -111,49 +174,216 @@ static void mark_common(struct commit *commit, Get the next rev to send, ignoring the common. */ -static const unsigned char* get_rev(void) + +static void get_rev_queue_parents(struct commit *commit, int mark, unsigned int stride, unsigned int steps) +{ + struct commit_list *parents = commit->parents; + + while (parents) { + struct commit_info *info = get_commit_info(parents->item); + struct commit_list *entry; + for (entry = info->children; entry; entry = entry->next) + if (entry->item == commit) + break; + if (!entry) + commit_list_insert(commit, &(info->children)); + if (!(parents->item->object.flags & SEEN)) + rev_list_push(parents->item, mark, stride, steps, 0); + if (mark & COMMON) + mark_common(parents->item, 1, 0); + parents = parents->next; + } +} + + +static struct commit* get_rev_stride_skip(struct work_list *work_item) +{ + get_rev_queue_parents(work_item->commit, SEEN, work_item->stride, + work_item->steps + 1); + + free(work_item); + return NULL; +} + +static void setup_bisect(struct commit *commit, int depth) +{ + struct work_list *work_item = xmalloc(sizeof(struct work_list)); + + work_item->commit = commit; + work_item->bisect = 1+depth; + work_item->is_ref = 0; + work_list_insert(work_item); +} + +static void setup_bisect_all (struct commit_list* list, int depth) +{ + while (list) { + setup_bisect(list->item, depth); + list = list->next; + } +} + +static struct commit* get_rev_stride_emit(struct work_list *work_item) +{ + struct commit *commit = work_item->commit; + unsigned int mark; + + commit->object.flags |= POPPED; + + /* This one got hit by our walk! */ + if (!(commit->object.flags & COMMON)) + non_common_revs--; + + if (commit->object.flags & COMMON) { + /* do not send "have", and ignore ancestors */ + commit = NULL; + mark = COMMON | SEEN; + } else if (commit->object.flags & COMMON_REF) { + /* send "have", and ignore ancestors */ + mark = COMMON | SEEN; + } else { + /* send "have", also for its ancestors */ + mark = SEEN; + setup_bisect(commit, 0); + } + + if (commit) + get_rev_queue_parents(commit, mark, work_item->stride*2, 0); + + free(work_item); + + return commit; +} + +static struct commit* get_rev_bisect(struct work_list *work_item) +{ + struct commit *full; + struct commit *half; + int half_step; + struct commit_info *info; + unsigned int flags = work_item->commit->object.flags; + + if (flags & NOTCOMMON) { + /* we inferred that this side of the bisection is not + * interesting any longer */ + free(work_item); + return NULL; + } + + + if (!(flags & (COMMON|BISECTED_BW))) { + /* Server does not have this. Search backward in history */ + + full = work_item->commit; + full->object.flags |= BISECTED_BW; + half = work_item->commit; + half_step = 0; + + while (full && full->parents) { + setup_bisect_all(full->parents->next, work_item->bisect); + full = full->parents->item; + half_step ^= 1; + if (half_step) + half = half->parents->item; + if (full->object.flags & (POPPED|COMMON)) + break; + } + + /* also insert the same bisection again so we can try forward too */ + work_list_insert(work_item); + + if (full->object.flags & POPPED + && !(full->object.flags & NOTCOMMON) + && !(half->object.flags & (COMMON|POPPED))) { + setup_bisect(half, work_item->bisect); + half->object.flags |= POPPED; + info = get_commit_info(half); + return half; + } else { + return NULL; + } + } + + if (!(flags & (NOTCOMMON|BISECTED_FW))) { + /* We have not seen this yet when bisecting, search forward */ + + full = work_item->commit; + full->object.flags |= BISECTED_FW; + half = work_item->commit; + half_step = 0; + + while (full && (info=get_commit_info(full))->children) { + setup_bisect_all(info->children->next, work_item->bisect); + full = info->children->item; + half_step ^= 1; + if (half_step) { + info = get_commit_info(half); + half = info->children->item; + } + if (full->object.flags & (POPPED|NOTCOMMON)) + break; + } + + if (full->object.flags & POPPED + && !(full->object.flags & COMMON) + && !(half->object.flags & POPPED)) { + setup_bisect(half, work_item->bisect); + half->object.flags |= POPPED; + info = get_commit_info(half); + free(work_item); + return half; + } + } + + free(work_item); + return NULL; +} + + +static void mark_not_common(struct commit *commit) +{ + struct commit_info *info = get_commit_info(commit); + struct commit_list *child; + + if (commit->object.flags & COMMON) + /* this has already been acked earlier */ + return; + + commit->object.flags |= NOTCOMMON; + + for (child = info->children; child; child = child->next) + mark_not_common(child->item); +} + +static struct commit *get_rev(void) { struct commit *commit = NULL; while (commit == NULL) { - unsigned int mark; - struct commit_list *parents; + struct work_list *work_item = NULL; - if (rev_list == NULL || non_common_revs == 0) + if (work_list == NULL || non_common_revs == 0) return NULL; - commit = rev_list->item; - if (!commit->object.parsed) + work_item = work_list; + work_list = work_item->next; + + commit = work_item->commit; + + if (commit && !commit->object.parsed) parse_commit(commit); - parents = commit->parents; - commit->object.flags |= POPPED; - if (!(commit->object.flags & COMMON)) - non_common_revs--; - - if (commit->object.flags & COMMON) { - /* do not send "have", and ignore ancestors */ - commit = NULL; - mark = COMMON | SEEN; - } else if (commit->object.flags & COMMON_REF) - /* send "have", and ignore ancestors */ - mark = COMMON | SEEN; - else - /* send "have", also for its ancestors */ - mark = SEEN; - - while (parents) { - if (!(parents->item->object.flags & SEEN)) - rev_list_push(parents->item, mark); - if (mark & COMMON) - mark_common(parents->item, 1, 0); - parents = parents->next; + if (work_item->bisect) { + commit = get_rev_bisect(work_item); + } else if (work_item->steps >= work_item->stride-1 + || (commit && !commit->parents)) { + commit = get_rev_stride_emit(work_item); + } else { + commit = get_rev_stride_skip(work_item); } - - rev_list = rev_list->next; } - return commit->object.sha1; + return commit; } static int find_common(int fd[2], unsigned char *result_sha1, @@ -161,6 +391,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, { int fetching; int count = 0, flushes = 0, retval; + struct commit *commit; const unsigned char *sha1; unsigned in_vain = 0; int got_continue = 0; @@ -243,11 +474,21 @@ static int find_common(int fd[2], unsigned char *result_sha1, flushes = 0; retval = -1; - while ((sha1 = get_rev())) { + while ((commit = get_rev())) { + sha1 = commit->object.sha1; packet_write(fd[1], "have %s\n", sha1_to_hex(sha1)); if (args.verbose) fprintf(stderr, "have %s\n", sha1_to_hex(sha1)); in_vain++; + + if (outstanding) { + commit_list_insert(commit, &(outstanding_end->next)); + outstanding_end = outstanding_end->next; + } else { + commit_list_insert(commit, &outstanding); + outstanding_end = outstanding; + } + if (!(31 & ++count)) { int ack; @@ -274,6 +515,16 @@ static int find_common(int fd[2], unsigned char *result_sha1, } else if (ack == 2) { struct commit *commit = lookup_commit(result_sha1); + struct commit_list *item; + while (commit != outstanding->item) { + mark_not_common(commit); + item = outstanding; + outstanding = item->next; + free(item); + } + item = outstanding; + outstanding = item->next; + free(item); mark_common(commit, 0, 1); retval = 0; in_vain = 0; @@ -445,7 +696,7 @@ static int everything_local(struct ref **refs, int nr_match, char **match) continue; if (!(o->flags & SEEN)) { - rev_list_push((struct commit *)o, COMMON_REF | SEEN); + rev_list_push((struct commit *)o, COMMON_REF | SEEN, 1, 1, 1); mark_common((struct commit *)o, 1, 1); } diff --git a/upload-pack.c b/upload-pack.c index e5adbc0..c6dfb32 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -414,9 +414,9 @@ static int get_common_commits(void) if (!prefixcmp(line, "have ")) { switch (got_sha1(line+5, sha1)) { case -1: /* they have what we do not */ - if (multi_ack && ok_to_give_up()) +/* if (multi_ack && ok_to_give_up()) packet_write(1, "ACK %s continue\n", - sha1_to_hex(sha1)); + sha1_to_hex(sha1));*/ break; default: memcpy(hex, sha1_to_hex(sha1), 41); -- tg: (1293c95..) t/fetch-pack-speedup (depends on: origin/master) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-09-17 23:01 ` [RFC PATCH] fetch-pack: log(n)-transmission find_common() Thomas Rast @ 2008-09-24 0:48 ` Thomas Rast 2008-10-23 19:38 ` Thomas Rast 0 siblings, 1 reply; 13+ messages in thread From: Thomas Rast @ 2008-09-24 0:48 UTC (permalink / raw) To: git; +Cc: Clemens Buchacher Replaces the existing simple history search with a more sophisticated algorithm: 1) Walk history with exponentially increasing stride lengths; i.e., send the 1st commit, then the 2nd after that, then the 4th after that, and so on. 2) Bisect the resulting intervals. Combined with tracking the "outstanding haves" so that we can detect which sha1s were never ACKed by upload-pack (and hence not common), this gives O(log(n)) required "have" lines. Unfortunately this cannot work if the server sends fake ACKs, so we introduce a capability 'exp-stride' which instructs upload-pack to disable ok_to_give_up(). (Which incidentally saves the server a lot of work.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- So, after the resounding success that v1 of this patch was, I went through and actually made it work, or at least I hope so. I eventually gave up on trying to understand the exact interaction of mark_common(), which sometimes pushes, and rev_list_push(), which sometimes marks, and just rewrote them to each only do one thing. I also rewrote the old strategy in terms of the new get_rev_stride (always keeping stride length at 1) because that was far less work than integrating the existing code with the new one. For the timings below, I ran the (new) daemon locally, with two repos: - a 'clone --mirror' of my own git.git, with a branch 'next' 525 commits from the above cc185a6 - a 'clone --mirror' of the completely unrelated adps.git, which shares no objects with git.git Some not really statistically sound timings, usually best of 3-5: ## A stripped down git.git, with only one branch and no tags $ git branch -a * master $ git tag $ git log -1 --pretty=oneline master cc185a6a8ac24737a26ec4b40cc401c2db8b2e97 Start draft release notes for 1.6.0.3 ## I still have the old (dashed) forms in /usr/bin from RPMs $ /usr/bin/git --version git version 1.5.6 ## (1a) "Updating" from the git.git that is further ahead (OLD) $ time /usr/bin/git-fetch-pack -k git://localhost/git next [...] real 0m1.004s user 0m0.228s sys 0m0.028s ## (1b) "Updating" from the git.git that is further ahead (NEW) $ time git fetch-pack -k git://localhost/git next [...] real 0m0.977s user 0m0.208s sys 0m0.068s ## (2a) Fetching the unrelated repo from scratch (OLD) $ time /usr/bin/git-fetch-pack -k git://localhost/adps master [...] real 0m9.560s user 0m0.720s sys 0m0.128s ## (2b) Fetching the unrelated repo from scratch (NEW) $ time git fetch-pack -k git://localhost/adps master [...] real 0m0.596s user 0m0.372s sys 0m0.008s ## (3a) Fetching over the network from real repo (OLD) $ time /usr/bin/git-fetch-pack git://git.kernel.org/pub/scm/git/git.git next [...] user 0m0.528s sys 0m0.136s ## (3b) Fetching over the network from real repo (NEW) $ time git fetch-pack git://git.kernel.org/pub/scm/git/git.git next [...] user 0m0.540s sys 0m0.180s ## Add more refs to make it more interesting $ git remote add origin ~/dev/git $ git fetch --tags origin ## (4a) like 1a, but with lots of tags $ time /usr/bin/git-fetch-pack -k git://localhost/git next [...] real 0m1.075s user 0m0.452s sys 0m0.048s ## (4b) like 1b, but with lots of tags $ time git fetch-pack -k git://localhost/git next [...] real 0m0.837s user 0m0.236s sys 0m0.040s Clearly, this approach does solve the issue I mentioned in the motivation earlier in the thread, where the initial fetch of completely disjoint repositories takes _ages_. Somewhat to my own surprise, it seems to do quite well in _all_ cases even though it aggressively digs through history. Note, however, that the timings aren't really solid; the final write of the pack, which I assume is fsync()ed, seems to have a big impact on the execution time. Test (3) obviously depends more on network performance than anything else; and it uses the old (linear) algorithm, but in the new implementation. I'd appreciate testers and eyeballs on the patch. I'll think about the ref heads "reduced DAG" again, to possibly shave off some more (client side cpu) time when I get the time, but that likely won't be in the next few days. - Thomas PS: Thanks drizzd for the discussion of the topic on IRC. PPS: Junio, enjoy your holidays :-) builtin-fetch-pack.c | 477 +++++++++++++++++++++++++++++++++++++++++--------- upload-pack.c | 7 +- 2 files changed, 400 insertions(+), 84 deletions(-) diff --git a/builtin-fetch-pack.c b/builtin-fetch-pack.c index 4dfef29..721fe08 100644 --- a/builtin-fetch-pack.c +++ b/builtin-fetch-pack.c @@ -25,6 +25,12 @@ static const char fetch_pack_usage[] = #define COMMON_REF (1U << 2) #define SEEN (1U << 3) #define POPPED (1U << 4) +#define BISECTED_FW (1U << 5) +#define BISECTED_BW (1U << 6) +#define NOTCOMMON (1U << 7) +#define HAS_INFO (1U << 8) +#define CLEAN_MARKS (COMMON | COMMON_REF | SEEN | POPPED | BISECTED_FW \ + | BISECTED_BW | NOTCOMMON) static int marked; @@ -34,133 +40,412 @@ static int marked; */ #define MAX_IN_VAIN 256 -static struct commit_list *rev_list; -static int non_common_revs, multi_ack, use_sideband; +struct work_list +{ + struct work_list *next; + struct commit *commit; + unsigned int stride; /* current stride length between commits */ + unsigned int steps; /* steps left to go before emitting again */ + unsigned int depth; /* sideline depth */ + unsigned int bisect; +}; + +/* "recursion stack" of our strategy of choosing sha1s to transmit */ +static struct work_list *work_list = NULL; -static void rev_list_push(struct commit *commit, int mark) +/* list of commits not ACKed yet */ +static struct commit_list *outstanding = NULL; +static struct commit_list *outstanding_end = NULL; + +struct commit_info { - if (!(commit->object.flags & mark)) { - commit->object.flags |= mark; + struct commit_list *children; +}; - if (!(commit->object.parsed)) - if (parse_commit(commit)) - return; - insert_by_date(commit, &rev_list); +static int non_common_revs = 1, multi_ack, use_sideband, exp_stride_algorithm; - if (!(commit->object.flags & COMMON)) - non_common_revs++; +static struct commit_info *get_commit_info(struct commit *commit) +{ + if (commit->object.flags & HAS_INFO) + return commit->util; + + struct commit_info *info = xmalloc(sizeof(struct commit_info)); + info->children = NULL; + commit->util = info; + commit->object.flags |= HAS_INFO; + return info; +} + +static void work_list_insert(struct work_list *work_item) +{ + struct work_list **pp = &work_list; + struct work_list *p; + + while ((p = *pp)) { + if (p->bisect || work_item->bisect) { + if (p->bisect > work_item->bisect) + break; + } else { + if (p->depth > work_item->depth) + break; + else if (p->commit->date < work_item->commit->date) + break; + } + + pp = &p->next; } + + work_item->next = p; + *pp = work_item; } -static int rev_list_insert_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data) +static void add_child(struct commit *parent, struct commit *child) +{ + struct commit_info *info = get_commit_info(parent); + struct commit_list *item; + for (item = info->children; item; item = item->next) + if (item->item == child) + break; + if (!item) + commit_list_insert(child, &(info->children)); +} + + +/* + Marks all (known) children of this commit NOTCOMMON. We call this + for all sha1s the server did not ACK. +*/ + +static void mark_not_common(struct commit *commit) +{ + struct commit_info *info = get_commit_info(commit); + struct commit_list *child; + + assert(!(commit->object.flags & COMMON)); + + if (commit->object.flags & NOTCOMMON) + return; + + commit->object.flags |= NOTCOMMON; + + for (child = info->children; child; child = child->next) + mark_not_common(child->item); +} + + +static void check_parsed_and_mark(struct commit *commit) +{ + struct commit_list *parent; + + if (!commit && commit->object.parsed) + return; + + parse_commit(commit); + + for (parent = commit->parents; parent; parent = parent->next) { + add_child(parent->item, commit); + if (parent->item->object.flags & NOTCOMMON) + mark_not_common(commit); + } +} + + +static void rev_list_push(struct commit *commit, unsigned int stride, + unsigned int steps, unsigned int depth) +{ + struct work_list *work_item; + + check_parsed_and_mark(commit); + + work_item = xmalloc(sizeof(struct work_list)); + work_item->commit = commit; + work_item->stride = stride; + work_item->steps = steps; + work_item->depth = depth; + work_item->bisect = 0; + work_list_insert(work_item); +} + +static int rev_list_insert_ref(const char *path, const unsigned char *sha1, + int flag, void *cb_data) { struct object *o = deref_tag(parse_object(sha1), path, 0); - if (o && o->type == OBJ_COMMIT) - rev_list_push((struct commit *)o, SEEN); + if (o && o->type == OBJ_COMMIT && !(o->flags & SEEN)) + rev_list_push((struct commit *)o, 1, 1, 0); return 0; } -static int clear_marks(const char *path, const unsigned char *sha1, int flag, void *cb_data) +static int clear_marks(const char *path, const unsigned char *sha1, int flag, + void *cb_data) { struct object *o = deref_tag(parse_object(sha1), path, 0); if (o && o->type == OBJ_COMMIT) - clear_commit_marks((struct commit *)o, - COMMON | COMMON_REF | SEEN | POPPED); + clear_commit_marks((struct commit *)o, CLEAN_MARKS); return 0; } + /* - This function marks a rev and its ancestors as common. - In some cases, it is desirable to mark only the ancestors (for example - when only the server does not yet know that they are common). + Mark commits backwards through history as COMMON. */ -static void mark_common(struct commit *commit, - int ancestors_only, int dont_parse) +static void mark_common(struct commit *commit, int ancestors_only, int dont_parse) { - if (commit != NULL && !(commit->object.flags & COMMON)) { - struct object *o = (struct object *)commit; + struct commit_list *parents; + struct object *o; - if (!ancestors_only) - o->flags |= COMMON; + if (commit == NULL || commit->object.flags & (COMMON_REF|COMMON)) + return; - if (!(o->flags & SEEN)) - rev_list_push(commit, SEEN); - else { - struct commit_list *parents; - - if (!ancestors_only && !(o->flags & POPPED)) - non_common_revs--; - if (!o->parsed && !dont_parse) - if (parse_commit(commit)) - return; - - for (parents = commit->parents; - parents; - parents = parents->next) - mark_common(parents->item, 0, dont_parse); - } + if (!dont_parse && !(commit->object.parsed)) + check_parsed_and_mark(commit); + + o = (struct object *)commit; + + if (!ancestors_only) + o->flags |= COMMON; + + for (parents = commit->parents; parents; parents = parents->next) { + add_child(parents->item, commit); + mark_common(parents->item, 0, dont_parse); } } + + /* - Get the next rev to send, ignoring the common. + Queue all revs in the list for simple backward search. Does not + requeue already SEEN commits. */ -static const unsigned char* get_rev(void) +static void queue_list(struct commit *child, struct commit_list *list, + unsigned int stride, unsigned int steps, + unsigned int depth) { - struct commit *commit = NULL; + while (list) { + add_child(list->item, child); + if (!(list->item->object.flags & SEEN)) + rev_list_push(list->item, stride, steps, depth); + list = list->next; + } +} + + +/* + Queues a commit for bisection +*/ + +static void setup_bisect(struct commit *commit, int depth) +{ + struct work_list *work_item = xmalloc(sizeof(struct work_list)); + + work_item->commit = commit; + work_item->bisect = 1+depth; + work_item->depth = 0; + work_list_insert(work_item); +} + + +/* + Queues a list of commits for bisection +*/ + +static void setup_bisect_all (struct commit_list* list, int depth) +{ + while (list) { + setup_bisect(list->item, depth); + list = list->next; + } +} + + +/* + Backwards search. If we use exp_stride_algorithm, it takes + exponential strides between commits chosen. +*/ + +static struct commit* get_rev_stride(struct work_list *work_item) +{ + struct commit *commit = work_item->commit; + unsigned int steps = work_item->steps; + + while (steps++ < work_item->stride-1 && commit->parents + && !(commit->object.flags & (SEEN|COMMON|COMMON_REF))) { + /* all but the first parent line are queued for later */ + if (commit->parents) + queue_list(commit, commit->parents->next, + work_item->stride, steps, + work_item->depth+1); + + /* follow the first parent line directly */ + commit->object.flags |= SEEN; + add_child(commit->parents->item, commit); + commit = commit->parents->item; + check_parsed_and_mark(commit); + } + + if (commit->object.flags & (SEEN|COMMON)) { + /* already been here! */ + commit = NULL; + } else if (commit->object.flags & COMMON_REF) { + /* this came from a ref; we stop here, but need to + * emit it so the server knows too */ + commit->object.flags |= POPPED|SEEN|BISECTED_BW; + if (exp_stride_algorithm) + setup_bisect(commit, 0); + } else { + /* usual case: this is news to us, so we scan further + * back. note that the first-depth bisection only + * needs to go forward */ + commit->object.flags |= POPPED|SEEN|BISECTED_BW; + queue_list(commit, commit->parents, + exp_stride_algorithm ? work_item->stride*2 : 1, + 0, work_item->depth); + if (exp_stride_algorithm) + setup_bisect(commit, 0); + } + + free(work_item); + + return commit; +} - while (commit == NULL) { - unsigned int mark; - struct commit_list *parents; - if (rev_list == NULL || non_common_revs == 0) +/* + Bisection state of the exponential stride algorithm: scans forward + (backward) for a non-common/already-sent (common/already-sent, + resp.) commit, then emits the middle of that range, and re-queues. + + This means we bisect the discovered range in log(n) transmissions, + although we do make O(n) steps through history. +*/ + +static struct commit* get_rev_bisect(struct work_list *work_item) +{ + struct commit *full; + struct commit *half; + int half_step; + struct commit_info *info; + unsigned int flags = work_item->commit->object.flags; + + if (!(flags & (COMMON|BISECTED_BW))) { + /* We don't know anything about the history _backward_ + * from this, so search it */ + + full = work_item->commit; + full->object.flags |= BISECTED_BW; + half = work_item->commit; + half_step = 0; + + while (full && full->parents) { + full = full->parents->item; + half_step ^= 1; + if (half_step) + half = half->parents->item; + if (full->object.flags & (POPPED|COMMON)) + break; + } + + /* also insert the same bisection again so we can try + * forward too */ + work_list_insert(work_item); + + if (full->object.flags & POPPED + && !(full->object.flags & NOTCOMMON) + && !(half->object.flags & (COMMON|POPPED))) { + setup_bisect(half, work_item->bisect); + half->object.flags |= POPPED; + return half; + } else { return NULL; + } + } - commit = rev_list->item; - if (!commit->object.parsed) - parse_commit(commit); - parents = commit->parents; - - commit->object.flags |= POPPED; - if (!(commit->object.flags & COMMON)) - non_common_revs--; - - if (commit->object.flags & COMMON) { - /* do not send "have", and ignore ancestors */ - commit = NULL; - mark = COMMON | SEEN; - } else if (commit->object.flags & COMMON_REF) - /* send "have", and ignore ancestors */ - mark = COMMON | SEEN; - else - /* send "have", also for its ancestors */ - mark = SEEN; - - while (parents) { - if (!(parents->item->object.flags & SEEN)) - rev_list_push(parents->item, mark); - if (mark & COMMON) - mark_common(parents->item, 1, 0); - parents = parents->next; + if (!(flags & (NOTCOMMON|BISECTED_FW))) { + /* We don't know anything about the history _forward_ + * from this, so search it */ + + full = work_item->commit; + full->object.flags |= BISECTED_FW; + half = work_item->commit; + half_step = 0; + + while (full && (info=get_commit_info(full))->children) { + setup_bisect_all(info->children->next, work_item->bisect); + full = info->children->item; + half_step ^= 1; + if (half_step) { + info = get_commit_info(half); + half = info->children->item; + } + if (full->object.flags & (POPPED|NOTCOMMON)) + break; } - rev_list = rev_list->next; + if (full->object.flags & POPPED + && !(full->object.flags & COMMON) + && !(half->object.flags & (NOTCOMMON|POPPED))) { + setup_bisect(half, work_item->bisect); + half->object.flags |= POPPED; + free(work_item); + return half; + } } - return commit->object.sha1; + free(work_item); + return NULL; } + +/* + Get the next revision to send. +*/ + +static struct commit *get_rev(void) +{ + struct commit *commit = NULL; + + while (commit == NULL && work_list) { + struct work_list *work_item = NULL; + + work_item = work_list; + /* we update the pointer early so the get_rev + * functions can free() or reuse the work_item as + * required */ + work_list = work_item->next; + + commit = work_item->commit; + + check_parsed_and_mark(commit); + + if (work_item->bisect) { + commit = get_rev_bisect(work_item); + } else { + commit = get_rev_stride(work_item); + } + } + + return commit; +} + +static void pop_outstanding() +{ + struct commit_list *item = outstanding; + outstanding = item->next; + free(item); +} + + static int find_common(int fd[2], unsigned char *result_sha1, struct ref *refs) { int fetching; int count = 0, flushes = 0, retval; + struct commit *commit; const unsigned char *sha1; unsigned in_vain = 0; int got_continue = 0; @@ -192,7 +477,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, } if (!fetching) - packet_write(fd[1], "want %s%s%s%s%s%s%s%s\n", + packet_write(fd[1], "want %s%s%s%s%s%s%s%s%s\n", sha1_to_hex(remote), (multi_ack ? " multi_ack" : ""), (use_sideband == 2 ? " side-band-64k" : ""), @@ -200,6 +485,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, (args.use_thin_pack ? " thin-pack" : ""), (args.no_progress ? " no-progress" : ""), (args.include_tag ? " include-tag" : ""), + (exp_stride_algorithm ? " exp-stride" : ""), " ofs-delta"); else packet_write(fd[1], "want %s\n", sha1_to_hex(remote)); @@ -243,13 +529,25 @@ static int find_common(int fd[2], unsigned char *result_sha1, flushes = 0; retval = -1; - while ((sha1 = get_rev())) { + while ((commit = get_rev())) { + sha1 = commit->object.sha1; packet_write(fd[1], "have %s\n", sha1_to_hex(sha1)); if (args.verbose) fprintf(stderr, "have %s\n", sha1_to_hex(sha1)); in_vain++; + + if (outstanding) { + commit_list_insert(commit, &(outstanding_end->next)); + outstanding_end = outstanding_end->next; + } else { + commit_list_insert(commit, &outstanding); + outstanding_end = outstanding; + } + if (!(31 & ++count)) { int ack; + int unwound = 0; + struct commit_list *item; packet_flush(fd[1]); flushes++; @@ -274,12 +572,23 @@ static int find_common(int fd[2], unsigned char *result_sha1, } else if (ack == 2) { struct commit *commit = lookup_commit(result_sha1); + while (commit != outstanding->item) { + mark_not_common(outstanding->item); + pop_outstanding(); + unwound++; + } + pop_outstanding(); + unwound++; mark_common(commit, 0, 1); retval = 0; in_vain = 0; got_continue = 1; } } while (ack); + while (unwound++ < 32) { + mark_not_common(outstanding->item); + pop_outstanding(); + } flushes--; if (got_continue && MAX_IN_VAIN < in_vain) { if (args.verbose) @@ -445,9 +754,8 @@ static int everything_local(struct ref **refs, int nr_match, char **match) continue; if (!(o->flags & SEEN)) { - rev_list_push((struct commit *)o, COMMON_REF | SEEN); - - mark_common((struct commit *)o, 1, 1); + rev_list_push((struct commit *)o, 1, 0, 0); + o->flags |= COMMON_REF; } } @@ -597,6 +905,11 @@ static struct ref *do_fetch_pack(int fd[2], fprintf(stderr, "Server supports side-band\n"); use_sideband = 1; } + if (server_supports("exp-stride")) { + if (args.verbose) + fprintf(stderr, "Server supports exp-stride\n"); + exp_stride_algorithm = 1; + } if (everything_local(&ref, nr_match, match)) { packet_flush(fd[1]); goto all_done; diff --git a/upload-pack.c b/upload-pack.c index e5adbc0..14012ee 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -37,6 +37,7 @@ static unsigned int timeout; */ static int use_sideband; static int debug_fd; +static int exp_stride_algorithm; static void reset_timeout(void) { @@ -414,7 +415,7 @@ static int get_common_commits(void) if (!prefixcmp(line, "have ")) { switch (got_sha1(line+5, sha1)) { case -1: /* they have what we do not */ - if (multi_ack && ok_to_give_up()) + if (multi_ack && !exp_stride_algorithm && ok_to_give_up()) packet_write(1, "ACK %s continue\n", sha1_to_hex(sha1)); break; @@ -501,6 +502,8 @@ static void receive_needs(void) no_progress = 1; if (strstr(line+45, "include-tag")) use_include_tag = 1; + if (strstr(line+45, "exp-stride")) + exp_stride_algorithm = 1; /* We have sent all our refs already, and the other end * should have chosen out of them; otherwise they are @@ -573,7 +576,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo { static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow no-progress" - " include-tag"; + " include-tag exp-stride"; struct object *o = parse_object(sha1); if (!o) -- tg: (1293c95..) t/fetch-pack-speedup (depends on: origin/master) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-09-24 0:48 ` [RFC PATCH v2] " Thomas Rast @ 2008-10-23 19:38 ` Thomas Rast 2008-10-24 15:49 ` Nicolas Pitre 2008-10-27 10:29 ` Nanako Shiraishi 0 siblings, 2 replies; 13+ messages in thread From: Thomas Rast @ 2008-10-23 19:38 UTC (permalink / raw) To: git Replaces the existing simple history search with a more sophisticated algorithm: 1) Walk history with exponentially increasing stride lengths; i.e., send the 1st commit, then the 2nd after that, then the 4th after that, and so on. 2) Bisect the resulting intervals. Combined with tracking the "outstanding haves" so that we can detect which sha1s were never ACKed by upload-pack (and hence not common), this gives O(log(n)) required "have" lines. Unfortunately this cannot work if the server sends fake ACKs, so we introduce a capability 'exp-stride' which instructs upload-pack to disable ok_to_give_up(). (Which incidentally saves the server a lot of work.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- This is a simple resend of v2, in the hope of attracting some discussion or at least encouraging words this time around. Here are the timings I took when I wrote the last mail: -- 8< -- For the timings below, I ran the (new) daemon locally, with two repos: - a 'clone --mirror' of my own git.git, with a branch 'next' 525 commits from the above cc185a6 - a 'clone --mirror' of the completely unrelated adps.git, which shares no objects with git.git Some not really statistically sound timings, usually best of 3-5: ## A stripped down git.git, with only one branch and no tags $ git branch -a * master $ git tag $ git log -1 --pretty=oneline master cc185a6a8ac24737a26ec4b40cc401c2db8b2e97 Start draft release notes for 1.6.0.3 ## I still have the old (dashed) forms in /usr/bin from RPMs $ /usr/bin/git --version git version 1.5.6 ## (1a) "Updating" from the git.git that is further ahead (OLD) $ time /usr/bin/git-fetch-pack -k git://localhost/git next [...] real 0m1.004s user 0m0.228s sys 0m0.028s ## (1b) "Updating" from the git.git that is further ahead (NEW) $ time git fetch-pack -k git://localhost/git next [...] real 0m0.977s user 0m0.208s sys 0m0.068s ## (2a) Fetching the unrelated repo from scratch (OLD) $ time /usr/bin/git-fetch-pack -k git://localhost/adps master [...] real 0m9.560s user 0m0.720s sys 0m0.128s ## (2b) Fetching the unrelated repo from scratch (NEW) $ time git fetch-pack -k git://localhost/adps master [...] real 0m0.596s user 0m0.372s sys 0m0.008s ## (3a) Fetching over the network from real repo (OLD) $ time /usr/bin/git-fetch-pack git://git.kernel.org/pub/scm/git/git.git next [...] user 0m0.528s sys 0m0.136s ## (3b) Fetching over the network from real repo (NEW) $ time git fetch-pack git://git.kernel.org/pub/scm/git/git.git next [...] user 0m0.540s sys 0m0.180s ## Add more refs to make it more interesting $ git remote add origin ~/dev/git $ git fetch --tags origin ## (4a) like 1a, but with lots of tags $ time /usr/bin/git-fetch-pack -k git://localhost/git next [...] real 0m1.075s user 0m0.452s sys 0m0.048s ## (4b) like 1b, but with lots of tags $ time git fetch-pack -k git://localhost/git next [...] real 0m0.837s user 0m0.236s sys 0m0.040s Clearly, this approach does solve the issue I mentioned in the motivation earlier in the thread, where the initial fetch of completely disjoint repositories takes _ages_. Somewhat to my own surprise, it seems to do quite well in _all_ cases even though it aggressively digs through history. Note, however, that the timings aren't really solid; the final write of the pack, which I assume is fsync()ed, seems to have a big impact on the execution time. Test (3) obviously depends more on network performance than anything else; and it uses the old (linear) algorithm, but in the new implementation. -- >8 -- builtin-fetch-pack.c | 477 +++++++++++++++++++++++++++++++++++++++++--------- upload-pack.c | 7 +- 2 files changed, 400 insertions(+), 84 deletions(-) diff --git a/builtin-fetch-pack.c b/builtin-fetch-pack.c index 372bfa2..9ab84ab 100644 --- a/builtin-fetch-pack.c +++ b/builtin-fetch-pack.c @@ -25,6 +25,12 @@ static const char fetch_pack_usage[] = #define COMMON_REF (1U << 2) #define SEEN (1U << 3) #define POPPED (1U << 4) +#define BISECTED_FW (1U << 5) +#define BISECTED_BW (1U << 6) +#define NOTCOMMON (1U << 7) +#define HAS_INFO (1U << 8) +#define CLEAN_MARKS (COMMON | COMMON_REF | SEEN | POPPED | BISECTED_FW \ + | BISECTED_BW | NOTCOMMON) static int marked; @@ -34,133 +40,412 @@ static int marked; */ #define MAX_IN_VAIN 256 -static struct commit_list *rev_list; -static int non_common_revs, multi_ack, use_sideband; +struct work_list +{ + struct work_list *next; + struct commit *commit; + unsigned int stride; /* current stride length between commits */ + unsigned int steps; /* steps left to go before emitting again */ + unsigned int depth; /* sideline depth */ + unsigned int bisect; +}; + +/* "recursion stack" of our strategy of choosing sha1s to transmit */ +static struct work_list *work_list = NULL; -static void rev_list_push(struct commit *commit, int mark) +/* list of commits not ACKed yet */ +static struct commit_list *outstanding = NULL; +static struct commit_list *outstanding_end = NULL; + +struct commit_info { - if (!(commit->object.flags & mark)) { - commit->object.flags |= mark; + struct commit_list *children; +}; - if (!(commit->object.parsed)) - if (parse_commit(commit)) - return; - insert_by_date(commit, &rev_list); +static int non_common_revs = 1, multi_ack, use_sideband, exp_stride_algorithm; - if (!(commit->object.flags & COMMON)) - non_common_revs++; +static struct commit_info *get_commit_info(struct commit *commit) +{ + if (commit->object.flags & HAS_INFO) + return commit->util; + + struct commit_info *info = xmalloc(sizeof(struct commit_info)); + info->children = NULL; + commit->util = info; + commit->object.flags |= HAS_INFO; + return info; +} + +static void work_list_insert(struct work_list *work_item) +{ + struct work_list **pp = &work_list; + struct work_list *p; + + while ((p = *pp)) { + if (p->bisect || work_item->bisect) { + if (p->bisect > work_item->bisect) + break; + } else { + if (p->depth > work_item->depth) + break; + else if (p->commit->date < work_item->commit->date) + break; + } + + pp = &p->next; } + + work_item->next = p; + *pp = work_item; } -static int rev_list_insert_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data) +static void add_child(struct commit *parent, struct commit *child) +{ + struct commit_info *info = get_commit_info(parent); + struct commit_list *item; + for (item = info->children; item; item = item->next) + if (item->item == child) + break; + if (!item) + commit_list_insert(child, &(info->children)); +} + + +/* + Marks all (known) children of this commit NOTCOMMON. We call this + for all sha1s the server did not ACK. +*/ + +static void mark_not_common(struct commit *commit) +{ + struct commit_info *info = get_commit_info(commit); + struct commit_list *child; + + assert(!(commit->object.flags & COMMON)); + + if (commit->object.flags & NOTCOMMON) + return; + + commit->object.flags |= NOTCOMMON; + + for (child = info->children; child; child = child->next) + mark_not_common(child->item); +} + + +static void check_parsed_and_mark(struct commit *commit) +{ + struct commit_list *parent; + + if (!commit && commit->object.parsed) + return; + + parse_commit(commit); + + for (parent = commit->parents; parent; parent = parent->next) { + add_child(parent->item, commit); + if (parent->item->object.flags & NOTCOMMON) + mark_not_common(commit); + } +} + + +static void rev_list_push(struct commit *commit, unsigned int stride, + unsigned int steps, unsigned int depth) +{ + struct work_list *work_item; + + check_parsed_and_mark(commit); + + work_item = xmalloc(sizeof(struct work_list)); + work_item->commit = commit; + work_item->stride = stride; + work_item->steps = steps; + work_item->depth = depth; + work_item->bisect = 0; + work_list_insert(work_item); +} + +static int rev_list_insert_ref(const char *path, const unsigned char *sha1, + int flag, void *cb_data) { struct object *o = deref_tag(parse_object(sha1), path, 0); - if (o && o->type == OBJ_COMMIT) - rev_list_push((struct commit *)o, SEEN); + if (o && o->type == OBJ_COMMIT && !(o->flags & SEEN)) + rev_list_push((struct commit *)o, 1, 1, 0); return 0; } -static int clear_marks(const char *path, const unsigned char *sha1, int flag, void *cb_data) +static int clear_marks(const char *path, const unsigned char *sha1, int flag, + void *cb_data) { struct object *o = deref_tag(parse_object(sha1), path, 0); if (o && o->type == OBJ_COMMIT) - clear_commit_marks((struct commit *)o, - COMMON | COMMON_REF | SEEN | POPPED); + clear_commit_marks((struct commit *)o, CLEAN_MARKS); return 0; } + /* - This function marks a rev and its ancestors as common. - In some cases, it is desirable to mark only the ancestors (for example - when only the server does not yet know that they are common). + Mark commits backwards through history as COMMON. */ -static void mark_common(struct commit *commit, - int ancestors_only, int dont_parse) +static void mark_common(struct commit *commit, int ancestors_only, int dont_parse) { - if (commit != NULL && !(commit->object.flags & COMMON)) { - struct object *o = (struct object *)commit; + struct commit_list *parents; + struct object *o; - if (!ancestors_only) - o->flags |= COMMON; + if (commit == NULL || commit->object.flags & (COMMON_REF|COMMON)) + return; - if (!(o->flags & SEEN)) - rev_list_push(commit, SEEN); - else { - struct commit_list *parents; - - if (!ancestors_only && !(o->flags & POPPED)) - non_common_revs--; - if (!o->parsed && !dont_parse) - if (parse_commit(commit)) - return; - - for (parents = commit->parents; - parents; - parents = parents->next) - mark_common(parents->item, 0, dont_parse); - } + if (!dont_parse && !(commit->object.parsed)) + check_parsed_and_mark(commit); + + o = (struct object *)commit; + + if (!ancestors_only) + o->flags |= COMMON; + + for (parents = commit->parents; parents; parents = parents->next) { + add_child(parents->item, commit); + mark_common(parents->item, 0, dont_parse); } } + + /* - Get the next rev to send, ignoring the common. + Queue all revs in the list for simple backward search. Does not + requeue already SEEN commits. */ -static const unsigned char* get_rev(void) +static void queue_list(struct commit *child, struct commit_list *list, + unsigned int stride, unsigned int steps, + unsigned int depth) { - struct commit *commit = NULL; + while (list) { + add_child(list->item, child); + if (!(list->item->object.flags & SEEN)) + rev_list_push(list->item, stride, steps, depth); + list = list->next; + } +} + + +/* + Queues a commit for bisection +*/ + +static void setup_bisect(struct commit *commit, int depth) +{ + struct work_list *work_item = xmalloc(sizeof(struct work_list)); + + work_item->commit = commit; + work_item->bisect = 1+depth; + work_item->depth = 0; + work_list_insert(work_item); +} + + +/* + Queues a list of commits for bisection +*/ + +static void setup_bisect_all (struct commit_list* list, int depth) +{ + while (list) { + setup_bisect(list->item, depth); + list = list->next; + } +} + + +/* + Backwards search. If we use exp_stride_algorithm, it takes + exponential strides between commits chosen. +*/ + +static struct commit* get_rev_stride(struct work_list *work_item) +{ + struct commit *commit = work_item->commit; + unsigned int steps = work_item->steps; + + while (steps++ < work_item->stride-1 && commit->parents + && !(commit->object.flags & (SEEN|COMMON|COMMON_REF))) { + /* all but the first parent line are queued for later */ + if (commit->parents) + queue_list(commit, commit->parents->next, + work_item->stride, steps, + work_item->depth+1); + + /* follow the first parent line directly */ + commit->object.flags |= SEEN; + add_child(commit->parents->item, commit); + commit = commit->parents->item; + check_parsed_and_mark(commit); + } + + if (commit->object.flags & (SEEN|COMMON)) { + /* already been here! */ + commit = NULL; + } else if (commit->object.flags & COMMON_REF) { + /* this came from a ref; we stop here, but need to + * emit it so the server knows too */ + commit->object.flags |= POPPED|SEEN|BISECTED_BW; + if (exp_stride_algorithm) + setup_bisect(commit, 0); + } else { + /* usual case: this is news to us, so we scan further + * back. note that the first-depth bisection only + * needs to go forward */ + commit->object.flags |= POPPED|SEEN|BISECTED_BW; + queue_list(commit, commit->parents, + exp_stride_algorithm ? work_item->stride*2 : 1, + 0, work_item->depth); + if (exp_stride_algorithm) + setup_bisect(commit, 0); + } + + free(work_item); + + return commit; +} - while (commit == NULL) { - unsigned int mark; - struct commit_list *parents; - if (rev_list == NULL || non_common_revs == 0) +/* + Bisection state of the exponential stride algorithm: scans forward + (backward) for a non-common/already-sent (common/already-sent, + resp.) commit, then emits the middle of that range, and re-queues. + + This means we bisect the discovered range in log(n) transmissions, + although we do make O(n) steps through history. +*/ + +static struct commit* get_rev_bisect(struct work_list *work_item) +{ + struct commit *full; + struct commit *half; + int half_step; + struct commit_info *info; + unsigned int flags = work_item->commit->object.flags; + + if (!(flags & (COMMON|BISECTED_BW))) { + /* We don't know anything about the history _backward_ + * from this, so search it */ + + full = work_item->commit; + full->object.flags |= BISECTED_BW; + half = work_item->commit; + half_step = 0; + + while (full && full->parents) { + full = full->parents->item; + half_step ^= 1; + if (half_step) + half = half->parents->item; + if (full->object.flags & (POPPED|COMMON)) + break; + } + + /* also insert the same bisection again so we can try + * forward too */ + work_list_insert(work_item); + + if (full->object.flags & POPPED + && !(full->object.flags & NOTCOMMON) + && !(half->object.flags & (COMMON|POPPED))) { + setup_bisect(half, work_item->bisect); + half->object.flags |= POPPED; + return half; + } else { return NULL; + } + } - commit = rev_list->item; - if (!commit->object.parsed) - parse_commit(commit); - parents = commit->parents; - - commit->object.flags |= POPPED; - if (!(commit->object.flags & COMMON)) - non_common_revs--; - - if (commit->object.flags & COMMON) { - /* do not send "have", and ignore ancestors */ - commit = NULL; - mark = COMMON | SEEN; - } else if (commit->object.flags & COMMON_REF) - /* send "have", and ignore ancestors */ - mark = COMMON | SEEN; - else - /* send "have", also for its ancestors */ - mark = SEEN; - - while (parents) { - if (!(parents->item->object.flags & SEEN)) - rev_list_push(parents->item, mark); - if (mark & COMMON) - mark_common(parents->item, 1, 0); - parents = parents->next; + if (!(flags & (NOTCOMMON|BISECTED_FW))) { + /* We don't know anything about the history _forward_ + * from this, so search it */ + + full = work_item->commit; + full->object.flags |= BISECTED_FW; + half = work_item->commit; + half_step = 0; + + while (full && (info=get_commit_info(full))->children) { + setup_bisect_all(info->children->next, work_item->bisect); + full = info->children->item; + half_step ^= 1; + if (half_step) { + info = get_commit_info(half); + half = info->children->item; + } + if (full->object.flags & (POPPED|NOTCOMMON)) + break; } - rev_list = rev_list->next; + if (full->object.flags & POPPED + && !(full->object.flags & COMMON) + && !(half->object.flags & (NOTCOMMON|POPPED))) { + setup_bisect(half, work_item->bisect); + half->object.flags |= POPPED; + free(work_item); + return half; + } } - return commit->object.sha1; + free(work_item); + return NULL; } + +/* + Get the next revision to send. +*/ + +static struct commit *get_rev(void) +{ + struct commit *commit = NULL; + + while (commit == NULL && work_list) { + struct work_list *work_item = NULL; + + work_item = work_list; + /* we update the pointer early so the get_rev + * functions can free() or reuse the work_item as + * required */ + work_list = work_item->next; + + commit = work_item->commit; + + check_parsed_and_mark(commit); + + if (work_item->bisect) { + commit = get_rev_bisect(work_item); + } else { + commit = get_rev_stride(work_item); + } + } + + return commit; +} + +static void pop_outstanding() +{ + struct commit_list *item = outstanding; + outstanding = item->next; + free(item); +} + + static int find_common(int fd[2], unsigned char *result_sha1, struct ref *refs) { int fetching; int count = 0, flushes = 0, retval; + struct commit *commit; const unsigned char *sha1; unsigned in_vain = 0; int got_continue = 0; @@ -192,7 +477,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, } if (!fetching) - packet_write(fd[1], "want %s%s%s%s%s%s%s%s\n", + packet_write(fd[1], "want %s%s%s%s%s%s%s%s%s\n", sha1_to_hex(remote), (multi_ack ? " multi_ack" : ""), (use_sideband == 2 ? " side-band-64k" : ""), @@ -200,6 +485,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, (args.use_thin_pack ? " thin-pack" : ""), (args.no_progress ? " no-progress" : ""), (args.include_tag ? " include-tag" : ""), + (exp_stride_algorithm ? " exp-stride" : ""), " ofs-delta"); else packet_write(fd[1], "want %s\n", sha1_to_hex(remote)); @@ -243,13 +529,25 @@ static int find_common(int fd[2], unsigned char *result_sha1, flushes = 0; retval = -1; - while ((sha1 = get_rev())) { + while ((commit = get_rev())) { + sha1 = commit->object.sha1; packet_write(fd[1], "have %s\n", sha1_to_hex(sha1)); if (args.verbose) fprintf(stderr, "have %s\n", sha1_to_hex(sha1)); in_vain++; + + if (outstanding) { + commit_list_insert(commit, &(outstanding_end->next)); + outstanding_end = outstanding_end->next; + } else { + commit_list_insert(commit, &outstanding); + outstanding_end = outstanding; + } + if (!(31 & ++count)) { int ack; + int unwound = 0; + struct commit_list *item; packet_flush(fd[1]); flushes++; @@ -274,12 +572,23 @@ static int find_common(int fd[2], unsigned char *result_sha1, } else if (ack == 2) { struct commit *commit = lookup_commit(result_sha1); + while (commit != outstanding->item) { + mark_not_common(outstanding->item); + pop_outstanding(); + unwound++; + } + pop_outstanding(); + unwound++; mark_common(commit, 0, 1); retval = 0; in_vain = 0; got_continue = 1; } } while (ack); + while (unwound++ < 32) { + mark_not_common(outstanding->item); + pop_outstanding(); + } flushes--; if (got_continue && MAX_IN_VAIN < in_vain) { if (args.verbose) @@ -445,9 +754,8 @@ static int everything_local(struct ref **refs, int nr_match, char **match) continue; if (!(o->flags & SEEN)) { - rev_list_push((struct commit *)o, COMMON_REF | SEEN); - - mark_common((struct commit *)o, 1, 1); + rev_list_push((struct commit *)o, 1, 0, 0); + o->flags |= COMMON_REF; } } @@ -597,6 +905,11 @@ static struct ref *do_fetch_pack(int fd[2], fprintf(stderr, "Server supports side-band\n"); use_sideband = 1; } + if (server_supports("exp-stride")) { + if (args.verbose) + fprintf(stderr, "Server supports exp-stride\n"); + exp_stride_algorithm = 1; + } if (everything_local(&ref, nr_match, match)) { packet_flush(fd[1]); goto all_done; diff --git a/upload-pack.c b/upload-pack.c index e5adbc0..14012ee 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -37,6 +37,7 @@ static unsigned int timeout; */ static int use_sideband; static int debug_fd; +static int exp_stride_algorithm; static void reset_timeout(void) { @@ -414,7 +415,7 @@ static int get_common_commits(void) if (!prefixcmp(line, "have ")) { switch (got_sha1(line+5, sha1)) { case -1: /* they have what we do not */ - if (multi_ack && ok_to_give_up()) + if (multi_ack && !exp_stride_algorithm && ok_to_give_up()) packet_write(1, "ACK %s continue\n", sha1_to_hex(sha1)); break; @@ -501,6 +502,8 @@ static void receive_needs(void) no_progress = 1; if (strstr(line+45, "include-tag")) use_include_tag = 1; + if (strstr(line+45, "exp-stride")) + exp_stride_algorithm = 1; /* We have sent all our refs already, and the other end * should have chosen out of them; otherwise they are @@ -573,7 +576,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo { static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow no-progress" - " include-tag"; + " include-tag exp-stride"; struct object *o = parse_object(sha1); if (!o) -- tg: (759ad19..) t/fetch-pack-speedup (depends on: origin/master) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-10-23 19:38 ` Thomas Rast @ 2008-10-24 15:49 ` Nicolas Pitre 2008-10-27 10:29 ` Nanako Shiraishi 1 sibling, 0 replies; 13+ messages in thread From: Nicolas Pitre @ 2008-10-24 15:49 UTC (permalink / raw) To: Thomas Rast; +Cc: git On Thu, 23 Oct 2008, Thomas Rast wrote: > Replaces the existing simple history search with a more sophisticated > algorithm: > > 1) Walk history with exponentially increasing stride lengths; i.e., > send the 1st commit, then the 2nd after that, then the 4th after > that, and so on. > > 2) Bisect the resulting intervals. > > Combined with tracking the "outstanding haves" so that we can detect > which sha1s were never ACKed by upload-pack (and hence not common), > this gives O(log(n)) required "have" lines. > > Unfortunately this cannot work if the server sends fake ACKs, so we > introduce a capability 'exp-stride' which instructs upload-pack to > disable ok_to_give_up(). (Which incidentally saves the server a lot > of work.) > > Signed-off-by: Thomas Rast <trast@student.ethz.ch> > > --- > > This is a simple resend of v2, in the hope of attracting some > discussion or at least encouraging words this time around. OK, I gave this a quick try, and fetch operations appear to make their mind about what to do quicker. Some fetch requests which used to take up to 5 seconds are somewhat faster. I have no formal measurements though. Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-10-23 19:38 ` Thomas Rast 2008-10-24 15:49 ` Nicolas Pitre @ 2008-10-27 10:29 ` Nanako Shiraishi 2008-10-28 3:24 ` Junio C Hamano 1 sibling, 1 reply; 13+ messages in thread From: Nanako Shiraishi @ 2008-10-27 10:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: Thomas Rast, git Quoting Thomas Rast <trast@student.ethz.ch>: > Replaces the existing simple history search with a more sophisticated > algorithm: > > 1) Walk history with exponentially increasing stride lengths; i.e., > send the 1st commit, then the 2nd after that, then the 4th after > that, and so on. > > 2) Bisect the resulting intervals. Junio, may I ask what the status of this patch is? I see Nicolas responded and said "I gave this a quick try". Wasn't it a good enough review? -- Nanako Shiraishi http://ivory.ap.teacup.com/nanako3/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-10-27 10:29 ` Nanako Shiraishi @ 2008-10-28 3:24 ` Junio C Hamano 2008-10-28 14:46 ` Nicolas Pitre 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2008-10-28 3:24 UTC (permalink / raw) To: Nanako Shiraishi; +Cc: Thomas Rast, git Nanako Shiraishi <nanako3@lavabit.com> writes: > Quoting Thomas Rast <trast@student.ethz.ch>: > >> Replaces the existing simple history search with a more sophisticated >> algorithm: >> >> 1) Walk history with exponentially increasing stride lengths; i.e., >> send the 1st commit, then the 2nd after that, then the 4th after >> that, and so on. >> >> 2) Bisect the resulting intervals. > > Junio, may I ask what the status of this patch is? I see Nicolas responded and said "I gave this a quick try". Wasn't it a good enough review? I took the "quick try" more about "first feel in performance" and not "code review concentrating on correctness and trying to catch mistakes". I like what the patch tries to solve, and the approach it takes to solve it. I haven't had a chance to have a block of time for me to concentrate on this patch to assess where it could go wrong yet. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-10-28 3:24 ` Junio C Hamano @ 2008-10-28 14:46 ` Nicolas Pitre 2008-10-28 19:37 ` Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 0/2] " Thomas Rast 0 siblings, 2 replies; 13+ messages in thread From: Nicolas Pitre @ 2008-10-28 14:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nanako Shiraishi, Thomas Rast, git On Mon, 27 Oct 2008, Junio C Hamano wrote: > Nanako Shiraishi <nanako3@lavabit.com> writes: > > > Quoting Thomas Rast <trast@student.ethz.ch>: > > > >> Replaces the existing simple history search with a more sophisticated > >> algorithm: > >> > >> 1) Walk history with exponentially increasing stride lengths; i.e., > >> send the 1st commit, then the 2nd after that, then the 4th after > >> that, and so on. > >> > >> 2) Bisect the resulting intervals. > > > > Junio, may I ask what the status of this patch is? I see Nicolas responded and said "I gave this a quick try". Wasn't it a good enough review? > > I took the "quick try" more about "first feel in performance" and not > "code review concentrating on correctness and trying to catch mistakes". Exact. FWIW, I had to back this patch out from my version as things seemed to fall into an infinite loop of ref negotiation while fetching the Linux kernel repository at some point. Doing a "git fetch -v -v" turned up an endless stream of "got" and "have" lines. I was in a hurry for $work so didn't think of preserving my local refs for reproduction of the problem. Sorry for not being more helpful. This is some part of git that I know f-all about. Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2] fetch-pack: log(n)-transmission find_common() 2008-10-28 14:46 ` Nicolas Pitre @ 2008-10-28 19:37 ` Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 0/2] " Thomas Rast 1 sibling, 0 replies; 13+ messages in thread From: Thomas Rast @ 2008-10-28 19:37 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, Nanako Shiraishi, git [-- Attachment #1: Type: text/plain, Size: 678 bytes --] Nicolas Pitre wrote: > > FWIW, I had to back this patch out from my version as things seemed to > fall into an infinite loop of ref negotiation while fetching the Linux > kernel repository at some point. Doing a "git fetch -v -v" turned up an > endless stream of "got" and "have" lines. I was in a hurry for $work so > didn't think of preserving my local refs for reproduction of the > problem. Ah. Thanks for the report, though having a test case would be great. I've been running it without problems since I completed v2 more than a month ago, so I don't expect it to be an obvious mistake. - Thomas -- Thomas Rast trast@{inf,student}.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 0/2] fetch-pack: log(n)-transmission find_common() 2008-10-28 14:46 ` Nicolas Pitre 2008-10-28 19:37 ` Thomas Rast @ 2008-12-06 23:20 ` Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 1/2] fetch-pack: rearrange main loop Thomas Rast 1 sibling, 1 reply; 13+ messages in thread From: Thomas Rast @ 2008-12-06 23:20 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Nicolas Pitre, Nanako Shiraishi If you still know what this was about, you can skip ahead to the "==" marker below. The mail that started it all was http://kerneltrap.org/mailarchive/git/2008/9/17/3328364 [Abstract:] Using two passes, first exponential stride through history, then bisection on the "right" windows, we implement the find_common handshake in log(history_size) many transmissions. The followup patch partially achieves that, but a few points are open and strict backwards compatibility is missing (but its absence is Mostly Harmless). This spawned out of a problem originally reported on IRC where pulling several disjoint histories together takes ages (due to the load on the server caused by ok_to_give_up()). Apparently this is a common workflow in the Ruby-on-Rails. The bound on the number of transmissions holds in other cases too, of course. The idea attracted some interest, but to my knowledge nobody actually reviewed the code, and it eventually died because of Nicholas's finding: Nicolas Pitre wrote: > FWIW, I had to back this patch out from my version as things seemed to > fall into an infinite loop of ref negotiation while fetching the Linux > kernel repository at some point. Doing a "git fetch -v -v" turned up an > endless stream of "got" and "have" lines. I was in a hurry for $work so > didn't think of preserving my local refs for reproduction of the > problem. Unfortunately, I was never able to reliably reproduce this problem. (IIRC I did once when running an automated hammering script but it vanished when I tried again manually.) == After a long break, I decided to pick this up again. I'm not sure this is the best time to do so, but I had the time to spare, and (unless there are more serious bugs, and Gods and Junio willing) we might get it into 'next' sometime early in the next cycle for better testing. I rewrote the core of the algorithm, though some helpers and most of the glue to surrounding routines are still the same. I took the following design decisions that weren't in v2: 0. Lots of comments. Well, by my standards in any case. 1. The old code stays. I mostly want to make it (far) easier to be certain that it behaves exactly as before if the server does not support disabling ok_to_give_up(). [The idea is that it should be easy to verify that 1/2 is a no-change refactoring and 2/2 only does something if the server supports it.] 2. The work list now uses a Fibonacci heap to order jobs. I'm not religious about the specific flavour of heap, but I had the description of these handy. (The work list can get rather large.) 3. The bisection is essentially precomputed. By staring at the binary representation of the distance from the starting commit hard enough, it can be constructed during the stride pass. Barring a circular bisection "tree", the algorithm is now guaranteed to terminate even if it, say, emits too many or too few sha1s due to some other bug. The last point of course means that the main work is now in the tangled mess that is get_rev_new_stride(), and "staring hard enough" frequently enough meant grabbing a notepad. YMMV. While I did run a lot of tests, including 'make test' and some automated hammering, it seems quite hard to make sure it _really_ bisects correctly (except manually in toy examples, such as two linear histories with a common base). Suggestions for automated tests welcome. To see the effects of the patch, try git fetch-pack -v -k <url> <head> 2>&1 \ | git name-rev --stdin refs/heads/master with a daemon that supports the feature. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 1/2] fetch-pack: rearrange main loop 2008-12-06 23:20 ` [RFC PATCH v3 0/2] " Thomas Rast @ 2008-12-06 23:20 ` Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 2/2] fetch-pack: log(n)-transmission find_common() Thomas Rast 0 siblings, 1 reply; 13+ messages in thread From: Thomas Rast @ 2008-12-06 23:20 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Nicolas Pitre, Nanako Shiraishi This patch does not change the results (nor any of the semantics except for the get_rev return type), but we need the changed layout for the exponential-stride feature. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- builtin-fetch-pack.c | 108 +++++++++++++++++++++++++++++-------------------- 1 files changed, 64 insertions(+), 44 deletions(-) diff --git a/builtin-fetch-pack.c b/builtin-fetch-pack.c index 372bfa2..ae0a67a 100644 --- a/builtin-fetch-pack.c +++ b/builtin-fetch-pack.c @@ -111,7 +111,7 @@ static void mark_common(struct commit *commit, Get the next rev to send, ignoring the common. */ -static const unsigned char* get_rev(void) +static struct commit* get_rev(void) { struct commit *commit = NULL; @@ -153,15 +153,41 @@ static const unsigned char* get_rev(void) rev_list = rev_list->next; } - return commit->object.sha1; + return commit; } +/* + * Send 'have' for the next batch of revisions. Returns 0 if we ran + * out of commits to send, 1 otherwise. + */ + +static int send_have_lines(int fd[2], int *flushes, unsigned *in_vain) +{ + struct commit *commit; + int i; + + for (i = 0; i < 32; i++) { + commit = get_rev(); + if (!commit) + return 0; + packet_write(fd[1], "have %s\n", + sha1_to_hex(commit->object.sha1)); + if (args.verbose) + fprintf(stderr, "have %s\n", + sha1_to_hex(commit->object.sha1)); + } + packet_flush(fd[1]); + *flushes += 1; + *in_vain += 32; + return 1; +} + + static int find_common(int fd[2], unsigned char *result_sha1, struct ref *refs) { int fetching; int count = 0, flushes = 0, retval; - const unsigned char *sha1; unsigned in_vain = 0; int got_continue = 0; @@ -243,51 +269,45 @@ static int find_common(int fd[2], unsigned char *result_sha1, flushes = 0; retval = -1; - while ((sha1 = get_rev())) { - packet_write(fd[1], "have %s\n", sha1_to_hex(sha1)); - if (args.verbose) - fprintf(stderr, "have %s\n", sha1_to_hex(sha1)); - in_vain++; - if (!(31 & ++count)) { - int ack; - packet_flush(fd[1]); - flushes++; + /* + * We keep one window "ahead" of the other side, and + * will wait for an ACK only on the next one + */ + if (!send_have_lines(fd, &flushes, &in_vain)) + goto done; - /* - * We keep one window "ahead" of the other side, and - * will wait for an ACK only on the next one - */ - if (count == 32) - continue; + while (send_have_lines(fd, &flushes, &in_vain)) { + int ack; + int unwound = 0; - do { - ack = get_ack(fd[0], result_sha1); - if (args.verbose && ack) - fprintf(stderr, "got ack %d %s\n", ack, - sha1_to_hex(result_sha1)); - if (ack == 1) { - flushes = 0; - multi_ack = 0; - retval = 0; - goto done; - } else if (ack == 2) { - struct commit *commit = - lookup_commit(result_sha1); - mark_common(commit, 0, 1); - retval = 0; - in_vain = 0; - got_continue = 1; - } - } while (ack); - flushes--; - if (got_continue && MAX_IN_VAIN < in_vain) { - if (args.verbose) - fprintf(stderr, "giving up\n"); - break; /* give up */ + do { + ack = get_ack(fd[0], result_sha1); + if (args.verbose && ack) + fprintf(stderr, "got ack %d %s\n", ack, + sha1_to_hex(result_sha1)); + if (ack == 1) { + flushes = 0; + multi_ack = 0; + retval = 0; + goto done; + } else if (ack == 2) { + struct commit *commit = + lookup_commit(result_sha1); + mark_common(commit, 0, 1); + retval = 0; + in_vain = 0; + got_continue = 1; } + } while (ack); + flushes--; + if (got_continue && MAX_IN_VAIN < in_vain) { + if (args.verbose) + fprintf(stderr, "giving up\n"); + break; /* give up */ } } + done: packet_write(fd[1], "done\n"); if (args.verbose) -- tg: (7f705dc..) t/fp-refactor (depends on: origin/master) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 2/2] fetch-pack: log(n)-transmission find_common() 2008-12-06 23:20 ` [RFC PATCH v3 1/2] fetch-pack: rearrange main loop Thomas Rast @ 2008-12-06 23:20 ` Thomas Rast 0 siblings, 0 replies; 13+ messages in thread From: Thomas Rast @ 2008-12-06 23:20 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Nicolas Pitre, Nanako Shiraishi Replaces the existing simple history search with a more sophisticated algorithm: 1) Walk history with exponentially increasing stride lengths; i.e., send the 1st commit, then the 2nd after that, then the 4th after that, and so on. 2) Bisect the resulting intervals. Combined with tracking the "outstanding haves" so that we can detect which sha1s were never ACKed by upload-pack (and hence not common), this gives O(log(n)) required "have" lines. Unfortunately this cannot work if the server sends fake ACKs, so we introduce a capability 'no-giveup' which instructs upload-pack to disable ok_to_give_up(). (Which incidentally saves the server a lot of work.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- builtin-fetch-pack.c | 789 +++++++++++++++++++++++++++++++++++++++++++++++++- upload-pack.c | 7 +- 2 files changed, 787 insertions(+), 9 deletions(-) diff --git a/builtin-fetch-pack.c b/builtin-fetch-pack.c index ae0a67a..ea88e28 100644 --- a/builtin-fetch-pack.c +++ b/builtin-fetch-pack.c @@ -9,6 +9,7 @@ #include "fetch-pack.h" #include "remote.h" #include "run-command.h" +#include <string.h> static int transfer_unpack_limit = -1; static int fetch_unpack_limit = -1; @@ -25,6 +26,14 @@ static const char fetch_pack_usage[] = #define COMMON_REF (1U << 2) #define SEEN (1U << 3) #define POPPED (1U << 4) +#define BISECTED_FW (1U << 5) +#define BISECTED_BW (1U << 6) +#define NOTCOMMON (1U << 7) +#define HAS_INFO (1U << 8) +#define PARSED_WITH_PARENTS (1U << 9) +#define CLEAN_MARKS (COMMON | COMMON_REF | SEEN | POPPED | BISECTED_FW \ + | BISECTED_BW | NOTCOMMON | HAS_INFO \ + | PARSED_WITH_PARENTS) static int marked; @@ -34,8 +43,61 @@ static int marked; */ #define MAX_IN_VAIN 256 +struct work_heap +{ + struct work_heap *next; + struct work_heap *prev; + struct work_item *item; +}; + +struct work_item +{ + /* Fibonacci heap data */ + /* we don't need a parent pointer because we don't support changing keys */ + struct work_item *child; + struct work_item *sibling; + int rank; + int marked; + + /* actual content */ + struct commit *commit; + unsigned int stride; /* for stride mode */ + unsigned int steps; + unsigned int depth; + unsigned int bisect; /* for bisection mode */ + + /* + * Helper array to generate the bisection order. A size of 32 + * is ok as long as the history length is less than 2**32. + */ + struct commit **ladder; +}; + +/* + * "Recursion stack" of the exponential-stride algorithm + * + * This is a priority queue implemented as a Fibonacci heap since we + * expect to stuff a lot of commits into it. + */ +static struct work_heap *work_heap = NULL; +static struct work_heap *work_min = NULL; + +/* list of commits not ACKed yet */ +static struct commit_list *outstanding = NULL; +static struct commit_list *outstanding_end = NULL; + +struct commit_info +{ + struct commit_list *children; + struct commit_list *bisect_forward; + struct commit_list *bisect_backward; + struct commit_list *forward_bound; + struct commit_list *backward_bound; +}; + static struct commit_list *rev_list; static int non_common_revs, multi_ack, use_sideband; +static int exp_stride_algorithm; static void rev_list_push(struct commit *commit, int mark) { @@ -68,8 +130,7 @@ static int clear_marks(const char *path, const unsigned char *sha1, int flag, vo struct object *o = deref_tag(parse_object(sha1), path, 0); if (o && o->type == OBJ_COMMIT) - clear_commit_marks((struct commit *)o, - COMMON | COMMON_REF | SEEN | POPPED); + clear_commit_marks((struct commit *)o, CLEAN_MARKS); return 0; } @@ -107,6 +168,354 @@ static void mark_common(struct commit *commit, } } +static struct commit_info *get_commit_info(struct commit *commit) +{ + if (commit->object.flags & HAS_INFO) + return commit->util; + + struct commit_info *info = xmalloc(sizeof(struct commit_info)); + info->children = NULL; + info->bisect_forward = NULL; + info->bisect_backward = NULL; + info->forward_bound = NULL; + info->backward_bound = NULL; + commit->util = info; + commit->object.flags |= HAS_INFO; + return info; +} + +static void add_child(struct commit *parent, struct commit *child) +{ + struct commit_info *info = get_commit_info(parent); + struct commit_list *item; + for (item = info->children; item; item = item->next) + if (item->item == child) + break; + if (!item) + commit_list_insert(child, &(info->children)); +} + +/* implemented further below */ +static void mark_not_common(struct commit *commit); + +static void check_parsed_and_mark(struct commit *commit) +{ + struct commit_list *parent; + + if (!commit || commit->object.flags & PARSED_WITH_PARENTS) + return; + + parse_commit(commit); + + commit->object.flags |= PARSED_WITH_PARENTS; + + for (parent = commit->parents; parent; parent = parent->next) { + add_child(parent->item, commit); + if (parent->item->object.flags & NOTCOMMON) + mark_not_common(commit); + } +} + +static void work_item_free(struct work_item *item) +{ + if (!item->bisect) + free(item->ladder); + free(item); +} + +/* + * Ordering imposed on the "recursion" that generates the commits. + * The smallest item is processed first. To maintain the O(log n) + * transmissions bound we want to achieve, bisections must come after + * (compare >0) strides. + */ + +static int work_item_cmp(struct work_item *a, struct work_item *b) +{ + if (a->bisect || b->bisect) { + /* bisections last */ + if (a->bisect > b->bisect) + return 1; + else if (a->bisect < b->bisect) + return -1; + + /* older commits later */ + if (a->commit->date < b->commit->date) + return 1; + else if (a->commit->date == b->commit->date) + return 0; + else + return -1; + } else { + /* deeper sidelines of history later */ + if (a->depth > b->depth) + return 1; + else if (a->depth < b->depth) + return -1; + + /* older commits later */ + if (a->commit->date < b->commit->date) + return 1; + else if (a->commit->date == b->commit->date) + return 0; + else + return -1; + } +} + +/* + * Insert/remove items from the work_heap. _pop() always returns the + * smallest one. + */ + +static void work_heap_insert_internal(struct work_item *item) +{ + struct work_heap *entry = xmalloc(sizeof(struct work_heap)); + entry->item = item; + + if (!work_heap) { + entry->prev = entry; + entry->next = entry; + work_heap = entry; + work_min = work_heap; + return; + } + + entry->next = work_heap; + entry->prev = work_heap->prev; + work_heap->prev = entry; + entry->prev->next = entry; + if (!work_min || work_item_cmp(item, work_min->item) < 0) + work_min = entry; + work_heap = entry; +} + +static void work_heap_insert(struct work_item *item) +{ + item->marked = 0; + item->child = NULL; + item->sibling = NULL; + item->rank = 0; + + work_heap_insert_internal(item); +} + +static struct work_item *work_heap_pop(void) +{ + struct work_heap *entry = work_min; + struct work_item *x, *y, *t; + struct work_item *child, *item; + /* The rank of any child in a Fib heap of size n can be at + * most log_{3/2}(n), and we only support 2**32 items + * anyway */ + struct work_item *ranks[55] = {0,}; + int i; + + if (!entry) + return NULL; + + if (entry == entry->prev) { + work_heap = NULL; + work_min = NULL; + item = entry->item; + free(entry); + for (child = item->child; child; child = child->sibling) { + child->marked = 0; + work_heap_insert_internal(child); + } + return item; + } + + work_heap = entry->next; + entry->next->prev = entry->prev; + entry->prev->next = entry->next; + item = entry->item; + free(entry); + work_min = NULL; + + for (child = item->child; child; child = child->sibling) { + child->marked = 0; + work_heap_insert_internal(child); + } + + while (work_heap) { + entry = work_heap; + if (entry == entry->prev) { + work_heap = NULL; + } else { + work_heap = entry->next; + entry->next->prev = entry->prev; + entry->prev->next = entry->next; + } + x = entry->item; + free(entry); + while ((y = ranks[x->rank])) { + ranks[x->rank] = NULL; + if (work_item_cmp(x, y) > 0) { + t = x; + x = y; + y = t; + } + y->sibling = x->child; + x->child = y; + x->rank++; + } + ranks[x->rank] = x; + } + + for (i = 0; i < 55; i++) + if (ranks[i]) + work_heap_insert_internal(ranks[i]); + + return item; +} + +static void work_insert_stride(struct commit *commit, int stride, + int steps, int depth, struct commit **ladder) +{ + struct work_item *item = xmalloc(sizeof(struct work_item)); + check_parsed_and_mark(commit); + item->commit = commit; + item->stride = stride; + item->steps = steps; + item->depth = depth; + item->bisect = 0; + item->ladder = xmalloc(32*sizeof(struct commit *)); + if (ladder) + memcpy(item->ladder, ladder, 32*sizeof(struct commit *)); + work_heap_insert(item); +} + +static void work_insert_bisect(struct commit *commit, int bisect) +{ + struct work_item *item = xmalloc(sizeof(struct work_item)); + item->commit = commit; + item->stride = 0; + item->steps = 0; + item->depth = 0; + item->bisect = bisect; + item->ladder = NULL; + work_heap_insert(item); +} + +static void work_insert_strides(struct commit_list *commits, + int stride, int steps, int depth, int depth2, + struct commit **ladder, + struct work_item *reusable_item) +{ + struct work_item *item = reusable_item; + + while (commits) { + if (!(commits->item->object.flags &= SEEN)) { + if (!item) + item = xmalloc(sizeof(struct work_item)); + + item->commit = commits->item; + item->stride = stride; + item->steps = steps; + item->depth = depth; + item->bisect = 0; + item->ladder = xmalloc(32*sizeof(struct commit *)); + if (ladder) + memcpy(item->ladder, ladder, 32*sizeof(struct commit *)); + work_heap_insert(item); + } + commits = commits->next; + item = NULL; + depth = depth2; + } + + if (item) + work_item_free(item); +} + +static void work_insert_bisects(struct commit_list *commits, + int bisect, + struct work_item *reusable_item) +{ + struct work_item *item = reusable_item; + + while (commits) { + if (!item) + item = xmalloc(sizeof(struct work_item)); + + item->commit = commits->item; + item->stride = 0; + item->steps = 0; + item->depth = 0; + item->bisect = bisect; + work_heap_insert(item); + + commits = commits->next; + item = NULL; + } + + if (item) + work_item_free(item); +} + +static int work_heap_insert_ref(const char *path, const unsigned char *sha1, + int flag, void *cb_data) +{ + struct object *o = deref_tag(parse_object(sha1), path, 0); + + if (o && o->type == OBJ_COMMIT && !(o->flags & SEEN)) + work_insert_stride((struct commit *)o, 0, 0, 0, NULL); + + return 0; +} + + +/* + * Marks all (known) children of this commit NOTCOMMON. We call this + * for all sha1s the server did not ACK. + */ + +static void mark_not_common(struct commit *commit) +{ + struct commit_info *info = get_commit_info(commit); + struct commit_list *child; + + assert(!(commit->object.flags & COMMON)); + + if (commit->object.flags & NOTCOMMON) + return; + + commit->object.flags |= NOTCOMMON; + + for (child = info->children; child; child = child->next) + mark_not_common(child->item); +} + +/* + * Similar for the ACKed ones. We have a _new() version because the + * old one interacts with the work queue. + */ + +static void mark_common_new(struct commit *commit, int ancestors_only, int dont_parse) +{ + struct commit_list *parents; + struct object *o; + + if (commit == NULL || commit->object.flags & (COMMON_REF|COMMON)) + return; + + if (!dont_parse && !(commit->object.parsed)) + check_parsed_and_mark(commit); + + o = (struct object *)commit; + + if (!ancestors_only) + o->flags |= COMMON; + + for (parents = commit->parents; parents; parents = parents->next) { + add_child(parents->item, commit); + mark_common_new(parents->item, 0, dont_parse); + } +} + + + /* Get the next rev to send, ignoring the common. */ @@ -156,6 +565,330 @@ static struct commit* get_rev(void) return commit; } + +int flag_in_list(int flag, struct commit_list *list) +{ + while (list) { + if (list->item->object.flags & flag) + return 1; + list = list->next; + } + return 0; +} + + +/* + * Find the second bit set in x, similar to ffs() from string.h. + * Assumes that x>0. + */ + +int fss(int x) +{ + return ffs(x - (1<<(ffs(x)-1))); +} + +/* + * The exponential-stride algorithm, part 1 and 1.5. + * + * We take exponential strides through history, that is we emit the + * 1st commit, the 2nd, the 4th, etc. But we also simultaneously + * build pointers that help us bisect later: Every commit has a + * bisect_forward and bisect_backward list (if history is linear, both + * have at most 1 element) that tells us where to continue bisection + * after this. + * + * To achieve this with good running time, we think of the commits in + * a line of history as being numbered in binary: + * + * o 1000 + * o 111 + * o 110 <- current [1] + * o 101 [0] + * t | o 100 [2] + * i | o 11 + * m | o 10 + * e v o 1 + * + * Every commit of the form 10...0 is emitted, since it is an + * exponential step. Between that, we want to have + * (100).bisect_backward=[110], (110).bisect_forward=[101], + * (110).bisect_backward=[111], etc. Define the lowest bit set (as + * per ffs()-1) in a commit as its "order", then we can achieve this + * by keeping track of the last commit of every order (marked with [n] + * in the diagram) as we go through history. Then [n] must be on + * [n+1].bisect_backward and [n-1] must be on [n].bisect_forward. + * + * Similarly, we track the boundaries forward_bound and + * backward_bound, which indicate the furthest commit(s) that can be + * reached by bisecting this one. We use them to detect when it + * becomes pointless to pursue this branch of the bisection because + * the far end is already COMMON (NOTCOMMON) in forward (backward) + * mode. In the scheme chosen, a current commit [n] numbered N is a + * backward boundary of [i] for all i=0..n, and [fss(N)-1] is a + * forward boundary of the current [n]. + * + * The tough problem is if we stop (because we hit a root, or a part + * of history that we've already seen) at a non-exp-stride commit. + * Then we must ensure that all history up to the current commit can + * be reached by bisecting. Consider + * + * o 1000 + * o 111 + * o 110 + * o 101 <- current [0] + * t | o 100 [2] + * i | o 11 + * m | o 10 [1] + * e v o 1 + * + * then stopping at the commit 101 without any corrective measures + * would leave it unreachable because the intermediate step 110 has + * not been seen yet. + * + * Observe that any given commit x10z, where x is arbitrary and z only + * zeros, has outgoing pointers to x01z (forward) and x11z (backward). + * Conversely this means x01z needs an incoming forward pointer from + * x10z, even if we haven't seen it yet. So there are two cases: + * + * SEEN CASE: We stopped because we hit a SEEN commit. Assume that + * the last commit not SEEN is numbered x1y1z, where x is arbitrary + * and y and z are only zeros (z possibly empty). Then unless y is + * empty, x1y'01z needs an incoming forward pointer from x1y'10z + * (where y'0=y), which we haven't seen yet. Repeat until y is empty. + * Note that during this repetition, we might hit a root commit. + * + * ROOT CASE: We cannot fix this with forward pointers since there are + * no commits left to put them in. Assume that the root commit is + * numbered x1y1z as in the SEEN case. Then we fix by pointing a + * backwards pointer from x1y'00z to x1y'01z, then iterate with + * x1y'00z. Note that x1y'00z is the last seen commit of order + * fss(x1y1z). + * + * Note that we only need to bisect (first level) exp-stride commits + * in one direction. We choose backward. This choice needs to be + * consistent among get_rev_new_stride() and get_rev_new(), and it + * happens to match the faster code path of get_rev_new_bisect(). + */ + +static struct commit *get_rev_new_stride(struct work_item *item) +{ + struct commit_info *info; + struct commit *commit = item->commit; + int steps = item->steps; + int stride = item->stride; + int depth = item->depth; + int order = -1; + struct commit **ladder = item->ladder; + int i; + /* The next two are for the pointer-fixing iterations */ + int order2; + struct commit *commit2; + + while (!(commit->object.flags & SEEN)) { + steps++; + order = ffs(steps)-1; + + commit->object.flags |= SEEN; + + if (order > 0 && stride == order) { + info = get_commit_info(ladder[order-1]); + commit_list_insert(commit, &info->backward_bound); + for (i = order-2; i>=0; i--) { + info = get_commit_info(ladder[i]); + commit_list_insert(commit, &info->backward_bound); + } + } + + ladder[order] = commit; + + if (stride == order) { + /* this is precisely a 2**k'th commit, emit it */ + work_insert_strides(commit->parents, stride+1, steps, + depth, depth+1, ladder, item); + return commit; + } + + /* + * We are the next step in the higher-order commit's + * backward (in time) bisection + */ + info = get_commit_info(ladder[order+1]); + commit_list_insert(commit, &info->bisect_backward); + info = get_commit_info(commit); + commit_list_insert(ladder[fss(steps)-1], &info->forward_bound); + + if (order > 0) { + /* + * The lower-order commit is our next step for + * forward (in time) bisection + */ + info = get_commit_info(commit); + commit_list_insert(ladder[order-1], &info->bisect_forward); + for (i = order-1; i >= 0; i--) { + info = get_commit_info(ladder[i]); + commit_list_insert(commit, &info->backward_bound); + } + } + + if (commit->parents) { + work_insert_strides(commit->parents->next, stride, + steps, depth+1, depth+1, ladder, + NULL); + commit = commit->parents->item; + check_parsed_and_mark(commit); + } else { + /* ROOT CASE: fix missing pointers */ + order2 = fss(steps)-1; + commit2 = commit; + while (order != order2-1 && order2 > 0) { + info = get_commit_info(ladder[order2]); + commit_list_insert(commit2, &info->bisect_backward); + info = get_commit_info(commit2); + commit_list_insert(ladder[order2], &info->forward_bound); + commit2 = ladder[order2]; + steps -= 1 << order; + order = order2; + order2 = fss(steps)-1; + } + work_item_free(item); + return commit; + } + } + + if (order == -1) { + /* this just means we hit a SEEN commit immediately */ + work_item_free(item); + return NULL; + } + + /* SEEN CASE: fix missing pointers w.r.t. the _last_ commit + * before this */ + + /* unless/until we hit a root, 'commit' tracks the steps + * backward through the SEEN commits that we stick the + * pointers in. 'order2' only serves to figure out if the 'y' + * (see comment at function head) of 'commit' is empty. */ + order2 = fss(steps)-1; + assert(order2 > order || order2 == 0); + while (order != order2-1 && order2 > 0) { + int i; + /* skip ahead to the next commit2 of order 'order+1' */ + for (i = (1<<order); i > 0; i--) { + steps++; + assert (steps < (1<<stride)); + if (commit->parents) { + commit = commit->parents->item; + check_parsed_and_mark(commit); + continue; + } + /* we hit a root commit! use the root strategy + * for the rest */ + order2 = fss(steps)-1; + commit2 = commit; + while (order != order2-1 && order2 > 0) { + info = get_commit_info(ladder[order2]); + commit_list_insert(commit2, &info->bisect_backward); + info = get_commit_info(commit2); + commit_list_insert(ladder[order2], &info->forward_bound); + commit2 = ladder[order2]; + steps -= 1 << order; + order = order2; + order2 = fss(steps)-1; + } + /* short-cut out of the big loop */ + work_item_free(item); + return NULL; + } + info = get_commit_info(commit); + commit_list_insert(ladder[order], &info->bisect_forward); + for (i = order; i >= 0; i--) { + info = get_commit_info(ladder[i]); + commit_list_insert(commit, &info->backward_bound); + } + order++; + /* this has not changed order2. we still keep track + * of 'steps' carefully in case we hit a root + * commit. */ + } + + work_item_free(item); + return NULL; +} + +/* + * The exponential-stride algorithm, part 2. + * + * Bisecting is now easy: just follow the forward/backward pointers, + * stopping when we already know enough to not search any further. + */ + +static struct commit *get_rev_new_bisect(struct work_item *item) +{ + struct commit *commit = item->commit; + struct commit_info *info = get_commit_info(commit); + int flags = commit->object.flags; + + if (!(flags & (BISECTED_FW|NOTCOMMON)) + && !flag_in_list(COMMON, info->forward_bound)) { + commit->object.flags |= BISECTED_FW; + work_insert_bisects(info->bisect_forward, item->bisect+1, NULL); + if (!(flags & (POPPED|COMMON))) { + /* re-queue for backward bisection */ + work_heap_insert(item); + item = NULL; + return commit; + } + } + + if (!(flags & (BISECTED_BW|COMMON)) + && !flag_in_list(NOTCOMMON, info->backward_bound)) { + commit->object.flags |= BISECTED_BW; + work_insert_bisects(info->bisect_backward, item->bisect+1, item); + item = NULL; + if (!(flags & (POPPED|NOTCOMMON))) + return commit; + } + + if (item) + free(item); + return NULL; +} + + +static struct commit* get_rev_new(void) +{ + struct commit *commit = NULL; + struct work_item *item; + + while (commit == NULL) { + item = work_heap_pop(); + if (!item) + return NULL; + check_parsed_and_mark(item->commit); + if (item->bisect) { + commit = get_rev_new_bisect(item); + } else { + commit = get_rev_new_stride(item); + if (commit) { + /* there's no point in bisecting the + * first level both ways */ + commit->object.flags |= BISECTED_FW; + work_insert_bisect(commit, 1); + } + } + } + + commit->object.flags |= POPPED; + return commit; +} + +static void pop_outstanding(void) +{ + struct commit_list *item = outstanding; + outstanding = item->next; + free(item); +} + /* * Send 'have' for the next batch of revisions. Returns 0 if we ran * out of commits to send, 1 otherwise. @@ -167,7 +900,10 @@ static int send_have_lines(int fd[2], int *flushes, unsigned *in_vain) int i; for (i = 0; i < 32; i++) { - commit = get_rev(); + if (exp_stride_algorithm) + commit = get_rev_new(); + else + commit = get_rev(); if (!commit) return 0; packet_write(fd[1], "have %s\n", @@ -175,6 +911,14 @@ static int send_have_lines(int fd[2], int *flushes, unsigned *in_vain) if (args.verbose) fprintf(stderr, "have %s\n", sha1_to_hex(commit->object.sha1)); + + if (outstanding) { + commit_list_insert(commit, &(outstanding_end->next)); + outstanding_end = outstanding_end->next; + } else { + commit_list_insert(commit, &outstanding); + outstanding_end = outstanding; + } } packet_flush(fd[1]); *flushes += 1; @@ -195,7 +939,10 @@ static int find_common(int fd[2], unsigned char *result_sha1, for_each_ref(clear_marks, NULL); marked = 1; - for_each_ref(rev_list_insert_ref, NULL); + if (exp_stride_algorithm) + for_each_ref(work_heap_insert_ref, NULL); + else + for_each_ref(rev_list_insert_ref, NULL); fetching = 0; for ( ; refs ; refs = refs->next) { @@ -218,7 +965,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, } if (!fetching) - packet_write(fd[1], "want %s%s%s%s%s%s%s%s\n", + packet_write(fd[1], "want %s%s%s%s%s%s%s%s%s\n", sha1_to_hex(remote), (multi_ack ? " multi_ack" : ""), (use_sideband == 2 ? " side-band-64k" : ""), @@ -226,6 +973,7 @@ static int find_common(int fd[2], unsigned char *result_sha1, (args.use_thin_pack ? " thin-pack" : ""), (args.no_progress ? " no-progress" : ""), (args.include_tag ? " include-tag" : ""), + (exp_stride_algorithm ? " no-giveup" : ""), " ofs-delta"); else packet_write(fd[1], "want %s\n", sha1_to_hex(remote)); @@ -294,12 +1042,31 @@ static int find_common(int fd[2], unsigned char *result_sha1, } else if (ack == 2) { struct commit *commit = lookup_commit(result_sha1); - mark_common(commit, 0, 1); + if (exp_stride_algorithm) { + while (commit != outstanding->item) { + if (args.verbose) + fprintf(stderr, "unwinding: %s\n", sha1_to_hex(outstanding->item->object.sha1)); + mark_not_common(outstanding->item); + pop_outstanding(); + unwound++; + } + pop_outstanding(); + unwound++; + mark_common_new(commit, 0, 1); + } else { + mark_common(commit, 0, 1); + } retval = 0; in_vain = 0; got_continue = 1; } } while (ack); + while (exp_stride_algorithm && unwound++ < 32) { + if (args.verbose) + fprintf(stderr, "unwinding: %s\n", sha1_to_hex(outstanding->item->object.sha1)); + mark_not_common(outstanding->item); + pop_outstanding(); + } flushes--; if (got_continue && MAX_IN_VAIN < in_vain) { if (args.verbose) @@ -467,7 +1234,10 @@ static int everything_local(struct ref **refs, int nr_match, char **match) if (!(o->flags & SEEN)) { rev_list_push((struct commit *)o, COMMON_REF | SEEN); - mark_common((struct commit *)o, 1, 1); + if (exp_stride_algorithm) + mark_common_new((struct commit *)o, 1, 1); + else + mark_common((struct commit *)o, 1, 1); } } @@ -617,6 +1387,11 @@ static struct ref *do_fetch_pack(int fd[2], fprintf(stderr, "Server supports side-band\n"); use_sideband = 1; } + if (server_supports("no-giveup")) { + if (args.verbose) + fprintf(stderr, "Server supports no-giveup\n"); + exp_stride_algorithm = 1; + } if (everything_local(&ref, nr_match, match)) { packet_flush(fd[1]); goto all_done; diff --git a/upload-pack.c b/upload-pack.c index e5adbc0..8f05dbf 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -37,6 +37,7 @@ static unsigned int timeout; */ static int use_sideband; static int debug_fd; +static int disable_give_up; static void reset_timeout(void) { @@ -414,7 +415,7 @@ static int get_common_commits(void) if (!prefixcmp(line, "have ")) { switch (got_sha1(line+5, sha1)) { case -1: /* they have what we do not */ - if (multi_ack && ok_to_give_up()) + if (multi_ack && !disable_give_up && ok_to_give_up()) packet_write(1, "ACK %s continue\n", sha1_to_hex(sha1)); break; @@ -501,6 +502,8 @@ static void receive_needs(void) no_progress = 1; if (strstr(line+45, "include-tag")) use_include_tag = 1; + if (strstr(line+45, "no-giveup")) + disable_give_up = 1; /* We have sent all our refs already, and the other end * should have chosen out of them; otherwise they are @@ -573,7 +576,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo { static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow no-progress" - " include-tag"; + " include-tag no-giveup"; struct object *o = parse_object(sha1); if (!o) -- tg: (73ef856..) t/fp-speedup (depends on: origin/master t/fp-refactor) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC] log(n)-transmissions common commit handshake 2008-09-17 23:00 [RFC] log(n)-transmissions common commit handshake Thomas Rast 2008-09-17 23:01 ` [RFC PATCH] fetch-pack: log(n)-transmission find_common() Thomas Rast @ 2008-09-18 8:18 ` Thomas Rast 1 sibling, 0 replies; 13+ messages in thread From: Thomas Rast @ 2008-09-18 8:18 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1446 bytes --] I wrote: > * Impact/clever use of refs > > For some reason, current git sends all refs to the server, even if > the server should already know about lots of them. For example, in > git.git, emitting v1.6.0.2 covers almost all tags in the repository > by simple ancestor scanning. > > Is there a reason for this behaviour? Otherwise it would be better > to emit them in date order and intelligently handle commons. (In > fact this does not depend on the discussed change.) As an addendum, I think the following would be a good way to cleverly use refs to reduce work: Cache a "reduced" DAG which just maps the ref'd commit relationships, i.e., shows the reachability of refs only. This needs to be written out to disk between invocations. At the start of the protocol, the server announces all its refs. We can use the reduced DAG to infer the minimal set of ref heads we need to announce to have the server know all common ones. We can also mark all the other refs as "common but not announced yet", so that the backwards marking and searching routines know to stop there. This should reduce the number of refs listed back to the server to only a handful, and at the same time, stop the client from searching backwards through _all_ history (which can take a bit of time, and is one of the weaknesses of my proposal) in most cases. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-12-06 23:22 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-09-17 23:00 [RFC] log(n)-transmissions common commit handshake Thomas Rast 2008-09-17 23:01 ` [RFC PATCH] fetch-pack: log(n)-transmission find_common() Thomas Rast 2008-09-24 0:48 ` [RFC PATCH v2] " Thomas Rast 2008-10-23 19:38 ` Thomas Rast 2008-10-24 15:49 ` Nicolas Pitre 2008-10-27 10:29 ` Nanako Shiraishi 2008-10-28 3:24 ` Junio C Hamano 2008-10-28 14:46 ` Nicolas Pitre 2008-10-28 19:37 ` Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 0/2] " Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 1/2] fetch-pack: rearrange main loop Thomas Rast 2008-12-06 23:20 ` [RFC PATCH v3 2/2] fetch-pack: log(n)-transmission find_common() Thomas Rast 2008-09-18 8:18 ` [RFC] log(n)-transmissions common commit handshake Thomas Rast
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).