All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
@ 2013-07-11 22:01 Matthijs Kooijman
  2013-07-11 22:53 ` Junio C Hamano
  2013-08-08  4:50 ` Duy Nguyen
  0 siblings, 2 replies; 30+ messages in thread
From: Matthijs Kooijman @ 2013-07-11 22:01 UTC (permalink / raw)
  To: git

Hi folks,

while playing with shallow fetches, I've found that in some
circumstances running git fetch with --depth can return too many objects
(in particular, _all_ the objects for the requested revisions are
returned, even when some of those objects are already known to the
client).

This happens when a client issues a fetch with a depth bigger or equal
to the number of commits the server is ahead of the client. In this
case, the revisions to be sent over will be completely detached from any
revisions the client already has (history-wise), causing the server to
effectively ignore all objects the client has (as advertised using its
have lines) and just send over _all_ objects (needed for the revisions
it is sending over).

I've traced this down to the way do_rev_list in upload-pack.c works. If
I've poured over the code enough to understand it, this is what happens:
 - The new shallow roots are made into graft points without parents.
 - The "want" commits are added to the pending list (revs->pending)
 - The "have" commits are marked uninteresting and added to the pending list
 - prepare_revision_walk is called, which adds everything from the
   pending list into the commmit list (revs->commits)
 - limit_list is called, which traverses the history of each interesting
   commit in the commit list (i.e., all want revisions), up to excluding
   the first uninteresting commit (i.e. a have revision). The result of
   this is the new commit list.

   This means the commit list now contains all commits that the client
   wants, up to (excluding) any commits he already has or up to
   (including) any (new) shallow roots.
 - mark_edges_uninteresting is called, which marks the tree of every
   parent of each edge in the commit list as uninteresting (in practice,
   this marks the tree of each uninteresting parent, since those are by
   definition the only kinds of revisions that can be beyond the edge).
 - All trees and blobs that are referenced by trees in the commit list
   but are not marked as uninteresting, are passed to git-pack-objects
   to put into the pack.

Normally, the list of commits to send over is connected to the
client's existing commits (which are marked as uninteresting). This
means that only the trees of those uninteresting ("have") commits that
are actually (direct) predecessors of the commits to send over are
marked as uninteresting. This is probably useful, since it prevents
having to go over all trees the client has (for other branches, for
example) and instead limits to the trees that are the most likely to
contain duplicate (or similar, for delta-ing) objects.

However, in the "detached shallow fetch" case, this assumption is no
longer valid. There will be no uninteresting commits as parents for
the commit list, since all edge commits will be shallow roots (hence
have no parents).  Ideally, one would find out which of the "detached"
"have" revisions are the closest to the new shallow roots, but with the
current code these shallow roots have their parents cut off long before
this code even runs, so this is probably not feasible.

Instead, what we can do in this case, is simply mark the trees of all
"have" commits as uninteresting. This prevents all objects that are
contained in the "have" commits themselves from being sent to the
client, which can be a big win for bigger repositories. Marking them all
is is probably more work than strictly needed, but is easy to implement.

I have created a mockup patch which does this, and also adds a test case
demonstrating the problem. Right now, the above fix is applied always,
even in cases where it isn't needed.

Looking at the code, I think it would be good to let
mark_edges_uninteresting look for shallow roots in the commit list (or
perhaps just add another loop over the commit list inside do_rev_list)
and only apply the fix if any shallow roots are in the commit list
(meaning at least a part of the history to send over is detached from
the clients current history). I haven't implemented this yet, wanting to
get some feedback first.

Also, I'm not quite sure how this fits in with the concept of "thin
packs". There might be some opportunities missing here as well, though
git-pack-objects is called without --thin when shallow roots are
involved. I think this is related to the "-" prefixed commit sha's that
are sent to git-pack-objects, but I couldn't found any documentation on
what the - prefix is supposed to mean.

(On a somewhat related note, show_commit in upload-pack.c checks the
BOUNDARY flag, but AFAICS the revs->boundary flag is never set, so
BOUNDARY cannot ever be set in this case either?)

How does this patch look?

Gr.

Matthijs

---
 t/t5500-fetch-pack.sh | 11 +++++++++++
 upload-pack.c         |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index fd2598e..a022d65 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -393,6 +393,17 @@ test_expect_success 'fetch in shallow repo unreachable shallow objects' '
 		git fsck --no-dangling
 	)
 '
+test_expect_success 'fetch creating new shallow root' '
+	(
+		git clone "file://$(pwd)/." shallow10 &&
+		git commit --allow-empty -m empty &&
+		cd shallow10 &&
+		git fetch --depth=1 --progress 2> actual &&
+		# This should fetch only the empty commit, no tree or
+		# blob objects
+		grep "remote: Total 1" actual
+	)
+'
 
 test_expect_success 'setup tests for the --stdin parameter' '
 	for head in C D E F
diff --git a/upload-pack.c b/upload-pack.c
index 59f43d1..5885f33 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void *user_data)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	mark_edges_uninteresting(revs.commits, &revs, show_edge);
+	/* In case we create a new shallow root, make sure that all
+	 * we don't send over objects that the client already has just
+	 * because their "have" revisions are no longer reachable from
+	 * the shallow root. */
+	for (i = 0; i < have_obj.nr; i++) {
+		struct commit *commit = (struct commit *)have_obj.objects[i].item;
+		mark_tree_uninteresting(commit->tree);
+	}
 	if (use_thin_pack)
 		for (i = 0; i < extra_edge_obj.nr; i++)
 			fprintf(pack_pipe, "-%s\n", sha1_to_hex(
-- 
1.8.3.2.736.ge92bb95.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-07-11 22:01 [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
@ 2013-07-11 22:53 ` Junio C Hamano
  2013-07-12  7:11   ` Matthijs Kooijman
  2013-08-08  4:50 ` Duy Nguyen
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2013-07-11 22:53 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: git

Matthijs Kooijman <matthijs@stdin.nl> writes:

[administrivia: you seem to have mail-followup-to that points at you
and the list; is that really needed???]

> This happens when a client issues a fetch with a depth bigger or equal
> to the number of commits the server is ahead of the client.

Do you mean "smaller" (not "bigger")?

> diff --git a/upload-pack.c b/upload-pack.c
> index 59f43d1..5885f33 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void *user_data)
>  	if (prepare_revision_walk(&revs))
>  		die("revision walk setup failed");
>  	mark_edges_uninteresting(revs.commits, &revs, show_edge);
> +	/* In case we create a new shallow root, make sure that all
> +	 * we don't send over objects that the client already has just
> +	 * because their "have" revisions are no longer reachable from
> +	 * the shallow root. */
> +	for (i = 0; i < have_obj.nr; i++) {
> +		struct commit *commit = (struct commit *)have_obj.objects[i].item;
> +		mark_tree_uninteresting(commit->tree);
> +	}

Hmph.

In your discussion (including the comment), you talk about "shallow
root" (I think that is the same as what we call "shallow boundary"),
but in this added block, there is nothing that checks CLIENT_SHALLOW
or SHALLOW flags to special case that.

Is it a good idea to unconditionally do this for all "have"
revisions?

Also there is another loop that iterates over "have" revisions just
above the precontext.  I wonder if this added code belongs in that
loop.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-07-11 22:53 ` Junio C Hamano
@ 2013-07-12  7:11   ` Matthijs Kooijman
  2013-08-07 10:27     ` Matthijs Kooijman
  0 siblings, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-07-12  7:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

> [administrivia: you seem to have mail-followup-to that points at you
> and the list; is that really needed???]
I'm not subscribed to the list, so yes :-)

> > This happens when a client issues a fetch with a depth bigger or equal
> > to the number of commits the server is ahead of the client.
> 
> Do you mean "smaller" (not "bigger")?
Yes, I meant smaller (reworded this first sentence a few times and then messed
up :-)

> > diff --git a/upload-pack.c b/upload-pack.c
> > index 59f43d1..5885f33 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void *user_data)
> >  	if (prepare_revision_walk(&revs))
> >  		die("revision walk setup failed");
> >  	mark_edges_uninteresting(revs.commits, &revs, show_edge);
> > +	/* In case we create a new shallow root, make sure that all
> > +	 * we don't send over objects that the client already has just
> > +	 * because their "have" revisions are no longer reachable from
> > +	 * the shallow root. */
> > +	for (i = 0; i < have_obj.nr; i++) {
> > +		struct commit *commit = (struct commit *)have_obj.objects[i].item;
> > +		mark_tree_uninteresting(commit->tree);
> > +	}
> 
> Hmph.
> 
> In your discussion (including the comment), you talk about "shallow
> root" (I think that is the same as what we call "shallow boundary"),
I think so, yes. I mean to refer to the commits referenced in
.git/shallow, that have their parents "hidden".

> but in this added block, there is nothing that checks CLIENT_SHALLOW
> or SHALLOW flags to special case that.
>
> Is it a good idea to unconditionally do this for all "have"
> revisions?
That's what I meant in my mail with "applying the fix unconditionally" -
there is probably some check needed (I discussed a few options in the
mail as well).

Note that this entire do_rev_list function is only called when there are
shallow revisions involved, so there is also a basic "only when shallow"
check in place.

> Also there is another loop that iterates over "have" revisions just
> above the precontext.  I wonder if this added code belongs in that
> loop.
I think we could add it there, yes. On the other hand, if we only want
to execute this code when there are shallow boundaries in the list of
revisions to send (as I suggested in my previous mail), then we can't
move this code up.

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-07-12  7:11   ` Matthijs Kooijman
@ 2013-08-07 10:27     ` Matthijs Kooijman
  2013-08-08  1:01       ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-08-07 10:27 UTC (permalink / raw)
  To: Junio C Hamano, git

Hi Junio,

I haven't got a reply to my mail yet. Could you have a look, so I can
update and resubmit my patch?

On Fri, Jul 12, 2013 at 09:11:57AM +0200, Matthijs Kooijman wrote:
> > [administrivia: you seem to have mail-followup-to that points at you
> > and the list; is that really needed???]
> > In your discussion (including the comment), you talk about "shallow
> > root" (I think that is the same as what we call "shallow boundary"),
> I think so, yes. I mean to refer to the commits referenced in
> .git/shallow, that have their parents "hidden".
Could you confirm that I got the terms right here (or is the shallow
boundary the first hidden commit?)

> > but in this added block, there is nothing that checks CLIENT_SHALLOW
> > or SHALLOW flags to special case that.
> >
> > Is it a good idea to unconditionally do this for all "have"
> > revisions?
> That's what I meant in my mail with "applying the fix unconditionally" -
> there is probably some check needed (I discussed a few options in the
> mail as well).
>
> Note that this entire do_rev_list function is only called when there are
> shallow revisions involved, so there is also a basic "only when shallow"
> check in place.

My proposal was to only apply the fix for all have revisions when the
previous history traversal came across some shallow boundary commits. If
this happens, then that shallow boundary commit will be a "new" one and
it will have prevented the history traversal from finding the full list
of relevant "have" commits. In this case, we should just use all "have"
commits instead.

Now, looking at the code, I see a few options for detecting this case:

 1 Modify mark_edges_uninteresting to return a boolean (or have an
   output argument) if any of the commits in the list of commits to find
   (not the edges) is a shallow boundary.
 2 Modify mark_edges_uninteresting to have a "show_shallow" argument
   that gets called for every shallow boundary. The show_shallow
   function passed would then simply keep a boolean if it is passed at
   least once.
 3 Add another loop over the commits _after_ the call to
   mark_edges_uninteresting, that simply looks for any shallow boundary
   commit.

The last option seems sensible to me, since it prevents modifying the
somewhat generic mark_edges_uninteresting function for this specific
usecase. On the other hand, it does mean that the list of commits is
looped twice, not sure what that means for performance.

Before I go and implement one of these, which option seems best to you?

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-07 10:27     ` Matthijs Kooijman
@ 2013-08-08  1:01       ` Junio C Hamano
  2013-08-08  1:09         ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2013-08-08  1:01 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: git

Matthijs Kooijman <matthijs@stdin.nl> writes:

>> > In your discussion (including the comment), you talk about "shallow
>> > root" (I think that is the same as what we call "shallow boundary"),
>> I think so, yes. I mean to refer to the commits referenced in
>> .git/shallow, that have their parents "hidden".
> Could you confirm that I got the terms right here (or is the shallow
> boundary the first hidden commit?)

As long as you are consistent it is fine. I _think_ boundary refers
to what is recorded in the .git/shallow file, so they are commits
that are missing from our repository, and their immediate children
are available.

> My proposal was to only apply the fix for all have revisions when the
> previous history traversal came across some shallow boundary commits. If
> this happens, then that shallow boundary commit will be a "new" one and
> it will have prevented the history traversal from finding the full list
> of relevant "have" commits. In this case, we should just use all "have"
> commits instead.
>
> Now, looking at the code, I see a few options for detecting this case:
>
>  1 Modify mark_edges_uninteresting to return a boolean (or have an
>    output argument) if any of the commits in the list of commits to find
>    (not the edges) is a shallow boundary.
>  2 Modify mark_edges_uninteresting to have a "show_shallow" argument
>    that gets called for every shallow boundary. The show_shallow
>    function passed would then simply keep a boolean if it is passed at
>    least once.
>  3 Add another loop over the commits _after_ the call to
>    mark_edges_uninteresting, that simply looks for any shallow boundary
>    commit.
>
> The last option seems sensible to me, since it prevents modifying the
> somewhat generic mark_edges_uninteresting function for this specific
> usecase. On the other hand, it does mean that the list of commits is
> looped twice, not sure what that means for performance.
>
> Before I go and implement one of these, which option seems best to you?

My gut feeling without looking at any patch is that the simplest
(i.e. 3.) would be the best among these three.

But I suspect, with any of these approaches, you would need to be
very careful futzing with the edge ones.  It may have an interesting
interactions with --thin transfer.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08  1:01       ` Junio C Hamano
@ 2013-08-08  1:09         ` Duy Nguyen
  2013-08-08  6:39           ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2013-08-08  1:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthijs Kooijman, Git Mailing List

On Thu, Aug 8, 2013 at 8:01 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Matthijs Kooijman <matthijs@stdin.nl> writes:
>
>>> > In your discussion (including the comment), you talk about "shallow
>>> > root" (I think that is the same as what we call "shallow boundary"),
>>> I think so, yes. I mean to refer to the commits referenced in
>>> .git/shallow, that have their parents "hidden".
>> Could you confirm that I got the terms right here (or is the shallow
>> boundary the first hidden commit?)
>
> As long as you are consistent it is fine. I _think_ boundary refers
> to what is recorded in the .git/shallow file, so they are commits
> that are missing from our repository, and their immediate children
> are available.

Haven't found time to read the rest yet, but this I can answer.
.git/shallow records graft points. If a commit is in .git/shallow and
it exists in the repository, the commit is considered to have no
parents regardless of what's recorded in repository. So .git/shallow
refers to the new roots, not the missing bits.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-07-11 22:01 [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
  2013-07-11 22:53 ` Junio C Hamano
@ 2013-08-08  4:50 ` Duy Nguyen
  2013-08-08  6:51   ` Junio C Hamano
  1 sibling, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2013-08-08  4:50 UTC (permalink / raw)
  To: Matthijs Kooijman, Git Mailing List

On Fri, Jul 12, 2013 at 5:01 AM, Matthijs Kooijman <matthijs@stdin.nl> wrote:
> Hi folks,
>
> while playing with shallow fetches, I've found that in some
> circumstances running git fetch with --depth can return too many objects
> (in particular, _all_ the objects for the requested revisions are
> returned, even when some of those objects are already known to the
> client).
>
> This happens when a client issues a fetch with a depth bigger or equal
> to the number of commits the server is ahead of the client. In this
> case, the revisions to be sent over will be completely detached from any
> revisions the client already has (history-wise), causing the server to
> effectively ignore all objects the client has (as advertised using its
> have lines) and just send over _all_ objects (needed for the revisions
> it is sending over).
>
> I've traced this down to the way do_rev_list in upload-pack.c works. If
> I've poured over the code enough to understand it, this is what happens:
>  - The new shallow roots are made into graft points without parents.
>  - The "want" commits are added to the pending list (revs->pending)
>  - The "have" commits are marked uninteresting and added to the pending list
>  - prepare_revision_walk is called, which adds everything from the
>    pending list into the commmit list (revs->commits)
>  - limit_list is called, which traverses the history of each interesting
>    commit in the commit list (i.e., all want revisions), up to excluding
>    the first uninteresting commit (i.e. a have revision). The result of
>    this is the new commit list.
>
>    This means the commit list now contains all commits that the client
>    wants, up to (excluding) any commits he already has or up to
>    (including) any (new) shallow roots.
>  - mark_edges_uninteresting is called, which marks the tree of every
>    parent of each edge in the commit list as uninteresting (in practice,
>    this marks the tree of each uninteresting parent, since those are by
>    definition the only kinds of revisions that can be beyond the edge).
>  - All trees and blobs that are referenced by trees in the commit list
>    but are not marked as uninteresting, are passed to git-pack-objects
>    to put into the pack.
>
> Normally, the list of commits to send over is connected to the
> client's existing commits (which are marked as uninteresting). This
> means that only the trees of those uninteresting ("have") commits that
> are actually (direct) predecessors of the commits to send over are
> marked as uninteresting. This is probably useful, since it prevents
> having to go over all trees the client has (for other branches, for
> example) and instead limits to the trees that are the most likely to
> contain duplicate (or similar, for delta-ing) objects.
>
> However, in the "detached shallow fetch" case, this assumption is no
> longer valid. There will be no uninteresting commits as parents for
> the commit list, since all edge commits will be shallow roots (hence
> have no parents).  Ideally, one would find out which of the "detached"
> "have" revisions are the closest to the new shallow roots, but with the
> current code these shallow roots have their parents cut off long before
> this code even runs, so this is probably not feasible.

I think this applies to general case as well, not just shallow.
Imagine I have a disconnected commit that points to the latest tree
(i.e. it contains most of latest changes). Because it's disconnected,
it'll be ignored by the server side. But if the servide side does
mark_tree_interesting on this commit, a bunch of blobs might be
excluded from sending. I used to (ab)use git and store a bunch of tags
point to trees. These trees share a lot. Still, fetching a new tag
means pulling all objects of the new tree even though it only needs a
few new blobs and trees. So perhaps we could go over have_obj list
again, if it's not processed and is

 - a tree-ish, mark_tree_uninteresting
 - a blob, just mark unintesting

and this does regardless of shallow state or edges. The only downside
is mark_tree_uninteresting is recursive so in unpacks lots of trees if
have_obj is long, or the worktree is really big. Commit bitmap should
help reduce the cost if have_obj is a committish, at least.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08  1:09         ` Duy Nguyen
@ 2013-08-08  6:39           ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2013-08-08  6:39 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Matthijs Kooijman, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> Haven't found time to read the rest yet, but this I can answer.
> .git/shallow records graft points. If a commit is in .git/shallow and
> it exists in the repository, the commit is considered to have no
> parents regardless of what's recorded in repository. So .git/shallow
> refers to the new roots, not the missing bits.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08  4:50 ` Duy Nguyen
@ 2013-08-08  6:51   ` Junio C Hamano
  2013-08-08  7:21     ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2013-08-08  6:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Matthijs Kooijman, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> I think this applies to general case as well, not just shallow.
> Imagine I have a disconnected commit that points to the latest tree
> (i.e. it contains most of latest changes). Because it's disconnected,
> it'll be ignored by the server side. But if the servide side does
> mark_tree_interesting on this commit, a bunch of blobs might be
> excluded from sending.

I think you meant mark_tree_UNinteresting.

> ... So perhaps we could go over have_obj list
> again, if it's not processed and is
>
>  - a tree-ish, mark_tree_uninteresting
>  - a blob, just mark unintesting
>
> and this does regardless of shallow state or edges.

As a general idea, I agree it may be worth trying out to see if your
concern that the "have" list may be so big that this approach may be
more costly than it is worth.

If the recipient is known to have something, we do not have to send
it.

The things that we decide not to send are not necessarily what the
recipient has, which introduces a twist you need to watch out for if
we want to go that route.

If the recipient is known to have something, a thin transfer can
send a delta against it.  You do not want to send the commits before
the shallow boundary (i.e. the parents of the commits listed in
.git/shallow) because the recipient does not want them, and that
means you may have to use a different mark to record that fact.  The
recipient does not have them, we do not want to send them, and they
cannot be used as a delta base for what we do send.  Which is quite
different from the ordinary "uninteresting" objects, those we decide
not to send because the recipient has them.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08  6:51   ` Junio C Hamano
@ 2013-08-08  7:21     ` Duy Nguyen
  2013-08-08 17:10       ` Junio C Hamano
  2013-08-12  8:02       ` Matthijs Kooijman
  0 siblings, 2 replies; 30+ messages in thread
From: Duy Nguyen @ 2013-08-08  7:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthijs Kooijman, Git Mailing List

On Thu, Aug 8, 2013 at 1:51 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> I think this applies to general case as well, not just shallow.
>> Imagine I have a disconnected commit that points to the latest tree
>> (i.e. it contains most of latest changes). Because it's disconnected,
>> it'll be ignored by the server side. But if the servide side does
>> mark_tree_interesting on this commit, a bunch of blobs might be
>> excluded from sending.
>
> I think you meant mark_tree_UNinteresting.

Yes, thanks for correcting.

>> ... So perhaps we could go over have_obj list
>> again, if it's not processed and is
>>
>>  - a tree-ish, mark_tree_uninteresting
>>  - a blob, just mark unintesting
>>
>> and this does regardless of shallow state or edges.
>
> As a general idea, I agree it may be worth trying out to see if your
> concern that the "have" list may be so big that this approach may be
> more costly than it is worth.
>
> If the recipient is known to have something, we do not have to send
> it.

OK. Mathijs, do you want make a patch for it?

> The things that we decide not to send are not necessarily what the
> recipient has, which introduces a twist you need to watch out for if
> we want to go that route.
>
> If the recipient is known to have something, a thin transfer can
> send a delta against it.  You do not want to send the commits before
> the shallow boundary (i.e. the parents of the commits listed in
> .git/shallow) because the recipient does not want them, and that
> means you may have to use a different mark to record that fact.  The
> recipient does not have them, we do not want to send them, and they
> cannot be used as a delta base for what we do send.  Which is quite
> different from the ordinary "uninteresting" objects, those we decide
> not to send because the recipient has them.

I fail to see the point here. There are two different things: what we
want to send, and what we can make deltas against. Shallow boundary
affects the former. What the recipient has affects latter. What is the
twist about?

As for considering objects before shallow boundary uninteresting, I
have a plan for it: kill upload-pack.c:do_rev_list(). The function is
created to make a cut at shallow boundary, but we already have a tool
for that: grafting. In my ongoing shallow series I will create a
temporary shallow file that contains new roots and pass the file to
pack-objects with --shallow-file. pack-objects will never see anything
outside what the recipient may want (i.e. commits before shallow
boundary) to receive and pack-objects' rev-list should do what
upload-pack.c:do_rev_list() currently does.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08  7:21     ` Duy Nguyen
@ 2013-08-08 17:10       ` Junio C Hamano
  2013-08-09 13:13         ` Duy Nguyen
  2013-08-12  8:02       ` Matthijs Kooijman
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2013-08-08 17:10 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Matthijs Kooijman, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> I fail to see the point here. There are two different things: what we
> want to send, and what we can make deltas against. Shallow boundary
> affects the former. What the recipient has affects latter. What is the
> twist about?

do_rev_list() --> mark_edges_uninteresting() --> show_edge() callchain
that eventually does this:

static void show_edge(struct commit *commit)
{
	fprintf(pack_pipe, "-%s\n", sha1_to_hex(commit->object.sha1));
}

was what I had in mind.

For a non-shallow transfer, feeding "-<boundary commit>" is done for
commits that we do not send (we do not do so for all of them) and
those that we know the recipient does have.  Two different things
used to be the same, but with your suggestion they are not.  Which
is a good thing but we need to be careful to make sure existing
codepaths do not conflate them and untangle ones that do if there
are any, that's all.

> As for considering objects before shallow boundary uninteresting, I
> have a plan for it: kill upload-pack.c:do_rev_list(). The function is
> created to make a cut at shallow boundary,...

Hmph, that function is not primarily about shallow boundary but does
all packing in general.

The edge hinting in there is for thin transfer where the sender
sends deltas against base objects that are known to be present in
the receiving repository, without sending the base objects.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08 17:10       ` Junio C Hamano
@ 2013-08-09 13:13         ` Duy Nguyen
  0 siblings, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2013-08-09 13:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthijs Kooijman, Git Mailing List

On Fri, Aug 9, 2013 at 12:10 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> I fail to see the point here. There are two different things: what we
>> want to send, and what we can make deltas against. Shallow boundary
>> affects the former. What the recipient has affects latter. What is the
>> twist about?
>
> do_rev_list() --> mark_edges_uninteresting() --> show_edge() callchain
> that eventually does this:
>
> static void show_edge(struct commit *commit)
> {
>         fprintf(pack_pipe, "-%s\n", sha1_to_hex(commit->object.sha1));
> }
>
> was what I had in mind.

Now I see. Thanks.

mark_edges_uninteresting() actually calls
mark_edge_parents_uninteresting(), which calls show_edge(). The middle
function is important because after calculating new depth, upload-pack
calls register_shallow() for all both old and new shallow roots and
those commits will have their 'parents' pointer set to NULL, which
renders mark_edge_parents_uninteresting() no-op. So show_edge() is
never called on shallow points' parents.

>> As for considering objects before shallow boundary uninteresting, I
>> have a plan for it: kill upload-pack.c:do_rev_list(). The function is
>> created to make a cut at shallow boundary,...
>
> Hmph, that function is not primarily about shallow boundary but does
> all packing in general.
>
> The edge hinting in there is for thin transfer where the sender
> sends deltas against base objects that are known to be present in
> the receiving repository, without sending the base objects.

OK but edge hinting is the same in pack-objects.c:get_object_list() so
the plan might still work, right? I still need to study about
extra_edge_obj in upload-pack.c though. That's something knowledge
that pack-objects won't have.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-08  7:21     ` Duy Nguyen
  2013-08-08 17:10       ` Junio C Hamano
@ 2013-08-12  8:02       ` Matthijs Kooijman
  2013-08-16  9:51         ` Duy Nguyen
  1 sibling, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-08-12  8:02 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List

Hi Duy,

> OK. Mathijs, do you want make a patch for it?
I'm willing, but:
 - I don't understand the code and all of your comments well enough yet
   to start coding right away (though I haven't actually invested enough
   time in this yet, either).
 - I'll be on vacation for the next two weeks.

When I get back, I'll re-read this thread properly and reply where I
don't follow it. Feel free to continue discussing the plan until then,
of course :-)

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-12  8:02       ` Matthijs Kooijman
@ 2013-08-16  9:51         ` Duy Nguyen
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Duy Nguyen @ 2013-08-16  9:51 UTC (permalink / raw)
  To: Matthijs Kooijman, Duy Nguyen, Junio C Hamano, Git Mailing List

On Mon, Aug 12, 2013 at 3:02 PM, Matthijs Kooijman <matthijs@stdin.nl> wrote:
> Hi Duy,
>
>> OK. Mathijs, do you want make a patch for it?
> I'm willing, but:
>  - I don't understand the code and all of your comments well enough yet
>    to start coding right away (though I haven't actually invested enough
>    time in this yet, either).
>  - I'll be on vacation for the next two weeks.
>
> When I get back, I'll re-read this thread properly and reply where I
> don't follow it. Feel free to continue discussing the plan until then,
> of course :-)

I thought a bit but my thoughts often get stuck if I don't write them
down in form of code :-) so this is what I got so far. 4/6 is a good
thing in my opinion, but I might overlook something 6/6  is about this
thread. I'm likely offline this weekend, so all is good :-D
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c
  2013-08-16  9:51         ` Duy Nguyen
@ 2013-08-16  9:52           ` Nguyễn Thái Ngọc Duy
  2013-08-16  9:52             ` [PATCH 2/6] shallow: only add shallow graft points to new shallow file Nguyễn Thái Ngọc Duy
                               ` (4 more replies)
  2013-08-28 15:36           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
  2013-10-21  7:51           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
  2 siblings, 5 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-16  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Matthijs Kooijman, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.h     |  3 +++
 fetch-pack.c | 53 +----------------------------------------------------
 shallow.c    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+), 52 deletions(-)

diff --git a/commit.h b/commit.h
index d912a9d..790e31b 100644
--- a/commit.h
+++ b/commit.h
@@ -198,6 +198,9 @@ extern struct commit_list *get_shallow_commits(struct object_array *heads,
 		int depth, int shallow_flag, int not_shallow_flag);
 extern void check_shallow_file_for_update(void);
 extern void set_alternate_shallow_file(const char *path);
+extern int write_shallow_commits(struct strbuf *out, int use_pack_protocol);
+extern void setup_alternate_shallow(struct lock_file *shallow_lock,
+				    const char **alternate_shallow_file);
 
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit *);
diff --git a/fetch-pack.c b/fetch-pack.c
index 6684348..28195ed 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -184,36 +184,6 @@ static void consume_shallow_list(struct fetch_pack_args *args, int fd)
 	}
 }
 
-struct write_shallow_data {
-	struct strbuf *out;
-	int use_pack_protocol;
-	int count;
-};
-
-static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
-{
-	struct write_shallow_data *data = cb_data;
-	const char *hex = sha1_to_hex(graft->sha1);
-	data->count++;
-	if (data->use_pack_protocol)
-		packet_buf_write(data->out, "shallow %s", hex);
-	else {
-		strbuf_addstr(data->out, hex);
-		strbuf_addch(data->out, '\n');
-	}
-	return 0;
-}
-
-static int write_shallow_commits(struct strbuf *out, int use_pack_protocol)
-{
-	struct write_shallow_data data;
-	data.out = out;
-	data.use_pack_protocol = use_pack_protocol;
-	data.count = 0;
-	for_each_commit_graft(write_one_shallow, &data);
-	return data.count;
-}
-
 static enum ack_type get_ack(int fd, unsigned char *result_sha1)
 {
 	int len;
@@ -795,27 +765,6 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
-static void setup_alternate_shallow(void)
-{
-	struct strbuf sb = STRBUF_INIT;
-	int fd;
-
-	check_shallow_file_for_update();
-	fd = hold_lock_file_for_update(&shallow_lock, git_path("shallow"),
-				       LOCK_DIE_ON_ERROR);
-	if (write_shallow_commits(&sb, 0)) {
-		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
-			die_errno("failed to write to %s", shallow_lock.filename);
-		alternate_shallow_file = shallow_lock.filename;
-	} else
-		/*
-		 * is_repository_shallow() sees empty string as "no
-		 * shallow file".
-		 */
-		alternate_shallow_file = "";
-	strbuf_release(&sb);
-}
-
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -896,7 +845,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	if (args->stateless_rpc)
 		packet_flush(fd[1]);
 	if (args->depth > 0)
-		setup_alternate_shallow();
+		setup_alternate_shallow(&shallow_lock, &alternate_shallow_file);
 	if (get_pack(args, fd, pack_lockfile))
 		die("git fetch-pack: fetch failed.");
 
diff --git a/shallow.c b/shallow.c
index 8a9c96d..68dd106 100644
--- a/shallow.c
+++ b/shallow.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "commit.h"
 #include "tag.h"
+#include "pkt-line.h"
 
 static int is_shallow = -1;
 static struct stat shallow_stat;
@@ -141,3 +142,56 @@ void check_shallow_file_for_update(void)
 		   )
 		die("shallow file was changed during fetch");
 }
+
+struct write_shallow_data {
+	struct strbuf *out;
+	int use_pack_protocol;
+	int count;
+};
+
+static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
+{
+	struct write_shallow_data *data = cb_data;
+	const char *hex = sha1_to_hex(graft->sha1);
+	data->count++;
+	if (data->use_pack_protocol)
+		packet_buf_write(data->out, "shallow %s", hex);
+	else {
+		strbuf_addstr(data->out, hex);
+		strbuf_addch(data->out, '\n');
+	}
+	return 0;
+}
+
+int write_shallow_commits(struct strbuf *out, int use_pack_protocol)
+{
+	struct write_shallow_data data;
+	data.out = out;
+	data.use_pack_protocol = use_pack_protocol;
+	data.count = 0;
+	for_each_commit_graft(write_one_shallow, &data);
+	return data.count;
+}
+
+void setup_alternate_shallow(struct lock_file *shallow_lock,
+			     const char **alternate_shallow_file)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+
+	check_shallow_file_for_update();
+	fd = hold_lock_file_for_update(shallow_lock, git_path("shallow"),
+				       LOCK_DIE_ON_ERROR);
+	if (write_shallow_commits(&sb, 0)) {
+		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
+			die_errno("failed to write to %s",
+				  shallow_lock->filename);
+		*alternate_shallow_file = shallow_lock->filename;
+	} else
+		/*
+		 * is_repository_shallow() sees empty string as "no
+		 * shallow file".
+		 */
+		*alternate_shallow_file = "";
+	strbuf_release(&sb);
+}
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/6] shallow: only add shallow graft points to new shallow file
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
@ 2013-08-16  9:52             ` Nguyễn Thái Ngọc Duy
  2013-08-16 23:50               ` Eric Sunshine
  2013-08-16  9:52             ` [PATCH 3/6] shallow: add setup_temporary_shallow() Nguyễn Thái Ngọc Duy
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-16  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Matthijs Kooijman, Nguyễn Thái Ngọc Duy

for_each_commit_graft() goes through all graft points and shallow
boudaries are just one special kind of grafting. If $GIT_DIR/shallow
and $GIT_DIR/info/grafts are both present, write_shallow_commits may
catch both sets, accidentally turning some graft points to shallow
boundaries. Don't do that.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 shallow.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/shallow.c b/shallow.c
index 68dd106..5f626c0 100644
--- a/shallow.c
+++ b/shallow.c
@@ -153,6 +153,8 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 {
 	struct write_shallow_data *data = cb_data;
 	const char *hex = sha1_to_hex(graft->sha1);
+	if (graft->nr_parent != -1)
+		return 0;
 	data->count++;
 	if (data->use_pack_protocol)
 		packet_buf_write(data->out, "shallow %s", hex);
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/6] shallow: add setup_temporary_shallow()
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
  2013-08-16  9:52             ` [PATCH 2/6] shallow: only add shallow graft points to new shallow file Nguyễn Thái Ngọc Duy
@ 2013-08-16  9:52             ` Nguyễn Thái Ngọc Duy
  2013-08-16 23:52               ` Eric Sunshine
  2013-08-16  9:52             ` [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects Nguyễn Thái Ngọc Duy
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-16  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Matthijs Kooijman, Nguyễn Thái Ngọc Duy

This function is like setup_alternate_shallow() except that it does
not lock $GIT_DIR/shallow. It's supposed to be used when a program
generates temporary shallow for for use by another program, then throw
the shallow file away.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.h  |  1 +
 shallow.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/commit.h b/commit.h
index 790e31b..c4d324c 100644
--- a/commit.h
+++ b/commit.h
@@ -201,6 +201,7 @@ extern void set_alternate_shallow_file(const char *path);
 extern int write_shallow_commits(struct strbuf *out, int use_pack_protocol);
 extern void setup_alternate_shallow(struct lock_file *shallow_lock,
 				    const char **alternate_shallow_file);
+extern char *setup_temporary_shallow(void);
 
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit *);
diff --git a/shallow.c b/shallow.c
index 5f626c0..cdf37d6 100644
--- a/shallow.c
+++ b/shallow.c
@@ -175,6 +175,29 @@ int write_shallow_commits(struct strbuf *out, int use_pack_protocol)
 	return data.count;
 }
 
+char *setup_temporary_shallow(void)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+
+	if (write_shallow_commits(&sb, 0)) {
+		struct strbuf path = STRBUF_INIT;
+		strbuf_addstr(&path, git_path("shallow_XXXXXX"));
+		fd = xmkstemp(path.buf);
+		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
+			die_errno("failed to write to %s",
+				  path.buf);
+		close(fd);
+		strbuf_release(&sb);
+		return strbuf_detach(&path, NULL);
+	}
+	/*
+	 * is_repository_shallow() sees empty string as "no shallow
+	 * file".
+	 */
+	return xstrdup("");
+}
+
 void setup_alternate_shallow(struct lock_file *shallow_lock,
 			     const char **alternate_shallow_file)
 {
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
  2013-08-16  9:52             ` [PATCH 2/6] shallow: only add shallow graft points to new shallow file Nguyễn Thái Ngọc Duy
  2013-08-16  9:52             ` [PATCH 3/6] shallow: add setup_temporary_shallow() Nguyễn Thái Ngọc Duy
@ 2013-08-16  9:52             ` Nguyễn Thái Ngọc Duy
  2013-08-28 14:52               ` Matthijs Kooijman
  2013-08-16  9:52             ` [PATCH 5/6] list-objects: reduce one argument in mark_edges_uninteresting Nguyễn Thái Ngọc Duy
  2013-08-16  9:52             ` [PATCH 6/6] list-objects: mark more commits as edges " Nguyễn Thái Ngọc Duy
  4 siblings, 1 reply; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-16  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Matthijs Kooijman, Nguyễn Thái Ngọc Duy

upload-pack has a special rev walking code for shallow recipients. It
works almost like the similar code in pack-objects except:

1. in upload-pack, graft points could be added for deepening

2. also when the repository is deepened, the shallow point will be
   moved further away from the tip, but the old shallow point will be
   marked as edge to produce more efficient packs. See 6523078 (make
   shallow repository deepening more network efficient - 2009-09-03)

pass the file to pack-objects via --shallow-file. This will override
$GIT_DIR/shallow and give pack-objects the exact repository shape that
upload-pack has.

mark edge commits by revision command arguments. Even if old shallow
points are passed as "--not" revisions as in this patch, they will not
be picked up by mark_edges_uninteresting() because this function looks
up to parents for edges, while in this case the edge is the children,
in the opposite direction. This will be fixed in the next patch when
all given uninteresting commits are marked as edges.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t5530-upload-pack-error.sh |   3 -
 upload-pack.c                | 128 +++++++++++--------------------------------
 2 files changed, 32 insertions(+), 99 deletions(-)

diff --git a/t/t5530-upload-pack-error.sh b/t/t5530-upload-pack-error.sh
index c983d36..3932e79 100755
--- a/t/t5530-upload-pack-error.sh
+++ b/t/t5530-upload-pack-error.sh
@@ -54,9 +54,6 @@ test_expect_success 'upload-pack fails due to error in rev-list' '
 	printf "0032want %s\n0034shallow %s00000009done\n0000" \
 		$(git rev-parse HEAD) $(git rev-parse HEAD^) >input &&
 	test_must_fail git upload-pack . <input >/dev/null 2>output.err &&
-	# pack-objects survived
-	grep "Total.*, reused" output.err &&
-	# but there was an error, which must have been in rev-list
 	grep "bad tree object" output.err
 '
 
diff --git a/upload-pack.c b/upload-pack.c
index 127e59a..d5a003a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -68,87 +68,28 @@ static ssize_t send_client_data(int fd, const char *data, ssize_t sz)
 	return sz;
 }
 
-static FILE *pack_pipe = NULL;
-static void show_commit(struct commit *commit, void *data)
-{
-	if (commit->object.flags & BOUNDARY)
-		fputc('-', pack_pipe);
-	if (fputs(sha1_to_hex(commit->object.sha1), pack_pipe) < 0)
-		die("broken output pipe");
-	fputc('\n', pack_pipe);
-	fflush(pack_pipe);
-	free(commit->buffer);
-	commit->buffer = NULL;
-}
-
-static void show_object(struct object *obj,
-			const struct name_path *path, const char *component,
-			void *cb_data)
-{
-	show_object_with_name(pack_pipe, obj, path, component);
-}
-
-static void show_edge(struct commit *commit)
-{
-	fprintf(pack_pipe, "-%s\n", sha1_to_hex(commit->object.sha1));
-}
-
-static int do_rev_list(int in, int out, void *user_data)
-{
-	int i;
-	struct rev_info revs;
-
-	pack_pipe = xfdopen(out, "w");
-	init_revisions(&revs, NULL);
-	revs.tag_objects = 1;
-	revs.tree_objects = 1;
-	revs.blob_objects = 1;
-	if (use_thin_pack)
-		revs.edge_hint = 1;
-
-	for (i = 0; i < want_obj.nr; i++) {
-		struct object *o = want_obj.objects[i].item;
-		/* why??? */
-		o->flags &= ~UNINTERESTING;
-		add_pending_object(&revs, o, NULL);
-	}
-	for (i = 0; i < have_obj.nr; i++) {
-		struct object *o = have_obj.objects[i].item;
-		o->flags |= UNINTERESTING;
-		add_pending_object(&revs, o, NULL);
-	}
-	setup_revisions(0, NULL, &revs, NULL);
-	if (prepare_revision_walk(&revs))
-		die("revision walk setup failed");
-	mark_edges_uninteresting(revs.commits, &revs, show_edge);
-	if (use_thin_pack)
-		for (i = 0; i < extra_edge_obj.nr; i++)
-			fprintf(pack_pipe, "-%s\n", sha1_to_hex(
-					extra_edge_obj.objects[i].item->sha1));
-	traverse_commit_list(&revs, show_commit, show_object, NULL);
-	fflush(pack_pipe);
-	fclose(pack_pipe);
-	return 0;
-}
-
 static void create_pack_file(void)
 {
-	struct async rev_list;
 	struct child_process pack_objects;
 	char data[8193], progress[128];
 	char abort_msg[] = "aborting due to possible repository "
 		"corruption on the remote side.";
 	int buffered = -1;
 	ssize_t sz;
-	const char *argv[10];
-	int arg = 0;
+	const char *argv[12];
+	int i, arg = 0;
+	FILE *pipe_fd;
+	char *shallow_file = NULL;
 
-	argv[arg++] = "pack-objects";
-	if (!shallow_nr) {
-		argv[arg++] = "--revs";
-		if (use_thin_pack)
-			argv[arg++] = "--thin";
+	if (shallow_nr) {
+		shallow_file = setup_temporary_shallow();
+		argv[arg++] = "--shallow-file";
+		argv[arg++] = shallow_file;
 	}
+	argv[arg++] = "pack-objects";
+	argv[arg++] = "--revs";
+	if (use_thin_pack)
+		argv[arg++] = "--thin";
 
 	argv[arg++] = "--stdout";
 	if (!no_progress)
@@ -169,29 +110,21 @@ static void create_pack_file(void)
 	if (start_command(&pack_objects))
 		die("git upload-pack: unable to fork git-pack-objects");
 
-	if (shallow_nr) {
-		memset(&rev_list, 0, sizeof(rev_list));
-		rev_list.proc = do_rev_list;
-		rev_list.out = pack_objects.in;
-		if (start_async(&rev_list))
-			die("git upload-pack: unable to fork git-rev-list");
-	}
-	else {
-		FILE *pipe_fd = xfdopen(pack_objects.in, "w");
-		int i;
-
-		for (i = 0; i < want_obj.nr; i++)
-			fprintf(pipe_fd, "%s\n",
-				sha1_to_hex(want_obj.objects[i].item->sha1));
-		fprintf(pipe_fd, "--not\n");
-		for (i = 0; i < have_obj.nr; i++)
-			fprintf(pipe_fd, "%s\n",
-				sha1_to_hex(have_obj.objects[i].item->sha1));
-		fprintf(pipe_fd, "\n");
-		fflush(pipe_fd);
-		fclose(pipe_fd);
-	}
-
+	pipe_fd = xfdopen(pack_objects.in, "w");
+
+	for (i = 0; i < want_obj.nr; i++)
+		fprintf(pipe_fd, "%s\n",
+			sha1_to_hex(want_obj.objects[i].item->sha1));
+	fprintf(pipe_fd, "--not\n");
+	for (i = 0; i < have_obj.nr; i++)
+		fprintf(pipe_fd, "%s\n",
+			sha1_to_hex(have_obj.objects[i].item->sha1));
+	for (i = 0; i < extra_edge_obj.nr; i++)
+		fprintf(pipe_fd, "%s\n",
+			sha1_to_hex(extra_edge_obj.objects[i].item->sha1));
+	fprintf(pipe_fd, "\n");
+	fflush(pipe_fd);
+	fclose(pipe_fd);
 
 	/* We read from pack_objects.err to capture stderr output for
 	 * progress bar, and pack_objects.out to capture the pack data.
@@ -290,8 +223,11 @@ static void create_pack_file(void)
 		error("git upload-pack: git-pack-objects died with error.");
 		goto fail;
 	}
-	if (shallow_nr && finish_async(&rev_list))
-		goto fail;	/* error was already reported */
+	if (shallow_file) {
+		if (*shallow_file)
+			unlink(shallow_file);
+		free(shallow_file);
+	}
 
 	/* flush the data */
 	if (0 <= buffered) {
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/6] list-objects: reduce one argument in mark_edges_uninteresting
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
                               ` (2 preceding siblings ...)
  2013-08-16  9:52             ` [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects Nguyễn Thái Ngọc Duy
@ 2013-08-16  9:52             ` Nguyễn Thái Ngọc Duy
  2013-08-16  9:52             ` [PATCH 6/6] list-objects: mark more commits as edges " Nguyễn Thái Ngọc Duy
  4 siblings, 0 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-16  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Matthijs Kooijman, Nguyễn Thái Ngọc Duy

mark_edges_uninteresting() is always called with this form

  mark_edges_uninteresting(revs->commits, revs, ...);

Remove the first argument and let mark_edges_uninteresting figure that
out by itself. It helps answer the question "are this commit list and
revs related in any way?" when looking at mark_edges_uninteresting
implementation.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 bisect.c               | 2 +-
 builtin/pack-objects.c | 2 +-
 builtin/rev-list.c     | 2 +-
 http-push.c            | 2 +-
 list-objects.c         | 7 +++----
 list-objects.h         | 2 +-
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/bisect.c b/bisect.c
index 71c1958..1e46a4f 100644
--- a/bisect.c
+++ b/bisect.c
@@ -624,7 +624,7 @@ static void bisect_common(struct rev_info *revs)
 	if (prepare_revision_walk(revs))
 		die("revision walk setup failed");
 	if (revs->tree_objects)
-		mark_edges_uninteresting(revs->commits, revs, NULL);
+		mark_edges_uninteresting(revs, NULL);
 }
 
 static void exit_if_skipped_commits(struct commit_list *tried,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f069462..dd117b3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2378,7 +2378,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
-	mark_edges_uninteresting(revs.commits, &revs, show_edge);
+	mark_edges_uninteresting(&revs, show_edge);
 	traverse_commit_list(&revs, show_commit, show_object, NULL);
 
 	if (keep_unreachable)
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index a5ec30d..4fc1616 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -336,7 +336,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	if (revs.tree_objects)
-		mark_edges_uninteresting(revs.commits, &revs, show_edge);
+		mark_edges_uninteresting(&revs, show_edge);
 
 	if (bisect_list) {
 		int reaches = reaches, all = all;
diff --git a/http-push.c b/http-push.c
index 6dad188..cde6416 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1976,7 +1976,7 @@ int main(int argc, char **argv)
 		pushing = 0;
 		if (prepare_revision_walk(&revs))
 			die("revision walk setup failed");
-		mark_edges_uninteresting(revs.commits, &revs, NULL);
+		mark_edges_uninteresting(&revs, NULL);
 		objects_to_send = get_delta(&revs, ref_lock);
 		finish_all_active_slots();
 
diff --git a/list-objects.c b/list-objects.c
index 3dd4a96..db8ee4f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -145,11 +145,10 @@ static void mark_edge_parents_uninteresting(struct commit *commit,
 	}
 }
 
-void mark_edges_uninteresting(struct commit_list *list,
-			      struct rev_info *revs,
-			      show_edge_fn show_edge)
+void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
 {
-	for ( ; list; list = list->next) {
+	struct commit_list *list;
+	for (list = revs->commits; list; list = list->next) {
 		struct commit *commit = list->item;
 
 		if (commit->object.flags & UNINTERESTING) {
diff --git a/list-objects.h b/list-objects.h
index 3db7bb6..136a1da 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -6,6 +6,6 @@ typedef void (*show_object_fn)(struct object *, const struct name_path *, const
 void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
-void mark_edges_uninteresting(struct commit_list *, struct rev_info *, show_edge_fn);
+void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
 
 #endif
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/6] list-objects: mark more commits as edges in mark_edges_uninteresting
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
                               ` (3 preceding siblings ...)
  2013-08-16  9:52             ` [PATCH 5/6] list-objects: reduce one argument in mark_edges_uninteresting Nguyễn Thái Ngọc Duy
@ 2013-08-16  9:52             ` Nguyễn Thái Ngọc Duy
  4 siblings, 0 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-16  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Matthijs Kooijman, Nguyễn Thái Ngọc Duy

The purpose of edge commits is to let pack-objects know what objects
it can use as base, but does not need to include in the thin pack
because the other side is supposed to already have them. So far we
mark uninteresting parents of interesting commits as edges. But even
an unrelated uninteresting commit (that the other side has) may become
a good base for pack-objects and help produce more efficient packs.

This is especially true for shallow clone, when the client issues a
fetch with a depth smaller or equal to the number of commits the
server is ahead of the client. For example, in this commit history the
client has up to "A" and the server has up to "B":

    -------A---B
     have--^   ^
              /
       want--+

If depth 1 is requested, the commit list to send to the client
includes only B. The way m_e_u is working, it checks if parent commits
of B are uninteresting, if so mark them as edges. Due to shallow
effect, commit B is grafted to have no parents and the revision walker
never sees A as the parent of B. In fact it marks no edges at all in
this simple case and sends everything B has to the client even if it
could have excluded what A and also the client already have. In a
slightly different case where A is not a direct parent of B (iow there
are commits in between A and B), marking A as an edge can still save
some because B may still have stuff from the far ancestor A.

There is another case from the previous patch, when we deepen a ref
from C->E to A->E:

    ---A---B   C---D---E
     want--^   ^       ^
       shallow-+      /
          have-------+

In this case we need to send A and B to the client, and C (i.e. the
current shallow point that the client informs the server) is a very
good base because it's closet to A and B. Normal m_e_u won't recognize
C as an edge because it only looks back to parents (i.e. A<-B) not the
opposite way B->C even if C is already marked as uninteresting commit
by the previous patch.

This patch includes all uninteresting commits from command line as
edges and lets pack-objects decide what's best to do. The upside is we
have better chance of producing better packs in certain cases. The
downside is we may need to process some extra objects on the server
side.

For the shallow case on git.git, when the client is 5 commits behind
and does "fetch --depth=3", the result pack is 99.26 KiB instead of
4.92 MiB.

Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 list-objects.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/list-objects.c b/list-objects.c
index db8ee4f..05c8c5c 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -148,15 +148,32 @@ static void mark_edge_parents_uninteresting(struct commit *commit,
 void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
 {
 	struct commit_list *list;
+	int i;
+
 	for (list = revs->commits; list; list = list->next) {
 		struct commit *commit = list->item;
 
 		if (commit->object.flags & UNINTERESTING) {
 			mark_tree_uninteresting(commit->tree);
+			if (revs->edge_hint && !(commit->object.flags & SHOWN)) {
+				commit->object.flags |= SHOWN;
+				show_edge(commit);
+			}
 			continue;
 		}
 		mark_edge_parents_uninteresting(commit, revs, show_edge);
 	}
+	for (i = 0; i < revs->cmdline.nr; i++) {
+		struct object *obj = revs->cmdline.rev[i].item;
+		struct commit *commit = (struct commit *)obj;
+		if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
+			continue;
+		mark_tree_uninteresting(commit->tree);
+		if (revs->edge_hint && !(obj->flags & SHOWN)) {
+			obj->flags |= SHOWN;
+			show_edge(commit);
+		}
+	}
 }
 
 static void add_pending_tree(struct rev_info *revs, struct tree *tree)
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] shallow: only add shallow graft points to new shallow file
  2013-08-16  9:52             ` [PATCH 2/6] shallow: only add shallow graft points to new shallow file Nguyễn Thái Ngọc Duy
@ 2013-08-16 23:50               ` Eric Sunshine
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Sunshine @ 2013-08-16 23:50 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Git List, Junio C Hamano, Matthijs Kooijman

On Fri, Aug 16, 2013 at 5:52 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> for_each_commit_graft() goes through all graft points and shallow
> boudaries are just one special kind of grafting. If $GIT_DIR/shallow

s/boudaries/boundaries/

> and $GIT_DIR/info/grafts are both present, write_shallow_commits may
> catch both sets, accidentally turning some graft points to shallow
> boundaries. Don't do that.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] shallow: add setup_temporary_shallow()
  2013-08-16  9:52             ` [PATCH 3/6] shallow: add setup_temporary_shallow() Nguyễn Thái Ngọc Duy
@ 2013-08-16 23:52               ` Eric Sunshine
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Sunshine @ 2013-08-16 23:52 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Git List, Junio C Hamano, Matthijs Kooijman

On Fri, Aug 16, 2013 at 5:52 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> This function is like setup_alternate_shallow() except that it does
> not lock $GIT_DIR/shallow. It's supposed to be used when a program
> generates temporary shallow for for use by another program, then throw

s/for for/for/

> the shallow file away.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects
  2013-08-16  9:52             ` [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects Nguyễn Thái Ngọc Duy
@ 2013-08-28 14:52               ` Matthijs Kooijman
  2013-08-29  9:48                 ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-08-28 14:52 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

Hi Nguy,

On Fri, Aug 16, 2013 at 04:52:05PM +0700, Nguyễn Thái Ngọc Duy wrote:
> upload-pack has a special rev walking code for shallow recipients. It
> works almost like the similar code in pack-objects except:
> 
> 1. in upload-pack, graft points could be added for deepening
> 
> 2. also when the repository is deepened, the shallow point will be
>    moved further away from the tip, but the old shallow point will be
>    marked as edge to produce more efficient packs. See 6523078 (make
>    shallow repository deepening more network efficient - 2009-09-03)
> 
> pass the file to pack-objects via --shallow-file. This will override
> $GIT_DIR/shallow and give pack-objects the exact repository shape that
> upload-pack has.
> 
> mark edge commits by revision command arguments. Even if old shallow
> points are passed as "--not" revisions as in this patch, they will not
> be picked up by mark_edges_uninteresting() because this function looks
> up to parents for edges, while in this case the edge is the children,
> in the opposite direction. This will be fixed in the next patch when
> all given uninteresting commits are marked as edges.
This says "the next patch" but it really refers to 6/6, not 5/6. Patch
6/6 has the same problem (it says "previous patch"). Perhaps patches 4
and 5 should just be swapped?

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-16  9:51         ` Duy Nguyen
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
@ 2013-08-28 15:36           ` Matthijs Kooijman
  2013-08-28 16:02             ` [PATCH] Add testcase for needless objects during a shallow fetch Matthijs Kooijman
  2013-10-21  7:51           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
  2 siblings, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-08-28 15:36 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List

Hi Duy,

> I thought a bit but my thoughts often get stuck if I don't write them
> down in form of code :-) so this is what I got so far. 4/6 is a good
> thing in my opinion, but I might overlook something 6/6  is about this
> thread.

The series looks good to me, though I don't know enough about the code
to do detailed analysis.

In any case, I agree that 4/6 is a good change, it removes a bunch of
similar code for the shallow special case (which is now no longer a
completely separate special case).

The total series also seems to actually fix the problem I reported. I'll
resend the testcase from my original patch as well, which now passes
with your series applied.

Thanks for diving into this!

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] Add testcase for needless objects during a shallow fetch
  2013-08-28 15:36           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
@ 2013-08-28 16:02             ` Matthijs Kooijman
  2013-08-29  9:50               ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-08-28 16:02 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Matthijs Kooijman

This is a testcase that checks for a problem where, during a specific
shallow fetch where the client does not have any commits that are a
successor of the new shallow root (i.e., the fetch creates a new
detached piece of history), the server would simply send over _all_
objects, instead of taking into account the objects already present in
the client.

The actual problem was fixed by a recent patch series by Nguyễn Thái
Ngọc Duy already.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
---
 t/t5500-fetch-pack.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index fd2598e..a022d65 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -393,6 +393,17 @@ test_expect_success 'fetch in shallow repo unreachable shallow objects' '
 		git fsck --no-dangling
 	)
 '
+test_expect_success 'fetch creating new shallow root' '
+	(
+		git clone "file://$(pwd)/." shallow10 &&
+		git commit --allow-empty -m empty &&
+		cd shallow10 &&
+		git fetch --depth=1 --progress 2> actual &&
+		# This should fetch only the empty commit, no tree or
+		# blob objects
+		grep "remote: Total 1" actual
+	)
+'
 
 test_expect_success 'setup tests for the --stdin parameter' '
 	for head in C D E F
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects
  2013-08-28 14:52               ` Matthijs Kooijman
@ 2013-08-29  9:48                 ` Duy Nguyen
  0 siblings, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2013-08-29  9:48 UTC (permalink / raw)
  To: Matthijs Kooijman, Nguyễn Thái Ngọc,
	Git Mailing List, Junio C Hamano

On Wed, Aug 28, 2013 at 9:52 PM, Matthijs Kooijman <matthijs@stdin.nl> wrote:
> Hi Nguy,
>
> On Fri, Aug 16, 2013 at 04:52:05PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> upload-pack has a special rev walking code for shallow recipients. It
>> works almost like the similar code in pack-objects except:
>>
>> 1. in upload-pack, graft points could be added for deepening
>>
>> 2. also when the repository is deepened, the shallow point will be
>>    moved further away from the tip, but the old shallow point will be
>>    marked as edge to produce more efficient packs. See 6523078 (make
>>    shallow repository deepening more network efficient - 2009-09-03)
>>
>> pass the file to pack-objects via --shallow-file. This will override
>> $GIT_DIR/shallow and give pack-objects the exact repository shape that
>> upload-pack has.
>>
>> mark edge commits by revision command arguments. Even if old shallow
>> points are passed as "--not" revisions as in this patch, they will not
>> be picked up by mark_edges_uninteresting() because this function looks
>> up to parents for edges, while in this case the edge is the children,
>> in the opposite direction. This will be fixed in the next patch when
>> all given uninteresting commits are marked as edges.
> This says "the next patch" but it really refers to 6/6, not 5/6. Patch
> 6/6 has the same problem (it says "previous patch"). Perhaps patches 4
> and 5 should just be swapped?

Yeah. I guess I reordered the patches before sending out and forgot
that the commit message needs a special order. Wil do.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add testcase for needless objects during a shallow fetch
  2013-08-28 16:02             ` [PATCH] Add testcase for needless objects during a shallow fetch Matthijs Kooijman
@ 2013-08-29  9:50               ` Duy Nguyen
  2013-08-31  1:25                 ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2013-08-29  9:50 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: Junio C Hamano, Git Mailing List

On Wed, Aug 28, 2013 at 11:02 PM, Matthijs Kooijman <matthijs@stdin.nl> wrote:
> This is a testcase that checks for a problem where, during a specific
> shallow fetch where the client does not have any commits that are a
> successor of the new shallow root (i.e., the fetch creates a new
> detached piece of history), the server would simply send over _all_
> objects, instead of taking into account the objects already present in
> the client.

Thanks. This reminds me I should add a test case in the 4/6 to
demonstrate the regression and let it verify again in 6/6 that the
temporary regression is gone. Will reroll the series with your patch
included.

>
> The actual problem was fixed by a recent patch series by Nguyễn Thái
> Ngọc Duy already.
>
> Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
> ---
>  t/t5500-fetch-pack.sh | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index fd2598e..a022d65 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -393,6 +393,17 @@ test_expect_success 'fetch in shallow repo unreachable shallow objects' '
>                 git fsck --no-dangling
>         )
>  '
> +test_expect_success 'fetch creating new shallow root' '
> +       (
> +               git clone "file://$(pwd)/." shallow10 &&
> +               git commit --allow-empty -m empty &&
> +               cd shallow10 &&
> +               git fetch --depth=1 --progress 2> actual &&
> +               # This should fetch only the empty commit, no tree or
> +               # blob objects
> +               grep "remote: Total 1" actual
> +       )
> +'
>
>  test_expect_success 'setup tests for the --stdin parameter' '
>         for head in C D E F
> --
> 1.8.4.rc1
>



-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add testcase for needless objects during a shallow fetch
  2013-08-29  9:50               ` Duy Nguyen
@ 2013-08-31  1:25                 ` Duy Nguyen
  0 siblings, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2013-08-31  1:25 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: Junio C Hamano, Git Mailing List

On Thu, Aug 29, 2013 at 4:50 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Wed, Aug 28, 2013 at 11:02 PM, Matthijs Kooijman <matthijs@stdin.nl> wrote:
>> This is a testcase that checks for a problem where, during a specific
>> shallow fetch where the client does not have any commits that are a
>> successor of the new shallow root (i.e., the fetch creates a new
>> detached piece of history), the server would simply send over _all_
>> objects, instead of taking into account the objects already present in
>> the client.
>
> Thanks. This reminds me I should add a test case in the 4/6 to
> demonstrate the regression and let it verify again in 6/6 that the
> temporary regression is gone. Will reroll the series with your patch
> included.

No. It's too hard. The difference is what base a delta object use and
checking that might not be entirely reliable because the algorithm in
pack-objects might change some day.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-08-16  9:51         ` Duy Nguyen
  2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
  2013-08-28 15:36           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
@ 2013-10-21  7:51           ` Matthijs Kooijman
  2013-10-26 10:49             ` Duy Nguyen
  2 siblings, 1 reply; 30+ messages in thread
From: Matthijs Kooijman @ 2013-10-21  7:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Kai Hendry - Webconverger

[-- Attachment #1: Type: text/plain, Size: 322 bytes --]

Hi Duy,

I saw your patch series got accepted in git master a while back, great!
Since I hope to be using the fixed behaviour soon, what was the plan for
including it? Am I correct in thinking that git master will become 1.8.5
in a while? Would this series perhaps be considered for backporting to
1.8.4.x?

Gr.

Matthijs

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
  2013-10-21  7:51           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
@ 2013-10-26 10:49             ` Duy Nguyen
  0 siblings, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2013-10-26 10:49 UTC (permalink / raw)
  To: Matthijs Kooijman, Duy Nguyen, Junio C Hamano, Git Mailing List,
	Kai Hendry - Webconverger

On Mon, Oct 21, 2013 at 2:51 PM, Matthijs Kooijman <matthijs@stdin.nl> wrote:
> Hi Duy,
>
> I saw your patch series got accepted in git master a while back, great!
> Since I hope to be using the fixed behaviour soon, what was the plan for
> including it? Am I correct in thinking that git master will become 1.8.5
> in a while? Would this series perhaps be considered for backporting to
> 1.8.4.x?

I was waiting for Junio to answer this as I rarely run released
versions and do not care much about releases. I think normally master
will be cut for the next release (1.8.5?), maint branches have
backported bug fixes. I consider this an improvement rather than bug
fix. So my guess is it will not be back ported to 1.8.4.x.

>
> Gr.
>
> Matthijs
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAlJk3QsACgkQz0nQ5oovr7wVOwCgvQCmB4IJ6X86727/5Kslg83G
> A4UAoI8fBIXGnE1PwtwqFk/Od697dgNM
> =rjMT
> -----END PGP SIGNATURE-----
>



-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-10-26 10:49 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-11 22:01 [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
2013-07-11 22:53 ` Junio C Hamano
2013-07-12  7:11   ` Matthijs Kooijman
2013-08-07 10:27     ` Matthijs Kooijman
2013-08-08  1:01       ` Junio C Hamano
2013-08-08  1:09         ` Duy Nguyen
2013-08-08  6:39           ` Junio C Hamano
2013-08-08  4:50 ` Duy Nguyen
2013-08-08  6:51   ` Junio C Hamano
2013-08-08  7:21     ` Duy Nguyen
2013-08-08 17:10       ` Junio C Hamano
2013-08-09 13:13         ` Duy Nguyen
2013-08-12  8:02       ` Matthijs Kooijman
2013-08-16  9:51         ` Duy Nguyen
2013-08-16  9:52           ` [PATCH 1/6] Move setup_alternate_shallow and write_shallow_commits to shallow.c Nguyễn Thái Ngọc Duy
2013-08-16  9:52             ` [PATCH 2/6] shallow: only add shallow graft points to new shallow file Nguyễn Thái Ngọc Duy
2013-08-16 23:50               ` Eric Sunshine
2013-08-16  9:52             ` [PATCH 3/6] shallow: add setup_temporary_shallow() Nguyễn Thái Ngọc Duy
2013-08-16 23:52               ` Eric Sunshine
2013-08-16  9:52             ` [PATCH 4/6] upload-pack: delegate rev walking in shallow fetch to pack-objects Nguyễn Thái Ngọc Duy
2013-08-28 14:52               ` Matthijs Kooijman
2013-08-29  9:48                 ` Duy Nguyen
2013-08-16  9:52             ` [PATCH 5/6] list-objects: reduce one argument in mark_edges_uninteresting Nguyễn Thái Ngọc Duy
2013-08-16  9:52             ` [PATCH 6/6] list-objects: mark more commits as edges " Nguyễn Thái Ngọc Duy
2013-08-28 15:36           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
2013-08-28 16:02             ` [PATCH] Add testcase for needless objects during a shallow fetch Matthijs Kooijman
2013-08-29  9:50               ` Duy Nguyen
2013-08-31  1:25                 ` Duy Nguyen
2013-10-21  7:51           ` [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects Matthijs Kooijman
2013-10-26 10:49             ` Duy Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.