All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fetch-pack: always allow fetching of literal SHA1s
@ 2017-05-09 18:20 Jonathan Tan
  2017-05-09 22:16 ` Jeff King
                   ` (7 more replies)
  0 siblings, 8 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-05-09 18:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c          | 30 ++++++++++++++++--------------
 t/t5500-fetch-pack.sh |  6 ++++++
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..180405dff 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -592,6 +592,14 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static int is_literal_sha1(const struct ref *ref)
+{
+	struct object_id oid;
+	return !get_oid_hex(ref->name, &oid) &&
+	       !ref->name[40] &&
+	       !oidcmp(&oid, &ref->old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
@@ -601,7 +609,6 @@ static void filter_refs(struct fetch_pack_args *args,
 	struct ref *ref, *next;
 	int i;
 
-	i = 0;
 	for (ref = *refs; ref; ref = next) {
 		int keep = 0;
 		next = ref->next;
@@ -610,15 +617,14 @@ static void filter_refs(struct fetch_pack_args *args,
 		    check_refname_format(ref->name, 0))
 			; /* trash */
 		else {
-			while (i < nr_sought) {
-				int cmp = strcmp(ref->name, sought[i]->name);
-				if (cmp < 0)
-					break; /* definitely do not have it */
-				else if (cmp == 0) {
-					keep = 1; /* definitely have it */
-					sought[i]->match_status = REF_MATCHED;
+			for (i = 0; i < nr_sought; i++) {
+				struct ref *s = sought[i];
+				if (!strcmp(ref->name, s->name) ||
+				    (is_literal_sha1(s) &&
+				     !oidcmp(&ref->old_oid, &s->old_oid))) {
+					keep = 1;
+					s->match_status = REF_MATCHED;
 				}
-				i++;
 			}
 		}
 
@@ -637,14 +643,10 @@ static void filter_refs(struct fetch_pack_args *args,
 
 	/* Append unmatched requests to the list */
 	for (i = 0; i < nr_sought; i++) {
-		unsigned char sha1[20];
-
 		ref = sought[i];
 		if (ref->match_status != REF_NOT_MATCHED)
 			continue;
-		if (get_sha1_hex(ref->name, sha1) ||
-		    ref->name[40] != '\0' ||
-		    hashcmp(sha1, ref->old_oid.hash))
+		if (!is_literal_sha1(ref))
 			continue;
 
 		if ((allow_unadvertised_object_request &
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..d87983bc2 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,12 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	git init server &&
+	test_commit -C server 4 &&
+	git fetch-pack server $(git -C server rev-parse refs/heads/master)
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
@ 2017-05-09 22:16 ` Jeff King
  2017-05-10  4:22   ` Shawn Pearce
  2017-05-10 16:44 ` [PATCH v2] " Jonathan Tan
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2017-05-09 22:16 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, May 09, 2017 at 11:20:42AM -0700, Jonathan Tan wrote:

> fetch-pack, when fetching a literal SHA-1 from a server that is not
> configured with uploadpack.allowtipsha1inwant (or similar), always
> returns an error message of the form "Server does not allow request for
> unadvertised object %s". However, it is sometimes the case that such
> object is advertised.
> 
> Teach fetch-pack to also check the SHA-1s of the refs in the received
> ref advertisement if a literal SHA-1 was given by the user.

Hmm. That makes sense generally, as the request should succeed. But it
seems like we're creating a client that will sometimes succeed and
sometimes fail, and the reasoning will be somewhat opaque to the user.
I have a feeling I'm missing some context on when you'd expect this to
kick in.

> +static int is_literal_sha1(const struct ref *ref)
> +{
> +	struct object_id oid;
> +	return !get_oid_hex(ref->name, &oid) &&
> +	       !ref->name[40] &&
> +	       !oidcmp(&oid, &ref->old_oid);
> +}

I think the preferred method these days is to avoid the bare "40":

  struct object_oid oid;
  const char *end;
  return !parse_oid_hex(ref->name, &oid, &end) &&
         !*end &&
	 !oidcmp(&oid, &ref->old_oid);

I was confused at first why we need this oidcmp() and the one below. But
this one is checking "does the name parse to itself", and the other is
checking "does this parse to our sought ref?". So both checks are
needed.

> +			for (i = 0; i < nr_sought; i++) {
> +				struct ref *s = sought[i];
> +				if (!strcmp(ref->name, s->name) ||
> +				    (is_literal_sha1(s) &&
> +				     !oidcmp(&ref->old_oid, &s->old_oid))) {
> +					keep = 1;
> +					s->match_status = REF_MATCHED;
>  				}
> -				i++;
>  			}

This will reparse ref->name as an oid via is_literal_sha1() for each
pass through the loop. Should it be hoisted out? Maybe that is just
premature optimization, though.

Other than those minor nits, the code itself looks fine to me.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 22:16 ` Jeff King
@ 2017-05-10  4:22   ` Shawn Pearce
  2017-05-10  4:33     ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Shawn Pearce @ 2017-05-10  4:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git

On Tue, May 9, 2017 at 3:16 PM, Jeff King <peff@peff.net> wrote:
> On Tue, May 09, 2017 at 11:20:42AM -0700, Jonathan Tan wrote:
>
>> fetch-pack, when fetching a literal SHA-1 from a server that is not
>> configured with uploadpack.allowtipsha1inwant (or similar), always
>> returns an error message of the form "Server does not allow request for
>> unadvertised object %s". However, it is sometimes the case that such
>> object is advertised.
>>
>> Teach fetch-pack to also check the SHA-1s of the refs in the received
>> ref advertisement if a literal SHA-1 was given by the user.
>
> Hmm. That makes sense generally, as the request should succeed. But it
> seems like we're creating a client that will sometimes succeed and
> sometimes fail, and the reasoning will be somewhat opaque to the user.
> I have a feeling I'm missing some context on when you'd expect this to
> kick in.

Specifically, someone I know was looking at building an application
that is passed only a SHA-1 for the tip of a ref on a popular hosting
site[1]. They wanted to run `git fetch URL SHA1`, but this failed
because the site doesn't have upload.allowtipsha1inwant enabled.
However the SHA1 was clearly in the output of git ls-remote.

So a workaround is to do this in shell, which is just weird:

  r=$(git ls-remote $url | grep ^$sha1);
  if [ -n "$r" ]; then
    exec git fetch $url $r:refs/tmp/foo
  else
    echo >&2 "cannot find $sha1"
    exit 1
  fi

For various reasons they expected this to work, because it works
against other sites that do have upload.allowtipsha1inwant enabled.
And I personally just expected it to work because the fetch client
accepts SHA-1s, and the wire protocol uses "want SHA1" not "want ref",
and the SHA-1 passed on the command line was currently in the
advertisement when the connection opened, so its certainly something
the server is currently willing to serve.

The application in question is using a SHA-1 rather than a ref name,
because thats what it was given by something else. And the other thing
basically wants this to fail-fast if it can't get that exact SHA-1. So
to pass a ref name to git fetch the supplier would have to actually
pass a tuple of ref+sha1, and then the fetcher has to check that the
ref it just obtained provides the sha1 it expected.

So this all turned into a bug report by me to Jonathan Tan, because
git fetch violated my assumption of what it would do if I handed it a
SHA-1 and the SHA-1 was currently the tip of a remote ref.


[1] github.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10  4:22   ` Shawn Pearce
@ 2017-05-10  4:33     ` Jeff King
  2017-05-10  4:46       ` Mike Hommey
                         ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Jeff King @ 2017-05-10  4:33 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Jonathan Tan, git

On Tue, May 09, 2017 at 09:22:11PM -0700, Shawn Pearce wrote:

> > Hmm. That makes sense generally, as the request should succeed. But it
> > seems like we're creating a client that will sometimes succeed and
> > sometimes fail, and the reasoning will be somewhat opaque to the user.
> > I have a feeling I'm missing some context on when you'd expect this to
> > kick in.
> 
> Specifically, someone I know was looking at building an application
> that is passed only a SHA-1 for the tip of a ref on a popular hosting
> site[1]. They wanted to run `git fetch URL SHA1`, but this failed
> because the site doesn't have upload.allowtipsha1inwant enabled.
> However the SHA1 was clearly in the output of git ls-remote.

OK. So this is basically a case where we expect that the user knows what
they're doing.

> For various reasons they expected this to work, because it works
> against other sites that do have upload.allowtipsha1inwant enabled.
> And I personally just expected it to work because the fetch client
> accepts SHA-1s, and the wire protocol uses "want SHA1" not "want ref",
> and the SHA-1 passed on the command line was currently in the
> advertisement when the connection opened, so its certainly something
> the server is currently willing to serve.

Right, makes sense.  I wondered if GitHub should be turning on
allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
some internal refs[1], and they're things that users wouldn't want to
fetch. The problem for your case really is just on the client side, and
this patch fixes it.

Some of this context might be useful in the commit message.

-Peff

[1] The reachability checks from upload-pack don't actually do much on
    GitHub, because you can generally access the objects via the API or
    the web site anyway. So I'm not really opposed to turning on
    allowTipSHA1InWant if it would be useful for users, but after
    Jonathan's patch I don't see how it would be.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10  4:33     ` Jeff King
@ 2017-05-10  4:46       ` Mike Hommey
  2017-05-10 17:50         ` Ævar Arnfjörð Bjarmason
  2017-05-10  4:57       ` Shawn Pearce
  2017-05-10 17:00       ` Jonathan Nieder
  2 siblings, 1 reply; 47+ messages in thread
From: Mike Hommey @ 2017-05-10  4:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Jonathan Tan, git

On Wed, May 10, 2017 at 12:33:44AM -0400, Jeff King wrote:
> On Tue, May 09, 2017 at 09:22:11PM -0700, Shawn Pearce wrote:
> 
> > > Hmm. That makes sense generally, as the request should succeed. But it
> > > seems like we're creating a client that will sometimes succeed and
> > > sometimes fail, and the reasoning will be somewhat opaque to the user.
> > > I have a feeling I'm missing some context on when you'd expect this to
> > > kick in.
> > 
> > Specifically, someone I know was looking at building an application
> > that is passed only a SHA-1 for the tip of a ref on a popular hosting
> > site[1]. They wanted to run `git fetch URL SHA1`, but this failed
> > because the site doesn't have upload.allowtipsha1inwant enabled.
> > However the SHA1 was clearly in the output of git ls-remote.
> 
> OK. So this is basically a case where we expect that the user knows what
> they're doing.
> 
> > For various reasons they expected this to work, because it works
> > against other sites that do have upload.allowtipsha1inwant enabled.
> > And I personally just expected it to work because the fetch client
> > accepts SHA-1s, and the wire protocol uses "want SHA1" not "want ref",
> > and the SHA-1 passed on the command line was currently in the
> > advertisement when the connection opened, so its certainly something
> > the server is currently willing to serve.
> 
> Right, makes sense.  I wondered if GitHub should be turning on
> allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
> some internal refs[1], and they're things that users wouldn't want to
> fetch. The problem for your case really is just on the client side, and
> this patch fixes it.

More broadly, I think it is desirable that any commit that can be
reached from public refs can be fetched by an explicit sha1 without
allowTipSHA1InWant.

Mike

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10  4:33     ` Jeff King
  2017-05-10  4:46       ` Mike Hommey
@ 2017-05-10  4:57       ` Shawn Pearce
  2017-05-10 17:00       ` Jonathan Nieder
  2 siblings, 0 replies; 47+ messages in thread
From: Shawn Pearce @ 2017-05-10  4:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git

On Tue, May 9, 2017 at 9:33 PM, Jeff King <peff@peff.net> wrote:
> On Tue, May 09, 2017 at 09:22:11PM -0700, Shawn Pearce wrote:
>
>> > Hmm. That makes sense generally, as the request should succeed. But it
>> > seems like we're creating a client that will sometimes succeed and
>> > sometimes fail, and the reasoning will be somewhat opaque to the user.
>> > I have a feeling I'm missing some context on when you'd expect this to
>> > kick in.
>>
>> Specifically, someone I know was looking at building an application
>> that is passed only a SHA-1 for the tip of a ref on a popular hosting
>> site[1]. They wanted to run `git fetch URL SHA1`, but this failed
>> because the site doesn't have upload.allowtipsha1inwant enabled.
>> However the SHA1 was clearly in the output of git ls-remote.
>
> OK. So this is basically a case where we expect that the user knows what
> they're doing.
>
>> For various reasons they expected this to work, because it works
>> against other sites that do have upload.allowtipsha1inwant enabled.
>> And I personally just expected it to work because the fetch client
>> accepts SHA-1s, and the wire protocol uses "want SHA1" not "want ref",
>> and the SHA-1 passed on the command line was currently in the
>> advertisement when the connection opened, so its certainly something
>> the server is currently willing to serve.
>
> Right, makes sense.  I wondered if GitHub should be turning on
> allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
> some internal refs[1], and they're things that users wouldn't want to
> fetch. The problem for your case really is just on the client side, and
> this patch fixes it.
>
> Some of this context might be useful in the commit message.
>
> -Peff
>
> [1] The reachability checks from upload-pack don't actually do much on
>     GitHub, because you can generally access the objects via the API or
>     the web site anyway. So I'm not really opposed to turning on
>     allowTipSHA1InWant if it would be useful for users, but after
>     Jonathan's patch I don't see how it would be.

Right, I agree. That is why we proposed this patch to the client,
rather than a support request with GitHub to change server
configuration. :)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
  2017-05-09 22:16 ` Jeff King
@ 2017-05-10 16:44 ` Jonathan Tan
  2017-05-10 18:01   ` Jonathan Nieder
  2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2017-05-10 16:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, spearce, peff

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

Thanks, peff, for your suggestions. Changes from v1:
 - updated commit message to explain a bit of the context
 - use parse_oid_hex instead of get_oid_hex to avoid the constant 40

peff also suggested a possible optimization in computing is_literal_sha1
outside the loop. After some thought, I don't think that it takes that
much time in the general case where the ref name starts with "refs/",
because parse_oid_hex will see the "r" (which is not in "[0-9a-f]") and
immediately return. So I left it as-is.

 fetch-pack.c          | 31 +++++++++++++++++--------------
 t/t5500-fetch-pack.sh |  6 ++++++
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..d4b57b9ba 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -592,6 +592,15 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static int is_literal_sha1(const struct ref *ref)
+{
+	struct object_id oid;
+	const char *end;
+	return !parse_oid_hex(ref->name, &oid, &end) &&
+	       !*end &&
+	       !oidcmp(&oid, &ref->old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
@@ -601,7 +610,6 @@ static void filter_refs(struct fetch_pack_args *args,
 	struct ref *ref, *next;
 	int i;
 
-	i = 0;
 	for (ref = *refs; ref; ref = next) {
 		int keep = 0;
 		next = ref->next;
@@ -610,15 +618,14 @@ static void filter_refs(struct fetch_pack_args *args,
 		    check_refname_format(ref->name, 0))
 			; /* trash */
 		else {
-			while (i < nr_sought) {
-				int cmp = strcmp(ref->name, sought[i]->name);
-				if (cmp < 0)
-					break; /* definitely do not have it */
-				else if (cmp == 0) {
-					keep = 1; /* definitely have it */
-					sought[i]->match_status = REF_MATCHED;
+			for (i = 0; i < nr_sought; i++) {
+				struct ref *s = sought[i];
+				if (!strcmp(ref->name, s->name) ||
+				    (is_literal_sha1(s) &&
+				     !oidcmp(&ref->old_oid, &s->old_oid))) {
+					keep = 1;
+					s->match_status = REF_MATCHED;
 				}
-				i++;
 			}
 		}
 
@@ -637,14 +644,10 @@ static void filter_refs(struct fetch_pack_args *args,
 
 	/* Append unmatched requests to the list */
 	for (i = 0; i < nr_sought; i++) {
-		unsigned char sha1[20];
-
 		ref = sought[i];
 		if (ref->match_status != REF_NOT_MATCHED)
 			continue;
-		if (get_sha1_hex(ref->name, sha1) ||
-		    ref->name[40] != '\0' ||
-		    hashcmp(sha1, ref->old_oid.hash))
+		if (!is_literal_sha1(ref))
 			continue;
 
 		if ((allow_unadvertised_object_request &
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..d87983bc2 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,12 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	git init server &&
+	test_commit -C server 4 &&
+	git fetch-pack server $(git -C server rev-parse refs/heads/master)
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10  4:33     ` Jeff King
  2017-05-10  4:46       ` Mike Hommey
  2017-05-10  4:57       ` Shawn Pearce
@ 2017-05-10 17:00       ` Jonathan Nieder
  2017-05-10 18:55         ` Sebastian Schuberth
  2017-05-11  9:59         ` Jeff King
  2 siblings, 2 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-10 17:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Jonathan Tan, git

Hi,

Jeff King wrote:

> Right, makes sense.  I wondered if GitHub should be turning on
> allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
> some internal refs[1], and they're things that users wouldn't want to
> fetch. The problem for your case really is just on the client side, and
> this patch fixes it.
[...]
> [1] The reachability checks from upload-pack don't actually do much on
>     GitHub, because you can generally access the objects via the API or
>     the web site anyway. So I'm not really opposed to turning on
>     allowTipSHA1InWant if it would be useful for users, but after
>     Jonathan's patch I don't see how it would be.

Given that, what would make me really happy is if github enables
uploadpack.allowAnySHA1InWant.  That would be useful for me, at least.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10  4:46       ` Mike Hommey
@ 2017-05-10 17:50         ` Ævar Arnfjörð Bjarmason
  2017-05-10 18:20           ` Jonathan Nieder
  0 siblings, 1 reply; 47+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-10 17:50 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Jeff King, Shawn Pearce, Jonathan Tan, git

On Wed, May 10, 2017 at 6:46 AM, Mike Hommey <mh@glandium.org> wrote:
> On Wed, May 10, 2017 at 12:33:44AM -0400, Jeff King wrote:
>> On Tue, May 09, 2017 at 09:22:11PM -0700, Shawn Pearce wrote:
>>
>> > > Hmm. That makes sense generally, as the request should succeed. But it
>> > > seems like we're creating a client that will sometimes succeed and
>> > > sometimes fail, and the reasoning will be somewhat opaque to the user.
>> > > I have a feeling I'm missing some context on when you'd expect this to
>> > > kick in.
>> >
>> > Specifically, someone I know was looking at building an application
>> > that is passed only a SHA-1 for the tip of a ref on a popular hosting
>> > site[1]. They wanted to run `git fetch URL SHA1`, but this failed
>> > because the site doesn't have upload.allowtipsha1inwant enabled.
>> > However the SHA1 was clearly in the output of git ls-remote.
>>
>> OK. So this is basically a case where we expect that the user knows what
>> they're doing.
>>
>> > For various reasons they expected this to work, because it works
>> > against other sites that do have upload.allowtipsha1inwant enabled.
>> > And I personally just expected it to work because the fetch client
>> > accepts SHA-1s, and the wire protocol uses "want SHA1" not "want ref",
>> > and the SHA-1 passed on the command line was currently in the
>> > advertisement when the connection opened, so its certainly something
>> > the server is currently willing to serve.
>>
>> Right, makes sense.  I wondered if GitHub should be turning on
>> allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
>> some internal refs[1], and they're things that users wouldn't want to
>> fetch. The problem for your case really is just on the client side, and
>> this patch fixes it.
>
> More broadly, I think it is desirable that any commit that can be
> reached from public refs can be fetched by an explicit sha1 without
> allowTipSHA1InWant.

Just a side question, what are the people who use this feature using
it for? The only thing I can think of myself is some out of band ref
advertisement because you've got squillions of refs as a hack around
git's limitations in that area.

Are there other use-cases for this? All the commits[1] that touched
this feature just explain what, not why.

1. git log --reverse -p -i -Gallowtipsha1inwant

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 16:44 ` [PATCH v2] " Jonathan Tan
@ 2017-05-10 18:01   ` Jonathan Nieder
  0 siblings, 0 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-10 18:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, spearce, peff

Hi,

Jonathan Tan wrote:

> fetch-pack, when fetching a literal SHA-1 from a server that is not
> configured with uploadpack.allowtipsha1inwant (or similar), always
> returns an error message of the form "Server does not allow request for
> unadvertised object %s". However, it is sometimes the case that such
> object is advertised.

Thanks for fixing it.

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -592,6 +592,15 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
>  	}
>  }
>  
> +static int is_literal_sha1(const struct ref *ref)
> +{
> +	struct object_id oid;
> +	const char *end;
> +	return !parse_oid_hex(ref->name, &oid, &end) &&
> +	       !*end &&
> +	       !oidcmp(&oid, &ref->old_oid);

When would a ref have its name be a oid and its value be a different
oid?

Ah, this check comes from

	commit b7916422c741b50925d454819b1977449fccc111
	Author: Jeff King <peff@peff.net>
	Date:   Thu Mar 19 16:34:51 2015 -0400

	    filter_ref: avoid overwriting ref->old_sha1 with garbage

which explains that the answer is "never" (leaving me even more
confused).  Anyway, that's orthogonal to this patch.

> +}
> +
>  static void filter_refs(struct fetch_pack_args *args,
>  			struct ref **refs,
>  			struct ref **sought, int nr_sought)
> @@ -601,7 +610,6 @@ static void filter_refs(struct fetch_pack_args *args,
>  	struct ref *ref, *next;
>  	int i;
>  
> -	i = 0;
>  	for (ref = *refs; ref; ref = next) {
>  		int keep = 0;
>  		next = ref->next;
> @@ -610,15 +618,14 @@ static void filter_refs(struct fetch_pack_args *args,
>  		    check_refname_format(ref->name, 0))
>  			; /* trash */
>  		else {
> -			while (i < nr_sought) {
> -				int cmp = strcmp(ref->name, sought[i]->name);
> -				if (cmp < 0)
> -					break; /* definitely do not have it */
> -				else if (cmp == 0) {
> -					keep = 1; /* definitely have it */
> -					sought[i]->match_status = REF_MATCHED;
> +			for (i = 0; i < nr_sought; i++) {

This resets i each time this code path is encountered.  Doesn't that
make this more expensive e.g. in the case where there are no sha1s
among the refs to be searched for?

Should this be conditional on whether there are any sha1-shaped refs in
'sought' to avoid that performance regression?

It's tempting to split 'sought' into two lists --- sha1-shaped refs
that have to be processed in full every time and the others that can
be processed in a single pass because they are in sorted order.

[...]
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -547,6 +547,12 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
>  	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
>  '
>  
> +test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
> +	git init server &&
> +	test_commit -C server 4 &&
> +	git fetch-pack server $(git -C server rev-parse refs/heads/master)
> +'

Is there test for the error case, too? (I.e. the server does not advertise
fetch-by-sha1 and the sha1 is not among the advertised refs)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 17:50         ` Ævar Arnfjörð Bjarmason
@ 2017-05-10 18:20           ` Jonathan Nieder
  2017-05-10 18:48             ` Martin Fick
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-10 18:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Mike Hommey, Jeff King, Shawn Pearce, Jonathan Tan, git

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Just a side question, what are the people who use this feature using
> it for? The only thing I can think of myself is some out of band ref
> advertisement because you've got squillions of refs as a hack around
> git's limitations in that area.

That's one use case.

Another is when you really care about the exact sha1 (for example
because you are an automated build system and this is the specific
sha1 you have already decided you want to build).

> Are there other use-cases for this? All the commits[1] that touched
> this feature just explain what, not why.

Similar to the build system case I described above is when a human has
a sha1 (from a mailing list, or source browser, or whatever) and wants
to fetch just that revision, with --depth=1.  You could use "git
archive --remote", but (1) github doesn't support that and (2) that
doesn't give you all the usual git-ish goodness.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 18:20           ` Jonathan Nieder
@ 2017-05-10 18:48             ` Martin Fick
  2017-05-10 18:54               ` Jonathan Nieder
  0 siblings, 1 reply; 47+ messages in thread
From: Martin Fick @ 2017-05-10 18:48 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, Mike Hommey, Jeff King,
	Shawn Pearce, Jonathan Tan, git

On Wednesday, May 10, 2017 11:20:49 AM Jonathan Nieder 
wrote:
> Hi,
> 
> Ævar Arnfjörð Bjarmason wrote:
> > Just a side question, what are the people who use this
> > feature using it for? The only thing I can think of
> > myself is some out of band ref advertisement because
> > you've got squillions of refs as a hack around git's
> > limitations in that area.
> 
> That's one use case.
> 
> Another is when you really care about the exact sha1 (for
> example because you are an automated build system and
> this is the specific sha1 you have already decided you
> want to build).
> > Are there other use-cases for this? All the commits[1]
> > that touched this feature just explain what, not why.
> 
> Similar to the build system case I described above is when
> a human has a sha1 (from a mailing list, or source
> browser, or whatever) and wants to fetch just that
> revision, with --depth=1.  You could use "git archive
> --remote", but (1) github doesn't support that and (2)
> that doesn't give you all the usual git-ish goodness.


Perhaps another use case is submodules and repo(android 
tool) subprojects since they can be "pinned" to sha1s,

-Martin
-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 18:48             ` Martin Fick
@ 2017-05-10 18:54               ` Jonathan Nieder
  0 siblings, 0 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-10 18:54 UTC (permalink / raw)
  To: Martin Fick
  Cc: Ævar Arnfjörð Bjarmason, Mike Hommey, Jeff King,
	Shawn Pearce, Jonathan Tan, git

On Wed, May 10, 2017 at 12:48:37PM -0600, Martin Fick wrote:
>> Ævar Arnfjörð Bjarmason wrote:

>>> Just a side question, what are the people who use this
>>> feature using it for? The only thing I can think of
>>> myself is some out of band ref advertisement because
>>> you've got squillions of refs as a hack around git's
>>> limitations in that area.
[...]
> Perhaps another use case is submodules and repo(android
> tool) subprojects since they can be "pinned" to sha1s,

Yes, thanks for mentioning that.  We've talked a little about making
'git fetch --recurse-submodules' use this feature, either always, or
if after the usual fetch the desired commit is not available in the
submodule.

Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 17:00       ` Jonathan Nieder
@ 2017-05-10 18:55         ` Sebastian Schuberth
  2017-05-11  9:59         ` Jeff King
  1 sibling, 0 replies; 47+ messages in thread
From: Sebastian Schuberth @ 2017-05-10 18:55 UTC (permalink / raw)
  To: git; +Cc: Shawn Pearce, Jonathan Tan, git

On 2017-05-10 19:00, Jonathan Nieder wrote:

>> Right, makes sense.  I wondered if GitHub should be turning on
>> allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
>> some internal refs[1], and they're things that users wouldn't want to
>> fetch. The problem for your case really is just on the client side, and
>> this patch fixes it.
> [...]
>> [1] The reachability checks from upload-pack don't actually do much on
>>      GitHub, because you can generally access the objects via the API or
>>      the web site anyway. So I'm not really opposed to turning on
>>      allowTipSHA1InWant if it would be useful for users, but after
>>      Jonathan's patch I don't see how it would be.
> 
> Given that, what would make me really happy is if github enables
> uploadpack.allowAnySHA1InWant.  That would be useful for me, at least.

Me too, BTW. If you're only interested in building / analyzing a 
specific SHA1 it's really a waste of network resources to fetch all of 
the history.

-- 
Sebastian Schuberth


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
  2017-05-09 22:16 ` Jeff King
  2017-05-10 16:44 ` [PATCH v2] " Jonathan Tan
@ 2017-05-10 22:11 ` Jonathan Tan
  2017-05-10 23:22   ` Jonathan Nieder
                     ` (2 more replies)
  2017-05-11 21:14 ` [PATCH v4] " Jonathan Tan
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-05-10 22:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, peff, spearce

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

Reviewers, note that the patch has been substantially rewritten.

After looking at ways to solve jrnieder's performance concerns, if we're
going to need to manage one more item of state within the function, I
might as well use my earlier idea of storing unmatched refs in its own
list instead of immediately freeing them. This version of the patch
should have much better performance characteristics.

I have also added the test requested by jrnieder (to show that
fetch-pack cannot fetch a SHA-1 if it is not advertised as a ref).

 fetch-pack.c          | 29 ++++++++++++++++++++++++++++-
 t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..5cace7458 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -598,6 +598,7 @@ static void filter_refs(struct fetch_pack_args *args,
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
 	int i;
 
@@ -631,13 +632,15 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
 	/* Append unmatched requests to the list */
 	for (i = 0; i < nr_sought; i++) {
 		unsigned char sha1[20];
+		int can_append = 0;
 
 		ref = sought[i];
 		if (ref->match_status != REF_NOT_MATCHED)
@@ -649,6 +652,25 @@ static void filter_refs(struct fetch_pack_args *args,
 
 		if ((allow_unadvertised_object_request &
 		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+			can_append = 1;
+		} else {
+			struct ref *u;
+			/* Check all refs, including those already matched */
+			for (u = unmatched; u; u = u->next) {
+				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
+					can_append = 1;
+					goto can_append;
+				}
+			}
+			for (u = newlist; u; u = u->next) {
+				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
+					can_append = 1;
+					break;
+				}
+			}
+		}
+can_append:
+		if (can_append) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -657,6 +679,11 @@ static void filter_refs(struct fetch_pack_args *args,
 		}
 	}
 	*refs = newlist;
+
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
 }
 
 static void mark_alternate_complete(struct object *obj)
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..80a1a3239 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master)
+'
+
+test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+	test_commit -C server 2 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/tags/1) refs/tags/1
+'
+
+test_expect_success 'fetch-pack cannot fetch a raw sha1 that is not advertised as a ref' '
+	rm -rf server &&
+
+	git init server &&
+	test_commit -C server 5 &&
+	git -C server tag -d 5 &&
+	test_commit -C server 6 &&
+
+	git init client &&
+	test_must_fail git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master^) 2>err &&
+	test_i18ngrep "Server does not allow request for unadvertised object" err
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
@ 2017-05-10 23:22   ` Jonathan Nieder
  2017-05-11  9:46   ` Jeff King
  2017-05-11 10:05   ` Jeff King
  2 siblings, 0 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-10 23:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, spearce

Hi,

Jonathan Tan wrote:

> fetch-pack, when fetching a literal SHA-1 from a server that is not
> configured with uploadpack.allowtipsha1inwant (or similar), always
> returns an error message of the form "Server does not allow request for
> unadvertised object %s". However, it is sometimes the case that such
> object is advertised. This situation would occur, for example, if a user
> or a script was provided a SHA-1 instead of a branch or tag name for
> fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
> that SHA-1.

Yay, thanks again.

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -598,6 +598,7 @@ static void filter_refs(struct fetch_pack_args *args,
>  {
>  	struct ref *newlist = NULL;
>  	struct ref **newtail = &newlist;
> +	struct ref *unmatched = NULL;
>  	struct ref *ref, *next;
>  	int i;
>  
> @@ -631,13 +632,15 @@ static void filter_refs(struct fetch_pack_args *args,
>  			ref->next = NULL;
>  			newtail = &ref->next;
>  		} else {
> -			free(ref);
> +			ref->next = unmatched;
> +			unmatched = ref;

Interesting.  Makes sense.

[...]
>  	/* Append unmatched requests to the list */
>  	for (i = 0; i < nr_sought; i++) {
>  		unsigned char sha1[20];
> +		int can_append = 0;

Can this be simplified by factoring out a function?  I.e. something
like

	if ((... ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)
	    || find_oid_among_refs(unmatched, oid)
	    || find_oid_among_refs(newlist, oid)) {
		ref->match_status = REF_MATCHED;
		...
	} else {
		ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
	}

[...]
> @@ -657,6 +679,11 @@ static void filter_refs(struct fetch_pack_args *args,
>  		}
>  	}
>  	*refs = newlist;
> +
> +	for (ref = unmatched; ref; ref = next) {
> +		next = ref->next;
> +		free(ref);
> +	}
>  }

optional nit: keeping the "*refs =" line as the last line of the
function would make it easier to see the contract at a glance.  (That
said, a doc comment at the top would make it even clearer.)

> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '

Yay, thanks much for these.

[...]
> +test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '

Ha, you read my mind. :)

Except for the search-ref-list-for-oid function needing to be factored out,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
  2017-05-10 23:22   ` Jonathan Nieder
@ 2017-05-11  9:46   ` Jeff King
  2017-05-11 17:51     ` Jonathan Tan
  2017-05-11 10:05   ` Jeff King
  2 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2017-05-11  9:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, spearce

On Wed, May 10, 2017 at 03:11:57PM -0700, Jonathan Tan wrote:

> After looking at ways to solve jrnieder's performance concerns, if we're
> going to need to manage one more item of state within the function, I
> might as well use my earlier idea of storing unmatched refs in its own
> list instead of immediately freeing them. This version of the patch
> should have much better performance characteristics.

Hrm. So the problem in your original was that the loop became quadratic
in the number of refs when fetching all of them (because the original
relies on the sorting to essentially do a list-merge). Are there any
quadratic bits left?

> @@ -649,6 +652,25 @@ static void filter_refs(struct fetch_pack_args *args,
>  
>  		if ((allow_unadvertised_object_request &
>  		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
> +			can_append = 1;
> +		} else {
> +			struct ref *u;
> +			/* Check all refs, including those already matched */
> +			for (u = unmatched; u; u = u->next) {
> +				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
> +					can_append = 1;
> +					goto can_append;
> +				}
> +			}

This is inside the nr_sought loop. So if I were to do:

  git fetch origin $(git ls-remote origin | awk '{print $1}')

we're back to being quadratic. I realize that's probably a silly thing
to do, but in the general case, you're O(m*n), where "n" is number of
unmatched remote refs and "m" is the number of SHA-1s you're looking
for.

Doing better would require either sorting both lists, or storing the
oids in something that has better than linear-time lookup.  Perhaps a
sha1_array or an oidset? We don't actually need to know anything about
the unmatched refs after the first loop. We just need the list of oids
that let us do can_append.

AIUI, you could also avoid creating the unmatched list entirely when the
server advertises tip/reachable sha1s. That's a small optimization, but
I think it may actually make the logic clearer.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 17:00       ` Jonathan Nieder
  2017-05-10 18:55         ` Sebastian Schuberth
@ 2017-05-11  9:59         ` Jeff King
  2017-05-11 19:03           ` Jonathan Nieder
  1 sibling, 1 reply; 47+ messages in thread
From: Jeff King @ 2017-05-11  9:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Shawn Pearce, Jonathan Tan, git

On Wed, May 10, 2017 at 10:00:44AM -0700, Jonathan Nieder wrote:

> > Right, makes sense.  I wondered if GitHub should be turning on
> > allowTipSHA1InWant, but it really doesn't make sense to. We _do_ hide
> > some internal refs[1], and they're things that users wouldn't want to
> > fetch. The problem for your case really is just on the client side, and
> > this patch fixes it.
> [...]
> > [1] The reachability checks from upload-pack don't actually do much on
> >     GitHub, because you can generally access the objects via the API or
> >     the web site anyway. So I'm not really opposed to turning on
> >     allowTipSHA1InWant if it would be useful for users, but after
> >     Jonathan's patch I don't see how it would be.
> 
> Given that, what would make me really happy is if github enables
> uploadpack.allowAnySHA1InWant.  That would be useful for me, at least.

One of my hesitations is that we've actually considered moving in the
opposite direction. The object storage for all of the repositories in a
network is shared, so I can fork git.git, push up malicious crap, and
then point people to:

  https://github.com/git/git/commit/$sha1

and it resolves. Obviously there's a social-engineering component to any
such attack, but it's not great. And even without security in mind, it's
potentially confusing. So we've looked at enforcing reachability from
the refs of git/git for a case like that. There's some collateral
damage, though (e.g., people might actually want to look at unreferenced
objects after a force-push). And there are complications around things
like refs/pull (which could still come from another fork, but which you
might reasonably want to reference as part of a PR in the context of
git/git).

Turning on allowAnySHA1InWant brings that confusion to "git fetch", too.
To some degree it's already there for refs/pull, but with the current
client you at least know that you're fetching PR refs (and they're not
even fetched by default). Whereas after Jonathan Tan's patch, you can
social-engineer somebody into:

  git fetch https://github.com/git/git $sha1

if you open a PR that points to some malicious $sha1. I don't think
that's a reason not to take his patch, though.

Arguably refs/pull/ is an abomination that mixes up ownership and should
be destroyed. There really isn't a great alternative, though, short of
representing it as a completely separate repository (which would mean
anybody fetching those refs would have to make a separate fetch
request).

But even leaving all the refs/pull stuff aside, allowAnySHA1InWant does
seem to increase that confusion, and I don't see a way around it short
of never sharing objects between repositories at all. So I think at most
we'd do allowReachableSHA1InWant.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
  2017-05-10 23:22   ` Jonathan Nieder
  2017-05-11  9:46   ` Jeff King
@ 2017-05-11 10:05   ` Jeff King
  2017-05-11 17:00     ` Brandon Williams
  2 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2017-05-11 10:05 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, spearce

On Wed, May 10, 2017 at 03:11:57PM -0700, Jonathan Tan wrote:

> fetch-pack, when fetching a literal SHA-1 from a server that is not
> configured with uploadpack.allowtipsha1inwant (or similar), always
> returns an error message of the form "Server does not allow request for
> unadvertised object %s". However, it is sometimes the case that such
> object is advertised. This situation would occur, for example, if a user
> or a script was provided a SHA-1 instead of a branch or tag name for
> fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
> that SHA-1.
> 
> Teach fetch-pack to also check the SHA-1s of the refs in the received
> ref advertisement if a literal SHA-1 was given by the user.

Stepping back a bit, what does this mean for a world where we implement
protocol extensions to let the client specify a set of refspecs to limit
the advertisement?

If we give the server our usual set of fetch refspecs, then we might
fail to fulfill a request that would have been advertised outside of
that set. And the behavior is confusing and non-transparent to the user.
I don't think that really makes sense, though; the advertisement we ask
for from the server should include only the bits we're interested in for
_this_ fetch.

If we tell the server "we are interested in abcd1234", then it's not
going to find any matching ref by name, obviously. So should servers
then treat 40-hex names in the incoming refspecs as a request to show
any names which have a matching sha1? That works against any server-side
optimizations to avoid looking at the complete set of refs, but it would
only have to kick in when the user actually specified a single SHA-1
(and even then only when allowAnySHA1 isn't on). So that's probably
workable.

None of this is your problem now either way; the advertisement-limiting
extension is still vaporware, albeit one we've discussed a lot. I just
wanted to make sure we weren't painting ourselves into any corners. And
I think this case could probably be handled.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 10:05   ` Jeff King
@ 2017-05-11 17:00     ` Brandon Williams
  2017-05-13  9:29       ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-05-11 17:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git, jrnieder, spearce

On 05/11, Jeff King wrote:
> On Wed, May 10, 2017 at 03:11:57PM -0700, Jonathan Tan wrote:
> 
> > fetch-pack, when fetching a literal SHA-1 from a server that is not
> > configured with uploadpack.allowtipsha1inwant (or similar), always
> > returns an error message of the form "Server does not allow request for
> > unadvertised object %s". However, it is sometimes the case that such
> > object is advertised. This situation would occur, for example, if a user
> > or a script was provided a SHA-1 instead of a branch or tag name for
> > fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
> > that SHA-1.
> > 
> > Teach fetch-pack to also check the SHA-1s of the refs in the received
> > ref advertisement if a literal SHA-1 was given by the user.
> 
> Stepping back a bit, what does this mean for a world where we implement
> protocol extensions to let the client specify a set of refspecs to limit
> the advertisement?
> 
> If we give the server our usual set of fetch refspecs, then we might
> fail to fulfill a request that would have been advertised outside of
> that set. And the behavior is confusing and non-transparent to the user.
> I don't think that really makes sense, though; the advertisement we ask
> for from the server should include only the bits we're interested in for
> _this_ fetch.
> 
> If we tell the server "we are interested in abcd1234", then it's not
> going to find any matching ref by name, obviously. So should servers
> then treat 40-hex names in the incoming refspecs as a request to show
> any names which have a matching sha1? That works against any server-side
> optimizations to avoid looking at the complete set of refs, but it would
> only have to kick in when the user actually specified a single SHA-1
> (and even then only when allowAnySHA1 isn't on). So that's probably
> workable.
> 
> None of this is your problem now either way; the advertisement-limiting
> extension is still vaporware, albeit one we've discussed a lot. I just
> wanted to make sure we weren't painting ourselves into any corners. And
> I think this case could probably be handled.

I can't remember, is there any reason why it is still vaporware? As in
what is holding us back from doing the advertisement-limiting (or doing
away with the initial ref  advertisement)?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11  9:46   ` Jeff King
@ 2017-05-11 17:51     ` Jonathan Tan
  2017-05-11 20:52       ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2017-05-11 17:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jrnieder, spearce

Thanks for your suggestions. I'll hold off on sending out a new patch 
(following Jonathan Nieder's suggestions [1]) until we decide if further 
optimizations (for example, as suggested by Peff) need to be done.

[1] <20170510232231.GC28740@aiede.svl.corp.google.com>

On 05/11/2017 02:46 AM, Jeff King wrote:
> On Wed, May 10, 2017 at 03:11:57PM -0700, Jonathan Tan wrote:
>
>> After looking at ways to solve jrnieder's performance concerns, if we're
>> going to need to manage one more item of state within the function, I
>> might as well use my earlier idea of storing unmatched refs in its own
>> list instead of immediately freeing them. This version of the patch
>> should have much better performance characteristics.
>
> Hrm. So the problem in your original was that the loop became quadratic
> in the number of refs when fetching all of them (because the original
> relies on the sorting to essentially do a list-merge). Are there any
> quadratic bits left?
>
>> @@ -649,6 +652,25 @@ static void filter_refs(struct fetch_pack_args *args,
>>
>>  		if ((allow_unadvertised_object_request &
>>  		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
>> +			can_append = 1;
>> +		} else {
>> +			struct ref *u;
>> +			/* Check all refs, including those already matched */
>> +			for (u = unmatched; u; u = u->next) {
>> +				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
>> +					can_append = 1;
>> +					goto can_append;
>> +				}
>> +			}
>
> This is inside the nr_sought loop. So if I were to do:
>
>   git fetch origin $(git ls-remote origin | awk '{print $1}')
>
> we're back to being quadratic. I realize that's probably a silly thing
> to do, but in the general case, you're O(m*n), where "n" is number of
> unmatched remote refs and "m" is the number of SHA-1s you're looking
> for.

The original patch was quadratic regardless of whether we're fetching 
names or SHA-1s, which can be said to be bad since it is a regression in 
an existing and common use case (and I agree). This one is O(m*n), as 
you said - the "quadratic-ness" only kicks in if you fetch SHA-1s, which 
before this patch is impossible.

> Doing better would require either sorting both lists, or storing the
> oids in something that has better than linear-time lookup.  Perhaps a
> sha1_array or an oidset? We don't actually need to know anything about
> the unmatched refs after the first loop. We just need the list of oids
> that let us do can_append.

Having a sha1_array or oidset would require allocation (O(n log n) time, 
I think, in addition to O(n) space) and this cost would be incurred 
regardless of how many SHA-1s were actually fetched (if m is an order of 
magnitude less than log n, for example, having a sha1_array might be 
actually worse). Also, presumably we don't want to incur this cost if we 
are fetching zero SHA-1s, so we would need to do some sort of pre-check. 
I'm inclined to leave it the way it is and consider this only if the use 
case of fetching many SHA1-s comes up.

> AIUI, you could also avoid creating the unmatched list entirely when the
> server advertises tip/reachable sha1s. That's a small optimization, but
> I think it may actually make the logic clearer.

If you mean adding an "if" block at the point where we add the unmatched 
ref to the unmatched list (that either adds it to the list or 
immediately frees it), I think it makes the logic slightly more 
complicated. Or maybe you had something else in mind?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11  9:59         ` Jeff King
@ 2017-05-11 19:03           ` Jonathan Nieder
  2017-05-11 21:04             ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-11 19:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Jonathan Tan, git

Jeff King wrote:
> On Wed, May 10, 2017 at 10:00:44AM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> [1] The reachability checks from upload-pack don't actually do much on
>>>     GitHub, because you can generally access the objects via the API or
>>>     the web site anyway.
[...]
>> Given that, what would make me really happy is if github enables
>> uploadpack.allowAnySHA1InWant.  That would be useful for me, at least.
>
> One of my hesitations is that we've actually considered moving in the
> opposite direction. The object storage for all of the repositories in a
> network is shared, so I can fork git.git, push up malicious crap, and
> then point people to:
>
>   https://github.com/git/git/commit/$sha1
>
> and it resolves. Obviously there's a social-engineering component to any
> such attack, but it's not great. And even without security in mind, it's
> potentially confusing.
[...]
> But even leaving all the refs/pull stuff aside, allowAnySHA1InWant does
> seem to increase that confusion, and I don't see a way around it short
> of never sharing objects between repositories at all. So I think at most
> we'd do allowReachableSHA1InWant.

I had guessed you didn't want to do allowReachableSHA1InWant for
performance reasons.  (I haven't checked to what extent we are already
taking advantage of bitmaps to avoid a slow reachability check.)  If I
was wrong and allowReachableSHA1InWant is on the table then it is of
course even better. :)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 17:51     ` Jonathan Tan
@ 2017-05-11 20:52       ` Jeff King
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-11 20:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, spearce

On Thu, May 11, 2017 at 10:51:37AM -0700, Jonathan Tan wrote:

> > This is inside the nr_sought loop. So if I were to do:
> > 
> >   git fetch origin $(git ls-remote origin | awk '{print $1}')
> > 
> > we're back to being quadratic. I realize that's probably a silly thing
> > to do, but in the general case, you're O(m*n), where "n" is number of
> > unmatched remote refs and "m" is the number of SHA-1s you're looking
> > for.
> 
> The original patch was quadratic regardless of whether we're fetching names
> or SHA-1s, which can be said to be bad since it is a regression in an
> existing and common use case (and I agree). This one is O(m*n), as you said
> - the "quadratic-ness" only kicks in if you fetch SHA-1s, which before this
> patch is impossible.

Yeah, no question this is an improvement over the original. I think this
still "regresses" a certain request performance-wise, but it's a request
that could never have succeeded in the original. Mostly I'd just rather
not leave a hidden quadratic loop that will blow up in somebody's face
a year or two down the road.

> Having a sha1_array or oidset would require allocation (O(n log n) time, I
> think, in addition to O(n) space) and this cost would be incurred regardless
> of how many SHA-1s were actually fetched (if m is an order of magnitude less
> than log n, for example, having a sha1_array might be actually worse). Also,
> presumably we don't want to incur this cost if we are fetching zero SHA-1s,
> so we would need to do some sort of pre-check. I'm inclined to leave it the
> way it is and consider this only if the use case of fetching many SHA1-s
> comes up.

An oidset should be O(n) in both time and space. Which we're already
spending in the earlier loop (and in general, when we allocate O(n) ref
structs in the first place). I don't think we need to care too much
about micro-optimizing bumps to the constant-factor here; we just need
to make sure we don't end up in an algorithmic explosion.

And if we initialize the oidset lazily, then we incur that cost only
when we would be doing the linear walk in your patch anyway. See the
patch below.

> > AIUI, you could also avoid creating the unmatched list entirely when the
> > server advertises tip/reachable sha1s. That's a small optimization, but
> > I think it may actually make the logic clearer.
> 
> If you mean adding an "if" block at the point where we add the unmatched ref
> to the unmatched list (that either adds it to the list or immediately frees
> it), I think it makes the logic slightly more complicated. Or maybe you had
> something else in mind?

I was just thinking that it does not leave a case where we create a
data structure (the unmatched list) even though we know right then that
it will not ever be used. So it's an extra line of logic there, but the
overall function may be clearer. I dunno, it's probably not a big deal
either way.

-Peff

---
diff --git a/fetch-pack.c b/fetch-pack.c
index 5cace7458..a660331e6 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,6 +593,12 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, const struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
@@ -600,6 +607,8 @@ static void filter_refs(struct fetch_pack_args *args,
 	struct ref **newtail = &newlist;
 	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
+	int tip_oids_initialized = 0;
 	int i;
 
 	i = 0;
@@ -654,22 +663,15 @@ static void filter_refs(struct fetch_pack_args *args,
 		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
 			can_append = 1;
 		} else {
-			struct ref *u;
-			/* Check all refs, including those already matched */
-			for (u = unmatched; u; u = u->next) {
-				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
-					can_append = 1;
-					goto can_append;
-				}
-			}
-			for (u = newlist; u; u = u->next) {
-				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
-					can_append = 1;
-					break;
-				}
+			if (!tip_oids_initialized) {
+				/* Check all refs, including those already matched */
+				add_refs_to_oidset(&tip_oids, unmatched);
+				add_refs_to_oidset(&tip_oids, newlist);
+				tip_oids_initialized = 1;
 			}
+			can_append = oidset_contains(&tip_oids, &ref->old_oid);
 		}
-can_append:
+
 		if (can_append) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
@@ -680,6 +682,7 @@ static void filter_refs(struct fetch_pack_args *args,
 	}
 	*refs = newlist;
 
+	oidset_clear(&tip_oids);
 	for (ref = unmatched; ref; ref = next) {
 		next = ref->next;
 		free(ref);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 19:03           ` Jonathan Nieder
@ 2017-05-11 21:04             ` Jeff King
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-11 21:04 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Shawn Pearce, Jonathan Tan, git

On Thu, May 11, 2017 at 12:03:54PM -0700, Jonathan Nieder wrote:

> > But even leaving all the refs/pull stuff aside, allowAnySHA1InWant does
> > seem to increase that confusion, and I don't see a way around it short
> > of never sharing objects between repositories at all. So I think at most
> > we'd do allowReachableSHA1InWant.
> 
> I had guessed you didn't want to do allowReachableSHA1InWant for
> performance reasons.  (I haven't checked to what extent we are already
> taking advantage of bitmaps to avoid a slow reachability check.)  If I
> was wrong and allowReachableSHA1InWant is on the table then it is of
> course even better. :)

Performance is definitely a consideration for resolving sha1's via the
website, because those can be any object, and you have to look in all
the trees. Traversing all the commits in linux.git is ~5s of CPU. Doing
the same for the whole object graph is more like ~45s.

Our experiments did involve bitmaps, but there are some corner cases
where we might not have good bitmap coverage of certain refs (I think
this is something that could be improved; the commit selection in the
current bitmap writer is fairly naive).

But I think upload-pack's reachability check only handles commits
anyway. Adding 5s of CPU to a request isn't great, but it's pretty easy
to convince Git to use 5s of CPU already. And this would only kick in
when the client asked for a non-tip ref anyway, and in general I'd
expect requested sha1's to be near the ref tips.

I'll open a discussion internally about enabling it and measuring the
performance.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
                   ` (2 preceding siblings ...)
  2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
@ 2017-05-11 21:14 ` Jonathan Tan
  2017-05-11 21:35   ` Jonathan Nieder
  2017-05-11 22:30 ` [PATCH v5] " Jonathan Tan
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2017-05-11 21:14 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, peff

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

Thanks, peff. I've incorporated your suggestions - I don't feel very
strongly about this, but I guess it's worthwhile to avoid the quadratic
behavior if we can.

Also incorporated Jonathan Nieder's suggestion about the placement of
the last line. The relevant function is also factored out (following
peff's suggestion).
---
 fetch-pack.c          | 36 +++++++++++++++++++++++++++++++++++-
 t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..1adb1a6c2 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,13 +593,22 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, const struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
+	int tip_oids_initialized = 0;
 	int i;
 
 	i = 0;
@@ -631,13 +641,15 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
 	/* Append unmatched requests to the list */
 	for (i = 0; i < nr_sought; i++) {
 		unsigned char sha1[20];
+		int can_append = 0;
 
 		ref = sought[i];
 		if (ref->match_status != REF_NOT_MATCHED)
@@ -649,6 +661,21 @@ static void filter_refs(struct fetch_pack_args *args,
 
 		if ((allow_unadvertised_object_request &
 		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+			can_append = 1;
+		} else {
+			if (!tip_oids_initialized) {
+				/*
+				 * Check all refs, including those already
+				 * matched
+				 */
+				add_refs_to_oidset(&tip_oids, unmatched);
+				add_refs_to_oidset(&tip_oids, newlist);
+				tip_oids_initialized = 1;
+			}
+			can_append = oidset_contains(&tip_oids, &ref->old_oid);
+		}
+
+		if (can_append) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -656,6 +683,13 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
 		}
 	}
+
+	oidset_clear(&tip_oids);
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
+
 	*refs = newlist;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..80a1a3239 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master)
+'
+
+test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+	test_commit -C server 2 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/tags/1) refs/tags/1
+'
+
+test_expect_success 'fetch-pack cannot fetch a raw sha1 that is not advertised as a ref' '
+	rm -rf server &&
+
+	git init server &&
+	test_commit -C server 5 &&
+	git -C server tag -d 5 &&
+	test_commit -C server 6 &&
+
+	git init client &&
+	test_must_fail git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master^) 2>err &&
+	test_i18ngrep "Server does not allow request for unadvertised object" err
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 21:14 ` [PATCH v4] " Jonathan Tan
@ 2017-05-11 21:35   ` Jonathan Nieder
  2017-05-11 21:59     ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-11 21:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Hi,

Jonathan Tan wrote:

> Thanks, peff. I've incorporated your suggestions - I don't feel very
> strongly about this, but I guess it's worthwhile to avoid the quadratic
> behavior if we can.
>
> Also incorporated Jonathan Nieder's suggestion about the placement of
> the last line. The relevant function is also factored out (following
> peff's suggestion).

Thanks.  The structure still seems more complicated than it needs to
be.  More details below.

[...]
> +++ b/fetch-pack.c
[...]
> @@ -592,13 +593,22 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
>  	}
>  }
>  
> +static void add_refs_to_oidset(struct oidset *oids, const struct ref *refs)
> +{
> +	for (; refs; refs = refs->next)
> +		oidset_insert(oids, &refs->old_oid);

Makes sense.

[...]
>  	/* Append unmatched requests to the list */
>  	for (i = 0; i < nr_sought; i++) {
>  		unsigned char sha1[20];
> +		int can_append = 0;
>  
>  		ref = sought[i];
>  		if (ref->match_status != REF_NOT_MATCHED)
> @@ -649,6 +661,21 @@ static void filter_refs(struct fetch_pack_args *args,
>  
>  		if ((allow_unadvertised_object_request &
>  		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
> +			can_append = 1;
> +		} else {
> +			if (!tip_oids_initialized) {
> +				/*
> +				 * Check all refs, including those already
> +				 * matched
> +				 */
> +				add_refs_to_oidset(&tip_oids, unmatched);
> +				add_refs_to_oidset(&tip_oids, newlist);
> +				tip_oids_initialized = 1;
> +			}
> +			can_append = oidset_contains(&tip_oids, &ref->old_oid);
> +		}
> +
> +		if (can_append) {

This structure could be simplified by putting the lazy-initializing
tip_oids lookup in a separate function.  For example:

	int tip_oids_contain(struct oidset *tip_oids,
		struct ref *unmatched, struct ref *newlist,
		const struct oid *id)
	{
		if (oidset_is_empty(tip_oids)) {
			add_refs_to_oidset(tip_oids, unmatched);
			add_refs_to_oidset(tip_oids, newlist);
		}
		return oidset_contains(tip_oids, id);
	}

That way, the caller could be kept simple (eliminating can_append
and the repeated if).

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 21:35   ` Jonathan Nieder
@ 2017-05-11 21:59     ` Jeff King
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-11 21:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git

On Thu, May 11, 2017 at 02:35:17PM -0700, Jonathan Nieder wrote:

> This structure could be simplified by putting the lazy-initializing
> tip_oids lookup in a separate function.  For example:
> 
> 	int tip_oids_contain(struct oidset *tip_oids,
> 		struct ref *unmatched, struct ref *newlist,
> 		const struct oid *id)
> 	{
> 		if (oidset_is_empty(tip_oids)) {
> 			add_refs_to_oidset(tip_oids, unmatched);
> 			add_refs_to_oidset(tip_oids, newlist);
> 		}
> 		return oidset_contains(tip_oids, id);
> 	}

Yeah, I started to write it that way, but in the empty-ref case it will
try to add the empty refs over and over. I guess that's not that big a
deal, though, as we know the noop add is O(1) then. :)

> That way, the caller could be kept simple (eliminating can_append
> and the repeated if).

Yes, shoving it all into the sub-function is a big win for readability.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
                   ` (3 preceding siblings ...)
  2017-05-11 21:14 ` [PATCH v4] " Jonathan Tan
@ 2017-05-11 22:30 ` Jonathan Tan
  2017-05-11 22:46   ` Jonathan Nieder
  2017-05-12  3:06   ` Jeff King
  2017-05-12 20:45 ` Jonathan Tan
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-05-11 22:30 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, peff

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

Change from v4: incorporated Jonathan Nieder's suggestion about using
another function. There is no oidset_is_empty, so I checked for a
presence of a member variable instead.
---
 fetch-pack.c          | 34 ++++++++++++++++++++++++++++++++--
 t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..9dd430a65 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,13 +593,32 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
+static int tip_oids_contain(struct oidset *tip_oids,
+			    struct ref *unmatched, struct ref *newlist,
+			    const struct object_id *id)
+{
+	if (!tip_oids->map.cmpfn) {
+		add_refs_to_oidset(tip_oids, unmatched);
+		add_refs_to_oidset(tip_oids, newlist);
+	}
+	return oidset_contains(tip_oids, id);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
 	int i;
 
 	i = 0;
@@ -631,7 +651,8 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
@@ -648,7 +669,9 @@ static void filter_refs(struct fetch_pack_args *args,
 			continue;
 
 		if ((allow_unadvertised_object_request &
-		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+		     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
+		    tip_oids_contain(&tip_oids, unmatched, newlist,
+				     &ref->old_oid)) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -656,6 +679,13 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
 		}
 	}
+
+	oidset_clear(&tip_oids);
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
+
 	*refs = newlist;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..80a1a3239 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master)
+'
+
+test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+	test_commit -C server 2 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/tags/1) refs/tags/1
+'
+
+test_expect_success 'fetch-pack cannot fetch a raw sha1 that is not advertised as a ref' '
+	rm -rf server &&
+
+	git init server &&
+	test_commit -C server 5 &&
+	git -C server tag -d 5 &&
+	test_commit -C server 6 &&
+
+	git init client &&
+	test_must_fail git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master^) 2>err &&
+	test_i18ngrep "Server does not allow request for unadvertised object" err
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 22:30 ` [PATCH v5] " Jonathan Tan
@ 2017-05-11 22:46   ` Jonathan Nieder
  2017-05-12  2:59     ` Jeff King
  2017-05-12  6:01     ` Junio C Hamano
  2017-05-12  3:06   ` Jeff King
  1 sibling, 2 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-11 22:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Jonathan Tan wrote:

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -15,6 +15,7 @@
>  #include "version.h"
>  #include "prio-queue.h"
>  #include "sha1-array.h"
> +#include "oidset.h"
>  
>  static int transfer_unpack_limit = -1;
>  static int fetch_unpack_limit = -1;
> @@ -592,13 +593,32 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
>  	}
>  }
>  
> +static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
> +{
> +	for (; refs; refs = refs->next)
> +		oidset_insert(oids, &refs->old_oid);
> +}
> +
> +static int tip_oids_contain(struct oidset *tip_oids,
> +			    struct ref *unmatched, struct ref *newlist,
> +			    const struct object_id *id)
> +{
> +	if (!tip_oids->map.cmpfn) {

This feels like a layering violation.  Could it be e.g. a static inline
function oidset_is_initialized in oidset.h?

> +		add_refs_to_oidset(tip_oids, unmatched);
> +		add_refs_to_oidset(tip_oids, newlist);
> +	}
> +	return oidset_contains(tip_oids, id);
> +}

The rest looks good.

With or without that change,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for your patient work.

diff --git i/fetch-pack.c w/fetch-pack.c
index 9dd430a65a..0394580434 100644
--- i/fetch-pack.c
+++ w/fetch-pack.c
@@ -603,7 +603,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
 			    struct ref *unmatched, struct ref *newlist,
 			    const struct object_id *id)
 {
-	if (!tip_oids->map.cmpfn) {
+	if (!oidset_initialized(tip_oids)) {
 		add_refs_to_oidset(tip_oids, unmatched);
 		add_refs_to_oidset(tip_oids, newlist);
 	}
diff --git i/oidset.c w/oidset.c
index ac169f05d3..f2a6753b8a 100644
--- i/oidset.c
+++ w/oidset.c
@@ -18,7 +18,7 @@ int oidset_contains(const struct oidset *set, const struct object_id *oid)
 {
 	struct hashmap_entry key;
 
-	if (!set->map.cmpfn)
+	if (!oidset_initialized(set))
 		return 0;
 
 	hashmap_entry_init(&key, sha1hash(oid->hash));
@@ -29,7 +29,7 @@ int oidset_insert(struct oidset *set, const struct object_id *oid)
 {
 	struct oidset_entry *entry;
 
-	if (!set->map.cmpfn)
+	if (!oidset_initialized(set))
 		hashmap_init(&set->map, oidset_hashcmp, 0);
 
 	if (oidset_contains(set, oid))
diff --git i/oidset.h w/oidset.h
index b7eaab5b88..2e7d889770 100644
--- i/oidset.h
+++ w/oidset.h
@@ -22,6 +22,16 @@ struct oidset {
 
 #define OIDSET_INIT { { NULL } }
 
+/**
+ * Returns true iff "set" has been initialized (for example by inserting
+ * an entry). An oidset is considered uninitialized if it hasn't had any
+ * oids inserted since it was last cleared.
+ */
+static inline int oidset_initialized(const struct oidset *set)
+{
+	return set->map.cmpfn ? 1 : 0;
+}
+
 /**
  * Returns true iff `set` contains `oid`.
  */

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 22:46   ` Jonathan Nieder
@ 2017-05-12  2:59     ` Jeff King
  2017-05-12  6:01     ` Junio C Hamano
  1 sibling, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-12  2:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git

On Thu, May 11, 2017 at 03:46:39PM -0700, Jonathan Nieder wrote:

> > +static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
> > +{
> > +	for (; refs; refs = refs->next)
> > +		oidset_insert(oids, &refs->old_oid);
> > +}
> > +
> > +static int tip_oids_contain(struct oidset *tip_oids,
> > +			    struct ref *unmatched, struct ref *newlist,
> > +			    const struct object_id *id)
> > +{
> > +	if (!tip_oids->map.cmpfn) {
> 
> This feels like a layering violation.  Could it be e.g. a static inline
> function oidset_is_initialized in oidset.h?

It definitely is a layering violation, though we normally are pretty
"open" with letting callers peek in at our data structures.

I'd be fine with it as-is or with the helper function you suggested.
But...

> +/**
> + * Returns true iff "set" has been initialized (for example by inserting
> + * an entry). An oidset is considered uninitialized if it hasn't had any
> + * oids inserted since it was last cleared.
> + */
> +static inline int oidset_initialized(const struct oidset *set)
> +{
> +	return set->map.cmpfn ? 1 : 0;
> +}

Now we're committing our own layering violation. I was surprised to find
that hashmap_free() clears "cmpfn", and I'm not sure if we would want to
count on that (not that it probably matters all that much, but it is
required for the description you gave to be accurate).

The hashmap documentation suggests using "tablesize" for checking
whether something is initialized. Maybe we ought to use that.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 22:30 ` [PATCH v5] " Jonathan Tan
  2017-05-11 22:46   ` Jonathan Nieder
@ 2017-05-12  3:06   ` Jeff King
  1 sibling, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-12  3:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder

On Thu, May 11, 2017 at 03:30:54PM -0700, Jonathan Tan wrote:

> fetch-pack, when fetching a literal SHA-1 from a server that is not
> configured with uploadpack.allowtipsha1inwant (or similar), always
> returns an error message of the form "Server does not allow request for
> unadvertised object %s". However, it is sometimes the case that such
> object is advertised. This situation would occur, for example, if a user
> or a script was provided a SHA-1 instead of a branch or tag name for
> fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
> that SHA-1.
> 
> Teach fetch-pack to also check the SHA-1s of the refs in the received
> ref advertisement if a literal SHA-1 was given by the user.
> 
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---

This looks good to me. There's one minor nit that I don't think I saw
mentioned, and I don't think needs to hold up the patch. But I wanted to
mention it just in case I'm wrong that it doesn't matter.

>  static void filter_refs(struct fetch_pack_args *args,
>  			struct ref **refs,
>  			struct ref **sought, int nr_sought)
>  {
>  	struct ref *newlist = NULL;
>  	struct ref **newtail = &newlist;
> +	struct ref *unmatched = NULL;
>  	struct ref *ref, *next;
> +	struct oidset tip_oids = OIDSET_INIT;
>  	int i;
>  
>  	i = 0;
> @@ -631,7 +651,8 @@ static void filter_refs(struct fetch_pack_args *args,
>  			ref->next = NULL;
>  			newtail = &ref->next;
>  		} else {
> -			free(ref);
> +			ref->next = unmatched;
> +			unmatched = ref;
>  		}

The incoming "refs" list is sorted, and we rely on that sorting to do
the linear walk. Likewise, we append to newlist via newtail, so it
remains sorted (and I suspect the consumers of the list rely on that).
But your new "unmatched" list is done by prepending, so it's in reverse
order.

I don't think that matters for our purposes here, and the list doesn't
escape our function. So there's no bug, but I just wonder if it might
end up biting somebody in the future. I'm OK with leaving it, though.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 22:46   ` Jonathan Nieder
  2017-05-12  2:59     ` Jeff King
@ 2017-05-12  6:01     ` Junio C Hamano
  2017-05-12  7:59       ` Jeff King
  1 sibling, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-05-12  6:01 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git, peff

Jonathan Nieder <jrnieder@gmail.com> writes:

>> +static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
>> +{
>> +	for (; refs; refs = refs->next)
>> +		oidset_insert(oids, &refs->old_oid);
>> +}
>> +
>> +static int tip_oids_contain(struct oidset *tip_oids,
>> +			    struct ref *unmatched, struct ref *newlist,
>> +			    const struct object_id *id)
>> +{
>> +	if (!tip_oids->map.cmpfn) {
>
> This feels like a layering violation.  Could it be e.g. a static inline
> function oidset_is_initialized in oidset.h?

Surely it does.

Also, tip_oids_contain() uses unmatched and newlist only on the
first call, but the internal API this patch establishes gives an
illusion (confusion) that updating unmatched and newlist while
making repeated to calls to this function may affect the outcome of
tip_oids_contain() function.  In fact, doesn't the loop that calls
this function extend "newlist" by extending the list at its tail?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12  6:01     ` Junio C Hamano
@ 2017-05-12  7:59       ` Jeff King
  2017-05-12  8:14         ` Jeff King
  2017-05-12 18:09         ` Jonathan Tan
  0 siblings, 2 replies; 47+ messages in thread
From: Jeff King @ 2017-05-12  7:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Jonathan Tan, git

On Fri, May 12, 2017 at 03:01:35PM +0900, Junio C Hamano wrote:

> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
> >> +static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
> >> +{
> >> +	for (; refs; refs = refs->next)
> >> +		oidset_insert(oids, &refs->old_oid);
> >> +}
> >> +
> >> +static int tip_oids_contain(struct oidset *tip_oids,
> >> +			    struct ref *unmatched, struct ref *newlist,
> >> +			    const struct object_id *id)
> >> +{
> >> +	if (!tip_oids->map.cmpfn) {
> >
> > This feels like a layering violation.  Could it be e.g. a static inline
> > function oidset_is_initialized in oidset.h?
> 
> Surely it does.
> 
> Also, tip_oids_contain() uses unmatched and newlist only on the
> first call, but the internal API this patch establishes gives an
> illusion (confusion) that updating unmatched and newlist while
> making repeated to calls to this function may affect the outcome of
> tip_oids_contain() function.  In fact, doesn't the loop that calls
> this function extend "newlist" by extending the list at its tail?

It does, but only with elements whose oids were already in the set. So I
don't think it's wrong, but I agree the interface makes it a bit muddy.

The whole thing is much easier to follow if you just create the oidset
before chopping it up into two lists. But that loses the "lazy"
property, and you pay for the oidset even if there are no raw sha1s
requested. But it is nice and short. See the patch below (as a
replacement for what we've been discussing, not on top).

It's hard to resolve that because the loop that chops up the lists is
also the loop that is figuring out if there are leftover raw-sha1
names. But you could probably structure it like:

  1. Loop and see if we have unmatched names that look like sha1s.

  2. Set up the oidset from the two chopped-up lists if there are
     sha1 candidates (and if we aren't going to just send them
     regardless due to server options).

  3. Loop over the unmatched candidates and check them against the
     oidset.

That avoids the lazy-init, so pulling (2) out into a function results in
a better interface.

---
diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..4f3e8fa6b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -599,6 +600,9 @@ static void filter_refs(struct fetch_pack_args *args,
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
+	int send_raw_oids = (allow_unadvertised_object_request &
+			     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
 	int i;
 
 	i = 0;
@@ -626,6 +630,9 @@ static void filter_refs(struct fetch_pack_args *args,
 		    (!args->deepen || !starts_with(ref->name, "refs/tags/")))
 			keep = 1;
 
+		if (!send_raw_oids)
+			oidset_insert(&tip_oids, &ref->old_oid);
+
 		if (keep) {
 			*newtail = ref;
 			ref->next = NULL;
@@ -647,8 +654,7 @@ static void filter_refs(struct fetch_pack_args *args,
 		    hashcmp(sha1, ref->old_oid.hash))
 			continue;
 
-		if ((allow_unadvertised_object_request &
-		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+		if (send_raw_oids || oidset_contains(&tip_oids, &ref->old_oid)) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -657,6 +663,8 @@ static void filter_refs(struct fetch_pack_args *args,
 		}
 	}
 	*refs = newlist;
+
+	oidset_clear(&tip_oids);
 }
 
 static void mark_alternate_complete(struct object *obj)

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12  7:59       ` Jeff King
@ 2017-05-12  8:14         ` Jeff King
  2017-05-12 18:00           ` Jonathan Tan
  2017-05-12 18:09         ` Jonathan Tan
  1 sibling, 1 reply; 47+ messages in thread
From: Jeff King @ 2017-05-12  8:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Jonathan Tan, git

On Fri, May 12, 2017 at 03:59:31AM -0400, Jeff King wrote:

> It's hard to resolve that because the loop that chops up the lists is
> also the loop that is figuring out if there are leftover raw-sha1
> names. But you could probably structure it like:
> 
>   1. Loop and see if we have unmatched names that look like sha1s.
> 
>   2. Set up the oidset from the two chopped-up lists if there are
>      sha1 candidates (and if we aren't going to just send them
>      regardless due to server options).
> 
>   3. Loop over the unmatched candidates and check them against the
>      oidset.
> 
> That avoids the lazy-init, so pulling (2) out into a function results in
> a better interface.

Here that is for reference. It's a bit more complex than the other
solutions, requiring the unmatched list, the is_literal_sha1() helper,
and the oidset. But it avoids any extra allocation when it isn't
necessary, and drops the laziness.

I mostly did this to give the discussion something concrete to look at.
I'm actually OK with any of the options so far, as long as it does not
have any quadratic behavior. :)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..e167213c0 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,13 +593,27 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static int is_literal_sha1(const struct ref *ref)
+{
+	struct object_id oid;
+	const char *end;
+	return !parse_oid_hex(ref->name, &oid, &end) &&
+	       !*end &&
+	       !oidcmp(&oid, &ref->old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
+	int send_raw_oids = (allow_unadvertised_object_request &
+			     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
+	int seeking_raw_oid = 0;
 	int i;
 
 	i = 0;
@@ -617,7 +632,8 @@ static void filter_refs(struct fetch_pack_args *args,
 				else if (cmp == 0) {
 					keep = 1; /* definitely have it */
 					sought[i]->match_status = REF_MATCHED;
-				}
+				} else if (is_literal_sha1(sought[i]))
+					seeking_raw_oid = 1;
 				i++;
 			}
 		}
@@ -631,24 +647,27 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
+	if (seeking_raw_oid && !send_raw_oids) {
+		for (ref = newlist; ref; ref = ref->next)
+			oidset_insert(&tip_oids, &ref->old_oid);
+		for (ref = unmatched; ref; ref = ref->next)
+			oidset_insert(&tip_oids, &ref->old_oid);
+	}
+
 	/* Append unmatched requests to the list */
 	for (i = 0; i < nr_sought; i++) {
-		unsigned char sha1[20];
-
 		ref = sought[i];
 		if (ref->match_status != REF_NOT_MATCHED)
 			continue;
-		if (get_sha1_hex(ref->name, sha1) ||
-		    ref->name[40] != '\0' ||
-		    hashcmp(sha1, ref->old_oid.hash))
+		if (!is_literal_sha1(ref))
 			continue;
 
-		if ((allow_unadvertised_object_request &
-		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+		if (send_raw_oids || oidset_contains(&tip_oids, &ref->old_oid)) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -656,6 +675,13 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
 		}
 	}
+
+	oidset_clear(&tip_oids);
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
+
 	*refs = newlist;
 }
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12  8:14         ` Jeff King
@ 2017-05-12 18:00           ` Jonathan Tan
  2017-05-13  8:30             ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2017-05-12 18:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jonathan Nieder, Git mailing list

On Fri, May 12, 2017 at 1:14 AM, Jeff King <peff@peff.net> wrote:
> diff --git a/fetch-pack.c b/fetch-pack.c
> index afb8b0502..e167213c0 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -15,6 +15,7 @@
>  #include "version.h"
>  #include "prio-queue.h"
>  #include "sha1-array.h"
> +#include "oidset.h"
>
>  static int transfer_unpack_limit = -1;
>  static int fetch_unpack_limit = -1;
> @@ -592,13 +593,27 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
>         }
>  }
>
> +static int is_literal_sha1(const struct ref *ref)
> +{
> +       struct object_id oid;
> +       const char *end;
> +       return !parse_oid_hex(ref->name, &oid, &end) &&
> +              !*end &&
> +              !oidcmp(&oid, &ref->old_oid);
> +}
> +
>  static void filter_refs(struct fetch_pack_args *args,
>                         struct ref **refs,
>                         struct ref **sought, int nr_sought)
>  {
>         struct ref *newlist = NULL;
>         struct ref **newtail = &newlist;
> +       struct ref *unmatched = NULL;
>         struct ref *ref, *next;
> +       struct oidset tip_oids = OIDSET_INIT;
> +       int send_raw_oids = (allow_unadvertised_object_request &
> +                            (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
> +       int seeking_raw_oid = 0;
>         int i;
>
>         i = 0;
> @@ -617,7 +632,8 @@ static void filter_refs(struct fetch_pack_args *args,
>                                 else if (cmp == 0) {
>                                         keep = 1; /* definitely have it */
>                                         sought[i]->match_status = REF_MATCHED;
> -                               }
> +                               } else if (is_literal_sha1(sought[i]))
> +                                       seeking_raw_oid = 1;

As far as I can tell, this seems to coincidentally work because SHA-1s
as strings compare less than any ref name (HEAD or refs/...) - if this
weren't the case, the "break" statement above might cause this line to
never be executed. I'm not sure if we want to rely on that.

>                                 i++;
>                         }
>                 }
> @@ -631,24 +647,27 @@ static void filter_refs(struct fetch_pack_args *args,
>                         ref->next = NULL;
>                         newtail = &ref->next;
>                 } else {
> -                       free(ref);
> +                       ref->next = unmatched;
> +                       unmatched = ref;
>                 }
>         }
>
> +       if (seeking_raw_oid && !send_raw_oids) {
> +               for (ref = newlist; ref; ref = ref->next)
> +                       oidset_insert(&tip_oids, &ref->old_oid);
> +               for (ref = unmatched; ref; ref = ref->next)
> +                       oidset_insert(&tip_oids, &ref->old_oid);
> +       }
> +
>         /* Append unmatched requests to the list */
>         for (i = 0; i < nr_sought; i++) {
> -               unsigned char sha1[20];
> -
>                 ref = sought[i];
>                 if (ref->match_status != REF_NOT_MATCHED)
>                         continue;
> -               if (get_sha1_hex(ref->name, sha1) ||
> -                   ref->name[40] != '\0' ||
> -                   hashcmp(sha1, ref->old_oid.hash))
> +               if (!is_literal_sha1(ref))
>                         continue;
>
> -               if ((allow_unadvertised_object_request &
> -                   (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
> +               if (send_raw_oids || oidset_contains(&tip_oids, &ref->old_oid)) {
>                         ref->match_status = REF_MATCHED;
>                         *newtail = copy_ref(ref);
>                         newtail = &(*newtail)->next;
> @@ -656,6 +675,13 @@ static void filter_refs(struct fetch_pack_args *args,
>                         ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
>                 }
>         }
> +
> +       oidset_clear(&tip_oids);
> +       for (ref = unmatched; ref; ref = next) {
> +               next = ref->next;
> +               free(ref);
> +       }
> +
>         *refs = newlist;
>  }
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12  7:59       ` Jeff King
  2017-05-12  8:14         ` Jeff King
@ 2017-05-12 18:09         ` Jonathan Tan
  2017-05-12 19:06           ` Jonathan Nieder
  1 sibling, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2017-05-12 18:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jonathan Nieder, Git mailing list

On Fri, May 12, 2017 at 12:59 AM, Jeff King <peff@peff.net> wrote:
> On Fri, May 12, 2017 at 03:01:35PM +0900, Junio C Hamano wrote:
>> Also, tip_oids_contain() uses unmatched and newlist only on the
>> first call, but the internal API this patch establishes gives an
>> illusion (confusion) that updating unmatched and newlist while
>> making repeated to calls to this function may affect the outcome of
>> tip_oids_contain() function.  In fact, doesn't the loop that calls
>> this function extend "newlist" by extending the list at its tail?
>
> It does, but only with elements whose oids were already in the set. So I
> don't think it's wrong, but I agree the interface makes it a bit muddy.

To make the interface less muddy, would you agree with this (untested):

@@ -648,7 +669,9 @@ static void filter_refs(struct fetch_pack_args *args,
  continue;

  if ((allow_unadvertised_object_request &
-    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
+      (check_tip_oids_initialized(&tip_oids, unmatched, newlist) &&
oidset_contains(&tip_oids,
+     &ref->old_oid))) {
  ref->match_status = REF_MATCHED;
  *newtail = copy_ref(ref);
  newtail = &(*newtail)->next;

(making the function-to-abstract be merely an initialization one,
instead of one that does 2 things). That decreases the scope of the
function that Jonathan Nieder and Peff wanted, but it might be a
warranted reduction in scope.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12 18:09         ` Jonathan Tan
@ 2017-05-12 19:06           ` Jonathan Nieder
  0 siblings, 0 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-12 19:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jeff King, Junio C Hamano, Git mailing list

Jonathan Tan wrote:

> To make the interface less muddy, would you agree with this (untested):
>
> @@ -648,7 +669,9 @@ static void filter_refs(struct fetch_pack_args *args,
>   continue;
> 
>   if ((allow_unadvertised_object_request &
> -    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
> +     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
> +      (check_tip_oids_initialized(&tip_oids, unmatched, newlist) &&
> oidset_contains(&tip_oids,
> +     &ref->old_oid))) {
>   ref->match_status = REF_MATCHED;
>   *newtail = copy_ref(ref);
>   newtail = &(*newtail)->next;
>
> (making the function-to-abstract be merely an initialization one,
> instead of one that does 2 things). That decreases the scope of the
> function that Jonathan Nieder and Peff wanted, but it might be a
> warranted reduction in scope.

Yeah, that sounds nicer than anything I suggested.

nit that check_ would be clearer as ensure_

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
                   ` (4 preceding siblings ...)
  2017-05-11 22:30 ` [PATCH v5] " Jonathan Tan
@ 2017-05-12 20:45 ` Jonathan Tan
  2017-05-12 20:46 ` [PATCH v6] " Jonathan Tan
  2017-05-15 17:32 ` [PATCH v7] " Jonathan Tan
  7 siblings, 0 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-05-12 20:45 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, peff, gitster

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

Change from v5: used "ensure_tip_oids_initialized" function instead.
This removes some of the muddiness (e.g. with newlist being modified
after the function).

I used an "int" return type for ensure_tip_oids_initialized so that I
could chain with "&&", but "void" and chaining with "," is also possible
(but I don't see much use of the comma operator in Git code).
---
 fetch-pack.c          | 35 +++++++++++++++++++++++++++++++++--
 t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..f05f48d7b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,13 +593,32 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
+static int ensure_tip_oids_initialized(struct oidset *tip_oids,
+				       struct ref *unmatched,
+				       struct ref *newlist)
+{
+	if (!tip_oids->map.tablesize) {
+		add_refs_to_oidset(tip_oids, unmatched);
+		add_refs_to_oidset(tip_oids, newlist);
+	}
+	return 1;
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
 	int i;
 
 	i = 0;
@@ -631,7 +651,8 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
@@ -648,7 +669,10 @@ static void filter_refs(struct fetch_pack_args *args,
 			continue;
 
 		if ((allow_unadvertised_object_request &
-		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+		     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
+		    (ensure_tip_oids_initialized(&tip_oids, unmatched,
+						 newlist) &&
+		     oidset_contains(&tip_oids, &ref->old_oid))) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -656,6 +680,13 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
 		}
 	}
+
+	oidset_clear(&tip_oids);
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
+
 	*refs = newlist;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..80a1a3239 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master)
+'
+
+test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+	test_commit -C server 2 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/tags/1) refs/tags/1
+'
+
+test_expect_success 'fetch-pack cannot fetch a raw sha1 that is not advertised as a ref' '
+	rm -rf server &&
+
+	git init server &&
+	test_commit -C server 5 &&
+	git -C server tag -d 5 &&
+	test_commit -C server 6 &&
+
+	git init client &&
+	test_must_fail git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master^) 2>err &&
+	test_i18ngrep "Server does not allow request for unadvertised object" err
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
                   ` (5 preceding siblings ...)
  2017-05-12 20:45 ` Jonathan Tan
@ 2017-05-12 20:46 ` Jonathan Tan
  2017-05-12 22:28   ` Jonathan Nieder
  2017-05-13  8:36   ` Jeff King
  2017-05-15 17:32 ` [PATCH v7] " Jonathan Tan
  7 siblings, 2 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-05-12 20:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, peff, gitster

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

Change from v5: used "ensure_tip_oids_initialized" function instead.
This removes some of the muddiness (e.g. with newlist being modified
after the function).

I used an "int" return type for ensure_tip_oids_initialized so that I
could chain with "&&", but "void" and chaining with "," is also possible
(but I don't see much use of the comma operator in Git code).
---
 fetch-pack.c          | 35 +++++++++++++++++++++++++++++++++--
 t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..f05f48d7b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,13 +593,32 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
+static int ensure_tip_oids_initialized(struct oidset *tip_oids,
+				       struct ref *unmatched,
+				       struct ref *newlist)
+{
+	if (!tip_oids->map.tablesize) {
+		add_refs_to_oidset(tip_oids, unmatched);
+		add_refs_to_oidset(tip_oids, newlist);
+	}
+	return 1;
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
 	int i;
 
 	i = 0;
@@ -631,7 +651,8 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
@@ -648,7 +669,10 @@ static void filter_refs(struct fetch_pack_args *args,
 			continue;
 
 		if ((allow_unadvertised_object_request &
-		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+		     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
+		    (ensure_tip_oids_initialized(&tip_oids, unmatched,
+						 newlist) &&
+		     oidset_contains(&tip_oids, &ref->old_oid))) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -656,6 +680,13 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
 		}
 	}
+
+	oidset_clear(&tip_oids);
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
+
 	*refs = newlist;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..80a1a3239 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master)
+'
+
+test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+	test_commit -C server 2 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/tags/1) refs/tags/1
+'
+
+test_expect_success 'fetch-pack cannot fetch a raw sha1 that is not advertised as a ref' '
+	rm -rf server &&
+
+	git init server &&
+	test_commit -C server 5 &&
+	git -C server tag -d 5 &&
+	test_commit -C server 6 &&
+
+	git init client &&
+	test_must_fail git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master^) 2>err &&
+	test_i18ngrep "Server does not allow request for unadvertised object" err
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v6] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12 20:46 ` [PATCH v6] " Jonathan Tan
@ 2017-05-12 22:28   ` Jonathan Nieder
  2017-05-13  8:36   ` Jeff King
  1 sibling, 0 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-12 22:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, gitster

Jonathan Tan wrote:

> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c          | 35 +++++++++++++++++++++++++++++++++--
>  t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for your patience.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12 18:00           ` Jonathan Tan
@ 2017-05-13  8:30             ` Jeff King
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-13  8:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, Jonathan Nieder, Git mailing list

On Fri, May 12, 2017 at 11:00:36AM -0700, Jonathan Tan wrote:

> > @@ -617,7 +632,8 @@ static void filter_refs(struct fetch_pack_args *args,
> >                                 else if (cmp == 0) {
> >                                         keep = 1; /* definitely have it */
> >                                         sought[i]->match_status = REF_MATCHED;
> > -                               }
> > +                               } else if (is_literal_sha1(sought[i]))
> > +                                       seeking_raw_oid = 1;
> 
> As far as I can tell, this seems to coincidentally work because SHA-1s
> as strings compare less than any ref name (HEAD or refs/...) - if this
> weren't the case, the "break" statement above might cause this line to
> never be executed. I'm not sure if we want to rely on that.

Right, I was thinking we'd hit all of the sought refs in this loop, but
we don't necessarily (if we run out of remote refs to compare to).
Finishing it off with:

  /* scan the remainder of the requests */
  for (; i < nr_sought; i++)
	if (is_literal_sha1(...))

though it would probably be less confusing to just do a separate loop
after the original is done.

Anyway, it looks like your v6 just keeps the lazy approach, so this is
all moot.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6] fetch-pack: always allow fetching of literal SHA1s
  2017-05-12 20:46 ` [PATCH v6] " Jonathan Tan
  2017-05-12 22:28   ` Jonathan Nieder
@ 2017-05-13  8:36   ` Jeff King
  2017-05-15  1:26     ` Junio C Hamano
  1 sibling, 1 reply; 47+ messages in thread
From: Jeff King @ 2017-05-13  8:36 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, gitster

On Fri, May 12, 2017 at 01:46:48PM -0700, Jonathan Tan wrote:

> Change from v5: used "ensure_tip_oids_initialized" function instead.
> This removes some of the muddiness (e.g. with newlist being modified
> after the function).

I don't think it really improves the muddiness. You are still calling
the ensure function each time through the loop with a potentially
modified list, but it doesn't actually look at the list after the first
time. So the muddy part is still there.

I think rather than changing the code, you'd do better with a comment
like:

  /*
   * Note that this only looks at the ref lists the first time
   * it's called. This works out in filter_refs() because even
   * though it may add to "newlist" between calls, the additions
   * will always be for oids that are already in the set.
   */

At which point the original all-in-one function is probably fine (as it
avoids the "return 1 just to join the &&-chain" bit).

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
  2017-05-11 17:00     ` Brandon Williams
@ 2017-05-13  9:29       ` Jeff King
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-13  9:29 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Jonathan Tan, git, jrnieder, spearce

On Thu, May 11, 2017 at 10:00:50AM -0700, Brandon Williams wrote:

> > None of this is your problem now either way; the advertisement-limiting
> > extension is still vaporware, albeit one we've discussed a lot. I just
> > wanted to make sure we weren't painting ourselves into any corners. And
> > I think this case could probably be handled.
> 
> I can't remember, is there any reason why it is still vaporware? As in
> what is holding us back from doing the advertisement-limiting (or doing
> away with the initial ref  advertisement)?

The tricky part is handling the compatibility issues. The client has no
way to speak first in the current protocol, so we have to break protocol
to tell the server our refspecs before the advertisement.

I actually put together patches last fall for this, got derailed by a
small snag while polishing them, and just haven't picked them up again.
If you're interested, they're at the jk/early-caps-wip branch of
https://github.com/peff/git.

I got as far as breaking it all up into patches, but the commit messages
still need to be written. Some of them are pretty obvious to me still
after 6 months, but a few are a bit inscrutable.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6] fetch-pack: always allow fetching of literal SHA1s
  2017-05-13  8:36   ` Jeff King
@ 2017-05-15  1:26     ` Junio C Hamano
  0 siblings, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2017-05-15  1:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git, jrnieder

Jeff King <peff@peff.net> writes:

> On Fri, May 12, 2017 at 01:46:48PM -0700, Jonathan Tan wrote:
>
>> Change from v5: used "ensure_tip_oids_initialized" function instead.
>> This removes some of the muddiness (e.g. with newlist being modified
>> after the function).
>
> I don't think it really improves the muddiness. You are still calling
> the ensure function each time through the loop with a potentially
> modified list, but it doesn't actually look at the list after the first
> time. So the muddy part is still there.
>
> I think rather than changing the code, you'd do better with a comment
> like:
>
>   /*
>    * Note that this only looks at the ref lists the first time
>    * it's called. This works out in filter_refs() because even
>    * though it may add to "newlist" between calls, the additions
>    * will always be for oids that are already in the set.
>    */
>
> At which point the original all-in-one function is probably fine (as it
> avoids the "return 1 just to join the &&-chain" bit).

Yes.  I agree that the in-code comment is the best approach to the
"muddy combination of 'grabbing it for the first time' and 'the
thing that is grabbed mutates in the loop'" confusion.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v7] fetch-pack: always allow fetching of literal SHA1s
  2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
                   ` (6 preceding siblings ...)
  2017-05-12 20:46 ` [PATCH v6] " Jonathan Tan
@ 2017-05-15 17:32 ` Jonathan Tan
  2017-05-15 17:46   ` Jonathan Nieder
  2017-05-15 22:10   ` Jeff King
  7 siblings, 2 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-05-15 17:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, peff, gitster

fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---

OK, one combined function (for lazy initialization and checking
containment in the oidset) with comment it is.

Change from v6: changed back to "tip_oids_contain", and included Peff's
comment.
---
 fetch-pack.c          | 40 ++++++++++++++++++++++++++++++++++++++--
 t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index afb8b0502..703e7ec78 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,13 +593,38 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
+static int tip_oids_contain(struct oidset *tip_oids,
+			    struct ref *unmatched, struct ref *newlist,
+			    const struct object_id *id)
+{
+	/*
+	 * Note that this only looks at the ref lists the first time it's
+	 * called. This works out in filter_refs() because even though it may
+	 * add to "newlist" between calls, the additions will always be for
+	 * oids that are already in the set.
+	 */
+	if (!tip_oids->map.tablesize) {
+		add_refs_to_oidset(tip_oids, unmatched);
+		add_refs_to_oidset(tip_oids, newlist);
+	}
+	return oidset_contains(tip_oids, id);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
 {
 	struct ref *newlist = NULL;
 	struct ref **newtail = &newlist;
+	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
 	int i;
 
 	i = 0;
@@ -631,7 +657,8 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->next = NULL;
 			newtail = &ref->next;
 		} else {
-			free(ref);
+			ref->next = unmatched;
+			unmatched = ref;
 		}
 	}
 
@@ -648,7 +675,9 @@ static void filter_refs(struct fetch_pack_args *args,
 			continue;
 
 		if ((allow_unadvertised_object_request &
-		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
+		     (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
+		    tip_oids_contain(&tip_oids, unmatched, newlist,
+				     &ref->old_oid)) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
 			newtail = &(*newtail)->next;
@@ -656,6 +685,13 @@ static void filter_refs(struct fetch_pack_args *args,
 			ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
 		}
 	}
+
+	oidset_clear(&tip_oids);
+	for (ref = unmatched; ref; ref = next) {
+		next = ref->next;
+		free(ref);
+	}
+
 	*refs = newlist;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b5865b385..80a1a3239 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,41 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'fetch-pack can fetch a raw sha1 that is advertised as a ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master)
+'
+
+test_expect_success 'fetch-pack can fetch a raw sha1 overlapping a named ref' '
+	rm -rf server client &&
+	git init server &&
+	test_commit -C server 1 &&
+	test_commit -C server 2 &&
+
+	git init client &&
+	git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/tags/1) refs/tags/1
+'
+
+test_expect_success 'fetch-pack cannot fetch a raw sha1 that is not advertised as a ref' '
+	rm -rf server &&
+
+	git init server &&
+	test_commit -C server 5 &&
+	git -C server tag -d 5 &&
+	test_commit -C server 6 &&
+
+	git init client &&
+	test_must_fail git -C client fetch-pack ../server \
+		$(git -C server rev-parse refs/heads/master^) 2>err &&
+	test_i18ngrep "Server does not allow request for unadvertised object" err
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
-- 
2.13.0.rc2.291.g57267f2277-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v7] fetch-pack: always allow fetching of literal SHA1s
  2017-05-15 17:32 ` [PATCH v7] " Jonathan Tan
@ 2017-05-15 17:46   ` Jonathan Nieder
  2017-05-15 22:10   ` Jeff King
  1 sibling, 0 replies; 47+ messages in thread
From: Jonathan Nieder @ 2017-05-15 17:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, gitster

Jonathan Tan wrote:

> fetch-pack, when fetching a literal SHA-1 from a server that is not
> configured with uploadpack.allowtipsha1inwant (or similar), always
> returns an error message of the form "Server does not allow request for
> unadvertised object %s". However, it is sometimes the case that such
> object is advertised. This situation would occur, for example, if a user
> or a script was provided a SHA-1 instead of a branch or tag name for
> fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
> that SHA-1.
>
> Teach fetch-pack to also check the SHA-1s of the refs in the received
> ref advertisement if a literal SHA-1 was given by the user.
>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c          | 40 ++++++++++++++++++++++++++++++++++++++--
>  t/t5500-fetch-pack.sh | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 73 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

I think we've done enough golf on this one.

Thanks for your patient work.
Jonathan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v7] fetch-pack: always allow fetching of literal SHA1s
  2017-05-15 17:32 ` [PATCH v7] " Jonathan Tan
  2017-05-15 17:46   ` Jonathan Nieder
@ 2017-05-15 22:10   ` Jeff King
  1 sibling, 0 replies; 47+ messages in thread
From: Jeff King @ 2017-05-15 22:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, gitster

On Mon, May 15, 2017 at 10:32:20AM -0700, Jonathan Tan wrote:

> OK, one combined function (for lazy initialization and checking
> containment in the oidset) with comment it is.
> 
> Change from v6: changed back to "tip_oids_contain", and included Peff's
> comment.

Thanks, looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-05-15 22:10 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
2017-05-09 22:16 ` Jeff King
2017-05-10  4:22   ` Shawn Pearce
2017-05-10  4:33     ` Jeff King
2017-05-10  4:46       ` Mike Hommey
2017-05-10 17:50         ` Ævar Arnfjörð Bjarmason
2017-05-10 18:20           ` Jonathan Nieder
2017-05-10 18:48             ` Martin Fick
2017-05-10 18:54               ` Jonathan Nieder
2017-05-10  4:57       ` Shawn Pearce
2017-05-10 17:00       ` Jonathan Nieder
2017-05-10 18:55         ` Sebastian Schuberth
2017-05-11  9:59         ` Jeff King
2017-05-11 19:03           ` Jonathan Nieder
2017-05-11 21:04             ` Jeff King
2017-05-10 16:44 ` [PATCH v2] " Jonathan Tan
2017-05-10 18:01   ` Jonathan Nieder
2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
2017-05-10 23:22   ` Jonathan Nieder
2017-05-11  9:46   ` Jeff King
2017-05-11 17:51     ` Jonathan Tan
2017-05-11 20:52       ` Jeff King
2017-05-11 10:05   ` Jeff King
2017-05-11 17:00     ` Brandon Williams
2017-05-13  9:29       ` Jeff King
2017-05-11 21:14 ` [PATCH v4] " Jonathan Tan
2017-05-11 21:35   ` Jonathan Nieder
2017-05-11 21:59     ` Jeff King
2017-05-11 22:30 ` [PATCH v5] " Jonathan Tan
2017-05-11 22:46   ` Jonathan Nieder
2017-05-12  2:59     ` Jeff King
2017-05-12  6:01     ` Junio C Hamano
2017-05-12  7:59       ` Jeff King
2017-05-12  8:14         ` Jeff King
2017-05-12 18:00           ` Jonathan Tan
2017-05-13  8:30             ` Jeff King
2017-05-12 18:09         ` Jonathan Tan
2017-05-12 19:06           ` Jonathan Nieder
2017-05-12  3:06   ` Jeff King
2017-05-12 20:45 ` Jonathan Tan
2017-05-12 20:46 ` [PATCH v6] " Jonathan Tan
2017-05-12 22:28   ` Jonathan Nieder
2017-05-13  8:36   ` Jeff King
2017-05-15  1:26     ` Junio C Hamano
2017-05-15 17:32 ` [PATCH v7] " Jonathan Tan
2017-05-15 17:46   ` Jonathan Nieder
2017-05-15 22:10   ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.