All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Issue 323 in msysgit: Can't clone over http
       [not found] <0016e6470f36315b8a0472bc75a8@google.com>
@ 2009-09-04 10:25 ` Junio C Hamano
  2009-09-04 13:29 ` Tay Ray Chuan
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-09-04 10:25 UTC (permalink / raw)
  To: git, Tay Ray Chuan; +Cc: msysgit


codesite-noreply@google.com writes:

> Status: New
> Owner: ----
>
> New issue 323 by bjelli: Can't clone over http
> http://code.google.com/p/msysgit/issues/detail?id=323
>
> What steps will reproduce the problem?
> 1.Install Git-1.6.4-preview20090730.exe
> 2.Clone exsiting repository http://github.com/tekkub/addontemplate.git

This does not seem to be an msysgit issue.  Even on a Linux host, v1.6.2.5
seems to work Ok but 'maint', 'master', nor 'next' does not clone this one
correctly.

> Output:
> got 2c8851d269d51676b8c626e63991ee68a6f5d578
> walk 2c8851d269d51676b8c626e63991ee68a6f5d578
> got 758419d18ad255c3417ca341c6e12c6ca1aa203e
> got fa8a1ec5a791c245789f70e90a844f2b9a275991
> walk fa8a1ec5a791c245789f70e90a844f2b9a275991
> got e884a228df0e08e0f862edab6012d8407907ab48
> got f7ea166470af2538a6a19642f8c45213bac7bd40
> got 6100842656e95bf50f2c6f3ff6e997bcbe2474cc
> got 445c0ea7c7193f6fcb42b32db50104926d328322
> got a44d6309e48622590b2780f96bed371122db6b71
> got 3ecefa3f04f394f64f8fe7be14ac20e69f2f2c18
> Getting alternates list for http://github.com/tekkub/addontemplate.git
> Getting pack list for http://github.com/tekkub/addontemplate.git
> error: Unable to verify pack 382c25c935b744e909c749532578112d72a4aff9 is
> available
> error: Unable to find 0a41ac04d56ccc96491989dc71d9875cd804fc6b under
> http://github.com/tekkub/addontemplate.git
> Cannot obtain needed blob 0a41ac04d56ccc96491989dc71d9875cd804fc6b
> while processing commit fa8a1ec5a791c245789f70e90a844f2b9a275991.
> fatal: Fetch failed.
>
> What version of the product are you using? On what operating system?
> Git-1.6.4-preview20090730.exe
> Windows XP
>
> Please provide any additional information below.
> Defect was discussed on the github support board here:
> http://support.github.com/discussions/repos/957-cant-clone-over-http-or-git

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
       [not found] <0016e6470f36315b8a0472bc75a8@google.com>
  2009-09-04 10:25 ` Issue 323 in msysgit: Can't clone over http Junio C Hamano
@ 2009-09-04 13:29 ` Tay Ray Chuan
  2009-09-07  4:59   ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-04 13:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit

Hi,

On Fri, Sep 4, 2009 at 6:25 PM, Junio C Hamano<gitster@pobox.com> wrote:
> codesite-noreply@google.com writes:
>
>> Status: New
>> Owner: ----
>>
>> New issue 323 by bjelli: Can't clone over http
>> http://code.google.com/p/msysgit/issues/detail?id=323

Junio, thanks for bringing this issue to the list's and my attention.

> This does not seem to be an msysgit issue.  Even on a Linux host, v1.6.2.5
> seems to work Ok but 'maint', 'master', nor 'next' does not clone this one
> correctly.

Releases including and after v1.6.4 will have this issue:

>> error: Unable to verify pack 382c25c935b744e909c749532578112d72a4aff9 is
>> available
>> error: Unable to find 0a41ac04d56ccc96491989dc71d9875cd804fc6b under
>> http://github.com/tekkub/addontemplate.git

The issue at hand is due to git checking the http repository for the
pack file before commencing the transfer; failing which, the transfer
aborts.

Right now, git chokes on the 500 error that github.com gives it, which
shouldn't be the case, even though that's a weird response.

--
Cheers,
Ray Chuan

-- >8 --
Subject: [PATCH] http.c: clarify missing-pack-check

Abort the pack transfer only if the pack is not available in the HTTP-
served repository; otherwise, allow the transfer to continue, even if
the check failed.

This addresses an issue raised by bjelli:

  http://code.google.com/p/msysgit/issues/detail?id=323

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 http.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/http.c b/http.c
index 5926c5b..cba7e9a 100644
--- a/http.c
+++ b/http.c
@@ -864,6 +864,7 @@ int http_fetch_ref(const char *base, struct ref *ref)
 static int fetch_pack_index(unsigned char *sha1, const char *base_url)
 {
 	int ret = 0;
+	int result;
 	char *hex = xstrdup(sha1_to_hex(sha1));
 	char *filename;
 	char *url;
@@ -874,11 +875,14 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
 	strbuf_addf(&buf, "objects/pack/pack-%s.pack", hex);
 	url = strbuf_detach(&buf, 0);

-	if (http_get_strbuf(url, NULL, 0)) {
-		ret = error("Unable to verify pack %s is available",
+	result = http_get_strbuf(url, NULL, 0);
+	if (result == HTTP_MISSING_TARGET) {
+		ret = error("Unable to find pack %s",
 			    hex);
 		goto cleanup;
-	}
+	} else if (result && http_is_verbose)
+		fprintf(stderr, "Unable to verify pack %s is available\n",
+			hex);

 	if (has_pack_index(sha1)) {
 		ret = 0;
--
1.6.4.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-04 13:29 ` Tay Ray Chuan
@ 2009-09-07  4:59   ` Junio C Hamano
  2009-09-07  5:53     ` Tay Ray Chuan
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2009-09-07  4:59 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git, msysgit

Tay Ray Chuan <rctay89@gmail.com> writes:

> Subject: [PATCH] http.c: clarify missing-pack-check
>
> Abort the pack transfer only if the pack is not available in the HTTP-
> served repository; otherwise, allow the transfer to continue, even if
> the check failed.
>
> This addresses an issue raised by bjelli:
>
>   http://code.google.com/p/msysgit/issues/detail?id=323
>
> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
> ---
>  http.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/http.c b/http.c
> index 5926c5b..cba7e9a 100644
> --- a/http.c
> +++ b/http.c
> @@ -864,6 +864,7 @@ int http_fetch_ref(const char *base, struct ref *ref)
>  static int fetch_pack_index(unsigned char *sha1, const char *base_url)
>  {
>  	int ret = 0;
> +	int result;
>  	char *hex = xstrdup(sha1_to_hex(sha1));
>  	char *filename;
>  	char *url;
> @@ -874,11 +875,14 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
>  	strbuf_addf(&buf, "objects/pack/pack-%s.pack", hex);
>  	url = strbuf_detach(&buf, 0);
>
> -	if (http_get_strbuf(url, NULL, 0)) {
> -		ret = error("Unable to verify pack %s is available",
> +	result = http_get_strbuf(url, NULL, 0);
> +	if (result == HTTP_MISSING_TARGET) {
> +		ret = error("Unable to find pack %s",
>  			    hex);
>  		goto cleanup;
> -	}
> +	} else if (result && http_is_verbose)
> +		fprintf(stderr, "Unable to verify pack %s is available\n",
> +			hex);
>
>  	if (has_pack_index(sha1)) {
>  		ret = 0;

You said:

> Releases including and after v1.6.4 will have this issue:
>
>>> error: Unable to verify pack 382c25c935b744e909c749532578112d72a4aff9 is
>>> available
>>> error: Unable to find 0a41ac04d56ccc96491989dc71d9875cd804fc6b under
>>> http://github.com/tekkub/addontemplate.git
>
> The issue at hand is due to git checking the http repository for the
> pack file before commencing the transfer; failing which, the transfer
> aborts.
>
> Right now, git chokes on the 500 error that github.com gives it, which
> shouldn't be the case, even though that's a weird response.

I am assuming that you meant 39dc52c was the culprit by "including and
after v1.6.4", but it is not quite clear how this patch helps if that is
the case.

Before 39dc52c (http: use new http API in fetch_index(), 2009-06-06), we
used to run the slot by hand and checked results.curl_request against
CURLE_OK.  Everything else was an error.

With the updated code, that all went to http_get_strbuf() which in turn
calls http_request() that does the same thing, and the function returns
HTTP_OK only when it gets CURLE_OK, but now it says MISSING_TARGET when we
ask for an idx file we think exists in the repository but the server says
it doesn't, and all other errors will be ignored with this patch.

If this codepath is what was broken by github returning 500 [*1*], the
client before 39dc52c would have failed the same way.  I do not think
loosening error checking like this is a real solution, but I may be
reading the patch incorrectly.

Do people on the github side see something strange in the log?  Perhaps we
think we are making a request to objects/pack/ of the repository but by
mistake the request is going to somewhere completely off (but then we
would get 401 not 500).


[Footnote]

*1* Which I do agree is somewhat strange thing to say for a request to a
file in the objects/pack directory in a public repository---I would
understand it if it were 404, but then it means the repository is
inconsistent, i.e. has a stale objects/info/packs relative to its set of
packs it has.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07  4:59   ` Junio C Hamano
@ 2009-09-07  5:53     ` Tay Ray Chuan
  2009-09-07  7:10       ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-07  5:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit

Hi,

On Mon, Sep 7, 2009 at 12:59 PM, Junio C Hamano<gitster@pobox.com> wrote:
> Before 39dc52c (http: use new http API in fetch_index(), 2009-06-06), we
> used to run the slot by hand and checked results.curl_request against
> CURLE_OK.  Everything else was an error.
>
> With the updated code, that all went to http_get_strbuf() which in turn
> calls http_request() that does the same thing, and the function returns
> HTTP_OK only when it gets CURLE_OK, but now it says MISSING_TARGET when we
> ask for an idx file we think exists in the repository but the server says
> it doesn't, and all other errors will be ignored with this patch.

We should only be interested in the MISSING_TARGET error, because it
tells us that the pack file is indeed not available. We aren't
interested in other errors, like being unable to perform the request
(HTTP_START_FAILED), or, say, a 401 (Unauthorized) error, or even a
500; we simply move along and we tell the user we couldn't perform the
check.

> If this codepath is what was broken by github returning 500 [*1*], the
> client before 39dc52c would have failed the same way.  I do not think
> loosening error checking like this is a real solution, but I may be
> reading the patch incorrectly.

You're right to say that git before 39dc52c would have failed. It did,
but no one had the chance to break anything, because 39dc52c was part
of my http patch series that only went wild in v1.6.4.

We can trace this back to 48188c2 ("http-walker: verify remote
packs"), which copied the "feature" from http-push.c to http-walker.c.

Before that, if you tried fetching a pack with http-push.c from a
500-prone server, you would also experience this.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07  5:53     ` Tay Ray Chuan
@ 2009-09-07  7:10       ` Junio C Hamano
  2009-09-07  8:18         ` Junio C Hamano
  2009-09-07  9:27         ` Tay Ray Chuan
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-09-07  7:10 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git, msysgit, Tom Preston-Werner

Tay Ray Chuan <rctay89@gmail.com> writes:

> We should only be interested in the MISSING_TARGET error, because it
> tells us that the pack file is indeed not available. We aren't
> interested in other errors, like being unable to perform the request
> (HTTP_START_FAILED), or, say, a 401 (Unauthorized) error, or even a
> 500; we simply move along and we tell the user we couldn't perform the
> check.

We couldn't perform the check, and then what happens?  We continue as if
everything is peachy, assuming that the *.idx file we thought we were
going to get describe what objects are in the corresponding pack, and barf
when we try to read the *.idx file that we failed to download even though
the server said we should be able to get it?

> You're right to say that git before 39dc52c would have failed. It did,
> but no one had the chance to break anything, because 39dc52c was part
> of my http patch series that only went wild in v1.6.4.
>
> We can trace this back to 48188c2 ("http-walker: verify remote
> packs"), which copied the "feature" from http-push.c to http-walker.c.

Ahh, v1.6.3 ships with fetch_index() that checks CURLE_OK and returns an
error(), but that is about .idx file, and it did not have the "do they
have the corresponding .pack file?" check introduced by 48188c2
(http-walker: verify remote packs, 2009-06-06), which is what makes the
server give us 500 error.  Before that check, we ignored a request to
fetch_index() if we already had one.

Why do we even call fetch_index() when we have one?  After all, we are
talking about "git clone" here, so it is not about "we failed once and the
previous attempt left .idx files we fetched".  Why

should we even have .idx file to begin with, that would have protected
v1.6.3 clients from getting this error?

Unless we are calling fetch_index() on the same .idx file twice from our
own stupidity, that is.

The same logic now is in fetch_pack_index(), which is called from
fetch_and_setup_pack_index().  I do not still see how we end up calling
the function for the same .idx file twice, though.

The repository in question http://github.com/tekkub/addontemplate.git/
return one liner file when asked for $URL/objects/info/packs.

Which means that it is not like that the loop in http_get_info_packs() is
calling the fetch_and_setup_pack() twice because the server lists the same
pack twice.  But if your patch matters, somebody is causing us to call the
function twice for the same .idx file, and I do not see where it is.

There definitely is something else going on.

Having said all that, after digging some more, I came to the conclusion
that I'd rather not see us proceed with bug hunting, based on the failures
we see with the current github repositories over http.  Read on for the
reasons.

The github's URL responds to request for $URL/HEAD and tells us that it
points at refs/heads/master, but requests for $URL/packed-refs and
$URL/refs/heads/master go to somewhere completely unrelated to the
request, without giving any failure indication.

In order to support fetch/clone over HTTP, a server at least must respond
to requests to the follwoing locations sensibly, meaning that it gives us
back the data without frills if they exist, and give us Not found if they
don't.

 - $URL/objects/info/refs

   This must list all the refs available in the repository and must be
   up-to-date.  We do not run PROPFIND, nor parse $URL/refs/index.htm, but
   trust this file's contents and start from there.

 - $URL/objects/info/packs

   This must list all the packs in the repository and must be up-to-date.
   We do not run PROPFIND, nor parse $URL/objects/pack/index.htm, but
   trust this file's contents.  If the repository does not have any pack,
   request to this file must give us Not found.

 - $URL/packed-refs

   This may be a valid packed-refs file left after "git pack-refs".  If
   the repository's refs are not packed, the file may not exist, and that
   is Ok, but in that case, the request must give us Not found.

 - $URL/objects/pack/pack-*.pack and pack-*.idx

   If the repository lacks what we asked for , the request must result in Not
   found.

 - $URL/objects/[0-9a-f][0-9][a-f]/*

   Loose objects.  If the repository lacks what we asked for, the request
   must result in Not found.

It appears that github always redirects the request to some random project
page when Not found response is expected, which is very broken from the
point of view of git-fetch/git-clone.

I cannot tell offhand if it is just this "addontemplate.git" repository,
or if the way github arranges the URL space is fundamentally broken and
all of their repositories are unusable in exactly the same way (their URL
space seems to overlay UI pages meant for browsers over output meant to be
read by git-fetch/git-clone).

In either case, cloning over http from that "addontemplate" URL is not
expected to work right now, and the primary reason is that the server is
utterly misbehaving.

Hopefully that is a temporary breakage something github folks can promptly
fix.  Tom, could you have your server folks in touch with the git mailing
list, so that we can figure out what is going on?  Or are they already on
top of this issue and we can just wait (hopefully not for too long)?

In the meantime, I do not think it is a good idea to loosen the error
checking on our side to accomodate a server in such a (hopefully
temporary) broken state, however popular it is.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07  7:10       ` Junio C Hamano
@ 2009-09-07  8:18         ` Junio C Hamano
  2009-09-07  9:27         ` Tay Ray Chuan
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-09-07  8:18 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: Junio C Hamano, git, msysgit, Tom Preston-Werner

Junio C Hamano <gitster@pobox.com> writes:

> In order to support fetch/clone over HTTP, a server at least must respond
> to requests to the follwoing locations sensibly, meaning that it gives us
> back the data without frills if they exist, and give us Not found if they
> don't.
>
>  - $URL/objects/info/refs
>
>    This must list all the refs available in the repository and must be
>    up-to-date.  We do not run PROPFIND, nor parse $URL/refs/index.htm, but
>    trust this file's contents and start from there.

Correction.  This is $URL/info/refs and the github repository in question
does respond correctly.  Sorry about the confusion.

Recent code makes CURLOPT_NOBODY request, which will turn into a HEAD
request over HTTP, to packfiles, in order to see if they exist.  Perhaps
github is not prepared to handle that and returns 500, even though it will
give the pack correctly if asked with a GET request?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07  7:10       ` Junio C Hamano
  2009-09-07  8:18         ` Junio C Hamano
@ 2009-09-07  9:27         ` Tay Ray Chuan
  2009-09-07 19:06           ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-07  9:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit, Tom Preston-Werner

Hi,

On Mon, Sep 7, 2009 at 3:10 PM, Junio C Hamano<gitster@pobox.com> wrote:
> We couldn't perform the check, and then what happens?  We continue as if
> everything is peachy, assuming that the *.idx file we thought we were
> going to get describe what objects are in the corresponding pack, and barf
> when we try to read the *.idx file that we failed to download even though
> the server said we should be able to get it?

We didn't check for the *.idx, only the corresponding *.pack file.

About barfing on a failed *.idx file transfer: in 48188c2 and later, we
still do have ample checks here and there in the HTTP api. http_get_file()
returns a HTTP_ERROR if we fail to get a file (a *.idx file here).

The issue is that the check on the file (a HEAD request) fails, but not
its actual transfer (a GET request). There's no compromise on error
handling; we can live with a bad response to HEAD requests, but we will
still fail if we can't GET the file.

> Ahh, v1.6.3 ships with fetch_index() that checks CURLE_OK and returns an
> error(), but that is about .idx file, and it did not have the "do they
> have the corresponding .pack file?" check introduced by 48188c2
> (http-walker: verify remote packs, 2009-06-06), which is what makes the
> server give us 500 error.

Just to clarify: the "check" of CURLE_OK in http-walker.c::fetch_index()
in v1.6.3 is fundamentally different from the check we have in 48188c2
(http-walker: verify remote packs, 2009-06-06).

The first "check" is a full-blooded GET request, and we do get back
actual data (in this case, the pack index file). The second check isn't
a GET request, just an inconsequential HEAD request; we don't get back
any real data.

> Before that check, we ignored a request to
> fetch_index() if we already had one.

48188c2 didn't really remove the check (for the presence of the *.idx
locally), it just pushed it further down.

But being placed so late makes it rather ineffective and useless.

> Why do we even call fetch_index() when we have one?  After all, we are
> talking about "git clone" here, so it is not about "we failed once and the
> previous attempt left .idx files we fetched".  Why
>
> should we even have .idx file to begin with, that would have protected
> v1.6.3 clients from getting this error?
>
> Unless we are calling fetch_index() on the same .idx file twice from our
> own stupidity, that is.
>
> The same logic now is in fetch_pack_index(), which is called from
> fetch_and_setup_pack_index().  I do not still see how we end up calling
> the function for the same .idx file twice, though.

We should bump up this check before the verify-remote-pack one, I think
there's no mysterious bug here to find.

> In the meantime, I do not think it is a good idea to loosen the error
> checking on our side to accomodate a server in such a (hopefully
> temporary) broken state, however popular it is.

Agreed. I didn't intend my patch to loosen up error checking, merely to
be clearer about what we're looking for. If read in another context
(separate from fixing cloning over github.com), my patch can be seen as
one that clarifies the verify-remote-pack check:

  Case 1: A 404 is received. The pack file is not available, so we stop.

  Case 2: Our check failed, due to some reason (request failed,
          unauthorized, etc). Nothing conclusive about availabilty of
          file. Continue anyway.

PS. I wrote 48188c2 with the aim of having http-push.c and http-walker.c
fetch behaviour match each other as closely as possible, and later move
these out into http. At that time, I toyed with the idea of removing
the check. I wondered if the overhead incurred by the HEAD request was
useful, since if there's going to be an error, we would be hit anyway
with a GET. We already do this for almost every other http request
(info/refs, objects/info/packs, etc). But the patch series was meant to
be a refactor job, not a feature/behaviour change, so there you have it.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07  9:27         ` Tay Ray Chuan
@ 2009-09-07 19:06           ` Junio C Hamano
  2009-09-08 12:57             ` Jakub Narebski
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-09-07 19:06 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git, msysgit, Tom Preston-Werner

Tay Ray Chuan <rctay89@gmail.com> writes:

> Just to clarify: the "check" of CURLE_OK in http-walker.c::fetch_index()
> in v1.6.3 is fundamentally different from the check we have in 48188c2
> (http-walker: verify remote packs, 2009-06-06).
>
> The first "check" is a full-blooded GET request, and we do get back
> actual data (in this case, the pack index file). The second check isn't
> a GET request, just an inconsequential HEAD request; we don't get back
> any real data.

Yeah, I realized after I wrote the message you are responding to and ran
some experiments with github repository myself to see for some of their
URLs HEAD gives 500 when GET gives the contents just fine.

  Tom, sorry to have given a confusing list of issues in my earlier message.
  Please disregard it.  The only funny your HTTP server folks may want to
  look into is that GET request to fetch the following URL gives the
  contents just fine, but HEAD request to it results in an Internal Server
  Error.

  http://github.com/tekkub/addontemplate.git/objects/pack/pack-382c25c935b744e909c749532578112d72a4aff9.pack

  Sorry about the confusion.

> Agreed. I didn't intend my patch to loosen up error checking, merely to
> be clearer about what we're looking for. If read in another context
> (separate from fixing cloning over github.com), my patch can be seen as
> one that clarifies the verify-remote-pack check:
>
>   Case 1: A 404 is received. The pack file is not available, so we stop.
>
>   Case 2: Our check failed, due to some reason (request failed,
>           unauthorized, etc). Nothing conclusive about availabilty of
>           file. Continue anyway.

I am torn about this.

On one hand, if we are going to treat such a half failure as nothing
conclusive, I do not see a point to keep that check to begin with.

On the other hand, if a HEAD request to a URL results in an unauthorized,
what plausible excuse the server operator could give for allowing a GET
request to the same URL?  If we are going to keep the check if *.pack that
corresponds to the *.idx will be available, shouldn't we trust whatever
check we perform?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07 19:06           ` Junio C Hamano
@ 2009-09-08 12:57             ` Jakub Narebski
  2009-09-08 13:18               ` Tay Ray Chuan
  2009-09-08 13:10             ` Tay Ray Chuan
  2009-09-09 12:33             ` Tay Ray Chuan
  2 siblings, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2009-09-08 12:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tay Ray Chuan, git, msysgit, Tom Preston-Werner


Junio C Hamano <gitster@pobox.com> writes:
> Tay Ray Chuan <rctay89@gmail.com> writes:
> 
> > Just to clarify: the "check" of CURLE_OK in http-walker.c::fetch_index()
> > in v1.6.3 is fundamentally different from the check we have in 48188c2
> > (http-walker: verify remote packs, 2009-06-06).
> >
> > The first "check" is a full-blooded GET request, and we do get back
> > actual data (in this case, the pack index file). The second check isn't
> > a GET request, just an inconsequential HEAD request; we don't get back
> > any real data.
> 
> Yeah, I realized after I wrote the message you are responding to and ran
> some experiments with github repository myself to see for some of their
> URLs HEAD gives 500 when GET gives the contents just fine.
> 
>   Tom, sorry to have given a confusing list of issues in my earlier message.
>   Please disregard it.  The only funny your HTTP server folks may want to
>   look into is that GET request to fetch the following URL gives the
>   contents just fine, but HEAD request to it results in an Internal Server
>   Error.
> 
>   http://github.com/tekkub/addontemplate.git/objects/pack/pack-382c25c935b744e909c749532578112d72a4aff9.pack
> 
>   Sorry about the confusion.

[...]

> On the other hand, if a HEAD request to a URL results in an unauthorized,
> what plausible excuse the server operator could give for allowing a GET
> request to the same URL?  If we are going to keep the check if *.pack that
> corresponds to the *.idx will be available, shouldn't we trust whatever
> check we perform?

500 Internal Server Error doesn't look right (well, it can indicate
bug in server code), but would git respond to correct error code other
than 404 Not Found, like 405 Method Not Allowed, or 501 Not Implemented?

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07 19:06           ` Junio C Hamano
  2009-09-08 12:57             ` Jakub Narebski
@ 2009-09-08 13:10             ` Tay Ray Chuan
  2009-09-09 12:33             ` Tay Ray Chuan
  2 siblings, 0 replies; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-08 13:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit, Tom Preston-Werner

Hi,

On Tue, Sep 8, 2009 at 3:06 AM, Junio C Hamano<gitster@pobox.com> wrote:
> I am torn about this.
>
> On one hand, if we are going to treat such a half failure as nothing
> conclusive, I do not see a point to keep that check to begin with.
>
> On the other hand, if a HEAD request to a URL results in an unauthorized,
> what plausible excuse the server operator could give for allowing a GET
> request to the same URL?  If we are going to keep the check if *.pack that
> corresponds to the *.idx will be available, shouldn't we trust whatever
> check we perform?

I am in favour or removing this check, not just due to its
unreliability, but for the sake of consistency (we very rarely send a
HEAD request to poll data before doing a GET).

Do disregard the patch included earlier, if you haven't already done so.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-08 12:57             ` Jakub Narebski
@ 2009-09-08 13:18               ` Tay Ray Chuan
  0 siblings, 0 replies; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-08 13:18 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, git, msysgit, Tom Preston-Werner

Hi,

On Tue, Sep 8, 2009 at 8:57 PM, Jakub Narebski<jnareb@gmail.com> wrote:
> 500 Internal Server Error doesn't look right (well, it can indicate
> bug in server code), but would git respond to correct error code other
> than 404 Not Found, like 405 Method Not Allowed, or 501 Not Implemented?

I believe you're referring to git response to such error codes for GET requests?

For 404, we'll try using alternates (objects/info/http-alternates,
objects/info/alternates).

For the rest, we'll just abort the request. I'm not really sure of the
specific responses by git, like if we do ask for authentication, in
the case of 401s.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-07 19:06           ` Junio C Hamano
  2009-09-08 12:57             ` Jakub Narebski
  2009-09-08 13:10             ` Tay Ray Chuan
@ 2009-09-09 12:33             ` Tay Ray Chuan
  2009-09-09 19:08               ` Junio C Hamano
  2009-09-11  8:54               ` Junio C Hamano
  2 siblings, 2 replies; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-09 12:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit, Tom Preston-Werner, Jakub Narebski

Hi,

On Tue, Sep 8, 2009 at 9:10 PM, Tay Ray Chuan <rctay89@gmail.com> wrote:
> On Tue, Sep 8, 2009 at 3:06 AM, Junio C Hamano<gitster@pobox.com> wrote:
>> I am torn about this.
>>
>> On one hand, if we are going to treat such a half failure as nothing
>> conclusive, I do not see a point to keep that check to begin with.
>>
>> On the other hand, if a HEAD request to a URL results in an unauthorized,
>> what plausible excuse the server operator could give for allowing a GET
>> request to the same URL?  If we are going to keep the check if *.pack that
>> corresponds to the *.idx will be available, shouldn't we trust whatever
>> check we perform?
>
> I am in favour or removing this check, not just due to its
> unreliability, but for the sake of consistency (we very rarely send a
> HEAD request to poll data before doing a GET).

my patch below is in response to earlier deliberation.

I still think github.com should look into this issue (of differing
responses for HEAD and GET requests).

 -- >8 --

Subject: [PATCH] http.c: remove verification of remote packs

Make http.c::fetch_pack_index() no longer check for the remote pack
with a HEAD request before fetching the corresponding pack index file.

Not only does sending a HEAD request before we do a GET incur a
performance penalty, it does not offer any significant error-
prevention advantages (pack fetching in the *_http_pack_request()
methods is capable of handling any errors on its own).

This addresses an issue raised elsewhere:

  http://code.google.com/p/msysgit/issues/detail?id=323
  http://support.github.com/discussions/repos/957-cant-clone-over-http-or-git

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---

Junio, I'm not sure if the credits and references ("This addresses...")
should be included, since the patch doesn't look like it's fixing any
thing, even though it is a response to an acknowledged issue.

Please remove those lines if you so wish.

 http.c |   11 -----------
 1 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/http.c b/http.c
index 5926c5b..84def9f 100644
--- a/http.c
+++ b/http.c
@@ -869,17 +869,6 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
 	char *url;
 	struct strbuf buf = STRBUF_INIT;

-	/* Don't use the index if the pack isn't there */
-	end_url_with_slash(&buf, base_url);
-	strbuf_addf(&buf, "objects/pack/pack-%s.pack", hex);
-	url = strbuf_detach(&buf, 0);
-
-	if (http_get_strbuf(url, NULL, 0)) {
-		ret = error("Unable to verify pack %s is available",
-			    hex);
-		goto cleanup;
-	}
-
 	if (has_pack_index(sha1)) {
 		ret = 0;
 		goto cleanup;
--
1.6.4.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-09 12:33             ` Tay Ray Chuan
@ 2009-09-09 19:08               ` Junio C Hamano
  2009-09-11  8:54               ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-09-09 19:08 UTC (permalink / raw)
  To: Tay Ray Chuan
  Cc: Junio C Hamano, git, msysgit, Tom Preston-Werner, Jakub Narebski

Tay Ray Chuan <rctay89@gmail.com> writes:

> I still think github.com should look into this issue (of differing
> responses for HEAD and GET requests).
>
>  -- >8 --
>
> Subject: [PATCH] http.c: remove verification of remote packs
>
> Make http.c::fetch_pack_index() no longer check for the remote pack
> with a HEAD request before fetching the corresponding pack index file.
>
> Not only does sending a HEAD request before we do a GET incur a
> performance penalty, it does not offer any significant error-
> prevention advantages (pack fetching in the *_http_pack_request()
> methods is capable of handling any errors on its own).
>
> This addresses an issue raised elsewhere:
>
>   http://code.google.com/p/msysgit/issues/detail?id=323
>   http://support.github.com/discussions/repos/957-cant-clone-over-http-or-git
>
> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
> ---
>
> Junio, I'm not sure if the credits and references ("This addresses...")
> should be included, since the patch doesn't look like it's fixing any
> thing, even though it is a response to an acknowledged issue.
>
> Please remove those lines if you so wish.

I think the backstory deserves to be recorded in this case, which is what
you did.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-09 12:33             ` Tay Ray Chuan
  2009-09-09 19:08               ` Junio C Hamano
@ 2009-09-11  8:54               ` Junio C Hamano
  2009-09-12 10:01                 ` Tay Ray Chuan
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2009-09-11  8:54 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git, msysgit, Tom Preston-Werner, Jakub Narebski

Tay Ray Chuan <rctay89@gmail.com> writes:

>> I am in favour or removing this check, not just due to its
>> unreliability, but for the sake of consistency (we very rarely send a
>> HEAD request to poll data before doing a GET).

I think this is sensible.  The only case I think it could be a problem is
this scenario:

 - You clone from remote.

 - Remote works more and has two more packs, A and B. 

 - You fetch, but this does not need both of them (perhaps you fetched
   only one branch).  You get *.idx files for both, but *.pack file only
   for A.

 - Remote works a bit more and then repacks everything into a single pack
   C.

 - You fetch, and the walker walks the loose objects, and then finds one
   object that cannot be obtained as a loose object.  It tries to look up
   in the *.idx file and finds it in B.

   But the packfile B is long gone.

I didn't follow the codepath that uses http_get_info_packs() and then uses
repo->packs list to see what it does, but as long as the above does not
happen we should be Ok.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-11  8:54               ` Junio C Hamano
@ 2009-09-12 10:01                 ` Tay Ray Chuan
  2009-09-12 17:31                   ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Tay Ray Chuan @ 2009-09-12 10:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit, Tom Preston-Werner, Jakub Narebski

Hi,

On Fri, Sep 11, 2009 at 4:54 PM, Junio C Hamano <gitster@pobox.com> wrote:
>  - You fetch, and the walker walks the loose objects, and then finds one
>   object that cannot be obtained as a loose object.  It tries to look up
>   in the *.idx file and finds it in B.
>
>   But the packfile B is long gone.
>
> I didn't follow the codepath that uses http_get_info_packs() and then uses
> repo->packs list to see what it does, but as long as the above does not
> happen we should be Ok.

To determine which pack to get when fetching an object, http-walker.c
does not refer to the *.idx files git already has (that is, those
found locally). Instead, it builds a list of *.idx files (repo->packs
or walker->data->alt->packs) from the remote's objects/info/packs, and
uses that.

So even if the *.idx file for pack B was downloaded from a previous
fetch, it won't be used at all.

Therefore, your concern (over fetching a non-existent pack) won't play
out, unless the server does a repack -a, but forgets to update the
pack list (at objects/info/packs).

PS. The above was solely based on my reading of the code, no testing done.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Issue 323 in msysgit: Can't clone over http
  2009-09-12 10:01                 ` Tay Ray Chuan
@ 2009-09-12 17:31                   ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-09-12 17:31 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git, msysgit, Tom Preston-Werner, Jakub Narebski

Tay Ray Chuan <rctay89@gmail.com> writes:

> PS. The above was solely based on my reading of the code, no testing done.

That matches the understanding I had from reading the code.  I just wanted
a sanity check by another person who looked at and worked on the code more
deeply than I did.

Thanks, so the patch is a go.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-09-12 17:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0016e6470f36315b8a0472bc75a8@google.com>
2009-09-04 10:25 ` Issue 323 in msysgit: Can't clone over http Junio C Hamano
2009-09-04 13:29 ` Tay Ray Chuan
2009-09-07  4:59   ` Junio C Hamano
2009-09-07  5:53     ` Tay Ray Chuan
2009-09-07  7:10       ` Junio C Hamano
2009-09-07  8:18         ` Junio C Hamano
2009-09-07  9:27         ` Tay Ray Chuan
2009-09-07 19:06           ` Junio C Hamano
2009-09-08 12:57             ` Jakub Narebski
2009-09-08 13:18               ` Tay Ray Chuan
2009-09-08 13:10             ` Tay Ray Chuan
2009-09-09 12:33             ` Tay Ray Chuan
2009-09-09 19:08               ` Junio C Hamano
2009-09-11  8:54               ` Junio C Hamano
2009-09-12 10:01                 ` Tay Ray Chuan
2009-09-12 17:31                   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.