* Strangely broken git repo [not found] ` <46a038f90510082014i6b296f2bvbac56e25344cbdf2@mail.gmail.com> @ 2005-10-10 4:26 ` Martin Langhoff (CatalystIT) 2005-10-10 9:00 ` Junio C Hamano 0 siblings, 1 reply; 21+ messages in thread From: Martin Langhoff (CatalystIT) @ 2005-10-10 4:26 UTC (permalink / raw) To: git Hi! We are having strange problems pushing (and pulling) with a particular head in a git repo. The repo is publicly available via http, and when I clone it it gives me a lot of errors $ cg-clone http://locke.catalyst.net.nz/git/moodle.git#moodle--topnz ... Getting index for pack 3e3492f365bb0d4a1ae11dfa7cee9ebbf345e647 Getting pack 3113eb34ef85482c87c3575721ce978c7232071f which contains 007a0323cf0941476dc262f6e3aff6bb9600dcd8 error: The requested file was not found ... (many more like these) FINISHED --15:56:38-- Downloaded: 4,723 bytes in 1 files New branch: 6759e2800c0cef00017c63b7dbbed80e481dbe2c Cloned to moodle.git#moodle--topnz/ (origin http://locke.catalyst.net.nz/git/moodle.git#moodle--topnz available as branch "origin") aporo:~/tmp/moodle.git#moodle--topnz martin$ git-fsck-objects bad sha1 file: .git/objects/00/7a0323cf0941476dc262f6e3aff6bb9600dcd8.temp ... (many more like these) When using git+ssh, we are seeing very strange stalls during the fetch, and when trying to push a couple of small new commits, it has given us (from a Debian etch-ppc, git 0.99.8.b) gives us "fatal unpack should have generated <sha1> but I can't find it". This has been discussed earlier here http://www.gelato.unsw.edu.au/archives/git/0508/7152.html -- but our scenario doesn't involve rewinding or any strange trickery, it's just clone - edit - commit - push. From a different machine (Debian sarge, i386, git 0.99.8b) the same operation succeeds. On the repo itself, I've run git-fsck-objects --full --strict and it only complained about dangling tags. There are some heads on that repo that I have deleted a while ago without purging unreachable objects, so I kind of expected those. There are some automated jobs that touch this repo and may have messed things up: - cronjobs running git-cvsimport and git-archimport and pushing to this repo - weekly git-repack run over the repo Uff. I have the feeling that this isn't a very complete picture, but I'm not sure what else to do to debug this one. Pointers welcome. cheers, martin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 4:26 ` Strangely broken git repo Martin Langhoff (CatalystIT) @ 2005-10-10 9:00 ` Junio C Hamano 2005-10-10 14:54 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-10 9:00 UTC (permalink / raw) To: Martin Langhoff (CatalystIT); +Cc: Nick Hengeveld, Daniel Barkalow, git "Martin Langhoff (CatalystIT)" <martin@catalyst.net.nz> writes: > $ cg-clone http://locke.catalyst.net.nz/git/moodle.git#moodle--topnz > ... > Getting index for pack 3e3492f365bb0d4a1ae11dfa7cee9ebbf345e647 > Getting pack 3113eb34ef85482c87c3575721ce978c7232071f > which contains 007a0323cf0941476dc262f6e3aff6bb9600dcd8 > error: The requested file was not found > ... (many more like these) So it failed to grab 007a0323cf0941476dc262f6e3aff6bb9600dcd8, but that is OK. That object is in a pack: pack-3113eb34ef85482c87c3575721ce978c7232071f has it. > FINISHED --15:56:38-- > Downloaded: 4,723 bytes in 1 files > New branch: 6759e2800c0cef00017c63b7dbbed80e481dbe2c > Cloned to moodle.git#moodle--topnz/ (origin > http://locke.catalyst.net.nz/git/moodle.git#moodle--topnz available as > branch "origin") > aporo:~/tmp/moodle.git#moodle--topnz martin$ git-fsck-objects > bad sha1 file: .git/objects/00/7a0323cf0941476dc262f6e3aff6bb9600dcd8.temp > ... (many more like these) This *.temp file is what git-http-fetch creates while retrieving the object. After successfully downloading it, it is renamed to its final name (i.e. sans .temp), but that rename did not happen. Well, this object is in a pack, so after failing to retrieve that, we should have got the list of packs, retrieved pack index files, and grabbed the pack that contained that object. We *should* clean up the *.temp file for this object whe we do this, but I do not think we do. But I do not think this is the _cause_ of your problem. It is a symptom that something is going wrong -- namely, the question is why it was not found in pack-3113eb34ef85482c87c3575721ce978c7232071f. Did the client download that pack? I just tried to clone your repository with 'git clone', and saw something failing, but I think the object trasfer went fine all the way down to the initial commit. I can see this object in my cloned repository: commit 34cc394a7427adf730cfd3fb482dc6d6d5b58775 Author: Martin Langhoff <martin@catalyst.net.nz> Date: Tue Aug 3 02:50:51 2004 +0000 initial import There are funky tag names in your refs/tags directory. The really core tools should be able to grok them just fine, but git-http-fetch failed with this from curl library: The requested URL returned error: 404 error: Couldn't get http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID** for tags/MOODLE_15_MERGED **INVALID** I do not speak curl, but I wonder if we should be quoting these funky characters like SP and asterisk in the URL when we make that request, or it is what the library does for us. Hmph. Interesting. I just tried. $ curl 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID**' gives an error page "404 Not Found", while $ wget -O - -o /dev/null 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID**' works fine and gives 2ddfec0dfd0cffd4892af9aaf48ee29c40c7ada3 back. So we do need to fix things up somewhat in our scripts as well. Anyway, I think I know the problems 'git-clone' would have had if you tried to clone it with it (not cg-clone which I do not know much about), and luckily it is only towards the end (after fetching most of the heads, but hitting the first funky tag). We should be able to fix this relatively easily. Oh, Daniel and Nick, while I was reading the http-fetch.c code, I noticed another thing. The fetch_alternates() function was supposed to be call-once and we have a static variable got_alternates that becomes 1 when it runs for the first time. However, there are other 'return 0's introduced that does not set got_alternates to 1. I am wondering if the semantics has changed that now we chain the alternates? Initially, if we are cloning/fetching from repository A, which borrows from repository B (i.e. alternates we retrieve from A would name B), and if B in turn borrows from C, then we assumed that A's alternates would also name C, and that was the reason why fetch_alternates() was call-once function. I do not mind if we change it to chain the alternates file, but if that is the case we should move the got_alternates variable into "struct alt_base", and pass the struct, not just alt->base, to fetch_alternates(), like this (untested, of course): diff --git a/http-fetch.c b/http-fetch.c index 5821c9e..2301f88 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -38,6 +38,7 @@ struct alt_base { char *base; int got_indices; + int got_alternates; struct packed_git *packs; struct alt_base *next; }; @@ -529,8 +530,6 @@ void prefetch(unsigned char *sha1) #endif } -static int got_alternates = 0; - static int fetch_index(struct alt_base *repo, unsigned char *sha1) { char *hex = sha1_to_hex(sha1); @@ -611,8 +610,9 @@ static int setup_index(struct alt_base * return 0; } -static int fetch_alternates(char *base) +static int fetch_alternates(struct alt_base *this_alt) { + char *base = this_alt->base; int ret = 0; struct buffer buffer; char *url; @@ -622,15 +622,17 @@ static int fetch_alternates(char *base) struct alt_base *tail = alt; struct active_request_slot *slot; - if (got_alternates) + if (this_alt->got_alternates) return 0; + this_alt->got_alternates = 1; data = xmalloc(4096); buffer.size = 4095; buffer.posn = 0; buffer.buffer = data; if (get_verbosely) - fprintf(stderr, "Getting alternates list\n"); + fprintf(stderr, "Getting alternates list for %s\n", + base); url = xmalloc(strlen(base) + 31); sprintf(url, "%s/objects/info/http-alternates", base); @@ -721,7 +723,6 @@ static int fetch_alternates(char *base) } i = posn + 1; } - got_alternates = 1; return ret; } @@ -1092,9 +1093,10 @@ int main(int argc, char **argv) alt = xmalloc(sizeof(*alt)); alt->base = url; alt->got_indices = 0; + alt->got_alternates = 0; alt->packs = NULL; alt->next = NULL; - fetch_alternates(alt->base); + fetch_alternates(alt); if (pull(commit_id)) return 1; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 9:00 ` Junio C Hamano @ 2005-10-10 14:54 ` Linus Torvalds 2005-10-10 15:21 ` Linus Torvalds 2005-10-11 4:29 ` Quote reference names while fetching with curl Junio C Hamano 2005-10-12 3:26 ` Strangely broken git repo Nick Hengeveld 2 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2005-10-10 14:54 UTC (permalink / raw) To: Junio C Hamano Cc: Martin Langhoff (CatalystIT), Nick Hengeveld, Daniel Barkalow, git On Mon, 10 Oct 2005, Junio C Hamano wrote: > > Hmph. Interesting. I just tried. > > $ curl 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID**' > > gives an error page "404 Not Found", while > > $ wget -O - -o /dev/null 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID**' > > works fine and gives 2ddfec0dfd0cffd4892af9aaf48ee29c40c7ada3 > back. So we do need to fix things up somewhat in our scripts as > well. It seems to be the space. Doing $ curl 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED%20**INVALID**' works ok (ie %20 instead of ' '). As far as I can tell, we should probably _also_ quote any curl-specific stuff. As far as I can tell from the manual, if the tag were to have special characters like '[' and '{', curl might confuse them with being range specifiers. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 14:54 ` Linus Torvalds @ 2005-10-10 15:21 ` Linus Torvalds 2005-10-10 18:19 ` Morten Welinder 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2005-10-10 15:21 UTC (permalink / raw) To: Junio C Hamano Cc: Martin Langhoff (CatalystIT), Nick Hengeveld, Daniel Barkalow, git On Mon, 10 Oct 2005, Linus Torvalds wrote: > > It seems to be the space. Doing an strace on curl vs wget shows that curl seems to do no quoting at all. I'd personally argue that that is a serious bug in curl: it sure as hell knows that it's a http transport, and it seems to be just doing GET %s HTTP/1.0\r\nUser-agent:... without any sanity checking at all. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 15:21 ` Linus Torvalds @ 2005-10-10 18:19 ` Morten Welinder 2005-10-10 18:23 ` Linus Torvalds 2005-10-10 18:30 ` Johannes Schindelin 0 siblings, 2 replies; 21+ messages in thread From: Morten Welinder @ 2005-10-10 18:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Spaces in URLs are off-spec. (And common, go figure.) M. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 18:19 ` Morten Welinder @ 2005-10-10 18:23 ` Linus Torvalds 2005-10-10 18:30 ` Johannes Schindelin 1 sibling, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2005-10-10 18:23 UTC (permalink / raw) To: Morten Welinder; +Cc: git On Mon, 10 Oct 2005, Morten Welinder wrote: > > Spaces in URLs are off-spec. (And common, go figure.) They may be off-spec in url's, but that doesn't mean that "curl" should just ignore them. It should either escape them, or refuse them. Using user-supplied data without any checks is usually a really bad idea. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 18:19 ` Morten Welinder 2005-10-10 18:23 ` Linus Torvalds @ 2005-10-10 18:30 ` Johannes Schindelin 1 sibling, 0 replies; 21+ messages in thread From: Johannes Schindelin @ 2005-10-10 18:30 UTC (permalink / raw) To: Morten Welinder; +Cc: Linus Torvalds, git Hi, On Mon, 10 Oct 2005, Morten Welinder wrote: > Spaces in URLs are off-spec. (And common, go figure.) %20 Hth, Dscho ^ permalink raw reply [flat|nested] 21+ messages in thread
* Quote reference names while fetching with curl. 2005-10-10 9:00 ` Junio C Hamano 2005-10-10 14:54 ` Linus Torvalds @ 2005-10-11 4:29 ` Junio C Hamano 2005-10-11 5:07 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Junio C Hamano 2005-10-12 3:26 ` Strangely broken git repo Nick Hengeveld 2 siblings, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2005-10-11 4:29 UTC (permalink / raw) To: Martin Langhoff (CatalystIT); +Cc: git curl_escape ought to do this, but we should not let it quote slashes (nobody said refs/tags can have subdirectories), so we roll our own safer version. With this, the last part of git-clone that used to fail now works, which reads: $ git-http-fetch -v -a -w 'tags/MOODLE_15_MERGED **INVALID**' \ 'tags/MOODLE_15_MERGED **INVALID**' \ http://locke.catalyst.net.nz/git/moodle.git/ Signed-off-by: Junio C Hamano <junkio@cox.net> --- Junio C Hamano <junkio@cox.net> writes: > I do not speak curl, but I wonder if we should be quoting > these funky characters like SP and asterisk in the URL when we > make that request, or it is what the library does for us. > > Hmph. Interesting. I just tried. > > $ curl 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID**' > > gives an error page "404 Not Found", while > > $ wget -O - -o /dev/null 'http://locke.catalyst.net.nz/git/moodle.git/refs/tags/MOODLE_15_MERGED **INVALID**' > > works fine and gives 2ddfec0dfd0cffd4892af9aaf48ee29c40c7ada3 > back. So we do need to fix things up somewhat in our scripts as > well. > > Anyway, I think I know the problems 'git-clone' would have had > if you tried to clone it with it (not cg-clone which I do not > know much about), and luckily it is only towards the end (after > fetching most of the heads, but hitting the first funky tag). > We should be able to fix this relatively easily. With this patch on top of the parallel transfer http-fetch in proposed updates branch, git-clone successfully cloned your repository. But in general, we should avoid spaces in reference names. Scripts have problem with them. For example, "git-fetch --tags" currently cannot handle it. If people cared deeply enough maybe they can rewrite parts of it in Perl and send me a patch ;-). Anyway, I _do_ care about the really core part (i.e. things written in C, roughly speaking), so this patch will likely make into "master" branch. http-fetch.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 49 insertions(+), 8 deletions(-) applies-to: 4ac37eb1d8cf308455b69828c5df8f64634b5789 1d32c9c017af915a390693021938ae8c56f33007 diff --git a/http-fetch.c b/http-fetch.c index e537591..acae805 100644 --- a/http-fetch.c +++ b/http-fetch.c @@ -969,9 +969,56 @@ int fetch(unsigned char *sha1) alt->base); } +static inline int needs_quote(int ch) +{ + switch (ch) { + case '/': case '-': + case 'A'...'Z': + case 'a'...'z': + case '0'...'9': + return 0; + default: + return 1; + } +} + +static inline int hex(int v) +{ + if (v < 10) return '0' + v; + else return 'A' + v - 10; +} + +static char *quote_ref_url(const char *base, const char *ref) +{ + const char *cp; + char *dp, *qref; + int len, baselen, ch; + + baselen = strlen(base); + len = baselen + 6; /* "refs/" + NUL */ + for (cp = ref; (ch = *cp) != 0; cp++, len++) + if (needs_quote(ch)) + len += 2; /* extra two hex plus replacement % */ + qref = xmalloc(len); + memcpy(qref, base, baselen); + memcpy(qref + baselen, "refs/", 5); + for (cp = ref, dp = qref + baselen + 5; (ch = *cp) != 0; cp++) { + if (needs_quote(ch)) { + *dp++ = '%'; + *dp++ = hex((ch >> 4) & 0xF); + *dp++ = hex(ch & 0xF); + } + else + *dp++ = ch; + } + *dp = 0; + + return qref; +} + int fetch_ref(char *ref, unsigned char *sha1) { - char *url, *posn; + char *url; char hex[42]; struct buffer buffer; char *base = alt->base; @@ -981,13 +1028,7 @@ int fetch_ref(char *ref, unsigned char * buffer.buffer = hex; hex[41] = '\0'; - url = xmalloc(strlen(base) + 6 + strlen(ref)); - strcpy(url, base); - posn = url + strlen(base); - strcpy(posn, "refs/"); - posn += 5; - strcpy(posn, ref); - + url = quote_ref_url(base, ref); slot = get_active_slot(); curl_easy_setopt(slot->curl, CURLOPT_FILE, &buffer); curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, fwrite_buffer); --- 0.99.8.GIT ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-11 4:29 ` Quote reference names while fetching with curl Junio C Hamano @ 2005-10-11 5:07 ` Junio C Hamano 2005-10-11 6:04 ` Junio C Hamano 2005-10-11 19:55 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Matthias Urlichs 0 siblings, 2 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-11 5:07 UTC (permalink / raw) To: Martin Langhoff (CatalystIT); +Cc: git "git-fetch --tags" can get confused with tags with spaces in their names, it used to use shell IFS to split the list of tags and also used curl which insists the URL to be escaped. Fix it so it can work with Martin's moodle repository http://locke.catalyst.net.nz/git/moodle.git/. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Junio C Hamano <junkio@cox.net> writes: > But in general, we should avoid spaces in reference names. > Scripts have problem with them. For example, "git-fetch > --tags" currently cannot handle it. If people cared deeply > enough maybe they can rewrite parts of it in Perl and send me > a patch ;-). And as usual, I end up being "people", but I did not use Perl ;-). Only lightly tested but it should fetch tags from your repository. I did not want to keep slurping those 70MB and 16MB packs, so I cheated by creating a small repository with funky tag names locally while testing. git-fetch.sh | 21 +++++++++++++-------- 1 files changed, 13 insertions(+), 8 deletions(-) applies-to: 5db8c58bc7403c7b076dea420133e6e890835e1a c1a4b1dd71e23d877dfbf04a2ac8b1c238fc5eb4 diff --git a/git-fetch.sh b/git-fetch.sh index d398866..b3f3782 100755 --- a/git-fetch.sh +++ b/git-fetch.sh @@ -5,6 +5,10 @@ _x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]' _x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40" +LF=' +' +IFS="$LF" + tags= append= force= @@ -170,11 +174,14 @@ esac reflist=$(get_remote_refs_for_fetch "$@") if test "$tags" then - taglist=$(git-ls-remote --tags "$remote" | awk '{ print "."$2":"$2 }') + taglist=$(git-ls-remote --tags "$remote" | + sed -e ' + s/^[^ ]* // + s/.*/&:&/') if test "$#" -gt 1 then # remote URL plus explicit refspecs; we need to merge them. - reflist="$reflist $taglist" + reflist="$reflist$LF$taglist" else # No explicit refspecs; fetch tags only. reflist=$taglist @@ -183,7 +190,7 @@ fi for ref in $reflist do - refs="$refs $ref" + refs="$refs$LF$ref" # These are relative path from $GIT_DIR, typically starting at refs/ # but may be HEAD @@ -204,7 +211,7 @@ do remote_name=$(expr "$ref" : '\([^:]*\):') local_name=$(expr "$ref" : '[^:]*:\(.*\)') - rref="$rref $remote_name" + rref="$rref$LF$remote_name" # There are transports that can fetch only one head at a time... case "$remote" in @@ -212,11 +219,9 @@ do if [ -n "$GIT_SSL_NO_VERIFY" ]; then curl_extra_args="-k" fi - head=$(curl -nsf $curl_extra_args "$remote/$remote_name") && - expr "$head" : "$_x40\$" >/dev/null || - die "Failed to fetch $remote_name from $remote" echo >&2 Fetching "$remote_name from $remote" using http - git-http-fetch -v -a "$head" "$remote/" || exit + ref_name=`expr "$remote_name" : 'refs/\(.*\)'` + git-http-fetch -v -a "$ref_name" "$remote/" || exit ;; rsync://*) TMP_HEAD="$GIT_DIR/TMP_HEAD" --- 0.99.8.GIT ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-11 5:07 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Junio C Hamano @ 2005-10-11 6:04 ` Junio C Hamano 2005-10-12 5:29 ` Junio C Hamano 2005-10-11 19:55 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Matthias Urlichs 1 sibling, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2005-10-11 6:04 UTC (permalink / raw) To: git; +Cc: Martin Langhoff (CatalystIT) Junio C Hamano <junkio@cox.net> writes: > "git-fetch --tags" can get confused with tags with spaces in their names, > it used to use shell IFS to split the list of tags and also used curl > which insists the URL to be escaped. Fix it so it can work with Martin's > moodle repository http://locke.catalyst.net.nz/git/moodle.git/. The one I sent to the list was buggy and broke usual multi-head fetches, and what I have in the proposed updates branch is a replacement one. We cannot still do arbitrary reference names, but at least now we allow spaces in them. But I am not sure if this is a good change. Do we in general want to support references with [^-a-zA-Z0-9.] in them? Most notably spaces? The current replacement patch implies that .git/remotes/ short-cut file format now has a slight incompatible change. You cannot have more than one refspec on single Pull: line. I used to have: Pull: master:ko-master +pu:ko-pu maint:ko-maint but these should now be split into multiple lines, like this: Pull: master:ko-master Pull: +pu:ko-pu Pull: maint:ko-maint The latter format, one refpair per line, has always been supported, BTW. Opinions? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-11 6:04 ` Junio C Hamano @ 2005-10-12 5:29 ` Junio C Hamano 2005-10-12 8:26 ` Petr Baudis 2005-10-12 15:36 ` H. Peter Anvin 0 siblings, 2 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-12 5:29 UTC (permalink / raw) To: git; +Cc: Martin Langhoff (CatalystIT) Junio C Hamano <junkio@cox.net> writes: > We cannot still do arbitrary reference names, but at least now > we allow spaces in them. But I am not sure if this is a good > change. > > Do we in general want to support references with [^-a-zA-Z0-9.] > in them? Most notably spaces? > > Opinions? Here is mine. I do not personally think it is too much of a restriction if we said we only allow tags using letters from [-a-zA-Z0-9.] (yes I am trying to be controversial by not allowing even latin-1 names). Even without going that far, if we just said we do not allow shell metacharacters in tagnames (i.e. assuming UTF-8 encoded pathnames, non-ascii part of unicode character space is allowed), I suspect things will get much simpler to handle. The troublesome tags Martin's repository had were autocreated with cvsimport. That is something we could easily fix (I think we already do certain tagname munging). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-12 5:29 ` Junio C Hamano @ 2005-10-12 8:26 ` Petr Baudis 2005-10-12 15:36 ` H. Peter Anvin 1 sibling, 0 replies; 21+ messages in thread From: Petr Baudis @ 2005-10-12 8:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Martin Langhoff (CatalystIT) Dear diary, on Wed, Oct 12, 2005 at 07:29:45AM CEST, I got a letter where Junio C Hamano <junkio@cox.net> told me that... > I do not personally think it is too much of a restriction if we > said we only allow tags using letters from [-a-zA-Z0-9.] (yes I > am trying to be controversial by not allowing even latin-1 > names). I think this is perfectly reasonable, as long as you also throw _ to the set. ;-) Actually, cg-tag now already does at least (echo $name | egrep -qv '[^a-zA-Z0-9_.@!:-]') || \ die "name contains invalid characters" but I'm in no way emotionally attached to the @!: characters. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-12 5:29 ` Junio C Hamano 2005-10-12 8:26 ` Petr Baudis @ 2005-10-12 15:36 ` H. Peter Anvin 2005-10-12 15:47 ` H. Peter Anvin 2005-10-12 18:10 ` Junio C Hamano 1 sibling, 2 replies; 21+ messages in thread From: H. Peter Anvin @ 2005-10-12 15:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Martin Langhoff (CatalystIT) Junio C Hamano wrote: > > Here is mine. > > I do not personally think it is too much of a restriction if we > said we only allow tags using letters from [-a-zA-Z0-9.] (yes I > am trying to be controversial by not allowing even latin-1 > names). Even without going that far, if we just said we do not > allow shell metacharacters in tagnames (i.e. assuming UTF-8 > encoded pathnames, non-ascii part of unicode character space is > allowed), I suspect things will get much simpler to handle. > > The troublesome tags Martin's repository had were autocreated > with cvsimport. That is something we could easily fix (I think > we already do certain tagname munging). > I don't know about this. Saying we only support ASCII tagnames can be quote brutal for non-English-language projects. Even with the set above, we'd at the very least need underscore, and I know of at least one project which require # in tagnames. In short, I don't think we can make that decision for people. We can disallow whitespace, and we *have* to disallow at least newline due to the file format; I believe we should disallow all control characters (0-31, 127-159.) -hpa ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-12 15:36 ` H. Peter Anvin @ 2005-10-12 15:47 ` H. Peter Anvin 2005-10-12 18:57 ` Junio C Hamano 2005-10-12 18:10 ` Junio C Hamano 1 sibling, 1 reply; 21+ messages in thread From: H. Peter Anvin @ 2005-10-12 15:47 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Junio C Hamano, git, Martin Langhoff (CatalystIT) H. Peter Anvin wrote: > > We can disallow whitespace, and we *have* to disallow at least newline > due to the file format; I believe we should disallow all control > characters (0-31, 127-159.) > Actually, disallowing anything 128 and above means knowing the encoding system. If we enforce UTF-8, we should presumably disallow at the very least U+FFFE and U+FFFF too. C99 contains a list of Unicode characters allowed in C identifiers. I believe we should allow those characters (plus at a minimum -+.#:@) as well, but I'd much rather we didn't make those kinds of decisions for the user. If we have *specific* characters we can't tolerate we should rule them out (like control characters), but even that occationally leads to major frustration on the part of the user: "why can't I use '.' in tag names in CVS"? -hpa ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-12 15:47 ` H. Peter Anvin @ 2005-10-12 18:57 ` Junio C Hamano 0 siblings, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-12 18:57 UTC (permalink / raw) To: H. Peter Anvin; +Cc: git, Martin Langhoff (CatalystIT) "H. Peter Anvin" <hpa@zytor.com> writes: > H. Peter Anvin wrote: >> We can disallow whitespace, and we *have* to disallow at least >> newline due to the file format; I believe we should disallow all >> control characters (0-31, 127-159.) > > Actually, disallowing anything 128 and above means knowing the encoding > system. If we enforce UTF-8, we should presumably disallow at the very > least U+FFFE and U+FFFF too. Hmph. I think enforcing (or rather supporting preferentially) UTF-8 in log messages was alright, but enforcing UTF-8 tagnames imply UTF-8 host pathnames because we do not currently convert when we fetch refs from remote and store locally. * git-clone-pack, git-fetch-pack and git-peek-remote run git-upload-pack on the other end. Currently upload-pack sends a list of refs read from the remote filesystem without conversion, and: (1) clone-pack uses the names without conversion to replicate refs on the local filesystem. (2) fetch-pack sends the names given on the command line, and/or read from the local filesystem, to upload-pack without conversion. (3) fetch-pack and peek-remote outputs the names obtained from the remote without conversion to stdout. * over http, the encoding of the refnames client sees is what is stored in project.git/info/refs on the remote. Currently, update-server-info reads from the filesystem and writes this file out without conversion. While walking the commits, names are not used, so there is no refname encoding issues. What we should do at this point is to declare that exchanging refnames between systems is to happen after converting them to UTF-8. And version 1.0 just assumes pathnames are UTF-8. If people on systems with non UTF-8 pathnames cared enough, the tools can be made aware of local pathname encodings, and taught how to convert what for_each_ref() read from the filesystem, the refspecs given from the command line, etc. to UTF-8. But that can come later. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-12 15:36 ` H. Peter Anvin 2005-10-12 15:47 ` H. Peter Anvin @ 2005-10-12 18:10 ` Junio C Hamano 2005-10-12 22:01 ` [PATCH] git-check-ref-format: reject funny ref names Junio C Hamano 2005-10-12 22:01 ` [PATCH] Refuse to create funny refs in clone-pack, git-fetch and receive-pack Junio C Hamano 1 sibling, 2 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-12 18:10 UTC (permalink / raw) To: H. Peter Anvin; +Cc: git, Martin Langhoff (CatalystIT) "H. Peter Anvin" <hpa@zytor.com> writes: > We can disallow whitespace, and we *have* to disallow at least newline > due to the file format; I believe we should disallow all control > characters (0-31, 127-159.) I agree. Currently SP, TAB, and LF are all we care about, and nothing else; NUL does not even count. But if we are codifying a list of disallowed byte values, excluding all control characters is probably a sane thing to do, without introducing arbitrary and unnecessary limitation too much. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] git-check-ref-format: reject funny ref names. 2005-10-12 18:10 ` Junio C Hamano @ 2005-10-12 22:01 ` Junio C Hamano 2005-10-12 22:01 ` [PATCH] Refuse to create funny refs in clone-pack, git-fetch and receive-pack Junio C Hamano 1 sibling, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-12 22:01 UTC (permalink / raw) To: git Update check_ref_format() function to reject ref names that: * has a path component that begins with a ".", or * has ASCII control character, "~", "^", ":" or SP, anywhere, or * ends with a "/". Use it in 'git-checkout -b', 'git-branch', and 'git-tag' to make sure that newly created refs are well-formed. Signed-off-by: Junio C Hamano <junkio@cox.net> --- * This is the beginning of currently two-patch series. This one "fixes" the programs that trivially create new refs. Also check_ref_format() is used by commit walkers when writing a new ref or updating an existing ref, so this patch makes them to refuse funny refs being created. Makefile | 2 +- check-ref-format.c | 17 ++++++++++++++ git-branch.sh | 63 +++++++++++++++++++++++++++++----------------------- git-checkout.sh | 2 ++ git-tag.sh | 2 ++ refs.c | 52 +++++++++++++++++++++++++++++++++++-------- 6 files changed, 100 insertions(+), 38 deletions(-) create mode 100644 check-ref-format.c applies-to: ea37b42d53264d65f746b3e42577349e8a44d5c4 3fa00f52f80e5e51cc8098979ce0883c77caab52 diff --git a/Makefile b/Makefile index 7c8f647..2860d47 100644 --- a/Makefile +++ b/Makefile @@ -120,7 +120,7 @@ PROGRAMS = \ git-ssh-upload$X git-tar-tree$X git-unpack-file$X \ git-unpack-objects$X git-update-index$X git-update-server-info$X \ git-upload-pack$X git-verify-pack$X git-write-tree$X \ - git-update-ref$X git-symbolic-ref$X \ + git-update-ref$X git-symbolic-ref$X git-check-ref-format$X \ $(SIMPLE_PROGRAMS) # Backward compatibility -- to be removed after 1.0 diff --git a/check-ref-format.c b/check-ref-format.c new file mode 100644 index 0000000..a0adb3d --- /dev/null +++ b/check-ref-format.c @@ -0,0 +1,17 @@ +/* + * GIT - The information manager from hell + */ + +#include "cache.h" +#include "refs.h" + +#include <stdio.h> + +int main(int ac, char **av) +{ + if (ac != 2) + usage("git-check-ref-format refname"); + if (check_ref_format(av[1])) + exit(1); + return 0; +} diff --git a/git-branch.sh b/git-branch.sh index 074229c..e2db906 100755 --- a/git-branch.sh +++ b/git-branch.sh @@ -13,38 +13,42 @@ If two arguments, create a new branch <b } delete_branch () { - option="$1" branch_name="$2" + option="$1" + shift headref=$(GIT_DIR="$GIT_DIR" git-symbolic-ref HEAD | sed -e 's|^refs/heads/||') - case ",$headref," in - ",$branch_name,") - die "Cannot delete the branch you are on." ;; - ,,) - die "What branch are you on anyway?" ;; - esac - branch=$(cat "$GIT_DIR/refs/heads/$branch_name") && - branch=$(git-rev-parse --verify "$branch^0") || - die "Seriously, what branch are you talking about?" - case "$option" in - -D) - ;; - *) - mbs=$(git-merge-base -a "$branch" HEAD | tr '\012' ' ') - case " $mbs " in - *' '$branch' '*) - # the merge base of branch and HEAD contains branch -- - # which means that the HEAD contains everything in the HEAD. + for branch_name + do + case ",$headref," in + ",$branch_name,") + die "Cannot delete the branch you are on." ;; + ,,) + die "What branch are you on anyway?" ;; + esac + branch=$(cat "$GIT_DIR/refs/heads/$branch_name") && + branch=$(git-rev-parse --verify "$branch^0") || + die "Seriously, what branch are you talking about?" + case "$option" in + -D) ;; *) - echo >&2 "The branch '$branch_name' is not a strict subset of your current HEAD. -If you are sure you want to delete it, run 'git branch -D $branch_name'." - exit 1 + mbs=$(git-merge-base -a "$branch" HEAD | tr '\012' ' ') + case " $mbs " in + *' '$branch' '*) + # the merge base of branch and HEAD contains branch -- + # which means that the HEAD contains everything in the HEAD. + ;; + *) + echo >&2 "The branch '$branch_name' is not a strict subset of your current HEAD. + If you are sure you want to delete it, run 'git branch -D $branch_name'." + exit 1 + ;; + esac ;; esac - ;; - esac - rm -f "$GIT_DIR/refs/heads/$branch_name" - echo "Deleted branch $branch_name." + rm -f "$GIT_DIR/refs/heads/$branch_name" + echo "Deleted branch $branch_name." + done exit 0 } @@ -52,7 +56,7 @@ while case "$#,$1" in 0,*) break ;; *,-* do case "$1" in -d | -D) - delete_branch "$1" "$2" + delete_branch "$@" exit ;; --) @@ -93,6 +97,9 @@ branchname="$1" rev=$(git-rev-parse --verify "$head") || exit -[ -e "$GIT_DIR/refs/heads/$branchname" ] && die "$branchname already exists" +[ -e "$GIT_DIR/refs/heads/$branchname" ] && + die "$branchname already exists." +git-check-ref-format "heads/$branchname" || + die "we do not like '$branchname' as a branch name." echo $rev > "$GIT_DIR/refs/heads/$branchname" diff --git a/git-checkout.sh b/git-checkout.sh index c382590..2c053a3 100755 --- a/git-checkout.sh +++ b/git-checkout.sh @@ -17,6 +17,8 @@ while [ "$#" != "0" ]; do die "git checkout: -b needs a branch name" [ -e "$GIT_DIR/refs/heads/$newbranch" ] && die "git checkout: branch $newbranch already exists" + git-check-ref-format "heads/$newbranch" || + die "we do not like '$newbranch' as a branch name." ;; "-f") force=1 diff --git a/git-tag.sh b/git-tag.sh index 25c1a0e..faa7667 100755 --- a/git-tag.sh +++ b/git-tag.sh @@ -53,6 +53,8 @@ if [ -e "$GIT_DIR/refs/tags/$name" -a -z die "tag '$name' already exists" fi shift +git-check-ref-format "tags/$name" || + die "we do not like '$name' as a tag name." object=$(git-rev-parse --verify --default HEAD "$@") || exit 1 type=$(git-cat-file -t $object) || exit 1 diff --git a/refs.c b/refs.c index 5a8cbd4..2d2144c 100644 --- a/refs.c +++ b/refs.c @@ -335,17 +335,51 @@ int write_ref_sha1(const char *ref, int return retval; } +/* + * Make sure "ref" is something reasonable to have under ".git/refs/"; + * We do not like it if: + * + * - any path component of it begins with ".", or + * - it has ASCII control character, "~", "^", ":" or SP, anywhere, or + * - it ends with a "/". + */ + +static inline int bad_ref_char(int ch) +{ + return (((unsigned) ch) <= ' ' || + ch == '~' || ch == '^' || ch == ':'); +} + int check_ref_format(const char *ref) { - char *middle; - if (ref[0] == '.' || ref[0] == '/') - return -1; - middle = strchr(ref, '/'); - if (!middle || !middle[1]) - return -1; - if (strchr(middle + 1, '/')) - return -1; - return 0; + int ch, level; + const char *cp = ref; + + level = 0; + while (1) { + while ((ch = *cp++) == '/') + ; /* tolerate duplicated slashes */ + if (!ch) + return -1; /* should not end with slashes */ + + /* we are at the beginning of the path component */ + if (ch == '.' || bad_ref_char(ch)) + return -1; + + /* scan the rest of the path component */ + while ((ch = *cp++) != 0) { + if (bad_ref_char(ch)) + return -1; + if (ch == '/') + break; + } + level++; + if (!ch) { + if (level < 2) + return -1; /* at least of form "heads/blah" */ + return 0; + } + } } int write_ref_sha1_unlocked(const char *ref, const unsigned char *sha1) --- 0.99.8.GIT ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH] Refuse to create funny refs in clone-pack, git-fetch and receive-pack. 2005-10-12 18:10 ` Junio C Hamano 2005-10-12 22:01 ` [PATCH] git-check-ref-format: reject funny ref names Junio C Hamano @ 2005-10-12 22:01 ` Junio C Hamano 1 sibling, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-12 22:01 UTC (permalink / raw) To: git Using git-check-ref-format, make sure we do not create refs with funny names when cloning from elsewhere (clone-pack), fast forwarding local heads (git-fetch), or somebody pushes into us (receive-pack). Signed-off-by: Junio C Hamano <junkio@cox.net> --- * This is the second installment. Cloning over HTTP uses the commit walker '-w' ref updates, and should already be covered by the previous "check_ref_format()" updates. clone-pack.c | 6 ++++++ git-parse-remote.sh | 6 ++++++ receive-pack.c | 4 ++++ 3 files changed, 16 insertions(+), 0 deletions(-) applies-to: 409fe70483c97167e9b7d396334ec5f96932694d 8da42e4b3363de13ede2f006ad96ee8268e52250 diff --git a/clone-pack.c b/clone-pack.c index c102ca8..48bee96 100644 --- a/clone-pack.c +++ b/clone-pack.c @@ -34,6 +34,12 @@ static void write_one_ref(struct ref *re int fd; char *hex; + if (!strncmp(ref->name, "refs/", 5) && + check_ref_format(ref->name + 5)) { + error("refusing to create funny ref '%s' locally", ref->name); + return; + } + if (safe_create_leading_directories(path)) die("unable to create leading directory for %s", ref->name); fd = open(path, O_CREAT | O_EXCL | O_WRONLY, 0666); diff --git a/git-parse-remote.sh b/git-parse-remote.sh index 5e75e15..aea7b0e 100755 --- a/git-parse-remote.sh +++ b/git-parse-remote.sh @@ -94,6 +94,12 @@ canon_refs_list_for_fetch () { heads/* | tags/* ) local="refs/$local" ;; *) local="refs/heads/$local" ;; esac + + if local_ref_name=$(expr "$local" : 'refs/\(.*\)') + then + git-check-ref-format "$local_ref_name" || + die "* refusing to create funny ref '$local_ref_name' locally" + fi echo "${dot_prefix}${force}${remote}:${local}" dot_prefix=. done diff --git a/receive-pack.c b/receive-pack.c index 06857eb..8f157bc 100644 --- a/receive-pack.c +++ b/receive-pack.c @@ -95,6 +95,10 @@ static int update(const char *name, char new_hex[60], *old_hex, *lock_name; int newfd, namelen, written; + if (!strncmp(name, "refs/", 5) && check_ref_format(name + 5)) + return error("refusing to create funny ref '%s' locally", + name); + namelen = strlen(name); lock_name = xmalloc(namelen + 10); memcpy(lock_name, name, namelen); --- 0.99.8.GIT ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] git-fetch --tags: deal with tags with spaces in them. 2005-10-11 5:07 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Junio C Hamano 2005-10-11 6:04 ` Junio C Hamano @ 2005-10-11 19:55 ` Matthias Urlichs 1 sibling, 0 replies; 21+ messages in thread From: Matthias Urlichs @ 2005-10-11 19:55 UTC (permalink / raw) To: git Hi, Junio C Hamano wrote: > I cheated by creating a small repository > with funky tag names locally while testing. Your patch is missing the file "t/t5410-send-funky-names.sh". ;-) -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - Being generous is inborn; being altruistic is a learned perversity. No resemblance ... -- Lazarus Long ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-10 9:00 ` Junio C Hamano 2005-10-10 14:54 ` Linus Torvalds 2005-10-11 4:29 ` Quote reference names while fetching with curl Junio C Hamano @ 2005-10-12 3:26 ` Nick Hengeveld 2005-10-12 4:22 ` Junio C Hamano 2 siblings, 1 reply; 21+ messages in thread From: Nick Hengeveld @ 2005-10-12 3:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Martin Langhoff (CatalystIT), Daniel Barkalow, git On Mon, Oct 10, 2005 at 02:00:06AM -0700, Junio C Hamano wrote: > The fetch_alternates() function was supposed to be call-once and > we have a static variable got_alternates that becomes 1 when it > runs for the first time. However, there are other 'return 0's > introduced that does not set got_alternates to 1. Oops, thanks for catching that - I'll fix it. > I am wondering if the semantics has changed that now we chain > the alternates? Initially, if we are cloning/fetching from > repository A, which borrows from repository B (i.e. alternates > we retrieve from A would name B), and if B in turn borrows from > C, then we assumed that A's alternates would also name C, and > that was the reason why fetch_alternates() was call-once > function. I do not mind if we change it to chain the alternates > file, but if that is the case we should move the got_alternates > variable into "struct alt_base", and pass the struct, not just > alt->base, to fetch_alternates(), like this (untested, of > course): I'd thought about chaining the alternates, but wasn't sure whether it's safe to consider all alternates in the chain equivalent. I would want to implement fetching the chain such that it could happen parallel to other request and not block the pull. I think it would also be useful to track the network performance of each alternate and prefer faster sources as new requests are started. -- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Strangely broken git repo 2005-10-12 3:26 ` Strangely broken git repo Nick Hengeveld @ 2005-10-12 4:22 ` Junio C Hamano 0 siblings, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2005-10-12 4:22 UTC (permalink / raw) To: Nick Hengeveld; +Cc: Martin Langhoff (CatalystIT), Daniel Barkalow, git Nick Hengeveld <nickh@reactrix.com> writes: > Oops, thanks for catching that - I'll fix it. That's OK. It turns out that fetch_alternates() function itself is now called from the main just once; it used to be called lazily and needed the call-once guard. > I'd thought about chaining the alternates, but wasn't sure whether it's > safe to consider all alternates in the chain equivalent. Well, I decided we shouldn't, because the original repository that has objects/info/alternates surely doesn't. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2005-10-12 22:01 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <46a038f90510062014l7f5740e0l77fc53b50f822e8f@mail.gmail.com> [not found] ` <46a038f90510082014i6b296f2bvbac56e25344cbdf2@mail.gmail.com> 2005-10-10 4:26 ` Strangely broken git repo Martin Langhoff (CatalystIT) 2005-10-10 9:00 ` Junio C Hamano 2005-10-10 14:54 ` Linus Torvalds 2005-10-10 15:21 ` Linus Torvalds 2005-10-10 18:19 ` Morten Welinder 2005-10-10 18:23 ` Linus Torvalds 2005-10-10 18:30 ` Johannes Schindelin 2005-10-11 4:29 ` Quote reference names while fetching with curl Junio C Hamano 2005-10-11 5:07 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Junio C Hamano 2005-10-11 6:04 ` Junio C Hamano 2005-10-12 5:29 ` Junio C Hamano 2005-10-12 8:26 ` Petr Baudis 2005-10-12 15:36 ` H. Peter Anvin 2005-10-12 15:47 ` H. Peter Anvin 2005-10-12 18:57 ` Junio C Hamano 2005-10-12 18:10 ` Junio C Hamano 2005-10-12 22:01 ` [PATCH] git-check-ref-format: reject funny ref names Junio C Hamano 2005-10-12 22:01 ` [PATCH] Refuse to create funny refs in clone-pack, git-fetch and receive-pack Junio C Hamano 2005-10-11 19:55 ` [PATCH] git-fetch --tags: deal with tags with spaces in them Matthias Urlichs 2005-10-12 3:26 ` Strangely broken git repo Nick Hengeveld 2005-10-12 4:22 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).