* git fails with control characters in trunk directory name @ 2009-05-11 20:08 Hugo Mildenberger 2009-05-12 6:51 ` Alex Riesen 0 siblings, 1 reply; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-11 20:08 UTC (permalink / raw) To: git Using the mouse to paste a git url from a website into a terminal session in order to clone the repository, I recently managed to include invisible control characters into the git trunk directory name. Consequently, I faced all sort of strange behaviour like git pull not working (error 2), later on a kernel make which supposedly could not finding a rule to create the trunk directory and more such inconsistencies. I then reinstalled git, rcs and so on and also tried unsuccessfully several git versions. The next morning I looked into the .git/config file and recognized that the "url" key value within the [remote "origin"] section contained some control characters: ^J and \n, as fas as I remember. While this was almost entirely my fault, git could possibly apply a filter, reject such a name or at least issue a warning. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-11 20:08 git fails with control characters in trunk directory name Hugo Mildenberger @ 2009-05-12 6:51 ` Alex Riesen 2009-05-12 9:02 ` Hugo Mildenberger 0 siblings, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-12 6:51 UTC (permalink / raw) To: Hugo Mildenberger; +Cc: git 2009/5/11 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: > Using the mouse to paste a git url from a website into a terminal session in > order to clone the repository, I recently managed to include invisible > control characters into the git trunk directory name. Git has no "trunk". Not in CVS/CVN sense, at least > Consequently, I faced all sort of strange behaviour like git pull not working > (error 2), later on a kernel make which supposedly could not finding a rule > to create the trunk directory and more such inconsistencies. > I then reinstalled git, rcs and so on and also tried unsuccessfully several > git versions. The next morning I looked into the .git/config file and > recognized that the "url" key value within the [remote "origin"] section > contained some control characters: ^J and \n, as fas as I remember. What platform are you on? Can you show your .git/config? > While this was almost entirely my fault, git could possibly apply a filter, > reject such a name or at least issue a warning. Maybe. Or maybe it can just work (well, assuming the user meant to use an url with character you considered "control"). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 6:51 ` Alex Riesen @ 2009-05-12 9:02 ` Hugo Mildenberger 2009-05-12 10:54 ` Alex Riesen 0 siblings, 1 reply; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-12 9:02 UTC (permalink / raw) To: git Hello Alex, > 2009/5/11 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: > > Using the mouse to paste a git url from a website into a terminal session > > in order to clone the repository, I recently managed to include invisible > > control characters into the git trunk directory name. > > Git has no "trunk". Not in CVS/CVN sense, at least > > > Consequently, I faced all sort of strange behaviour like git pull not > > working (error 2), later on a kernel make which supposedly could not > > finding a rule to create the trunk directory and more such > > inconsistencies. > > I then reinstalled git, rcs and so on and also tried unsuccessfully > > several git versions. The next morning I looked into the .git/config file > > and recognized that the "url" key value within the [remote "origin"] > > section contained some control characters: ^J and \n, as fas as I > > remember. > > What platform are you on? > Can you show your .git/config? > > > While this was almost entirely my fault, git could possibly apply a > > filter, reject such a name or at least issue a warning. > > Maybe. Or maybe it can just work (well, assuming the user meant to > use an url with character you considered "control"). I said it actually did not work well, independent from how anyone prefers to classify characters. My platform is Gentoo-hardened with unicode support and an ext3 disk format . With "trunk directory" I meant the top level directory which is created when you run git clone on a remote url -- sorry for still not being a native git speaker. I don't have the original setup anymore. My _working_ ".git/config" is now: [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master The remote url did contain something like "^J\n" at the end, as did the top level (the much derided "trunk") directory name on disk. I probably got there by puting the copied url within quotes on the command line, but today I'm unsure exactly how I arrived there. However, old fashioned as I am, I still consider e.g. a linefeed to be a "control character", and inspite of your flashing git punditry I still consider this to be an issue. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 9:02 ` Hugo Mildenberger @ 2009-05-12 10:54 ` Alex Riesen 2009-05-12 13:57 ` Hugo Mildenberger 0 siblings, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-12 10:54 UTC (permalink / raw) To: Hugo Mildenberger; +Cc: git 2009/5/12 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: >> 2009/5/11 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: >> > While this was almost entirely my fault, git could possibly apply a >> > filter, reject such a name or at least issue a warning. >> >> Maybe. Or maybe it can just work (well, assuming the user meant to >> use an url with character you considered "control"). > > I said it actually did not work well, independent from how anyone prefers to > classify characters. My platform is Gentoo-hardened with unicode support and > an ext3 disk format . With "trunk directory" I meant the top level > directory which is created when you run git clone on a remote url -- sorry > for still not being a native git speaker. I don't have the original setup > anymore. My _working_ ".git/config" is now: Ok, let's assume you picked up '\r\n' from that web page > The remote url did contain something like "^J\n" at the end, as did the top > level (the much derided "trunk") directory name on disk. I probably got there > by puting the copied url within quotes on the command line, but today I'm > unsure exactly how I arrived there. However, old fashioned as I am, I still > consider e.g. a linefeed to be a "control character", and inspite of your > flashing git punditry I still consider this to be an issue. Git tries not to enforce any specific rules where the program itself does not need them. One can easily imagine an automatic system which generates names of repositories from something like binary data (something like BigTable comes to mind) and the less rules the underlying levels impose on it, the simplier the upper levels will be. OTOH, a warning about commonly used delimiters not fitting a name context, maybe a good idea. Like "\r\n\t", backslash (came up recently on this list). Such a check and associated warning may be useful for repository names and branches. Still, it's more of a policy issue and I would make it optional, even if enabled by default. Maybe even by defining a regexp which the repo name or branch name must (for hard error) or should (for a warning) match. For your specific case, you can take a look at builtin-clone.c, just after the line containing "guess_dir_name(repo_name"... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 10:54 ` Alex Riesen @ 2009-05-12 13:57 ` Hugo Mildenberger 2009-05-12 14:59 ` Alex Riesen 0 siblings, 1 reply; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-12 13:57 UTC (permalink / raw) To: Alex Riesen; +Cc: git > OTOH, a warning about commonly used delimiters not fitting a name > context, maybe a good idea. Like "\r\n\t", backslash (came up recently > on this list). Such a check and associated warning may be useful for > repository names and branches. > > Still, it's more of a policy issue and I would make it optional, even > if enabled by default. Maybe even by defining a regexp which the > repo name or branch name must (for hard error) or should (for a warning) > match. > > For your specific case, you can take a look at builtin-clone.c, > just after the line containing "guess_dir_name(repo_name"... > -- But at least the git versions I tried (up to 1.6.3) really do have a problem when facing a trailing newline in repository names; so one should enforce a convention. Although the situation here is certainly a far-flung, uncommon one, it could also happen when git was called from scripts. I looked into guess_dir_name(). A regex call would be easy to fit, but currently the git binary does not depend on libpcre. Is it generally considered to be acceptable to add such a dependency? While I like the idea to make use of a configurable regular expression, such an expression had to be a command line parameter with a reasonable default value, because .git/config still would not exist when the value would be needed. Last not least, I managed to reproduce the problem almost exactly: 1.) hm@localhost git clone "git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git " (Note the trailing linefeed) 2.) hm@localhost ~/tmp/bluetooth-testing.git $ make Makefile:313: /home/hm/tmp/bluetooth-testing.git /scripts/Kbuild.include: No such file or directory make[1]: /home/hm/tmp/bluetooth-testing.git: No such file or directory make[1]: *** No rule to make target `/home/hm/tmp/bluetooth-testing.git'. Stop. make: *** No rule to make target `include/config/auto.conf', needed by `include/config/kernel.release'. Stop. 3.) hm@localhost ~/tmp/bluetooth-testing.git $ git pull fatal: Error in line 2: 4.) ".git/config" now contains url = git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git\n I particulary liked the git message "fatal: Error in line 2:" ... grep says, it would stem from builtin-fmt-merge-msg.c/fmt_merge_msg(): 270 /* get a line */ 271 while (pos < in->len) { 272 int len; 273 char *newline, *p = in->buf + pos; 274 275 newline = strchr(p, '\n'); 276 len = newline ? newline - p : strlen(p); 277 pos += len + !!newline; 278 i++; 279 p[len] = 0; 280 if (handle_line(p)) 281 die ("Error in line %d: %.*s", i, len, p); 282 } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 13:57 ` Hugo Mildenberger @ 2009-05-12 14:59 ` Alex Riesen 2009-05-12 16:59 ` Hugo Mildenberger 0 siblings, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-12 14:59 UTC (permalink / raw) To: Hugo Mildenberger; +Cc: git 2009/5/12 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: >> OTOH, a warning about commonly used delimiters not fitting a name >> context, maybe a good idea. Like "\r\n\t", backslash (came up recently >> on this list). Such a check and associated warning may be useful for >> repository names and branches. >> >> Still, it's more of a policy issue and I would make it optional, even >> if enabled by default. Maybe even by defining a regexp which the >> repo name or branch name must (for hard error) or should (for a warning) >> match. >> >> For your specific case, you can take a look at builtin-clone.c, >> just after the line containing "guess_dir_name(repo_name"... >> -- > > But at least the git versions I tried (up to 1.6.3) really do have a problem > when facing a trailing newline in repository names; so one should enforce a > convention. That's what I mean by saying: "... even if enabled by default". I just want to disable newbie-helping annoyances on my systems. > I looked into guess_dir_name(). That's not the right place. The place I meant is right below the call to this function (you have to parse the names given in the command-line too). The automatically generated (that's the case with guess_dir_name) directory name certainly shouldn't contain any unexpected characters. > A regex call would be easy to fit, but > currently the git binary does not depend on libpcre. Is it generally > considered to be acceptable to add such a dependency? No. And pcre is not the only regex lib in the world. And we prefer shell patterns, if any at all. > While I like the idea to make use of a configurable regular expression, such > an expression had to be a command line parameter with a reasonable default > value, because .git/config still would not exist when the value would be > needed. That's where _default_ policy plays its role. "Default" like in "it is compiled into the git executable and needs no configuration present". > Last not least, I managed to reproduce the problem almost exactly: > > 1.) hm@localhost git > clone "git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git > " > (Note the trailing linefeed) That's all the command printed? No "Initialized empty Git repository" line? > 2.) hm@localhost ~/tmp/bluetooth-testing.git $ make Hmm... At this point the clone may have worked (at least partially). It named "bluetooth-testing.git", which it shouldn't (but explainable: the repo url suffix is not .git anymore, but ".git\r\n"). But it looks like the post-clone checkout failed (silently? which would be bad): > Makefile:313: /home/hm/tmp/bluetooth-testing.git > /scripts/Kbuild.include: No such file or directory > make[1]: /home/hm/tmp/bluetooth-testing.git: No such file or directory > make[1]: *** No rule to make target `/home/hm/tmp/bluetooth-testing.git'. > Stop. > make: *** No rule to make target `include/config/auto.conf', needed by > `include/config/kernel.release'. Stop. Assuming the files must be present, of course. > 3.) hm@localhost ~/tmp/bluetooth-testing.git $ git pull > fatal: Error in line 2: > > 4.) ".git/config" now contains > > url = > git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git\n > > I particulary liked the git message "fatal: Error in line 2:" ... > Separate issue. Will look at it later. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 14:59 ` Alex Riesen @ 2009-05-12 16:59 ` Hugo Mildenberger 2009-05-12 17:18 ` Alex Riesen 2009-05-12 17:41 ` git fails with control characters in trunk directory name Alex Riesen 0 siblings, 2 replies; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-12 16:59 UTC (permalink / raw) To: Alex Riesen; +Cc: git > > I looked into guess_dir_name(). > > That's not the right place. The place I meant is right below the call > to this function (you have to parse the names given in the command-line > too). > > The automatically generated (that's the case with guess_dir_name) > directory name certainly shouldn't contain any unexpected characters. > > > A regex call would be easy to fit, but > > currently the git binary does not depend on libpcre. Is it generally > > considered to be acceptable to add such a dependency? > > No. And pcre is not the only regex lib in the world. And we prefer > shell patterns, if any at all. > > > While I like the idea to make use of a configurable regular expression, such > > an expression had to be a command line parameter with a reasonable default > > value, because .git/config still would not exist when the value would be > > needed. > > That's where _default_ policy plays its role. "Default" like in "it is compiled > into the git executable and needs no configuration present". > > > Last not least, I managed to reproduce the problem almost exactly: > > > > 1.) hm@localhost git > > clone "git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git > > " > > (Note the trailing linefeed) > > That's all the command printed? No "Initialized empty Git repository" line? > > > 2.) hm@localhost ~/tmp/bluetooth-testing.git $ make > > Hmm... At this point the clone may have worked (at least partially). > It named "bluetooth-testing.git", which it shouldn't (but explainable: > the repo url suffix is not .git anymore, but ".git\r\n"). But it looks like > the post-clone checkout failed (silently? which would be bad): > > > Makefile:313: /home/hm/tmp/bluetooth-testing.git > > /scripts/Kbuild.include: No such file or directory > > make[1]: /home/hm/tmp/bluetooth-testing.git: No such file or directory > > make[1]: *** No rule to make target `/home/hm/tmp/bluetooth-testing.git'. > > Stop. > > make: *** No rule to make target `include/config/auto.conf', needed by > > `include/config/kernel.release'. Stop. > > Assuming the files must be present, of course. > > > 3.) hm@localhost ~/tmp/bluetooth-testing.git $ git pull > > fatal: Error in line 2: > > > > 4.) ".git/config" now contains > > > > url = > > git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git\n > > > > I particulary liked the git message "fatal: Error in line 2:" ... > > > > Separate issue. Will look at it later. > -- You really want to use shell patterns to match against a string from within a binary? Although git already makes use of regexec from glibc or compat/regex directory in numerous places? Regarding the procedure above, sorry for keeping it much too short. Here comes a complete sequence (I will put some comments within square brackets): hm@localhost /var/tmp $ git --version git version 1.6.0.6 hm@localhost /var/tmp $ git clone "git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git " Initialized empty Git repository in /mnt/hda1/tmp/bluetooth-testing.git /.git/ remote: Counting objects: 1177836, done. remote: Compressing objects: 100% (189467/189467), done. remote: Total 1177836 (delta 982785), reused 1176855 (delta 981880) Receiving objects: 100% (1177836/1177836), 288.16 MiB | 1288 KiB/s, done. Resolving deltas: 100% (982785/982785), done. Checking out files: 100% (27842/27842), done. hm@localhost /var/tmp $ ls bluetooth-testing.git? [Note that question mark replacing \n in repository's name] hm@localhost /var/tmp $ cd blue bluetooth-testing.git^J/ bluez-utils-2.25-r1.new/ [I typed cd blue<TAB> here, and there is the "^J" I remembered to have seen elsewhere] hm@localhost /var/tmp $ cd 'bluetooth-testing.git '/ [bash puts the directory name automatically within single quotes in this case] hm@localhost /var/tmp/bluetooth-testing.git $ make defconfig *** Default configuration is based on 'i386_defconfig' # # configuration written to .config # hm@localhost /var/tmp/bluetooth-testing.git $ make make[1]: /mnt/hda1/tmp/bluetooth-testing.git: No such file or directory make[1]: *** No rule to make target `/mnt/hda1/tmp/bluetooth-testing.git'. Stop. make: *** No rule to make target `include/config/auto.conf', needed by `include/config/kernel.release'. Stop. [The problem is simply the odd directory name, while make tries to use the real one] hm@localhost /var/tmp/bluetooth-testing.git $ git pull fatal: Error in line 2: ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 16:59 ` Hugo Mildenberger @ 2009-05-12 17:18 ` Alex Riesen 2009-05-12 17:24 ` [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD Alex Riesen 2009-05-12 17:41 ` git fails with control characters in trunk directory name Alex Riesen 1 sibling, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-12 17:18 UTC (permalink / raw) To: Hugo Mildenberger; +Cc: git 2009/5/12 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: >> > Last not least, I managed to reproduce the problem almost exactly: >> > >> > 1.) hm@localhost git >> > clone "git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git >> > " >> > (Note the trailing linefeed) >> >> That's all the command printed? No "Initialized empty Git repository" line? > > hm@localhost /var/tmp $ git clone "git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-testing.git > " > Initialized empty Git repository in /mnt/hda1/tmp/bluetooth-testing.git > /.git/ > remote: Counting objects: 1177836, done. > remote: Compressing objects: 100% (189467/189467), done. > remote: Total 1177836 (delta 982785), reused 1176855 (delta 981880) > Receiving objects: 100% (1177836/1177836), 288.16 MiB | 1288 KiB/s, done. > Resolving deltas: 100% (982785/982785), done. > Checking out files: 100% (27842/27842), done. > > hm@localhost /var/tmp $ ls > bluetooth-testing.git? [Note that question mark replacing \n in repository's name] > Ok, clone works. Fully and correctly (even if a bit unexpected). > hm@localhost /var/tmp/bluetooth-testing.git $ make > make[1]: /mnt/hda1/tmp/bluetooth-testing.git: No such file or directory > make[1]: *** No rule to make target `/mnt/hda1/tmp/bluetooth-testing.git'. Stop. It is just linux build system which does not support compilation in oddly named directories (which is reasonable, if you ask me). > hm@localhost /var/tmp/bluetooth-testing.git $ git pull > fatal: Error in line 2: That's already fixed. Will send the fix in a minute. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-12 17:18 ` Alex Riesen @ 2009-05-12 17:24 ` Alex Riesen 2009-05-12 23:16 ` Junio C Hamano 2009-05-13 18:08 ` [PATCH 1/2] " Alex Riesen 0 siblings, 2 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-12 17:24 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Daniel Barkalow, Hugo Mildenberger The fmt-merge-msg does a strong syntax checking of its input and fails with if it is incorrect. The LF character is the only character important for fmt-merge-msg. As the url in FETCH_HEAD plays only informational role, a quoted representation of the url should be good and true enough. The url often comes from either user-editable config or command line, so it is reasonable to expect all kinds of characters in it, including the characters which the format of FETCH_HEAD considers special (line separator in this case). Noticed and reported by Hugo Mildenberger. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- Alex Riesen, Tue, May 12, 2009 19:18:33 +0200: > 2009/5/12 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: > > hm@localhost /var/tmp/bluetooth-testing.git $ git pull > > fatal: Error in line 2: > > That's already fixed. Will send the fix in a minute. Here builtin-fetch.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/builtin-fetch.c b/builtin-fetch.c index 3c998ea..eef7091 100644 --- a/builtin-fetch.c +++ b/builtin-fetch.c @@ -353,12 +353,17 @@ static int store_updated_refs(const char *url, const char *remote_name, kind); note_len += sprintf(note + note_len, "'%s' of ", what); } - note_len += sprintf(note + note_len, "%.*s", url_len, url); - fprintf(fp, "%s\t%s\t%s\n", + fprintf(fp, "%s\t%s\t%s", sha1_to_hex(commit ? commit->object.sha1 : rm->old_sha1), rm->merge ? "" : "not-for-merge", note); + for (i = 0; i < url_len; ++i) + if ('\n' == url[i]) + fputs("\\n", fp); + else + fputc(url[i], fp); + fputc('\n', fp); if (ref) rc |= update_local_ref(ref, what, note); -- 1.6.3.28.ga852b ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-12 17:24 ` [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD Alex Riesen @ 2009-05-12 23:16 ` Junio C Hamano 2009-05-13 6:06 ` Alex Riesen 2009-05-13 18:08 ` [PATCH 1/2] " Alex Riesen 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2009-05-12 23:16 UTC (permalink / raw) To: Alex Riesen; +Cc: git, Junio C Hamano, Daniel Barkalow, Hugo Mildenberger Alex Riesen <raa.lkml@gmail.com> writes: > + for (i = 0; i < url_len; ++i) > + if ('\n' == url[i]) > + fputs("\\n", fp); > + else > + fputc(url[i], fp); > + fputc('\n', fp); This ad-hoc quoting feels _very_ wrong. Who is on the reading side and how does it unquote? If it uses quote.c infrastructure, we should be quoting using the function from the same library shouldn't we? If it is just informational use only, then it might make more sense to drop this ugly "quoted \n" silently. I dunno. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-12 23:16 ` Junio C Hamano @ 2009-05-13 6:06 ` Alex Riesen [not found] ` <200905131340.31509.Hugo.Mildenberger@namir.de> ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-13 6:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Daniel Barkalow, Hugo Mildenberger 2009/5/13 Junio C Hamano <gitster@pobox.com>: > Alex Riesen <raa.lkml@gmail.com> writes: > >> + for (i = 0; i < url_len; ++i) >> + if ('\n' == url[i]) >> + fputs("\\n", fp); >> + else >> + fputc(url[i], fp); >> + fputc('\n', fp); > > This ad-hoc quoting feels _very_ wrong. Who is on the reading side and > how does it unquote? git fmt-merge-msg. It does not unquote. The url is purely informational here. OTOH, the \n shouldn't be in url text at all, so treat it as slightly less annoying warning. > If it is just informational use only, then it might make more sense to > drop this ugly "quoted \n" silently. I dunno. That'd mean to loose the information completely. Which is just as bad as putting the LF in the url in the first place. ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <200905131340.31509.Hugo.Mildenberger@namir.de>]
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD [not found] ` <200905131340.31509.Hugo.Mildenberger@namir.de> @ 2009-05-13 12:10 ` Alex Riesen 2009-05-13 14:49 ` Hugo Mildenberger 0 siblings, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-13 12:10 UTC (permalink / raw) To: Hugo Mildenberger; +Cc: Git Mailing List 2009/5/13 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: > Am Mittwoch, 13. Mai 2009 schrieb Alex Riesen <raa.lkml@gmail.com> >> >> That'd mean to loose the information completely. Which is just as bad >> as putting the LF in the url in the first place. > > This stray linefeed is not information, but pure contamination. Thus it > would be much better to simply strip it off. Not at the place where the patch changes it. In git clone, maybe (the many times mentioned guessing function). But then, we have to provide an option to leave the names alone, verbatim (which, I think, the non-quessing form already provides. No additional coding necessary). > And besides from the fact that git apply rejects this patch (fatal: corrupt > patch at line 6), This is your local problem. Maybe you copied the patch through clipboard and it kept the presentation in KMail, or KMail generally corrupts mails, I don't know. Just save the mail as is and do "git am -3" on it. > I think it would also not handle the equally wrong repository directory name on disk, You think wrong. It does handle it perfectly. And it is not wrong. It is just how it is. Don't spend too much time in MacOS and that overgrown DOS. They'll wreck your brain. > which then possibly leads to subsequent make failures (as it actually happend in > the case I described earlier here.) The make(1) works in such directories just fine. Fix yours (thinking about it now - you wont find a make which does not work in such directories). > Why not just return to your original idea, which proposed testing the > repository name against a regular expression describing a forbidden > set of characters (which is "\n", currently) and then terminate with a > clear message? The idea was not abandoned nor followed upon. It's just I don't need that warning and would switch it off immediately anyway (if it was implemented). I just spent a minute thinking about it (that's where regexp idea came from). I'm not going to work on it (at least, not right now). I'm even sure you wont be working on it too, now when you have learned what the problem is and how to work around it. And that's while all you needed is to put a two-three lines in that guessing function... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 12:10 ` Alex Riesen @ 2009-05-13 14:49 ` Hugo Mildenberger 0 siblings, 0 replies; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-13 14:49 UTC (permalink / raw) To: Alex Riesen; +Cc: Git Mailing List Am Mittwoch, 13. Mai 2009 schrieb Alex Riesen: > 2009/5/13 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: > > Am Mittwoch, 13. Mai 2009 schrieb Alex Riesen <raa.lkml@gmail.com> > >> > >> That'd mean to loose the information completely. Which is just as bad > >> as putting the LF in the url in the first place. > > > > This stray linefeed is not information, but pure contamination. Thus it > > would be much better to simply strip it off. > > Not at the place where the patch changes it. In git clone, maybe (the many > times mentioned guessing function). But then, we have to provide an option > to leave the names alone, verbatim (which, I think, the non-quessing form > already provides. No additional coding necessary). > > > And besides from the fact that git apply rejects this patch (fatal: corrupt > > patch at line 6), > > This is your local problem. Maybe you copied the patch through clipboard > and it kept the presentation in KMail, or KMail generally corrupts > mails, I don't > know. Just save the mail as is and do "git am -3" on it. My apologies, the problem actually was copy & paste from kmail. > > I think it would also not handle the equally wrong repository directory name on disk, > > You think wrong. It does handle it perfectly. I'm thinking right. The stray linefeed is still there: ls -l 'bluetooth-testing.git '/ -d drwxr-xr-x 23 hm hm 4096 13. Mai 15:06 bluetooth-testing.git?/ > > which then possibly leads to subsequent make failures (as it actually happend in > > the case I described earlier here.) > > The make(1) works in such directories just fine. Fix yours (thinking about > it now - you wont find a make which does not work in such directories). You aren't saying which make you are using, but GNU make version 3.81 or the linux kernel make system still can't cope with a linefeed contaminated directory name: hm@localhost /var/tmp/bluetooth-testing.git $ make make[1]: /mnt/hda1/tmp/bluetooth-testing.git: No such file or directory make[1]: *** No rule to make target `/mnt/hda1/tmp/bluetooth-testing.git'. Stop. make: *** No rule to make target `include/config/auto.conf', needed by `include/config/kernel.release'. Stop. > > Why not just return to your original idea, which proposed testing the > > repository name against a regular expression describing a forbidden > > set of characters (which is "\n", currently) and then terminate with a > > clear message? > > The idea was not abandoned nor followed upon. It's just I don't need that > warning and would switch it off immediately anyway (if it was implemented). > I just spent a minute thinking about it (that's where regexp idea came from). > I'm not going to work on it (at least, not right now). I'm even sure you wont > be working on it too, now when you have learned what the problem is and > how to work around it. And that's while all you needed is to put a two-three > lines in that guessing function... I reported this problem here after I already knew the cause and how to avoid the problem, because I thought it might be useful fix it generally, especially useful for those who run git within a brushwood of scripts, which I do not. If you personally would need this or not is not of so much interest here. But I was interested in discussing a clean way to implement it. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 6:06 ` Alex Riesen [not found] ` <200905131340.31509.Hugo.Mildenberger@namir.de> @ 2009-05-13 12:39 ` Hugo Mildenberger 2009-05-13 15:18 ` Daniel Barkalow 2 siblings, 0 replies; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-13 12:39 UTC (permalink / raw) To: git Am Mittwoch, 13. Mai 2009 schrieb Alex Riesen <raa.lkml@gmail.com> > 2009/5/13 Junio C Hamano <gitster@pobox.com>: > > Alex Riesen <raa.lkml@gmail.com> writes: > > > >> + for (i = 0; i < url_len; ++i) > >> + if ('\n' == url[i]) > >> + fputs("\\n", fp); > >> + else > >> + fputc(url[i], fp); > >> + fputc('\n', fp); > > > > This ad-hoc quoting feels _very_ wrong. Who is on the reading side and > > how does it unquote? > > git fmt-merge-msg. It does not unquote. The url is purely informational here. > OTOH, the \n shouldn't be in url text at all, so treat it as slightly > less annoying > warning. > > > If it is just informational use only, then it might make more sense to > > drop this ugly "quoted \n" silently. I dunno. > > That'd mean to loose the information completely. Which is just as bad > as putting the LF in the url in the first place. > -- This stray linefeed is not information, but pure contamination. Thus it would be much better to simply strip it off. And besides from the fact that git apply rejects this patch (fatal: corrupt patch at line 6), I think it would also not handle the equally wrong repository directory name on disk, which then possibly leads to subsequent make failures (as it actually happend in the case I described earlier here.) Why not just return to your original idea, which proposed testing the repository name against a regular expression describing a forbidden set of characters (which is "\n", currently) and then terminate with a clear message? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 6:06 ` Alex Riesen [not found] ` <200905131340.31509.Hugo.Mildenberger@namir.de> 2009-05-13 12:39 ` Hugo Mildenberger @ 2009-05-13 15:18 ` Daniel Barkalow 2009-05-13 16:09 ` Alex Riesen 2009-05-13 18:23 ` Junio C Hamano 2 siblings, 2 replies; 30+ messages in thread From: Daniel Barkalow @ 2009-05-13 15:18 UTC (permalink / raw) To: Alex Riesen; +Cc: Junio C Hamano, git, Hugo Mildenberger [-- Attachment #1: Type: TEXT/PLAIN, Size: 1523 bytes --] On Wed, 13 May 2009, Alex Riesen wrote: > 2009/5/13 Junio C Hamano <gitster@pobox.com>: > > Alex Riesen <raa.lkml@gmail.com> writes: > > > >> + for (i = 0; i < url_len; ++i) > >> + if ('\n' == url[i]) > >> + fputs("\\n", fp); > >> + else > >> + fputc(url[i], fp); > >> + fputc('\n', fp); > > > > This ad-hoc quoting feels _very_ wrong. Who is on the reading side and > > how does it unquote? > > git fmt-merge-msg. It does not unquote. The url is purely informational here. > OTOH, the \n shouldn't be in url text at all, so treat it as slightly > less annoying > warning. > > > If it is just informational use only, then it might make more sense to > > drop this ugly "quoted \n" silently. I dunno. > > That'd mean to loose the information completely. Which is just as bad > as putting the LF in the url in the first place. Looking back at the original message, it looks like the user included a newline in an argument to clone, and the fetch must have stripped it out (or ignored it in some other fashion), because data was retrieved from a repository that doesn't have a newline in its name. Most likely, the newline should just be prohibited in the URL in the config file in the first place, and we shouldn't be able to get to the point of writing a FETCH_HEAD with that value. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 15:18 ` Daniel Barkalow @ 2009-05-13 16:09 ` Alex Riesen 2009-05-13 17:07 ` Alex Riesen 2009-05-13 17:12 ` Daniel Barkalow 2009-05-13 18:23 ` Junio C Hamano 1 sibling, 2 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-13 16:09 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Junio C Hamano, git, Hugo Mildenberger 2009/5/13 Daniel Barkalow <barkalow@iabervon.org>: > On Wed, 13 May 2009, Alex Riesen wrote: >> 2009/5/13 Junio C Hamano <gitster@pobox.com>: >> >> > If it is just informational use only, then it might make more sense to >> > drop this ugly "quoted \n" silently. I dunno. >> >> That'd mean to loose the information completely. Which is just as bad >> as putting the LF in the url in the first place. > > Looking back at the original message, it looks like the user included a > newline in an argument to clone, and the fetch must have stripped it out > (or ignored it in some other fashion), because data was retrieved from a > repository that doesn't have a newline in its name. _That_ looks like a bug to me. We should have asked exactly the name we were given, or notify the user about what we have used for the url if we have ignored user's input and decided to use something else. Patches follow. > ... Most likely, the > newline should just be prohibited in the URL in the config file in the > first place, and we shouldn't be able to get to the point of writing a > FETCH_HEAD with that value. What I cannot understand is what's wrong with url containing an LF? Especially if we can handle it, all the tools can handle it, and putting LF in a filename is a long-standing UNIX tradition. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 16:09 ` Alex Riesen @ 2009-05-13 17:07 ` Alex Riesen 2009-05-13 17:12 ` Daniel Barkalow 1 sibling, 0 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-13 17:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Daniel Barkalow, Hugo Mildenberger Strip trailing spaces off guessed target directory, and replace every 'control' character in the new directory name with ASCII space (\x20). The user still can have any name by specifying it explicitely after url. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- Alex Riesen, Wed, May 13, 2009 18:09:42 +0200: > 2009/5/13 Daniel Barkalow <barkalow@iabervon.org>: > > On Wed, 13 May 2009, Alex Riesen wrote: > >> 2009/5/13 Junio C Hamano <gitster@pobox.com>: > >> > >> > If it is just informational use only, then it might make more sense to > >> > drop this ugly "quoted \n" silently. I dunno. > >> > >> That'd mean to loose the information completely. Which is just as bad > >> as putting the LF in the url in the first place. > > > > Looking back at the original message, it looks like the user included a > > newline in an argument to clone, and the fetch must have stripped it out > > (or ignored it in some other fashion), because data was retrieved from a > > repository that doesn't have a newline in its name. > > _That_ looks like a bug to me. We should have asked exactly the name > we were given, or notify the user about what we have used for the url > if we have ignored user's input and decided to use something else. > Patches follow. > Now I just have to figure out why the fetch works. Something seem to strip spaces off urls... builtin-clone.c | 24 ++++++++++++++++++------ 1 files changed, 18 insertions(+), 6 deletions(-) diff --git a/builtin-clone.c b/builtin-clone.c index 880373f..bdbe931 100644 --- a/builtin-clone.c +++ b/builtin-clone.c @@ -104,11 +104,12 @@ static char *get_repo_path(const char *repo, int *is_bundle) static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) { const char *end = repo + strlen(repo), *start; + char *dir; /* - * Strip trailing slashes and /.git + * Strip trailing spaces, slashes and /.git */ - while (repo < end && is_dir_sep(end[-1])) + while (repo < end && (is_dir_sep(end[-1]) || isspace(end[-1]))) end--; if (end - repo > 5 && is_dir_sep(end[-5]) && !strncmp(end - 4, ".git", 4)) { @@ -140,10 +141,21 @@ static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) if (is_bare) { struct strbuf result = STRBUF_INIT; strbuf_addf(&result, "%.*s.git", (int)(end - start), start); - return strbuf_detach(&result, 0); - } - - return xstrndup(start, end - start); + dir = strbuf_detach(&result, 0); + } else + dir = xstrndup(start, end - start); + /* replace all 'control' characters with ascii space */ + for (start = dir; *start; ++start) + if (*(const unsigned char *)start < 32u) + dir[start - dir] = '\x20'; + /* remove trailing spaces */ + if (dir < start) + for (end = start; dir < --end; ) + if (!isspace(*end)) + break; + else + dir[end - dir] = '\0'; + return dir; } static void strip_trailing_slashes(char *dir) -- 1.6.3.28.ga852b ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 16:09 ` Alex Riesen 2009-05-13 17:07 ` Alex Riesen @ 2009-05-13 17:12 ` Daniel Barkalow 2009-05-13 18:11 ` Alex Riesen 1 sibling, 1 reply; 30+ messages in thread From: Daniel Barkalow @ 2009-05-13 17:12 UTC (permalink / raw) To: Alex Riesen; +Cc: Junio C Hamano, git, Hugo Mildenberger [-- Attachment #1: Type: TEXT/PLAIN, Size: 2022 bytes --] On Wed, 13 May 2009, Alex Riesen wrote: > 2009/5/13 Daniel Barkalow <barkalow@iabervon.org>: > > On Wed, 13 May 2009, Alex Riesen wrote: > >> 2009/5/13 Junio C Hamano <gitster@pobox.com>: > >> > >> > If it is just informational use only, then it might make more sense to > >> > drop this ugly "quoted \n" silently. I dunno. > >> > >> That'd mean to loose the information completely. Which is just as bad > >> as putting the LF in the url in the first place. > > > > Looking back at the original message, it looks like the user included a > > newline in an argument to clone, and the fetch must have stripped it out > > (or ignored it in some other fashion), because data was retrieved from a > > repository that doesn't have a newline in its name. > > _That_ looks like a bug to me. We should have asked exactly the name > we were given, or notify the user about what we have used for the url > if we have ignored user's input and decided to use something else. > Patches follow. > > > ... Most likely, the > > newline should just be prohibited in the URL in the config file in the > > first place, and we shouldn't be able to get to the point of writing a > > FETCH_HEAD with that value. > > What I cannot understand is what's wrong with url containing an LF? > Especially if we can handle it, all the tools can handle it, and putting > LF in a filename is a long-standing UNIX tradition. The RFC for URLs (which we don't actually follow particularly much) doesn't permit whitespace. Protocols that use URLs directly generally use newlines as control characters. For example, an HTTP request for: http://host/path/with newline/in/it would parse as a request for "http://host/path/with" with an invalid header line. Normally, to refer to a resource whose name contains a newline, the URL contains "%0A" instead. Of course, our "url" config can contain some things that aren't URLs, but this particular case was of the form that's supposed to be a standard URL. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 17:12 ` Daniel Barkalow @ 2009-05-13 18:11 ` Alex Riesen 0 siblings, 0 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-13 18:11 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Junio C Hamano, git, Hugo Mildenberger Daniel Barkalow, Wed, May 13, 2009 19:12:26 +0200: > Of course, our "url" config can contain some things that aren't URLs, but > this particular case was of the form that's supposed to be a standard URL. This particular case handled the _url_ just fine. The problem was in the automatically generated name for local directory. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-13 15:18 ` Daniel Barkalow 2009-05-13 16:09 ` Alex Riesen @ 2009-05-13 18:23 ` Junio C Hamano 1 sibling, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2009-05-13 18:23 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Alex Riesen, Junio C Hamano, git, Hugo Mildenberger Daniel Barkalow <barkalow@iabervon.org> writes: > On Wed, 13 May 2009, Alex Riesen wrote: > >> 2009/5/13 Junio C Hamano <gitster@pobox.com>: >> > Alex Riesen <raa.lkml@gmail.com> writes: >> > >> >> + for (i = 0; i < url_len; ++i) >> >> + if ('\n' == url[i]) >> >> + fputs("\\n", fp); >> >> + else >> >> + fputc(url[i], fp); >> >> + fputc('\n', fp); >> > >> > This ad-hoc quoting feels _very_ wrong. Who is on the reading side and >> > how does it unquote? >> >> git fmt-merge-msg. It does not unquote. The url is purely informational here. >> OTOH, the \n shouldn't be in url text at all, so treat it as slightly >> less annoying >> warning. >> >> > If it is just informational use only, then it might make more sense to >> > drop this ugly "quoted \n" silently. I dunno. >> >> That'd mean to loose the information completely. Which is just as bad >> as putting the LF in the url in the first place. > > Looking back at the original message, it looks like the user included a > newline in an argument to clone, and the fetch must have stripped it out > (or ignored it in some other fashion), because data was retrieved from a > repository that doesn't have a newline in its name. Most likely, the > newline should just be prohibited in the URL in the config file in the > first place, and we shouldn't be able to get to the point of writing a > FETCH_HEAD with that value. Let me attempt to summarize the situation. . FETCH_HEAD is a LF terminated sequence of records; each record describes the commit object name, if it is meant to be merged, what URL and what ref it came from; . "git pull" reads from FETCH_HEAD to decide which commit objects to pass to underlying "git merge", and "git fmt-merge-msg" reads from FETCH_HEAD to decide what message to produce; and . "git fetch" allows a URL with an LF in it and fetches from such a URL just fine, but "git pull" barfed because "git fmt-patch" noticed. Because we copy the LF in the URL straight to FETCH_HEAD, it breaks the file format. Alex's proposal is to quote it as "\\n" to avoid it (I suggested stripping it). Either change will _fix_ the situation in the sense that the file format will now be correct, "git pull" will extract the right set of commits (as it does not look at the URL at all when deciding which records are used for merge), and "git fmt-merge-msg" will generate correct set of branches and list of incoming commits without parse errors, even though the user won't be able to cut and paste the URL it records in its output either way (if stripped, it will be "information loss", but even if it were quoted, it won't be usable because nobody unquotes it). Given all of the above, I think: (1) Alex's patch is a good minimum fix to the issue [*1*]. If unfixed, people will be able to fetch from but won't be able to pull from a URL with a LF in it. (2) Even though it seems that we do _support_ fetching from such a URL, it is not a good thing. People may do all sorts of crazy things, and this may be one of these crazy things that we _could_ break the backward compatibility to avoid innocent mistakes, by forbidding LF in URLs. Nobody could have been using such a URL in real settings because of the issue Alex's patch addresses anyway. (3) But forbidding or warning will be done by new code. Is the cost to do such a check (implementation and maintenance of the new code) justifiable? Where do we check and when? So for now, I'd say I take Alex's patch as-is, and do nothing else. If somebody has a compelling example that this kind of mistake is common and is hard to figure out why, we can explicitly forbid certain byte values in the repository URL as a separate step. [Footnote] *1* The output is purely informational, and especially its URL field is not meant to be used as cut-and-paste source by general public anyway (e.g. you may pull using git over SSH but general public may not have an SSH access to the repository). I actually was tempted, when I did the initial version of fmt-merge-msg, to shorten an overlong URL with ellipses, even though I didn't. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] Quote LF in urls git fetch saves in FETCH_HEAD 2009-05-12 17:24 ` [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD Alex Riesen 2009-05-12 23:16 ` Junio C Hamano @ 2009-05-13 18:08 ` Alex Riesen 2009-05-13 20:53 ` [PATCH 2/2] Improve the naming of guessed target repository for git clone Alex Riesen 1 sibling, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-13 18:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Daniel Barkalow, Hugo Mildenberger The fmt-merge-msg does a strong syntax checking of its input and fails with if it is incorrect. The LF character is the only character important for fmt-merge-msg. As the url in FETCH_HEAD plays only informational role, a quoted representation of the url should be good and true enough. The url often comes from either user-editable config or command line, so it is reasonable to expect all kinds of characters in it, including the characters which the format of FETCH_HEAD considers special (line separator in this case). Noticed and reported by Hugo Mildenberger. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- Alex Riesen, Tue, May 12, 2009 19:24:52 +0200: > Alex Riesen, Tue, May 12, 2009 19:18:33 +0200: > > 2009/5/12 Hugo Mildenberger <Hugo.Mildenberger@namir.de>: > > > hm@localhost /var/tmp/bluetooth-testing.git $ git pull > > > fatal: Error in line 2: > > > > That's already fixed. Will send the fix in a minute. > > Here > That one has a bug which breaks t5515: printing an unterminated string ('note'). builtin-fetch.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/builtin-fetch.c b/builtin-fetch.c index 3c998ea..ec75df0 100644 --- a/builtin-fetch.c +++ b/builtin-fetch.c @@ -353,12 +353,18 @@ static int store_updated_refs(const char *url, const char *remote_name, kind); note_len += sprintf(note + note_len, "'%s' of ", what); } - note_len += sprintf(note + note_len, "%.*s", url_len, url); - fprintf(fp, "%s\t%s\t%s\n", + note[note_len] = '\0'; + fprintf(fp, "%s\t%s\t%s", sha1_to_hex(commit ? commit->object.sha1 : rm->old_sha1), rm->merge ? "" : "not-for-merge", note); + for (i = 0; i < url_len; ++i) + if ('\n' == url[i]) + fputs("\\n", fp); + else + fputc(url[i], fp); + fputc('\n', fp); if (ref) rc |= update_local_ref(ref, what, note); -- 1.6.3.28.ga852b ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-13 18:08 ` [PATCH 1/2] " Alex Riesen @ 2009-05-13 20:53 ` Alex Riesen 2009-05-14 0:41 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-13 20:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Daniel Barkalow, Hugo Mildenberger Strip trailing spaces off guessed target directory in builtin clone, and replace 'control' characters with an ASCII space. User still can have any name by specifying it explicitely after url. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- This should take care of accidental pastings inside shell quotes. At least for the local part of the operation. Now I'm looking at the code and think I should have stripped the heading whitespace as well. It is much less likely to happen, though. builtin-clone.c | 24 ++++++++++++++++++------ 1 files changed, 18 insertions(+), 6 deletions(-) diff --git a/builtin-clone.c b/builtin-clone.c index 880373f..bdbe931 100644 --- a/builtin-clone.c +++ b/builtin-clone.c @@ -104,11 +104,12 @@ static char *get_repo_path(const char *repo, int *is_bundle) static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) { const char *end = repo + strlen(repo), *start; + char *dir; /* - * Strip trailing slashes and /.git + * Strip trailing spaces, slashes and /.git */ - while (repo < end && is_dir_sep(end[-1])) + while (repo < end && (is_dir_sep(end[-1]) || isspace(end[-1]))) end--; if (end - repo > 5 && is_dir_sep(end[-5]) && !strncmp(end - 4, ".git", 4)) { @@ -140,10 +141,21 @@ static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) if (is_bare) { struct strbuf result = STRBUF_INIT; strbuf_addf(&result, "%.*s.git", (int)(end - start), start); - return strbuf_detach(&result, 0); - } - - return xstrndup(start, end - start); + dir = strbuf_detach(&result, 0); + } else + dir = xstrndup(start, end - start); + /* replace all 'control' characters with ascii space */ + for (start = dir; *start; ++start) + if (*(const unsigned char *)start < 32u) + dir[start - dir] = '\x20'; + /* remove trailing spaces */ + if (dir < start) + for (end = start; dir < --end; ) + if (!isspace(*end)) + break; + else + dir[end - dir] = '\0'; + return dir; } static void strip_trailing_slashes(char *dir) -- 1.6.3.28.ga852b ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-13 20:53 ` [PATCH 2/2] Improve the naming of guessed target repository for git clone Alex Riesen @ 2009-05-14 0:41 ` Junio C Hamano 2009-05-14 5:54 ` Alex Riesen 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2009-05-14 0:41 UTC (permalink / raw) To: Alex Riesen; +Cc: git, Daniel Barkalow, Hugo Mildenberger Alex Riesen <raa.lkml@gmail.com> writes: > Strip trailing spaces off guessed target directory in builtin clone, > and replace 'control' characters with an ASCII space. > > User still can have any name by specifying it explicitely after url. > > Signed-off-by: Alex Riesen <raa.lkml@gmail.com> > --- > > This should take care of accidental pastings inside shell quotes. > At least for the local part of the operation. > Now I'm looking at the code and think I should have stripped the > heading whitespace as well. It is much less likely to happen, though. > @@ -140,10 +141,21 @@ static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) > if (is_bare) { > struct strbuf result = STRBUF_INIT; > strbuf_addf(&result, "%.*s.git", (int)(end - start), start); > + dir = strbuf_detach(&result, 0); > + } else > + dir = xstrndup(start, end - start); > + /* replace all 'control' characters with ascii space */ > + for (start = dir; *start; ++start) > + if (*(const unsigned char *)start < 32u) > + dir[start - dir] = '\x20'; What's this strange mixture of 32u and '\x20'? > + /* remove trailing spaces */ > + if (dir < start) > + for (end = start; dir < --end; ) > + if (!isspace(*end)) > + break; > + else > + dir[end - dir] = '\0'; > + return dir; > } Honestly, I regret having asked if there was a 2/2 ;-) What's the point of this change, now that you have a fix in 1/2? Who are you helping with this patch? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-14 0:41 ` Junio C Hamano @ 2009-05-14 5:54 ` Alex Riesen 2009-05-14 6:35 ` Junio C Hamano 2009-05-14 8:33 ` Alex Riesen 0 siblings, 2 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-14 5:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Daniel Barkalow, Hugo Mildenberger 2009/5/14 Junio C Hamano <gitster@pobox.com>: > Alex Riesen <raa.lkml@gmail.com> writes: >> + /* replace all 'control' characters with ascii space */ >> + for (start = dir; *start; ++start) >> + if (*(const unsigned char *)start < 32u) >> + dir[start - dir] = '\x20'; > > What's this strange mixture of 32u and '\x20'? > Not sure myself. I probably wanted visibility, and somehow ended up using different presentations. >> + /* remove trailing spaces */ >> + if (dir < start) >> + for (end = start; dir < --end; ) >> + if (!isspace(*end)) >> + break; >> + else >> + dir[end - dir] = '\0'; >> + return dir; >> } > > Honestly, I regret having asked if there was a 2/2 ;-) > > What's the point of this change, now that you have a fix in 1/2? Who are > you helping with this patch? > Without this the _automatically_ generated names for cloned repositories have all the whitespace around them. As you cannot sanely depend on automatically generated names, I thought that making them simpler will make sense. But I should complete the patch: remove heading whitespace, and replace multiple spaces and control characters with one space. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-14 5:54 ` Alex Riesen @ 2009-05-14 6:35 ` Junio C Hamano 2009-05-14 8:45 ` Alex Riesen 2009-05-14 8:33 ` Alex Riesen 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2009-05-14 6:35 UTC (permalink / raw) To: Alex Riesen; +Cc: git, Daniel Barkalow, Hugo Mildenberger Alex Riesen <raa.lkml@gmail.com> writes: >> What's the point of this change, now that you have a fix in 1/2? Who are >> you helping with this patch? > > Without this the _automatically_ generated names for cloned repositories > have all the whitespace around them. Even if it has whitespace around its name, that's what you got from the upstream (a valid source of clone), and wasn't it you who said something about UNIX tradition of allowing LF and others in the filename? If clone reports "ok we created this new repository" so that the caller can capture it, then the whole process should be able to cope with automatically generated names with or without the patch, shouldn't it? Or are you trying to help a human user who gives a pathname ridden with excess whitespaces to "git clone", and that pathname _happens_ to work as a valid clone source, creating a new repository whose name is ridden with excess whitespaces the same way as the input pathname? After all, the user deliberately gave them to us, and the repository we cloned from had these excesses in its name (iow, without the excess whitespaces the clone itself wouldn't have worked). In such a case, is it really helping him to remove these whitespaces as excesses? I do not understand the point of this patch and that is why I asked who you are trying to help. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-14 6:35 ` Junio C Hamano @ 2009-05-14 8:45 ` Alex Riesen 2009-05-14 12:50 ` Hugo Mildenberger 0 siblings, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-14 8:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Daniel Barkalow, Hugo Mildenberger 2009/5/14 Junio C Hamano <gitster@pobox.com>: > Alex Riesen <raa.lkml@gmail.com> writes: > >>> What's the point of this change, now that you have a fix in 1/2? Who are >>> you helping with this patch? >> >> Without this the _automatically_ generated names for cloned repositories >> have all the whitespace around them. > > Even if it has whitespace around its name, that's what you got from the > upstream (a valid source of clone), and wasn't it you who said something > about UNIX tradition of allowing LF and others in the filename? Yes, when user explicitely asked a program about that. This here (clone with only URL as argument) is not the case, I think. > If clone reports "ok we created this new repository" so that the caller > can capture it, then the whole process should be able to cope with > automatically generated names with or without the patch, shouldn't it? No, don't think so. You're not always able to capture the output of git clone (Windows again), and BTW - init-db output is not designed to be captured unambiguously. > Or are you trying to help a human user who gives a pathname ridden with > excess whitespaces to "git clone", and that pathname _happens_ to work as > a valid clone source, creating a new repository whose name is ridden with > excess whitespaces the same way as the input pathname? Not really. I just try to make the _generated_ output, which the user cannot predict anyway (nor does the user care much about it) to be less problematic. Yes, I did say that LF-anything in UNIX filenames is a normal thing. That does not mean that such names are so very convenient to use. They do cause problems, even if just through scrambling terminal output. They are "inconvenient". If our users don't expect precise output anyway, we can be a little more adhering to usual practices in choosing names. > ... After all, the > user deliberately gave them to us, and the repository we cloned from had > these excesses in its name (iow, without the excess whitespaces the clone > itself wouldn't have worked). In such a case, is it really helping him to > remove these whitespaces as excesses? I think yes. Otherwise the strict form of git clone could have been used, which involves absolutely no guessing and mangling. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-14 8:45 ` Alex Riesen @ 2009-05-14 12:50 ` Hugo Mildenberger 0 siblings, 0 replies; 30+ messages in thread From: Hugo Mildenberger @ 2009-05-14 12:50 UTC (permalink / raw) To: Alex Riesen; +Cc: Junio C Hamano, git, Daniel Barkalow Am Donnerstag, 14. Mai 2009 schrieb Alex Riesen: > 2009/5/14 Junio C Hamano <gitster@pobox.com>: > > Alex Riesen <raa.lkml@gmail.com> writes: > > > >>> What's the point of this change, now that you have a fix in 1/2? Who are > >>> you helping with this patch? > >> > >> Without this the _automatically_ generated names for cloned repositories > >> have all the whitespace around them. > > > > Even if it has whitespace around its name, that's what you got from the > > upstream (a valid source of clone), and wasn't it you who said something > > about UNIX tradition of allowing LF and others in the filename? > > Yes, when user explicitely asked a program about that. This here > (clone with only URL as argument) is not the case, I think. > > > If clone reports "ok we created this new repository" so that the caller > > can capture it, then the whole process should be able to cope with > > automatically generated names with or without the patch, shouldn't it? > > No, don't think so. You're not always able to capture the output of git clone > (Windows again), and BTW - init-db output is not designed to be captured > unambiguously. > > > Or are you trying to help a human user who gives a pathname ridden with > > excess whitespaces to "git clone", and that pathname _happens_ to work as > > a valid clone source, creating a new repository whose name is ridden with > > excess whitespaces the same way as the input pathname? > > Not really. I just try to make the _generated_ output, which the user cannot > predict anyway (nor does the user care much about it) to be less > problematic. Yes, I did say that LF-anything in UNIX filenames is a normal > thing. That does not mean that such names are so very convenient to use. > They do cause problems, even if just through scrambling terminal output. > They are "inconvenient". If our users don't expect precise output anyway, > we can be a little more adhering to usual practices in choosing names. > > > ... After all, the > > user deliberately gave them to us, and the repository we cloned from had > > these excesses in its name (iow, without the excess whitespaces the clone > > itself wouldn't have worked). In such a case, is it really helping him to > > remove these whitespaces as excesses? > > I think yes. Otherwise the strict form of git clone could have been used, > which involves absolutely no guessing and mangling. > This discussion began with the problem I encountered during a git clone operation applied to an url which accidentally included a linefeed. It partly went through although the remote side did not include that character. After Alex Riesen fixed the git part by escaping linefeeds, the linefeed left over in the top level directory name still caused a linux kernel make to fail (which really is not problem of git or GNU make itself). Elsewhere in this thread, Daniel Barkalow proposed to apply the HTTP RFC to the whole url. If accepted, this would cope with the problem in a standard way. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-14 5:54 ` Alex Riesen 2009-05-14 6:35 ` Junio C Hamano @ 2009-05-14 8:33 ` Alex Riesen 2009-05-16 17:49 ` Junio C Hamano 1 sibling, 1 reply; 30+ messages in thread From: Alex Riesen @ 2009-05-14 8:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Daniel Barkalow, Hugo Mildenberger [-- Attachment #1: Type: text/plain, Size: 2218 bytes --] Strip leading and trailing spaces off guessed target directory, and replace sequences of whitespace and 'control' characters with one space character. User still can have any name by specifying it explicitely after url. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- 2009/5/14 Alex Riesen <raa.lkml@gmail.com>: > But I should complete the patch: remove heading whitespace, and replace > multiple spaces and control characters with one space. Done. builtin-clone.c | 34 +++++++++++++++++++++++++++++----- 1 files changed, 29 insertions(+), 5 deletions(-) diff --git a/builtin-clone.c b/builtin-clone.c index 880373f..d068b7e 100644 --- a/builtin-clone.c +++ b/builtin-clone.c @@ -104,11 +104,12 @@ static char *get_repo_path(const char *repo, int *is_bundle) static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) { const char *end = repo + strlen(repo), *start; + char *dir; /* - * Strip trailing slashes and /.git + * Strip trailing spaces, slashes and /.git */ - while (repo < end && is_dir_sep(end[-1])) + while (repo < end && (is_dir_sep(end[-1]) || isspace(end[-1]))) end--; if (end - repo > 5 && is_dir_sep(end[-5]) && !strncmp(end - 4, ".git", 4)) { @@ -140,10 +141,33 @@ static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) if (is_bare) { struct strbuf result = STRBUF_INIT; strbuf_addf(&result, "%.*s.git", (int)(end - start), start); - return strbuf_detach(&result, 0); + dir = strbuf_detach(&result, 0); + } else + dir = xstrndup(start, end - start); + /* + * Replace sequences of 'control' characters and whitespace + * with one ascii space, remove leading and trailing spaces. + */ + if (*dir) { + char *out = dir; + int prev_space = 1 /* strip leading whitespace */; + for (end = dir; *end; ++end) { + char ch = *end; + if ((unsigned char)ch < '\x20') + ch = '\x20'; + if (isspace(ch)) { + if (prev_space) + continue; + prev_space = 1; + } else + prev_space = 0; + *out++ = ch; + } + *out = '\0'; + if (out > dir && prev_space) + out[-1] = '\0'; } - - return xstrndup(start, end - start); + return dir; } static void strip_trailing_slashes(char *dir) -- 1.6.3.1.37.g5d96e [-- Attachment #2: 0002-Improve-the-naming-of-guessed-target-repository-for-.patch --] [-- Type: application/octet-stream, Size: 2269 bytes --] From 6d6075130afbdd35260e3e33c099f57d9faebe98 Mon Sep 17 00:00:00 2001 From: Alex Riesen <raa.lkml@gmail.com> Date: Wed, 13 May 2009 18:32:06 +0200 Subject: [PATCH 2/2] Improve the naming of guessed target repository for git clone Strip leading and trailing spaces off guessed target directory, and replace sequences of whitespace and 'control' characters with one space character. User still can have any name by specifying it explicitely after url. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- builtin-clone.c | 34 +++++++++++++++++++++++++++++----- 1 files changed, 29 insertions(+), 5 deletions(-) diff --git a/builtin-clone.c b/builtin-clone.c index 880373f..d068b7e 100644 --- a/builtin-clone.c +++ b/builtin-clone.c @@ -104,11 +104,12 @@ static char *get_repo_path(const char *repo, int *is_bundle) static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) { const char *end = repo + strlen(repo), *start; + char *dir; /* - * Strip trailing slashes and /.git + * Strip trailing spaces, slashes and /.git */ - while (repo < end && is_dir_sep(end[-1])) + while (repo < end && (is_dir_sep(end[-1]) || isspace(end[-1]))) end--; if (end - repo > 5 && is_dir_sep(end[-5]) && !strncmp(end - 4, ".git", 4)) { @@ -140,10 +141,33 @@ static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) if (is_bare) { struct strbuf result = STRBUF_INIT; strbuf_addf(&result, "%.*s.git", (int)(end - start), start); - return strbuf_detach(&result, 0); + dir = strbuf_detach(&result, 0); + } else + dir = xstrndup(start, end - start); + /* + * Replace sequences of 'control' characters and whitespace + * with one ascii space, remove leading and trailing spaces. + */ + if (*dir) { + char *out = dir; + int prev_space = 1 /* strip leading whitespace */; + for (end = dir; *end; ++end) { + char ch = *end; + if ((unsigned char)ch < '\x20') + ch = '\x20'; + if (isspace(ch)) { + if (prev_space) + continue; + prev_space = 1; + } else + prev_space = 0; + *out++ = ch; + } + *out = '\0'; + if (out > dir && prev_space) + out[-1] = '\0'; } - - return xstrndup(start, end - start); + return dir; } static void strip_trailing_slashes(char *dir) -- 1.6.3.1.37.g5d96e ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] Improve the naming of guessed target repository for git clone 2009-05-14 8:33 ` Alex Riesen @ 2009-05-16 17:49 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2009-05-16 17:49 UTC (permalink / raw) To: Alex Riesen; +Cc: git, Daniel Barkalow, Hugo Mildenberger Alex Riesen <raa.lkml@gmail.com> writes: > Strip leading and trailing spaces off guessed target directory, and > replace sequences of whitespace and 'control' characters with one > space character. > > User still can have any name by specifying it explicitely after url. > > Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Thanks, will queue. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git fails with control characters in trunk directory name 2009-05-12 16:59 ` Hugo Mildenberger 2009-05-12 17:18 ` Alex Riesen @ 2009-05-12 17:41 ` Alex Riesen 1 sibling, 0 replies; 30+ messages in thread From: Alex Riesen @ 2009-05-12 17:41 UTC (permalink / raw) To: Hugo Mildenberger; +Cc: git Hugo Mildenberger, Tue, May 12, 2009 18:59:58 +0200: > > > I looked into guess_dir_name(). > > > > That's not the right place. The place I meant is right below the call > > to this function (you have to parse the names given in the command-line > > too). > > > > The automatically generated (that's the case with guess_dir_name) > > directory name certainly shouldn't contain any unexpected characters. > > > > > A regex call would be easy to fit, but > > > currently the git binary does not depend on libpcre. Is it generally > > > considered to be acceptable to add such a dependency? > > > > No. And pcre is not the only regex lib in the world. And we prefer > > shell patterns, if any at all. > > You really want to use shell patterns to match against a string from within a binary? Is that a problem? Especially if the matching expression is just something like a character class? > Although git already makes use of regexec from glibc or compat/regex > directory in numerous places? You said pcre, and I wasn't able to look at the source at the time I answered your mail to check if git relies on POSIX regexp. I did (and remembered git grep), so no need for pcre and regular exceptions are just a line away. P.S. Could you please quote more appropriately? Your discussion is a little hard to follow. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-05-16 17:50 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-05-11 20:08 git fails with control characters in trunk directory name Hugo Mildenberger 2009-05-12 6:51 ` Alex Riesen 2009-05-12 9:02 ` Hugo Mildenberger 2009-05-12 10:54 ` Alex Riesen 2009-05-12 13:57 ` Hugo Mildenberger 2009-05-12 14:59 ` Alex Riesen 2009-05-12 16:59 ` Hugo Mildenberger 2009-05-12 17:18 ` Alex Riesen 2009-05-12 17:24 ` [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD Alex Riesen 2009-05-12 23:16 ` Junio C Hamano 2009-05-13 6:06 ` Alex Riesen [not found] ` <200905131340.31509.Hugo.Mildenberger@namir.de> 2009-05-13 12:10 ` Alex Riesen 2009-05-13 14:49 ` Hugo Mildenberger 2009-05-13 12:39 ` Hugo Mildenberger 2009-05-13 15:18 ` Daniel Barkalow 2009-05-13 16:09 ` Alex Riesen 2009-05-13 17:07 ` Alex Riesen 2009-05-13 17:12 ` Daniel Barkalow 2009-05-13 18:11 ` Alex Riesen 2009-05-13 18:23 ` Junio C Hamano 2009-05-13 18:08 ` [PATCH 1/2] " Alex Riesen 2009-05-13 20:53 ` [PATCH 2/2] Improve the naming of guessed target repository for git clone Alex Riesen 2009-05-14 0:41 ` Junio C Hamano 2009-05-14 5:54 ` Alex Riesen 2009-05-14 6:35 ` Junio C Hamano 2009-05-14 8:45 ` Alex Riesen 2009-05-14 12:50 ` Hugo Mildenberger 2009-05-14 8:33 ` Alex Riesen 2009-05-16 17:49 ` Junio C Hamano 2009-05-12 17:41 ` git fails with control characters in trunk directory name Alex Riesen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.