All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Daniel Barkalow <barkalow@iabervon.org>
Cc: Alex Riesen <raa.lkml@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org,
	Hugo Mildenberger <Hugo.Mildenberger@namir.de>
Subject: Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD
Date: Wed, 13 May 2009 11:23:49 -0700	[thread overview]
Message-ID: <7veius6eve.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.LNX.2.00.0905131109240.2147@iabervon.org> (Daniel Barkalow's message of "Wed\, 13 May 2009 11\:18\:40 -0400 \(EDT\)")

Daniel Barkalow <barkalow@iabervon.org> writes:

> On Wed, 13 May 2009, Alex Riesen wrote:
>
>> 2009/5/13 Junio C Hamano <gitster@pobox.com>:
>> > Alex Riesen <raa.lkml@gmail.com> writes:
>> >
>> >> +             for (i = 0; i < url_len; ++i)
>> >> +                     if ('\n' == url[i])
>> >> +                             fputs("\\n", fp);
>> >> +                     else
>> >> +                             fputc(url[i], fp);
>> >> +             fputc('\n', fp);
>> >
>> > This ad-hoc quoting feels _very_ wrong.  Who is on the reading side and
>> > how does it unquote?
>> 
>> git fmt-merge-msg. It does not unquote. The url is purely informational here.
>> OTOH, the \n shouldn't be in url text at all, so treat it as slightly
>> less annoying
>> warning.
>> 
>> > If it is just informational use only, then it might make more sense to
>> > drop this ugly "quoted \n" silently.  I dunno.
>> 
>> That'd mean to loose the information completely. Which is just as bad
>> as putting the LF in the url in the first place.
>
> Looking back at the original message, it looks like the user included a 
> newline in an argument to clone, and the fetch must have stripped it out 
> (or ignored it in some other fashion), because data was retrieved from a 
> repository that doesn't have a newline in its name. Most likely, the 
> newline should just be prohibited in the URL in the config file in the 
> first place, and we shouldn't be able to get to the point of writing a 
> FETCH_HEAD with that value.

Let me attempt to summarize the situation.

 . FETCH_HEAD is a LF terminated sequence of records; each record
   describes the commit object name, if it is meant to be merged, what URL
   and what ref it came from;

 . "git pull" reads from FETCH_HEAD to decide which commit objects to pass
   to underlying "git merge", and "git fmt-merge-msg" reads from
   FETCH_HEAD to decide what message to produce; and

 . "git fetch" allows a URL with an LF in it and fetches from such a URL
   just fine, but "git pull" barfed because "git fmt-patch" noticed.

Because we copy the LF in the URL straight to FETCH_HEAD, it breaks the
file format.  Alex's proposal is to quote it as "\\n" to avoid it (I
suggested stripping it).  Either change will _fix_ the situation in the
sense that the file format will now be correct, "git pull" will extract
the right set of commits (as it does not look at the URL at all when
deciding which records are used for merge), and "git fmt-merge-msg" will
generate correct set of branches and list of incoming commits without
parse errors, even though the user won't be able to cut and paste the URL
it records in its output either way (if stripped, it will be "information
loss", but even if it were quoted, it won't be usable because nobody
unquotes it).

Given all of the above, I think:

 (1) Alex's patch is a good minimum fix to the issue [*1*].  If unfixed,
     people will be able to fetch from but won't be able to pull from a
     URL with a LF in it.

 (2) Even though it seems that we do _support_ fetching from such a URL,
     it is not a good thing.  People may do all sorts of crazy things, and
     this may be one of these crazy things that we _could_ break the
     backward compatibility to avoid innocent mistakes, by forbidding LF
     in URLs.  Nobody could have been using such a URL in real settings
     because of the issue Alex's patch addresses anyway.

 (3) But forbidding or warning will be done by new code.  Is the cost to
     do such a check (implementation and maintenance of the new code)
     justifiable?  Where do we check and when?

So for now, I'd say I take Alex's patch as-is, and do nothing else.

If somebody has a compelling example that this kind of mistake is common
and is hard to figure out why, we can explicitly forbid certain byte
values in the repository URL as a separate step.


[Footnote]

*1* The output is purely informational, and especially its URL field is
not meant to be used as cut-and-paste source by general public anyway
(e.g. you may pull using git over SSH but general public may not have an
SSH access to the repository).  I actually was tempted, when I did the
initial version of fmt-merge-msg, to shorten an overlong URL with
ellipses, even though I didn't.

  parent reply	other threads:[~2009-05-13 18:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-11 20:08 git fails with control characters in trunk directory name Hugo Mildenberger
2009-05-12  6:51 ` Alex Riesen
2009-05-12  9:02   ` Hugo Mildenberger
2009-05-12 10:54     ` Alex Riesen
2009-05-12 13:57       ` Hugo Mildenberger
2009-05-12 14:59         ` Alex Riesen
2009-05-12 16:59           ` Hugo Mildenberger
2009-05-12 17:18             ` Alex Riesen
2009-05-12 17:24               ` [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD Alex Riesen
2009-05-12 23:16                 ` Junio C Hamano
2009-05-13  6:06                   ` Alex Riesen
     [not found]                     ` <200905131340.31509.Hugo.Mildenberger@namir.de>
2009-05-13 12:10                       ` Alex Riesen
2009-05-13 14:49                         ` Hugo Mildenberger
2009-05-13 12:39                     ` Hugo Mildenberger
2009-05-13 15:18                     ` Daniel Barkalow
2009-05-13 16:09                       ` Alex Riesen
2009-05-13 17:07                         ` Alex Riesen
2009-05-13 17:12                         ` Daniel Barkalow
2009-05-13 18:11                           ` Alex Riesen
2009-05-13 18:23                       ` Junio C Hamano [this message]
2009-05-13 18:08                 ` [PATCH 1/2] " Alex Riesen
2009-05-13 20:53                   ` [PATCH 2/2] Improve the naming of guessed target repository for git clone Alex Riesen
2009-05-14  0:41                     ` Junio C Hamano
2009-05-14  5:54                       ` Alex Riesen
2009-05-14  6:35                         ` Junio C Hamano
2009-05-14  8:45                           ` Alex Riesen
2009-05-14 12:50                             ` Hugo Mildenberger
2009-05-14  8:33                         ` Alex Riesen
2009-05-16 17:49                           ` Junio C Hamano
2009-05-12 17:41             ` git fails with control characters in trunk directory name Alex Riesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7veius6eve.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=Hugo.Mildenberger@namir.de \
    --cc=barkalow@iabervon.org \
    --cc=git@vger.kernel.org \
    --cc=raa.lkml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.