All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast-import docs: LT is valid in email, GT is not
@ 2010-04-24  0:45 Mark Lodato
  2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder
  2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder
  0 siblings, 2 replies; 13+ messages in thread
From: Mark Lodato @ 2010-04-24  0:45 UTC (permalink / raw)
  To: git; +Cc: Mark Lodato

In git-fast-import(1), fix a mistake that said LT and LF were the
invalid characters in email addresses.  This should have been GT and LF,
since the GT ends the email address and LF ends the command.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
---
 Documentation/git-fast-import.txt |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 19082b0..6dcd583 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -394,7 +394,7 @@ Here `<name>` is the person's display name (for example
 and greater-than (\x3e) symbols.  These are required to delimit
 the email address from the other fields in the line.  Note that
 `<name>` is free-form and may contain any sequence of bytes, except
-`LT` and `LF`.  It is typically UTF-8 encoded.
+`GT` and `LF`.  It is typically UTF-8 encoded.
 
 The time of the change is specified by `<when>` using the date format
 that was selected by the \--date-format=<fmt> command line option.
-- 
1.7.0.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] fsck: check ident lines in commit objects
  2010-04-24  0:45 [PATCH] fast-import docs: LT is valid in email, GT is not Mark Lodato
@ 2010-04-24 16:06 ` Jonathan Nieder
  2010-04-24 16:59   ` Jonathan Nieder
  2010-04-24 19:04   ` Shawn O. Pearce
  2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder
  1 sibling, 2 replies; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-24 16:06 UTC (permalink / raw)
  To: Mark Lodato; +Cc: git, Shawn O. Pearce, Nicolas Pitre

Check that email addresses do not contain <, >, or newline so they can
be quickly scanned without trouble.  The copy() function in ident.c
already ensures that ordinary git commands will not write email
addresses without this property.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Thoughts?  Should some of these errors be warnings?

git fast-import is capable of producing commits with some of these
problems: for example, it is fine with

	committer C O Mitter <foo@b>ar.net> 005 -    +5

 fsck.c          |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh |   25 +++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fsck.c b/fsck.c
index 89278c1..ae9ae1a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -222,12 +222,47 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
+static int fsck_ident(char **ident, struct object *obj, fsck_error error_func)
+{
+	if (**ident == '<' || **ident == '\n')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before email");
+	*ident += strcspn(*ident, "<\n");
+	if ((*ident)[-1] != ' ')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before email");
+	if (**ident != '<')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing email");
+	(*ident)++;
+	*ident += strcspn(*ident, "<>\n");
+	if (**ident != '>')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad email");
+	(*ident)++;
+	if (**ident != ' ')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before date");
+	(*ident)++;
+	if (**ident == '0' && (*ident)[1] != ' ')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - zero-padded date");
+	*ident += strspn(*ident, "0123456789");
+	if (**ident != ' ')
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad date");
+	(*ident)++;
+	if ((**ident != '+' && **ident != '-') ||
+	    !isdigit((*ident)[1]) ||
+	    !isdigit((*ident)[2]) ||
+	    !isdigit((*ident)[3]) ||
+	    !isdigit((*ident)[4]) ||
+	    ((*ident)[5] != '\n'))
+		return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad time zone");
+	(*ident) += 6;
+	return 0;
+}
+
 static int fsck_commit(struct commit *commit, fsck_error error_func)
 {
 	char *buffer = commit->buffer;
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
 	int parents = 0;
+	int err;
 
 	if (commit->date == ULONG_MAX)
 		return error_func(&commit->object, FSCK_ERROR, "invalid author/committer line");
@@ -266,6 +301,18 @@ static int fsck_commit(struct commit *commit, fsck_error error_func)
 	}
 	if (memcmp(buffer, "author ", 7))
 		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'author' line");
+	buffer += 7;
+	err = fsck_ident(&buffer, &commit->object, error_func);
+	if (err)
+		return err;
+	if (memcmp(buffer, "committer ", strlen("committer ")))
+		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'committer' line");
+	buffer += strlen("committer ");
+	err = fsck_ident(&buffer, &commit->object, error_func);
+	if (err)
+		return err;
+	if (*buffer != '\n')
+		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected blank line");
 	if (!commit->tree)
 		return error_func(&commit->object, FSCK_ERROR, "could not load commit's tree %s", sha1_to_hex(tree_sha1));
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 49cae3e..d8eed9b 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -57,6 +57,31 @@ test_expect_success 'branch pointing to non-commit' '
 	git update-ref -d refs/heads/invalid
 '
 
+new=nothing
+test_expect_success 'email without @ is okay' '
+	git cat-file commit HEAD >basis &&
+	sed "s/@/AT/" basis >okay &&
+	new=$(git hash-object -t commit -w --stdin <okay) &&
+	echo "$new" &&
+	git update-ref refs/heads/bogus "$new" &&
+	git fsck
+'
+git update-ref -d refs/heads/bogus
+rm -f ".git/objects/$new"
+
+new=nothing
+test_expect_success 'email with embedded > is not okay' '
+	git cat-file commit HEAD >basis &&
+	sed "s/@[a-z]/&>/" basis >bad-email &&
+	new=$(git hash-object -t commit -w --stdin <bad-email) &&
+	echo "$new" &&
+	git update-ref refs/heads/bogus "$new" &&
+	git fsck 2>out &&
+	grep "error in commit $new" out
+'
+git update-ref -d refs/heads/bogus
+rm -f ".git/objects/$new"
+
 cat > invalid-tag <<EOF
 object ffffffffffffffffffffffffffffffffffffffff
 type commit
-- 
1.7.0.6.2.g02f3f0.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] fast-import docs: LT is valid in email, GT is not
  2010-04-24  0:45 [PATCH] fast-import docs: LT is valid in email, GT is not Mark Lodato
  2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder
@ 2010-04-24 16:12 ` Jonathan Nieder
  2010-04-24 16:59   ` Mark Lodato
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-24 16:12 UTC (permalink / raw)
  To: Mark Lodato; +Cc: git

Hi Mark,

Mark Lodato wrote:

> +++ b/Documentation/git-fast-import.txt
> @@ -394,7 +394,7 @@ Here `<name>` is the person's display name (for example
>  and greater-than (\x3e) symbols.  These are required to delimit
>  the email address from the other fields in the line.  Note that
>  `<name>` is free-form and may contain any sequence of bytes, except
> -`LT` and `LF`.  It is typically UTF-8 encoded.
> +`GT` and `LF`.  It is typically UTF-8 encoded.

	Here <name> is the person’s display name (for example
	“Com M Itter”)

So the original text is correct --- a <name> cannot contain LT because
a less-than sign marks the boundary between a name and email address.

Maybe you were wondering what characters are valid in an e-mail address?
The comments in fast-import.c and code in ident.c are consistent about
this: the forbidden characters are <, >, and LF, though no one seems to
check (see also my other reply).  A patch to explain this (including a
reference to git-commit-tree(1), I guess) might be useful.

git won’t understand an email with embedded > or LF.  I’m not sure a <
would cause problems, but I don’t mind that it is disallowed.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fsck: check ident lines in commit objects
  2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder
@ 2010-04-24 16:59   ` Jonathan Nieder
  2010-04-24 19:04   ` Shawn O. Pearce
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-24 16:59 UTC (permalink / raw)
  To: Mark Lodato; +Cc: git, Shawn O. Pearce, Nicolas Pitre

Jonathan Nieder wrote:

> Check that email addresses do not contain <, >, or newline so they can
> be quickly scanned without trouble.

Test was bogus: the object format tests in fsck do not affect its exit
code.  Here’s a fixup.

Sorry for the trouble,
Jonathan

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d8eed9b..22a80c8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -64,7 +64,9 @@ test_expect_success 'email without @ is okay' '
 	new=$(git hash-object -t commit -w --stdin <okay) &&
 	echo "$new" &&
 	git update-ref refs/heads/bogus "$new" &&
-	git fsck
+	git fsck 2>out &&
+	cat out &&
+	! grep "error in commit $new" out
 '
 git update-ref -d refs/heads/bogus
 rm -f ".git/objects/$new"
@@ -77,6 +79,7 @@ test_expect_success 'email with embedded > is not okay' '
 	echo "$new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	git fsck 2>out &&
+	cat out &&
 	grep "error in commit $new" out
 '
 git update-ref -d refs/heads/bogus

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] fast-import docs: LT is valid in email, GT is not
  2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder
@ 2010-04-24 16:59   ` Mark Lodato
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Lodato @ 2010-04-24 16:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Sat, Apr 24, 2010 at 12:12 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Mark Lodato wrote:
>> +++ b/Documentation/git-fast-import.txt
>> @@ -394,7 +394,7 @@ Here `<name>` is the person's display name (for example
>>  and greater-than (\x3e) symbols.  These are required to delimit
>>  the email address from the other fields in the line.  Note that
>>  `<name>` is free-form and may contain any sequence of bytes, except
>> -`LT` and `LF`.  It is typically UTF-8 encoded.
>> +`GT` and `LF`.  It is typically UTF-8 encoded.
>
>        Here <name> is the person’s display name (for example
>        “Com M Itter”)
>
> So the original text is correct --- a <name> cannot contain LT because
> a less-than sign marks the boundary between a name and email address.

Ah, you're right, sorry.  I thought it was <email>, not <name>.

> Maybe you were wondering what characters are valid in an e-mail address?
> The comments in fast-import.c and code in ident.c are consistent about
> this: the forbidden characters are <, >, and LF, though no one seems to
> check (see also my other reply).  A patch to explain this (including a
> reference to git-commit-tree(1), I guess) might be useful.
>
> git won’t understand an email with embedded > or LF.  I’m not sure a <
> would cause problems, but I don’t mind that it is disallowed.

It seems like it would be good to disallow <, >, and LF in both name
and email.  With your other patch, > is allowed in the name.

Thanks for the clarification,
Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fsck: check ident lines in commit objects
  2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder
  2010-04-24 16:59   ` Jonathan Nieder
@ 2010-04-24 19:04   ` Shawn O. Pearce
  2010-04-24 20:38     ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder
  1 sibling, 1 reply; 13+ messages in thread
From: Shawn O. Pearce @ 2010-04-24 19:04 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Mark Lodato, git, Nicolas Pitre

Jonathan Nieder <jrnieder@gmail.com> wrote:
> Check that email addresses do not contain <, >, or newline so they can
> be quickly scanned without trouble.  The copy() function in ident.c
> already ensures that ordinary git commands will not write email
> addresses without this property.
> 
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
> Thoughts?  Should some of these errors be warnings?

These should be errors.  We should never see this sort of thing
occur in a live repository.
 
> git fast-import is capable of producing commits with some of these
> problems: for example, it is fine with
> 
> 	committer C O Mitter <foo@b>ar.net> 005 -    +5

Yuck.  We probably should tighten up the parser in fast-import a
bit more.  The above is pretty insane for it to produce into the
repository.  I can't even begin to count how many ways the above
line is just wrong...  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/2] fast-import: tighten up parsing ident line
  2010-04-24 19:04   ` Shawn O. Pearce
@ 2010-04-24 20:38     ` Jonathan Nieder
  2010-04-24 20:50       ` [PATCH 1/2] fast-import: be strict about formatting of raw dates Jonathan Nieder
  2010-04-24 21:10       ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder
  0 siblings, 2 replies; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-24 20:38 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre

Shawn O. Pearce wrote:
> Jonathan Nieder <jrnieder@gmail.com> wrote:

>> git fast-import is capable of producing commits with some of these
>> problems: for example, it is fine with
>> 
>> 	committer C O Mitter <foo@b>ar.net> 005 -    +5
>
> Yuck.  We probably should tighten up the parser in fast-import a
> bit more.

How about this?

Jonathan Nieder (2):
  fast-import: be strict about formatting of dates
  fast-import: validate entire ident string

 Documentation/git-fast-import.txt |    9 ++--
 fast-import.c                     |   63 ++++++++++++++--------
 t/t9300-fast-import.sh            |  105 +++++++++++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] fast-import: be strict about formatting of raw dates
  2010-04-24 20:38     ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder
@ 2010-04-24 20:50       ` Jonathan Nieder
  2010-04-24 21:10       ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-24 20:50 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre

git proper never zero- or space-pads its dates and its time zones are
always 4 digits long.  Require fast-import front-ends to behave
likewise to avoid generating puzzling objects.

Since this makes the input format more strict, some front-ends may not
be happy.  But they were warned:

   This is the Git native format and is <time> SP <offutc>. It is also
   fast-import’s default format, if --date-format was not specified.

   Unlike the rfc2822 format, this format is very strict. Any variation
   in formatting will cause fast-import to reject the value.

Aside from ensuring the format is predictable so tools like git can
handle it, making the date format this strict ensures that there is
only one valid representation for a given date and time zone, which
would be useful for round-trip conversion of objects to and from other
formats (for storage by other version control systems, for example).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Is -0000 the same time zone as +0000?  I wasn’t sure so I erred on the
side of not worrying about it.

 fast-import.c          |    9 +++++++--
 t/t9300-fast-import.sh |   30 ++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 74f08bd..1701cf1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1906,6 +1906,8 @@ static int validate_raw_date(const char *src, char *result, int maxlen)
 
 	errno = 0;
 
+	if ((*src == '0' && isdigit(src[1])) || !isdigit(*src))
+		return -1;
 	num = strtoul(src, &endp, 10);
 	/* NEEDSWORK: perhaps check for reasonable values? */
 	if (errno || endp == src || *endp != ' ')
@@ -1915,8 +1917,11 @@ static int validate_raw_date(const char *src, char *result, int maxlen)
 	if (*src != '-' && *src != '+')
 		return -1;
 
-	num = strtoul(src + 1, &endp, 10);
-	if (errno || endp == src + 1 || *endp || (endp - orig_src) >= maxlen ||
+	src++;
+	if (*src != '0' && *src != '1')
+		return -1;
+	num = strtoul(src, &endp, 10);
+	if (errno || endp != src + 4 || *endp || (endp - orig_src) >= maxlen ||
 	    1400 < num)
 		return -1;
 
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 131f032..ed653a7 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -348,6 +348,36 @@ test_expect_success \
 
 cat >input <<INPUT_END
 commit refs/heads/branch
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 1170783301 -  0500
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 117078330 -0500
+data <<COMMIT
+Malformed time zone
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: blanks in raw time zone' '
+    test_must_fail git fast-import --date-format=raw <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 01170783301 -0500
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 117078330 -0500
+data <<COMMIT
+Malformed date
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: leading zero in raw date' '
+    test_must_fail git fast-import --date-format=raw <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Tue Feb 6 11:22:18 2007 -0500
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:02 2007 -0500
 data <<COMMIT
-- 
1.7.1.rc1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] fast-import: validate entire ident string
  2010-04-24 20:38     ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder
  2010-04-24 20:50       ` [PATCH 1/2] fast-import: be strict about formatting of raw dates Jonathan Nieder
@ 2010-04-24 21:10       ` Jonathan Nieder
  2010-04-26 16:02         ` Shawn O. Pearce
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-24 21:10 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre

The author, committer, and tagger name and email should not include
any embedded <, >, or newline characters.  The format of the
identification string is

  ('author'|'committer'|'tagger') sp name sp < email > sp date

If an object has no name attached, then git expects to find two spaces
in a row.

Helped-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
For malformed input, the parser in pretty.c and ‘git commit --amend’
tend to end up with different ideas of who the author is.  A lot of
the time, commit --amend gives up with "fatal: invalid commit".

 Documentation/git-fast-import.txt |    9 ++--
 fast-import.c                     |   54 ++++++++++++++++----------
 t/t9300-fast-import.sh            |   75 +++++++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+), 25 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 19082b0..ee725c6 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -337,8 +337,8 @@ change to the project.
 ....
 	'commit' SP <ref> LF
 	mark?
-	('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
-	'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	('author' SP <name>? SP LT <email> GT SP <when> LF)?
+	'committer' SP <name>? SP LT <email> GT SP <when> LF
 	data
 	('from' SP <committish> LF)?
 	('merge' SP <committish> LF)?
@@ -393,8 +393,9 @@ Here `<name>` is the person's display name (for example
 (``cm@example.com'').  `LT` and `GT` are the literal less-than (\x3c)
 and greater-than (\x3e) symbols.  These are required to delimit
 the email address from the other fields in the line.  Note that
-`<name>` is free-form and may contain any sequence of bytes, except
-`LT` and `LF`.  It is typically UTF-8 encoded.
+`<name>` and `<email>` are free-form and may contain any sequence
+of bytes that are not `LT`, `GT`, or `LF`.  Both are typically UTF-8
+encoded.
 
 The time of the change is specified by `<when>` using the date format
 that was selected by the \--date-format=<fmt> command line option.
diff --git a/fast-import.c b/fast-import.c
index 1701cf1..d919168 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -19,8 +19,8 @@ Format of STDIN stream:
 
   new_commit ::= 'commit' sp ref_str lf
     mark?
-    ('author' (sp name)? sp '<' email '>' sp when lf)?
-    'committer' (sp name)? sp '<' email '>' sp when lf
+    ('author' sp name? sp '<' email '>' sp when lf)?
+    'committer' sp name? sp '<' email '>' sp when lf
     commit_msg
     ('from' sp committish lf)?
     ('merge' sp committish lf)*
@@ -47,7 +47,7 @@ Format of STDIN stream:
 
   new_tag ::= 'tag' sp tag_str lf
     'from' sp committish lf
-    ('tagger' (sp name)? sp '<' email '>' sp when lf)?
+    ('tagger' sp name? sp '<' email '>' sp when lf)?
     tag_msg;
   tag_msg ::= data;
 
@@ -123,9 +123,8 @@ Format of STDIN stream:
   sha1exp ::= # Any valid GIT SHA1 expression;
   hexsha1 ::= # SHA1 in hexadecimal format;
 
-     # note: name and email are UTF8 strings, however name must not
-     # contain '<' or lf and email must not contain any of the
-     # following: '<', '>', lf.
+     # note: name and email are UTF8 strings, however name and email
+     # must not contain any of the following: '<', '>', lf.
      #
   name  ::= # valid GIT author/committer name;
   email ::= # valid GIT author/committer email;
@@ -1929,34 +1928,47 @@ static int validate_raw_date(const char *src, char *result, int maxlen)
 	return 0;
 }
 
-static char *parse_ident(const char *buf)
+static size_t parse_name_and_email(const char *src, char **result, size_t extra)
 {
-	const char *gt;
+	const char *lt, *gt;
 	size_t name_len;
-	char *ident;
 
-	gt = strrchr(buf, '>');
-	if (!gt)
-		die("Missing > in ident string: %s", buf);
+	lt = src + strcspn(src, "<>\n");
+	if (lt == src || lt[-1] != ' ' || *lt != '<')
+		die("Invalid name in ident string: %s", src);
+	gt = lt + 1 + strcspn(lt + 1, "<>\n");
+	if (*gt != '>')
+		die("Invalid email in ident string: %s", src);
 	gt++;
 	if (*gt != ' ')
-		die("Missing space after > in ident string: %s", buf);
+		die("Missing space after > in ident string: %s", src);
 	gt++;
-	name_len = gt - buf;
-	ident = xmalloc(name_len + 24);
-	strncpy(ident, buf, name_len);
+	name_len = gt - src;
+	*result = xmalloc(name_len + extra);
+	memcpy(*result, src, name_len);
+	return name_len;
+}
+
+static char *parse_ident(const char *buf)
+{
+	const char *date;
+	size_t name_len;
+	char *ident;
+
+	name_len = parse_name_and_email(buf, &ident, 24);
+	date = buf + name_len;
 
 	switch (whenspec) {
 	case WHENSPEC_RAW:
-		if (validate_raw_date(gt, ident + name_len, 24) < 0)
-			die("Invalid raw date \"%s\" in ident: %s", gt, buf);
+		if (validate_raw_date(date, ident + name_len, 24) < 0)
+			die("Invalid raw date \"%s\" in ident: %s", date, buf);
 		break;
 	case WHENSPEC_RFC2822:
-		if (parse_date(gt, ident + name_len, 24) < 0)
-			die("Invalid rfc2822 date \"%s\" in ident: %s", gt, buf);
+		if (parse_date(date, ident + name_len, 24) < 0)
+			die("Invalid rfc2822 date \"%s\" in ident: %s", date, buf);
 		break;
 	case WHENSPEC_NOW:
-		if (strcmp("now", gt))
+		if (strcmp("now", date))
 			die("Date in ident must be 'now': %s", buf);
 		datestamp(ident + name_len, 24);
 		break;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index ed653a7..a7e379f 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -348,6 +348,81 @@ test_expect_success \
 
 cat >input <<INPUT_END
 commit refs/heads/branch
+author <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 14:49:52 -0500
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:01 2007 -0500
+data <<COMMIT
+Nameless author, first attempt
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: require space after author name' '
+    test_must_fail git fast-import --date-format=rfc2822 <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
+author  <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 14:49:52 -0500
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:01 2007 -0500
+data <<COMMIT
+Nameless author
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: do not require author name, though' '
+    git fast-import --date-format=rfc2822 <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 14:49:52 -0500
+committer C O >Mitter <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:01 2007 -0500
+data <<COMMIT
+Odd committer
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: unparsable committer' '
+    test_must_fail git fast-import --date-format=rfc2822 <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 15:05:27 -0500
+committer $GIT_COMMITTER_NAME <aggh@<example.com> Tue Feb 6 12:35:01 2007 -0500
+data <<COMMIT
+Odd email
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: unparsable email' '
+    test_must_fail git fast-import --date-format=rfc2822 <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 15:05:27 -0500
+committer $GIT_COMMITTER_NAME <äggh!some!other!machine!example> Tue Feb 6 12:35:01 2007 -0500
+data <<COMMIT
+Bang path
+COMMIT
+
+from refs/heads/branch^0
+
+INPUT_END
+test_expect_success 'E: okay email' '
+    git fast-import --date-format=rfc2822 <input
+'
+
+cat >input <<INPUT_END
+commit refs/heads/branch
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 1170783301 -  0500
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 117078330 -0500
 data <<COMMIT
-- 
1.7.1.rc1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] fast-import: validate entire ident string
  2010-04-24 21:10       ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder
@ 2010-04-26 16:02         ` Shawn O. Pearce
  2010-04-26 16:24           ` Jonathan Nieder
  2010-05-04 17:11           ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Shawn O. Pearce @ 2010-04-26 16:02 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Mark Lodato, git, Nicolas Pitre

Jonathan Nieder <jrnieder@gmail.com> wrote:
> The author, committer, and tagger name and email should not include
> any embedded <, >, or newline characters.  The format of the
> identification string is
> 
>   ('author'|'committer'|'tagger') sp name sp < email > sp date
> 
> If an object has no name attached, then git expects to find two spaces
> in a row.

This is going to be a problem I think.  Some importers are probably
writing "committer <bob> ...." when pulling from systems that don't
have a concept of name vs. email (e.g. CVS or SVN).  I highly suspect
that requiring two spaces here will cause a lot of importers to fail.

If we really need to require two spaces, I think we need to honor
the documented input format but rewrite the line inside of the
import process to match the two space convention.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] fast-import: validate entire ident string
  2010-04-26 16:02         ` Shawn O. Pearce
@ 2010-04-26 16:24           ` Jonathan Nieder
  2010-04-26 16:30             ` Jonathan Nieder
  2010-05-04 17:11           ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-26 16:24 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre

Shawn O. Pearce wrote:

> Some importers are probably
> writing "committer <bob> ...." when pulling from systems that don't
> have a concept of name vs. email (e.g. CVS or SVN).  I highly suspect
> that requiring two spaces here will cause a lot of importers to fail.
> 
> If we really need to require two spaces,

It is not a huge deal, but ‘git commit --amend’ will die with "invalid
commit" if it does not find a “ <” sequence after the “author ”
string.  Maybe that should be changed.  Patch below.

> I think we need to honor
> the documented input format but rewrite the line inside of the
> import process to match the two space convention.

Yes, that’s doable.

Thanks for the feedback,
Jonathan

diff --git a/builtin/commit.c b/builtin/commit.c
index c5ab683..c56f2c9 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -449,19 +449,23 @@ static void determine_author_info(void)
 	date = getenv("GIT_AUTHOR_DATE");
 
 	if (use_message && !renew_authorship) {
-		const char *a, *lb, *rb, *eol;
+		const char *a, *n, *lb, *rb, *eol;
 
 		a = strstr(use_message_buffer, "\nauthor ");
 		if (!a)
 			die("invalid commit: %s", use_message);
 
-		lb = strstr(a + 8, " <");
-		rb = strstr(a + 8, "> ");
-		eol = strchr(a + 8, '\n');
+		n = a + strlen("\nauthor");
+		lb = strstr(n, " <");
+		rb = strstr(lb + 2, "> ");
+		eol = strchr(rb + 2, '\n');
 		if (!lb || !rb || !eol)
 			die("invalid commit: %s", use_message);
 
-		name = xstrndup(a + 8, lb - (a + 8));
+		if (lb == n)
+			name = xstrndup("", 0);
+		else
+			name = xstrndup(n + 1, lb - (n + 1));
 		email = xstrndup(lb + 2, rb - (lb + 2));
 		date = xstrndup(rb + 2, eol - (rb + 2));
 	}
@@ -470,7 +474,7 @@ static void determine_author_info(void)
 		const char *lb = strstr(force_author, " <");
 		const char *rb = strchr(force_author, '>');
 
-		if (!lb || !rb)
+		if (!lb || !rb || rb < lb)
 			die("malformed --author parameter");
 		name = xstrndup(force_author, lb - force_author);
 		email = xstrndup(lb + 2, rb - (lb + 2));
-- 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] fast-import: validate entire ident string
  2010-04-26 16:24           ` Jonathan Nieder
@ 2010-04-26 16:30             ` Jonathan Nieder
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2010-04-26 16:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre

Jonathan Nieder wrote:

> -		lb = strstr(a + 8, " <");
> -		rb = strstr(a + 8, "> ");
> -		eol = strchr(a + 8, '\n');
> +		n = a + strlen("\nauthor");
> +		lb = strstr(n, " <");
> +		rb = strstr(lb + 2, "> ");
> +		eol = strchr(rb + 2, '\n');
>  		if (!lb || !rb || !eol)
>  			die("invalid commit: %s", use_message);

Err, this will segv when it fails; better to use

	lb = a + strlen("\nauthor ");
	lb = strchrnul(lb, '<');
	rb = strchrnul(lb, '>');
	eol = strchrnul(rb, '\n');
	if (!*lb || !*rb || !*eol)
		die("invalid commit: %s", use_message);

This is even more permissive, but I think that’s okay.

Sorry for the noise.
Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] fast-import: validate entire ident string
  2010-04-26 16:02         ` Shawn O. Pearce
  2010-04-26 16:24           ` Jonathan Nieder
@ 2010-05-04 17:11           ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2010-05-04 17:11 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jonathan Nieder, Mark Lodato, git, Nicolas Pitre

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Jonathan Nieder <jrnieder@gmail.com> wrote:
>> The author, committer, and tagger name and email should not include
>> any embedded <, >, or newline characters.  The format of the
>> identification string is
>> 
>>   ('author'|'committer'|'tagger') sp name sp < email > sp date
>> 
>> If an object has no name attached, then git expects to find two spaces
>> in a row.
>
> This is going to be a problem I think.  Some importers are probably
> writing "committer <bob> ...." when pulling from systems that don't
> have a concept of name vs. email (e.g. CVS or SVN).  I highly suspect
> that requiring two spaces here will cause a lot of importers to fail.
>
> If we really need to require two spaces, I think we need to honor
> the documented input format but rewrite the line inside of the
> import process to match the two space convention.

It probably should document [sp name] (or [name sp]) an "zero or one"
item, if we want to be lenient.

I also think it may not be such a bad idea to allow fast-import to add a
phoney "name" by taking everything in e-mail before the first '@', just
like how the git-cvsimport and git-svn does, when the input stream does
not have a "name".  I am not sure how usable "shortlog" output would be
otherwise, without any "name" field.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-05-04 17:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-24  0:45 [PATCH] fast-import docs: LT is valid in email, GT is not Mark Lodato
2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder
2010-04-24 16:59   ` Jonathan Nieder
2010-04-24 19:04   ` Shawn O. Pearce
2010-04-24 20:38     ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder
2010-04-24 20:50       ` [PATCH 1/2] fast-import: be strict about formatting of raw dates Jonathan Nieder
2010-04-24 21:10       ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder
2010-04-26 16:02         ` Shawn O. Pearce
2010-04-26 16:24           ` Jonathan Nieder
2010-04-26 16:30             ` Jonathan Nieder
2010-05-04 17:11           ` Junio C Hamano
2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder
2010-04-24 16:59   ` Mark Lodato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.