All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast-import: catch garbage after marks in from/merge
@ 2012-04-01 22:54 Pete Wyckoff
  2012-04-01 23:12 ` Jonathan Nieder
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-01 22:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Dmitry Ivankov

A forgotten LF can lead to a confusing bug.  The last
line in this commit command is wrong:

    commit refs/heads/S2
    committer Name <name@example.com> 1112912893 -0400
    data <<COMMIT
    commit message
    COMMIT
    from :1M 100644 :103 hello.c

It is missing a newline and should be:

    from :1
    M 100644 :103 hello.c

Make fast-import complain about the buggy input, for both
from and merge lines that use marks.

Signed-off-by: Pete Wyckoff <pw@padd.com>
---
I spent too long tracking down the bug described in the
commit message.  It might help future users if fast-import
were to complain in this case.

 fast-import.c          |   24 +++++++++--
 t/t9300-fast-import.sh |  104 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index a85275d..13001bb 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2537,8 +2537,16 @@ static int parse_from(struct branch *b)
 		hashcpy(b->branch_tree.versions[0].sha1, t);
 		hashcpy(b->branch_tree.versions[1].sha1, t);
 	} else if (*from == ':') {
-		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
-		struct object_entry *oe = find_mark(idnum);
+		char *eptr;
+		uintmax_t idnum = strtoumax(from + 1, &eptr, 10);
+		struct object_entry *oe;
+		if (eptr) {
+			for (; *eptr && isspace(*eptr); eptr++) ;
+			if (*eptr)
+				die("Garbage after mark: %s",
+				    command_buf.buf);
+		}
+		oe = find_mark(idnum);
 		if (oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", idnum);
 		hashcpy(b->sha1, oe->idx.sha1);
@@ -2572,8 +2580,16 @@ static struct hash_list *parse_merge(unsigned int *count)
 		if (s)
 			hashcpy(n->sha1, s->sha1);
 		else if (*from == ':') {
-			uintmax_t idnum = strtoumax(from + 1, NULL, 10);
-			struct object_entry *oe = find_mark(idnum);
+			char *eptr;
+			uintmax_t idnum = strtoumax(from + 1, &eptr, 10);
+			struct object_entry *oe;
+			if (eptr) {
+				for (; *eptr && isspace(*eptr); eptr++) ;
+				if (*eptr)
+					die("Garbage after mark: %s",
+					    command_buf.buf);
+			}
+			oe = find_mark(idnum);
 			if (oe->type != OBJ_COMMIT)
 				die("Mark :%" PRIuMAX " not a commit", idnum);
 			hashcpy(n->sha1, oe->idx.sha1);
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..46346cd 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2635,4 +2635,108 @@ test_expect_success \
 	'n=$(grep $a verify | wc -l) &&
 	 test 1 = $n'
 
+###
+### series S
+###
+#
+# Set up 1, 2, 3 of this:
+#
+# 1--2--4
+#  \   /
+#   -3-
+#
+# Then typo the merge to create #4, make sure it complains.
+#
+test_tick
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :1
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+blob 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+blob 2
+COMMIT
+from :1
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: add commits' '
+	git fast-import --export-marks=marks <input
+'
+
+cat >input <<INPUT_END
+blob
+mark :103
+data <<BLOB
+blob 3
+BLOB
+
+commit refs/heads/S2
+mark :3
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+blob 3
+COMMIT
+from :1M 100644 :103 hello.c
+INPUT_END
+
+test_expect_success 'S: from mark line overrun' '
+	test_must_fail git fast-import --import-marks=marks <input
+'
+
+cat >input <<INPUT_END
+blob
+mark :103
+data <<BLOB
+blob 3
+BLOB
+
+commit refs/heads/S2
+mark :3
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+blob 3
+COMMIT
+from :1
+M 100644 :103 hello.c
+INPUT_END
+
+test_expect_success 'S: add from commit' '
+	git fast-import --import-marks=marks --export-marks=marks <input
+'
+
+cat >input <<INPUT_END
+blob
+mark :104
+data <<BLOB
+blob 4
+BLOB
+
+commit refs/heads/S
+mark :4
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+blob 4
+COMMIT
+from :2
+merge :3M 100644 :4 hello.c
+INPUT_END
+
+test_expect_success 'S: merge mark line overrun' '
+	test_must_fail git fast-import --import-marks=marks <input
+'
+
 test_done
-- 
1.7.10.rc2.55.gb775a

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] fast-import: catch garbage after marks in from/merge
  2012-04-01 22:54 [PATCH] fast-import: catch garbage after marks in from/merge Pete Wyckoff
@ 2012-04-01 23:12 ` Jonathan Nieder
  2012-04-02  0:13   ` Pete Wyckoff
  2012-04-02 16:15 ` Junio C Hamano
  2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
  2 siblings, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-01 23:12 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier

Hi Pete,

Pete Wyckoff wrote:

>     from :1M 100644 :103 hello.c
>
> It is missing a newline and should be:
>
>     from :1
>     M 100644 :103 hello.c

Good idea; thanks.

I agree that this at least deserves a warning and probably should
error out.

[...]
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -2537,8 +2537,16 @@ static int parse_from(struct branch *b)
>  		hashcpy(b->branch_tree.versions[0].sha1, t);
>  		hashcpy(b->branch_tree.versions[1].sha1, t);
>  	} else if (*from == ':') {
> -		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
> -		struct object_entry *oe = find_mark(idnum);
> +		char *eptr;
> +		uintmax_t idnum = strtoumax(from + 1, &eptr, 10);
> +		struct object_entry *oe;
> +		if (eptr) {
> +			for (; *eptr && isspace(*eptr); eptr++) ;
> +			if (*eptr)
> +				die("Garbage after mark: %s",

The implementation seems more complicated than it needs to be.  Why
allow whitespace after the mark number?

Curious,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fast-import: catch garbage after marks in from/merge
  2012-04-01 23:12 ` Jonathan Nieder
@ 2012-04-02  0:13   ` Pete Wyckoff
  2012-04-02  6:56     ` Dmitry Ivankov
  2012-04-02 15:43     ` Jonathan Nieder
  0 siblings, 2 replies; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-02  0:13 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier

jrnieder@gmail.com wrote on Sun, 01 Apr 2012 18:12 -0500:
> Hi Pete,
> 
> Pete Wyckoff wrote:
> 
> >     from :1M 100644 :103 hello.c
> >
> > It is missing a newline and should be:
> >
> >     from :1
> >     M 100644 :103 hello.c
> 
> Good idea; thanks.
> 
> I agree that this at least deserves a warning and probably should
> error out.
> 
> [...]
> > --- a/fast-import.c
> > +++ b/fast-import.c
> > @@ -2537,8 +2537,16 @@ static int parse_from(struct branch *b)
> >  		hashcpy(b->branch_tree.versions[0].sha1, t);
> >  		hashcpy(b->branch_tree.versions[1].sha1, t);
> >  	} else if (*from == ':') {
> > -		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
> > -		struct object_entry *oe = find_mark(idnum);
> > +		char *eptr;
> > +		uintmax_t idnum = strtoumax(from + 1, &eptr, 10);
> > +		struct object_entry *oe;
> > +		if (eptr) {
> > +			for (; *eptr && isspace(*eptr); eptr++) ;
> > +			if (*eptr)
> > +				die("Garbage after mark: %s",
> 
> The implementation seems more complicated than it needs to be.  Why
> allow whitespace after the mark number?

Fear of breaking existing fast-import users that might happen
to have stray whitespace, or \r\n terminators.

Other similar fast-import are less forgiving, such as
parse_cat_blob.  Maybe we should generalize and enforce its
approach to parsing marks.

		-- Pete

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fast-import: catch garbage after marks in from/merge
  2012-04-02  0:13   ` Pete Wyckoff
@ 2012-04-02  6:56     ` Dmitry Ivankov
  2012-04-02 16:16       ` Junio C Hamano
  2012-04-02 15:43     ` Jonathan Nieder
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry Ivankov @ 2012-04-02  6:56 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Jonathan Nieder, git, David Barr, Sverre Rabbelier

On Mon, Apr 2, 2012 at 6:13 AM, Pete Wyckoff <pw@padd.com> wrote:
> jrnieder@gmail.com wrote on Sun, 01 Apr 2012 18:12 -0500:
>> Hi Pete,
>>
>> Pete Wyckoff wrote:
>>
>> >     from :1M 100644 :103 hello.c
>> >
>> > It is missing a newline and should be:
>> >
>> >     from :1
>> >     M 100644 :103 hello.c
>>
>> Good idea; thanks.
>>
>> I agree that this at least deserves a warning and probably should
>> error out.
>>
>> [...]
>> > --- a/fast-import.c
>> > +++ b/fast-import.c
>> > @@ -2537,8 +2537,16 @@ static int parse_from(struct branch *b)
>> >             hashcpy(b->branch_tree.versions[0].sha1, t);
>> >             hashcpy(b->branch_tree.versions[1].sha1, t);
>> >     } else if (*from == ':') {
>> > -           uintmax_t idnum = strtoumax(from + 1, NULL, 10);
>> > -           struct object_entry *oe = find_mark(idnum);
>> > +           char *eptr;
>> > +           uintmax_t idnum = strtoumax(from + 1, &eptr, 10);
>> > +           struct object_entry *oe;
>> > +           if (eptr) {
>> > +                   for (; *eptr && isspace(*eptr); eptr++) ;
>> > +                   if (*eptr)
>> > +                           die("Garbage after mark: %s",
>>
>> The implementation seems more complicated than it needs to be.  Why
>> allow whitespace after the mark number?
>
> Fear of breaking existing fast-import users that might happen
> to have stray whitespace, or \r\n terminators.
>
> Other similar fast-import are less forgiving, such as
> parse_cat_blob.  Maybe we should generalize and enforce its
> approach to parsing marks.

Docs say that "fast-import is very strict about its input", so
probably it is ok to both deny trailing spaces and fix all other
strtoumax()-es.

>
>                -- Pete

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fast-import: catch garbage after marks in from/merge
  2012-04-02  0:13   ` Pete Wyckoff
  2012-04-02  6:56     ` Dmitry Ivankov
@ 2012-04-02 15:43     ` Jonathan Nieder
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-02 15:43 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier

Pete Wyckoff wrote:
> jrnieder@gmail.com wrote on Sun, 01 Apr 2012 18:12 -0500:

>> Why allow whitespace after the mark number?
>
> Fear of breaking existing fast-import users that might happen
> to have stray whitespace, or \r\n terminators.

Oh.  I think worrying about that sort of thing is a very good thing.
In particular, the possibility of \r\n terminators from an importer
that writes its output in text mode on Windows is somewhat scary.

Luckily such a problem would be caught as soon as the importer uses
a filemodify (M) command:

 - if it quotes the path:

	fatal: Garbage after path in: M ...

 - if it doesn't quote the path, the resulting import would use
   filenames with trailing CRs.

So we narrowly escape that problem.

Other types of stray whitespace at eol seem less likely but it's
probably worth letting a patch like this incubate in the "next" branch
for a little while to see if anyone complains.

The reason to care instead of just being permissive is that git
fast-import is not the only fast-import backend.  The tighter we can
make the parsing without breaking existing frontends, the better the
chance that a frontend that was only tested against git fast-import
will be usable against all other backends, too.

In other words, there are two requirements in tension here
(compatibility with sloppy frontends, compatibility with fussy
backends).  Some day fast-import might want to learn a --permissive
option to reconcile them.  In this particular case being strict
unconditionally seems safe enough.

My only other hint is that as Dmitry mentioned it would be nice to
nice to use the same logic for other places where fast-import parses a
number at the end of a line, so people don't keep on making the same
print vs println mistake and sending new patches to detect it in each
place one at a time.

Sane?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fast-import: catch garbage after marks in from/merge
  2012-04-01 22:54 [PATCH] fast-import: catch garbage after marks in from/merge Pete Wyckoff
  2012-04-01 23:12 ` Jonathan Nieder
@ 2012-04-02 16:15 ` Junio C Hamano
  2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
  2 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2012-04-02 16:15 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: git, Jonathan Nieder, Dmitry Ivankov

Pete Wyckoff <pw@padd.com> writes:

> A forgotten LF can lead to a confusing bug.  The last
> line in this commit command is wrong:
> ...
> It is missing a newline and should be:
>
>     from :1
>     M 100644 :103 hello.c

During my first reading of this, this introductory part made me think "oh,
so this is a patch to fix somebody who produces a wrong data that is fed
to fast-import".  But it does not seem to be the case.

Please rephrase the first sentence.  Is it a confusing _bug_, or the
program produces a garbage output when fed a garbage output?

> Make fast-import complain about the buggy input, for both
> from and merge lines that use marks.

Perhaps these two lines, negated to state the current behaviour e.g. "git
fast-import does not complain when a mark that is used in 'from' or
'merge' command to name a commit is not followed by the mandatory LF." can
be used to replace the first sentence, followed by "It does X" to describe
the user-observable breakage for a bonus point.

> Signed-off-by: Pete Wyckoff <pw@padd.com>
> ---
> I spent too long tracking down the bug described in the
> commit message.  It might help future users if fast-import
> were to complain in this case.

It would help future users if the commit log actually described the
symptom caused by the bug; otherwise, future users would not notice when
hitting the same issue.

> diff --git a/fast-import.c b/fast-import.c
> index a85275d..13001bb 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -2537,8 +2537,16 @@ static int parse_from(struct branch *b)
>  		hashcpy(b->branch_tree.versions[0].sha1, t);
>  		hashcpy(b->branch_tree.versions[1].sha1, t);
>  	} else if (*from == ':') {
> -		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
> -		struct object_entry *oe = find_mark(idnum);
> +		char *eptr;
> +		uintmax_t idnum = strtoumax(from + 1, &eptr, 10);
> +		struct object_entry *oe;
> +		if (eptr) {
> +			for (; *eptr && isspace(*eptr); eptr++) ;

Put the empty body on a separate line, i.e.

			for ( ; isspace(*eptr); eptr++)
				; /* nothing */

> +			if (*eptr)
> +				die("Garbage after mark: %s",
> +				    command_buf.buf);

Good.

> +		}
> +		oe = find_mark(idnum);
>  		if (oe->type != OBJ_COMMIT)
>  			die("Mark :%" PRIuMAX " not a commit", idnum);

Would it help future callers if you made this small part that parses a
mark into a separate small helper function that returns an oe and
increment the pointer so that the caller can peek at the terminating
character to enforce the syntax?  E.g.

	} else if (*from == ':') {
		char *cp = from + 1;
		struct object_entry *oe = parse_mark(&cp);
		if (*cp)
			die("Garbage after mark: %s", command_buf.buf);
                if (!oe || oe->type != OBJ_COMMIT)
                	die("No such commit: %s", command_buf.buf);
	}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] fast-import: catch garbage after marks in from/merge
  2012-04-02  6:56     ` Dmitry Ivankov
@ 2012-04-02 16:16       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2012-04-02 16:16 UTC (permalink / raw)
  To: Dmitry Ivankov
  Cc: Pete Wyckoff, Jonathan Nieder, git, David Barr, Sverre Rabbelier

Dmitry Ivankov <divanorama@gmail.com> writes:

>> Other similar fast-import are less forgiving, such as
>> parse_cat_blob. Maybe we should generalize and enforce its
>> approach to parsing marks.
>
> Docs say that "fast-import is very strict about its input", so
> probably it is ok to both deny trailing spaces and fix all other
> strtoumax()-es.

Concurred.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCHv2 0/2] fast-import: tighten parsing of mark references
  2012-04-01 22:54 [PATCH] fast-import: catch garbage after marks in from/merge Pete Wyckoff
  2012-04-01 23:12 ` Jonathan Nieder
  2012-04-02 16:15 ` Junio C Hamano
@ 2012-04-03  1:51 ` Pete Wyckoff
  2012-04-03  1:51   ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
                     ` (3 more replies)
  2 siblings, 4 replies; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-03  1:51 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano

Thanks Dmitry, Jonathan and Junio for the comments.  I'm happy to
have fast-import be strict about its format, and have added code
and tests that demand exactly one space, or an end-of-line, as
necessary.  I also made sure the other error messages involved
with parsing datarefs are correct.

Jonathan, good observation on CRLF users.  If we did want to
cater to them, doing it centrally in read_next_command() would be
the way to go.  But why bother.

Regarding fixing up all end-of-line number parsing, I think the
only other one is dates.  Both "raw" and "now" check for garbage
at end-of-line, but "rfc2822" uses a generic function that
accepts junk.  I'm not motivated to add a lot of code to fix that
corner case.

Junio, I made the commit message more clear.  For the idea of
combining find_mark + parse_mark, that isn't general enough for
all users.  This is the construct used in many places:

    oe = find_mark(parse_mark_ref_space(p, &x));

Also, I did the unit tests first, to make sure things were broken
as I expected.  You can squash it all together if you prefer.

		-- Pete

Pete Wyckoff (2):
  fast-import: test behavior of garbage after mark references
  fast-import: tighten parsing of mark references

 fast-import.c          |   97 ++++++++++++++----
 t/t9300-fast-import.sh |  267 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 342 insertions(+), 22 deletions(-)

-- 
1.7.10.rc2.2.g38670

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
  2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
@ 2012-04-03  1:51   ` Pete Wyckoff
  2012-04-03 14:00     ` Jonathan Nieder
  2012-04-03  1:51   ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-03  1:51 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano

Add 15 tests to see what happens when extra characters
appear after a mark reference, in all places that take marks.
Ten of these fail.

Signed-off-by: Pete Wyckoff <pw@padd.com>
---
 t/t9300-fast-import.sh |  267 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)

diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..621f02a 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2635,4 +2635,271 @@ test_expect_success \
 	'n=$(grep $a verify | wc -l) &&
 	 test 1 = $n'
 
+###
+### series S
+###
+#
+# Set up is roughly this.  Commits marked 1,2,3,4.  Blobs
+# marked 100 + commit.  Notes 200 +.  Make sure missing spaces
+# and EOLs after mark references cause errors.
+#
+# 1--2--4
+#  \   /
+#   -3-
+#
+test_tick
+
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :1
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 2
+COMMIT
+from :1
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+
+blob
+mark :103
+data <<BLOB
+blob 3
+BLOB
+
+blob
+mark :202
+data <<BLOB
+note 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: add commits 1 and 2, and blob 103' '
+	git fast-import --export-marks=marks <input
+'
+
+#
+# filemodify, three datarefs
+#
+test_expect_failure 'S: filemodify markref no space' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 :103x hello.c
+	EOF
+	cat err &&
+	grep -q "Missing space after mark" err
+'
+
+test_expect_failure 'S: filemodify inline no space' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 inlineX hello.c
+	data <<BLOB
+	inline
+	BLOB
+	EOF
+	cat err &&
+	grep -q "Missing space after .inline." err
+'
+
+test_expect_success 'S: filemodify sha1 no space' '
+	sha1=$(grep -w :103 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 ${sha1}x hello.c
+	EOF
+	cat err &&
+	grep -q "Missing space after SHA1" err
+'
+
+#
+# notemodify, three ways to say dataref
+#
+test_expect_failure 'S: notemodify dataref markref no space' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref markref
+	COMMIT
+	N :103x :2
+	EOF
+	cat err &&
+	grep -q "Missing space after mark" err
+'
+
+test_expect_failure 'S: notemodify dataref inline no space' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref inline
+	COMMIT
+	N inlineX :2
+	data <<BLOB
+	note blob
+	BLOB
+	EOF
+	cat err &&
+	grep -q "Missing space after .inline." err
+'
+
+test_expect_success 'S: notemodify dataref sha1 no space' '
+	sha1=$(grep -w :2 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref sha1
+	COMMIT
+	N ${sha1}x :2
+	EOF
+	cat err &&
+	grep -q "Missing space after SHA1" err
+'
+
+#
+# notemodify, mark in committish
+#
+test_expect_failure 'S: notemodify committish markref junk at eol' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/Snotes
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note committish
+	COMMIT
+	N :202 :2x
+	EOF
+	cat err &&
+	grep -q "Garbage after mark" err
+'
+
+#
+# from
+#
+test_expect_failure 'S: from markref junk at eol' '
+	# no &&
+	git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
+	commit refs/heads/S2
+	mark :3
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :1x
+	M 100644 :103 hello.c
+	EOF
+
+	ret=$? &&
+	echo returned $ret &&
+	test $ret -ne 0 && # failed, but it created the commit
+
+	# go create the commit, need it for merge test
+	git fast-import --import-marks=marks --export-marks=marks <<-EOF &&
+	commit refs/heads/S2
+	mark :3
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :1
+	M 100644 :103 hello.c
+	EOF
+
+	# now evaluate the error
+	cat err &&
+	grep -q "Garbage after mark" err
+'
+
+
+#
+# merge
+#
+test_expect_failure 'S: merge markref junk at eol' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	mark :4
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :2
+	merge :3x
+	M 100644 :103 hello.c
+	EOF
+	cat err &&
+	grep -q "Garbage after mark" err
+'
+
+#
+# tag, from markref
+#
+test_expect_failure 'S: tag markref junk at eol' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	tag refs/tags/Stag
+	from :2x
+	tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<TAG
+	tag S
+	TAG
+	EOF
+	cat err &&
+	grep -q "Garbage after mark" err
+'
+
+#
+# cat-blob markref
+#
+test_expect_success 'S: cat-blob markref junk at eol' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	cat-blob :2x
+	EOF
+	cat err &&
+	grep -q "Garbage after mark" err
+'
+
+#
+# ls markref
+#
+test_expect_failure 'S: ls markref space' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	ls :2x hello.c
+	EOF
+	cat err &&
+	grep -q "Missing space after mark" err
+'
+
+test_expect_failure 'S: ls sha1 space' '
+	sha1=$(grep -w :2 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	ls ${sha1}x hello.c
+	EOF
+	cat err &&
+	grep -q "Missing space after SHA1" err
+'
+
 test_done
-- 
1.7.10.rc2.2.g38670

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCHv2 2/2] fast-import: tighten parsing of mark references
  2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
  2012-04-03  1:51   ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
@ 2012-04-03  1:51   ` Pete Wyckoff
  2012-04-03 14:20     ` Jonathan Nieder
  2012-04-03  2:00   ` [PATCHv2 0/2] " Sverre Rabbelier
  2012-04-05  1:51   ` [PATCHv3] " Pete Wyckoff
  3 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-03  1:51 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano

The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference.  Fast-import does not complain when garbage
appears after a mark reference in some cases.

Factor out parsing of mark references and complain if
errant characters are found.

Buggy input can cause fast-import to produce the wrong output,
silently, without error.  This makes it difficult to track
down buggy generators of fast-import streams.  An example is
seen in the last line of this commit command:

    commit refs/heads/S2
    committer Name <name@example.com> 1112912893 -0400
    data <<COMMIT
    commit message
    COMMIT
    from :1M 100644 :103 hello.c

It is missing a newline and should be:

    [...]
    from :1
    M 100644 :103 hello.c

What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^.  What the buggy
program was expecting was the contents of blob :103.  While
the resulting commit graph looked correct, the contents in
some commits were wrong.

Signed-off-by: Pete Wyckoff <pw@padd.com>
---
 fast-import.c          |   97 +++++++++++++++++++++++++++++++++++++-----------
 t/t9300-fast-import.sh |   20 +++++-----
 2 files changed, 85 insertions(+), 32 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index a85275d..bd1b9d1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2207,6 +2207,57 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
 	return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
 }
 
+/*
+ * Given a pointer into a string, parse a mark reference:
+ *
+ *   idnum ::= ':' bigint;
+ *
+ * Return the first character after the value in *endptr.
+ *
+ * Complain if the following character is not what is expected,
+ * either a space or end of the string.
+ */
+static uintmax_t parse_mark_ref(const char *p, char **endptr)
+{
+	uintmax_t mark;
+
+	assert(*p == ':');
+	++p;
+	mark = strtoumax(p, endptr, 10);
+	if (*endptr == p)
+		die("No value after ':' in mark: %s", command_buf.buf);
+	return mark;
+}
+
+/*
+ * Parse the mark reference, and complain if this is not the end of
+ * the string.
+ */
+static uintmax_t parse_mark_ref_eol(const char *p)
+{
+	char *end;
+	uintmax_t mark;
+
+	mark = parse_mark_ref(p, &end);
+	if (*end != '\0')
+		die("Garbage after mark: %s", command_buf.buf);
+	return mark;
+}
+
+/*
+ * Parse the mark reference, demanding a trailing space.  Return a
+ * pointer to the space.
+ */
+static uintmax_t parse_mark_ref_space(const char *p, char **endptr)
+{
+	uintmax_t mark;
+
+	mark = parse_mark_ref(p, endptr);
+	if (**endptr != ' ')
+		die("Missing space after mark: %s", command_buf.buf);
+	return mark;
+}
+
 static void file_change_m(struct branch *b)
 {
 	const char *p = command_buf.buf + 2;
@@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
 
 	if (*p == ':') {
 		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
+		oe = find_mark(parse_mark_ref_space(p, &x));
 		hashcpy(sha1, oe->idx.sha1);
 		p = x;
 	} else if (!prefixcmp(p, "inline")) {
 		inline_data = 1;
 		p += 6;
+		if (*p != ' ')
+			die("Missing space after 'inline': %s",
+			    command_buf.buf);
 	} else {
 		if (get_sha1_hex(p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
 		oe = find_object(sha1);
 		p += 40;
+		if (*p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
 	}
-	if (*p++ != ' ')
-		die("Missing space after SHA1: %s", command_buf.buf);
+	++p;  /* skip space */
 
 	strbuf_reset(&uq);
 	if (!unquote_c_style(&uq, p, &endp)) {
@@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
 	/* <dataref> or 'inline' */
 	if (*p == ':') {
 		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
+		oe = find_mark(parse_mark_ref_space(p, &x));
 		hashcpy(sha1, oe->idx.sha1);
 		p = x;
 	} else if (!prefixcmp(p, "inline")) {
 		inline_data = 1;
 		p += 6;
+		if (*p != ' ')
+			die("Missing space after 'inline': %s",
+			    command_buf.buf);
 	} else {
 		if (get_sha1_hex(p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
 		oe = find_object(sha1);
 		p += 40;
+		if (*p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
 	}
-	if (*p++ != ' ')
-		die("Missing space after SHA1: %s", command_buf.buf);
+	++p;  /* skip space */
 
 	/* <committish> */
 	s = lookup_branch(p);
@@ -2430,7 +2489,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
 			die("Can't add a note on empty branch.");
 		hashcpy(commit_sha1, s->sha1);
 	} else if (*p == ':') {
-		uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+		uintmax_t commit_mark = parse_mark_ref_eol(p);
 		struct object_entry *commit_oe = find_mark(commit_mark);
 		if (commit_oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", commit_mark);
@@ -2537,7 +2596,7 @@ static int parse_from(struct branch *b)
 		hashcpy(b->branch_tree.versions[0].sha1, t);
 		hashcpy(b->branch_tree.versions[1].sha1, t);
 	} else if (*from == ':') {
-		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+		uintmax_t idnum = parse_mark_ref_eol(from);
 		struct object_entry *oe = find_mark(idnum);
 		if (oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2572,7 +2631,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 		if (s)
 			hashcpy(n->sha1, s->sha1);
 		else if (*from == ':') {
-			uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+			uintmax_t idnum = parse_mark_ref_eol(from);
 			struct object_entry *oe = find_mark(idnum);
 			if (oe->type != OBJ_COMMIT)
 				die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2735,7 +2794,7 @@ static void parse_new_tag(void)
 		type = OBJ_COMMIT;
 	} else if (*from == ':') {
 		struct object_entry *oe;
-		from_mark = strtoumax(from + 1, NULL, 10);
+		from_mark = parse_mark_ref_eol(from);
 		oe = find_mark(from_mark);
 		type = oe->type;
 		hashcpy(sha1, oe->idx.sha1);
@@ -2867,14 +2926,9 @@ static void parse_cat_blob(void)
 	/* cat-blob SP <object> LF */
 	p = command_buf.buf + strlen("cat-blob ");
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
-		if (x == p + 1)
-			die("Invalid mark: %s", command_buf.buf);
+		oe = find_mark(parse_mark_ref_eol(p));
 		if (!oe)
 			die("Unknown mark: %s", command_buf.buf);
-		if (*x)
-			die("Garbage after mark: %s", command_buf.buf);
 		hashcpy(sha1, oe->idx.sha1);
 	} else {
 		if (get_sha1_hex(p, sha1))
@@ -2945,9 +2999,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 
 	if (**p == ':') {	/* <mark> */
 		char *endptr;
-		e = find_mark(strtoumax(*p + 1, &endptr, 10));
-		if (endptr == *p + 1)
-			die("Invalid mark: %s", command_buf.buf);
+		e = find_mark(parse_mark_ref_space(*p, &endptr));
 		if (!e)
 			die("Unknown mark: %s", command_buf.buf);
 		*p = endptr;
@@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	} else {	/* <sha1> */
 		if (get_sha1_hex(*p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
-		e = find_object(sha1);
 		*p += 40;
+		if (**p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
+		e = find_object(sha1);
 	}
+	*p += 1;  /* skip space */
 
 	while (!e || e->type != OBJ_TREE)
 		e = dereference(e, sha1);
@@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
 		root = new_tree_entry();
 		hashcpy(root->versions[1].sha1, e->idx.sha1);
 		load_tree(root);
-		if (*p++ != ' ')
-			die("Missing space after tree-ish: %s", command_buf.buf);
 	}
 	if (*p == '"') {
 		static struct strbuf uq = STRBUF_INIT;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 621f02a..875ac7a 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2693,7 +2693,7 @@ test_expect_success 'S: add commits 1 and 2, and blob 103' '
 #
 # filemodify, three datarefs
 #
-test_expect_failure 'S: filemodify markref no space' '
+test_expect_success 'S: filemodify markref no space' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	commit refs/heads/S
 	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2706,7 +2706,7 @@ test_expect_failure 'S: filemodify markref no space' '
 	grep -q "Missing space after mark" err
 '
 
-test_expect_failure 'S: filemodify inline no space' '
+test_expect_success 'S: filemodify inline no space' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	commit refs/heads/S
 	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2739,7 +2739,7 @@ test_expect_success 'S: filemodify sha1 no space' '
 #
 # notemodify, three ways to say dataref
 #
-test_expect_failure 'S: notemodify dataref markref no space' '
+test_expect_success 'S: notemodify dataref markref no space' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	commit refs/heads/S
 	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2752,7 +2752,7 @@ test_expect_failure 'S: notemodify dataref markref no space' '
 	grep -q "Missing space after mark" err
 '
 
-test_expect_failure 'S: notemodify dataref inline no space' '
+test_expect_success 'S: notemodify dataref inline no space' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	commit refs/heads/S
 	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2785,7 +2785,7 @@ test_expect_success 'S: notemodify dataref sha1 no space' '
 #
 # notemodify, mark in committish
 #
-test_expect_failure 'S: notemodify committish markref junk at eol' '
+test_expect_success 'S: notemodify committish markref junk at eol' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	commit refs/heads/Snotes
 	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2801,7 +2801,7 @@ test_expect_failure 'S: notemodify committish markref junk at eol' '
 #
 # from
 #
-test_expect_failure 'S: from markref junk at eol' '
+test_expect_success 'S: from markref junk at eol' '
 	# no &&
 	git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
 	commit refs/heads/S2
@@ -2839,7 +2839,7 @@ test_expect_failure 'S: from markref junk at eol' '
 #
 # merge
 #
-test_expect_failure 'S: merge markref junk at eol' '
+test_expect_success 'S: merge markref junk at eol' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	commit refs/heads/S
 	mark :4
@@ -2858,7 +2858,7 @@ test_expect_failure 'S: merge markref junk at eol' '
 #
 # tag, from markref
 #
-test_expect_failure 'S: tag markref junk at eol' '
+test_expect_success 'S: tag markref junk at eol' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	tag refs/tags/Stag
 	from :2x
@@ -2885,7 +2885,7 @@ test_expect_success 'S: cat-blob markref junk at eol' '
 #
 # ls markref
 #
-test_expect_failure 'S: ls markref space' '
+test_expect_success 'S: ls markref space' '
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	ls :2x hello.c
 	EOF
@@ -2893,7 +2893,7 @@ test_expect_failure 'S: ls markref space' '
 	grep -q "Missing space after mark" err
 '
 
-test_expect_failure 'S: ls sha1 space' '
+test_expect_success 'S: ls sha1 space' '
 	sha1=$(grep -w :2 marks | cut -d\  -f2) &&
 	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
 	ls ${sha1}x hello.c
-- 
1.7.10.rc2.2.g38670

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 0/2] fast-import: tighten parsing of mark references
  2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
  2012-04-03  1:51   ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
  2012-04-03  1:51   ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
@ 2012-04-03  2:00   ` Sverre Rabbelier
  2012-04-05  1:51   ` [PATCHv3] " Pete Wyckoff
  3 siblings, 0 replies; 22+ messages in thread
From: Sverre Rabbelier @ 2012-04-03  2:00 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Jonathan Nieder, Dmitry Ivankov, David Barr, Junio C Hamano

On Mon, Apr 2, 2012 at 20:51, Pete Wyckoff <pw@padd.com> wrote:
> Also, I did the unit tests first, to make sure things were broken
> as I expected.  You can squash it all together if you prefer.

Nice series. I agree that it's good to be strict in what we accept
(breaking from the "be liberal in what you accept" mantra), due to the
importance of the input stream being parsed exactly right (as opposed
to a browser, where it's better to not render something bold than to
just break because a </b> was missing).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
  2012-04-03  1:51   ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
@ 2012-04-03 14:00     ` Jonathan Nieder
  2012-04-04  0:46       ` Pete Wyckoff
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-03 14:00 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier, Junio C Hamano

Pete Wyckoff wrote:

> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -2635,4 +2635,271 @@ test_expect_success \
>  	'n=$(grep $a verify | wc -l) &&
>  	 test 1 = $n'
>  
> +###
> +### series S
> +###
> +#
> +# Set up is roughly this.  Commits marked 1,2,3,4.  Blobs
> +# marked 100 + commit.  Notes 200 +.  Make sure missing spaces
> +# and EOLs after mark references cause errors.

Nit: "Set up" should be "Setup" when it is a noun.

[...]
> +test_expect_success 'S: add commits 1 and 2, and blob 103' '
> +	git fast-import --export-marks=marks <input
> +'

Ok, this one sets up for later ones...

> +
> +#
> +# filemodify, three datarefs
> +#
> +test_expect_failure 'S: filemodify markref no space' '

What is this testing for?  The ideal is that each test_expect_foo line
contains a proposition and the test checks whether that proposition is
true or false.  For example:

	test_expect_failure 'S: filemodify with garbage after mark errors out' '

Likewise in later tests.

> +	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> +	commit refs/heads/S
> +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +	data <<COMMIT
> +	commit N
> +	COMMIT
> +	M 100644 :103x hello.c
> +	EOF
> +	cat err &&
> +	grep -q "Missing space after mark" err

Is this using "grep -q" to avoid repeating the same line in the output
twice?  It seems better to use plain grep or test_i18ngrep.

I'm also worried that if someone wants to change these messages
(perhaps to make the 'm' in "Missing" lowercase or something), they
will have to change all of these tests.  If we want to be absolutely
sure that git detects the right error instead of something else, I
would suggest

	test_i18ngrep "space after mark" message

I'm also not convinced the error message is worth checking at all ---
as long as fast-import errors out, won't the frontend author be able
to look in the logs to find out the problematic line anyway?

> +'
> +
> +test_expect_failure 'S: filemodify inline no space' '
> +	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> +	commit refs/heads/S
> +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +	data <<COMMIT
> +	commit N
> +	COMMIT
> +	M 100644 inlineX hello.c
> +	data <<BLOB
> +	inline
> +	BLOB
> +	EOF
> +	cat err &&
> +	grep -q "Missing space after .inline." err

Does this fail because the error message is "Missing space after SHA1"
instead?  I'm not sure that's actually a bug, unless we want to
correctly nitpick that the keyword "inline" that is a stand-in for an
object name is not itself one.

I don't think the tests for exact error messages make too much sense
without the next patch, so I would suggest leaving them out if this
patch is supposed to be applicable on its own.

Thanks for some thorough tests.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references
  2012-04-03  1:51   ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
@ 2012-04-03 14:20     ` Jonathan Nieder
  2012-04-04  1:20       ` Pete Wyckoff
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-03 14:20 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano, Johan Herland

(cc-ing Johan for noteimport code)
Pete Wyckoff wrote:

>                    Fast-import does not complain when garbage
> appears after a mark reference in some cases.

Thanks for fixing it.

[...]
> +++ b/fast-import.c
[...]
> @@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
>  
>  	if (*p == ':') {
>  		char *x;
> -		oe = find_mark(strtoumax(p + 1, &x, 10));
> +		oe = find_mark(parse_mark_ref_space(p, &x));
>  		hashcpy(sha1, oe->idx.sha1);
>  		p = x;

Simpler:

	if (*p == ':') {
		oe = find_mark(parse_mark_ref_space(p, &p));
		hashcpy(sha1, oe->idx.sha1);
	} else if ...

>  	} else if (!prefixcmp(p, "inline")) {
>  		inline_data = 1;
>  		p += 6;
> +		if (*p != ' ')
> +			die("Missing space after 'inline': %s",
> +			    command_buf.buf);
>  	} else {
>  		if (get_sha1_hex(p, sha1))
>  			die("Invalid SHA1: %s", command_buf.buf);

If I write

	M 100644 inliness some/path/to/file

was my mistake actually leaving out a space after 'inline' or
was it using an invalid <dataref>?

I think the latter, so I would suggest

	} else if (!prefixcmp(p, "inline ")) {
		inline_data = 1;
		p += strlen("inline");	/* advance to space */
	} else {
		if (get_sha1_hex(p, sha1))
			...

[...]
>  	}
> -	if (*p++ != ' ')
> -		die("Missing space after SHA1: %s", command_buf.buf);
> +	++p;  /* skip space */

I guess I'd suggest

	assert(*p == ' ');
	p++;

as defense against coders introducing additional cases that
are not as careful.

> @@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
>  	/* <dataref> or 'inline' */
>  	if (*p == ':') {
>  		char *x;
> -		oe = find_mark(strtoumax(p + 1, &x, 10));
> +		oe = find_mark(parse_mark_ref_space(p, &x));
>  		hashcpy(sha1, oe->idx.sha1);
>  		p = x;

Likewise (btw, why doesn't this share code with the filemodify case?):

	if (*p == ':') {
		oe = find_mark(parse_mark_with_trailing_space(p, &p));
		hashcpy(sha1, oe->idx.sha1);
	} else if ...

and so on.

[...]
> @@ -2430,7 +2489,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
>  			die("Can't add a note on empty branch.");
>  		hashcpy(commit_sha1, s->sha1);
>  	} else if (*p == ':') {
> -		uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
> +		uintmax_t commit_mark = parse_mark_ref_eol(p);
>  		struct object_entry *commit_oe = find_mark(commit_mark);
>  		if (commit_oe->type != OBJ_COMMIT)
>  			die("Mark :%" PRIuMAX " not a commit", commit_mark);
> @@ -2537,7 +2596,7 @@ static int parse_from(struct branch *b)
>  		hashcpy(b->branch_tree.versions[0].sha1, t);
>  		hashcpy(b->branch_tree.versions[1].sha1, t);
>  	} else if (*from == ':') {
> -		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
> +		uintmax_t idnum = parse_mark_ref_eol(from);

The title feature.  Nice.

[...]
> @@ -2945,9 +2999,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
>  
>  	if (**p == ':') {	/* <mark> */
>  		char *endptr;
> -		e = find_mark(strtoumax(*p + 1, &endptr, 10));
> -		if (endptr == *p + 1)
> -			die("Invalid mark: %s", command_buf.buf);
> +		e = find_mark(parse_mark_ref_space(*p, &endptr));
>  		if (!e)
>  			die("Unknown mark: %s", command_buf.buf);
>  		*p = endptr;

Simpler:

	if (**p == ':') {
		e = find_mark(parse_mark_...(*p, p));
		if (!e)
			die(...);
	} else {

> @@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
>  	} else {	/* <sha1> */
>  		if (get_sha1_hex(*p, sha1))
>  			die("Invalid SHA1: %s", command_buf.buf);
> -		e = find_object(sha1);
>  		*p += 40;
> +		if (**p != ' ')
> +			die("Missing space after SHA1: %s", command_buf.buf);
> +		e = find_object(sha1);

This seems dangerous.  What if a new caller arises that wants to
parse a <dataref> representing a tree-ish at the end of the line?

So I think checking the character after the tree-ish should still
be the caller's responsibility.

>  	}
> +	*p += 1;  /* skip space */

If other patches in flight use the same function, they would expect
*p to point to the space when parse_treeish_dataref returns.  If we
wanted to change that (as mentioned above I don't think we ought to)
then the function's name should be changed to force such new callers
not to compile.

> @@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
>  		root = new_tree_entry();
>  		hashcpy(root->versions[1].sha1, e->idx.sha1);
>  		load_tree(root);
> -		if (*p++ != ' ')
> -			die("Missing space after tree-ish: %s", command_buf.buf);

(here's the caller).

Except where noted above, this looks good.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
  2012-04-03 14:00     ` Jonathan Nieder
@ 2012-04-04  0:46       ` Pete Wyckoff
  2012-04-04  5:43         ` Jonathan Nieder
  0 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-04  0:46 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier, Junio C Hamano

jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:00 -0500:
> Pete Wyckoff wrote:
> > +
> > +#
> > +# filemodify, three datarefs
> > +#
> > +test_expect_failure 'S: filemodify markref no space' '
> 
> What is this testing for?  The ideal is that each test_expect_foo line
> contains a proposition and the test checks whether that proposition is
> true or false.  For example:
> 
> 	test_expect_failure 'S: filemodify with garbage after mark errors out' '
> 
> Likewise in later tests.

I've fixed these locally, thanks for reading them!

> > +	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> > +	commit refs/heads/S
> > +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> > +	data <<COMMIT
> > +	commit N
> > +	COMMIT
> > +	M 100644 :103x hello.c
> > +	EOF
> > +	cat err &&
> > +	grep -q "Missing space after mark" err
> 
> Is this using "grep -q" to avoid repeating the same line in the output
> twice?  It seems better to use plain grep or test_i18ngrep.
> 
> I'm also worried that if someone wants to change these messages
> (perhaps to make the 'm' in "Missing" lowercase or something), they
> will have to change all of these tests.  If we want to be absolutely
> sure that git detects the right error instead of something else, I
> would suggest
> 
> 	test_i18ngrep "space after mark" message
> 
> I'm also not convinced the error message is worth checking at all ---
> as long as fast-import errors out, won't the frontend author be able
> to look in the logs to find out the problematic line anyway?

I'm a bit confused.  I read through 5e9637c (i18n: add
infrastructure for translating Git with gettext, 2011-11-18), and
GETTEXT_POISON seems to be used to find untranslated messages.
What I want to test here is that the functionality works: do the
right untranslated messages get printed.

Changing the "Missing" to "missing" would require fixing the
tests, and that seems okay.

I'm happy to drop the "-q" and drop the "Missing", but wonder if
you're looking for something deeper.

> > +test_expect_failure 'S: filemodify inline no space' '
> > +	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> > +	commit refs/heads/S
> > +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> > +	data <<COMMIT
> > +	commit N
> > +	COMMIT
> > +	M 100644 inlineX hello.c
> > +	data <<BLOB
> > +	inline
> > +	BLOB
> > +	EOF
> > +	cat err &&
> > +	grep -q "Missing space after .inline." err
> 
> Does this fail because the error message is "Missing space after SHA1"
> instead?  I'm not sure that's actually a bug, unless we want to
> correctly nitpick that the keyword "inline" that is a stand-in for an
> object name is not itself one.

The form of the message matters.  Datarefs can be inline, SHA1s,
or marks.  It is confusing to see an error message about a SHA1
when the input stream has no SHA1s.  The existing code always
says "SHA1", which is wrong if you gave it a mark or an inline.

> I don't think the tests for exact error messages make too much sense
> without the next patch, so I would suggest leaving them out if this
> patch is supposed to be applicable on its own.
> 
> Thanks for some thorough tests.

You think I should squash it all together, then?  Or factor the
tests into two chunks:
    - tests for behavior that silently accepts broken input; and,
    - tests for behavior where the bogus input is detcetd, but
      incorrect error messages are given?

		-- Pete

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references
  2012-04-03 14:20     ` Jonathan Nieder
@ 2012-04-04  1:20       ` Pete Wyckoff
  2012-04-04  5:32         ` Jonathan Nieder
  0 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-04  1:20 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano, Johan Herland

jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:20 -0500:
> (cc-ing Johan for noteimport code)

Thanks, glad you noticed.

> Pete Wyckoff wrote:
> [...]
> > +++ b/fast-import.c
> [...]
> > @@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
> >  
> >  	if (*p == ':') {
> >  		char *x;
> > -		oe = find_mark(strtoumax(p + 1, &x, 10));
> > +		oe = find_mark(parse_mark_ref_space(p, &x));
> >  		hashcpy(sha1, oe->idx.sha1);
> >  		p = x;
> 
> Simpler:
> 
> 	if (*p == ':') {
> 		oe = find_mark(parse_mark_ref_space(p, &p));
> 		hashcpy(sha1, oe->idx.sha1);
> 	} else if ...

Yes.  I thought about just passing in plain old &p.  Even though
these approaches would work, it is a bit more difficult for
novice C coders to read.  Figured we should err on the side of
helping future code readers.  I can add more cleverness if you
feel strongly.

> >  	} else if (!prefixcmp(p, "inline")) {
> >  		inline_data = 1;
> >  		p += 6;
> > +		if (*p != ' ')
> > +			die("Missing space after 'inline': %s",
> > +			    command_buf.buf);
> >  	} else {
> >  		if (get_sha1_hex(p, sha1))
> >  			die("Invalid SHA1: %s", command_buf.buf);
> 
> If I write
> 
> 	M 100644 inliness some/path/to/file
> 
> was my mistake actually leaving out a space after 'inline' or
> was it using an invalid <dataref>?
> 
> I think the latter, so I would suggest
> 
> 	} else if (!prefixcmp(p, "inline ")) {
> 		inline_data = 1;
> 		p += strlen("inline");	/* advance to space */
> 	} else {
> 		if (get_sha1_hex(p, sha1))
> 			...

Insead of "Missing space after 'inline'", you'll get "Invalid
SHA1".  You misspelled "inline" with "inliness"?  And would
prefer to be told you provided an invalid SHA1?

I'm tempted to guess that any string starting with "inline", e.g.
"inlinePath/To/File" without a space is still a good indication
that they were trying to say "inline ".  The chance that they
horribly typed a SHA1, or really had a path staring with "inline"
and forgot the dataref entirely, feel less likely.

> [...]
> >  	}
> > -	if (*p++ != ' ')
> > -		die("Missing space after SHA1: %s", command_buf.buf);
> > +	++p;  /* skip space */
> 
> I guess I'd suggest
> 
> 	assert(*p == ' ');
> 	p++;
> 
> as defense against coders introducing additional cases that
> are not as careful.

Good suggestion, thanks.

> > @@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
> >  	/* <dataref> or 'inline' */
> >  	if (*p == ':') {
> >  		char *x;
> > -		oe = find_mark(strtoumax(p + 1, &x, 10));
> > +		oe = find_mark(parse_mark_ref_space(p, &x));
> >  		hashcpy(sha1, oe->idx.sha1);
> >  		p = x;
> 
> Likewise (btw, why doesn't this share code with the filemodify case?):
> 
> 	if (*p == ':') {
> 		oe = find_mark(parse_mark_with_trailing_space(p, &p));
> 		hashcpy(sha1, oe->idx.sha1);
> 	} else if ...
> 
> and so on.

It does feel like a good opportunity for some refactoring.  Two
out of the three callers to parse an mark followed by a space
could be put together here.

[..]
> > @@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
> >  	} else {	/* <sha1> */
> >  		if (get_sha1_hex(*p, sha1))
> >  			die("Invalid SHA1: %s", command_buf.buf);
> > -		e = find_object(sha1);
> >  		*p += 40;
> > +		if (**p != ' ')
> > +			die("Missing space after SHA1: %s", command_buf.buf);
> > +		e = find_object(sha1);
> 
> This seems dangerous.  What if a new caller arises that wants to
> parse a <dataref> representing a tree-ish at the end of the line?
> 
> So I think checking the character after the tree-ish should still
> be the caller's responsibility.

I was close to just moving parse_treeish_dataref() into its
single caller, parse_ls(), just so we wouldn't have to think
about this.

There are two cases it handles:  mark and sha1.  The mark case
uses the handy new parse_mark_ref_space(), which does the space
checking.  The sha1 branch had no check in this function.  So
I hoisted the space check up to make the branches symmetrical.

> >  	}
> > +	*p += 1;  /* skip space */
> 
> If other patches in flight use the same function, they would expect
> *p to point to the space when parse_treeish_dataref returns.  If we
> wanted to change that (as mentioned above I don't think we ought to)
> then the function's name should be changed to force such new callers
> not to compile.
> 
> > @@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
> >  		root = new_tree_entry();
> >  		hashcpy(root->versions[1].sha1, e->idx.sha1);
> >  		load_tree(root);
> > -		if (*p++ != ' ')
> > -			die("Missing space after tree-ish: %s", command_buf.buf);
> 
> (here's the caller).

I would prefer just to inline the whole thing.  Or new name
parse_ls_dataref() if you have a preference.

> Except where noted above, this looks good.

Thanks.

		-- Pete

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references
  2012-04-04  1:20       ` Pete Wyckoff
@ 2012-04-04  5:32         ` Jonathan Nieder
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-04  5:32 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano, Johan Herland

Pete Wyckoff wrote:
> jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:20 -0500:

>> Simpler:
>> 
>> 	if (*p == ':') {
>> 		oe = find_mark(parse_mark_ref_space(p, &p));
>> 		hashcpy(sha1, oe->idx.sha1);
>> 	} else if ...
>
> Yes.  I thought about just passing in plain old &p.  Even though
> these approaches would work, it is a bit more difficult for
> novice C coders to read.  Figured we should err on the side of
> helping future code readers.  I can add more cleverness if you
> feel strongly.

It would be clearest with one argument, like so:

		oe = find_mark(parse_mark_...(&p));
		hashcpy(sha1, oe->idx.sha1);

[...]
> Insead of "Missing space after 'inline'", you'll get "Invalid
> SHA1".  You misspelled "inline" with "inliness"?  And would
> prefer to be told you provided an invalid SHA1?

It wasn't a great example, but what I meant is that if someone
asked me, a human, to parse

	M 100644 foobar path/to/file

I would assume that foobar is a <dataref>.  Likewise, for any
string baz in

	M 100644 baz path/to/file

including strings that start with "inline", except for "inline"
itself.

To put it another way: checking for 'inline' at the start of a word as
a way to check for typos seems odd to me.  We do not diagnose

	M 100644 Inline path/to/file

as a misspelled version of "inline", nor

	M 100644inline path/to/file

as an instance of a missing space character, and we shouldn't.

The goal in fast-import's behavior is usually predictability and
simplicity in terms of the mental model of the person writing a
frontend.  Trying to guess the user's intention on malformed input
only takes away from that goal.

Why I care: if some day git permits other kinds of <dataref> (for
example if it supports refnames some day), I do not want datarefs
beginning with "inline" to be forbidden.

[...]
> There are two cases it handles:  mark and sha1.  The mark case
> uses the handy new parse_mark_ref_space(), which does the space
> checking.  The sha1 branch had no check in this function.  So
> I hoisted the space check up to make the branches symmetrical.

I think it's ok to sacrifice symmetry here, but:

[...]
> I would prefer just to inline the whole thing.  Or new name
> parse_ls_dataref() if you have a preference.

if changing the behavior of the function that parses a treeish dataref
seems right, that's fine with me as long as its name or signature
changes.

For example, it could become

	static struct object_entry *parse_treeish(const char **p);

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
  2012-04-04  0:46       ` Pete Wyckoff
@ 2012-04-04  5:43         ` Jonathan Nieder
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-04  5:43 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier, Junio C Hamano

On Tue, Apr 03, 2012 at 08:46:10PM -0400, Pete Wyckoff wrote:
> jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:00 -0500:

>> Is this using "grep -q" to avoid repeating the same line in the output
>> twice?  It seems better to use plain grep or test_i18ngrep.
[...]
> What I want to test here is that the functionality works: do the
> right untranslated messages get printed.
>
> Changing the "Missing" to "missing" would require fixing the
> tests, and that seems okay.

Let me reiterate this a little then.

Suppose I mark the messages in fast-import.c with _() so they get
translated.  Then your tests will fail, so I have to tweak them.  Fine
--- the test tweaks take some time, but they're doable.  Nothing lost,
right?

No, something major would be lost.

Tests normally save later coders time, by giving immediate feedback
that they would normally only get by letting a feature be used over a
long time by real users.  They also dissuade people from changing
git's behavior without thinking carefully about the consequences ---
each broken test represents a class of script or user expectation that
is potentially being broken.

Similarly, a test that checks that git produces such-and-such exact
output is dissuading me from making certain behavior changes by adding
to the work needed to make them (I have to adjust tests, too).  So now
I am less likely to

 (1) reword the message to make it clearer in some way in response to
     user feedback

 (2) mark it for translation so the operator can see a message in her
     native language

How is making that hard in any way a good thing?

Relaxing the pattern addresses (1).  Using test_i18ngrep instead of
grep addresses (2).

Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCHv3] fast-import: tighten parsing of mark references
  2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
                     ` (2 preceding siblings ...)
  2012-04-03  2:00   ` [PATCHv2 0/2] " Sverre Rabbelier
@ 2012-04-05  1:51   ` Pete Wyckoff
  2012-04-05  2:24     ` Jonathan Nieder
  2012-04-07 22:59     ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
  3 siblings, 2 replies; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-05  1:51 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano, Johan Herland

The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference.  Fast-import does not complain when garbage
appears after a mark reference in some cases.

Factor out parsing of mark references and complain if
errant characters are found.

Buggy input can cause fast-import to produce the wrong output,
silently, without error.  This makes it difficult to track
down buggy generators of fast-import streams.  An example is
seen in the last line of this commit command:

    commit refs/heads/S2
    committer Name <name@example.com> 1112912893 -0400
    data <<COMMIT
    commit message
    COMMIT
    from :1M 100644 :103 hello.c

It is missing a newline and should be:

    [...]
    from :1
    M 100644 :103 hello.c

What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^.  What the buggy
program was expecting was the contents of blob :103.  While
the resulting commit graph looked correct, the contents in
some commits were wrong.
---

This addresses all of Jonathan's comments, in particular:

  - give tests descriptive names

  - add asserts for trailing space for filemodify, notemodify
    to protect against flaws in future dataref implementions

  - compactify end pointer return and update, so:

      oe = find_mark(parse_mark_ref_space(&p));

  - replace "grep -q" with "test_i18ngrep"

  - drop first word in failure messages, so no "Missing"
    or "Garbage", resp., in:

      test_i18ngrep "space after SHA1" err
      test_i18ngrep "after mark" err

  - Erroneous datarefs "inlineX" and "no-such-dataref" should
    behave the same, in particular, they now complain "Invalid SHA1"
    rather than guessing an attempt at "inline ".

  - Revert change parse_treeish_dataref() API in case other
    changes are inflight.  Verify space handling in caller.

I did not refactor the 15 or so common lines in filemodify and
notemodify dataref handling.

I'll resend once 1.7.11 opens up.

Thanks all for the careful review.

		-- Pete

 fast-import.c          |  102 +++++++++++++-----
 t/t9300-fast-import.sh |  276 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 349 insertions(+), 29 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index a85275d..0525e12 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2207,6 +2207,59 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
 	return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
 }
 
+/*
+ * Given a pointer into a string, parse a mark reference:
+ *
+ *   idnum ::= ':' bigint;
+ *
+ * Return the first character after the value in *endptr.
+ *
+ * Complain if the following character is not what is expected,
+ * either a space or end of the string.
+ */
+static uintmax_t parse_mark_ref(const char *p, char **endptr)
+{
+	uintmax_t mark;
+
+	assert(*p == ':');
+	++p;
+	mark = strtoumax(p, endptr, 10);
+	if (*endptr == p)
+		die("No value after ':' in mark: %s", command_buf.buf);
+	return mark;
+}
+
+/*
+ * Parse the mark reference, and complain if this is not the end of
+ * the string.
+ */
+static uintmax_t parse_mark_ref_eol(const char *p)
+{
+	char *end;
+	uintmax_t mark;
+
+	mark = parse_mark_ref(p, &end);
+	if (*end != '\0')
+		die("Garbage after mark: %s", command_buf.buf);
+	return mark;
+}
+
+/*
+ * Parse the mark reference, demanding a trailing space.  Return a
+ * pointer to the space.
+ */
+static uintmax_t parse_mark_ref_space(const char **p)
+{
+	uintmax_t mark;
+	char *end;
+
+	mark = parse_mark_ref(*p, &end);
+	if (*end != ' ')
+		die("Missing space after mark: %s", command_buf.buf);
+	*p = end;
+	return mark;
+}
+
 static void file_change_m(struct branch *b)
 {
 	const char *p = command_buf.buf + 2;
@@ -2235,21 +2288,21 @@ static void file_change_m(struct branch *b)
 	}
 
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
+		oe = find_mark(parse_mark_ref_space(&p));
 		hashcpy(sha1, oe->idx.sha1);
-		p = x;
-	} else if (!prefixcmp(p, "inline")) {
+	} else if (!prefixcmp(p, "inline ")) {
 		inline_data = 1;
-		p += 6;
+		p += strlen("inline");  /* advance to space */
 	} else {
 		if (get_sha1_hex(p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
 		oe = find_object(sha1);
 		p += 40;
+		if (*p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
 	}
-	if (*p++ != ' ')
-		die("Missing space after SHA1: %s", command_buf.buf);
+	assert(*p == ' ');
+	++p;  /* skip space */
 
 	strbuf_reset(&uq);
 	if (!unquote_c_style(&uq, p, &endp)) {
@@ -2407,21 +2460,21 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
 	/* Now parse the notemodify command. */
 	/* <dataref> or 'inline' */
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
+		oe = find_mark(parse_mark_ref_space(&p));
 		hashcpy(sha1, oe->idx.sha1);
-		p = x;
-	} else if (!prefixcmp(p, "inline")) {
+	} else if (!prefixcmp(p, "inline ")) {
 		inline_data = 1;
-		p += 6;
+		p += strlen("inline");  /* advance to space */
 	} else {
 		if (get_sha1_hex(p, sha1))
 			die("Invalid SHA1: %s", command_buf.buf);
 		oe = find_object(sha1);
 		p += 40;
+		if (*p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
 	}
-	if (*p++ != ' ')
-		die("Missing space after SHA1: %s", command_buf.buf);
+	assert(*p == ' ');
+	++p;  /* skip space */
 
 	/* <committish> */
 	s = lookup_branch(p);
@@ -2430,7 +2483,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
 			die("Can't add a note on empty branch.");
 		hashcpy(commit_sha1, s->sha1);
 	} else if (*p == ':') {
-		uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+		uintmax_t commit_mark = parse_mark_ref_eol(p);
 		struct object_entry *commit_oe = find_mark(commit_mark);
 		if (commit_oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", commit_mark);
@@ -2537,7 +2590,7 @@ static int parse_from(struct branch *b)
 		hashcpy(b->branch_tree.versions[0].sha1, t);
 		hashcpy(b->branch_tree.versions[1].sha1, t);
 	} else if (*from == ':') {
-		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+		uintmax_t idnum = parse_mark_ref_eol(from);
 		struct object_entry *oe = find_mark(idnum);
 		if (oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2572,7 +2625,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 		if (s)
 			hashcpy(n->sha1, s->sha1);
 		else if (*from == ':') {
-			uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+			uintmax_t idnum = parse_mark_ref_eol(from);
 			struct object_entry *oe = find_mark(idnum);
 			if (oe->type != OBJ_COMMIT)
 				die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2735,7 +2788,7 @@ static void parse_new_tag(void)
 		type = OBJ_COMMIT;
 	} else if (*from == ':') {
 		struct object_entry *oe;
-		from_mark = strtoumax(from + 1, NULL, 10);
+		from_mark = parse_mark_ref_eol(from);
 		oe = find_mark(from_mark);
 		type = oe->type;
 		hashcpy(sha1, oe->idx.sha1);
@@ -2867,14 +2920,9 @@ static void parse_cat_blob(void)
 	/* cat-blob SP <object> LF */
 	p = command_buf.buf + strlen("cat-blob ");
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
-		if (x == p + 1)
-			die("Invalid mark: %s", command_buf.buf);
+		oe = find_mark(parse_mark_ref_eol(p));
 		if (!oe)
 			die("Unknown mark: %s", command_buf.buf);
-		if (*x)
-			die("Garbage after mark: %s", command_buf.buf);
 		hashcpy(sha1, oe->idx.sha1);
 	} else {
 		if (get_sha1_hex(p, sha1))
@@ -2944,13 +2992,9 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	struct object_entry *e;
 
 	if (**p == ':') {	/* <mark> */
-		char *endptr;
-		e = find_mark(strtoumax(*p + 1, &endptr, 10));
-		if (endptr == *p + 1)
-			die("Invalid mark: %s", command_buf.buf);
+		e = find_mark(parse_mark_ref_space(p));
 		if (!e)
 			die("Unknown mark: %s", command_buf.buf);
-		*p = endptr;
 		hashcpy(sha1, e->idx.sha1);
 	} else {	/* <sha1> */
 		if (get_sha1_hex(*p, sha1))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..cbc0e81 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2635,4 +2635,280 @@ test_expect_success \
 	'n=$(grep $a verify | wc -l) &&
 	 test 1 = $n'
 
+###
+### series S
+###
+#
+# Setup is roughly this.  Commits marked 1,2,3,4.  Blobs
+# marked 100 + commit.  Notes 200 +.  Make sure missing spaces
+# and EOLs after mark references cause errors.
+#
+# The error message looks like either:
+#   Missing space after ..
+# or
+#   Garbage after ..
+#
+# 1--2--4
+#  \   /
+#   -3-
+#
+test_tick
+
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :1
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 2
+COMMIT
+from :1
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+
+blob
+mark :103
+data <<BLOB
+blob 3
+BLOB
+
+blob
+mark :202
+data <<BLOB
+note 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: initialize for S tests' '
+	git fast-import --export-marks=marks <input
+'
+
+#
+# filemodify, three datarefs
+#
+test_expect_success 'S: filemodify with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 :103x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after mark" err
+'
+
+# inline is misspelled; fast-import thinks it is some unknown dataref
+# and complains "Invalid SHA1"
+test_expect_success 'S: filemodify with garbage after inline must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 inlineX hello.c
+	data <<BLOB
+	inline
+	BLOB
+	EOF
+	cat err &&
+	test_i18ngrep "nvalid SHA1" err
+'
+
+test_expect_success 'S: filemodify with garbage after sha1 must fail' '
+	sha1=$(grep -w :103 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 ${sha1}x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, three ways to say dataref
+#
+test_expect_success 'S: notemodify with garabge after mark dataref must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref markref
+	COMMIT
+	N :103x :2
+	EOF
+	cat err &&
+	test_i18ngrep "space after mark" err
+'
+
+# inline is misspelled; fast-import thinks it is some unknown dataref
+# and complains "Invalid SHA1"
+test_expect_success 'S: notemodify with garbage after inline dataref must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref inline
+	COMMIT
+	N inlineX :2
+	data <<BLOB
+	note blob
+	BLOB
+	EOF
+	cat err &&
+	test_i18ngrep "nvalid SHA1" err
+'
+
+test_expect_success 'S: notemodify with garbage after sha1 dataref must fail' '
+	sha1=$(grep -w :2 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref sha1
+	COMMIT
+	N ${sha1}x :2
+	EOF
+	cat err &&
+	test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, mark in committish
+#
+test_expect_success 'S: notemodify with garbarge after mark committish must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/Snotes
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note committish
+	COMMIT
+	N :202 :2x
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# from
+#
+test_expect_success 'S: from with garbage after mark must fail' '
+	# no &&
+	git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
+	commit refs/heads/S2
+	mark :3
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :1x
+	M 100644 :103 hello.c
+	EOF
+
+	ret=$? &&
+	echo returned $ret &&
+	test $ret -ne 0 && # failed, but it created the commit
+
+	# go create the commit, need it for merge test
+	git fast-import --import-marks=marks --export-marks=marks <<-EOF &&
+	commit refs/heads/S2
+	mark :3
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :1
+	M 100644 :103 hello.c
+	EOF
+
+	# now evaluate the error
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+
+#
+# merge
+#
+test_expect_success 'S: merge with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	mark :4
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :2
+	merge :3x
+	M 100644 :103 hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# tag, from markref
+#
+test_expect_success 'S: tag with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	tag refs/tags/Stag
+	from :2x
+	tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<TAG
+	tag S
+	TAG
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# cat-blob markref
+#
+test_expect_success 'S: cat-blob with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	cat-blob :2x
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# ls markref
+#
+test_expect_success 'S: ls with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	ls :2x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after mark" err
+'
+
+test_expect_success 'S: ls with garbage after sha1 must fail' '
+	sha1=$(grep -w :2 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	ls ${sha1}x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after tree-ish" err
+'
+
 test_done
-- 
1.7.10.rc2.62.gac32b.dirty

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCHv3] fast-import: tighten parsing of mark references
  2012-04-05  1:51   ` [PATCHv3] " Pete Wyckoff
@ 2012-04-05  2:24     ` Jonathan Nieder
  2012-04-05 17:20       ` Junio C Hamano
  2012-04-07 22:59     ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
  1 sibling, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-05  2:24 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano, Johan Herland

Pete Wyckoff wrote:

> This addresses all of Jonathan's comments, in particular:

Nice.  Thanks much.  I only have a few small worries left:

[...]
> +++ b/t/t9300-fast-import.sh
> @@ -2635,4 +2635,280 @@ test_expect_success \
[...]
> +test_expect_success 'S: filemodify with garbage after sha1 must fail' '
> +	sha1=$(grep -w :103 marks | cut -d\  -f2) &&

"grep -w" isn't used elsewhere in the testsuite.  Is it portable?

[...]
> +# inline is misspelled; fast-import thinks it is some unknown dataref
> +# and complains "Invalid SHA1"
> +test_expect_success 'S: notemodify with garbage after inline dataref must fail' '
> +	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> +	commit refs/heads/S
> +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +	data <<COMMIT
> +	commit S note dataref inline
> +	COMMIT
> +	N inlineX :2
> +	data <<BLOB
> +	note blob
> +	BLOB
> +	EOF
> +	cat err &&
> +	test_i18ngrep "nvalid SHA1" err
> +'

If I understood the discussion before correctly, this error message is
suboptimal and something like "invalid dataref" would be a little
clearer, right?

That's orthogonal to what this patch is about so I'm not suggesting
changing it.  But shouldn't the test just check that fast-import fails
without committing to any particular message?

Cheers,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHv3] fast-import: tighten parsing of mark references
  2012-04-05  2:24     ` Jonathan Nieder
@ 2012-04-05 17:20       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2012-04-05 17:20 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Pete Wyckoff, git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Johan Herland

Jonathan Nieder <jrnieder@gmail.com> writes:

> Pete Wyckoff wrote:
>
>> This addresses all of Jonathan's comments, in particular:
>
> Nice.  Thanks much.  I only have a few small worries left:
>
> [...]
>> +++ b/t/t9300-fast-import.sh
>> @@ -2635,4 +2635,280 @@ test_expect_success \
> [...]
>> +test_expect_success 'S: filemodify with garbage after sha1 must fail' '
>> +	sha1=$(grep -w :103 marks | cut -d\  -f2) &&
>
> "grep -w" isn't used elsewhere in the testsuite.  Is it portable?

It is not portable enough.

> If I understood the discussion before correctly, this error message is
> suboptimal and something like "invalid dataref" would be a little
> clearer, right?
>
> That's orthogonal to what this patch is about so I'm not suggesting
> changing it.  But shouldn't the test just check that fast-import fails
> without committing to any particular message?

That would certainly make more sense.

Thanks for being extra careful.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCHv4] fast-import: tighten parsing of datarefs
  2012-04-05  1:51   ` [PATCHv3] " Pete Wyckoff
  2012-04-05  2:24     ` Jonathan Nieder
@ 2012-04-07 22:59     ` Pete Wyckoff
  2012-04-10 21:40       ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-07 22:59 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
	Junio C Hamano, Johan Herland

The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference.  Fast-import does not complain when garbage
appears after a mark reference in some cases.

Factor out parsing of mark references and complain if
errant characters are found.  Also be a little more careful
when parsing "inline" and SHA1s, complaining if extra
characters appear or if the form of the dataref is unrecognized.

Buggy input can cause fast-import to produce the wrong output,
silently, without error.  This makes it difficult to track
down buggy generators of fast-import streams.  An example is
seen in the last line of this commit command:

    commit refs/heads/S2
    committer Name <name@example.com> 1112912893 -0400
    data <<COMMIT
    commit message
    COMMIT
    from :1M 100644 :103 hello.c

It is missing a newline and should be:

    [...]
    from :1
    M 100644 :103 hello.c

What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^.  What the buggy
program was expecting was the contents of blob :103.  While
the resulting commit graph looked correct, the contents in
some commits were wrong.

Signed-off-by: Pete Wyckoff <pw@padd.com>
---

Thanks for the patient comments.  Changes from v3:

    - say "invalid dataref" when neither a sha1 nor
      "inline " for a mark could be correctly parsed, including
      requisite spaces

    - avoid "grep -w": rearrange marks used in the tests
      so they are easily greppable without -w

 fast-import.c          |  110 +++++++++++++------
 t/t9300-fast-import.sh |  287 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 364 insertions(+), 33 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index a85275d..7f1fbed 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2207,6 +2207,59 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
 	return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
 }
 
+/*
+ * Given a pointer into a string, parse a mark reference:
+ *
+ *   idnum ::= ':' bigint;
+ *
+ * Return the first character after the value in *endptr.
+ *
+ * Complain if the following character is not what is expected,
+ * either a space or end of the string.
+ */
+static uintmax_t parse_mark_ref(const char *p, char **endptr)
+{
+	uintmax_t mark;
+
+	assert(*p == ':');
+	++p;
+	mark = strtoumax(p, endptr, 10);
+	if (*endptr == p)
+		die("No value after ':' in mark: %s", command_buf.buf);
+	return mark;
+}
+
+/*
+ * Parse the mark reference, and complain if this is not the end of
+ * the string.
+ */
+static uintmax_t parse_mark_ref_eol(const char *p)
+{
+	char *end;
+	uintmax_t mark;
+
+	mark = parse_mark_ref(p, &end);
+	if (*end != '\0')
+		die("Garbage after mark: %s", command_buf.buf);
+	return mark;
+}
+
+/*
+ * Parse the mark reference, demanding a trailing space.  Return a
+ * pointer to the space.
+ */
+static uintmax_t parse_mark_ref_space(const char **p)
+{
+	uintmax_t mark;
+	char *end;
+
+	mark = parse_mark_ref(*p, &end);
+	if (*end != ' ')
+		die("Missing space after mark: %s", command_buf.buf);
+	*p = end;
+	return mark;
+}
+
 static void file_change_m(struct branch *b)
 {
 	const char *p = command_buf.buf + 2;
@@ -2235,21 +2288,21 @@ static void file_change_m(struct branch *b)
 	}
 
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
+		oe = find_mark(parse_mark_ref_space(&p));
 		hashcpy(sha1, oe->idx.sha1);
-		p = x;
-	} else if (!prefixcmp(p, "inline")) {
+	} else if (!prefixcmp(p, "inline ")) {
 		inline_data = 1;
-		p += 6;
+		p += strlen("inline");  /* advance to space */
 	} else {
 		if (get_sha1_hex(p, sha1))
-			die("Invalid SHA1: %s", command_buf.buf);
+			die("Invalid dataref: %s", command_buf.buf);
 		oe = find_object(sha1);
 		p += 40;
+		if (*p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
 	}
-	if (*p++ != ' ')
-		die("Missing space after SHA1: %s", command_buf.buf);
+	assert(*p == ' ');
+	++p;  /* skip space */
 
 	strbuf_reset(&uq);
 	if (!unquote_c_style(&uq, p, &endp)) {
@@ -2407,21 +2460,21 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
 	/* Now parse the notemodify command. */
 	/* <dataref> or 'inline' */
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
+		oe = find_mark(parse_mark_ref_space(&p));
 		hashcpy(sha1, oe->idx.sha1);
-		p = x;
-	} else if (!prefixcmp(p, "inline")) {
+	} else if (!prefixcmp(p, "inline ")) {
 		inline_data = 1;
-		p += 6;
+		p += strlen("inline");  /* advance to space */
 	} else {
 		if (get_sha1_hex(p, sha1))
-			die("Invalid SHA1: %s", command_buf.buf);
+			die("Invalid dataref: %s", command_buf.buf);
 		oe = find_object(sha1);
 		p += 40;
+		if (*p != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
 	}
-	if (*p++ != ' ')
-		die("Missing space after SHA1: %s", command_buf.buf);
+	assert(*p == ' ');
+	++p;  /* skip space */
 
 	/* <committish> */
 	s = lookup_branch(p);
@@ -2430,7 +2483,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
 			die("Can't add a note on empty branch.");
 		hashcpy(commit_sha1, s->sha1);
 	} else if (*p == ':') {
-		uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+		uintmax_t commit_mark = parse_mark_ref_eol(p);
 		struct object_entry *commit_oe = find_mark(commit_mark);
 		if (commit_oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", commit_mark);
@@ -2537,7 +2590,7 @@ static int parse_from(struct branch *b)
 		hashcpy(b->branch_tree.versions[0].sha1, t);
 		hashcpy(b->branch_tree.versions[1].sha1, t);
 	} else if (*from == ':') {
-		uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+		uintmax_t idnum = parse_mark_ref_eol(from);
 		struct object_entry *oe = find_mark(idnum);
 		if (oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2572,7 +2625,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 		if (s)
 			hashcpy(n->sha1, s->sha1);
 		else if (*from == ':') {
-			uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+			uintmax_t idnum = parse_mark_ref_eol(from);
 			struct object_entry *oe = find_mark(idnum);
 			if (oe->type != OBJ_COMMIT)
 				die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2735,7 +2788,7 @@ static void parse_new_tag(void)
 		type = OBJ_COMMIT;
 	} else if (*from == ':') {
 		struct object_entry *oe;
-		from_mark = strtoumax(from + 1, NULL, 10);
+		from_mark = parse_mark_ref_eol(from);
 		oe = find_mark(from_mark);
 		type = oe->type;
 		hashcpy(sha1, oe->idx.sha1);
@@ -2867,18 +2920,13 @@ static void parse_cat_blob(void)
 	/* cat-blob SP <object> LF */
 	p = command_buf.buf + strlen("cat-blob ");
 	if (*p == ':') {
-		char *x;
-		oe = find_mark(strtoumax(p + 1, &x, 10));
-		if (x == p + 1)
-			die("Invalid mark: %s", command_buf.buf);
+		oe = find_mark(parse_mark_ref_eol(p));
 		if (!oe)
 			die("Unknown mark: %s", command_buf.buf);
-		if (*x)
-			die("Garbage after mark: %s", command_buf.buf);
 		hashcpy(sha1, oe->idx.sha1);
 	} else {
 		if (get_sha1_hex(p, sha1))
-			die("Invalid SHA1: %s", command_buf.buf);
+			die("Invalid dataref: %s", command_buf.buf);
 		if (p[40])
 			die("Garbage after SHA1: %s", command_buf.buf);
 		oe = find_object(sha1);
@@ -2944,17 +2992,13 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	struct object_entry *e;
 
 	if (**p == ':') {	/* <mark> */
-		char *endptr;
-		e = find_mark(strtoumax(*p + 1, &endptr, 10));
-		if (endptr == *p + 1)
-			die("Invalid mark: %s", command_buf.buf);
+		e = find_mark(parse_mark_ref_space(p));
 		if (!e)
 			die("Unknown mark: %s", command_buf.buf);
-		*p = endptr;
 		hashcpy(sha1, e->idx.sha1);
 	} else {	/* <sha1> */
 		if (get_sha1_hex(*p, sha1))
-			die("Invalid SHA1: %s", command_buf.buf);
+			die("Invalid dataref: %s", command_buf.buf);
 		e = find_object(sha1);
 		*p += 40;
 	}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..0125413 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2634,5 +2634,292 @@ test_expect_success \
 	'R: blob appears only once' \
 	'n=$(grep $a verify | wc -l) &&
 	 test 1 = $n'
+ 
+###
+### series S
+###
+#
+# Make sure missing spaces and EOLs after mark references
+# cause errors.
+#
+# Setup:
+#
+#   1--2--4
+#    \   /
+#     -3-
+#
+#   commit marks:  301, 302, 303, 304
+#   blob marks:              403, 404, resp.
+#   note mark:          202
+#
+# The error message when a space is missing not at the
+# end of the line is:
+#
+#   Missing space after ..
+#
+# or when extra characters come after the mark at the end
+# of the line:
+#
+#   Garbage after ..
+#
+# or when the dataref is neither "inline " or a known SHA1,
+#
+#   Invalid dataref ..
+#
+test_tick
+
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :301
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :302
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 2
+COMMIT
+from :301
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+
+blob
+mark :403
+data <<BLOB
+blob 3
+BLOB
+
+blob
+mark :202
+data <<BLOB
+note 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: initialize for S tests' '
+	git fast-import --export-marks=marks <input
+'
+
+#
+# filemodify, three datarefs
+#
+test_expect_success 'S: filemodify with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 :403x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after mark" err
+'
+
+# inline is misspelled; fast-import thinks it is some unknown dataref
+test_expect_success 'S: filemodify with garbage after inline must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 inlineX hello.c
+	data <<BLOB
+	inline
+	BLOB
+	EOF
+	cat err &&
+	test_i18ngrep "nvalid dataref" err
+'
+
+test_expect_success 'S: filemodify with garbage after sha1 must fail' '
+	sha1=$(grep :403 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit N
+	COMMIT
+	M 100644 ${sha1}x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, three ways to say dataref
+#
+test_expect_success 'S: notemodify with garabge after mark dataref must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref markref
+	COMMIT
+	N :202x :302
+	EOF
+	cat err &&
+	test_i18ngrep "space after mark" err
+'
+
+test_expect_success 'S: notemodify with garbage after inline dataref must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref inline
+	COMMIT
+	N inlineX :302
+	data <<BLOB
+	note blob
+	BLOB
+	EOF
+	cat err &&
+	test_i18ngrep "nvalid dataref" err
+'
+
+test_expect_success 'S: notemodify with garbage after sha1 dataref must fail' '
+	sha1=$(grep :202 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note dataref sha1
+	COMMIT
+	N ${sha1}x :302
+	EOF
+	cat err &&
+	test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, mark in committish
+#
+test_expect_success 'S: notemodify with garbarge after mark committish must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/Snotes
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit S note committish
+	COMMIT
+	N :202 :302x
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# from
+#
+test_expect_success 'S: from with garbage after mark must fail' '
+	# no &&
+	git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
+	commit refs/heads/S2
+	mark :303
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :301x
+	M 100644 :403 hello.c
+	EOF
+
+	ret=$? &&
+	echo returned $ret &&
+	test $ret -ne 0 && # failed, but it created the commit
+
+	# go create the commit, need it for merge test
+	git fast-import --import-marks=marks --export-marks=marks <<-EOF &&
+	commit refs/heads/S2
+	mark :303
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	commit 3
+	COMMIT
+	from :301
+	M 100644 :403 hello.c
+	EOF
+
+	# now evaluate the error
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+
+#
+# merge
+#
+test_expect_success 'S: merge with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	commit refs/heads/S
+	mark :304
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	merge 4
+	COMMIT
+	from :302
+	merge :303x
+	M 100644 :403 hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# tag, from markref
+#
+test_expect_success 'S: tag with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	tag refs/tags/Stag
+	from :302x
+	tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<TAG
+	tag S
+	TAG
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# cat-blob markref
+#
+test_expect_success 'S: cat-blob with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	cat-blob :403x
+	EOF
+	cat err &&
+	test_i18ngrep "after mark" err
+'
+
+#
+# ls markref
+#
+test_expect_success 'S: ls with garbage after mark must fail' '
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	ls :302x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after mark" err
+'
+
+test_expect_success 'S: ls with garbage after sha1 must fail' '
+	sha1=$(grep :302 marks | cut -d\  -f2) &&
+	test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+	ls ${sha1}x hello.c
+	EOF
+	cat err &&
+	test_i18ngrep "space after tree-ish" err
+'
 
 test_done
-- 
1.7.10.rc2.65.gbd86d

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCHv4] fast-import: tighten parsing of datarefs
  2012-04-07 22:59     ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
@ 2012-04-10 21:40       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2012-04-10 21:40 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: git, Jonathan Nieder, Dmitry Ivankov, David Barr,
	Sverre Rabbelier, Junio C Hamano, Johan Herland

Pete Wyckoff <pw@padd.com> writes:

> The syntax for the use of mark references in fast-import
> demands either a SP (space) or LF (end-of-line) after
> a mark reference.  Fast-import does not complain when garbage
> appears after a mark reference in some cases.
> 
> Factor out parsing of mark references and complain if
> errant characters are found.  Also be a little more careful
> when parsing "inline" and SHA1s, complaining if extra
> characters appear or if the form of the dataref is unrecognized.


> +static uintmax_t parse_mark_ref(const char *p, char **endptr)
> +{
> +	uintmax_t mark;
> +
> +	assert(*p == ':');
> +	++p;
> +	mark = strtoumax(p, endptr, 10);
> +	if (*endptr == p)
> +		die("No value after ':' in mark: %s", command_buf.buf);
> +	return mark;
> +}

> +static uintmax_t parse_mark_ref_eol(const char *p)
> +{
> +...
> +}
> +
> +static uintmax_t parse_mark_ref_space(const char **p)
> +{
> +...
> +}
> +

The first helper looks sensible, but the two seemingly similar
parse_mark_ref_WHATTOEXPECT() that have different interfaces are somewhat
tasteless.

I wonder if the calling sites in file_change_m(), note_change_n(),
parse_merge(), parse_cat_blob() and parse_treeish_dataref() can be made to
share even more code by slight restructuring, though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-04-10 21:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-01 22:54 [PATCH] fast-import: catch garbage after marks in from/merge Pete Wyckoff
2012-04-01 23:12 ` Jonathan Nieder
2012-04-02  0:13   ` Pete Wyckoff
2012-04-02  6:56     ` Dmitry Ivankov
2012-04-02 16:16       ` Junio C Hamano
2012-04-02 15:43     ` Jonathan Nieder
2012-04-02 16:15 ` Junio C Hamano
2012-04-03  1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
2012-04-03  1:51   ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
2012-04-03 14:00     ` Jonathan Nieder
2012-04-04  0:46       ` Pete Wyckoff
2012-04-04  5:43         ` Jonathan Nieder
2012-04-03  1:51   ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
2012-04-03 14:20     ` Jonathan Nieder
2012-04-04  1:20       ` Pete Wyckoff
2012-04-04  5:32         ` Jonathan Nieder
2012-04-03  2:00   ` [PATCHv2 0/2] " Sverre Rabbelier
2012-04-05  1:51   ` [PATCHv3] " Pete Wyckoff
2012-04-05  2:24     ` Jonathan Nieder
2012-04-05 17:20       ` Junio C Hamano
2012-04-07 22:59     ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
2012-04-10 21:40       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.