* [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
2012-04-03 1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
@ 2012-04-03 1:51 ` Pete Wyckoff
2012-04-03 14:00 ` Jonathan Nieder
2012-04-03 1:51 ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-03 1:51 UTC (permalink / raw)
To: git
Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano
Add 15 tests to see what happens when extra characters
appear after a mark reference, in all places that take marks.
Ten of these fail.
Signed-off-by: Pete Wyckoff <pw@padd.com>
---
t/t9300-fast-import.sh | 267 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 267 insertions(+)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..621f02a 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2635,4 +2635,271 @@ test_expect_success \
'n=$(grep $a verify | wc -l) &&
test 1 = $n'
+###
+### series S
+###
+#
+# Set up is roughly this. Commits marked 1,2,3,4. Blobs
+# marked 100 + commit. Notes 200 +. Make sure missing spaces
+# and EOLs after mark references cause errors.
+#
+# 1--2--4
+# \ /
+# -3-
+#
+test_tick
+
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :1
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 2
+COMMIT
+from :1
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+
+blob
+mark :103
+data <<BLOB
+blob 3
+BLOB
+
+blob
+mark :202
+data <<BLOB
+note 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: add commits 1 and 2, and blob 103' '
+ git fast-import --export-marks=marks <input
+'
+
+#
+# filemodify, three datarefs
+#
+test_expect_failure 'S: filemodify markref no space' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 :103x hello.c
+ EOF
+ cat err &&
+ grep -q "Missing space after mark" err
+'
+
+test_expect_failure 'S: filemodify inline no space' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 inlineX hello.c
+ data <<BLOB
+ inline
+ BLOB
+ EOF
+ cat err &&
+ grep -q "Missing space after .inline." err
+'
+
+test_expect_success 'S: filemodify sha1 no space' '
+ sha1=$(grep -w :103 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 ${sha1}x hello.c
+ EOF
+ cat err &&
+ grep -q "Missing space after SHA1" err
+'
+
+#
+# notemodify, three ways to say dataref
+#
+test_expect_failure 'S: notemodify dataref markref no space' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref markref
+ COMMIT
+ N :103x :2
+ EOF
+ cat err &&
+ grep -q "Missing space after mark" err
+'
+
+test_expect_failure 'S: notemodify dataref inline no space' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref inline
+ COMMIT
+ N inlineX :2
+ data <<BLOB
+ note blob
+ BLOB
+ EOF
+ cat err &&
+ grep -q "Missing space after .inline." err
+'
+
+test_expect_success 'S: notemodify dataref sha1 no space' '
+ sha1=$(grep -w :2 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref sha1
+ COMMIT
+ N ${sha1}x :2
+ EOF
+ cat err &&
+ grep -q "Missing space after SHA1" err
+'
+
+#
+# notemodify, mark in committish
+#
+test_expect_failure 'S: notemodify committish markref junk at eol' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/Snotes
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note committish
+ COMMIT
+ N :202 :2x
+ EOF
+ cat err &&
+ grep -q "Garbage after mark" err
+'
+
+#
+# from
+#
+test_expect_failure 'S: from markref junk at eol' '
+ # no &&
+ git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
+ commit refs/heads/S2
+ mark :3
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :1x
+ M 100644 :103 hello.c
+ EOF
+
+ ret=$? &&
+ echo returned $ret &&
+ test $ret -ne 0 && # failed, but it created the commit
+
+ # go create the commit, need it for merge test
+ git fast-import --import-marks=marks --export-marks=marks <<-EOF &&
+ commit refs/heads/S2
+ mark :3
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :1
+ M 100644 :103 hello.c
+ EOF
+
+ # now evaluate the error
+ cat err &&
+ grep -q "Garbage after mark" err
+'
+
+
+#
+# merge
+#
+test_expect_failure 'S: merge markref junk at eol' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ mark :4
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :2
+ merge :3x
+ M 100644 :103 hello.c
+ EOF
+ cat err &&
+ grep -q "Garbage after mark" err
+'
+
+#
+# tag, from markref
+#
+test_expect_failure 'S: tag markref junk at eol' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ tag refs/tags/Stag
+ from :2x
+ tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<TAG
+ tag S
+ TAG
+ EOF
+ cat err &&
+ grep -q "Garbage after mark" err
+'
+
+#
+# cat-blob markref
+#
+test_expect_success 'S: cat-blob markref junk at eol' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ cat-blob :2x
+ EOF
+ cat err &&
+ grep -q "Garbage after mark" err
+'
+
+#
+# ls markref
+#
+test_expect_failure 'S: ls markref space' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ ls :2x hello.c
+ EOF
+ cat err &&
+ grep -q "Missing space after mark" err
+'
+
+test_expect_failure 'S: ls sha1 space' '
+ sha1=$(grep -w :2 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ ls ${sha1}x hello.c
+ EOF
+ cat err &&
+ grep -q "Missing space after SHA1" err
+'
+
test_done
--
1.7.10.rc2.2.g38670
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
2012-04-03 1:51 ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
@ 2012-04-03 14:00 ` Jonathan Nieder
2012-04-04 0:46 ` Pete Wyckoff
0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-03 14:00 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier, Junio C Hamano
Pete Wyckoff wrote:
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -2635,4 +2635,271 @@ test_expect_success \
> 'n=$(grep $a verify | wc -l) &&
> test 1 = $n'
>
> +###
> +### series S
> +###
> +#
> +# Set up is roughly this. Commits marked 1,2,3,4. Blobs
> +# marked 100 + commit. Notes 200 +. Make sure missing spaces
> +# and EOLs after mark references cause errors.
Nit: "Set up" should be "Setup" when it is a noun.
[...]
> +test_expect_success 'S: add commits 1 and 2, and blob 103' '
> + git fast-import --export-marks=marks <input
> +'
Ok, this one sets up for later ones...
> +
> +#
> +# filemodify, three datarefs
> +#
> +test_expect_failure 'S: filemodify markref no space' '
What is this testing for? The ideal is that each test_expect_foo line
contains a proposition and the test checks whether that proposition is
true or false. For example:
test_expect_failure 'S: filemodify with garbage after mark errors out' '
Likewise in later tests.
> + test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> + commit refs/heads/S
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + commit N
> + COMMIT
> + M 100644 :103x hello.c
> + EOF
> + cat err &&
> + grep -q "Missing space after mark" err
Is this using "grep -q" to avoid repeating the same line in the output
twice? It seems better to use plain grep or test_i18ngrep.
I'm also worried that if someone wants to change these messages
(perhaps to make the 'm' in "Missing" lowercase or something), they
will have to change all of these tests. If we want to be absolutely
sure that git detects the right error instead of something else, I
would suggest
test_i18ngrep "space after mark" message
I'm also not convinced the error message is worth checking at all ---
as long as fast-import errors out, won't the frontend author be able
to look in the logs to find out the problematic line anyway?
> +'
> +
> +test_expect_failure 'S: filemodify inline no space' '
> + test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> + commit refs/heads/S
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + commit N
> + COMMIT
> + M 100644 inlineX hello.c
> + data <<BLOB
> + inline
> + BLOB
> + EOF
> + cat err &&
> + grep -q "Missing space after .inline." err
Does this fail because the error message is "Missing space after SHA1"
instead? I'm not sure that's actually a bug, unless we want to
correctly nitpick that the keyword "inline" that is a stand-in for an
object name is not itself one.
I don't think the tests for exact error messages make too much sense
without the next patch, so I would suggest leaving them out if this
patch is supposed to be applicable on its own.
Thanks for some thorough tests.
Hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
2012-04-03 14:00 ` Jonathan Nieder
@ 2012-04-04 0:46 ` Pete Wyckoff
2012-04-04 5:43 ` Jonathan Nieder
0 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-04 0:46 UTC (permalink / raw)
To: Jonathan Nieder
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier, Junio C Hamano
jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:00 -0500:
> Pete Wyckoff wrote:
> > +
> > +#
> > +# filemodify, three datarefs
> > +#
> > +test_expect_failure 'S: filemodify markref no space' '
>
> What is this testing for? The ideal is that each test_expect_foo line
> contains a proposition and the test checks whether that proposition is
> true or false. For example:
>
> test_expect_failure 'S: filemodify with garbage after mark errors out' '
>
> Likewise in later tests.
I've fixed these locally, thanks for reading them!
> > + test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> > + commit refs/heads/S
> > + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> > + data <<COMMIT
> > + commit N
> > + COMMIT
> > + M 100644 :103x hello.c
> > + EOF
> > + cat err &&
> > + grep -q "Missing space after mark" err
>
> Is this using "grep -q" to avoid repeating the same line in the output
> twice? It seems better to use plain grep or test_i18ngrep.
>
> I'm also worried that if someone wants to change these messages
> (perhaps to make the 'm' in "Missing" lowercase or something), they
> will have to change all of these tests. If we want to be absolutely
> sure that git detects the right error instead of something else, I
> would suggest
>
> test_i18ngrep "space after mark" message
>
> I'm also not convinced the error message is worth checking at all ---
> as long as fast-import errors out, won't the frontend author be able
> to look in the logs to find out the problematic line anyway?
I'm a bit confused. I read through 5e9637c (i18n: add
infrastructure for translating Git with gettext, 2011-11-18), and
GETTEXT_POISON seems to be used to find untranslated messages.
What I want to test here is that the functionality works: do the
right untranslated messages get printed.
Changing the "Missing" to "missing" would require fixing the
tests, and that seems okay.
I'm happy to drop the "-q" and drop the "Missing", but wonder if
you're looking for something deeper.
> > +test_expect_failure 'S: filemodify inline no space' '
> > + test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> > + commit refs/heads/S
> > + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> > + data <<COMMIT
> > + commit N
> > + COMMIT
> > + M 100644 inlineX hello.c
> > + data <<BLOB
> > + inline
> > + BLOB
> > + EOF
> > + cat err &&
> > + grep -q "Missing space after .inline." err
>
> Does this fail because the error message is "Missing space after SHA1"
> instead? I'm not sure that's actually a bug, unless we want to
> correctly nitpick that the keyword "inline" that is a stand-in for an
> object name is not itself one.
The form of the message matters. Datarefs can be inline, SHA1s,
or marks. It is confusing to see an error message about a SHA1
when the input stream has no SHA1s. The existing code always
says "SHA1", which is wrong if you gave it a mark or an inline.
> I don't think the tests for exact error messages make too much sense
> without the next patch, so I would suggest leaving them out if this
> patch is supposed to be applicable on its own.
>
> Thanks for some thorough tests.
You think I should squash it all together, then? Or factor the
tests into two chunks:
- tests for behavior that silently accepts broken input; and,
- tests for behavior where the bogus input is detcetd, but
incorrect error messages are given?
-- Pete
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv2 1/2] fast-import: test behavior of garbage after mark references
2012-04-04 0:46 ` Pete Wyckoff
@ 2012-04-04 5:43 ` Jonathan Nieder
0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-04 5:43 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier, Junio C Hamano
On Tue, Apr 03, 2012 at 08:46:10PM -0400, Pete Wyckoff wrote:
> jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:00 -0500:
>> Is this using "grep -q" to avoid repeating the same line in the output
>> twice? It seems better to use plain grep or test_i18ngrep.
[...]
> What I want to test here is that the functionality works: do the
> right untranslated messages get printed.
>
> Changing the "Missing" to "missing" would require fixing the
> tests, and that seems okay.
Let me reiterate this a little then.
Suppose I mark the messages in fast-import.c with _() so they get
translated. Then your tests will fail, so I have to tweak them. Fine
--- the test tweaks take some time, but they're doable. Nothing lost,
right?
No, something major would be lost.
Tests normally save later coders time, by giving immediate feedback
that they would normally only get by letting a feature be used over a
long time by real users. They also dissuade people from changing
git's behavior without thinking carefully about the consequences ---
each broken test represents a class of script or user expectation that
is potentially being broken.
Similarly, a test that checks that git produces such-and-such exact
output is dissuading me from making certain behavior changes by adding
to the work needed to make them (I have to adjust tests, too). So now
I am less likely to
(1) reword the message to make it clearer in some way in response to
user feedback
(2) mark it for translation so the operator can see a message in her
native language
How is making that hard in any way a good thing?
Relaxing the pattern addresses (1). Using test_i18ngrep instead of
grep addresses (2).
Jonathan
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCHv2 2/2] fast-import: tighten parsing of mark references
2012-04-03 1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
2012-04-03 1:51 ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
@ 2012-04-03 1:51 ` Pete Wyckoff
2012-04-03 14:20 ` Jonathan Nieder
2012-04-03 2:00 ` [PATCHv2 0/2] " Sverre Rabbelier
2012-04-05 1:51 ` [PATCHv3] " Pete Wyckoff
3 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-03 1:51 UTC (permalink / raw)
To: git
Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano
The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference. Fast-import does not complain when garbage
appears after a mark reference in some cases.
Factor out parsing of mark references and complain if
errant characters are found.
Buggy input can cause fast-import to produce the wrong output,
silently, without error. This makes it difficult to track
down buggy generators of fast-import streams. An example is
seen in the last line of this commit command:
commit refs/heads/S2
committer Name <name@example.com> 1112912893 -0400
data <<COMMIT
commit message
COMMIT
from :1M 100644 :103 hello.c
It is missing a newline and should be:
[...]
from :1
M 100644 :103 hello.c
What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^. What the buggy
program was expecting was the contents of blob :103. While
the resulting commit graph looked correct, the contents in
some commits were wrong.
Signed-off-by: Pete Wyckoff <pw@padd.com>
---
fast-import.c | 97 +++++++++++++++++++++++++++++++++++++-----------
t/t9300-fast-import.sh | 20 +++++-----
2 files changed, 85 insertions(+), 32 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index a85275d..bd1b9d1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2207,6 +2207,57 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
}
+/*
+ * Given a pointer into a string, parse a mark reference:
+ *
+ * idnum ::= ':' bigint;
+ *
+ * Return the first character after the value in *endptr.
+ *
+ * Complain if the following character is not what is expected,
+ * either a space or end of the string.
+ */
+static uintmax_t parse_mark_ref(const char *p, char **endptr)
+{
+ uintmax_t mark;
+
+ assert(*p == ':');
+ ++p;
+ mark = strtoumax(p, endptr, 10);
+ if (*endptr == p)
+ die("No value after ':' in mark: %s", command_buf.buf);
+ return mark;
+}
+
+/*
+ * Parse the mark reference, and complain if this is not the end of
+ * the string.
+ */
+static uintmax_t parse_mark_ref_eol(const char *p)
+{
+ char *end;
+ uintmax_t mark;
+
+ mark = parse_mark_ref(p, &end);
+ if (*end != '\0')
+ die("Garbage after mark: %s", command_buf.buf);
+ return mark;
+}
+
+/*
+ * Parse the mark reference, demanding a trailing space. Return a
+ * pointer to the space.
+ */
+static uintmax_t parse_mark_ref_space(const char *p, char **endptr)
+{
+ uintmax_t mark;
+
+ mark = parse_mark_ref(p, endptr);
+ if (**endptr != ' ')
+ die("Missing space after mark: %s", command_buf.buf);
+ return mark;
+}
+
static void file_change_m(struct branch *b)
{
const char *p = command_buf.buf + 2;
@@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
if (*p == ':') {
char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
+ oe = find_mark(parse_mark_ref_space(p, &x));
hashcpy(sha1, oe->idx.sha1);
p = x;
} else if (!prefixcmp(p, "inline")) {
inline_data = 1;
p += 6;
+ if (*p != ' ')
+ die("Missing space after 'inline': %s",
+ command_buf.buf);
} else {
if (get_sha1_hex(p, sha1))
die("Invalid SHA1: %s", command_buf.buf);
oe = find_object(sha1);
p += 40;
+ if (*p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
}
- if (*p++ != ' ')
- die("Missing space after SHA1: %s", command_buf.buf);
+ ++p; /* skip space */
strbuf_reset(&uq);
if (!unquote_c_style(&uq, p, &endp)) {
@@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
/* <dataref> or 'inline' */
if (*p == ':') {
char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
+ oe = find_mark(parse_mark_ref_space(p, &x));
hashcpy(sha1, oe->idx.sha1);
p = x;
} else if (!prefixcmp(p, "inline")) {
inline_data = 1;
p += 6;
+ if (*p != ' ')
+ die("Missing space after 'inline': %s",
+ command_buf.buf);
} else {
if (get_sha1_hex(p, sha1))
die("Invalid SHA1: %s", command_buf.buf);
oe = find_object(sha1);
p += 40;
+ if (*p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
}
- if (*p++ != ' ')
- die("Missing space after SHA1: %s", command_buf.buf);
+ ++p; /* skip space */
/* <committish> */
s = lookup_branch(p);
@@ -2430,7 +2489,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
die("Can't add a note on empty branch.");
hashcpy(commit_sha1, s->sha1);
} else if (*p == ':') {
- uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+ uintmax_t commit_mark = parse_mark_ref_eol(p);
struct object_entry *commit_oe = find_mark(commit_mark);
if (commit_oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", commit_mark);
@@ -2537,7 +2596,7 @@ static int parse_from(struct branch *b)
hashcpy(b->branch_tree.versions[0].sha1, t);
hashcpy(b->branch_tree.versions[1].sha1, t);
} else if (*from == ':') {
- uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+ uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2572,7 +2631,7 @@ static struct hash_list *parse_merge(unsigned int *count)
if (s)
hashcpy(n->sha1, s->sha1);
else if (*from == ':') {
- uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+ uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2735,7 +2794,7 @@ static void parse_new_tag(void)
type = OBJ_COMMIT;
} else if (*from == ':') {
struct object_entry *oe;
- from_mark = strtoumax(from + 1, NULL, 10);
+ from_mark = parse_mark_ref_eol(from);
oe = find_mark(from_mark);
type = oe->type;
hashcpy(sha1, oe->idx.sha1);
@@ -2867,14 +2926,9 @@ static void parse_cat_blob(void)
/* cat-blob SP <object> LF */
p = command_buf.buf + strlen("cat-blob ");
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
- if (x == p + 1)
- die("Invalid mark: %s", command_buf.buf);
+ oe = find_mark(parse_mark_ref_eol(p));
if (!oe)
die("Unknown mark: %s", command_buf.buf);
- if (*x)
- die("Garbage after mark: %s", command_buf.buf);
hashcpy(sha1, oe->idx.sha1);
} else {
if (get_sha1_hex(p, sha1))
@@ -2945,9 +2999,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
if (**p == ':') { /* <mark> */
char *endptr;
- e = find_mark(strtoumax(*p + 1, &endptr, 10));
- if (endptr == *p + 1)
- die("Invalid mark: %s", command_buf.buf);
+ e = find_mark(parse_mark_ref_space(*p, &endptr));
if (!e)
die("Unknown mark: %s", command_buf.buf);
*p = endptr;
@@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
} else { /* <sha1> */
if (get_sha1_hex(*p, sha1))
die("Invalid SHA1: %s", command_buf.buf);
- e = find_object(sha1);
*p += 40;
+ if (**p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
+ e = find_object(sha1);
}
+ *p += 1; /* skip space */
while (!e || e->type != OBJ_TREE)
e = dereference(e, sha1);
@@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
root = new_tree_entry();
hashcpy(root->versions[1].sha1, e->idx.sha1);
load_tree(root);
- if (*p++ != ' ')
- die("Missing space after tree-ish: %s", command_buf.buf);
}
if (*p == '"') {
static struct strbuf uq = STRBUF_INIT;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 621f02a..875ac7a 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2693,7 +2693,7 @@ test_expect_success 'S: add commits 1 and 2, and blob 103' '
#
# filemodify, three datarefs
#
-test_expect_failure 'S: filemodify markref no space' '
+test_expect_success 'S: filemodify markref no space' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
commit refs/heads/S
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2706,7 +2706,7 @@ test_expect_failure 'S: filemodify markref no space' '
grep -q "Missing space after mark" err
'
-test_expect_failure 'S: filemodify inline no space' '
+test_expect_success 'S: filemodify inline no space' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
commit refs/heads/S
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2739,7 +2739,7 @@ test_expect_success 'S: filemodify sha1 no space' '
#
# notemodify, three ways to say dataref
#
-test_expect_failure 'S: notemodify dataref markref no space' '
+test_expect_success 'S: notemodify dataref markref no space' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
commit refs/heads/S
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2752,7 +2752,7 @@ test_expect_failure 'S: notemodify dataref markref no space' '
grep -q "Missing space after mark" err
'
-test_expect_failure 'S: notemodify dataref inline no space' '
+test_expect_success 'S: notemodify dataref inline no space' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
commit refs/heads/S
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2785,7 +2785,7 @@ test_expect_success 'S: notemodify dataref sha1 no space' '
#
# notemodify, mark in committish
#
-test_expect_failure 'S: notemodify committish markref junk at eol' '
+test_expect_success 'S: notemodify committish markref junk at eol' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
commit refs/heads/Snotes
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -2801,7 +2801,7 @@ test_expect_failure 'S: notemodify committish markref junk at eol' '
#
# from
#
-test_expect_failure 'S: from markref junk at eol' '
+test_expect_success 'S: from markref junk at eol' '
# no &&
git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
commit refs/heads/S2
@@ -2839,7 +2839,7 @@ test_expect_failure 'S: from markref junk at eol' '
#
# merge
#
-test_expect_failure 'S: merge markref junk at eol' '
+test_expect_success 'S: merge markref junk at eol' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
commit refs/heads/S
mark :4
@@ -2858,7 +2858,7 @@ test_expect_failure 'S: merge markref junk at eol' '
#
# tag, from markref
#
-test_expect_failure 'S: tag markref junk at eol' '
+test_expect_success 'S: tag markref junk at eol' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
tag refs/tags/Stag
from :2x
@@ -2885,7 +2885,7 @@ test_expect_success 'S: cat-blob markref junk at eol' '
#
# ls markref
#
-test_expect_failure 'S: ls markref space' '
+test_expect_success 'S: ls markref space' '
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
ls :2x hello.c
EOF
@@ -2893,7 +2893,7 @@ test_expect_failure 'S: ls markref space' '
grep -q "Missing space after mark" err
'
-test_expect_failure 'S: ls sha1 space' '
+test_expect_success 'S: ls sha1 space' '
sha1=$(grep -w :2 marks | cut -d\ -f2) &&
test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
ls ${sha1}x hello.c
--
1.7.10.rc2.2.g38670
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references
2012-04-03 1:51 ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
@ 2012-04-03 14:20 ` Jonathan Nieder
2012-04-04 1:20 ` Pete Wyckoff
0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-03 14:20 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano, Johan Herland
(cc-ing Johan for noteimport code)
Pete Wyckoff wrote:
> Fast-import does not complain when garbage
> appears after a mark reference in some cases.
Thanks for fixing it.
[...]
> +++ b/fast-import.c
[...]
> @@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
>
> if (*p == ':') {
> char *x;
> - oe = find_mark(strtoumax(p + 1, &x, 10));
> + oe = find_mark(parse_mark_ref_space(p, &x));
> hashcpy(sha1, oe->idx.sha1);
> p = x;
Simpler:
if (*p == ':') {
oe = find_mark(parse_mark_ref_space(p, &p));
hashcpy(sha1, oe->idx.sha1);
} else if ...
> } else if (!prefixcmp(p, "inline")) {
> inline_data = 1;
> p += 6;
> + if (*p != ' ')
> + die("Missing space after 'inline': %s",
> + command_buf.buf);
> } else {
> if (get_sha1_hex(p, sha1))
> die("Invalid SHA1: %s", command_buf.buf);
If I write
M 100644 inliness some/path/to/file
was my mistake actually leaving out a space after 'inline' or
was it using an invalid <dataref>?
I think the latter, so I would suggest
} else if (!prefixcmp(p, "inline ")) {
inline_data = 1;
p += strlen("inline"); /* advance to space */
} else {
if (get_sha1_hex(p, sha1))
...
[...]
> }
> - if (*p++ != ' ')
> - die("Missing space after SHA1: %s", command_buf.buf);
> + ++p; /* skip space */
I guess I'd suggest
assert(*p == ' ');
p++;
as defense against coders introducing additional cases that
are not as careful.
> @@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
> /* <dataref> or 'inline' */
> if (*p == ':') {
> char *x;
> - oe = find_mark(strtoumax(p + 1, &x, 10));
> + oe = find_mark(parse_mark_ref_space(p, &x));
> hashcpy(sha1, oe->idx.sha1);
> p = x;
Likewise (btw, why doesn't this share code with the filemodify case?):
if (*p == ':') {
oe = find_mark(parse_mark_with_trailing_space(p, &p));
hashcpy(sha1, oe->idx.sha1);
} else if ...
and so on.
[...]
> @@ -2430,7 +2489,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
> die("Can't add a note on empty branch.");
> hashcpy(commit_sha1, s->sha1);
> } else if (*p == ':') {
> - uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
> + uintmax_t commit_mark = parse_mark_ref_eol(p);
> struct object_entry *commit_oe = find_mark(commit_mark);
> if (commit_oe->type != OBJ_COMMIT)
> die("Mark :%" PRIuMAX " not a commit", commit_mark);
> @@ -2537,7 +2596,7 @@ static int parse_from(struct branch *b)
> hashcpy(b->branch_tree.versions[0].sha1, t);
> hashcpy(b->branch_tree.versions[1].sha1, t);
> } else if (*from == ':') {
> - uintmax_t idnum = strtoumax(from + 1, NULL, 10);
> + uintmax_t idnum = parse_mark_ref_eol(from);
The title feature. Nice.
[...]
> @@ -2945,9 +2999,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
>
> if (**p == ':') { /* <mark> */
> char *endptr;
> - e = find_mark(strtoumax(*p + 1, &endptr, 10));
> - if (endptr == *p + 1)
> - die("Invalid mark: %s", command_buf.buf);
> + e = find_mark(parse_mark_ref_space(*p, &endptr));
> if (!e)
> die("Unknown mark: %s", command_buf.buf);
> *p = endptr;
Simpler:
if (**p == ':') {
e = find_mark(parse_mark_...(*p, p));
if (!e)
die(...);
} else {
> @@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
> } else { /* <sha1> */
> if (get_sha1_hex(*p, sha1))
> die("Invalid SHA1: %s", command_buf.buf);
> - e = find_object(sha1);
> *p += 40;
> + if (**p != ' ')
> + die("Missing space after SHA1: %s", command_buf.buf);
> + e = find_object(sha1);
This seems dangerous. What if a new caller arises that wants to
parse a <dataref> representing a tree-ish at the end of the line?
So I think checking the character after the tree-ish should still
be the caller's responsibility.
> }
> + *p += 1; /* skip space */
If other patches in flight use the same function, they would expect
*p to point to the space when parse_treeish_dataref returns. If we
wanted to change that (as mentioned above I don't think we ought to)
then the function's name should be changed to force such new callers
not to compile.
> @@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
> root = new_tree_entry();
> hashcpy(root->versions[1].sha1, e->idx.sha1);
> load_tree(root);
> - if (*p++ != ' ')
> - die("Missing space after tree-ish: %s", command_buf.buf);
(here's the caller).
Except where noted above, this looks good.
Thanks and hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references
2012-04-03 14:20 ` Jonathan Nieder
@ 2012-04-04 1:20 ` Pete Wyckoff
2012-04-04 5:32 ` Jonathan Nieder
0 siblings, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-04 1:20 UTC (permalink / raw)
To: Jonathan Nieder
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano, Johan Herland
jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:20 -0500:
> (cc-ing Johan for noteimport code)
Thanks, glad you noticed.
> Pete Wyckoff wrote:
> [...]
> > +++ b/fast-import.c
> [...]
> > @@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
> >
> > if (*p == ':') {
> > char *x;
> > - oe = find_mark(strtoumax(p + 1, &x, 10));
> > + oe = find_mark(parse_mark_ref_space(p, &x));
> > hashcpy(sha1, oe->idx.sha1);
> > p = x;
>
> Simpler:
>
> if (*p == ':') {
> oe = find_mark(parse_mark_ref_space(p, &p));
> hashcpy(sha1, oe->idx.sha1);
> } else if ...
Yes. I thought about just passing in plain old &p. Even though
these approaches would work, it is a bit more difficult for
novice C coders to read. Figured we should err on the side of
helping future code readers. I can add more cleverness if you
feel strongly.
> > } else if (!prefixcmp(p, "inline")) {
> > inline_data = 1;
> > p += 6;
> > + if (*p != ' ')
> > + die("Missing space after 'inline': %s",
> > + command_buf.buf);
> > } else {
> > if (get_sha1_hex(p, sha1))
> > die("Invalid SHA1: %s", command_buf.buf);
>
> If I write
>
> M 100644 inliness some/path/to/file
>
> was my mistake actually leaving out a space after 'inline' or
> was it using an invalid <dataref>?
>
> I think the latter, so I would suggest
>
> } else if (!prefixcmp(p, "inline ")) {
> inline_data = 1;
> p += strlen("inline"); /* advance to space */
> } else {
> if (get_sha1_hex(p, sha1))
> ...
Insead of "Missing space after 'inline'", you'll get "Invalid
SHA1". You misspelled "inline" with "inliness"? And would
prefer to be told you provided an invalid SHA1?
I'm tempted to guess that any string starting with "inline", e.g.
"inlinePath/To/File" without a space is still a good indication
that they were trying to say "inline ". The chance that they
horribly typed a SHA1, or really had a path staring with "inline"
and forgot the dataref entirely, feel less likely.
> [...]
> > }
> > - if (*p++ != ' ')
> > - die("Missing space after SHA1: %s", command_buf.buf);
> > + ++p; /* skip space */
>
> I guess I'd suggest
>
> assert(*p == ' ');
> p++;
>
> as defense against coders introducing additional cases that
> are not as careful.
Good suggestion, thanks.
> > @@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
> > /* <dataref> or 'inline' */
> > if (*p == ':') {
> > char *x;
> > - oe = find_mark(strtoumax(p + 1, &x, 10));
> > + oe = find_mark(parse_mark_ref_space(p, &x));
> > hashcpy(sha1, oe->idx.sha1);
> > p = x;
>
> Likewise (btw, why doesn't this share code with the filemodify case?):
>
> if (*p == ':') {
> oe = find_mark(parse_mark_with_trailing_space(p, &p));
> hashcpy(sha1, oe->idx.sha1);
> } else if ...
>
> and so on.
It does feel like a good opportunity for some refactoring. Two
out of the three callers to parse an mark followed by a space
could be put together here.
[..]
> > @@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
> > } else { /* <sha1> */
> > if (get_sha1_hex(*p, sha1))
> > die("Invalid SHA1: %s", command_buf.buf);
> > - e = find_object(sha1);
> > *p += 40;
> > + if (**p != ' ')
> > + die("Missing space after SHA1: %s", command_buf.buf);
> > + e = find_object(sha1);
>
> This seems dangerous. What if a new caller arises that wants to
> parse a <dataref> representing a tree-ish at the end of the line?
>
> So I think checking the character after the tree-ish should still
> be the caller's responsibility.
I was close to just moving parse_treeish_dataref() into its
single caller, parse_ls(), just so we wouldn't have to think
about this.
There are two cases it handles: mark and sha1. The mark case
uses the handy new parse_mark_ref_space(), which does the space
checking. The sha1 branch had no check in this function. So
I hoisted the space check up to make the branches symmetrical.
> > }
> > + *p += 1; /* skip space */
>
> If other patches in flight use the same function, they would expect
> *p to point to the space when parse_treeish_dataref returns. If we
> wanted to change that (as mentioned above I don't think we ought to)
> then the function's name should be changed to force such new callers
> not to compile.
>
> > @@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
> > root = new_tree_entry();
> > hashcpy(root->versions[1].sha1, e->idx.sha1);
> > load_tree(root);
> > - if (*p++ != ' ')
> > - die("Missing space after tree-ish: %s", command_buf.buf);
>
> (here's the caller).
I would prefer just to inline the whole thing. Or new name
parse_ls_dataref() if you have a preference.
> Except where noted above, this looks good.
Thanks.
-- Pete
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references
2012-04-04 1:20 ` Pete Wyckoff
@ 2012-04-04 5:32 ` Jonathan Nieder
0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-04 5:32 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano, Johan Herland
Pete Wyckoff wrote:
> jrnieder@gmail.com wrote on Tue, 03 Apr 2012 09:20 -0500:
>> Simpler:
>>
>> if (*p == ':') {
>> oe = find_mark(parse_mark_ref_space(p, &p));
>> hashcpy(sha1, oe->idx.sha1);
>> } else if ...
>
> Yes. I thought about just passing in plain old &p. Even though
> these approaches would work, it is a bit more difficult for
> novice C coders to read. Figured we should err on the side of
> helping future code readers. I can add more cleverness if you
> feel strongly.
It would be clearest with one argument, like so:
oe = find_mark(parse_mark_...(&p));
hashcpy(sha1, oe->idx.sha1);
[...]
> Insead of "Missing space after 'inline'", you'll get "Invalid
> SHA1". You misspelled "inline" with "inliness"? And would
> prefer to be told you provided an invalid SHA1?
It wasn't a great example, but what I meant is that if someone
asked me, a human, to parse
M 100644 foobar path/to/file
I would assume that foobar is a <dataref>. Likewise, for any
string baz in
M 100644 baz path/to/file
including strings that start with "inline", except for "inline"
itself.
To put it another way: checking for 'inline' at the start of a word as
a way to check for typos seems odd to me. We do not diagnose
M 100644 Inline path/to/file
as a misspelled version of "inline", nor
M 100644inline path/to/file
as an instance of a missing space character, and we shouldn't.
The goal in fast-import's behavior is usually predictability and
simplicity in terms of the mental model of the person writing a
frontend. Trying to guess the user's intention on malformed input
only takes away from that goal.
Why I care: if some day git permits other kinds of <dataref> (for
example if it supports refnames some day), I do not want datarefs
beginning with "inline" to be forbidden.
[...]
> There are two cases it handles: mark and sha1. The mark case
> uses the handy new parse_mark_ref_space(), which does the space
> checking. The sha1 branch had no check in this function. So
> I hoisted the space check up to make the branches symmetrical.
I think it's ok to sacrifice symmetry here, but:
[...]
> I would prefer just to inline the whole thing. Or new name
> parse_ls_dataref() if you have a preference.
if changing the behavior of the function that parses a treeish dataref
seems right, that's fine with me as long as its name or signature
changes.
For example, it could become
static struct object_entry *parse_treeish(const char **p);
Hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv2 0/2] fast-import: tighten parsing of mark references
2012-04-03 1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
2012-04-03 1:51 ` [PATCHv2 1/2] fast-import: test behavior of garbage after " Pete Wyckoff
2012-04-03 1:51 ` [PATCHv2 2/2] fast-import: tighten parsing of " Pete Wyckoff
@ 2012-04-03 2:00 ` Sverre Rabbelier
2012-04-05 1:51 ` [PATCHv3] " Pete Wyckoff
3 siblings, 0 replies; 22+ messages in thread
From: Sverre Rabbelier @ 2012-04-03 2:00 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Jonathan Nieder, Dmitry Ivankov, David Barr, Junio C Hamano
On Mon, Apr 2, 2012 at 20:51, Pete Wyckoff <pw@padd.com> wrote:
> Also, I did the unit tests first, to make sure things were broken
> as I expected. You can squash it all together if you prefer.
Nice series. I agree that it's good to be strict in what we accept
(breaking from the "be liberal in what you accept" mantra), due to the
importance of the input stream being parsed exactly right (as opposed
to a browser, where it's better to not render something bold than to
just break because a </b> was missing).
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCHv3] fast-import: tighten parsing of mark references
2012-04-03 1:51 ` [PATCHv2 0/2] fast-import: tighten parsing of mark references Pete Wyckoff
` (2 preceding siblings ...)
2012-04-03 2:00 ` [PATCHv2 0/2] " Sverre Rabbelier
@ 2012-04-05 1:51 ` Pete Wyckoff
2012-04-05 2:24 ` Jonathan Nieder
2012-04-07 22:59 ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
3 siblings, 2 replies; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-05 1:51 UTC (permalink / raw)
To: git
Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano, Johan Herland
The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference. Fast-import does not complain when garbage
appears after a mark reference in some cases.
Factor out parsing of mark references and complain if
errant characters are found.
Buggy input can cause fast-import to produce the wrong output,
silently, without error. This makes it difficult to track
down buggy generators of fast-import streams. An example is
seen in the last line of this commit command:
commit refs/heads/S2
committer Name <name@example.com> 1112912893 -0400
data <<COMMIT
commit message
COMMIT
from :1M 100644 :103 hello.c
It is missing a newline and should be:
[...]
from :1
M 100644 :103 hello.c
What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^. What the buggy
program was expecting was the contents of blob :103. While
the resulting commit graph looked correct, the contents in
some commits were wrong.
---
This addresses all of Jonathan's comments, in particular:
- give tests descriptive names
- add asserts for trailing space for filemodify, notemodify
to protect against flaws in future dataref implementions
- compactify end pointer return and update, so:
oe = find_mark(parse_mark_ref_space(&p));
- replace "grep -q" with "test_i18ngrep"
- drop first word in failure messages, so no "Missing"
or "Garbage", resp., in:
test_i18ngrep "space after SHA1" err
test_i18ngrep "after mark" err
- Erroneous datarefs "inlineX" and "no-such-dataref" should
behave the same, in particular, they now complain "Invalid SHA1"
rather than guessing an attempt at "inline ".
- Revert change parse_treeish_dataref() API in case other
changes are inflight. Verify space handling in caller.
I did not refactor the 15 or so common lines in filemodify and
notemodify dataref handling.
I'll resend once 1.7.11 opens up.
Thanks all for the careful review.
-- Pete
fast-import.c | 102 +++++++++++++-----
t/t9300-fast-import.sh | 276 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 349 insertions(+), 29 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index a85275d..0525e12 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2207,6 +2207,59 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
}
+/*
+ * Given a pointer into a string, parse a mark reference:
+ *
+ * idnum ::= ':' bigint;
+ *
+ * Return the first character after the value in *endptr.
+ *
+ * Complain if the following character is not what is expected,
+ * either a space or end of the string.
+ */
+static uintmax_t parse_mark_ref(const char *p, char **endptr)
+{
+ uintmax_t mark;
+
+ assert(*p == ':');
+ ++p;
+ mark = strtoumax(p, endptr, 10);
+ if (*endptr == p)
+ die("No value after ':' in mark: %s", command_buf.buf);
+ return mark;
+}
+
+/*
+ * Parse the mark reference, and complain if this is not the end of
+ * the string.
+ */
+static uintmax_t parse_mark_ref_eol(const char *p)
+{
+ char *end;
+ uintmax_t mark;
+
+ mark = parse_mark_ref(p, &end);
+ if (*end != '\0')
+ die("Garbage after mark: %s", command_buf.buf);
+ return mark;
+}
+
+/*
+ * Parse the mark reference, demanding a trailing space. Return a
+ * pointer to the space.
+ */
+static uintmax_t parse_mark_ref_space(const char **p)
+{
+ uintmax_t mark;
+ char *end;
+
+ mark = parse_mark_ref(*p, &end);
+ if (*end != ' ')
+ die("Missing space after mark: %s", command_buf.buf);
+ *p = end;
+ return mark;
+}
+
static void file_change_m(struct branch *b)
{
const char *p = command_buf.buf + 2;
@@ -2235,21 +2288,21 @@ static void file_change_m(struct branch *b)
}
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
+ oe = find_mark(parse_mark_ref_space(&p));
hashcpy(sha1, oe->idx.sha1);
- p = x;
- } else if (!prefixcmp(p, "inline")) {
+ } else if (!prefixcmp(p, "inline ")) {
inline_data = 1;
- p += 6;
+ p += strlen("inline"); /* advance to space */
} else {
if (get_sha1_hex(p, sha1))
die("Invalid SHA1: %s", command_buf.buf);
oe = find_object(sha1);
p += 40;
+ if (*p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
}
- if (*p++ != ' ')
- die("Missing space after SHA1: %s", command_buf.buf);
+ assert(*p == ' ');
+ ++p; /* skip space */
strbuf_reset(&uq);
if (!unquote_c_style(&uq, p, &endp)) {
@@ -2407,21 +2460,21 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
/* Now parse the notemodify command. */
/* <dataref> or 'inline' */
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
+ oe = find_mark(parse_mark_ref_space(&p));
hashcpy(sha1, oe->idx.sha1);
- p = x;
- } else if (!prefixcmp(p, "inline")) {
+ } else if (!prefixcmp(p, "inline ")) {
inline_data = 1;
- p += 6;
+ p += strlen("inline"); /* advance to space */
} else {
if (get_sha1_hex(p, sha1))
die("Invalid SHA1: %s", command_buf.buf);
oe = find_object(sha1);
p += 40;
+ if (*p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
}
- if (*p++ != ' ')
- die("Missing space after SHA1: %s", command_buf.buf);
+ assert(*p == ' ');
+ ++p; /* skip space */
/* <committish> */
s = lookup_branch(p);
@@ -2430,7 +2483,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
die("Can't add a note on empty branch.");
hashcpy(commit_sha1, s->sha1);
} else if (*p == ':') {
- uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+ uintmax_t commit_mark = parse_mark_ref_eol(p);
struct object_entry *commit_oe = find_mark(commit_mark);
if (commit_oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", commit_mark);
@@ -2537,7 +2590,7 @@ static int parse_from(struct branch *b)
hashcpy(b->branch_tree.versions[0].sha1, t);
hashcpy(b->branch_tree.versions[1].sha1, t);
} else if (*from == ':') {
- uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+ uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2572,7 +2625,7 @@ static struct hash_list *parse_merge(unsigned int *count)
if (s)
hashcpy(n->sha1, s->sha1);
else if (*from == ':') {
- uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+ uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2735,7 +2788,7 @@ static void parse_new_tag(void)
type = OBJ_COMMIT;
} else if (*from == ':') {
struct object_entry *oe;
- from_mark = strtoumax(from + 1, NULL, 10);
+ from_mark = parse_mark_ref_eol(from);
oe = find_mark(from_mark);
type = oe->type;
hashcpy(sha1, oe->idx.sha1);
@@ -2867,14 +2920,9 @@ static void parse_cat_blob(void)
/* cat-blob SP <object> LF */
p = command_buf.buf + strlen("cat-blob ");
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
- if (x == p + 1)
- die("Invalid mark: %s", command_buf.buf);
+ oe = find_mark(parse_mark_ref_eol(p));
if (!oe)
die("Unknown mark: %s", command_buf.buf);
- if (*x)
- die("Garbage after mark: %s", command_buf.buf);
hashcpy(sha1, oe->idx.sha1);
} else {
if (get_sha1_hex(p, sha1))
@@ -2944,13 +2992,9 @@ static struct object_entry *parse_treeish_dataref(const char **p)
struct object_entry *e;
if (**p == ':') { /* <mark> */
- char *endptr;
- e = find_mark(strtoumax(*p + 1, &endptr, 10));
- if (endptr == *p + 1)
- die("Invalid mark: %s", command_buf.buf);
+ e = find_mark(parse_mark_ref_space(p));
if (!e)
die("Unknown mark: %s", command_buf.buf);
- *p = endptr;
hashcpy(sha1, e->idx.sha1);
} else { /* <sha1> */
if (get_sha1_hex(*p, sha1))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..cbc0e81 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2635,4 +2635,280 @@ test_expect_success \
'n=$(grep $a verify | wc -l) &&
test 1 = $n'
+###
+### series S
+###
+#
+# Setup is roughly this. Commits marked 1,2,3,4. Blobs
+# marked 100 + commit. Notes 200 +. Make sure missing spaces
+# and EOLs after mark references cause errors.
+#
+# The error message looks like either:
+# Missing space after ..
+# or
+# Garbage after ..
+#
+# 1--2--4
+# \ /
+# -3-
+#
+test_tick
+
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :1
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 2
+COMMIT
+from :1
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+
+blob
+mark :103
+data <<BLOB
+blob 3
+BLOB
+
+blob
+mark :202
+data <<BLOB
+note 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: initialize for S tests' '
+ git fast-import --export-marks=marks <input
+'
+
+#
+# filemodify, three datarefs
+#
+test_expect_success 'S: filemodify with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 :103x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after mark" err
+'
+
+# inline is misspelled; fast-import thinks it is some unknown dataref
+# and complains "Invalid SHA1"
+test_expect_success 'S: filemodify with garbage after inline must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 inlineX hello.c
+ data <<BLOB
+ inline
+ BLOB
+ EOF
+ cat err &&
+ test_i18ngrep "nvalid SHA1" err
+'
+
+test_expect_success 'S: filemodify with garbage after sha1 must fail' '
+ sha1=$(grep -w :103 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 ${sha1}x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, three ways to say dataref
+#
+test_expect_success 'S: notemodify with garabge after mark dataref must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref markref
+ COMMIT
+ N :103x :2
+ EOF
+ cat err &&
+ test_i18ngrep "space after mark" err
+'
+
+# inline is misspelled; fast-import thinks it is some unknown dataref
+# and complains "Invalid SHA1"
+test_expect_success 'S: notemodify with garbage after inline dataref must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref inline
+ COMMIT
+ N inlineX :2
+ data <<BLOB
+ note blob
+ BLOB
+ EOF
+ cat err &&
+ test_i18ngrep "nvalid SHA1" err
+'
+
+test_expect_success 'S: notemodify with garbage after sha1 dataref must fail' '
+ sha1=$(grep -w :2 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref sha1
+ COMMIT
+ N ${sha1}x :2
+ EOF
+ cat err &&
+ test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, mark in committish
+#
+test_expect_success 'S: notemodify with garbarge after mark committish must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/Snotes
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note committish
+ COMMIT
+ N :202 :2x
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# from
+#
+test_expect_success 'S: from with garbage after mark must fail' '
+ # no &&
+ git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
+ commit refs/heads/S2
+ mark :3
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :1x
+ M 100644 :103 hello.c
+ EOF
+
+ ret=$? &&
+ echo returned $ret &&
+ test $ret -ne 0 && # failed, but it created the commit
+
+ # go create the commit, need it for merge test
+ git fast-import --import-marks=marks --export-marks=marks <<-EOF &&
+ commit refs/heads/S2
+ mark :3
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :1
+ M 100644 :103 hello.c
+ EOF
+
+ # now evaluate the error
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+
+#
+# merge
+#
+test_expect_success 'S: merge with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ mark :4
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :2
+ merge :3x
+ M 100644 :103 hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# tag, from markref
+#
+test_expect_success 'S: tag with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ tag refs/tags/Stag
+ from :2x
+ tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<TAG
+ tag S
+ TAG
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# cat-blob markref
+#
+test_expect_success 'S: cat-blob with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ cat-blob :2x
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# ls markref
+#
+test_expect_success 'S: ls with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ ls :2x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after mark" err
+'
+
+test_expect_success 'S: ls with garbage after sha1 must fail' '
+ sha1=$(grep -w :2 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ ls ${sha1}x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after tree-ish" err
+'
+
test_done
--
1.7.10.rc2.62.gac32b.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCHv3] fast-import: tighten parsing of mark references
2012-04-05 1:51 ` [PATCHv3] " Pete Wyckoff
@ 2012-04-05 2:24 ` Jonathan Nieder
2012-04-05 17:20 ` Junio C Hamano
2012-04-07 22:59 ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
1 sibling, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2012-04-05 2:24 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano, Johan Herland
Pete Wyckoff wrote:
> This addresses all of Jonathan's comments, in particular:
Nice. Thanks much. I only have a few small worries left:
[...]
> +++ b/t/t9300-fast-import.sh
> @@ -2635,4 +2635,280 @@ test_expect_success \
[...]
> +test_expect_success 'S: filemodify with garbage after sha1 must fail' '
> + sha1=$(grep -w :103 marks | cut -d\ -f2) &&
"grep -w" isn't used elsewhere in the testsuite. Is it portable?
[...]
> +# inline is misspelled; fast-import thinks it is some unknown dataref
> +# and complains "Invalid SHA1"
> +test_expect_success 'S: notemodify with garbage after inline dataref must fail' '
> + test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
> + commit refs/heads/S
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + commit S note dataref inline
> + COMMIT
> + N inlineX :2
> + data <<BLOB
> + note blob
> + BLOB
> + EOF
> + cat err &&
> + test_i18ngrep "nvalid SHA1" err
> +'
If I understood the discussion before correctly, this error message is
suboptimal and something like "invalid dataref" would be a little
clearer, right?
That's orthogonal to what this patch is about so I'm not suggesting
changing it. But shouldn't the test just check that fast-import fails
without committing to any particular message?
Cheers,
Jonathan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv3] fast-import: tighten parsing of mark references
2012-04-05 2:24 ` Jonathan Nieder
@ 2012-04-05 17:20 ` Junio C Hamano
0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2012-04-05 17:20 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Pete Wyckoff, git, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Johan Herland
Jonathan Nieder <jrnieder@gmail.com> writes:
> Pete Wyckoff wrote:
>
>> This addresses all of Jonathan's comments, in particular:
>
> Nice. Thanks much. I only have a few small worries left:
>
> [...]
>> +++ b/t/t9300-fast-import.sh
>> @@ -2635,4 +2635,280 @@ test_expect_success \
> [...]
>> +test_expect_success 'S: filemodify with garbage after sha1 must fail' '
>> + sha1=$(grep -w :103 marks | cut -d\ -f2) &&
>
> "grep -w" isn't used elsewhere in the testsuite. Is it portable?
It is not portable enough.
> If I understood the discussion before correctly, this error message is
> suboptimal and something like "invalid dataref" would be a little
> clearer, right?
>
> That's orthogonal to what this patch is about so I'm not suggesting
> changing it. But shouldn't the test just check that fast-import fails
> without committing to any particular message?
That would certainly make more sense.
Thanks for being extra careful.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCHv4] fast-import: tighten parsing of datarefs
2012-04-05 1:51 ` [PATCHv3] " Pete Wyckoff
2012-04-05 2:24 ` Jonathan Nieder
@ 2012-04-07 22:59 ` Pete Wyckoff
2012-04-10 21:40 ` Junio C Hamano
1 sibling, 1 reply; 22+ messages in thread
From: Pete Wyckoff @ 2012-04-07 22:59 UTC (permalink / raw)
To: git
Cc: Jonathan Nieder, Dmitry Ivankov, David Barr, Sverre Rabbelier,
Junio C Hamano, Johan Herland
The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference. Fast-import does not complain when garbage
appears after a mark reference in some cases.
Factor out parsing of mark references and complain if
errant characters are found. Also be a little more careful
when parsing "inline" and SHA1s, complaining if extra
characters appear or if the form of the dataref is unrecognized.
Buggy input can cause fast-import to produce the wrong output,
silently, without error. This makes it difficult to track
down buggy generators of fast-import streams. An example is
seen in the last line of this commit command:
commit refs/heads/S2
committer Name <name@example.com> 1112912893 -0400
data <<COMMIT
commit message
COMMIT
from :1M 100644 :103 hello.c
It is missing a newline and should be:
[...]
from :1
M 100644 :103 hello.c
What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^. What the buggy
program was expecting was the contents of blob :103. While
the resulting commit graph looked correct, the contents in
some commits were wrong.
Signed-off-by: Pete Wyckoff <pw@padd.com>
---
Thanks for the patient comments. Changes from v3:
- say "invalid dataref" when neither a sha1 nor
"inline " for a mark could be correctly parsed, including
requisite spaces
- avoid "grep -w": rearrange marks used in the tests
so they are easily greppable without -w
fast-import.c | 110 +++++++++++++------
t/t9300-fast-import.sh | 287 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 364 insertions(+), 33 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index a85275d..7f1fbed 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2207,6 +2207,59 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
}
+/*
+ * Given a pointer into a string, parse a mark reference:
+ *
+ * idnum ::= ':' bigint;
+ *
+ * Return the first character after the value in *endptr.
+ *
+ * Complain if the following character is not what is expected,
+ * either a space or end of the string.
+ */
+static uintmax_t parse_mark_ref(const char *p, char **endptr)
+{
+ uintmax_t mark;
+
+ assert(*p == ':');
+ ++p;
+ mark = strtoumax(p, endptr, 10);
+ if (*endptr == p)
+ die("No value after ':' in mark: %s", command_buf.buf);
+ return mark;
+}
+
+/*
+ * Parse the mark reference, and complain if this is not the end of
+ * the string.
+ */
+static uintmax_t parse_mark_ref_eol(const char *p)
+{
+ char *end;
+ uintmax_t mark;
+
+ mark = parse_mark_ref(p, &end);
+ if (*end != '\0')
+ die("Garbage after mark: %s", command_buf.buf);
+ return mark;
+}
+
+/*
+ * Parse the mark reference, demanding a trailing space. Return a
+ * pointer to the space.
+ */
+static uintmax_t parse_mark_ref_space(const char **p)
+{
+ uintmax_t mark;
+ char *end;
+
+ mark = parse_mark_ref(*p, &end);
+ if (*end != ' ')
+ die("Missing space after mark: %s", command_buf.buf);
+ *p = end;
+ return mark;
+}
+
static void file_change_m(struct branch *b)
{
const char *p = command_buf.buf + 2;
@@ -2235,21 +2288,21 @@ static void file_change_m(struct branch *b)
}
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
+ oe = find_mark(parse_mark_ref_space(&p));
hashcpy(sha1, oe->idx.sha1);
- p = x;
- } else if (!prefixcmp(p, "inline")) {
+ } else if (!prefixcmp(p, "inline ")) {
inline_data = 1;
- p += 6;
+ p += strlen("inline"); /* advance to space */
} else {
if (get_sha1_hex(p, sha1))
- die("Invalid SHA1: %s", command_buf.buf);
+ die("Invalid dataref: %s", command_buf.buf);
oe = find_object(sha1);
p += 40;
+ if (*p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
}
- if (*p++ != ' ')
- die("Missing space after SHA1: %s", command_buf.buf);
+ assert(*p == ' ');
+ ++p; /* skip space */
strbuf_reset(&uq);
if (!unquote_c_style(&uq, p, &endp)) {
@@ -2407,21 +2460,21 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
/* Now parse the notemodify command. */
/* <dataref> or 'inline' */
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
+ oe = find_mark(parse_mark_ref_space(&p));
hashcpy(sha1, oe->idx.sha1);
- p = x;
- } else if (!prefixcmp(p, "inline")) {
+ } else if (!prefixcmp(p, "inline ")) {
inline_data = 1;
- p += 6;
+ p += strlen("inline"); /* advance to space */
} else {
if (get_sha1_hex(p, sha1))
- die("Invalid SHA1: %s", command_buf.buf);
+ die("Invalid dataref: %s", command_buf.buf);
oe = find_object(sha1);
p += 40;
+ if (*p != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
}
- if (*p++ != ' ')
- die("Missing space after SHA1: %s", command_buf.buf);
+ assert(*p == ' ');
+ ++p; /* skip space */
/* <committish> */
s = lookup_branch(p);
@@ -2430,7 +2483,7 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
die("Can't add a note on empty branch.");
hashcpy(commit_sha1, s->sha1);
} else if (*p == ':') {
- uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+ uintmax_t commit_mark = parse_mark_ref_eol(p);
struct object_entry *commit_oe = find_mark(commit_mark);
if (commit_oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", commit_mark);
@@ -2537,7 +2590,7 @@ static int parse_from(struct branch *b)
hashcpy(b->branch_tree.versions[0].sha1, t);
hashcpy(b->branch_tree.versions[1].sha1, t);
} else if (*from == ':') {
- uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+ uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2572,7 +2625,7 @@ static struct hash_list *parse_merge(unsigned int *count)
if (s)
hashcpy(n->sha1, s->sha1);
else if (*from == ':') {
- uintmax_t idnum = strtoumax(from + 1, NULL, 10);
+ uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2735,7 +2788,7 @@ static void parse_new_tag(void)
type = OBJ_COMMIT;
} else if (*from == ':') {
struct object_entry *oe;
- from_mark = strtoumax(from + 1, NULL, 10);
+ from_mark = parse_mark_ref_eol(from);
oe = find_mark(from_mark);
type = oe->type;
hashcpy(sha1, oe->idx.sha1);
@@ -2867,18 +2920,13 @@ static void parse_cat_blob(void)
/* cat-blob SP <object> LF */
p = command_buf.buf + strlen("cat-blob ");
if (*p == ':') {
- char *x;
- oe = find_mark(strtoumax(p + 1, &x, 10));
- if (x == p + 1)
- die("Invalid mark: %s", command_buf.buf);
+ oe = find_mark(parse_mark_ref_eol(p));
if (!oe)
die("Unknown mark: %s", command_buf.buf);
- if (*x)
- die("Garbage after mark: %s", command_buf.buf);
hashcpy(sha1, oe->idx.sha1);
} else {
if (get_sha1_hex(p, sha1))
- die("Invalid SHA1: %s", command_buf.buf);
+ die("Invalid dataref: %s", command_buf.buf);
if (p[40])
die("Garbage after SHA1: %s", command_buf.buf);
oe = find_object(sha1);
@@ -2944,17 +2992,13 @@ static struct object_entry *parse_treeish_dataref(const char **p)
struct object_entry *e;
if (**p == ':') { /* <mark> */
- char *endptr;
- e = find_mark(strtoumax(*p + 1, &endptr, 10));
- if (endptr == *p + 1)
- die("Invalid mark: %s", command_buf.buf);
+ e = find_mark(parse_mark_ref_space(p));
if (!e)
die("Unknown mark: %s", command_buf.buf);
- *p = endptr;
hashcpy(sha1, e->idx.sha1);
} else { /* <sha1> */
if (get_sha1_hex(*p, sha1))
- die("Invalid SHA1: %s", command_buf.buf);
+ die("Invalid dataref: %s", command_buf.buf);
e = find_object(sha1);
*p += 40;
}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 0f5b5e5..0125413 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2634,5 +2634,292 @@ test_expect_success \
'R: blob appears only once' \
'n=$(grep $a verify | wc -l) &&
test 1 = $n'
+
+###
+### series S
+###
+#
+# Make sure missing spaces and EOLs after mark references
+# cause errors.
+#
+# Setup:
+#
+# 1--2--4
+# \ /
+# -3-
+#
+# commit marks: 301, 302, 303, 304
+# blob marks: 403, 404, resp.
+# note mark: 202
+#
+# The error message when a space is missing not at the
+# end of the line is:
+#
+# Missing space after ..
+#
+# or when extra characters come after the mark at the end
+# of the line:
+#
+# Garbage after ..
+#
+# or when the dataref is neither "inline " or a known SHA1,
+#
+# Invalid dataref ..
+#
+test_tick
+
+cat >input <<INPUT_END
+commit refs/heads/S
+mark :301
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 1
+COMMIT
+M 100644 inline hello.c
+data <<BLOB
+blob 1
+BLOB
+
+commit refs/heads/S
+mark :302
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit 2
+COMMIT
+from :301
+M 100644 inline hello.c
+data <<BLOB
+blob 2
+BLOB
+
+blob
+mark :403
+data <<BLOB
+blob 3
+BLOB
+
+blob
+mark :202
+data <<BLOB
+note 2
+BLOB
+INPUT_END
+
+test_expect_success 'S: initialize for S tests' '
+ git fast-import --export-marks=marks <input
+'
+
+#
+# filemodify, three datarefs
+#
+test_expect_success 'S: filemodify with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 :403x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after mark" err
+'
+
+# inline is misspelled; fast-import thinks it is some unknown dataref
+test_expect_success 'S: filemodify with garbage after inline must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 inlineX hello.c
+ data <<BLOB
+ inline
+ BLOB
+ EOF
+ cat err &&
+ test_i18ngrep "nvalid dataref" err
+'
+
+test_expect_success 'S: filemodify with garbage after sha1 must fail' '
+ sha1=$(grep :403 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit N
+ COMMIT
+ M 100644 ${sha1}x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, three ways to say dataref
+#
+test_expect_success 'S: notemodify with garabge after mark dataref must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref markref
+ COMMIT
+ N :202x :302
+ EOF
+ cat err &&
+ test_i18ngrep "space after mark" err
+'
+
+test_expect_success 'S: notemodify with garbage after inline dataref must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref inline
+ COMMIT
+ N inlineX :302
+ data <<BLOB
+ note blob
+ BLOB
+ EOF
+ cat err &&
+ test_i18ngrep "nvalid dataref" err
+'
+
+test_expect_success 'S: notemodify with garbage after sha1 dataref must fail' '
+ sha1=$(grep :202 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note dataref sha1
+ COMMIT
+ N ${sha1}x :302
+ EOF
+ cat err &&
+ test_i18ngrep "space after SHA1" err
+'
+
+#
+# notemodify, mark in committish
+#
+test_expect_success 'S: notemodify with garbarge after mark committish must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/Snotes
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit S note committish
+ COMMIT
+ N :202 :302x
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# from
+#
+test_expect_success 'S: from with garbage after mark must fail' '
+ # no &&
+ git fast-import --import-marks=marks --export-marks=marks <<-EOF 2>err
+ commit refs/heads/S2
+ mark :303
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :301x
+ M 100644 :403 hello.c
+ EOF
+
+ ret=$? &&
+ echo returned $ret &&
+ test $ret -ne 0 && # failed, but it created the commit
+
+ # go create the commit, need it for merge test
+ git fast-import --import-marks=marks --export-marks=marks <<-EOF &&
+ commit refs/heads/S2
+ mark :303
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ commit 3
+ COMMIT
+ from :301
+ M 100644 :403 hello.c
+ EOF
+
+ # now evaluate the error
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+
+#
+# merge
+#
+test_expect_success 'S: merge with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ commit refs/heads/S
+ mark :304
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ merge 4
+ COMMIT
+ from :302
+ merge :303x
+ M 100644 :403 hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# tag, from markref
+#
+test_expect_success 'S: tag with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ tag refs/tags/Stag
+ from :302x
+ tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<TAG
+ tag S
+ TAG
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# cat-blob markref
+#
+test_expect_success 'S: cat-blob with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ cat-blob :403x
+ EOF
+ cat err &&
+ test_i18ngrep "after mark" err
+'
+
+#
+# ls markref
+#
+test_expect_success 'S: ls with garbage after mark must fail' '
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ ls :302x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after mark" err
+'
+
+test_expect_success 'S: ls with garbage after sha1 must fail' '
+ sha1=$(grep :302 marks | cut -d\ -f2) &&
+ test_must_fail git fast-import --import-marks=marks <<-EOF 2>err &&
+ ls ${sha1}x hello.c
+ EOF
+ cat err &&
+ test_i18ngrep "space after tree-ish" err
+'
test_done
--
1.7.10.rc2.65.gbd86d
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCHv4] fast-import: tighten parsing of datarefs
2012-04-07 22:59 ` [PATCHv4] fast-import: tighten parsing of datarefs Pete Wyckoff
@ 2012-04-10 21:40 ` Junio C Hamano
0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2012-04-10 21:40 UTC (permalink / raw)
To: Pete Wyckoff
Cc: git, Jonathan Nieder, Dmitry Ivankov, David Barr,
Sverre Rabbelier, Junio C Hamano, Johan Herland
Pete Wyckoff <pw@padd.com> writes:
> The syntax for the use of mark references in fast-import
> demands either a SP (space) or LF (end-of-line) after
> a mark reference. Fast-import does not complain when garbage
> appears after a mark reference in some cases.
>
> Factor out parsing of mark references and complain if
> errant characters are found. Also be a little more careful
> when parsing "inline" and SHA1s, complaining if extra
> characters appear or if the form of the dataref is unrecognized.
> +static uintmax_t parse_mark_ref(const char *p, char **endptr)
> +{
> + uintmax_t mark;
> +
> + assert(*p == ':');
> + ++p;
> + mark = strtoumax(p, endptr, 10);
> + if (*endptr == p)
> + die("No value after ':' in mark: %s", command_buf.buf);
> + return mark;
> +}
> +static uintmax_t parse_mark_ref_eol(const char *p)
> +{
> +...
> +}
> +
> +static uintmax_t parse_mark_ref_space(const char **p)
> +{
> +...
> +}
> +
The first helper looks sensible, but the two seemingly similar
parse_mark_ref_WHATTOEXPECT() that have different interfaces are somewhat
tasteless.
I wonder if the calling sites in file_change_m(), note_change_n(),
parse_merge(), parse_cat_blob() and parse_treeish_dataref() can be made to
share even more code by slight restructuring, though.
^ permalink raw reply [flat|nested] 22+ messages in thread