All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Fix some fast-import parsing issues
@ 2019-02-20 22:58 Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Elijah Newren @ 2019-02-20 22:58 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr, Elijah Newren

I found a few issues with parsing in fast-import (dating back to
git-2.6.0).  I was thrown off track for a while by the fact that the
.git/fast_import_crash_$PID file erroneously claimed that the
penultimate command it read was somehow truncated; e.g. for input of
the form:

    commit refs/heads/master
    mark :1
    author Full Name <user@organization.tld> 1000000000 +0100
    committer Full Name <user@organization.tld> 1000000000 +0100
    data 53
    Commit message for intentionally empty orphan commit
    get-mark :1
    
the crash file was reporting

    Most Recent Commands Before Crash
    ---------------------------------
      commit refs/heads/master
      mark :1
      author Full Name <user@organization.tld> 1000000000 +0100
      committer Full Name <user@organization.tld> 1000000000 +0100
      data 53
    * get-mark :1
    
In other words, it dropped the commit message line entirely, which
made me wonder if I was having weird buffering or flushing issues
until I was able to narrow it down to the simple testcase above and
could duplicate with
    cat seven-line-test-file | git fast-import
I don't know if there are other ways to get fast-import to give a
corrupted "Most Recent Commands Before Crash" report, but this one
doesn't trigger after my other fixes below.

Switching gears from the bad reporting to the original parsing bug
though, it is clear fast-import somehow mishandled get-mark directives
after empty orphan commits.  Digging into it, I think it was caused by
following a precedent set by a somewhat unsafe implementation of
trying to allow cat-blob directives to appear in the middle of a
commit.  My spelunking showed the following commits seemed to be the
most relevant as to how we got to the current state:

    777f80d7429b ("fast-import: Allow cat-blob requests at arbitrary
                 points in stream", 2010-11-28)
    8dc6a373d201 ("fast-import: add 'ls' command", 2010-12-02)
    97313bef2a16 ("fast-import: use skip_prefix for parsing input",
                  2014-06-18)
    28c7b1f7b7b7 ("fast-import: add a get-mark command", 2015-07-01)

I've cc'ed the relevant folks, and have a few patches that fix the
issue and I think make the parser more robust against future issues in
a way that I think is safe enough for backward compatibility, but
"backward compatible enough" might concern some folks; if so, please
take a look at patches 4 and 5.

Elijah Newren (5):
  t9300: demonstrate bug with get-mark and empty orphan commits
  git-fast-import.txt: fix wording about where ls command can appear
  fast-import: check most prominent commands first
  fast-import: only allow cat-blob requests where it makes sense
  fast-import: fix erroneous handling of get-mark with empty orphan
    commits

 Documentation/git-fast-import.txt | 22 +++++++++---------
 fast-import.c                     | 31 ++++++++++++++------------
 t/t9300-fast-import.sh            | 37 +++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 24 deletions(-)

-- 
2.21.0.rc2.5.g8f70af2367

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/5] t9300: demonstrate bug with get-mark and empty orphan commits
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
@ 2019-02-20 22:58 ` Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 2/5] git-fast-import.txt: fix wording about where ls command can appear Elijah Newren
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-02-20 22:58 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt |  7 +++++-
 t/t9300-fast-import.sh            | 37 +++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 43ab3b1637..339b6e7e98 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -422,7 +422,12 @@ However it is recommended that a `filedeleteall` command precede
 all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
 the same commit, as `filedeleteall` wipes the branch clean (see below).
 
-The `LF` after the command is optional (it used to be required).
+The `LF` after the command is optional (it used to be required).  Note
+that for reasons of backward compatibility, if the commit ends with a
+`data` command (i.e. it has has no `from`, `merge`, `filemodify`,
+`filedelete`, `filecopy`, `filerename`, `filedeleteall` or
+`notemodify` commands) then two `LF` commands may appear at the end of
+the command instead of just one.
 
 `author`
 ^^^^^^^^
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 59a13b6a77..c304c8c47c 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3262,4 +3262,41 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
 	background_import_still_running
 '
 
+###
+### series W (get-mark and empty orphan commits)
+###
+
+cat >>W-input <<-W_INPUT_END
+	commit refs/heads/W-branch
+	mark :1
+	author Full Name <user@company.tld> 1000000000 +0100
+	committer Full Name <user@company.tld> 1000000000 +0100
+	data 27
+	Intentionally empty commit
+	LFsget-mark :1
+	W_INPUT_END
+
+test_expect_failure !MINGW 'W: get-mark & empty orphan commit with no newlines' '
+	sed -e s/LFs// W-input | tr L "\n" | git fast-import
+'
+
+test_expect_failure !MINGW 'W: get-mark & empty orphan commit with one newline' '
+	sed -e s/LFs/L/ W-input | tr L "\n" | git fast-import
+'
+
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with ugly second newline' '
+	# Technically, this should fail as it has too many linefeeds
+	# according to the grammar in fast-import.txt.  But, for whatever
+	# reason, it works.  Since using the correct number of newlines
+	# does not work with older (pre-2.22) versions of git, allow apps
+	# that used this second-newline workaround to keep working by
+	# checking it with this test...
+	sed -e s/LFs/LL/ W-input | tr L "\n" | git fast-import
+'
+
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous third newline' '
+	# ...but do NOT allow more empty lines than that (see previous test).
+	sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import
+'
+
 test_done
-- 
2.21.0.rc2.5.g8f70af2367


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/5] git-fast-import.txt: fix wording about where ls command can appear
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
@ 2019-02-20 22:58 ` Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 3/5] fast-import: check most prominent commands first Elijah Newren
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-02-20 22:58 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr, Elijah Newren

The docs claimed `ls` commands could appear almost anywhere, but the
code told a different story.  Modify the docs to match the code.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 339b6e7e98..f7e2d330b1 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1016,8 +1016,8 @@ printing a blob from the active commit (with `cat-blob`) or copying a
 blob or tree from a previous commit for use in the current one (with
 `filemodify`).
 
-The `ls` command can be used anywhere in the stream that comments are
-accepted, including the middle of a commit.
+The `ls` command can also be used where a `filemodify` directive can
+appear, allowing it to be used in the middle of a commit.
 
 Reading from the active commit::
 	This form can only be used in the middle of a `commit`.
-- 
2.21.0.rc2.5.g8f70af2367


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/5] fast-import: check most prominent commands first
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 2/5] git-fast-import.txt: fix wording about where ls command can appear Elijah Newren
@ 2019-02-20 22:58 ` Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 4/5] fast-import: only allow cat-blob requests where it makes sense Elijah Newren
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-02-20 22:58 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr, Elijah Newren

This is not a very important change, and one that I expect to have no
performance impact whatsoever, but reading the code bothered me.  The
parsing of command types in cmd_main() mostly runs in order of most
common to least common commands; sure, it's hard to say for sure what
the most common are without some type of study, but it seems fairly
clear to mark the original four ("blob", "commit", "tag", "reset") as
the most prominent.  Indeed, the parsing for most other commands were
added to later in the list.  However, when "ls" was added, it was stuck
near the top of the list, with no rationale for that particular
location.  Move it down to later to appease my Tourette's-like internal
twitching that its former location was causing.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 fast-import.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index b7ba755c2b..3114ce17f1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -3303,14 +3303,14 @@ int cmd_main(int argc, const char **argv)
 		const char *v;
 		if (!strcmp("blob", command_buf.buf))
 			parse_new_blob();
-		else if (skip_prefix(command_buf.buf, "ls ", &v))
-			parse_ls(v, NULL);
 		else if (skip_prefix(command_buf.buf, "commit ", &v))
 			parse_new_commit(v);
 		else if (skip_prefix(command_buf.buf, "tag ", &v))
 			parse_new_tag(v);
 		else if (skip_prefix(command_buf.buf, "reset ", &v))
 			parse_reset_branch(v);
+		else if (skip_prefix(command_buf.buf, "ls ", &v))
+			parse_ls(v, NULL);
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!strcmp("done", command_buf.buf))
-- 
2.21.0.rc2.5.g8f70af2367


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/5] fast-import: only allow cat-blob requests where it makes sense
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
                   ` (2 preceding siblings ...)
  2019-02-20 22:58 ` [RFC PATCH 3/5] fast-import: check most prominent commands first Elijah Newren
@ 2019-02-20 22:58 ` Elijah Newren
  2019-02-20 22:58 ` [RFC PATCH 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits Elijah Newren
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-02-20 22:58 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr, Elijah Newren

In commit 777f80d7429b ("fast-import: Allow cat-blob requests at
arbitrary points in stream", 2010-11-28), fast-import started allowing
cat-blob commands to appear on the start of any line except in the
middle of a "data" command.  It could be in the middle of various
directives that were part of a tag command, or in the middle of
checkpoints or progresses (each of which allow an optional second empty
newline), or even immediately after the mark command of a blob before
the data directive appeared (raising the question of what if it used the
mark for the blob that just barely appeared in the stream that we do not
yet have the data for).  None of these locations make any sense as
places to put cat-blob requests.

The purpose of this change as stated in that commit message was to

   [save] frontends from having to loop over everything they want to
   commit in the next commit and cat-ing the necessary objects in
   advance.

However, that can be achieved by simply allowing cat-blob requests to
appear whenever a filemodify directive is allowed.  Further, it avoids
setting a bad precedent for other commands to follow (e.g. get-mark); a
precedent which caused parsing problems in corner cases.

Technically, inline filemodify directives add a slight wrinkle in that
frontends might want to have cat-blob directives appear after the start
of the filemodify and before the data directive contained within it.  I
think it would have been better to disallow such a case (it would be
trivial to use cat-blob before the filemodify instead), but since there
is evidence this was used, for backwards compatibility let's support
that case too.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt |  7 ++++---
 fast-import.c                     | 19 +++++++++++++------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f7e2d330b1..982f82b0b3 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1001,9 +1001,10 @@ Output uses the same format as `git cat-file --batch`:
 	<contents> LF
 ====
 
-This command can be used anywhere in the stream that comments are
-accepted.  In particular, the `cat-blob` command can be used in the
-middle of a commit but not in the middle of a `data` command.
+This command can be used where a `filemodify` directive can appear,
+allowing it to be used in the middle of a commit.  For a `filemodify`
+using an inline directive, it can also appear right before the `data`
+directive.
 
 See ``Responses To Commands'' below for details about how to read
 this output safely.
diff --git a/fast-import.c b/fast-import.c
index 3114ce17f1..338db61e6e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1786,10 +1786,6 @@ static int read_next_command(void)
 			parse_get_mark(p);
 			continue;
 		}
-		if (skip_prefix(command_buf.buf, "cat-blob ", &p)) {
-			parse_cat_blob(p);
-			continue;
-		}
 		if (command_buf.buf[0] == '#')
 			continue;
 		return 0;
@@ -2254,8 +2250,15 @@ static void file_change_m(const char *p, struct branch *b)
 			strbuf_addstr(&uq, p);
 			p = uq.buf;
 		}
-		read_next_command();
-		parse_and_store_blob(&last_blob, &oid, 0);
+		while (read_next_command() != EOF) {
+			const char *v;
+			if (skip_prefix(command_buf.buf, "cat-blob ", &v))
+				parse_cat_blob(v);
+			else {
+				parse_and_store_blob(&last_blob, &oid, 0);
+				break;
+			}
+		}
 	} else {
 		enum object_type expected = S_ISDIR(mode) ?
 						OBJ_TREE: OBJ_BLOB;
@@ -2627,6 +2630,8 @@ static void parse_new_commit(const char *arg)
 			file_change_deleteall(b);
 		else if (skip_prefix(command_buf.buf, "ls ", &v))
 			parse_ls(v, b);
+		else if (skip_prefix(command_buf.buf, "cat-blob ", &v))
+			parse_cat_blob(v);
 		else {
 			unread_command_buf = 1;
 			break;
@@ -3311,6 +3316,8 @@ int cmd_main(int argc, const char **argv)
 			parse_reset_branch(v);
 		else if (skip_prefix(command_buf.buf, "ls ", &v))
 			parse_ls(v, NULL);
+		else if (skip_prefix(command_buf.buf, "cat-blob ", &v))
+			parse_cat_blob(v);
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!strcmp("done", command_buf.buf))
-- 
2.21.0.rc2.5.g8f70af2367


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
                   ` (3 preceding siblings ...)
  2019-02-20 22:58 ` [RFC PATCH 4/5] fast-import: only allow cat-blob requests where it makes sense Elijah Newren
@ 2019-02-20 22:58 ` Elijah Newren
  2019-03-08 16:53 ` [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
  6 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-02-20 22:58 UTC (permalink / raw)
  To: git
  Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr, Elijah Newren

When get-mark was introduced in commit 28c7b1f7b7b7 ("fast-import: add a
get-mark command", 2015-07-01), it followed the precedent of the
cat-blob command to be allowed on any line other than in the middle of a
data directive; see commit 777f80d7429b ("fast-import: Allow cat-blob
requests at arbitrary points in stream", 2010-11-28).  It was useful to
allow cat-blob directives in the middle of a commit to get more data
that would be used in writing the current commit object.  get-mark is
not similarly useful since fast-import can already use either object id
or mark.  Further, trying to allow this command anywhere caused parsing
bugs.  Fix the parsing problems by only allowing get-mark commands to
appear when other commands have completed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt | 4 ----
 fast-import.c                     | 8 ++------
 t/t9300-fast-import.sh            | 4 ++--
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 982f82b0b3..33cce1e150 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -971,10 +971,6 @@ might want to refer to in their commit messages.
 	'get-mark' SP ':' <idnum> LF
 ....
 
-This command can be used anywhere in the stream that comments are
-accepted.  In particular, the `get-mark` command can be used in the
-middle of a commit but not in the middle of a `data` command.
-
 See ``Responses To Commands'' below for details about how to read
 this output safely.
 
diff --git a/fast-import.c b/fast-import.c
index 338db61e6e..064c55e8be 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1748,8 +1748,6 @@ static int read_next_command(void)
 	}
 
 	for (;;) {
-		const char *p;
-
 		if (unread_command_buf) {
 			unread_command_buf = 0;
 		} else {
@@ -1782,10 +1780,6 @@ static int read_next_command(void)
 			rc->prev->next = rc;
 			cmd_tail = rc;
 		}
-		if (skip_prefix(command_buf.buf, "get-mark ", &p)) {
-			parse_get_mark(p);
-			continue;
-		}
 		if (command_buf.buf[0] == '#')
 			continue;
 		return 0;
@@ -3318,6 +3312,8 @@ int cmd_main(int argc, const char **argv)
 			parse_ls(v, NULL);
 		else if (skip_prefix(command_buf.buf, "cat-blob ", &v))
 			parse_cat_blob(v);
+		else if (skip_prefix(command_buf.buf, "get-mark ", &v))
+			parse_get_mark(v);
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!strcmp("done", command_buf.buf))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index c304c8c47c..3668263c40 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3276,11 +3276,11 @@ cat >>W-input <<-W_INPUT_END
 	LFsget-mark :1
 	W_INPUT_END
 
-test_expect_failure !MINGW 'W: get-mark & empty orphan commit with no newlines' '
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with no newlines' '
 	sed -e s/LFs// W-input | tr L "\n" | git fast-import
 '
 
-test_expect_failure !MINGW 'W: get-mark & empty orphan commit with one newline' '
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with one newline' '
 	sed -e s/LFs/L/ W-input | tr L "\n" | git fast-import
 '
 
-- 
2.21.0.rc2.5.g8f70af2367


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/5] Fix some fast-import parsing issues
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
                   ` (4 preceding siblings ...)
  2019-02-20 22:58 ` [RFC PATCH 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits Elijah Newren
@ 2019-03-08 16:53 ` Elijah Newren
  2019-04-01 10:44   ` Junio C Hamano
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
  6 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2019-03-08 16:53 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Michael Haggerty, Jonathan Nieder, Jeff King, David Barr

Hi,

On Wed, Feb 20, 2019 at 2:58 PM Elijah Newren <newren@gmail.com> wrote:
>
> I found a few issues with parsing in fast-import (dating back to
....
> I've cc'ed the relevant folks, and have a few patches that fix the
> issue and I think make the parser more robust against future issues in
> a way that I think is safe enough for backward compatibility, but
> "backward compatible enough" might concern some folks; if so, please
> take a look at patches 4 and 5.

Just thought I'd ping to see if folks have any concerns with this
slight tweak to backward compatibility; if not, I'll just repost the
patches removing the RFC label.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/5] Fix some fast-import parsing issues
  2019-03-08 16:53 ` [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
@ 2019-04-01 10:44   ` Junio C Hamano
  2019-04-01 15:52     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2019-04-01 10:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr

Elijah Newren <newren@gmail.com> writes:

> On Wed, Feb 20, 2019 at 2:58 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> I found a few issues with parsing in fast-import (dating back to
> ....
>> I've cc'ed the relevant folks, and have a few patches that fix the
>> issue and I think make the parser more robust against future issues in
>> a way that I think is safe enough for backward compatibility, but
>> "backward compatible enough" might concern some folks; if so, please
>> take a look at patches 4 and 5.
>
> Just thought I'd ping to see if folks have any concerns with this
> slight tweak to backward compatibility; if not, I'll just repost the
> patches removing the RFC label.

It's been more than a month and we haven't hard from anybody,
perhaps?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/5] Fix some fast-import parsing issues
  2019-04-01 10:44   ` Junio C Hamano
@ 2019-04-01 15:52     ` Elijah Newren
  0 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 15:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr

On Mon, Apr 1, 2019 at 3:44 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Wed, Feb 20, 2019 at 2:58 PM Elijah Newren <newren@gmail.com> wrote:
> >>
> >> I found a few issues with parsing in fast-import (dating back to
> > ....
> >> I've cc'ed the relevant folks, and have a few patches that fix the
> >> issue and I think make the parser more robust against future issues in
> >> a way that I think is safe enough for backward compatibility, but
> >> "backward compatible enough" might concern some folks; if so, please
> >> take a look at patches 4 and 5.
> >
> > Just thought I'd ping to see if folks have any concerns with this
> > slight tweak to backward compatibility; if not, I'll just repost the
> > patches removing the RFC label.
>
> It's been more than a month and we haven't hard from anybody,
> perhaps?

Indeed; I was going to resend sooner but got wrapped up in
directory-rename-conflict-by-default, switch and restore series
discussions, etc.  I'll rip off the RFC label and resend.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/5] Fix some fast-import parsing issues
  2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
                   ` (5 preceding siblings ...)
  2019-03-08 16:53 ` [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
@ 2019-04-01 16:00 ` Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
                     ` (4 more replies)
  6 siblings, 5 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 16:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr, Elijah Newren

This series fixes a few issues with parsing in fast-import, dating back to
git-2.6.0.  The only change since v1 is removing the RFC label; sadly, no
one commented on the first round.

Text from my original submission:

The point behind this series is that fast-import somehow mishandled
get-mark directives after empty orphan commits.  Digging into it, I
think it was caused by following a precedent set by a somewhat unsafe
implementation of trying to allow cat-blob directives to appear in the
middle of a commit.  My spelunking showed the following commits seemed
to be the most relevant as to how we got to the current state:

    777f80d7429b ("fast-import: Allow cat-blob requests at arbitrary
                 points in stream", 2010-11-28)
    8dc6a373d201 ("fast-import: add 'ls' command", 2010-12-02)
    97313bef2a16 ("fast-import: use skip_prefix for parsing input",
                  2014-06-18)
    28c7b1f7b7b7 ("fast-import: add a get-mark command", 2015-07-01)

I have a few patches that fix the issue and I think make the parser
more robust against future issues in a way that I think is safe enough
for backward compatibility, but "backward compatible enough" might
concern some folks; if so, please take a look at patches 4 and 5.

Elijah Newren (5):
  t9300: demonstrate bug with get-mark and empty orphan commits
  git-fast-import.txt: fix wording about where ls command can appear
  fast-import: check most prominent commands first
  fast-import: only allow cat-blob requests where it makes sense
  fast-import: fix erroneous handling of get-mark with empty orphan
    commits

 Documentation/git-fast-import.txt | 22 +++++++++---------
 fast-import.c                     | 31 ++++++++++++++------------
 t/t9300-fast-import.sh            | 37 +++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 24 deletions(-)

-- 
2.21.0.1.g0a561c1dbd


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] t9300: demonstrate bug with get-mark and empty orphan commits
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
@ 2019-04-01 16:00   ` Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 2/5] git-fast-import.txt: fix wording about where ls command can appear Elijah Newren
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 16:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt |  7 +++++-
 t/t9300-fast-import.sh            | 37 +++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 43ab3b1637..339b6e7e98 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -422,7 +422,12 @@ However it is recommended that a `filedeleteall` command precede
 all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
 the same commit, as `filedeleteall` wipes the branch clean (see below).
 
-The `LF` after the command is optional (it used to be required).
+The `LF` after the command is optional (it used to be required).  Note
+that for reasons of backward compatibility, if the commit ends with a
+`data` command (i.e. it has has no `from`, `merge`, `filemodify`,
+`filedelete`, `filecopy`, `filerename`, `filedeleteall` or
+`notemodify` commands) then two `LF` commands may appear at the end of
+the command instead of just one.
 
 `author`
 ^^^^^^^^
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 59a13b6a77..c304c8c47c 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3262,4 +3262,41 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
 	background_import_still_running
 '
 
+###
+### series W (get-mark and empty orphan commits)
+###
+
+cat >>W-input <<-W_INPUT_END
+	commit refs/heads/W-branch
+	mark :1
+	author Full Name <user@company.tld> 1000000000 +0100
+	committer Full Name <user@company.tld> 1000000000 +0100
+	data 27
+	Intentionally empty commit
+	LFsget-mark :1
+	W_INPUT_END
+
+test_expect_failure !MINGW 'W: get-mark & empty orphan commit with no newlines' '
+	sed -e s/LFs// W-input | tr L "\n" | git fast-import
+'
+
+test_expect_failure !MINGW 'W: get-mark & empty orphan commit with one newline' '
+	sed -e s/LFs/L/ W-input | tr L "\n" | git fast-import
+'
+
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with ugly second newline' '
+	# Technically, this should fail as it has too many linefeeds
+	# according to the grammar in fast-import.txt.  But, for whatever
+	# reason, it works.  Since using the correct number of newlines
+	# does not work with older (pre-2.22) versions of git, allow apps
+	# that used this second-newline workaround to keep working by
+	# checking it with this test...
+	sed -e s/LFs/LL/ W-input | tr L "\n" | git fast-import
+'
+
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous third newline' '
+	# ...but do NOT allow more empty lines than that (see previous test).
+	sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import
+'
+
 test_done
-- 
2.21.0.1.g0a561c1dbd


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] git-fast-import.txt: fix wording about where ls command can appear
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
@ 2019-04-01 16:00   ` Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 3/5] fast-import: check most prominent commands first Elijah Newren
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 16:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr, Elijah Newren

The docs claimed `ls` commands could appear almost anywhere, but the
code told a different story.  Modify the docs to match the code.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 339b6e7e98..f7e2d330b1 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1016,8 +1016,8 @@ printing a blob from the active commit (with `cat-blob`) or copying a
 blob or tree from a previous commit for use in the current one (with
 `filemodify`).
 
-The `ls` command can be used anywhere in the stream that comments are
-accepted, including the middle of a commit.
+The `ls` command can also be used where a `filemodify` directive can
+appear, allowing it to be used in the middle of a commit.
 
 Reading from the active commit::
 	This form can only be used in the middle of a `commit`.
-- 
2.21.0.1.g0a561c1dbd


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] fast-import: check most prominent commands first
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 2/5] git-fast-import.txt: fix wording about where ls command can appear Elijah Newren
@ 2019-04-01 16:00   ` Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 4/5] fast-import: only allow cat-blob requests where it makes sense Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits Elijah Newren
  4 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 16:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr, Elijah Newren

This is not a very important change, and one that I expect to have no
performance impact whatsoever, but reading the code bothered me.  The
parsing of command types in cmd_main() mostly runs in order of most
common to least common commands; sure, it's hard to say for sure what
the most common are without some type of study, but it seems fairly
clear to mark the original four ("blob", "commit", "tag", "reset") as
the most prominent.  Indeed, the parsing for most other commands were
added to later in the list.  However, when "ls" was added, it was stuck
near the top of the list, with no rationale for that particular
location.  Move it down to later to appease my Tourette's-like internal
twitching that its former location was causing.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 fast-import.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index b7ba755c2b..3114ce17f1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -3303,14 +3303,14 @@ int cmd_main(int argc, const char **argv)
 		const char *v;
 		if (!strcmp("blob", command_buf.buf))
 			parse_new_blob();
-		else if (skip_prefix(command_buf.buf, "ls ", &v))
-			parse_ls(v, NULL);
 		else if (skip_prefix(command_buf.buf, "commit ", &v))
 			parse_new_commit(v);
 		else if (skip_prefix(command_buf.buf, "tag ", &v))
 			parse_new_tag(v);
 		else if (skip_prefix(command_buf.buf, "reset ", &v))
 			parse_reset_branch(v);
+		else if (skip_prefix(command_buf.buf, "ls ", &v))
+			parse_ls(v, NULL);
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!strcmp("done", command_buf.buf))
-- 
2.21.0.1.g0a561c1dbd


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] fast-import: only allow cat-blob requests where it makes sense
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
                     ` (2 preceding siblings ...)
  2019-04-01 16:00   ` [PATCH v2 3/5] fast-import: check most prominent commands first Elijah Newren
@ 2019-04-01 16:00   ` Elijah Newren
  2019-04-01 16:00   ` [PATCH v2 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits Elijah Newren
  4 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 16:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr, Elijah Newren

In commit 777f80d7429b ("fast-import: Allow cat-blob requests at
arbitrary points in stream", 2010-11-28), fast-import started allowing
cat-blob commands to appear on the start of any line except in the
middle of a "data" command.  It could be in the middle of various
directives that were part of a tag command, or in the middle of
checkpoints or progresses (each of which allow an optional second empty
newline), or even immediately after the mark command of a blob before
the data directive appeared (raising the question of what if it used the
mark for the blob that just barely appeared in the stream that we do not
yet have the data for).  None of these locations make any sense as
places to put cat-blob requests.

The purpose of this change as stated in that commit message was to

   [save] frontends from having to loop over everything they want to
   commit in the next commit and cat-ing the necessary objects in
   advance.

However, that can be achieved by simply allowing cat-blob requests to
appear whenever a filemodify directive is allowed.  Further, it avoids
setting a bad precedent for other commands to follow (e.g. get-mark); a
precedent which caused parsing problems in corner cases.

Technically, inline filemodify directives add a slight wrinkle in that
frontends might want to have cat-blob directives appear after the start
of the filemodify and before the data directive contained within it.  I
think it would have been better to disallow such a case (it would be
trivial to use cat-blob before the filemodify instead), but since there
is evidence this was used, for backwards compatibility let's support
that case too.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt |  7 ++++---
 fast-import.c                     | 19 +++++++++++++------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f7e2d330b1..982f82b0b3 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1001,9 +1001,10 @@ Output uses the same format as `git cat-file --batch`:
 	<contents> LF
 ====
 
-This command can be used anywhere in the stream that comments are
-accepted.  In particular, the `cat-blob` command can be used in the
-middle of a commit but not in the middle of a `data` command.
+This command can be used where a `filemodify` directive can appear,
+allowing it to be used in the middle of a commit.  For a `filemodify`
+using an inline directive, it can also appear right before the `data`
+directive.
 
 See ``Responses To Commands'' below for details about how to read
 this output safely.
diff --git a/fast-import.c b/fast-import.c
index 3114ce17f1..338db61e6e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1786,10 +1786,6 @@ static int read_next_command(void)
 			parse_get_mark(p);
 			continue;
 		}
-		if (skip_prefix(command_buf.buf, "cat-blob ", &p)) {
-			parse_cat_blob(p);
-			continue;
-		}
 		if (command_buf.buf[0] == '#')
 			continue;
 		return 0;
@@ -2254,8 +2250,15 @@ static void file_change_m(const char *p, struct branch *b)
 			strbuf_addstr(&uq, p);
 			p = uq.buf;
 		}
-		read_next_command();
-		parse_and_store_blob(&last_blob, &oid, 0);
+		while (read_next_command() != EOF) {
+			const char *v;
+			if (skip_prefix(command_buf.buf, "cat-blob ", &v))
+				parse_cat_blob(v);
+			else {
+				parse_and_store_blob(&last_blob, &oid, 0);
+				break;
+			}
+		}
 	} else {
 		enum object_type expected = S_ISDIR(mode) ?
 						OBJ_TREE: OBJ_BLOB;
@@ -2627,6 +2630,8 @@ static void parse_new_commit(const char *arg)
 			file_change_deleteall(b);
 		else if (skip_prefix(command_buf.buf, "ls ", &v))
 			parse_ls(v, b);
+		else if (skip_prefix(command_buf.buf, "cat-blob ", &v))
+			parse_cat_blob(v);
 		else {
 			unread_command_buf = 1;
 			break;
@@ -3311,6 +3316,8 @@ int cmd_main(int argc, const char **argv)
 			parse_reset_branch(v);
 		else if (skip_prefix(command_buf.buf, "ls ", &v))
 			parse_ls(v, NULL);
+		else if (skip_prefix(command_buf.buf, "cat-blob ", &v))
+			parse_cat_blob(v);
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!strcmp("done", command_buf.buf))
-- 
2.21.0.1.g0a561c1dbd


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits
  2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
                     ` (3 preceding siblings ...)
  2019-04-01 16:00   ` [PATCH v2 4/5] fast-import: only allow cat-blob requests where it makes sense Elijah Newren
@ 2019-04-01 16:00   ` Elijah Newren
  4 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2019-04-01 16:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michael Haggerty, Jonathan Nieder, Jeff King,
	David Barr, Elijah Newren

When get-mark was introduced in commit 28c7b1f7b7b7 ("fast-import: add a
get-mark command", 2015-07-01), it followed the precedent of the
cat-blob command to be allowed on any line other than in the middle of a
data directive; see commit 777f80d7429b ("fast-import: Allow cat-blob
requests at arbitrary points in stream", 2010-11-28).  It was useful to
allow cat-blob directives in the middle of a commit to get more data
that would be used in writing the current commit object.  get-mark is
not similarly useful since fast-import can already use either object id
or mark.  Further, trying to allow this command anywhere caused parsing
bugs.  Fix the parsing problems by only allowing get-mark commands to
appear when other commands have completed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-import.txt | 4 ----
 fast-import.c                     | 8 ++------
 t/t9300-fast-import.sh            | 4 ++--
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 982f82b0b3..33cce1e150 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -971,10 +971,6 @@ might want to refer to in their commit messages.
 	'get-mark' SP ':' <idnum> LF
 ....
 
-This command can be used anywhere in the stream that comments are
-accepted.  In particular, the `get-mark` command can be used in the
-middle of a commit but not in the middle of a `data` command.
-
 See ``Responses To Commands'' below for details about how to read
 this output safely.
 
diff --git a/fast-import.c b/fast-import.c
index 338db61e6e..064c55e8be 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1748,8 +1748,6 @@ static int read_next_command(void)
 	}
 
 	for (;;) {
-		const char *p;
-
 		if (unread_command_buf) {
 			unread_command_buf = 0;
 		} else {
@@ -1782,10 +1780,6 @@ static int read_next_command(void)
 			rc->prev->next = rc;
 			cmd_tail = rc;
 		}
-		if (skip_prefix(command_buf.buf, "get-mark ", &p)) {
-			parse_get_mark(p);
-			continue;
-		}
 		if (command_buf.buf[0] == '#')
 			continue;
 		return 0;
@@ -3318,6 +3312,8 @@ int cmd_main(int argc, const char **argv)
 			parse_ls(v, NULL);
 		else if (skip_prefix(command_buf.buf, "cat-blob ", &v))
 			parse_cat_blob(v);
+		else if (skip_prefix(command_buf.buf, "get-mark ", &v))
+			parse_get_mark(v);
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!strcmp("done", command_buf.buf))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index c304c8c47c..3668263c40 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3276,11 +3276,11 @@ cat >>W-input <<-W_INPUT_END
 	LFsget-mark :1
 	W_INPUT_END
 
-test_expect_failure !MINGW 'W: get-mark & empty orphan commit with no newlines' '
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with no newlines' '
 	sed -e s/LFs// W-input | tr L "\n" | git fast-import
 '
 
-test_expect_failure !MINGW 'W: get-mark & empty orphan commit with one newline' '
+test_expect_success !MINGW 'W: get-mark & empty orphan commit with one newline' '
 	sed -e s/LFs/L/ W-input | tr L "\n" | git fast-import
 '
 
-- 
2.21.0.1.g0a561c1dbd


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-04-01 16:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 22:58 [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
2019-02-20 22:58 ` [RFC PATCH 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
2019-02-20 22:58 ` [RFC PATCH 2/5] git-fast-import.txt: fix wording about where ls command can appear Elijah Newren
2019-02-20 22:58 ` [RFC PATCH 3/5] fast-import: check most prominent commands first Elijah Newren
2019-02-20 22:58 ` [RFC PATCH 4/5] fast-import: only allow cat-blob requests where it makes sense Elijah Newren
2019-02-20 22:58 ` [RFC PATCH 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits Elijah Newren
2019-03-08 16:53 ` [RFC PATCH 0/5] Fix some fast-import parsing issues Elijah Newren
2019-04-01 10:44   ` Junio C Hamano
2019-04-01 15:52     ` Elijah Newren
2019-04-01 16:00 ` [PATCH v2 " Elijah Newren
2019-04-01 16:00   ` [PATCH v2 1/5] t9300: demonstrate bug with get-mark and empty orphan commits Elijah Newren
2019-04-01 16:00   ` [PATCH v2 2/5] git-fast-import.txt: fix wording about where ls command can appear Elijah Newren
2019-04-01 16:00   ` [PATCH v2 3/5] fast-import: check most prominent commands first Elijah Newren
2019-04-01 16:00   ` [PATCH v2 4/5] fast-import: only allow cat-blob requests where it makes sense Elijah Newren
2019-04-01 16:00   ` [PATCH v2 5/5] fast-import: fix erroneous handling of get-mark with empty orphan commits Elijah Newren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.