All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast-export: fix surprising behavior with --first-parent
@ 2021-11-23 11:28 William Sprent via GitGitGadget
  2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: William Sprent via GitGitGadget @ 2021-11-23 11:28 UTC (permalink / raw)
  To: git; +Cc: William Sprent, William Sprent

From: William Sprent <williams@unity3d.com>

When invoking git-fast-export with the --first-parent flag on a branch
with merges, fast-export would early-out on processing the first merge
on the branch. If combined with --reverse, fast-export would instead
output all single parent commits on the branch.

This commit makes fast-export output the same commits as rev-list
--first-parent, and makes --reverse not have an effect on which commits
are output.

The fix involves removing logic within fast-export which was responsible
for ensuring that parents are processed before their children, which was
what was exiting early due to missing second parents. This is replaced
by setting 'reverse = 1' before revision walking, which, in conjuction
with topo_order, allows for delegating the ordering of commits to
revision.c. The reverse flag is set after parsing rev-list arguments to
avoid having it disabled.

Signed-off-by: William Sprent <williams@unity3d.com>
---
    fast-export: fix surprising behavior with --first-parent
    
    Hi,
    
    This is my first time patching git, so I probably need some guidance on
    my approach. :)
    
    I've noticed that git fast-export exhibits some odd behavior when passed
    the --first-parent flag. In the repository I was working on, it would
    only output the initial commit before exiting. Moreover, adding the
    --reverse flag causes it to behave differently and instead output all
    commits in the first parent line that only have one parent. My
    expectation is more or less that git fast-export should output the same
    set of commits as git rev-list would output given the same arguments,
    which matches how it acts when given revision ranges.
    
    It seems like this behavior comes from the fact that has_unshown_parents
    will never be false for any merge commits encountered when fast-export
    is called with --first-parent. This causes the while loop to follow the
    pattern of pushing all commits into the "commits" queue until the
    initial commit is encountered, at which point it calls handle_tail which
    falls through on the first merge commit, causing most of the commits to
    be unhandled.
    
    My impression is that this logic only serves to ensure that parents are
    processed before their children, so in my patch I've opted to fix the
    issue by delegating that responsibility to revision.c by adding the
    reverse flag before performing the revision walk. From what I can see,
    this should be equivalent to what the previous logic is trying to
    achieve, but I can also see that there could be risk in these changes.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1084

 builtin/fast-export.c  | 36 ++----------------------------------
 t/t9350-fast-export.sh | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 34 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 8e2caf72819..50f8c224b6e 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
 
 static struct decoration idnums;
 static uint32_t last_idnum;
-
-static int has_unshown_parent(struct commit *commit)
-{
-	struct commit_list *parent;
-
-	for (parent = commit->parents; parent; parent = parent->next)
-		if (!(parent->item->object.flags & SHOWN) &&
-		    !(parent->item->object.flags & UNINTERESTING))
-			return 1;
-	return 0;
-}
-
 struct anonymized_entry {
 	struct hashmap_entry hash;
 	const char *anon;
@@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
 	return strbuf_detach(&out, NULL);
 }
 
-static void handle_tail(struct object_array *commits, struct rev_info *revs,
-			struct string_list *paths_of_changed_objects)
-{
-	struct commit *commit;
-	while (commits->nr) {
-		commit = (struct commit *)object_array_pop(commits);
-		if (has_unshown_parent(commit)) {
-			/* Queue again, to be handled later */
-			add_object_array(&commit->object, NULL, commits);
-			return;
-		}
-		handle_commit(commit, revs, paths_of_changed_objects);
-	}
-}
 
 static void handle_tag(const char *name, struct tag *tag)
 {
@@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
 int cmd_fast_export(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
-	struct object_array commits = OBJECT_ARRAY_INIT;
 	struct commit *commit;
 	char *export_filename = NULL,
 	     *import_filename = NULL,
@@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 
 	get_tags_and_duplicates(&revs.cmdline);
 
+	revs.reverse = 1;
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	revs.diffopt.format_callback = show_filemodify;
 	revs.diffopt.format_callback_data = &paths_of_changed_objects;
 	revs.diffopt.flags.recursive = 1;
 	while ((commit = get_revision(&revs))) {
-		if (has_unshown_parent(commit)) {
-			add_object_array(&commit->object, NULL, &commits);
-		}
-		else {
-			handle_commit(commit, &revs, &paths_of_changed_objects);
-			handle_tail(&commits, &revs, &paths_of_changed_objects);
-		}
+		handle_commit(commit, &revs, &paths_of_changed_objects);
 	}
 
 	handle_tags_and_duplicates(&extra_refs);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 409b48e2442..bd08101b81d 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -750,4 +750,34 @@ test_expect_success 'merge commit gets exported with --import-marks' '
 	)
 '
 
+
+test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
+	git init first-parent &&
+	cd first-parent &&
+	test_commit init &&
+	git checkout -b topic1 &&
+	test_commit file2 file2.txt &&
+	git checkout main &&
+	git merge topic1 --no-ff &&
+
+	git checkout -b topic2 &&
+	test_commit file3 file3.txt &&
+	git checkout main &&
+	git merge topic2 --no-ff &&
+
+	test_commit branch-head &&
+
+	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&
+
+	git fast-export main -- --first-parent > first-parent-export &&
+	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&
+	git init import && cd import &&
+	cat ../first-parent-export | git fast-import &&
+
+	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&
+	test $(git rev-list --all | wc -l) -eq 4 &&
+	test_cmp ../expected actual &&
+	test_cmp ../first-parent-export ../first-parent-reverse-export
+'
+
 test_done

base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] fast-export: fix surprising behavior with --first-parent
  2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
@ 2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
  2021-11-24 11:57   ` William Sprent
  2021-11-23 19:14 ` Elijah Newren
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-23 13:07 UTC (permalink / raw)
  To: William Sprent via GitGitGadget; +Cc: git, William Sprent


On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote:

> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.
>
> This commit makes fast-export output the same commits as rev-list
> --first-parent, and makes --reverse not have an effect on which commits
> are output.
>
> The fix involves removing logic within fast-export which was responsible
> for ensuring that parents are processed before their children, which was
> what was exiting early due to missing second parents. This is replaced
> by setting 'reverse = 1' before revision walking, which, in conjuction
> with topo_order, allows for delegating the ordering of commits to
> revision.c. The reverse flag is set after parsing rev-list arguments to
> avoid having it disabled.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>     
>     Hi,
>     
>     This is my first time patching git, so I probably need some guidance on
>     my approach. :)

Hi, thanks for your first contribution to git. This is a rather shallow
review, a deeper one is much deserved.

I notice that you're removing code in builtin/fast-export.c, presumably
we have code in revision.c that does the same thing. It would really
help a reviewer for you to dig a bit into the relevant commit history
and note it in the commit message.

I.e. could revision.c always do this, and this was always needless
duplication, or at time X it was needed, but as of Y revision.c learned
to do this, and callers A, B and C were adjusted, but just not this
missed call D? etc.

> -static int has_unshown_parent(struct commit *commit)
> -{
> -	struct commit_list *parent;
> -
> -	for (parent = commit->parents; parent; parent = parent->next)
> -		if (!(parent->item->object.flags & SHOWN) &&
> -		    !(parent->item->object.flags & UNINTERESTING))
> -			return 1;
> -	return 0;
> -}
> -
>  struct anonymized_entry {
>  	struct hashmap_entry hash;
>  	const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>  	return strbuf_detach(&out, NULL);
>  }
>  
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -			struct string_list *paths_of_changed_objects)
> -{
> -	struct commit *commit;
> -	while (commits->nr) {
> -		commit = (struct commit *)object_array_pop(commits);
> -		if (has_unshown_parent(commit)) {
> -			/* Queue again, to be handled later */
> -			add_object_array(&commit->object, NULL, commits);
> -			return;
> -		}
> -		handle_commit(commit, revs, paths_of_changed_objects);
> -	}
> -}

...

>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>  	struct rev_info revs;
> -	struct object_array commits = OBJECT_ARRAY_INIT;
>  	struct commit *commit;
>  	char *export_filename = NULL,
>  	     *import_filename = NULL,
> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  
>  	get_tags_and_duplicates(&revs.cmdline);
>  
> +	revs.reverse = 1;

Is the placement of revs.reverse = 1 here important, or could it go
earlier after init_revision_sources() when we assign some other values
ir revs?

>  	if (prepare_revision_walk(&revs))
>  		die("revision walk setup failed");

A light reading of prepare_revision_walk() suggests it could come after,
but maybe I'm entirely wrong.

>  	revs.diffopt.format_callback = show_filemodify;
>  	revs.diffopt.format_callback_data = &paths_of_changed_objects;
>  	revs.diffopt.flags.recursive = 1;
>  	while ((commit = get_revision(&revs))) {
> -		if (has_unshown_parent(commit)) {
> -			add_object_array(&commit->object, NULL, &commits);
> -		}
> -		else {
> -			handle_commit(commit, &revs, &paths_of_changed_objects);
> -			handle_tail(&commits, &revs, &paths_of_changed_objects);
> -		}
> +		handle_commit(commit, &revs, &paths_of_changed_objects);
>  	}
>  

Yay code deletion, good if it works (I didn't check).

Since this is just a one-statement while-loop we can also remove its
braces now.

> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +	git init first-parent &&
> +	cd first-parent &&

Do any such "cd" in a sub-shell:

	git init x &&
	(
    		cd x &&
                ...
	)
Otherwise the next test after you is going to run in anotherdirectory.

> +	test_commit init &&
> +	git checkout -b topic1 &&
> +	test_commit file2 file2.txt &&
> +	git checkout main &&
> +	git merge topic1 --no-ff &&
> +
> +	git checkout -b topic2 &&
> +	test_commit file3 file3.txt &&
> +	git checkout main &&
> +	git merge topic2 --no-ff &&

Just a nit. I'd use "test_commit A", "test_commit B" etc. when the
filenames etc. aren't important. There's no subsequent reference here,
so I assume they're not.

> +	test_commit branch-head &&
> +
> +	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&

nit; >expected, not > expected is the usual style.
> +
> +	git fast-export main -- --first-parent > first-parent-export &&
> +	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&

ditto:

> +	git init import && cd import &&

ditto earlier "cd" comment.

> +	cat ../first-parent-export | git fast-import &&

Instead of "cat x | prog" do "prog <x".

> +	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&

> +	test $(git rev-list --all | wc -l) -eq 4 &&

Instead:

    git rev-list --all >tmp &&
    test_line_count = 4 tmp

(for some value of tmp)

> +	test_cmp ../expected actual &&
> +	test_cmp ../first-parent-export ../first-parent-reverse-export
> +'

Maybe some of the CD-ing around here wouldu be easier by not doing that
and instead running e.g.:

    git -C subdir fast-export >file-not-in-subdir &&
    ...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fast-export: fix surprising behavior with --first-parent
  2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
  2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
@ 2021-11-23 19:14 ` Elijah Newren
  2021-11-24 13:05   ` William Sprent
  2021-11-24  0:41 ` Junio C Hamano
  2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
  3 siblings, 1 reply; 18+ messages in thread
From: Elijah Newren @ 2021-11-23 19:14 UTC (permalink / raw)
  To: William Sprent via GitGitGadget
  Cc: Git Mailing List, William Sprent, Johannes Schindelin

On Tue, Nov 23, 2021 at 5:25 AM William Sprent via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.
>
> This commit makes fast-export output the same commits as rev-list
> --first-parent, and makes --reverse not have an effect on which commits
> are output.
>
> The fix involves removing logic within fast-export which was responsible
> for ensuring that parents are processed before their children, which was
> what was exiting early due to missing second parents. This is replaced
> by setting 'reverse = 1' before revision walking, which, in conjuction
> with topo_order, allows for delegating the ordering of commits to
> revision.c. The reverse flag is set after parsing rev-list arguments to
> avoid having it disabled.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>
>     Hi,
>
>     This is my first time patching git, so I probably need some guidance on
>     my approach. :)
>
>     I've noticed that git fast-export exhibits some odd behavior when passed
>     the --first-parent flag. In the repository I was working on, it would
>     only output the initial commit before exiting. Moreover, adding the
>     --reverse flag causes it to behave differently and instead output all
>     commits in the first parent line that only have one parent. My
>     expectation is more or less that git fast-export should output the same
>     set of commits as git rev-list would output given the same arguments,
>     which matches how it acts when given revision ranges.
>
>     It seems like this behavior comes from the fact that has_unshown_parents
>     will never be false for any merge commits encountered when fast-export
>     is called with --first-parent. This causes the while loop to follow the
>     pattern of pushing all commits into the "commits" queue until the
>     initial commit is encountered, at which point it calls handle_tail which
>     falls through on the first merge commit, causing most of the commits to
>     be unhandled.
>
>     My impression is that this logic only serves to ensure that parents are
>     processed before their children, so in my patch I've opted to fix the
>     issue by delegating that responsibility to revision.c by adding the
>     reverse flag before performing the revision walk. From what I can see,
>     this should be equivalent to what the previous logic is trying to
>     achieve, but I can also see that there could be risk in these changes.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1084
>
>  builtin/fast-export.c  | 36 ++----------------------------------
>  t/t9350-fast-export.sh | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 34 deletions(-)
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 8e2caf72819..50f8c224b6e 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
>
>  static struct decoration idnums;
>  static uint32_t last_idnum;
> -
> -static int has_unshown_parent(struct commit *commit)
> -{
> -       struct commit_list *parent;
> -
> -       for (parent = commit->parents; parent; parent = parent->next)
> -               if (!(parent->item->object.flags & SHOWN) &&
> -                   !(parent->item->object.flags & UNINTERESTING))
> -                       return 1;
> -       return 0;
> -}
> -
>  struct anonymized_entry {
>         struct hashmap_entry hash;
>         const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>         return strbuf_detach(&out, NULL);
>  }
>
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -                       struct string_list *paths_of_changed_objects)
> -{
> -       struct commit *commit;
> -       while (commits->nr) {
> -               commit = (struct commit *)object_array_pop(commits);
> -               if (has_unshown_parent(commit)) {
> -                       /* Queue again, to be handled later */
> -                       add_object_array(&commit->object, NULL, commits);
> -                       return;
> -               }
> -               handle_commit(commit, revs, paths_of_changed_objects);
> -       }
> -}
>
>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>         struct rev_info revs;
> -       struct object_array commits = OBJECT_ARRAY_INIT;
>         struct commit *commit;
>         char *export_filename = NULL,
>              *import_filename = NULL,
> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>
>         get_tags_and_duplicates(&revs.cmdline);
>
> +       revs.reverse = 1;
>         if (prepare_revision_walk(&revs))
>                 die("revision walk setup failed");
>         revs.diffopt.format_callback = show_filemodify;
>         revs.diffopt.format_callback_data = &paths_of_changed_objects;
>         revs.diffopt.flags.recursive = 1;
>         while ((commit = get_revision(&revs))) {
> -               if (has_unshown_parent(commit)) {
> -                       add_object_array(&commit->object, NULL, &commits);
> -               }
> -               else {
> -                       handle_commit(commit, &revs, &paths_of_changed_objects);
> -                       handle_tail(&commits, &revs, &paths_of_changed_objects);
> -               }
> +               handle_commit(commit, &revs, &paths_of_changed_objects);

Wow, interesting.  I did some related work that I never submitted at
https://github.com/newren/git/commit/08f799b4667de1c755c78e1ea1657f946b588556
-- in that commit, I also noticed that fast-export did not seem
careful enough about making sure to process all commits, and came up
with a much different fix.  However, in my case, I didn't know how to
trigger the problems in fast-export without some additional changes I
was trying to make, and since I never got those other changes working,
I didn't think it was worth submitting my fix.  My solution is more
complex, in part because it was designed to handle ordering
requirements & recommendations other than just ancestry.  However, we
don't currently have any additional ordering requirements (the current
code only considers ancestry), so your solution is much simpler.  If I
do at some point get my other changes working, I'd need to
re-introduce the queue and handle_tail() and my changes to these, but
that can always be done later if and when I get those other changes
working.

Interestingly, Dscho added both the --reverse ability to revision.c
and the commits object_array and the handle_tail() stuff in
fast-export.c.  Both were done in the same year, and --reverse came
first.  See 9c5e66e97da8 (Teach revision machinery about --reverse,
2007-01-20) and f2dc849e9c5f (Add 'git fast-export', the sister of
'git fast-import', 2007-12-02).  It's not clear why revs.reverse = 1
wasn't used previously, although it certainly didn't occur to me
either and I think it would have been an alternative solution to my
first ever git.git contribution -- 784f8affe4df (fast-export: ensure
we traverse commits in topological order, 2009-02-10).  (Though
rev.reverse = 1 wouldn't have provided the same speedups that
rev.topo_order=1 provided.)

I can't think of any cases where this would cause any problems, and it
seems like using rev.reverse is a pretty clean solution.  Just in case
I'm missing something, though, it would be nice to get Dscho to also
comment on this.  I'm cc'ing him.

(Also, I see that Ævar has already provided some good feedback on the
testcase, so I'll stop here.)

>         }
>
>         handle_tags_and_duplicates(&extra_refs);
> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index 409b48e2442..bd08101b81d 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -750,4 +750,34 @@ test_expect_success 'merge commit gets exported with --import-marks' '
>         )
>  '
>
> +
> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +       git init first-parent &&
> +       cd first-parent &&
> +       test_commit init &&
> +       git checkout -b topic1 &&
> +       test_commit file2 file2.txt &&
> +       git checkout main &&
> +       git merge topic1 --no-ff &&
> +
> +       git checkout -b topic2 &&
> +       test_commit file3 file3.txt &&
> +       git checkout main &&
> +       git merge topic2 --no-ff &&
> +
> +       test_commit branch-head &&
> +
> +       git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&
> +
> +       git fast-export main -- --first-parent > first-parent-export &&
> +       git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&
> +       git init import && cd import &&
> +       cat ../first-parent-export | git fast-import &&
> +
> +       git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&
> +       test $(git rev-list --all | wc -l) -eq 4 &&
> +       test_cmp ../expected actual &&
> +       test_cmp ../first-parent-export ../first-parent-reverse-export
> +'
> +
>  test_done
>
> base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fast-export: fix surprising behavior with --first-parent
  2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
  2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
  2021-11-23 19:14 ` Elijah Newren
@ 2021-11-24  0:41 ` Junio C Hamano
  2021-11-24 13:05   ` William Sprent
  2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
  3 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2021-11-24  0:41 UTC (permalink / raw)
  To: William Sprent via GitGitGadget; +Cc: git, William Sprent

"William Sprent via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.

I do not doubt we would want to make the command behave sensibly
with all options it accepts, but let me first ask a more basic and
possibly stupid question.

What is "git fast-export --first-parent <options>" supposed to do
differently from "git fast-export <options>" (with the same set of
options other than "--first-parent")?  Should it omit merge commits
altogether, pretending that the first single-parent ancestor it
finds on the first parent chain is a direct parent of a
single-parent descendant, e.g. if the real history were with two
single-parente commits A and B, with two merges M and N, on the
mainline, making the resulting commits into a single strand of two
pearls, with A and B before and after the rewrite to have the same
tree objects?

    ---A---M---N---B             ---A---B
          /   /           ==>
         X   Y

Or should it pretend merge commits have only their first parent as
their parents, i.e.

    ---A---M---N---B             ---A---M---N---B
          /   /           ==>
         X   Y

"git fast-export --help" does not even mention "--first-parent" and
pretend that any and all [<git-rev-list-args>...] are valid requests
to make to the command, but I am wondering if that is what we intend
to support in the first place.  In builtin/fast-export.c, I do not
see any attempt to do anything differently when "--first-parent" is
requested.  Perhaps we shouldn't be even taking "--first-parent" as
an option to begin with.

The "--reverse" feels even more iffy.  Are we reversing the history
with such an export, i.e. pretending that parents are children and
merges are forks?

    ---A---M---N---B             B---N---M---A---
          /   /           ==>         \   \
         X   Y                         X   Y

Or are we supposed to produce the same history in the end, just
spewing older commits first in the output stream?  I am not sure
what purpose such a request serves---the "fast-import" downstream
would need the same set of objects before it can create each commit
anyway, so I am not sure what the point of giving "--reverse" is.

If there is no sensible interpretation for some of the options that
are valid in rev-list in the context of "fast-export" command, should
we just error out when we parse the command line, instead of producing
nonsense output stream, I wonder.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fast-export: fix surprising behavior with --first-parent
  2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
@ 2021-11-24 11:57   ` William Sprent
  0 siblings, 0 replies; 18+ messages in thread
From: William Sprent @ 2021-11-24 11:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, William Sprent via GitGitGadget
  Cc: git

On 23/11/2021 14.07, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote:
> 
>> From: William Sprent <williams@unity3d.com>
>>
>> When invoking git-fast-export with the --first-parent flag on a branch
>> with merges, fast-export would early-out on processing the first merge
>> on the branch. If combined with --reverse, fast-export would instead
>> output all single parent commits on the branch.
>>
>> This commit makes fast-export output the same commits as rev-list
>> --first-parent, and makes --reverse not have an effect on which commits
>> are output.
>>
>> The fix involves removing logic within fast-export which was responsible
>> for ensuring that parents are processed before their children, which was
>> what was exiting early due to missing second parents. This is replaced
>> by setting 'reverse = 1' before revision walking, which, in conjuction
>> with topo_order, allows for delegating the ordering of commits to
>> revision.c. The reverse flag is set after parsing rev-list arguments to
>> avoid having it disabled.
>>
>> Signed-off-by: William Sprent <williams@unity3d.com>
>> ---
>>      fast-export: fix surprising behavior with --first-parent
>>      
>>      Hi,
>>      
>>      This is my first time patching git, so I probably need some guidance on
>>      my approach. :)
> 
> Hi, thanks for your first contribution to git. This is a rather shallow
> review, a deeper one is much deserved.
> > I notice that you're removing code in builtin/fast-export.c, presumably
> we have code in revision.c that does the same thing. It would really
> help a reviewer for you to dig a bit into the relevant commit history
> and note it in the commit message.
> 
> I.e. could revision.c always do this, and this was always needless
> duplication, or at time X it was needed, but as of Y revision.c learned
> to do this, and callers A, B and C were adjusted, but just not this
> missed call D? etc.
> 

That's a really good suggestion. I should've done that. I did dig just 
enough to see that the logic has been around since fast-export was 
introduced, but I didn't check whether the 'reverse' option was part of 
revision.c at that point. I see that Elijah has done that homework for 
me later in this thread and discovered that --reverse was introduce a 
year or so before fast-export though.

>> -static int has_unshown_parent(struct commit *commit)
>> -{
>> -	struct commit_list *parent;
>> -
>> -	for (parent = commit->parents; parent; parent = parent->next)
>> -		if (!(parent->item->object.flags & SHOWN) &&
>> -		    !(parent->item->object.flags & UNINTERESTING))
>> -			return 1;
>> -	return 0;
>> -}
>> -
>>   struct anonymized_entry {
>>   	struct hashmap_entry hash;
>>   	const char *anon;
>> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>>   	return strbuf_detach(&out, NULL);
>>   }
>>   
>> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
>> -			struct string_list *paths_of_changed_objects)
>> -{
>> -	struct commit *commit;
>> -	while (commits->nr) {
>> -		commit = (struct commit *)object_array_pop(commits);
>> -		if (has_unshown_parent(commit)) {
>> -			/* Queue again, to be handled later */
>> -			add_object_array(&commit->object, NULL, commits);
>> -			return;
>> -		}
>> -		handle_commit(commit, revs, paths_of_changed_objects);
>> -	}
>> -}
> 
> ...
> 
>>   static void handle_tag(const char *name, struct tag *tag)
>>   {
>> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>>   int cmd_fast_export(int argc, const char **argv, const char *prefix)
>>   {
>>   	struct rev_info revs;
>> -	struct object_array commits = OBJECT_ARRAY_INIT;
>>   	struct commit *commit;
>>   	char *export_filename = NULL,
>>   	     *import_filename = NULL,
>> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>>   
>>   	get_tags_and_duplicates(&revs.cmdline);
>>   
>> +	revs.reverse = 1;
> 
> Is the placement of revs.reverse = 1 here important, or could it go
> earlier after init_revision_sources() when we assign some other values
> ir revs?
> 
>>   	if (prepare_revision_walk(&revs))
>>   		die("revision walk setup failed");
> 
> A light reading of prepare_revision_walk() suggests it could come after,
> but maybe I'm entirely wrong.

It needs to go after the setup_revisions() call as otherwise a --reverse 
passed to fast-export will toggle the option off again. You are right in 
that it can be moved down. I've done that in my working directory for 
the next patch.

Another option would be to move the revs.reverse up as you suggest and 
then then call die() if it was toggled off by setup_revisions(). The 
only downside I can think of is that it would make any current usage of 
'fast-export --reverse' go from a no-op to an error.

>>   	revs.diffopt.format_callback = show_filemodify;
>>   	revs.diffopt.format_callback_data = &paths_of_changed_objects;
>>   	revs.diffopt.flags.recursive = 1;
>>   	while ((commit = get_revision(&revs))) {
>> -		if (has_unshown_parent(commit)) {
>> -			add_object_array(&commit->object, NULL, &commits);
>> -		}
>> -		else {
>> -			handle_commit(commit, &revs, &paths_of_changed_objects);
>> -			handle_tail(&commits, &revs, &paths_of_changed_objects);
>> -		}
>> +		handle_commit(commit, &revs, &paths_of_changed_objects);
>>   	}
>>   
> 
> Yay code deletion, good if it works (I didn't check).
> 
> Since this is just a one-statement while-loop we can also remove its
> braces now.
>  >> +test_expect_success 'fast-export --first-parent outputs all 
revisions output by revision walk' '
>> +	git init first-parent &&
>> +	cd first-parent &&
> 
> Do any such "cd" in a sub-shell:
> 
> 	git init x &&
> 	(
>      		cd x &&
>                  ...
> 	)
> Otherwise the next test after you is going to run in anotherdirectory.
> 
>> +	test_commit init &&
>> +	git checkout -b topic1 &&
>> +	test_commit file2 file2.txt &&
>> +	git checkout main &&
>> +	git merge topic1 --no-ff &&
>> +
>> +	git checkout -b topic2 &&
>> +	test_commit file3 file3.txt &&
>> +	git checkout main &&
>> +	git merge topic2 --no-ff &&
> 
> Just a nit. I'd use "test_commit A", "test_commit B" etc. when the
> filenames etc. aren't important. There's no subsequent reference here,
> so I assume they're not.
> 
>> +	test_commit branch-head &&
>> +
>> +	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&
> 
> nit; >expected, not > expected is the usual style.
>> +
>> +	git fast-export main -- --first-parent > first-parent-export &&
>> +	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&
> 
> ditto:
> 
>> +	git init import && cd import &&
> 
> ditto earlier "cd" comment.
> 
>> +	cat ../first-parent-export | git fast-import &&
> 
> Instead of "cat x | prog" do "prog <x".
> 
>> +	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&
> 
>> +	test $(git rev-list --all | wc -l) -eq 4 &&
> 
> Instead:
> 
>      git rev-list --all >tmp &&
>      test_line_count = 4 tmp
> 
> (for some value of tmp)
> 
>> +	test_cmp ../expected actual &&
>> +	test_cmp ../first-parent-export ../first-parent-reverse-export
>> +'
> 
> Maybe some of the CD-ing around here wouldu be easier by not doing that
> and instead running e.g.:
> 
>      git -C subdir fast-export >file-not-in-subdir &&
>      ...
> 

Thanks for taking the time to give your feedback. :) I'll remove those 
braces from the while loop and incorporate your comments about the test 
for v2.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fast-export: fix surprising behavior with --first-parent
  2021-11-24  0:41 ` Junio C Hamano
@ 2021-11-24 13:05   ` William Sprent
  0 siblings, 0 replies; 18+ messages in thread
From: William Sprent @ 2021-11-24 13:05 UTC (permalink / raw)
  To: Junio C Hamano, William Sprent via GitGitGadget; +Cc: git

On 24/11/2021 01.41, Junio C Hamano wrote:
> "William Sprent via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: William Sprent <williams@unity3d.com>
>>
>> When invoking git-fast-export with the --first-parent flag on a branch
>> with merges, fast-export would early-out on processing the first merge
>> on the branch. If combined with --reverse, fast-export would instead
>> output all single parent commits on the branch.
> 
> I do not doubt we would want to make the command behave sensibly
> with all options it accepts, but let me first ask a more basic and
> possibly stupid question.
> 
> What is "git fast-export --first-parent <options>" supposed to do
> differently from "git fast-export <options>" (with the same set of
> options other than "--first-parent")?  Should it omit merge commits
> altogether, pretending that the first single-parent ancestor it
> finds on the first parent chain is a direct parent of a
> single-parent descendant, e.g. if the real history were with two
> single-parente commits A and B, with two merges M and N, on the
> mainline, making the resulting commits into a single strand of two
> pearls, with A and B before and after the rewrite to have the same
> tree objects?
> 
>      ---A---M---N---B             ---A---B
>            /   /           ==>
>           X   Y
> 
> Or should it pretend merge commits have only their first parent as
> their parents, i.e.
> 
>      ---A---M---N---B             ---A---M---N---B
>            /   /           ==>
>           X   Y
> 
> "git fast-export --help" does not even mention "--first-parent" and
> pretend that any and all [<git-rev-list-args>...] are valid requests
> to make to the command, but I am wondering if that is what we intend
> to support in the first place.  In builtin/fast-export.c, I do not
> see any attempt to do anything differently when "--first-parent" is
> requested.  Perhaps we shouldn't be even taking "--first-parent" as
> an option to begin with.
> The "--reverse" feels even more iffy.  Are we reversing the history
> with such an export, i.e. pretending that parents are children and
> merges are forks?
> 
>      ---A---M---N---B             B---N---M---A---
>            /   /           ==>         \   \
>           X   Y                         X   Y
> 
> Or are we supposed to produce the same history in the end, just
> spewing older commits first in the output stream?  I am not sure
> what purpose such a request serves---the "fast-import" downstream
> would need the same set of objects before it can create each commit
> anyway, so I am not sure what the point of giving "--reverse" is.
> 
> If there is no sensible interpretation for some of the options that
> are valid in rev-list in the context of "fast-export" command, should
> we just error out when we parse the command line, instead of producing
> nonsense output stream, I wonder.
> 


I agree with the concerns. I just skimmed the list of flags that git 
rev-list take, and I'm pretty sure that there are both flags that don't 
make sense at all in the context of fast-export, and that there are 
flags where it is unclear what the behavior of fast-export would be when 
passed.

However, I do think that having git fast-export support history limiting 
is useful. And I also think that the workflow of crafting a git rev-list 
command (perhaps using --graph) which outputs the part of history you 
want, and then applying it to git fast-export is fairly straight 
forward. But I also agree that "git fast-export --reverse" is nonsense.

I've thought about this a bit, and I wonder if having "git fast-export" 
accept revisions on stdin in a similar format as "git rev-list 
--parents" outputs would be an API that would be flexible enough, but 
without the oddities of allowing all rev-list flags. Or maybe there 
should be a list of acceptable rev-list flags which fast-export should 
accept. I don't really know.

For the specific question of what "--first-parent" should output, my 
thinking is that I would expect "git fast-export --first-parent" to 
output the same set of commits as "git rev-list --first-parent", which 
would be latter of your examples, i.e.

      ---A---M---N---B

Similarly, I guess if the user wanted

      ---A---B

then they could pass "--no-merges" as well, which would leave out the 
merge commits.

With regards to this patch in particular, we now overrides any 
"--reverse" flag that the user passes, so I can make "git fast-export 
--reverse" cause an error while I'm at it, if that is desired behavior.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fast-export: fix surprising behavior with --first-parent
  2021-11-23 19:14 ` Elijah Newren
@ 2021-11-24 13:05   ` William Sprent
  0 siblings, 0 replies; 18+ messages in thread
From: William Sprent @ 2021-11-24 13:05 UTC (permalink / raw)
  To: Elijah Newren, William Sprent via GitGitGadget
  Cc: Git Mailing List, Johannes Schindelin

On 23/11/2021 20.14, Elijah Newren wrote:
> On Tue, Nov 23, 2021 at 5:25 AM William Sprent via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: William Sprent <williams@unity3d.com>
>> ...
> 
> Wow, interesting.  I did some related work that I never submitted at
> https://github.com/newren/git/commit/08f799b4667de1c755c78e1ea1657f946b588556
> -- in that commit, I also noticed that fast-export did not seem
> careful enough about making sure to process all commits, and came up
> with a much different fix.  However, in my case, I didn't know how to
> trigger the problems in fast-export without some additional changes I
> was trying to make, and since I never got those other changes working,
> I didn't think it was worth submitting my fix.  My solution is more
> complex, in part because it was designed to handle ordering
> requirements & recommendations other than just ancestry.  However, we
> don't currently have any additional ordering requirements (the current
> code only considers ancestry), so your solution is much simpler.  If I
> do at some point get my other changes working, I'd need to
> re-introduce the queue and handle_tail() and my changes to these, but
> that can always be done later if and when I get those other changes
> working.
> 

Ah that is interesting. Good to know. I was happy that I could make 
fast-export rely on revision.c for ordering. At surface level, with the 
minimal amount of insight I have, that seems like an alright separation 
of concerns.

> Interestingly, Dscho added both the --reverse ability to revision.c
> and the commits object_array and the handle_tail() stuff in
> fast-export.c.  Both were done in the same year, and --reverse came
> first.  See 9c5e66e97da8 (Teach revision machinery about --reverse,
> 2007-01-20) and f2dc849e9c5f (Add 'git fast-export', the sister of
> 'git fast-import', 2007-12-02).  It's not clear why revs.reverse = 1
> wasn't used previously, although it certainly didn't occur to me
> either and I think it would have been an alternative solution to my
> first ever git.git contribution -- 784f8affe4df (fast-export: ensure
> we traverse commits in topological order, 2009-02-10).  (Though
> rev.reverse = 1 wouldn't have provided the same speedups that
> rev.topo_order=1 provided.)

Ah. Thanks for the insight.

> I can't think of any cases where this would cause any problems, and it
> seems like using rev.reverse is a pretty clean solution.  Just in case
> I'm missing something, though, it would be nice to get Dscho to also
> comment on this.  I'm cc'ing him.

Thanks. Good idea.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-11-24  0:41 ` Junio C Hamano
@ 2021-12-09  8:13 ` William Sprent via GitGitGadget
  2021-12-10  3:48   ` Elijah Newren
  2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
  3 siblings, 2 replies; 18+ messages in thread
From: William Sprent via GitGitGadget @ 2021-12-09  8:13 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, William Sprent, William Sprent

From: William Sprent <williams@unity3d.com>

When invoking git-fast-export with the --first-parent flag on a branch
with merges, fast-export would early-out on processing the first merge
on the branch. If combined with --reverse, fast-export would instead
output all single parent commits on the branch.

This commit makes fast-export output the same commits as rev-list
--first-parent, and makes --reverse not have an effect on which commits
are output.

The fix involves removing logic within fast-export which was responsible
for ensuring that parents are processed before their children, which was
what was exiting early due to missing second parents. This is replaced
by setting 'reverse = 1' before revision walking, which, in conjuction
with topo_order, allows for delegating the ordering of commits to
revision.c. The reverse flag is set after parsing rev-list arguments to
avoid having it disabled.

Signed-off-by: William Sprent <williams@unity3d.com>
---
    fast-export: fix surprising behavior with --first-parent
    
    Hi,
    
    I've noticed that git fast-export exhibits some odd behavior when passed
    the --first-parent flag. In the repository I was working on, it would
    only output the initial commit before exiting. Moreover, adding the
    --reverse flag causes it to behave differently and instead output all
    commits in the first parent line that only have one parent. My
    expectation is more or less that git fast-export should output the same
    set of commits as git rev-list would output given the same arguments,
    which matches how it acts when given revision ranges.
    
    It seems like this behavior comes from the fact that has_unshown_parents
    will never be false for any merge commits encountered when fast-export
    is called with --first-parent. This causes the while loop to follow the
    pattern of pushing all commits into the "commits" queue until the
    initial commit is encountered, at which point it calls handle_tail which
    falls through on the first merge commit, causing most of the commits to
    be unhandled.
    
    My impression is that this logic only serves to ensure that parents are
    processed before their children, so in my patch I've opted to fix the
    issue by delegating that responsibility to revision.c by adding the
    reverse flag before performing the revision walk. From what I can see,
    this should be equivalent to what the previous logic is trying to
    achieve, but I can also see that there could be risk in these changes.
    
    Changes since v1:
    
     * Moved revs.reverse assignment down to similar assignments below.
     * Removed braces around single statement while loop.
     * The test now only changes directory inside a sub-shell.
     * Applied stylistic feedback on test such as: making redirections be on
       the form >FILE etc.
    
    There were questions raised about whether it makes sense at all for
    fast-export to simply accept all git rev-list arguments whether they
    have an effect or not - in particular for a flag like --reverse. I think
    I agree that it is questionable behavior, or at least questionably
    documented, but I also think it is out of scope for this change.
    
    I did consider teaching fast-export to complain if given --reverse, but
    it seemed inconsistent to me as it will gladly accept every other
    rev-list argument (for example, "git fast-export HEAD --show-signature
    --graph" works just fine).
    
    cc: Ævar Arnfjörð Bjarmason avarab@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1084

Range-diff vs v1:

 1:  e2824193a8f ! 1:  af552f37dbe fast-export: fix surprising behavior with --first-parent
     @@ builtin/fast-export.c: static int parse_opt_anonymize_map(const struct option *o
       	     *import_filename = NULL,
      @@ builtin/fast-export.c: int cmd_fast_export(int argc, const char **argv, const char *prefix)
       
     - 	get_tags_and_duplicates(&revs.cmdline);
     - 
     -+	revs.reverse = 1;
       	if (prepare_revision_walk(&revs))
       		die("revision walk setup failed");
     ++
     ++	revs.reverse = 1;
       	revs.diffopt.format_callback = show_filemodify;
       	revs.diffopt.format_callback_data = &paths_of_changed_objects;
       	revs.diffopt.flags.recursive = 1;
     - 	while ((commit = get_revision(&revs))) {
     +-	while ((commit = get_revision(&revs))) {
      -		if (has_unshown_parent(commit)) {
      -			add_object_array(&commit->object, NULL, &commits);
      -		}
     @@ builtin/fast-export.c: int cmd_fast_export(int argc, const char **argv, const ch
      -			handle_commit(commit, &revs, &paths_of_changed_objects);
      -			handle_tail(&commits, &revs, &paths_of_changed_objects);
      -		}
     +-	}
     ++	while ((commit = get_revision(&revs)))
      +		handle_commit(commit, &revs, &paths_of_changed_objects);
     - 	}
       
       	handle_tags_and_duplicates(&extra_refs);
     + 	handle_tags_and_duplicates(&tag_refs);
      
       ## t/t9350-fast-export.sh ##
      @@ t/t9350-fast-export.sh: test_expect_success 'merge commit gets exported with --import-marks' '
     @@ t/t9350-fast-export.sh: test_expect_success 'merge commit gets exported with --i
      +
      +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
      +	git init first-parent &&
     -+	cd first-parent &&
     -+	test_commit init &&
     -+	git checkout -b topic1 &&
     -+	test_commit file2 file2.txt &&
     -+	git checkout main &&
     -+	git merge topic1 --no-ff &&
     ++	(
     ++		cd first-parent &&
     ++		test_commit A &&
     ++		git checkout -b topic1 &&
     ++		test_commit B &&
     ++		git checkout main &&
     ++		git merge topic1 --no-ff &&
     ++
     ++		git checkout -b topic2 &&
     ++		test_commit C &&
     ++		git checkout main &&
     ++		git merge topic2 --no-ff &&
     ++
     ++		test_commit D &&
      +
     -+	git checkout -b topic2 &&
     -+	test_commit file3 file3.txt &&
     -+	git checkout main &&
     -+	git merge topic2 --no-ff &&
     ++		git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected &&
      +
     -+	test_commit branch-head &&
     ++		git fast-export main -- --first-parent >first-parent-export &&
     ++		git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
      +
     -+	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&
     ++		git init import &&
     ++		git -C import fast-import <first-parent-export &&
      +
     -+	git fast-export main -- --first-parent > first-parent-export &&
     -+	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&
     -+	git init import && cd import &&
     -+	cat ../first-parent-export | git fast-import &&
     ++		git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
     ++		git -C import rev-list --all >tmp &&
      +
     -+	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&
     -+	test $(git rev-list --all | wc -l) -eq 4 &&
     -+	test_cmp ../expected actual &&
     -+	test_cmp ../first-parent-export ../first-parent-reverse-export
     ++		test_line_count = 4 tmp &&
     ++		test_cmp expected actual &&
     ++		test_cmp first-parent-export first-parent-reverse-export
     ++	)
      +'
      +
       test_done


 builtin/fast-export.c  | 40 ++++------------------------------------
 t/t9350-fast-export.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 8e2caf72819..cb1d6473f12 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
 
 static struct decoration idnums;
 static uint32_t last_idnum;
-
-static int has_unshown_parent(struct commit *commit)
-{
-	struct commit_list *parent;
-
-	for (parent = commit->parents; parent; parent = parent->next)
-		if (!(parent->item->object.flags & SHOWN) &&
-		    !(parent->item->object.flags & UNINTERESTING))
-			return 1;
-	return 0;
-}
-
 struct anonymized_entry {
 	struct hashmap_entry hash;
 	const char *anon;
@@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
 	return strbuf_detach(&out, NULL);
 }
 
-static void handle_tail(struct object_array *commits, struct rev_info *revs,
-			struct string_list *paths_of_changed_objects)
-{
-	struct commit *commit;
-	while (commits->nr) {
-		commit = (struct commit *)object_array_pop(commits);
-		if (has_unshown_parent(commit)) {
-			/* Queue again, to be handled later */
-			add_object_array(&commit->object, NULL, commits);
-			return;
-		}
-		handle_commit(commit, revs, paths_of_changed_objects);
-	}
-}
 
 static void handle_tag(const char *name, struct tag *tag)
 {
@@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
 int cmd_fast_export(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
-	struct object_array commits = OBJECT_ARRAY_INIT;
 	struct commit *commit;
 	char *export_filename = NULL,
 	     *import_filename = NULL,
@@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
+
+	revs.reverse = 1;
 	revs.diffopt.format_callback = show_filemodify;
 	revs.diffopt.format_callback_data = &paths_of_changed_objects;
 	revs.diffopt.flags.recursive = 1;
-	while ((commit = get_revision(&revs))) {
-		if (has_unshown_parent(commit)) {
-			add_object_array(&commit->object, NULL, &commits);
-		}
-		else {
-			handle_commit(commit, &revs, &paths_of_changed_objects);
-			handle_tail(&commits, &revs, &paths_of_changed_objects);
-		}
-	}
+	while ((commit = get_revision(&revs)))
+		handle_commit(commit, &revs, &paths_of_changed_objects);
 
 	handle_tags_and_duplicates(&extra_refs);
 	handle_tags_and_duplicates(&tag_refs);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 409b48e2442..df1a5b1013a 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -750,4 +750,39 @@ test_expect_success 'merge commit gets exported with --import-marks' '
 	)
 '
 
+
+test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
+	git init first-parent &&
+	(
+		cd first-parent &&
+		test_commit A &&
+		git checkout -b topic1 &&
+		test_commit B &&
+		git checkout main &&
+		git merge topic1 --no-ff &&
+
+		git checkout -b topic2 &&
+		test_commit C &&
+		git checkout main &&
+		git merge topic2 --no-ff &&
+
+		test_commit D &&
+
+		git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected &&
+
+		git fast-export main -- --first-parent >first-parent-export &&
+		git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
+
+		git init import &&
+		git -C import fast-import <first-parent-export &&
+
+		git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
+		git -C import rev-list --all >tmp &&
+
+		test_line_count = 4 tmp &&
+		test_cmp expected actual &&
+		test_cmp first-parent-export first-parent-reverse-export
+	)
+'
+
 test_done

base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
@ 2021-12-10  3:48   ` Elijah Newren
  2021-12-10 21:55     ` Junio C Hamano
  2021-12-13 15:09     ` William Sprent
  2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
  1 sibling, 2 replies; 18+ messages in thread
From: Elijah Newren @ 2021-12-10  3:48 UTC (permalink / raw)
  To: William Sprent via GitGitGadget; +Cc: Git Mailing List, William Sprent

 On Thu, Dec 9, 2021 at 12:13 AM William Sprent via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.
>
> This commit makes fast-export output the same commits as rev-list
> --first-parent, and makes --reverse not have an effect on which commits
> are output.
>
> The fix involves removing logic within fast-export which was responsible
> for ensuring that parents are processed before their children, which was
> what was exiting early due to missing second parents. This is replaced
> by setting 'reverse = 1' before revision walking, which, in conjuction
> with topo_order, allows for delegating the ordering of commits to
> revision.c. The reverse flag is set after parsing rev-list arguments to
> avoid having it disabled.

This explains how you discovered the issue, but seems to focus on
these particular flags, leaving the reader wondering whether there are
other flags that are also mishandled that need additional care, and
whether this fix might break things for other sets of flags.  It took
me quite a bit of reasoning to satisfy myself on those points; it's
actually somewhat complex.  May I suggest an alternative commit
message based on that?  Here's what I think are the relevant points
(and yeah, it's lengthy):


The revision traversal machinery typically processes and returns all
children before any parent.  fast-export needs to operate in the
reverse fashion, handling parents before any of their children in
order to build up the history starting from the root commit(s).  This
would be a clear case where we could just use the revision traversal
machinery's "reverse" option to achieve this desired affect.

However, this wasn't what the code did.  It added its own array for
queuing.  The obvious hand-rolled solution would be to just push all
the commits into the array and then traverse afterwards, but it didn't
quite do that either.  It instead attempted to process anything it
could as soon as it could, and once it could, check whether it could
process anything that had been queued.  As far as I can tell, this was
an effort to save a little memory in the case of multiple root commits
since it could process some commits before queueing all of them.  This
involved some helper functions named has_unshown_parent() and
handle_tail().  For typical invocations of fast-export, this
alternative essentially amounted to a hand-rolled method of reversing
the commits -- it was a bunch of work to duplicate the revision
traversal machinery's "reverse" option.

This hand-rolled reversing mechanism is actually somewhat difficult to
reason about.  It takes some time to figure out how it ensures in
normal cases that it will actually process all traversed commits
(rather than just dropping some and not printing anything for them).

And it turns out there are some cases where the code does drop commits
without handling them, and not even printing an error or warning for
the user.  Due to the has_unshown_parent() checks, some commits could
be left in the array at the end of the "while...get_revision()" loop
which would be unprocessed.  This could be triggered for example with
    git fast-export main -- --first-parent
or non-sensical traversal rules such as
    git fast-export main -- --grep=Merge --invert-grep

While most traversals that don't include all parents should likely
trigger errors in fast-export (or at least require being used in
combination with --reference-excluded-parents), the --first-parent
traversal is at least reasonable and it'd be nice if it didn't just
drop commits.  It'd also be nice to have a simpler "reverse traversal"
mechanism.  Use the "reverse" option of the revision traversal
machinery to achieve both.

Even for the non-sensical traversal flags like the --grep one above,
this would be an improvement.  For example, in that case, the code
previously would have silently truncated history to only those commits
that do not have an ancestor containing "Merge" in their commit
message.  After this code change, that case would would include all
commits without "Merge" in their commit message -- but any commit that
previously had a "Merge"-mentioning parent would lose that parent
(likely resulting in many new root commits).  While the new behavior
is still odd, it is at least understandable given that
--reference-excluded-parents is not the default.


>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>
>     Hi,
>
>     I've noticed that git fast-export exhibits some odd behavior when passed
>     the --first-parent flag. In the repository I was working on, it would
>     only output the initial commit before exiting. Moreover, adding the
>     --reverse flag causes it to behave differently and instead output all
>     commits in the first parent line that only have one parent. My
>     expectation is more or less that git fast-export should output the same
>     set of commits as git rev-list would output given the same arguments,
>     which matches how it acts when given revision ranges.
>
>     It seems like this behavior comes from the fact that has_unshown_parents
>     will never be false for any merge commits encountered when fast-export
>     is called with --first-parent. This causes the while loop to follow the
>     pattern of pushing all commits into the "commits" queue until the
>     initial commit is encountered, at which point it calls handle_tail which
>     falls through on the first merge commit, causing most of the commits to
>     be unhandled.
>
>     My impression is that this logic only serves to ensure that parents are
>     processed before their children, so in my patch I've opted to fix the
>     issue by delegating that responsibility to revision.c by adding the
>     reverse flag before performing the revision walk. From what I can see,
>     this should be equivalent to what the previous logic is trying to
>     achieve, but I can also see that there could be risk in these changes.
>
>     Changes since v1:
>
>      * Moved revs.reverse assignment down to similar assignments below.
>      * Removed braces around single statement while loop.
>      * The test now only changes directory inside a sub-shell.
>      * Applied stylistic feedback on test such as: making redirections be on
>        the form >FILE etc.
>
>     There were questions raised about whether it makes sense at all for
>     fast-export to simply accept all git rev-list arguments whether they
>     have an effect or not - in particular for a flag like --reverse. I think
>     I agree that it is questionable behavior, or at least questionably
>     documented, but I also think it is out of scope for this change.
>
>     I did consider teaching fast-export to complain if given --reverse, but
>     it seemed inconsistent to me as it will gladly accept every other
>     rev-list argument (for example, "git fast-export HEAD --show-signature
>     --graph" works just fine).
>
>     cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
>
...
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 8e2caf72819..cb1d6473f12 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
>
>  static struct decoration idnums;
>  static uint32_t last_idnum;
> -
> -static int has_unshown_parent(struct commit *commit)
> -{
> -       struct commit_list *parent;
> -
> -       for (parent = commit->parents; parent; parent = parent->next)
> -               if (!(parent->item->object.flags & SHOWN) &&
> -                   !(parent->item->object.flags & UNINTERESTING))
> -                       return 1;
> -       return 0;
> -}
> -
>  struct anonymized_entry {
>         struct hashmap_entry hash;
>         const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>         return strbuf_detach(&out, NULL);
>  }
>
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -                       struct string_list *paths_of_changed_objects)
> -{
> -       struct commit *commit;
> -       while (commits->nr) {
> -               commit = (struct commit *)object_array_pop(commits);
> -               if (has_unshown_parent(commit)) {
> -                       /* Queue again, to be handled later */
> -                       add_object_array(&commit->object, NULL, commits);
> -                       return;
> -               }
> -               handle_commit(commit, revs, paths_of_changed_objects);
> -       }
> -}

Deleted code is debugged code!  :-)

>
>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>         struct rev_info revs;
> -       struct object_array commits = OBJECT_ARRAY_INIT;
>         struct commit *commit;
>         char *export_filename = NULL,
>              *import_filename = NULL,
> @@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>
>         if (prepare_revision_walk(&revs))
>                 die("revision walk setup failed");
> +
> +       revs.reverse = 1;

I really wanted to see this next to the
    rev.topo_order = 1
statement elsewhere, but as you alluded to in the commit message, this
part of revision.c makes that unsafe:

"""
} else if (!strcmp(arg, "--reverse")) {
    revs->reverse ^= 1;
"""

However, given that it's unsafe to set revs.reverse=1 earlier, now
that I think about it, isn't it also unsafe to set revs.topo_order
where it is?  Someone could override it by passing --date-order to
fast-export.  (It's okay if you want to leave fixing that to someone
else, just thought I'd mention it while reviewing.)

>         revs.diffopt.format_callback = show_filemodify;
>         revs.diffopt.format_callback_data = &paths_of_changed_objects;
>         revs.diffopt.flags.recursive = 1;
> -       while ((commit = get_revision(&revs))) {
> -               if (has_unshown_parent(commit)) {
> -                       add_object_array(&commit->object, NULL, &commits);
> -               }
> -               else {
> -                       handle_commit(commit, &revs, &paths_of_changed_objects);
> -                       handle_tail(&commits, &revs, &paths_of_changed_objects);
> -               }
> -       }
> +       while ((commit = get_revision(&revs)))
> +               handle_commit(commit, &revs, &paths_of_changed_objects);

Looks good.  Nice work on finding this.

>
>         handle_tags_and_duplicates(&extra_refs);
>         handle_tags_and_duplicates(&tag_refs);
> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index 409b48e2442..df1a5b1013a 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -750,4 +750,39 @@ test_expect_success 'merge commit gets exported with --import-marks' '
>         )
>  '
>
> +
> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +       git init first-parent &&
> +       (
> +               cd first-parent &&
> +               test_commit A &&
> +               git checkout -b topic1 &&
> +               test_commit B &&
> +               git checkout main &&
> +               git merge topic1 --no-ff &&
> +
> +               git checkout -b topic2 &&
> +               test_commit C &&
> +               git checkout main &&
> +               git merge topic2 --no-ff &&

micro nit: I'd really prefer the --no-ff before the "topic1" and "topic2".

> +
> +               test_commit D &&
> +
> +               git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected &&

I'd much prefer this were changed to
    git log --format="%ad %s" --first-parent >expected &&
because:
  * --topo-order is irrelevant when you have a linear history (which
--first-parent gives you)
  * --no-commit-header can be tossed by using log instead of rev-list
  * %B provides the entire (multi-line) commit message body but you
clearly care about how many commits you saw below and were assuming
just one line per commit, so %s is better.
  * The format looks weird when inspecting as a human; it's much nicer
with a space between the %ad and %s.

> +
> +               git fast-export main -- --first-parent >first-parent-export &&
> +               git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
> +
> +               git init import &&
> +               git -C import fast-import <first-parent-export &&
> +
> +               git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&

Same simplifications as above here:
   git -C import log --format="%ad %s" --topo-order --all >actual &&

However, is there a reason you're using "--all" instead of "main"?
Although main is the only branch, which makes either output the same
thing, it'd be easier for folks reading to catch the equivalence if it
was clear you were now listing information about the same branch.

> +               git -C import rev-list --all >tmp &&
> +
> +               test_line_count = 4 tmp &&

I don't understand why you need tmp.  Why not just use actual on the
test_line_count line?

> +               test_cmp expected actual &&
> +               test_cmp first-parent-export first-parent-reverse-export

This last line would be much nicer to include right after
first-parent-reverse-export is created.  Or else move the creation of
first-parent-reverse-export down to just above this line.

> +       )
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-12-10  3:48   ` Elijah Newren
@ 2021-12-10 21:55     ` Junio C Hamano
  2021-12-10 22:02       ` Elijah Newren
  2021-12-13 15:09     ` William Sprent
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2021-12-10 21:55 UTC (permalink / raw)
  To: Elijah Newren
  Cc: William Sprent via GitGitGadget, Git Mailing List, William Sprent

Elijah Newren <newren@gmail.com> writes:

> ...  Here's what I think are the relevant points
> (and yeah, it's lengthy):
>
>
> The revision traversal machinery typically processes and returns all
> children before any parent.  fast-export needs to operate in the
> reverse fashion, handling parents before any of their children in
> order to build up the history starting from the root commit(s).  This
> would be a clear case where we could just use the revision traversal
> machinery's "reverse" option to achieve this desired affect.
>
> However, this wasn't what the code did.  It added its own array for
> queuing.  The obvious hand-rolled solution would be to just push all
> the commits into the array and then traverse afterwards, but it didn't
> quite do that either.  It instead attempted to process anything it
> could as soon as it could, and once it could, check whether it could
> process anything that had been queued.  As far as I can tell, this was
> an effort to save a little memory in the case of multiple root commits
> since it could process some commits before queueing all of them.  This
> involved some helper functions named has_unshown_parent() and
> handle_tail().  For typical invocations of fast-export, this
> alternative essentially amounted to a hand-rolled method of reversing
> the commits -- it was a bunch of work to duplicate the revision
> traversal machinery's "reverse" option.
>
> This hand-rolled reversing mechanism is actually somewhat difficult to
> reason about.  It takes some time to figure out how it ensures in
> normal cases that it will actually process all traversed commits
> (rather than just dropping some and not printing anything for them).
>
> And it turns out there are some cases where the code does drop commits
> without handling them, and not even printing an error or warning for
> the user.  Due to the has_unshown_parent() checks, some commits could
> be left in the array at the end of the "while...get_revision()" loop
> which would be unprocessed.  This could be triggered for example with
>     git fast-export main -- --first-parent
> or non-sensical traversal rules such as
>     git fast-export main -- --grep=Merge --invert-grep
>
> While most traversals that don't include all parents should likely
> trigger errors in fast-export (or at least require being used in
> combination with --reference-excluded-parents), the --first-parent
> traversal is at least reasonable and it'd be nice if it didn't just
> drop commits.  It'd also be nice to have a simpler "reverse traversal"
> mechanism.  Use the "reverse" option of the revision traversal
> machinery to achieve both.

The above is a very helpful and understandable explanation of what
is going on.  I am a bit puzzled by the very last part, though. By
"It'd also be nice to have a simpler 'reverse traversal' mechanism",
do you mean that the end users have need to control the direction
the traversal goes (in other words, they use "git fast-export" for
some thing, and "git fast-export --reverse" to achieve some other
things)?  Or do you just mean that we need to do a reverse traversal
but that is already available in the revision traversal machinery,
and not using it and rolling our own does not make sense?

> Even for the non-sensical traversal flags like the --grep one above,
> this would be an improvement.  For example, in that case, the code
> previously would have silently truncated history to only those commits
> that do not have an ancestor containing "Merge" in their commit
> message.  After this code change, that case would would include all

"would would" -> "would"

> commits without "Merge" in their commit message -- but any commit that
> previously had a "Merge"-mentioning parent would lose that parent
> (likely resulting in many new root commits).  While the new behavior
> is still odd, it is at least understandable given that
> --reference-excluded-parents is not the default.

Nicely written.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-12-10 21:55     ` Junio C Hamano
@ 2021-12-10 22:02       ` Elijah Newren
  0 siblings, 0 replies; 18+ messages in thread
From: Elijah Newren @ 2021-12-10 22:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: William Sprent via GitGitGadget, Git Mailing List, William Sprent

On Fri, Dec 10, 2021 at 1:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > ...  Here's what I think are the relevant points
> > (and yeah, it's lengthy):
> >
> >
> > The revision traversal machinery typically processes and returns all
> > children before any parent.  fast-export needs to operate in the
> > reverse fashion, handling parents before any of their children in
> > order to build up the history starting from the root commit(s).  This
> > would be a clear case where we could just use the revision traversal
> > machinery's "reverse" option to achieve this desired affect.
> >
> > However, this wasn't what the code did.  It added its own array for
> > queuing.  The obvious hand-rolled solution would be to just push all
> > the commits into the array and then traverse afterwards, but it didn't
> > quite do that either.  It instead attempted to process anything it
> > could as soon as it could, and once it could, check whether it could
> > process anything that had been queued.  As far as I can tell, this was
> > an effort to save a little memory in the case of multiple root commits
> > since it could process some commits before queueing all of them.  This
> > involved some helper functions named has_unshown_parent() and
> > handle_tail().  For typical invocations of fast-export, this
> > alternative essentially amounted to a hand-rolled method of reversing
> > the commits -- it was a bunch of work to duplicate the revision
> > traversal machinery's "reverse" option.
> >
> > This hand-rolled reversing mechanism is actually somewhat difficult to
> > reason about.  It takes some time to figure out how it ensures in
> > normal cases that it will actually process all traversed commits
> > (rather than just dropping some and not printing anything for them).
> >
> > And it turns out there are some cases where the code does drop commits
> > without handling them, and not even printing an error or warning for
> > the user.  Due to the has_unshown_parent() checks, some commits could
> > be left in the array at the end of the "while...get_revision()" loop
> > which would be unprocessed.  This could be triggered for example with
> >     git fast-export main -- --first-parent
> > or non-sensical traversal rules such as
> >     git fast-export main -- --grep=Merge --invert-grep
> >
> > While most traversals that don't include all parents should likely
> > trigger errors in fast-export (or at least require being used in
> > combination with --reference-excluded-parents), the --first-parent
> > traversal is at least reasonable and it'd be nice if it didn't just
> > drop commits.  It'd also be nice to have a simpler "reverse traversal"
> > mechanism.  Use the "reverse" option of the revision traversal
> > machinery to achieve both.
>
> The above is a very helpful and understandable explanation of what
> is going on.  I am a bit puzzled by the very last part, though. By
> "It'd also be nice to have a simpler 'reverse traversal' mechanism",
> do you mean that the end users have need to control the direction
> the traversal goes (in other words, they use "git fast-export" for
> some thing, and "git fast-export --reverse" to achieve some other
> things)?  Or do you just mean that we need to do a reverse traversal
> but that is already available in the revision traversal machinery,
> and not using it and rolling our own does not make sense?

Sorry, yeah, I meant the latter.  I do not think end users should have
control of the direction.  Perhaps if that was reworded to "...It'd
also be nice for future readers of the code to have a simpler..." it'd
be clearer?

> > Even for the non-sensical traversal flags like the --grep one above,
> > this would be an improvement.  For example, in that case, the code
> > previously would have silently truncated history to only those commits
> > that do not have an ancestor containing "Merge" in their commit
> > message.  After this code change, that case would would include all
>
> "would would" -> "would"

Good catch.

> > commits without "Merge" in their commit message -- but any commit that
> > previously had a "Merge"-mentioning parent would lose that parent
> > (likely resulting in many new root commits).  While the new behavior
> > is still odd, it is at least understandable given that
> > --reference-excluded-parents is not the default.
>
> Nicely written.

Thanks.  :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-12-10  3:48   ` Elijah Newren
  2021-12-10 21:55     ` Junio C Hamano
@ 2021-12-13 15:09     ` William Sprent
  2021-12-14  0:31       ` Elijah Newren
  1 sibling, 1 reply; 18+ messages in thread
From: William Sprent @ 2021-12-13 15:09 UTC (permalink / raw)
  To: Elijah Newren, William Sprent via GitGitGadget; +Cc: Git Mailing List

On 10/12/2021 04.48, Elijah Newren wrote:
>   On Thu, Dec 9, 2021 at 12:13 AM William Sprent via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: William Sprent <williams@unity3d.com>
>>
>> When invoking git-fast-export with the --first-parent flag on a branch
>> with merges, fast-export would early-out on processing the first merge
>> on the branch. If combined with --reverse, fast-export would instead
>> output all single parent commits on the branch.
>>
>> This commit makes fast-export output the same commits as rev-list
>> --first-parent, and makes --reverse not have an effect on which commits
>> are output.
>>
>> The fix involves removing logic within fast-export which was responsible
>> for ensuring that parents are processed before their children, which was
>> what was exiting early due to missing second parents. This is replaced
>> by setting 'reverse = 1' before revision walking, which, in conjuction
>> with topo_order, allows for delegating the ordering of commits to
>> revision.c. The reverse flag is set after parsing rev-list arguments to
>> avoid having it disabled.
> 
> This explains how you discovered the issue, but seems to focus on
> these particular flags, leaving the reader wondering whether there are
> other flags that are also mishandled that need additional care, and
> whether this fix might break things for other sets of flags.  It took
> me quite a bit of reasoning to satisfy myself on those points; it's
> actually somewhat complex.  May I suggest an alternative commit
> message based on that?  Here's what I think are the relevant points
> (and yeah, it's lengthy):
> 
> 
> The revision traversal machinery typically processes and returns all
> children before any parent.  fast-export needs to operate in the
> reverse fashion, handling parents before any of their children in
> order to build up the history starting from the root commit(s).  This
> would be a clear case where we could just use the revision traversal
> machinery's "reverse" option to achieve this desired affect.
> 
> However, this wasn't what the code did.  It added its own array for
> queuing.  The obvious hand-rolled solution would be to just push all
> the commits into the array and then traverse afterwards, but it didn't
> quite do that either.  It instead attempted to process anything it
> could as soon as it could, and once it could, check whether it could
> process anything that had been queued.  As far as I can tell, this was
> an effort to save a little memory in the case of multiple root commits
> since it could process some commits before queueing all of them.  This
> involved some helper functions named has_unshown_parent() and
> handle_tail().  For typical invocations of fast-export, this
> alternative essentially amounted to a hand-rolled method of reversing
> the commits -- it was a bunch of work to duplicate the revision
> traversal machinery's "reverse" option.
> 
> This hand-rolled reversing mechanism is actually somewhat difficult to
> reason about.  It takes some time to figure out how it ensures in
> normal cases that it will actually process all traversed commits
> (rather than just dropping some and not printing anything for them).
> 
> And it turns out there are some cases where the code does drop commits
> without handling them, and not even printing an error or warning for
> the user.  Due to the has_unshown_parent() checks, some commits could
> be left in the array at the end of the "while...get_revision()" loop
> which would be unprocessed.  This could be triggered for example with
>      git fast-export main -- --first-parent
> or non-sensical traversal rules such as
>      git fast-export main -- --grep=Merge --invert-grep
> 
> While most traversals that don't include all parents should likely
> trigger errors in fast-export (or at least require being used in
> combination with --reference-excluded-parents), the --first-parent
> traversal is at least reasonable and it'd be nice if it didn't just
> drop commits.  It'd also be nice to have a simpler "reverse traversal"
> mechanism.  Use the "reverse" option of the revision traversal
> machinery to achieve both.
> 
> Even for the non-sensical traversal flags like the --grep one above,
> this would be an improvement.  For example, in that case, the code
> previously would have silently truncated history to only those commits
> that do not have an ancestor containing "Merge" in their commit
> message.  After this code change, that case would would include all
> commits without "Merge" in their commit message -- but any commit that
> previously had a "Merge"-mentioning parent would lose that parent
> (likely resulting in many new root commits).  While the new behavior
> is still odd, it is at least understandable given that
> --reference-excluded-parents is not the default.
> 
> 

Thanks for taking the time to write such a thorough explanation. I'll
apply Junio's comments and use it as the commit message for the next
version.

>>
>> Signed-off-by: William Sprent <williams@unity3d.com>
>> ---
>>      fast-export: fix surprising behavior with --first-parent
>>
>>      Hi,
>>
>>      I've noticed that git fast-export exhibits some odd behavior when passed
>>      the --first-parent flag. In the repository I was working on, it would
>>      only output the initial commit before exiting. Moreover, adding the
>>      --reverse flag causes it to behave differently and instead output all
>>      commits in the first parent line that only have one parent. My
>>      expectation is more or less that git fast-export should output the same
>>      set of commits as git rev-list would output given the same arguments,
>>      which matches how it acts when given revision ranges.
>>
>>      It seems like this behavior comes from the fact that has_unshown_parents
>>      will never be false for any merge commits encountered when fast-export
>>      is called with --first-parent. This causes the while loop to follow the
>>      pattern of pushing all commits into the "commits" queue until the
>>      initial commit is encountered, at which point it calls handle_tail which
>>      falls through on the first merge commit, causing most of the commits to
>>      be unhandled.
>>
>>      My impression is that this logic only serves to ensure that parents are
>>      processed before their children, so in my patch I've opted to fix the
>>      issue by delegating that responsibility to revision.c by adding the
>>      reverse flag before performing the revision walk. From what I can see,
>>      this should be equivalent to what the previous logic is trying to
>>      achieve, but I can also see that there could be risk in these changes.
>>
>>      Changes since v1:
>>
>>       * Moved revs.reverse assignment down to similar assignments below.
>>       * Removed braces around single statement while loop.
>>       * The test now only changes directory inside a sub-shell.
>>       * Applied stylistic feedback on test such as: making redirections be on
>>         the form >FILE etc.
>>
>>      There were questions raised about whether it makes sense at all for
>>      fast-export to simply accept all git rev-list arguments whether they
>>      have an effect or not - in particular for a flag like --reverse. I think
>>      I agree that it is questionable behavior, or at least questionably
>>      documented, but I also think it is out of scope for this change.
>>
>>      I did consider teaching fast-export to complain if given --reverse, but
>>      it seemed inconsistent to me as it will gladly accept every other
>>      rev-list argument (for example, "git fast-export HEAD --show-signature
>>      --graph" works just fine).
>>
>>      cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
>>
> ...
>> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
>> index 8e2caf72819..cb1d6473f12 100644
>> --- a/builtin/fast-export.c
>> +++ b/builtin/fast-export.c
>> @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
>>
>>   static struct decoration idnums;
>>   static uint32_t last_idnum;
>> -
>> -static int has_unshown_parent(struct commit *commit)
>> -{
>> -       struct commit_list *parent;
>> -
>> -       for (parent = commit->parents; parent; parent = parent->next)
>> -               if (!(parent->item->object.flags & SHOWN) &&
>> -                   !(parent->item->object.flags & UNINTERESTING))
>> -                       return 1;
>> -       return 0;
>> -}
>> -
>>   struct anonymized_entry {
>>          struct hashmap_entry hash;
>>          const char *anon;
>> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>>          return strbuf_detach(&out, NULL);
>>   }
>>
>> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
>> -                       struct string_list *paths_of_changed_objects)
>> -{
>> -       struct commit *commit;
>> -       while (commits->nr) {
>> -               commit = (struct commit *)object_array_pop(commits);
>> -               if (has_unshown_parent(commit)) {
>> -                       /* Queue again, to be handled later */
>> -                       add_object_array(&commit->object, NULL, commits);
>> -                       return;
>> -               }
>> -               handle_commit(commit, revs, paths_of_changed_objects);
>> -       }
>> -}
> 
> Deleted code is debugged code!  :-)
> 
>>
>>   static void handle_tag(const char *name, struct tag *tag)
>>   {
>> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>>   int cmd_fast_export(int argc, const char **argv, const char *prefix)
>>   {
>>          struct rev_info revs;
>> -       struct object_array commits = OBJECT_ARRAY_INIT;
>>          struct commit *commit;
>>          char *export_filename = NULL,
>>               *import_filename = NULL,
>> @@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>>
>>          if (prepare_revision_walk(&revs))
>>                  die("revision walk setup failed");
>> +
>> +       revs.reverse = 1;
> 
> I really wanted to see this next to the
>      rev.topo_order = 1
> statement elsewhere, but as you alluded to in the commit message, this
> part of revision.c makes that unsafe:
> 
> """
> } else if (!strcmp(arg, "--reverse")) {
>      revs->reverse ^= 1;
> """
> 
> However, given that it's unsafe to set revs.reverse=1 earlier, now
> that I think about it, isn't it also unsafe to set revs.topo_order
> where it is?  Someone could override it by passing --date-order to
> fast-export.  (It's okay if you want to leave fixing that to someone
> else, just thought I'd mention it while reviewing.)
> 

I couldn't tell you for sure if the topo_order placement is safe. I at
least don't see any place where topo_order itself can be toggled off in
revision.c. I'm sure there exists at least one rev-list argument which
will cause unexpected behaviour, though.

I agree that it would be nice to have the traversal order options be
assigned in the same place. I guess we have three options:


    1. Put the reverse assignment to the top (together with topo_order),
allowing the user to disable it with --reverse, which will cause odd
behaviour.

    2. Put the reverse assignment to the top and throw an error if the
user passes the --reverse option.

    3. Keep the reverse assignment at the bottom, silently ignoring any
--reverse option.


I don't think any of the three options are particularly good. The first
one for obvious reasons. The second seems inconsistent to me as we would
only error on --reverse but not any of the other "nonsensical" rev-list
args. However, silently ignoring certain arguments does also not make
for a good user experience.

I think that it might be a good idea to move up the 'reverse' assignment
and then add a paragraph to the man page for git-fast-export explaining
that some arguments, in particular the ones that change the ordering of
commits and the ones that change how commits are displayed (such as 
--graph), may have no or unexpected effects.

I've tried writing a snippet in git-fast-export.txt, which I could include
in the next version, if you think it seems like a reasonable approach:

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index 1978dbdc6a..34875ef01d 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -157,16 +157,21 @@ by keeping the marks the same across runs.
 [<git-rev-list-args>...]::
 	A list of arguments, acceptable to 'git rev-parse' and
 	'git rev-list', that specifies the specific objects and references
 	to export.  For example, `master~10..master` causes the
 	current master reference to be exported along with all objects
 	added since its 10th ancestor commit and (unless the
 	--reference-excluded-parents option is specified) all files
 	common to master{tilde}9 and master{tilde}10.
++
+Arguments to `git rev-list` which change the _order_ in which commits are
+traversed, such as '--reverse', as well as arguments which control how commits
+are displayed, such as '--graph', may either have no effect or have an
+unexpected effect on which commits are exported.
 
 EXAMPLES
 --------


>>          revs.diffopt.format_callback = show_filemodify;
>>          revs.diffopt.format_callback_data = &paths_of_changed_objects;
>>          revs.diffopt.flags.recursive = 1;
>> -       while ((commit = get_revision(&revs))) {
>> -               if (has_unshown_parent(commit)) {
>> -                       add_object_array(&commit->object, NULL, &commits);
>> -               }
>> -               else {
>> -                       handle_commit(commit, &revs, &paths_of_changed_objects);
>> -                       handle_tail(&commits, &revs, &paths_of_changed_objects);
>> -               }
>> -       }
>> +       while ((commit = get_revision(&revs)))
>> +               handle_commit(commit, &revs, &paths_of_changed_objects);
> 
> Looks good.  Nice work on finding this.
> 
>>
>>          handle_tags_and_duplicates(&extra_refs);
>>          handle_tags_and_duplicates(&tag_refs);
>> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
>> index 409b48e2442..df1a5b1013a 100755
>> --- a/t/t9350-fast-export.sh
>> +++ b/t/t9350-fast-export.sh
>> @@ -750,4 +750,39 @@ test_expect_success 'merge commit gets exported with --import-marks' '
>>          )
>>   '
>>
>> +
>> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
>> +       git init first-parent &&
>> +       (
>> +               cd first-parent &&
>> +               test_commit A &&
>> +               git checkout -b topic1 &&
>> +               test_commit B &&
>> +               git checkout main &&
>> +               git merge topic1 --no-ff &&
>> +
>> +               git checkout -b topic2 &&
>> +               test_commit C &&
>> +               git checkout main &&
>> +               git merge topic2 --no-ff &&
> 
> micro nit: I'd really prefer the --no-ff before the "topic1" and "topic2".
> 

I agree. I'll move it forward.

>> +
>> +               test_commit D &&
>> +
>> +               git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected &&
> 
> I'd much prefer this were changed to
>      git log --format="%ad %s" --first-parent >expected &&
> because:
>    * --topo-order is irrelevant when you have a linear history (which
> --first-parent gives you)
>    * --no-commit-header can be tossed by using log instead of rev-list
>    * %B provides the entire (multi-line) commit message body but you
> clearly care about how many commits you saw below and were assuming
> just one line per commit, so %s is better.
>    * The format looks weird when inspecting as a human; it's much nicer
> with a space between the %ad and %s.
>

I think that makes sense. That would also allow me to use 'actual' in
the line count test below.

>> +
>> +               git fast-export main -- --first-parent >first-parent-export &&
>> +               git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
>> +
>> +               git init import &&
>> +               git -C import fast-import <first-parent-export &&
>> +
>> +               git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
> 
> Same simplifications as above here:
>     git -C import log --format="%ad %s" --topo-order --all >actual &&
> 
> However, is there a reason you're using "--all" instead of "main"?
> Although main is the only branch, which makes either output the same
> thing, it'd be easier for folks reading to catch the equivalence if it
> was clear you were now listing information about the same branch.
> 

I guess the intent is to be completely sure that only four commits were
exported, and no other refs made it into the new repository. I don't feel
too strongly about it, but I think it is a slightly stronger test than
leaving out the '--all'.

>> +               git -C import rev-list --all >tmp &&
>> +
>> +               test_line_count = 4 tmp &&
> 
> I don't understand why you need tmp.  Why not just use actual on the
> test_line_count line?
> 

As you noted above, I used '%B' for writing into 'actual', so I don't think
I can use that to verify that the correct number of commits have been
exported. I guess I can use your suggestion above and then save one check.

>> +               test_cmp expected actual &&
>> +               test_cmp first-parent-export first-parent-reverse-export
> 
> This last line would be much nicer to include right after
> first-parent-reverse-export is created.  Or else move the creation of
> first-parent-reverse-export down to just above this line.
> 

I agree. I will move it up. Except if we decide on moving the revs.reverse
assignment to the topo_order assignment, then passing '--reverse' won't be
a no-op and I'll remove the check instead.

>> +       )
>> +'
>> +
>>   test_done


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-12-13 15:09     ` William Sprent
@ 2021-12-14  0:31       ` Elijah Newren
  2021-12-14 13:11         ` William Sprent
  0 siblings, 1 reply; 18+ messages in thread
From: Elijah Newren @ 2021-12-14  0:31 UTC (permalink / raw)
  To: William Sprent; +Cc: William Sprent via GitGitGadget, Git Mailing List

On Mon, Dec 13, 2021 at 7:09 AM William Sprent <williams@unity3d.com> wrote:
>
> > However, given that it's unsafe to set revs.reverse=1 earlier, now
> > that I think about it, isn't it also unsafe to set revs.topo_order
> > where it is?  Someone could override it by passing --date-order to
> > fast-export.  (It's okay if you want to leave fixing that to someone
> > else, just thought I'd mention it while reviewing.)
> >
>
> I couldn't tell you for sure if the topo_order placement is safe. I at
> least don't see any place where topo_order itself can be toggled off in
> revision.c.  I'm sure there exists at least one rev-list argument which
> will cause unexpected behaviour, though.
>
> I agree that it would be nice to have the traversal order options be
> assigned in the same place. I guess we have three options:
>
>
>     1. Put the reverse assignment to the top (together with topo_order),
> allowing the user to disable it with --reverse, which will cause odd
> behaviour.

I'd call it broken rather than merely odd; more on this below.

>     2. Put the reverse assignment to the top and throw an error if the
> user passes the --reverse option.

Might be a reasonable longer term solution if someone wants to dive
into all the non-sensical options and mark them as such.  But I agree
that it's slightly odd only picking one specific one when we know
there's likely a pile of them here.

>     3. Keep the reverse assignment at the bottom, silently ignoring any
> --reverse option.

"silently ignored" or "dismissed with prejudice"?  :-)

> I don't think any of the three options are particularly good. The first
> one for obvious reasons. The second seems inconsistent to me as we would
> only error on --reverse but not any of the other "nonsensical" rev-list
> args. However, silently ignoring certain arguments does also not make
> for a good user experience.
>
> I think that it might be a good idea to move up the 'reverse' assignment
> and then add a paragraph to the man page for git-fast-export explaining
> that some arguments, in particular the ones that change the ordering of
> commits and the ones that change how commits are displayed (such as
> --graph), may have no or unexpected effects.

I'd rather choose option #3, like builtin/add.c does with max_count.
In part this is because...

> I've tried writing a snippet in git-fast-export.txt, which I could include
> in the next version, if you think it seems like a reasonable approach:
>
> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
> index 1978dbdc6a..34875ef01d 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -157,16 +157,21 @@ by keeping the marks the same across runs.
>  [<git-rev-list-args>...]::
>         A list of arguments, acceptable to 'git rev-parse' and
>         'git rev-list', that specifies the specific objects and references
>         to export.  For example, `master~10..master` causes the
>         current master reference to be exported along with all objects
>         added since its 10th ancestor commit and (unless the
>         --reference-excluded-parents option is specified) all files
>         common to master{tilde}9 and master{tilde}10.
> ++
> +Arguments to `git rev-list` which change the _order_ in which commits are
> +traversed, such as '--reverse', as well as arguments which control how commits
> +are displayed, such as '--graph', may either have no effect or have an
> +unexpected effect on which commits are exported.

After your patch, --reverse won't have an unexpected effect on _which_
commits are exported, it would instead have an unexpected effect on
_how_ commits are exported (turning _every_ commit into a root
commit).  I'd rather just go with your option #3.

> >> +
> >> +               git fast-export main -- --first-parent >first-parent-export &&
> >> +               git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
> >> +
> >> +               git init import &&
> >> +               git -C import fast-import <first-parent-export &&
> >> +
> >> +               git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
> >
> > Same simplifications as above here:
> >     git -C import log --format="%ad %s" --topo-order --all >actual &&
> >
> > However, is there a reason you're using "--all" instead of "main"?
> > Although main is the only branch, which makes either output the same
> > thing, it'd be easier for folks reading to catch the equivalence if it
> > was clear you were now listing information about the same branch.
> >
>
> I guess the intent is to be completely sure that only four commits were
> exported, and no other refs made it into the new repository. I don't feel
> too strongly about it, but I think it is a slightly stronger test than
> leaving out the '--all'.

Fair enough, '--all' works for me with that explanation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
  2021-12-14  0:31       ` Elijah Newren
@ 2021-12-14 13:11         ` William Sprent
  0 siblings, 0 replies; 18+ messages in thread
From: William Sprent @ 2021-12-14 13:11 UTC (permalink / raw)
  To: Elijah Newren; +Cc: William Sprent via GitGitGadget, Git Mailing List

On 14/12/2021 01.31, Elijah Newren wrote:
> On Mon, Dec 13, 2021 at 7:09 AM William Sprent <williams@unity3d.com> wrote:
>>
>>> However, given that it's unsafe to set revs.reverse=1 earlier, now
>>> that I think about it, isn't it also unsafe to set revs.topo_order
>>> where it is?  Someone could override it by passing --date-order to
>>> fast-export.  (It's okay if you want to leave fixing that to someone
>>> else, just thought I'd mention it while reviewing.)
>>>
>>
>> I couldn't tell you for sure if the topo_order placement is safe. I at
>> least don't see any place where topo_order itself can be toggled off in
>> revision.c.  I'm sure there exists at least one rev-list argument which
>> will cause unexpected behaviour, though.
>>
>> I agree that it would be nice to have the traversal order options be
>> assigned in the same place. I guess we have three options:
>>
>>
>>      1. Put the reverse assignment to the top (together with topo_order),
>> allowing the user to disable it with --reverse, which will cause odd
>> behaviour.
> 
> I'd call it broken rather than merely odd; more on this below.
> 
>>      2. Put the reverse assignment to the top and throw an error if the
>> user passes the --reverse option.
> 
> Might be a reasonable longer term solution if someone wants to dive
> into all the non-sensical options and mark them as such.  But I agree
> that it's slightly odd only picking one specific one when we know
> there's likely a pile of them here.
> 
>>      3. Keep the reverse assignment at the bottom, silently ignoring any
>> --reverse option.
> 
> "silently ignored" or "dismissed with prejudice"?  :-)
> 

Heh. :) It would definitely an "interesting" option to be able to 
reverse the commit graph, if it worked..

>> I don't think any of the three options are particularly good. The first
>> one for obvious reasons. The second seems inconsistent to me as we would
>> only error on --reverse but not any of the other "nonsensical" rev-list
>> args. However, silently ignoring certain arguments does also not make
>> for a good user experience.
>>
>> I think that it might be a good idea to move up the 'reverse' assignment
>> and then add a paragraph to the man page for git-fast-export explaining
>> that some arguments, in particular the ones that change the ordering of
>> commits and the ones that change how commits are displayed (such as
>> --graph), may have no or unexpected effects.
> 
> I'd rather choose option #3, like builtin/add.c does with max_count.
> In part this is because...
> 
>> I've tried writing a snippet in git-fast-export.txt, which I could include
>> in the next version, if you think it seems like a reasonable approach:
>>
>> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
>> index 1978dbdc6a..34875ef01d 100644
>> --- a/Documentation/git-fast-export.txt
>> +++ b/Documentation/git-fast-export.txt
>> @@ -157,16 +157,21 @@ by keeping the marks the same across runs.
>>   [<git-rev-list-args>...]::
>>          A list of arguments, acceptable to 'git rev-parse' and
>>          'git rev-list', that specifies the specific objects and references
>>          to export.  For example, `master~10..master` causes the
>>          current master reference to be exported along with all objects
>>          added since its 10th ancestor commit and (unless the
>>          --reference-excluded-parents option is specified) all files
>>          common to master{tilde}9 and master{tilde}10.
>> ++
>> +Arguments to `git rev-list` which change the _order_ in which commits are
>> +traversed, such as '--reverse', as well as arguments which control how commits
>> +are displayed, such as '--graph', may either have no effect or have an
>> +unexpected effect on which commits are exported.
> 
> After your patch, --reverse won't have an unexpected effect on _which_
> commits are exported, it would instead have an unexpected effect on
> _how_ commits are exported (turning _every_ commit into a root
> commit).  I'd rather just go with your option #3.
> 

Alright. I guess another solution would have been to move topo_order 
down, but that seems unsafe as well. According to your commit, 
668f3aa776 (fast-export: Set revs.topo_order before calling 
setup_revisions, 2009-06-25).

I'll leave the revs.reverse assignment where it is for now.

>>>> +
>>>> +               git fast-export main -- --first-parent >first-parent-export &&
>>>> +               git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
>>>> +
>>>> +               git init import &&
>>>> +               git -C import fast-import <first-parent-export &&
>>>> +
>>>> +               git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
>>>
>>> Same simplifications as above here:
>>>      git -C import log --format="%ad %s" --topo-order --all >actual &&
>>>
>>> However, is there a reason you're using "--all" instead of "main"?
>>> Although main is the only branch, which makes either output the same
>>> thing, it'd be easier for folks reading to catch the equivalence if it
>>> was clear you were now listing information about the same branch.
>>>
>>
>> I guess the intent is to be completely sure that only four commits were
>> exported, and no other refs made it into the new repository. I don't feel
>> too strongly about it, but I think it is a slightly stronger test than
>> leaving out the '--all'.
> 
> Fair enough, '--all' works for me with that explanation.
> 

Ok. In summary I'll use the commit message you wrote and apply the rest 
of your feedback for the test for the next version.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3] fast-export: fix surprising behavior with --first-parent
  2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
  2021-12-10  3:48   ` Elijah Newren
@ 2021-12-16 16:23   ` William Sprent via GitGitGadget
  2021-12-21 18:47     ` Elijah Newren
  1 sibling, 1 reply; 18+ messages in thread
From: William Sprent via GitGitGadget @ 2021-12-16 16:23 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	William Sprent, William Sprent

From: William Sprent <williams@unity3d.com>

The revision traversal machinery typically processes and returns all
children before any parent.  fast-export needs to operate in the
reverse fashion, handling parents before any of their children in
order to build up the history starting from the root commit(s).  This
would be a clear case where we could just use the revision traversal
machinery's "reverse" option to achieve this desired affect.

However, this wasn't what the code did.  It added its own array for
queuing.  The obvious hand-rolled solution would be to just push all
the commits into the array and then traverse afterwards, but it didn't
quite do that either.  It instead attempted to process anything it
could as soon as it could, and once it could, check whether it could
process anything that had been queued.  As far as I can tell, this was
an effort to save a little memory in the case of multiple root commits
since it could process some commits before queueing all of them.  This
involved some helper functions named has_unshown_parent() and
handle_tail().  For typical invocations of fast-export, this
alternative essentially amounted to a hand-rolled method of reversing
the commits -- it was a bunch of work to duplicate the revision
traversal machinery's "reverse" option.

This hand-rolled reversing mechanism is actually somewhat difficult to
reason about.  It takes some time to figure out how it ensures in
normal cases that it will actually process all traversed commits
(rather than just dropping some and not printing anything for them).

And it turns out there are some cases where the code does drop commits
without handling them, and not even printing an error or warning for
the user.  Due to the has_unshown_parent() checks, some commits could
be left in the array at the end of the "while...get_revision()" loop
which would be unprocessed.  This could be triggered for example with
    git fast-export main -- --first-parent
or non-sensical traversal rules such as
    git fast-export main -- --grep=Merge --invert-grep

While most traversals that don't include all parents should likely
trigger errors in fast-export (or at least require being used in
combination with --reference-excluded-parents), the --first-parent
traversal is at least reasonable and it'd be nice if it didn't just drop
commits. It'd also be nice for future readers of the code to have a
simpler "reverse traversal" mechanism. Use the "reverse" option of the
revision traversal machinery to achieve both.

Even for the non-sensical traversal flags like the --grep one above,
this would be an improvement. For example, in that case, the code
previously would have silently truncated history to only those commits
that do not have an ancestor containing "Merge" in their commit message.
After this code change, that case would include all commits without
"Merge" in their commit message -- but any commit that previously had a
"Merge"-mentioning parent would lose that parent
(likely resulting in many new root commits). While the new behavior is
still odd, it is at least understandable given that
--reference-excluded-parents is not the default.

Signed-off-by: William Sprent <williams@unity3d.com>
---
    fast-export: fix surprising behavior with --first-parent
    
    Hi,
    
    Here's a revised version of my previously submitted patch regarding
    fast-export and how it reacts to the --first-parent flag.
    
    Changes since v2:
    
     * Changed commit message to include the detailed description of the
       problem area as suggested by Elijah. I went back and forth with my
       self about whether the message needs some "lead in", but it ended up
       getting long without adding much.
     * Cleaned up test such that the comparison with the --reverse output is
       just below where it is generated.
     * In test, moved all usages of --no-ff to just after 'git merge' .
     * In test, replaced long 'git rev-list' incantation with simpler 'git
       log' format.
    
    Changes since v1:
    
     * Moved revs.reverse assignment down to similar assignments below.
     * Removed braces around single statement while loop.
     * The test now only changes directory inside a sub-shell.
     * Applied stylistic feedback on test such as: making redirections be on
       the form >FILE etc.
    
    There were questions raised about whether it makes sense at all for
    fast-export to simply accept all git rev-list arguments whether they
    have an effect or not - in particular for a flag like --reverse. I think
    I agree that it is questionable behavior, or at least questionably
    documented, but I also think it is out of scope for this change.
    
    I did consider teaching fast-export to complain if given --reverse, but
    it seemed inconsistent to me as it will gladly accept every other
    rev-list argument (for example, "git fast-export HEAD --show-signature
    --graph" works just fine).
    
    Original Message: I've noticed that git fast-export exhibits some odd
    behavior when passed the --first-parent flag. In the repository I was
    working on, it would only output the initial commit before exiting.
    Moreover, adding the --reverse flag causes it to behave differently and
    instead output all commits in the first parent line that only have one
    parent. My expectation is more or less that git fast-export should
    output the same set of commits as git rev-list would output given the
    same arguments, which matches how it acts when given revision ranges.
    
    It seems like this behavior comes from the fact that has_unshown_parents
    will never be false for any merge commits encountered when fast-export
    is called with --first-parent. This causes the while loop to follow the
    pattern of pushing all commits into the "commits" queue until the
    initial commit is encountered, at which point it calls handle_tail which
    falls through on the first merge commit, causing most of the commits to
    be unhandled.
    
    My impression is that this logic only serves to ensure that parents are
    processed before their children, so in my patch I've opted to fix the
    issue by delegating that responsibility to revision.c by adding the
    reverse flag before performing the revision walk. From what I can see,
    this should be equivalent to what the previous logic is trying to
    achieve, but I can also see that there could be risk in these changes.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1084

Range-diff vs v2:

 1:  af552f37dbe ! 1:  312255260b1 fast-export: fix surprising behavior with --first-parent
     @@ Metadata
       ## Commit message ##
          fast-export: fix surprising behavior with --first-parent
      
     -    When invoking git-fast-export with the --first-parent flag on a branch
     -    with merges, fast-export would early-out on processing the first merge
     -    on the branch. If combined with --reverse, fast-export would instead
     -    output all single parent commits on the branch.
     +    The revision traversal machinery typically processes and returns all
     +    children before any parent.  fast-export needs to operate in the
     +    reverse fashion, handling parents before any of their children in
     +    order to build up the history starting from the root commit(s).  This
     +    would be a clear case where we could just use the revision traversal
     +    machinery's "reverse" option to achieve this desired affect.
      
     -    This commit makes fast-export output the same commits as rev-list
     -    --first-parent, and makes --reverse not have an effect on which commits
     -    are output.
     +    However, this wasn't what the code did.  It added its own array for
     +    queuing.  The obvious hand-rolled solution would be to just push all
     +    the commits into the array and then traverse afterwards, but it didn't
     +    quite do that either.  It instead attempted to process anything it
     +    could as soon as it could, and once it could, check whether it could
     +    process anything that had been queued.  As far as I can tell, this was
     +    an effort to save a little memory in the case of multiple root commits
     +    since it could process some commits before queueing all of them.  This
     +    involved some helper functions named has_unshown_parent() and
     +    handle_tail().  For typical invocations of fast-export, this
     +    alternative essentially amounted to a hand-rolled method of reversing
     +    the commits -- it was a bunch of work to duplicate the revision
     +    traversal machinery's "reverse" option.
      
     -    The fix involves removing logic within fast-export which was responsible
     -    for ensuring that parents are processed before their children, which was
     -    what was exiting early due to missing second parents. This is replaced
     -    by setting 'reverse = 1' before revision walking, which, in conjuction
     -    with topo_order, allows for delegating the ordering of commits to
     -    revision.c. The reverse flag is set after parsing rev-list arguments to
     -    avoid having it disabled.
     +    This hand-rolled reversing mechanism is actually somewhat difficult to
     +    reason about.  It takes some time to figure out how it ensures in
     +    normal cases that it will actually process all traversed commits
     +    (rather than just dropping some and not printing anything for them).
     +
     +    And it turns out there are some cases where the code does drop commits
     +    without handling them, and not even printing an error or warning for
     +    the user.  Due to the has_unshown_parent() checks, some commits could
     +    be left in the array at the end of the "while...get_revision()" loop
     +    which would be unprocessed.  This could be triggered for example with
     +        git fast-export main -- --first-parent
     +    or non-sensical traversal rules such as
     +        git fast-export main -- --grep=Merge --invert-grep
     +
     +    While most traversals that don't include all parents should likely
     +    trigger errors in fast-export (or at least require being used in
     +    combination with --reference-excluded-parents), the --first-parent
     +    traversal is at least reasonable and it'd be nice if it didn't just drop
     +    commits. It'd also be nice for future readers of the code to have a
     +    simpler "reverse traversal" mechanism. Use the "reverse" option of the
     +    revision traversal machinery to achieve both.
     +
     +    Even for the non-sensical traversal flags like the --grep one above,
     +    this would be an improvement. For example, in that case, the code
     +    previously would have silently truncated history to only those commits
     +    that do not have an ancestor containing "Merge" in their commit message.
     +    After this code change, that case would include all commits without
     +    "Merge" in their commit message -- but any commit that previously had a
     +    "Merge"-mentioning parent would lose that parent
     +    (likely resulting in many new root commits). While the new behavior is
     +    still odd, it is at least understandable given that
     +    --reference-excluded-parents is not the default.
      
          Signed-off-by: William Sprent <williams@unity3d.com>
      
     @@ t/t9350-fast-export.sh: test_expect_success 'merge commit gets exported with --i
      +		git checkout -b topic1 &&
      +		test_commit B &&
      +		git checkout main &&
     -+		git merge topic1 --no-ff &&
     ++		git merge --no-ff topic1 &&
      +
      +		git checkout -b topic2 &&
      +		test_commit C &&
      +		git checkout main &&
     -+		git merge topic2 --no-ff &&
     ++		git merge --no-ff topic2 &&
      +
      +		test_commit D &&
      +
     -+		git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected &&
     -+
      +		git fast-export main -- --first-parent >first-parent-export &&
      +		git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
     ++		test_cmp first-parent-export first-parent-reverse-export &&
      +
      +		git init import &&
      +		git -C import fast-import <first-parent-export &&
      +
     -+		git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
     -+		git -C import rev-list --all >tmp &&
     -+
     -+		test_line_count = 4 tmp &&
     ++		git log --format="%ad %s" --first-parent main >expected &&
     ++		git -C import log --format="%ad %s" --all >actual &&
      +		test_cmp expected actual &&
     -+		test_cmp first-parent-export first-parent-reverse-export
     ++		test_line_count = 4 actual
      +	)
      +'
      +


 builtin/fast-export.c  | 40 ++++------------------------------------
 t/t9350-fast-export.sh | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 8e2caf72819..cb1d6473f12 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
 
 static struct decoration idnums;
 static uint32_t last_idnum;
-
-static int has_unshown_parent(struct commit *commit)
-{
-	struct commit_list *parent;
-
-	for (parent = commit->parents; parent; parent = parent->next)
-		if (!(parent->item->object.flags & SHOWN) &&
-		    !(parent->item->object.flags & UNINTERESTING))
-			return 1;
-	return 0;
-}
-
 struct anonymized_entry {
 	struct hashmap_entry hash;
 	const char *anon;
@@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
 	return strbuf_detach(&out, NULL);
 }
 
-static void handle_tail(struct object_array *commits, struct rev_info *revs,
-			struct string_list *paths_of_changed_objects)
-{
-	struct commit *commit;
-	while (commits->nr) {
-		commit = (struct commit *)object_array_pop(commits);
-		if (has_unshown_parent(commit)) {
-			/* Queue again, to be handled later */
-			add_object_array(&commit->object, NULL, commits);
-			return;
-		}
-		handle_commit(commit, revs, paths_of_changed_objects);
-	}
-}
 
 static void handle_tag(const char *name, struct tag *tag)
 {
@@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
 int cmd_fast_export(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
-	struct object_array commits = OBJECT_ARRAY_INIT;
 	struct commit *commit;
 	char *export_filename = NULL,
 	     *import_filename = NULL,
@@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
+
+	revs.reverse = 1;
 	revs.diffopt.format_callback = show_filemodify;
 	revs.diffopt.format_callback_data = &paths_of_changed_objects;
 	revs.diffopt.flags.recursive = 1;
-	while ((commit = get_revision(&revs))) {
-		if (has_unshown_parent(commit)) {
-			add_object_array(&commit->object, NULL, &commits);
-		}
-		else {
-			handle_commit(commit, &revs, &paths_of_changed_objects);
-			handle_tail(&commits, &revs, &paths_of_changed_objects);
-		}
-	}
+	while ((commit = get_revision(&revs)))
+		handle_commit(commit, &revs, &paths_of_changed_objects);
 
 	handle_tags_and_duplicates(&extra_refs);
 	handle_tags_and_duplicates(&tag_refs);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 409b48e2442..7b7a18dd2c1 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -750,4 +750,36 @@ test_expect_success 'merge commit gets exported with --import-marks' '
 	)
 '
 
+
+test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
+	git init first-parent &&
+	(
+		cd first-parent &&
+		test_commit A &&
+		git checkout -b topic1 &&
+		test_commit B &&
+		git checkout main &&
+		git merge --no-ff topic1 &&
+
+		git checkout -b topic2 &&
+		test_commit C &&
+		git checkout main &&
+		git merge --no-ff topic2 &&
+
+		test_commit D &&
+
+		git fast-export main -- --first-parent >first-parent-export &&
+		git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
+		test_cmp first-parent-export first-parent-reverse-export &&
+
+		git init import &&
+		git -C import fast-import <first-parent-export &&
+
+		git log --format="%ad %s" --first-parent main >expected &&
+		git -C import log --format="%ad %s" --all >actual &&
+		test_cmp expected actual &&
+		test_line_count = 4 actual
+	)
+'
+
 test_done

base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] fast-export: fix surprising behavior with --first-parent
  2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
@ 2021-12-21 18:47     ` Elijah Newren
  2021-12-21 20:50       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Elijah Newren @ 2021-12-21 18:47 UTC (permalink / raw)
  To: William Sprent via GitGitGadget
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason, William Sprent

On Thu, Dec 16, 2021 at 8:23 AM William Sprent via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: William Sprent <williams@unity3d.com>
>
> The revision traversal machinery typically processes and returns all
> children before any parent.  fast-export needs to operate in the
> reverse fashion, handling parents before any of their children in
> order to build up the history starting from the root commit(s).  This
> would be a clear case where we could just use the revision traversal
> machinery's "reverse" option to achieve this desired affect.
>
> However, this wasn't what the code did.  It added its own array for
> queuing.  The obvious hand-rolled solution would be to just push all
> the commits into the array and then traverse afterwards, but it didn't
> quite do that either.  It instead attempted to process anything it
> could as soon as it could, and once it could, check whether it could
> process anything that had been queued.  As far as I can tell, this was
> an effort to save a little memory in the case of multiple root commits
> since it could process some commits before queueing all of them.  This
> involved some helper functions named has_unshown_parent() and
> handle_tail().  For typical invocations of fast-export, this
> alternative essentially amounted to a hand-rolled method of reversing
> the commits -- it was a bunch of work to duplicate the revision
> traversal machinery's "reverse" option.
>
> This hand-rolled reversing mechanism is actually somewhat difficult to
> reason about.  It takes some time to figure out how it ensures in
> normal cases that it will actually process all traversed commits
> (rather than just dropping some and not printing anything for them).
>
> And it turns out there are some cases where the code does drop commits
> without handling them, and not even printing an error or warning for
> the user.  Due to the has_unshown_parent() checks, some commits could
> be left in the array at the end of the "while...get_revision()" loop
> which would be unprocessed.  This could be triggered for example with
>     git fast-export main -- --first-parent
> or non-sensical traversal rules such as
>     git fast-export main -- --grep=Merge --invert-grep
>
> While most traversals that don't include all parents should likely
> trigger errors in fast-export (or at least require being used in
> combination with --reference-excluded-parents), the --first-parent
> traversal is at least reasonable and it'd be nice if it didn't just drop
> commits. It'd also be nice for future readers of the code to have a
> simpler "reverse traversal" mechanism. Use the "reverse" option of the
> revision traversal machinery to achieve both.
>
> Even for the non-sensical traversal flags like the --grep one above,
> this would be an improvement. For example, in that case, the code
> previously would have silently truncated history to only those commits
> that do not have an ancestor containing "Merge" in their commit message.
> After this code change, that case would include all commits without
> "Merge" in their commit message -- but any commit that previously had a
> "Merge"-mentioning parent would lose that parent
> (likely resulting in many new root commits). While the new behavior is
> still odd, it is at least understandable given that
> --reference-excluded-parents is not the default.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>
>     Hi,
>
>     Here's a revised version of my previously submitted patch regarding
>     fast-export and how it reacts to the --first-parent flag.
>
>     Changes since v2:
>
>      * Changed commit message to include the detailed description of the
>        problem area as suggested by Elijah. I went back and forth with my
>        self about whether the message needs some "lead in", but it ended up
>        getting long without adding much.

If it feels weird quoting someone else so extensively and just using
their words for the commit message, we've used a "Commit-message-by:"
trailer in the past; that could be useful here.

>      * Cleaned up test such that the comparison with the --reverse output is
>        just below where it is generated.
>      * In test, moved all usages of --no-ff to just after 'git merge' .
>      * In test, replaced long 'git rev-list' incantation with simpler 'git
>        log' format.
>
>     Changes since v1:
>
>      * Moved revs.reverse assignment down to similar assignments below.
>      * Removed braces around single statement while loop.
>      * The test now only changes directory inside a sub-shell.
>      * Applied stylistic feedback on test such as: making redirections be on
>        the form >FILE etc.
>
>     There were questions raised about whether it makes sense at all for
>     fast-export to simply accept all git rev-list arguments whether they
>     have an effect or not - in particular for a flag like --reverse. I think
>     I agree that it is questionable behavior, or at least questionably
>     documented, but I also think it is out of scope for this change.
>
>     I did consider teaching fast-export to complain if given --reverse, but
>     it seemed inconsistent to me as it will gladly accept every other
>     rev-list argument (for example, "git fast-export HEAD --show-signature
>     --graph" works just fine).
>
>     Original Message: I've noticed that git fast-export exhibits some odd
>     behavior when passed the --first-parent flag. In the repository I was
>     working on, it would only output the initial commit before exiting.
>     Moreover, adding the --reverse flag causes it to behave differently and
>     instead output all commits in the first parent line that only have one
>     parent. My expectation is more or less that git fast-export should
>     output the same set of commits as git rev-list would output given the
>     same arguments, which matches how it acts when given revision ranges.
>
>     It seems like this behavior comes from the fact that has_unshown_parents
>     will never be false for any merge commits encountered when fast-export
>     is called with --first-parent. This causes the while loop to follow the
>     pattern of pushing all commits into the "commits" queue until the
>     initial commit is encountered, at which point it calls handle_tail which
>     falls through on the first merge commit, causing most of the commits to
>     be unhandled.
>
>     My impression is that this logic only serves to ensure that parents are
>     processed before their children, so in my patch I've opted to fix the
>     issue by delegating that responsibility to revision.c by adding the
>     reverse flag before performing the revision walk. From what I can see,
>     this should be equivalent to what the previous logic is trying to
>     achieve, but I can also see that there could be risk in these changes.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1084
>
> Range-diff vs v2:
>
>  1:  af552f37dbe ! 1:  312255260b1 fast-export: fix surprising behavior with --first-parent
>      @@ Metadata
>        ## Commit message ##
>           fast-export: fix surprising behavior with --first-parent
>
>      -    When invoking git-fast-export with the --first-parent flag on a branch
>      -    with merges, fast-export would early-out on processing the first merge
>      -    on the branch. If combined with --reverse, fast-export would instead
>      -    output all single parent commits on the branch.
>      +    The revision traversal machinery typically processes and returns all
>      +    children before any parent.  fast-export needs to operate in the
>      +    reverse fashion, handling parents before any of their children in
>      +    order to build up the history starting from the root commit(s).  This
>      +    would be a clear case where we could just use the revision traversal
>      +    machinery's "reverse" option to achieve this desired affect.
>
>      -    This commit makes fast-export output the same commits as rev-list
>      -    --first-parent, and makes --reverse not have an effect on which commits
>      -    are output.
>      +    However, this wasn't what the code did.  It added its own array for
>      +    queuing.  The obvious hand-rolled solution would be to just push all
>      +    the commits into the array and then traverse afterwards, but it didn't
>      +    quite do that either.  It instead attempted to process anything it
>      +    could as soon as it could, and once it could, check whether it could
>      +    process anything that had been queued.  As far as I can tell, this was
>      +    an effort to save a little memory in the case of multiple root commits
>      +    since it could process some commits before queueing all of them.  This
>      +    involved some helper functions named has_unshown_parent() and
>      +    handle_tail().  For typical invocations of fast-export, this
>      +    alternative essentially amounted to a hand-rolled method of reversing
>      +    the commits -- it was a bunch of work to duplicate the revision
>      +    traversal machinery's "reverse" option.
>
>      -    The fix involves removing logic within fast-export which was responsible
>      -    for ensuring that parents are processed before their children, which was
>      -    what was exiting early due to missing second parents. This is replaced
>      -    by setting 'reverse = 1' before revision walking, which, in conjuction
>      -    with topo_order, allows for delegating the ordering of commits to
>      -    revision.c. The reverse flag is set after parsing rev-list arguments to
>      -    avoid having it disabled.
>      +    This hand-rolled reversing mechanism is actually somewhat difficult to
>      +    reason about.  It takes some time to figure out how it ensures in
>      +    normal cases that it will actually process all traversed commits
>      +    (rather than just dropping some and not printing anything for them).
>      +
>      +    And it turns out there are some cases where the code does drop commits
>      +    without handling them, and not even printing an error or warning for
>      +    the user.  Due to the has_unshown_parent() checks, some commits could
>      +    be left in the array at the end of the "while...get_revision()" loop
>      +    which would be unprocessed.  This could be triggered for example with
>      +        git fast-export main -- --first-parent
>      +    or non-sensical traversal rules such as
>      +        git fast-export main -- --grep=Merge --invert-grep
>      +
>      +    While most traversals that don't include all parents should likely
>      +    trigger errors in fast-export (or at least require being used in
>      +    combination with --reference-excluded-parents), the --first-parent
>      +    traversal is at least reasonable and it'd be nice if it didn't just drop
>      +    commits. It'd also be nice for future readers of the code to have a
>      +    simpler "reverse traversal" mechanism. Use the "reverse" option of the
>      +    revision traversal machinery to achieve both.
>      +
>      +    Even for the non-sensical traversal flags like the --grep one above,
>      +    this would be an improvement. For example, in that case, the code
>      +    previously would have silently truncated history to only those commits
>      +    that do not have an ancestor containing "Merge" in their commit message.
>      +    After this code change, that case would include all commits without
>      +    "Merge" in their commit message -- but any commit that previously had a
>      +    "Merge"-mentioning parent would lose that parent
>      +    (likely resulting in many new root commits). While the new behavior is
>      +    still odd, it is at least understandable given that
>      +    --reference-excluded-parents is not the default.
>
>           Signed-off-by: William Sprent <williams@unity3d.com>
>
>      @@ t/t9350-fast-export.sh: test_expect_success 'merge commit gets exported with --i
>       +         git checkout -b topic1 &&
>       +         test_commit B &&
>       +         git checkout main &&
>      -+         git merge topic1 --no-ff &&
>      ++         git merge --no-ff topic1 &&
>       +
>       +         git checkout -b topic2 &&
>       +         test_commit C &&
>       +         git checkout main &&
>      -+         git merge topic2 --no-ff &&
>      ++         git merge --no-ff topic2 &&
>       +
>       +         test_commit D &&
>       +
>      -+         git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected &&
>      -+
>       +         git fast-export main -- --first-parent >first-parent-export &&
>       +         git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
>      ++         test_cmp first-parent-export first-parent-reverse-export &&
>       +
>       +         git init import &&
>       +         git -C import fast-import <first-parent-export &&
>       +
>      -+         git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual &&
>      -+         git -C import rev-list --all >tmp &&
>      -+
>      -+         test_line_count = 4 tmp &&
>      ++         git log --format="%ad %s" --first-parent main >expected &&
>      ++         git -C import log --format="%ad %s" --all >actual &&
>       +         test_cmp expected actual &&
>      -+         test_cmp first-parent-export first-parent-reverse-export
>      ++         test_line_count = 4 actual
>       + )
>       +'
>       +
>
>
>  builtin/fast-export.c  | 40 ++++------------------------------------
>  t/t9350-fast-export.sh | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 8e2caf72819..cb1d6473f12 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
>
>  static struct decoration idnums;
>  static uint32_t last_idnum;
> -
> -static int has_unshown_parent(struct commit *commit)
> -{
> -       struct commit_list *parent;
> -
> -       for (parent = commit->parents; parent; parent = parent->next)
> -               if (!(parent->item->object.flags & SHOWN) &&
> -                   !(parent->item->object.flags & UNINTERESTING))
> -                       return 1;
> -       return 0;
> -}
> -
>  struct anonymized_entry {
>         struct hashmap_entry hash;
>         const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>         return strbuf_detach(&out, NULL);
>  }
>
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -                       struct string_list *paths_of_changed_objects)
> -{
> -       struct commit *commit;
> -       while (commits->nr) {
> -               commit = (struct commit *)object_array_pop(commits);
> -               if (has_unshown_parent(commit)) {
> -                       /* Queue again, to be handled later */
> -                       add_object_array(&commit->object, NULL, commits);
> -                       return;
> -               }
> -               handle_commit(commit, revs, paths_of_changed_objects);
> -       }
> -}
>
>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>         struct rev_info revs;
> -       struct object_array commits = OBJECT_ARRAY_INIT;
>         struct commit *commit;
>         char *export_filename = NULL,
>              *import_filename = NULL,
> @@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>
>         if (prepare_revision_walk(&revs))
>                 die("revision walk setup failed");
> +
> +       revs.reverse = 1;
>         revs.diffopt.format_callback = show_filemodify;
>         revs.diffopt.format_callback_data = &paths_of_changed_objects;
>         revs.diffopt.flags.recursive = 1;
> -       while ((commit = get_revision(&revs))) {
> -               if (has_unshown_parent(commit)) {
> -                       add_object_array(&commit->object, NULL, &commits);
> -               }
> -               else {
> -                       handle_commit(commit, &revs, &paths_of_changed_objects);
> -                       handle_tail(&commits, &revs, &paths_of_changed_objects);
> -               }
> -       }
> +       while ((commit = get_revision(&revs)))
> +               handle_commit(commit, &revs, &paths_of_changed_objects);
>
>         handle_tags_and_duplicates(&extra_refs);
>         handle_tags_and_duplicates(&tag_refs);
> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index 409b48e2442..7b7a18dd2c1 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -750,4 +750,36 @@ test_expect_success 'merge commit gets exported with --import-marks' '
>         )
>  '
>
> +
> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +       git init first-parent &&
> +       (
> +               cd first-parent &&
> +               test_commit A &&
> +               git checkout -b topic1 &&
> +               test_commit B &&
> +               git checkout main &&
> +               git merge --no-ff topic1 &&
> +
> +               git checkout -b topic2 &&
> +               test_commit C &&
> +               git checkout main &&
> +               git merge --no-ff topic2 &&
> +
> +               test_commit D &&
> +
> +               git fast-export main -- --first-parent >first-parent-export &&
> +               git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
> +               test_cmp first-parent-export first-parent-reverse-export &&
> +
> +               git init import &&
> +               git -C import fast-import <first-parent-export &&
> +
> +               git log --format="%ad %s" --first-parent main >expected &&
> +               git -C import log --format="%ad %s" --all >actual &&
> +               test_cmp expected actual &&
> +               test_line_count = 4 actual
> +       )
> +'
> +
>  test_done
>
> base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc

This version looks good to me:

Reviewed-by: Elijah Newren <newren@gmail.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] fast-export: fix surprising behavior with --first-parent
  2021-12-21 18:47     ` Elijah Newren
@ 2021-12-21 20:50       ` Junio C Hamano
  2021-12-22  8:38         ` William Sprent
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2021-12-21 20:50 UTC (permalink / raw)
  To: Elijah Newren
  Cc: William Sprent via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason, William Sprent

Elijah Newren <newren@gmail.com> writes:

>>      * Changed commit message to include the detailed description of the
>>        problem area as suggested by Elijah. I went back and forth with my
>>        self about whether the message needs some "lead in", but it ended up
>>        getting long without adding much.
>
> If it feels weird quoting someone else so extensively and just using
> their words for the commit message, we've used a "Commit-message-by:"
> trailer in the past; that could be useful here.

I do not mind tweaking the copy I already have with "Helped-by"
myself, while ...

> This version looks good to me:
>
> Reviewed-by: Elijah Newren <newren@gmail.com>

... adding this one before merging the result to 'next'.

Thanks, both.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] fast-export: fix surprising behavior with --first-parent
  2021-12-21 20:50       ` Junio C Hamano
@ 2021-12-22  8:38         ` William Sprent
  0 siblings, 0 replies; 18+ messages in thread
From: William Sprent @ 2021-12-22  8:38 UTC (permalink / raw)
  To: Junio C Hamano, Elijah Newren
  Cc: William Sprent via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason

On 21/12/2021 21.50, Junio C Hamano wrote:
> Elijah Newren <newren@gmail.com> writes:
> 
>>>       * Changed commit message to include the detailed description of the
>>>         problem area as suggested by Elijah. I went back and forth with my
>>>         self about whether the message needs some "lead in", but it ended up
>>>         getting long without adding much.
>>
>> If it feels weird quoting someone else so extensively and just using
>> their words for the commit message, we've used a "Commit-message-by:"
>> trailer in the past; that could be useful here.
> 
> I do not mind tweaking the copy I already have with "Helped-by"
> myself, while ...
> 
>> This version looks good to me:
>>
>> Reviewed-by: Elijah Newren <newren@gmail.com>
> 
> ... adding this one before merging the result to 'next'.
> 
> Thanks, both.

Either way works for me. I'm not too familiar with conventions around 
trailers. Both seem very reasonable to me.

Thanks a bunch for the help. :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-12-22  8:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
2021-11-24 11:57   ` William Sprent
2021-11-23 19:14 ` Elijah Newren
2021-11-24 13:05   ` William Sprent
2021-11-24  0:41 ` Junio C Hamano
2021-11-24 13:05   ` William Sprent
2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
2021-12-10  3:48   ` Elijah Newren
2021-12-10 21:55     ` Junio C Hamano
2021-12-10 22:02       ` Elijah Newren
2021-12-13 15:09     ` William Sprent
2021-12-14  0:31       ` Elijah Newren
2021-12-14 13:11         ` William Sprent
2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
2021-12-21 18:47     ` Elijah Newren
2021-12-21 20:50       ` Junio C Hamano
2021-12-22  8:38         ` William Sprent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.