All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: William Sprent via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, William Sprent <williams@unity3d.com>
Subject: Re: [PATCH] fast-export: fix surprising behavior with --first-parent
Date: Tue, 23 Nov 2021 14:07:26 +0100	[thread overview]
Message-ID: <211123.865ysjui34.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <pull.1084.git.1637666927224.gitgitgadget@gmail.com>


On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote:

> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.
>
> This commit makes fast-export output the same commits as rev-list
> --first-parent, and makes --reverse not have an effect on which commits
> are output.
>
> The fix involves removing logic within fast-export which was responsible
> for ensuring that parents are processed before their children, which was
> what was exiting early due to missing second parents. This is replaced
> by setting 'reverse = 1' before revision walking, which, in conjuction
> with topo_order, allows for delegating the ordering of commits to
> revision.c. The reverse flag is set after parsing rev-list arguments to
> avoid having it disabled.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>     
>     Hi,
>     
>     This is my first time patching git, so I probably need some guidance on
>     my approach. :)

Hi, thanks for your first contribution to git. This is a rather shallow
review, a deeper one is much deserved.

I notice that you're removing code in builtin/fast-export.c, presumably
we have code in revision.c that does the same thing. It would really
help a reviewer for you to dig a bit into the relevant commit history
and note it in the commit message.

I.e. could revision.c always do this, and this was always needless
duplication, or at time X it was needed, but as of Y revision.c learned
to do this, and callers A, B and C were adjusted, but just not this
missed call D? etc.

> -static int has_unshown_parent(struct commit *commit)
> -{
> -	struct commit_list *parent;
> -
> -	for (parent = commit->parents; parent; parent = parent->next)
> -		if (!(parent->item->object.flags & SHOWN) &&
> -		    !(parent->item->object.flags & UNINTERESTING))
> -			return 1;
> -	return 0;
> -}
> -
>  struct anonymized_entry {
>  	struct hashmap_entry hash;
>  	const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>  	return strbuf_detach(&out, NULL);
>  }
>  
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -			struct string_list *paths_of_changed_objects)
> -{
> -	struct commit *commit;
> -	while (commits->nr) {
> -		commit = (struct commit *)object_array_pop(commits);
> -		if (has_unshown_parent(commit)) {
> -			/* Queue again, to be handled later */
> -			add_object_array(&commit->object, NULL, commits);
> -			return;
> -		}
> -		handle_commit(commit, revs, paths_of_changed_objects);
> -	}
> -}

...

>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>  	struct rev_info revs;
> -	struct object_array commits = OBJECT_ARRAY_INIT;
>  	struct commit *commit;
>  	char *export_filename = NULL,
>  	     *import_filename = NULL,
> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  
>  	get_tags_and_duplicates(&revs.cmdline);
>  
> +	revs.reverse = 1;

Is the placement of revs.reverse = 1 here important, or could it go
earlier after init_revision_sources() when we assign some other values
ir revs?

>  	if (prepare_revision_walk(&revs))
>  		die("revision walk setup failed");

A light reading of prepare_revision_walk() suggests it could come after,
but maybe I'm entirely wrong.

>  	revs.diffopt.format_callback = show_filemodify;
>  	revs.diffopt.format_callback_data = &paths_of_changed_objects;
>  	revs.diffopt.flags.recursive = 1;
>  	while ((commit = get_revision(&revs))) {
> -		if (has_unshown_parent(commit)) {
> -			add_object_array(&commit->object, NULL, &commits);
> -		}
> -		else {
> -			handle_commit(commit, &revs, &paths_of_changed_objects);
> -			handle_tail(&commits, &revs, &paths_of_changed_objects);
> -		}
> +		handle_commit(commit, &revs, &paths_of_changed_objects);
>  	}
>  

Yay code deletion, good if it works (I didn't check).

Since this is just a one-statement while-loop we can also remove its
braces now.

> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +	git init first-parent &&
> +	cd first-parent &&

Do any such "cd" in a sub-shell:

	git init x &&
	(
    		cd x &&
                ...
	)
Otherwise the next test after you is going to run in anotherdirectory.

> +	test_commit init &&
> +	git checkout -b topic1 &&
> +	test_commit file2 file2.txt &&
> +	git checkout main &&
> +	git merge topic1 --no-ff &&
> +
> +	git checkout -b topic2 &&
> +	test_commit file3 file3.txt &&
> +	git checkout main &&
> +	git merge topic2 --no-ff &&

Just a nit. I'd use "test_commit A", "test_commit B" etc. when the
filenames etc. aren't important. There's no subsequent reference here,
so I assume they're not.

> +	test_commit branch-head &&
> +
> +	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&

nit; >expected, not > expected is the usual style.
> +
> +	git fast-export main -- --first-parent > first-parent-export &&
> +	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&

ditto:

> +	git init import && cd import &&

ditto earlier "cd" comment.

> +	cat ../first-parent-export | git fast-import &&

Instead of "cat x | prog" do "prog <x".

> +	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&

> +	test $(git rev-list --all | wc -l) -eq 4 &&

Instead:

    git rev-list --all >tmp &&
    test_line_count = 4 tmp

(for some value of tmp)

> +	test_cmp ../expected actual &&
> +	test_cmp ../first-parent-export ../first-parent-reverse-export
> +'

Maybe some of the CD-ing around here wouldu be easier by not doing that
and instead running e.g.:

    git -C subdir fast-export >file-not-in-subdir &&
    ...

  reply	other threads:[~2021-11-23 13:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason [this message]
2021-11-24 11:57   ` William Sprent
2021-11-23 19:14 ` Elijah Newren
2021-11-24 13:05   ` William Sprent
2021-11-24  0:41 ` Junio C Hamano
2021-11-24 13:05   ` William Sprent
2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
2021-12-10  3:48   ` Elijah Newren
2021-12-10 21:55     ` Junio C Hamano
2021-12-10 22:02       ` Elijah Newren
2021-12-13 15:09     ` William Sprent
2021-12-14  0:31       ` Elijah Newren
2021-12-14 13:11         ` William Sprent
2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
2021-12-21 18:47     ` Elijah Newren
2021-12-21 20:50       ` Junio C Hamano
2021-12-22  8:38         ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211123.865ysjui34.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=williams@unity3d.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.