All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Phillip Wood <phillip.wood@dunelm.org.uk>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1 8/8] sequencer: try to commit without forking 'git commit'
Date: Tue, 7 Nov 2017 02:36:30 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1711070203230.6482@virtualbox> (raw)
In-Reply-To: <20171106112709.2121-9-phillip.wood@talktalk.net>

Hi Phillip,

On Mon, 6 Nov 2017, Phillip Wood wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> If the commit message does not need to be edited then create the
> commit without forking 'git commit'. Taking the best time of ten runs
> with a warm cache this reduces the time taken to cherry-pick 10
> commits by 27% (from 282ms to 204ms), and the time taken by 'git
> rebase --continue' to pick 10 commits by 45% (from 386ms to 212ms) on
> my computer running linux. Some of greater saving for rebase is
> because it longer wastes time creating the commit summary just to

I usually leave the grammar reviews to people who prefer to review grammar
over code, but in this case I think the "no" in "no longer" is rather
crucial.

> throw it away.

Those are impressive improvements, and I am certain that they will be even
more noticable on Windows, where creating processes is a lot more
expensive than on Linux (which is the reason why you will find a lot more
multi-threaded processes on Windows...).

> The code to create the commit is based on builtin/commit.c. It is
> slightly simplified as it doesn't have to deal with merges and
> modified so try and return an error rather than dying so that the
> sequencer exits cleanly, as it would when forking 'git commit'.
> 
> Even when not forking 'git commit' the commit message is written to a
> file and CHERRY_PICK_HEAD is created unnecessarily. This could be
> eliminated in future. I hacked up a version that does not write these
> files and just passed an strbuf (with the wrong message for fixup and
> squash commands) to do_commit() but I couldn't measure any significant
> time difference when running cherry-pick or rebase. I think
> eliminating the writes properly for rebase would require a bit of
> effort as the code would need to be restructured.

True. And it totally makes sense to go for the big bucks.

> diff --git a/sequencer.c b/sequencer.c
> index b8cf679751449591d6f97102904e060ebee9d7a1..0636d027e9e1cdebaab4802e5becd89e8398a425 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -592,6 +592,18 @@ static int read_env_script(struct argv_array *env)
>  	return 0;
>  }
>  
> +static char *get_author(const char* message)
> +{
> +	size_t len;
> +	const char *a;
> +
> +	a = find_commit_header(message, "author", &len);
> +	if (a)
> +		return xmemdupz(a, len);
> +
> +	return NULL;
> +}

I was surprised that there is no helper for that yet, so I looked, but it
seems that the three existing callers of `find_commit_header(...,
"author", ...)` want to split the ident line right away, and do not need
the duplicated buffer.

In short: the code added here is necessary.

> @@ -984,6 +996,151 @@ int print_commit_summary(const char *prefix, const struct object_id *oid,
>  	return ret;
>  }
>  
> +static int parse_head(struct commit **head)
> +{
> +	struct commit *current_head;
> +	struct object_id oid;
> +
> +	if (get_oid("HEAD", &oid)) {
> +		current_head = NULL;
> +	} else {
> +		current_head = lookup_commit_reference(&oid);
> +		if (!current_head)
> +			return error(_("could not parse HEAD"));
> +		if (oidcmp(&oid, &current_head->object.oid)) {
> +			warning(_("HEAD %s is not a commit!"),
> +				oid_to_hex(&oid));
> +		}
> +		if (parse_commit(current_head))
> +			return error(_("could not parse HEAD commit"));
> +	}
> +	*head = current_head;
> +
> +	return 0;
> +}
> +
> +static int try_to_commit(struct strbuf *msg, const char *author,
> +			 struct replay_opts *opts, unsigned int flags,
> +			 struct object_id *oid)

Since this is a file-local function, i.e. not in any way tied to a
process exit status, it should probably return -1 in the case of errors,
as Git does elsewhere, too.

> +{
> +	struct object_id tree;
> +	struct commit *current_head;
> +	struct commit_list *parents = NULL;
> +	struct commit_extra_header *extra = NULL;
> +	struct strbuf err = STRBUF_INIT;
> +	struct strbuf amend_msg = STRBUF_INIT;
> +	char *amend_author = NULL;
> +	const char *gpg_sign;
> +	enum cleanup_mode cleanup;
> +	int res = 0;
> +
> +	if (parse_head(&current_head))
> +		return -1;
> +
> +	if (flags & AMEND_MSG) {
> +		const char *exclude_gpgsig[2] = { "gpgsig", NULL };

Git's current source code seems to prefer to infer the array length; The
`2` is unnecessary here.

> +		const char *out_enc = get_commit_output_encoding();
> +		const char *message = logmsg_reencode(current_head, NULL,
> +						      out_enc);
> +
> +		if (!msg) {
> +			const char *body = NULL;
> +
> +			find_commit_subject(message, &body);

Maybe `orig_message` would be better here; I expected `body` to refer to
the part of the commit message *after* the subject, but reading the code
of `find_commit_subject()`, I find that it stores the beginning of the
commit message.

Dunno.

> +			msg = &amend_msg;
> +			strbuf_addstr(msg, body);
> +		}
> +		author = amend_author = get_author (message);

Please lose the space after the function name.

> +		unuse_commit_buffer(current_head, message);
> +		if (!author) {
> +			res = error(_("unable to parse commit author"));
> +			goto out;
> +		}
> +		parents = copy_commit_list(current_head->parents);
> +		extra = read_commit_extra_headers(current_head, exclude_gpgsig);
> +	} else if (current_head) {
> +		commit_list_insert(current_head, &parents);
> +	}
> +
> +	cleanup = (flags & CLEANUP_MSG) ? CLEANUP_ALL : default_msg_cleanup;
> +	if (cleanup != CLEANUP_NONE)
> +		strbuf_stripspace(msg, cleanup == CLEANUP_ALL);
> +	if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) {
> +		res = 1;
> +		goto out;
> +	}
> +
> +	gpg_sign = (opts->gpg_sign) ? opts->gpg_sign : default_gpg_sign;

Others will probably complain about those extra parentheses. I am not
offended by them, though.

> +	if (write_cache_as_tree(tree.hash, 0, NULL)) {
> +		res = error(_("git write-tree failed to write a tree"));
> +		goto out;
> +	}
> +
> +	if (!(flags & ALLOW_EMPTY) && !oidcmp(current_head ?
> +					      &current_head->tree->object.oid :
> +					      &empty_tree_oid, &tree)) {

I'll leave it to Junio to comment on the formatting here.

> +		res = 1;
> +		goto out;
> +	}
> +
> +	if (commit_tree_extended(msg->buf, msg->len, tree.hash, parents,
> +				 oid->hash, author, gpg_sign, extra)) {
> +		res = error(_("failed to write commit object"));
> +		goto out;
> +	}
> +
> +	if (update_head(current_head, oid, getenv("GIT_REFLOG_ACTION"), msg,
> +			&err)){
> +		res = error("%s", err.buf);
> +		goto out;
> +	}
> +
> +	if (flags & AMEND_MSG)
> +		commit_post_rewrite(current_head, oid);
> +
> +out:
> +	free_commit_extra_headers(extra);
> +	strbuf_release(&err);
> +	strbuf_release(&amend_msg);
> +	if (amend_author)
> +		free(amend_author);

Git's source code uses the fact that `free(NULL);` is essentially a no-op
(and certainly allowed) to avoid conditionals in such cases.

That would make the `if (amend_author)` unnecessary.

> +
> +	return res;
> +}
> +
> +static int do_commit(const char *msg_file, const char* author,
> +		     struct replay_opts *opts, unsigned int flags)
> +{
> +	int res = 1;

Same as above, the error code should most likely be -1 instead.

> +	if (~flags & EDIT_MSG && ~flags & VERIFY_MSG) {

I *think* it is more common to write `!(flags & EDIT_MSG)` in Git's source
code.

> +		struct object_id oid;
> +		struct strbuf sb = STRBUF_INIT;
> +
> +		if (msg_file && strbuf_read_file(&sb, msg_file, 2048) < 0)
> +			return error_errno(_("unable to read commit message "
> +					     "from '%s'"),
> +					   msg_file);
> +
> +		res = try_to_commit(msg_file ? &sb : NULL, author, opts, flags,
> +				    &oid);
> +		strbuf_release(&sb);
> +		if (res == 0) {

Usually, Git's source code uses `if (!res)` in such cases.

> +			unlink(git_path_cherry_pick_head());
> +			unlink(git_path_merge_msg());
> +			if (!is_rebase_i(opts))
> +				res = print_commit_summary(NULL, &oid,
> +						SUMMARY_SHOW_AUTHOR_DATE);
> +			return res;
> +		}

I wonder whether we should move the `return res;` one line lower, to avoid
falling through to call `run_git_commit()` if `try_to_commit()` failed...

> +	}
> +	if (res == 1)
> +		return run_git_commit(msg_file, opts, flags);

Maybe this code could be simplified even further by moving this
conditional to the beginning of the function, as:

	if ((flags & (EDIT_MSG | VERIFY_MSG)))
		return run_git_commit(msg_file, opts, flags);

But maybe I misunderstood and you really wanted to fall back on
`run_git_commit()` if `try_to_commit()` failed?

> +
> +	return res;
> +}
> +
>  static int is_original_commit_empty(struct commit *commit)
>  {
>  	const struct object_id *ptree_oid;
> @@ -1235,6 +1392,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  	struct object_id head;
>  	struct commit *base, *next, *parent;
>  	const char *base_label, *next_label;
> +	char *author = NULL;
>  	struct commit_message msg = { NULL, NULL, NULL, NULL };
>  	struct strbuf msgbuf = STRBUF_INIT;
>  	int res, unborn = 0, allow;
> @@ -1350,6 +1508,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  			strbuf_addstr(&msgbuf, oid_to_hex(&commit->object.oid));
>  			strbuf_addstr(&msgbuf, ")\n");
>  		}
> +		if (!is_fixup (command))
> +			author = get_author(msg.message);
>  	}
>  
>  	if (command == TODO_REWORD)
> @@ -1435,9 +1595,13 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  		goto leave;
>  	} else if (allow)
>  		flags |= ALLOW_EMPTY;
> -	if (!opts->no_commit)
> +	if (!opts->no_commit) {
>  fast_forward_edit:
> -		res = run_git_commit(msg_file, opts, flags);
> +		if (author || command == TODO_REVERT || (flags & AMEND_MSG))
> +			res = do_commit(msg_file, author, opts, flags);
> +		else
> +			res = error(_("unable to parse commit author"));

Would this be a bug here? Or do we expect `get_author()` to possibly fail?

> +	}
>  
>  	if (!res && final_fixup) {
>  		unlink(rebase_path_fixup_msg());
> @@ -1446,6 +1610,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
>  
>  leave:
>  	free_message(commit, &msg);
> +	if (author)
> +		free(author);

As above, please write an unconditional `free(author);` here.

All in all, this patch series was a nice an pleasant read. I am impressed
by the performance wins.

Thank you very much,
Dscho

  reply	other threads:[~2017-11-07  1:36 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-25 10:10 [RFC PATCH 0/8] sequencer: dont't fork git commit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 1/8] commit: move empty message checks to libgit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 2/8] commit: move code to update HEAD " Phillip Wood
2017-10-07  9:54   ` Junio C Hamano
2017-10-24 10:01     ` Phillip Wood
2017-10-24 12:41       ` Junio C Hamano
2017-09-25 10:10 ` [RFC PATCH 3/8] sequencer: refactor update_head() Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 4/8] commit: move post-rewrite code to libgit Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 5/8] commit: move print_commit_summary() " Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 7/8] sequencer: load commit related config Phillip Wood
2017-09-25 10:10 ` [RFC PATCH 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-06 11:27 ` [PATCH v1 0/8] sequencer: dont't fork git commit Phillip Wood
2017-11-06 11:27   ` [PATCH v1 1/8] commit: move empty message checks to libgit Phillip Wood
2017-11-07  0:43     ` Johannes Schindelin
2017-11-07 14:24       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 2/8] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-07  2:56     ` Junio C Hamano
2017-11-07  3:02       ` Johannes Schindelin
2017-11-07 14:26         ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 3/8] commit: move post-rewrite code to libgit Phillip Wood
2017-11-07  3:03     ` Junio C Hamano
2017-11-07 14:28       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 4/8] commit: move print_commit_summary() " Phillip Wood
2017-11-07  3:38     ` Junio C Hamano
2017-11-07 14:32       ` Phillip Wood
2017-11-08  1:04         ` Junio C Hamano
2017-11-06 11:27   ` [PATCH v1 5/8] sequencer: don't die in print_commit_summary() Phillip Wood
2017-11-07  4:18     ` Junio C Hamano
2017-11-07 10:24       ` Johannes Schindelin
2017-11-07 15:13       ` Junio C Hamano
2017-11-10 14:53         ` Phillip Wood
2017-11-10 18:05           ` Junio C Hamano
2017-11-13 11:11             ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-07  0:52     ` Johannes Schindelin
2017-11-07  4:52     ` Junio C Hamano
2017-11-07 14:46       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 7/8] sequencer: load commit related config Phillip Wood
2017-11-07  1:02     ` Johannes Schindelin
2017-11-07 10:50       ` Phillip Wood
2017-11-06 11:27   ` [PATCH v1 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-07  1:36     ` Johannes Schindelin [this message]
2017-11-07 11:16       ` Phillip Wood
2017-11-07 14:09         ` Johannes Schindelin
2017-11-10 11:09 ` [PATCH v2 0/9] sequencer: dont't fork git commit Phillip Wood
2017-11-10 11:09   ` [PATCH v2 1/9] t3404: check intermediate squash messages Phillip Wood
2017-11-10 11:09   ` [PATCH v2 2/9] commit: move empty message checks to libgit Phillip Wood
2017-11-10 18:51     ` Ramsay Jones
2017-11-13 11:08       ` Phillip Wood
2017-11-10 11:09   ` [PATCH v2 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-10 18:36     ` Junio C Hamano
2017-11-13 11:25       ` Phillip Wood
2017-11-10 11:09   ` [PATCH v2 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-11-10 11:09   ` [PATCH v2 5/9] commit: move print_commit_summary() " Phillip Wood
2017-11-10 11:09   ` [PATCH v2 6/9] sequencer: don't die in print_commit_summary() Phillip Wood
2017-11-10 11:09   ` [PATCH v2 7/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-10 11:09   ` [PATCH v2 8/9] sequencer: load commit related config Phillip Wood
2017-11-10 11:09   ` [PATCH v2 9/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-10 19:21   ` [PATCH v2 0/9] sequencer: dont't fork git commit Junio C Hamano
2017-11-13 11:24     ` Phillip Wood
2017-11-14  1:15       ` Junio C Hamano
2017-11-17 11:34 ` [PATCH v3 0/8] sequencer: don't " Phillip Wood
2017-11-17 11:34   ` [PATCH v3 1/8] t3404: check intermediate squash messages Phillip Wood
2017-11-17 11:34   ` [PATCH v3 2/8] commit: move empty message checks to libgit Phillip Wood
2017-11-17 11:34   ` [PATCH v3 3/8] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-17 11:34   ` [PATCH v3 4/8] commit: move post-rewrite code to libgit Phillip Wood
2017-11-17 11:34   ` [PATCH v3 5/8] commit: move print_commit_summary() " Phillip Wood
2017-11-17 11:34   ` [PATCH v3 6/8] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-17 11:34   ` [PATCH v3 7/8] sequencer: load commit related config Phillip Wood
2017-11-17 11:34   ` [PATCH v3 8/8] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-18  3:41   ` [PATCH v3 0/8] sequencer: don't fork git commit Junio C Hamano
2017-11-18  3:57     ` Junio C Hamano
2017-11-18 11:32       ` Phillip Wood
2017-11-18 14:33       ` Phillip Wood
2017-11-24 11:07 ` [PATCH v4 0/9] " Phillip Wood
2017-11-24 11:07   ` [PATCH v4 1/9] t3404: check intermediate squash messages Phillip Wood
2017-11-24 11:07   ` [PATCH v4 2/9] commit: move empty message checks to libgit Phillip Wood
2017-11-24 11:07   ` [PATCH v4 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-11-24 11:07   ` [PATCH v4 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-11-24 11:07   ` [PATCH v4 5/9] commit: move print_commit_summary() " Phillip Wood
2017-11-24 11:07   ` [PATCH v4 6/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-11-24 11:07   ` [PATCH v4 7/9] sequencer: load commit related config Phillip Wood
2017-11-24 13:48     ` Junio C Hamano
2017-11-24 14:38       ` Phillip Wood
2017-12-04 18:30     ` Junio C Hamano
2017-12-05 11:21       ` Phillip Wood
2017-12-05 12:10         ` Phillip Wood
2017-12-09 19:05         ` Phillip Wood
2017-11-24 11:07   ` [PATCH v4 8/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2017-11-24 11:07   ` [PATCH v4 9/9] t3512/t3513: remove KNOWN_FAILURE_CHERRY_PICK_SEES_EMPTY_COMMIT=1 Phillip Wood
2017-12-04 19:24     ` Stefan Beller
2017-12-05 12:13       ` Phillip Wood
2017-12-11 14:13 ` [PATCH v5 0/9] sequencer: don't fork git commit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 1/9] t3404: check intermediate squash messages Phillip Wood
2017-12-11 14:13   ` [PATCH v5 2/9] commit: move empty message checks to libgit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 3/9] Add a function to update HEAD after creating a commit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 4/9] commit: move post-rewrite code to libgit Phillip Wood
2017-12-11 14:13   ` [PATCH v5 5/9] commit: move print_commit_summary() " Phillip Wood
2017-12-11 14:13   ` [PATCH v5 6/9] sequencer: simplify adding Signed-off-by: trailer Phillip Wood
2017-12-11 14:13   ` [PATCH v5 7/9] sequencer: load commit related config Phillip Wood
2017-12-11 18:53     ` Phillip Wood
2017-12-11 14:13   ` [PATCH v5 8/9] sequencer: try to commit without forking 'git commit' Phillip Wood
2018-01-10 20:53     ` Jonathan Nieder
2018-01-10 22:40       ` Johannes Schindelin
2018-01-11 10:41         ` Phillip Wood
2018-01-11 20:21           ` Johannes Schindelin
2017-12-11 14:13   ` [PATCH v5 9/9] t3512/t3513: remove KNOWN_FAILURE_CHERRY_PICK_SEES_EMPTY_COMMIT=1 Phillip Wood
2017-12-11 23:44   ` [PATCH v5 0/9] sequencer: don't fork git commit Junio C Hamano
2017-12-12 10:32     ` Phillip Wood
2017-12-13 11:46     ` [PATCH] sequencer: improve config handling Phillip Wood
2017-12-20 18:33       ` Error in `git': free(): invalid pointer (was Re: [PATCH] sequencer: improve config handling) Kaartic Sivaraam
2017-12-21 14:06         ` Johannes Schindelin
2017-12-21 16:42           ` Kaartic Sivaraam
2017-12-22 11:49             ` Johannes Schindelin
2017-12-25  8:51               ` Kaartic Sivaraam
     [not found]         ` <18737953.1042351513802399608.JavaMail.defaultUser@defaultHost>
2017-12-21 15:02           ` Kaartic Sivaraam
2017-12-21 16:53         ` phillip.wood
2017-12-21 17:14           ` Kaartic Sivaraam
2017-12-22 10:47             ` phillip.wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1711070203230.6482@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.