From: Junio C Hamano <gitster@pobox.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2] checkout: avoid unnecessary match_pathspec calls
Date: Mon, 25 Mar 2013 09:26:08 -0700 [thread overview]
Message-ID: <7vr4j3zbrz.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <1364129725-29597-1-git-send-email-pclouds@gmail.com> (=?utf-8?B?Ik5ndXnhu4VuCVRow6FpIE5n4buNYw==?= Duy"'s message of "Sun, 24 Mar 2013 19:55:25 +0700")
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> In checkout_paths() we do this
>
> - for all updated items, call match_pathspec
> - for all items, call match_pathspec (inside unmerge_cache)
> - for all items, call match_pathspec (for showing "path .. is unmerged)
> - for updated items, call match_pathspec and update paths
>
> That's a lot of duplicate match_pathspec(s) and the function is not
> exactly cheap to be called so many times, especially on large indexes.
> This patch makes it call match_pathspec once per updated index entry,
> save the result in ce_flags and reuse the results in the following
> loops.
>
> The changes in 0a1283b (checkout $tree $path: do not clobber local
> changes in $path not in $tree - 2011-09-30) limit the affected paths
> to ones we read from $tree. We do not do anything to other modified
> entries in this case, so the "for all items" above could be modified
> to "for all updated items". But..
>
> The command's behavior now is modified slightly: unmerged entries that
> match $path, but not updated by $tree, are now NOT touched. Although
> this should be considered a bug fix, not a regression.
Could we have a test to show the difference please, especially if we
are going to sell this as a fix?
The change itself looks quite sane to me (I didn't apply or test it,
though---just eyeballing).
Thanks.
>
> And while at there, free ps_matched after use.
>
> The following command is tested on webkit, 215k entries. The pattern
> is chosen mainly to make match_pathspec sweat:
>
> git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*"
>
> before after
> real 0m3.493s 0m2.737s
> user 0m2.239s 0m1.586s
> sys 0m1.252s 0m1.151s
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> > and worry about larger ones later, so if there were another choice,
> > i.e.
> >
> > - eject nd/magic-pathspecs for now, cook this (and other small
> > independent and clear improvements you may come up with, some of
> > which might come out of nd/magic-pathspecs itself) and let it
> > graduate first, and later revisit rerolld nd/magic-pathspecs
> >
> > that would be the ideal among the given choices ;-).
>
> Whichever is easier for you.
>
> > The above is a faithful rewrite, but I have to wonder why you need
> > two separate loops.
> >
> > Do you understand what the original loop is doing with ps_matched,
> > and why the code excludes certain paths while doing so?
>
> Nope, I did not dig that deep. I expected you to do it ;-) j/k
>
> > After commenting on the above, it makes me wonder if we even need to
> > bother marking entries that were in the index that did not come from
> > the tree-ish we are checking paths out of, though. What breaks if
> > you did not do the rewrite above and dropped the second loop in your
> > patch?
>
> The test suite says none. There is a behavior change regarding
> unmerged entries as mentioned in the commit message. But I think it's
> a good change.
>
> builtin/checkout.c | 34 +++++++++++++++++++++++++++-------
> cache.h | 1 +
> resolve-undo.c | 19 ++++++++++++++++++-
> resolve-undo.h | 1 +
> 4 files changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index a9c1b5a..359b983 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -271,24 +271,46 @@ static int checkout_paths(const struct checkout_opts *opts,
> ;
> ps_matched = xcalloc(1, pos);
>
> + /*
> + * Make sure all pathspecs participated in locating the paths
> + * to be checked out.
> + */
> for (pos = 0; pos < active_nr; pos++) {
> struct cache_entry *ce = active_cache[pos];
> + ce->ce_flags &= ~CE_MATCHED;
> if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
> + /*
> + * "git checkout tree-ish -- path", but this entry
> + * is in the original index; it will not be checked
> + * out to the working tree and it does not matter
> + * if pathspec matched this entry. We will not do
> + * anything to this entry at all.
> + */
> continue;
> - match_pathspec(opts->pathspec, ce->name, ce_namelen(ce), 0, ps_matched);
> + /*
> + * Either this entry came from the tree-ish we are
> + * checking the paths out of, or we are checking out
> + * of the index.
> + */
> + if (match_pathspec(opts->pathspec, ce->name, ce_namelen(ce),
> + 0, ps_matched))
> + ce->ce_flags |= CE_MATCHED;
> }
>
> - if (report_path_error(ps_matched, opts->pathspec, opts->prefix))
> + if (report_path_error(ps_matched, opts->pathspec, opts->prefix)) {
> + free(ps_matched);
> return 1;
> + }
> + free(ps_matched);
>
> /* "checkout -m path" to recreate conflicted state */
> if (opts->merge)
> - unmerge_cache(opts->pathspec);
> + unmerge_marked_index(&the_index);
>
> /* Any unmerged paths? */
> for (pos = 0; pos < active_nr; pos++) {
> struct cache_entry *ce = active_cache[pos];
> - if (match_pathspec(opts->pathspec, ce->name, ce_namelen(ce), 0, NULL)) {
> + if (ce->ce_flags & CE_MATCHED) {
> if (!ce_stage(ce))
> continue;
> if (opts->force) {
> @@ -313,9 +335,7 @@ static int checkout_paths(const struct checkout_opts *opts,
> state.refresh_cache = 1;
> for (pos = 0; pos < active_nr; pos++) {
> struct cache_entry *ce = active_cache[pos];
> - if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
> - continue;
> - if (match_pathspec(opts->pathspec, ce->name, ce_namelen(ce), 0, NULL)) {
> + if (ce->ce_flags & CE_MATCHED) {
> if (!ce_stage(ce)) {
> errs |= checkout_entry(ce, &state, NULL);
> continue;
> diff --git a/cache.h b/cache.h
> index c56315c..04e6090 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -161,6 +161,7 @@ struct cache_entry {
>
> #define CE_UNPACKED (1 << 24)
> #define CE_NEW_SKIP_WORKTREE (1 << 25)
> +#define CE_MATCHED (1 << 26)
>
> /*
> * Extended on-disk flags
> diff --git a/resolve-undo.c b/resolve-undo.c
> index 72b4612..639eb9c 100644
> --- a/resolve-undo.c
> +++ b/resolve-undo.c
> @@ -118,7 +118,7 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
> struct cache_entry *ce;
> struct string_list_item *item;
> struct resolve_undo_info *ru;
> - int i, err = 0;
> + int i, err = 0, matched;
>
> if (!istate->resolve_undo)
> return pos;
> @@ -137,6 +137,7 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
> ru = item->util;
> if (!ru)
> return pos;
> + matched = ce->ce_flags & CE_MATCHED;
> remove_index_entry_at(istate, pos);
> for (i = 0; i < 3; i++) {
> struct cache_entry *nce;
> @@ -144,6 +145,8 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
> continue;
> nce = make_cache_entry(ru->mode[i], ru->sha1[i],
> ce->name, i + 1, 0);
> + if (matched)
> + nce->ce_flags |= CE_MATCHED;
> if (add_index_entry(istate, nce, ADD_CACHE_OK_TO_ADD)) {
> err = 1;
> error("cannot unmerge '%s'", ce->name);
> @@ -156,6 +159,20 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
> return unmerge_index_entry_at(istate, pos);
> }
>
> +void unmerge_marked_index(struct index_state *istate)
> +{
> + int i;
> +
> + if (!istate->resolve_undo)
> + return;
> +
> + for (i = 0; i < istate->cache_nr; i++) {
> + struct cache_entry *ce = istate->cache[i];
> + if (ce->ce_flags & CE_MATCHED)
> + i = unmerge_index_entry_at(istate, i);
> + }
> +}
> +
> void unmerge_index(struct index_state *istate, const char **pathspec)
> {
> int i;
> diff --git a/resolve-undo.h b/resolve-undo.h
> index 8458769..7a30206 100644
> --- a/resolve-undo.h
> +++ b/resolve-undo.h
> @@ -12,5 +12,6 @@ extern struct string_list *resolve_undo_read(const char *, unsigned long);
> extern void resolve_undo_clear_index(struct index_state *);
> extern int unmerge_index_entry_at(struct index_state *, int);
> extern void unmerge_index(struct index_state *, const char **);
> +extern void unmerge_marked_index(struct index_state *);
>
> #endif
next prev parent reply other threads:[~2013-03-25 16:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-23 10:55 [PATCH] checkout: avoid unncessary match_pathspec calls Nguyễn Thái Ngọc Duy
2013-03-24 2:45 ` Eric Sunshine
2013-03-24 6:47 ` Junio C Hamano
2013-03-24 12:55 ` [PATCH v2] checkout: avoid unnecessary " Nguyễn Thái Ngọc Duy
2013-03-25 16:26 ` Junio C Hamano [this message]
2013-03-27 5:58 ` [PATCH v3] " Nguyễn Thái Ngọc Duy
2013-03-28 22:32 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vr4j3zbrz.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).