git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Martin Melka" <martin.melka@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Samuel Lijin" <sxlijin@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v5 00/12] Avoid multiple recursive calls for same path in read_directory_recursive()
Date: Wed, 01 Apr 2020 04:17:34 +0000	[thread overview]
Message-ID: <pull.700.v5.git.git.1585714667.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.700.v4.git.git.1585258061.gitgitgadget@gmail.com>

This series provides some "modest" speedups (see commit message for patch
8), and should allow 'git status --ignored' to complete in a more reasonable
timeframe for Martin Melka (see 
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
). It also cleans up the fill_directory() code and API, and fixes
bash-completion for 'git add untracked-dir/'.

Changes since v4:

 * cleanups suggested by Junio (patch 1)
 * new testcases that would have displayed multiple bugs with v4 (patch 2)
 * fixed the bugs with v4 (look for LEADING_PATHSPEC in patch 8)
 * fixed ANOTHER exponential slowdown codepath (look for MODE_MATCHING in
   patch 8)
 * make DIR_KEEP_UNTRACKED_CONTENTS less of a weird one-off (patch 9)
 * reduce number of calls to [do_]match_pathspec() (patch 10)
 * fix error-proneness of fill_directory() API (patch 11)
 * fix bash-completion results for 'git add' on an untracked dir (patch 12)

This is one of those rare patchsets that is absolutely perfect and
risk-free. That's right, bask in their glory and the ease of conscience from
using such solid stuff. Using this series will even innoculate you from bugs
outside of dir.c, and ones external to git, and even bugs external to your
computer. It's just that good. Pay no attention to the man behind the
curtain, er, I mean the huge warnings in patch 8, er...I mean what warnings?
There's no warnings to view, this stuff is solid as can be.

But if an extra pair of eyes wants to look at commit message in patch 8, or
at the new patches (2 and 9-12) and opine on how perfect everything looks
and feels, be my guest.

Derrick Stolee (1):
  dir: refactor treat_directory to clarify control flow

Elijah Newren (11):
  t7063: more thorough status checking
  t3000: add more testcases testing a variety of ls-files issues
  dir: fix simple typo in comment
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: replace exponential algorithm with a linear one
  dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
  dir: replace double pathspec matching with single in treat_directory()
  Fix error-prone fill_directory() API; make it only return matches
  completion: fix 'git add' on paths under an untracked directory

 builtin/clean.c                        |   6 -
 builtin/grep.c                         |   2 -
 builtin/ls-files.c                     |   5 +-
 builtin/stash.c                        |  17 +-
 contrib/completion/git-completion.bash |   2 +-
 dir.c                                  | 422 +++++++++++++++----------
 t/t3000-ls-files-others.sh             | 121 +++++++
 t/t7063-status-untracked-cache.sh      |  52 +++
 t/t9902-completion.sh                  |   5 +
 wt-status.c                            |   6 +-
 10 files changed, 437 insertions(+), 201 deletions(-)


base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v5
Pull-Request: https://github.com/git/git/pull/700

Range-diff vs v4:

  1:  752403e339b !  1:  e2704245854 t7063: more thorough status checking
     @@ -11,8 +11,10 @@
          restructuring.
      
          Unfortunately, it's not easy to run status and tell it to ignore the
     -    untracked cache; the only knob we have it to instruct it to *delete*
     -    (and ignore) the untracked cache.
     +    untracked cache; the only knob we have is core.untrackedCache=false,
     +    which is used to instruct git to *delete* the untracked cache (which
     +    might also ignore the untracked cache when it operates, but that isn't
     +    specified in the docs).
      
          Create a simple helper that will create a clone of the index that is
          missing the untracked cache bits, and use it to compare that the results
     @@ -33,9 +35,9 @@
      +#    iuc status --porcelain >expect &&
      +#    git status --porcelain >actual &&
      +#    test_cmp expect actual
     -+iuc() {
     ++iuc () {
      +	git ls-files -s >../current-index-entries
     -+	git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries
     ++	git ls-files -t | sed -ne s/^S.//p >../current-sparse-entries
      +
      +	GIT_INDEX_FILE=.git/tmp_index
      +	export GIT_INDEX_FILE
  -:  ----------- >  2:  88e9d5d5dbd t3000: add more testcases testing a variety of ls-files issues
  2:  a4287d690be =  3:  38d4d5a46b1 dir: fix simple typo in comment
  3:  48f37e5b114 =  4:  eeb38a25f3a dir: consolidate treat_path() and treat_one_path()
  4:  b5ad1939379 =  5:  6e29f1f6aec dir: fix broken comment
  5:  2603c1a9d13 =  6:  62dae938c8f dir: fix confusion based on variable tense
  6:  576f364329d =  7:  25921cb792e dir: refactor treat_directory to clarify control flow
  7:  e20525429e5 !  8:  b2caa426790 dir: replace exponential algorithm with a linear one
     @@ -187,8 +187,23 @@
       
      -	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
      +	/*
     -+	 * We only need to recurse into untracked/ignored directories if
     -+	 * either of the following bits is set:
     ++	 * If we have a pathspec which could match something _below_ this
     ++	 * directory (e.g. when checking 'subdir/' having a pathspec like
     ++	 * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we
     ++	 * need to recurse.
     ++	 */
     ++	if (pathspec) {
     ++		int ret = do_match_pathspec(istate, pathspec, dirname, len,
     ++					    0 /* prefix */, NULL /* seen */,
     ++					    DO_MATCH_LEADING_PATHSPEC);
     ++		if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
     ++			return path_recurse;
     ++	}
     ++
     ++	/*
     ++	 * Other than the path_recurse case immediately above, we only need
     ++	 * to recurse into untracked/ignored directories if either of the
     ++	 * following bits is set:
      +	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
      +	 *                           there are ignored directories below)
      +	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
     @@ -197,6 +212,16 @@
      +	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
       		return excluded ? path_excluded : path_untracked;
       
     ++	/*
     ++	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
     ++	 * recursing into ignored directories if the path is excluded and
     ++	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
     ++	 */
     ++	if (excluded &&
     ++	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
     ++	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
     ++		return path_excluded;
     ++
      +	/*
      +	 * If we have we don't want to know the all the paths under an
      +	 * untracked or ignored directory, we still need to go into the
     @@ -241,59 +266,52 @@
      +
      +	/* There are a variety of reasons we may need to fixup the state... */
      +	if (state == path_excluded) {
     -+		int i;
     -+
     -+		/*
     -+		 * When stop_early is set, read_directory_recursive() will
     -+		 * never return path_untracked regardless of whether
     -+		 * underlying paths were untracked or ignored (because
     -+		 * returning early means it excluded some paths, or
     -+		 * something like that -- see commit 5aaa7fd39aaf ("Improve
     -+		 * performance of git status --ignored", 2017-09-18)).
     -+		 * However, we're not really concerned with the status of
     -+		 * files under the directory, we just wanted to know
     -+		 * whether the directory was empty (state == path_none) or
     -+		 * not (state == path_excluded), and if not, we'd return
     -+		 * our original status based on whether the untracked
     -+		 * directory matched an exclusion pattern.
     ++		/* state == path_excluded implies all paths under
     ++		 * dirname were ignored...
     ++		 *
     ++		 * if running e.g. `git status --porcelain --ignored=matching`,
     ++		 * then we want to see the subpaths that are ignored.
     ++		 *
     ++		 * if running e.g. just `git status --porcelain`, then
     ++		 * we just want the directory itself to be listed as ignored
     ++		 * and not the individual paths underneath.
      +		 */
     -+		if (stop_early)
     -+			state = excluded ? path_excluded : path_untracked;
     ++		int want_ignored_subpaths =
     ++			((dir->flags & DIR_SHOW_IGNORED_TOO) &&
     ++			 (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING));
      +
     -+		else {
     ++		if (want_ignored_subpaths) {
      +			/*
     -+			 * When
     -+			 *     !stop_early && state == path_excluded
     -+			 * then all paths under dirname were ignored.  For
     -+			 * this case, git status --porcelain wants to just
     -+			 * list the directory itself as ignored and not
     -+			 * list the individual paths underneath.  Remove
     -+			 * the individual paths underneath.
     ++			 * with --ignored=matching, we want the subpaths
     ++			 * INSTEAD of the directory itself.
      +			 */
     ++			state = path_none;
     ++		} else {
     ++			int i;
      +			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
     -+				free(dir->ignored[i]);
     ++				FREE_AND_NULL(dir->ignored[i]);
      +			dir->ignored_nr = old_ignored_nr;
      +		}
      +	}
     -+
     -+	/*
     + 
     + 	/*
     +-	 * If this is an excluded directory, then we only need to check if
     +-	 * the directory contains any files.
      +	 * If there is nothing under the current directory and we are not
      +	 * hiding empty directories, then we need to report on the
      +	 * untracked or ignored status of the directory itself.
     -+	 */
     + 	 */
     +-	return read_directory_recursive(dir, istate, dirname, len,
     +-					untracked, 1, excluded, pathspec);
      +	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
      +		state = excluded ? path_excluded : path_untracked;
     - 
     - 	/*
     --	 * If this is an excluded directory, then we only need to check if
     --	 * the directory contains any files.
     ++
     ++	/*
      +	 * We can recurse into untracked directories that don't match any
      +	 * of the given pathspecs when some file underneath the directory
      +	 * might match one of the pathspecs.  If so, we should make sure
      +	 * to note that the directory itself did not match.
     - 	 */
     --	return read_directory_recursive(dir, istate, dirname, len,
     --					untracked, 1, excluded, pathspec);
     ++	 */
      +	if (pathspec &&
      +	    !match_pathspec(istate, pathspec, dirname, len,
      +			    0 /* prefix */, NULL,
     @@ -316,6 +334,47 @@
       	strbuf_setlen(path, baselen);
       	if (!cdir->ucd) {
       		strbuf_addstr(path, cdir->file);
     +@@
     + 				      const struct pathspec *pathspec)
     + {
     + 	int has_path_in_index, dtype, excluded;
     +-	enum path_treatment path_treatment;
     + 
     + 	if (!cdir->d_name)
     + 		return treat_path_fast(dir, untracked, cdir, istate, path,
     +@@
     + 	default:
     + 		return path_none;
     + 	case DT_DIR:
     +-		strbuf_addch(path, '/');
     +-		path_treatment = treat_directory(dir, istate, untracked,
     +-						 path->buf, path->len,
     +-						 baselen, excluded, pathspec);
     + 		/*
     +-		 * If 1) we only want to return directories that
     +-		 * match an exclude pattern and 2) this directory does
     +-		 * not match an exclude pattern but all of its
     +-		 * contents are excluded, then indicate that we should
     +-		 * recurse into this directory (instead of marking the
     +-		 * directory itself as an ignored path).
     ++		 * WARNING: Do not ignore/amend the return value from
     ++		 * treat_directory(), and especially do not change it to return
     ++		 * path_recurse as that can cause exponential slowdown.
     ++		 * Instead, modify treat_directory() to return the right value.
     + 		 */
     +-		if (!excluded &&
     +-		    path_treatment == path_excluded &&
     +-		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
     +-		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
     +-			return path_recurse;
     +-		return path_treatment;
     ++		strbuf_addch(path, '/');
     ++		return treat_directory(dir, istate, untracked,
     ++				       path->buf, path->len,
     ++				       baselen, excluded, pathspec);
     + 	case DT_REG:
     + 	case DT_LNK:
     + 		return excluded ? path_excluded : path_untracked;
      @@
       	int stop_at_first_file, const struct pathspec *pathspec)
       {
  -:  ----------- >  9:  08a10869816 dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
  -:  ----------- > 10:  cee74871e43 dir: replace double pathspec matching with single in treat_directory()
  -:  ----------- > 11:  61d9c9d758e Fix error-prone fill_directory() API; make it only return matches
  -:  ----------- > 12:  725adf0a9b8 completion: fix 'git add' on paths under an untracked directory

-- 
gitgitgadget

  parent reply	other threads:[~2020-04-01  4:17 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-30 15:20   ` Derrick Stolee
2020-01-31 18:04   ` SZEDER Gábor
2020-01-31 18:17     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
2020-01-30 15:33   ` Derrick Stolee
2020-01-30 15:45     ` Elijah Newren
2020-01-30 16:00       ` Derrick Stolee
2020-01-30 16:10         ` Derrick Stolee
2020-01-30 16:20           ` Elijah Newren
2020-01-30 18:17             ` Derrick Stolee
2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-30 15:55   ` Derrick Stolee
2020-01-30 17:13     ` Elijah Newren
2020-01-30 17:45       ` Elijah Newren
2020-01-31 17:13   ` SZEDER Gábor
2020-01-31 17:47     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
2020-03-26 13:02       ` Derrick Stolee
2020-03-26 21:18         ` Elijah Newren
2020-03-25 19:31     ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
2020-03-26 13:13       ` Derrick Stolee
2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-03-27 13:09         ` Derrick Stolee
2020-03-29 18:18           ` Junio C Hamano
2020-03-31 20:15             ` Elijah Newren
2020-03-26 21:27       ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
2020-03-28 17:33         ` Elijah Newren
2020-03-29 18:20           ` Junio C Hamano
2020-04-01  4:17       ` Elijah Newren via GitGitGadget [this message]
2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-04-01 13:57           ` Derrick Stolee
2020-04-01 15:59             ` Elijah Newren
2020-04-01  4:17         ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget
2020-07-19  6:33           ` Andreas Schwab
2020-07-19 12:39             ` Martin Ågren
2020-07-20 15:25               ` Elijah Newren
2020-07-20 18:45                 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren
2020-07-20 18:49                   ` Elijah Newren
2020-07-20 18:51                     ` Martin Ågren
2020-07-20 20:25                   ` Junio C Hamano
2020-07-20 18:58                 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Junio C Hamano
2020-04-01  4:17         ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.700.v5.git.git.1585714667.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.melka@gmail.com \
    --cc=newren@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=stolee@gmail.com \
    --cc=sxlijin@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).