Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive()
@ 2020-01-29 22:03 Elijah Newren via GitGitGadget
  2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

This patch series builds on en/fill-directory-fixes-more. This series should
be considered an RFC because of the untracked-cache changes (see the last
two commits), for which I'm hoping to get an untracked-cache expert to
comment. This series does provide some modest speedups (see second to last
commit message), and should allow 'git status --ignored' to complete in a
more reasonable timeframe for Martin Melka (see 
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
)

Elijah Newren (6):
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: move setting of nested_repo next to its actual usage
  dir: replace exponential algorithm with a linear one
  t7063: blindly accept diffs

 dir.c                             | 295 +++++++++++++++++-------------
 t/t7063-status-untracked-cache.sh |  50 ++---
 2 files changed, 191 insertions(+), 154 deletions(-)


base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v1
Pull-Request: https://github.com/git/git/pull/700
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/6] dir: consolidate treat_path() and treat_one_path()
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
@ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget
  2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 16e2cfa90993 ("read_directory(): further split treat_path()",
2010-01-08) split treat_one_path() out of treat_path(), because
treat_leading_path() would not have access to a dirent but wanted to
re-use as much of treat_path() as possible.  Not re-using all of
treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir:
fix checks on common prefix directory", 2019-12-19).  Finally, in commit
ad6f2157f951 ("dir: restructure in a way to avoid passing around a
struct dirent", 2020-01-16), dirents were removed from treat_path() and
other functions entirely.

Since the only reason for splitting these functions was the lack of a
dirent -- which no longer applies to either function -- and since the
split caused problems in the past resulting in us not using
treat_one_path() separately anymore, just undo the split.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 121 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 55 insertions(+), 66 deletions(-)

diff --git a/dir.c b/dir.c
index b460211e61..68c56aeddb 100644
--- a/dir.c
+++ b/dir.c
@@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate,
 	return dtype;
 }
 
-static enum path_treatment treat_one_path(struct dir_struct *dir,
-					  struct untracked_cache_dir *untracked,
-					  struct index_state *istate,
-					  struct strbuf *path,
-					  int baselen,
-					  const struct pathspec *pathspec,
-					  int dtype)
-{
-	int exclude;
-	int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case);
+static enum path_treatment treat_path_fast(struct dir_struct *dir,
+					   struct untracked_cache_dir *untracked,
+					   struct cached_dir *cdir,
+					   struct index_state *istate,
+					   struct strbuf *path,
+					   int baselen,
+					   const struct pathspec *pathspec)
+{
+	strbuf_setlen(path, baselen);
+	if (!cdir->ucd) {
+		strbuf_addstr(path, cdir->file);
+		return path_untracked;
+	}
+	strbuf_addstr(path, cdir->ucd->name);
+	/* treat_one_path() does this before it calls treat_directory() */
+	strbuf_complete(path, '/');
+	if (cdir->ucd->check_only)
+		/*
+		 * check_only is set as a result of treat_directory() getting
+		 * to its bottom. Verify again the same set of directories
+		 * with check_only set.
+		 */
+		return read_directory_recursive(dir, istate, path->buf, path->len,
+						cdir->ucd, 1, 0, pathspec);
+	/*
+	 * We get path_recurse in the first run when
+	 * directory_exists_in_index() returns index_nonexistent. We
+	 * are sure that new changes in the index does not impact the
+	 * outcome. Return now.
+	 */
+	return path_recurse;
+}
+
+static enum path_treatment treat_path(struct dir_struct *dir,
+				      struct untracked_cache_dir *untracked,
+				      struct cached_dir *cdir,
+				      struct index_state *istate,
+				      struct strbuf *path,
+				      int baselen,
+				      const struct pathspec *pathspec)
+{
+	int has_path_in_index, dtype, exclude;
 	enum path_treatment path_treatment;
 
-	dtype = resolve_dtype(dtype, istate, path->buf, path->len);
+	if (!cdir->d_name)
+		return treat_path_fast(dir, untracked, cdir, istate, path,
+				       baselen, pathspec);
+	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
+		return path_none;
+	strbuf_setlen(path, baselen);
+	strbuf_addstr(path, cdir->d_name);
+	if (simplify_away(path->buf, path->len, pathspec))
+		return path_none;
+
+	dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len);
 
 	/* Always exclude indexed files */
+	has_path_in_index = !!index_file_exists(istate, path->buf, path->len,
+						ignore_case);
 	if (dtype != DT_DIR && has_path_in_index)
 		return path_none;
 
@@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	}
 }
 
-static enum path_treatment treat_path_fast(struct dir_struct *dir,
-					   struct untracked_cache_dir *untracked,
-					   struct cached_dir *cdir,
-					   struct index_state *istate,
-					   struct strbuf *path,
-					   int baselen,
-					   const struct pathspec *pathspec)
-{
-	strbuf_setlen(path, baselen);
-	if (!cdir->ucd) {
-		strbuf_addstr(path, cdir->file);
-		return path_untracked;
-	}
-	strbuf_addstr(path, cdir->ucd->name);
-	/* treat_one_path() does this before it calls treat_directory() */
-	strbuf_complete(path, '/');
-	if (cdir->ucd->check_only)
-		/*
-		 * check_only is set as a result of treat_directory() getting
-		 * to its bottom. Verify again the same set of directories
-		 * with check_only set.
-		 */
-		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, 0, pathspec);
-	/*
-	 * We get path_recurse in the first run when
-	 * directory_exists_in_index() returns index_nonexistent. We
-	 * are sure that new changes in the index does not impact the
-	 * outcome. Return now.
-	 */
-	return path_recurse;
-}
-
-static enum path_treatment treat_path(struct dir_struct *dir,
-				      struct untracked_cache_dir *untracked,
-				      struct cached_dir *cdir,
-				      struct index_state *istate,
-				      struct strbuf *path,
-				      int baselen,
-				      const struct pathspec *pathspec)
-{
-	if (!cdir->d_name)
-		return treat_path_fast(dir, untracked, cdir, istate, path,
-				       baselen, pathspec);
-	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
-		return path_none;
-	strbuf_setlen(path, baselen);
-	strbuf_addstr(path, cdir->d_name);
-	if (simplify_away(path->buf, path->len, pathspec))
-		return path_none;
-
-	return treat_one_path(dir, untracked, istate, path, baselen, pathspec,
-			      cdir->d_type);
-}
-
 static void add_untracked(struct untracked_cache_dir *dir, const char *name)
 {
 	if (!dir)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 2/6] dir: fix broken comment
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
@ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget
  2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 68c56aeddb..c358158f55 100644
--- a/dir.c
+++ b/dir.c
@@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 					add_untracked(untracked, path.buf + baselen);
 				break;
 			}
-			/* skip the dir_add_* part */
+			/* skip the add_path_to_appropriate_result_list() */
 			continue;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3/6] dir: fix confusion based on variable tense
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
  2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
@ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget
  2020-01-30 15:20   ` Derrick Stolee
  2020-01-31 18:04   ` SZEDER Gábor
  2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Despite having contributed several fixes in this area, I have for months
(years?) assumed that the "exclude" variable was a directive; this
caused me to think of it as a different mode we operate in and left me
confused as I tried to build up a mental model around why we'd need such
a directive.  I mostly tried to ignore it while focusing on the pieces I
was trying to understand.

Then I finally traced this variable all back to a call to is_excluded(),
meaning it was actually functioning as an adjective.  In particular, it
was a checked property ("Does this path match a rule in .gitignore?"),
rather than a mode passed in from the caller.  Change the variable name
to match the part of speech used by the function called to define it,
which will hopefully make these bits of code slightly clearer to the
next reader.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index c358158f55..225f0bc082 100644
--- a/dir.c
+++ b/dir.c
@@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
 static enum path_treatment treat_directory(struct dir_struct *dir,
 	struct index_state *istate,
 	struct untracked_cache_dir *untracked,
-	const char *dirname, int len, int baselen, int exclude,
+	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
@@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		}
 		if (nested_repo)
 			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(exclude ? path_excluded : path_untracked));
+				(excluded ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
-		if (exclude &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
+		if (excluded &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
 
 			/*
 			 * This is an excluded directory and we are
@@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* This is the "show_other_directories" case */
 
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
@@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * the directory contains any files.
 	 */
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, exclude, pathspec);
+					untracked, 1, excluded, pathspec);
 }
 
 /*
@@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      int baselen,
 				      const struct pathspec *pathspec)
 {
-	int has_path_in_index, dtype, exclude;
+	int has_path_in_index, dtype, excluded;
 	enum path_treatment path_treatment;
 
 	if (!cdir->d_name)
@@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	    (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent))
 		return path_none;
 
-	exclude = is_excluded(dir, istate, path->buf, &dtype);
+	excluded = is_excluded(dir, istate, path->buf, &dtype);
 
 	/*
 	 * Excluded? If we don't explicitly want to show
 	 * ignored files, ignore it
 	 */
-	if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
+	if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
 		return path_excluded;
 
 	switch (dtype) {
@@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		strbuf_addch(path, '/');
 		path_treatment = treat_directory(dir, istate, untracked,
 						 path->buf, path->len,
-						 baselen, exclude, pathspec);
+						 baselen, excluded, pathspec);
 		/*
 		 * If 1) we only want to return directories that
 		 * match an exclude pattern and 2) this directory does
@@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		 * recurse into this directory (instead of marking the
 		 * directory itself as an ignored path).
 		 */
-		if (!exclude &&
+		if (!excluded &&
 		    path_treatment == path_excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
@@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		return path_treatment;
 	case DT_REG:
 	case DT_LNK:
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
@ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget
  2020-01-30 15:33   ` Derrick Stolee
  2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 225f0bc082..ef3307718a 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
-	int nested_repo = 0;
+	int nested_repo;
 
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(istate, dirname, len-1)) {
@@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
+		nested_repo = 0;
 		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
 		    !(dir->flags & DIR_NO_GITLINKS)) {
 			struct strbuf sb = STRBUF_INIT;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5/6] dir: replace exponential algorithm with a linear one
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
@ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget
  2020-01-30 15:55   ` Derrick Stolee
  2020-01-31 17:13   ` SZEDER Gábor
  2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  6 siblings, 2 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

dir's read_directory_recursive() naturally operates recursively in order
to walk the directory tree.  Treating of directories is sometimes weird
because there are so many different permutations about how to handle
directories.  Some examples:

   * 'git ls-files -o --directory' only needs to know that a directory
     itself is untracked; it doesn't need to recurse into it to see what
     is underneath.

   * 'git status' needs to recurse into an untracked directory, but only
     to determine whether or not it is empty.  If there are no files
     underneath, the directory itself will be omitted from the output.
     If it is not empty, only the directory will be listed.

   * 'git status --ignored' needs to recurse into untracked directories
     and report all the ignored entries and then report the directory as
     untracked -- UNLESS all the entries under the directory are
     ignored, in which case we don't print any of the entries under the
     directory and just report the directory itself as ignored.

   * For 'git clean', we may need to recurse into a directory that
     doesn't match any specified pathspecs, if it's possible that there
     is an entry underneath the directory that can match one of the
     pathspecs.  In such a case, we need to be careful to omit the
     directory itself from the list of paths (see e.g. commit
     404ebceda01c ("dir: also check directories for matching pathspecs",
     2019-09-17))

Part of the tension noted above is that the treatment of a directory can
changed based on the files within it, and based on the various settings
in dir->flags.  Trying to keep this in mind while reading over the code,
it is easy to (accidentally?) think in terms of "treat_directory() tells
us what to do with a directory, and read_directory_recursive() is the
thing that recurses".  Since we need to look into a directory to know
how to treat it, though, it was quite easy to decide to recurse into the
directory from treat_directory() by adding a read_directory_recursive()
call.  Adding such a call is actually fine, IF we didn't also cause
read_directory_recursive() to recurse into the same directory again.

Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
for ignored files", 2017-05-18), added exactly such a case to the code,
meaning we'd have two calls to read_directory_recursive() for an
untracked directory.  So, if we had a file named
   one/two/three/four/five/somefile.txt
and nothing in one/ was tracked, then 'git status --ignored' would
call read_directory_recursive() twice on the directory 'one/', and
each of those would call read_directory_recursive() twice on the
directory 'one/two/', and so on until read_directory_recursive() was
called 2^5 times for 'one/two/three/four/five/'.

Avoid calling read_directory_recursive() twice per level by moving a
lot of the special logic into treat_directory().

Since dir.c is somewhat complex, extra cruft built up around this over
time.  While trying to unravel it, I noticed several instances where the
first call to read_directory_recursive() would return e.g.
path_untracked for a some directory and a later one would return e.g.
path_none, and the code relied on the side-effect of the first adding
untracked entries to dir->entries in order to get the correct output
despite the supposed override in return value by the later call.

I am somewhat concerned that there are still bugs and maybe even
testcases with the wrong expectation.  I have tried to carefully
document treat_directory() since it becomes more complex after this
change (though much of this complexity came from elsewhere that probably
deserved better comments to begin with).  However, much of my work felt
more like a game of whackamole while attempting to make the code match
the existing regression tests than an attempt to create an
implementation that matched some clear design.  That seems wrong to me,
but the rules of existing behavior had so many special cases that I had
a hard time coming up with some overarching rules about what correct
behavior is for all cases, forcing me to hope that the regression tests
are correct and sufficient.  (I'll note that this turmoil makes working
with dir.c extremely unpleasant for me; I keep hoping it'll get better,
but it never seems to.)

However, on the positive side, it does make the code much faster.  For
the following simple shell loop in an empty repository:

  for depth in $(seq 10 25)
  do
    dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
    rm -rf dir
    mkdir -p $dirs
    >$dirs/untracked-file
    /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
  done

I saw the following timings, in seconds (note that the numbers are a
little noisy from run-to-run, but the trend is very clear with every
run):

    10: 0.03
    11: 0.05
    12: 0.08
    13: 0.19
    14: 0.29
    15: 0.50
    16: 1.05
    17: 2.11
    18: 4.11
    19: 8.60
    20: 17.55
    21: 33.87
    22: 68.71
    23: 140.05
    24: 274.45
    25: 551.15

After this fix, those drop to:

    10: 0.00
    11: 0.00
    12: 0.00
    13: 0.00
    14: 0.00
    15: 0.00
    16: 0.00
    17: 0.00
    18: 0.00
    19: 0.00
    20: 0.00
    21: 0.00
    22: 0.00
    23: 0.00
    24: 0.00
    25: 0.00

In fact, it isn't until a depth of 190 nested directories that it
sometimes starts reporting a time of 0.01 seconds and doesn't
consistently report 0.01 seconds until there are 240 nested directories.
The previous code would have taken
  17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
to have completed the 240 nested directories case.  It's not often
that you get to speed something up by a factor of 3*10^69.

WARNING: This change breaks t7063.  I don't know whether that is to be expected
(I now intentionally visit untracked directories differently so naturally the
untracked cache should change), or if I've broken something.  I'm hoping to get
an untracked cache expert to chime in...

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 105 insertions(+), 46 deletions(-)

diff --git a/dir.c b/dir.c
index ef3307718a..aaf038a9c4 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
-	int nested_repo;
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
+	enum path_treatment state;
+	int nested_repo, old_ignored_nr, stop_early;
 
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(istate, dirname, len-1)) {
@@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/* This is the "show_other_directories" case */
 
-	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+	/*
+	 * We only need to recurse into untracked/ignored directories if
+	 * either of the following bits is set:
+	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
+	 *                           there are ignored directories below)
+	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
+	 *                                 the directory is empty)
+	 */
+	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
 		return excluded ? path_excluded : path_untracked;
 
+	/*
+	 * If we only want to determine if dirname is empty, then we can
+	 * stop at the first file we find underneath that directory rather
+	 * than continuing to recurse beyond it.  If DIR_SHOW_IGNORED_TOO
+	 * is set, then we want MORE than just determining if dirname is
+	 * empty.
+	 */
+	stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
+		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
+
+	/*
+	 * If /every/ file within an untracked directory is ignored, then
+	 * we want to treat the directory as ignored (for e.g. status
+	 * --porcelain), without listing the individual ignored files
+	 * underneath.  To do so, we'll save the current ignored_nr, and
+	 * pop all the ones added after it if it turns out the entire
+	 * directory is ignored.
+	 */
+	old_ignored_nr = dir->ignored_nr;
+
+	/* Actually recurse into dirname now, we'll fixup the state later. */
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
+	state = read_directory_recursive(dir, istate, dirname, len, untracked,
+					 stop_early, stop_early, pathspec);
+
+	/* There are a variety of reasons we may need to fixup the state... */
+	if (state == path_excluded) {
+		int i;
+
+		/*
+		 * When stop_early is set, read_directory_recursive() will
+		 * never return path_untracked regardless of whether
+		 * underlying paths were untracked or ignored (because
+		 * returning early means it excluded some paths, or
+		 * something like that -- see commit 5aaa7fd39aaf ("Improve
+		 * performance of git status --ignored", 2017-09-18)).
+		 * However, we're not really concerned with the status of
+		 * files under the directory, we just wanted to know
+		 * whether the directory was empty (state == path_none) or
+		 * not (state == path_excluded), and if not, we'd return
+		 * our original status based on whether the untracked
+		 * directory matched an exclusion pattern.
+		 */
+		if (stop_early)
+			state = excluded ? path_excluded : path_untracked;
+
+		else {
+			/*
+			 * When
+			 *     !stop_early && state == path_excluded
+			 * then all paths under dirname were ignored.  For
+			 * this case, git status --porcelain wants to just
+			 * list the directory itself as ignored and not
+			 * list the individual paths underneath.  Remove
+			 * the individual paths underneath.
+			 */
+			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
+				free(dir->ignored[i]);
+			dir->ignored_nr = old_ignored_nr;
+		}
+	}
 
 	/*
-	 * If this is an excluded directory, then we only need to check if
-	 * the directory contains any files.
+	 * If there is nothing under the current directory and we are not
+	 * hiding empty directories, then we need to report on the
+	 * untracked or ignored status of the directory itself.
 	 */
-	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, excluded, pathspec);
+	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+		state = excluded ? path_excluded : path_untracked;
+
+	/*
+	 * We can recurse into untracked directories that don't match any
+	 * of the given pathspecs when some file underneath the directory
+	 * might match one of the pathspecs.  If so, we should make sure
+	 * to note that the directory itself did not match.
+	 */
+	if (pathspec &&
+	    !match_pathspec(istate, pathspec, dirname, len,
+			    0 /* prefix */, NULL,
+			    0 /* do NOT special case dirs */))
+		state = path_none;
+
+	return state;
 }
 
 /*
@@ -1872,6 +1961,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 					   int baselen,
 					   const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
 	strbuf_setlen(path, baselen);
 	if (!cdir->ucd) {
 		strbuf_addstr(path, cdir->file);
@@ -2177,14 +2271,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in treat_leading_path().  See the commit message for the
-	 * commit adding this warning as well as the commit preceding it
-	 * for details.
+	 * WARNING: Do NOT call recurse unless path_recurse is returned
+	 *          from treat_path().  Recursing on any other return value
+	 *          results in exponential slowdown.
 	 */
-
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2206,13 +2296,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			dir_state = state;
 
 		/* recurse into subdir if instructed by treat_path */
-		if ((state == path_recurse) ||
-			((state == path_untracked) &&
-			 (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) &&
-			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  (pathspec &&
-			   do_match_pathspec(istate, pathspec, path.buf, path.len,
-					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
+		if (state == path_recurse) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -2296,15 +2380,6 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
-	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in read_directory_recursive().  See 777b420347 (dir:
-	 * synchronize treat_leading_path() and read_directory_recursive(),
-	 * 2019-12-19) and its parent commit for details.
-	 */
-
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf subdir = STRBUF_INIT;
 	int prevlen, baselen;
@@ -2355,23 +2430,7 @@ static int treat_leading_path(struct dir_struct *dir,
 		strbuf_reset(&subdir);
 		strbuf_add(&subdir, path+prevlen, baselen-prevlen);
 		cdir.d_name = subdir.buf;
-		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
-				    pathspec);
-		if (state == path_untracked &&
-		    resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
-		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
-				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
-			if (!match_pathspec(istate, pathspec, sb.buf, sb.len,
-					    0 /* prefix */, NULL,
-					    0 /* do NOT special case dirs */))
-				state = path_none;
-			add_path_to_appropriate_result_list(dir, NULL, &cdir,
-							    istate,
-							    &sb, baselen,
-							    pathspec, state);
-			state = path_recurse;
-		}
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
 
 		if (state != path_recurse)
 			break; /* do not recurse into it */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 6/6] t7063: blindly accept diffs
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
@ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Assuming that the changes I made in the last commit to drastically
modify how and when and especially how frequently untracked paths are
visited should result in changes to the untracked-cache, this commit
simply updates the t7063 testcases to match what the code now reports.

If this is correct, this commit should be squashed into the previous
one.

It'd be nice if I could get an untracked-cache expert to comment on
this...

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7063-status-untracked-cache.sh | 50 ++++++++++++-------------------
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 190ae149cf..c1b0fd0540 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -85,9 +85,7 @@ dtwo/
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 
 test_expect_success 'status first time (empty cache)' '
@@ -140,8 +138,6 @@ test_expect_success 'modify in root directory, one dir invalidation' '
 A  done/one
 A  one
 A  two
-?? dthree/
-?? dtwo/
 ?? four
 ?? three
 EOF
@@ -164,15 +160,11 @@ core.excludesfile 0000000000000000000000000000000000000000
 exclude_per_dir .gitignore
 flags 00000006
 / 0000000000000000000000000000000000000000 recurse valid
-dthree/
-dtwo/
 four
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -217,9 +209,7 @@ dtwo/
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -235,6 +225,7 @@ A  done/one
 A  one
 A  two
 ?? .gitignore
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -256,11 +247,11 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -277,7 +268,6 @@ flags 00000006
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -290,7 +280,6 @@ test_expect_success 'status after the move' '
 A  done/one
 A  one
 ?? .gitignore
-?? dtwo/
 ?? two
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -312,12 +301,10 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 two
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -334,7 +321,6 @@ flags 00000006
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -348,7 +334,6 @@ A  done/one
 A  one
 A  two
 ?? .gitignore
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
@@ -369,11 +354,9 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -392,7 +375,6 @@ test_expect_success 'status after commit' '
 	git status --porcelain >../actual &&
 	cat >../status.expect <<EOF &&
 ?? .gitignore
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
@@ -413,11 +395,9 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -451,7 +431,6 @@ test_expect_success 'test sparse status with untracked cache' '
  M done/two
 ?? .gitignore
 ?? done/five
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
@@ -472,12 +451,10 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -491,7 +468,6 @@ test_expect_success 'test sparse status again with untracked cache' '
  M done/two
 ?? .gitignore
 ?? done/five
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
@@ -519,7 +495,6 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 ?? .gitignore
 ?? done/five
 ?? done/sub/
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
@@ -540,17 +515,13 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
 sub/
 /done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
-sub/
 /done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
-file
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect-from-test-dump ../actual
 '
@@ -615,6 +586,23 @@ test_expect_success 'setting core.untrackedCache to true and using git status cr
 	test_cmp ../expect-no-uc ../actual &&
 	git status &&
 	test-tool dump-untracked-cache >../actual &&
+	cat >../expect-from-test-dump <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
+.gitignore
+dthree/
+dtwo/
+/done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
+five
+sub/
+/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
+/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+EOF
 	test_cmp ../expect-from-test-dump ../actual
 '
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/6] dir: fix confusion based on variable tense
  2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
@ 2020-01-30 15:20   ` Derrick Stolee
  2020-01-31 18:04   ` SZEDER Gábor
  1 sibling, 0 replies; 68+ messages in thread
From: Derrick Stolee @ 2020-01-30 15:20 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Despite having contributed several fixes in this area, I have for months
> (years?) assumed that the "exclude" variable was a directive; this
> caused me to think of it as a different mode we operate in and left me
> confused as I tried to build up a mental model around why we'd need such
> a directive.  I mostly tried to ignore it while focusing on the pieces I
> was trying to understand.
> 
> Then I finally traced this variable all back to a call to is_excluded(),
> meaning it was actually functioning as an adjective.  In particular, it
> was a checked property ("Does this path match a rule in .gitignore?"),
> rather than a mode passed in from the caller.  Change the variable name
> to match the part of speech used by the function called to define it,
> which will hopefully make these bits of code slightly clearer to the
> next reader.

I agree that some of the terminology in the .gitignore is confusing,
especially when the terminology was used in the opposite sense for
the sparse-checkout feature. I think this rename is worth the noise.

For reference, here are some commits from ds/include-exclude that
performed similar refactors:

468ce99b77 unpack-trees: rename 'is_excluded_from_list()'
65edd96aec treewide: rename 'exclude' methods to 'pattern'
4ff89ee52c treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
caa3d55444 treewide: rename 'struct exclude_list' to 'struct pattern_list'
ab8db61390 treewide: rename 'struct exclude' to 'struct path_pattern'

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
@ 2020-01-30 15:33   ` Derrick Stolee
  2020-01-30 15:45     ` Elijah Newren
  0 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-01-30 15:33 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  dir.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 225f0bc082..ef3307718a 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  	const char *dirname, int len, int baselen, int excluded,
>  	const struct pathspec *pathspec)
>  {
> -	int nested_repo = 0;
> +	int nested_repo;
>  
>  	/* The "len-1" is to strip the final '/' */
>  	switch (directory_exists_in_index(istate, dirname, len-1)) {
> @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  		return path_none;
>  
>  	case index_nonexistent:
> +		nested_repo = 0;

I had to look at this code in-full from en/fill-directory-fixes-more to
be sure that this case was the only use of nested_repo. However, I found
that this switch statement is unnecessarily complicated. By converting
the switch to multiple "if" statements, I noticed that the third case
actually has a "break" statement that can lead to the final "fourth case"
outside the switch statement.

Hopefully the patch below is a worthy replacement for this one:

-->8--

From b5c04e6e028cb6c7f9e78fbdd2182383d928fe6d Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Thu, 30 Jan 2020 15:28:39 +0000
Subject: [PATCH] dir: refactor treat_directory to clarify variable scope

The nested_repo variable in treat_directory() is created and
initialized before a multi-case switch statement, but is only
used by one case. In fact, this switch is very asymmetrical,
as the first two cases are simple but the third is more
complicated than the rest of the method.

Extract the switch statement into a series of "if" statements.
This simplifies the trivial cases, while highlighting the fact
that a "break" statement in a condition of the third case
actually leads to jumping to the fourth case (after the switch).
This assists a reader who provides an initial scan to notice
there is a second way to approach the "show_other_directories"
case than simply the response from directory_exists_in_index().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/dir.c b/dir.c
index b460211e61..e48812efe6 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,17 +1659,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int exclude,
 	const struct pathspec *pathspec)
 {
-	int nested_repo = 0;
-
 	/* The "len-1" is to strip the final '/' */
-	switch (directory_exists_in_index(istate, dirname, len-1)) {
-	case index_directory:
-		return path_recurse;
+	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
-	case index_gitdir:
+	if (status == index_directory)
+		return path_recurse;
+	if (status == index_gitdir)
 		return path_none;
 
-	case index_nonexistent:
+	if (status == index_nonexistent) {
+		int nested_repo = 0;
 		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
 		    !(dir->flags & DIR_NO_GITLINKS)) {
 			struct strbuf sb = STRBUF_INIT;
@@ -1682,7 +1681,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 				(exclude ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
-			break;
+			goto show_other_directories;
 		if (exclude &&
 			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
 			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
@@ -1711,7 +1710,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
-
+show_other_directories:
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
 		return exclude ? path_excluded : path_untracked;
 
-- 
2.25.0.vfs.1.1



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-30 15:33   ` Derrick Stolee
@ 2020-01-30 15:45     ` Elijah Newren
  2020-01-30 16:00       ` Derrick Stolee
  0 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren @ 2020-01-30 15:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On Thu, Jan 30, 2020 at 7:33 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  dir.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/dir.c b/dir.c
> > index 225f0bc082..ef3307718a 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >       const char *dirname, int len, int baselen, int excluded,
> >       const struct pathspec *pathspec)
> >  {
> > -     int nested_repo = 0;
> > +     int nested_repo;
> >
> >       /* The "len-1" is to strip the final '/' */
> >       switch (directory_exists_in_index(istate, dirname, len-1)) {
> > @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >               return path_none;
> >
> >       case index_nonexistent:
> > +             nested_repo = 0;
>
> I had to look at this code in-full from en/fill-directory-fixes-more to
> be sure that this case was the only use of nested_repo. However, I found
> that this switch statement is unnecessarily complicated. By converting
> the switch to multiple "if" statements, I noticed that the third case
> actually has a "break" statement that can lead to the final "fourth case"
> outside the switch statement.
>
> Hopefully the patch below is a worthy replacement for this one:
>
> -->8--
>
> From b5c04e6e028cb6c7f9e78fbdd2182383d928fe6d Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Thu, 30 Jan 2020 15:28:39 +0000
> Subject: [PATCH] dir: refactor treat_directory to clarify variable scope
>
> The nested_repo variable in treat_directory() is created and
> initialized before a multi-case switch statement, but is only
> used by one case. In fact, this switch is very asymmetrical,
> as the first two cases are simple but the third is more
> complicated than the rest of the method.
>
> Extract the switch statement into a series of "if" statements.
> This simplifies the trivial cases, while highlighting the fact
> that a "break" statement in a condition of the third case
> actually leads to jumping to the fourth case (after the switch).
> This assists a reader who provides an initial scan to notice
> there is a second way to approach the "show_other_directories"
> case than simply the response from directory_exists_in_index().

Wait, I'm lost.  Wasn't that break statement the only way to get to
the "show_other_directories" block of code after the switch statement?
 I can't see where the second way is; am I missing something?

That is, unless directory_exists_in_index() suddenly starts returning
some value other than the three current possibilities.  Perhaps we
should throw a BUG() if we get anything other than index_directory,
index_gitdir, or index_nonexistent.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index b460211e61..e48812efe6 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1659,17 +1659,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>         const char *dirname, int len, int baselen, int exclude,
>         const struct pathspec *pathspec)
>  {
> -       int nested_repo = 0;
> -
>         /* The "len-1" is to strip the final '/' */
> -       switch (directory_exists_in_index(istate, dirname, len-1)) {
> -       case index_directory:
> -               return path_recurse;
> +       enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
>
> -       case index_gitdir:
> +       if (status == index_directory)
> +               return path_recurse;
> +       if (status == index_gitdir)
>                 return path_none;
>
> -       case index_nonexistent:
> +       if (status == index_nonexistent) {
> +               int nested_repo = 0;
>                 if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
>                     !(dir->flags & DIR_NO_GITLINKS)) {
>                         struct strbuf sb = STRBUF_INIT;
> @@ -1682,7 +1681,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>                                 (exclude ? path_excluded : path_untracked));
>
>                 if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
> -                       break;
> +                       goto show_other_directories;
>                 if (exclude &&
>                         (dir->flags & DIR_SHOW_IGNORED_TOO) &&
>                         (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
> @@ -1711,7 +1710,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>         }

I'd say we'd want to add a BUG("Unhandled value for
directory_exists_in_index: %d\n", status); right here.

>
>         /* This is the "show_other_directories" case */
> -
> +show_other_directories:
>         if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
>                 return exclude ? path_excluded : path_untracked;
>
> --
> 2.25.0.vfs.1.1

Otherwise, the patch looks good to me and I'll be happy to replace my
patch with this one.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one
  2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
@ 2020-01-30 15:55   ` Derrick Stolee
  2020-01-30 17:13     ` Elijah Newren
  2020-01-31 17:13   ` SZEDER Gábor
  1 sibling, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-01-30 15:55 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren,
	Kevin.Willford

I am very enticed by the subject!

On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote:
> Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
> for ignored files", 2017-05-18), added exactly such a case to the code,

I was disappointed that the commit you mention did not add a test for
the new behavior, but then found a test change in the following commit
fb89888849 (dir: hide untracked contents of untracked dirs, 2017-05-18).
This makes me feel better that your changes are less likely to un-do
the intention of df5bcdf83aeb.

> meaning we'd have two calls to read_directory_recursive() for an
> untracked directory.  So, if we had a file named
>    one/two/three/four/five/somefile.txt
> and nothing in one/ was tracked, then 'git status --ignored' would
> call read_directory_recursive() twice on the directory 'one/', and
> each of those would call read_directory_recursive() twice on the
> directory 'one/two/', and so on until read_directory_recursive() was
> called 2^5 times for 'one/two/three/four/five/'.

Wow! Good find. "Accidentally exponential" is a lot worse than
"accidentally quadratic". At least the N here _usually_ does not
grow too quickly, but the constant here (lstat-ing directories and
files) is significant enough that 2^3 or 2^4 is enough to notice
the difference.

> Avoid calling read_directory_recursive() twice per level by moving a
> lot of the special logic into treat_directory().
> 
> Since dir.c is somewhat complex, extra cruft built up around this over
> time.  While trying to unravel it, I noticed several instances where the
> first call to read_directory_recursive() would return e.g.
> path_untracked for a some directory and a later one would return e.g.
> path_none, and the code relied on the side-effect of the first adding
> untracked entries to dir->entries in order to get the correct output
> despite the supposed override in return value by the later call.
>
> I am somewhat concerned that there are still bugs and maybe even
> testcases with the wrong expectation.  I have tried to carefully
> document treat_directory() since it becomes more complex after this
> change (though much of this complexity came from elsewhere that probably
> deserved better comments to begin with).  However, much of my work felt
> more like a game of whackamole while attempting to make the code match
> the existing regression tests than an attempt to create an
> implementation that matched some clear design.  That seems wrong to me,
> but the rules of existing behavior had so many special cases that I had
> a hard time coming up with some overarching rules about what correct
> behavior is for all cases, forcing me to hope that the regression tests
> are correct and sufficient.  (I'll note that this turmoil makes working
> with dir.c extremely unpleasant for me; I keep hoping it'll get better,
> but it never seems to.)

Keep fighting the good fight! It appears that some of our most-important
code has these complicated cases and side-effects because it has grown
so organically over time. It's unlikely that someone _could_ rewrite it
to avoid that pain, as dir.c contains a lot of accumulated knowledge from
the many special-cases Git handles. I suppose the only thing we can do
is try to write as many detailed tests as possible.

> However, on the positive side, it does make the code much faster.  For
> the following simple shell loop in an empty repository:
> 
>   for depth in $(seq 10 25)
>   do
>     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
>     rm -rf dir
>     mkdir -p $dirs
>     >$dirs/untracked-file
>     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
>   done
> 
> I saw the following timings, in seconds (note that the numbers are a
> little noisy from run-to-run, but the trend is very clear with every
> run):
> 
>     10: 0.03
>     11: 0.05
>     12: 0.08
>     13: 0.19
>     14: 0.29
>     15: 0.50
>     16: 1.05
>     17: 2.11
>     18: 4.11
>     19: 8.60
>     20: 17.55
>     21: 33.87
>     22: 68.71
>     23: 140.05
>     24: 274.45
>     25: 551.15

Are these timings on Linux? I imagine that the timings will increase
much more quickly on Windows.

> After this fix, those drop to:
> 
>     10: 0.00
...
>     25: 0.00

Nice. I wonder if presenting these 0.00 values as a table is worth
the space? At least the effect is dramatic.

> In fact, it isn't until a depth of 190 nested directories that it
> sometimes starts reporting a time of 0.01 seconds and doesn't
> consistently report 0.01 seconds until there are 240 nested directories.
> The previous code would have taken
>   17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
> to have completed the 240 nested directories case.  It's not often
> that you get to speed something up by a factor of 3*10^69.

Awesome.

> WARNING: This change breaks t7063.  I don't know whether that is to be expected
> (I now intentionally visit untracked directories differently so naturally the
> untracked cache should change), or if I've broken something.  I'm hoping to get
> an untracked cache expert to chime in...

I suppose that when the untracked cache is enabled, your expectation that we
do not need to recurse into an untracked directory is incorrect: we actually
want to explore that directory. Is there a mode we can check to see if we
are REALLY REALLY collecting _all_ untracked paths? Perhaps we need to create
one?

I'm CC'ing Kevin Willford because he is more familiar with the Git index
than me, and perhaps the untracked cache in particular.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 105 insertions(+), 46 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index ef3307718a..aaf038a9c4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  	const char *dirname, int len, int baselen, int excluded,
>  	const struct pathspec *pathspec)
>  {
> -	int nested_repo;
> +	/*
> +	 * WARNING: From this function, you can return path_recurse or you
> +	 *          can call read_directory_recursive() (or neither), but
> +	 *          you CAN'T DO BOTH.
> +	 */
> +	enum path_treatment state;
> +	int nested_repo, old_ignored_nr, stop_early;
>  
>  	/* The "len-1" is to strip the final '/' */
>  	switch (directory_exists_in_index(istate, dirname, len-1)) {
> @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  
>  	/* This is the "show_other_directories" case */
>  
> -	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> +	/*
> +	 * We only need to recurse into untracked/ignored directories if
> +	 * either of the following bits is set:
> +	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
> +	 *                           there are ignored directories below)
> +	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
> +	 *                                 the directory is empty)

Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED
flag to ensure the untracked cache loads all untracked paths?

> +	 */
> +	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
>  		return excluded ? path_excluded : path_untracked;
>  
> +	/*
> +	 * If we only want to determine if dirname is empty, then we can
> +	 * stop at the first file we find underneath that directory rather
> +	 * than continuing to recurse beyond it.  If DIR_SHOW_IGNORED_TOO
> +	 * is set, then we want MORE than just determining if dirname is
> +	 * empty.
> +	 */
> +	stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
> +		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
> +
> +	/*
> +	 * If /every/ file within an untracked directory is ignored, then
> +	 * we want to treat the directory as ignored (for e.g. status
> +	 * --porcelain), without listing the individual ignored files
> +	 * underneath.  To do so, we'll save the current ignored_nr, and
> +	 * pop all the ones added after it if it turns out the entire
> +	 * directory is ignored.

Here is a question for an untracked cache expert: Do we store ignored
paths in the untracked cache?

> +	 */
> +	old_ignored_nr = dir->ignored_nr;
> +
> +	/* Actually recurse into dirname now, we'll fixup the state later. */
>  	untracked = lookup_untracked(dir->untracked, untracked,
>  				     dirname + baselen, len - baselen);
> +	state = read_directory_recursive(dir, istate, dirname, len, untracked,
> +					 stop_early, stop_early, pathspec);
> +
> +	/* There are a variety of reasons we may need to fixup the state... */
> +	if (state == path_excluded) {
> +		int i;
> +
> +		/*
> +		 * When stop_early is set, read_directory_recursive() will
> +		 * never return path_untracked regardless of whether
> +		 * underlying paths were untracked or ignored (because
> +		 * returning early means it excluded some paths, or
> +		 * something like that -- see commit 5aaa7fd39aaf ("Improve
> +		 * performance of git status --ignored", 2017-09-18)).
> +		 * However, we're not really concerned with the status of
> +		 * files under the directory, we just wanted to know
> +		 * whether the directory was empty (state == path_none) or
> +		 * not (state == path_excluded), and if not, we'd return
> +		 * our original status based on whether the untracked
> +		 * directory matched an exclusion pattern.
> +		 */
> +		if (stop_early)
> +			state = excluded ? path_excluded : path_untracked;
> +
> +		else {
> +			/*
> +			 * When
> +			 *     !stop_early && state == path_excluded
> +			 * then all paths under dirname were ignored.  For
> +			 * this case, git status --porcelain wants to just
> +			 * list the directory itself as ignored and not
> +			 * list the individual paths underneath.  Remove
> +			 * the individual paths underneath.
> +			 */
> +			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
> +				free(dir->ignored[i]);
> +			dir->ignored_nr = old_ignored_nr;
> +		}
> +	}
>  
>  	/*
> -	 * If this is an excluded directory, then we only need to check if
> -	 * the directory contains any files.
> +	 * If there is nothing under the current directory and we are not
> +	 * hiding empty directories, then we need to report on the
> +	 * untracked or ignored status of the directory itself.
>  	 */
> -	return read_directory_recursive(dir, istate, dirname, len,
> -					untracked, 1, excluded, pathspec);
> +	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> +		state = excluded ? path_excluded : path_untracked;
> +
> +	/*
> +	 * We can recurse into untracked directories that don't match any
> +	 * of the given pathspecs when some file underneath the directory
> +	 * might match one of the pathspecs.  If so, we should make sure
> +	 * to note that the directory itself did not match.
> +	 */
> +	if (pathspec &&
> +	    !match_pathspec(istate, pathspec, dirname, len,
> +			    0 /* prefix */, NULL,
> +			    0 /* do NOT special case dirs */))
> +		state = path_none;
> +
> +	return state;
>  }

This is certainly a substantial change, and I'm not able to read it
carefully right now. I hope to return to it soon, but hopefully I've
pointed out some places that may lead you to resolve your untracked
cache issues.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-30 15:45     ` Elijah Newren
@ 2020-01-30 16:00       ` Derrick Stolee
  2020-01-30 16:10         ` Derrick Stolee
  0 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-01-30 16:00 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On 1/30/2020 10:45 AM, Elijah Newren wrote:
> On Thu, Jan 30, 2020 at 7:33 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
>>>
>>> Signed-off-by: Elijah Newren <newren@gmail.com>
>>> ---
>>>  dir.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/dir.c b/dir.c
>>> index 225f0bc082..ef3307718a 100644
>>> --- a/dir.c
>>> +++ b/dir.c
>>> @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>>       const char *dirname, int len, int baselen, int excluded,
>>>       const struct pathspec *pathspec)
>>>  {
>>> -     int nested_repo = 0;
>>> +     int nested_repo;
>>>
>>>       /* The "len-1" is to strip the final '/' */
>>>       switch (directory_exists_in_index(istate, dirname, len-1)) {
>>> @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>>               return path_none;
>>>
>>>       case index_nonexistent:
>>> +             nested_repo = 0;
>>
>> I had to look at this code in-full from en/fill-directory-fixes-more to
>> be sure that this case was the only use of nested_repo. However, I found
>> that this switch statement is unnecessarily complicated. By converting
>> the switch to multiple "if" statements, I noticed that the third case
>> actually has a "break" statement that can lead to the final "fourth case"
>> outside the switch statement.
>>
>> Hopefully the patch below is a worthy replacement for this one:
>>
>> -->8--
>>
>> From b5c04e6e028cb6c7f9e78fbdd2182383d928fe6d Mon Sep 17 00:00:00 2001
>> From: Derrick Stolee <dstolee@microsoft.com>
>> Date: Thu, 30 Jan 2020 15:28:39 +0000
>> Subject: [PATCH] dir: refactor treat_directory to clarify variable scope
>>
>> The nested_repo variable in treat_directory() is created and
>> initialized before a multi-case switch statement, but is only
>> used by one case. In fact, this switch is very asymmetrical,
>> as the first two cases are simple but the third is more
>> complicated than the rest of the method.
>>
>> Extract the switch statement into a series of "if" statements.
>> This simplifies the trivial cases, while highlighting the fact
>> that a "break" statement in a condition of the third case
>> actually leads to jumping to the fourth case (after the switch).
>> This assists a reader who provides an initial scan to notice
>> there is a second way to approach the "show_other_directories"
>> case than simply the response from directory_exists_in_index().
> 
> Wait, I'm lost.  Wasn't that break statement the only way to get to
> the "show_other_directories" block of code after the switch statement?
>  I can't see where the second way is; am I missing something?

Ah, I guess I didn't realize that exist_status didn't have a fourth
mode. I was assuming that normally the switch would not hit any of
the case statements was the way you would _assume_ to hit the block
after the switch.

So yes, my statement is incorrect, but the intention is correct:
the flow of this method is very confusing.

> That is, unless directory_exists_in_index() suddenly starts returning
> some value other than the three current possibilities.  Perhaps we
> should throw a BUG() if we get anything other than index_directory,
> index_gitdir, or index_nonexistent.
> 
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  dir.c | 17 ++++++++---------
>>  1 file changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/dir.c b/dir.c
>> index b460211e61..e48812efe6 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1659,17 +1659,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>         const char *dirname, int len, int baselen, int exclude,
>>         const struct pathspec *pathspec)
>>  {
>> -       int nested_repo = 0;
>> -
>>         /* The "len-1" is to strip the final '/' */
>> -       switch (directory_exists_in_index(istate, dirname, len-1)) {
>> -       case index_directory:
>> -               return path_recurse;
>> +       enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
>>
>> -       case index_gitdir:
>> +       if (status == index_directory)
>> +               return path_recurse;
>> +       if (status == index_gitdir)
>>                 return path_none;
>>
>> -       case index_nonexistent:
>> +       if (status == index_nonexistent) {

Since exist_status only has three options, this "if" is redundant.

>> +               int nested_repo = 0;
>>                 if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
>>                     !(dir->flags & DIR_NO_GITLINKS)) {
>>                         struct strbuf sb = STRBUF_INIT;
>> @@ -1682,7 +1681,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>                                 (exclude ? path_excluded : path_untracked));
>>
>>                 if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
>> -                       break;
>> +                       goto show_other_directories;

It would be better to nest the rest of this block in an 
if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES))

>>                 if (exclude &&
>>                         (dir->flags & DIR_SHOW_IGNORED_TOO) &&
>>                         (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
>> @@ -1711,7 +1710,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>         }
> 
> I'd say we'd want to add a BUG("Unhandled value for
> directory_exists_in_index: %d\n", status); right here.
> 
>>
>>         /* This is the "show_other_directories" case */
>> -
>> +show_other_directories:

...allowing us to drop this.

>>         if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
>>                 return exclude ? path_excluded : path_untracked;
>>
>> --
>> 2.25.0.vfs.1.1
> 
> Otherwise, the patch looks good to me and I'll be happy to replace my
> patch with this one.
 
Let me send a v2 of this patch now that you've pointed out my error. It
is worth making this method clearer before you expand substantially on
this final case.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-30 16:00       ` Derrick Stolee
@ 2020-01-30 16:10         ` Derrick Stolee
  2020-01-30 16:20           ` Elijah Newren
  0 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-01-30 16:10 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On 1/30/2020 11:00 AM, Derrick Stolee wrote:
>  
> Let me send a v2 of this patch now that you've pointed out my error. It
> is worth making this method clearer before you expand substantially on
> this final case.

Here we are:

-->8--

From 3fb4fdda25affe9fe6b3e91050e8ad105bcb6fe0 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Thu, 30 Jan 2020 15:28:39 +0000
Subject: [PATCH v2] dir: refactor treat_directory to clarify control flow

The logic in treat_directory() is handled by a multi-case
switch statement, but this switch is very asymmetrical, as
the first two cases are simple but the third is more
complicated than the rest of the method. In fact, the third
case includes a "break" statement that leads to the block
of code outside the switch statement. That is the only way
to reach that block, as the switch handles all possible
values from directory_exists_in_index();

Extract the switch statement into a series of "if" statements.
This simplifies the trivial cases, while clarifying how to
reach the "show_other_directories" case. This is particularly
important as the "show_other_directories" case will expand
in a later change.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index b460211e61..0989558ae6 100644
--- a/dir.c
+++ b/dir.c
@@ -1660,29 +1660,26 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
-
 	/* The "len-1" is to strip the final '/' */
-	switch (directory_exists_in_index(istate, dirname, len-1)) {
-	case index_directory:
-		return path_recurse;
+	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
-	case index_gitdir:
+	if (status == index_directory)
+		return path_recurse;
+	if (status == index_gitdir)
 		return path_none;
 
-	case index_nonexistent:
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    !(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			nested_repo = is_nonbare_repository_dir(&sb);
-			strbuf_release(&sb);
-		}
-		if (nested_repo)
-			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(exclude ? path_excluded : path_untracked));
+	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		!(dir->flags & DIR_NO_GITLINKS)) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, dirname);
+		nested_repo = is_nonbare_repository_dir(&sb);
+		strbuf_release(&sb);
+	}
+	if (nested_repo)
+		return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+			(exclude ? path_excluded : path_untracked));
 
-		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
-			break;
+	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
 		if (exclude &&
 			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
 			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
-- 
2.25.0.vfs.1.1




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-30 16:10         ` Derrick Stolee
@ 2020-01-30 16:20           ` Elijah Newren
  2020-01-30 18:17             ` Derrick Stolee
  0 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren @ 2020-01-30 16:20 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On Thu, Jan 30, 2020 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/30/2020 11:00 AM, Derrick Stolee wrote:
> >
> > Let me send a v2 of this patch now that you've pointed out my error. It
> > is worth making this method clearer before you expand substantially on
> > this final case.
>
> Here we are:
>
> -->8--
>
> From 3fb4fdda25affe9fe6b3e91050e8ad105bcb6fe0 Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Thu, 30 Jan 2020 15:28:39 +0000
> Subject: [PATCH v2] dir: refactor treat_directory to clarify control flow
>
> The logic in treat_directory() is handled by a multi-case
> switch statement, but this switch is very asymmetrical, as
> the first two cases are simple but the third is more
> complicated than the rest of the method. In fact, the third
> case includes a "break" statement that leads to the block
> of code outside the switch statement. That is the only way
> to reach that block, as the switch handles all possible
> values from directory_exists_in_index();
>
> Extract the switch statement into a series of "if" statements.
> This simplifies the trivial cases, while clarifying how to
> reach the "show_other_directories" case. This is particularly
> important as the "show_other_directories" case will expand
> in a later change.
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 33 +++++++++++++++------------------
>  1 file changed, 15 insertions(+), 18 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index b460211e61..0989558ae6 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1660,29 +1660,26 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>         const struct pathspec *pathspec)
>  {
>         int nested_repo = 0;
> -
>         /* The "len-1" is to strip the final '/' */
> -       switch (directory_exists_in_index(istate, dirname, len-1)) {
> -       case index_directory:
> -               return path_recurse;
> +       enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
>
> -       case index_gitdir:
> +       if (status == index_directory)
> +               return path_recurse;
> +       if (status == index_gitdir)
>                 return path_none;

I think right here we should add:

        if (status != index_nonexistent):
                BUG("Unhandled value for directory_exists_in_index:
%d\n", status);

for future-proofing, since both you and I had to look up what
possibilities existed as a return status from
directory_exists_in_index(), and I'd rather a large warning was thrown
if someone ever adds a fourth option to that function rather than
assume treat_directory() is fine and only needs to special case two
choices.

Or we could add an assert or a code comment, just so long as we
document to future readers that the remainder of the code is assuming
status==index_nonexistent.

> -       case index_nonexistent:
> -               if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
> -                   !(dir->flags & DIR_NO_GITLINKS)) {
> -                       struct strbuf sb = STRBUF_INIT;
> -                       strbuf_addstr(&sb, dirname);
> -                       nested_repo = is_nonbare_repository_dir(&sb);
> -                       strbuf_release(&sb);
> -               }
> -               if (nested_repo)
> -                       return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
> -                               (exclude ? path_excluded : path_untracked));
> +       if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
> +               !(dir->flags & DIR_NO_GITLINKS)) {
> +               struct strbuf sb = STRBUF_INIT;
> +               strbuf_addstr(&sb, dirname);
> +               nested_repo = is_nonbare_repository_dir(&sb);
> +               strbuf_release(&sb);
> +       }
> +       if (nested_repo)
> +               return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
> +                       (exclude ? path_excluded : path_untracked));
>
> -               if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
> -                       break;
> +       if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
>                 if (exclude &&
>                         (dir->flags & DIR_SHOW_IGNORED_TOO) &&
>                         (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
> --
> 2.25.0.vfs.1.1

Otherwise, I'm quite happy with these changes.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one
  2020-01-30 15:55   ` Derrick Stolee
@ 2020-01-30 17:13     ` Elijah Newren
  2020-01-30 17:45       ` Elijah Newren
  0 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren @ 2020-01-30 17:13 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Kevin.Willford

On Thu, Jan 30, 2020 at 7:55 AM Derrick Stolee <stolee@gmail.com> wrote:
> > However, on the positive side, it does make the code much faster.  For
> > the following simple shell loop in an empty repository:
> >
> >   for depth in $(seq 10 25)
> >   do
> >     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >     rm -rf dir
> >     mkdir -p $dirs
> >     >$dirs/untracked-file
> >     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> >   done
> >
> > I saw the following timings, in seconds (note that the numbers are a
> > little noisy from run-to-run, but the trend is very clear with every
> > run):
> >
> >     10: 0.03
> >     11: 0.05
> >     12: 0.08
> >     13: 0.19
> >     14: 0.29
> >     15: 0.50
> >     16: 1.05
> >     17: 2.11
> >     18: 4.11
> >     19: 8.60
> >     20: 17.55
> >     21: 33.87
> >     22: 68.71
> >     23: 140.05
> >     24: 274.45
> >     25: 551.15
>
> Are these timings on Linux? I imagine that the timings will increase
> much more quickly on Windows.

Yes, on Linux, with an SSD for the hard drive in this case (though I
suspect OS caching of the directories would probably eliminate any
differences between an SSD and a spinny disk since the same
directories are visited so many times).

> > After this fix, those drop to:
> >
> >     10: 0.00
> ...
> >     25: 0.00
>
> Nice. I wonder if presenting these 0.00 values as a table is worth
> the space? At least the effect is dramatic.

I first considered a table, but then noted it didn't match the code
snippet I provided and was worried I'd have to spend more time
explaining how I post-processed the output from two runs than we'd
gain from compressing the number of lines of the commit message.
Assuming reader time was more valuable, I opted to just keep the two
snippets of output.

> > WARNING: This change breaks t7063.  I don't know whether that is to be expected
> > (I now intentionally visit untracked directories differently so naturally the
> > untracked cache should change), or if I've broken something.  I'm hoping to get
> > an untracked cache expert to chime in...
>
> I suppose that when the untracked cache is enabled, your expectation that we
> do not need to recurse into an untracked directory is incorrect: we actually
> want to explore that directory. Is there a mode we can check to see if we
> are REALLY REALLY collecting _all_ untracked paths? Perhaps we need to create
> one?

I don't think I made any significant changes about using the untracked
cache versus traversing; the primary differences should be that I
traverse each directory once instead of 2^N times.  However, the
previous code would traverse with both check_only=0 and check_only=1,
and to avoid the whole 2^N thing I only traverse once.  That
fundamentally means I only won't traverse with both settings of that
flag.

The output in t7063 seems to suggest to me that the check_only flag
matters to what the untracked-cache stores ("check_only" literally
appears as part of the expected output), and the output also suggests
that the untracked-cache is recording when entries are visited
multiple times somehow.  Or maybe I'm just totally misunderstanding
the expected output in t7063.  I really have no clue about that stuff.

> I'm CC'ing Kevin Willford because he is more familiar with the Git index
> than me, and perhaps the untracked cache in particular.

Getting another set of eyes, even if they only know enough to provide
hunches or guesses, would be very welcome.

> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------
> >  1 file changed, 105 insertions(+), 46 deletions(-)
> >
> > diff --git a/dir.c b/dir.c
> > index ef3307718a..aaf038a9c4 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >       const char *dirname, int len, int baselen, int excluded,
> >       const struct pathspec *pathspec)
> >  {
> > -     int nested_repo;
> > +     /*
> > +      * WARNING: From this function, you can return path_recurse or you
> > +      *          can call read_directory_recursive() (or neither), but
> > +      *          you CAN'T DO BOTH.
> > +      */
> > +     enum path_treatment state;
> > +     int nested_repo, old_ignored_nr, stop_early;
> >
> >       /* The "len-1" is to strip the final '/' */
> >       switch (directory_exists_in_index(istate, dirname, len-1)) {
> > @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >
> >       /* This is the "show_other_directories" case */
> >
> > -     if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> > +     /*
> > +      * We only need to recurse into untracked/ignored directories if
> > +      * either of the following bits is set:
> > +      *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
> > +      *                           there are ignored directories below)
> > +      *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
> > +      *                                 the directory is empty)
>
> Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED
> flag to ensure the untracked cache loads all untracked paths?

Do you mean DIR_KEEP_UNTRACKED_CONTENTS (which is documented in dir.h
as only having meaning when DIR_SHOW_IGNORED_TOO is also set, and thus
caused me to not list it separately)?

Speaking of DIR_KEEP_UNTRACKED_CONTENTS, though, its handling as a
post-processing step in read_directory() is now inconsistent with how
we handle squashing a directory full of ignores into just marking the
containing directory as ignored.  I think I should move the
read_directory() logic for DIR_KEEP_UNTRACKED_CONTENTS to
treat_directory() and use another counter similar to old_ignored_nr.
It should be more efficient that way, too.

>
> > +      */
> > +     if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
> >               return excluded ? path_excluded : path_untracked;
> >
> > +     /*
> > +      * If we only want to determine if dirname is empty, then we can
> > +      * stop at the first file we find underneath that directory rather
> > +      * than continuing to recurse beyond it.  If DIR_SHOW_IGNORED_TOO
> > +      * is set, then we want MORE than just determining if dirname is
> > +      * empty.
> > +      */
> > +     stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
> > +                   !(dir->flags & DIR_SHOW_IGNORED_TOO));
> > +
> > +     /*
> > +      * If /every/ file within an untracked directory is ignored, then
> > +      * we want to treat the directory as ignored (for e.g. status
> > +      * --porcelain), without listing the individual ignored files
> > +      * underneath.  To do so, we'll save the current ignored_nr, and
> > +      * pop all the ones added after it if it turns out the entire
> > +      * directory is ignored.
>
> Here is a question for an untracked cache expert: Do we store ignored
> paths in the untracked cache?

According to 0dcb8d7fe0ec ("untracked cache: record .gitignore
information and dir hierarchy", 2015-03-08), no:

    This cached output is about untracked files only, not ignored files
    because the number of tracked files is usually small, so small cache
    overhead, while the number of ignored files could go really high
    (e.g. *.o files mixing with source code).

...unless, of course, someone came along later and changed the design goals.

[...]

> This is certainly a substantial change, and I'm not able to read it
> carefully right now. I hope to return to it soon, but hopefully I've
> pointed out some places that may lead you to resolve your untracked
> cache issues.

Yeah, it's pretty hard to reason about; personally I needed lots of
dumps of state during traversals just to partially make sense of it.

I had dumps of output from both before and after my changes printing
out return values of treat_directory() and paths and a bunch of other
stuff and was doing lots of comparisons (and repeatedly did this for
many, many different testcases with different toplevel git commands).
It was particularly annoying that the old stuff would traverse
everything 2^N times, half the time with check_only on and half the
time with it off.  It would return different state values for the same
path from different calls, often depending on the side effects of
dir.entries having had more entries added by the first recursion to
get the right output, despite the fact that the "wrong" state was
returned by treat_directory() for later visits to the same path (e.g.
path_untracked returned for the first time it was visited, then
path_none later, and it was a case where path_untracked was correct in
my view).

Despite those difficulties, having an extra set of eyes try to reason
about it and pointing out anything that looks amiss or even that just
looks hard to understand would be very welcome.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one
  2020-01-30 17:13     ` Elijah Newren
@ 2020-01-30 17:45       ` Elijah Newren
  0 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren @ 2020-01-30 17:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Kevin.Willford

On Thu, Jan 30, 2020 at 9:13 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Jan 30, 2020 at 7:55 AM Derrick Stolee <stolee@gmail.com> wrote:
[...]
> > > @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> > >
> > >       /* This is the "show_other_directories" case */
> > >
> > > -     if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> > > +     /*
> > > +      * We only need to recurse into untracked/ignored directories if
> > > +      * either of the following bits is set:
> > > +      *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
> > > +      *                           there are ignored directories below)
> > > +      *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
> > > +      *                                 the directory is empty)
> >
> > Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED
> > flag to ensure the untracked cache loads all untracked paths?
>
> Do you mean DIR_KEEP_UNTRACKED_CONTENTS (which is documented in dir.h
> as only having meaning when DIR_SHOW_IGNORED_TOO is also set, and thus
> caused me to not list it separately)?
>
> Speaking of DIR_KEEP_UNTRACKED_CONTENTS, though, its handling as a
> post-processing step in read_directory() is now inconsistent with how
> we handle squashing a directory full of ignores into just marking the
> containing directory as ignored.  I think I should move the
> read_directory() logic for DIR_KEEP_UNTRACKED_CONTENTS to
> treat_directory() and use another counter similar to old_ignored_nr.
> It should be more efficient that way, too.

Oh, actually, I think I understand what you're getting at so let me
clear it up.  With DIR_SHOW_IGNORED_TOO, we always recurse to the
bottom, because it's needed to find any files that might be ignored.
(Maybe we could do something clever with checking .gitignore entries
and seeing if it's impossible for them to match anything below the
current directory, but the code doesn't do anything that clever.)  As
a side effect, we'll get all untracked files whenever that flag is
set.  As such, the only question is whether we want to keep all those
extra untracked files that we found or not, which is the purpose of
DIR_KEEP_UNTRACKED_CONTENTS.  Without DIR_SHOW_IGNORED_TOO, there's no
need or want to visit all untracked files without also learning of all
ignored files (and, in fact, git-clean is currently the only one that
wants to know about all untracked files).

As far as a simple test goes, in a simple repository with a file named
   one/two/three/four/five/untracked-file
and with nothing else under one/:

Before my changes:
    $ strace -e trace=file git status --ignored 2>&1 | grep
'open("one/' | grep -v gitignore.*ENOENT | wc -l
    62
Note that 62 == 2^5 + 2^4 + 2^3 + 2^2 + 2^1, showing how many
directories we open and read.

After my changes:
    $ strace -e trace=file git status --ignored 2>&1 | grep
'open("one/' | grep -v gitignore.*ENOENT | wc -l
    5
showing that it does open and read each directory, but does so only once.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage
  2020-01-30 16:20           ` Elijah Newren
@ 2020-01-30 18:17             ` Derrick Stolee
  0 siblings, 0 replies; 68+ messages in thread
From: Derrick Stolee @ 2020-01-30 18:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On 1/30/2020 11:20 AM, Elijah Newren wrote:
> On Thu, Jan 30, 2020 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>> diff --git a/dir.c b/dir.c
>> index b460211e61..0989558ae6 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1660,29 +1660,26 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>         const struct pathspec *pathspec)
>>  {
>>         int nested_repo = 0;
>> -
>>         /* The "len-1" is to strip the final '/' */
>> -       switch (directory_exists_in_index(istate, dirname, len-1)) {
>> -       case index_directory:
>> -               return path_recurse;
>> +       enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
>>
>> -       case index_gitdir:
>> +       if (status == index_directory)
>> +               return path_recurse;
>> +       if (status == index_gitdir)
>>                 return path_none;
> 
> I think right here we should add:
> 
>         if (status != index_nonexistent):
>                 BUG("Unhandled value for directory_exists_in_index:
> %d\n", status);
> 
> for future-proofing, since both you and I had to look up what
> possibilities existed as a return status from
> directory_exists_in_index(), and I'd rather a large warning was thrown
> if someone ever adds a fourth option to that function rather than
> assume treat_directory() is fine and only needs to special case two
> choices.
> 
> Or we could add an assert or a code comment, just so long as we
> document to future readers that the remainder of the code is assuming
> status==index_nonexistent.

I'm happy if you squash this into the commit. Thanks!


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one
  2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
  2020-01-30 15:55   ` Derrick Stolee
@ 2020-01-31 17:13   ` SZEDER Gábor
  2020-01-31 17:47     ` Elijah Newren
  1 sibling, 1 reply; 68+ messages in thread
From: SZEDER Gábor @ 2020-01-31 17:13 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Martin Melka, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On Wed, Jan 29, 2020 at 10:03:42PM +0000, Elijah Newren via GitGitGadget wrote:
> Part of the tension noted above is that the treatment of a directory can
> changed based on the files within it, and based on the various settings

s/changed/change/, or perhaps s/changed/be changed/ ?

> Since dir.c is somewhat complex, extra cruft built up around this over
> time.  While trying to unravel it, I noticed several instances where the
> first call to read_directory_recursive() would return e.g.
> path_untracked for a some directory and a later one would return e.g.

s/for a some/for some/

> However, on the positive side, it does make the code much faster.  For
> the following simple shell loop in an empty repository:
> 
>   for depth in $(seq 10 25)
>   do
>     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
>     rm -rf dir
>     mkdir -p $dirs
>     >$dirs/untracked-file
>     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
>   done
> 
> I saw the following timings, in seconds (note that the numbers are a
> little noisy from run-to-run, but the trend is very clear with every
> run):
> 
>     10: 0.03
>     11: 0.05
>     12: 0.08
>     13: 0.19
>     14: 0.29
>     15: 0.50
>     16: 1.05
>     17: 2.11
>     18: 4.11
>     19: 8.60
>     20: 17.55
>     21: 33.87
>     22: 68.71
>     23: 140.05
>     24: 274.45
>     25: 551.15
> 
> After this fix, those drop to:
> 
>     10: 0.00
>     11: 0.00
>     12: 0.00
>     13: 0.00
>     14: 0.00
>     15: 0.00
>     16: 0.00
>     17: 0.00
>     18: 0.00
>     19: 0.00
>     20: 0.00
>     21: 0.00
>     22: 0.00
>     23: 0.00
>     24: 0.00
>     25: 0.00

I agree with Derrick here: if you just said that all these report
0.00, I would have taken your word for it.

Having said that...  I don't know how to get more decimal places out
of /use/bin/time, but our trace performance facility uses nanosecond
resolution timestamps.  So using this command in the loop above:

  GIT_TRACE_PERFORMANCE=2 git status --ignored 2>&1 >/dev/null |
    sed -n -e "s/.* performance: \(.*\): git command.*/$depth: \1/p"

gave me this:

  1: 0.000574302 s
  2: 0.000584995 s
  3: 0.000608684 s
  4: 0.000951336 s
  5: 0.000762019 s
  6: 0.000816685 s
  7: 0.000672516 s
  8: 0.000912628 s
  9: 0.000661538 s
  10: 0.000687465 s
  11: 0.000708880 s
  12: 0.000693754 s
  13: 0.000726120 s
  14: 0.000737334 s
  15: 0.000787362 s
  16: 0.000856687 s
  17: 0.000780892 s
  18: 0.000790798 s
  19: 0.000834411 s
  20: 0.000859094 s
  21: 0.001230912 s
  22: 0.001048852 s
  23: 0.000891057 s
  24: 0.000934097 s
  25: 0.001051704 s

Not sure it's worth including, though.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one
  2020-01-31 17:13   ` SZEDER Gábor
@ 2020-01-31 17:47     ` Elijah Newren
  0 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren @ 2020-01-31 17:47 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	Samuel Lijin, Nguyễn Thái Ngọc Duy

On Fri, Jan 31, 2020 at 9:13 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Wed, Jan 29, 2020 at 10:03:42PM +0000, Elijah Newren via GitGitGadget wrote:
> > Part of the tension noted above is that the treatment of a directory can
> > changed based on the files within it, and based on the various settings
>
> s/changed/change/, or perhaps s/changed/be changed/ ?
>
> > Since dir.c is somewhat complex, extra cruft built up around this over
> > time.  While trying to unravel it, I noticed several instances where the
> > first call to read_directory_recursive() would return e.g.
> > path_untracked for a some directory and a later one would return e.g.
>
> s/for a some/for some/
>
> > However, on the positive side, it does make the code much faster.  For
> > the following simple shell loop in an empty repository:
> >
> >   for depth in $(seq 10 25)
> >   do
> >     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >     rm -rf dir
> >     mkdir -p $dirs
> >     >$dirs/untracked-file
> >     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> >   done
> >
> > I saw the following timings, in seconds (note that the numbers are a
> > little noisy from run-to-run, but the trend is very clear with every
> > run):
> >
> >     10: 0.03
> >     11: 0.05
> >     12: 0.08
> >     13: 0.19
> >     14: 0.29
> >     15: 0.50
> >     16: 1.05
> >     17: 2.11
> >     18: 4.11
> >     19: 8.60
> >     20: 17.55
> >     21: 33.87
> >     22: 68.71
> >     23: 140.05
> >     24: 274.45
> >     25: 551.15
> >
> > After this fix, those drop to:
> >
> >     10: 0.00
> >     11: 0.00
> >     12: 0.00
> >     13: 0.00
> >     14: 0.00
> >     15: 0.00
> >     16: 0.00
> >     17: 0.00
> >     18: 0.00
> >     19: 0.00
> >     20: 0.00
> >     21: 0.00
> >     22: 0.00
> >     23: 0.00
> >     24: 0.00
> >     25: 0.00
>
> I agree with Derrick here: if you just said that all these report
> 0.00, I would have taken your word for it.

Thanks, I'll include all these fixes.  Good timing too, as I was about
to send a re-roll.

> Having said that...  I don't know how to get more decimal places out
> of /use/bin/time, but our trace performance facility uses nanosecond
> resolution timestamps.  So using this command in the loop above:
>
>   GIT_TRACE_PERFORMANCE=2 git status --ignored 2>&1 >/dev/null |
>     sed -n -e "s/.* performance: \(.*\): git command.*/$depth: \1/p"
>
> gave me this:
>
>   1: 0.000574302 s
>   2: 0.000584995 s
>   3: 0.000608684 s
>   4: 0.000951336 s
>   5: 0.000762019 s
>   6: 0.000816685 s
>   7: 0.000672516 s
>   8: 0.000912628 s
>   9: 0.000661538 s
>   10: 0.000687465 s
>   11: 0.000708880 s
>   12: 0.000693754 s
>   13: 0.000726120 s
>   14: 0.000737334 s
>   15: 0.000787362 s
>   16: 0.000856687 s
>   17: 0.000780892 s
>   18: 0.000790798 s
>   19: 0.000834411 s
>   20: 0.000859094 s
>   21: 0.001230912 s
>   22: 0.001048852 s
>   23: 0.000891057 s
>   24: 0.000934097 s
>   25: 0.001051704 s
>
> Not sure it's worth including, though.

Yeah, I'm afraid people will spend time trying to analyze it and the
numbers are extremely noisy.  I instead included some words about
counting the number of untracked files opened according to strace,
which shows before we had 2^(1+$depth)-2 untracked directories get
opened and after we had exactly $depth get opened.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/6] dir: fix confusion based on variable tense
  2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
  2020-01-30 15:20   ` Derrick Stolee
@ 2020-01-31 18:04   ` SZEDER Gábor
  2020-01-31 18:17     ` Elijah Newren
  1 sibling, 1 reply; 68+ messages in thread
From: SZEDER Gábor @ 2020-01-31 18:04 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Martin Melka, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On Wed, Jan 29, 2020 at 10:03:40PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Despite having contributed several fixes in this area, I have for months
> (years?) assumed that the "exclude" variable was a directive; this
> caused me to think of it as a different mode we operate in and left me
> confused as I tried to build up a mental model around why we'd need such
> a directive.  I mostly tried to ignore it while focusing on the pieces I
> was trying to understand.
> 
> Then I finally traced this variable all back to a call to is_excluded(),
> meaning it was actually functioning as an adjective.  In particular, it
> was a checked property ("Does this path match a rule in .gitignore?"),
> rather than a mode passed in from the caller.  Change the variable name
> to match the part of speech used by the function called to define it,
> which will hopefully make these bits of code slightly clearer to the
> next reader.

Slightly related questions: Does 'excluded' always mean ignored?  Or
is it possible for a file to be excluded but for some other reason
than being ignored?

I'm never really sure, and of course it doesn't help that we have both
'.gitignore' and '.git/info/exclude' files and conditions like:

> +		if (excluded &&
> +		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
> +		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/6] dir: fix confusion based on variable tense
  2020-01-31 18:04   ` SZEDER Gábor
@ 2020-01-31 18:17     ` Elijah Newren
  0 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren @ 2020-01-31 18:17 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	Samuel Lijin, Nguyễn Thái Ngọc Duy

On Fri, Jan 31, 2020 at 10:04 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Wed, Jan 29, 2020 at 10:03:40PM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Despite having contributed several fixes in this area, I have for months
> > (years?) assumed that the "exclude" variable was a directive; this
> > caused me to think of it as a different mode we operate in and left me
> > confused as I tried to build up a mental model around why we'd need such
> > a directive.  I mostly tried to ignore it while focusing on the pieces I
> > was trying to understand.
> >
> > Then I finally traced this variable all back to a call to is_excluded(),
> > meaning it was actually functioning as an adjective.  In particular, it
> > was a checked property ("Does this path match a rule in .gitignore?"),
> > rather than a mode passed in from the caller.  Change the variable name
> > to match the part of speech used by the function called to define it,
> > which will hopefully make these bits of code slightly clearer to the
> > next reader.
>
> Slightly related questions: Does 'excluded' always mean ignored?  Or
> is it possible for a file to be excluded but for some other reason
> than being ignored?
>
> I'm never really sure, and of course it doesn't help that we have both
> '.gitignore' and '.git/info/exclude' files and conditions like:
>
> > +             if (excluded &&
> > +                 (dir->flags & DIR_SHOW_IGNORED_TOO) &&
> > +                 (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
>

Good question; no idea.  You can start digging into is_excluded() and
the pattern list stored in the dir struct and try to trace it back to
see if it's just the combination of ignore rules in .gitignore and
.git/info/exclude and core.excludesFile, or if there is something else
meant here.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
@ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
                     ` (6 more replies)
  6 siblings, 7 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren

This patch series builds on en/fill-directory-fixes-more. This series should
be considered an RFC because of the untracked-cache changes (see the last
two commits), for which I'm hoping to get an untracked-cache expert to
comment. This series does provide some modest speedups (see second to last
commit message), and should allow 'git status --ignored' to complete in a
more reasonable timeframe for Martin Melka (see 
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
)

Changes since v1:

 * Replaced patch 4 with improved version from Stolee (with additional
   improvement of my own)
 * Clarifications, wording fixes, and more about linear perf in commit
   message to patch 5
 * More detail in patch 5 about why "whackamole" particularly makes me
   uneasy for dir.c

Stuff clearly still missing from v2:

 * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in 
   https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ 
   which I think would make the code cleaner & clearer.
 * I still have not addressed the untracked-cache issue mentioned in the
   last two commits. I looked at it very, very briefly, but I was really
   close to doing something similar to [1] and just dropping my patches in
   this series before even submitting them on Wednesday[2] (dir.c is a
   really unpleasant to work in). Other than wording fixes, I just need a
   week or two off from this area before I dig further, unless someone else
   wants to dive in and needs me to provide pointers on what I've done so
   far.

[1] 
https://lore.kernel.org/git/pull.676.v3.git.git.1576571586.gitgitgadget@gmail.com/
[2] I was inches from doing that Wednesday morning. I had done several
rounds of "Okay, I fixed all the tests that broke with my changes last time,
let's re-run the testsuite -- wow, four totally different tests from
testfiles I hadn't looked at before now break", and decided that I would
only do one more before dropping it an maybe coming back in a month or two.
That time happened to work, minus the untracked-cache, so I decided to put
it in front of other eyeballs.

Derrick Stolee (1):
  dir: refactor treat_directory to clarify control flow

Elijah Newren (5):
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: replace exponential algorithm with a linear one
  t7063: blindly accept diffs

 dir.c                             | 331 +++++++++++++++++-------------
 t/t7063-status-untracked-cache.sh |  50 ++---
 2 files changed, 208 insertions(+), 173 deletions(-)


base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v2
Pull-Request: https://github.com/git/git/pull/700

Range-diff vs v1:

 1:  27bc135796 = 1:  27bc135796 dir: consolidate treat_path() and treat_one_path()
 2:  2ceb64ae61 = 2:  2ceb64ae61 dir: fix broken comment
 3:  e6d21228d1 = 3:  e6d21228d1 dir: fix confusion based on variable tense
 4:  3b2ec5eaf6 ! 4:  f73f0d66d1 dir: move setting of nested_repo next to its actual usage
     @@ -1,26 +1,73 @@
     -Author: Elijah Newren <newren@gmail.com>
     +Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    dir: move setting of nested_repo next to its actual usage
     +    dir: refactor treat_directory to clarify control flow
      
     +    The logic in treat_directory() is handled by a multi-case
     +    switch statement, but this switch is very asymmetrical, as
     +    the first two cases are simple but the third is more
     +    complicated than the rest of the method. In fact, the third
     +    case includes a "break" statement that leads to the block
     +    of code outside the switch statement. That is the only way
     +    to reach that block, as the switch handles all possible
     +    values from directory_exists_in_index();
     +
     +    Extract the switch statement into a series of "if" statements.
     +    This simplifies the trivial cases, while clarifying how to
     +    reach the "show_other_directories" case. This is particularly
     +    important as the "show_other_directories" case will expand
     +    in a later change.
     +
     +    Helped-by: Elijah Newren <newren@gmail.com>
     +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       diff --git a/dir.c b/dir.c
       --- a/dir.c
       +++ b/dir.c
      @@
     - 	const char *dirname, int len, int baselen, int excluded,
       	const struct pathspec *pathspec)
       {
     --	int nested_repo = 0;
     -+	int nested_repo;
     - 
     + 	int nested_repo = 0;
     +-
       	/* The "len-1" is to strip the final '/' */
     - 	switch (directory_exists_in_index(istate, dirname, len-1)) {
     -@@
     +-	switch (directory_exists_in_index(istate, dirname, len-1)) {
     +-	case index_directory:
     +-		return path_recurse;
     ++	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
     + 
     +-	case index_gitdir:
     ++	if (status == index_directory)
     ++		return path_recurse;
     ++	if (status == index_gitdir)
       		return path_none;
     ++	if (status != index_nonexistent)
     ++		BUG("Unhandled value for directory_exists_in_index: %d\n", status);
     + 
     +-	case index_nonexistent:
     +-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
     +-		    !(dir->flags & DIR_NO_GITLINKS)) {
     +-			struct strbuf sb = STRBUF_INIT;
     +-			strbuf_addstr(&sb, dirname);
     +-			nested_repo = is_nonbare_repository_dir(&sb);
     +-			strbuf_release(&sb);
     +-		}
     +-		if (nested_repo)
     +-			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
     +-				(excluded ? path_excluded : path_untracked));
     ++	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
     ++		!(dir->flags & DIR_NO_GITLINKS)) {
     ++		struct strbuf sb = STRBUF_INIT;
     ++		strbuf_addstr(&sb, dirname);
     ++		nested_repo = is_nonbare_repository_dir(&sb);
     ++		strbuf_release(&sb);
     ++	}
     ++	if (nested_repo)
     ++		return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
     ++			(excluded ? path_excluded : path_untracked));
       
     - 	case index_nonexistent:
     -+		nested_repo = 0;
     - 		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
     - 		    !(dir->flags & DIR_NO_GITLINKS)) {
     - 			struct strbuf sb = STRBUF_INIT;
     +-		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
     +-			break;
     ++	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
     + 		if (excluded &&
     + 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
     + 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
 5:  40b378e7ad ! 5:  d3136ef52f dir: replace exponential algorithm with a linear one
     @@ -20,26 +20,29 @@
               and report all the ignored entries and then report the directory as
               untracked -- UNLESS all the entries under the directory are
               ignored, in which case we don't print any of the entries under the
     -         directory and just report the directory itself as ignored.
     +         directory and just report the directory itself as ignored.  (Note
     +         that although this forces us to walk all untracked files underneath
     +         the directory as well, we strip them from the output, except for
     +         users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.)
      
             * For 'git clean', we may need to recurse into a directory that
               doesn't match any specified pathspecs, if it's possible that there
               is an entry underneath the directory that can match one of the
               pathspecs.  In such a case, we need to be careful to omit the
     -         directory itself from the list of paths (see e.g. commit
     -         404ebceda01c ("dir: also check directories for matching pathspecs",
     -         2019-09-17))
     +         directory itself from the list of paths (see commit 404ebceda01c
     +         ("dir: also check directories for matching pathspecs", 2019-09-17))
      
          Part of the tension noted above is that the treatment of a directory can
     -    changed based on the files within it, and based on the various settings
     +    change based on the files within it, and based on the various settings
          in dir->flags.  Trying to keep this in mind while reading over the code,
     -    it is easy to (accidentally?) think in terms of "treat_directory() tells
     -    us what to do with a directory, and read_directory_recursive() is the
     -    thing that recurses".  Since we need to look into a directory to know
     -    how to treat it, though, it was quite easy to decide to recurse into the
     +    it is easy to think in terms of "treat_directory() tells us what to do
     +    with a directory, and read_directory_recursive() is the thing that
     +    recurses".  Since we need to look into a directory to know how to treat
     +    it, though, it is quite easy to decide to (also) recurse into the
          directory from treat_directory() by adding a read_directory_recursive()
     -    call.  Adding such a call is actually fine, IF we didn't also cause
     -    read_directory_recursive() to recurse into the same directory again.
     +    call.  Adding such a call is actually fine, IF we make sure that
     +    read_directory_recursive() does not also recurse into that same
     +    directory.
      
          Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
          for ignored files", 2017-05-18), added exactly such a case to the code,
     @@ -58,10 +61,12 @@
          Since dir.c is somewhat complex, extra cruft built up around this over
          time.  While trying to unravel it, I noticed several instances where the
          first call to read_directory_recursive() would return e.g.
     -    path_untracked for a some directory and a later one would return e.g.
     -    path_none, and the code relied on the side-effect of the first adding
     -    untracked entries to dir->entries in order to get the correct output
     -    despite the supposed override in return value by the later call.
     +    path_untracked for some directory and a later one would return e.g.
     +    path_none, despite the fact that the directory clearly should have been
     +    considered untracked.  The code happened to work due to the side-effect
     +    from the first invocation of adding untracked entries to dir->entries;
     +    this allowed it to get the correct output despite the supposed override
     +    in return value by the later call.
      
          I am somewhat concerned that there are still bugs and maybe even
          testcases with the wrong expectation.  I have tried to carefully
     @@ -74,9 +79,40 @@
          but the rules of existing behavior had so many special cases that I had
          a hard time coming up with some overarching rules about what correct
          behavior is for all cases, forcing me to hope that the regression tests
     -    are correct and sufficient.  (I'll note that this turmoil makes working
     -    with dir.c extremely unpleasant for me; I keep hoping it'll get better,
     -    but it never seems to.)
     +    are correct and sufficient.  Such a hope seems likely to be ill-founded,
     +    given my experience with dir.c-related testcases in the last few months:
     +
     +      Examples where the documentation was hard to parse or even just wrong:
     +       * 3aca58045f4f (git-clean.txt: do not claim we will delete files with
     +                       -n/--dry-run, 2019-09-17)
     +       * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
     +                       repository, 2019-09-17)
     +       * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
     +      Examples where testcases were declared wrong and changed:
     +       * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
     +                       repository, 2019-09-17)
     +       * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
     +       * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within
     +                       leading directories", 2019-12-10)
     +      Examples where testcases were clearly inadequate:
     +       * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an
     +                       ignored file breakage, 2019-08-25)
     +       * 7541cc530239 (t7300: add testcases showing failure to clean specified
     +                       pathspecs, 2019-09-17)
     +       * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item,
     +                       2019-09-17)
     +       * 404ebceda01c (dir: also check directories for matching pathspecs,
     +                       2019-09-17)
     +       * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
     +                       repository, 2019-09-17)
     +       * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
     +       * 452efd11fbf6 (t3011: demonstrate directory traversal failures,
     +                       2019-12-10)
     +       * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19)
     +      Examples where "correct behavior" was unclear to everyone:
     +        https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
     +      Other commits of note:
     +       * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17)
      
          However, on the positive side, it does make the code much faster.  For
          the following simple shell loop in an empty repository:
     @@ -111,27 +147,14 @@
              24: 274.45
              25: 551.15
      
     -    After this fix, those drop to:
     -
     -        10: 0.00
     -        11: 0.00
     -        12: 0.00
     -        13: 0.00
     -        14: 0.00
     -        15: 0.00
     -        16: 0.00
     -        17: 0.00
     -        18: 0.00
     -        19: 0.00
     -        20: 0.00
     -        21: 0.00
     -        22: 0.00
     -        23: 0.00
     -        24: 0.00
     -        25: 0.00
     +    For the above run, using strace I can look for the number of untracked
     +    directories opened and can verify that it matches the expected
     +    2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth).
      
     -    In fact, it isn't until a depth of 190 nested directories that it
     -    sometimes starts reporting a time of 0.01 seconds and doesn't
     +    After this fix, with strace I can verify that the number of untracked
     +    directories that are opened drops to just $depth, and the timings all
     +    drop to 0.00.  In fact, it isn't until a depth of 190 nested directories
     +    that it sometimes starts reporting a time of 0.01 seconds and doesn't
          consistently report 0.01 seconds until there are 240 nested directories.
          The previous code would have taken
            17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
     @@ -152,17 +175,17 @@
       	const char *dirname, int len, int baselen, int excluded,
       	const struct pathspec *pathspec)
       {
     --	int nested_repo;
     +-	int nested_repo = 0;
      +	/*
      +	 * WARNING: From this function, you can return path_recurse or you
      +	 *          can call read_directory_recursive() (or neither), but
      +	 *          you CAN'T DO BOTH.
      +	 */
      +	enum path_treatment state;
     -+	int nested_repo, old_ignored_nr, stop_early;
     - 
     ++	int nested_repo = 0, old_ignored_nr, stop_early;
       	/* The "len-1" is to strip the final '/' */
     - 	switch (directory_exists_in_index(istate, dirname, len-1)) {
     + 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
     + 
      @@
       
       	/* This is the "show_other_directories" case */
     @@ -292,9 +315,9 @@
      -	 * updates in treat_leading_path().  See the commit message for the
      -	 * commit adding this warning as well as the commit preceding it
      -	 * for details.
     -+	 * WARNING: Do NOT call recurse unless path_recurse is returned
     -+	 *          from treat_path().  Recursing on any other return value
     -+	 *          results in exponential slowdown.
     ++	 * WARNING: Do NOT recurse unless path_recurse is returned from
     ++	 *          treat_path().  Recursing on any other return value
     ++	 *          can result in exponential slowdown.
       	 */
      -
       	struct cached_dir cdir;
 6:  7fb8063541 = 6:  9a3f20656e t7063: blindly accept diffs

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path()
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
@ 2020-01-31 18:31   ` Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 16e2cfa90993 ("read_directory(): further split treat_path()",
2010-01-08) split treat_one_path() out of treat_path(), because
treat_leading_path() would not have access to a dirent but wanted to
re-use as much of treat_path() as possible.  Not re-using all of
treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir:
fix checks on common prefix directory", 2019-12-19).  Finally, in commit
ad6f2157f951 ("dir: restructure in a way to avoid passing around a
struct dirent", 2020-01-16), dirents were removed from treat_path() and
other functions entirely.

Since the only reason for splitting these functions was the lack of a
dirent -- which no longer applies to either function -- and since the
split caused problems in the past resulting in us not using
treat_one_path() separately anymore, just undo the split.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 121 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 55 insertions(+), 66 deletions(-)

diff --git a/dir.c b/dir.c
index b460211e61..68c56aeddb 100644
--- a/dir.c
+++ b/dir.c
@@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate,
 	return dtype;
 }
 
-static enum path_treatment treat_one_path(struct dir_struct *dir,
-					  struct untracked_cache_dir *untracked,
-					  struct index_state *istate,
-					  struct strbuf *path,
-					  int baselen,
-					  const struct pathspec *pathspec,
-					  int dtype)
-{
-	int exclude;
-	int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case);
+static enum path_treatment treat_path_fast(struct dir_struct *dir,
+					   struct untracked_cache_dir *untracked,
+					   struct cached_dir *cdir,
+					   struct index_state *istate,
+					   struct strbuf *path,
+					   int baselen,
+					   const struct pathspec *pathspec)
+{
+	strbuf_setlen(path, baselen);
+	if (!cdir->ucd) {
+		strbuf_addstr(path, cdir->file);
+		return path_untracked;
+	}
+	strbuf_addstr(path, cdir->ucd->name);
+	/* treat_one_path() does this before it calls treat_directory() */
+	strbuf_complete(path, '/');
+	if (cdir->ucd->check_only)
+		/*
+		 * check_only is set as a result of treat_directory() getting
+		 * to its bottom. Verify again the same set of directories
+		 * with check_only set.
+		 */
+		return read_directory_recursive(dir, istate, path->buf, path->len,
+						cdir->ucd, 1, 0, pathspec);
+	/*
+	 * We get path_recurse in the first run when
+	 * directory_exists_in_index() returns index_nonexistent. We
+	 * are sure that new changes in the index does not impact the
+	 * outcome. Return now.
+	 */
+	return path_recurse;
+}
+
+static enum path_treatment treat_path(struct dir_struct *dir,
+				      struct untracked_cache_dir *untracked,
+				      struct cached_dir *cdir,
+				      struct index_state *istate,
+				      struct strbuf *path,
+				      int baselen,
+				      const struct pathspec *pathspec)
+{
+	int has_path_in_index, dtype, exclude;
 	enum path_treatment path_treatment;
 
-	dtype = resolve_dtype(dtype, istate, path->buf, path->len);
+	if (!cdir->d_name)
+		return treat_path_fast(dir, untracked, cdir, istate, path,
+				       baselen, pathspec);
+	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
+		return path_none;
+	strbuf_setlen(path, baselen);
+	strbuf_addstr(path, cdir->d_name);
+	if (simplify_away(path->buf, path->len, pathspec))
+		return path_none;
+
+	dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len);
 
 	/* Always exclude indexed files */
+	has_path_in_index = !!index_file_exists(istate, path->buf, path->len,
+						ignore_case);
 	if (dtype != DT_DIR && has_path_in_index)
 		return path_none;
 
@@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	}
 }
 
-static enum path_treatment treat_path_fast(struct dir_struct *dir,
-					   struct untracked_cache_dir *untracked,
-					   struct cached_dir *cdir,
-					   struct index_state *istate,
-					   struct strbuf *path,
-					   int baselen,
-					   const struct pathspec *pathspec)
-{
-	strbuf_setlen(path, baselen);
-	if (!cdir->ucd) {
-		strbuf_addstr(path, cdir->file);
-		return path_untracked;
-	}
-	strbuf_addstr(path, cdir->ucd->name);
-	/* treat_one_path() does this before it calls treat_directory() */
-	strbuf_complete(path, '/');
-	if (cdir->ucd->check_only)
-		/*
-		 * check_only is set as a result of treat_directory() getting
-		 * to its bottom. Verify again the same set of directories
-		 * with check_only set.
-		 */
-		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, 0, pathspec);
-	/*
-	 * We get path_recurse in the first run when
-	 * directory_exists_in_index() returns index_nonexistent. We
-	 * are sure that new changes in the index does not impact the
-	 * outcome. Return now.
-	 */
-	return path_recurse;
-}
-
-static enum path_treatment treat_path(struct dir_struct *dir,
-				      struct untracked_cache_dir *untracked,
-				      struct cached_dir *cdir,
-				      struct index_state *istate,
-				      struct strbuf *path,
-				      int baselen,
-				      const struct pathspec *pathspec)
-{
-	if (!cdir->d_name)
-		return treat_path_fast(dir, untracked, cdir, istate, path,
-				       baselen, pathspec);
-	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
-		return path_none;
-	strbuf_setlen(path, baselen);
-	strbuf_addstr(path, cdir->d_name);
-	if (simplify_away(path->buf, path->len, pathspec))
-		return path_none;
-
-	return treat_one_path(dir, untracked, istate, path, baselen, pathspec,
-			      cdir->d_type);
-}
-
 static void add_untracked(struct untracked_cache_dir *dir, const char *name)
 {
 	if (!dir)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 2/6] dir: fix broken comment
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
@ 2020-01-31 18:31   ` Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 68c56aeddb..c358158f55 100644
--- a/dir.c
+++ b/dir.c
@@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 					add_untracked(untracked, path.buf + baselen);
 				break;
 			}
-			/* skip the dir_add_* part */
+			/* skip the add_path_to_appropriate_result_list() */
 			continue;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 3/6] dir: fix confusion based on variable tense
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
@ 2020-01-31 18:31   ` Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Despite having contributed several fixes in this area, I have for months
(years?) assumed that the "exclude" variable was a directive; this
caused me to think of it as a different mode we operate in and left me
confused as I tried to build up a mental model around why we'd need such
a directive.  I mostly tried to ignore it while focusing on the pieces I
was trying to understand.

Then I finally traced this variable all back to a call to is_excluded(),
meaning it was actually functioning as an adjective.  In particular, it
was a checked property ("Does this path match a rule in .gitignore?"),
rather than a mode passed in from the caller.  Change the variable name
to match the part of speech used by the function called to define it,
which will hopefully make these bits of code slightly clearer to the
next reader.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index c358158f55..225f0bc082 100644
--- a/dir.c
+++ b/dir.c
@@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
 static enum path_treatment treat_directory(struct dir_struct *dir,
 	struct index_state *istate,
 	struct untracked_cache_dir *untracked,
-	const char *dirname, int len, int baselen, int exclude,
+	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
@@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		}
 		if (nested_repo)
 			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(exclude ? path_excluded : path_untracked));
+				(excluded ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
-		if (exclude &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
+		if (excluded &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
 
 			/*
 			 * This is an excluded directory and we are
@@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* This is the "show_other_directories" case */
 
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
@@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * the directory contains any files.
 	 */
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, exclude, pathspec);
+					untracked, 1, excluded, pathspec);
 }
 
 /*
@@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      int baselen,
 				      const struct pathspec *pathspec)
 {
-	int has_path_in_index, dtype, exclude;
+	int has_path_in_index, dtype, excluded;
 	enum path_treatment path_treatment;
 
 	if (!cdir->d_name)
@@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	    (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent))
 		return path_none;
 
-	exclude = is_excluded(dir, istate, path->buf, &dtype);
+	excluded = is_excluded(dir, istate, path->buf, &dtype);
 
 	/*
 	 * Excluded? If we don't explicitly want to show
 	 * ignored files, ignore it
 	 */
-	if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
+	if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
 		return path_excluded;
 
 	switch (dtype) {
@@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		strbuf_addch(path, '/');
 		path_treatment = treat_directory(dir, istate, untracked,
 						 path->buf, path->len,
-						 baselen, exclude, pathspec);
+						 baselen, excluded, pathspec);
 		/*
 		 * If 1) we only want to return directories that
 		 * match an exclude pattern and 2) this directory does
@@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		 * recurse into this directory (instead of marking the
 		 * directory itself as an ignored path).
 		 */
-		if (!exclude &&
+		if (!excluded &&
 		    path_treatment == path_excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
@@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		return path_treatment;
 	case DT_REG:
 	case DT_LNK:
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-01-31 18:31   ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
@ 2020-01-31 18:31   ` Derrick Stolee via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The logic in treat_directory() is handled by a multi-case
switch statement, but this switch is very asymmetrical, as
the first two cases are simple but the third is more
complicated than the rest of the method. In fact, the third
case includes a "break" statement that leads to the block
of code outside the switch statement. That is the only way
to reach that block, as the switch handles all possible
values from directory_exists_in_index();

Extract the switch statement into a series of "if" statements.
This simplifies the trivial cases, while clarifying how to
reach the "show_other_directories" case. This is particularly
important as the "show_other_directories" case will expand
in a later change.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index 225f0bc082..6867356a31 100644
--- a/dir.c
+++ b/dir.c
@@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
-
 	/* The "len-1" is to strip the final '/' */
-	switch (directory_exists_in_index(istate, dirname, len-1)) {
-	case index_directory:
-		return path_recurse;
+	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
-	case index_gitdir:
+	if (status == index_directory)
+		return path_recurse;
+	if (status == index_gitdir)
 		return path_none;
+	if (status != index_nonexistent)
+		BUG("Unhandled value for directory_exists_in_index: %d\n", status);
 
-	case index_nonexistent:
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    !(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			nested_repo = is_nonbare_repository_dir(&sb);
-			strbuf_release(&sb);
-		}
-		if (nested_repo)
-			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(excluded ? path_excluded : path_untracked));
+	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		!(dir->flags & DIR_NO_GITLINKS)) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, dirname);
+		nested_repo = is_nonbare_repository_dir(&sb);
+		strbuf_release(&sb);
+	}
+	if (nested_repo)
+		return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+			(excluded ? path_excluded : path_untracked));
 
-		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
-			break;
+	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
 		if (excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 5/6] dir: replace exponential algorithm with a linear one
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-01-31 18:31   ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
@ 2020-01-31 18:31   ` Elijah Newren via GitGitGadget
  2020-01-31 18:31   ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

dir's read_directory_recursive() naturally operates recursively in order
to walk the directory tree.  Treating of directories is sometimes weird
because there are so many different permutations about how to handle
directories.  Some examples:

   * 'git ls-files -o --directory' only needs to know that a directory
     itself is untracked; it doesn't need to recurse into it to see what
     is underneath.

   * 'git status' needs to recurse into an untracked directory, but only
     to determine whether or not it is empty.  If there are no files
     underneath, the directory itself will be omitted from the output.
     If it is not empty, only the directory will be listed.

   * 'git status --ignored' needs to recurse into untracked directories
     and report all the ignored entries and then report the directory as
     untracked -- UNLESS all the entries under the directory are
     ignored, in which case we don't print any of the entries under the
     directory and just report the directory itself as ignored.  (Note
     that although this forces us to walk all untracked files underneath
     the directory as well, we strip them from the output, except for
     users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.)

   * For 'git clean', we may need to recurse into a directory that
     doesn't match any specified pathspecs, if it's possible that there
     is an entry underneath the directory that can match one of the
     pathspecs.  In such a case, we need to be careful to omit the
     directory itself from the list of paths (see commit 404ebceda01c
     ("dir: also check directories for matching pathspecs", 2019-09-17))

Part of the tension noted above is that the treatment of a directory can
change based on the files within it, and based on the various settings
in dir->flags.  Trying to keep this in mind while reading over the code,
it is easy to think in terms of "treat_directory() tells us what to do
with a directory, and read_directory_recursive() is the thing that
recurses".  Since we need to look into a directory to know how to treat
it, though, it is quite easy to decide to (also) recurse into the
directory from treat_directory() by adding a read_directory_recursive()
call.  Adding such a call is actually fine, IF we make sure that
read_directory_recursive() does not also recurse into that same
directory.

Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
for ignored files", 2017-05-18), added exactly such a case to the code,
meaning we'd have two calls to read_directory_recursive() for an
untracked directory.  So, if we had a file named
   one/two/three/four/five/somefile.txt
and nothing in one/ was tracked, then 'git status --ignored' would
call read_directory_recursive() twice on the directory 'one/', and
each of those would call read_directory_recursive() twice on the
directory 'one/two/', and so on until read_directory_recursive() was
called 2^5 times for 'one/two/three/four/five/'.

Avoid calling read_directory_recursive() twice per level by moving a
lot of the special logic into treat_directory().

Since dir.c is somewhat complex, extra cruft built up around this over
time.  While trying to unravel it, I noticed several instances where the
first call to read_directory_recursive() would return e.g.
path_untracked for some directory and a later one would return e.g.
path_none, despite the fact that the directory clearly should have been
considered untracked.  The code happened to work due to the side-effect
from the first invocation of adding untracked entries to dir->entries;
this allowed it to get the correct output despite the supposed override
in return value by the later call.

I am somewhat concerned that there are still bugs and maybe even
testcases with the wrong expectation.  I have tried to carefully
document treat_directory() since it becomes more complex after this
change (though much of this complexity came from elsewhere that probably
deserved better comments to begin with).  However, much of my work felt
more like a game of whackamole while attempting to make the code match
the existing regression tests than an attempt to create an
implementation that matched some clear design.  That seems wrong to me,
but the rules of existing behavior had so many special cases that I had
a hard time coming up with some overarching rules about what correct
behavior is for all cases, forcing me to hope that the regression tests
are correct and sufficient.  Such a hope seems likely to be ill-founded,
given my experience with dir.c-related testcases in the last few months:

  Examples where the documentation was hard to parse or even just wrong:
   * 3aca58045f4f (git-clean.txt: do not claim we will delete files with
                   -n/--dry-run, 2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
  Examples where testcases were declared wrong and changed:
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within
                   leading directories", 2019-12-10)
  Examples where testcases were clearly inadequate:
   * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an
                   ignored file breakage, 2019-08-25)
   * 7541cc530239 (t7300: add testcases showing failure to clean specified
                   pathspecs, 2019-09-17)
   * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item,
                   2019-09-17)
   * 404ebceda01c (dir: also check directories for matching pathspecs,
                   2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * 452efd11fbf6 (t3011: demonstrate directory traversal failures,
                   2019-12-10)
   * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19)
  Examples where "correct behavior" was unclear to everyone:
    https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
  Other commits of note:
   * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17)

However, on the positive side, it does make the code much faster.  For
the following simple shell loop in an empty repository:

  for depth in $(seq 10 25)
  do
    dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
    rm -rf dir
    mkdir -p $dirs
    >$dirs/untracked-file
    /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
  done

I saw the following timings, in seconds (note that the numbers are a
little noisy from run-to-run, but the trend is very clear with every
run):

    10: 0.03
    11: 0.05
    12: 0.08
    13: 0.19
    14: 0.29
    15: 0.50
    16: 1.05
    17: 2.11
    18: 4.11
    19: 8.60
    20: 17.55
    21: 33.87
    22: 68.71
    23: 140.05
    24: 274.45
    25: 551.15

For the above run, using strace I can look for the number of untracked
directories opened and can verify that it matches the expected
2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth).

After this fix, with strace I can verify that the number of untracked
directories that are opened drops to just $depth, and the timings all
drop to 0.00.  In fact, it isn't until a depth of 190 nested directories
that it sometimes starts reporting a time of 0.01 seconds and doesn't
consistently report 0.01 seconds until there are 240 nested directories.
The previous code would have taken
  17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
to have completed the 240 nested directories case.  It's not often
that you get to speed something up by a factor of 3*10^69.

WARNING: This change breaks t7063.  I don't know whether that is to be expected
(I now intentionally visit untracked directories differently so naturally the
untracked cache should change), or if I've broken something.  I'm hoping to get
an untracked cache expert to chime in...

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 105 insertions(+), 46 deletions(-)

diff --git a/dir.c b/dir.c
index 6867356a31..9816fa31d9 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
-	int nested_repo = 0;
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
+	enum path_treatment state;
+	int nested_repo = 0, old_ignored_nr, stop_early;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
@@ -1711,18 +1717,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/* This is the "show_other_directories" case */
 
-	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+	/*
+	 * We only need to recurse into untracked/ignored directories if
+	 * either of the following bits is set:
+	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
+	 *                           there are ignored directories below)
+	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
+	 *                                 the directory is empty)
+	 */
+	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
 		return excluded ? path_excluded : path_untracked;
 
+	/*
+	 * If we only want to determine if dirname is empty, then we can
+	 * stop at the first file we find underneath that directory rather
+	 * than continuing to recurse beyond it.  If DIR_SHOW_IGNORED_TOO
+	 * is set, then we want MORE than just determining if dirname is
+	 * empty.
+	 */
+	stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
+		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
+
+	/*
+	 * If /every/ file within an untracked directory is ignored, then
+	 * we want to treat the directory as ignored (for e.g. status
+	 * --porcelain), without listing the individual ignored files
+	 * underneath.  To do so, we'll save the current ignored_nr, and
+	 * pop all the ones added after it if it turns out the entire
+	 * directory is ignored.
+	 */
+	old_ignored_nr = dir->ignored_nr;
+
+	/* Actually recurse into dirname now, we'll fixup the state later. */
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
+	state = read_directory_recursive(dir, istate, dirname, len, untracked,
+					 stop_early, stop_early, pathspec);
+
+	/* There are a variety of reasons we may need to fixup the state... */
+	if (state == path_excluded) {
+		int i;
+
+		/*
+		 * When stop_early is set, read_directory_recursive() will
+		 * never return path_untracked regardless of whether
+		 * underlying paths were untracked or ignored (because
+		 * returning early means it excluded some paths, or
+		 * something like that -- see commit 5aaa7fd39aaf ("Improve
+		 * performance of git status --ignored", 2017-09-18)).
+		 * However, we're not really concerned with the status of
+		 * files under the directory, we just wanted to know
+		 * whether the directory was empty (state == path_none) or
+		 * not (state == path_excluded), and if not, we'd return
+		 * our original status based on whether the untracked
+		 * directory matched an exclusion pattern.
+		 */
+		if (stop_early)
+			state = excluded ? path_excluded : path_untracked;
+
+		else {
+			/*
+			 * When
+			 *     !stop_early && state == path_excluded
+			 * then all paths under dirname were ignored.  For
+			 * this case, git status --porcelain wants to just
+			 * list the directory itself as ignored and not
+			 * list the individual paths underneath.  Remove
+			 * the individual paths underneath.
+			 */
+			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
+				free(dir->ignored[i]);
+			dir->ignored_nr = old_ignored_nr;
+		}
+	}
 
 	/*
-	 * If this is an excluded directory, then we only need to check if
-	 * the directory contains any files.
+	 * If there is nothing under the current directory and we are not
+	 * hiding empty directories, then we need to report on the
+	 * untracked or ignored status of the directory itself.
 	 */
-	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, excluded, pathspec);
+	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+		state = excluded ? path_excluded : path_untracked;
+
+	/*
+	 * We can recurse into untracked directories that don't match any
+	 * of the given pathspecs when some file underneath the directory
+	 * might match one of the pathspecs.  If so, we should make sure
+	 * to note that the directory itself did not match.
+	 */
+	if (pathspec &&
+	    !match_pathspec(istate, pathspec, dirname, len,
+			    0 /* prefix */, NULL,
+			    0 /* do NOT special case dirs */))
+		state = path_none;
+
+	return state;
 }
 
 /*
@@ -1870,6 +1959,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 					   int baselen,
 					   const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
 	strbuf_setlen(path, baselen);
 	if (!cdir->ucd) {
 		strbuf_addstr(path, cdir->file);
@@ -2175,14 +2269,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in treat_leading_path().  See the commit message for the
-	 * commit adding this warning as well as the commit preceding it
-	 * for details.
+	 * WARNING: Do NOT recurse unless path_recurse is returned from
+	 *          treat_path().  Recursing on any other return value
+	 *          can result in exponential slowdown.
 	 */
-
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2204,13 +2294,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			dir_state = state;
 
 		/* recurse into subdir if instructed by treat_path */
-		if ((state == path_recurse) ||
-			((state == path_untracked) &&
-			 (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) &&
-			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  (pathspec &&
-			   do_match_pathspec(istate, pathspec, path.buf, path.len,
-					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
+		if (state == path_recurse) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -2294,15 +2378,6 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
-	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in read_directory_recursive().  See 777b420347 (dir:
-	 * synchronize treat_leading_path() and read_directory_recursive(),
-	 * 2019-12-19) and its parent commit for details.
-	 */
-
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf subdir = STRBUF_INIT;
 	int prevlen, baselen;
@@ -2353,23 +2428,7 @@ static int treat_leading_path(struct dir_struct *dir,
 		strbuf_reset(&subdir);
 		strbuf_add(&subdir, path+prevlen, baselen-prevlen);
 		cdir.d_name = subdir.buf;
-		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
-				    pathspec);
-		if (state == path_untracked &&
-		    resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
-		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
-				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
-			if (!match_pathspec(istate, pathspec, sb.buf, sb.len,
-					    0 /* prefix */, NULL,
-					    0 /* do NOT special case dirs */))
-				state = path_none;
-			add_path_to_appropriate_result_list(dir, NULL, &cdir,
-							    istate,
-							    &sb, baselen,
-							    pathspec, state);
-			state = path_recurse;
-		}
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
 
 		if (state != path_recurse)
 			break; /* do not recurse into it */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 6/6] t7063: blindly accept diffs
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-01-31 18:31   ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
@ 2020-01-31 18:31   ` Elijah Newren via GitGitGadget
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  6 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Assuming that the changes I made in the last commit to drastically
modify how and when and especially how frequently untracked paths are
visited should result in changes to the untracked-cache, this commit
simply updates the t7063 testcases to match what the code now reports.

If this is correct, this commit should be squashed into the previous
one.

It'd be nice if I could get an untracked-cache expert to comment on
this...

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7063-status-untracked-cache.sh | 50 ++++++++++++-------------------
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 190ae149cf..c1b0fd0540 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -85,9 +85,7 @@ dtwo/
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 
 test_expect_success 'status first time (empty cache)' '
@@ -140,8 +138,6 @@ test_expect_success 'modify in root directory, one dir invalidation' '
 A  done/one
 A  one
 A  two
-?? dthree/
-?? dtwo/
 ?? four
 ?? three
 EOF
@@ -164,15 +160,11 @@ core.excludesfile 0000000000000000000000000000000000000000
 exclude_per_dir .gitignore
 flags 00000006
 / 0000000000000000000000000000000000000000 recurse valid
-dthree/
-dtwo/
 four
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -217,9 +209,7 @@ dtwo/
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -235,6 +225,7 @@ A  done/one
 A  one
 A  two
 ?? .gitignore
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -256,11 +247,11 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -277,7 +268,6 @@ flags 00000006
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -290,7 +280,6 @@ test_expect_success 'status after the move' '
 A  done/one
 A  one
 ?? .gitignore
-?? dtwo/
 ?? two
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -312,12 +301,10 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 two
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -334,7 +321,6 @@ flags 00000006
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -348,7 +334,6 @@ A  done/one
 A  one
 A  two
 ?? .gitignore
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
@@ -369,11 +354,9 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -392,7 +375,6 @@ test_expect_success 'status after commit' '
 	git status --porcelain >../actual &&
 	cat >../status.expect <<EOF &&
 ?? .gitignore
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
@@ -413,11 +395,9 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -451,7 +431,6 @@ test_expect_success 'test sparse status with untracked cache' '
  M done/two
 ?? .gitignore
 ?? done/five
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
@@ -472,12 +451,10 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -491,7 +468,6 @@ test_expect_success 'test sparse status again with untracked cache' '
  M done/two
 ?? .gitignore
 ?? done/five
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
@@ -519,7 +495,6 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 ?? .gitignore
 ?? done/five
 ?? done/sub/
-?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
@@ -540,17 +515,13 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
-dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
 sub/
 /done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
-sub/
 /done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
-file
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
 /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect-from-test-dump ../actual
 '
@@ -615,6 +586,23 @@ test_expect_success 'setting core.untrackedCache to true and using git status cr
 	test_cmp ../expect-no-uc ../actual &&
 	git status &&
 	test-tool dump-untracked-cache >../actual &&
+	cat >../expect-from-test-dump <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
+.gitignore
+dthree/
+dtwo/
+/done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
+five
+sub/
+/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
+/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+EOF
 	test_cmp ../expect-from-test-dump ../actual
 '
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-01-31 18:31   ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
@ 2020-03-25 19:31   ` Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
                       ` (7 more replies)
  6 siblings, 8 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren

Sorry for the really long delay on this series; I think I finally figured
out the untracked cache stuff, though it comes with bad news: existing users
of untracked cache are dealing with a buggy implementation that may be
causing git commands that list untracked files to omit expected paths from
the output. This series fixes it, though with a big hammer (partial
disabling of the cache; see the final commit in the series).

Also, as before, this series provides some "modest" speedups (see last
commit message), and should allow 'git status --ignored' to complete in a
more reasonable timeframe for Martin Melka (see 
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
).

Changes since v2:

 * Added a patch at the beginning which highlights how the untracked cache
   has been broken from the beginning. Using it will result in other
   commands giving erroneous output. At least, before this series it did.
 * Added another patch at the beginning of the series to fix a simple
   comment typo.
 * Dropped the final patch of the previous series, and instead squashed in a
   fix for the untracked cache problems to what is now the final patch of
   the series. I would have liked to have made that a separate commit
   earlier in the series, but the fix depended on both disabling the
   check_only portion of the cache and the avoid-exponential-visiting. If I
   move the partial disabling earlier, nothing is fixed and stuff is still
   visited. If I move the partial disabling later, then I have to
   temporarily mark lots of extra tests with test_expect_failure. It's only
   three extra one-line changes to dir.c, which you can probably spot in the
   range-diff.

Stuff still missing from v3:

 * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in 
   https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ 
   which I think would make the code cleaner & clearer. I guess I'm leaving
   that for future work.

As per the commit message of the final patch, this series has some risk.
Extra eyes would be greatly appreciated. Also, we should probably merge it
early in some cycle, either this one or a later one.

Derrick Stolee (1):
  dir: refactor treat_directory to clarify control flow

Elijah Newren (6):
  t7063: correct broken test expectation
  dir: fix simple typo in comment
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: replace exponential algorithm with a linear one, fix untracked
    cache

 dir.c                             | 339 +++++++++++++++++-------------
 t/t7063-status-untracked-cache.sh |  79 +++----
 2 files changed, 220 insertions(+), 198 deletions(-)


base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v3
Pull-Request: https://github.com/git/git/pull/700

Range-diff vs v2:

 6:  9a3f20656e2 ! 1:  d4fe5d33577 t7063: blindly accept diffs
     @@ -1,17 +1,61 @@
      Author: Elijah Newren <newren@gmail.com>
      
     -    t7063: blindly accept diffs
     +    t7063: correct broken test expectation
      
     -    Assuming that the changes I made in the last commit to drastically
     -    modify how and when and especially how frequently untracked paths are
     -    visited should result in changes to the untracked-cache, this commit
     -    simply updates the t7063 testcases to match what the code now reports.
     +    The untracked cache is caching wrong information, resulting in commands
     +    like `git status --porcelain` producing erroneous answers.  The tests in
     +    t7063 actually have a wide enough test to catch a relevant case, in
     +    particular surrounding the directory 'dthree/', but it appears the
     +    answers were not checked quite closely enough and the tests were coded
     +    with the wrong expectation.  Once the wrong info got into the cache in
     +    an early test, since later tests built on it, many others have a wrong
     +    expectation as well.  This affects just over a third of the tests in
     +    t7063.
      
     -    If this is correct, this commit should be squashed into the previous
     -    one.
     +    The error can be seen starting at t7063.12 (the first one switched from
     +    expect_success to expect_failure in this patch).  That test runs in a
     +    directory with the following files present:
     +      done/one
     +      dthree/three
     +      dtwo/two
     +      four
     +      .gitignore
     +      one
     +      three
     +      two
      
     -    It'd be nice if I could get an untracked-cache expert to comment on
     -    this...
     +    Of those files, the following files are tracked:
     +      done/one
     +      one
     +      two
     +
     +    and the contents of .gitignore are:
     +      four
     +
     +    and the contents of .git/info/exclude are:
     +      three
     +
     +    And there is no core.excludesfile.  Therefore, the following should be
     +    untracked:
     +      .gitignore
     +      dthree/
     +      dtwo/
     +    Indeed, these three paths are reported if you run
     +      git ls-files -o --directory --exclude-standard
     +    within this directory.  However, 'git status --porcelain' was reporting
     +    for this test:
     +      A  done/one
     +      A  one
     +      A  two
     +      ?? .gitignore
     +      ?? dtwo/
     +    which was clearly wrong -- dthree/ should also be listed as untracked.
     +    This appears to have been broken since the test was introduced with
     +    commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08).
     +    Correct the test to expect the right output, marking the test as failed
     +    for now.  Make the same change throughout the remainder of the testsuite
     +    to reflect that dthree/ remains an untracked directory throughout and
     +    should be recognized as such.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ -19,50 +63,14 @@
       --- a/t/t7063-status-untracked-cache.sh
       +++ b/t/t7063-status-untracked-cache.sh
      @@
     - three
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --three
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 
     - test_expect_success 'status first time (empty cache)' '
     -@@
     - A  done/one
     - A  one
     - A  two
     --?? dthree/
     --?? dtwo/
     - ?? four
     - ?? three
     - EOF
     -@@
     - exclude_per_dir .gitignore
     - flags 00000006
     - / 0000000000000000000000000000000000000000 recurse valid
     --dthree/
     --dtwo/
     - four
     - three
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --three
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     -@@
     - three
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --three
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
       	test_cmp ../expect ../actual
       '
     + 
     +-test_expect_success 'new info/exclude invalidates everything' '
     ++test_expect_failure 'new info/exclude invalidates everything' '
     + 	avoid_racy &&
     + 	echo three >>.git/info/exclude &&
     + 	: >../trace &&
      @@
       A  one
       A  two
     @@ -71,6 +79,15 @@
       ?? dtwo/
       EOF
       	test_cmp ../status.expect ../actual &&
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'verify untracked cache dump' '
     ++test_expect_failure 'verify untracked cache dump' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
      @@
       flags 00000006
       / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     @@ -79,164 +96,246 @@
       dtwo/
       /done/ 0000000000000000000000000000000000000000 recurse valid
       /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
      @@
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
       	test_cmp ../expect ../actual
       '
     + 
     +-test_expect_success 'status after the move' '
     ++test_expect_failure 'status after the move' '
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
      @@
       A  done/one
       A  one
       ?? .gitignore
     --?? dtwo/
     ++?? dthree/
     + ?? dtwo/
       ?? two
       EOF
     - 	test_cmp ../status.expect ../actual &&
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'verify untracked cache dump' '
     ++test_expect_failure 'verify untracked cache dump' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
      @@
       flags 00000006
       / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
       .gitignore
     --dtwo/
     ++dthree/
     + dtwo/
       two
       /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
      @@
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
       	test_cmp ../expect ../actual
       '
     + 
     +-test_expect_success 'status after the move' '
     ++test_expect_failure 'status after the move' '
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
      @@
       A  one
       A  two
       ?? .gitignore
     --?? dtwo/
     ++?? dthree/
     + ?? dtwo/
       EOF
       	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'verify untracked cache dump' '
     ++test_expect_failure 'verify untracked cache dump' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
      @@
       flags 00000006
       / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
       .gitignore
     --dtwo/
     ++dthree/
     + dtwo/
       /done/ 0000000000000000000000000000000000000000 recurse valid
       /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
      @@
     + 	git commit -m "first commit"
     + '
     + 
     +-test_expect_success 'status after commit' '
     ++test_expect_failure 'status after commit' '
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
       	git status --porcelain >../actual &&
       	cat >../status.expect <<EOF &&
       ?? .gitignore
     --?? dtwo/
     ++?? dthree/
     + ?? dtwo/
       EOF
       	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'untracked cache correct after commit' '
     ++test_expect_failure 'untracked cache correct after commit' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
      @@
       flags 00000006
       / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
       .gitignore
     --dtwo/
     ++dthree/
     + dtwo/
       /done/ 0000000000000000000000000000000000000000 recurse valid
       /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     +@@
     + 	sync_mtime
       '
     + 
     +-test_expect_success 'test sparse status with untracked cache' '
     ++test_expect_failure 'test sparse status with untracked cache' '
     + 	: >../trace &&
     + 	avoid_racy &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
      @@
        M done/two
       ?? .gitignore
       ?? done/five
     --?? dtwo/
     ++?? dthree/
     + ?? dtwo/
       EOF
       	test_cmp ../status.expect ../status.actual &&
     - 	cat >../trace.expect <<EOF &&
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'untracked cache correct after status' '
     ++test_expect_failure 'untracked cache correct after status' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
      @@
       flags 00000006
       / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
       .gitignore
     --dtwo/
     ++dthree/
     + dtwo/
       /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
       five
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     +@@
       	test_cmp ../expect ../actual
       '
     + 
     +-test_expect_success 'test sparse status again with untracked cache' '
     ++test_expect_failure 'test sparse status again with untracked cache' '
     + 	avoid_racy &&
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
      @@
        M done/two
       ?? .gitignore
       ?? done/five
     --?? dtwo/
     ++?? dthree/
     + ?? dtwo/
       EOF
       	test_cmp ../status.expect ../status.actual &&
     - 	cat >../trace.expect <<EOF &&
     +@@
     + 	echo "sub" > done/sub/sub/file
     + '
     + 
     +-test_expect_success 'test sparse status with untracked cache and subdir' '
     ++test_expect_failure 'test sparse status with untracked cache and subdir' '
     + 	avoid_racy &&
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
      @@
       ?? .gitignore
       ?? done/five
       ?? done/sub/
     --?? dtwo/
     ++?? dthree/
     + ?? dtwo/
       EOF
       	test_cmp ../status.expect ../status.actual &&
     - 	cat >../trace.expect <<EOF &&
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
     ++test_expect_failure 'verify untracked cache dump (sparse/subdirs)' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect-from-test-dump <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
      @@
       flags 00000006
       / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
       .gitignore
     --dtwo/
     ++dthree/
     + dtwo/
       /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
       five
     - sub/
     - /done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     --sub/
     - /done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     --file
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     +@@
       	test_cmp ../expect-from-test-dump ../actual
       '
     + 
     +-test_expect_success 'test sparse status again with untracked cache and subdir' '
     ++test_expect_failure 'test sparse status again with untracked cache and subdir' '
     + 	avoid_racy &&
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
      @@
     - 	test_cmp ../expect-no-uc ../actual &&
     - 	git status &&
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_success 'move entry in subdir from untracked to cached' '
     ++test_expect_failure 'move entry in subdir from untracked to cached' '
     + 	git add dtwo/two &&
     + 	git status --porcelain >../status.actual &&
     + 	cat >../status.expect <<EOF &&
     +@@
     + ?? .gitignore
     + ?? done/five
     + ?? done/sub/
     ++?? dthree/
     + EOF
     + 	test_cmp ../status.expect ../status.actual
     + '
     + 
     +-test_expect_success 'move entry in subdir from cached to untracked' '
     ++test_expect_failure 'move entry in subdir from cached to untracked' '
     + 	git rm --cached dtwo/two &&
     + 	git status --porcelain >../status.actual &&
     + 	cat >../status.expect <<EOF &&
     +@@
     + ?? .gitignore
     + ?? done/five
     + ?? done/sub/
     ++?? dthree/
     + ?? dtwo/
     + EOF
     + 	test_cmp ../status.expect ../status.actual
     +@@
     + 	test_cmp ../expect-no-uc ../actual
     + '
     + 
     +-test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' '
     ++test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' '
     + 	git config core.untrackedCache true &&
       	test-tool dump-untracked-cache >../actual &&
     -+	cat >../expect-from-test-dump <<EOF &&
     -+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -+core.excludesfile 0000000000000000000000000000000000000000
     -+exclude_per_dir .gitignore
     -+flags 00000006
     -+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     -+.gitignore
     -+dthree/
     -+dtwo/
     -+/done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     -+five
     -+sub/
     -+/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     -+/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     -+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     -+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     -+EOF
     - 	test_cmp ../expect-from-test-dump ../actual
     + 	test_cmp ../expect-no-uc ../actual &&
     +@@
     + 	test_cmp ../expect-empty ../actual
       '
       
     +-test_expect_success 'setting core.untrackedCache to keep' '
     ++test_expect_failure 'setting core.untrackedCache to keep' '
     + 	git config core.untrackedCache keep &&
     + 	git update-index --untracked-cache &&
     + 	test-tool dump-untracked-cache >../actual &&
 -:  ----------- > 2:  b20bc3b9afd dir: fix simple typo in comment
 1:  27bc1357964 = 3:  fa9035949e0 dir: consolidate treat_path() and treat_one_path()
 2:  2ceb64ae61e = 4:  02e652d1869 dir: fix broken comment
 3:  e6d21228d12 = 5:  705c008d993 dir: fix confusion based on variable tense
 4:  f73f0d66d14 = 6:  f5d69102946 dir: refactor treat_directory to clarify control flow
 5:  d3136ef52f3 ! 7:  6cfca619e2c dir: replace exponential algorithm with a linear one
     @@ -1,6 +1,6 @@
      Author: Elijah Newren <newren@gmail.com>
      
     -    dir: replace exponential algorithm with a linear one
     +    dir: replace exponential algorithm with a linear one, fix untracked cache
      
          dir's read_directory_recursive() naturally operates recursively in order
          to walk the directory tree.  Treating of directories is sometimes weird
     @@ -161,10 +161,27 @@
          to have completed the 240 nested directories case.  It's not often
          that you get to speed something up by a factor of 3*10^69.
      
     -    WARNING: This change breaks t7063.  I don't know whether that is to be expected
     -    (I now intentionally visit untracked directories differently so naturally the
     -    untracked cache should change), or if I've broken something.  I'm hoping to get
     -    an untracked cache expert to chime in...
     +    Finally, this also fixes the untracked cache, as noted by the test fixes
     +    in t7063.  Unfortunately, it does so by passing stop_at_first_file to
     +    close_cached_dir() in order to disable the caching of whether
     +    directories were empty (this caching was only relevant for directories
     +    that we knew we didn't need to walk all the entries under but just
     +    needed to know whether the directory had any entries within it in order
     +    to know if the directory itself should be marked as path_none or
     +    path_untracked).  I'm not convinced that disabling the is-the-dir-empty
     +    check is necessary; there is probably some way to still cache that and
     +    not get erroneous results.  However, I have not figured out how to do
     +    so.  If I revert the change to close_cached_dir() in this patch (thus
     +    continuing to cache cases where stop_at_first_file is true meaning we
     +    continue to cache whether directories are empty), then the untracked
     +    cache breakage in t7063 becomes more prevalant.  With my change to
     +    close_cached_dir() and the other changes to avoid traversing directories
     +    2^n times in this patch, I not only avoid making the untracked_cache
     +    breakage in t7063 worse but actually fix the existing breakage.  Update
     +    the test results in t7063 to no longer expect check_only cache entries,
     +    to reflect that we have to do a bit more work in terms of how many
     +    directories we have to open, and to reflect that we fixed the 1/3 of
     +    tests that were broken in that testsuite.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ -305,6 +322,24 @@
       	strbuf_setlen(path, baselen);
       	if (!cdir->ucd) {
       		strbuf_addstr(path, cdir->file);
     +@@
     + 	return -1;
     + }
     + 
     +-static void close_cached_dir(struct cached_dir *cdir)
     ++static void close_cached_dir(struct cached_dir *cdir, int stop_at_first_file)
     + {
     + 	if (cdir->fdir)
     + 		closedir(cdir->fdir);
     +@@
     + 	 * We have gone through this directory and found no untracked
     + 	 * entries. Mark it valid.
     + 	 */
     +-	if (cdir->untracked) {
     ++	if (!stop_at_first_file && cdir->untracked) {
     + 		cdir->untracked->valid = 1;
     + 		cdir->untracked->recurse = 1;
     + 	}
      @@
       	int stop_at_first_file, const struct pathspec *pathspec)
       {
     @@ -338,6 +373,15 @@
       			struct untracked_cache_dir *ud;
       			ud = lookup_untracked(dir->untracked, untracked,
       					      path.buf + baselen,
     +@@
     + 						    istate, &path, baselen,
     + 						    pathspec, state);
     + 	}
     +-	close_cached_dir(&cdir);
     ++	close_cached_dir(&cdir, stop_at_first_file);
     +  out:
     + 	strbuf_release(&path);
     + 
      @@
       			      const char *path, int len,
       			      const struct pathspec *pathspec)
     @@ -379,3 +423,342 @@
       
       		if (state != path_recurse)
       			break; /* do not recurse into it */
     +
     + diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
     + --- a/t/t7063-status-untracked-cache.sh
     + +++ b/t/t7063-status-untracked-cache.sh
     +@@
     + dtwo/
     + three
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-three
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 
     + test_expect_success 'status first time (empty cache)' '
     +@@
     + EOF
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 0
     + directory invalidation: 1
     +-opendir: 1
     ++opendir: 3
     + EOF
     + 	test_cmp ../trace.expect ../trace
     + 
     +@@
     + four
     + three
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-three
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     +@@
     + EOF
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 1
     + directory invalidation: 1
     + opendir: 4
     +@@
     + dtwo/
     + three
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-three
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     + 
     +-test_expect_failure 'new info/exclude invalidates everything' '
     ++test_expect_success 'new info/exclude invalidates everything' '
     + 	avoid_racy &&
     + 	echo three >>.git/info/exclude &&
     + 	: >../trace &&
     +@@
     + EOF
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 1
     + directory invalidation: 0
     + opendir: 4
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'verify untracked cache dump' '
     ++test_expect_success 'verify untracked cache dump' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     +@@
     + dthree/
     + dtwo/
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     +@@
     + flags 00000006
     + / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     + 
     +-test_expect_failure 'status after the move' '
     ++test_expect_success 'status after the move' '
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     +@@
     + EOF
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 0
     + directory invalidation: 0
     +-opendir: 1
     ++opendir: 3
     + EOF
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'verify untracked cache dump' '
     ++test_expect_success 'verify untracked cache dump' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     +@@
     + dtwo/
     + two
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     +@@
     + flags 00000006
     + / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     + 
     +-test_expect_failure 'status after the move' '
     ++test_expect_success 'status after the move' '
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     +@@
     + EOF
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 0
     + directory invalidation: 0
     +-opendir: 1
     ++opendir: 3
     + EOF
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'verify untracked cache dump' '
     ++test_expect_success 'verify untracked cache dump' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     +@@
     + dthree/
     + dtwo/
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     +@@
     + 	git commit -m "first commit"
     + '
     + 
     +-test_expect_failure 'status after commit' '
     ++test_expect_success 'status after commit' '
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     +@@
     + EOF
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 0
     + directory invalidation: 0
     +-opendir: 2
     ++opendir: 4
     + EOF
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'untracked cache correct after commit' '
     ++test_expect_success 'untracked cache correct after commit' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     +@@
     + dthree/
     + dtwo/
     + /done/ 0000000000000000000000000000000000000000 recurse valid
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     +@@
     + 	sync_mtime
     + '
     + 
     +-test_expect_failure 'test sparse status with untracked cache' '
     ++test_expect_success 'test sparse status with untracked cache' '
     + 	: >../trace &&
     + 	avoid_racy &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     +@@
     + EOF
     + 	test_cmp ../status.expect ../status.actual &&
     + 	cat >../trace.expect <<EOF &&
     +-node creation: 0
     ++node creation: 2
     + gitignore invalidation: 1
     + directory invalidation: 2
     +-opendir: 2
     ++opendir: 4
     + EOF
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'untracked cache correct after status' '
     ++test_expect_success 'untracked cache correct after status' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     +@@
     + dtwo/
     + /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     + five
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect ../actual
     + '
     + 
     +-test_expect_failure 'test sparse status again with untracked cache' '
     ++test_expect_success 'test sparse status again with untracked cache' '
     + 	avoid_racy &&
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     +@@
     + 	echo "sub" > done/sub/sub/file
     + '
     + 
     +-test_expect_failure 'test sparse status with untracked cache and subdir' '
     ++test_expect_success 'test sparse status with untracked cache and subdir' '
     + 	avoid_racy &&
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'verify untracked cache dump (sparse/subdirs)' '
     ++test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
     + 	test-tool dump-untracked-cache >../actual &&
     + 	cat >../expect-from-test-dump <<EOF &&
     + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     +@@
     + /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     + five
     + sub/
     +-/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-sub/
     +-/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-file
     +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     +-two
     + EOF
     + 	test_cmp ../expect-from-test-dump ../actual
     + '
     + 
     +-test_expect_failure 'test sparse status again with untracked cache and subdir' '
     ++test_expect_success 'test sparse status again with untracked cache and subdir' '
     + 	avoid_racy &&
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     +@@
     + 	test_cmp ../trace.expect ../trace
     + '
     + 
     +-test_expect_failure 'move entry in subdir from untracked to cached' '
     ++test_expect_success 'move entry in subdir from untracked to cached' '
     + 	git add dtwo/two &&
     + 	git status --porcelain >../status.actual &&
     + 	cat >../status.expect <<EOF &&
     +@@
     + 	test_cmp ../status.expect ../status.actual
     + '
     + 
     +-test_expect_failure 'move entry in subdir from cached to untracked' '
     ++test_expect_success 'move entry in subdir from cached to untracked' '
     + 	git rm --cached dtwo/two &&
     + 	git status --porcelain >../status.actual &&
     + 	cat >../status.expect <<EOF &&
     +@@
     + 	test_cmp ../expect-no-uc ../actual
     + '
     + 
     +-test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' '
     ++test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' '
     + 	git config core.untrackedCache true &&
     + 	test-tool dump-untracked-cache >../actual &&
     + 	test_cmp ../expect-no-uc ../actual &&
     +@@
     + 	test_cmp ../expect-empty ../actual
     + '
     + 
     +-test_expect_failure 'setting core.untrackedCache to keep' '
     ++test_expect_success 'setting core.untrackedCache to keep' '
     + 	git config core.untrackedCache keep &&
     + 	git update-index --untracked-cache &&
     + 	test-tool dump-untracked-cache >../actual &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 1/7] t7063: correct broken test expectation
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
@ 2020-03-25 19:31     ` Elijah Newren via GitGitGadget
  2020-03-26 13:02       ` Derrick Stolee
  2020-03-25 19:31     ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The untracked cache is caching wrong information, resulting in commands
like `git status --porcelain` producing erroneous answers.  The tests in
t7063 actually have a wide enough test to catch a relevant case, in
particular surrounding the directory 'dthree/', but it appears the
answers were not checked quite closely enough and the tests were coded
with the wrong expectation.  Once the wrong info got into the cache in
an early test, since later tests built on it, many others have a wrong
expectation as well.  This affects just over a third of the tests in
t7063.

The error can be seen starting at t7063.12 (the first one switched from
expect_success to expect_failure in this patch).  That test runs in a
directory with the following files present:
  done/one
  dthree/three
  dtwo/two
  four
  .gitignore
  one
  three
  two

Of those files, the following files are tracked:
  done/one
  one
  two

and the contents of .gitignore are:
  four

and the contents of .git/info/exclude are:
  three

And there is no core.excludesfile.  Therefore, the following should be
untracked:
  .gitignore
  dthree/
  dtwo/
Indeed, these three paths are reported if you run
  git ls-files -o --directory --exclude-standard
within this directory.  However, 'git status --porcelain' was reporting
for this test:
  A  done/one
  A  one
  A  two
  ?? .gitignore
  ?? dtwo/
which was clearly wrong -- dthree/ should also be listed as untracked.
This appears to have been broken since the test was introduced with
commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08).
Correct the test to expect the right output, marking the test as failed
for now.  Make the same change throughout the remainder of the testsuite
to reflect that dthree/ remains an untracked directory throughout and
should be recognized as such.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7063-status-untracked-cache.sh | 51 ++++++++++++++++++++-----------
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 190ae149cf3..41705ec1526 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -224,7 +224,7 @@ EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_success 'new info/exclude invalidates everything' '
+test_expect_failure 'new info/exclude invalidates everything' '
 	avoid_racy &&
 	echo three >>.git/info/exclude &&
 	: >../trace &&
@@ -235,6 +235,7 @@ A  done/one
 A  one
 A  two
 ?? .gitignore
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -247,7 +248,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'verify untracked cache dump' '
+test_expect_failure 'verify untracked cache dump' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -256,6 +257,7 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
@@ -282,7 +284,7 @@ EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_success 'status after the move' '
+test_expect_failure 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
@@ -290,6 +292,7 @@ test_expect_success 'status after the move' '
 A  done/one
 A  one
 ?? .gitignore
+?? dthree/
 ?? dtwo/
 ?? two
 EOF
@@ -303,7 +306,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'verify untracked cache dump' '
+test_expect_failure 'verify untracked cache dump' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -312,6 +315,7 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 two
 /done/ 0000000000000000000000000000000000000000 recurse valid
@@ -339,7 +343,7 @@ EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_success 'status after the move' '
+test_expect_failure 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
@@ -348,6 +352,7 @@ A  done/one
 A  one
 A  two
 ?? .gitignore
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -360,7 +365,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'verify untracked cache dump' '
+test_expect_failure 'verify untracked cache dump' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -369,6 +374,7 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
@@ -386,12 +392,13 @@ test_expect_success 'set up for sparse checkout testing' '
 	git commit -m "first commit"
 '
 
-test_expect_success 'status after commit' '
+test_expect_failure 'status after commit' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
 	cat >../status.expect <<EOF &&
 ?? .gitignore
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../actual &&
@@ -404,7 +411,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'untracked cache correct after commit' '
+test_expect_failure 'untracked cache correct after commit' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -413,6 +420,7 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
 /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
@@ -442,7 +450,7 @@ test_expect_success 'create/modify files, some of which are gitignored' '
 	sync_mtime
 '
 
-test_expect_success 'test sparse status with untracked cache' '
+test_expect_failure 'test sparse status with untracked cache' '
 	: >../trace &&
 	avoid_racy &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -451,6 +459,7 @@ test_expect_success 'test sparse status with untracked cache' '
  M done/two
 ?? .gitignore
 ?? done/five
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
@@ -463,7 +472,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'untracked cache correct after status' '
+test_expect_failure 'untracked cache correct after status' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -472,6 +481,7 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
@@ -482,7 +492,7 @@ EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_success 'test sparse status again with untracked cache' '
+test_expect_failure 'test sparse status again with untracked cache' '
 	avoid_racy &&
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -491,6 +501,7 @@ test_expect_success 'test sparse status again with untracked cache' '
  M done/two
 ?? .gitignore
 ?? done/five
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
@@ -509,7 +520,7 @@ test_expect_success 'set up for test of subdir and sparse checkouts' '
 	echo "sub" > done/sub/sub/file
 '
 
-test_expect_success 'test sparse status with untracked cache and subdir' '
+test_expect_failure 'test sparse status with untracked cache and subdir' '
 	avoid_racy &&
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -519,6 +530,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 ?? .gitignore
 ?? done/five
 ?? done/sub/
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual &&
@@ -531,7 +543,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
+test_expect_failure 'verify untracked cache dump (sparse/subdirs)' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect-from-test-dump <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -540,6 +552,7 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
 .gitignore
+dthree/
 dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
@@ -555,7 +568,7 @@ EOF
 	test_cmp ../expect-from-test-dump ../actual
 '
 
-test_expect_success 'test sparse status again with untracked cache and subdir' '
+test_expect_failure 'test sparse status again with untracked cache and subdir' '
 	avoid_racy &&
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -570,7 +583,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_success 'move entry in subdir from untracked to cached' '
+test_expect_failure 'move entry in subdir from untracked to cached' '
 	git add dtwo/two &&
 	git status --porcelain >../status.actual &&
 	cat >../status.expect <<EOF &&
@@ -579,11 +592,12 @@ A  dtwo/two
 ?? .gitignore
 ?? done/five
 ?? done/sub/
+?? dthree/
 EOF
 	test_cmp ../status.expect ../status.actual
 '
 
-test_expect_success 'move entry in subdir from cached to untracked' '
+test_expect_failure 'move entry in subdir from cached to untracked' '
 	git rm --cached dtwo/two &&
 	git status --porcelain >../status.actual &&
 	cat >../status.expect <<EOF &&
@@ -591,6 +605,7 @@ test_expect_success 'move entry in subdir from cached to untracked' '
 ?? .gitignore
 ?? done/five
 ?? done/sub/
+?? dthree/
 ?? dtwo/
 EOF
 	test_cmp ../status.expect ../status.actual
@@ -609,7 +624,7 @@ test_expect_success 'git status does not change anything' '
 	test_cmp ../expect-no-uc ../actual
 '
 
-test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' '
+test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' '
 	git config core.untrackedCache true &&
 	test-tool dump-untracked-cache >../actual &&
 	test_cmp ../expect-no-uc ../actual &&
@@ -642,7 +657,7 @@ test_expect_success 'using --untracked-cache does not fail when core.untrackedCa
 	test_cmp ../expect-empty ../actual
 '
 
-test_expect_success 'setting core.untrackedCache to keep' '
+test_expect_failure 'setting core.untrackedCache to keep' '
 	git config core.untrackedCache keep &&
 	git update-index --untracked-cache &&
 	test-tool dump-untracked-cache >../actual &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 2/7] dir: fix simple typo in comment
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
@ 2020-03-25 19:31     ` Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index b460211e614..b505ba747bb 100644
--- a/dir.c
+++ b/dir.c
@@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir,
  * If 'stop_at_first_file' is specified, 'path_excluded' is returned
  * to signal that a file was found. This is the least significant value that
  * indicates that a file was encountered that does not depend on the order of
- * whether an untracked or exluded path was encountered first.
+ * whether an untracked or excluded path was encountered first.
  *
  * Returns the most significant path_treatment value encountered in the scan.
  * If 'stop_at_first_file' is specified, `path_excluded` is the most
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path()
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
@ 2020-03-25 19:31     ` Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 16e2cfa90993 ("read_directory(): further split treat_path()",
2010-01-08) split treat_one_path() out of treat_path(), because
treat_leading_path() would not have access to a dirent but wanted to
re-use as much of treat_path() as possible.  Not re-using all of
treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir:
fix checks on common prefix directory", 2019-12-19).  Finally, in commit
ad6f2157f951 ("dir: restructure in a way to avoid passing around a
struct dirent", 2020-01-16), dirents were removed from treat_path() and
other functions entirely.

Since the only reason for splitting these functions was the lack of a
dirent -- which no longer applies to either function -- and since the
split caused problems in the past resulting in us not using
treat_one_path() separately anymore, just undo the split.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 121 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 55 insertions(+), 66 deletions(-)

diff --git a/dir.c b/dir.c
index b505ba747bb..d0f3d660850 100644
--- a/dir.c
+++ b/dir.c
@@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate,
 	return dtype;
 }
 
-static enum path_treatment treat_one_path(struct dir_struct *dir,
-					  struct untracked_cache_dir *untracked,
-					  struct index_state *istate,
-					  struct strbuf *path,
-					  int baselen,
-					  const struct pathspec *pathspec,
-					  int dtype)
-{
-	int exclude;
-	int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case);
+static enum path_treatment treat_path_fast(struct dir_struct *dir,
+					   struct untracked_cache_dir *untracked,
+					   struct cached_dir *cdir,
+					   struct index_state *istate,
+					   struct strbuf *path,
+					   int baselen,
+					   const struct pathspec *pathspec)
+{
+	strbuf_setlen(path, baselen);
+	if (!cdir->ucd) {
+		strbuf_addstr(path, cdir->file);
+		return path_untracked;
+	}
+	strbuf_addstr(path, cdir->ucd->name);
+	/* treat_one_path() does this before it calls treat_directory() */
+	strbuf_complete(path, '/');
+	if (cdir->ucd->check_only)
+		/*
+		 * check_only is set as a result of treat_directory() getting
+		 * to its bottom. Verify again the same set of directories
+		 * with check_only set.
+		 */
+		return read_directory_recursive(dir, istate, path->buf, path->len,
+						cdir->ucd, 1, 0, pathspec);
+	/*
+	 * We get path_recurse in the first run when
+	 * directory_exists_in_index() returns index_nonexistent. We
+	 * are sure that new changes in the index does not impact the
+	 * outcome. Return now.
+	 */
+	return path_recurse;
+}
+
+static enum path_treatment treat_path(struct dir_struct *dir,
+				      struct untracked_cache_dir *untracked,
+				      struct cached_dir *cdir,
+				      struct index_state *istate,
+				      struct strbuf *path,
+				      int baselen,
+				      const struct pathspec *pathspec)
+{
+	int has_path_in_index, dtype, exclude;
 	enum path_treatment path_treatment;
 
-	dtype = resolve_dtype(dtype, istate, path->buf, path->len);
+	if (!cdir->d_name)
+		return treat_path_fast(dir, untracked, cdir, istate, path,
+				       baselen, pathspec);
+	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
+		return path_none;
+	strbuf_setlen(path, baselen);
+	strbuf_addstr(path, cdir->d_name);
+	if (simplify_away(path->buf, path->len, pathspec))
+		return path_none;
+
+	dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len);
 
 	/* Always exclude indexed files */
+	has_path_in_index = !!index_file_exists(istate, path->buf, path->len,
+						ignore_case);
 	if (dtype != DT_DIR && has_path_in_index)
 		return path_none;
 
@@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	}
 }
 
-static enum path_treatment treat_path_fast(struct dir_struct *dir,
-					   struct untracked_cache_dir *untracked,
-					   struct cached_dir *cdir,
-					   struct index_state *istate,
-					   struct strbuf *path,
-					   int baselen,
-					   const struct pathspec *pathspec)
-{
-	strbuf_setlen(path, baselen);
-	if (!cdir->ucd) {
-		strbuf_addstr(path, cdir->file);
-		return path_untracked;
-	}
-	strbuf_addstr(path, cdir->ucd->name);
-	/* treat_one_path() does this before it calls treat_directory() */
-	strbuf_complete(path, '/');
-	if (cdir->ucd->check_only)
-		/*
-		 * check_only is set as a result of treat_directory() getting
-		 * to its bottom. Verify again the same set of directories
-		 * with check_only set.
-		 */
-		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, 0, pathspec);
-	/*
-	 * We get path_recurse in the first run when
-	 * directory_exists_in_index() returns index_nonexistent. We
-	 * are sure that new changes in the index does not impact the
-	 * outcome. Return now.
-	 */
-	return path_recurse;
-}
-
-static enum path_treatment treat_path(struct dir_struct *dir,
-				      struct untracked_cache_dir *untracked,
-				      struct cached_dir *cdir,
-				      struct index_state *istate,
-				      struct strbuf *path,
-				      int baselen,
-				      const struct pathspec *pathspec)
-{
-	if (!cdir->d_name)
-		return treat_path_fast(dir, untracked, cdir, istate, path,
-				       baselen, pathspec);
-	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
-		return path_none;
-	strbuf_setlen(path, baselen);
-	strbuf_addstr(path, cdir->d_name);
-	if (simplify_away(path->buf, path->len, pathspec))
-		return path_none;
-
-	return treat_one_path(dir, untracked, istate, path, baselen, pathspec,
-			      cdir->d_type);
-}
-
 static void add_untracked(struct untracked_cache_dir *dir, const char *name)
 {
 	if (!dir)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 4/7] dir: fix broken comment
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-03-25 19:31     ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
@ 2020-03-25 19:31     ` Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d0f3d660850..3a367683661 100644
--- a/dir.c
+++ b/dir.c
@@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 					add_untracked(untracked, path.buf + baselen);
 				break;
 			}
-			/* skip the dir_add_* part */
+			/* skip the add_path_to_appropriate_result_list() */
 			continue;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 5/7] dir: fix confusion based on variable tense
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-03-25 19:31     ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
@ 2020-03-25 19:31     ` Elijah Newren via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Despite having contributed several fixes in this area, I have for months
(years?) assumed that the "exclude" variable was a directive; this
caused me to think of it as a different mode we operate in and left me
confused as I tried to build up a mental model around why we'd need such
a directive.  I mostly tried to ignore it while focusing on the pieces I
was trying to understand.

Then I finally traced this variable all back to a call to is_excluded(),
meaning it was actually functioning as an adjective.  In particular, it
was a checked property ("Does this path match a rule in .gitignore?"),
rather than a mode passed in from the caller.  Change the variable name
to match the part of speech used by the function called to define it,
which will hopefully make these bits of code slightly clearer to the
next reader.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index 3a367683661..8074e651e6f 100644
--- a/dir.c
+++ b/dir.c
@@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
 static enum path_treatment treat_directory(struct dir_struct *dir,
 	struct index_state *istate,
 	struct untracked_cache_dir *untracked,
-	const char *dirname, int len, int baselen, int exclude,
+	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
@@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		}
 		if (nested_repo)
 			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(exclude ? path_excluded : path_untracked));
+				(excluded ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
-		if (exclude &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
+		if (excluded &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
 
 			/*
 			 * This is an excluded directory and we are
@@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* This is the "show_other_directories" case */
 
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
@@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * the directory contains any files.
 	 */
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, exclude, pathspec);
+					untracked, 1, excluded, pathspec);
 }
 
 /*
@@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      int baselen,
 				      const struct pathspec *pathspec)
 {
-	int has_path_in_index, dtype, exclude;
+	int has_path_in_index, dtype, excluded;
 	enum path_treatment path_treatment;
 
 	if (!cdir->d_name)
@@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	    (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent))
 		return path_none;
 
-	exclude = is_excluded(dir, istate, path->buf, &dtype);
+	excluded = is_excluded(dir, istate, path->buf, &dtype);
 
 	/*
 	 * Excluded? If we don't explicitly want to show
 	 * ignored files, ignore it
 	 */
-	if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
+	if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
 		return path_excluded;
 
 	switch (dtype) {
@@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		strbuf_addch(path, '/');
 		path_treatment = treat_directory(dir, istate, untracked,
 						 path->buf, path->len,
-						 baselen, exclude, pathspec);
+						 baselen, excluded, pathspec);
 		/*
 		 * If 1) we only want to return directories that
 		 * match an exclude pattern and 2) this directory does
@@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		 * recurse into this directory (instead of marking the
 		 * directory itself as an ignored path).
 		 */
-		if (!exclude &&
+		if (!excluded &&
 		    path_treatment == path_excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
@@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		return path_treatment;
 	case DT_REG:
 	case DT_LNK:
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-03-25 19:31     ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
@ 2020-03-25 19:31     ` Derrick Stolee via GitGitGadget
  2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  7 siblings, 0 replies; 68+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The logic in treat_directory() is handled by a multi-case
switch statement, but this switch is very asymmetrical, as
the first two cases are simple but the third is more
complicated than the rest of the method. In fact, the third
case includes a "break" statement that leads to the block
of code outside the switch statement. That is the only way
to reach that block, as the switch handles all possible
values from directory_exists_in_index();

Extract the switch statement into a series of "if" statements.
This simplifies the trivial cases, while clarifying how to
reach the "show_other_directories" case. This is particularly
important as the "show_other_directories" case will expand
in a later change.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index 8074e651e6f..d9bcb7e19b6 100644
--- a/dir.c
+++ b/dir.c
@@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
-
 	/* The "len-1" is to strip the final '/' */
-	switch (directory_exists_in_index(istate, dirname, len-1)) {
-	case index_directory:
-		return path_recurse;
+	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
-	case index_gitdir:
+	if (status == index_directory)
+		return path_recurse;
+	if (status == index_gitdir)
 		return path_none;
+	if (status != index_nonexistent)
+		BUG("Unhandled value for directory_exists_in_index: %d\n", status);
 
-	case index_nonexistent:
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    !(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			nested_repo = is_nonbare_repository_dir(&sb);
-			strbuf_release(&sb);
-		}
-		if (nested_repo)
-			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(excluded ? path_excluded : path_untracked));
+	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		!(dir->flags & DIR_NO_GITLINKS)) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, dirname);
+		nested_repo = is_nonbare_repository_dir(&sb);
+		strbuf_release(&sb);
+	}
+	if (nested_repo)
+		return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+			(excluded ? path_excluded : path_untracked));
 
-		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
-			break;
+	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
 		if (excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2020-03-25 19:31     ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
@ 2020-03-25 19:31     ` Elijah Newren via GitGitGadget
  2020-03-26 13:13       ` Derrick Stolee
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  7 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

dir's read_directory_recursive() naturally operates recursively in order
to walk the directory tree.  Treating of directories is sometimes weird
because there are so many different permutations about how to handle
directories.  Some examples:

   * 'git ls-files -o --directory' only needs to know that a directory
     itself is untracked; it doesn't need to recurse into it to see what
     is underneath.

   * 'git status' needs to recurse into an untracked directory, but only
     to determine whether or not it is empty.  If there are no files
     underneath, the directory itself will be omitted from the output.
     If it is not empty, only the directory will be listed.

   * 'git status --ignored' needs to recurse into untracked directories
     and report all the ignored entries and then report the directory as
     untracked -- UNLESS all the entries under the directory are
     ignored, in which case we don't print any of the entries under the
     directory and just report the directory itself as ignored.  (Note
     that although this forces us to walk all untracked files underneath
     the directory as well, we strip them from the output, except for
     users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.)

   * For 'git clean', we may need to recurse into a directory that
     doesn't match any specified pathspecs, if it's possible that there
     is an entry underneath the directory that can match one of the
     pathspecs.  In such a case, we need to be careful to omit the
     directory itself from the list of paths (see commit 404ebceda01c
     ("dir: also check directories for matching pathspecs", 2019-09-17))

Part of the tension noted above is that the treatment of a directory can
change based on the files within it, and based on the various settings
in dir->flags.  Trying to keep this in mind while reading over the code,
it is easy to think in terms of "treat_directory() tells us what to do
with a directory, and read_directory_recursive() is the thing that
recurses".  Since we need to look into a directory to know how to treat
it, though, it is quite easy to decide to (also) recurse into the
directory from treat_directory() by adding a read_directory_recursive()
call.  Adding such a call is actually fine, IF we make sure that
read_directory_recursive() does not also recurse into that same
directory.

Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
for ignored files", 2017-05-18), added exactly such a case to the code,
meaning we'd have two calls to read_directory_recursive() for an
untracked directory.  So, if we had a file named
   one/two/three/four/five/somefile.txt
and nothing in one/ was tracked, then 'git status --ignored' would
call read_directory_recursive() twice on the directory 'one/', and
each of those would call read_directory_recursive() twice on the
directory 'one/two/', and so on until read_directory_recursive() was
called 2^5 times for 'one/two/three/four/five/'.

Avoid calling read_directory_recursive() twice per level by moving a
lot of the special logic into treat_directory().

Since dir.c is somewhat complex, extra cruft built up around this over
time.  While trying to unravel it, I noticed several instances where the
first call to read_directory_recursive() would return e.g.
path_untracked for some directory and a later one would return e.g.
path_none, despite the fact that the directory clearly should have been
considered untracked.  The code happened to work due to the side-effect
from the first invocation of adding untracked entries to dir->entries;
this allowed it to get the correct output despite the supposed override
in return value by the later call.

I am somewhat concerned that there are still bugs and maybe even
testcases with the wrong expectation.  I have tried to carefully
document treat_directory() since it becomes more complex after this
change (though much of this complexity came from elsewhere that probably
deserved better comments to begin with).  However, much of my work felt
more like a game of whackamole while attempting to make the code match
the existing regression tests than an attempt to create an
implementation that matched some clear design.  That seems wrong to me,
but the rules of existing behavior had so many special cases that I had
a hard time coming up with some overarching rules about what correct
behavior is for all cases, forcing me to hope that the regression tests
are correct and sufficient.  Such a hope seems likely to be ill-founded,
given my experience with dir.c-related testcases in the last few months:

  Examples where the documentation was hard to parse or even just wrong:
   * 3aca58045f4f (git-clean.txt: do not claim we will delete files with
                   -n/--dry-run, 2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
  Examples where testcases were declared wrong and changed:
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within
                   leading directories", 2019-12-10)
  Examples where testcases were clearly inadequate:
   * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an
                   ignored file breakage, 2019-08-25)
   * 7541cc530239 (t7300: add testcases showing failure to clean specified
                   pathspecs, 2019-09-17)
   * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item,
                   2019-09-17)
   * 404ebceda01c (dir: also check directories for matching pathspecs,
                   2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * 452efd11fbf6 (t3011: demonstrate directory traversal failures,
                   2019-12-10)
   * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19)
  Examples where "correct behavior" was unclear to everyone:
    https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
  Other commits of note:
   * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17)

However, on the positive side, it does make the code much faster.  For
the following simple shell loop in an empty repository:

  for depth in $(seq 10 25)
  do
    dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
    rm -rf dir
    mkdir -p $dirs
    >$dirs/untracked-file
    /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
  done

I saw the following timings, in seconds (note that the numbers are a
little noisy from run-to-run, but the trend is very clear with every
run):

    10: 0.03
    11: 0.05
    12: 0.08
    13: 0.19
    14: 0.29
    15: 0.50
    16: 1.05
    17: 2.11
    18: 4.11
    19: 8.60
    20: 17.55
    21: 33.87
    22: 68.71
    23: 140.05
    24: 274.45
    25: 551.15

For the above run, using strace I can look for the number of untracked
directories opened and can verify that it matches the expected
2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth).

After this fix, with strace I can verify that the number of untracked
directories that are opened drops to just $depth, and the timings all
drop to 0.00.  In fact, it isn't until a depth of 190 nested directories
that it sometimes starts reporting a time of 0.01 seconds and doesn't
consistently report 0.01 seconds until there are 240 nested directories.
The previous code would have taken
  17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
to have completed the 240 nested directories case.  It's not often
that you get to speed something up by a factor of 3*10^69.

Finally, this also fixes the untracked cache, as noted by the test fixes
in t7063.  Unfortunately, it does so by passing stop_at_first_file to
close_cached_dir() in order to disable the caching of whether
directories were empty (this caching was only relevant for directories
that we knew we didn't need to walk all the entries under but just
needed to know whether the directory had any entries within it in order
to know if the directory itself should be marked as path_none or
path_untracked).  I'm not convinced that disabling the is-the-dir-empty
check is necessary; there is probably some way to still cache that and
not get erroneous results.  However, I have not figured out how to do
so.  If I revert the change to close_cached_dir() in this patch (thus
continuing to cache cases where stop_at_first_file is true meaning we
continue to cache whether directories are empty), then the untracked
cache breakage in t7063 becomes more prevalant.  With my change to
close_cached_dir() and the other changes to avoid traversing directories
2^n times in this patch, I not only avoid making the untracked_cache
breakage in t7063 worse but actually fix the existing breakage.  Update
the test results in t7063 to no longer expect check_only cache entries,
to reflect that we have to do a bit more work in terms of how many
directories we have to open, and to reflect that we fixed the 1/3 of
tests that were broken in that testsuite.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             | 157 ++++++++++++++++++++----------
 t/t7063-status-untracked-cache.sh | 100 ++++++-------------
 2 files changed, 138 insertions(+), 119 deletions(-)

diff --git a/dir.c b/dir.c
index d9bcb7e19b6..803e2851964 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
-	int nested_repo = 0;
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
+	enum path_treatment state;
+	int nested_repo = 0, old_ignored_nr, stop_early;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
@@ -1711,18 +1717,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/* This is the "show_other_directories" case */
 
-	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+	/*
+	 * We only need to recurse into untracked/ignored directories if
+	 * either of the following bits is set:
+	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
+	 *                           there are ignored directories below)
+	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
+	 *                                 the directory is empty)
+	 */
+	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
 		return excluded ? path_excluded : path_untracked;
 
+	/*
+	 * If we only want to determine if dirname is empty, then we can
+	 * stop at the first file we find underneath that directory rather
+	 * than continuing to recurse beyond it.  If DIR_SHOW_IGNORED_TOO
+	 * is set, then we want MORE than just determining if dirname is
+	 * empty.
+	 */
+	stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
+		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
+
+	/*
+	 * If /every/ file within an untracked directory is ignored, then
+	 * we want to treat the directory as ignored (for e.g. status
+	 * --porcelain), without listing the individual ignored files
+	 * underneath.  To do so, we'll save the current ignored_nr, and
+	 * pop all the ones added after it if it turns out the entire
+	 * directory is ignored.
+	 */
+	old_ignored_nr = dir->ignored_nr;
+
+	/* Actually recurse into dirname now, we'll fixup the state later. */
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
+	state = read_directory_recursive(dir, istate, dirname, len, untracked,
+					 stop_early, stop_early, pathspec);
+
+	/* There are a variety of reasons we may need to fixup the state... */
+	if (state == path_excluded) {
+		int i;
+
+		/*
+		 * When stop_early is set, read_directory_recursive() will
+		 * never return path_untracked regardless of whether
+		 * underlying paths were untracked or ignored (because
+		 * returning early means it excluded some paths, or
+		 * something like that -- see commit 5aaa7fd39aaf ("Improve
+		 * performance of git status --ignored", 2017-09-18)).
+		 * However, we're not really concerned with the status of
+		 * files under the directory, we just wanted to know
+		 * whether the directory was empty (state == path_none) or
+		 * not (state == path_excluded), and if not, we'd return
+		 * our original status based on whether the untracked
+		 * directory matched an exclusion pattern.
+		 */
+		if (stop_early)
+			state = excluded ? path_excluded : path_untracked;
+
+		else {
+			/*
+			 * When
+			 *     !stop_early && state == path_excluded
+			 * then all paths under dirname were ignored.  For
+			 * this case, git status --porcelain wants to just
+			 * list the directory itself as ignored and not
+			 * list the individual paths underneath.  Remove
+			 * the individual paths underneath.
+			 */
+			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
+				free(dir->ignored[i]);
+			dir->ignored_nr = old_ignored_nr;
+		}
+	}
 
 	/*
-	 * If this is an excluded directory, then we only need to check if
-	 * the directory contains any files.
+	 * If there is nothing under the current directory and we are not
+	 * hiding empty directories, then we need to report on the
+	 * untracked or ignored status of the directory itself.
 	 */
-	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, excluded, pathspec);
+	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+		state = excluded ? path_excluded : path_untracked;
+
+	/*
+	 * We can recurse into untracked directories that don't match any
+	 * of the given pathspecs when some file underneath the directory
+	 * might match one of the pathspecs.  If so, we should make sure
+	 * to note that the directory itself did not match.
+	 */
+	if (pathspec &&
+	    !match_pathspec(istate, pathspec, dirname, len,
+			    0 /* prefix */, NULL,
+			    0 /* do NOT special case dirs */))
+		state = path_none;
+
+	return state;
 }
 
 /*
@@ -1870,6 +1959,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 					   int baselen,
 					   const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
 	strbuf_setlen(path, baselen);
 	if (!cdir->ucd) {
 		strbuf_addstr(path, cdir->file);
@@ -2102,7 +2196,7 @@ static int read_cached_dir(struct cached_dir *cdir)
 	return -1;
 }
 
-static void close_cached_dir(struct cached_dir *cdir)
+static void close_cached_dir(struct cached_dir *cdir, int stop_at_first_file)
 {
 	if (cdir->fdir)
 		closedir(cdir->fdir);
@@ -2110,7 +2204,7 @@ static void close_cached_dir(struct cached_dir *cdir)
 	 * We have gone through this directory and found no untracked
 	 * entries. Mark it valid.
 	 */
-	if (cdir->untracked) {
+	if (!stop_at_first_file && cdir->untracked) {
 		cdir->untracked->valid = 1;
 		cdir->untracked->recurse = 1;
 	}
@@ -2175,14 +2269,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in treat_leading_path().  See the commit message for the
-	 * commit adding this warning as well as the commit preceding it
-	 * for details.
+	 * WARNING: Do NOT recurse unless path_recurse is returned from
+	 *          treat_path().  Recursing on any other return value
+	 *          can result in exponential slowdown.
 	 */
-
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2204,13 +2294,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			dir_state = state;
 
 		/* recurse into subdir if instructed by treat_path */
-		if ((state == path_recurse) ||
-			((state == path_untracked) &&
-			 (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) &&
-			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  (pathspec &&
-			   do_match_pathspec(istate, pathspec, path.buf, path.len,
-					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
+		if (state == path_recurse) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -2266,7 +2350,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 						    istate, &path, baselen,
 						    pathspec, state);
 	}
-	close_cached_dir(&cdir);
+	close_cached_dir(&cdir, stop_at_first_file);
  out:
 	strbuf_release(&path);
 
@@ -2294,15 +2378,6 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
-	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in read_directory_recursive().  See 777b420347 (dir:
-	 * synchronize treat_leading_path() and read_directory_recursive(),
-	 * 2019-12-19) and its parent commit for details.
-	 */
-
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf subdir = STRBUF_INIT;
 	int prevlen, baselen;
@@ -2353,23 +2428,7 @@ static int treat_leading_path(struct dir_struct *dir,
 		strbuf_reset(&subdir);
 		strbuf_add(&subdir, path+prevlen, baselen-prevlen);
 		cdir.d_name = subdir.buf;
-		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
-				    pathspec);
-		if (state == path_untracked &&
-		    resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
-		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
-				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
-			if (!match_pathspec(istate, pathspec, sb.buf, sb.len,
-					    0 /* prefix */, NULL,
-					    0 /* do NOT special case dirs */))
-				state = path_none;
-			add_path_to_appropriate_result_list(dir, NULL, &cdir,
-							    istate,
-							    &sb, baselen,
-							    pathspec, state);
-			state = path_recurse;
-		}
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
 
 		if (state != path_recurse)
 			break; /* do not recurse into it */
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 41705ec1526..72b6877837b 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -84,10 +84,6 @@ dthree/
 dtwo/
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 
 test_expect_success 'status first time (empty cache)' '
@@ -147,10 +143,10 @@ A  two
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 0
 directory invalidation: 1
-opendir: 1
+opendir: 3
 EOF
 	test_cmp ../trace.expect ../trace
 
@@ -169,10 +165,6 @@ dtwo/
 four
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -194,7 +186,7 @@ A  two
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 1
 directory invalidation: 1
 opendir: 4
@@ -216,15 +208,11 @@ dthree/
 dtwo/
 three
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-three
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_failure 'new info/exclude invalidates everything' '
+test_expect_success 'new info/exclude invalidates everything' '
 	avoid_racy &&
 	echo three >>.git/info/exclude &&
 	: >../trace &&
@@ -240,7 +228,7 @@ A  two
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 1
 directory invalidation: 0
 opendir: 4
@@ -248,7 +236,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'verify untracked cache dump' '
+test_expect_success 'verify untracked cache dump' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -260,9 +248,6 @@ flags 00000006
 dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -277,14 +262,11 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_failure 'status after the move' '
+test_expect_success 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
@@ -298,15 +280,15 @@ A  one
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 0
 directory invalidation: 0
-opendir: 1
+opendir: 3
 EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'verify untracked cache dump' '
+test_expect_success 'verify untracked cache dump' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -319,9 +301,6 @@ dthree/
 dtwo/
 two
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -336,14 +315,11 @@ exclude_per_dir .gitignore
 flags 00000006
 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_failure 'status after the move' '
+test_expect_success 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
@@ -357,15 +333,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 0
 directory invalidation: 0
-opendir: 1
+opendir: 3
 EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'verify untracked cache dump' '
+test_expect_success 'verify untracked cache dump' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -377,9 +353,6 @@ flags 00000006
 dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -392,7 +365,7 @@ test_expect_success 'set up for sparse checkout testing' '
 	git commit -m "first commit"
 '
 
-test_expect_failure 'status after commit' '
+test_expect_success 'status after commit' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
@@ -403,15 +376,15 @@ test_expect_failure 'status after commit' '
 EOF
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 0
 directory invalidation: 0
-opendir: 2
+opendir: 4
 EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'untracked cache correct after commit' '
+test_expect_success 'untracked cache correct after commit' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -423,9 +396,6 @@ flags 00000006
 dthree/
 dtwo/
 /done/ 0000000000000000000000000000000000000000 recurse valid
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
@@ -450,7 +420,7 @@ test_expect_success 'create/modify files, some of which are gitignored' '
 	sync_mtime
 '
 
-test_expect_failure 'test sparse status with untracked cache' '
+test_expect_success 'test sparse status with untracked cache' '
 	: >../trace &&
 	avoid_racy &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -464,15 +434,15 @@ test_expect_failure 'test sparse status with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
+node creation: 2
 gitignore invalidation: 1
 directory invalidation: 2
-opendir: 2
+opendir: 4
 EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'untracked cache correct after status' '
+test_expect_success 'untracked cache correct after status' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -485,14 +455,11 @@ dthree/
 dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect ../actual
 '
 
-test_expect_failure 'test sparse status again with untracked cache' '
+test_expect_success 'test sparse status again with untracked cache' '
 	avoid_racy &&
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -520,7 +487,7 @@ test_expect_success 'set up for test of subdir and sparse checkouts' '
 	echo "sub" > done/sub/sub/file
 '
 
-test_expect_failure 'test sparse status with untracked cache and subdir' '
+test_expect_success 'test sparse status with untracked cache and subdir' '
 	avoid_racy &&
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -543,7 +510,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'verify untracked cache dump (sparse/subdirs)' '
+test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
 	test-tool dump-untracked-cache >../actual &&
 	cat >../expect-from-test-dump <<EOF &&
 info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
@@ -557,18 +524,11 @@ dtwo/
 /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
 five
 sub/
-/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
-sub/
-/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
-file
-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
-two
 EOF
 	test_cmp ../expect-from-test-dump ../actual
 '
 
-test_expect_failure 'test sparse status again with untracked cache and subdir' '
+test_expect_success 'test sparse status again with untracked cache and subdir' '
 	avoid_racy &&
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
@@ -583,7 +543,7 @@ EOF
 	test_cmp ../trace.expect ../trace
 '
 
-test_expect_failure 'move entry in subdir from untracked to cached' '
+test_expect_success 'move entry in subdir from untracked to cached' '
 	git add dtwo/two &&
 	git status --porcelain >../status.actual &&
 	cat >../status.expect <<EOF &&
@@ -597,7 +557,7 @@ EOF
 	test_cmp ../status.expect ../status.actual
 '
 
-test_expect_failure 'move entry in subdir from cached to untracked' '
+test_expect_success 'move entry in subdir from cached to untracked' '
 	git rm --cached dtwo/two &&
 	git status --porcelain >../status.actual &&
 	cat >../status.expect <<EOF &&
@@ -624,7 +584,7 @@ test_expect_success 'git status does not change anything' '
 	test_cmp ../expect-no-uc ../actual
 '
 
-test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' '
+test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' '
 	git config core.untrackedCache true &&
 	test-tool dump-untracked-cache >../actual &&
 	test_cmp ../expect-no-uc ../actual &&
@@ -657,7 +617,7 @@ test_expect_success 'using --untracked-cache does not fail when core.untrackedCa
 	test_cmp ../expect-empty ../actual
 '
 
-test_expect_failure 'setting core.untrackedCache to keep' '
+test_expect_success 'setting core.untrackedCache to keep' '
 	git config core.untrackedCache keep &&
 	git update-index --untracked-cache &&
 	test-tool dump-untracked-cache >../actual &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/7] t7063: correct broken test expectation
  2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
@ 2020-03-26 13:02       ` Derrick Stolee
  2020-03-26 21:18         ` Elijah Newren
  0 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-03-26 13:02 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 3/25/2020 3:31 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> The untracked cache is caching wrong information, resulting in commands
> like `git status --porcelain` producing erroneous answers.  The tests in
> t7063 actually have a wide enough test to catch a relevant case, in
> particular surrounding the directory 'dthree/', but it appears the
> answers were not checked quite closely enough and the tests were coded
> with the wrong expectation.  Once the wrong info got into the cache in
> an early test, since later tests built on it, many others have a wrong
> expectation as well.  This affects just over a third of the tests in
> t7063.

Wow. Good find.

> The error can be seen starting at t7063.12 (the first one switched from
> expect_success to expect_failure in this patch).  That test runs in a
> directory with the following files present:
>   done/one
>   dthree/three
>   dtwo/two
>   four
>   .gitignore
>   one
>   three
>   two
> 
> Of those files, the following files are tracked:
>   done/one
>   one
>   two
> 
> and the contents of .gitignore are:
>   four
> 
> and the contents of .git/info/exclude are:
>   three
> 
> And there is no core.excludesfile.  Therefore, the following should be
> untracked:
>   .gitignore
>   dthree/
>   dtwo/
> Indeed, these three paths are reported if you run
>   git ls-files -o --directory --exclude-standard
> within this directory.  However, 'git status --porcelain' was reporting
> for this test:
>   A  done/one
>   A  one
>   A  two
>   ?? .gitignore
>   ?? dtwo/
> which was clearly wrong -- dthree/ should also be listed as untracked.
> This appears to have been broken since the test was introduced with
> commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08).
> Correct the test to expect the right output, marking the test as failed
> for now.  Make the same change throughout the remainder of the testsuite
> to reflect that dthree/ remains an untracked directory throughout and
> should be recognized as such.

I wonder if we could simultaneously verify these "expected" results match
using another command without the untracked cache? It's good that we have
the expected outputs explicitly, but perhaps double-checking the command
with `-c core.untrackedCache=false` would help us know these are the correct
expected outputs?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache
  2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
@ 2020-03-26 13:13       ` Derrick Stolee
  0 siblings, 0 replies; 68+ messages in thread
From: Derrick Stolee @ 2020-03-26 13:13 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 3/25/2020 3:31 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> dir's read_directory_recursive() naturally operates recursively in order
> to walk the directory tree.  Treating of directories is sometimes weird
> because there are so many different permutations about how to handle
> directories.  Some examples:
> 
>    * 'git ls-files -o --directory' only needs to know that a directory
>      itself is untracked; it doesn't need to recurse into it to see what
>      is underneath.
> 
>    * 'git status' needs to recurse into an untracked directory, but only
>      to determine whether or not it is empty.  If there are no files
>      underneath, the directory itself will be omitted from the output.
>      If it is not empty, only the directory will be listed.
> 
>    * 'git status --ignored' needs to recurse into untracked directories
>      and report all the ignored entries and then report the directory as
>      untracked -- UNLESS all the entries under the directory are
>      ignored, in which case we don't print any of the entries under the
>      directory and just report the directory itself as ignored.  (Note
>      that although this forces us to walk all untracked files underneath
>      the directory as well, we strip them from the output, except for
>      users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.)
> 
>    * For 'git clean', we may need to recurse into a directory that
>      doesn't match any specified pathspecs, if it's possible that there
>      is an entry underneath the directory that can match one of the
>      pathspecs.  In such a case, we need to be careful to omit the
>      directory itself from the list of paths (see commit 404ebceda01c
>      ("dir: also check directories for matching pathspecs", 2019-09-17))
> 
> Part of the tension noted above is that the treatment of a directory can
> change based on the files within it, and based on the various settings
> in dir->flags.  Trying to keep this in mind while reading over the code,
> it is easy to think in terms of "treat_directory() tells us what to do
> with a directory, and read_directory_recursive() is the thing that
> recurses".  Since we need to look into a directory to know how to treat
> it, though, it is quite easy to decide to (also) recurse into the
> directory from treat_directory() by adding a read_directory_recursive()
> call.  Adding such a call is actually fine, IF we make sure that
> read_directory_recursive() does not also recurse into that same
> directory.
> 
> Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
> for ignored files", 2017-05-18), added exactly such a case to the code,
> meaning we'd have two calls to read_directory_recursive() for an
> untracked directory.  So, if we had a file named
>    one/two/three/four/five/somefile.txt
> and nothing in one/ was tracked, then 'git status --ignored' would
> call read_directory_recursive() twice on the directory 'one/', and
> each of those would call read_directory_recursive() twice on the
> directory 'one/two/', and so on until read_directory_recursive() was
> called 2^5 times for 'one/two/three/four/five/'.
> 
> Avoid calling read_directory_recursive() twice per level by moving a
> lot of the special logic into treat_directory().
> 
> Since dir.c is somewhat complex, extra cruft built up around this over
> time.  While trying to unravel it, I noticed several instances where the
> first call to read_directory_recursive() would return e.g.
> path_untracked for some directory and a later one would return e.g.
> path_none, despite the fact that the directory clearly should have been
> considered untracked.  The code happened to work due to the side-effect
> from the first invocation of adding untracked entries to dir->entries;
> this allowed it to get the correct output despite the supposed override
> in return value by the later call.
> 
> I am somewhat concerned that there are still bugs and maybe even
> testcases with the wrong expectation.

For my part, I recently set up draft PRs to test the 'next' branch in
Scalar [1] and VFS for Git [2]. I'll create a Git installer using these
patches as well so I can run our functional test suite for a little extra
check of the behavior here.

[1] https://github.com/microsoft/scalar/pull/354/files
[2] https://github.com/microsoft/VFSForGit/pull/1645

>  I have tried to carefully
> document treat_directory() since it becomes more complex after this
> change (though much of this complexity came from elsewhere that probably
> deserved better comments to begin with).

I do enjoy your warning comments.

> However, much of my work felt
> more like a game of whackamole while attempting to make the code match
> the existing regression tests than an attempt to create an
> implementation that matched some clear design.  That seems wrong to me,
> but the rules of existing behavior had so many special cases that I had
> a hard time coming up with some overarching rules about what correct
> behavior is for all cases, forcing me to hope that the regression tests
> are correct and sufficient.  Such a hope seems likely to be ill-founded,
> given my experience with dir.c-related testcases in the last few months:
> 
>   Examples where the documentation was hard to parse or even just wrong:
>    * 3aca58045f4f (git-clean.txt: do not claim we will delete files with
>                    -n/--dry-run, 2019-09-17)
>    * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
>                    repository, 2019-09-17)
>    * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
>   Examples where testcases were declared wrong and changed:
>    * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
>                    repository, 2019-09-17)
>    * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
>    * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within
>                    leading directories", 2019-12-10)
>   Examples where testcases were clearly inadequate:
>    * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an
>                    ignored file breakage, 2019-08-25)
>    * 7541cc530239 (t7300: add testcases showing failure to clean specified
>                    pathspecs, 2019-09-17)
>    * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item,
>                    2019-09-17)
>    * 404ebceda01c (dir: also check directories for matching pathspecs,
>                    2019-09-17)
>    * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
>                    repository, 2019-09-17)
>    * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
>    * 452efd11fbf6 (t3011: demonstrate directory traversal failures,
>                    2019-12-10)
>    * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19)
>   Examples where "correct behavior" was unclear to everyone:
>     https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
>   Other commits of note:
>    * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17)

Thanks for all of these pointers. These will be helpful if we ever do find
a regression that bisects to this patch.

> However, on the positive side, it does make the code much faster.  For
> the following simple shell loop in an empty repository:
> 
>   for depth in $(seq 10 25)
>   do
>     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
>     rm -rf dir
>     mkdir -p $dirs
>     >$dirs/untracked-file
>     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
>   done
> 
> I saw the following timings, in seconds (note that the numbers are a
> little noisy from run-to-run, but the trend is very clear with every
> run):
> 
>     10: 0.03
>     11: 0.05
>     12: 0.08
>     13: 0.19
>     14: 0.29
>     15: 0.50
>     16: 1.05
>     17: 2.11
>     18: 4.11
>     19: 8.60
>     20: 17.55
>     21: 33.87
>     22: 68.71
>     23: 140.05
>     24: 274.45
>     25: 551.15

These are still impressive.

> For the above run, using strace I can look for the number of untracked
> directories opened and can verify that it matches the expected
> 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth).
> 
> After this fix, with strace I can verify that the number of untracked
> directories that are opened drops to just $depth, and the timings all
> drop to 0.00.  In fact, it isn't until a depth of 190 nested directories
> that it sometimes starts reporting a time of 0.01 seconds and doesn't
> consistently report 0.01 seconds until there are 240 nested directories.
> The previous code would have taken
>   17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
> to have completed the 240 nested directories case.  It's not often
> that you get to speed something up by a factor of 3*10^69.
> 
> Finally, this also fixes the untracked cache, as noted by the test fixes
> in t7063.  Unfortunately, it does so by passing stop_at_first_file to
> close_cached_dir() in order to disable the caching of whether
> directories were empty (this caching was only relevant for directories
> that we knew we didn't need to walk all the entries under but just
> needed to know whether the directory had any entries within it in order
> to know if the directory itself should be marked as path_none or
> path_untracked).  I'm not convinced that disabling the is-the-dir-empty
> check is necessary; there is probably some way to still cache that and
> not get erroneous results.  However, I have not figured out how to do
> so.  If I revert the change to close_cached_dir() in this patch (thus
> continuing to cache cases where stop_at_first_file is true meaning we
> continue to cache whether directories are empty), then the untracked
> cache breakage in t7063 becomes more prevalant.  With my change to
> close_cached_dir() and the other changes to avoid traversing directories
> 2^n times in this patch, I not only avoid making the untracked_cache
> breakage in t7063 worse but actually fix the existing breakage.  Update
> the test results in t7063 to no longer expect check_only cache entries,
> to reflect that we have to do a bit more work in terms of how many
> directories we have to open, and to reflect that we fixed the 1/3 of
> tests that were broken in that testsuite.

We use the untracked cache in Scalar, so we should get some coverage
of that, too.

I'll let you know when the tests are done, and then do a review.

-Stolee

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/7] t7063: correct broken test expectation
  2020-03-26 13:02       ` Derrick Stolee
@ 2020-03-26 21:18         ` Elijah Newren
  0 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren @ 2020-03-26 21:18 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On Thu, Mar 26, 2020 at 6:02 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/25/2020 3:31 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > The untracked cache is caching wrong information, resulting in commands
> > like `git status --porcelain` producing erroneous answers.  The tests in
> > t7063 actually have a wide enough test to catch a relevant case, in
> > particular surrounding the directory 'dthree/', but it appears the
> > answers were not checked quite closely enough and the tests were coded
> > with the wrong expectation.  Once the wrong info got into the cache in
> > an early test, since later tests built on it, many others have a wrong
> > expectation as well.  This affects just over a third of the tests in
> > t7063.
>
> Wow. Good find.

or maybe not...

> > The error can be seen starting at t7063.12 (the first one switched from
> > expect_success to expect_failure in this patch).  That test runs in a
> > directory with the following files present:
> >   done/one
> >   dthree/three
> >   dtwo/two
> >   four
> >   .gitignore
> >   one
> >   three
> >   two
> >
> > Of those files, the following files are tracked:
> >   done/one
> >   one
> >   two
> >
> > and the contents of .gitignore are:
> >   four
> >
> > and the contents of .git/info/exclude are:
> >   three
> >
> > And there is no core.excludesfile.  Therefore, the following should be
> > untracked:
> >   .gitignore
> >   dthree/
> >   dtwo/
> > Indeed, these three paths are reported if you run
> >   git ls-files -o --directory --exclude-standard
> > within this directory.  However, 'git status --porcelain' was reporting
> > for this test:
> >   A  done/one
> >   A  one
> >   A  two
> >   ?? .gitignore
> >   ?? dtwo/
> > which was clearly wrong -- dthree/ should also be listed as untracked.
> > This appears to have been broken since the test was introduced with
> > commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08).
> > Correct the test to expect the right output, marking the test as failed
> > for now.  Make the same change throughout the remainder of the testsuite
> > to reflect that dthree/ remains an untracked directory throughout and
> > should be recognized as such.
>
> I wonder if we could simultaneously verify these "expected" results match
> using another command without the untracked cache? It's good that we have
> the expected outputs explicitly, but perhaps double-checking the command
> with `-c core.untrackedCache=false` would help us know these are the correct
> expected outputs?

This was an *awesome* idea, even if the implementation doesn't quite
work.  It turns out that -c core.untrackedCache=false does not
instruct status to ignore the untracked cache, it instructs status to
delete it. Since we had subsequent tests that depended on the
untrackedCache created in previous tests, this would break a number of
tests.  But I can introduce a helper to workaround that:

# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
# compare commands side-by-side, e.g.
#    iuc status --porcelain >expect &&
#    git status --porcelain >actual &&
#    test_cmp expect actual
iuc() {
        git ls-files -s >../current-index-entries
        git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries

        GIT_INDEX_FILE=.git/tmp_index
        export GIT_INDEX_FILE
        git update-index --index-info <../current-index-entries
        git update-index --skip-worktree $(cat ../current-sparse-entries)

        git -c core.untrackedCache=false "$@"
        ret=$?

        rm ../current-index-entries
        rm $GIT_INDEX_FILE
        unset GIT_INDEX_FILE

        return $ret
}


Doing that helped me discover that the test didn't have a wrong
expectation; I did.  When a directory that is not tracked is filled
entirely with files that are ignored, then status --porcelain treats
the directory itself as ignored...and thus doesn't display it.  (`git
status --porcelain --ignored` will show it).  I had seen that
somewhere, but hadn't fully understood the check_only and
stop_at_first_file pieces related to it.  Anyway, with this helpful
hint:

  * I can say that there was not a bug in the untracked cache (at
least not any that I'm aware of)
  * I can update my first patch to do more thorough checking instead
of changing the expectation
  * I found the bug in my final patch that had been evading me
  * I added a huge comment explaining check_only and
stop_at_first_file, how they're used, and what they mean for the
future reader
  * I also no longer need to partially disable the untracked cache in
my changes.

New patches incoming...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
@ 2020-03-26 21:27     ` Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
                         ` (8 more replies)
  7 siblings, 9 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren

This series provides some "modest" speedups (see last commit message), and
should allow 'git status --ignored' to complete in a more reasonable
timeframe for Martin Melka (see 
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
).

Changes since v3:

 * Turns out I was wrong about the untracked cache stuff and had some bugs
   around untracked directories with nothing bug ignored sub-entries.
 * First patch now is no longer a change of expectation of the untracked
   cache, but some more thorough testing/verification in that test that
   helped explain my misunderstanding and uncover the bug in my refactor.
 * Corrected the check_only and stop_at_first_file logic in the last patch
   and added a big comment explaining how/why it all works. Also stopped
   disabling part of the untracked cache in the same patch, and undid all
   the changes to t7063 in that patch.

Stuff still missing from v4:

 * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in 
   https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ 
   which I think would make the code cleaner & clearer. I guess I'm leaving
   that for future work.

As per the commit message of the final patch, this series has some risk.
Extra eyes would be greatly appreciated; one pair already helped me find one
bug. Also, we should probably merge it early in some cycle, either this one
or a later one.

Derrick Stolee (1):
  dir: refactor treat_directory to clarify control flow

Elijah Newren (6):
  t7063: more thorough status checking
  dir: fix simple typo in comment
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: replace exponential algorithm with a linear one

 dir.c                             | 349 ++++++++++++++++++------------
 t/t7063-status-untracked-cache.sh |  52 +++++
 2 files changed, 258 insertions(+), 143 deletions(-)


base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v4
Pull-Request: https://github.com/git/git/pull/700

Range-diff vs v3:

 1:  d4fe5d33577 ! 1:  752403e339b t7063: correct broken test expectation
     @@ -1,61 +1,23 @@
      Author: Elijah Newren <newren@gmail.com>
      
     -    t7063: correct broken test expectation
     +    t7063: more thorough status checking
      
     -    The untracked cache is caching wrong information, resulting in commands
     -    like `git status --porcelain` producing erroneous answers.  The tests in
     -    t7063 actually have a wide enough test to catch a relevant case, in
     -    particular surrounding the directory 'dthree/', but it appears the
     -    answers were not checked quite closely enough and the tests were coded
     -    with the wrong expectation.  Once the wrong info got into the cache in
     -    an early test, since later tests built on it, many others have a wrong
     -    expectation as well.  This affects just over a third of the tests in
     -    t7063.
     +    It turns out the t7063 has some testcases that even without using the
     +    untracked cache cover situations that nothing else in the testsuite
     +    handles.  Checking the results of
     +      git status --porcelain
     +    both with and without the untracked cache, and comparing both against
     +    our expected results helped uncover a critical bug in some dir.c
     +    restructuring.
      
     -    The error can be seen starting at t7063.12 (the first one switched from
     -    expect_success to expect_failure in this patch).  That test runs in a
     -    directory with the following files present:
     -      done/one
     -      dthree/three
     -      dtwo/two
     -      four
     -      .gitignore
     -      one
     -      three
     -      two
     +    Unfortunately, it's not easy to run status and tell it to ignore the
     +    untracked cache; the only knob we have it to instruct it to *delete*
     +    (and ignore) the untracked cache.
      
     -    Of those files, the following files are tracked:
     -      done/one
     -      one
     -      two
     -
     -    and the contents of .gitignore are:
     -      four
     -
     -    and the contents of .git/info/exclude are:
     -      three
     -
     -    And there is no core.excludesfile.  Therefore, the following should be
     -    untracked:
     -      .gitignore
     -      dthree/
     -      dtwo/
     -    Indeed, these three paths are reported if you run
     -      git ls-files -o --directory --exclude-standard
     -    within this directory.  However, 'git status --porcelain' was reporting
     -    for this test:
     -      A  done/one
     -      A  one
     -      A  two
     -      ?? .gitignore
     -      ?? dtwo/
     -    which was clearly wrong -- dthree/ should also be listed as untracked.
     -    This appears to have been broken since the test was introduced with
     -    commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08).
     -    Correct the test to expect the right output, marking the test as failed
     -    for now.  Make the same change throughout the remainder of the testsuite
     -    to reflect that dthree/ remains an untracked directory throughout and
     -    should be recognized as such.
     +    Create a simple helper that will create a clone of the index that is
     +    missing the untracked cache bits, and use it to compare that the results
     +    with the untracked cache match the results we get without the untracked
     +    cache.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ -63,279 +25,230 @@
       --- a/t/t7063-status-untracked-cache.sh
       +++ b/t/t7063-status-untracked-cache.sh
      @@
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_success 'new info/exclude invalidates everything' '
     -+test_expect_failure 'new info/exclude invalidates everything' '
     - 	avoid_racy &&
     - 	echo three >>.git/info/exclude &&
     + 	test_must_be_empty ../status.actual
     + }
     + 
     ++# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
     ++# compare commands side-by-side, e.g.
     ++#    iuc status --porcelain >expect &&
     ++#    git status --porcelain >actual &&
     ++#    test_cmp expect actual
     ++iuc() {
     ++	git ls-files -s >../current-index-entries
     ++	git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries
     ++
     ++	GIT_INDEX_FILE=.git/tmp_index
     ++	export GIT_INDEX_FILE
     ++	git update-index --index-info <../current-index-entries
     ++	git update-index --skip-worktree $(cat ../current-sparse-entries)
     ++
     ++	git -c core.untrackedCache=false "$@"
     ++	ret=$?
     ++
     ++	rm ../current-index-entries
     ++	rm $GIT_INDEX_FILE
     ++	unset GIT_INDEX_FILE
     ++
     ++	return $ret
     ++}
     ++
     + test_lazy_prereq UNTRACKED_CACHE '
     + 	{ git update-index --test-untracked-cache; ret=$?; } &&
     + 	test $ret -ne 1
     +@@
       	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     ++	test_cmp ../status.expect ../status.iuc &&
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 3
     +@@
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     ++	test_cmp ../status.expect ../status.iuc &&
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
     + A  done/one
       A  one
     - A  two
     - ?? .gitignore
     -+?? dthree/
     - ?? dtwo/
     +@@
     + ?? four
     + ?? three
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'verify untracked cache dump' '
     -+test_expect_failure 'verify untracked cache dump' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
     + A  done/one
     + A  one
      @@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     - .gitignore
     -+dthree/
     - dtwo/
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     + ?? dtwo/
     + ?? three
     + EOF
     ++	test_cmp ../status.expect ../status.iuc &&
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_success 'status after the move' '
     -+test_expect_failure 'status after the move' '
       	: >../trace &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
       	git status --porcelain >../actual &&
     -@@
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
       A  done/one
       A  one
     +@@
       ?? .gitignore
     -+?? dthree/
       ?? dtwo/
     - ?? two
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'verify untracked cache dump' '
     -+test_expect_failure 'verify untracked cache dump' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     + 	: >../trace &&
     + 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
     + A  done/one
     + A  one
      @@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     - .gitignore
     -+dthree/
     - dtwo/
     - two
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     + ?? dtwo/
     + ?? two
     + EOF
     ++	test_cmp ../status.expect ../status.iuc &&
     + 	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_success 'status after the move' '
     -+test_expect_failure 'status after the move' '
       	: >../trace &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
       	git status --porcelain >../actual &&
     -@@
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
     + A  done/one
       A  one
     - A  two
     +@@
       ?? .gitignore
     -+?? dthree/
       ?? dtwo/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'verify untracked cache dump' '
     -+test_expect_failure 'verify untracked cache dump' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     - .gitignore
     -+dthree/
     - dtwo/
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     -@@
     - 	git commit -m "first commit"
     - '
     - 
     --test_expect_success 'status after commit' '
     -+test_expect_failure 'status after commit' '
       	: >../trace &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
       	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
       	cat >../status.expect <<EOF &&
       ?? .gitignore
     -+?? dthree/
       ?? dtwo/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'untracked cache correct after commit' '
     -+test_expect_failure 'untracked cache correct after commit' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     - .gitignore
     -+dthree/
     - dtwo/
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     -@@
     - 	sync_mtime
     - '
     - 
     --test_expect_success 'test sparse status with untracked cache' '
     -+test_expect_failure 'test sparse status with untracked cache' '
     - 	: >../trace &&
       	avoid_racy &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     + 	git status --porcelain >../status.actual &&
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
        M done/two
       ?? .gitignore
       ?? done/five
     -+?? dthree/
       ?? dtwo/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../status.actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'untracked cache correct after status' '
     -+test_expect_failure 'untracked cache correct after status' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     - .gitignore
     -+dthree/
     - dtwo/
     - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     - five
     -@@
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_success 'test sparse status again with untracked cache' '
     -+test_expect_failure 'test sparse status again with untracked cache' '
     - 	avoid_racy &&
       	: >../trace &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     + 	git status --porcelain >../status.actual &&
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
        M done/two
       ?? .gitignore
       ?? done/five
     -+?? dthree/
       ?? dtwo/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../status.actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	echo "sub" > done/sub/sub/file
     - '
     - 
     --test_expect_success 'test sparse status with untracked cache and subdir' '
     -+test_expect_failure 'test sparse status with untracked cache and subdir' '
     - 	avoid_racy &&
       	: >../trace &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     + 	git status --porcelain >../status.actual &&
     ++	iuc status --porcelain >../status.iuc &&
     + 	cat >../status.expect <<EOF &&
     +  M done/two
       ?? .gitignore
     - ?? done/five
     +@@
       ?? done/sub/
     -+?? dthree/
       ?? dtwo/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../status.actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 2
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
     -+test_expect_failure 'verify untracked cache dump (sparse/subdirs)' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect-from-test-dump <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
     - .gitignore
     -+dthree/
     - dtwo/
     - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     - five
     -@@
     - 	test_cmp ../expect-from-test-dump ../actual
     - '
     - 
     --test_expect_success 'test sparse status again with untracked cache and subdir' '
     -+test_expect_failure 'test sparse status again with untracked cache and subdir' '
     - 	avoid_racy &&
       	: >../trace &&
       	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     + 	git status --porcelain >../status.actual &&
     ++	iuc status --porcelain >../status.iuc &&
     ++	test_cmp ../status.expect ../status.iuc &&
     + 	test_cmp ../status.expect ../status.actual &&
     + 	cat >../trace.expect <<EOF &&
     + node creation: 0
      @@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_success 'move entry in subdir from untracked to cached' '
     -+test_expect_failure 'move entry in subdir from untracked to cached' '
     + test_expect_success 'move entry in subdir from untracked to cached' '
       	git add dtwo/two &&
       	git status --porcelain >../status.actual &&
     ++	iuc status --porcelain >../status.iuc &&
       	cat >../status.expect <<EOF &&
     +  M done/two
     + A  dtwo/two
      @@
     - ?? .gitignore
       ?? done/five
       ?? done/sub/
     -+?? dthree/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../status.actual
       '
       
     --test_expect_success 'move entry in subdir from cached to untracked' '
     -+test_expect_failure 'move entry in subdir from cached to untracked' '
     + test_expect_success 'move entry in subdir from cached to untracked' '
       	git rm --cached dtwo/two &&
       	git status --porcelain >../status.actual &&
     ++	iuc status --porcelain >../status.iuc &&
       	cat >../status.expect <<EOF &&
     -@@
     +  M done/two
       ?? .gitignore
     - ?? done/five
     +@@
       ?? done/sub/
     -+?? dthree/
       ?? dtwo/
       EOF
     ++	test_cmp ../status.expect ../status.iuc &&
       	test_cmp ../status.expect ../status.actual
     -@@
     - 	test_cmp ../expect-no-uc ../actual
     - '
     - 
     --test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' '
     -+test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' '
     - 	git config core.untrackedCache true &&
     - 	test-tool dump-untracked-cache >../actual &&
     - 	test_cmp ../expect-no-uc ../actual &&
     -@@
     - 	test_cmp ../expect-empty ../actual
       '
       
     --test_expect_success 'setting core.untrackedCache to keep' '
     -+test_expect_failure 'setting core.untrackedCache to keep' '
     - 	git config core.untrackedCache keep &&
     - 	git update-index --untracked-cache &&
     - 	test-tool dump-untracked-cache >../actual &&
 2:  b20bc3b9afd = 2:  a4287d690be dir: fix simple typo in comment
 3:  fa9035949e0 = 3:  48f37e5b114 dir: consolidate treat_path() and treat_one_path()
 4:  02e652d1869 = 4:  b5ad1939379 dir: fix broken comment
 5:  705c008d993 = 5:  2603c1a9d13 dir: fix confusion based on variable tense
 6:  f5d69102946 = 6:  576f364329d dir: refactor treat_directory to clarify control flow
 7:  6cfca619e2c ! 7:  e20525429e5 dir: replace exponential algorithm with a linear one, fix untracked cache
     @@ -1,6 +1,6 @@
      Author: Elijah Newren <newren@gmail.com>
      
     -    dir: replace exponential algorithm with a linear one, fix untracked cache
     +    dir: replace exponential algorithm with a linear one
      
          dir's read_directory_recursive() naturally operates recursively in order
          to walk the directory tree.  Treating of directories is sometimes weird
     @@ -161,28 +161,6 @@
          to have completed the 240 nested directories case.  It's not often
          that you get to speed something up by a factor of 3*10^69.
      
     -    Finally, this also fixes the untracked cache, as noted by the test fixes
     -    in t7063.  Unfortunately, it does so by passing stop_at_first_file to
     -    close_cached_dir() in order to disable the caching of whether
     -    directories were empty (this caching was only relevant for directories
     -    that we knew we didn't need to walk all the entries under but just
     -    needed to know whether the directory had any entries within it in order
     -    to know if the directory itself should be marked as path_none or
     -    path_untracked).  I'm not convinced that disabling the is-the-dir-empty
     -    check is necessary; there is probably some way to still cache that and
     -    not get erroneous results.  However, I have not figured out how to do
     -    so.  If I revert the change to close_cached_dir() in this patch (thus
     -    continuing to cache cases where stop_at_first_file is true meaning we
     -    continue to cache whether directories are empty), then the untracked
     -    cache breakage in t7063 becomes more prevalant.  With my change to
     -    close_cached_dir() and the other changes to avoid traversing directories
     -    2^n times in this patch, I not only avoid making the untracked_cache
     -    breakage in t7063 worse but actually fix the existing breakage.  Update
     -    the test results in t7063 to no longer expect check_only cache entries,
     -    to reflect that we have to do a bit more work in terms of how many
     -    directories we have to open, and to reflect that we fixed the 1/3 of
     -    tests that were broken in that testsuite.
     -
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       diff --git a/dir.c b/dir.c
     @@ -199,7 +177,7 @@
      +	 *          you CAN'T DO BOTH.
      +	 */
      +	enum path_treatment state;
     -+	int nested_repo = 0, old_ignored_nr, stop_early;
     ++	int nested_repo = 0, old_ignored_nr, check_only, stop_early;
       	/* The "len-1" is to strip the final '/' */
       	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
       
     @@ -220,16 +198,32 @@
       		return excluded ? path_excluded : path_untracked;
       
      +	/*
     -+	 * If we only want to determine if dirname is empty, then we can
     -+	 * stop at the first file we find underneath that directory rather
     -+	 * than continuing to recurse beyond it.  If DIR_SHOW_IGNORED_TOO
     -+	 * is set, then we want MORE than just determining if dirname is
     -+	 * empty.
     ++	 * If we have we don't want to know the all the paths under an
     ++	 * untracked or ignored directory, we still need to go into the
     ++	 * directory to determine if it is empty (because an empty directory
     ++	 * should be path_none instead of path_excluded or path_untracked).
      +	 */
     -+	stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
     ++	check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
      +		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
      +
      +	/*
     ++	 * However, there's another optimization possible as a subset of
     ++	 * check_only, based on the cases we have to consider:
     ++	 *   A) Directory matches no exclude patterns:
     ++	 *     * Directory is empty => path_none
     ++	 *     * Directory has an untracked file under it => path_untracked
     ++	 *     * Directory has only ignored files under it => path_excluded
     ++	 *   B) Directory matches an exclude pattern:
     ++	 *     * Directory is empty => path_none
     ++	 *     * Directory has an untracked file under it => path_excluded
     ++	 *     * Directory has only ignored files under it => path_excluded
     ++	 * In case A, we can exit as soon as we've found an untracked
     ++	 * file but otherwise have to walk all files.  In case B, though,
     ++	 * we can stop at the first file we find under the directory.
     ++	 */
     ++	stop_early = check_only && excluded;
     ++
     ++	/*
      +	 * If /every/ file within an untracked directory is ignored, then
      +	 * we want to treat the directory as ignored (for e.g. status
      +	 * --porcelain), without listing the individual ignored files
     @@ -243,7 +237,7 @@
       	untracked = lookup_untracked(dir->untracked, untracked,
       				     dirname + baselen, len - baselen);
      +	state = read_directory_recursive(dir, istate, dirname, len, untracked,
     -+					 stop_early, stop_early, pathspec);
     ++					 check_only, stop_early, pathspec);
      +
      +	/* There are a variety of reasons we may need to fixup the state... */
      +	if (state == path_excluded) {
     @@ -281,25 +275,25 @@
      +			dir->ignored_nr = old_ignored_nr;
      +		}
      +	}
     - 
     - 	/*
     --	 * If this is an excluded directory, then we only need to check if
     --	 * the directory contains any files.
     ++
     ++	/*
      +	 * If there is nothing under the current directory and we are not
      +	 * hiding empty directories, then we need to report on the
      +	 * untracked or ignored status of the directory itself.
     - 	 */
     --	return read_directory_recursive(dir, istate, dirname, len,
     --					untracked, 1, excluded, pathspec);
     ++	 */
      +	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
      +		state = excluded ? path_excluded : path_untracked;
     -+
     -+	/*
     + 
     + 	/*
     +-	 * If this is an excluded directory, then we only need to check if
     +-	 * the directory contains any files.
      +	 * We can recurse into untracked directories that don't match any
      +	 * of the given pathspecs when some file underneath the directory
      +	 * might match one of the pathspecs.  If so, we should make sure
      +	 * to note that the directory itself did not match.
     -+	 */
     + 	 */
     +-	return read_directory_recursive(dir, istate, dirname, len,
     +-					untracked, 1, excluded, pathspec);
      +	if (pathspec &&
      +	    !match_pathspec(istate, pathspec, dirname, len,
      +			    0 /* prefix */, NULL,
     @@ -322,24 +316,6 @@
       	strbuf_setlen(path, baselen);
       	if (!cdir->ucd) {
       		strbuf_addstr(path, cdir->file);
     -@@
     - 	return -1;
     - }
     - 
     --static void close_cached_dir(struct cached_dir *cdir)
     -+static void close_cached_dir(struct cached_dir *cdir, int stop_at_first_file)
     - {
     - 	if (cdir->fdir)
     - 		closedir(cdir->fdir);
     -@@
     - 	 * We have gone through this directory and found no untracked
     - 	 * entries. Mark it valid.
     - 	 */
     --	if (cdir->untracked) {
     -+	if (!stop_at_first_file && cdir->untracked) {
     - 		cdir->untracked->valid = 1;
     - 		cdir->untracked->recurse = 1;
     - 	}
      @@
       	int stop_at_first_file, const struct pathspec *pathspec)
       {
     @@ -373,15 +349,6 @@
       			struct untracked_cache_dir *ud;
       			ud = lookup_untracked(dir->untracked, untracked,
       					      path.buf + baselen,
     -@@
     - 						    istate, &path, baselen,
     - 						    pathspec, state);
     - 	}
     --	close_cached_dir(&cdir);
     -+	close_cached_dir(&cdir, stop_at_first_file);
     -  out:
     - 	strbuf_release(&path);
     - 
      @@
       			      const char *path, int len,
       			      const struct pathspec *pathspec)
     @@ -423,342 +390,3 @@
       
       		if (state != path_recurse)
       			break; /* do not recurse into it */
     -
     - diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
     - --- a/t/t7063-status-untracked-cache.sh
     - +++ b/t/t7063-status-untracked-cache.sh
     -@@
     - dtwo/
     - three
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --three
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 
     - test_expect_success 'status first time (empty cache)' '
     -@@
     - EOF
     - 	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 0
     - directory invalidation: 1
     --opendir: 1
     -+opendir: 3
     - EOF
     - 	test_cmp ../trace.expect ../trace
     - 
     -@@
     - four
     - three
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --three
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     -@@
     - EOF
     - 	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 1
     - directory invalidation: 1
     - opendir: 4
     -@@
     - dtwo/
     - three
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --three
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_failure 'new info/exclude invalidates everything' '
     -+test_expect_success 'new info/exclude invalidates everything' '
     - 	avoid_racy &&
     - 	echo three >>.git/info/exclude &&
     - 	: >../trace &&
     -@@
     - EOF
     - 	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 1
     - directory invalidation: 0
     - opendir: 4
     -@@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'verify untracked cache dump' '
     -+test_expect_success 'verify untracked cache dump' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - dthree/
     - dtwo/
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     -@@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_failure 'status after the move' '
     -+test_expect_success 'status after the move' '
     - 	: >../trace &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     - 	git status --porcelain >../actual &&
     -@@
     - EOF
     - 	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 0
     - directory invalidation: 0
     --opendir: 1
     -+opendir: 3
     - EOF
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'verify untracked cache dump' '
     -+test_expect_success 'verify untracked cache dump' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - dtwo/
     - two
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     -@@
     - flags 00000006
     - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_failure 'status after the move' '
     -+test_expect_success 'status after the move' '
     - 	: >../trace &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     - 	git status --porcelain >../actual &&
     -@@
     - EOF
     - 	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 0
     - directory invalidation: 0
     --opendir: 1
     -+opendir: 3
     - EOF
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'verify untracked cache dump' '
     -+test_expect_success 'verify untracked cache dump' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - dthree/
     - dtwo/
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     -@@
     - 	git commit -m "first commit"
     - '
     - 
     --test_expect_failure 'status after commit' '
     -+test_expect_success 'status after commit' '
     - 	: >../trace &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     - 	git status --porcelain >../actual &&
     -@@
     - EOF
     - 	test_cmp ../status.expect ../actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 0
     - directory invalidation: 0
     --opendir: 2
     -+opendir: 4
     - EOF
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'untracked cache correct after commit' '
     -+test_expect_success 'untracked cache correct after commit' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - dthree/
     - dtwo/
     - /done/ 0000000000000000000000000000000000000000 recurse valid
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     -@@
     - 	sync_mtime
     - '
     - 
     --test_expect_failure 'test sparse status with untracked cache' '
     -+test_expect_success 'test sparse status with untracked cache' '
     - 	: >../trace &&
     - 	avoid_racy &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     - EOF
     - 	test_cmp ../status.expect ../status.actual &&
     - 	cat >../trace.expect <<EOF &&
     --node creation: 0
     -+node creation: 2
     - gitignore invalidation: 1
     - directory invalidation: 2
     --opendir: 2
     -+opendir: 4
     - EOF
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'untracked cache correct after status' '
     -+test_expect_success 'untracked cache correct after status' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - dtwo/
     - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     - five
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect ../actual
     - '
     - 
     --test_expect_failure 'test sparse status again with untracked cache' '
     -+test_expect_success 'test sparse status again with untracked cache' '
     - 	avoid_racy &&
     - 	: >../trace &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     - 	echo "sub" > done/sub/sub/file
     - '
     - 
     --test_expect_failure 'test sparse status with untracked cache and subdir' '
     -+test_expect_success 'test sparse status with untracked cache and subdir' '
     - 	avoid_racy &&
     - 	: >../trace &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'verify untracked cache dump (sparse/subdirs)' '
     -+test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
     - 	test-tool dump-untracked-cache >../actual &&
     - 	cat >../expect-from-test-dump <<EOF &&
     - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
     -@@
     - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid
     - five
     - sub/
     --/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     --sub/
     --/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid
     --file
     --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
     --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
     --two
     - EOF
     - 	test_cmp ../expect-from-test-dump ../actual
     - '
     - 
     --test_expect_failure 'test sparse status again with untracked cache and subdir' '
     -+test_expect_success 'test sparse status again with untracked cache and subdir' '
     - 	avoid_racy &&
     - 	: >../trace &&
     - 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
     -@@
     - 	test_cmp ../trace.expect ../trace
     - '
     - 
     --test_expect_failure 'move entry in subdir from untracked to cached' '
     -+test_expect_success 'move entry in subdir from untracked to cached' '
     - 	git add dtwo/two &&
     - 	git status --porcelain >../status.actual &&
     - 	cat >../status.expect <<EOF &&
     -@@
     - 	test_cmp ../status.expect ../status.actual
     - '
     - 
     --test_expect_failure 'move entry in subdir from cached to untracked' '
     -+test_expect_success 'move entry in subdir from cached to untracked' '
     - 	git rm --cached dtwo/two &&
     - 	git status --porcelain >../status.actual &&
     - 	cat >../status.expect <<EOF &&
     -@@
     - 	test_cmp ../expect-no-uc ../actual
     - '
     - 
     --test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' '
     -+test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' '
     - 	git config core.untrackedCache true &&
     - 	test-tool dump-untracked-cache >../actual &&
     - 	test_cmp ../expect-no-uc ../actual &&
     -@@
     - 	test_cmp ../expect-empty ../actual
     - '
     - 
     --test_expect_failure 'setting core.untrackedCache to keep' '
     -+test_expect_success 'setting core.untrackedCache to keep' '
     - 	git config core.untrackedCache keep &&
     - 	git update-index --untracked-cache &&
     - 	test-tool dump-untracked-cache >../actual &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 1/7] t7063: more thorough status checking
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
@ 2020-03-26 21:27       ` Elijah Newren via GitGitGadget
  2020-03-27 13:09         ` Derrick Stolee
  2020-03-26 21:27       ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

It turns out the t7063 has some testcases that even without using the
untracked cache cover situations that nothing else in the testsuite
handles.  Checking the results of
  git status --porcelain
both with and without the untracked cache, and comparing both against
our expected results helped uncover a critical bug in some dir.c
restructuring.

Unfortunately, it's not easy to run status and tell it to ignore the
untracked cache; the only knob we have it to instruct it to *delete*
(and ignore) the untracked cache.

Create a simple helper that will create a clone of the index that is
missing the untracked cache bits, and use it to compare that the results
with the untracked cache match the results we get without the untracked
cache.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 190ae149cf3..156d06c34e8 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -30,6 +30,30 @@ status_is_clean() {
 	test_must_be_empty ../status.actual
 }
 
+# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
+# compare commands side-by-side, e.g.
+#    iuc status --porcelain >expect &&
+#    git status --porcelain >actual &&
+#    test_cmp expect actual
+iuc() {
+	git ls-files -s >../current-index-entries
+	git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries
+
+	GIT_INDEX_FILE=.git/tmp_index
+	export GIT_INDEX_FILE
+	git update-index --index-info <../current-index-entries
+	git update-index --skip-worktree $(cat ../current-sparse-entries)
+
+	git -c core.untrackedCache=false "$@"
+	ret=$?
+
+	rm ../current-index-entries
+	rm $GIT_INDEX_FILE
+	unset GIT_INDEX_FILE
+
+	return $ret
+}
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -95,6 +119,8 @@ test_expect_success 'status first time (empty cache)' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 3
@@ -115,6 +141,8 @@ test_expect_success 'status second time (fully populated cache)' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -136,6 +164,7 @@ test_expect_success 'modify in root directory, one dir invalidation' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -145,6 +174,7 @@ A  two
 ?? four
 ?? three
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -183,6 +213,7 @@ test_expect_success 'new .gitignore invalidates recursively' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -192,6 +223,7 @@ A  two
 ?? dtwo/
 ?? three
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -230,6 +262,7 @@ test_expect_success 'new info/exclude invalidates everything' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -237,6 +270,7 @@ A  two
 ?? .gitignore
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -286,6 +320,7 @@ test_expect_success 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -293,6 +328,7 @@ A  one
 ?? dtwo/
 ?? two
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -343,6 +379,7 @@ test_expect_success 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -350,6 +387,7 @@ A  two
 ?? .gitignore
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -390,10 +428,12 @@ test_expect_success 'status after commit' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 ?? .gitignore
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -447,12 +487,14 @@ test_expect_success 'test sparse status with untracked cache' '
 	avoid_racy &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
 ?? done/five
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -487,12 +529,14 @@ test_expect_success 'test sparse status again with untracked cache' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
 ?? done/five
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -514,6 +558,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
@@ -521,6 +566,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 ?? done/sub/
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 2
@@ -560,6 +606,8 @@ test_expect_success 'test sparse status again with untracked cache and subdir' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -573,6 +621,7 @@ EOF
 test_expect_success 'move entry in subdir from untracked to cached' '
 	git add dtwo/two &&
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 A  dtwo/two
@@ -580,12 +629,14 @@ A  dtwo/two
 ?? done/five
 ?? done/sub/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual
 '
 
 test_expect_success 'move entry in subdir from cached to untracked' '
 	git rm --cached dtwo/two &&
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
@@ -593,6 +644,7 @@ test_expect_success 'move entry in subdir from cached to untracked' '
 ?? done/sub/
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 2/7] dir: fix simple typo in comment
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
@ 2020-03-26 21:27       ` Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index b460211e614..b505ba747bb 100644
--- a/dir.c
+++ b/dir.c
@@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir,
  * If 'stop_at_first_file' is specified, 'path_excluded' is returned
  * to signal that a file was found. This is the least significant value that
  * indicates that a file was encountered that does not depend on the order of
- * whether an untracked or exluded path was encountered first.
+ * whether an untracked or excluded path was encountered first.
  *
  * Returns the most significant path_treatment value encountered in the scan.
  * If 'stop_at_first_file' is specified, `path_excluded` is the most
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path()
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
@ 2020-03-26 21:27       ` Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 16e2cfa90993 ("read_directory(): further split treat_path()",
2010-01-08) split treat_one_path() out of treat_path(), because
treat_leading_path() would not have access to a dirent but wanted to
re-use as much of treat_path() as possible.  Not re-using all of
treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir:
fix checks on common prefix directory", 2019-12-19).  Finally, in commit
ad6f2157f951 ("dir: restructure in a way to avoid passing around a
struct dirent", 2020-01-16), dirents were removed from treat_path() and
other functions entirely.

Since the only reason for splitting these functions was the lack of a
dirent -- which no longer applies to either function -- and since the
split caused problems in the past resulting in us not using
treat_one_path() separately anymore, just undo the split.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 121 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 55 insertions(+), 66 deletions(-)

diff --git a/dir.c b/dir.c
index b505ba747bb..d0f3d660850 100644
--- a/dir.c
+++ b/dir.c
@@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate,
 	return dtype;
 }
 
-static enum path_treatment treat_one_path(struct dir_struct *dir,
-					  struct untracked_cache_dir *untracked,
-					  struct index_state *istate,
-					  struct strbuf *path,
-					  int baselen,
-					  const struct pathspec *pathspec,
-					  int dtype)
-{
-	int exclude;
-	int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case);
+static enum path_treatment treat_path_fast(struct dir_struct *dir,
+					   struct untracked_cache_dir *untracked,
+					   struct cached_dir *cdir,
+					   struct index_state *istate,
+					   struct strbuf *path,
+					   int baselen,
+					   const struct pathspec *pathspec)
+{
+	strbuf_setlen(path, baselen);
+	if (!cdir->ucd) {
+		strbuf_addstr(path, cdir->file);
+		return path_untracked;
+	}
+	strbuf_addstr(path, cdir->ucd->name);
+	/* treat_one_path() does this before it calls treat_directory() */
+	strbuf_complete(path, '/');
+	if (cdir->ucd->check_only)
+		/*
+		 * check_only is set as a result of treat_directory() getting
+		 * to its bottom. Verify again the same set of directories
+		 * with check_only set.
+		 */
+		return read_directory_recursive(dir, istate, path->buf, path->len,
+						cdir->ucd, 1, 0, pathspec);
+	/*
+	 * We get path_recurse in the first run when
+	 * directory_exists_in_index() returns index_nonexistent. We
+	 * are sure that new changes in the index does not impact the
+	 * outcome. Return now.
+	 */
+	return path_recurse;
+}
+
+static enum path_treatment treat_path(struct dir_struct *dir,
+				      struct untracked_cache_dir *untracked,
+				      struct cached_dir *cdir,
+				      struct index_state *istate,
+				      struct strbuf *path,
+				      int baselen,
+				      const struct pathspec *pathspec)
+{
+	int has_path_in_index, dtype, exclude;
 	enum path_treatment path_treatment;
 
-	dtype = resolve_dtype(dtype, istate, path->buf, path->len);
+	if (!cdir->d_name)
+		return treat_path_fast(dir, untracked, cdir, istate, path,
+				       baselen, pathspec);
+	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
+		return path_none;
+	strbuf_setlen(path, baselen);
+	strbuf_addstr(path, cdir->d_name);
+	if (simplify_away(path->buf, path->len, pathspec))
+		return path_none;
+
+	dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len);
 
 	/* Always exclude indexed files */
+	has_path_in_index = !!index_file_exists(istate, path->buf, path->len,
+						ignore_case);
 	if (dtype != DT_DIR && has_path_in_index)
 		return path_none;
 
@@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	}
 }
 
-static enum path_treatment treat_path_fast(struct dir_struct *dir,
-					   struct untracked_cache_dir *untracked,
-					   struct cached_dir *cdir,
-					   struct index_state *istate,
-					   struct strbuf *path,
-					   int baselen,
-					   const struct pathspec *pathspec)
-{
-	strbuf_setlen(path, baselen);
-	if (!cdir->ucd) {
-		strbuf_addstr(path, cdir->file);
-		return path_untracked;
-	}
-	strbuf_addstr(path, cdir->ucd->name);
-	/* treat_one_path() does this before it calls treat_directory() */
-	strbuf_complete(path, '/');
-	if (cdir->ucd->check_only)
-		/*
-		 * check_only is set as a result of treat_directory() getting
-		 * to its bottom. Verify again the same set of directories
-		 * with check_only set.
-		 */
-		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, 0, pathspec);
-	/*
-	 * We get path_recurse in the first run when
-	 * directory_exists_in_index() returns index_nonexistent. We
-	 * are sure that new changes in the index does not impact the
-	 * outcome. Return now.
-	 */
-	return path_recurse;
-}
-
-static enum path_treatment treat_path(struct dir_struct *dir,
-				      struct untracked_cache_dir *untracked,
-				      struct cached_dir *cdir,
-				      struct index_state *istate,
-				      struct strbuf *path,
-				      int baselen,
-				      const struct pathspec *pathspec)
-{
-	if (!cdir->d_name)
-		return treat_path_fast(dir, untracked, cdir, istate, path,
-				       baselen, pathspec);
-	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
-		return path_none;
-	strbuf_setlen(path, baselen);
-	strbuf_addstr(path, cdir->d_name);
-	if (simplify_away(path->buf, path->len, pathspec))
-		return path_none;
-
-	return treat_one_path(dir, untracked, istate, path, baselen, pathspec,
-			      cdir->d_type);
-}
-
 static void add_untracked(struct untracked_cache_dir *dir, const char *name)
 {
 	if (!dir)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 4/7] dir: fix broken comment
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (2 preceding siblings ...)
  2020-03-26 21:27       ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
@ 2020-03-26 21:27       ` Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d0f3d660850..3a367683661 100644
--- a/dir.c
+++ b/dir.c
@@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 					add_untracked(untracked, path.buf + baselen);
 				break;
 			}
-			/* skip the dir_add_* part */
+			/* skip the add_path_to_appropriate_result_list() */
 			continue;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 5/7] dir: fix confusion based on variable tense
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (3 preceding siblings ...)
  2020-03-26 21:27       ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
@ 2020-03-26 21:27       ` Elijah Newren via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Despite having contributed several fixes in this area, I have for months
(years?) assumed that the "exclude" variable was a directive; this
caused me to think of it as a different mode we operate in and left me
confused as I tried to build up a mental model around why we'd need such
a directive.  I mostly tried to ignore it while focusing on the pieces I
was trying to understand.

Then I finally traced this variable all back to a call to is_excluded(),
meaning it was actually functioning as an adjective.  In particular, it
was a checked property ("Does this path match a rule in .gitignore?"),
rather than a mode passed in from the caller.  Change the variable name
to match the part of speech used by the function called to define it,
which will hopefully make these bits of code slightly clearer to the
next reader.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index 3a367683661..8074e651e6f 100644
--- a/dir.c
+++ b/dir.c
@@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
 static enum path_treatment treat_directory(struct dir_struct *dir,
 	struct index_state *istate,
 	struct untracked_cache_dir *untracked,
-	const char *dirname, int len, int baselen, int exclude,
+	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
@@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		}
 		if (nested_repo)
 			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(exclude ? path_excluded : path_untracked));
+				(excluded ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
-		if (exclude &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
+		if (excluded &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
 
 			/*
 			 * This is an excluded directory and we are
@@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* This is the "show_other_directories" case */
 
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
@@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * the directory contains any files.
 	 */
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, exclude, pathspec);
+					untracked, 1, excluded, pathspec);
 }
 
 /*
@@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      int baselen,
 				      const struct pathspec *pathspec)
 {
-	int has_path_in_index, dtype, exclude;
+	int has_path_in_index, dtype, excluded;
 	enum path_treatment path_treatment;
 
 	if (!cdir->d_name)
@@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	    (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent))
 		return path_none;
 
-	exclude = is_excluded(dir, istate, path->buf, &dtype);
+	excluded = is_excluded(dir, istate, path->buf, &dtype);
 
 	/*
 	 * Excluded? If we don't explicitly want to show
 	 * ignored files, ignore it
 	 */
-	if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
+	if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
 		return path_excluded;
 
 	switch (dtype) {
@@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		strbuf_addch(path, '/');
 		path_treatment = treat_directory(dir, istate, untracked,
 						 path->buf, path->len,
-						 baselen, exclude, pathspec);
+						 baselen, excluded, pathspec);
 		/*
 		 * If 1) we only want to return directories that
 		 * match an exclude pattern and 2) this directory does
@@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		 * recurse into this directory (instead of marking the
 		 * directory itself as an ignored path).
 		 */
-		if (!exclude &&
+		if (!excluded &&
 		    path_treatment == path_excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
@@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		return path_treatment;
 	case DT_REG:
 	case DT_LNK:
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (4 preceding siblings ...)
  2020-03-26 21:27       ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
@ 2020-03-26 21:27       ` Derrick Stolee via GitGitGadget
  2020-03-26 21:27       ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The logic in treat_directory() is handled by a multi-case
switch statement, but this switch is very asymmetrical, as
the first two cases are simple but the third is more
complicated than the rest of the method. In fact, the third
case includes a "break" statement that leads to the block
of code outside the switch statement. That is the only way
to reach that block, as the switch handles all possible
values from directory_exists_in_index();

Extract the switch statement into a series of "if" statements.
This simplifies the trivial cases, while clarifying how to
reach the "show_other_directories" case. This is particularly
important as the "show_other_directories" case will expand
in a later change.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index 8074e651e6f..d9bcb7e19b6 100644
--- a/dir.c
+++ b/dir.c
@@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
-
 	/* The "len-1" is to strip the final '/' */
-	switch (directory_exists_in_index(istate, dirname, len-1)) {
-	case index_directory:
-		return path_recurse;
+	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
-	case index_gitdir:
+	if (status == index_directory)
+		return path_recurse;
+	if (status == index_gitdir)
 		return path_none;
+	if (status != index_nonexistent)
+		BUG("Unhandled value for directory_exists_in_index: %d\n", status);
 
-	case index_nonexistent:
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    !(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			nested_repo = is_nonbare_repository_dir(&sb);
-			strbuf_release(&sb);
-		}
-		if (nested_repo)
-			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(excluded ? path_excluded : path_untracked));
+	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		!(dir->flags & DIR_NO_GITLINKS)) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, dirname);
+		nested_repo = is_nonbare_repository_dir(&sb);
+		strbuf_release(&sb);
+	}
+	if (nested_repo)
+		return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+			(excluded ? path_excluded : path_untracked));
 
-		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
-			break;
+	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
 		if (excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 7/7] dir: replace exponential algorithm with a linear one
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (5 preceding siblings ...)
  2020-03-26 21:27       ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
@ 2020-03-26 21:27       ` Elijah Newren via GitGitGadget
  2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

dir's read_directory_recursive() naturally operates recursively in order
to walk the directory tree.  Treating of directories is sometimes weird
because there are so many different permutations about how to handle
directories.  Some examples:

   * 'git ls-files -o --directory' only needs to know that a directory
     itself is untracked; it doesn't need to recurse into it to see what
     is underneath.

   * 'git status' needs to recurse into an untracked directory, but only
     to determine whether or not it is empty.  If there are no files
     underneath, the directory itself will be omitted from the output.
     If it is not empty, only the directory will be listed.

   * 'git status --ignored' needs to recurse into untracked directories
     and report all the ignored entries and then report the directory as
     untracked -- UNLESS all the entries under the directory are
     ignored, in which case we don't print any of the entries under the
     directory and just report the directory itself as ignored.  (Note
     that although this forces us to walk all untracked files underneath
     the directory as well, we strip them from the output, except for
     users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.)

   * For 'git clean', we may need to recurse into a directory that
     doesn't match any specified pathspecs, if it's possible that there
     is an entry underneath the directory that can match one of the
     pathspecs.  In such a case, we need to be careful to omit the
     directory itself from the list of paths (see commit 404ebceda01c
     ("dir: also check directories for matching pathspecs", 2019-09-17))

Part of the tension noted above is that the treatment of a directory can
change based on the files within it, and based on the various settings
in dir->flags.  Trying to keep this in mind while reading over the code,
it is easy to think in terms of "treat_directory() tells us what to do
with a directory, and read_directory_recursive() is the thing that
recurses".  Since we need to look into a directory to know how to treat
it, though, it is quite easy to decide to (also) recurse into the
directory from treat_directory() by adding a read_directory_recursive()
call.  Adding such a call is actually fine, IF we make sure that
read_directory_recursive() does not also recurse into that same
directory.

Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
for ignored files", 2017-05-18), added exactly such a case to the code,
meaning we'd have two calls to read_directory_recursive() for an
untracked directory.  So, if we had a file named
   one/two/three/four/five/somefile.txt
and nothing in one/ was tracked, then 'git status --ignored' would
call read_directory_recursive() twice on the directory 'one/', and
each of those would call read_directory_recursive() twice on the
directory 'one/two/', and so on until read_directory_recursive() was
called 2^5 times for 'one/two/three/four/five/'.

Avoid calling read_directory_recursive() twice per level by moving a
lot of the special logic into treat_directory().

Since dir.c is somewhat complex, extra cruft built up around this over
time.  While trying to unravel it, I noticed several instances where the
first call to read_directory_recursive() would return e.g.
path_untracked for some directory and a later one would return e.g.
path_none, despite the fact that the directory clearly should have been
considered untracked.  The code happened to work due to the side-effect
from the first invocation of adding untracked entries to dir->entries;
this allowed it to get the correct output despite the supposed override
in return value by the later call.

I am somewhat concerned that there are still bugs and maybe even
testcases with the wrong expectation.  I have tried to carefully
document treat_directory() since it becomes more complex after this
change (though much of this complexity came from elsewhere that probably
deserved better comments to begin with).  However, much of my work felt
more like a game of whackamole while attempting to make the code match
the existing regression tests than an attempt to create an
implementation that matched some clear design.  That seems wrong to me,
but the rules of existing behavior had so many special cases that I had
a hard time coming up with some overarching rules about what correct
behavior is for all cases, forcing me to hope that the regression tests
are correct and sufficient.  Such a hope seems likely to be ill-founded,
given my experience with dir.c-related testcases in the last few months:

  Examples where the documentation was hard to parse or even just wrong:
   * 3aca58045f4f (git-clean.txt: do not claim we will delete files with
                   -n/--dry-run, 2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
  Examples where testcases were declared wrong and changed:
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within
                   leading directories", 2019-12-10)
  Examples where testcases were clearly inadequate:
   * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an
                   ignored file breakage, 2019-08-25)
   * 7541cc530239 (t7300: add testcases showing failure to clean specified
                   pathspecs, 2019-09-17)
   * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item,
                   2019-09-17)
   * 404ebceda01c (dir: also check directories for matching pathspecs,
                   2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * 452efd11fbf6 (t3011: demonstrate directory traversal failures,
                   2019-12-10)
   * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19)
  Examples where "correct behavior" was unclear to everyone:
    https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
  Other commits of note:
   * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17)

However, on the positive side, it does make the code much faster.  For
the following simple shell loop in an empty repository:

  for depth in $(seq 10 25)
  do
    dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
    rm -rf dir
    mkdir -p $dirs
    >$dirs/untracked-file
    /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
  done

I saw the following timings, in seconds (note that the numbers are a
little noisy from run-to-run, but the trend is very clear with every
run):

    10: 0.03
    11: 0.05
    12: 0.08
    13: 0.19
    14: 0.29
    15: 0.50
    16: 1.05
    17: 2.11
    18: 4.11
    19: 8.60
    20: 17.55
    21: 33.87
    22: 68.71
    23: 140.05
    24: 274.45
    25: 551.15

For the above run, using strace I can look for the number of untracked
directories opened and can verify that it matches the expected
2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth).

After this fix, with strace I can verify that the number of untracked
directories that are opened drops to just $depth, and the timings all
drop to 0.00.  In fact, it isn't until a depth of 190 nested directories
that it sometimes starts reporting a time of 0.01 seconds and doesn't
consistently report 0.01 seconds until there are 240 nested directories.
The previous code would have taken
  17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
to have completed the 240 nested directories case.  It's not often
that you get to speed something up by a factor of 3*10^69.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 167 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 121 insertions(+), 46 deletions(-)

diff --git a/dir.c b/dir.c
index d9bcb7e19b6..29283fc2588 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
-	int nested_repo = 0;
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
+	enum path_treatment state;
+	int nested_repo = 0, old_ignored_nr, check_only, stop_early;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
@@ -1711,18 +1717,117 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/* This is the "show_other_directories" case */
 
-	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+	/*
+	 * We only need to recurse into untracked/ignored directories if
+	 * either of the following bits is set:
+	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
+	 *                           there are ignored directories below)
+	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
+	 *                                 the directory is empty)
+	 */
+	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
 		return excluded ? path_excluded : path_untracked;
 
+	/*
+	 * If we have we don't want to know the all the paths under an
+	 * untracked or ignored directory, we still need to go into the
+	 * directory to determine if it is empty (because an empty directory
+	 * should be path_none instead of path_excluded or path_untracked).
+	 */
+	check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
+		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
+
+	/*
+	 * However, there's another optimization possible as a subset of
+	 * check_only, based on the cases we have to consider:
+	 *   A) Directory matches no exclude patterns:
+	 *     * Directory is empty => path_none
+	 *     * Directory has an untracked file under it => path_untracked
+	 *     * Directory has only ignored files under it => path_excluded
+	 *   B) Directory matches an exclude pattern:
+	 *     * Directory is empty => path_none
+	 *     * Directory has an untracked file under it => path_excluded
+	 *     * Directory has only ignored files under it => path_excluded
+	 * In case A, we can exit as soon as we've found an untracked
+	 * file but otherwise have to walk all files.  In case B, though,
+	 * we can stop at the first file we find under the directory.
+	 */
+	stop_early = check_only && excluded;
+
+	/*
+	 * If /every/ file within an untracked directory is ignored, then
+	 * we want to treat the directory as ignored (for e.g. status
+	 * --porcelain), without listing the individual ignored files
+	 * underneath.  To do so, we'll save the current ignored_nr, and
+	 * pop all the ones added after it if it turns out the entire
+	 * directory is ignored.
+	 */
+	old_ignored_nr = dir->ignored_nr;
+
+	/* Actually recurse into dirname now, we'll fixup the state later. */
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
+	state = read_directory_recursive(dir, istate, dirname, len, untracked,
+					 check_only, stop_early, pathspec);
+
+	/* There are a variety of reasons we may need to fixup the state... */
+	if (state == path_excluded) {
+		int i;
+
+		/*
+		 * When stop_early is set, read_directory_recursive() will
+		 * never return path_untracked regardless of whether
+		 * underlying paths were untracked or ignored (because
+		 * returning early means it excluded some paths, or
+		 * something like that -- see commit 5aaa7fd39aaf ("Improve
+		 * performance of git status --ignored", 2017-09-18)).
+		 * However, we're not really concerned with the status of
+		 * files under the directory, we just wanted to know
+		 * whether the directory was empty (state == path_none) or
+		 * not (state == path_excluded), and if not, we'd return
+		 * our original status based on whether the untracked
+		 * directory matched an exclusion pattern.
+		 */
+		if (stop_early)
+			state = excluded ? path_excluded : path_untracked;
+
+		else {
+			/*
+			 * When
+			 *     !stop_early && state == path_excluded
+			 * then all paths under dirname were ignored.  For
+			 * this case, git status --porcelain wants to just
+			 * list the directory itself as ignored and not
+			 * list the individual paths underneath.  Remove
+			 * the individual paths underneath.
+			 */
+			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
+				free(dir->ignored[i]);
+			dir->ignored_nr = old_ignored_nr;
+		}
+	}
+
+	/*
+	 * If there is nothing under the current directory and we are not
+	 * hiding empty directories, then we need to report on the
+	 * untracked or ignored status of the directory itself.
+	 */
+	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+		state = excluded ? path_excluded : path_untracked;
 
 	/*
-	 * If this is an excluded directory, then we only need to check if
-	 * the directory contains any files.
+	 * We can recurse into untracked directories that don't match any
+	 * of the given pathspecs when some file underneath the directory
+	 * might match one of the pathspecs.  If so, we should make sure
+	 * to note that the directory itself did not match.
 	 */
-	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, excluded, pathspec);
+	if (pathspec &&
+	    !match_pathspec(istate, pathspec, dirname, len,
+			    0 /* prefix */, NULL,
+			    0 /* do NOT special case dirs */))
+		state = path_none;
+
+	return state;
 }
 
 /*
@@ -1870,6 +1975,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 					   int baselen,
 					   const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
 	strbuf_setlen(path, baselen);
 	if (!cdir->ucd) {
 		strbuf_addstr(path, cdir->file);
@@ -2175,14 +2285,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in treat_leading_path().  See the commit message for the
-	 * commit adding this warning as well as the commit preceding it
-	 * for details.
+	 * WARNING: Do NOT recurse unless path_recurse is returned from
+	 *          treat_path().  Recursing on any other return value
+	 *          can result in exponential slowdown.
 	 */
-
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2204,13 +2310,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			dir_state = state;
 
 		/* recurse into subdir if instructed by treat_path */
-		if ((state == path_recurse) ||
-			((state == path_untracked) &&
-			 (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) &&
-			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  (pathspec &&
-			   do_match_pathspec(istate, pathspec, path.buf, path.len,
-					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
+		if (state == path_recurse) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -2294,15 +2394,6 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
-	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in read_directory_recursive().  See 777b420347 (dir:
-	 * synchronize treat_leading_path() and read_directory_recursive(),
-	 * 2019-12-19) and its parent commit for details.
-	 */
-
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf subdir = STRBUF_INIT;
 	int prevlen, baselen;
@@ -2353,23 +2444,7 @@ static int treat_leading_path(struct dir_struct *dir,
 		strbuf_reset(&subdir);
 		strbuf_add(&subdir, path+prevlen, baselen-prevlen);
 		cdir.d_name = subdir.buf;
-		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
-				    pathspec);
-		if (state == path_untracked &&
-		    resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
-		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
-				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
-			if (!match_pathspec(istate, pathspec, sb.buf, sb.len,
-					    0 /* prefix */, NULL,
-					    0 /* do NOT special case dirs */))
-				state = path_none;
-			add_path_to_appropriate_result_list(dir, NULL, &cdir,
-							    istate,
-							    &sb, baselen,
-							    pathspec, state);
-			state = path_recurse;
-		}
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
 
 		if (state != path_recurse)
 			break; /* do not recurse into it */
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/7] t7063: more thorough status checking
  2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
@ 2020-03-27 13:09         ` Derrick Stolee
  2020-03-29 18:18           ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-03-27 13:09 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 3/26/2020 5:27 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> It turns out the t7063 has some testcases that even without using the
> untracked cache cover situations that nothing else in the testsuite
> handles.  Checking the results of
>   git status --porcelain
> both with and without the untracked cache, and comparing both against
> our expected results helped uncover a critical bug in some dir.c
> restructuring.
> 
> Unfortunately, it's not easy to run status and tell it to ignore the
> untracked cache; the only knob we have it to instruct it to *delete*
> (and ignore) the untracked cache.
> 
> Create a simple helper that will create a clone of the index that is
> missing the untracked cache bits, and use it to compare that the results
> with the untracked cache match the results we get without the untracked
> cache.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
> index 190ae149cf3..156d06c34e8 100755
> --- a/t/t7063-status-untracked-cache.sh
> +++ b/t/t7063-status-untracked-cache.sh
> @@ -30,6 +30,30 @@ status_is_clean() {
>  	test_must_be_empty ../status.actual
>  }
>  
> +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
> +# compare commands side-by-side, e.g.
> +#    iuc status --porcelain >expect &&
> +#    git status --porcelain >actual &&
> +#    test_cmp expect actual
> +iuc() {
> +	git ls-files -s >../current-index-entries
> +	git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries
> +
> +	GIT_INDEX_FILE=.git/tmp_index
> +	export GIT_INDEX_FILE
> +	git update-index --index-info <../current-index-entries
> +	git update-index --skip-worktree $(cat ../current-sparse-entries)
> +
> +	git -c core.untrackedCache=false "$@"
> +	ret=$?
> +
> +	rm ../current-index-entries
> +	rm $GIT_INDEX_FILE
> +	unset GIT_INDEX_FILE
> +
> +	return $ret
> +}

This is a clever way to get around the untracked cache deletion.

Thanks for adding these extra comparisons! It really does help guarantee
that we are doing the right thing in each case.

-Stolee


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (6 preceding siblings ...)
  2020-03-26 21:27       ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
@ 2020-03-27 13:13       ` Derrick Stolee
  2020-03-28 17:33         ` Elijah Newren
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
  8 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-03-27 13:13 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 3/26/2020 5:27 PM, Elijah Newren via GitGitGadget wrote:
> This series provides some "modest" speedups (see last commit message), and
> should allow 'git status --ignored' to complete in a more reasonable
> timeframe for Martin Melka (see 
> https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
> ).
> 
> Changes since v3:
> 
>  * Turns out I was wrong about the untracked cache stuff and had some bugs
>    around untracked directories with nothing bug ignored sub-entries.
>  * First patch now is no longer a change of expectation of the untracked
>    cache, but some more thorough testing/verification in that test that
>    helped explain my misunderstanding and uncover the bug in my refactor.
>  * Corrected the check_only and stop_at_first_file logic in the last patch
>    and added a big comment explaining how/why it all works. Also stopped
>    disabling part of the untracked cache in the same patch, and undid all
>    the changes to t7063 in that patch.
> 
> Stuff still missing from v4:
> 
>  * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in 
>    https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ 
>    which I think would make the code cleaner & clearer. I guess I'm leaving
>    that for future work.
> 
> As per the commit message of the final patch, this series has some risk.
> Extra eyes would be greatly appreciated; one pair already helped me find one
> bug.

I'm glad that I could help you discover mixed expectations. This pair of eyes
is now satisfied with this series to the extent I can check it.

Adding the previous patch to our microsoft/git fork pass the functional tests
in Scalar and VFS for Git, for what it's worth:

[1] https://github.com/microsoft/scalar/pull/358
[2] https://github.com/microsoft/VFSForGit/pull/1646

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
@ 2020-03-28 17:33         ` Elijah Newren
  2020-03-29 18:20           ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren @ 2020-03-28 17:33 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On Fri, Mar 27, 2020 at 6:13 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/26/2020 5:27 PM, Elijah Newren via GitGitGadget wrote:
> > This series provides some "modest" speedups (see last commit message), and
> > should allow 'git status --ignored' to complete in a more reasonable
> > timeframe for Martin Melka (see
> > https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
> > ).
> >
> > Changes since v3:
> >
> >  * Turns out I was wrong about the untracked cache stuff and had some bugs
> >    around untracked directories with nothing bug ignored sub-entries.
> >  * First patch now is no longer a change of expectation of the untracked
> >    cache, but some more thorough testing/verification in that test that
> >    helped explain my misunderstanding and uncover the bug in my refactor.
> >  * Corrected the check_only and stop_at_first_file logic in the last patch
> >    and added a big comment explaining how/why it all works. Also stopped
> >    disabling part of the untracked cache in the same patch, and undid all
> >    the changes to t7063 in that patch.
> >
> > Stuff still missing from v4:
> >
> >  * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in
> >    https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/
> >    which I think would make the code cleaner & clearer. I guess I'm leaving
> >    that for future work.
> >
> > As per the commit message of the final patch, this series has some risk.
> > Extra eyes would be greatly appreciated; one pair already helped me find one
> > bug.
>
> I'm glad that I could help you discover mixed expectations. This pair of eyes
> is now satisfied with this series to the extent I can check it.
>
> Adding the previous patch to our microsoft/git fork pass the functional tests
> in Scalar and VFS for Git, for what it's worth:
>
> [1] https://github.com/microsoft/scalar/pull/358
> [2] https://github.com/microsoft/VFSForGit/pull/1646

Thanks, that helps.

An update of my own for this series: Based on Felipe's reported
bash-completion issue I was modifying commands to try out a number of
other things and discovered some cases that can trigger the die("git
ls-files: internal error - directory entry not superset of prefix")
message from ls-files so there's still some fixes I need to make.
Will send an update when I've got it.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/7] t7063: more thorough status checking
  2020-03-27 13:09         ` Derrick Stolee
@ 2020-03-29 18:18           ` Junio C Hamano
  2020-03-31 20:15             ` Elijah Newren
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2020-03-29 18:18 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, git, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

Derrick Stolee <stolee@gmail.com> writes:

>> +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
>> +# compare commands side-by-side, e.g.
>> +#    iuc status --porcelain >expect &&
>> +#    git status --porcelain >actual &&
>> +#    test_cmp expect actual

;-)

>> +iuc() {

Missing SP after "iuc".

>> +	git ls-files -s >../current-index-entries
>> +	git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries

When you see yourself piping grep output to sed, think twice to see
if you can lose one of them.  sed -ne 's/^S.//p' perhaps?

>> +
>> +	GIT_INDEX_FILE=.git/tmp_index
>> +	export GIT_INDEX_FILE
>> +	git update-index --index-info <../current-index-entries
>> +	git update-index --skip-worktree $(cat ../current-sparse-entries)

Are the dances with ls-files and update-index to prepare us for a
possible future in which we do not use .git/index as the index file,
or something?  IOW, would 

	export GIT_INDEX_FILE=.git/tmp_index &&
	cp .git/index "$GIT_INDEX_FILE &&

be insufficient?

>> +
>> +	git -c core.untrackedCache=false "$@"
>> +	ret=$?
>> +
>> +	rm ../current-index-entries
>> +	rm $GIT_INDEX_FILE
>> +	unset GIT_INDEX_FILE
>> +
>> +	return $ret
>> +}
>
> This is a clever way to get around the untracked cache deletion.
>
> Thanks for adding these extra comparisons! It really does help guarantee
> that we are doing the right thing in each case.

Yes, I think it is a great idea to see tested commands behave the
same way with or without the untracked cache.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-03-28 17:33         ` Elijah Newren
@ 2020-03-29 18:20           ` Junio C Hamano
  0 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2020-03-29 18:20 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List,
	Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

Elijah Newren <newren@gmail.com> writes:

> An update of my own for this series: Based on Felipe's reported
> bash-completion issue I was modifying commands to try out a number of
> other things and discovered some cases that can trigger the die("git
> ls-files: internal error - directory entry not superset of prefix")
> message from ls-files so there's still some fixes I need to make.
> Will send an update when I've got it.

Thanks.  This is uncomfortably exciting ;-)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/7] t7063: more thorough status checking
  2020-03-29 18:18           ` Junio C Hamano
@ 2020-03-31 20:15             ` Elijah Newren
  0 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren @ 2020-03-31 20:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List,
	Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On Sun, Mar 29, 2020 at 11:18 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Derrick Stolee <stolee@gmail.com> writes:
>
> >> +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
> >> +# compare commands side-by-side, e.g.
> >> +#    iuc status --porcelain >expect &&
> >> +#    git status --porcelain >actual &&
> >> +#    test_cmp expect actual
>
> ;-)
>
> >> +iuc() {
>
> Missing SP after "iuc".

Will fix.

> >> +    git ls-files -s >../current-index-entries
> >> +    git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries
>
> When you see yourself piping grep output to sed, think twice to see
> if you can lose one of them.  sed -ne 's/^S.//p' perhaps?

Ooh, thanks.  I have to admit that I don't know sed very well.  In
fact, 'sed -e s/pattern/replacement/' was the _only_ piece of sed I
knew.  But the -n flag and p modifier look handy; I think I ran across
them in perl before as well.

> >> +
> >> +    GIT_INDEX_FILE=.git/tmp_index
> >> +    export GIT_INDEX_FILE
> >> +    git update-index --index-info <../current-index-entries
> >> +    git update-index --skip-worktree $(cat ../current-sparse-entries)
>
> Are the dances with ls-files and update-index to prepare us for a
> possible future in which we do not use .git/index as the index file,
> or something?  IOW, would
>
>         export GIT_INDEX_FILE=.git/tmp_index &&
>         cp .git/index "$GIT_INDEX_FILE &&
>
> be insufficient?

I guess it's a matter of perspective.  Do we want to compare to how
git behaves when there is no untracked cache (as I was trying to
implement), or compare to how git behaves when there is an untracked
cache and git is told to remove it?  (The documentation for
core.untrackedCache doesn't actually say when
core.untrackedCache=false that git will ignore it, just that it will
delete the untracked cache when that option is set.  Perhaps if we do
go the route with your alternative, we at least need to update the
documentation as well and perhaps also audit the code to make sure it
ignores the untracked cache as I'd expect?  Or maybe we just need to
run two operations, one to delete the untracked cache, and then the
second that we are actually comparing to?)

> >> +
> >> +    git -c core.untrackedCache=false "$@"
> >> +    ret=$?
> >> +
> >> +    rm ../current-index-entries
> >> +    rm $GIT_INDEX_FILE
> >> +    unset GIT_INDEX_FILE
> >> +
> >> +    return $ret
> >> +}
> >
> > This is a clever way to get around the untracked cache deletion.
> >
> > Thanks for adding these extra comparisons! It really does help guarantee
> > that we are doing the right thing in each case.
>
> Yes, I think it is a great idea to see tested commands behave the
> same way with or without the untracked cache.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 00/12] Avoid multiple recursive calls for same path in read_directory_recursive()
  2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (7 preceding siblings ...)
  2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
@ 2020-04-01  4:17       ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
                           ` (11 more replies)
  8 siblings, 12 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren

This series provides some "modest" speedups (see commit message for patch
8), and should allow 'git status --ignored' to complete in a more reasonable
timeframe for Martin Melka (see 
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
). It also cleans up the fill_directory() code and API, and fixes
bash-completion for 'git add untracked-dir/'.

Changes since v4:

 * cleanups suggested by Junio (patch 1)
 * new testcases that would have displayed multiple bugs with v4 (patch 2)
 * fixed the bugs with v4 (look for LEADING_PATHSPEC in patch 8)
 * fixed ANOTHER exponential slowdown codepath (look for MODE_MATCHING in
   patch 8)
 * make DIR_KEEP_UNTRACKED_CONTENTS less of a weird one-off (patch 9)
 * reduce number of calls to [do_]match_pathspec() (patch 10)
 * fix error-proneness of fill_directory() API (patch 11)
 * fix bash-completion results for 'git add' on an untracked dir (patch 12)

This is one of those rare patchsets that is absolutely perfect and
risk-free. That's right, bask in their glory and the ease of conscience from
using such solid stuff. Using this series will even innoculate you from bugs
outside of dir.c, and ones external to git, and even bugs external to your
computer. It's just that good. Pay no attention to the man behind the
curtain, er, I mean the huge warnings in patch 8, er...I mean what warnings?
There's no warnings to view, this stuff is solid as can be.

But if an extra pair of eyes wants to look at commit message in patch 8, or
at the new patches (2 and 9-12) and opine on how perfect everything looks
and feels, be my guest.

Derrick Stolee (1):
  dir: refactor treat_directory to clarify control flow

Elijah Newren (11):
  t7063: more thorough status checking
  t3000: add more testcases testing a variety of ls-files issues
  dir: fix simple typo in comment
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: replace exponential algorithm with a linear one
  dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
  dir: replace double pathspec matching with single in treat_directory()
  Fix error-prone fill_directory() API; make it only return matches
  completion: fix 'git add' on paths under an untracked directory

 builtin/clean.c                        |   6 -
 builtin/grep.c                         |   2 -
 builtin/ls-files.c                     |   5 +-
 builtin/stash.c                        |  17 +-
 contrib/completion/git-completion.bash |   2 +-
 dir.c                                  | 422 +++++++++++++++----------
 t/t3000-ls-files-others.sh             | 121 +++++++
 t/t7063-status-untracked-cache.sh      |  52 +++
 t/t9902-completion.sh                  |   5 +
 wt-status.c                            |   6 +-
 10 files changed, 437 insertions(+), 201 deletions(-)


base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v5
Pull-Request: https://github.com/git/git/pull/700

Range-diff vs v4:

  1:  752403e339b !  1:  e2704245854 t7063: more thorough status checking
     @@ -11,8 +11,10 @@
          restructuring.
      
          Unfortunately, it's not easy to run status and tell it to ignore the
     -    untracked cache; the only knob we have it to instruct it to *delete*
     -    (and ignore) the untracked cache.
     +    untracked cache; the only knob we have is core.untrackedCache=false,
     +    which is used to instruct git to *delete* the untracked cache (which
     +    might also ignore the untracked cache when it operates, but that isn't
     +    specified in the docs).
      
          Create a simple helper that will create a clone of the index that is
          missing the untracked cache bits, and use it to compare that the results
     @@ -33,9 +35,9 @@
      +#    iuc status --porcelain >expect &&
      +#    git status --porcelain >actual &&
      +#    test_cmp expect actual
     -+iuc() {
     ++iuc () {
      +	git ls-files -s >../current-index-entries
     -+	git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries
     ++	git ls-files -t | sed -ne s/^S.//p >../current-sparse-entries
      +
      +	GIT_INDEX_FILE=.git/tmp_index
      +	export GIT_INDEX_FILE
  -:  ----------- >  2:  88e9d5d5dbd t3000: add more testcases testing a variety of ls-files issues
  2:  a4287d690be =  3:  38d4d5a46b1 dir: fix simple typo in comment
  3:  48f37e5b114 =  4:  eeb38a25f3a dir: consolidate treat_path() and treat_one_path()
  4:  b5ad1939379 =  5:  6e29f1f6aec dir: fix broken comment
  5:  2603c1a9d13 =  6:  62dae938c8f dir: fix confusion based on variable tense
  6:  576f364329d =  7:  25921cb792e dir: refactor treat_directory to clarify control flow
  7:  e20525429e5 !  8:  b2caa426790 dir: replace exponential algorithm with a linear one
     @@ -187,8 +187,23 @@
       
      -	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
      +	/*
     -+	 * We only need to recurse into untracked/ignored directories if
     -+	 * either of the following bits is set:
     ++	 * If we have a pathspec which could match something _below_ this
     ++	 * directory (e.g. when checking 'subdir/' having a pathspec like
     ++	 * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we
     ++	 * need to recurse.
     ++	 */
     ++	if (pathspec) {
     ++		int ret = do_match_pathspec(istate, pathspec, dirname, len,
     ++					    0 /* prefix */, NULL /* seen */,
     ++					    DO_MATCH_LEADING_PATHSPEC);
     ++		if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
     ++			return path_recurse;
     ++	}
     ++
     ++	/*
     ++	 * Other than the path_recurse case immediately above, we only need
     ++	 * to recurse into untracked/ignored directories if either of the
     ++	 * following bits is set:
      +	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
      +	 *                           there are ignored directories below)
      +	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
     @@ -197,6 +212,16 @@
      +	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
       		return excluded ? path_excluded : path_untracked;
       
     ++	/*
     ++	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
     ++	 * recursing into ignored directories if the path is excluded and
     ++	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
     ++	 */
     ++	if (excluded &&
     ++	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
     ++	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
     ++		return path_excluded;
     ++
      +	/*
      +	 * If we have we don't want to know the all the paths under an
      +	 * untracked or ignored directory, we still need to go into the
     @@ -241,59 +266,52 @@
      +
      +	/* There are a variety of reasons we may need to fixup the state... */
      +	if (state == path_excluded) {
     -+		int i;
     -+
     -+		/*
     -+		 * When stop_early is set, read_directory_recursive() will
     -+		 * never return path_untracked regardless of whether
     -+		 * underlying paths were untracked or ignored (because
     -+		 * returning early means it excluded some paths, or
     -+		 * something like that -- see commit 5aaa7fd39aaf ("Improve
     -+		 * performance of git status --ignored", 2017-09-18)).
     -+		 * However, we're not really concerned with the status of
     -+		 * files under the directory, we just wanted to know
     -+		 * whether the directory was empty (state == path_none) or
     -+		 * not (state == path_excluded), and if not, we'd return
     -+		 * our original status based on whether the untracked
     -+		 * directory matched an exclusion pattern.
     ++		/* state == path_excluded implies all paths under
     ++		 * dirname were ignored...
     ++		 *
     ++		 * if running e.g. `git status --porcelain --ignored=matching`,
     ++		 * then we want to see the subpaths that are ignored.
     ++		 *
     ++		 * if running e.g. just `git status --porcelain`, then
     ++		 * we just want the directory itself to be listed as ignored
     ++		 * and not the individual paths underneath.
      +		 */
     -+		if (stop_early)
     -+			state = excluded ? path_excluded : path_untracked;
     ++		int want_ignored_subpaths =
     ++			((dir->flags & DIR_SHOW_IGNORED_TOO) &&
     ++			 (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING));
      +
     -+		else {
     ++		if (want_ignored_subpaths) {
      +			/*
     -+			 * When
     -+			 *     !stop_early && state == path_excluded
     -+			 * then all paths under dirname were ignored.  For
     -+			 * this case, git status --porcelain wants to just
     -+			 * list the directory itself as ignored and not
     -+			 * list the individual paths underneath.  Remove
     -+			 * the individual paths underneath.
     ++			 * with --ignored=matching, we want the subpaths
     ++			 * INSTEAD of the directory itself.
      +			 */
     ++			state = path_none;
     ++		} else {
     ++			int i;
      +			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
     -+				free(dir->ignored[i]);
     ++				FREE_AND_NULL(dir->ignored[i]);
      +			dir->ignored_nr = old_ignored_nr;
      +		}
      +	}
     -+
     -+	/*
     + 
     + 	/*
     +-	 * If this is an excluded directory, then we only need to check if
     +-	 * the directory contains any files.
      +	 * If there is nothing under the current directory and we are not
      +	 * hiding empty directories, then we need to report on the
      +	 * untracked or ignored status of the directory itself.
     -+	 */
     + 	 */
     +-	return read_directory_recursive(dir, istate, dirname, len,
     +-					untracked, 1, excluded, pathspec);
      +	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
      +		state = excluded ? path_excluded : path_untracked;
     - 
     - 	/*
     --	 * If this is an excluded directory, then we only need to check if
     --	 * the directory contains any files.
     ++
     ++	/*
      +	 * We can recurse into untracked directories that don't match any
      +	 * of the given pathspecs when some file underneath the directory
      +	 * might match one of the pathspecs.  If so, we should make sure
      +	 * to note that the directory itself did not match.
     - 	 */
     --	return read_directory_recursive(dir, istate, dirname, len,
     --					untracked, 1, excluded, pathspec);
     ++	 */
      +	if (pathspec &&
      +	    !match_pathspec(istate, pathspec, dirname, len,
      +			    0 /* prefix */, NULL,
     @@ -316,6 +334,47 @@
       	strbuf_setlen(path, baselen);
       	if (!cdir->ucd) {
       		strbuf_addstr(path, cdir->file);
     +@@
     + 				      const struct pathspec *pathspec)
     + {
     + 	int has_path_in_index, dtype, excluded;
     +-	enum path_treatment path_treatment;
     + 
     + 	if (!cdir->d_name)
     + 		return treat_path_fast(dir, untracked, cdir, istate, path,
     +@@
     + 	default:
     + 		return path_none;
     + 	case DT_DIR:
     +-		strbuf_addch(path, '/');
     +-		path_treatment = treat_directory(dir, istate, untracked,
     +-						 path->buf, path->len,
     +-						 baselen, excluded, pathspec);
     + 		/*
     +-		 * If 1) we only want to return directories that
     +-		 * match an exclude pattern and 2) this directory does
     +-		 * not match an exclude pattern but all of its
     +-		 * contents are excluded, then indicate that we should
     +-		 * recurse into this directory (instead of marking the
     +-		 * directory itself as an ignored path).
     ++		 * WARNING: Do not ignore/amend the return value from
     ++		 * treat_directory(), and especially do not change it to return
     ++		 * path_recurse as that can cause exponential slowdown.
     ++		 * Instead, modify treat_directory() to return the right value.
     + 		 */
     +-		if (!excluded &&
     +-		    path_treatment == path_excluded &&
     +-		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
     +-		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
     +-			return path_recurse;
     +-		return path_treatment;
     ++		strbuf_addch(path, '/');
     ++		return treat_directory(dir, istate, untracked,
     ++				       path->buf, path->len,
     ++				       baselen, excluded, pathspec);
     + 	case DT_REG:
     + 	case DT_LNK:
     + 		return excluded ? path_excluded : path_untracked;
      @@
       	int stop_at_first_file, const struct pathspec *pathspec)
       {
  -:  ----------- >  9:  08a10869816 dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
  -:  ----------- > 10:  cee74871e43 dir: replace double pathspec matching with single in treat_directory()
  -:  ----------- > 11:  61d9c9d758e Fix error-prone fill_directory() API; make it only return matches
  -:  ----------- > 12:  725adf0a9b8 completion: fix 'git add' on paths under an untracked directory

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 01/12] t7063: more thorough status checking
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget
                           ` (10 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

It turns out the t7063 has some testcases that even without using the
untracked cache cover situations that nothing else in the testsuite
handles.  Checking the results of
  git status --porcelain
both with and without the untracked cache, and comparing both against
our expected results helped uncover a critical bug in some dir.c
restructuring.

Unfortunately, it's not easy to run status and tell it to ignore the
untracked cache; the only knob we have is core.untrackedCache=false,
which is used to instruct git to *delete* the untracked cache (which
might also ignore the untracked cache when it operates, but that isn't
specified in the docs).

Create a simple helper that will create a clone of the index that is
missing the untracked cache bits, and use it to compare that the results
with the untracked cache match the results we get without the untracked
cache.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 190ae149cf3..69c39ff2e49 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -30,6 +30,30 @@ status_is_clean() {
 	test_must_be_empty ../status.actual
 }
 
+# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can
+# compare commands side-by-side, e.g.
+#    iuc status --porcelain >expect &&
+#    git status --porcelain >actual &&
+#    test_cmp expect actual
+iuc () {
+	git ls-files -s >../current-index-entries
+	git ls-files -t | sed -ne s/^S.//p >../current-sparse-entries
+
+	GIT_INDEX_FILE=.git/tmp_index
+	export GIT_INDEX_FILE
+	git update-index --index-info <../current-index-entries
+	git update-index --skip-worktree $(cat ../current-sparse-entries)
+
+	git -c core.untrackedCache=false "$@"
+	ret=$?
+
+	rm ../current-index-entries
+	rm $GIT_INDEX_FILE
+	unset GIT_INDEX_FILE
+
+	return $ret
+}
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -95,6 +119,8 @@ test_expect_success 'status first time (empty cache)' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 3
@@ -115,6 +141,8 @@ test_expect_success 'status second time (fully populated cache)' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -136,6 +164,7 @@ test_expect_success 'modify in root directory, one dir invalidation' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -145,6 +174,7 @@ A  two
 ?? four
 ?? three
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -183,6 +213,7 @@ test_expect_success 'new .gitignore invalidates recursively' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -192,6 +223,7 @@ A  two
 ?? dtwo/
 ?? three
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -230,6 +262,7 @@ test_expect_success 'new info/exclude invalidates everything' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -237,6 +270,7 @@ A  two
 ?? .gitignore
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -286,6 +320,7 @@ test_expect_success 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -293,6 +328,7 @@ A  one
 ?? dtwo/
 ?? two
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -343,6 +379,7 @@ test_expect_success 'status after the move' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 A  done/one
 A  one
@@ -350,6 +387,7 @@ A  two
 ?? .gitignore
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -390,10 +428,12 @@ test_expect_success 'status after commit' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
 ?? .gitignore
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -447,12 +487,14 @@ test_expect_success 'test sparse status with untracked cache' '
 	avoid_racy &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
 ?? done/five
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -487,12 +529,14 @@ test_expect_success 'test sparse status again with untracked cache' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
 ?? done/five
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -514,6 +558,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
@@ -521,6 +566,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 ?? done/sub/
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 2
@@ -560,6 +606,8 @@ test_expect_success 'test sparse status again with untracked cache and subdir' '
 	: >../trace &&
 	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
 	cat >../trace.expect <<EOF &&
 node creation: 0
@@ -573,6 +621,7 @@ EOF
 test_expect_success 'move entry in subdir from untracked to cached' '
 	git add dtwo/two &&
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 A  dtwo/two
@@ -580,12 +629,14 @@ A  dtwo/two
 ?? done/five
 ?? done/sub/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual
 '
 
 test_expect_success 'move entry in subdir from cached to untracked' '
 	git rm --cached dtwo/two &&
 	git status --porcelain >../status.actual &&
+	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
  M done/two
 ?? .gitignore
@@ -593,6 +644,7 @@ test_expect_success 'move entry in subdir from cached to untracked' '
 ?? done/sub/
 ?? dtwo/
 EOF
+	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget
                           ` (9 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds seven new ls-files tests.  While currently all seven test
pass, my earlier rounds of restructuring dir.c to replace an exponential
algorithm with a linear one passed all the tests in the testsuite but
failed six of these seven new tests.  Add these tests to increase our
case coverage.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3000-ls-files-others.sh | 121 +++++++++++++++++++++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/t/t3000-ls-files-others.sh b/t/t3000-ls-files-others.sh
index 0aefadacb05..ffdfb16f580 100755
--- a/t/t3000-ls-files-others.sh
+++ b/t/t3000-ls-files-others.sh
@@ -91,4 +91,125 @@ test_expect_success SYMLINKS 'ls-files --others with symlinked submodule' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup nested pathspec search' '
+	test_create_repo nested &&
+	(
+		cd nested &&
+
+		mkdir -p partially_tracked/untracked_dir &&
+		> partially_tracked/content &&
+		> partially_tracked/untracked_dir/file &&
+
+		mkdir -p untracked/deep &&
+		> untracked/deep/path &&
+		> untracked/deep/foo.c &&
+
+		git add partially_tracked/content
+	)
+'
+
+test_expect_success 'ls-files -o --directory with single deep dir pathspec' '
+	(
+		cd nested &&
+
+		git ls-files -o --directory untracked/deep/ >actual &&
+
+		cat <<-EOF >expect &&
+		untracked/deep/
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ls-files -o --directory with multiple dir pathspecs' '
+	(
+		cd nested &&
+
+		git ls-files -o --directory partially_tracked/ untracked/ >actual &&
+
+		cat <<-EOF >expect &&
+		partially_tracked/untracked_dir/
+		untracked/
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ls-files -o --directory with mix dir/file pathspecs' '
+	(
+		cd nested &&
+
+		git ls-files -o --directory partially_tracked/ untracked/deep/path >actual &&
+
+		cat <<-EOF >expect &&
+		partially_tracked/untracked_dir/
+		untracked/deep/path
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ls-files --o --directory with glob filetype match' '
+	(
+		cd nested &&
+
+		# globs kinda defeat --directory, but only for that pathspec
+		git ls-files --others --directory partially_tracked "untracked/*.c" >actual &&
+
+		cat <<-EOF >expect &&
+		partially_tracked/untracked_dir/
+		untracked/deep/foo.c
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ls-files --o --directory with mix of tracked states' '
+	(
+		cd nested &&
+
+		# globs kinda defeat --directory, but only for that pathspec
+		git ls-files --others --directory partially_tracked/ "untracked/?*" >actual &&
+
+		cat <<-EOF >expect &&
+		partially_tracked/untracked_dir/
+		untracked/deep/
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ls-files --o --directory with glob filetype match only' '
+	(
+		cd nested &&
+
+		git ls-files --others --directory "untracked/*.c" >actual &&
+
+		cat <<-EOF >expect &&
+		untracked/deep/foo.c
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ls-files --o --directory to get immediate paths under one dir only' '
+	(
+		cd nested &&
+
+		git ls-files --others --directory "untracked/?*" >actual &&
+
+		cat <<-EOF >expect &&
+		untracked/deep/
+		EOF
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 03/12] dir: fix simple typo in comment
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
                           ` (8 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index b460211e614..b505ba747bb 100644
--- a/dir.c
+++ b/dir.c
@@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir,
  * If 'stop_at_first_file' is specified, 'path_excluded' is returned
  * to signal that a file was found. This is the least significant value that
  * indicates that a file was encountered that does not depend on the order of
- * whether an untracked or exluded path was encountered first.
+ * whether an untracked or excluded path was encountered first.
  *
  * Returns the most significant path_treatment value encountered in the scan.
  * If 'stop_at_first_file' is specified, `path_excluded` is the most
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path()
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (2 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget
                           ` (7 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 16e2cfa90993 ("read_directory(): further split treat_path()",
2010-01-08) split treat_one_path() out of treat_path(), because
treat_leading_path() would not have access to a dirent but wanted to
re-use as much of treat_path() as possible.  Not re-using all of
treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir:
fix checks on common prefix directory", 2019-12-19).  Finally, in commit
ad6f2157f951 ("dir: restructure in a way to avoid passing around a
struct dirent", 2020-01-16), dirents were removed from treat_path() and
other functions entirely.

Since the only reason for splitting these functions was the lack of a
dirent -- which no longer applies to either function -- and since the
split caused problems in the past resulting in us not using
treat_one_path() separately anymore, just undo the split.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 121 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 55 insertions(+), 66 deletions(-)

diff --git a/dir.c b/dir.c
index b505ba747bb..d0f3d660850 100644
--- a/dir.c
+++ b/dir.c
@@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate,
 	return dtype;
 }
 
-static enum path_treatment treat_one_path(struct dir_struct *dir,
-					  struct untracked_cache_dir *untracked,
-					  struct index_state *istate,
-					  struct strbuf *path,
-					  int baselen,
-					  const struct pathspec *pathspec,
-					  int dtype)
-{
-	int exclude;
-	int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case);
+static enum path_treatment treat_path_fast(struct dir_struct *dir,
+					   struct untracked_cache_dir *untracked,
+					   struct cached_dir *cdir,
+					   struct index_state *istate,
+					   struct strbuf *path,
+					   int baselen,
+					   const struct pathspec *pathspec)
+{
+	strbuf_setlen(path, baselen);
+	if (!cdir->ucd) {
+		strbuf_addstr(path, cdir->file);
+		return path_untracked;
+	}
+	strbuf_addstr(path, cdir->ucd->name);
+	/* treat_one_path() does this before it calls treat_directory() */
+	strbuf_complete(path, '/');
+	if (cdir->ucd->check_only)
+		/*
+		 * check_only is set as a result of treat_directory() getting
+		 * to its bottom. Verify again the same set of directories
+		 * with check_only set.
+		 */
+		return read_directory_recursive(dir, istate, path->buf, path->len,
+						cdir->ucd, 1, 0, pathspec);
+	/*
+	 * We get path_recurse in the first run when
+	 * directory_exists_in_index() returns index_nonexistent. We
+	 * are sure that new changes in the index does not impact the
+	 * outcome. Return now.
+	 */
+	return path_recurse;
+}
+
+static enum path_treatment treat_path(struct dir_struct *dir,
+				      struct untracked_cache_dir *untracked,
+				      struct cached_dir *cdir,
+				      struct index_state *istate,
+				      struct strbuf *path,
+				      int baselen,
+				      const struct pathspec *pathspec)
+{
+	int has_path_in_index, dtype, exclude;
 	enum path_treatment path_treatment;
 
-	dtype = resolve_dtype(dtype, istate, path->buf, path->len);
+	if (!cdir->d_name)
+		return treat_path_fast(dir, untracked, cdir, istate, path,
+				       baselen, pathspec);
+	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
+		return path_none;
+	strbuf_setlen(path, baselen);
+	strbuf_addstr(path, cdir->d_name);
+	if (simplify_away(path->buf, path->len, pathspec))
+		return path_none;
+
+	dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len);
 
 	/* Always exclude indexed files */
+	has_path_in_index = !!index_file_exists(istate, path->buf, path->len,
+						ignore_case);
 	if (dtype != DT_DIR && has_path_in_index)
 		return path_none;
 
@@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	}
 }
 
-static enum path_treatment treat_path_fast(struct dir_struct *dir,
-					   struct untracked_cache_dir *untracked,
-					   struct cached_dir *cdir,
-					   struct index_state *istate,
-					   struct strbuf *path,
-					   int baselen,
-					   const struct pathspec *pathspec)
-{
-	strbuf_setlen(path, baselen);
-	if (!cdir->ucd) {
-		strbuf_addstr(path, cdir->file);
-		return path_untracked;
-	}
-	strbuf_addstr(path, cdir->ucd->name);
-	/* treat_one_path() does this before it calls treat_directory() */
-	strbuf_complete(path, '/');
-	if (cdir->ucd->check_only)
-		/*
-		 * check_only is set as a result of treat_directory() getting
-		 * to its bottom. Verify again the same set of directories
-		 * with check_only set.
-		 */
-		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, 0, pathspec);
-	/*
-	 * We get path_recurse in the first run when
-	 * directory_exists_in_index() returns index_nonexistent. We
-	 * are sure that new changes in the index does not impact the
-	 * outcome. Return now.
-	 */
-	return path_recurse;
-}
-
-static enum path_treatment treat_path(struct dir_struct *dir,
-				      struct untracked_cache_dir *untracked,
-				      struct cached_dir *cdir,
-				      struct index_state *istate,
-				      struct strbuf *path,
-				      int baselen,
-				      const struct pathspec *pathspec)
-{
-	if (!cdir->d_name)
-		return treat_path_fast(dir, untracked, cdir, istate, path,
-				       baselen, pathspec);
-	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
-		return path_none;
-	strbuf_setlen(path, baselen);
-	strbuf_addstr(path, cdir->d_name);
-	if (simplify_away(path->buf, path->len, pathspec))
-		return path_none;
-
-	return treat_one_path(dir, untracked, istate, path, baselen, pathspec,
-			      cdir->d_type);
-}
-
 static void add_untracked(struct untracked_cache_dir *dir, const char *name)
 {
 	if (!dir)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 05/12] dir: fix broken comment
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (3 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
                           ` (6 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d0f3d660850..3a367683661 100644
--- a/dir.c
+++ b/dir.c
@@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 					add_untracked(untracked, path.buf + baselen);
 				break;
 			}
-			/* skip the dir_add_* part */
+			/* skip the add_path_to_appropriate_result_list() */
 			continue;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 06/12] dir: fix confusion based on variable tense
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (4 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
                           ` (5 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Despite having contributed several fixes in this area, I have for months
(years?) assumed that the "exclude" variable was a directive; this
caused me to think of it as a different mode we operate in and left me
confused as I tried to build up a mental model around why we'd need such
a directive.  I mostly tried to ignore it while focusing on the pieces I
was trying to understand.

Then I finally traced this variable all back to a call to is_excluded(),
meaning it was actually functioning as an adjective.  In particular, it
was a checked property ("Does this path match a rule in .gitignore?"),
rather than a mode passed in from the caller.  Change the variable name
to match the part of speech used by the function called to define it,
which will hopefully make these bits of code slightly clearer to the
next reader.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index 3a367683661..8074e651e6f 100644
--- a/dir.c
+++ b/dir.c
@@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
 static enum path_treatment treat_directory(struct dir_struct *dir,
 	struct index_state *istate,
 	struct untracked_cache_dir *untracked,
-	const char *dirname, int len, int baselen, int exclude,
+	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
@@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		}
 		if (nested_repo)
 			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(exclude ? path_excluded : path_untracked));
+				(excluded ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
-		if (exclude &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			(dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
+		if (excluded &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
 
 			/*
 			 * This is an excluded directory and we are
@@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* This is the "show_other_directories" case */
 
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
@@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * the directory contains any files.
 	 */
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, exclude, pathspec);
+					untracked, 1, excluded, pathspec);
 }
 
 /*
@@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      int baselen,
 				      const struct pathspec *pathspec)
 {
-	int has_path_in_index, dtype, exclude;
+	int has_path_in_index, dtype, excluded;
 	enum path_treatment path_treatment;
 
 	if (!cdir->d_name)
@@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	    (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent))
 		return path_none;
 
-	exclude = is_excluded(dir, istate, path->buf, &dtype);
+	excluded = is_excluded(dir, istate, path->buf, &dtype);
 
 	/*
 	 * Excluded? If we don't explicitly want to show
 	 * ignored files, ignore it
 	 */
-	if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
+	if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO)))
 		return path_excluded;
 
 	switch (dtype) {
@@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		strbuf_addch(path, '/');
 		path_treatment = treat_directory(dir, istate, untracked,
 						 path->buf, path->len,
-						 baselen, exclude, pathspec);
+						 baselen, excluded, pathspec);
 		/*
 		 * If 1) we only want to return directories that
 		 * match an exclude pattern and 2) this directory does
@@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		 * recurse into this directory (instead of marking the
 		 * directory itself as an ignored path).
 		 */
-		if (!exclude &&
+		if (!excluded &&
 		    path_treatment == path_excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
@@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		return path_treatment;
 	case DT_REG:
 	case DT_LNK:
-		return exclude ? path_excluded : path_untracked;
+		return excluded ? path_excluded : path_untracked;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (5 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Derrick Stolee via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
                           ` (4 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The logic in treat_directory() is handled by a multi-case
switch statement, but this switch is very asymmetrical, as
the first two cases are simple but the third is more
complicated than the rest of the method. In fact, the third
case includes a "break" statement that leads to the block
of code outside the switch statement. That is the only way
to reach that block, as the switch handles all possible
values from directory_exists_in_index();

Extract the switch statement into a series of "if" statements.
This simplifies the trivial cases, while clarifying how to
reach the "show_other_directories" case. This is particularly
important as the "show_other_directories" case will expand
in a later change.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index 8074e651e6f..d9bcb7e19b6 100644
--- a/dir.c
+++ b/dir.c
@@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const struct pathspec *pathspec)
 {
 	int nested_repo = 0;
-
 	/* The "len-1" is to strip the final '/' */
-	switch (directory_exists_in_index(istate, dirname, len-1)) {
-	case index_directory:
-		return path_recurse;
+	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
-	case index_gitdir:
+	if (status == index_directory)
+		return path_recurse;
+	if (status == index_gitdir)
 		return path_none;
+	if (status != index_nonexistent)
+		BUG("Unhandled value for directory_exists_in_index: %d\n", status);
 
-	case index_nonexistent:
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    !(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			nested_repo = is_nonbare_repository_dir(&sb);
-			strbuf_release(&sb);
-		}
-		if (nested_repo)
-			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
-				(excluded ? path_excluded : path_untracked));
+	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		!(dir->flags & DIR_NO_GITLINKS)) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, dirname);
+		nested_repo = is_nonbare_repository_dir(&sb);
+		strbuf_release(&sb);
+	}
+	if (nested_repo)
+		return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+			(excluded ? path_excluded : path_untracked));
 
-		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
-			break;
+	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
 		if (excluded &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
 		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 08/12] dir: replace exponential algorithm with a linear one
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (6 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01 13:57           ` Derrick Stolee
  2020-04-01  4:17         ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget
                           ` (3 subsequent siblings)
  11 siblings, 1 reply; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

dir's read_directory_recursive() naturally operates recursively in order
to walk the directory tree.  Treating of directories is sometimes weird
because there are so many different permutations about how to handle
directories.  Some examples:

   * 'git ls-files -o --directory' only needs to know that a directory
     itself is untracked; it doesn't need to recurse into it to see what
     is underneath.

   * 'git status' needs to recurse into an untracked directory, but only
     to determine whether or not it is empty.  If there are no files
     underneath, the directory itself will be omitted from the output.
     If it is not empty, only the directory will be listed.

   * 'git status --ignored' needs to recurse into untracked directories
     and report all the ignored entries and then report the directory as
     untracked -- UNLESS all the entries under the directory are
     ignored, in which case we don't print any of the entries under the
     directory and just report the directory itself as ignored.  (Note
     that although this forces us to walk all untracked files underneath
     the directory as well, we strip them from the output, except for
     users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.)

   * For 'git clean', we may need to recurse into a directory that
     doesn't match any specified pathspecs, if it's possible that there
     is an entry underneath the directory that can match one of the
     pathspecs.  In such a case, we need to be careful to omit the
     directory itself from the list of paths (see commit 404ebceda01c
     ("dir: also check directories for matching pathspecs", 2019-09-17))

Part of the tension noted above is that the treatment of a directory can
change based on the files within it, and based on the various settings
in dir->flags.  Trying to keep this in mind while reading over the code,
it is easy to think in terms of "treat_directory() tells us what to do
with a directory, and read_directory_recursive() is the thing that
recurses".  Since we need to look into a directory to know how to treat
it, though, it is quite easy to decide to (also) recurse into the
directory from treat_directory() by adding a read_directory_recursive()
call.  Adding such a call is actually fine, IF we make sure that
read_directory_recursive() does not also recurse into that same
directory.

Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs
for ignored files", 2017-05-18), added exactly such a case to the code,
meaning we'd have two calls to read_directory_recursive() for an
untracked directory.  So, if we had a file named
   one/two/three/four/five/somefile.txt
and nothing in one/ was tracked, then 'git status --ignored' would
call read_directory_recursive() twice on the directory 'one/', and
each of those would call read_directory_recursive() twice on the
directory 'one/two/', and so on until read_directory_recursive() was
called 2^5 times for 'one/two/three/four/five/'.

Avoid calling read_directory_recursive() twice per level by moving a
lot of the special logic into treat_directory().

Since dir.c is somewhat complex, extra cruft built up around this over
time.  While trying to unravel it, I noticed several instances where the
first call to read_directory_recursive() would return e.g.
path_untracked for some directory and a later one would return e.g.
path_none, despite the fact that the directory clearly should have been
considered untracked.  The code happened to work due to the side-effect
from the first invocation of adding untracked entries to dir->entries;
this allowed it to get the correct output despite the supposed override
in return value by the later call.

I am somewhat concerned that there are still bugs and maybe even
testcases with the wrong expectation.  I have tried to carefully
document treat_directory() since it becomes more complex after this
change (though much of this complexity came from elsewhere that probably
deserved better comments to begin with).  However, much of my work felt
more like a game of whackamole while attempting to make the code match
the existing regression tests than an attempt to create an
implementation that matched some clear design.  That seems wrong to me,
but the rules of existing behavior had so many special cases that I had
a hard time coming up with some overarching rules about what correct
behavior is for all cases, forcing me to hope that the regression tests
are correct and sufficient.  Such a hope seems likely to be ill-founded,
given my experience with dir.c-related testcases in the last few months:

  Examples where the documentation was hard to parse or even just wrong:
   * 3aca58045f4f (git-clean.txt: do not claim we will delete files with
                   -n/--dry-run, 2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
  Examples where testcases were declared wrong and changed:
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within
                   leading directories", 2019-12-10)
  Examples where testcases were clearly inadequate:
   * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an
                   ignored file breakage, 2019-08-25)
   * 7541cc530239 (t7300: add testcases showing failure to clean specified
                   pathspecs, 2019-09-17)
   * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item,
                   2019-09-17)
   * 404ebceda01c (dir: also check directories for matching pathspecs,
                   2019-09-17)
   * 09487f2cbad3 (clean: avoid removing untracked files in a nested git
                   repository, 2019-09-17)
   * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17)
   * 452efd11fbf6 (t3011: demonstrate directory traversal failures,
                   2019-12-10)
   * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19)
  Examples where "correct behavior" was unclear to everyone:
    https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
  Other commits of note:
   * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17)

However, on the positive side, it does make the code much faster.  For
the following simple shell loop in an empty repository:

  for depth in $(seq 10 25)
  do
    dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
    rm -rf dir
    mkdir -p $dirs
    >$dirs/untracked-file
    /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
  done

I saw the following timings, in seconds (note that the numbers are a
little noisy from run-to-run, but the trend is very clear with every
run):

    10: 0.03
    11: 0.05
    12: 0.08
    13: 0.19
    14: 0.29
    15: 0.50
    16: 1.05
    17: 2.11
    18: 4.11
    19: 8.60
    20: 17.55
    21: 33.87
    22: 68.71
    23: 140.05
    24: 274.45
    25: 551.15

For the above run, using strace I can look for the number of untracked
directories opened and can verify that it matches the expected
2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth).

After this fix, with strace I can verify that the number of untracked
directories that are opened drops to just $depth, and the timings all
drop to 0.00.  In fact, it isn't until a depth of 190 nested directories
that it sometimes starts reporting a time of 0.01 seconds and doesn't
consistently report 0.01 seconds until there are 240 nested directories.
The previous code would have taken
  17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS
to have completed the 240 nested directories case.  It's not often
that you get to speed something up by a factor of 3*10^69.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 210 ++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 147 insertions(+), 63 deletions(-)

diff --git a/dir.c b/dir.c
index d9bcb7e19b6..1b3c095b5a4 100644
--- a/dir.c
+++ b/dir.c
@@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int excluded,
 	const struct pathspec *pathspec)
 {
-	int nested_repo = 0;
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
+	enum path_treatment state;
+	int nested_repo = 0, old_ignored_nr, check_only, stop_early;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
@@ -1711,18 +1717,135 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/* This is the "show_other_directories" case */
 
-	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+	/*
+	 * If we have a pathspec which could match something _below_ this
+	 * directory (e.g. when checking 'subdir/' having a pathspec like
+	 * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we
+	 * need to recurse.
+	 */
+	if (pathspec) {
+		int ret = do_match_pathspec(istate, pathspec, dirname, len,
+					    0 /* prefix */, NULL /* seen */,
+					    DO_MATCH_LEADING_PATHSPEC);
+		if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
+			return path_recurse;
+	}
+
+	/*
+	 * Other than the path_recurse case immediately above, we only need
+	 * to recurse into untracked/ignored directories if either of the
+	 * following bits is set:
+	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
+	 *                           there are ignored directories below)
+	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
+	 *                                 the directory is empty)
+	 */
+	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
 		return excluded ? path_excluded : path_untracked;
 
+	/*
+	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
+	 * recursing into ignored directories if the path is excluded and
+	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+	 */
+	if (excluded &&
+	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
+	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+		return path_excluded;
+
+	/*
+	 * If we have we don't want to know the all the paths under an
+	 * untracked or ignored directory, we still need to go into the
+	 * directory to determine if it is empty (because an empty directory
+	 * should be path_none instead of path_excluded or path_untracked).
+	 */
+	check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
+		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
+
+	/*
+	 * However, there's another optimization possible as a subset of
+	 * check_only, based on the cases we have to consider:
+	 *   A) Directory matches no exclude patterns:
+	 *     * Directory is empty => path_none
+	 *     * Directory has an untracked file under it => path_untracked
+	 *     * Directory has only ignored files under it => path_excluded
+	 *   B) Directory matches an exclude pattern:
+	 *     * Directory is empty => path_none
+	 *     * Directory has an untracked file under it => path_excluded
+	 *     * Directory has only ignored files under it => path_excluded
+	 * In case A, we can exit as soon as we've found an untracked
+	 * file but otherwise have to walk all files.  In case B, though,
+	 * we can stop at the first file we find under the directory.
+	 */
+	stop_early = check_only && excluded;
+
+	/*
+	 * If /every/ file within an untracked directory is ignored, then
+	 * we want to treat the directory as ignored (for e.g. status
+	 * --porcelain), without listing the individual ignored files
+	 * underneath.  To do so, we'll save the current ignored_nr, and
+	 * pop all the ones added after it if it turns out the entire
+	 * directory is ignored.
+	 */
+	old_ignored_nr = dir->ignored_nr;
+
+	/* Actually recurse into dirname now, we'll fixup the state later. */
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
+	state = read_directory_recursive(dir, istate, dirname, len, untracked,
+					 check_only, stop_early, pathspec);
+
+	/* There are a variety of reasons we may need to fixup the state... */
+	if (state == path_excluded) {
+		/* state == path_excluded implies all paths under
+		 * dirname were ignored...
+		 *
+		 * if running e.g. `git status --porcelain --ignored=matching`,
+		 * then we want to see the subpaths that are ignored.
+		 *
+		 * if running e.g. just `git status --porcelain`, then
+		 * we just want the directory itself to be listed as ignored
+		 * and not the individual paths underneath.
+		 */
+		int want_ignored_subpaths =
+			((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+			 (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING));
+
+		if (want_ignored_subpaths) {
+			/*
+			 * with --ignored=matching, we want the subpaths
+			 * INSTEAD of the directory itself.
+			 */
+			state = path_none;
+		} else {
+			int i;
+			for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i)
+				FREE_AND_NULL(dir->ignored[i]);
+			dir->ignored_nr = old_ignored_nr;
+		}
+	}
 
 	/*
-	 * If this is an excluded directory, then we only need to check if
-	 * the directory contains any files.
+	 * If there is nothing under the current directory and we are not
+	 * hiding empty directories, then we need to report on the
+	 * untracked or ignored status of the directory itself.
 	 */
-	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, excluded, pathspec);
+	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+		state = excluded ? path_excluded : path_untracked;
+
+	/*
+	 * We can recurse into untracked directories that don't match any
+	 * of the given pathspecs when some file underneath the directory
+	 * might match one of the pathspecs.  If so, we should make sure
+	 * to note that the directory itself did not match.
+	 */
+	if (pathspec &&
+	    !match_pathspec(istate, pathspec, dirname, len,
+			    0 /* prefix */, NULL,
+			    0 /* do NOT special case dirs */))
+		state = path_none;
+
+	return state;
 }
 
 /*
@@ -1870,6 +1993,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 					   int baselen,
 					   const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING: From this function, you can return path_recurse or you
+	 *          can call read_directory_recursive() (or neither), but
+	 *          you CAN'T DO BOTH.
+	 */
 	strbuf_setlen(path, baselen);
 	if (!cdir->ucd) {
 		strbuf_addstr(path, cdir->file);
@@ -1904,7 +2032,6 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      const struct pathspec *pathspec)
 {
 	int has_path_in_index, dtype, excluded;
-	enum path_treatment path_treatment;
 
 	if (!cdir->d_name)
 		return treat_path_fast(dir, untracked, cdir, istate, path,
@@ -1961,24 +2088,16 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	default:
 		return path_none;
 	case DT_DIR:
-		strbuf_addch(path, '/');
-		path_treatment = treat_directory(dir, istate, untracked,
-						 path->buf, path->len,
-						 baselen, excluded, pathspec);
 		/*
-		 * If 1) we only want to return directories that
-		 * match an exclude pattern and 2) this directory does
-		 * not match an exclude pattern but all of its
-		 * contents are excluded, then indicate that we should
-		 * recurse into this directory (instead of marking the
-		 * directory itself as an ignored path).
+		 * WARNING: Do not ignore/amend the return value from
+		 * treat_directory(), and especially do not change it to return
+		 * path_recurse as that can cause exponential slowdown.
+		 * Instead, modify treat_directory() to return the right value.
 		 */
-		if (!excluded &&
-		    path_treatment == path_excluded &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-			return path_recurse;
-		return path_treatment;
+		strbuf_addch(path, '/');
+		return treat_directory(dir, istate, untracked,
+				       path->buf, path->len,
+				       baselen, excluded, pathspec);
 	case DT_REG:
 	case DT_LNK:
 		return excluded ? path_excluded : path_untracked;
@@ -2175,14 +2294,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in treat_leading_path().  See the commit message for the
-	 * commit adding this warning as well as the commit preceding it
-	 * for details.
+	 * WARNING: Do NOT recurse unless path_recurse is returned from
+	 *          treat_path().  Recursing on any other return value
+	 *          can result in exponential slowdown.
 	 */
-
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2204,13 +2319,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			dir_state = state;
 
 		/* recurse into subdir if instructed by treat_path */
-		if ((state == path_recurse) ||
-			((state == path_untracked) &&
-			 (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) &&
-			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  (pathspec &&
-			   do_match_pathspec(istate, pathspec, path.buf, path.len,
-					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
+		if (state == path_recurse) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -2294,15 +2403,6 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
-	/*
-	 * WARNING WARNING WARNING:
-	 *
-	 * Any updates to the traversal logic here may need corresponding
-	 * updates in read_directory_recursive().  See 777b420347 (dir:
-	 * synchronize treat_leading_path() and read_directory_recursive(),
-	 * 2019-12-19) and its parent commit for details.
-	 */
-
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf subdir = STRBUF_INIT;
 	int prevlen, baselen;
@@ -2353,23 +2453,7 @@ static int treat_leading_path(struct dir_struct *dir,
 		strbuf_reset(&subdir);
 		strbuf_add(&subdir, path+prevlen, baselen-prevlen);
 		cdir.d_name = subdir.buf;
-		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
-				    pathspec);
-		if (state == path_untracked &&
-		    resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR &&
-		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
-		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
-				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
-			if (!match_pathspec(istate, pathspec, sb.buf, sb.len,
-					    0 /* prefix */, NULL,
-					    0 /* do NOT special case dirs */))
-				state = path_none;
-			add_path_to_appropriate_result_list(dir, NULL, &cdir,
-							    istate,
-							    &sb, baselen,
-							    pathspec, state);
-			state = path_recurse;
-		}
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
 
 		if (state != path_recurse)
 			break; /* do not recurse into it */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (7 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget
                           ` (2 subsequent siblings)
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Handling DIR_KEEP_UNTRACKED_CONTENTS within treat_directory() instead of
as a post-processing step in read_directory():
  * allows us to directly access and remove the relevant entries instead
    of needing to calculate which ones need to be removed
  * keeps the logic for directory handling in one location (and puts it
    closer the the logic for stripping out extra ignored entries, which
    seems logical).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 43 +++++++++++++++++++------------------------
 1 file changed, 19 insertions(+), 24 deletions(-)

diff --git a/dir.c b/dir.c
index 1b3c095b5a4..8be31df58c2 100644
--- a/dir.c
+++ b/dir.c
@@ -1665,7 +1665,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 *          you CAN'T DO BOTH.
 	 */
 	enum path_treatment state;
-	int nested_repo = 0, old_ignored_nr, check_only, stop_early;
+	int nested_repo = 0, check_only, stop_early;
+	int old_ignored_nr, old_untracked_nr;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
 
@@ -1785,9 +1786,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * --porcelain), without listing the individual ignored files
 	 * underneath.  To do so, we'll save the current ignored_nr, and
 	 * pop all the ones added after it if it turns out the entire
-	 * directory is ignored.
+	 * directory is ignored.  Also, when DIR_SHOW_IGNORED_TOO and
+	 * !DIR_KEEP_UNTRACKED_CONTENTS then we don't want to show
+	 * untracked paths so will need to pop all those off the last
+	 * after we traverse.
 	 */
 	old_ignored_nr = dir->ignored_nr;
+	old_untracked_nr = dir->nr;
 
 	/* Actually recurse into dirname now, we'll fixup the state later. */
 	untracked = lookup_untracked(dir->untracked, untracked,
@@ -1825,6 +1830,18 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		}
 	}
 
+	/*
+	 * We may need to ignore some of the untracked paths we found while
+	 * traversing subdirectories.
+	 */
+	if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+	    !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) {
+		int i;
+		for (i = old_untracked_nr + 1; i<dir->nr; ++i)
+			FREE_AND_NULL(dir->entries[i]);
+		dir->nr = old_untracked_nr;
+	}
+
 	/*
 	 * If there is nothing under the current directory and we are not
 	 * hiding empty directories, then we need to report on the
@@ -2653,28 +2670,6 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-	/*
-	 * If DIR_SHOW_IGNORED_TOO is set, read_directory_recursive() will
-	 * also pick up untracked contents of untracked dirs; by default
-	 * we discard these, but given DIR_KEEP_UNTRACKED_CONTENTS we do not.
-	 */
-	if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
-		     !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) {
-		int i, j;
-
-		/* remove from dir->entries untracked contents of untracked dirs */
-		for (i = j = 0; j < dir->nr; j++) {
-			if (i &&
-			    check_dir_entry_contains(dir->entries[i - 1], dir->entries[j])) {
-				FREE_AND_NULL(dir->entries[j]);
-			} else {
-				dir->entries[i++] = dir->entries[j];
-			}
-		}
-
-		dir->nr = i;
-	}
-
 	trace_performance_leave("read directory %.*s", len, path);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 10/12] dir: replace double pathspec matching with single in treat_directory()
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (8 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

treat_directory() had a call to both do_match_pathspec() and
match_pathspec().  These calls have migrated through the code somewhat
since their introduction, but we don't actually need both.  Replace the
two calls with one, and while at it, move the check earlier in order to
reduce the need for callers of fill_directory() to do post-filtering of
results.

The next patch will address post-filtering more forcefully and provide
more relevant history and context.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/dir.c b/dir.c
index 8be31df58c2..a67930dcff6 100644
--- a/dir.c
+++ b/dir.c
@@ -1665,6 +1665,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 *          you CAN'T DO BOTH.
 	 */
 	enum path_treatment state;
+	int matches_how = 0;
 	int nested_repo = 0, check_only, stop_early;
 	int old_ignored_nr, old_untracked_nr;
 	/* The "len-1" is to strip the final '/' */
@@ -1677,6 +1678,22 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (status != index_nonexistent)
 		BUG("Unhandled value for directory_exists_in_index: %d\n", status);
 
+	/*
+	 * We don't want to descend into paths that don't match the necessary
+	 * patterns.  Clearly, if we don't have a pathspec, then we can't check
+	 * for matching patterns.  Also, if (excluded) then we know we matched
+	 * the exclusion patterns so as an optimization we can skip checking
+	 * for matching patterns.
+	 */
+	if (pathspec && !excluded) {
+		matches_how = do_match_pathspec(istate, pathspec, dirname, len,
+						0 /* prefix */, NULL /* seen */,
+						DO_MATCH_LEADING_PATHSPEC);
+		if (!matches_how)
+			return path_none;
+	}
+
+
 	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
 		!(dir->flags & DIR_NO_GITLINKS)) {
 		struct strbuf sb = STRBUF_INIT;
@@ -1724,13 +1741,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we
 	 * need to recurse.
 	 */
-	if (pathspec) {
-		int ret = do_match_pathspec(istate, pathspec, dirname, len,
-					    0 /* prefix */, NULL /* seen */,
-					    DO_MATCH_LEADING_PATHSPEC);
-		if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
-			return path_recurse;
-	}
+	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
+		return path_recurse;
 
 	/*
 	 * Other than the path_recurse case immediately above, we only need
@@ -1850,18 +1862,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
 		state = excluded ? path_excluded : path_untracked;
 
-	/*
-	 * We can recurse into untracked directories that don't match any
-	 * of the given pathspecs when some file underneath the directory
-	 * might match one of the pathspecs.  If so, we should make sure
-	 * to note that the directory itself did not match.
-	 */
-	if (pathspec &&
-	    !match_pathspec(istate, pathspec, dirname, len,
-			    0 /* prefix */, NULL,
-			    0 /* do NOT special case dirs */))
-		state = path_none;
-
 	return state;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (9 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  2020-04-01  4:17         ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Traditionally, the expected calling convention for the dir.c API was:

    fill_directory(&dir, ..., pathspec)
    foreach entry in dir->entries:
        if (dir_path_match(entry, pathspec))
            process_or_display(entry)

This may have made sense once upon a time, because the fill_directory() call
could use cheap checks to avoid doing full pathspec matching, and an external
caller may have wanted to do other post-processing of the results anyway.
However:

    * this structure makes it easy for users of the API to get it wrong

    * this structure actually makes it harder to understand
      fill_directory() and the functions it uses internally.  It has
      tripped me up several times while trying to fix bugs and
      restructure things.

    * relying on post-filtering was already found to produce wrong
      results; pathspec matching had to be added internally for multiple
      cases in order to get the right results (see commits 404ebceda01c
      (dir: also check directories for matching pathspecs, 2019-09-17)
      and 89a1f4aaf765 (dir: if our pathspec might match files under a
      dir, recurse into it, 2019-09-17))

    * it's bad for performance: fill_directory() already has to do lots
      of checks and knows the subset of cases where it still needs to do
      more checks.  Forcing external callers to do full pathspec
      matching means they must re-check _every_ path.

So, add the pathspec matching within the fill_directory() internals, and
remove it from external callers.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c    |  6 ------
 builtin/grep.c     |  2 --
 builtin/ls-files.c |  5 +++--
 builtin/stash.c    | 17 +++++------------
 dir.c              |  9 ++++++++-
 wt-status.c        |  6 ++----
 6 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 5abf087e7c4..b189b7b4ea0 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -989,12 +989,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		if (!cache_name_is_other(ent->name, ent->len))
 			continue;
 
-		if (pathspec.nr)
-			matches = dir_path_match(&the_index, ent, &pathspec, 0, NULL);
-
-		if (pathspec.nr && !matches)
-			continue;
-
 		if (lstat(ent->name, &st))
 			die_errno("Cannot lstat '%s'", ent->name);
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 50ce8d94612..f3425102999 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -691,8 +691,6 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 
 	fill_directory(&dir, opt->repo->index, pathspec);
 	for (i = 0; i < dir.nr; i++) {
-		if (!dir_path_match(opt->repo->index, dir.entries[i], pathspec, 0, NULL))
-			continue;
 		hit |= grep_file(opt, dir.entries[i]->name);
 		if (hit && opt->status_only)
 			break;
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index f069a028cea..b87c22ac240 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -128,8 +128,9 @@ static void show_dir_entry(const struct index_state *istate,
 	if (len > ent->len)
 		die("git ls-files: internal error - directory entry not superset of prefix");
 
-	if (!dir_path_match(istate, ent, &pathspec, len, ps_matched))
-		return;
+	/* If ps_matches is non-NULL, figure out which pathspec(s) match. */
+	if (ps_matched)
+		dir_path_match(istate, ent, &pathspec, len, ps_matched);
 
 	fputs(tag, stdout);
 	write_eolinfo(istate, NULL, ent->name);
diff --git a/builtin/stash.c b/builtin/stash.c
index 4ad3adf4ba5..704740b245c 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -856,30 +856,23 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 			       struct strbuf *untracked_files)
 {
 	int i;
-	int max_len;
 	int found = 0;
-	char *seen;
 	struct dir_struct dir;
 
 	memset(&dir, 0, sizeof(dir));
 	if (include_untracked != INCLUDE_ALL_FILES)
 		setup_standard_excludes(&dir);
 
-	seen = xcalloc(ps->nr, 1);
-
-	max_len = fill_directory(&dir, the_repository->index, ps);
+	fill_directory(&dir, the_repository->index, ps);
 	for (i = 0; i < dir.nr; i++) {
 		struct dir_entry *ent = dir.entries[i];
-		if (dir_path_match(&the_index, ent, ps, max_len, seen)) {
-			found++;
-			strbuf_addstr(untracked_files, ent->name);
-			/* NUL-terminate: will be fed to update-index -z */
-			strbuf_addch(untracked_files, '\0');
-		}
+		found++;
+		strbuf_addstr(untracked_files, ent->name);
+		/* NUL-terminate: will be fed to update-index -z */
+		strbuf_addch(untracked_files, '\0');
 		free(ent);
 	}
 
-	free(seen);
 	free(dir.entries);
 	free(dir.ignored);
 	clear_directory(&dir);
diff --git a/dir.c b/dir.c
index a67930dcff6..2de64910401 100644
--- a/dir.c
+++ b/dir.c
@@ -2117,7 +2117,14 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				       baselen, excluded, pathspec);
 	case DT_REG:
 	case DT_LNK:
-		return excluded ? path_excluded : path_untracked;
+		if (excluded)
+			return path_excluded;
+		if (pathspec &&
+		    !do_match_pathspec(istate, pathspec, path->buf, path->len,
+				       0 /* prefix */, NULL /* seen */,
+				       0 /* flags */))
+			return path_none;
+		return path_untracked;
 	}
 }
 
diff --git a/wt-status.c b/wt-status.c
index cc6f94504d9..98dfa6f73f9 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -722,16 +722,14 @@ static void wt_status_collect_untracked(struct wt_status *s)
 
 	for (i = 0; i < dir.nr; i++) {
 		struct dir_entry *ent = dir.entries[i];
-		if (index_name_is_other(istate, ent->name, ent->len) &&
-		    dir_path_match(istate, ent, &s->pathspec, 0, NULL))
+		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->untracked, ent->name);
 		free(ent);
 	}
 
 	for (i = 0; i < dir.ignored_nr; i++) {
 		struct dir_entry *ent = dir.ignored[i];
-		if (index_name_is_other(istate, ent->name, ent->len) &&
-		    dir_path_match(istate, ent, &s->pathspec, 0, NULL))
+		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->ignored, ent->name);
 		free(ent);
 	}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory
  2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
                           ` (10 preceding siblings ...)
  2020-04-01  4:17         ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget
@ 2020-04-01  4:17         ` Elijah Newren via GitGitGadget
  11 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-04-01  4:17 UTC (permalink / raw)
  To: git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Derrick Stolee,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

As reported on the git mailing list, since git-2.25,
    git add untracked-dir/
has been tab completing to
    git add untracked-dir/./

The cause for this was that with commit b9670c1f5e (dir: fix checks on
common prefix directory, 2019-12-19),
    git ls-files -o --directory untracked-dir/
(or the equivalent `git -C untracked-dir ls-files -o --directory`) began
reporting
    untracked-dir/
instead of listing paths underneath that directory.  It may also be
worth noting that the real command in question was
    git -C untracked-dir ls-files -o --directory '*'
which is equivalent to
    git ls-files -o --directory 'untracked-dir/*'
which behaves the same for the purposes of this issue (the '*' can match
the empty string), but becomes relevant for the proposed fix.

At first, based on the report, I decided to try to view this as a
regression and tried to find a way to recover the old behavior without
breaking other stuff, or at least breaking as little as possible.
However, in the end, I couldn't figure out a way to do it that wouldn't
just cause lots more problems than it solved.  The old behavior was a
bug:
  * Although older git would avoid cleaning anything with `git clean -f
    .git`, it would wipe out everything under that direcotry with `git
    clean -f .git/`.  Despite the difference in command used, this is
    relevant because the exact same change that fixed clean changed the
    behavior of ls-files.
  * Older git would report different results based solely on presence or
    absence of a trailing slash for $SUBDIR in the command `git ls-files
    -o --directory $SUBDIR`.
  * Older git violated the documented behavior of not recursing into
    directories that matched the pathspec when --directory was
    specified.
  * And, after all, commit b9670c1f5e (dir: fix checks on common prefix
    directory, 2019-12-19) didn't overlook this issue; it explicitly
    stated that the behavior of the command was being changed to bring
    it inline with the docs.

(Also, if it helps, despite that commit being merged during the 2.25
series, this bug was not reported during the 2.25 cycle, nor even during
most of the 2.26 cycle -- it was reported a day before 2.26 was
released.  So the impact of the change is at least somewhat small.)

Instead of relying on a bug of ls-files in reporting the wrong content,
change the invocation of ls-files used by git-completion to make it grab
paths one depth deeper.  Do this by changing '$DIR/*' (match $DIR/ plus
0 or more characters) into '$DIR/?*' (match $DIR/ plus 1 or more
characters).  Note that the '?' character should not be added when
trying to complete a filename (e.g. 'git ls-files -o --directory
"merge.c?*"' would not correctly return "merge.c" when such a file
exists), so we have to make sure to add the '?' character only in cases
where the path specified so far is a directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 contrib/completion/git-completion.bash | 2 +-
 t/t9902-completion.sh                  | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index e4d9ff4a95c..1032b642297 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -504,7 +504,7 @@ __git_index_files ()
 {
 	local root="$2" match="$3"
 
-	__git_ls_files_helper "$root" "$1" "$match" |
+	__git_ls_files_helper "$root" "$1" "${match:-?}" |
 	awk -F / -v pfx="${2//\\/\\\\}" '{
 		paths[$1] = 1
 	}
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 93877ba9cd6..d9a6425671f 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -1581,6 +1581,11 @@ test_expect_success 'complete files' '
 	echo modify > modified &&
 	test_completion "git add " "modified" &&
 
+	mkdir -p some/deep &&
+	touch some/deep/path &&
+	test_completion "git add some/" "some/deep" &&
+	git clean -f some &&
+
 	touch untracked &&
 
 	: TODO .gitignore should not be here &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v5 08/12] dir: replace exponential algorithm with a linear one
  2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
@ 2020-04-01 13:57           ` Derrick Stolee
  2020-04-01 15:59             ` Elijah Newren
  0 siblings, 1 reply; 68+ messages in thread
From: Derrick Stolee @ 2020-04-01 13:57 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Martin Melka, SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy, Elijah Newren

On 4/1/2020 12:17 AM, Elijah Newren via GitGitGadget wrote:
> @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  	const char *dirname, int len, int baselen, int excluded,
>  	const struct pathspec *pathspec)
>  {
> -	int nested_repo = 0;
> +	/*
> +	 * WARNING: From this function, you can return path_recurse or you
> +	 *          can call read_directory_recursive() (or neither), but
> +	 *          you CAN'T DO BOTH.
> +	 */
> +	enum path_treatment state;
> +	int nested_repo = 0, old_ignored_nr, check_only, stop_early;
>  	/* The "len-1" is to strip the final '/' */
>  	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
>  
> @@ -1711,18 +1717,135 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  
>  	/* This is the "show_other_directories" case */
>  
> -	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> +	/*
> +	 * If we have a pathspec which could match something _below_ this
> +	 * directory (e.g. when checking 'subdir/' having a pathspec like
> +	 * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we
> +	 * need to recurse.

I was extremely skeptical about this approach due to leading wildcards
like "*.c" or "sub*/*.h" but found this comment inside math_pathspec_item():

		/*
		 * Here is where we would perform a wildmatch to check if
		 * "name" can be matched as a directory (or a prefix) against
		 * the pathspec.  Since wildmatch doesn't have this capability
		 * at the present we have to punt and say that it is a match,
		 * potentially returning a false positive
		 * The submodules themselves will be able to perform more
		 * accurate matching to determine if the pathspec matches.
		 */
		return MATCHED_RECURSIVELY_LEADING_PATHSPEC;

So it looks like it will return MATCHED_RECURSIVELY_LEADING_PATHSPEC as
expected by this block below:

> +	 */
> +	if (pathspec) {
> +		int ret = do_match_pathspec(istate, pathspec, dirname, len,
> +					    0 /* prefix */, NULL /* seen */,
> +					    DO_MATCH_LEADING_PATHSPEC);
> +		if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
> +			return path_recurse;
> +	}

I can't say that I fully understand the change to this patch yet, but at
least my initial "THAT CAN'T POSSIBLY WORK!" reaction has been tempered.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v5 08/12] dir: replace exponential algorithm with a linear one
  2020-04-01 13:57           ` Derrick Stolee
@ 2020-04-01 15:59             ` Elijah Newren
  0 siblings, 0 replies; 68+ messages in thread
From: Elijah Newren @ 2020-04-01 15:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka,
	SZEDER Gábor, Samuel Lijin,
	Nguyễn Thái Ngọc Duy

On Wed, Apr 1, 2020 at 6:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/1/2020 12:17 AM, Elijah Newren via GitGitGadget wrote:
> > @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >       const char *dirname, int len, int baselen, int excluded,
> >       const struct pathspec *pathspec)
> >  {
> > -     int nested_repo = 0;
> > +     /*
> > +      * WARNING: From this function, you can return path_recurse or you
> > +      *          can call read_directory_recursive() (or neither), but
> > +      *          you CAN'T DO BOTH.
> > +      */
> > +     enum path_treatment state;
> > +     int nested_repo = 0, old_ignored_nr, check_only, stop_early;
> >       /* The "len-1" is to strip the final '/' */
> >       enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
> >
> > @@ -1711,18 +1717,135 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >
> >       /* This is the "show_other_directories" case */
> >
> > -     if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> > +     /*
> > +      * If we have a pathspec which could match something _below_ this
> > +      * directory (e.g. when checking 'subdir/' having a pathspec like
> > +      * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we
> > +      * need to recurse.
>
> I was extremely skeptical about this approach due to leading wildcards
> like "*.c" or "sub*/*.h" but found this comment inside math_pathspec_item():
>
>                 /*
>                  * Here is where we would perform a wildmatch to check if
>                  * "name" can be matched as a directory (or a prefix) against
>                  * the pathspec.  Since wildmatch doesn't have this capability
>                  * at the present we have to punt and say that it is a match,
>                  * potentially returning a false positive
>                  * The submodules themselves will be able to perform more
>                  * accurate matching to determine if the pathspec matches.
>                  */
>                 return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
>
> So it looks like it will return MATCHED_RECURSIVELY_LEADING_PATHSPEC as
> expected by this block below:
>
> > +      */
> > +     if (pathspec) {
> > +             int ret = do_match_pathspec(istate, pathspec, dirname, len,
> > +                                         0 /* prefix */, NULL /* seen */,
> > +                                         DO_MATCH_LEADING_PATHSPEC);
> > +             if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
> > +                     return path_recurse;
> > +     }
>
> I can't say that I fully understand the change to this patch yet, but at
> least my initial "THAT CAN'T POSSIBLY WORK!" reaction has been tempered.

I don't know if it helps you feel better about this block or not, but
it existed (in just slightly modified form) in dir.c before patch 7; I
just missed it when I was restructuring and thus didn't have it in my
first four rounds of this series.  (Funnily enough, I was the one who
added this LEADING_PATHSPEC logic to dir.c a while back, and you'd
think if I was going to overlook some code when I was restructuring,
that it surely couldn't be bits that I had added myself.)  So, that
basically means that dir.c has been relying on this logic for some
time, and I just needed to make sure to include it in this
restructuring.

Elijah

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, back to index

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-30 15:20   ` Derrick Stolee
2020-01-31 18:04   ` SZEDER Gábor
2020-01-31 18:17     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
2020-01-30 15:33   ` Derrick Stolee
2020-01-30 15:45     ` Elijah Newren
2020-01-30 16:00       ` Derrick Stolee
2020-01-30 16:10         ` Derrick Stolee
2020-01-30 16:20           ` Elijah Newren
2020-01-30 18:17             ` Derrick Stolee
2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-30 15:55   ` Derrick Stolee
2020-01-30 17:13     ` Elijah Newren
2020-01-30 17:45       ` Elijah Newren
2020-01-31 17:13   ` SZEDER Gábor
2020-01-31 17:47     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
2020-03-26 13:02       ` Derrick Stolee
2020-03-26 21:18         ` Elijah Newren
2020-03-25 19:31     ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
2020-03-26 13:13       ` Derrick Stolee
2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-03-27 13:09         ` Derrick Stolee
2020-03-29 18:18           ` Junio C Hamano
2020-03-31 20:15             ` Elijah Newren
2020-03-26 21:27       ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
2020-03-28 17:33         ` Elijah Newren
2020-03-29 18:20           ` Junio C Hamano
2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-04-01 13:57           ` Derrick Stolee
2020-04-01 15:59             ` Elijah Newren
2020-04-01  4:17         ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git