git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Directory traversal bugs
@ 2019-12-09 20:47 Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
                   ` (8 more replies)
  0 siblings, 9 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

This series fixes multiple fill_directory() bugs, one of them new to 2.24.0
coming from en/clean-nested-with-ignored-topic, the rest having been around
in versions of git going back up to a decade. There is also one testcase it
documents but does not fix; I tracked the code for that testcase far enough
to determine that fill_directory() and its callees were not at fault; rather
post-fill_directory() processing done by ls-files eventually calls
git_fnmatch() and it incorrectly filters out one of the returned paths. I
suspect there's a whole can of wildmatch() worms to be found following that
thread, so I just opted to document my findings next to that testcase.

See https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/ for the report
spawning this series.

Some comments about notable items in this series:

 * Patch 2: Revert a previous patch which fixed status --ignore behavior
   incorrectly and which complicates code that we will need to significantly
   restructure in order to fix all the issues we want to address (patches 6
   & 7 provide the right fix)
 * Patch 4: a fix to my en/clean-nested-with-ignored-topic, fixing the new
   bugs
 * Patches 6&7: the fixes to the old issues (Other patches were adding
   testcases, code cleanups, comment cleanups, etc.)

CC: blees@dcon.de, gitster@pobox.com, kyle@kyleam.com, sxlijin@gmail.com

Elijah Newren (8):
  t3011: demonstrate directory traversal failures
  Revert "dir.c: make 'git-status --ignored' work within leading
    directories"
  dir: remove stray quote character in comment
  dir: exit before wildcard fall-through if there is no wildcard
  dir: break part of read_directory_recursive() out for reuse
  dir: fix checks on common prefix directory
  dir: synchronize treat_leading_path() and read_directory_recursive()
  dir: consolidate similar code in treat_directory()

 dir.c                                         | 174 +++++++++++-----
 ...common-prefixes-and-directory-traversal.sh | 193 ++++++++++++++++++
 t/t7061-wtstatus-ignore.sh                    |   7 +-
 3 files changed, 323 insertions(+), 51 deletions(-)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh


base-commit: da72936f544fec5a335e66432610e4cef4430991
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-676%2Fnewren%2Fls-files-bug-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-676/newren/ls-files-bug-v1
Pull-Request: https://github.com/git/git/pull/676
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/8] t3011: demonstrate directory traversal failures
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 21:06   ` Denton Liu
  2019-12-09 20:47 ` [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add several tests demonstrating directory traversal failures of various
sorts in dir.c (and one similar looking test that turns out to be a
git_fnmatch bug).  A lot of these tests look like near duplicates of
each other, but an optimization path in dir.c to pre-descend into a
common prefix and the specialized treatment of trailing slashes in dir.c
mean the tiny differences are sometimes important and potentially cause
different codepaths to be explored.

Of the 7 failing tests, 2 are new to git-2.24.0 (tweaked by side effects
of the en/clean-nested-with-ignored-topic); the other 5 also failed
under git-2.23.0 and earlier.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 ...common-prefixes-and-directory-traversal.sh | 193 ++++++++++++++++++
 1 file changed, 193 insertions(+)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh

diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
new file mode 100755
index 0000000000..773d6038d1
--- /dev/null
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -0,0 +1,193 @@
+#!/bin/sh
+
+test_description='directory traversal handling, especially with common prefixes'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit hello &&
+
+	>empty &&
+	mkdir untracked_dir &&
+	>untracked_dir/empty &&
+	git init untracked_repo &&
+	>untracked_repo/empty &&
+
+	echo ignored >.gitignore &&
+	echo an_ignored_dir/ >>.gitignore &&
+	mkdir an_ignored_dir &&
+	mkdir an_untracked_dir &&
+	>an_ignored_dir/ignored &&
+	>an_ignored_dir/untracked &&
+	>an_untracked_dir/ignored &&
+	>an_untracked_dir/untracked
+'
+
+test_expect_success 'git ls-files -o shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_ignored_dir/ignored
+	an_ignored_dir/untracked
+	an_untracked_dir/ignored
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --exclude-standard shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o --exclude-standard >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_repo does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+	echo untracked_dir/empty >expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
+	echo untracked_dir/empty >expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+	echo untracked_dir/ >expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o --directory untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
+	echo untracked_dir/ >expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o .git shows nothing' '
+	>expect &&
+	git ls-files -o .git >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o .git/ shows nothing' '
+	>expect &&
+	git ls-files -o .git/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
+	mkdir "untracked_*" &&
+	>"untracked_*/empty" &&
+
+	echo "untracked_*/empty" >expect &&
+	echo untracked_dir/empty >>expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+# It turns out fill_directory returns the right paths, but ls-files' post-call
+# filtering in show_dir_entry() via calling dir_path_match() which ends up
+# in git_fnmatch() has logic for PATHSPEC_ONESTAR that assumes the pathspec
+# must match the full path; it doesn't check it for matching a leading
+# directory.
+test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
+	echo "untracked_*/empty" >expect &&
+	echo untracked_dir/empty >>expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
+	echo "untracked_*/" >expect &&
+	echo untracked_dir/ >>expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o --directory "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
+	echo "untracked_*/" >expect &&
+	echo untracked_dir/ >>expect &&
+	echo untracked_repo/ >>expect &&
+	git ls-files -o --directory "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o consistent between one or two dirs' '
+	git ls-files -o --exclude-standard an_ignored_dir/ an_untracked_dir/ >tmp &&
+	! grep ^an_ignored_dir/ tmp >expect &&
+	git ls-files -o --exclude-standard an_ignored_dir/ >actual &&
+	test_cmp expect actual
+'
+
+# ls-files doesn't have a way to request showing both untracked and ignored
+# files at the same time, so use `git status --ignored`
+test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+	cat <<-EOF >expect &&
+	?? an_untracked_dir/
+	!! an_untracked_dir/ignored
+	EOF
+	git status --porcelain --ignored >output &&
+	grep an_untracked_dir output >expect &&
+	git status --porcelain --ignored an_untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 21:32   ` Denton Liu
  2019-12-09 20:47 ` [PATCH 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit be8a84c52669 ("dir.c: make 'git-status --ignored' work within
leading directories", 2013-04-15) noted that
   git status --ignored <SOMEPATH>
would not list ignored files and directories within <SOMEPATH> if
<SOMEPATH> was untracked, and modified the behavior to make it show
them.  However, it did so via a hack that broke consistency; it would
show paths under <SOMEPATH> differently than a simple
   git status --ignored | grep <SOMEPATH>
would show them.  A correct fix is slightly more involved, and
complicated slightly by this hack, so we revert this commit (but keep
corrected versions of the testcases) and will later fix the original
bug with a subsequent patch.

Some history may be helpful:

A very, very similar case to the commit we are reverting was raised in
commit 48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08); but it actually went in somewhat the opposite direction.  In
that commit, it mentioned how
   git ls-files -o --exclude-standard t/
used to show untracked files under t/ even when t/ was ignored, and then
changed the behavior to stop showing untracked files under an ignored
directory.  More importantly, this commit considered keeping this
behavior but noted that it would be inconsistent with the behavior when
multiple pathspecs were specified and thus rejected it.

The reason for this whole inconsistency when one pathspec is specified
versus zero or two is because common prefixes of pathspecs are sent
through a different set of checks (in treat_leading_path()) than normal
file/directory traversal (those go through read_directory_recursive()
and treat_path()).  As such, for consistency, one needs to check that
both codepaths produce the same result.

Revert commit be8a84c526691667fc04a8241d93a3de1de298ab, except instead
of removing the testcase it added, modify it to check for correct and
consistent behavior.  A subsequent patch in this series will fix the
testcase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                      | 3 ---
 t/t7061-wtstatus-ignore.sh | 7 +++++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index 61f559f980..0dd5266629 100644
--- a/dir.c
+++ b/dir.c
@@ -2083,14 +2083,12 @@ static int treat_leading_path(struct dir_struct *dir,
 	struct strbuf sb = STRBUF_INIT;
 	int baselen, rc = 0;
 	const char *cp;
-	int old_flags = dir->flags;
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
 	baselen = 0;
-	dir->flags &= ~DIR_SHOW_OTHER_DIRECTORIES;
 	while (1) {
 		cp = path + baselen + !!baselen;
 		cp = memchr(cp, '/', path + len - cp);
@@ -2113,7 +2111,6 @@ static int treat_leading_path(struct dir_struct *dir,
 		}
 	}
 	strbuf_release(&sb);
-	dir->flags = old_flags;
 	return rc;
 }
 
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 0c394cf995..ded7f97181 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -43,11 +43,14 @@ test_expect_success 'status untracked directory with --ignored -u' '
 	test_cmp expected actual
 '
 cat >expected <<\EOF
-?? untracked/uncommitted
+?? untracked/
 !! untracked/ignored
 EOF
 
-test_expect_success 'status prefixed untracked directory with --ignored' '
+test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+	git status --porcelain --ignored | grep untracked/ >actual &&
+	test_cmp expected actual &&
+
 	git status --porcelain --ignored untracked/ >actual &&
 	test_cmp expected actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 3/8] dir: remove stray quote character in comment
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 0dd5266629..5dacacd469 100644
--- a/dir.c
+++ b/dir.c
@@ -373,7 +373,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
-		/* name" doesn't match up to the first wild character */
+		/* name doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
 		    ps_strncmp(item, match, name,
 			       item->nowildcard_len - prefix))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 4/8] dir: exit before wildcard fall-through if there is no wildcard
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2019-12-09 20:47 ` [PATCH 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a
wildcard, noting that we don't yet have enough information to determine
if a further paths under the current directory might match due to the
presence of wildcards.  But if we have no wildcards in our pathspec,
then we shouldn't get to that fall-through case.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                              | 7 +++++++
 t/t3011-common-prefixes-and-directory-traversal.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 5dacacd469..517a569e10 100644
--- a/dir.c
+++ b/dir.c
@@ -379,6 +379,13 @@ static int match_pathspec_item(const struct index_state *istate,
 			       item->nowildcard_len - prefix))
 			return 0;
 
+		/*
+		 * name has no wildcard, and it didn't match as a leading
+		 * pathspec so return.
+		 */
+		if (item->nowildcard_len == item->len)
+			return 0;
+
 		/*
 		 * Here is where we would perform a wildmatch to check if
 		 * "name" can be matched as a directory (or a prefix) against
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 773d6038d1..d4c06fcd76 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -90,7 +90,7 @@ test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+test_expect_success 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
 	echo untracked_dir/empty >expect &&
 	echo untracked_repo/ >>expect &&
 	git ls-files -o untracked_dir untracked_repo >actual &&
@@ -104,7 +104,7 @@ test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses int
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
 	echo untracked_dir/ >expect &&
 	echo untracked_repo/ >>expect &&
 	git ls-files -o --directory untracked_dir untracked_repo >actual &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 5/8] dir: break part of read_directory_recursive() out for reuse
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2019-12-09 20:47 ` [PATCH 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create an add_path_to_appropriate_result_list() function from the code
at the end of read_directory_recursive() so we can use it elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 60 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/dir.c b/dir.c
index 517a569e10..645b44ea64 100644
--- a/dir.c
+++ b/dir.c
@@ -1932,6 +1932,40 @@ static void close_cached_dir(struct cached_dir *cdir)
 	}
 }
 
+static void add_path_to_appropriate_result_list(struct dir_struct *dir,
+	struct untracked_cache_dir *untracked,
+	struct cached_dir *cdir,
+	struct index_state *istate,
+	struct strbuf *path,
+	int baselen,
+	const struct pathspec *pathspec,
+	enum path_treatment state)
+{
+	/* add the path to the appropriate result list */
+	switch (state) {
+	case path_excluded:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			dir_add_name(dir, istate, path->buf, path->len);
+		else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			((dir->flags & DIR_COLLECT_IGNORED) &&
+			exclude_matches_pathspec(path->buf, path->len,
+						 pathspec)))
+			dir_add_ignored(dir, istate, path->buf, path->len);
+		break;
+
+	case path_untracked:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			break;
+		dir_add_name(dir, istate, path->buf, path->len);
+		if (cdir->fdir)
+			add_untracked(untracked, path->buf + baselen);
+		break;
+
+	default:
+		break;
+	}
+}
+
 /*
  * Read a directory tree. We currently ignore anything but
  * directories, regular files and symlinks. That's because git
@@ -2035,29 +2069,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			continue;
 		}
 
-		/* add the path to the appropriate result list */
-		switch (state) {
-		case path_excluded:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				dir_add_name(dir, istate, path.buf, path.len);
-			else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-				((dir->flags & DIR_COLLECT_IGNORED) &&
-				exclude_matches_pathspec(path.buf, path.len,
-							 pathspec)))
-				dir_add_ignored(dir, istate, path.buf, path.len);
-			break;
-
-		case path_untracked:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				break;
-			dir_add_name(dir, istate, path.buf, path.len);
-			if (cdir.fdir)
-				add_untracked(untracked, path.buf + baselen);
-			break;
-
-		default:
-			break;
-		}
+		add_path_to_appropriate_result_list(dir, untracked, &cdir,
+						    istate, &path, baselen,
+						    pathspec, state);
 	}
 	close_cached_dir(&cdir);
  out:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 6/8] dir: fix checks on common prefix directory
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2019-12-09 20:47 ` [PATCH 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Many years ago, the directory traversing logic had an optimization that
would always recurse into any directory that was a common prefix of all
the pathspecs without walking the leading directories to get down to
the desired directory.  Thus,
   git ls-files -o .git/                        # case A
would notice that .git/ was a common prefix of all pathspecs (since
it is the only pathspec listed), and then traverse into it and start
showing unknown files under that directory.  Unfortunately, .git/ is not
a directory we should be traversing into, which made this optimization
problematic.  This also affected cases like
   git ls-files -o --exclude-standard t/        # case B
where t/ was in the .gitignore file and thus isn't interesting and
shouldn't be recursed into.  It also affected cases like
   git ls-files -o --directory untracked_dir/   # case C
where untracked_dir/ is indeed untracked and thus interesting, but the
--directory flag means we only want to show the directory itself, not
recurse into it and start listing untracked files below it.

The case B class of bugs were noted and fixed in commits 16e2cfa90993
("read_directory(): further split treat_path()", 2010-01-08) and
48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08), with the idea being that we first wanted to check whether
the common prefix was interesting.  The former patch noted that
treat_path() couldn't be used when checking the common prefix because
treat_path() requires a dir_entry() and we haven't read any directories
at the point we are checking the common prefix.  So, that patch split
treat_one_path() out of treat_path().  The latter patch then created a
new treat_leading_path() which duplicated by hand the bits of
treat_path() that couldn't be broken out and then called
treat_one_path() for the remainder.  There were three problems with this
approach:

  * The duplicated logic in treat_leading_path() accidentally missed the
    check for special paths (such as is_dot_or_dotdot and matching
    ".git"), causing case A types of bugs to continue to be an issue.
  * The treat_leading_path() logic assumed we should traverse into
    anything where path_treatment was not path_none, i.e. it perpetuated
    class C types of bugs.
  * It meant we had split logic that needed to kept in sync, running the
    risk that people introduced new inconsistencies (such as in commit
    be8a84c52669, which we reverted earlier in this series, or in commit
    df5bcdf83ae which we'll fix in a subsequent commit)

Fix most these problems by making treat_leading_path() not only loop
over each leading path component, but calling treat_path() directly on
each.  To do so, we have to create a synthetic dir_entry, but that only
takes a few lines.  Then, pay attention to the path_treatment result we
get from treat_path() and don't treat path_excluded, path_untracked, and
path_recurse all the same as path_recurse.

This leaves one remaining problem, the new inconsistency from commit
df5bcdf83ae.  That will be addressed in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 54 +++++++++++++++----
 ...common-prefixes-and-directory-traversal.sh |  6 +--
 2 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/dir.c b/dir.c
index 645b44ea64..9c71a9ac21 100644
--- a/dir.c
+++ b/dir.c
@@ -2102,37 +2102,69 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const struct pathspec *pathspec)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int baselen, rc = 0;
+	int prevlen, baselen;
 	const char *cp;
+	struct cached_dir cdir;
+	struct dirent de;
+	enum path_treatment state = path_none;
+
+	/*
+	 * For each directory component of path, we are going to check whether
+	 * that path is relevant given the pathspec.  For example, if path is
+	 *    foo/bar/baz/
+	 * then we will ask treat_path() whether we should go into foo, then
+	 * whether we should go into bar, then whether baz is relevant.
+	 * Checking each is important because e.g. if path is
+	 *    .git/info/
+	 * then we need to check .git to know we shouldn't traverse it.
+	 * If the return from treat_path() is:
+	 *    * path_none, for any path, we return false.
+	 *    * path_recurse, for all path components, we return true
+	 *    * <anything else> for some intermediate component, we make sure
+	 *        to add that path to the relevant list but return false
+	 *        signifying that we shouldn't recurse into it.
+	 */
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
+
+	memset(&cdir, 0, sizeof(cdir));
+	memset(&de, 0, sizeof(de));
+	cdir.de = &de;
+	de.d_type = DT_DIR;
 	baselen = 0;
+	prevlen = 0;
 	while (1) {
-		cp = path + baselen + !!baselen;
+		prevlen = baselen + !!baselen;
+		cp = path + prevlen;
 		cp = memchr(cp, '/', path + len - cp);
 		if (!cp)
 			baselen = len;
 		else
 			baselen = cp - path;
-		strbuf_setlen(&sb, 0);
+		strbuf_reset(&sb);
 		strbuf_add(&sb, path, baselen);
 		if (!is_directory(sb.buf))
 			break;
-		if (simplify_away(sb.buf, sb.len, pathspec))
-			break;
-		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
-				   DT_DIR, NULL) == path_none)
+		strbuf_reset(&sb);
+		strbuf_add(&sb, path, prevlen);
+		memcpy(de.d_name, path+prevlen, baselen-prevlen);
+		de.d_name[baselen-prevlen] = '\0';
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
+				    pathspec);
+		if (state != path_recurse)
 			break; /* do not recurse into it */
-		if (len <= baselen) {
-			rc = 1;
+		if (len <= baselen)
 			break; /* finished checking */
-		}
 	}
+	add_path_to_appropriate_result_list(dir, NULL, &cdir, istate,
+					    &sb, baselen, pathspec,
+					    state);
+
 	strbuf_release(&sb);
-	return rc;
+	return state == path_recurse;
 }
 
 static const char *get_ident_string(void)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index d4c06fcd76..0151ea8b6d 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -72,7 +72,7 @@ test_expect_success 'git ls-files -o --directory untracked_dir does not recurse'
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir/ does not recurse' '
 	echo untracked_dir/ >expect &&
 	git ls-files -o --directory untracked_dir/ >actual &&
 	test_cmp expect actual
@@ -84,7 +84,7 @@ test_expect_success 'git ls-files -o untracked_repo does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+test_expect_success 'git ls-files -o untracked_repo/ does not recurse' '
 	echo untracked_repo/ >expect &&
 	git ls-files -o untracked_repo/ >actual &&
 	test_cmp expect actual
@@ -124,7 +124,7 @@ test_expect_success 'git ls-files -o .git shows nothing' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o .git/ shows nothing' '
+test_expect_success 'git ls-files -o .git/ shows nothing' '
 	>expect &&
 	git ls-files -o .git/ >actual &&
 	test_cmp expect actual
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 7/8] dir: synchronize treat_leading_path() and read_directory_recursive()
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2019-12-09 20:47 ` [PATCH 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-09 20:47 ` [PATCH 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our optimization to avoid calling into read_directory_recursive() when
all pathspecs have a common leading directory mean that we need to match
the logic that read_directory_recursive() would use if we had just
called it from the root.  Since it does more than call treat_path() we
need to copy that same logic.

Alternatively, we could try to change treat_path to return path_recurse
for an untracked directory under the given special circumstances that
this logic checks for, but a simple switch results in many test failures
such as 'git clean -d' not wiping out untracked but empty directories.
To work around that, we'd need the caller of treat_path to check for
path_recurse and sometimes special case it into path_untracked.  In
other words, we'd still have extra logic in both places.

Needing to duplicate logic like this means it is guaranteed someone will
eventually need to make further changes and forget to update both
locations.  It is tempting to just nuke the leading_directory special
casing to avoid such bugs and simplify the code, but unpack_trees'
verify_clean_subdirectory() also calls read_directory() and does so with
a non-empty leading path, so I'm hesitant to try to restructure further.
Add obnoxious warnings to treat_leading_path() and
read_directory_recursive() to try to warn people of such problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 30 +++++++++++++++++++
 ...common-prefixes-and-directory-traversal.sh |  2 +-
 t/t7061-wtstatus-ignore.sh                    |  2 +-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 9c71a9ac21..bb6e481909 100644
--- a/dir.c
+++ b/dir.c
@@ -1990,6 +1990,15 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct untracked_cache_dir *untracked, int check_only,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2101,6 +2110,15 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct strbuf sb = STRBUF_INIT;
 	int prevlen, baselen;
 	const char *cp;
@@ -2154,6 +2172,18 @@ static int treat_leading_path(struct dir_struct *dir,
 		de.d_name[baselen-prevlen] = '\0';
 		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
 				    pathspec);
+		if (state == path_untracked &&
+		    get_dtype(cdir.de, istate, sb.buf, sb.len) == DT_DIR &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
+		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
+				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
+			add_path_to_appropriate_result_list(dir, NULL, &cdir,
+							    istate,
+							    &sb, baselen,
+							    pathspec, state);
+			state = path_recurse;
+		}
+
 		if (state != path_recurse)
 			break; /* do not recurse into it */
 		if (len <= baselen)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 0151ea8b6d..d48ee1a320 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -179,7 +179,7 @@ test_expect_success 'git ls-files -o consistent between one or two dirs' '
 
 # ls-files doesn't have a way to request showing both untracked and ignored
 # files at the same time, so use `git status --ignored`
-test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+test_expect_success 'git status --ignored shows same files under dir with or without pathspec' '
 	cat <<-EOF >expect &&
 	?? an_untracked_dir/
 	!! an_untracked_dir/ignored
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index ded7f97181..d8060a42e4 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -47,7 +47,7 @@ cat >expected <<\EOF
 !! untracked/ignored
 EOF
 
-test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+test_expect_success 'status of untracked directory with --ignored works with or without prefix' '
 	git status --porcelain --ignored | grep untracked/ >actual &&
 	test_cmp expected actual &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 8/8] dir: consolidate similar code in treat_directory()
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                   ` (6 preceding siblings ...)
  2019-12-09 20:47 ` [PATCH 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
@ 2019-12-09 20:47 ` Elijah Newren via GitGitGadget
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-09 20:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Both the DIR_SKIP_NESTED_GIT and DIR_NO_GITLINKS cases were checking for
whether a path was actually a nonbare repository.  That code could be
shared, with just the result of how to act differing between the two
cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/dir.c b/dir.c
index bb6e481909..04541b798b 100644
--- a/dir.c
+++ b/dir.c
@@ -1461,6 +1461,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int exclude,
 	const struct pathspec *pathspec)
 {
+	int nested_repo = 0;
+
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(istate, dirname, len-1)) {
 	case index_directory:
@@ -1470,15 +1472,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
-		if (dir->flags & DIR_SKIP_NESTED_GIT) {
-			int nested_repo;
+		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		    !(dir->flags & DIR_NO_GITLINKS)) {
 			struct strbuf sb = STRBUF_INIT;
 			strbuf_addstr(&sb, dirname);
 			nested_repo = is_nonbare_repository_dir(&sb);
 			strbuf_release(&sb);
-			if (nested_repo)
-				return path_none;
 		}
+		if (nested_repo)
+			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+				(exclude ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
@@ -1506,13 +1509,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 			return path_none;
 		}
-		if (!(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			if (is_nonbare_repository_dir(&sb))
-				return exclude ? path_excluded : path_untracked;
-			strbuf_release(&sb);
-		}
 		return path_recurse;
 	}
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/8] t3011: demonstrate directory traversal failures
  2019-12-09 20:47 ` [PATCH 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
@ 2019-12-09 21:06   ` Denton Liu
  0 siblings, 0 replies; 69+ messages in thread
From: Denton Liu @ 2019-12-09 21:06 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Junio C Hamano, Elijah Newren

Hi Elijah,

On Mon, Dec 09, 2019 at 08:47:38PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Add several tests demonstrating directory traversal failures of various
> sorts in dir.c (and one similar looking test that turns out to be a
> git_fnmatch bug).  A lot of these tests look like near duplicates of
> each other, but an optimization path in dir.c to pre-descend into a
> common prefix and the specialized treatment of trailing slashes in dir.c
> mean the tiny differences are sometimes important and potentially cause
> different codepaths to be explored.
> 
> Of the 7 failing tests, 2 are new to git-2.24.0 (tweaked by side effects
> of the en/clean-nested-with-ignored-topic); the other 5 also failed
> under git-2.23.0 and earlier.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  ...common-prefixes-and-directory-traversal.sh | 193 ++++++++++++++++++
>  1 file changed, 193 insertions(+)
>  create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh
> 
> diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
> new file mode 100755
> index 0000000000..773d6038d1
> --- /dev/null
> +++ b/t/t3011-common-prefixes-and-directory-traversal.sh
> @@ -0,0 +1,193 @@
> +#!/bin/sh
> +
> +test_description='directory traversal handling, especially with common prefixes'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +	test_commit hello &&
> +
> +	>empty &&
> +	mkdir untracked_dir &&
> +	>untracked_dir/empty &&
> +	git init untracked_repo &&
> +	>untracked_repo/empty &&
> +
> +	echo ignored >.gitignore &&
> +	echo an_ignored_dir/ >>.gitignore &&

Could we just do

	cat <<-EOF >expect
	ignored
	an_ignored_dir/
	EOF

instead? Alternatively, we can use test_write_lines() but since we have
here-doc usages below already, let's keep using here-docs here to be
consistent. Same comment applies to anywhere in this patch where we have
a >>.

> +	mkdir an_ignored_dir &&
> +	mkdir an_untracked_dir &&
> +	>an_ignored_dir/ignored &&
> +	>an_ignored_dir/untracked &&
> +	>an_untracked_dir/ignored &&
> +	>an_untracked_dir/untracked
> +'
> +
> +test_expect_success 'git ls-files -o shows the right entries' '
> +	cat <<-EOF >expect &&
> +	.gitignore
> +	actual
> +	an_ignored_dir/ignored
> +	an_ignored_dir/untracked
> +	an_untracked_dir/ignored
> +	an_untracked_dir/untracked
> +	empty
> +	expect
> +	untracked_dir/empty
> +	untracked_repo/
> +	EOF
> +	git ls-files -o >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o --exclude-standard shows the right entries' '
> +	cat <<-EOF >expect &&
> +	.gitignore
> +	actual
> +	an_untracked_dir/untracked
> +	empty
> +	expect
> +	untracked_dir/empty
> +	untracked_repo/
> +	EOF
> +	git ls-files -o --exclude-standard >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o untracked_dir recurses' '
> +	echo untracked_dir/empty >expect &&
> +	git ls-files -o untracked_dir >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o untracked_dir/ recurses' '
> +	echo untracked_dir/empty >expect &&
> +	git ls-files -o untracked_dir/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o --directory untracked_dir does not recurse' '
> +	echo untracked_dir/ >expect &&
> +	git ls-files -o --directory untracked_dir >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
> +	echo untracked_dir/ >expect &&
> +	git ls-files -o --directory untracked_dir/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o untracked_repo does not recurse' '
> +	echo untracked_repo/ >expect &&
> +	git ls-files -o untracked_repo >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
> +	echo untracked_repo/ >expect &&
> +	git ls-files -o untracked_repo/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
> +	echo untracked_dir/empty >expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o untracked_dir untracked_repo >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
> +	echo untracked_dir/empty >expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
> +	echo untracked_dir/ >expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o --directory untracked_dir untracked_repo >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
> +	echo untracked_dir/ >expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o .git shows nothing' '
> +	>expect &&
> +	git ls-files -o .git >actual &&
> +	test_cmp expect actual
> +'

Could we use test_must_be_empty here instead?

> +
> +test_expect_failure 'git ls-files -o .git/ shows nothing' '
> +	>expect &&
> +	git ls-files -o .git/ >actual &&
> +	test_cmp expect actual
> +'

test_must_be_empty?

Thanks,

Denton

> +
> +test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
> +	mkdir "untracked_*" &&
> +	>"untracked_*/empty" &&
> +
> +	echo "untracked_*/empty" >expect &&
> +	echo untracked_dir/empty >>expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o "untracked_*" >actual &&
> +	test_cmp expect actual
> +'
> +
> +# It turns out fill_directory returns the right paths, but ls-files' post-call
> +# filtering in show_dir_entry() via calling dir_path_match() which ends up
> +# in git_fnmatch() has logic for PATHSPEC_ONESTAR that assumes the pathspec
> +# must match the full path; it doesn't check it for matching a leading
> +# directory.
> +test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
> +	echo "untracked_*/empty" >expect &&
> +	echo untracked_dir/empty >>expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o "untracked_*/" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
> +	echo "untracked_*/" >expect &&
> +	echo untracked_dir/ >>expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o --directory "untracked_*" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
> +	echo "untracked_*/" >expect &&
> +	echo untracked_dir/ >>expect &&
> +	echo untracked_repo/ >>expect &&
> +	git ls-files -o --directory "untracked_*/" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git ls-files -o consistent between one or two dirs' '
> +	git ls-files -o --exclude-standard an_ignored_dir/ an_untracked_dir/ >tmp &&
> +	! grep ^an_ignored_dir/ tmp >expect &&
> +	git ls-files -o --exclude-standard an_ignored_dir/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +# ls-files doesn't have a way to request showing both untracked and ignored
> +# files at the same time, so use `git status --ignored`
> +test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
> +	cat <<-EOF >expect &&
> +	?? an_untracked_dir/
> +	!! an_untracked_dir/ignored
> +	EOF
> +	git status --porcelain --ignored >output &&
> +	grep an_untracked_dir output >expect &&
> +	git status --porcelain --ignored an_untracked_dir/ >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-09 20:47 ` [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
@ 2019-12-09 21:32   ` Denton Liu
  2019-12-09 21:51     ` Elijah Newren
  2019-12-09 22:09     ` Eric Sunshine
  0 siblings, 2 replies; 69+ messages in thread
From: Denton Liu @ 2019-12-09 21:32 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Junio C Hamano, Elijah Newren

Hi Elijah,

On Mon, Dec 09, 2019 at 08:47:39PM +0000, Elijah Newren via GitGitGadget wrote:
> diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> index 0c394cf995..ded7f97181 100755
> --- a/t/t7061-wtstatus-ignore.sh
> +++ b/t/t7061-wtstatus-ignore.sh
> @@ -43,11 +43,14 @@ test_expect_success 'status untracked directory with --ignored -u' '
>  	test_cmp expected actual
>  '
>  cat >expected <<\EOF
> -?? untracked/uncommitted
> +?? untracked/
>  !! untracked/ignored
>  EOF
>  
> -test_expect_success 'status prefixed untracked directory with --ignored' '
> +test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
> +	git status --porcelain --ignored | grep untracked/ >actual &&

Can we break this pipe up into two invocations so that we don't have a
git command in the upstream of a pipe?

Thanks,

Denton

P.S. Perhaps in the future, we (I) could try to extend chainlint so that
it catches this and git commands in non-assignment command
substitutions... I think that would be pretty nice.

> +	test_cmp expected actual &&
> +
>  	git status --porcelain --ignored untracked/ >actual &&
>  	test_cmp expected actual
>  '
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-09 21:32   ` Denton Liu
@ 2019-12-09 21:51     ` Elijah Newren
  2019-12-09 22:09     ` Eric Sunshine
  1 sibling, 0 replies; 69+ messages in thread
From: Elijah Newren @ 2019-12-09 21:51 UTC (permalink / raw)
  To: Denton Liu
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Junio C Hamano

On Mon, Dec 9, 2019 at 1:32 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> On Mon, Dec 09, 2019 at 08:47:39PM +0000, Elijah Newren via GitGitGadget wrote:
> > diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> > index 0c394cf995..ded7f97181 100755
> > --- a/t/t7061-wtstatus-ignore.sh
> > +++ b/t/t7061-wtstatus-ignore.sh
> > @@ -43,11 +43,14 @@ test_expect_success 'status untracked directory with --ignored -u' '
> >       test_cmp expected actual
> >  '
> >  cat >expected <<\EOF
> > -?? untracked/uncommitted
> > +?? untracked/
> >  !! untracked/ignored
> >  EOF
> >
> > -test_expect_success 'status prefixed untracked directory with --ignored' '
> > +test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
> > +     git status --porcelain --ignored | grep untracked/ >actual &&
>
> Can we break this pipe up into two invocations so that we don't have a
> git command in the upstream of a pipe?

Sigh...yeah, I keep doing this.  And I'll probably keep doing it if
someone can't chainlint (or pipefail) it.  I'll fix it up.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-09 21:32   ` Denton Liu
  2019-12-09 21:51     ` Elijah Newren
@ 2019-12-09 22:09     ` Eric Sunshine
  1 sibling, 0 replies; 69+ messages in thread
From: Eric Sunshine @ 2019-12-09 22:09 UTC (permalink / raw)
  To: Denton Liu
  Cc: Elijah Newren via GitGitGadget, Git List, Junio C Hamano, Elijah Newren

On Mon, Dec 9, 2019 at 4:32 PM Denton Liu <liu.denton@gmail.com> wrote:
> On Mon, Dec 09, 2019 at 08:47:39PM +0000, Elijah Newren via GitGitGadget wrote:
> > +test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
> > +     git status --porcelain --ignored | grep untracked/ >actual &&
>
> Can we break this pipe up into two invocations so that we don't have a
> git command in the upstream of a pipe?
>
> P.S. Perhaps in the future, we (I) could try to extend chainlint so that
> it catches this and git commands in non-assignment command
> substitutions... I think that would be pretty nice.

Rather than getting mired down in chainlint (which could make your
eyeballs melt), an easier way to catch this sort of thing would be to
introduce a new script which checks test scripts for Git
best-practices non-conformity, similar to how
t/check-non-portable-shell.pl checks for non-portable shell
constructs. (You could even extend check-non-portable-shell.pl with
the functionality, but then the script would no longer be specific to
"non-portable shell", so either renaming it or making a new new script
is warranted.)

By the way, I have considered adding a best-practices linting script
like this, but it (at least at first) would need to have some sort of
opt-in or opt-out feature since there (likely) are still so many
instances of tests which don't follow best-practices, and it could
take a while to "fix" them all (and eat up a lot of reviewer time, so
it should be done in small batches).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 0/8] Directory traversal bugs
  2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                   ` (7 preceding siblings ...)
  2019-12-09 20:47 ` [PATCH 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
@ 2019-12-10 20:00 ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
                     ` (8 more replies)
  8 siblings, 9 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano

This series fixes multiple fill_directory() bugs, one of them new to 2.24.0
coming from en/clean-nested-with-ignored-topic, the rest having been around
in versions of git going back up to a decade.
See https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/ for the report
spawning this series.

Changes since v1:

 * Testcase cleanups and tweaks suggested by Denton
 * A tweak to this cover letter so that gitgitgadget will hopefully pick up
   the cc-list. (It apparently needs to be 'Cc' now, instead of 'CC')

Elijah Newren (8):
  t3011: demonstrate directory traversal failures
  Revert "dir.c: make 'git-status --ignored' work within leading
    directories"
  dir: remove stray quote character in comment
  dir: exit before wildcard fall-through if there is no wildcard
  dir: break part of read_directory_recursive() out for reuse
  dir: fix checks on common prefix directory
  dir: synchronize treat_leading_path() and read_directory_recursive()
  dir: consolidate similar code in treat_directory()

 dir.c                                         | 174 +++++++++++----
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 t/t7061-wtstatus-ignore.sh                    |   9 +-
 3 files changed, 341 insertions(+), 51 deletions(-)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh


base-commit: da72936f544fec5a335e66432610e4cef4430991
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-676%2Fnewren%2Fls-files-bug-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-676/newren/ls-files-bug-v2
Pull-Request: https://github.com/git/git/pull/676

Range-diff vs v1:

 1:  4b24ba9966 ! 1:  6d659b2302 t3011: demonstrate directory traversal failures
     @@ -36,8 +36,10 @@
      +	git init untracked_repo &&
      +	>untracked_repo/empty &&
      +
     -+	echo ignored >.gitignore &&
     -+	echo an_ignored_dir/ >>.gitignore &&
     ++	cat <<-EOF >.gitignore &&
     ++	ignored
     ++	an_ignored_dir/
     ++	EOF
      +	mkdir an_ignored_dir &&
      +	mkdir an_untracked_dir &&
      +	>an_ignored_dir/ignored &&
     @@ -114,52 +116,60 @@
      +'
      +
      +test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
     -+	echo untracked_dir/empty >expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_dir/empty
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o untracked_dir untracked_repo >actual &&
      +	test_cmp expect actual
      +'
      +
      +test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
     -+	echo untracked_dir/empty >expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_dir/empty
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
      +	test_cmp expect actual
      +'
      +
      +test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
     -+	echo untracked_dir/ >expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_dir/
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o --directory untracked_dir untracked_repo >actual &&
      +	test_cmp expect actual
      +'
      +
      +test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
     -+	echo untracked_dir/ >expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_dir/
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
      +	test_cmp expect actual
      +'
      +
      +test_expect_success 'git ls-files -o .git shows nothing' '
     -+	>expect &&
      +	git ls-files -o .git >actual &&
     -+	test_cmp expect actual
     ++	test_must_be_empty actual
      +'
      +
      +test_expect_failure 'git ls-files -o .git/ shows nothing' '
     -+	>expect &&
      +	git ls-files -o .git/ >actual &&
     -+	test_cmp expect actual
     ++	test_must_be_empty actual
      +'
      +
      +test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
      +	mkdir "untracked_*" &&
      +	>"untracked_*/empty" &&
      +
     -+	echo "untracked_*/empty" >expect &&
     -+	echo untracked_dir/empty >>expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_*/empty
     ++	untracked_dir/empty
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o "untracked_*" >actual &&
      +	test_cmp expect actual
      +'
     @@ -170,25 +180,31 @@
      +# must match the full path; it doesn't check it for matching a leading
      +# directory.
      +test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
     -+	echo "untracked_*/empty" >expect &&
     -+	echo untracked_dir/empty >>expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_*/empty
     ++	untracked_dir/empty
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o "untracked_*/" >actual &&
      +	test_cmp expect actual
      +'
      +
      +test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
     -+	echo "untracked_*/" >expect &&
     -+	echo untracked_dir/ >>expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_*/
     ++	untracked_dir/
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o --directory "untracked_*" >actual &&
      +	test_cmp expect actual
      +'
      +
      +test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
     -+	echo "untracked_*/" >expect &&
     -+	echo untracked_dir/ >>expect &&
     -+	echo untracked_repo/ >>expect &&
     ++	cat <<-EOF >expect &&
     ++	untracked_*/
     ++	untracked_dir/
     ++	untracked_repo/
     ++	EOF
      +	git ls-files -o --directory "untracked_*/" >actual &&
      +	test_cmp expect actual
      +'
 2:  bfaf7592ee ! 2:  79f2b56174 Revert "dir.c: make 'git-status --ignored' work within leading directories"
     @@ -83,7 +83,9 @@
       
      -test_expect_success 'status prefixed untracked directory with --ignored' '
      +test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
     -+	git status --porcelain --ignored | grep untracked/ >actual &&
     ++	git status --porcelain --ignored >tmp &&
     ++	grep untracked/ tmp >actual &&
     ++	rm tmp &&
      +	test_cmp expected actual &&
      +
       	git status --porcelain --ignored untracked/ >actual &&
 3:  ea2588e87c = 3:  d6f858cab1 dir: remove stray quote character in comment
 4:  c3220758ab ! 4:  8d2d98eec3 dir: exit before wildcard fall-through if there is no wildcard
     @@ -37,15 +37,15 @@
       
      -test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
      +test_expect_success 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
     - 	echo untracked_dir/empty >expect &&
     - 	echo untracked_repo/ >>expect &&
     - 	git ls-files -o untracked_dir untracked_repo >actual &&
     + 	cat <<-EOF >expect &&
     + 	untracked_dir/empty
     + 	untracked_repo/
      @@
       	test_cmp expect actual
       '
       
      -test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
      +test_expect_success 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
     - 	echo untracked_dir/ >expect &&
     - 	echo untracked_repo/ >>expect &&
     - 	git ls-files -o --directory untracked_dir untracked_repo >actual &&
     + 	cat <<-EOF >expect &&
     + 	untracked_dir/
     + 	untracked_repo/
 5:  738d9ae4c9 = 5:  d2f5623bd7 dir: break part of read_directory_recursive() out for reuse
 6:  b897095136 ! 6:  9839aca00a dir: fix checks on common prefix directory
     @@ -164,11 +164,11 @@
       	git ls-files -o untracked_repo/ >actual &&
       	test_cmp expect actual
      @@
     - 	test_cmp expect actual
     + 	test_must_be_empty actual
       '
       
      -test_expect_failure 'git ls-files -o .git/ shows nothing' '
      +test_expect_success 'git ls-files -o .git/ shows nothing' '
     - 	>expect &&
       	git ls-files -o .git/ >actual &&
     - 	test_cmp expect actual
     + 	test_must_be_empty actual
     + '
 7:  4f8bf05d26 ! 7:  df7f08886a dir: synchronize treat_leading_path() and read_directory_recursive()
     @@ -104,6 +104,6 @@
       
      -test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
      +test_expect_success 'status of untracked directory with --ignored works with or without prefix' '
     - 	git status --porcelain --ignored | grep untracked/ >actual &&
     - 	test_cmp expected actual &&
     - 
     + 	git status --porcelain --ignored >tmp &&
     + 	grep untracked/ tmp >actual &&
     + 	rm tmp &&
 8:  2200bf144a = 8:  77b57e44fd dir: consolidate similar code in treat_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 1/8] t3011: demonstrate directory traversal failures
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add several tests demonstrating directory traversal failures of various
sorts in dir.c (and one similar looking test that turns out to be a
git_fnmatch bug).  A lot of these tests look like near duplicates of
each other, but an optimization path in dir.c to pre-descend into a
common prefix and the specialized treatment of trailing slashes in dir.c
mean the tiny differences are sometimes important and potentially cause
different codepaths to be explored.

Of the 7 failing tests, 2 are new to git-2.24.0 (tweaked by side effects
of the en/clean-nested-with-ignored-topic); the other 5 also failed
under git-2.23.0 and earlier.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 1 file changed, 209 insertions(+)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh

diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
new file mode 100755
index 0000000000..54f80c62b8
--- /dev/null
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -0,0 +1,209 @@
+#!/bin/sh
+
+test_description='directory traversal handling, especially with common prefixes'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit hello &&
+
+	>empty &&
+	mkdir untracked_dir &&
+	>untracked_dir/empty &&
+	git init untracked_repo &&
+	>untracked_repo/empty &&
+
+	cat <<-EOF >.gitignore &&
+	ignored
+	an_ignored_dir/
+	EOF
+	mkdir an_ignored_dir &&
+	mkdir an_untracked_dir &&
+	>an_ignored_dir/ignored &&
+	>an_ignored_dir/untracked &&
+	>an_untracked_dir/ignored &&
+	>an_untracked_dir/untracked
+'
+
+test_expect_success 'git ls-files -o shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_ignored_dir/ignored
+	an_ignored_dir/untracked
+	an_untracked_dir/ignored
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --exclude-standard shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o --exclude-standard >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_repo does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o .git shows nothing' '
+	git ls-files -o .git >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_failure 'git ls-files -o .git/ shows nothing' '
+	git ls-files -o .git/ >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
+	mkdir "untracked_*" &&
+	>"untracked_*/empty" &&
+
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+# It turns out fill_directory returns the right paths, but ls-files' post-call
+# filtering in show_dir_entry() via calling dir_path_match() which ends up
+# in git_fnmatch() has logic for PATHSPEC_ONESTAR that assumes the pathspec
+# must match the full path; it doesn't check it for matching a leading
+# directory.
+test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o consistent between one or two dirs' '
+	git ls-files -o --exclude-standard an_ignored_dir/ an_untracked_dir/ >tmp &&
+	! grep ^an_ignored_dir/ tmp >expect &&
+	git ls-files -o --exclude-standard an_ignored_dir/ >actual &&
+	test_cmp expect actual
+'
+
+# ls-files doesn't have a way to request showing both untracked and ignored
+# files at the same time, so use `git status --ignored`
+test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+	cat <<-EOF >expect &&
+	?? an_untracked_dir/
+	!! an_untracked_dir/ignored
+	EOF
+	git status --porcelain --ignored >output &&
+	grep an_untracked_dir output >expect &&
+	git status --porcelain --ignored an_untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit be8a84c52669 ("dir.c: make 'git-status --ignored' work within
leading directories", 2013-04-15) noted that
   git status --ignored <SOMEPATH>
would not list ignored files and directories within <SOMEPATH> if
<SOMEPATH> was untracked, and modified the behavior to make it show
them.  However, it did so via a hack that broke consistency; it would
show paths under <SOMEPATH> differently than a simple
   git status --ignored | grep <SOMEPATH>
would show them.  A correct fix is slightly more involved, and
complicated slightly by this hack, so we revert this commit (but keep
corrected versions of the testcases) and will later fix the original
bug with a subsequent patch.

Some history may be helpful:

A very, very similar case to the commit we are reverting was raised in
commit 48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08); but it actually went in somewhat the opposite direction.  In
that commit, it mentioned how
   git ls-files -o --exclude-standard t/
used to show untracked files under t/ even when t/ was ignored, and then
changed the behavior to stop showing untracked files under an ignored
directory.  More importantly, this commit considered keeping this
behavior but noted that it would be inconsistent with the behavior when
multiple pathspecs were specified and thus rejected it.

The reason for this whole inconsistency when one pathspec is specified
versus zero or two is because common prefixes of pathspecs are sent
through a different set of checks (in treat_leading_path()) than normal
file/directory traversal (those go through read_directory_recursive()
and treat_path()).  As such, for consistency, one needs to check that
both codepaths produce the same result.

Revert commit be8a84c526691667fc04a8241d93a3de1de298ab, except instead
of removing the testcase it added, modify it to check for correct and
consistent behavior.  A subsequent patch in this series will fix the
testcase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                      | 3 ---
 t/t7061-wtstatus-ignore.sh | 9 +++++++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index 61f559f980..0dd5266629 100644
--- a/dir.c
+++ b/dir.c
@@ -2083,14 +2083,12 @@ static int treat_leading_path(struct dir_struct *dir,
 	struct strbuf sb = STRBUF_INIT;
 	int baselen, rc = 0;
 	const char *cp;
-	int old_flags = dir->flags;
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
 	baselen = 0;
-	dir->flags &= ~DIR_SHOW_OTHER_DIRECTORIES;
 	while (1) {
 		cp = path + baselen + !!baselen;
 		cp = memchr(cp, '/', path + len - cp);
@@ -2113,7 +2111,6 @@ static int treat_leading_path(struct dir_struct *dir,
 		}
 	}
 	strbuf_release(&sb);
-	dir->flags = old_flags;
 	return rc;
 }
 
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 0c394cf995..84366050da 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -43,11 +43,16 @@ test_expect_success 'status untracked directory with --ignored -u' '
 	test_cmp expected actual
 '
 cat >expected <<\EOF
-?? untracked/uncommitted
+?? untracked/
 !! untracked/ignored
 EOF
 
-test_expect_success 'status prefixed untracked directory with --ignored' '
+test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+	git status --porcelain --ignored >tmp &&
+	grep untracked/ tmp >actual &&
+	rm tmp &&
+	test_cmp expected actual &&
+
 	git status --porcelain --ignored untracked/ >actual &&
 	test_cmp expected actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 3/8] dir: remove stray quote character in comment
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 0dd5266629..5dacacd469 100644
--- a/dir.c
+++ b/dir.c
@@ -373,7 +373,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
-		/* name" doesn't match up to the first wild character */
+		/* name doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
 		    ps_strncmp(item, match, name,
 			       item->nowildcard_len - prefix))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 4/8] dir: exit before wildcard fall-through if there is no wildcard
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2019-12-10 20:00   ` [PATCH v2 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a
wildcard, noting that we don't yet have enough information to determine
if a further paths under the current directory might match due to the
presence of wildcards.  But if we have no wildcards in our pathspec,
then we shouldn't get to that fall-through case.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                              | 7 +++++++
 t/t3011-common-prefixes-and-directory-traversal.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 5dacacd469..517a569e10 100644
--- a/dir.c
+++ b/dir.c
@@ -379,6 +379,13 @@ static int match_pathspec_item(const struct index_state *istate,
 			       item->nowildcard_len - prefix))
 			return 0;
 
+		/*
+		 * name has no wildcard, and it didn't match as a leading
+		 * pathspec so return.
+		 */
+		if (item->nowildcard_len == item->len)
+			return 0;
+
 		/*
 		 * Here is where we would perform a wildmatch to check if
 		 * "name" can be matched as a directory (or a prefix) against
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 54f80c62b8..d6e161ddd8 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -92,7 +92,7 @@ test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+test_expect_success 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
 	cat <<-EOF >expect &&
 	untracked_dir/empty
 	untracked_repo/
@@ -110,7 +110,7 @@ test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses int
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
 	cat <<-EOF >expect &&
 	untracked_dir/
 	untracked_repo/
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 5/8] dir: break part of read_directory_recursive() out for reuse
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2019-12-10 20:00   ` [PATCH v2 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create an add_path_to_appropriate_result_list() function from the code
at the end of read_directory_recursive() so we can use it elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 60 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/dir.c b/dir.c
index 517a569e10..645b44ea64 100644
--- a/dir.c
+++ b/dir.c
@@ -1932,6 +1932,40 @@ static void close_cached_dir(struct cached_dir *cdir)
 	}
 }
 
+static void add_path_to_appropriate_result_list(struct dir_struct *dir,
+	struct untracked_cache_dir *untracked,
+	struct cached_dir *cdir,
+	struct index_state *istate,
+	struct strbuf *path,
+	int baselen,
+	const struct pathspec *pathspec,
+	enum path_treatment state)
+{
+	/* add the path to the appropriate result list */
+	switch (state) {
+	case path_excluded:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			dir_add_name(dir, istate, path->buf, path->len);
+		else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			((dir->flags & DIR_COLLECT_IGNORED) &&
+			exclude_matches_pathspec(path->buf, path->len,
+						 pathspec)))
+			dir_add_ignored(dir, istate, path->buf, path->len);
+		break;
+
+	case path_untracked:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			break;
+		dir_add_name(dir, istate, path->buf, path->len);
+		if (cdir->fdir)
+			add_untracked(untracked, path->buf + baselen);
+		break;
+
+	default:
+		break;
+	}
+}
+
 /*
  * Read a directory tree. We currently ignore anything but
  * directories, regular files and symlinks. That's because git
@@ -2035,29 +2069,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			continue;
 		}
 
-		/* add the path to the appropriate result list */
-		switch (state) {
-		case path_excluded:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				dir_add_name(dir, istate, path.buf, path.len);
-			else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-				((dir->flags & DIR_COLLECT_IGNORED) &&
-				exclude_matches_pathspec(path.buf, path.len,
-							 pathspec)))
-				dir_add_ignored(dir, istate, path.buf, path.len);
-			break;
-
-		case path_untracked:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				break;
-			dir_add_name(dir, istate, path.buf, path.len);
-			if (cdir.fdir)
-				add_untracked(untracked, path.buf + baselen);
-			break;
-
-		default:
-			break;
-		}
+		add_path_to_appropriate_result_list(dir, untracked, &cdir,
+						    istate, &path, baselen,
+						    pathspec, state);
 	}
 	close_cached_dir(&cdir);
  out:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2019-12-10 20:00   ` [PATCH v2 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-15 10:29     ` Johannes Schindelin
  2019-12-10 20:00   ` [PATCH v2 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Many years ago, the directory traversing logic had an optimization that
would always recurse into any directory that was a common prefix of all
the pathspecs without walking the leading directories to get down to
the desired directory.  Thus,
   git ls-files -o .git/                        # case A
would notice that .git/ was a common prefix of all pathspecs (since
it is the only pathspec listed), and then traverse into it and start
showing unknown files under that directory.  Unfortunately, .git/ is not
a directory we should be traversing into, which made this optimization
problematic.  This also affected cases like
   git ls-files -o --exclude-standard t/        # case B
where t/ was in the .gitignore file and thus isn't interesting and
shouldn't be recursed into.  It also affected cases like
   git ls-files -o --directory untracked_dir/   # case C
where untracked_dir/ is indeed untracked and thus interesting, but the
--directory flag means we only want to show the directory itself, not
recurse into it and start listing untracked files below it.

The case B class of bugs were noted and fixed in commits 16e2cfa90993
("read_directory(): further split treat_path()", 2010-01-08) and
48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08), with the idea being that we first wanted to check whether
the common prefix was interesting.  The former patch noted that
treat_path() couldn't be used when checking the common prefix because
treat_path() requires a dir_entry() and we haven't read any directories
at the point we are checking the common prefix.  So, that patch split
treat_one_path() out of treat_path().  The latter patch then created a
new treat_leading_path() which duplicated by hand the bits of
treat_path() that couldn't be broken out and then called
treat_one_path() for the remainder.  There were three problems with this
approach:

  * The duplicated logic in treat_leading_path() accidentally missed the
    check for special paths (such as is_dot_or_dotdot and matching
    ".git"), causing case A types of bugs to continue to be an issue.
  * The treat_leading_path() logic assumed we should traverse into
    anything where path_treatment was not path_none, i.e. it perpetuated
    class C types of bugs.
  * It meant we had split logic that needed to kept in sync, running the
    risk that people introduced new inconsistencies (such as in commit
    be8a84c52669, which we reverted earlier in this series, or in commit
    df5bcdf83ae which we'll fix in a subsequent commit)

Fix most these problems by making treat_leading_path() not only loop
over each leading path component, but calling treat_path() directly on
each.  To do so, we have to create a synthetic dir_entry, but that only
takes a few lines.  Then, pay attention to the path_treatment result we
get from treat_path() and don't treat path_excluded, path_untracked, and
path_recurse all the same as path_recurse.

This leaves one remaining problem, the new inconsistency from commit
df5bcdf83ae.  That will be addressed in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 54 +++++++++++++++----
 ...common-prefixes-and-directory-traversal.sh |  6 +--
 2 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/dir.c b/dir.c
index 645b44ea64..9c71a9ac21 100644
--- a/dir.c
+++ b/dir.c
@@ -2102,37 +2102,69 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const struct pathspec *pathspec)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int baselen, rc = 0;
+	int prevlen, baselen;
 	const char *cp;
+	struct cached_dir cdir;
+	struct dirent de;
+	enum path_treatment state = path_none;
+
+	/*
+	 * For each directory component of path, we are going to check whether
+	 * that path is relevant given the pathspec.  For example, if path is
+	 *    foo/bar/baz/
+	 * then we will ask treat_path() whether we should go into foo, then
+	 * whether we should go into bar, then whether baz is relevant.
+	 * Checking each is important because e.g. if path is
+	 *    .git/info/
+	 * then we need to check .git to know we shouldn't traverse it.
+	 * If the return from treat_path() is:
+	 *    * path_none, for any path, we return false.
+	 *    * path_recurse, for all path components, we return true
+	 *    * <anything else> for some intermediate component, we make sure
+	 *        to add that path to the relevant list but return false
+	 *        signifying that we shouldn't recurse into it.
+	 */
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
+
+	memset(&cdir, 0, sizeof(cdir));
+	memset(&de, 0, sizeof(de));
+	cdir.de = &de;
+	de.d_type = DT_DIR;
 	baselen = 0;
+	prevlen = 0;
 	while (1) {
-		cp = path + baselen + !!baselen;
+		prevlen = baselen + !!baselen;
+		cp = path + prevlen;
 		cp = memchr(cp, '/', path + len - cp);
 		if (!cp)
 			baselen = len;
 		else
 			baselen = cp - path;
-		strbuf_setlen(&sb, 0);
+		strbuf_reset(&sb);
 		strbuf_add(&sb, path, baselen);
 		if (!is_directory(sb.buf))
 			break;
-		if (simplify_away(sb.buf, sb.len, pathspec))
-			break;
-		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
-				   DT_DIR, NULL) == path_none)
+		strbuf_reset(&sb);
+		strbuf_add(&sb, path, prevlen);
+		memcpy(de.d_name, path+prevlen, baselen-prevlen);
+		de.d_name[baselen-prevlen] = '\0';
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
+				    pathspec);
+		if (state != path_recurse)
 			break; /* do not recurse into it */
-		if (len <= baselen) {
-			rc = 1;
+		if (len <= baselen)
 			break; /* finished checking */
-		}
 	}
+	add_path_to_appropriate_result_list(dir, NULL, &cdir, istate,
+					    &sb, baselen, pathspec,
+					    state);
+
 	strbuf_release(&sb);
-	return rc;
+	return state == path_recurse;
 }
 
 static const char *get_ident_string(void)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index d6e161ddd8..098fddc75b 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -74,7 +74,7 @@ test_expect_success 'git ls-files -o --directory untracked_dir does not recurse'
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir/ does not recurse' '
 	echo untracked_dir/ >expect &&
 	git ls-files -o --directory untracked_dir/ >actual &&
 	test_cmp expect actual
@@ -86,7 +86,7 @@ test_expect_success 'git ls-files -o untracked_repo does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+test_expect_success 'git ls-files -o untracked_repo/ does not recurse' '
 	echo untracked_repo/ >expect &&
 	git ls-files -o untracked_repo/ >actual &&
 	test_cmp expect actual
@@ -133,7 +133,7 @@ test_expect_success 'git ls-files -o .git shows nothing' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'git ls-files -o .git/ shows nothing' '
+test_expect_success 'git ls-files -o .git/ shows nothing' '
 	git ls-files -o .git/ >actual &&
 	test_must_be_empty actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 7/8] dir: synchronize treat_leading_path() and read_directory_recursive()
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2019-12-10 20:00   ` [PATCH v2 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-10 20:00   ` [PATCH v2 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our optimization to avoid calling into read_directory_recursive() when
all pathspecs have a common leading directory mean that we need to match
the logic that read_directory_recursive() would use if we had just
called it from the root.  Since it does more than call treat_path() we
need to copy that same logic.

Alternatively, we could try to change treat_path to return path_recurse
for an untracked directory under the given special circumstances that
this logic checks for, but a simple switch results in many test failures
such as 'git clean -d' not wiping out untracked but empty directories.
To work around that, we'd need the caller of treat_path to check for
path_recurse and sometimes special case it into path_untracked.  In
other words, we'd still have extra logic in both places.

Needing to duplicate logic like this means it is guaranteed someone will
eventually need to make further changes and forget to update both
locations.  It is tempting to just nuke the leading_directory special
casing to avoid such bugs and simplify the code, but unpack_trees'
verify_clean_subdirectory() also calls read_directory() and does so with
a non-empty leading path, so I'm hesitant to try to restructure further.
Add obnoxious warnings to treat_leading_path() and
read_directory_recursive() to try to warn people of such problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 30 +++++++++++++++++++
 ...common-prefixes-and-directory-traversal.sh |  2 +-
 t/t7061-wtstatus-ignore.sh                    |  2 +-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 9c71a9ac21..bb6e481909 100644
--- a/dir.c
+++ b/dir.c
@@ -1990,6 +1990,15 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct untracked_cache_dir *untracked, int check_only,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2101,6 +2110,15 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct strbuf sb = STRBUF_INIT;
 	int prevlen, baselen;
 	const char *cp;
@@ -2154,6 +2172,18 @@ static int treat_leading_path(struct dir_struct *dir,
 		de.d_name[baselen-prevlen] = '\0';
 		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
 				    pathspec);
+		if (state == path_untracked &&
+		    get_dtype(cdir.de, istate, sb.buf, sb.len) == DT_DIR &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
+		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
+				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
+			add_path_to_appropriate_result_list(dir, NULL, &cdir,
+							    istate,
+							    &sb, baselen,
+							    pathspec, state);
+			state = path_recurse;
+		}
+
 		if (state != path_recurse)
 			break; /* do not recurse into it */
 		if (len <= baselen)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 098fddc75b..3da5b2b6e7 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -195,7 +195,7 @@ test_expect_success 'git ls-files -o consistent between one or two dirs' '
 
 # ls-files doesn't have a way to request showing both untracked and ignored
 # files at the same time, so use `git status --ignored`
-test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+test_expect_success 'git status --ignored shows same files under dir with or without pathspec' '
 	cat <<-EOF >expect &&
 	?? an_untracked_dir/
 	!! an_untracked_dir/ignored
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 84366050da..e4cf5484f9 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -47,7 +47,7 @@ cat >expected <<\EOF
 !! untracked/ignored
 EOF
 
-test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+test_expect_success 'status of untracked directory with --ignored works with or without prefix' '
 	git status --porcelain --ignored >tmp &&
 	grep untracked/ tmp >actual &&
 	rm tmp &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 8/8] dir: consolidate similar code in treat_directory()
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                     ` (6 preceding siblings ...)
  2019-12-10 20:00   ` [PATCH v2 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
@ 2019-12-10 20:00   ` Elijah Newren via GitGitGadget
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-10 20:00 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Both the DIR_SKIP_NESTED_GIT and DIR_NO_GITLINKS cases were checking for
whether a path was actually a nonbare repository.  That code could be
shared, with just the result of how to act differing between the two
cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/dir.c b/dir.c
index bb6e481909..04541b798b 100644
--- a/dir.c
+++ b/dir.c
@@ -1461,6 +1461,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int exclude,
 	const struct pathspec *pathspec)
 {
+	int nested_repo = 0;
+
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(istate, dirname, len-1)) {
 	case index_directory:
@@ -1470,15 +1472,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
-		if (dir->flags & DIR_SKIP_NESTED_GIT) {
-			int nested_repo;
+		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		    !(dir->flags & DIR_NO_GITLINKS)) {
 			struct strbuf sb = STRBUF_INIT;
 			strbuf_addstr(&sb, dirname);
 			nested_repo = is_nonbare_repository_dir(&sb);
 			strbuf_release(&sb);
-			if (nested_repo)
-				return path_none;
 		}
+		if (nested_repo)
+			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+				(exclude ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
@@ -1506,13 +1509,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 			return path_none;
 		}
-		if (!(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			if (is_nonbare_repository_dir(&sb))
-				return exclude ? path_excluded : path_untracked;
-			strbuf_release(&sb);
-		}
 		return path_recurse;
 	}
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-10 20:00   ` [PATCH v2 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
@ 2019-12-15 10:29     ` Johannes Schindelin
  2019-12-16 13:51       ` Elijah Newren
  0 siblings, 1 reply; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-15 10:29 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, blees, Junio C Hamano, kyle, sxlijin, Junio C Hamano, Elijah Newren

Hi Elijah,

I have not had time to dive deeply into this, but I know that it _does_
cause a ton of segmentation faults in the `shears/pu` branch (where all of
Git for Windows' patches are rebased on top of `pu`):

On Tue, 10 Dec 2019, Elijah Newren via GitGitGadget wrote:

> diff --git a/dir.c b/dir.c
> index 645b44ea64..9c71a9ac21 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2102,37 +2102,69 @@ static int treat_leading_path(struct dir_struct *dir,
>  			      const struct pathspec *pathspec)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> -	int baselen, rc = 0;
> +	int prevlen, baselen;
>  	const char *cp;
> +	struct cached_dir cdir;
> +	struct dirent de;
> +	enum path_treatment state = path_none;
> +
> +	/*
> +	 * For each directory component of path, we are going to check whether
> +	 * that path is relevant given the pathspec.  For example, if path is
> +	 *    foo/bar/baz/
> +	 * then we will ask treat_path() whether we should go into foo, then
> +	 * whether we should go into bar, then whether baz is relevant.
> +	 * Checking each is important because e.g. if path is
> +	 *    .git/info/
> +	 * then we need to check .git to know we shouldn't traverse it.
> +	 * If the return from treat_path() is:
> +	 *    * path_none, for any path, we return false.
> +	 *    * path_recurse, for all path components, we return true
> +	 *    * <anything else> for some intermediate component, we make sure
> +	 *        to add that path to the relevant list but return false
> +	 *        signifying that we shouldn't recurse into it.
> +	 */
>
>  	while (len && path[len - 1] == '/')
>  		len--;
>  	if (!len)
>  		return 1;
> +
> +	memset(&cdir, 0, sizeof(cdir));
> +	memset(&de, 0, sizeof(de));
> +	cdir.de = &de;
> +	de.d_type = DT_DIR;

So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.

>  	baselen = 0;
> +	prevlen = 0;
>  	while (1) {
> -		cp = path + baselen + !!baselen;
> +		prevlen = baselen + !!baselen;
> +		cp = path + prevlen;
>  		cp = memchr(cp, '/', path + len - cp);
>  		if (!cp)
>  			baselen = len;
>  		else
>  			baselen = cp - path;
> -		strbuf_setlen(&sb, 0);
> +		strbuf_reset(&sb);
>  		strbuf_add(&sb, path, baselen);
>  		if (!is_directory(sb.buf))
>  			break;
> -		if (simplify_away(sb.buf, sb.len, pathspec))
> -			break;
> -		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
> -				   DT_DIR, NULL) == path_none)
> +		strbuf_reset(&sb);
> +		strbuf_add(&sb, path, prevlen);
> +		memcpy(de.d_name, path+prevlen, baselen-prevlen);

But here we try to copy a path into that `de.d_name`, which is still
`NULL`?

That can't be right, can it?

Thanks for your help,
Dscho

> +		de.d_name[baselen-prevlen] = '\0';
> +		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
> +				    pathspec);
> +		if (state != path_recurse)
>  			break; /* do not recurse into it */
> -		if (len <= baselen) {
> -			rc = 1;
> +		if (len <= baselen)
>  			break; /* finished checking */
> -		}
>  	}
> +	add_path_to_appropriate_result_list(dir, NULL, &cdir, istate,
> +					    &sb, baselen, pathspec,
> +					    state);
> +
>  	strbuf_release(&sb);
> -	return rc;
> +	return state == path_recurse;
>  }
>
>  static const char *get_ident_string(void)
> diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
> index d6e161ddd8..098fddc75b 100755
> --- a/t/t3011-common-prefixes-and-directory-traversal.sh
> +++ b/t/t3011-common-prefixes-and-directory-traversal.sh
> @@ -74,7 +74,7 @@ test_expect_success 'git ls-files -o --directory untracked_dir does not recurse'
>  	test_cmp expect actual
>  '
>
> -test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
> +test_expect_success 'git ls-files -o --directory untracked_dir/ does not recurse' '
>  	echo untracked_dir/ >expect &&
>  	git ls-files -o --directory untracked_dir/ >actual &&
>  	test_cmp expect actual
> @@ -86,7 +86,7 @@ test_expect_success 'git ls-files -o untracked_repo does not recurse' '
>  	test_cmp expect actual
>  '
>
> -test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
> +test_expect_success 'git ls-files -o untracked_repo/ does not recurse' '
>  	echo untracked_repo/ >expect &&
>  	git ls-files -o untracked_repo/ >actual &&
>  	test_cmp expect actual
> @@ -133,7 +133,7 @@ test_expect_success 'git ls-files -o .git shows nothing' '
>  	test_must_be_empty actual
>  '
>
> -test_expect_failure 'git ls-files -o .git/ shows nothing' '
> +test_expect_success 'git ls-files -o .git/ shows nothing' '
>  	git ls-files -o .git/ >actual &&
>  	test_must_be_empty actual
>  '
> --
> gitgitgadget
>
>
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-15 10:29     ` Johannes Schindelin
@ 2019-12-16 13:51       ` Elijah Newren
  2019-12-16 16:00         ` Elijah Newren
  0 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren @ 2019-12-16 13:51 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Junio C Hamano, Kyle Meyer, Samuel Lijin

On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> I have not had time to dive deeply into this, but I know that it _does_
> cause a ton of segmentation faults in the `shears/pu` branch (where all of
> Git for Windows' patches are rebased on top of `pu`):

Weird.  If it's going to cause segmentation faults at all, it would
certainly do it all over the place, but I tested the patches on the
major platforms using your Azure Pipelines setup on git.git so it
should be good on all the platforms.  Did your shears/pu branch make
some other changes to the setup?

> On Tue, 10 Dec 2019, Elijah Newren via GitGitGadget wrote:
>
> > diff --git a/dir.c b/dir.c
> > index 645b44ea64..9c71a9ac21 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -2102,37 +2102,69 @@ static int treat_leading_path(struct dir_struct *dir,
> >                             const struct pathspec *pathspec)
> >  {
> >       struct strbuf sb = STRBUF_INIT;
> > -     int baselen, rc = 0;
> > +     int prevlen, baselen;
> >       const char *cp;
> > +     struct cached_dir cdir;
> > +     struct dirent de;
> > +     enum path_treatment state = path_none;
> > +
> > +     /*
> > +      * For each directory component of path, we are going to check whether
> > +      * that path is relevant given the pathspec.  For example, if path is
> > +      *    foo/bar/baz/
> > +      * then we will ask treat_path() whether we should go into foo, then
> > +      * whether we should go into bar, then whether baz is relevant.
> > +      * Checking each is important because e.g. if path is
> > +      *    .git/info/
> > +      * then we need to check .git to know we shouldn't traverse it.
> > +      * If the return from treat_path() is:
> > +      *    * path_none, for any path, we return false.
> > +      *    * path_recurse, for all path components, we return true
> > +      *    * <anything else> for some intermediate component, we make sure
> > +      *        to add that path to the relevant list but return false
> > +      *        signifying that we shouldn't recurse into it.
> > +      */
> >
> >       while (len && path[len - 1] == '/')
> >               len--;
> >       if (!len)
> >               return 1;
> > +
> > +     memset(&cdir, 0, sizeof(cdir));
> > +     memset(&de, 0, sizeof(de));
> > +     cdir.de = &de;
> > +     de.d_type = DT_DIR;
>
> So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.

Um, yeah...didn't I have an allocation of de.d_name here?  It will
always have a subset of path copied into it, so an allocation of len+1
is plenty long enough.

> >       baselen = 0;
> > +     prevlen = 0;
> >       while (1) {
> > -             cp = path + baselen + !!baselen;
> > +             prevlen = baselen + !!baselen;
> > +             cp = path + prevlen;
> >               cp = memchr(cp, '/', path + len - cp);
> >               if (!cp)
> >                       baselen = len;
> >               else
> >                       baselen = cp - path;
> > -             strbuf_setlen(&sb, 0);
> > +             strbuf_reset(&sb);
> >               strbuf_add(&sb, path, baselen);
> >               if (!is_directory(sb.buf))
> >                       break;
> > -             if (simplify_away(sb.buf, sb.len, pathspec))
> > -                     break;
> > -             if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
> > -                                DT_DIR, NULL) == path_none)
> > +             strbuf_reset(&sb);
> > +             strbuf_add(&sb, path, prevlen);
> > +             memcpy(de.d_name, path+prevlen, baselen-prevlen);
>
> But here we try to copy a path into that `de.d_name`, which is still
> `NULL`?
>
> That can't be right, can it?

Yes, it can't be right.  How did this possibly pass on any platform
let alone all of them?
(https://dev.azure.com/git/git/_build/results?buildId=1462&view=results).
This is absolutely an important codepath that is hit; otherwise it
couldn't fix the three tests from failure to success.  Further, the
subsequent patch added code within this if-block after this memcpy and
fixed a few tests from failures to success.  So it had to hit this
code path as well.  How could it not have segfaulted?  I'm very
confused...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-16 13:51       ` Elijah Newren
@ 2019-12-16 16:00         ` Elijah Newren
  2019-12-16 18:13           ` Junio C Hamano
  2019-12-17  0:04           ` Johannes Schindelin
  0 siblings, 2 replies; 69+ messages in thread
From: Elijah Newren @ 2019-12-16 16:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Junio C Hamano, Kyle Meyer, Samuel Lijin

On Mon, Dec 16, 2019 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Elijah,
> >
> > I have not had time to dive deeply into this, but I know that it _does_
> > cause a ton of segmentation faults in the `shears/pu` branch (where all of
> > Git for Windows' patches are rebased on top of `pu`):
>
> Weird.  If it's going to cause segmentation faults at all, it would
> certainly do it all over the place, but I tested the patches on the
> major platforms using your Azure Pipelines setup on git.git so it
> should be good on all the platforms.  Did your shears/pu branch make
> some other changes to the setup?
>
> > On Tue, 10 Dec 2019, Elijah Newren via GitGitGadget wrote:
> >
> > > diff --git a/dir.c b/dir.c
> > > index 645b44ea64..9c71a9ac21 100644
> > > --- a/dir.c
> > > +++ b/dir.c
> > > @@ -2102,37 +2102,69 @@ static int treat_leading_path(struct dir_struct *dir,
> > >                             const struct pathspec *pathspec)
> > >  {
> > >       struct strbuf sb = STRBUF_INIT;
> > > -     int baselen, rc = 0;
> > > +     int prevlen, baselen;
> > >       const char *cp;
> > > +     struct cached_dir cdir;
> > > +     struct dirent de;
> > > +     enum path_treatment state = path_none;
> > > +
> > > +     /*
> > > +      * For each directory component of path, we are going to check whether
> > > +      * that path is relevant given the pathspec.  For example, if path is
> > > +      *    foo/bar/baz/
> > > +      * then we will ask treat_path() whether we should go into foo, then
> > > +      * whether we should go into bar, then whether baz is relevant.
> > > +      * Checking each is important because e.g. if path is
> > > +      *    .git/info/
> > > +      * then we need to check .git to know we shouldn't traverse it.
> > > +      * If the return from treat_path() is:
> > > +      *    * path_none, for any path, we return false.
> > > +      *    * path_recurse, for all path components, we return true
> > > +      *    * <anything else> for some intermediate component, we make sure
> > > +      *        to add that path to the relevant list but return false
> > > +      *        signifying that we shouldn't recurse into it.
> > > +      */
> > >
> > >       while (len && path[len - 1] == '/')
> > >               len--;
> > >       if (!len)
> > >               return 1;
> > > +
> > > +     memset(&cdir, 0, sizeof(cdir));
> > > +     memset(&de, 0, sizeof(de));
> > > +     cdir.de = &de;
> > > +     de.d_type = DT_DIR;
> >
> > So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.
>
> Um, yeah...didn't I have an allocation of de.d_name here?  It will
> always have a subset of path copied into it, so an allocation of len+1
> is plenty long enough.

Actually, it looks like I looked up the definition of dirent
previously and forgot by the time you emailed.  On linux, from
/usr/include/bits/dirent.h:

struct dirent
  {
    ....
    unsigned char d_type;
    char d_name[256];           /* We must not include limits.h! */
  };

and from compat/win32/dirent.h defines it as:

struct dirent {
        unsigned char d_type;      /* file type to prevent lstat after
readdir */
        char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
};

and 'man dirent' on Mac OS X says it's defined as:

struct dirent {
        ...
        _uint8_t d_type;
        _unit8_t d_namlen;   /* length of string in d_name */
        char    d_name[255+1];  /* name must be no longer than this */
}

so, allocating it would be incorrect and my memset would just fill
d_name with nul characters.


But the raises the question...what kind of segfaults are you getting?
Can you link to any builds or post any stack traces?  Can I duplicate
with some copy of git-for-windows on linux?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-16 16:00         ` Elijah Newren
@ 2019-12-16 18:13           ` Junio C Hamano
  2019-12-16 21:08             ` Elijah Newren
  2019-12-17  0:04           ` Johannes Schindelin
  1 sibling, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-16 18:13 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin, Elijah Newren via GitGitGadget,
	Git Mailing List, blees, Kyle Meyer, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

>> > > +     memset(&cdir, 0, sizeof(cdir));
>> > > +     memset(&de, 0, sizeof(de));
>> > > +     cdir.de = &de;
>> > > +     de.d_type = DT_DIR;
>> >
>> > So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.
>>
>> Um, yeah...didn't I have an allocation of de.d_name here?  It will
>> always have a subset of path copied into it, so an allocation of len+1
>> is plenty long enough.
>
> Actually, it looks like I looked up the definition of dirent
> previously and forgot by the time you emailed.  On linux, from
> /usr/include/bits/dirent.h:
>
> struct dirent
>   {
>     ....
>     unsigned char d_type;
>     char d_name[256];           /* We must not include limits.h! */
>   };
>
> ...

Uh, oh.  The size of "struct dirent" is unspecified and it is asking
for trouble to allocate one yourself (iow, treat it pretty much as
something you can only get a pointer to an instance from readdir()).
For example, a dirent that comes back readdir() may have a lot
longer name than the sizeof(.d_name[]) above may imply.

Do you really need to manufacture a dirent yourself, or can you use
a more concrete type you invent yourself?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-16 18:13           ` Junio C Hamano
@ 2019-12-16 21:08             ` Elijah Newren
  2019-12-16 21:25               ` Junio C Hamano
  0 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren @ 2019-12-16 21:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Elijah Newren via GitGitGadget,
	Git Mailing List, blees, Kyle Meyer, Samuel Lijin

On Mon, Dec 16, 2019 at 10:13 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> >> > > +     memset(&cdir, 0, sizeof(cdir));
> >> > > +     memset(&de, 0, sizeof(de));
> >> > > +     cdir.de = &de;
> >> > > +     de.d_type = DT_DIR;
> >> >
> >> > So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.
> >>
> >> Um, yeah...didn't I have an allocation of de.d_name here?  It will
> >> always have a subset of path copied into it, so an allocation of len+1
> >> is plenty long enough.
> >
> > Actually, it looks like I looked up the definition of dirent
> > previously and forgot by the time you emailed.  On linux, from
> > /usr/include/bits/dirent.h:
> >
> > struct dirent
> >   {
> >     ....
> >     unsigned char d_type;
> >     char d_name[256];           /* We must not include limits.h! */
> >   };
> >
> > ...
>
> Uh, oh.  The size of "struct dirent" is unspecified and it is asking
> for trouble to allocate one yourself (iow, treat it pretty much as
> something you can only get a pointer to an instance from readdir()).
> For example, a dirent that comes back readdir() may have a lot
> longer name than the sizeof(.d_name[]) above may imply.
>
> Do you really need to manufacture a dirent yourself, or can you use
> a more concrete type you invent yourself?

I need to manufacture a dirent myself; short of that, the most likely
alternative is to drop patches 2 & 5-8 of this series and throw my
hands in the air and give up.  That probably deserves an
explanation...

Years ago someone noticed that if a user ran "git ls-files -o
foo/bar/one foo/bar/two", that we could try to optimize by noticing
that we won't be interested in anything until we get to foo/bar/.  So,
they tried to short-circuit the read_directory_recursive() and
readdir() calls, but couldn't reuse the same treat_path() logic to
check that we should even go into foo/bar/ at all.  So there was some
copy & paste from treat_path() into a new treat_leading_path()...and
that both missed some important parts and the logic further diverged
over time.

This patch was about categorizing the suite of bugs that arose from
not using treat_path() for checks from both codepaths, and tried to
correct those problems.  treat_path() takes a dirent, and several of
the functions it calls all take a dirent.  It'd be an awful lot of
work to rip it out.  So I manufactured a dirent myself so that we
could use the same codepaths and not only fix all these bugs but
prevent future ones as well.  If we can't manufacture a dirent, then
unless someone else has some bright ideas about something clever we
can do, then I think this problem blows up in complexity to a level
where I don't think it's worth addressing.

I almost ripped the optimization out altogether (just how much do we
really save by not looking into the leading two directories?), except
that unpack_trees() calls into the same code with a leading path and I
didn't want to mess with that.

Any bright ideas about what to do here?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-16 21:08             ` Elijah Newren
@ 2019-12-16 21:25               ` Junio C Hamano
  2019-12-16 22:39                 ` Elijah Newren
  0 siblings, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-16 21:25 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin, Elijah Newren via GitGitGadget,
	Git Mailing List, blees, Kyle Meyer, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

> Any bright ideas about what to do here?

Restructuring the code so that we do not use "struct dirent" in the
first place, even in the original code that used only those obtained
from readdir(), perhaps?  Then the codepath that would take the _thing_
that describes the diretory entry would expect to see the data in
the struct you define (not "struct dirent" from the system), and you
can safely manufacture ones out of thin air.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-16 21:25               ` Junio C Hamano
@ 2019-12-16 22:39                 ` Elijah Newren
  0 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren @ 2019-12-16 22:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Elijah Newren via GitGitGadget,
	Git Mailing List, blees, Kyle Meyer, Samuel Lijin

On Mon, Dec 16, 2019 at 1:25 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Any bright ideas about what to do here?
>
> Restructuring the code so that we do not use "struct dirent" in the
> first place, even in the original code that used only those obtained
> from readdir(), perhaps?  Then the codepath that would take the _thing_
> that describes the diretory entry would expect to see the data in
> the struct you define (not "struct dirent" from the system), and you
> can safely manufacture ones out of thin air.

Okay, I'll submit a new series dropping most the patches, but note
this thread in the commit message of the new testcases in case someone
(maybe my future self?) wants to dig further.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-16 16:00         ` Elijah Newren
  2019-12-16 18:13           ` Junio C Hamano
@ 2019-12-17  0:04           ` Johannes Schindelin
  2019-12-17  0:14             ` Junio C Hamano
  2019-12-17  5:26             ` Elijah Newren
  1 sibling, 2 replies; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-17  0:04 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Junio C Hamano, Kyle Meyer, Samuel Lijin

Hi Elijah,

On Mon, 16 Dec 2019, Elijah Newren wrote:

> On Mon, Dec 16, 2019 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > Hi Elijah,
> > >
> > > I have not had time to dive deeply into this, but I know that it _does_
> > > cause a ton of segmentation faults in the `shears/pu` branch (where all of
> > > Git for Windows' patches are rebased on top of `pu`):
> >
> > Weird.  If it's going to cause segmentation faults at all, it would
> > certainly do it all over the place, but I tested the patches on the
> > major platforms using your Azure Pipelines setup on git.git so it
> > should be good on all the platforms.  Did your shears/pu branch make
> > some other changes to the setup?

Not really.

> > > On Tue, 10 Dec 2019, Elijah Newren via GitGitGadget wrote:
> > >
> > > > diff --git a/dir.c b/dir.c
> > > > index 645b44ea64..9c71a9ac21 100644
> > > > --- a/dir.c
> > > > +++ b/dir.c
> > > > @@ -2102,37 +2102,69 @@ static int treat_leading_path(struct dir_struct *dir,
> > > >                             const struct pathspec *pathspec)
> > > >  {
> > > >       struct strbuf sb = STRBUF_INIT;
> > > > -     int baselen, rc = 0;
> > > > +     int prevlen, baselen;
> > > >       const char *cp;
> > > > +     struct cached_dir cdir;
> > > > +     struct dirent de;
> > > > +     enum path_treatment state = path_none;
> > > > +
> > > > +     /*
> > > > +      * For each directory component of path, we are going to check whether
> > > > +      * that path is relevant given the pathspec.  For example, if path is
> > > > +      *    foo/bar/baz/
> > > > +      * then we will ask treat_path() whether we should go into foo, then
> > > > +      * whether we should go into bar, then whether baz is relevant.
> > > > +      * Checking each is important because e.g. if path is
> > > > +      *    .git/info/
> > > > +      * then we need to check .git to know we shouldn't traverse it.
> > > > +      * If the return from treat_path() is:
> > > > +      *    * path_none, for any path, we return false.
> > > > +      *    * path_recurse, for all path components, we return true
> > > > +      *    * <anything else> for some intermediate component, we make sure
> > > > +      *        to add that path to the relevant list but return false
> > > > +      *        signifying that we shouldn't recurse into it.
> > > > +      */
> > > >
> > > >       while (len && path[len - 1] == '/')
> > > >               len--;
> > > >       if (!len)
> > > >               return 1;
> > > > +
> > > > +     memset(&cdir, 0, sizeof(cdir));
> > > > +     memset(&de, 0, sizeof(de));
> > > > +     cdir.de = &de;
> > > > +     de.d_type = DT_DIR;
> > >
> > > So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.
> >
> > Um, yeah...didn't I have an allocation of de.d_name here?  It will
> > always have a subset of path copied into it, so an allocation of len+1
> > is plenty long enough.
>
> Actually, it looks like I looked up the definition of dirent
> previously and forgot by the time you emailed.  On linux, from
> /usr/include/bits/dirent.h:
>
> struct dirent
>   {
>     ....
>     unsigned char d_type;
>     char d_name[256];           /* We must not include limits.h! */
>   };
>
> and from compat/win32/dirent.h defines it as:
>
> struct dirent {
>         unsigned char d_type;      /* file type to prevent lstat after
> readdir */
>         char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> };
>
> and 'man dirent' on Mac OS X says it's defined as:
>
> struct dirent {
>         ...
>         _uint8_t d_type;
>         _unit8_t d_namlen;   /* length of string in d_name */
>         char    d_name[255+1];  /* name must be no longer than this */
> }
>
> so, allocating it would be incorrect and my memset would just fill
> d_name with nul characters.
>
>
> But the raises the question...what kind of segfaults are you getting?
> Can you link to any builds or post any stack traces?  Can I duplicate
> with some copy of git-for-windows on linux?

If you care to look at our very own `compat/win32/dirent.h`, you will see
this:

struct dirent {
        unsigned char d_type; /* file type to prevent lstat after readdir */
        char *d_name;         /* file name */
};

And looking at
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
do not see any guarantee of that `[256]` at all:

The <dirent.h> header shall [...] define the structure dirent which shall
include the following members:

[XSI][Option Start]
ino_t  d_ino       File serial number.
[Option End]
char   d_name[]    Filename string of entry.

You will notice that not even `d_type` is guaranteed.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17  0:04           ` Johannes Schindelin
@ 2019-12-17  0:14             ` Junio C Hamano
  2019-12-17 11:08               ` Johannes Schindelin
  2019-12-17  5:26             ` Elijah Newren
  1 sibling, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-17  0:14 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> If you care to look at our very own `compat/win32/dirent.h`, you will see
> this:
>
> struct dirent {
>         unsigned char d_type; /* file type to prevent lstat after readdir */
>         char *d_name;         /* file name */
> };
>
> And looking at
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
> do not see any guarantee of that `[256]` at all:
>
> The <dirent.h> header shall [...] define the structure dirent which shall
> include the following members:
>
> [XSI][Option Start]
> ino_t  d_ino       File serial number.
> [Option End]
> char   d_name[]    Filename string of entry.
>
> You will notice that not even `d_type` is guaranteed.

I am reasonably sure that the code (without Elijah's patches anyway)
takes the possibility of missing d_type into account already.

Doesn't the above mean d_name[] has to be an in-place array of some
size (i.e. even a flex-array is OK)?  It does not look to me that it
allows for it to be a pointer pointing at elsewhere (possibly on
heap), which may be asking for trouble.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17  0:04           ` Johannes Schindelin
  2019-12-17  0:14             ` Junio C Hamano
@ 2019-12-17  5:26             ` Elijah Newren
  2019-12-17 11:15               ` Johannes Schindelin
  1 sibling, 1 reply; 69+ messages in thread
From: Elijah Newren @ 2019-12-17  5:26 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Junio C Hamano, Kyle Meyer, Samuel Lijin

Hi Dscho,

On Mon, Dec 16, 2019 at 4:04 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Mon, 16 Dec 2019, Elijah Newren wrote:
> > On Mon, Dec 16, 2019 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
> > >
> > > On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
> > > <Johannes.Schindelin@gmx.de> wrote:
> > > >
> > > > Hi Elijah,
> > > >
> > > > I have not had time to dive deeply into this, but I know that it _does_
> > > > cause a ton of segmentation faults in the `shears/pu` branch (where all of
> > > > Git for Windows' patches are rebased on top of `pu`):
> > >
> > > Weird.  If it's going to cause segmentation faults at all, it would
> > > certainly do it all over the place, but I tested the patches on the
> > > major platforms using your Azure Pipelines setup on git.git so it
> > > should be good on all the platforms.  Did your shears/pu branch make
> > > some other changes to the setup?
>
> Not really.
>
> >
> > Actually, it looks like I looked up the definition of dirent
> > previously and forgot by the time you emailed.  On linux, from
> > /usr/include/bits/dirent.h:
...
> > and from compat/win32/dirent.h defines it as:
> >
> > struct dirent {
> >         unsigned char d_type;      /* file type to prevent lstat after
> > readdir */
> >         char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> > };
...
>
> If you care to look at our very own `compat/win32/dirent.h`, you will see
> this:

Interesting, we both brought up compat/win32/dirent.h and quoted from
it in our emails...

> struct dirent {
>         unsigned char d_type; /* file type to prevent lstat after readdir */
>         char *d_name;         /* file name */
> };

...but the contents were different?  Looks like git-for-windows forked
compat/win32/dirent.h, possibly in a way that violates POSIX as
pointed out by Junio.  Any reason those changes weren't sent back
upstream, by chance?  Feels odd having a compat/win32/ directory that
our downstream windows users aren't actually using.  It also means the
testing I'm getting from gitgitgadget and your Azure setup (which all
is really, really nice by the way), is far less reassuring and helpful
than I hoped.

> And looking at
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
> do not see any guarantee of that `[256]` at all:
>
> The <dirent.h> header shall [...] define the structure dirent which shall
> include the following members:
>
> [XSI][Option Start]
> ino_t  d_ino       File serial number.
> [Option End]
> char   d_name[]    Filename string of entry.
>
> You will notice that not even `d_type` is guaranteed.

Doh, yeah, I messed that up too.

Anyway, as I mentioned to Junio, I'll resubmit after gutting the
series.  I'll still include a fix for the issue that a real world user
reported, but all the other ancillary bugs I found that have been
around for over a decade aren't important enough to merit a major
refactor, IMO.

Elijah

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 0/3] Directory traversal bugs
  2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                     ` (7 preceding siblings ...)
  2019-12-10 20:00   ` [PATCH v2 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
@ 2019-12-17  8:33   ` Elijah Newren via GitGitGadget
  2019-12-17  8:33     ` [PATCH v3 1/3] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
                       ` (4 more replies)
  8 siblings, 5 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-17  8:33 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano

This series documents multiple fill_directory() bugs, and fixes the one that
is new to 2.24.0 coming from en/clean-nested-with-ignored-topic, the rest
having been around in versions of git going back up to a decade. 

Changes since v2:

 * gutted the series of most the fixes, dropping the patch count from 8 to
   3, due to incompatibility with git-for-windows (which interestingly has a
   different compat/win32/dirent.h than git.git does). The only bugs
   reported by a user are fixed by patch 3, and fixing the remaining bugs
   (which I found while investigating the one fixed bug) would require a
   major refactor that I don't have the time for currently.

Elijah Newren (3):
  t3011: demonstrate directory traversal failures
  dir: remove stray quote character in comment
  dir: exit before wildcard fall-through if there is no wildcard

 dir.c                                         |   9 +-
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 2 files changed, 217 insertions(+), 1 deletion(-)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh


base-commit: da72936f544fec5a335e66432610e4cef4430991
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-676%2Fnewren%2Fls-files-bug-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-676/newren/ls-files-bug-v3
Pull-Request: https://github.com/git/git/pull/676

Range-diff vs v2:

 1:  6d659b2302 ! 1:  61d303d8bd t3011: demonstrate directory traversal failures
     @@ -14,6 +14,18 @@
          of the en/clean-nested-with-ignored-topic); the other 5 also failed
          under git-2.23.0 and earlier.
      
     +    The old failing tests can be traced down to the common prefix
     +    optimization in dir.c handling paths differently than
     +    read_directory_recursive() and treat_path() would, due to incomplete
     +    duplication of logic into treat_leading_path() and having that
     +    function call treat_one_path() rather than treat_path().  Fixing
     +    that problem would require restructuring treat_path() and its full
     +    call hierarchy to stop taking a dirent; see
     +       https://lore.kernel.org/git/xmqqzhfshsk2.fsf@gitster-ct.c.googlers.com/
     +    and the thread surrounding it for details.
     +
     +    For now, simply document the breakages.
     +
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
 2:  79f2b56174 < -:  ---------- Revert "dir.c: make 'git-status --ignored' work within leading directories"
 3:  d6f858cab1 = 2:  49b0b628db dir: remove stray quote character in comment
 4:  8d2d98eec3 = 3:  47814640e4 dir: exit before wildcard fall-through if there is no wildcard
 5:  d2f5623bd7 < -:  ---------- dir: break part of read_directory_recursive() out for reuse
 6:  9839aca00a < -:  ---------- dir: fix checks on common prefix directory
 7:  df7f08886a < -:  ---------- dir: synchronize treat_leading_path() and read_directory_recursive()
 8:  77b57e44fd < -:  ---------- dir: consolidate similar code in treat_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 1/3] t3011: demonstrate directory traversal failures
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
@ 2019-12-17  8:33     ` Elijah Newren via GitGitGadget
  2019-12-17  8:33     ` [PATCH v3 2/3] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-17  8:33 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add several tests demonstrating directory traversal failures of various
sorts in dir.c (and one similar looking test that turns out to be a
git_fnmatch bug).  A lot of these tests look like near duplicates of
each other, but an optimization path in dir.c to pre-descend into a
common prefix and the specialized treatment of trailing slashes in dir.c
mean the tiny differences are sometimes important and potentially cause
different codepaths to be explored.

Of the 7 failing tests, 2 are new to git-2.24.0 (tweaked by side effects
of the en/clean-nested-with-ignored-topic); the other 5 also failed
under git-2.23.0 and earlier.

The old failing tests can be traced down to the common prefix
optimization in dir.c handling paths differently than
read_directory_recursive() and treat_path() would, due to incomplete
duplication of logic into treat_leading_path() and having that
function call treat_one_path() rather than treat_path().  Fixing
that problem would require restructuring treat_path() and its full
call hierarchy to stop taking a dirent; see
   https://lore.kernel.org/git/xmqqzhfshsk2.fsf@gitster-ct.c.googlers.com/
and the thread surrounding it for details.

For now, simply document the breakages.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 1 file changed, 209 insertions(+)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh

diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
new file mode 100755
index 0000000000..54f80c62b8
--- /dev/null
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -0,0 +1,209 @@
+#!/bin/sh
+
+test_description='directory traversal handling, especially with common prefixes'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit hello &&
+
+	>empty &&
+	mkdir untracked_dir &&
+	>untracked_dir/empty &&
+	git init untracked_repo &&
+	>untracked_repo/empty &&
+
+	cat <<-EOF >.gitignore &&
+	ignored
+	an_ignored_dir/
+	EOF
+	mkdir an_ignored_dir &&
+	mkdir an_untracked_dir &&
+	>an_ignored_dir/ignored &&
+	>an_ignored_dir/untracked &&
+	>an_untracked_dir/ignored &&
+	>an_untracked_dir/untracked
+'
+
+test_expect_success 'git ls-files -o shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_ignored_dir/ignored
+	an_ignored_dir/untracked
+	an_untracked_dir/ignored
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --exclude-standard shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o --exclude-standard >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_repo does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o .git shows nothing' '
+	git ls-files -o .git >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_failure 'git ls-files -o .git/ shows nothing' '
+	git ls-files -o .git/ >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
+	mkdir "untracked_*" &&
+	>"untracked_*/empty" &&
+
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+# It turns out fill_directory returns the right paths, but ls-files' post-call
+# filtering in show_dir_entry() via calling dir_path_match() which ends up
+# in git_fnmatch() has logic for PATHSPEC_ONESTAR that assumes the pathspec
+# must match the full path; it doesn't check it for matching a leading
+# directory.
+test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o consistent between one or two dirs' '
+	git ls-files -o --exclude-standard an_ignored_dir/ an_untracked_dir/ >tmp &&
+	! grep ^an_ignored_dir/ tmp >expect &&
+	git ls-files -o --exclude-standard an_ignored_dir/ >actual &&
+	test_cmp expect actual
+'
+
+# ls-files doesn't have a way to request showing both untracked and ignored
+# files at the same time, so use `git status --ignored`
+test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+	cat <<-EOF >expect &&
+	?? an_untracked_dir/
+	!! an_untracked_dir/ignored
+	EOF
+	git status --porcelain --ignored >output &&
+	grep an_untracked_dir output >expect &&
+	git status --porcelain --ignored an_untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 2/3] dir: remove stray quote character in comment
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-17  8:33     ` [PATCH v3 1/3] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
@ 2019-12-17  8:33     ` Elijah Newren via GitGitGadget
  2019-12-17  8:33     ` [PATCH v3 3/3] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-17  8:33 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 61f559f980..13c0c2333c 100644
--- a/dir.c
+++ b/dir.c
@@ -373,7 +373,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
-		/* name" doesn't match up to the first wild character */
+		/* name doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
 		    ps_strncmp(item, match, name,
 			       item->nowildcard_len - prefix))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 3/3] dir: exit before wildcard fall-through if there is no wildcard
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-17  8:33     ` [PATCH v3 1/3] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
  2019-12-17  8:33     ` [PATCH v3 2/3] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
@ 2019-12-17  8:33     ` Elijah Newren via GitGitGadget
  2019-12-17 11:18     ` [PATCH v3 0/3] Directory traversal bugs Johannes Schindelin
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
  4 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-17  8:33 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a
wildcard, noting that we don't yet have enough information to determine
if a further paths under the current directory might match due to the
presence of wildcards.  But if we have no wildcards in our pathspec,
then we shouldn't get to that fall-through case.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                              | 7 +++++++
 t/t3011-common-prefixes-and-directory-traversal.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 13c0c2333c..f08bb073ef 100644
--- a/dir.c
+++ b/dir.c
@@ -379,6 +379,13 @@ static int match_pathspec_item(const struct index_state *istate,
 			       item->nowildcard_len - prefix))
 			return 0;
 
+		/*
+		 * name has no wildcard, and it didn't match as a leading
+		 * pathspec so return.
+		 */
+		if (item->nowildcard_len == item->len)
+			return 0;
+
 		/*
 		 * Here is where we would perform a wildmatch to check if
 		 * "name" can be matched as a directory (or a prefix) against
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 54f80c62b8..d6e161ddd8 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -92,7 +92,7 @@ test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+test_expect_success 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
 	cat <<-EOF >expect &&
 	untracked_dir/empty
 	untracked_repo/
@@ -110,7 +110,7 @@ test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses int
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
 	cat <<-EOF >expect &&
 	untracked_dir/
 	untracked_repo/
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17  0:14             ` Junio C Hamano
@ 2019-12-17 11:08               ` Johannes Schindelin
  2019-12-17 17:33                 ` Junio C Hamano
  0 siblings, 1 reply; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-17 11:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

Hi Junio,

On Mon, 16 Dec 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > If you care to look at our very own `compat/win32/dirent.h`, you will see
> > this:
> >
> > struct dirent {
> >         unsigned char d_type; /* file type to prevent lstat after readdir */
> >         char *d_name;         /* file name */
> > };
> >
> > And looking at
> > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
> > do not see any guarantee of that `[256]` at all:
> >
> > The <dirent.h> header shall [...] define the structure dirent which shall
> > include the following members:
> >
> > [XSI][Option Start]
> > ino_t  d_ino       File serial number.
> > [Option End]
> > char   d_name[]    Filename string of entry.
> >
> > You will notice that not even `d_type` is guaranteed.
>
> I am reasonably sure that the code (without Elijah's patches anyway)
> takes the possibility of missing d_type into account already.
>
> Doesn't the above mean d_name[] has to be an in-place array of some
> size (i.e. even a flex-array is OK)?  It does not look to me that it
> allows for it to be a pointer pointing at elsewhere (possibly on
> heap), which may be asking for trouble.

You are right, of course.

I also was not _quite_ spot on, as I had looked at Git for Windows'
`shears/pu` branch, not at the `pu` branch. Alas, we have patches in Git
for Windows that allow for switching to a faster, caching way to access
the stat() and readdir() data (it is called the "FSCache" and it is
responsible for some rather dramatic speed-ups). And these patches change
`struct dirent` to the form that is quoted above, to allow switching back
and forth between two _different_ backends, storing the actual `d_name`
not in `struct dirent` but in `DIR`.

Is this compliant with POSIX? I guess not. Does it work? Yes, it does.

I can't know for sure that it makes a dent, but FSCache is designed for
speed, and it actually does not even store the `d_name` in the `DIR`, but
directly in the cache structure, avoiding copying at all.

In short: if we can allow FSCache to keep operating that way (i.e. keep
`d_name` as a pointer), then that would be helpful to keep the performance
on Windows somewhat within acceptable boundaries.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17  5:26             ` Elijah Newren
@ 2019-12-17 11:15               ` Johannes Schindelin
  2019-12-17 16:58                 ` Elijah Newren
  0 siblings, 1 reply; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-17 11:15 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Junio C Hamano, Kyle Meyer, Samuel Lijin

Hi Elijah,

On Mon, 16 Dec 2019, Elijah Newren wrote:

> On Mon, Dec 16, 2019 at 4:04 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > On Mon, 16 Dec 2019, Elijah Newren wrote:
> > > On Mon, Dec 16, 2019 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
> > > >
> > > > On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
> > > > <Johannes.Schindelin@gmx.de> wrote:
> > > > >
> > > > > Hi Elijah,
> > > > >
> > > > > I have not had time to dive deeply into this, but I know that it _does_
> > > > > cause a ton of segmentation faults in the `shears/pu` branch (where all of
> > > > > Git for Windows' patches are rebased on top of `pu`):
> > > >
> > > > Weird.  If it's going to cause segmentation faults at all, it would
> > > > certainly do it all over the place, but I tested the patches on the
> > > > major platforms using your Azure Pipelines setup on git.git so it
> > > > should be good on all the platforms.  Did your shears/pu branch make
> > > > some other changes to the setup?
> >
> > Not really.
> >
> > >
> > > Actually, it looks like I looked up the definition of dirent
> > > previously and forgot by the time you emailed.  On linux, from
> > > /usr/include/bits/dirent.h:
> ...
> > > and from compat/win32/dirent.h defines it as:
> > >
> > > struct dirent {
> > >         unsigned char d_type;      /* file type to prevent lstat after
> > > readdir */
> > >         char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> > > };
> ...
> >
> > If you care to look at our very own `compat/win32/dirent.h`, you will see
> > this:
>
> Interesting, we both brought up compat/win32/dirent.h and quoted from
> it in our emails...
>
> > struct dirent {
> >         unsigned char d_type; /* file type to prevent lstat after readdir */
> >         char *d_name;         /* file name */
> > };
>
> ...but the contents were different?  Looks like git-for-windows forked
> compat/win32/dirent.h, possibly in a way that violates POSIX as
> pointed out by Junio.

Yep, I messed that up, sorry.

> Any reason those changes weren't sent back upstream, by chance?  Feels
> odd having a compat/win32/ directory that our downstream windows users
> aren't actually using.  It also means the testing I'm getting from
> gitgitgadget and your Azure setup (which all is really, really nice by
> the way), is far less reassuring and helpful than I hoped.

Yes. I was ready to submit the FSCache feature to the Git mailing list for
review some 2.5 years ago when along came Ben Peart, finding ways to speed
up FSCache even further. That is the reason why I held off, and I still
have to condense the patches (which currently form a topology of 17 patch
series!!!) into a nice small patch series that does not reflect the
meandering history of the FSCache history, but instead presents one neat
story.

> > And looking at
> > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
> > do not see any guarantee of that `[256]` at all:
> >
> > The <dirent.h> header shall [...] define the structure dirent which shall
> > include the following members:
> >
> > [XSI][Option Start]
> > ino_t  d_ino       File serial number.
> > [Option End]
> > char   d_name[]    Filename string of entry.
> >
> > You will notice that not even `d_type` is guaranteed.
>
> Doh, yeah, I messed that up too.
>
> Anyway, as I mentioned to Junio, I'll resubmit after gutting the
> series.  I'll still include a fix for the issue that a real world user
> reported, but all the other ancillary bugs I found that have been
> around for over a decade aren't important enough to merit a major
> refactor, IMO.

Hmm. I am really sorry that I nudged you to go down this route. Quite
honestly, I'd rather add an ugly work-around that is Windows-only just so
that you can fix those ancillary bugs.

I might even go so far as to adjust the FSCache's internal data structure
to _store_ `struct dirent` items, then the fast `readdir()` implementation
could be even faster by just pointing at those items.

What do you think? Can we strike a deal to keep those bug fixes?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 0/3] Directory traversal bugs
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2019-12-17  8:33     ` [PATCH v3 3/3] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
@ 2019-12-17 11:18     ` Johannes Schindelin
  2019-12-17 18:24       ` Junio C Hamano
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
  4 siblings, 1 reply; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-17 11:18 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, blees, Junio C Hamano, kyle, sxlijin, Junio C Hamano

Hi Elijah,

On Tue, 17 Dec 2019, Elijah Newren via GitGitGadget wrote:

> This series documents multiple fill_directory() bugs, and fixes the one that
> is new to 2.24.0 coming from en/clean-nested-with-ignored-topic, the rest
> having been around in versions of git going back up to a decade.
>
> Changes since v2:
>
>  * gutted the series of most the fixes, dropping the patch count from 8 to
>    3, due to incompatibility with git-for-windows (which interestingly has a
>    different compat/win32/dirent.h than git.git does). The only bugs
>    reported by a user are fixed by patch 3, and fixing the remaining bugs
>    (which I found while investigating the one fixed bug) would require a
>    major refactor that I don't have the time for currently.

I am really sorry that I caused you so much work.

As I said elsewhere, if Git for Windows' FSCache hack is the only thing
that is broken by this patch series, in light of the bugs that it _does_
fix I would rather adjust the FSCache patches to accommodate v2.

What do you think?

Ciao,
Dscho

> Elijah Newren (3):
>   t3011: demonstrate directory traversal failures
>   dir: remove stray quote character in comment
>   dir: exit before wildcard fall-through if there is no wildcard
>
>  dir.c                                         |   9 +-
>  ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
>  2 files changed, 217 insertions(+), 1 deletion(-)
>  create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh
>
>
> base-commit: da72936f544fec5a335e66432610e4cef4430991
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-676%2Fnewren%2Fls-files-bug-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-676/newren/ls-files-bug-v3
> Pull-Request: https://github.com/git/git/pull/676
>
> Range-diff vs v2:
>
>  1:  6d659b2302 ! 1:  61d303d8bd t3011: demonstrate directory traversal failures
>      @@ -14,6 +14,18 @@
>           of the en/clean-nested-with-ignored-topic); the other 5 also failed
>           under git-2.23.0 and earlier.
>
>      +    The old failing tests can be traced down to the common prefix
>      +    optimization in dir.c handling paths differently than
>      +    read_directory_recursive() and treat_path() would, due to incomplete
>      +    duplication of logic into treat_leading_path() and having that
>      +    function call treat_one_path() rather than treat_path().  Fixing
>      +    that problem would require restructuring treat_path() and its full
>      +    call hierarchy to stop taking a dirent; see
>      +       https://lore.kernel.org/git/xmqqzhfshsk2.fsf@gitster-ct.c.googlers.com/
>      +    and the thread surrounding it for details.
>      +
>      +    For now, simply document the breakages.
>      +
>           Signed-off-by: Elijah Newren <newren@gmail.com>
>
>        diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
>  2:  79f2b56174 < -:  ---------- Revert "dir.c: make 'git-status --ignored' work within leading directories"
>  3:  d6f858cab1 = 2:  49b0b628db dir: remove stray quote character in comment
>  4:  8d2d98eec3 = 3:  47814640e4 dir: exit before wildcard fall-through if there is no wildcard
>  5:  d2f5623bd7 < -:  ---------- dir: break part of read_directory_recursive() out for reuse
>  6:  9839aca00a < -:  ---------- dir: fix checks on common prefix directory
>  7:  df7f08886a < -:  ---------- dir: synchronize treat_leading_path() and read_directory_recursive()
>  8:  77b57e44fd < -:  ---------- dir: consolidate similar code in treat_directory()
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17 11:15               ` Johannes Schindelin
@ 2019-12-17 16:58                 ` Elijah Newren
  0 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren @ 2019-12-17 16:58 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Junio C Hamano, Kyle Meyer, Samuel Lijin

Hi Dscho,

On Tue, Dec 17, 2019 at 3:16 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Mon, 16 Dec 2019, Elijah Newren wrote:
>
> > On Mon, Dec 16, 2019 at 4:04 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > > On Mon, 16 Dec 2019, Elijah Newren wrote:
> > > > On Mon, Dec 16, 2019 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
> > > > >
> > > > > On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
> > > > > <Johannes.Schindelin@gmx.de> wrote:
> > > > > >
> > > > > > Hi Elijah,
> > > > > >
> > > > > > I have not had time to dive deeply into this, but I know that it _does_
> > > > > > cause a ton of segmentation faults in the `shears/pu` branch (where all of
> > > > > > Git for Windows' patches are rebased on top of `pu`):
> > > > >
> > > > > Weird.  If it's going to cause segmentation faults at all, it would
> > > > > certainly do it all over the place, but I tested the patches on the
> > > > > major platforms using your Azure Pipelines setup on git.git so it
> > > > > should be good on all the platforms.  Did your shears/pu branch make
> > > > > some other changes to the setup?
> > >
> > > Not really.
> > >
> > > >
> > > > Actually, it looks like I looked up the definition of dirent
> > > > previously and forgot by the time you emailed.  On linux, from
> > > > /usr/include/bits/dirent.h:
> > ...
> > > > and from compat/win32/dirent.h defines it as:
> > > >
> > > > struct dirent {
> > > >         unsigned char d_type;      /* file type to prevent lstat after
> > > > readdir */
> > > >         char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> > > > };
> > ...
> > >
> > > If you care to look at our very own `compat/win32/dirent.h`, you will see
> > > this:
> >
> > Interesting, we both brought up compat/win32/dirent.h and quoted from
> > it in our emails...
> >
> > > struct dirent {
> > >         unsigned char d_type; /* file type to prevent lstat after readdir */
> > >         char *d_name;         /* file name */
> > > };
> >
> > ...but the contents were different?  Looks like git-for-windows forked
> > compat/win32/dirent.h, possibly in a way that violates POSIX as
> > pointed out by Junio.
>
> Yep, I messed that up, sorry.
>
> > Any reason those changes weren't sent back upstream, by chance?  Feels
> > odd having a compat/win32/ directory that our downstream windows users
> > aren't actually using.  It also means the testing I'm getting from
> > gitgitgadget and your Azure setup (which all is really, really nice by
> > the way), is far less reassuring and helpful than I hoped.
>
> Yes. I was ready to submit the FSCache feature to the Git mailing list for
> review some 2.5 years ago when along came Ben Peart, finding ways to speed
> up FSCache even further. That is the reason why I held off, and I still
> have to condense the patches (which currently form a topology of 17 patch
> series!!!) into a nice small patch series that does not reflect the
> meandering history of the FSCache history, but instead presents one neat
> story.
>
> > > And looking at
> > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
> > > do not see any guarantee of that `[256]` at all:
> > >
> > > The <dirent.h> header shall [...] define the structure dirent which shall
> > > include the following members:
> > >
> > > [XSI][Option Start]
> > > ino_t  d_ino       File serial number.
> > > [Option End]
> > > char   d_name[]    Filename string of entry.
> > >
> > > You will notice that not even `d_type` is guaranteed.
> >
> > Doh, yeah, I messed that up too.
> >
> > Anyway, as I mentioned to Junio, I'll resubmit after gutting the
> > series.  I'll still include a fix for the issue that a real world user
> > reported, but all the other ancillary bugs I found that have been
> > around for over a decade aren't important enough to merit a major
> > refactor, IMO.
>
> Hmm. I am really sorry that I nudged you to go down this route. Quite
> honestly, I'd rather add an ugly work-around that is Windows-only just so
> that you can fix those ancillary bugs.

You brought up issues; that's what you're supposed to do.  You
shouldn't feel bad about that.  Besides, the d_type one is real, and
means the patches at least need a
    #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
surrounding my explicit setting of d_type.  The problem wasn't what
you brought up or how you brought it up, it's massive fatigue on my
end from dir.c, from before even submitting this series[*].  I'm not
giving up on these changes or trying to discourage anyone else from
picking them up and extending them, I just don't want to touch them
right now and would rather put them on the shelf for a while.

Elijah


[*]  If you're really curious...I got involved in dir.c because of a
simple bug report nearly two years ago[1], and found myself working on
a foundation that was error-prone by design[2], with ambiguous or even
wrong documentation[3] about not just what the code does but the
intent.  Further, it was a place where not only is the correct fix
unclear, and not only is the "right" behavior unclear, but the cases
in question affect so few people that pinging the list periodically
over more than a year can't generate enough interest for anyone else
to hazard a guess as to what "correct" behavior is[4].  Stack on that
the fact that every time I touch this area, I think I'm really close
to having a fix, only to find I never, ever am.  There's always
one-more-thing before I can finally get back to something I really
wanted to work on instead.  Speaking of which, I've only managed to
work on my new merge strategy like once every 3-6 months for a small
amount of time each time.  Yes, part of that's my fault with
git-filter-repo (another case of perpetually thinking I'm close to
done), rebase changes, and whatnot.  But this series arose right when
I had my calendar nearly cleared so that I could work on the merge
strategy again (and of course the rebase bug report came in about the
same time too).  But at least git-filter-repo and rebase are generally
useful; dir.c at most generates "meh, this seems annoying" reports.
And I've already fixed all of those, the remaining fixes are stuff
that it appears I'm the only one to have reported, and I only reported
it because I was digging into the other "meh, seems annoying" reports.
I'm usually happy when I have a patch series ready to submit to git;
it means I think I'll make things better for others.  I didn't feel
that way with this series; I kind of wanted to just drop it entirely
and not even turn it in.  But I figured I should to at least document
my findings, so I pushed myself to submit and hoped no one would
respond.  Then this issue arose and when I mentioned in my
possibilities of fixing it that ripping the usage of dirent out would
be a lot of work and would probably cause me to give up and asked for
ideas, Junio responded that we should rip out dirent.  I think he's
right, and it's important the he defend code quality and point out the
right way to do things, it's just that I want out of this rabbit hole
right now.

[1] https://lore.kernel.org/git/20180405173446.32372-1-newren@gmail.com/
[2] https://lore.kernel.org/git/xmqqefjp6sko.fsf@gitster-ct.c.googlers.com/
[3] e.g. https://lore.kernel.org/git/20190905154735.29784-10-newren@gmail.com/
[4] https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
and links referenced therein

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17 11:08               ` Johannes Schindelin
@ 2019-12-17 17:33                 ` Junio C Hamano
  2019-12-17 19:32                   ` Johannes Schindelin
  0 siblings, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-17 17:33 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > [XSI][Option Start]
>> > ino_t  d_ino       File serial number.
>> > [Option End]
>> > char   d_name[]    Filename string of entry.
>> >
>> > You will notice that not even `d_type` is guaranteed.
>>
>> I am reasonably sure that the code (without Elijah's patches anyway)
>> takes the possibility of missing d_type into account already.
>>
>> Doesn't the above mean d_name[] has to be an in-place array of some
>> size (i.e. even a flex-array is OK)?  It does not look to me that it
>> allows for it to be a pointer pointing at elsewhere (possibly on
>> heap), which may be asking for trouble.
>
> You are right, of course.
>
> ...
>
> Is this compliant with POSIX? I guess not. Does it work? Yes, it does.

I actually would not throw it into "it works" category.  The obvious
implication is that a program like this:

	static struct dirent *fabricate(const char *name)
	{
        	/* over-allocate as we do not know how long the	d_name[] is */
		struct dirent *ent = calloc(1, sizeof(*ent) + strlen(name) + 1);
		strcpy(ent->d_name, name);
		return ent;
	}

	static void show_name(const struct dirent *ent)
	{
		printf("%s\n", ent->d_name);
	}

	int main(int ac, char **av)
	{
		struct dirent *mine = fabricate("mine");
                show_name(mine);
		free(mine);
		return 0;
	}

would be broken if you do not have d_name as an array.

I would not be surprised if the segfaults you saw with Elijah's
series all were caused by your d_name not being an array, and if
that is the case, I'd rather see it fixed on your end than fixes
withdrawn.

Thanks.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 0/3] Directory traversal bugs
  2019-12-17 11:18     ` [PATCH v3 0/3] Directory traversal bugs Johannes Schindelin
@ 2019-12-17 18:24       ` Junio C Hamano
  2019-12-21 22:05         ` Johannes Schindelin
  0 siblings, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-17 18:24 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren via GitGitGadget, git, blees, kyle, sxlijin

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> As I said elsewhere, if Git for Windows' FSCache hack is the only thing
> that is broken by this patch series, in light of the bugs that it _does_
> fix I would rather adjust the FSCache patches to accommodate v2.

With "FSCache hack", do you refer to the "d_name is a pointer to
elsewhere" thing?  If so, I too very much appreciate the direction
you are suggesting.  Seeing that these three patches essentially are
the same as three (1/8, 3/8 and 4/8) from the v2, I'd keep all the 8
patches from v2 in my tree for now.

Thanks, both.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
  2019-12-17 17:33                 ` Junio C Hamano
@ 2019-12-17 19:32                   ` Johannes Schindelin
  0 siblings, 0 replies; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-17 19:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

Hi Junio,

On Tue, 17 Dec 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> > [XSI][Option Start]
> >> > ino_t  d_ino       File serial number.
> >> > [Option End]
> >> > char   d_name[]    Filename string of entry.
> >> >
> >> > You will notice that not even `d_type` is guaranteed.
> >>
> >> I am reasonably sure that the code (without Elijah's patches anyway)
> >> takes the possibility of missing d_type into account already.
> >>
> >> Doesn't the above mean d_name[] has to be an in-place array of some
> >> size (i.e. even a flex-array is OK)?  It does not look to me that it
> >> allows for it to be a pointer pointing at elsewhere (possibly on
> >> heap), which may be asking for trouble.
> >
> > You are right, of course.
> >
> > ...
> >
> > Is this compliant with POSIX? I guess not. Does it work? Yes, it does.
>
> I actually would not throw it into "it works" category.  The obvious
> implication is that a program like this:
>
> 	static struct dirent *fabricate(const char *name)
> 	{
>         	/* over-allocate as we do not know how long the	d_name[] is */
> 		struct dirent *ent = calloc(1, sizeof(*ent) + strlen(name) + 1);
> 		strcpy(ent->d_name, name);
> 		return ent;
> 	}
>
> 	static void show_name(const struct dirent *ent)
> 	{
> 		printf("%s\n", ent->d_name);
> 	}
>
> 	int main(int ac, char **av)
> 	{
> 		struct dirent *mine = fabricate("mine");
>                 show_name(mine);
> 		free(mine);
> 		return 0;
> 	}
>
> would be broken if you do not have d_name as an array.
>
> I would not be surprised if the segfaults you saw with Elijah's
> series all were caused by your d_name not being an array, and if
> that is the case, I'd rather see it fixed on your end than fixes
> withdrawn.

I agree with this reasoning.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v4 0/8] Directory traversal bugs
  2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2019-12-17 11:18     ` [PATCH v3 0/3] Directory traversal bugs Johannes Schindelin
@ 2019-12-18 19:29     ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
                         ` (8 more replies)
  4 siblings, 9 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano

This series documents multiple fill_directory() bugs, and fixes the one that
is new to 2.24.0 coming from en/clean-nested-with-ignored-topic, the rest
having been around in versions of git going back up to a decade. 

Changes since v2 (v3 was sent earlier, but this series is closer to v2):

 * protected access to d_type with a !defined(NO_D_TYPE_IN_DIRENT) and made
   sure to allocate a dirent on the heap with some extra space for d_name
   rather than allocating the dirent on the stack.

Elijah Newren (8):
  t3011: demonstrate directory traversal failures
  Revert "dir.c: make 'git-status --ignored' work within leading
    directories"
  dir: remove stray quote character in comment
  dir: exit before wildcard fall-through if there is no wildcard
  dir: break part of read_directory_recursive() out for reuse
  dir: fix checks on common prefix directory
  dir: synchronize treat_leading_path() and read_directory_recursive()
  dir: consolidate similar code in treat_directory()

 dir.c                                         | 177 +++++++++++----
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 t/t7061-wtstatus-ignore.sh                    |   9 +-
 3 files changed, 344 insertions(+), 51 deletions(-)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh


base-commit: da72936f544fec5a335e66432610e4cef4430991
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-676%2Fnewren%2Fls-files-bug-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-676/newren/ls-files-bug-v4
Pull-Request: https://github.com/git/git/pull/676

Range-diff vs v3:

 1:  61d303d8bd ! 1:  6d659b2302 t3011: demonstrate directory traversal failures
     @@ -14,18 +14,6 @@
          of the en/clean-nested-with-ignored-topic); the other 5 also failed
          under git-2.23.0 and earlier.
      
     -    The old failing tests can be traced down to the common prefix
     -    optimization in dir.c handling paths differently than
     -    read_directory_recursive() and treat_path() would, due to incomplete
     -    duplication of logic into treat_leading_path() and having that
     -    function call treat_one_path() rather than treat_path().  Fixing
     -    that problem would require restructuring treat_path() and its full
     -    call hierarchy to stop taking a dirent; see
     -       https://lore.kernel.org/git/xmqqzhfshsk2.fsf@gitster-ct.c.googlers.com/
     -    and the thread surrounding it for details.
     -
     -    For now, simply document the breakages.
     -
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
 -:  ---------- > 2:  79f2b56174 Revert "dir.c: make 'git-status --ignored' work within leading directories"
 2:  49b0b628db = 3:  d6f858cab1 dir: remove stray quote character in comment
 3:  47814640e4 = 4:  8d2d98eec3 dir: exit before wildcard fall-through if there is no wildcard
 -:  ---------- > 5:  d2f5623bd7 dir: break part of read_directory_recursive() out for reuse
 -:  ---------- > 6:  1f3978aa46 dir: fix checks on common prefix directory
 -:  ---------- > 7:  542c6e5792 dir: synchronize treat_leading_path() and read_directory_recursive()
 -:  ---------- > 8:  31079dc1cf dir: consolidate similar code in treat_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v4 1/8] t3011: demonstrate directory traversal failures
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add several tests demonstrating directory traversal failures of various
sorts in dir.c (and one similar looking test that turns out to be a
git_fnmatch bug).  A lot of these tests look like near duplicates of
each other, but an optimization path in dir.c to pre-descend into a
common prefix and the specialized treatment of trailing slashes in dir.c
mean the tiny differences are sometimes important and potentially cause
different codepaths to be explored.

Of the 7 failing tests, 2 are new to git-2.24.0 (tweaked by side effects
of the en/clean-nested-with-ignored-topic); the other 5 also failed
under git-2.23.0 and earlier.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 1 file changed, 209 insertions(+)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh

diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
new file mode 100755
index 0000000000..54f80c62b8
--- /dev/null
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -0,0 +1,209 @@
+#!/bin/sh
+
+test_description='directory traversal handling, especially with common prefixes'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit hello &&
+
+	>empty &&
+	mkdir untracked_dir &&
+	>untracked_dir/empty &&
+	git init untracked_repo &&
+	>untracked_repo/empty &&
+
+	cat <<-EOF >.gitignore &&
+	ignored
+	an_ignored_dir/
+	EOF
+	mkdir an_ignored_dir &&
+	mkdir an_untracked_dir &&
+	>an_ignored_dir/ignored &&
+	>an_ignored_dir/untracked &&
+	>an_untracked_dir/ignored &&
+	>an_untracked_dir/untracked
+'
+
+test_expect_success 'git ls-files -o shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_ignored_dir/ignored
+	an_ignored_dir/untracked
+	an_untracked_dir/ignored
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --exclude-standard shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o --exclude-standard >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_repo does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o .git shows nothing' '
+	git ls-files -o .git >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_failure 'git ls-files -o .git/ shows nothing' '
+	git ls-files -o .git/ >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
+	mkdir "untracked_*" &&
+	>"untracked_*/empty" &&
+
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+# It turns out fill_directory returns the right paths, but ls-files' post-call
+# filtering in show_dir_entry() via calling dir_path_match() which ends up
+# in git_fnmatch() has logic for PATHSPEC_ONESTAR that assumes the pathspec
+# must match the full path; it doesn't check it for matching a leading
+# directory.
+test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o consistent between one or two dirs' '
+	git ls-files -o --exclude-standard an_ignored_dir/ an_untracked_dir/ >tmp &&
+	! grep ^an_ignored_dir/ tmp >expect &&
+	git ls-files -o --exclude-standard an_ignored_dir/ >actual &&
+	test_cmp expect actual
+'
+
+# ls-files doesn't have a way to request showing both untracked and ignored
+# files at the same time, so use `git status --ignored`
+test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+	cat <<-EOF >expect &&
+	?? an_untracked_dir/
+	!! an_untracked_dir/ignored
+	EOF
+	git status --porcelain --ignored >output &&
+	grep an_untracked_dir output >expect &&
+	git status --porcelain --ignored an_untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit be8a84c52669 ("dir.c: make 'git-status --ignored' work within
leading directories", 2013-04-15) noted that
   git status --ignored <SOMEPATH>
would not list ignored files and directories within <SOMEPATH> if
<SOMEPATH> was untracked, and modified the behavior to make it show
them.  However, it did so via a hack that broke consistency; it would
show paths under <SOMEPATH> differently than a simple
   git status --ignored | grep <SOMEPATH>
would show them.  A correct fix is slightly more involved, and
complicated slightly by this hack, so we revert this commit (but keep
corrected versions of the testcases) and will later fix the original
bug with a subsequent patch.

Some history may be helpful:

A very, very similar case to the commit we are reverting was raised in
commit 48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08); but it actually went in somewhat the opposite direction.  In
that commit, it mentioned how
   git ls-files -o --exclude-standard t/
used to show untracked files under t/ even when t/ was ignored, and then
changed the behavior to stop showing untracked files under an ignored
directory.  More importantly, this commit considered keeping this
behavior but noted that it would be inconsistent with the behavior when
multiple pathspecs were specified and thus rejected it.

The reason for this whole inconsistency when one pathspec is specified
versus zero or two is because common prefixes of pathspecs are sent
through a different set of checks (in treat_leading_path()) than normal
file/directory traversal (those go through read_directory_recursive()
and treat_path()).  As such, for consistency, one needs to check that
both codepaths produce the same result.

Revert commit be8a84c526691667fc04a8241d93a3de1de298ab, except instead
of removing the testcase it added, modify it to check for correct and
consistent behavior.  A subsequent patch in this series will fix the
testcase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                      | 3 ---
 t/t7061-wtstatus-ignore.sh | 9 +++++++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index 61f559f980..0dd5266629 100644
--- a/dir.c
+++ b/dir.c
@@ -2083,14 +2083,12 @@ static int treat_leading_path(struct dir_struct *dir,
 	struct strbuf sb = STRBUF_INIT;
 	int baselen, rc = 0;
 	const char *cp;
-	int old_flags = dir->flags;
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
 	baselen = 0;
-	dir->flags &= ~DIR_SHOW_OTHER_DIRECTORIES;
 	while (1) {
 		cp = path + baselen + !!baselen;
 		cp = memchr(cp, '/', path + len - cp);
@@ -2113,7 +2111,6 @@ static int treat_leading_path(struct dir_struct *dir,
 		}
 	}
 	strbuf_release(&sb);
-	dir->flags = old_flags;
 	return rc;
 }
 
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 0c394cf995..84366050da 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -43,11 +43,16 @@ test_expect_success 'status untracked directory with --ignored -u' '
 	test_cmp expected actual
 '
 cat >expected <<\EOF
-?? untracked/uncommitted
+?? untracked/
 !! untracked/ignored
 EOF
 
-test_expect_success 'status prefixed untracked directory with --ignored' '
+test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+	git status --porcelain --ignored >tmp &&
+	grep untracked/ tmp >actual &&
+	rm tmp &&
+	test_cmp expected actual &&
+
 	git status --porcelain --ignored untracked/ >actual &&
 	test_cmp expected actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 3/8] dir: remove stray quote character in comment
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 0dd5266629..5dacacd469 100644
--- a/dir.c
+++ b/dir.c
@@ -373,7 +373,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
-		/* name" doesn't match up to the first wild character */
+		/* name doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
 		    ps_strncmp(item, match, name,
 			       item->nowildcard_len - prefix))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 4/8] dir: exit before wildcard fall-through if there is no wildcard
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
                         ` (2 preceding siblings ...)
  2019-12-18 19:29       ` [PATCH v4 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a
wildcard, noting that we don't yet have enough information to determine
if a further paths under the current directory might match due to the
presence of wildcards.  But if we have no wildcards in our pathspec,
then we shouldn't get to that fall-through case.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                              | 7 +++++++
 t/t3011-common-prefixes-and-directory-traversal.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 5dacacd469..517a569e10 100644
--- a/dir.c
+++ b/dir.c
@@ -379,6 +379,13 @@ static int match_pathspec_item(const struct index_state *istate,
 			       item->nowildcard_len - prefix))
 			return 0;
 
+		/*
+		 * name has no wildcard, and it didn't match as a leading
+		 * pathspec so return.
+		 */
+		if (item->nowildcard_len == item->len)
+			return 0;
+
 		/*
 		 * Here is where we would perform a wildmatch to check if
 		 * "name" can be matched as a directory (or a prefix) against
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 54f80c62b8..d6e161ddd8 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -92,7 +92,7 @@ test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+test_expect_success 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
 	cat <<-EOF >expect &&
 	untracked_dir/empty
 	untracked_repo/
@@ -110,7 +110,7 @@ test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses int
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
 	cat <<-EOF >expect &&
 	untracked_dir/
 	untracked_repo/
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 5/8] dir: break part of read_directory_recursive() out for reuse
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
                         ` (3 preceding siblings ...)
  2019-12-18 19:29       ` [PATCH v4 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create an add_path_to_appropriate_result_list() function from the code
at the end of read_directory_recursive() so we can use it elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 60 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/dir.c b/dir.c
index 517a569e10..645b44ea64 100644
--- a/dir.c
+++ b/dir.c
@@ -1932,6 +1932,40 @@ static void close_cached_dir(struct cached_dir *cdir)
 	}
 }
 
+static void add_path_to_appropriate_result_list(struct dir_struct *dir,
+	struct untracked_cache_dir *untracked,
+	struct cached_dir *cdir,
+	struct index_state *istate,
+	struct strbuf *path,
+	int baselen,
+	const struct pathspec *pathspec,
+	enum path_treatment state)
+{
+	/* add the path to the appropriate result list */
+	switch (state) {
+	case path_excluded:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			dir_add_name(dir, istate, path->buf, path->len);
+		else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			((dir->flags & DIR_COLLECT_IGNORED) &&
+			exclude_matches_pathspec(path->buf, path->len,
+						 pathspec)))
+			dir_add_ignored(dir, istate, path->buf, path->len);
+		break;
+
+	case path_untracked:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			break;
+		dir_add_name(dir, istate, path->buf, path->len);
+		if (cdir->fdir)
+			add_untracked(untracked, path->buf + baselen);
+		break;
+
+	default:
+		break;
+	}
+}
+
 /*
  * Read a directory tree. We currently ignore anything but
  * directories, regular files and symlinks. That's because git
@@ -2035,29 +2069,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			continue;
 		}
 
-		/* add the path to the appropriate result list */
-		switch (state) {
-		case path_excluded:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				dir_add_name(dir, istate, path.buf, path.len);
-			else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-				((dir->flags & DIR_COLLECT_IGNORED) &&
-				exclude_matches_pathspec(path.buf, path.len,
-							 pathspec)))
-				dir_add_ignored(dir, istate, path.buf, path.len);
-			break;
-
-		case path_untracked:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				break;
-			dir_add_name(dir, istate, path.buf, path.len);
-			if (cdir.fdir)
-				add_untracked(untracked, path.buf + baselen);
-			break;
-
-		default:
-			break;
-		}
+		add_path_to_appropriate_result_list(dir, untracked, &cdir,
+						    istate, &path, baselen,
+						    pathspec, state);
 	}
 	close_cached_dir(&cdir);
  out:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
                         ` (4 preceding siblings ...)
  2019-12-18 19:29       ` [PATCH v4 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 21:29         ` Junio C Hamano
  2019-12-18 19:29       ` [PATCH v4 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Many years ago, the directory traversing logic had an optimization that
would always recurse into any directory that was a common prefix of all
the pathspecs without walking the leading directories to get down to
the desired directory.  Thus,
   git ls-files -o .git/                        # case A
would notice that .git/ was a common prefix of all pathspecs (since
it is the only pathspec listed), and then traverse into it and start
showing unknown files under that directory.  Unfortunately, .git/ is not
a directory we should be traversing into, which made this optimization
problematic.  This also affected cases like
   git ls-files -o --exclude-standard t/        # case B
where t/ was in the .gitignore file and thus isn't interesting and
shouldn't be recursed into.  It also affected cases like
   git ls-files -o --directory untracked_dir/   # case C
where untracked_dir/ is indeed untracked and thus interesting, but the
--directory flag means we only want to show the directory itself, not
recurse into it and start listing untracked files below it.

The case B class of bugs were noted and fixed in commits 16e2cfa90993
("read_directory(): further split treat_path()", 2010-01-08) and
48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08), with the idea being that we first wanted to check whether
the common prefix was interesting.  The former patch noted that
treat_path() couldn't be used when checking the common prefix because
treat_path() requires a dir_entry() and we haven't read any directories
at the point we are checking the common prefix.  So, that patch split
treat_one_path() out of treat_path().  The latter patch then created a
new treat_leading_path() which duplicated by hand the bits of
treat_path() that couldn't be broken out and then called
treat_one_path() for the remainder.  There were three problems with this
approach:

  * The duplicated logic in treat_leading_path() accidentally missed the
    check for special paths (such as is_dot_or_dotdot and matching
    ".git"), causing case A types of bugs to continue to be an issue.
  * The treat_leading_path() logic assumed we should traverse into
    anything where path_treatment was not path_none, i.e. it perpetuated
    class C types of bugs.
  * It meant we had split logic that needed to kept in sync, running the
    risk that people introduced new inconsistencies (such as in commit
    be8a84c52669, which we reverted earlier in this series, or in commit
    df5bcdf83ae which we'll fix in a subsequent commit)

Fix most these problems by making treat_leading_path() not only loop
over each leading path component, but calling treat_path() directly on
each.  To do so, we have to create a synthetic dir_entry, but that only
takes a few lines.  Then, pay attention to the path_treatment result we
get from treat_path() and don't treat path_excluded, path_untracked, and
path_recurse all the same as path_recurse.

This leaves one remaining problem, the new inconsistency from commit
df5bcdf83ae.  That will be addressed in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 57 +++++++++++++++----
 ...common-prefixes-and-directory-traversal.sh |  6 +-
 2 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/dir.c b/dir.c
index 645b44ea64..1de5d7ad33 100644
--- a/dir.c
+++ b/dir.c
@@ -2102,37 +2102,72 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const struct pathspec *pathspec)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int baselen, rc = 0;
+	int prevlen, baselen;
 	const char *cp;
+	struct cached_dir cdir;
+	struct dirent *de;
+	enum path_treatment state = path_none;
+
+	/*
+	 * For each directory component of path, we are going to check whether
+	 * that path is relevant given the pathspec.  For example, if path is
+	 *    foo/bar/baz/
+	 * then we will ask treat_path() whether we should go into foo, then
+	 * whether we should go into bar, then whether baz is relevant.
+	 * Checking each is important because e.g. if path is
+	 *    .git/info/
+	 * then we need to check .git to know we shouldn't traverse it.
+	 * If the return from treat_path() is:
+	 *    * path_none, for any path, we return false.
+	 *    * path_recurse, for all path components, we return true
+	 *    * <anything else> for some intermediate component, we make sure
+	 *        to add that path to the relevant list but return false
+	 *        signifying that we shouldn't recurse into it.
+	 */
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
+
+	de = xcalloc(1, sizeof(struct dirent)+len+1);
+	memset(&cdir, 0, sizeof(cdir));
+	cdir.de = de;
+#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
+	de->d_type = DT_DIR;
+#endif
 	baselen = 0;
+	prevlen = 0;
 	while (1) {
-		cp = path + baselen + !!baselen;
+		prevlen = baselen + !!baselen;
+		cp = path + prevlen;
 		cp = memchr(cp, '/', path + len - cp);
 		if (!cp)
 			baselen = len;
 		else
 			baselen = cp - path;
-		strbuf_setlen(&sb, 0);
+		strbuf_reset(&sb);
 		strbuf_add(&sb, path, baselen);
 		if (!is_directory(sb.buf))
 			break;
-		if (simplify_away(sb.buf, sb.len, pathspec))
-			break;
-		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
-				   DT_DIR, NULL) == path_none)
+		strbuf_reset(&sb);
+		strbuf_add(&sb, path, prevlen);
+		memcpy(de->d_name, path+prevlen, baselen-prevlen);
+		de->d_name[baselen-prevlen] = '\0';
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
+				    pathspec);
+		if (state != path_recurse)
 			break; /* do not recurse into it */
-		if (len <= baselen) {
-			rc = 1;
+		if (len <= baselen)
 			break; /* finished checking */
-		}
 	}
+	add_path_to_appropriate_result_list(dir, NULL, &cdir, istate,
+					    &sb, baselen, pathspec,
+					    state);
+
+	free(de);
 	strbuf_release(&sb);
-	return rc;
+	return state == path_recurse;
 }
 
 static const char *get_ident_string(void)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index d6e161ddd8..098fddc75b 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -74,7 +74,7 @@ test_expect_success 'git ls-files -o --directory untracked_dir does not recurse'
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir/ does not recurse' '
 	echo untracked_dir/ >expect &&
 	git ls-files -o --directory untracked_dir/ >actual &&
 	test_cmp expect actual
@@ -86,7 +86,7 @@ test_expect_success 'git ls-files -o untracked_repo does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+test_expect_success 'git ls-files -o untracked_repo/ does not recurse' '
 	echo untracked_repo/ >expect &&
 	git ls-files -o untracked_repo/ >actual &&
 	test_cmp expect actual
@@ -133,7 +133,7 @@ test_expect_success 'git ls-files -o .git shows nothing' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'git ls-files -o .git/ shows nothing' '
+test_expect_success 'git ls-files -o .git/ shows nothing' '
 	git ls-files -o .git/ >actual &&
 	test_must_be_empty actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 7/8] dir: synchronize treat_leading_path() and read_directory_recursive()
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
                         ` (5 preceding siblings ...)
  2019-12-18 19:29       ` [PATCH v4 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-18 19:29       ` [PATCH v4 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our optimization to avoid calling into read_directory_recursive() when
all pathspecs have a common leading directory mean that we need to match
the logic that read_directory_recursive() would use if we had just
called it from the root.  Since it does more than call treat_path() we
need to copy that same logic.

Alternatively, we could try to change treat_path to return path_recurse
for an untracked directory under the given special circumstances that
this logic checks for, but a simple switch results in many test failures
such as 'git clean -d' not wiping out untracked but empty directories.
To work around that, we'd need the caller of treat_path to check for
path_recurse and sometimes special case it into path_untracked.  In
other words, we'd still have extra logic in both places.

Needing to duplicate logic like this means it is guaranteed someone will
eventually need to make further changes and forget to update both
locations.  It is tempting to just nuke the leading_directory special
casing to avoid such bugs and simplify the code, but unpack_trees'
verify_clean_subdirectory() also calls read_directory() and does so with
a non-empty leading path, so I'm hesitant to try to restructure further.
Add obnoxious warnings to treat_leading_path() and
read_directory_recursive() to try to warn people of such problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 30 +++++++++++++++++++
 ...common-prefixes-and-directory-traversal.sh |  2 +-
 t/t7061-wtstatus-ignore.sh                    |  2 +-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 1de5d7ad33..f500fd9279 100644
--- a/dir.c
+++ b/dir.c
@@ -1990,6 +1990,15 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct untracked_cache_dir *untracked, int check_only,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2101,6 +2110,15 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct strbuf sb = STRBUF_INIT;
 	int prevlen, baselen;
 	const char *cp;
@@ -2156,6 +2174,18 @@ static int treat_leading_path(struct dir_struct *dir,
 		de->d_name[baselen-prevlen] = '\0';
 		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
 				    pathspec);
+		if (state == path_untracked &&
+		    get_dtype(cdir.de, istate, sb.buf, sb.len) == DT_DIR &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
+		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
+				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
+			add_path_to_appropriate_result_list(dir, NULL, &cdir,
+							    istate,
+							    &sb, baselen,
+							    pathspec, state);
+			state = path_recurse;
+		}
+
 		if (state != path_recurse)
 			break; /* do not recurse into it */
 		if (len <= baselen)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 098fddc75b..3da5b2b6e7 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -195,7 +195,7 @@ test_expect_success 'git ls-files -o consistent between one or two dirs' '
 
 # ls-files doesn't have a way to request showing both untracked and ignored
 # files at the same time, so use `git status --ignored`
-test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+test_expect_success 'git status --ignored shows same files under dir with or without pathspec' '
 	cat <<-EOF >expect &&
 	?? an_untracked_dir/
 	!! an_untracked_dir/ignored
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 84366050da..e4cf5484f9 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -47,7 +47,7 @@ cat >expected <<\EOF
 !! untracked/ignored
 EOF
 
-test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+test_expect_success 'status of untracked directory with --ignored works with or without prefix' '
 	git status --porcelain --ignored >tmp &&
 	grep untracked/ tmp >actual &&
 	rm tmp &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 8/8] dir: consolidate similar code in treat_directory()
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
                         ` (6 preceding siblings ...)
  2019-12-18 19:29       ` [PATCH v4 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
@ 2019-12-18 19:29       ` Elijah Newren via GitGitGadget
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-18 19:29 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Both the DIR_SKIP_NESTED_GIT and DIR_NO_GITLINKS cases were checking for
whether a path was actually a nonbare repository.  That code could be
shared, with just the result of how to act differing between the two
cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/dir.c b/dir.c
index f500fd9279..6912f7eb5e 100644
--- a/dir.c
+++ b/dir.c
@@ -1461,6 +1461,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int exclude,
 	const struct pathspec *pathspec)
 {
+	int nested_repo = 0;
+
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(istate, dirname, len-1)) {
 	case index_directory:
@@ -1470,15 +1472,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
-		if (dir->flags & DIR_SKIP_NESTED_GIT) {
-			int nested_repo;
+		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		    !(dir->flags & DIR_NO_GITLINKS)) {
 			struct strbuf sb = STRBUF_INIT;
 			strbuf_addstr(&sb, dirname);
 			nested_repo = is_nonbare_repository_dir(&sb);
 			strbuf_release(&sb);
-			if (nested_repo)
-				return path_none;
 		}
+		if (nested_repo)
+			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+				(exclude ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
@@ -1506,13 +1509,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 			return path_none;
 		}
-		if (!(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			if (is_nonbare_repository_dir(&sb))
-				return exclude ? path_excluded : path_untracked;
-			strbuf_release(&sb);
-		}
 		return path_recurse;
 	}
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-18 19:29       ` [PATCH v4 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
@ 2019-12-18 21:29         ` Junio C Hamano
  2019-12-19 20:23           ` Elijah Newren
  0 siblings, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-18 21:29 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, blees, kyle, sxlijin, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> ...
> Fix most these problems by making treat_leading_path() not only loop
> over each leading path component, but calling treat_path() directly on
> each.  To do so, we have to create a synthetic dir_entry, but that only
> takes a few lines.  Then, pay attention to the path_treatment result we
> get from treat_path() and don't treat path_excluded, path_untracked, and
> path_recurse all the same as path_recurse.
>
> This leaves one remaining problem, the new inconsistency from commit
> df5bcdf83ae.  That will be addressed in a subsequent commit.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  dir.c                                         | 57 +++++++++++++++----
>  ...common-prefixes-and-directory-traversal.sh |  6 +-
>  2 files changed, 49 insertions(+), 14 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 645b44ea64..1de5d7ad33 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2102,37 +2102,72 @@ static int treat_leading_path(struct dir_struct *dir,
>  			      const struct pathspec *pathspec)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> -	int baselen, rc = 0;
> +	int prevlen, baselen;
>  	const char *cp;
> +	struct cached_dir cdir;
> +	struct dirent *de;
> +	enum path_treatment state = path_none;
> +
> +	/*
> +	 * For each directory component of path, we are going to check whether
> +	 * that path is relevant given the pathspec.  For example, if path is
> +	 *    foo/bar/baz/
> +	 * then we will ask treat_path() whether we should go into foo, then
> +	 * whether we should go into bar, then whether baz is relevant.
> +	 * Checking each is important because e.g. if path is
> +	 *    .git/info/
> +	 * then we need to check .git to know we shouldn't traverse it.
> +	 * If the return from treat_path() is:
> +	 *    * path_none, for any path, we return false.
> +	 *    * path_recurse, for all path components, we return true
> +	 *    * <anything else> for some intermediate component, we make sure
> +	 *        to add that path to the relevant list but return false
> +	 *        signifying that we shouldn't recurse into it.
> +	 */
>  
>  	while (len && path[len - 1] == '/')
>  		len--;
>  	if (!len)
>  		return 1;
> +
> +	de = xcalloc(1, sizeof(struct dirent)+len+1);

That "+len+1" may deserve a comment?  If we wanted to shoot for the
minimum memory consumption (and we do not), we would probably
allocate

	(sizeof(struct dirent) - sizeof(de->d_name)) +
		max(sizeof(de->d_name), len + 1)

bytes, but unconditionally adding len+1 is simpler and easier to
understand.  Either way, we *are* relying on the assumption that
either:

 (1) the "struct dirent" would have d_name[] array at the end of the
     struct, and by over-allocating, we can safely fit and carry a
     name that is much longer than sizeof(.d_name[]); OR

 (2) the "struct dirent" has d_name[] that is large enough to hold len+1
     bytes, if the assumption (1) does not hold.

is true.

> +	memset(&cdir, 0, sizeof(cdir));
> +	cdir.de = de;
> +#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
> +	de->d_type = DT_DIR;
> +#endif
>  	baselen = 0;
> +	prevlen = 0;
>  	while (1) {
> -		cp = path + baselen + !!baselen;
> +		prevlen = baselen + !!baselen;
> +		cp = path + prevlen;
>  		cp = memchr(cp, '/', path + len - cp);
>  		if (!cp)
>  			baselen = len;
>  		else
>  			baselen = cp - path;
> -		strbuf_setlen(&sb, 0);
> +		strbuf_reset(&sb);
>  		strbuf_add(&sb, path, baselen);
>  		if (!is_directory(sb.buf))
>  			break;



> -		if (simplify_away(sb.buf, sb.len, pathspec))
> -			break;
> -		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
> -				   DT_DIR, NULL) == path_none)
> +		strbuf_reset(&sb);
> +		strbuf_add(&sb, path, prevlen);
> +		memcpy(de->d_name, path+prevlen, baselen-prevlen);
> +		de->d_name[baselen-prevlen] = '\0';
> +		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
> +				    pathspec);

So this is the crux fo the fix---instead of doing a (poor) imitation
of what treat_path() does by calling simplify_away() and
treat_one_path() ourselves, we make a call to the real thing.

Looking good.  Thanks.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-18 21:29         ` Junio C Hamano
@ 2019-12-19 20:23           ` Elijah Newren
  2019-12-19 22:24             ` Jeff King
  0 siblings, 1 reply; 69+ messages in thread
From: Elijah Newren @ 2019-12-19 20:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, blees,
	Kyle Meyer, Samuel Lijin

On Wed, Dec 18, 2019 at 1:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
...
> >       while (len && path[len - 1] == '/')
> >               len--;
> >       if (!len)
> >               return 1;
> > +
> > +     de = xcalloc(1, sizeof(struct dirent)+len+1);
>
> That "+len+1" may deserve a comment?

Good point, I'll add one and send a re-roll.

>  If we wanted to shoot for the
> minimum memory consumption (and we do not), we would probably
> allocate
>
>         (sizeof(struct dirent) - sizeof(de->d_name)) +
>                 max(sizeof(de->d_name), len + 1)
>
> bytes, but unconditionally adding len+1 is simpler and easier to
> understand.  Either way, we *are* relying on the assumption that
> either:
>
>  (1) the "struct dirent" would have d_name[] array at the end of the
>      struct, and by over-allocating, we can safely fit and carry a
>      name that is much longer than sizeof(.d_name[]); OR
>
>  (2) the "struct dirent" has d_name[] that is large enough to hold len+1
>      bytes, if the assumption (1) does not hold.
>
> is true.
>
> > +     memset(&cdir, 0, sizeof(cdir));
> > +     cdir.de = de;
> > +#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
> > +     de->d_type = DT_DIR;
> > +#endif
> >       baselen = 0;
> > +     prevlen = 0;
> >       while (1) {
> > -             cp = path + baselen + !!baselen;
> > +             prevlen = baselen + !!baselen;
> > +             cp = path + prevlen;
> >               cp = memchr(cp, '/', path + len - cp);
> >               if (!cp)
> >                       baselen = len;
> >               else
> >                       baselen = cp - path;
> > -             strbuf_setlen(&sb, 0);
> > +             strbuf_reset(&sb);
> >               strbuf_add(&sb, path, baselen);
> >               if (!is_directory(sb.buf))
> >                       break;
>
>
>
> > -             if (simplify_away(sb.buf, sb.len, pathspec))
> > -                     break;
> > -             if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
> > -                                DT_DIR, NULL) == path_none)
> > +             strbuf_reset(&sb);
> > +             strbuf_add(&sb, path, prevlen);
> > +             memcpy(de->d_name, path+prevlen, baselen-prevlen);
> > +             de->d_name[baselen-prevlen] = '\0';
> > +             state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
> > +                                 pathspec);
>
> So this is the crux fo the fix---instead of doing a (poor) imitation
> of what treat_path() does by calling simplify_away() and
> treat_one_path() ourselves, we make a call to the real thing.
>
> Looking good.  Thanks.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v5 0/8] Directory traversal bugs
  2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
                         ` (7 preceding siblings ...)
  2019-12-18 19:29       ` [PATCH v4 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
@ 2019-12-19 21:28       ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
                           ` (7 more replies)
  8 siblings, 8 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano

This series documents multiple fill_directory() bugs, and fixes the one that
is new to 2.24.0 coming from en/clean-nested-with-ignored-topic, the rest
having been around in versions of git going back up to a decade. 

Changes since v4:

 * added a comment with the rationale for allocating an extra len+1 bytes
   for the dirent.

Elijah Newren (8):
  t3011: demonstrate directory traversal failures
  Revert "dir.c: make 'git-status --ignored' work within leading
    directories"
  dir: remove stray quote character in comment
  dir: exit before wildcard fall-through if there is no wildcard
  dir: break part of read_directory_recursive() out for reuse
  dir: fix checks on common prefix directory
  dir: synchronize treat_leading_path() and read_directory_recursive()
  dir: consolidate similar code in treat_directory()

 dir.c                                         | 187 ++++++++++++----
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 t/t7061-wtstatus-ignore.sh                    |   9 +-
 3 files changed, 354 insertions(+), 51 deletions(-)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh


base-commit: da72936f544fec5a335e66432610e4cef4430991
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-676%2Fnewren%2Fls-files-bug-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-676/newren/ls-files-bug-v5
Pull-Request: https://github.com/git/git/pull/676

Range-diff vs v4:

 1:  6d659b2302 = 1:  6d659b2302 t3011: demonstrate directory traversal failures
 2:  79f2b56174 = 2:  79f2b56174 Revert "dir.c: make 'git-status --ignored' work within leading directories"
 3:  d6f858cab1 = 3:  d6f858cab1 dir: remove stray quote character in comment
 4:  8d2d98eec3 = 4:  8d2d98eec3 dir: exit before wildcard fall-through if there is no wildcard
 5:  d2f5623bd7 = 5:  d2f5623bd7 dir: break part of read_directory_recursive() out for reuse
 6:  1f3978aa46 ! 6:  97e145489d dir: fix checks on common prefix directory
     @@ -93,6 +93,16 @@
       	if (!len)
       		return 1;
      +
     ++	/*
     ++	 * We need a manufactured dirent with sufficient space to store a
     ++	 * leading directory component of path in its d_name.  Here, we
     ++	 * assume that the dirent's d_name is either declared as
     ++	 *    char d_name[BIG_ENOUGH]
     ++	 * or that it is declared at the end of the struct as
     ++	 *    char d_name[]
     ++	 * For either case, padding with len+1 bytes at the end will ensure
     ++	 * sufficient storage space.
     ++	 */
      +	de = xcalloc(1, sizeof(struct dirent)+len+1);
      +	memset(&cdir, 0, sizeof(cdir));
      +	cdir.de = de;
 7:  542c6e5792 = 7:  5275e6d7f0 dir: synchronize treat_leading_path() and read_directory_recursive()
 8:  31079dc1cf = 8:  e4768931d0 dir: consolidate similar code in treat_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v5 1/8] t3011: demonstrate directory traversal failures
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add several tests demonstrating directory traversal failures of various
sorts in dir.c (and one similar looking test that turns out to be a
git_fnmatch bug).  A lot of these tests look like near duplicates of
each other, but an optimization path in dir.c to pre-descend into a
common prefix and the specialized treatment of trailing slashes in dir.c
mean the tiny differences are sometimes important and potentially cause
different codepaths to be explored.

Of the 7 failing tests, 2 are new to git-2.24.0 (tweaked by side effects
of the en/clean-nested-with-ignored-topic); the other 5 also failed
under git-2.23.0 and earlier.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 ...common-prefixes-and-directory-traversal.sh | 209 ++++++++++++++++++
 1 file changed, 209 insertions(+)
 create mode 100755 t/t3011-common-prefixes-and-directory-traversal.sh

diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
new file mode 100755
index 0000000000..54f80c62b8
--- /dev/null
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -0,0 +1,209 @@
+#!/bin/sh
+
+test_description='directory traversal handling, especially with common prefixes'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit hello &&
+
+	>empty &&
+	mkdir untracked_dir &&
+	>untracked_dir/empty &&
+	git init untracked_repo &&
+	>untracked_repo/empty &&
+
+	cat <<-EOF >.gitignore &&
+	ignored
+	an_ignored_dir/
+	EOF
+	mkdir an_ignored_dir &&
+	mkdir an_untracked_dir &&
+	>an_ignored_dir/ignored &&
+	>an_ignored_dir/untracked &&
+	>an_untracked_dir/ignored &&
+	>an_untracked_dir/untracked
+'
+
+test_expect_success 'git ls-files -o shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_ignored_dir/ignored
+	an_ignored_dir/untracked
+	an_untracked_dir/ignored
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --exclude-standard shows the right entries' '
+	cat <<-EOF >expect &&
+	.gitignore
+	actual
+	an_untracked_dir/untracked
+	empty
+	expect
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o --exclude-standard >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ recurses' '
+	echo untracked_dir/empty >expect &&
+	git ls-files -o untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+	echo untracked_dir/ >expect &&
+	git ls-files -o --directory untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_repo does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+	echo untracked_repo/ >expect &&
+	git ls-files -o untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses into untracked_dir only' '
+	cat <<-EOF >expect &&
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir untracked_repo >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o --directory untracked_dir/ untracked_repo/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory untracked_dir/ untracked_repo/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o .git shows nothing' '
+	git ls-files -o .git >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_failure 'git ls-files -o .git/ shows nothing' '
+	git ls-files -o .git/ >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o untracked_* recurses appropriately' '
+	mkdir "untracked_*" &&
+	>"untracked_*/empty" &&
+
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+# It turns out fill_directory returns the right paths, but ls-files' post-call
+# filtering in show_dir_entry() via calling dir_path_match() which ends up
+# in git_fnmatch() has logic for PATHSPEC_ONESTAR that assumes the pathspec
+# must match the full path; it doesn't check it for matching a leading
+# directory.
+test_expect_failure FUNNYNAMES 'git ls-files -o untracked_*/ recurses appropriately' '
+	cat <<-EOF >expect &&
+	untracked_*/empty
+	untracked_dir/empty
+	untracked_repo/
+	EOF
+	git ls-files -o "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_* does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'git ls-files -o --directory untracked_*/ does not recurse' '
+	cat <<-EOF >expect &&
+	untracked_*/
+	untracked_dir/
+	untracked_repo/
+	EOF
+	git ls-files -o --directory "untracked_*/" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git ls-files -o consistent between one or two dirs' '
+	git ls-files -o --exclude-standard an_ignored_dir/ an_untracked_dir/ >tmp &&
+	! grep ^an_ignored_dir/ tmp >expect &&
+	git ls-files -o --exclude-standard an_ignored_dir/ >actual &&
+	test_cmp expect actual
+'
+
+# ls-files doesn't have a way to request showing both untracked and ignored
+# files at the same time, so use `git status --ignored`
+test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+	cat <<-EOF >expect &&
+	?? an_untracked_dir/
+	!! an_untracked_dir/ignored
+	EOF
+	git status --porcelain --ignored >output &&
+	grep an_untracked_dir output >expect &&
+	git status --porcelain --ignored an_untracked_dir/ >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories"
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit be8a84c52669 ("dir.c: make 'git-status --ignored' work within
leading directories", 2013-04-15) noted that
   git status --ignored <SOMEPATH>
would not list ignored files and directories within <SOMEPATH> if
<SOMEPATH> was untracked, and modified the behavior to make it show
them.  However, it did so via a hack that broke consistency; it would
show paths under <SOMEPATH> differently than a simple
   git status --ignored | grep <SOMEPATH>
would show them.  A correct fix is slightly more involved, and
complicated slightly by this hack, so we revert this commit (but keep
corrected versions of the testcases) and will later fix the original
bug with a subsequent patch.

Some history may be helpful:

A very, very similar case to the commit we are reverting was raised in
commit 48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08); but it actually went in somewhat the opposite direction.  In
that commit, it mentioned how
   git ls-files -o --exclude-standard t/
used to show untracked files under t/ even when t/ was ignored, and then
changed the behavior to stop showing untracked files under an ignored
directory.  More importantly, this commit considered keeping this
behavior but noted that it would be inconsistent with the behavior when
multiple pathspecs were specified and thus rejected it.

The reason for this whole inconsistency when one pathspec is specified
versus zero or two is because common prefixes of pathspecs are sent
through a different set of checks (in treat_leading_path()) than normal
file/directory traversal (those go through read_directory_recursive()
and treat_path()).  As such, for consistency, one needs to check that
both codepaths produce the same result.

Revert commit be8a84c526691667fc04a8241d93a3de1de298ab, except instead
of removing the testcase it added, modify it to check for correct and
consistent behavior.  A subsequent patch in this series will fix the
testcase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                      | 3 ---
 t/t7061-wtstatus-ignore.sh | 9 +++++++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index 61f559f980..0dd5266629 100644
--- a/dir.c
+++ b/dir.c
@@ -2083,14 +2083,12 @@ static int treat_leading_path(struct dir_struct *dir,
 	struct strbuf sb = STRBUF_INIT;
 	int baselen, rc = 0;
 	const char *cp;
-	int old_flags = dir->flags;
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
 	baselen = 0;
-	dir->flags &= ~DIR_SHOW_OTHER_DIRECTORIES;
 	while (1) {
 		cp = path + baselen + !!baselen;
 		cp = memchr(cp, '/', path + len - cp);
@@ -2113,7 +2111,6 @@ static int treat_leading_path(struct dir_struct *dir,
 		}
 	}
 	strbuf_release(&sb);
-	dir->flags = old_flags;
 	return rc;
 }
 
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 0c394cf995..84366050da 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -43,11 +43,16 @@ test_expect_success 'status untracked directory with --ignored -u' '
 	test_cmp expected actual
 '
 cat >expected <<\EOF
-?? untracked/uncommitted
+?? untracked/
 !! untracked/ignored
 EOF
 
-test_expect_success 'status prefixed untracked directory with --ignored' '
+test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+	git status --porcelain --ignored >tmp &&
+	grep untracked/ tmp >actual &&
+	rm tmp &&
+	test_cmp expected actual &&
+
 	git status --porcelain --ignored untracked/ >actual &&
 	test_cmp expected actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 3/8] dir: remove stray quote character in comment
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 0dd5266629..5dacacd469 100644
--- a/dir.c
+++ b/dir.c
@@ -373,7 +373,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
-		/* name" doesn't match up to the first wild character */
+		/* name doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
 		    ps_strncmp(item, match, name,
 			       item->nowildcard_len - prefix))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 4/8] dir: exit before wildcard fall-through if there is no wildcard
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                           ` (2 preceding siblings ...)
  2019-12-19 21:28         ` [PATCH v5 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a
wildcard, noting that we don't yet have enough information to determine
if a further paths under the current directory might match due to the
presence of wildcards.  But if we have no wildcards in our pathspec,
then we shouldn't get to that fall-through case.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                              | 7 +++++++
 t/t3011-common-prefixes-and-directory-traversal.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 5dacacd469..517a569e10 100644
--- a/dir.c
+++ b/dir.c
@@ -379,6 +379,13 @@ static int match_pathspec_item(const struct index_state *istate,
 			       item->nowildcard_len - prefix))
 			return 0;
 
+		/*
+		 * name has no wildcard, and it didn't match as a leading
+		 * pathspec so return.
+		 */
+		if (item->nowildcard_len == item->len)
+			return 0;
+
 		/*
 		 * Here is where we would perform a wildmatch to check if
 		 * "name" can be matched as a directory (or a prefix) against
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 54f80c62b8..d6e161ddd8 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -92,7 +92,7 @@ test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
+test_expect_success 'git ls-files -o untracked_dir untracked_repo recurses into untracked_dir only' '
 	cat <<-EOF >expect &&
 	untracked_dir/empty
 	untracked_repo/
@@ -110,7 +110,7 @@ test_expect_success 'git ls-files -o untracked_dir/ untracked_repo/ recurses int
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir untracked_repo does not recurse' '
 	cat <<-EOF >expect &&
 	untracked_dir/
 	untracked_repo/
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 5/8] dir: break part of read_directory_recursive() out for reuse
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                           ` (3 preceding siblings ...)
  2019-12-19 21:28         ` [PATCH v5 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create an add_path_to_appropriate_result_list() function from the code
at the end of read_directory_recursive() so we can use it elsewhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 60 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/dir.c b/dir.c
index 517a569e10..645b44ea64 100644
--- a/dir.c
+++ b/dir.c
@@ -1932,6 +1932,40 @@ static void close_cached_dir(struct cached_dir *cdir)
 	}
 }
 
+static void add_path_to_appropriate_result_list(struct dir_struct *dir,
+	struct untracked_cache_dir *untracked,
+	struct cached_dir *cdir,
+	struct index_state *istate,
+	struct strbuf *path,
+	int baselen,
+	const struct pathspec *pathspec,
+	enum path_treatment state)
+{
+	/* add the path to the appropriate result list */
+	switch (state) {
+	case path_excluded:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			dir_add_name(dir, istate, path->buf, path->len);
+		else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			((dir->flags & DIR_COLLECT_IGNORED) &&
+			exclude_matches_pathspec(path->buf, path->len,
+						 pathspec)))
+			dir_add_ignored(dir, istate, path->buf, path->len);
+		break;
+
+	case path_untracked:
+		if (dir->flags & DIR_SHOW_IGNORED)
+			break;
+		dir_add_name(dir, istate, path->buf, path->len);
+		if (cdir->fdir)
+			add_untracked(untracked, path->buf + baselen);
+		break;
+
+	default:
+		break;
+	}
+}
+
 /*
  * Read a directory tree. We currently ignore anything but
  * directories, regular files and symlinks. That's because git
@@ -2035,29 +2069,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			continue;
 		}
 
-		/* add the path to the appropriate result list */
-		switch (state) {
-		case path_excluded:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				dir_add_name(dir, istate, path.buf, path.len);
-			else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-				((dir->flags & DIR_COLLECT_IGNORED) &&
-				exclude_matches_pathspec(path.buf, path.len,
-							 pathspec)))
-				dir_add_ignored(dir, istate, path.buf, path.len);
-			break;
-
-		case path_untracked:
-			if (dir->flags & DIR_SHOW_IGNORED)
-				break;
-			dir_add_name(dir, istate, path.buf, path.len);
-			if (cdir.fdir)
-				add_untracked(untracked, path.buf + baselen);
-			break;
-
-		default:
-			break;
-		}
+		add_path_to_appropriate_result_list(dir, untracked, &cdir,
+						    istate, &path, baselen,
+						    pathspec, state);
 	}
 	close_cached_dir(&cdir);
  out:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 6/8] dir: fix checks on common prefix directory
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                           ` (4 preceding siblings ...)
  2019-12-19 21:28         ` [PATCH v5 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Many years ago, the directory traversing logic had an optimization that
would always recurse into any directory that was a common prefix of all
the pathspecs without walking the leading directories to get down to
the desired directory.  Thus,
   git ls-files -o .git/                        # case A
would notice that .git/ was a common prefix of all pathspecs (since
it is the only pathspec listed), and then traverse into it and start
showing unknown files under that directory.  Unfortunately, .git/ is not
a directory we should be traversing into, which made this optimization
problematic.  This also affected cases like
   git ls-files -o --exclude-standard t/        # case B
where t/ was in the .gitignore file and thus isn't interesting and
shouldn't be recursed into.  It also affected cases like
   git ls-files -o --directory untracked_dir/   # case C
where untracked_dir/ is indeed untracked and thus interesting, but the
--directory flag means we only want to show the directory itself, not
recurse into it and start listing untracked files below it.

The case B class of bugs were noted and fixed in commits 16e2cfa90993
("read_directory(): further split treat_path()", 2010-01-08) and
48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08), with the idea being that we first wanted to check whether
the common prefix was interesting.  The former patch noted that
treat_path() couldn't be used when checking the common prefix because
treat_path() requires a dir_entry() and we haven't read any directories
at the point we are checking the common prefix.  So, that patch split
treat_one_path() out of treat_path().  The latter patch then created a
new treat_leading_path() which duplicated by hand the bits of
treat_path() that couldn't be broken out and then called
treat_one_path() for the remainder.  There were three problems with this
approach:

  * The duplicated logic in treat_leading_path() accidentally missed the
    check for special paths (such as is_dot_or_dotdot and matching
    ".git"), causing case A types of bugs to continue to be an issue.
  * The treat_leading_path() logic assumed we should traverse into
    anything where path_treatment was not path_none, i.e. it perpetuated
    class C types of bugs.
  * It meant we had split logic that needed to kept in sync, running the
    risk that people introduced new inconsistencies (such as in commit
    be8a84c52669, which we reverted earlier in this series, or in commit
    df5bcdf83ae which we'll fix in a subsequent commit)

Fix most these problems by making treat_leading_path() not only loop
over each leading path component, but calling treat_path() directly on
each.  To do so, we have to create a synthetic dir_entry, but that only
takes a few lines.  Then, pay attention to the path_treatment result we
get from treat_path() and don't treat path_excluded, path_untracked, and
path_recurse all the same as path_recurse.

This leaves one remaining problem, the new inconsistency from commit
df5bcdf83ae.  That will be addressed in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 67 ++++++++++++++++---
 ...common-prefixes-and-directory-traversal.sh |  6 +-
 2 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/dir.c b/dir.c
index 645b44ea64..a42cc2aa8c 100644
--- a/dir.c
+++ b/dir.c
@@ -2102,37 +2102,82 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const struct pathspec *pathspec)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int baselen, rc = 0;
+	int prevlen, baselen;
 	const char *cp;
+	struct cached_dir cdir;
+	struct dirent *de;
+	enum path_treatment state = path_none;
+
+	/*
+	 * For each directory component of path, we are going to check whether
+	 * that path is relevant given the pathspec.  For example, if path is
+	 *    foo/bar/baz/
+	 * then we will ask treat_path() whether we should go into foo, then
+	 * whether we should go into bar, then whether baz is relevant.
+	 * Checking each is important because e.g. if path is
+	 *    .git/info/
+	 * then we need to check .git to know we shouldn't traverse it.
+	 * If the return from treat_path() is:
+	 *    * path_none, for any path, we return false.
+	 *    * path_recurse, for all path components, we return true
+	 *    * <anything else> for some intermediate component, we make sure
+	 *        to add that path to the relevant list but return false
+	 *        signifying that we shouldn't recurse into it.
+	 */
 
 	while (len && path[len - 1] == '/')
 		len--;
 	if (!len)
 		return 1;
+
+	/*
+	 * We need a manufactured dirent with sufficient space to store a
+	 * leading directory component of path in its d_name.  Here, we
+	 * assume that the dirent's d_name is either declared as
+	 *    char d_name[BIG_ENOUGH]
+	 * or that it is declared at the end of the struct as
+	 *    char d_name[]
+	 * For either case, padding with len+1 bytes at the end will ensure
+	 * sufficient storage space.
+	 */
+	de = xcalloc(1, sizeof(struct dirent)+len+1);
+	memset(&cdir, 0, sizeof(cdir));
+	cdir.de = de;
+#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
+	de->d_type = DT_DIR;
+#endif
 	baselen = 0;
+	prevlen = 0;
 	while (1) {
-		cp = path + baselen + !!baselen;
+		prevlen = baselen + !!baselen;
+		cp = path + prevlen;
 		cp = memchr(cp, '/', path + len - cp);
 		if (!cp)
 			baselen = len;
 		else
 			baselen = cp - path;
-		strbuf_setlen(&sb, 0);
+		strbuf_reset(&sb);
 		strbuf_add(&sb, path, baselen);
 		if (!is_directory(sb.buf))
 			break;
-		if (simplify_away(sb.buf, sb.len, pathspec))
-			break;
-		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
-				   DT_DIR, NULL) == path_none)
+		strbuf_reset(&sb);
+		strbuf_add(&sb, path, prevlen);
+		memcpy(de->d_name, path+prevlen, baselen-prevlen);
+		de->d_name[baselen-prevlen] = '\0';
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
+				    pathspec);
+		if (state != path_recurse)
 			break; /* do not recurse into it */
-		if (len <= baselen) {
-			rc = 1;
+		if (len <= baselen)
 			break; /* finished checking */
-		}
 	}
+	add_path_to_appropriate_result_list(dir, NULL, &cdir, istate,
+					    &sb, baselen, pathspec,
+					    state);
+
+	free(de);
 	strbuf_release(&sb);
-	return rc;
+	return state == path_recurse;
 }
 
 static const char *get_ident_string(void)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index d6e161ddd8..098fddc75b 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -74,7 +74,7 @@ test_expect_success 'git ls-files -o --directory untracked_dir does not recurse'
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
+test_expect_success 'git ls-files -o --directory untracked_dir/ does not recurse' '
 	echo untracked_dir/ >expect &&
 	git ls-files -o --directory untracked_dir/ >actual &&
 	test_cmp expect actual
@@ -86,7 +86,7 @@ test_expect_success 'git ls-files -o untracked_repo does not recurse' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
+test_expect_success 'git ls-files -o untracked_repo/ does not recurse' '
 	echo untracked_repo/ >expect &&
 	git ls-files -o untracked_repo/ >actual &&
 	test_cmp expect actual
@@ -133,7 +133,7 @@ test_expect_success 'git ls-files -o .git shows nothing' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'git ls-files -o .git/ shows nothing' '
+test_expect_success 'git ls-files -o .git/ shows nothing' '
 	git ls-files -o .git/ >actual &&
 	test_must_be_empty actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 7/8] dir: synchronize treat_leading_path() and read_directory_recursive()
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                           ` (5 preceding siblings ...)
  2019-12-19 21:28         ` [PATCH v5 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  2019-12-19 21:28         ` [PATCH v5 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our optimization to avoid calling into read_directory_recursive() when
all pathspecs have a common leading directory mean that we need to match
the logic that read_directory_recursive() would use if we had just
called it from the root.  Since it does more than call treat_path() we
need to copy that same logic.

Alternatively, we could try to change treat_path to return path_recurse
for an untracked directory under the given special circumstances that
this logic checks for, but a simple switch results in many test failures
such as 'git clean -d' not wiping out untracked but empty directories.
To work around that, we'd need the caller of treat_path to check for
path_recurse and sometimes special case it into path_untracked.  In
other words, we'd still have extra logic in both places.

Needing to duplicate logic like this means it is guaranteed someone will
eventually need to make further changes and forget to update both
locations.  It is tempting to just nuke the leading_directory special
casing to avoid such bugs and simplify the code, but unpack_trees'
verify_clean_subdirectory() also calls read_directory() and does so with
a non-empty leading path, so I'm hesitant to try to restructure further.
Add obnoxious warnings to treat_leading_path() and
read_directory_recursive() to try to warn people of such problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                                         | 30 +++++++++++++++++++
 ...common-prefixes-and-directory-traversal.sh |  2 +-
 t/t7061-wtstatus-ignore.sh                    |  2 +-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index a42cc2aa8c..357f9593c4 100644
--- a/dir.c
+++ b/dir.c
@@ -1990,6 +1990,15 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct untracked_cache_dir *untracked, int check_only,
 	int stop_at_first_file, const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
 	struct strbuf path = STRBUF_INIT;
@@ -2101,6 +2110,15 @@ static int treat_leading_path(struct dir_struct *dir,
 			      const char *path, int len,
 			      const struct pathspec *pathspec)
 {
+	/*
+	 * WARNING WARNING WARNING:
+	 *
+	 * Any updates to the traversal logic here may need corresponding
+	 * updates in treat_leading_path().  See the commit message for the
+	 * commit adding this warning as well as the commit preceding it
+	 * for details.
+	 */
+
 	struct strbuf sb = STRBUF_INIT;
 	int prevlen, baselen;
 	const char *cp;
@@ -2166,6 +2184,18 @@ static int treat_leading_path(struct dir_struct *dir,
 		de->d_name[baselen-prevlen] = '\0';
 		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
 				    pathspec);
+		if (state == path_untracked &&
+		    get_dtype(cdir.de, istate, sb.buf, sb.len) == DT_DIR &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
+		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
+				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
+			add_path_to_appropriate_result_list(dir, NULL, &cdir,
+							    istate,
+							    &sb, baselen,
+							    pathspec, state);
+			state = path_recurse;
+		}
+
 		if (state != path_recurse)
 			break; /* do not recurse into it */
 		if (len <= baselen)
diff --git a/t/t3011-common-prefixes-and-directory-traversal.sh b/t/t3011-common-prefixes-and-directory-traversal.sh
index 098fddc75b..3da5b2b6e7 100755
--- a/t/t3011-common-prefixes-and-directory-traversal.sh
+++ b/t/t3011-common-prefixes-and-directory-traversal.sh
@@ -195,7 +195,7 @@ test_expect_success 'git ls-files -o consistent between one or two dirs' '
 
 # ls-files doesn't have a way to request showing both untracked and ignored
 # files at the same time, so use `git status --ignored`
-test_expect_failure 'git status --ignored shows same files under dir with or without pathspec' '
+test_expect_success 'git status --ignored shows same files under dir with or without pathspec' '
 	cat <<-EOF >expect &&
 	?? an_untracked_dir/
 	!! an_untracked_dir/ignored
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 84366050da..e4cf5484f9 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -47,7 +47,7 @@ cat >expected <<\EOF
 !! untracked/ignored
 EOF
 
-test_expect_failure 'status of untracked directory with --ignored works with or without prefix' '
+test_expect_success 'status of untracked directory with --ignored works with or without prefix' '
 	git status --porcelain --ignored >tmp &&
 	grep untracked/ tmp >actual &&
 	rm tmp &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v5 8/8] dir: consolidate similar code in treat_directory()
  2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
                           ` (6 preceding siblings ...)
  2019-12-19 21:28         ` [PATCH v5 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
@ 2019-12-19 21:28         ` Elijah Newren via GitGitGadget
  7 siblings, 0 replies; 69+ messages in thread
From: Elijah Newren via GitGitGadget @ 2019-12-19 21:28 UTC (permalink / raw)
  To: git; +Cc: blees, gitster, kyle, sxlijin, Junio C Hamano, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Both the DIR_SKIP_NESTED_GIT and DIR_NO_GITLINKS cases were checking for
whether a path was actually a nonbare repository.  That code could be
shared, with just the result of how to act differing between the two
cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/dir.c b/dir.c
index 357f9593c4..e1b74f6478 100644
--- a/dir.c
+++ b/dir.c
@@ -1461,6 +1461,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	const char *dirname, int len, int baselen, int exclude,
 	const struct pathspec *pathspec)
 {
+	int nested_repo = 0;
+
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(istate, dirname, len-1)) {
 	case index_directory:
@@ -1470,15 +1472,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
-		if (dir->flags & DIR_SKIP_NESTED_GIT) {
-			int nested_repo;
+		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+		    !(dir->flags & DIR_NO_GITLINKS)) {
 			struct strbuf sb = STRBUF_INIT;
 			strbuf_addstr(&sb, dirname);
 			nested_repo = is_nonbare_repository_dir(&sb);
 			strbuf_release(&sb);
-			if (nested_repo)
-				return path_none;
 		}
+		if (nested_repo)
+			return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none :
+				(exclude ? path_excluded : path_untracked));
 
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
@@ -1506,13 +1509,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 			return path_none;
 		}
-		if (!(dir->flags & DIR_NO_GITLINKS)) {
-			struct strbuf sb = STRBUF_INIT;
-			strbuf_addstr(&sb, dirname);
-			if (is_nonbare_repository_dir(&sb))
-				return exclude ? path_excluded : path_untracked;
-			strbuf_release(&sb);
-		}
 		return path_recurse;
 	}
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-19 20:23           ` Elijah Newren
@ 2019-12-19 22:24             ` Jeff King
  2019-12-20 17:00               ` Elijah Newren
  2019-12-20 18:01               ` Junio C Hamano
  0 siblings, 2 replies; 69+ messages in thread
From: Jeff King @ 2019-12-19 22:24 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

On Thu, Dec 19, 2019 at 12:23:29PM -0800, Elijah Newren wrote:

> > >       while (len && path[len - 1] == '/')
> > >               len--;
> > >       if (!len)
> > >               return 1;
> > > +
> > > +     de = xcalloc(1, sizeof(struct dirent)+len+1);
> >
> > That "+len+1" may deserve a comment?
> 
> Good point, I'll add one and send a re-roll.

Please use st_add3() while you are at it.

I'd also usually suggest FLEX_ALLOC_MEM() for even more simplicity, but
it looks like filling the string is handled separately (and done many
times).

I have to wonder, though, if it wouldn't be simpler to move away from
"struct dirent" here (and it looks like Junio suggested the same earlier
in the thread). I don't know this code very well, but it looks
like it could easily get by passing around a name pointer and a dtype
through the cached_dir. The patch below seems like it's not too bad a
cleanup, but possibly the names could be better.

---
 dir.c | 48 ++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/dir.c b/dir.c
index 43e2f47f66..e1cba688f3 100644
--- a/dir.c
+++ b/dir.c
@@ -41,7 +41,8 @@ struct cached_dir {
 	int nr_files;
 	int nr_dirs;
 
-	struct dirent *de;
+	const char *d_name;
+	int d_type;
 	const char *file;
 	struct untracked_cache_dir *ucd;
 };
@@ -50,8 +51,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *path, int len,
 	struct untracked_cache_dir *untracked,
 	int check_only, int stop_at_first_file, const struct pathspec *pathspec);
-static int get_dtype(struct dirent *de, struct index_state *istate,
-		     const char *path, int len);
+static int resolve_dtype(int dtype, struct index_state *istate,
+			 const char *path, int len);
 
 int count_slashes(const char *s)
 {
@@ -1050,8 +1051,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
 		int prefix = pattern->nowildcardlen;
 
 		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
-			if (*dtype == DT_UNKNOWN)
-				*dtype = get_dtype(NULL, istate, pathname, pathlen);
+			*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
 			if (*dtype != DT_DIR)
 				continue;
 		}
@@ -1639,10 +1639,9 @@ static int get_index_dtype(struct index_state *istate,
 	return DT_UNKNOWN;
 }
 
-static int get_dtype(struct dirent *de, struct index_state *istate,
-		     const char *path, int len)
+static int resolve_dtype(int dtype, struct index_state *istate,
+			 const char *path, int len)
 {
-	int dtype = de ? DTYPE(de) : DT_UNKNOWN;
 	struct stat st;
 
 	if (dtype != DT_UNKNOWN)
@@ -1667,14 +1666,13 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 					  struct strbuf *path,
 					  int baselen,
 					  const struct pathspec *pathspec,
-					  int dtype, struct dirent *de)
+					  int dtype)
 {
 	int exclude;
 	int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case);
 	enum path_treatment path_treatment;
 
-	if (dtype == DT_UNKNOWN)
-		dtype = get_dtype(de, istate, path->buf, path->len);
+	dtype = resolve_dtype(dtype, istate, path->buf, path->len);
 
 	/* Always exclude indexed files */
 	if (dtype != DT_DIR && has_path_in_index)
@@ -1782,21 +1780,18 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      int baselen,
 				      const struct pathspec *pathspec)
 {
-	int dtype;
-	struct dirent *de = cdir->de;
-
-	if (!de)
+	if (!cdir->d_name)
 		return treat_path_fast(dir, untracked, cdir, istate, path,
 				       baselen, pathspec);
-	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
+	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
 		return path_none;
 	strbuf_setlen(path, baselen);
-	strbuf_addstr(path, de->d_name);
+	strbuf_addstr(path, cdir->d_name);
 	if (simplify_away(path->buf, path->len, pathspec))
 		return path_none;
 
-	dtype = DTYPE(de);
-	return treat_one_path(dir, untracked, istate, path, baselen, pathspec, dtype, de);
+	return treat_one_path(dir, untracked, istate, path, baselen, pathspec,
+			      cdir->d_type);
 }
 
 static void add_untracked(struct untracked_cache_dir *dir, const char *name)
@@ -1884,10 +1879,17 @@ static int open_cached_dir(struct cached_dir *cdir,
 
 static int read_cached_dir(struct cached_dir *cdir)
 {
+	struct dirent *de;
+
 	if (cdir->fdir) {
-		cdir->de = readdir(cdir->fdir);
-		if (!cdir->de)
+		de = readdir(cdir->fdir);
+		if (!de) {
+			cdir->d_name = NULL;
+			cdir->d_type = DT_UNKNOWN;
 			return -1;
+		}
+		cdir->d_name = de->d_name;
+		cdir->d_type = DTYPE(de);
 		return 0;
 	}
 	while (cdir->nr_dirs < cdir->untracked->dirs_nr) {
@@ -1970,7 +1972,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* recurse into subdir if instructed by treat_path */
 		if ((state == path_recurse) ||
 			((state == path_untracked) &&
-			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
+			 (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) &&
 			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
 			  (pathspec &&
 			   do_match_pathspec(istate, pathspec, path.buf, path.len,
@@ -2103,7 +2105,7 @@ static int treat_leading_path(struct dir_struct *dir,
 		if (simplify_away(sb.buf, sb.len, pathspec))
 			break;
 		if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
-				   DT_DIR, NULL) == path_none)
+				   DT_DIR) == path_none)
 			break; /* do not recurse into it */
 		if (len <= baselen) {
 			rc = 1;

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-19 22:24             ` Jeff King
@ 2019-12-20 17:00               ` Elijah Newren
  2019-12-20 21:14                 ` Jeff King
  2019-12-20 18:01               ` Junio C Hamano
  1 sibling, 1 reply; 69+ messages in thread
From: Elijah Newren @ 2019-12-20 17:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

Hi Peff,

On Thu, Dec 19, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Dec 19, 2019 at 12:23:29PM -0800, Elijah Newren wrote:
>
> > > >       while (len && path[len - 1] == '/')
> > > >               len--;
> > > >       if (!len)
> > > >               return 1;
> > > > +
> > > > +     de = xcalloc(1, sizeof(struct dirent)+len+1);
> > >
> > > That "+len+1" may deserve a comment?
> >
> > Good point, I'll add one and send a re-roll.
>
> Please use st_add3() while you are at it.

I would, but Junio already took the patches and applied them to next
already.  (I am curious, though, why we're worried about overflow in a
context like this?)

> I'd also usually suggest FLEX_ALLOC_MEM() for even more simplicity, but
> it looks like filling the string is handled separately (and done many
> times).

Yes, the string is handled separately; I don't manufacture a dirent
per leading directory component of the common prefix, but just
allocate one and re-use it.

> I have to wonder, though, if it wouldn't be simpler to move away from
> "struct dirent" here (and it looks like Junio suggested the same earlier
> in the thread). I don't know this code very well, but it looks
> like it could easily get by passing around a name pointer and a dtype
> through the cached_dir. The patch below seems like it's not too bad a
> cleanup, but possibly the names could be better.

This was mentioned twice upthread, first by me then by Junio (and I'll
include my final response too):

>>> I need to manufacture a dirent myself; short of that, the most
>>> likely alternative is to drop patches 2 & 5-8 of this series and
>>> throw my hands in the air and give up.
>>> ...
>>> It'd be an awful lot of work to rip [dirent] out...unless someone
>>> else has some bright ideas about something clever we can do, then I
>>> think this problem blows up in complexity to a level where I don't
>>> think it's worth addressing.
>>> ...
>>> Any bright ideas about what to do here?
>>
>> Restructuring the code so that we do not use "struct dirent" in the
>> first place, even in the original code that used only those obtained
>> from readdir(), perhaps?
>
> Okay, I'll submit a new series dropping most the patches.

It's possible I vastly overestimated how much work ripping out the
dirent would be; I mean I've mis-estimated absolutely everything in
dir.c and assumed each "little" thing would all be a small amount of
work, so maybe I'm just swinging the pendulum too far the other way.
But, although I think this alternative would be the cleanest, I saw a
couple things that looked like this was going to turn into a huge can
of worms when I started to peek at what it all touched.  I'd be happy
for someone to take this route, but it won't be me (see also
https://lore.kernel.org/git/CABPp-BEkX9cH1=r3dJ4WLzcJKVcF-KpGUkshL34MMp3Xhhhpuw@mail.gmail.com/).

Elijah

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-19 22:24             ` Jeff King
  2019-12-20 17:00               ` Elijah Newren
@ 2019-12-20 18:01               ` Junio C Hamano
  2019-12-20 21:15                 ` Jeff King
  1 sibling, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2019-12-20 18:01 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

Jeff King <peff@peff.net> writes:

> Please use st_add3() while you are at it.
>
> I'd also usually suggest FLEX_ALLOC_MEM() for even more simplicity, but
> it looks like filling the string is handled separately (and done many
> times).
>
> I have to wonder, though, if it wouldn't be simpler to move away from
> "struct dirent" here (and it looks like Junio suggested the same earlier
> in the thread). I don't know this code very well, but it looks
> like it could easily get by passing around a name pointer and a dtype
> through the cached_dir. The patch below seems like it's not too bad a
> cleanup, but possibly the names could be better.

It does look like a good clean-up.

In the meantime, here is to apologize for merging the patch a bit
too early to 'next'.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Fri, 20 Dec 2019 09:55:53 -0800
Subject: [PATCH] dir.c: use st_add3() for allocation size

When preparing a manufactured dirent instance, we add a length of
path to the size of struct to decide how many bytes to allocate.
Make sure this addition does not wrap-around to cause us
underallocate.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index e1b74f6478..113170aeb9 100644
--- a/dir.c
+++ b/dir.c
@@ -2154,7 +2154,7 @@ static int treat_leading_path(struct dir_struct *dir,
 	 * For either case, padding with len+1 bytes at the end will ensure
 	 * sufficient storage space.
 	 */
-	de = xcalloc(1, sizeof(struct dirent)+len+1);
+	de = xcalloc(1, st_add3(sizeof(struct dirent), len, 1));
 	memset(&cdir, 0, sizeof(cdir));
 	cdir.de = de;
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
-- 
2.24.1-769-g187e15c71d




^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-20 17:00               ` Elijah Newren
@ 2019-12-20 21:14                 ` Jeff King
  0 siblings, 0 replies; 69+ messages in thread
From: Jeff King @ 2019-12-20 21:14 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

On Fri, Dec 20, 2019 at 09:00:40AM -0800, Elijah Newren wrote:

> > > > > +     de = xcalloc(1, sizeof(struct dirent)+len+1);
> > > >
> > > > That "+len+1" may deserve a comment?
> > >
> > > Good point, I'll add one and send a re-roll.
> >
> > Please use st_add3() while you are at it.
> 
> I would, but Junio already took the patches and applied them to next
> already.  (I am curious, though, why we're worried about overflow in a
> context like this?)

If len is large enough to cause integer overflow when computing the
total size, then we'd allocate a too-small buffer (and then later
overflow the buffer when writing into it).

I'm not sure how possible that is here. On 32-bit systems, overflowing
size_t only needs 4GB. you're not likely to have a 4GB path on a
filesystem, but malicious folks could shove them into a tree. I'm not
sure if this code could be triggered for anything that doesn't actually
exist on the filesystem, though.

You're also not likely to actually manage to store a 4GB string in
"path" on a 32-bit system in the first place. But "len" is actually an
"int". On a 64-bit system it would be easy to do, though, and int is
still 32 bits there. But because the result of sizeof() is a size_t, I
think the int will be promoted as well during the addition (and assuming
it's not negative, will be too small to overflow). (Also, the "len"
parameter probably should to be a size_t in the first place, but that's
not new).

So I don't think it's exploitable, but as you can see there's a bit of
pondering to see that it's so. When I audit I usually look for something
like /x[mc]alloc.*[+*] / to find potential problem spots. Even if we're
pretty sure a particular site isn't vulnerable, marking it with st_add()
errs on the safe side, and makes those audits easier.

> It's possible I vastly overestimated how much work ripping out the
> dirent would be; I mean I've mis-estimated absolutely everything in
> dir.c and assumed each "little" thing would all be a small amount of
> work, so maybe I'm just swinging the pendulum too far the other way.
> But, although I think this alternative would be the cleanest, I saw a
> couple things that looked like this was going to turn into a huge can
> of worms when I started to peek at what it all touched.  I'd be happy
> for someone to take this route, but it won't be me (see also
> https://lore.kernel.org/git/CABPp-BEkX9cH1=r3dJ4WLzcJKVcF-KpGUkshL34MMp3Xhhhpuw@mail.gmail.com/).

OK. I certainly don't insist on this direction. I just saw the
portability issues and wondered how bad it would be to do so. Hence the
patch I sent, which I _think_ is correct, but I really don't know the
dir.c code very well. And I'm sure it will not surprise you that I have
generally been confused and/or frightened by it when I do look at it. :)

-Peff

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 6/8] dir: fix checks on common prefix directory
  2019-12-20 18:01               ` Junio C Hamano
@ 2019-12-20 21:15                 ` Jeff King
  0 siblings, 0 replies; 69+ messages in thread
From: Jeff King @ 2019-12-20 21:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
	blees, Kyle Meyer, Samuel Lijin

On Fri, Dec 20, 2019 at 10:01:21AM -0800, Junio C Hamano wrote:

> In the meantime, here is to apologize for merging the patch a bit
> too early to 'next'.
> 
> -- >8 --
> From: Junio C Hamano <gitster@pobox.com>
> Date: Fri, 20 Dec 2019 09:55:53 -0800
> Subject: [PATCH] dir.c: use st_add3() for allocation size

Thanks, I think this is an easy improvement worth doing (I laid out more
in my response to Elijah, but: I don't think this is exploitable, but
I'd rather err on the side of caution and ease of auditing).

-Peff

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 0/3] Directory traversal bugs
  2019-12-17 18:24       ` Junio C Hamano
@ 2019-12-21 22:05         ` Johannes Schindelin
  0 siblings, 0 replies; 69+ messages in thread
From: Johannes Schindelin @ 2019-12-21 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, blees, kyle, sxlijin

Hi Junio,

On Tue, 17 Dec 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > As I said elsewhere, if Git for Windows' FSCache hack is the only thing
> > that is broken by this patch series, in light of the bugs that it _does_
> > fix I would rather adjust the FSCache patches to accommodate v2.
>
> With "FSCache hack", do you refer to the "d_name is a pointer to
> elsewhere" thing?

Yes.

> If so, I too very much appreciate the direction you are suggesting.
> Seeing that these three patches essentially are the same as three (1/8,
> 3/8 and 4/8) from the v2, I'd keep all the 8 patches from v2 in my tree
> for now.
>
> Thanks, both.

Thank you,
Dscho

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2019-12-21 22:06 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-09 21:06   ` Denton Liu
2019-12-09 20:47 ` [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-09 21:32   ` Denton Liu
2019-12-09 21:51     ` Elijah Newren
2019-12-09 22:09     ` Eric Sunshine
2019-12-09 20:47 ` [PATCH 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-15 10:29     ` Johannes Schindelin
2019-12-16 13:51       ` Elijah Newren
2019-12-16 16:00         ` Elijah Newren
2019-12-16 18:13           ` Junio C Hamano
2019-12-16 21:08             ` Elijah Newren
2019-12-16 21:25               ` Junio C Hamano
2019-12-16 22:39                 ` Elijah Newren
2019-12-17  0:04           ` Johannes Schindelin
2019-12-17  0:14             ` Junio C Hamano
2019-12-17 11:08               ` Johannes Schindelin
2019-12-17 17:33                 ` Junio C Hamano
2019-12-17 19:32                   ` Johannes Schindelin
2019-12-17  5:26             ` Elijah Newren
2019-12-17 11:15               ` Johannes Schindelin
2019-12-17 16:58                 ` Elijah Newren
2019-12-10 20:00   ` [PATCH v2 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-17  8:33     ` [PATCH v3 1/3] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-17  8:33     ` [PATCH v3 2/3] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-17  8:33     ` [PATCH v3 3/3] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-17 11:18     ` [PATCH v3 0/3] Directory traversal bugs Johannes Schindelin
2019-12-17 18:24       ` Junio C Hamano
2019-12-21 22:05         ` Johannes Schindelin
2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-18 21:29         ` Junio C Hamano
2019-12-19 20:23           ` Elijah Newren
2019-12-19 22:24             ` Jeff King
2019-12-20 17:00               ` Elijah Newren
2019-12-20 21:14                 ` Jeff King
2019-12-20 18:01               ` Junio C Hamano
2019-12-20 21:15                 ` Jeff King
2019-12-18 19:29       ` [PATCH v4 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).