All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Eric Sunshine <sunshine@sunshineco.com>,
	Elijah Newren <newren@gmail.com>,
	Derrick Stolee <stolee@gmail.com>, Jeff King <peff@peff.net>,
	Philip Oakley <philipoakley@iee.email>,
	Jeff Hostetler <jeffhost@microsoft.com>,
	Josh Steadmon <steadmon@google.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v3 0/8] Directory traversal fixes
Date: Sat, 08 May 2021 19:58:56 +0000	[thread overview]
Message-ID: <pull.1020.v3.git.git.1620503945.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1020.v2.git.git.1620432500.gitgitgadget@gmail.com>

This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should (one of which was originally reported on this
list at [1]). And it includes a few cleanups

Changes since v2:

 * Move the RFC patches to the front
 * Deletes all the ugly test code that stole reviewer attention away from
   the rest of the series. :-) The RFC patches being first allow the test to
   be dramatically simplified and rewritten.
 * Included cleanups suggested by Phillip Oakley and Eric Sunshine (the
   cleanups suggested by others are obsolete with the test rewrite).

Patches 1-3 are RFC because

 * (1) I'm not that familiar with trace1 & trace2; I've only used trace2 for
   region_enter() and region_leave() calls before. And I'm unsure if
   removing trace1 counts as a backward compatibility issue or not, though
   the trace2 documentation claims it's meant to replace trace1.
 * (2) The ls-files -i handling to print an error instead of operating as
   before might be considered a backward incompatible change. I want to hear
   others' opinions on that.

Also, if anyone has any ideas about a better place to put the "Some
sidenotes" from the sixth commit message rather than keeping them in a
random commit message, that might be helpful too.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Derrick Stolee (1):
  dir: update stale description of treat_directory()

Elijah Newren (7):
  [RFC] dir: convert trace calls to trace2 equivalents
  [RFC] dir: report number of visited directories and paths with trace2
  [RFC] ls-files: error out on -i unless -o or -c are specified
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles

 builtin/ls-files.c                 |   3 +
 dir.c                              | 103 +++++++++------
 dir.h                              |   4 +
 t/t1306-xdg-files.sh               |   2 +-
 t/t3001-ls-files-others-exclude.sh |   5 +
 t/t3003-ls-files-exclude.sh        |   4 +-
 t/t7063-status-untracked-cache.sh  | 194 ++++++++++++++++-------------
 t/t7300-clean.sh                   |  41 ++++++
 t/t7519-status-fsmonitor.sh        |   8 +-
 9 files changed, 238 insertions(+), 126 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v3
Pull-Request: https://github.com/git/git/pull/1020

Range-diff vs v2:

 7:  3a2394506a53 = 1:  9f1c0d78d739 [RFC] dir: convert trace calls to trace2 equivalents
 8:  fba4d65b78c7 ! 2:  8b511f228af8 [RFC] dir: reported number of visited directories and paths with trace2
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] dir: reported number of visited directories and paths with trace2
     +    [RFC] dir: report number of visited directories and paths with trace2
      
     -    Previously, tests that wanted to verify that we don't traverse into a
     -    deep directory hierarchy that is ignored had no easy way to verify and
     -    enforce that behavior.  Record information about the number of
     -    directories and paths we inspect while traversing the directory
     -    hierarchy in read_directory(), and when trace2 is enabled, print these
     -    statistics.
     -
     -    Make use of these statistics in t7300 to simplify (and vastly improve
     -    the performance of) the "avoid traversing into ignored directories"
     -    test.
     +    Provide more statistics in trace2 output that include the number of
     +    directories and total paths visited by the directory traversal logic.
     +    Subsequent patches will take advantage of this to ensure we do not
     +    unnecessarily traverse into ignored directories.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ t/t7063-status-untracked-cache.sh: get_relevant_traces() {
       	    | cut -d "|" -f 9 \
       	    >$OUTPUT_FILE
       }
     -
     - ## t/t7300-clean.sh ##
     -@@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
     - '
     - 
     - test_expect_success 'avoid traversing into ignored directories' '
     --	test_when_finished rm -f output error &&
     -+	test_when_finished rm -f output error trace.* &&
     - 	test_create_repo avoid-traversing-deep-hierarchy &&
     - 	(
     - 		cd avoid-traversing-deep-hierarchy &&
     - 
     --		>directory-random-file.txt &&
     --		# Put this file under directory400/directory399/.../directory1/
     --		depth=400 &&
     --		for x in $(test_seq 1 $depth); do
     --			mkdir "tmpdirectory$x" &&
     --			mv directory* "tmpdirectory$x" &&
     --			mv "tmpdirectory$x" "directory$x"
     --		done &&
     --
     --		git clean -ffdxn -e directory$depth >../output 2>../error &&
     --
     --		test_must_be_empty ../output &&
     --		# We especially do not want things like
     --		#   "warning: could not open directory "
     --		# appearing in the error output.  It is true that directories
     --		# that are too long cannot be opened, but we should not be
     --		# recursing into those directories anyway since the very first
     --		# level is ignored.
     --		test_must_be_empty ../error &&
     --
     --		# alpine-linux-musl fails to "rm -rf" a directory with such
     --		# a deeply nested hierarchy.  Help it out by deleting the
     --		# leading directories ourselves.  Super slow, but, what else
     --		# can we do?  Without this, we will hit a
     --		#     error: Tests passed but test cleanup failed; aborting
     --		# so do this ugly manual cleanup...
     --		while test ! -f directory-random-file.txt; do
     --			name=$(ls -d directory*) &&
     --			mv $name/* . &&
     --			rmdir $name
     --		done
     -+		mkdir -p untracked/subdir/with/a &&
     -+		>untracked/subdir/with/a/random-file.txt &&
     -+
     -+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     -+		git clean -ffdxn -e untracked &&
     -+
     -+		grep data.*read_directo.*visited ../trace.output \
     -+			| cut -d "|" -f 9 >../trace.relevant &&
     -+		cat >../trace.expect <<-EOF &&
     -+		 directories-visited:1
     -+		 paths-visited:4
     -+		EOF
     -+		test_cmp ../trace.expect ../trace.relevant
     - 	)
     - '
     - 
 5:  3d8dd00ccd10 = 3:  44a1322c4402 [RFC] ls-files: error out on -i unless -o or -c are specified
 1:  a3bd253fa8e8 ! 4:  dc3d3f247141 t7300: add testcase showing unnecessary traversal into ignored directory
     @@ Metadata
       ## Commit message ##
          t7300: add testcase showing unnecessary traversal into ignored directory
      
     -    PNPM is apparently creating deeply nested (but ignored) directory
     -    structures; traversing them is costly performance-wise, unnecessary, and
     -    in some cases is even throwing warnings/errors because the paths are too
     -    long to handle on various platforms.  Add a testcase that demonstrates
     -    this problem.
     +    The PNPM package manager is apparently creating deeply nested (but
     +    ignored) directory structures; traversing them is costly
     +    performance-wise, unnecessary, and in some cases is even throwing
     +    warnings/errors because the paths are too long to handle on various
     +    platforms.  Add a testcase that checks for such unnecessary directory
     +    traversal.
      
     -    Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
     -    Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## t/t7300-clean.sh ##
     @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
       '
       
      +test_expect_failure 'avoid traversing into ignored directories' '
     -+	test_when_finished rm -f output error &&
     ++	test_when_finished rm -f output error trace.* &&
      +	test_create_repo avoid-traversing-deep-hierarchy &&
      +	(
      +		cd avoid-traversing-deep-hierarchy &&
      +
     -+		>directory-random-file.txt &&
     -+		# Put this file under directory400/directory399/.../directory1/
     -+		depth=400 &&
     -+		for x in $(test_seq 1 $depth); do
     -+			mkdir "tmpdirectory$x" &&
     -+			mv directory* "tmpdirectory$x" &&
     -+			mv "tmpdirectory$x" "directory$x"
     -+		done &&
     ++		mkdir -p untracked/subdir/with/a &&
     ++		>untracked/subdir/with/a/random-file.txt &&
      +
     -+		git clean -ffdxn -e directory$depth >../output 2>../error &&
     ++		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     ++		git clean -ffdxn -e untracked
     ++	) &&
      +
     -+		test_must_be_empty ../output &&
     -+		# We especially do not want things like
     -+		#   "warning: could not open directory "
     -+		# appearing in the error output.  It is true that directories
     -+		# that are too long cannot be opened, but we should not be
     -+		# recursing into those directories anyway since the very first
     -+		# level is ignored.
     -+		test_must_be_empty ../error &&
     -+
     -+		# alpine-linux-musl fails to "rm -rf" a directory with such
     -+		# a deeply nested hierarchy.  Help it out by deleting the
     -+		# leading directories ourselves.  Super slow, but, what else
     -+		# can we do?  Without this, we will hit a
     -+		#     error: Tests passed but test cleanup failed; aborting
     -+		# so do this ugly manual cleanup...
     -+		while test ! -f directory-random-file.txt; do
     -+			name=$(ls -d directory*) &&
     -+			mv $name/* . &&
     -+			rmdir $name
     -+		done
     -+	)
     ++	grep data.*read_directo.*visited trace.output \
     ++		| cut -d "|" -f 9 >trace.relevant &&
     ++	cat >trace.expect <<-EOF &&
     ++	 directories-visited:1
     ++	 paths-visited:4
     ++	EOF
     ++	test_cmp trace.expect trace.relevant
      +'
      +
       test_done
 2:  aa3a41e26eca ! 5:  73b03a1e8e05 t3001, t7300: add testcase showcasing missed directory traversal
     @@ t/t3001-ls-files-others-exclude.sh: EOF
      
       ## t/t7300-clean.sh ##
      @@ t/t7300-clean.sh: test_expect_failure 'avoid traversing into ignored directories' '
     - 	)
     + 	test_cmp trace.expect trace.relevant
       '
       
      +test_expect_failure 'traverse into directories that may have ignored entries' '
 3:  3c3f6111da13 ! 6:  66ffc7f02d08 dir: avoid unnecessary traversal into ignored directory
     @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
       
      -test_expect_failure 'avoid traversing into ignored directories' '
      +test_expect_success 'avoid traversing into ignored directories' '
     - 	test_when_finished rm -f output error &&
     + 	test_when_finished rm -f output error trace.* &&
       	test_create_repo avoid-traversing-deep-hierarchy &&
       	(
 4:  fad048339b81 ! 7:  acde436b220e dir: traverse into untracked directories if they may have ignored subfiles
     @@ t/t3001-ls-files-others-exclude.sh: EOF
      
       ## t/t7300-clean.sh ##
      @@ t/t7300-clean.sh: test_expect_success 'avoid traversing into ignored directories' '
     - 	)
     + 	test_cmp trace.expect trace.relevant
       '
       
      -test_expect_failure 'traverse into directories that may have ignored entries' '
 6:  1d825dfdc70b = 8:  57135c357774 dir: update stale description of treat_directory()

-- 
gitgitgadget

  parent reply	other threads:[~2021-05-08 19:59 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-07  4:27   ` Eric Sunshine
2021-05-07  5:00     ` Elijah Newren
2021-05-07  5:31       ` Eric Sunshine
2021-05-07  5:42         ` Elijah Newren
2021-05-07  5:56           ` Eric Sunshine
2021-05-07 23:05       ` Jeff King
2021-05-07 23:15         ` Eric Sunshine
2021-05-08  0:04         ` Elijah Newren
2021-05-08  0:10           ` Eric Sunshine
2021-05-08 17:20             ` Elijah Newren
2021-05-08 11:13   ` Philip Oakley
2021-05-08 17:20     ` Elijah Newren
2021-05-07  4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-07  4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-07  4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-07  4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee
2021-05-07 17:57   ` Elijah Newren
2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee
2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-08 10:13     ` Junio C Hamano
2021-05-08 17:34       ` Elijah Newren
2021-05-08 10:19     ` Junio C Hamano
2021-05-08 17:41       ` Elijah Newren
2021-05-08  0:08   ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-08 19:58   ` Elijah Newren via GitGitGadget [this message]
2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-10  4:49       ` Junio C Hamano
2021-05-11 17:23         ` Elijah Newren
2021-05-11 16:17       ` Jeff Hostetler
2021-05-11 17:29         ` Elijah Newren
2021-05-08 19:58     ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-10  5:00       ` Junio C Hamano
2021-05-08 19:58     ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-10  5:09       ` Junio C Hamano
2021-05-11 17:40         ` Elijah Newren
2021-05-11 22:32           ` Junio C Hamano
2021-05-08 19:59     ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-10  5:28       ` Junio C Hamano
2021-05-11 17:45         ` Elijah Newren
2021-05-11 22:43           ` Junio C Hamano
2021-05-12  2:07             ` Elijah Newren
2021-05-12  3:17               ` Junio C Hamano
2021-05-08 19:59     ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-08 19:59     ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-10  5:48       ` Junio C Hamano
2021-05-11 17:57         ` Elijah Newren
2021-05-08 19:59     ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-08 19:59     ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-11 19:06         ` Jeff Hostetler
2021-05-11 20:12           ` Elijah Newren
2021-05-11 23:12             ` Jeff Hostetler
2021-05-12  0:44               ` Elijah Newren
2021-05-12 12:26                 ` Jeff Hostetler
2021-05-12 15:24                   ` Elijah Newren
2021-05-11 18:34       ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-17 17:20           ` Derrick Stolee
2021-05-17 19:44             ` Junio C Hamano
2021-05-18  3:32               ` Elijah Newren
2021-05-19  1:44                 ` Junio C Hamano
2021-05-12 17:28         ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget
2021-05-17 17:22           ` Derrick Stolee
2021-05-18  3:34             ` Elijah Newren
2021-05-17 17:23         ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1020.v3.git.git.1620503945.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jeffhost@microsoft.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=philipoakley@iee.email \
    --cc=steadmon@google.com \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.