All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/10] tree name and depth limits
@ 2023-08-31  6:17 Jeff King
  2023-08-31  6:17 ` [PATCH 01/10] tree-walk: reduce stack size for recursive functions Jeff King
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:17 UTC (permalink / raw)
  To: git

There aren't currently any limits on the size of pathnames in tree
objects, nor of the depth of recursive trees. This can create two
problems:

  1. We process pathnames from trees as strings in lots of places, and
     we have a history of using "int" to access strings. I don't know of
     a specific problem area, but I have little doubt there is a spot
     where you can convince an "int" to overflow with a 2GB tree.

     In the long run it would be nice to fix all of that code to use
     size_t. And hopefully other audits to use st_add(), etc, for
     allocations mean that we'd avoid actual buffer overflows. But in
     the meantime, it sure would be nice to have another layer
     protecting us.

  2. Because we recurse for most tree algorithms, it's easy for a
     malicious tree to run you out of stack space and cause a segfault.
     By itself this isn't a security problem, but it's confusing for
     users. And if you happen to run a public hosting site and monitor
     your segfaults, it's awfully annoying to investigate each one to
     see if it's the first sign of a real attack, or if somebody is just
     throwing garbage at you.

So here are some patches I wrote for GitHub several years ago. They add
limits on the size of a single pathname component, as well as on the
depth we're willing to recurse when handling trees.

I kind of hate them, because I hate arbitrary limits. But I also think
they can do some good in practice, and the default limits are set such
that I doubt anybody will ever notice, let alone complain (they have
been in place on github.com for 4+ years, and I don't recall ever
getting any feedback from users who were not security researchers poking
at the system).

And most operating systems have arbitrary limits (that are much
smaller!) anyway. So if you plan to do exotic things like "run
git-checkout", you'd run into problems way before you hit either of
these.

So IMHO they'll do more good than harm.

The first three patches are cleanups / prep.

Patch 4 is the name limit.

Patches 5-10 implement the depth limits.

  [01/10]: tree-walk: reduce stack size for recursive functions
  [02/10]: tree-walk: drop MAX_TRAVERSE_TREES macro
  [03/10]: tree-walk: rename "error" variable
  [04/10]: fsck: detect very large tree pathnames
  [05/10]: add core.maxTreeDepth config
  [06/10]: traverse_trees(): respect max_allowed_tree_depth
  [07/10]: read_tree(): respect max_allowed_tree_depth
  [08/10]: list-objects: respect max_allowed_tree_depth
  [09/10]: tree-diff: respect max_allowed_tree_depth
  [10/10]: lower core.maxTreeDepth default to 2048

 Documentation/config/core.txt |  6 +++
 Documentation/fsck-msgids.txt |  7 +++
 config.c                      |  5 ++
 environment.c                 |  1 +
 environment.h                 |  1 +
 fsck.c                        | 24 ++++++++-
 fsck.h                        |  1 +
 list-objects.c                |  8 +++
 sparse-index.c                |  2 +-
 t/t1450-fsck.sh               | 10 ++++
 t/t6700-tree-depth.sh         | 93 +++++++++++++++++++++++++++++++++++
 tree-diff.c                   | 23 ++++++---
 tree-walk.c                   | 20 +++++---
 tree-walk.h                   |  2 -
 tree.c                        |  9 +++-
 tree.h                        |  1 +
 unpack-trees.c                |  9 +++-
 unpack-trees.h                |  2 +-
 wt-status.c                   |  2 +-
 19 files changed, 201 insertions(+), 25 deletions(-)
 create mode 100755 t/t6700-tree-depth.sh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/10] tree-walk: reduce stack size for recursive functions
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
@ 2023-08-31  6:17 ` Jeff King
  2023-08-31  6:19 ` [PATCH 02/10] tree-walk: drop MAX_TRAVERSE_TREES macro Jeff King
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:17 UTC (permalink / raw)
  To: git

The traverse_trees() and traverse_trees_recursive() functions call each
other recursively. In a deep tree, this can result in running out of
stack space and crashing.

There's obviously going to be some limit here based on available stack,
but the problem is exacerbated by a few large structs, many of which we
over-allocate. For example, in traverse_trees() we store a name_entry
and tree_desc_x per tree, both of which contain an object_id (which is
now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even
though many traversals will only look at 1 or 2.

Interestingly, we used to allocate these on the heap, prior to
8dd40c0472 (traverse_trees(): use stack array for name entries,
2020-01-30). That commit was trying to simplify away allocation size
computations, and naively assumed that the sizes were small enough not
to matter. And they don't in normal cases, but on my stock Debian system
I see a crash running "git archive" on a tree with ~3600 entries.
That's deep enough we wouldn't see it in practice, but probably shallow
enough that we'd prefer not to make it a hard limit. Especially because
other systems may have even smaller stacks.

We can replace these stack variables with a few malloc invocations. This
reduces the stack sizes for the two functions from 1128 and 752 bytes,
respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on
my machine (the improvement isn't in linear proportion because my
numbers don't count the size of parameters and other function overhead).

The possible downsides are:

  1. We now have to remember to free(). But both functions have an easy
     single exit (and already had to clean up other bits anyway).

  2. The extra malloc()/free() overhead might be measurable. I tested
     this by setting up a 3000-depth tree with a single blob and running
     "git archive" on it. After switching to the heap, it consistently
     runs 2-3% faster! Presumably this is because the 1K+ of wasted
     stack space penalized memory caches.

On a more real-world case like linux.git, the speed difference isn't
measurable at all, simply because most trees aren't that deep and
there's so much other work going on (like accessing the objects
themselves). So the improvement I saw should be taken as evidence that
we're not making anything worse, but isn't really that interesting on
its own. The main motivation here is that we're now less likely to run
out of stack space and crash.

Signed-off-by: Jeff King <peff@peff.net>
---
 tree-walk.c    | 10 ++++++----
 unpack-trees.c |  9 +++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 29ead71be1..ad49d55290 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -442,9 +442,9 @@ int traverse_trees(struct index_state *istate,
 		   struct traverse_info *info)
 {
 	int error = 0;
-	struct name_entry entry[MAX_TRAVERSE_TREES];
+	struct name_entry *entry;
 	int i;
-	struct tree_desc_x tx[ARRAY_SIZE(entry)];
+	struct tree_desc_x *tx;
 	struct strbuf base = STRBUF_INIT;
 	int interesting = 1;
 	char *traverse_path;
@@ -455,8 +455,8 @@ int traverse_trees(struct index_state *istate,
 	if (traverse_trees_cur_depth > traverse_trees_max_depth)
 		traverse_trees_max_depth = traverse_trees_cur_depth;
 
-	if (n >= ARRAY_SIZE(entry))
-		BUG("traverse_trees() called with too many trees (%d)", n);
+	ALLOC_ARRAY(entry, n);
+	ALLOC_ARRAY(tx, n);
 
 	for (i = 0; i < n; i++) {
 		tx[i].d = t[i];
@@ -551,6 +551,8 @@ int traverse_trees(struct index_state *istate,
 	}
 	for (i = 0; i < n; i++)
 		free_extended_entry(tx + i);
+	free(tx);
+	free(entry);
 	free(traverse_path);
 	info->traverse_path = NULL;
 	strbuf_release(&base);
diff --git a/unpack-trees.c b/unpack-trees.c
index 87517364dc..a203f9a3d7 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -864,8 +864,8 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	struct unpack_trees_options *o = info->data;
 	int i, ret, bottom;
 	int nr_buf = 0;
-	struct tree_desc t[MAX_UNPACK_TREES];
-	void *buf[MAX_UNPACK_TREES];
+	struct tree_desc *t;
+	void **buf;
 	struct traverse_info newinfo;
 	struct name_entry *p;
 	int nr_entries;
@@ -902,6 +902,9 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	newinfo.pathlen = st_add3(newinfo.pathlen, tree_entry_len(p), 1);
 	newinfo.df_conflicts |= df_conflicts;
 
+	ALLOC_ARRAY(t, n);
+	ALLOC_ARRAY(buf, n);
+
 	/*
 	 * Fetch the tree from the ODB for each peer directory in the
 	 * n commits.
@@ -937,6 +940,8 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 
 	for (i = 0; i < nr_buf; i++)
 		free(buf[i]);
+	free(buf);
+	free(t);
 
 	return ret;
 }
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/10] tree-walk: drop MAX_TRAVERSE_TREES macro
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
  2023-08-31  6:17 ` [PATCH 01/10] tree-walk: reduce stack size for recursive functions Jeff King
@ 2023-08-31  6:19 ` Jeff King
  2023-08-31  6:19 ` [PATCH 03/10] tree-walk: rename "error" variable Jeff King
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:19 UTC (permalink / raw)
  To: git

Since the previous commit dropped the hard-coded limit in
traverse_trees(), we don't need this macro there anymore (the code can
handle any number of trees in parallel).

We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to
5290d45134 (tree-walk.c: break circular dependency with unpack-trees,
2020-02-01). So we can just directly define that as "8" now; we know
traverse_trees() can handle whatever we throw at it.

Signed-off-by: Jeff King <peff@peff.net>
---
I think from previous discussions that this "8" may also be excessive.
But digging into that should definitely be left for another series.

 tree-walk.h    | 2 --
 unpack-trees.h | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/tree-walk.h b/tree-walk.h
index 74cdceb3fe..a6bfa3da3a 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -6,8 +6,6 @@
 struct index_state;
 struct repository;
 
-#define MAX_TRAVERSE_TREES 8
-
 /**
  * The tree walking API is used to traverse and inspect trees.
  */
diff --git a/unpack-trees.h b/unpack-trees.h
index 9b827c307f..5867e26e17 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -7,7 +7,7 @@
 #include "string-list.h"
 #include "tree-walk.h"
 
-#define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
+#define MAX_UNPACK_TREES 8
 
 struct cache_entry;
 struct unpack_trees_options;
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/10] tree-walk: rename "error" variable
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
  2023-08-31  6:17 ` [PATCH 01/10] tree-walk: reduce stack size for recursive functions Jeff King
  2023-08-31  6:19 ` [PATCH 02/10] tree-walk: drop MAX_TRAVERSE_TREES macro Jeff King
@ 2023-08-31  6:19 ` Jeff King
  2023-08-31  6:20 ` [PATCH 04/10] fsck: detect very large tree pathnames Jeff King
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:19 UTC (permalink / raw)
  To: git

The "error" variable in traverse_trees() shadows the global error()
function (meaning we can't call error() from here). Let's call the local
variable "ret" instead, which matches the idiom in other functions.

Signed-off-by: Jeff King <peff@peff.net>
---
 tree-walk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index ad49d55290..4efd0fc391 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -441,7 +441,7 @@ int traverse_trees(struct index_state *istate,
 		   int n, struct tree_desc *t,
 		   struct traverse_info *info)
 {
-	int error = 0;
+	int ret = 0;
 	struct name_entry *entry;
 	int i;
 	struct tree_desc_x *tx;
@@ -539,7 +539,7 @@ int traverse_trees(struct index_state *istate,
 		if (interesting) {
 			trees_used = info->fn(n, mask, dirmask, entry, info);
 			if (trees_used < 0) {
-				error = trees_used;
+				ret = trees_used;
 				if (!info->show_all_errors)
 					break;
 			}
@@ -558,7 +558,7 @@ int traverse_trees(struct index_state *istate,
 	strbuf_release(&base);
 
 	traverse_trees_cur_depth--;
-	return error;
+	return ret;
 }
 
 struct dir_state {
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/10] fsck: detect very large tree pathnames
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (2 preceding siblings ...)
  2023-08-31  6:19 ` [PATCH 03/10] tree-walk: rename "error" variable Jeff King
@ 2023-08-31  6:20 ` Jeff King
  2023-08-31  6:21 ` [PATCH 05/10] add core.maxTreeDepth config Jeff King
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:20 UTC (permalink / raw)
  To: git

In general, Git tries not to arbitrarily limit what it will store, and
there are currently no limits at all on the size of the path we find in
a tree. In theory you could have one that is gigabytes long.

But in practice this freedom is not really helping anybody, and is
potentially harmful:

  1. Most operating systems have much lower limits for the size of a
     single pathname component (e.g., on Linux you'll generally get
     ENAMETOOLONG for anything over 255 bytes). And while you _can_ use
     Git in a way that never touches the filesystem (manipulating the
     index and trees directly), it's still probably not a good idea to
     have gigantic tree names. Many operations load and traverse them,
     so any clever Git-as-a-database scheme is likely to perform poorly
     in that case.

  2. We still have a lot of code which assumes strings are reasonably
     sized, and I won't be at all surprised if you can trigger some
     interesting integer overflows with gigantic pathnames. Stopping
     malicious trees from entering the repository provides an extra line
     of defense, protecting downstream code.

This patch implements an fsck check so that such trees can be rejected
by transfer.fsckObjects. I've picked a reasonably high maximum depth
here (4096) that hopefully should not bother anybody in practice. I've
also made it configurable, as an escape hatch.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/fsck-msgids.txt |  7 +++++++
 fsck.c                        | 24 +++++++++++++++++++++++-
 fsck.h                        |  1 +
 t/t1450-fsck.sh               | 10 ++++++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 12eae8a222..09b0aecbf8 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -103,6 +103,13 @@
 `hasDotgit`::
 	(WARN) A tree contains an entry named `.git`.
 
+`largePathname`::
+	(WARN) A tree contains an entry with a very long path name. If
+	the value of `fsck.largePathname` contains a colon, that value
+	is used as the maximum allowable length (e.g., "warn:10" would
+	complain about any path component of 11 or more bytes). The
+	default value is 4096.
+
 `mailmapSymlink`::
 	(INFO) `.mailmap` is a symlink.
 
diff --git a/fsck.c b/fsck.c
index 2b1e348005..6a0bbc5087 100644
--- a/fsck.c
+++ b/fsck.c
@@ -24,6 +24,8 @@
 #include "credential.h"
 #include "help.h"
 
+static ssize_t max_tree_entry_len = 4096;
+
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -154,15 +156,29 @@ void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
+	char *to_free = NULL;
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
 
+	if (msg_id == FSCK_MSG_LARGE_PATHNAME) {
+		const char *colon = strchr(msg_type_str, ':');
+		if (colon) {
+			msg_type_str = to_free =
+				xmemdupz(msg_type_str, colon - msg_type_str);
+			colon++;
+			if (!git_parse_ssize_t(colon, &max_tree_entry_len))
+				die("unable to parse max tree entry len: %s", colon);
+		}
+	}
+	msg_type = parse_msg_type(msg_type_str);
+
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
+	free(to_free);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
@@ -578,6 +594,7 @@ static int fsck_tree(const struct object_id *tree_oid,
 	int has_bad_modes = 0;
 	int has_dup_entries = 0;
 	int not_properly_sorted = 0;
+	int has_large_name = 0;
 	struct tree_desc desc;
 	unsigned o_mode;
 	const char *o_name;
@@ -607,6 +624,7 @@ static int fsck_tree(const struct object_id *tree_oid,
 		has_dotdot |= !strcmp(name, "..");
 		has_dotgit |= is_hfs_dotgit(name) || is_ntfs_dotgit(name);
 		has_zero_pad |= *(char *)desc.buffer == '0';
+		has_large_name |= tree_entry_len(&desc.entry) > max_tree_entry_len;
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
@@ -749,6 +767,10 @@ static int fsck_tree(const struct object_id *tree_oid,
 		retval += report(options, tree_oid, OBJ_TREE,
 				 FSCK_MSG_TREE_NOT_SORTED,
 				 "not properly sorted");
+	if (has_large_name)
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_LARGE_PATHNAME,
+				 "contains excessively large pathname");
 	return retval;
 }
 
diff --git a/fsck.h b/fsck.h
index 6359ba359b..e3adf9d911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -73,6 +73,7 @@ enum fsck_msg_type {
 	FUNC(NULL_SHA1, WARN) \
 	FUNC(ZERO_PADDED_FILEMODE, WARN) \
 	FUNC(NUL_IN_COMMIT, WARN) \
+	FUNC(LARGE_PATHNAME, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(BAD_FILEMODE, INFO) \
 	FUNC(GITMODULES_PARSE, INFO) \
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5805d47eb9..10a539158c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -589,6 +589,16 @@ test_expect_success 'fsck notices submodule entry pointing to null sha1' '
 	)
 '
 
+test_expect_success 'fsck notices excessively large tree entry name' '
+	git init large-name &&
+	(
+		cd large-name &&
+		test_commit a-long-name &&
+		git -c fsck.largePathname=warn:10 fsck 2>out &&
+		grep "warning.*large pathname" out
+	)
+'
+
 while read name path pretty; do
 	while read mode type; do
 		: ${pretty:=$path}
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/10] add core.maxTreeDepth config
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (3 preceding siblings ...)
  2023-08-31  6:20 ` [PATCH 04/10] fsck: detect very large tree pathnames Jeff King
@ 2023-08-31  6:21 ` Jeff King
  2023-08-31  6:21 ` [PATCH 06/10] traverse_trees(): respect max_allowed_tree_depth Jeff King
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:21 UTC (permalink / raw)
  To: git

Most of our tree traversal algorithms use recursion to visit sub-trees.
For pathologically large trees, this can cause us to run out of stack
space and abort in an uncontrolled way. Let's put our own limit here so
that we can fail gracefully rather than segfaulting.

In similar cases where we recursed along the commit graph, we rewrote
the algorithms to avoid recursion and keep any stack data on the heap.
But the commit graph is meant to grow without bound, whereas it's not an
imposition to put a limit on the maximum size of tree we'll handle.

And this has a bonus side effect: coupled with a limit on individual
tree entry names, this limits the total size of a path we may encounter.
This gives us an extra protection against code handling long path names
which may suffer from integer overflows in the size (which could then be
exploited by malicious trees).

The default of 4096 is set to be much longer than anybody would care
about in the real world. Even with single-letter interior tree names
(like "a/b/c"), such a path is at least 8191 bytes. While most operating
systems will let you create such a path incrementally, trying to
reference the whole thing in a system call (as Git would do when
actually trying to access it) will result in ENAMETOOLONG. Coupled with
the recent fsck.largePathname warning, the maximum total pathname Git
will handle is (by default) 16MB.

This config option doesn't do anything yet; future patches will convert
various algorithms to respect the limit.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config/core.txt | 6 ++++++
 config.c                      | 5 +++++
 environment.c                 | 1 +
 environment.h                 | 1 +
 4 files changed, 13 insertions(+)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index dfbdaf00b8..0e8c2832bf 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -736,3 +736,9 @@ core.abbrev::
 	If set to "no", no abbreviation is made and the object names
 	are shown in their full length.
 	The minimum length is 4.
+
+core.maxTreeDepth::
+	The maximum depth Git is willing to recurse while traversing a
+	tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe
+	to allow Git to abort cleanly, and should not generally need to
+	be adjusted. The default is 4096.
diff --git a/config.c b/config.c
index 3846a37be9..2ea39295cd 100644
--- a/config.c
+++ b/config.c
@@ -1801,6 +1801,11 @@ static int git_default_core_config(const char *var, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(var, "core.maxtreedepth")) {
+		max_allowed_tree_depth = git_config_int(var, value, ctx->kvi);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return platform_core_config(var, value, ctx, cb);
 }
diff --git a/environment.c b/environment.c
index f98d76f080..8e25b5ef02 100644
--- a/environment.c
+++ b/environment.c
@@ -81,6 +81,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
+int max_allowed_tree_depth = 4096;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/environment.h b/environment.h
index c5377473c6..e5351c9dd9 100644
--- a/environment.h
+++ b/environment.h
@@ -132,6 +132,7 @@ extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
 extern unsigned long big_file_threshold;
 extern unsigned long pack_size_limit_cfg;
+extern int max_allowed_tree_depth;
 
 /*
  * Accessors for the core.sharedrepository config which lazy-load the value
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/10] traverse_trees(): respect max_allowed_tree_depth
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (4 preceding siblings ...)
  2023-08-31  6:21 ` [PATCH 05/10] add core.maxTreeDepth config Jeff King
@ 2023-08-31  6:21 ` Jeff King
  2023-08-31  6:21 ` [PATCH 07/10] read_tree(): " Jeff King
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:21 UTC (permalink / raw)
  To: git

The tree-walk.c code walks trees recursively, and may run out of stack
space. The easiest way to see this is with git-archive; on my 64-bit
Linux system it runs out of stack trying to generate a tarfile with a
tree depth of 13,772.

I've picked 4100 as the depth for our "big" test. I ran it with a much
higher value to confirm that we do get a segfault without this patch.
But really anything over 4096 is sufficient for its stated purpose,
which is to find out if our default limit of 4096 is low enough to
prevent segfaults on all platforms. Keeping it small saves us time on
the test setup.

The tree-walk code that's touched here underlies unpack_trees(), so this
protects any programs which use it, not just git-archive (but archive is
easy to test, and was what alerted me to this issue in a real-world
case).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6700-tree-depth.sh | 66 +++++++++++++++++++++++++++++++++++++++++++
 tree-walk.c           |  4 +++
 2 files changed, 70 insertions(+)
 create mode 100755 t/t6700-tree-depth.sh

diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh
new file mode 100755
index 0000000000..d4d17db2ae
--- /dev/null
+++ b/t/t6700-tree-depth.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='handling of deep trees in various commands'
+. ./test-lib.sh
+
+# We'll test against two depths here: a small one that will let us check the
+# behavior of the config setting easily, and a large one that should be
+# forbidden by default. Testing the default depth will let us know whether our
+# default is enough to prevent segfaults on systems that run the tests.
+small_depth=50
+big_depth=4100
+
+small_ok="-c core.maxtreedepth=$small_depth"
+small_no="-c core.maxtreedepth=$((small_depth-1))"
+
+# usage: mkdeep <name> <depth>
+#   Create a tag <name> containing a file whose path has depth <depth>.
+#
+# We'll use fast-import here for two reasons:
+#
+#   1. It's faster than creating $big_depth tree objects.
+#
+#   2. As we tighten tree limits, it's more likely to allow large sizes
+#      than trying to stuff a deep path into the index.
+mkdeep () {
+	{
+		echo "commit refs/tags/$1" &&
+		echo "committer foo <foo@example.com> 1234 -0000" &&
+		echo "data <<EOF" &&
+		echo "the commit message" &&
+		echo "EOF" &&
+
+		printf 'M 100644 inline ' &&
+		i=0 &&
+		while test $i -lt $2
+		do
+			printf 'a/'
+			i=$((i+1))
+		done &&
+		echo "file" &&
+
+		echo "data <<EOF" &&
+		echo "the file contents" &&
+		echo "EOF" &&
+		echo
+	} | git fast-import
+}
+
+test_expect_success 'create small tree' '
+	mkdeep small $small_depth
+'
+
+test_expect_success 'create big tree' '
+	mkdeep big $big_depth
+'
+
+test_expect_success 'limit recursion of git-archive' '
+	git $small_ok archive small >/dev/null &&
+	test_must_fail git $small_no archive small >/dev/null
+'
+
+test_expect_success 'default limit for git-archive fails gracefully' '
+	test_must_fail git archive big >/dev/null
+'
+
+test_done
diff --git a/tree-walk.c b/tree-walk.c
index 4efd0fc391..b517792ba2 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -9,6 +9,7 @@
 #include "tree.h"
 #include "pathspec.h"
 #include "json-writer.h"
+#include "environment.h"
 
 static const char *get_mode(const char *str, unsigned int *modep)
 {
@@ -449,6 +450,9 @@ int traverse_trees(struct index_state *istate,
 	int interesting = 1;
 	char *traverse_path;
 
+	if (traverse_trees_cur_depth > max_allowed_tree_depth)
+		return error("exceeded maximum allowed tree depth");
+
 	traverse_trees_count++;
 	traverse_trees_cur_depth++;
 
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/10] read_tree(): respect max_allowed_tree_depth
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (5 preceding siblings ...)
  2023-08-31  6:21 ` [PATCH 06/10] traverse_trees(): respect max_allowed_tree_depth Jeff King
@ 2023-08-31  6:21 ` Jeff King
  2023-08-31  6:22 ` [PATCH 08/10] list-objects: " Jeff King
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:21 UTC (permalink / raw)
  To: git

The read_tree() function reads trees recursively (via its read_tree_at()
helper). This can cause it to run out of stack space on very deep trees.
Let's teach it about the new core.maxTreeDepth option.

The easiest way to demonstrate this is via "ls-tree -r", which the test
covers. Note that I needed a tree depth of ~30k to trigger a segfault on
my Linux system, not the 4100 used by our "big" test in t6700. However,
that test still tells us what we want: that the default 4096 limit is
enough to prevent segfaults on all platforms. We could bump it, but that
increases the cost of the test setup for little gain.

As an interesting side-note: when I originally wrote this patch about 4
years ago, I needed a depth of ~50k to segfault. But porting it forward,
the number is much lower. Seemingly little things like cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) take it from 32,722
to 29,080.

Signed-off-by: Jeff King <peff@peff.net>
---
 sparse-index.c        | 2 +-
 t/t6700-tree-depth.sh | 9 +++++++++
 tree.c                | 9 +++++++--
 tree.h                | 1 +
 wt-status.c           | 2 +-
 5 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 1fdb07a9e6..3578feb283 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -391,7 +391,7 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 		strbuf_setlen(&base, 0);
 		strbuf_add(&base, ce->name, strlen(ce->name));
 
-		read_tree_at(istate->repo, tree, &base, &ps,
+		read_tree_at(istate->repo, tree, &base, 0, &ps,
 			     add_path_to_index, &ctx);
 
 		/* free directory entries. full entries are re-used */
diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh
index d4d17db2ae..93ec41df03 100755
--- a/t/t6700-tree-depth.sh
+++ b/t/t6700-tree-depth.sh
@@ -63,4 +63,13 @@ test_expect_success 'default limit for git-archive fails gracefully' '
 	test_must_fail git archive big >/dev/null
 '
 
+test_expect_success 'limit recursion of ls-tree -r' '
+	git $small_ok ls-tree -r small &&
+	test_must_fail git $small_no ls-tree -r small
+'
+
+test_expect_success 'default limit for ls-tree fails gracefully' '
+	test_must_fail git ls-tree -r big >/dev/null
+'
+
 test_done
diff --git a/tree.c b/tree.c
index c745462f96..990f9c9854 100644
--- a/tree.c
+++ b/tree.c
@@ -10,11 +10,13 @@
 #include "alloc.h"
 #include "tree-walk.h"
 #include "repository.h"
+#include "environment.h"
 
 const char *tree_type = "tree";
 
 int read_tree_at(struct repository *r,
 		 struct tree *tree, struct strbuf *base,
+		 int depth,
 		 const struct pathspec *pathspec,
 		 read_tree_fn_t fn, void *context)
 {
@@ -24,6 +26,9 @@ int read_tree_at(struct repository *r,
 	int len, oldlen = base->len;
 	enum interesting retval = entry_not_interesting;
 
+	if (depth > max_allowed_tree_depth)
+		return error("exceeded maximum allowed tree depth");
+
 	if (parse_tree(tree))
 		return -1;
 
@@ -74,7 +79,7 @@ int read_tree_at(struct repository *r,
 		strbuf_add(base, entry.path, len);
 		strbuf_addch(base, '/');
 		retval = read_tree_at(r, lookup_tree(r, &oid),
-				      base, pathspec,
+				      base, depth + 1, pathspec,
 				      fn, context);
 		strbuf_setlen(base, oldlen);
 		if (retval)
@@ -89,7 +94,7 @@ int read_tree(struct repository *r,
 	      read_tree_fn_t fn, void *context)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret = read_tree_at(r, tree, &sb, pathspec, fn, context);
+	int ret = read_tree_at(r, tree, &sb, 0, pathspec, fn, context);
 	strbuf_release(&sb);
 	return ret;
 }
diff --git a/tree.h b/tree.h
index 1b5ecbda6b..cc6ddf51b3 100644
--- a/tree.h
+++ b/tree.h
@@ -44,6 +44,7 @@ typedef int (*read_tree_fn_t)(const struct object_id *, struct strbuf *, const c
 
 int read_tree_at(struct repository *r,
 		 struct tree *tree, struct strbuf *base,
+		 int depth,
 		 const struct pathspec *pathspec,
 		 read_tree_fn_t fn, void *context);
 
diff --git a/wt-status.c b/wt-status.c
index 5b1378965c..996f8635c3 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -739,7 +739,7 @@ static void wt_status_collect_changes_initial(struct wt_status *s)
 			ps.max_depth = -1;
 
 			strbuf_add(&base, ce->name, ce->ce_namelen);
-			read_tree_at(istate->repo, tree, &base, &ps,
+			read_tree_at(istate->repo, tree, &base, 0, &ps,
 				     add_file_to_list, s);
 			continue;
 		}
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/10] list-objects: respect max_allowed_tree_depth
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (6 preceding siblings ...)
  2023-08-31  6:21 ` [PATCH 07/10] read_tree(): " Jeff King
@ 2023-08-31  6:22 ` Jeff King
  2023-08-31  6:22 ` [PATCH 09/10] tree-diff: " Jeff King
  2023-08-31  6:23 ` [PATCH 10/10] lower core.maxTreeDepth default to 2048 Jeff King
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:22 UTC (permalink / raw)
  To: git

The tree traversal in list-objects.c, which is used by "rev-list
--objects", etc, uses recursion and may run out of stack space. Let's
teach it about the new core.maxTreeDepth config option.

We unfortunately can't return an error here, as this code doesn't
produce an error return at all. We'll die() instead, which matches the
behavior when we see an otherwise broken tree.

Note that this will also generally reject such deep trees from entering
the repository from a fetch or push, due to the use of rev-list in the
connectivity check. But it's not foolproof! We stop traversing when we
see an UNINTERESTING object, and the connectivity check marks existing
ref tips as UNINTERESTING. So imagine commit X has a tree
with maximum depth N. If you then create a new commit Y with a tree
entry "Y:subdir" that points to "X^{tree}", then the depth of Y will be
N+1. But a connectivity check running "git rev-list --objects Y --not X"
won't realize that; it will stop traversing at X^{tree}, since that was
already reachable.

So this will stop naive pushes of too-deep trees, but not carefully
crafted malicious ones. Doing it robustly and efficiently would require
caching the maximum depth of each tree (i.e., the longest path to any
leaf entry). That's much more complex and not strictly needed. If each
recursive algorithm limits itself already, then that's sufficient.
Blocking the objects from entering the repo would be a nice
belt-and-suspenders addition, but it's not worth the extra cost.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c        | 8 ++++++++
 t/t6700-tree-depth.sh | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/list-objects.c b/list-objects.c
index e60a6cd5b4..c25c72b32c 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -14,13 +14,15 @@
 #include "packfile.h"
 #include "object-store-ll.h"
 #include "trace.h"
+#include "environment.h"
 
 struct traversal_context {
 	struct rev_info *revs;
 	show_object_fn show_object;
 	show_commit_fn show_commit;
 	void *show_data;
 	struct filter *filter;
+	int depth;
 };
 
 static void show_commit(struct traversal_context *ctx,
@@ -118,7 +120,9 @@ static void process_tree_contents(struct traversal_context *ctx,
 				    entry.path, oid_to_hex(&tree->object.oid));
 			}
 			t->object.flags |= NOT_USER_GIVEN;
+			ctx->depth++;
 			process_tree(ctx, t, base, entry.path);
+			ctx->depth--;
 		}
 		else if (S_ISGITLINK(entry.mode))
 			; /* ignore gitlink */
@@ -156,6 +160,9 @@ static void process_tree(struct traversal_context *ctx,
 	    !revs->include_check_obj(&tree->object, revs->include_check_data))
 		return;
 
+	if (ctx->depth > max_allowed_tree_depth)
+		die("exceeded maximum allowed tree depth");
+
 	failed_parse = parse_tree_gently(tree, 1);
 	if (failed_parse) {
 		if (revs->ignore_missing_links)
@@ -349,6 +356,7 @@ static void traverse_non_commits(struct traversal_context *ctx,
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
+			ctx->depth = 0;
 			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh
index 93ec41df03..f5d284b16e 100755
--- a/t/t6700-tree-depth.sh
+++ b/t/t6700-tree-depth.sh
@@ -72,4 +72,13 @@ test_expect_success 'default limit for ls-tree fails gracefully' '
 	test_must_fail git ls-tree -r big >/dev/null
 '
 
+test_expect_success 'limit recursion of rev-list --objects' '
+	git $small_ok rev-list --objects small >/dev/null &&
+	test_must_fail git $small_no rev-list --objects small >/dev/null
+'
+
+test_expect_success 'default limit for rev-list fails gracefully' '
+	test_must_fail git rev-list --objects big >/dev/null
+'
+
 test_done
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/10] tree-diff: respect max_allowed_tree_depth
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (7 preceding siblings ...)
  2023-08-31  6:22 ` [PATCH 08/10] list-objects: " Jeff King
@ 2023-08-31  6:22 ` Jeff King
  2023-08-31  6:23 ` [PATCH 10/10] lower core.maxTreeDepth default to 2048 Jeff King
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:22 UTC (permalink / raw)
  To: git

When diffing trees, we recurse to handle subtrees. That means we may run
out of stack space and segfault. Let's teach this code path about
core.maxTreeDepth in order to fail more gracefully.

As with the previous patch, we have no way to return an error (and other
tree-loading problems would just cause us to die()). So we'll likewise
call die() if we exceed the maximum depth.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6700-tree-depth.sh |  9 +++++++++
 tree-diff.c           | 23 +++++++++++++++--------
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh
index f5d284b16e..e410c41234 100755
--- a/t/t6700-tree-depth.sh
+++ b/t/t6700-tree-depth.sh
@@ -81,4 +81,13 @@ test_expect_success 'default limit for rev-list fails gracefully' '
 	test_must_fail git rev-list --objects big >/dev/null
 '
 
+test_expect_success 'limit recursion of diff-tree -r' '
+	git $small_ok diff-tree -r $EMPTY_TREE small &&
+	test_must_fail git $small_no diff-tree -r $EMPTY_TREE small
+'
+
+test_expect_success 'default limit for diff-tree fails gracefully' '
+	test_must_fail git diff-tree -r $EMPTY_TREE big
+'
+
 test_done
diff --git a/tree-diff.c b/tree-diff.c
index 8fc159b86e..46107772d1 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -7,6 +7,7 @@
 #include "hash.h"
 #include "tree.h"
 #include "tree-walk.h"
+#include "environment.h"
 
 /*
  * Some mode bits are also used internally for computations.
@@ -45,7 +46,8 @@
 static struct combine_diff_path *ll_diff_tree_paths(
 	struct combine_diff_path *p, const struct object_id *oid,
 	const struct object_id **parents_oid, int nparent,
-	struct strbuf *base, struct diff_options *opt);
+	struct strbuf *base, struct diff_options *opt,
+	int depth);
 static void ll_diff_tree_oid(const struct object_id *old_oid,
 			     const struct object_id *new_oid,
 			     struct strbuf *base, struct diff_options *opt);
@@ -196,7 +198,7 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
 static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 	struct strbuf *base, struct diff_options *opt, int nparent,
 	struct tree_desc *t, struct tree_desc *tp,
-	int imin)
+	int imin, int depth)
 {
 	unsigned short mode;
 	const char *path;
@@ -302,7 +304,8 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 
 		strbuf_add(base, path, pathlen);
 		strbuf_addch(base, '/');
-		p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt);
+		p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt,
+				       depth + 1);
 		FAST_ARRAY_FREE(parents_oid, nparent);
 	}
 
@@ -423,12 +426,16 @@ static inline void update_tp_entries(struct tree_desc *tp, int nparent)
 static struct combine_diff_path *ll_diff_tree_paths(
 	struct combine_diff_path *p, const struct object_id *oid,
 	const struct object_id **parents_oid, int nparent,
-	struct strbuf *base, struct diff_options *opt)
+	struct strbuf *base, struct diff_options *opt,
+	int depth)
 {
 	struct tree_desc t, *tp;
 	void *ttree, **tptree;
 	int i;
 
+	if (depth > max_allowed_tree_depth)
+		die("exceeded maximum allowed tree depth");
+
 	FAST_ARRAY_ALLOC(tp, nparent);
 	FAST_ARRAY_ALLOC(tptree, nparent);
 
@@ -522,7 +529,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
 
 			/* D += {δ(t,pi) if pi=p[imin];  "+a" if pi > p[imin]} */
 			p = emit_path(p, base, opt, nparent,
-					&t, tp, imin);
+					&t, tp, imin, depth);
 
 		skip_emit_t_tp:
 			/* t↓,  ∀ pi=p[imin]  pi↓ */
@@ -534,7 +541,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
 		else if (cmp < 0) {
 			/* D += "+t" */
 			p = emit_path(p, base, opt, nparent,
-					&t, /*tp=*/NULL, -1);
+					&t, /*tp=*/NULL, -1, depth);
 
 			/* t↓ */
 			update_tree_entry(&t);
@@ -550,7 +557,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
 			}
 
 			p = emit_path(p, base, opt, nparent,
-					/*t=*/NULL, tp, imin);
+					/*t=*/NULL, tp, imin, depth);
 
 		skip_emit_tp:
 			/* ∀ pi=p[imin]  pi↓ */
@@ -572,7 +579,7 @@ struct combine_diff_path *diff_tree_paths(
 	const struct object_id **parents_oid, int nparent,
 	struct strbuf *base, struct diff_options *opt)
 {
-	p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt);
+	p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt, 0);
 
 	/*
 	 * free pre-allocated last element, if any
-- 
2.42.0.561.gaa987ecc69


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/10] lower core.maxTreeDepth default to 2048
  2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
                   ` (8 preceding siblings ...)
  2023-08-31  6:22 ` [PATCH 09/10] tree-diff: " Jeff King
@ 2023-08-31  6:23 ` Jeff King
  2023-08-31 10:39   ` Oswald Buddenhagen
  9 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2023-08-31  6:23 UTC (permalink / raw)
  To: git

On my Linux system, all of our recursive tree walking algorithms can run
up to the 4096 default limit without segfaulting. But not all platforms
will have stack sizes as generous (nor might even Linux if we kick off a
recursive walk within a thread).

In particular, several of the tests added in the previous few commits
fail in our Windows CI environment. Through some guess-and-check
pushing, I found that 3072 is still too much, but 2048 is OK.

These are obviously vague heuristics, and there is nothing to promise
that another system might not have trouble at even lower values. But it
seems unlikely anybody will be too angry about a 2048-depth limit (this
is close to the default max-pathname limit on Linux even for a
pathological path like "a/a/a/..."). So let's just lower it.

Some alternatives are:

  - configure separate defaults for Windows versus other platforms.

  - just skip the tests on Windows. This leaves Windows users with the
    annoying case that they can be crashed by running out of stack
    space, but there shouldn't be any security implications (they can't
    go deep enough to hit integer overflow problems).

Since the original default was arbitrary, it seems less confusing to
just lower it, keeping behavior consistent across platforms.

Signed-off-by: Jeff King <peff@peff.net>
---
Arguably this could be squashed into patch 5. But I thought that
following the sequence of logic (from "4096 is probably OK" to "whoops,
it's not") had some value to share. AFAIK GitHub is still running with
the 4096 limit; I discovered the Windows issue only while preparing this
for upstream. But I still find it highly unlikely that somebody would
run into it in practice.

 environment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/environment.c b/environment.c
index 8e25b5ef02..bb3c2a96a3 100644
--- a/environment.c
+++ b/environment.c
@@ -81,7 +81,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
-int max_allowed_tree_depth = 4096;
+int max_allowed_tree_depth = 2048;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
-- 
2.42.0.561.gaa987ecc69

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/10] lower core.maxTreeDepth default to 2048
  2023-08-31  6:23 ` [PATCH 10/10] lower core.maxTreeDepth default to 2048 Jeff King
@ 2023-08-31 10:39   ` Oswald Buddenhagen
  2023-08-31 17:42     ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Oswald Buddenhagen @ 2023-08-31 10:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
>But I thought that
>following the sequence of logic (from "4096 is probably OK" to "whoops,
>it's not") had some value to share.
>
of course, but you can just integrate that into the squashed commit 
message. having it all in one place makes it easier to follow.

regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/10] lower core.maxTreeDepth default to 2048
  2023-08-31 10:39   ` Oswald Buddenhagen
@ 2023-08-31 17:42     ` Jeff King
  2023-08-31 21:06       ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2023-08-31 17:42 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: git

On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote:

> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
> > But I thought that
> > following the sequence of logic (from "4096 is probably OK" to "whoops,
> > it's not") had some value to share.
> > 
> of course, but you can just integrate that into the squashed commit message.
> having it all in one place makes it easier to follow.

Yes, though I think having it as a separate patch makes it easier to
revisit later (e.g., by reverting or by replacing the patch during a
re-roll).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/10] lower core.maxTreeDepth default to 2048
  2023-08-31 17:42     ` Jeff King
@ 2023-08-31 21:06       ` Junio C Hamano
  2023-08-31 21:59         ` rsbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2023-08-31 21:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Oswald Buddenhagen, git

Jeff King <peff@peff.net> writes:

> On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote:
>
>> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
>> > But I thought that
>> > following the sequence of logic (from "4096 is probably OK" to "whoops,
>> > it's not") had some value to share.
>> > 
>> of course, but you can just integrate that into the squashed commit message.
>> having it all in one place makes it easier to follow.
>
> Yes, though I think having it as a separate patch makes it easier to
> revisit later (e.g., by reverting or by replacing the patch during a
> re-roll).

I am on the fence.  Having it squashed into the same step as it was
introduced may reduce the patch count, but then it would not be easy
to explain why 2048 is a reasonable default at that step when no
code actually uses the variable, so the end result is not all that
easier to follow and read, as that earlier step would be handwaving
"2048 is good at the end of the series, trust me", unlike having it
at the end.  When 4096 is introduced as a "random number that seems
larger than large enough" in the earlier step, it might be worth
mentioning that it is a tentative default and may turn out to be
larger than necessary in which case we may want to shrink it ;-)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 10/10] lower core.maxTreeDepth default to 2048
  2023-08-31 21:06       ` Junio C Hamano
@ 2023-08-31 21:59         ` rsbecker
  2023-08-31 22:31           ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: rsbecker @ 2023-08-31 21:59 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Jeff King'
  Cc: 'Oswald Buddenhagen', git

On Thursday, August 31, 2023 5:06 PM, Junio C Hamano wrote:
>Jeff King <peff@peff.net> writes:
>
>> On Thu, Aug 31, 2023 at 12:39:37PM +0200, Oswald Buddenhagen wrote:
>>
>>> On Thu, Aug 31, 2023 at 02:23:20AM -0400, Jeff King wrote:
>>> > But I thought that
>>> > following the sequence of logic (from "4096 is probably OK" to
>>> > "whoops, it's not") had some value to share.
>>> >
>>> of course, but you can just integrate that into the squashed commit
message.
>>> having it all in one place makes it easier to follow.
>>
>> Yes, though I think having it as a separate patch makes it easier to
>> revisit later (e.g., by reverting or by replacing the patch during a
>> re-roll).
>
>I am on the fence.  Having it squashed into the same step as it was
introduced may
>reduce the patch count, but then it would not be easy to explain why 2048
is a
>reasonable default at that step when no code actually uses the variable, so
the end
>result is not all that easier to follow and read, as that earlier step
would be
>handwaving
>"2048 is good at the end of the series, trust me", unlike having it at the
end.  When
>4096 is introduced as a "random number that seems larger than large enough"
in the
>earlier step, it might be worth mentioning that it is a tentative default
and may turn
>out to be larger than necessary in which case we may want to shrink it ;-)

I have been trying to figure out the implications of this and went down the
wrong rabbit hole. Are we taking about the tree depth of the underlying
Merkel Tree (no) or the tree-ish thing representing the file system
(apparently yes). In this case, a practical depth of 2048 hits the exact max
path size on the NonStop platform, so I have no issue there. My concern is
one of terminology. My assumption of what maxTreeDepth meant, from other
terminology used in git, seemed (wrongly) to align with the use of --depth=n
where n<maxTreeDepth parameters for commands like fetch. From a user
intuition (arguably, if I have any here) is that the parameter should be
more of a path nomenclature, like maxPathHeight or maxHierarchyHeight rather
than what is currently in flight. Just my opinion and I'm fine no matter
which way.

--Randall
--
Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)
NonStop(211288444200000000)
-- In real life, I talk too much.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/10] lower core.maxTreeDepth default to 2048
  2023-08-31 21:59         ` rsbecker
@ 2023-08-31 22:31           ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2023-08-31 22:31 UTC (permalink / raw)
  To: rsbecker; +Cc: 'Junio C Hamano', 'Oswald Buddenhagen', git

On Thu, Aug 31, 2023 at 05:59:28PM -0400, rsbecker@nexbridge.com wrote:

> I have been trying to figure out the implications of this and went down the
> wrong rabbit hole. Are we taking about the tree depth of the underlying
> Merkel Tree (no) or the tree-ish thing representing the file system
> (apparently yes). In this case, a practical depth of 2048 hits the exact max
> path size on the NonStop platform, so I have no issue there. My concern is
> one of terminology. My assumption of what maxTreeDepth meant, from other
> terminology used in git, seemed (wrongly) to align with the use of --depth=n
> where n<maxTreeDepth parameters for commands like fetch. From a user
> intuition (arguably, if I have any here) is that the parameter should be
> more of a path nomenclature, like maxPathHeight or maxHierarchyHeight rather
> than what is currently in flight. Just my opinion and I'm fine no matter
> which way.

The documentation from the patch is:

  +core.maxTreeDepth::
  +       The maximum depth Git is willing to recurse while traversing a
  +       tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe
  +       to allow Git to abort cleanly, and should not generally need to
  +       be adjusted. The default is 4096.

Does reading that answer your question and make the meaning clear? If
not, can you suggest any changes?

I'd like to stick with "depth" here as it's commonly used in other
places to mean the same thing (e.g., git-grep's --max-depth option).

I also think this is something that most people will remain completely
oblivious to, as you'd only hit it for absurd cases.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-31 22:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-31  6:17 [PATCH 0/10] tree name and depth limits Jeff King
2023-08-31  6:17 ` [PATCH 01/10] tree-walk: reduce stack size for recursive functions Jeff King
2023-08-31  6:19 ` [PATCH 02/10] tree-walk: drop MAX_TRAVERSE_TREES macro Jeff King
2023-08-31  6:19 ` [PATCH 03/10] tree-walk: rename "error" variable Jeff King
2023-08-31  6:20 ` [PATCH 04/10] fsck: detect very large tree pathnames Jeff King
2023-08-31  6:21 ` [PATCH 05/10] add core.maxTreeDepth config Jeff King
2023-08-31  6:21 ` [PATCH 06/10] traverse_trees(): respect max_allowed_tree_depth Jeff King
2023-08-31  6:21 ` [PATCH 07/10] read_tree(): " Jeff King
2023-08-31  6:22 ` [PATCH 08/10] list-objects: " Jeff King
2023-08-31  6:22 ` [PATCH 09/10] tree-diff: " Jeff King
2023-08-31  6:23 ` [PATCH 10/10] lower core.maxTreeDepth default to 2048 Jeff King
2023-08-31 10:39   ` Oswald Buddenhagen
2023-08-31 17:42     ` Jeff King
2023-08-31 21:06       ` Junio C Hamano
2023-08-31 21:59         ` rsbecker
2023-08-31 22:31           ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.