All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Improving performance of git clean
@ 2015-04-25  9:06 Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 1/5] setup: add gentle version of read_gitfile Erik Elfström
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-25  9:06 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

v3 of the patch can be found here:
http://thread.gmane.org/gmane.comp.version-control.git/267422

Changes in v4:
* changed some tests to use more meaningful dir names.
* fixed performance test by doing "git clean -n" to avoid
  timing setup code. Increased test size to 100000 directories
  (~0.5s runtime).
* changed interface of read_gitfile_gently to be able to
  return error code.
* fixed a compiler warning in read_gitfile_gently ("warning:
  ‘dir’ may be used uninitialized in this function").
* added sanity check of git file size in read_gitfile_gently
* updated commit message in [5/5] to more clearly motivate
  remaining behavioral changes of git clean.

Thanks to Junio C Hamano and Jeff King for comments and help on v3.

Erik Elfström (5):
  setup: add gentle version of read_gitfile
  setup: sanity check file size in read_gitfile_gently
  t7300: add tests to document behavior of clean and nested git
  p7300: add performance tests for clean
  clean: improve performance when removing lots of directories

 builtin/clean.c       |  26 +++++++++--
 cache.h               |   3 +-
 setup.c               |  88 ++++++++++++++++++++++++++++-------
 t/perf/p7300-clean.sh |  31 +++++++++++++
 t/t7300-clean.sh      | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 252 insertions(+), 22 deletions(-)
 create mode 100755 t/perf/p7300-clean.sh

-- 
2.4.0.rc3.8.g4ebd28d

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 1/5] setup: add gentle version of read_gitfile
  2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
@ 2015-04-25  9:06 ` Erik Elfström
  2015-04-25 16:51   ` Junio C Hamano
  2015-04-25  9:06 ` [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Erik Elfström @ 2015-04-25  9:06 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

read_gitfile will die on most error cases. This makes it unsuitable
for speculative calls. Extract the core logic and provide a gentle
version that returns NULL on failure.

The first usecase of the new gentle version will be to probe for
submodules during git clean.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 cache.h |  3 ++-
 setup.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 67 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index 3d3244b..6e29068 100644
--- a/cache.h
+++ b/cache.h
@@ -431,7 +431,8 @@ extern int set_git_dir(const char *path);
 extern const char *get_git_namespace(void);
 extern const char *strip_namespace(const char *namespaced_ref);
 extern const char *get_git_work_tree(void);
-extern const char *read_gitfile(const char *path);
+extern const char *read_gitfile_gently(const char *path, int *return_error_code);
+#define read_gitfile(path) read_gitfile_gently((path), NULL)
 extern const char *resolve_gitdir(const char *suspect);
 extern void set_git_work_tree(const char *tree);
 
diff --git a/setup.c b/setup.c
index 979b13f..e1897cc 100644
--- a/setup.c
+++ b/setup.c
@@ -335,35 +335,53 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
 /*
  * Try to read the location of the git directory from the .git file,
  * return path to git directory if found.
+ *
+ * On failure, if return_error_code is not NULL, return_error_code
+ * will be set to an error code and NULL will be returned. If
+ * return_error_code is NULL the function will die instead (for most
+ * cases).
  */
-const char *read_gitfile(const char *path)
+const char *read_gitfile_gently(const char *path, int *return_error_code)
 {
-	char *buf;
-	char *dir;
+	int error_code = 0;
+	char *buf = NULL;
+	char *dir = NULL;
 	const char *slash;
 	struct stat st;
 	int fd;
 	ssize_t len;
 
-	if (stat(path, &st))
-		return NULL;
-	if (!S_ISREG(st.st_mode))
-		return NULL;
+	if (stat(path, &st)) {
+		error_code = 1;
+		goto cleanup_return;
+	}
+	if (!S_ISREG(st.st_mode)) {
+		error_code = 2;
+		goto cleanup_return;
+	}
 	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		die_errno("Error opening '%s'", path);
+	if (fd < 0) {
+		error_code = 3;
+		goto cleanup_return;
+	}
 	buf = xmalloc(st.st_size + 1);
 	len = read_in_full(fd, buf, st.st_size);
 	close(fd);
-	if (len != st.st_size)
-		die("Error reading %s", path);
+	if (len != st.st_size) {
+		error_code = 4;
+		goto cleanup_return;
+	}
 	buf[len] = '\0';
-	if (!starts_with(buf, "gitdir: "))
-		die("Invalid gitfile format: %s", path);
+	if (!starts_with(buf, "gitdir: ")) {
+		error_code = 5;
+		goto cleanup_return;
+	}
 	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
 		len--;
-	if (len < 9)
-		die("No path in gitfile: %s", path);
+	if (len < 9) {
+		error_code = 6;
+		goto cleanup_return;
+	}
 	buf[len] = '\0';
 	dir = buf + 8;
 
@@ -378,11 +396,41 @@ const char *read_gitfile(const char *path)
 		buf = dir;
 	}
 
-	if (!is_git_directory(dir))
-		die("Not a git repository: %s", dir);
+	if (!is_git_directory(dir)) {
+		error_code = 7;
+		goto cleanup_return;
+	}
 	path = real_path(dir);
 
+cleanup_return:
 	free(buf);
+
+	if (return_error_code)
+		*return_error_code = error_code;
+
+	if (error_code) {
+		if (return_error_code)
+			return NULL;
+
+		switch (error_code) {
+		case 1: // failed to stat
+		case 2: // not regular file
+			return NULL;
+		case 3:
+			die_errno("Error opening '%s'", path);
+		case 4:
+			die("Error reading %s", path);
+		case 5:
+			die("Invalid gitfile format: %s", path);
+		case 6:
+			die("No path in gitfile: %s", path);
+		case 7:
+			die("Not a git repository: %s", dir);
+		default:
+			assert(0);
+		}
+	}
+
 	return path;
 }
 
-- 
2.4.0.rc3.8.g4ebd28d

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 1/5] setup: add gentle version of read_gitfile Erik Elfström
@ 2015-04-25  9:06 ` Erik Elfström
  2015-04-25 16:47   ` Junio C Hamano
  2015-04-25  9:06 ` [PATCH v4 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Erik Elfström @ 2015-04-25  9:06 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

read_gitfile_gently will allocate a buffer to fit the entire file that
should be read. Add a sanity check of the file size before opening to
avoid allocating a potentially huge amount of memory if we come across
a large file that someone happened to name ".git". The limit is set to
a sufficiently unreasonable size that should never be exceeded by a
genuine .git file.

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---

I'm not sure about this one but it felt like the safe thing to do.
This patch can be dropped if it is not desired.

I considered testing it using
 "mkdir foo && truncate -s 200G foo/.git && git clean -f -d"
but that feels like a pretty evil test that is likely to cause lots
of problems and not fail in any good way.

 setup.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/setup.c b/setup.c
index e1897cc..ed87334 100644
--- a/setup.c
+++ b/setup.c
@@ -364,22 +364,26 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 		error_code = 3;
 		goto cleanup_return;
 	}
+	if (st.st_size > PATH_MAX * 4) {
+		error_code = 4;
+		goto cleanup_return;
+	}
 	buf = xmalloc(st.st_size + 1);
 	len = read_in_full(fd, buf, st.st_size);
 	close(fd);
 	if (len != st.st_size) {
-		error_code = 4;
+		error_code = 5;
 		goto cleanup_return;
 	}
 	buf[len] = '\0';
 	if (!starts_with(buf, "gitdir: ")) {
-		error_code = 5;
+		error_code = 6;
 		goto cleanup_return;
 	}
 	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
 		len--;
 	if (len < 9) {
-		error_code = 6;
+		error_code = 7;
 		goto cleanup_return;
 	}
 	buf[len] = '\0';
@@ -397,7 +401,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 	}
 
 	if (!is_git_directory(dir)) {
-		error_code = 7;
+		error_code = 8;
 		goto cleanup_return;
 	}
 	path = real_path(dir);
@@ -419,12 +423,14 @@ cleanup_return:
 		case 3:
 			die_errno("Error opening '%s'", path);
 		case 4:
-			die("Error reading %s", path);
+			die("Too large to be a .git file: '%s'", path);
 		case 5:
-			die("Invalid gitfile format: %s", path);
+			die("Error reading %s", path);
 		case 6:
-			die("No path in gitfile: %s", path);
+			die("Invalid gitfile format: %s", path);
 		case 7:
+			die("No path in gitfile: %s", path);
+		case 8:
 			die("Not a git repository: %s", dir);
 		default:
 			assert(0);
-- 
2.4.0.rc3.8.g4ebd28d

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 3/5] t7300: add tests to document behavior of clean and nested git
  2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 1/5] setup: add gentle version of read_gitfile Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
@ 2015-04-25  9:06 ` Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 4/5] p7300: add performance tests for clean Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 5/5] clean: improve performance when removing lots of directories Erik Elfström
  4 siblings, 0 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-25  9:06 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 t/t7300-clean.sh | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 99be5d9..11f3a6d 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,6 +455,134 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
+test_expect_failure 'should clean things that almost look like git but are not' '
+	rm -fr almost_git almost_bare_git almost_submodule &&
+	mkdir -p almost_git/.git/objects &&
+	mkdir -p almost_git/.git/refs &&
+	cat >almost_git/.git/HEAD <<-\EOF &&
+	garbage
+	EOF
+	cp -r almost_git/.git/ almost_bare_git &&
+	mkdir almost_submodule/ &&
+	cat >almost_submodule/.git <<-\EOF &&
+	garbage
+	EOF
+	test_when_finished "rm -rf almost_*" &&
+	## This will fail due to die("Invalid gitfile format: %s", path); in
+	## setup.c:read_gitfile.
+	git clean -f -d &&
+	test_path_is_missing almost_git &&
+	test_path_is_missing almost_bare_git &&
+	test_path_is_missing almost_submodule
+'
+
+test_expect_success 'should not clean submodules' '
+	rm -fr repo to_clean sub1 sub2 &&
+	mkdir repo to_clean &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git submodule add ./repo/.git sub1 &&
+	git commit -m "sub1" &&
+	git branch before_sub2 &&
+	git submodule add ./repo/.git sub2 &&
+	git commit -m "sub2" &&
+	git checkout before_sub2 &&
+	>to_clean/should_clean.this &&
+	git clean -f -d &&
+	test_path_is_file repo/.git/index &&
+	test_path_is_file repo/hello.world &&
+	test_path_is_file sub1/.git &&
+	test_path_is_file sub1/hello.world &&
+	test_path_is_file sub2/.git &&
+	test_path_is_file sub2/hello.world &&
+	test_path_is_missing to_clean
+'
+
+test_expect_failure 'nested (empty) git should be kept' '
+	rm -fr empty_repo to_clean &&
+	git init empty_repo &&
+	mkdir to_clean &&
+	>to_clean/should_clean.this &&
+	git clean -f -d &&
+	test_path_is_file empty_repo/.git/HEAD &&
+	test_path_is_missing to_clean
+'
+
+test_expect_success 'nested bare repositories should be cleaned' '
+	rm -fr bare1 bare2 subdir &&
+	git init --bare bare1 &&
+	git clone --local --bare . bare2 &&
+	mkdir subdir &&
+	cp -r bare2 subdir/bare3 &&
+	git clean -f -d &&
+	test_path_is_missing bare1 &&
+	test_path_is_missing bare2 &&
+	test_path_is_missing subdir
+'
+
+test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+	rm -fr strange_bare &&
+	mkdir strange_bare &&
+	git init --bare strange_bare/.git &&
+	git clean -f -d &&
+	test_path_is_missing strange_bare
+'
+
+test_expect_failure 'nested (non-empty) bare repositories should be cleaned even when in .git' '
+	rm -fr strange_bare &&
+	mkdir strange_bare &&
+	git clone --local --bare . strange_bare/.git &&
+	git clean -f -d &&
+	test_path_is_missing strange_bare
+'
+
+test_expect_success 'giving path in nested git work tree will remove it' '
+	rm -fr repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		mkdir -p bar/baz &&
+		test_commit msg bar/baz/hello.world
+	) &&
+	git clean -f -d repo/bar/baz &&
+	test_path_is_file repo/.git/HEAD &&
+	test_path_is_dir repo/bar/ &&
+	test_path_is_missing repo/bar/baz
+'
+
+test_expect_success 'giving path to nested .git will not remove it' '
+	rm -fr repo &&
+	mkdir repo untracked &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git clean -f -d repo/.git &&
+	test_path_is_file repo/.git/HEAD &&
+	test_path_is_dir repo/.git/refs &&
+	test_path_is_dir repo/.git/objects &&
+	test_path_is_dir untracked/
+'
+
+test_expect_success 'giving path to nested .git/ will remove contents' '
+	rm -fr repo untracked &&
+	mkdir repo untracked &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git clean -f -d repo/.git/ &&
+	test_path_is_dir repo/.git &&
+	test_dir_is_empty repo/.git &&
+	test_path_is_dir untracked/
+'
+
 test_expect_success 'force removal of nested git work tree' '
 	rm -fr foo bar baz &&
 	mkdir -p foo bar baz/boo &&
-- 
2.4.0.rc3.8.g4ebd28d

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 4/5] p7300: add performance tests for clean
  2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
                   ` (2 preceding siblings ...)
  2015-04-25  9:06 ` [PATCH v4 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
@ 2015-04-25  9:06 ` Erik Elfström
  2015-04-25  9:06 ` [PATCH v4 5/5] clean: improve performance when removing lots of directories Erik Elfström
  4 siblings, 0 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-25  9:06 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

The tests are run in dry-run mode to avoid having to restore the test
directories for each timed iteration. Using dry-run is an acceptable
compromise since we are mostly interested in the initial computation
of what to clean and not so much in the cleaning it self.

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 t/perf/p7300-clean.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100755 t/perf/p7300-clean.sh

diff --git a/t/perf/p7300-clean.sh b/t/perf/p7300-clean.sh
new file mode 100755
index 0000000..6ae55ec
--- /dev/null
+++ b/t/perf/p7300-clean.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description="Test git-clean performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+test_expect_success 'setup untracked directory with many sub dirs' '
+	rm -rf 500_sub_dirs 100000_sub_dirs clean_test_dir &&
+	mkdir 500_sub_dirs 100000_sub_dirs clean_test_dir &&
+	for i in $(test_seq 1 500)
+	do
+		mkdir 500_sub_dirs/dir$i || return $?
+	done &&
+	for i in $(test_seq 1 200)
+	do
+		cp -r 500_sub_dirs 100000_sub_dirs/dir$i || return $?
+	done
+'
+
+test_perf 'clean many untracked sub dirs, check for nested git' '
+	git clean -n -q -f -d 100000_sub_dirs/
+'
+
+test_perf 'clean many untracked sub dirs, ignore nested git' '
+	git clean -n -q -f -f -d 100000_sub_dirs/
+'
+
+test_done
-- 
2.4.0.rc3.8.g4ebd28d

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 5/5] clean: improve performance when removing lots of directories
  2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
                   ` (3 preceding siblings ...)
  2015-04-25  9:06 ` [PATCH v4 4/5] p7300: add performance tests for clean Erik Elfström
@ 2015-04-25  9:06 ` Erik Elfström
  4 siblings, 0 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-25  9:06 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

"git clean" uses resolve_gitlink_ref() to check for the presence of
nested git repositories, but it has the drawback of creating a
ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list causes a massive
performance hit for large number of directories.

Modify clean.c:remove_dirs to use setup.c:is_git_directory and
setup.c:read_gitfile_gently instead.

Both these functions will open files and parse contents when they find
something that looks like a git repository. This is ok from a
performance standpoint since finding repository candidates should be
comparatively rare.

Using is_git_directory and read_gitfile_gently should give a more
standardized check for what is and what isn't a git repository but
also gives three behavioral changes.

The first change is that we will now detect and avoid cleaning empty
nested git repositories (only init run). This is desirable.

Second, we will no longer die when cleaning a file named ".git" with
garbage content (it will be cleaned instead). This is also desirable.

The last change is that we will detect and avoid cleaning empty bare
repositories that have been placed in a directory named ".git". This
is not desirable but should have no real user impact since we already
fail to clean non-empty bare repositories in the same scenario. This
is thus deemed acceptable.

Update t7300 to reflect these changes in behavior.

The time to clean an untracked directory containing 100000 sub
directories went from 61s to 1.7s after this change.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 builtin/clean.c  | 26 ++++++++++++++++++++++----
 t/t7300-clean.sh |  8 +++-----
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 98c103f..17a1c13 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -10,7 +10,6 @@
 #include "cache.h"
 #include "dir.h"
 #include "parse-options.h"
-#include "refs.h"
 #include "string-list.h"
 #include "quote.h"
 #include "column.h"
@@ -148,6 +147,27 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+/*
+ * Return 1 if the given path is the root of a git repository or
+ * submodule else 0. Will not return 1 for bare repositories with the
+ * exception of creating a bare repository in "foo/.git" and calling
+ * is_git_repository("foo").
+ */
+static int is_git_repository(struct strbuf *path)
+{
+	int ret = 0;
+	int unused_error_code;
+	size_t orig_path_len = path->len;
+	assert(orig_path_len != 0);
+	if (path->buf[orig_path_len - 1] != '/')
+		strbuf_addch(path, '/');
+	strbuf_addstr(path, ".git");
+	if (read_gitfile_gently(path->buf, &unused_error_code) || is_git_directory(path->buf))
+		ret = 1;
+	strbuf_setlen(path, orig_path_len);
+	return ret;
+}
+
 static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 		int dry_run, int quiet, int *dir_gone)
 {
@@ -155,13 +175,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 	struct strbuf quoted = STRBUF_INIT;
 	struct dirent *e;
 	int res = 0, ret = 0, gone = 1, original_len = path->len, len;
-	unsigned char submodule_head[20];
 	struct string_list dels = STRING_LIST_INIT_DUP;
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
-			!resolve_gitlink_ref(path->buf, "HEAD", submodule_head)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_git_repository(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 11f3a6d..4d07a4a 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
-test_expect_failure 'should clean things that almost look like git but are not' '
+test_expect_success 'should clean things that almost look like git but are not' '
 	rm -fr almost_git almost_bare_git almost_submodule &&
 	mkdir -p almost_git/.git/objects &&
 	mkdir -p almost_git/.git/refs &&
@@ -468,8 +468,6 @@ test_expect_failure 'should clean things that almost look like git but are not'
 	garbage
 	EOF
 	test_when_finished "rm -rf almost_*" &&
-	## This will fail due to die("Invalid gitfile format: %s", path); in
-	## setup.c:read_gitfile.
 	git clean -f -d &&
 	test_path_is_missing almost_git &&
 	test_path_is_missing almost_bare_git &&
@@ -501,7 +499,7 @@ test_expect_success 'should not clean submodules' '
 	test_path_is_missing to_clean
 '
 
-test_expect_failure 'nested (empty) git should be kept' '
+test_expect_success 'nested (empty) git should be kept' '
 	rm -fr empty_repo to_clean &&
 	git init empty_repo &&
 	mkdir to_clean &&
@@ -523,7 +521,7 @@ test_expect_success 'nested bare repositories should be cleaned' '
 	test_path_is_missing subdir
 '
 
-test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+test_expect_failure 'nested (empty) bare repositories should be cleaned even when in .git' '
 	rm -fr strange_bare &&
 	mkdir strange_bare &&
 	git init --bare strange_bare/.git &&
-- 
2.4.0.rc3.8.g4ebd28d

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-25  9:06 ` [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
@ 2015-04-25 16:47   ` Junio C Hamano
  2015-04-25 17:59     ` Erik Elfström
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2015-04-25 16:47 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git

Erik Elfström <erik.elfstrom@gmail.com> writes:

> read_gitfile_gently will allocate a buffer to fit the entire file that
> should be read. Add a sanity check of the file size before opening to
> avoid allocating a potentially huge amount of memory if we come across
> a large file that someone happened to name ".git". The limit is set to
> a sufficiently unreasonable size that should never be exceeded by a
> genuine .git file.
>
> Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
> ---
>
> I'm not sure about this one but it felt like the safe thing to do.
> This patch can be dropped if it is not desired.

I do not think it is wrong per-se, but the changes in this patch
shows why hardcoded values assigned to error_code without #define is
not a good idea, as these values are now exposed to the callers of
the new function.  After we gain a new caller that does care about
the exact error code (e.g. to react differently to the reason why we
failed to read by giving different error messages) if we decide to
revert this step, or if we decide to add a new error type, for
whatever reason, this organization forces the caller to be updated.

> I considered testing it using
>  "mkdir foo && truncate -s 200G foo/.git && git clean -f -d"
> but that feels like a pretty evil test that is likely to cause lots
> of problems and not fail in any good way.

Amen to that.

>
>  setup.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index e1897cc..ed87334 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -364,22 +364,26 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
>  		error_code = 3;
>  		goto cleanup_return;
>  	}
> +	if (st.st_size > PATH_MAX * 4) {
> +		error_code = 4;
> +		goto cleanup_return;
> +	}
>  	buf = xmalloc(st.st_size + 1);
>  	len = read_in_full(fd, buf, st.st_size);
>  	close(fd);
>  	if (len != st.st_size) {
> -		error_code = 4;
> +		error_code = 5;
>  		goto cleanup_return;
>  	}
>  	buf[len] = '\0';
>  	if (!starts_with(buf, "gitdir: ")) {
> -		error_code = 5;
> +		error_code = 6;
>  		goto cleanup_return;
>  	}
>  	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
>  		len--;
>  	if (len < 9) {
> -		error_code = 6;
> +		error_code = 7;
>  		goto cleanup_return;
>  	}
>  	buf[len] = '\0';
> @@ -397,7 +401,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
>  	}
>  
>  	if (!is_git_directory(dir)) {
> -		error_code = 7;
> +		error_code = 8;
>  		goto cleanup_return;
>  	}
>  	path = real_path(dir);
> @@ -419,12 +423,14 @@ cleanup_return:
>  		case 3:
>  			die_errno("Error opening '%s'", path);
>  		case 4:
> -			die("Error reading %s", path);
> +			die("Too large to be a .git file: '%s'", path);
>  		case 5:
> -			die("Invalid gitfile format: %s", path);
> +			die("Error reading %s", path);
>  		case 6:
> -			die("No path in gitfile: %s", path);
> +			die("Invalid gitfile format: %s", path);
>  		case 7:
> +			die("No path in gitfile: %s", path);
> +		case 8:
>  			die("Not a git repository: %s", dir);
>  		default:
>  			assert(0);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/5] setup: add gentle version of read_gitfile
  2015-04-25  9:06 ` [PATCH v4 1/5] setup: add gentle version of read_gitfile Erik Elfström
@ 2015-04-25 16:51   ` Junio C Hamano
  2015-04-25 16:54     ` Junio C Hamano
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2015-04-25 16:51 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git

Erik Elfström <erik.elfstrom@gmail.com> writes:

> read_gitfile will die on most error cases. This makes it unsuitable
> for speculative calls. Extract the core logic and provide a gentle
> version that returns NULL on failure.
>
> The first usecase of the new gentle version will be to probe for
> submodules during git clean.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
> ---
>  cache.h |  3 ++-
>  setup.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++--------------
>  2 files changed, 67 insertions(+), 18 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 3d3244b..6e29068 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -431,7 +431,8 @@ extern int set_git_dir(const char *path);
>  extern const char *get_git_namespace(void);
>  extern const char *strip_namespace(const char *namespaced_ref);
>  extern const char *get_git_work_tree(void);
> -extern const char *read_gitfile(const char *path);
> +extern const char *read_gitfile_gently(const char *path, int *return_error_code);
> +#define read_gitfile(path) read_gitfile_gently((path), NULL)
>  extern const char *resolve_gitdir(const char *suspect);
>  extern void set_git_work_tree(const char *tree);
>  
> diff --git a/setup.c b/setup.c
> index 979b13f..e1897cc 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -335,35 +335,53 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
>  /*
>   * Try to read the location of the git directory from the .git file,
>   * return path to git directory if found.
> + *
> + * On failure, if return_error_code is not NULL, return_error_code
> + * will be set to an error code and NULL will be returned. If
> + * return_error_code is NULL the function will die instead (for most
> + * cases).
>   */
> -const char *read_gitfile(const char *path)
> +const char *read_gitfile_gently(const char *path, int *return_error_code)
>  {
> -	char *buf;
> -	char *dir;
> +	int error_code = 0;
> +	char *buf = NULL;
> +	char *dir = NULL;
>  	const char *slash;
>  	struct stat st;
>  	int fd;
>  	ssize_t len;
>  
> -	if (stat(path, &st))
> -		return NULL;
> -	if (!S_ISREG(st.st_mode))
> -		return NULL;
> +	if (stat(path, &st)) {
> +		error_code = 1;
> +		goto cleanup_return;
> +	}
> +	if (!S_ISREG(st.st_mode)) {
> +		error_code = 2;
> +		goto cleanup_return;
> +	}
>  	fd = open(path, O_RDONLY);
> -	if (fd < 0)
> -		die_errno("Error opening '%s'", path);
> +	if (fd < 0) {
> +		error_code = 3;
> +		goto cleanup_return;
> +	}
>  	buf = xmalloc(st.st_size + 1);
>  	len = read_in_full(fd, buf, st.st_size);
>  	close(fd);
> -	if (len != st.st_size)
> -		die("Error reading %s", path);
> +	if (len != st.st_size) {
> +		error_code = 4;
> +		goto cleanup_return;
> +	}
>  	buf[len] = '\0';
> -	if (!starts_with(buf, "gitdir: "))
> -		die("Invalid gitfile format: %s", path);
> +	if (!starts_with(buf, "gitdir: ")) {
> +		error_code = 5;
> +		goto cleanup_return;
> +	}
>  	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
>  		len--;
> -	if (len < 9)
> -		die("No path in gitfile: %s", path);
> +	if (len < 9) {
> +		error_code = 6;
> +		goto cleanup_return;
> +	}
>  	buf[len] = '\0';
>  	dir = buf + 8;
>  
> @@ -378,11 +396,41 @@ const char *read_gitfile(const char *path)
>  		buf = dir;
>  	}
>  
> -	if (!is_git_directory(dir))
> -		die("Not a git repository: %s", dir);
> +	if (!is_git_directory(dir)) {
> +		error_code = 7;
> +		goto cleanup_return;
> +	}
>  	path = real_path(dir);
>  
> +cleanup_return:
>  	free(buf);
> +
> +	if (return_error_code)
> +		*return_error_code = error_code;
> +
> +	if (error_code) {
> +		if (return_error_code)
> +			return NULL;
> +
> +		switch (error_code) {
> +		case 1: // failed to stat
> +		case 2: // not regular file

Please do not use C++ // comments.

> +			return NULL;
> +		case 3:
> +			die_errno("Error opening '%s'", path);
> +		case 4:
> +			die("Error reading %s", path);
> +		case 5:
> +			die("Invalid gitfile format: %s", path);
> +		case 6:
> +			die("No path in gitfile: %s", path);
> +		case 7:
> +			die("Not a git repository: %s", dir);
> +		default:
> +			assert(0);
> +		}
> +	}
> +
>  	return path;
>  }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/5] setup: add gentle version of read_gitfile
  2015-04-25 16:51   ` Junio C Hamano
@ 2015-04-25 16:54     ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2015-04-25 16:54 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>> +		switch (error_code) {
>> +		case 1: // failed to stat
>> +		case 2: // not regular file
>
> Please do not use C++ // comments.

No need to resend for this; I'll locally amend.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-25 16:47   ` Junio C Hamano
@ 2015-04-25 17:59     ` Erik Elfström
  2015-04-26  4:29       ` Junio C Hamano
  0 siblings, 1 reply; 38+ messages in thread
From: Erik Elfström @ 2015-04-25 17:59 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

On Sat, Apr 25, 2015 at 6:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> I do not think it is wrong per-se, but the changes in this patch
> shows why hardcoded values assigned to error_code without #define is
> not a good idea, as these values are now exposed to the callers of
> the new function.  After we gain a new caller that does care about
> the exact error code (e.g. to react differently to the reason why we
> failed to read by giving different error messages) if we decide to
> revert this step, or if we decide to add a new error type, for
> whatever reason, this organization forces the caller to be updated.

Yes, it was a bit silly of me not to add that. Would something like
this be ok?

---
 cache.h |  9 +++++++++
 setup.c | 32 ++++++++++++++++----------------
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/cache.h b/cache.h
index 6e29068..177337a 100644
--- a/cache.h
+++ b/cache.h
@@ -431,6 +431,15 @@ extern int set_git_dir(const char *path);
 extern const char *get_git_namespace(void);
 extern const char *strip_namespace(const char *namespaced_ref);
 extern const char *get_git_work_tree(void);
+
+#define READ_GITFILE_ERR_STAT_FAILED 1
+#define READ_GITFILE_ERR_NOT_A_FILE 2
+#define READ_GITFILE_ERR_OPEN_FAILED 3
+#define READ_GITFILE_ERR_TOO_LARGE 4
+#define READ_GITFILE_ERR_READ_FAILED 5
+#define READ_GITFILE_ERR_INVALID_FORMAT 6
+#define READ_GITFILE_ERR_NO_PATH 7
+#define READ_GITFILE_ERR_NOT_A_REPO 8
 extern const char *read_gitfile_gently(const char *path, int *return_error_code);
 #define read_gitfile(path) read_gitfile_gently((path), NULL)
 extern const char *resolve_gitdir(const char *suspect);
diff --git a/setup.c b/setup.c
index ed87334..792c37b 100644
--- a/setup.c
+++ b/setup.c
@@ -352,38 +352,38 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 	ssize_t len;
 
 	if (stat(path, &st)) {
-		error_code = 1;
+		error_code = READ_GITFILE_ERR_STAT_FAILED;
 		goto cleanup_return;
 	}
 	if (!S_ISREG(st.st_mode)) {
-		error_code = 2;
+		error_code = READ_GITFILE_ERR_NOT_A_FILE;
 		goto cleanup_return;
 	}
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
-		error_code = 3;
+		error_code = READ_GITFILE_ERR_OPEN_FAILED;
 		goto cleanup_return;
 	}
 	if (st.st_size > PATH_MAX * 4) {
-		error_code = 4;
+		error_code = READ_GITFILE_ERR_TOO_LARGE;
 		goto cleanup_return;
 	}
 	buf = xmalloc(st.st_size + 1);
 	len = read_in_full(fd, buf, st.st_size);
 	close(fd);
 	if (len != st.st_size) {
-		error_code = 5;
+		error_code = READ_GITFILE_ERR_READ_FAILED;
 		goto cleanup_return;
 	}
 	buf[len] = '\0';
 	if (!starts_with(buf, "gitdir: ")) {
-		error_code = 6;
+		error_code = READ_GITFILE_ERR_INVALID_FORMAT;
 		goto cleanup_return;
 	}
 	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
 		len--;
 	if (len < 9) {
-		error_code = 7;
+		error_code = READ_GITFILE_ERR_NO_PATH;
 		goto cleanup_return;
 	}
 	buf[len] = '\0';
@@ -401,7 +401,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 	}
 
 	if (!is_git_directory(dir)) {
-		error_code = 8;
+		error_code = READ_GITFILE_ERR_NOT_A_REPO;
 		goto cleanup_return;
 	}
 	path = real_path(dir);
@@ -417,20 +417,20 @@ cleanup_return:
 			return NULL;
 
 		switch (error_code) {
-		case 1: // failed to stat
-		case 2: // not regular file
+		case READ_GITFILE_ERR_STAT_FAILED:
+		case READ_GITFILE_ERR_NOT_A_FILE:
 			return NULL;
-		case 3:
+		case READ_GITFILE_ERR_OPEN_FAILED:
 			die_errno("Error opening '%s'", path);
-		case 4:
+		case READ_GITFILE_ERR_TOO_LARGE:
 			die("Too large to be a .git file: '%s'", path);
-		case 5:
+		case READ_GITFILE_ERR_READ_FAILED:
 			die("Error reading %s", path);
-		case 6:
+		case READ_GITFILE_ERR_INVALID_FORMAT:
 			die("Invalid gitfile format: %s", path);
-		case 7:
+		case READ_GITFILE_ERR_NO_PATH:
 			die("No path in gitfile: %s", path);
-		case 8:
+		case READ_GITFILE_ERR_NOT_A_REPO:
 			die("Not a git repository: %s", dir);
 		default:
 			assert(0);
-- 
2.4.0.rc3.8.g86acfbd.dirty

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-25 17:59     ` Erik Elfström
@ 2015-04-26  4:29       ` Junio C Hamano
  2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2015-04-26  4:29 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git

Erik Elfström <erik.elfstrom@gmail.com> writes:

> On Sat, Apr 25, 2015 at 6:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> I do not think it is wrong per-se, but the changes in this patch
>> shows why hardcoded values assigned to error_code without #define is
>> not a good idea, as these values are now exposed to the callers of
>> the new function.  After we gain a new caller that does care about
>> the exact error code (e.g. to react differently to the reason why we
>> failed to read by giving different error messages) if we decide to
>> revert this step, or if we decide to add a new error type, for
>> whatever reason, this organization forces the caller to be updated.
>
> Yes, it was a bit silly of me not to add that. Would something like
> this be ok?

Yeah, if you used symbolic constants from the get-go, then the
second patch to add the "too-large? it cannot be gitfile" safety
would not have to renumber everything; instead it would be adding a
single constant to the header file and adding a new case to setup.c
that uses the error code.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 0/5] Improving performance of git clean
  2015-04-26  4:29       ` Junio C Hamano
@ 2015-04-26  6:49         ` Erik Elfström
  2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
                             ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-26  6:49 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

Changes in v5:
* Added defines for read_gitfile_gently error codes. 
  This was a silly mistake, sorry about that.

Erik Elfström (5):
  setup: add gentle version of read_gitfile
  setup: sanity check file size in read_gitfile_gently
  t7300: add tests to document behavior of clean and nested git
  p7300: add performance tests for clean
  clean: improve performance when removing lots of directories

 builtin/clean.c       |  26 +++++++++--
 cache.h               |  12 ++++-
 setup.c               |  88 ++++++++++++++++++++++++++++-------
 t/perf/p7300-clean.sh |  31 +++++++++++++
 t/t7300-clean.sh      | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 261 insertions(+), 22 deletions(-)
 create mode 100755 t/perf/p7300-clean.sh

-- 
2.4.0.rc3.8.gbb31afb

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
@ 2015-04-26  6:49           ` Erik Elfström
  2015-04-28  6:17             ` Jeff King
  2015-04-29 23:47             ` Stefan Beller
  2015-04-26  6:49           ` [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
                             ` (3 subsequent siblings)
  4 siblings, 2 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-26  6:49 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

read_gitfile will die on most error cases. This makes it unsuitable
for speculative calls. Extract the core logic and provide a gentle
version that returns NULL on failure.

The first usecase of the new gentle version will be to probe for
submodules during git clean.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 cache.h | 11 ++++++++-
 setup.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index 3d3244b..868e4d3 100644
--- a/cache.h
+++ b/cache.h
@@ -431,7 +431,16 @@ extern int set_git_dir(const char *path);
 extern const char *get_git_namespace(void);
 extern const char *strip_namespace(const char *namespaced_ref);
 extern const char *get_git_work_tree(void);
-extern const char *read_gitfile(const char *path);
+
+#define READ_GITFILE_ERR_STAT_FAILED 1
+#define READ_GITFILE_ERR_NOT_A_FILE 2
+#define READ_GITFILE_ERR_OPEN_FAILED 3
+#define READ_GITFILE_ERR_READ_FAILED 4
+#define READ_GITFILE_ERR_INVALID_FORMAT 5
+#define READ_GITFILE_ERR_NO_PATH 6
+#define READ_GITFILE_ERR_NOT_A_REPO 7
+extern const char *read_gitfile_gently(const char *path, int *return_error_code);
+#define read_gitfile(path) read_gitfile_gently((path), NULL)
 extern const char *resolve_gitdir(const char *suspect);
 extern void set_git_work_tree(const char *tree);
 
diff --git a/setup.c b/setup.c
index 979b13f..c4538ca 100644
--- a/setup.c
+++ b/setup.c
@@ -335,35 +335,53 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
 /*
  * Try to read the location of the git directory from the .git file,
  * return path to git directory if found.
+ *
+ * On failure, if return_error_code is not NULL, return_error_code
+ * will be set to an error code and NULL will be returned. If
+ * return_error_code is NULL the function will die instead (for most
+ * cases).
  */
-const char *read_gitfile(const char *path)
+const char *read_gitfile_gently(const char *path, int *return_error_code)
 {
-	char *buf;
-	char *dir;
+	int error_code = 0;
+	char *buf = NULL;
+	char *dir = NULL;
 	const char *slash;
 	struct stat st;
 	int fd;
 	ssize_t len;
 
-	if (stat(path, &st))
-		return NULL;
-	if (!S_ISREG(st.st_mode))
-		return NULL;
+	if (stat(path, &st)) {
+		error_code = READ_GITFILE_ERR_STAT_FAILED;
+		goto cleanup_return;
+	}
+	if (!S_ISREG(st.st_mode)) {
+		error_code = READ_GITFILE_ERR_NOT_A_FILE;
+		goto cleanup_return;
+	}
 	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		die_errno("Error opening '%s'", path);
+	if (fd < 0) {
+		error_code = READ_GITFILE_ERR_OPEN_FAILED;
+		goto cleanup_return;
+	}
 	buf = xmalloc(st.st_size + 1);
 	len = read_in_full(fd, buf, st.st_size);
 	close(fd);
-	if (len != st.st_size)
-		die("Error reading %s", path);
+	if (len != st.st_size) {
+		error_code = READ_GITFILE_ERR_READ_FAILED;
+		goto cleanup_return;
+	}
 	buf[len] = '\0';
-	if (!starts_with(buf, "gitdir: "))
-		die("Invalid gitfile format: %s", path);
+	if (!starts_with(buf, "gitdir: ")) {
+		error_code = READ_GITFILE_ERR_INVALID_FORMAT;
+		goto cleanup_return;
+	}
 	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
 		len--;
-	if (len < 9)
-		die("No path in gitfile: %s", path);
+	if (len < 9) {
+		error_code = READ_GITFILE_ERR_NO_PATH;
+		goto cleanup_return;
+	}
 	buf[len] = '\0';
 	dir = buf + 8;
 
@@ -378,11 +396,41 @@ const char *read_gitfile(const char *path)
 		buf = dir;
 	}
 
-	if (!is_git_directory(dir))
-		die("Not a git repository: %s", dir);
+	if (!is_git_directory(dir)) {
+		error_code = READ_GITFILE_ERR_NOT_A_REPO;
+		goto cleanup_return;
+	}
 	path = real_path(dir);
 
+cleanup_return:
 	free(buf);
+
+	if (return_error_code)
+		*return_error_code = error_code;
+
+	if (error_code) {
+		if (return_error_code)
+			return NULL;
+
+		switch (error_code) {
+		case READ_GITFILE_ERR_STAT_FAILED:
+		case READ_GITFILE_ERR_NOT_A_FILE:
+			return NULL;
+		case READ_GITFILE_ERR_OPEN_FAILED:
+			die_errno("Error opening '%s'", path);
+		case READ_GITFILE_ERR_READ_FAILED:
+			die("Error reading %s", path);
+		case READ_GITFILE_ERR_INVALID_FORMAT:
+			die("Invalid gitfile format: %s", path);
+		case READ_GITFILE_ERR_NO_PATH:
+			die("No path in gitfile: %s", path);
+		case READ_GITFILE_ERR_NOT_A_REPO:
+			die("Not a git repository: %s", dir);
+		default:
+			assert(0);
+		}
+	}
+
 	return path;
 }
 
-- 
2.4.0.rc3.8.gbb31afb

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
  2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
@ 2015-04-26  6:49           ` Erik Elfström
  2015-04-28  6:02             ` Jeff King
  2015-04-29 15:42             ` Junio C Hamano
  2015-04-26  6:49           ` [PATCH v5 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
                             ` (2 subsequent siblings)
  4 siblings, 2 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-26  6:49 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

read_gitfile_gently will allocate a buffer to fit the entire file that
should be read. Add a sanity check of the file size before opening to
avoid allocating a potentially huge amount of memory if we come across
a large file that someone happened to name ".git". The limit is set to
a sufficiently unreasonable size that should never be exceeded by a
genuine .git file.

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 cache.h | 1 +
 setup.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/cache.h b/cache.h
index 868e4d3..c9f1f8e 100644
--- a/cache.h
+++ b/cache.h
@@ -439,6 +439,7 @@ extern const char *get_git_work_tree(void);
 #define READ_GITFILE_ERR_INVALID_FORMAT 5
 #define READ_GITFILE_ERR_NO_PATH 6
 #define READ_GITFILE_ERR_NOT_A_REPO 7
+#define READ_GITFILE_ERR_TOO_LARGE 8
 extern const char *read_gitfile_gently(const char *path, int *return_error_code);
 #define read_gitfile(path) read_gitfile_gently((path), NULL)
 extern const char *resolve_gitdir(const char *suspect);
diff --git a/setup.c b/setup.c
index c4538ca..792c37b 100644
--- a/setup.c
+++ b/setup.c
@@ -364,6 +364,10 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 		error_code = READ_GITFILE_ERR_OPEN_FAILED;
 		goto cleanup_return;
 	}
+	if (st.st_size > PATH_MAX * 4) {
+		error_code = READ_GITFILE_ERR_TOO_LARGE;
+		goto cleanup_return;
+	}
 	buf = xmalloc(st.st_size + 1);
 	len = read_in_full(fd, buf, st.st_size);
 	close(fd);
@@ -418,6 +422,8 @@ cleanup_return:
 			return NULL;
 		case READ_GITFILE_ERR_OPEN_FAILED:
 			die_errno("Error opening '%s'", path);
+		case READ_GITFILE_ERR_TOO_LARGE:
+			die("Too large to be a .git file: '%s'", path);
 		case READ_GITFILE_ERR_READ_FAILED:
 			die("Error reading %s", path);
 		case READ_GITFILE_ERR_INVALID_FORMAT:
-- 
2.4.0.rc3.8.gbb31afb

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 3/5] t7300: add tests to document behavior of clean and nested git
  2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
  2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
  2015-04-26  6:49           ` [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
@ 2015-04-26  6:49           ` Erik Elfström
  2015-04-26  6:49           ` [PATCH v5 4/5] p7300: add performance tests for clean Erik Elfström
  2015-04-26  6:49           ` [PATCH v5 5/5] clean: improve performance when removing lots of directories Erik Elfström
  4 siblings, 0 replies; 38+ messages in thread
From: Erik Elfström @ 2015-04-26  6:49 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 t/t7300-clean.sh | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 99be5d9..11f3a6d 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,6 +455,134 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
+test_expect_failure 'should clean things that almost look like git but are not' '
+	rm -fr almost_git almost_bare_git almost_submodule &&
+	mkdir -p almost_git/.git/objects &&
+	mkdir -p almost_git/.git/refs &&
+	cat >almost_git/.git/HEAD <<-\EOF &&
+	garbage
+	EOF
+	cp -r almost_git/.git/ almost_bare_git &&
+	mkdir almost_submodule/ &&
+	cat >almost_submodule/.git <<-\EOF &&
+	garbage
+	EOF
+	test_when_finished "rm -rf almost_*" &&
+	## This will fail due to die("Invalid gitfile format: %s", path); in
+	## setup.c:read_gitfile.
+	git clean -f -d &&
+	test_path_is_missing almost_git &&
+	test_path_is_missing almost_bare_git &&
+	test_path_is_missing almost_submodule
+'
+
+test_expect_success 'should not clean submodules' '
+	rm -fr repo to_clean sub1 sub2 &&
+	mkdir repo to_clean &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git submodule add ./repo/.git sub1 &&
+	git commit -m "sub1" &&
+	git branch before_sub2 &&
+	git submodule add ./repo/.git sub2 &&
+	git commit -m "sub2" &&
+	git checkout before_sub2 &&
+	>to_clean/should_clean.this &&
+	git clean -f -d &&
+	test_path_is_file repo/.git/index &&
+	test_path_is_file repo/hello.world &&
+	test_path_is_file sub1/.git &&
+	test_path_is_file sub1/hello.world &&
+	test_path_is_file sub2/.git &&
+	test_path_is_file sub2/hello.world &&
+	test_path_is_missing to_clean
+'
+
+test_expect_failure 'nested (empty) git should be kept' '
+	rm -fr empty_repo to_clean &&
+	git init empty_repo &&
+	mkdir to_clean &&
+	>to_clean/should_clean.this &&
+	git clean -f -d &&
+	test_path_is_file empty_repo/.git/HEAD &&
+	test_path_is_missing to_clean
+'
+
+test_expect_success 'nested bare repositories should be cleaned' '
+	rm -fr bare1 bare2 subdir &&
+	git init --bare bare1 &&
+	git clone --local --bare . bare2 &&
+	mkdir subdir &&
+	cp -r bare2 subdir/bare3 &&
+	git clean -f -d &&
+	test_path_is_missing bare1 &&
+	test_path_is_missing bare2 &&
+	test_path_is_missing subdir
+'
+
+test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+	rm -fr strange_bare &&
+	mkdir strange_bare &&
+	git init --bare strange_bare/.git &&
+	git clean -f -d &&
+	test_path_is_missing strange_bare
+'
+
+test_expect_failure 'nested (non-empty) bare repositories should be cleaned even when in .git' '
+	rm -fr strange_bare &&
+	mkdir strange_bare &&
+	git clone --local --bare . strange_bare/.git &&
+	git clean -f -d &&
+	test_path_is_missing strange_bare
+'
+
+test_expect_success 'giving path in nested git work tree will remove it' '
+	rm -fr repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		mkdir -p bar/baz &&
+		test_commit msg bar/baz/hello.world
+	) &&
+	git clean -f -d repo/bar/baz &&
+	test_path_is_file repo/.git/HEAD &&
+	test_path_is_dir repo/bar/ &&
+	test_path_is_missing repo/bar/baz
+'
+
+test_expect_success 'giving path to nested .git will not remove it' '
+	rm -fr repo &&
+	mkdir repo untracked &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git clean -f -d repo/.git &&
+	test_path_is_file repo/.git/HEAD &&
+	test_path_is_dir repo/.git/refs &&
+	test_path_is_dir repo/.git/objects &&
+	test_path_is_dir untracked/
+'
+
+test_expect_success 'giving path to nested .git/ will remove contents' '
+	rm -fr repo untracked &&
+	mkdir repo untracked &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git clean -f -d repo/.git/ &&
+	test_path_is_dir repo/.git &&
+	test_dir_is_empty repo/.git &&
+	test_path_is_dir untracked/
+'
+
 test_expect_success 'force removal of nested git work tree' '
 	rm -fr foo bar baz &&
 	mkdir -p foo bar baz/boo &&
-- 
2.4.0.rc3.8.gbb31afb

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 4/5] p7300: add performance tests for clean
  2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
                             ` (2 preceding siblings ...)
  2015-04-26  6:49           ` [PATCH v5 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
@ 2015-04-26  6:49           ` Erik Elfström
  2015-04-28  6:33             ` Jeff King
  2015-04-26  6:49           ` [PATCH v5 5/5] clean: improve performance when removing lots of directories Erik Elfström
  4 siblings, 1 reply; 38+ messages in thread
From: Erik Elfström @ 2015-04-26  6:49 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

The tests are run in dry-run mode to avoid having to restore the test
directories for each timed iteration. Using dry-run is an acceptable
compromise since we are mostly interested in the initial computation
of what to clean and not so much in the cleaning it self.

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 t/perf/p7300-clean.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100755 t/perf/p7300-clean.sh

diff --git a/t/perf/p7300-clean.sh b/t/perf/p7300-clean.sh
new file mode 100755
index 0000000..6ae55ec
--- /dev/null
+++ b/t/perf/p7300-clean.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description="Test git-clean performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+test_expect_success 'setup untracked directory with many sub dirs' '
+	rm -rf 500_sub_dirs 100000_sub_dirs clean_test_dir &&
+	mkdir 500_sub_dirs 100000_sub_dirs clean_test_dir &&
+	for i in $(test_seq 1 500)
+	do
+		mkdir 500_sub_dirs/dir$i || return $?
+	done &&
+	for i in $(test_seq 1 200)
+	do
+		cp -r 500_sub_dirs 100000_sub_dirs/dir$i || return $?
+	done
+'
+
+test_perf 'clean many untracked sub dirs, check for nested git' '
+	git clean -n -q -f -d 100000_sub_dirs/
+'
+
+test_perf 'clean many untracked sub dirs, ignore nested git' '
+	git clean -n -q -f -f -d 100000_sub_dirs/
+'
+
+test_done
-- 
2.4.0.rc3.8.gbb31afb

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 5/5] clean: improve performance when removing lots of directories
  2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
                             ` (3 preceding siblings ...)
  2015-04-26  6:49           ` [PATCH v5 4/5] p7300: add performance tests for clean Erik Elfström
@ 2015-04-26  6:49           ` Erik Elfström
  2015-04-28  6:24             ` Jeff King
  4 siblings, 1 reply; 38+ messages in thread
From: Erik Elfström @ 2015-04-26  6:49 UTC (permalink / raw)
  To: git, gitster; +Cc: Erik Elfström

"git clean" uses resolve_gitlink_ref() to check for the presence of
nested git repositories, but it has the drawback of creating a
ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list causes a massive
performance hit for large number of directories.

Modify clean.c:remove_dirs to use setup.c:is_git_directory and
setup.c:read_gitfile_gently instead.

Both these functions will open files and parse contents when they find
something that looks like a git repository. This is ok from a
performance standpoint since finding repository candidates should be
comparatively rare.

Using is_git_directory and read_gitfile_gently should give a more
standardized check for what is and what isn't a git repository but
also gives three behavioral changes.

The first change is that we will now detect and avoid cleaning empty
nested git repositories (only init run). This is desirable.

Second, we will no longer die when cleaning a file named ".git" with
garbage content (it will be cleaned instead). This is also desirable.

The last change is that we will detect and avoid cleaning empty bare
repositories that have been placed in a directory named ".git". This
is not desirable but should have no real user impact since we already
fail to clean non-empty bare repositories in the same scenario. This
is thus deemed acceptable.

Update t7300 to reflect these changes in behavior.

The time to clean an untracked directory containing 100000 sub
directories went from 61s to 1.7s after this change.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 builtin/clean.c  | 26 ++++++++++++++++++++++----
 t/t7300-clean.sh |  8 +++-----
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 98c103f..17a1c13 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -10,7 +10,6 @@
 #include "cache.h"
 #include "dir.h"
 #include "parse-options.h"
-#include "refs.h"
 #include "string-list.h"
 #include "quote.h"
 #include "column.h"
@@ -148,6 +147,27 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+/*
+ * Return 1 if the given path is the root of a git repository or
+ * submodule else 0. Will not return 1 for bare repositories with the
+ * exception of creating a bare repository in "foo/.git" and calling
+ * is_git_repository("foo").
+ */
+static int is_git_repository(struct strbuf *path)
+{
+	int ret = 0;
+	int unused_error_code;
+	size_t orig_path_len = path->len;
+	assert(orig_path_len != 0);
+	if (path->buf[orig_path_len - 1] != '/')
+		strbuf_addch(path, '/');
+	strbuf_addstr(path, ".git");
+	if (read_gitfile_gently(path->buf, &unused_error_code) || is_git_directory(path->buf))
+		ret = 1;
+	strbuf_setlen(path, orig_path_len);
+	return ret;
+}
+
 static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 		int dry_run, int quiet, int *dir_gone)
 {
@@ -155,13 +175,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 	struct strbuf quoted = STRBUF_INIT;
 	struct dirent *e;
 	int res = 0, ret = 0, gone = 1, original_len = path->len, len;
-	unsigned char submodule_head[20];
 	struct string_list dels = STRING_LIST_INIT_DUP;
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
-			!resolve_gitlink_ref(path->buf, "HEAD", submodule_head)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_git_repository(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 11f3a6d..4d07a4a 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
-test_expect_failure 'should clean things that almost look like git but are not' '
+test_expect_success 'should clean things that almost look like git but are not' '
 	rm -fr almost_git almost_bare_git almost_submodule &&
 	mkdir -p almost_git/.git/objects &&
 	mkdir -p almost_git/.git/refs &&
@@ -468,8 +468,6 @@ test_expect_failure 'should clean things that almost look like git but are not'
 	garbage
 	EOF
 	test_when_finished "rm -rf almost_*" &&
-	## This will fail due to die("Invalid gitfile format: %s", path); in
-	## setup.c:read_gitfile.
 	git clean -f -d &&
 	test_path_is_missing almost_git &&
 	test_path_is_missing almost_bare_git &&
@@ -501,7 +499,7 @@ test_expect_success 'should not clean submodules' '
 	test_path_is_missing to_clean
 '
 
-test_expect_failure 'nested (empty) git should be kept' '
+test_expect_success 'nested (empty) git should be kept' '
 	rm -fr empty_repo to_clean &&
 	git init empty_repo &&
 	mkdir to_clean &&
@@ -523,7 +521,7 @@ test_expect_success 'nested bare repositories should be cleaned' '
 	test_path_is_missing subdir
 '
 
-test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+test_expect_failure 'nested (empty) bare repositories should be cleaned even when in .git' '
 	rm -fr strange_bare &&
 	mkdir strange_bare &&
 	git init --bare strange_bare/.git &&
-- 
2.4.0.rc3.8.gbb31afb

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-26  6:49           ` [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
@ 2015-04-28  6:02             ` Jeff King
  2015-04-28  7:21               ` Windows path limites, was " Johannes Schindelin
  2015-04-28 19:28               ` erik elfström
  2015-04-29 15:42             ` Junio C Hamano
  1 sibling, 2 replies; 38+ messages in thread
From: Jeff King @ 2015-04-28  6:02 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git, gitster

On Sun, Apr 26, 2015 at 08:49:42AM +0200, Erik Elfström wrote:

> read_gitfile_gently will allocate a buffer to fit the entire file that
> should be read. Add a sanity check of the file size before opening to
> avoid allocating a potentially huge amount of memory if we come across
> a large file that someone happened to name ".git". The limit is set to
> a sufficiently unreasonable size that should never be exceeded by a
> genuine .git file.
>
> [...]
> +	if (st.st_size > PATH_MAX * 4) {
> +		error_code = READ_GITFILE_ERR_TOO_LARGE;
> +		goto cleanup_return;
> +	}

My understanding is that PATH_MAX is set absurdly low on Windows
systems (and doesn't actually represent the real limit of a path!).
Since the value is picked arbitrarily anyway, could use something more
independent (like 100K or something, which is large enough to be beyond
absurd and small enough that a malloc isn't a big deal)?

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
@ 2015-04-28  6:17             ` Jeff King
  2015-04-28 20:07               ` erik elfström
  2015-04-29 23:47             ` Stefan Beller
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff King @ 2015-04-28  6:17 UTC (permalink / raw)
  To: Erik Elfström; +Cc: Jonathan Nieder, git, gitster

On Sun, Apr 26, 2015 at 08:49:41AM +0200, Erik Elfström wrote:

> -extern const char *read_gitfile(const char *path);
> +
> +#define READ_GITFILE_ERR_STAT_FAILED 1
> +#define READ_GITFILE_ERR_NOT_A_FILE 2
> +#define READ_GITFILE_ERR_OPEN_FAILED 3
> +#define READ_GITFILE_ERR_READ_FAILED 4
> +#define READ_GITFILE_ERR_INVALID_FORMAT 5
> +#define READ_GITFILE_ERR_NO_PATH 6
> +#define READ_GITFILE_ERR_NOT_A_REPO 7
> +extern const char *read_gitfile_gently(const char *path, int *return_error_code);

There was a discussion not too long ago on strategies for returning
errors, and one of the suggestions was to return an "error strbuf"
rather than a code[1]. That's less flexible, as the caller can't react
differently based on the type of error. But for cases like this, where
the only fate for the code is to get converted back into a message,
it can reduce the boilerplate.

What you have here is OK to me, and I don't want to hold up your patch
series in a flamewar about error-reporting techniques. But I think it's
an interesting case study.

-Peff

[1] The original thread was here:

      http://thread.gmane.org/gmane.comp.version-control.git/259695/focus=260722

    I'm still a little wary of the allocation boilerplate introduced by
    the strbuf approach. But in this case it would not be too bad, I
    think.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 5/5] clean: improve performance when removing lots of directories
  2015-04-26  6:49           ` [PATCH v5 5/5] clean: improve performance when removing lots of directories Erik Elfström
@ 2015-04-28  6:24             ` Jeff King
  2015-04-28 20:31               ` erik elfström
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2015-04-28  6:24 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git, gitster

On Sun, Apr 26, 2015 at 08:49:45AM +0200, Erik Elfström wrote:

> +/*
> + * Return 1 if the given path is the root of a git repository or
> + * submodule else 0. Will not return 1 for bare repositories with the
> + * exception of creating a bare repository in "foo/.git" and calling
> + * is_git_repository("foo").
> + */
> +static int is_git_repository(struct strbuf *path)
> +{
> +	int ret = 0;
> +	int unused_error_code;
> +	size_t orig_path_len = path->len;
> +	assert(orig_path_len != 0);
> +	if (path->buf[orig_path_len - 1] != '/')
> +		strbuf_addch(path, '/');
> +	strbuf_addstr(path, ".git");
> +	if (read_gitfile_gently(path->buf, &unused_error_code) || is_git_directory(path->buf))
> +		ret = 1;
> +	strbuf_setlen(path, orig_path_len);
> +	return ret;
> +}

This iteration looks reasonable overall to me.

Should this is_git_repository() helper be available to other files? I
think there are other calls to resolve_gitlink_ref() that would want the
same treatment (e.g., I think "git status" may have a similar issue).

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 4/5] p7300: add performance tests for clean
  2015-04-26  6:49           ` [PATCH v5 4/5] p7300: add performance tests for clean Erik Elfström
@ 2015-04-28  6:33             ` Jeff King
  2015-04-28 19:36               ` erik elfström
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2015-04-28  6:33 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git, gitster

On Sun, Apr 26, 2015 at 08:49:44AM +0200, Erik Elfström wrote:

> diff --git a/t/perf/p7300-clean.sh b/t/perf/p7300-clean.sh
> new file mode 100755
> index 0000000..6ae55ec
> --- /dev/null
> +++ b/t/perf/p7300-clean.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +
> +test_description="Test git-clean performance"
> +
> +. ./perf-lib.sh
> +
> +test_perf_large_repo

Do we actually need a large repo here? The real cost is coming from the
directories we create. We could actually start with a totally empty
repository if we wanted (though I don't think the t/perf system handles
that right now). But if there's not a reason to use the large repo, I
think using test_perf_default_repo is better, as it works out of the box
without specifying extra environment variables (well, it works either
way, but you get a nasty warning from perf-lib.sh).

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Windows path limites, was Re: [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-28  6:02             ` Jeff King
@ 2015-04-28  7:21               ` Johannes Schindelin
  2015-04-28 15:33                 ` Doug Kelly
  2015-04-28 19:28               ` erik elfström
  1 sibling, 1 reply; 38+ messages in thread
From: Johannes Schindelin @ 2015-04-28  7:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Erik Elfström, git, gitster

Hi Peff,

On 2015-04-28 08:02, Jeff King wrote:

> My understanding is that PATH_MAX is set absurdly low on Windows
> systems (and doesn't actually represent the real limit of a path!).

Well, yes and no. Yes, it is absurdly low on Windows, and yes, it is not the real limit of a path *if you know how to work around it*. Most Win32 API calls actually do have that absurdly low limit, but internally longer paths can be represented (and there is a hack where you prefix the path -- which must be an absolute one for this hack -- by `\\?\`). Keep in mind, though, that even the Windows Explorer is (at least sometimes) limited by the absurdly low path limit.

Note that we have support for `core.longpaths` in Git for Windows (to be submitted just after I get 2.x out), but it is disabled by default because of the Windows Explorer issue.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Windows path limites, was Re: [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-28  7:21               ` Windows path limites, was " Johannes Schindelin
@ 2015-04-28 15:33                 ` Doug Kelly
  2015-04-28 16:20                   ` Windows path limits, " Johannes Schindelin
  0 siblings, 1 reply; 38+ messages in thread
From: Doug Kelly @ 2015-04-28 15:33 UTC (permalink / raw)
  To: Johannes Schindelin, Jeff King
  Cc: Erik Elfström, Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 2:23 AM Johannes Schindelin
<johannes.schindelin@gmx.de> wrote:
>
> Hi Peff,
>
> On 2015-04-28 08:02, Jeff King wrote:
>
> > My understanding is that PATH_MAX is set absurdly low on Windows
> > systems (and doesn't actually represent the real limit of a path!).
>
> Well, yes and no. Yes, it is absurdly low on Windows, and yes, it is not the real limit of a path *if you know how to work around it*. Most Win32 API calls actually do have that absurdly low limit, but internally longer paths can be represented (and there is a hack where you prefix the path -- which must be an absolute one for this hack -- by `\\?\`). Keep in mind, though, that even the Windows Explorer is (at least sometimes) limited by the absurdly low path limit.
>

One more thing worth noting is that path lengths are actually restricted further
within git_path_submodule(), I would presume to reserve some amount of space
when cloning a submodule (it seems to set aside 100 characters)?  For other
cases, yes, Dscho is absolutely right: the MAX_PATH constant is 260 (which gives
something like 256 usable characters).  If you're able to do everything through
the Unicode Win32 APIs, you can reach 65535 characters, assuming the filesystem
supports it (NTFS does, FAT32 would not, for example).  I recall there being one
function (possibly thinking of mktemp) that couldn't properly handle the long
paths, but I believe that was it.

Other apps may or may not handle the longer paths well; the core.longpaths
switch may prevent mishaps in more than just Explorer, but the downside isn't
disaster always either -- for example, Explorer just refuses to browse
to that path.
(Note that other apps may include both the msys bits as well as the Windows
command line built-in commands, which may make things that farm out to sh
behave in unusual ways.  I vaguely remember testing this myself, but I
don't recall
what I did exactly or what happened.)

--Doug

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Windows path limits, was Re: [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-28 15:33                 ` Doug Kelly
@ 2015-04-28 16:20                   ` Johannes Schindelin
  0 siblings, 0 replies; 38+ messages in thread
From: Johannes Schindelin @ 2015-04-28 16:20 UTC (permalink / raw)
  To: Doug Kelly; +Cc: Jeff King, Erik Elfström, Git List, Junio C Hamano

Hi,

On 2015-04-28 17:33, Doug Kelly wrote:

> If you're able to do everything through the Unicode Win32 APIs, you can reach 65535 characters, assuming the filesystem supports it (NTFS does, FAT32 would not, for example).  I recall there being one function (possibly thinking of mktemp) that couldn't properly handle the long paths, but I believe that was it.

IIRC the MSDN documentation of at least one of the directory-related functions was not quite guaranteeing that it would support long paths with all Windows versions, either.

> Other apps may or may not handle the longer paths well; the core.longpaths
> switch may prevent mishaps in more than just Explorer, but the downside isn't
> disaster always either -- for example, Explorer just refuses to browse
> to that path.

The problem comes from a different angle, though: I had *a lot* of fun when working on the tests for the `core.longpaths` feature because our very own `rm.exe` would not handle the long paths. As a consequence, quite a few tests failed to remove the trash directories of previous, failed test runs.

In such a case you will have to resort to use different tools than you are used to, and *that* is the reason why we do not enable that feature by default.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-28  6:02             ` Jeff King
  2015-04-28  7:21               ` Windows path limites, was " Johannes Schindelin
@ 2015-04-28 19:28               ` erik elfström
  1 sibling, 0 replies; 38+ messages in thread
From: erik elfström @ 2015-04-28 19:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 8:02 AM, Jeff King <peff@peff.net> wrote:
>
> My understanding is that PATH_MAX is set absurdly low on Windows
> systems (and doesn't actually represent the real limit of a path!).
> Since the value is picked arbitrarily anyway, could use something more
> independent (like 100K or something, which is large enough to be beyond
> absurd and small enough that a malloc isn't a big deal)?
>
> -Peff

I'm happy to set the limit to anything that makes everybody feel safe. I'll set
it to 1MB to be on the safe side.

I'm not sure though how the code (in general) is supposed to keep working if
a path can exceed PATH_MAX? A cursory search for PATH_MAX comes
up with char array sizes and check-and-die kind of things. If a path is longer
then surely we will be unable to handle it and abort in all sorts of places?

Are you only worried we might have a submodule with a too long path (that will
create various other problems in different codepaths) that we may mistakenly
clean (if it doesn't trigger any other abort earlier in the clean call
chain) or do
you want clean to keep working and do the right thing even in this case?

While digging around looking at this I also noticed that there is
another problem
I have overlooked previously.

read_gitfile_gently will call is_git_directory at the very end and it
contains the
following check at the very beginning:

    if (PATH_MAX <= len + strlen("/objects"))
        die("Too long path: %.*s", 60, suspect);

Now, this is good in the way that we will avoid mistakenly cleaning
stuff because
the path is too long but also bad because it makes read_gitfile_gently
behave very
ungently in this case.

I suspect I should make a gentle version of this also. The question is
what to do
in clean if the path is reported as too long? Abort? Avoid cleaning it
to be safe?
Ignore and clean it?

is_git_directory is also called from the new is_git_repository
directly but here I think
dying is ok since this path is a path in the working tree and if we
can't handle the paths
in the tree then there seem to be little point in trying to go on (as
opposed to when
some string in a file is too large for a path)

Thoughts?

/Erik

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 4/5] p7300: add performance tests for clean
  2015-04-28  6:33             ` Jeff King
@ 2015-04-28 19:36               ` erik elfström
  0 siblings, 0 replies; 38+ messages in thread
From: erik elfström @ 2015-04-28 19:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 8:33 AM, Jeff King <peff@peff.net> wrote:
>
> Do we actually need a large repo here? The real cost is coming from the
> directories we create. We could actually start with a totally empty
> repository if we wanted (though I don't think the t/perf system handles
> that right now). But if there's not a reason to use the large repo, I
> think using test_perf_default_repo is better, as it works out of the box
> without specifying extra environment variables (well, it works either
> way, but you get a nasty warning from perf-lib.sh).
>
> -Peff

Sure, I'll change this to the default. I wasn't sure about the
performance characteristics of clean (does working tree size matter?)
when I started so it felt safest to go with a large repo.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28  6:17             ` Jeff King
@ 2015-04-28 20:07               ` erik elfström
  2015-04-28 20:19                 ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: erik elfström @ 2015-04-28 20:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 8:17 AM, Jeff King <peff@peff.net> wrote:
>
> There was a discussion not too long ago on strategies for returning
> errors, and one of the suggestions was to return an "error strbuf"
> rather than a code[1]. That's less flexible, as the caller can't react
> differently based on the type of error. But for cases like this, where
> the only fate for the code is to get converted back into a message,
> it can reduce the boilerplate.
>
> What you have here is OK to me, and I don't want to hold up your patch
> series in a flamewar about error-reporting techniques. But I think it's
> an interesting case study.
>
> -Peff

Thanks. I haven't had time to look through that thread yet, I'll try
to get to that later.

My initial reaction is a bit skeptical though. For this case we
currently don't want any error reporting, the NULL return is
sufficient and even allocating and sending in the int* is pure noise.
Allocating and releasing a strbuf seems like a lot more overhead for
this type of caller? The one other potential candidate caller for
read_gitfile_gently that I have seen (clone.c:get_repo_path) don't
want any error code or message either as far as i can tell.

Also if it turns out that we actually need to treat the "file too
large" error differently in clean (as discussed in thread on the file
size check) then we can no longer communicate that back using the
strbuf interface.

Overall it seems like a less attractive solution to me but I can be
persuaded if there is more support expressed for the strbuf direction.

/Erik

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:07               ` erik elfström
@ 2015-04-28 20:19                 ` Jeff King
  2015-04-28 20:34                   ` Jonathan Nieder
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2015-04-28 20:19 UTC (permalink / raw)
  To: erik elfström; +Cc: Jonathan Nieder, Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 10:07:43PM +0200, erik elfström wrote:

> On Tue, Apr 28, 2015 at 8:17 AM, Jeff King <peff@peff.net> wrote:
> >
> > There was a discussion not too long ago on strategies for returning
> > errors, and one of the suggestions was to return an "error strbuf"
> > rather than a code[1]. That's less flexible, as the caller can't react
> > differently based on the type of error. But for cases like this, where
> > the only fate for the code is to get converted back into a message,
> > it can reduce the boilerplate.
> >
> > What you have here is OK to me, and I don't want to hold up your patch
> > series in a flamewar about error-reporting techniques. But I think it's
> > an interesting case study.
> >
> > -Peff
> 
> Thanks. I haven't had time to look through that thread yet, I'll try
> to get to that later.
> 
> My initial reaction is a bit skeptical though. For this case we
> currently don't want any error reporting, the NULL return is
> sufficient and even allocating and sending in the int* is pure noise.
> Allocating and releasing a strbuf seems like a lot more overhead for
> this type of caller? The one other potential candidate caller for
> read_gitfile_gently that I have seen (clone.c:get_repo_path) don't
> want any error code or message either as far as i can tell.

I had envisioned that the strbuf would be optional. I.e., you would
have:

  /* like error(), but dump the message in a strbuf instead of stderr */
  int error_buf(struct strbuf *buf, const char *fmt, ...)
  {
	if (buf) {
		va_list ap;
		va_start(ap, fmt);
		strbuf_vaddf(buf, fmt, ap);
		va_end(ap);
	}
	return -1;
  }

and then in the error-reporting function:

  const char *read_gitfile_gently(const char *path, struct strbuf *err)
  {
	...
	fd = open(path, O_RDONLY);
	if (fd < 0) {
		error_buf(err, "unable to open %s: %s", path, strerror(errno));
		return NULL; /* or goto cleanup if necessary */
	}
  }

and then one caller can do:

  if (!read_gitfile_gently(path, NULL)) {
	/* we know there was an error, but we did not ask for details */
	...
  }

and the non-gentle read_gitfile() becomes:

  const char *read_gitfile(const char *path)
  {
	struct strbuf err = STRBUF_INIT;
	const char *ret = read_gitfile_gently(path, &err);
	if (!ret)
		die("%s", err.buf);
	/* no need to free err; if there was no error, nothing was written */
	return path;
  }

Note that the "return -1" from error_buf() is not useful here, but it
might be used as a shortcut in other situations (e.g., the same places
we call "return error()" now).

> Also if it turns out that we actually need to treat the "file too
> large" error differently in clean (as discussed in thread on the file
> size check) then we can no longer communicate that back using the
> strbuf interface.

Yeah, agreed. This system breaks down as soon as you need to
programatically know which error happened.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 5/5] clean: improve performance when removing lots of directories
  2015-04-28  6:24             ` Jeff King
@ 2015-04-28 20:31               ` erik elfström
  0 siblings, 0 replies; 38+ messages in thread
From: erik elfström @ 2015-04-28 20:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 8:24 AM, Jeff King <peff@peff.net> wrote:
>
> This iteration looks reasonable overall to me.
>
> Should this is_git_repository() helper be available to other files? I
> think there are other calls to resolve_gitlink_ref() that would want the
> same treatment (e.g., I think "git status" may have a similar issue).
>
> -Peff

Yes that is probably the direction to go in but I think the current
version is too tailored to the clean case to be generally useful
(although I haven't check the status case). Right now this is more of
a is_this_a_repo_clean_should_care_about_ignoring_some_cornercase_quirks
than a true is_git_repository check.

I would think a general version would look more like this:

static int is_git_repository(struct strbuf *path)
{
    if (is_bare_repository(path))
        return BARE_REPO;
    if (is_submodule(path))
        return SUBMODULE;
    if (is_git_director(path))
        return REPO;
    return 0;
}

probably also communicating any errors back to the caller.

To sum it up I think you are correct in the direction but I suspect
that the current version is not sufficient for the general case and
that it would be best to leave the generalization and making it public
for future work.

/Erik

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:19                 ` Jeff King
@ 2015-04-28 20:34                   ` Jonathan Nieder
  2015-04-28 20:36                     ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Jonathan Nieder @ 2015-04-28 20:34 UTC (permalink / raw)
  To: Jeff King; +Cc: erik elfström, Git List, Junio C Hamano

Jeff King wrote:
> On Tue, Apr 28, 2015 at 10:07:43PM +0200, erik elfström wrote:

>> Also if it turns out that we actually need to treat the "file too
>> large" error differently in clean (as discussed in thread on the file
>> size check) then we can no longer communicate that back using the
>> strbuf interface.
>
> Yeah, agreed. This system breaks down as soon as you need to
> programatically know which error happened.

On the contrary: it separates the information that is used
programatically and the information intended for the user.

The return value (or an int * parameter) distinguishes errors that
affect control flow.  A string can provide information for the user.

This way it is easy to tweak the information that the user sees in
one place, without changing control flow.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:34                   ` Jonathan Nieder
@ 2015-04-28 20:36                     ` Jeff King
  2015-04-28 20:42                       ` Jonathan Nieder
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2015-04-28 20:36 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: erik elfström, Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 01:34:00PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Tue, Apr 28, 2015 at 10:07:43PM +0200, erik elfström wrote:
> 
> >> Also if it turns out that we actually need to treat the "file too
> >> large" error differently in clean (as discussed in thread on the file
> >> size check) then we can no longer communicate that back using the
> >> strbuf interface.
> >
> > Yeah, agreed. This system breaks down as soon as you need to
> > programatically know which error happened.
> 
> On the contrary: it separates the information that is used
> programatically and the information intended for the user.
> 
> The return value (or an int * parameter) distinguishes errors that
> affect control flow.  A string can provide information for the user.
> 
> This way it is easy to tweak the information that the user sees in
> one place, without changing control flow.

But the NULL does not carry the information about _which_ error, and
Erik is suggesting that the caller may need to change behavior based on
that information. IOW, his current patch (return NULL and set the
specific integer code in a variable) allows this, but switching the
integer code out for a human-readable strbuf does not.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:36                     ` Jeff King
@ 2015-04-28 20:42                       ` Jonathan Nieder
  2015-04-28 20:48                         ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Jonathan Nieder @ 2015-04-28 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: erik elfström, Git List, Junio C Hamano

Jeff King wrote:

> But the NULL does not carry the information about _which_ error, and
> Erik is suggesting that the caller may need to change behavior based on
> that information. IOW, his current patch (return NULL and set the
> specific integer code in a variable) allows this, but switching the
> integer code out for a human-readable strbuf does not.

Right.  Two ways to handle that are:

 - two out parameters: an integer code and a human-readable string, or

 - an integer code returned using the ERR_PTR method and a
   human-readable string out parameter

The point I was trying to make clear is that the human-readable error
message and the integer to affect control flow have different purposes
and are not tightly related except in that they are produced at the
same time.

Using an integer to convey the human-readable error message (like errno
does) unnecessarily forces the message to be more vague (by not being
parameterized, and by always using the same error message for the same
integer error).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:42                       ` Jonathan Nieder
@ 2015-04-28 20:48                         ` Jeff King
  2015-04-28 21:06                           ` Jonathan Nieder
  2015-04-28 23:34                           ` Junio C Hamano
  0 siblings, 2 replies; 38+ messages in thread
From: Jeff King @ 2015-04-28 20:48 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: erik elfström, Git List, Junio C Hamano

On Tue, Apr 28, 2015 at 01:42:13PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > But the NULL does not carry the information about _which_ error, and
> > Erik is suggesting that the caller may need to change behavior based on
> > that information. IOW, his current patch (return NULL and set the
> > specific integer code in a variable) allows this, but switching the
> > integer code out for a human-readable strbuf does not.
> 
> Right.  Two ways to handle that are:

Sure, but "this system" that I was referring to one was not one of
those ways. :)

>  - two out parameters: an integer code and a human-readable string, or
> 
>  - an integer code returned using the ERR_PTR method and a
>    human-readable string out parameter
> 
> The point I was trying to make clear is that the human-readable error
> message and the integer to affect control flow have different purposes
> and are not tightly related except in that they are produced at the
> same time.
> 
> Using an integer to convey the human-readable error message (like errno
> does) unnecessarily forces the message to be more vague (by not being
> parameterized, and by always using the same error message for the same
> integer error).

Yes, I agree converting the integer back into a string later does not
always carry all of the data. OTOH, the caller can often supply the
context (i.e., this is basically how "errno" works). This gets back to
the idea we discussed a while ago of having a "struct error" that can
carry the code and the parameters (or if you like, the code and a
finished error message). It just feels in some ways like that ends up
with the worst of both worlds (a lot of boilerplate for integer codes,
_and_ the allocation and cleanup issues associated with a string error
message).

I dunno. I'd be interested to see a rough draft of an idea applied to a
specific callsite.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:48                         ` Jeff King
@ 2015-04-28 21:06                           ` Jonathan Nieder
  2015-04-28 23:34                           ` Junio C Hamano
  1 sibling, 0 replies; 38+ messages in thread
From: Jonathan Nieder @ 2015-04-28 21:06 UTC (permalink / raw)
  To: Jeff King; +Cc: erik elfström, Git List, Junio C Hamano

Jeff King wrote:
> On Tue, Apr 28, 2015 at 01:42:13PM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> But the NULL does not carry the information about _which_ error, and
>>> Erik is suggesting that the caller may need to change behavior based on
>>> that information. IOW, his current patch (return NULL and set the
>>> specific integer code in a variable) allows this, but switching the
>>> integer code out for a human-readable strbuf does not.
>>
>> Right.  Two ways to handle that are:
>
> Sure, but "this system" that I was referring to one was not one of
> those ways. :)

That means I wasn't communicating well.

I never advocated *not* providing an integer return code when you want
to affect control flow.

An example of a function that provides information both to the caller
and user and doesn't conflate the two is ref_transaction_commit.  It
returns TRANSACTION_NAME_CONFLICT for a certain kind of recoverable
error and TRANSACTION_GENERIC_ERROR for all other errors.  In either
case, the 'err' parameter is populated with information intended for
the user.

Sometimes you might want to affect control flow without providing
information for the user.  Then there's no need for a string output
parameter and you can just provide the int.

At other times you might want to provide information for the user
without affecting control flow.  Then you can provide a string without
the int.

Providing error codes when you also want to return a string or struct
pointer is an unrelated problem.  One runs into that problem even if
only intending to provide a return code.  Possible approaches, like
mentioned before, are

 - an int * parameter to pass back the return code
 - using ERR_PTR to embed error codes in the pointer space

There is another approach which is sometimes nicer:

 - return int and pass the string or struct pointer through an output
   paramter

Thanks and sorry for the lack of clarity,
Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-28 20:48                         ` Jeff King
  2015-04-28 21:06                           ` Jonathan Nieder
@ 2015-04-28 23:34                           ` Junio C Hamano
  1 sibling, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2015-04-28 23:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, erik elfström, Git List

Jeff King <peff@peff.net> writes:

> Yes, I agree converting the integer back into a string later does not
> always carry all of the data. OTOH, the caller can often supply the
> context (i.e., this is basically how "errno" works). This gets back to
> the idea we discussed a while ago of having a "struct error" that can
> carry the code and the parameters (or if you like, the code and a
> finished error message). It just feels in some ways like that ends up
> with the worst of both worlds (a lot of boilerplate for integer codes,
> _and_ the allocation and cleanup issues associated with a string error
> message).
>
> I dunno. I'd be interested to see a rough draft of an idea applied to a
> specific callsite.

For this specific callsite, I think the error code alone is the
right way forward. It conveys all information necessary out of the
callee back to the caller, so that the caller, when it decides to do
so later, can turn error code to human readable form.

For some callsites, there may be cases where carrying a centrally
produced string may be a handy way to grab detailed error message
out of a deep call chain, but even in such a case, it is likely that
we would need a separate error code, and a funtion that returns a
pointer as its primary return value would need "two" extra out
parameters, one for message and one for code.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently
  2015-04-26  6:49           ` [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
  2015-04-28  6:02             ` Jeff King
@ 2015-04-29 15:42             ` Junio C Hamano
  1 sibling, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2015-04-29 15:42 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git

Erik Elfström <erik.elfstrom@gmail.com> writes:

> diff --git a/setup.c b/setup.c
> index c4538ca..792c37b 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -364,6 +364,10 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
>  		error_code = READ_GITFILE_ERR_OPEN_FAILED;
>  		goto cleanup_return;
>  	}
> +	if (st.st_size > PATH_MAX * 4) {
> +		error_code = READ_GITFILE_ERR_TOO_LARGE;

You have fd already open for the path at this point.  When you
reroll, please do not forget to close(fd) here.

> +		goto cleanup_return;
> +	}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
  2015-04-28  6:17             ` Jeff King
@ 2015-04-29 23:47             ` Stefan Beller
  2015-04-30  1:35               ` Junio C Hamano
  1 sibling, 1 reply; 38+ messages in thread
From: Stefan Beller @ 2015-04-29 23:47 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git, Junio C Hamano

On Sat, Apr 25, 2015 at 11:49 PM, Erik Elfström <erik.elfstrom@gmail.com> wrote:
> read_gitfile will die on most error cases. This makes it unsuitable
> for speculative calls. Extract the core logic and provide a gentle
> version that returns NULL on failure.
>
> The first usecase of the new gentle version will be to probe for
> submodules during git clean.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
> ---
>  cache.h | 11 ++++++++-
>  setup.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++--------------
>  2 files changed, 75 insertions(+), 18 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 3d3244b..868e4d3 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -431,7 +431,16 @@ extern int set_git_dir(const char *path);
>  extern const char *get_git_namespace(void);
>  extern const char *strip_namespace(const char *namespaced_ref);
>  extern const char *get_git_work_tree(void);
> -extern const char *read_gitfile(const char *path);
> +
> +#define READ_GITFILE_ERR_STAT_FAILED 1
> +#define READ_GITFILE_ERR_NOT_A_FILE 2
> +#define READ_GITFILE_ERR_OPEN_FAILED 3
> +#define READ_GITFILE_ERR_READ_FAILED 4
> +#define READ_GITFILE_ERR_INVALID_FORMAT 5
> +#define READ_GITFILE_ERR_NO_PATH 6
> +#define READ_GITFILE_ERR_NOT_A_REPO 7
> +extern const char *read_gitfile_gently(const char *path, int *return_error_code);
> +#define read_gitfile(path) read_gitfile_gently((path), NULL)
>  extern const char *resolve_gitdir(const char *suspect);
>  extern void set_git_work_tree(const char *tree);
>
> diff --git a/setup.c b/setup.c
> index 979b13f..c4538ca 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -335,35 +335,53 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
>  /*
>   * Try to read the location of the git directory from the .git file,
>   * return path to git directory if found.
> + *
> + * On failure, if return_error_code is not NULL, return_error_code
> + * will be set to an error code and NULL will be returned. If
> + * return_error_code is NULL the function will die instead (for most
> + * cases).
>   */
> -const char *read_gitfile(const char *path)
> +const char *read_gitfile_gently(const char *path, int *return_error_code)
>  {
> -       char *buf;
> -       char *dir;
> +       int error_code = 0;
> +       char *buf = NULL;
> +       char *dir = NULL;
>         const char *slash;
>         struct stat st;
>         int fd;
>         ssize_t len;
>
> -       if (stat(path, &st))
> -               return NULL;
> -       if (!S_ISREG(st.st_mode))
> -               return NULL;
> +       if (stat(path, &st)) {
> +               error_code = READ_GITFILE_ERR_STAT_FAILED;
> +               goto cleanup_return;
> +       }
> +       if (!S_ISREG(st.st_mode)) {
> +               error_code = READ_GITFILE_ERR_NOT_A_FILE;
> +               goto cleanup_return;
> +       }
>         fd = open(path, O_RDONLY);
> -       if (fd < 0)
> -               die_errno("Error opening '%s'", path);
> +       if (fd < 0) {
> +               error_code = READ_GITFILE_ERR_OPEN_FAILED;
> +               goto cleanup_return;
> +       }
>         buf = xmalloc(st.st_size + 1);
>         len = read_in_full(fd, buf, st.st_size);
>         close(fd);
> -       if (len != st.st_size)
> -               die("Error reading %s", path);
> +       if (len != st.st_size) {
> +               error_code = READ_GITFILE_ERR_READ_FAILED;
> +               goto cleanup_return;

Sorry for the late review.

So when you jump from here to the cleanup, there is no close(fd) involved?
I think there are code paths now, which leak fd.

> +       }
>         buf[len] = '\0';
> -       if (!starts_with(buf, "gitdir: "))
> -               die("Invalid gitfile format: %s", path);
> +       if (!starts_with(buf, "gitdir: ")) {
> +               error_code = READ_GITFILE_ERR_INVALID_FORMAT;
> +               goto cleanup_return;
> +       }
>         while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
>                 len--;
> -       if (len < 9)
> -               die("No path in gitfile: %s", path);
> +       if (len < 9) {
> +               error_code = READ_GITFILE_ERR_NO_PATH;
> +               goto cleanup_return;
> +       }
>         buf[len] = '\0';
>         dir = buf + 8;
>
> @@ -378,11 +396,41 @@ const char *read_gitfile(const char *path)
>                 buf = dir;
>         }
>
> -       if (!is_git_directory(dir))
> -               die("Not a git repository: %s", dir);
> +       if (!is_git_directory(dir)) {
> +               error_code = READ_GITFILE_ERR_NOT_A_REPO;
> +               goto cleanup_return;
> +       }
>         path = real_path(dir);
>
> +cleanup_return:
>         free(buf);
> +
> +       if (return_error_code)
> +               *return_error_code = error_code;
> +
> +       if (error_code) {
> +               if (return_error_code)
> +                       return NULL;
> +
> +               switch (error_code) {
> +               case READ_GITFILE_ERR_STAT_FAILED:
> +               case READ_GITFILE_ERR_NOT_A_FILE:
> +                       return NULL;
> +               case READ_GITFILE_ERR_OPEN_FAILED:
> +                       die_errno("Error opening '%s'", path);
> +               case READ_GITFILE_ERR_READ_FAILED:
> +                       die("Error reading %s", path);
> +               case READ_GITFILE_ERR_INVALID_FORMAT:
> +                       die("Invalid gitfile format: %s", path);
> +               case READ_GITFILE_ERR_NO_PATH:
> +                       die("No path in gitfile: %s", path);
> +               case READ_GITFILE_ERR_NOT_A_REPO:
> +                       die("Not a git repository: %s", dir);
> +               default:
> +                       assert(0);
> +               }
> +       }
> +
>         return path;
>  }
>
> --
> 2.4.0.rc3.8.gbb31afb
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 1/5] setup: add gentle version of read_gitfile
  2015-04-29 23:47             ` Stefan Beller
@ 2015-04-30  1:35               ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2015-04-30  1:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Erik Elfström, git

Stefan Beller <sbeller@google.com> writes:

[administrivia: cull the parts of the original from your quote if
you are not addressing them]

>>         fd = open(path, O_RDONLY);
>> -       if (fd < 0)
>> -               die_errno("Error opening '%s'", path);
>> +       if (fd < 0) {
>> +               error_code = READ_GITFILE_ERR_OPEN_FAILED;
>> +               goto cleanup_return;
>> +       }
>>         buf = xmalloc(st.st_size + 1);
>>         len = read_in_full(fd, buf, st.st_size);
>>         close(fd);
>> -       if (len != st.st_size)
>> -               die("Error reading %s", path);
>> +       if (len != st.st_size) {
>> +               error_code = READ_GITFILE_ERR_READ_FAILED;
>> +               goto cleanup_return;
>
> Sorry for the late review.
>
> So when you jump from here to the cleanup, there is no close(fd) involved?
> I think there are code paths now, which leak fd.

This one comes _after_ close(fd), so there is no issue.

A later step does introduce an issue; see $gmane/267975.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-04-30  1:35 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
2015-04-25  9:06 ` [PATCH v4 1/5] setup: add gentle version of read_gitfile Erik Elfström
2015-04-25 16:51   ` Junio C Hamano
2015-04-25 16:54     ` Junio C Hamano
2015-04-25  9:06 ` [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
2015-04-25 16:47   ` Junio C Hamano
2015-04-25 17:59     ` Erik Elfström
2015-04-26  4:29       ` Junio C Hamano
2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
2015-04-28  6:17             ` Jeff King
2015-04-28 20:07               ` erik elfström
2015-04-28 20:19                 ` Jeff King
2015-04-28 20:34                   ` Jonathan Nieder
2015-04-28 20:36                     ` Jeff King
2015-04-28 20:42                       ` Jonathan Nieder
2015-04-28 20:48                         ` Jeff King
2015-04-28 21:06                           ` Jonathan Nieder
2015-04-28 23:34                           ` Junio C Hamano
2015-04-29 23:47             ` Stefan Beller
2015-04-30  1:35               ` Junio C Hamano
2015-04-26  6:49           ` [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
2015-04-28  6:02             ` Jeff King
2015-04-28  7:21               ` Windows path limites, was " Johannes Schindelin
2015-04-28 15:33                 ` Doug Kelly
2015-04-28 16:20                   ` Windows path limits, " Johannes Schindelin
2015-04-28 19:28               ` erik elfström
2015-04-29 15:42             ` Junio C Hamano
2015-04-26  6:49           ` [PATCH v5 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-26  6:49           ` [PATCH v5 4/5] p7300: add performance tests for clean Erik Elfström
2015-04-28  6:33             ` Jeff King
2015-04-28 19:36               ` erik elfström
2015-04-26  6:49           ` [PATCH v5 5/5] clean: improve performance when removing lots of directories Erik Elfström
2015-04-28  6:24             ` Jeff King
2015-04-28 20:31               ` erik elfström
2015-04-25  9:06 ` [PATCH v4 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-25  9:06 ` [PATCH v4 4/5] p7300: add performance tests for clean Erik Elfström
2015-04-25  9:06 ` [PATCH v4 5/5] clean: improve performance when removing lots of directories Erik Elfström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.