All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Erik Elfström" <erik.elfstrom@gmail.com>
To: git@vger.kernel.org
Cc: "Erik Elfström" <erik.elfstrom@gmail.com>
Subject: [PATCH v4 5/5] clean: improve performance when removing lots of directories
Date: Sat, 25 Apr 2015 11:06:41 +0200	[thread overview]
Message-ID: <1429952801-2646-6-git-send-email-erik.elfstrom@gmail.com> (raw)
In-Reply-To: <1429952801-2646-1-git-send-email-erik.elfstrom@gmail.com>

"git clean" uses resolve_gitlink_ref() to check for the presence of
nested git repositories, but it has the drawback of creating a
ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list causes a massive
performance hit for large number of directories.

Modify clean.c:remove_dirs to use setup.c:is_git_directory and
setup.c:read_gitfile_gently instead.

Both these functions will open files and parse contents when they find
something that looks like a git repository. This is ok from a
performance standpoint since finding repository candidates should be
comparatively rare.

Using is_git_directory and read_gitfile_gently should give a more
standardized check for what is and what isn't a git repository but
also gives three behavioral changes.

The first change is that we will now detect and avoid cleaning empty
nested git repositories (only init run). This is desirable.

Second, we will no longer die when cleaning a file named ".git" with
garbage content (it will be cleaned instead). This is also desirable.

The last change is that we will detect and avoid cleaning empty bare
repositories that have been placed in a directory named ".git". This
is not desirable but should have no real user impact since we already
fail to clean non-empty bare repositories in the same scenario. This
is thus deemed acceptable.

Update t7300 to reflect these changes in behavior.

The time to clean an untracked directory containing 100000 sub
directories went from 61s to 1.7s after this change.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 builtin/clean.c  | 26 ++++++++++++++++++++++----
 t/t7300-clean.sh |  8 +++-----
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 98c103f..17a1c13 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -10,7 +10,6 @@
 #include "cache.h"
 #include "dir.h"
 #include "parse-options.h"
-#include "refs.h"
 #include "string-list.h"
 #include "quote.h"
 #include "column.h"
@@ -148,6 +147,27 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+/*
+ * Return 1 if the given path is the root of a git repository or
+ * submodule else 0. Will not return 1 for bare repositories with the
+ * exception of creating a bare repository in "foo/.git" and calling
+ * is_git_repository("foo").
+ */
+static int is_git_repository(struct strbuf *path)
+{
+	int ret = 0;
+	int unused_error_code;
+	size_t orig_path_len = path->len;
+	assert(orig_path_len != 0);
+	if (path->buf[orig_path_len - 1] != '/')
+		strbuf_addch(path, '/');
+	strbuf_addstr(path, ".git");
+	if (read_gitfile_gently(path->buf, &unused_error_code) || is_git_directory(path->buf))
+		ret = 1;
+	strbuf_setlen(path, orig_path_len);
+	return ret;
+}
+
 static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 		int dry_run, int quiet, int *dir_gone)
 {
@@ -155,13 +175,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 	struct strbuf quoted = STRBUF_INIT;
 	struct dirent *e;
 	int res = 0, ret = 0, gone = 1, original_len = path->len, len;
-	unsigned char submodule_head[20];
 	struct string_list dels = STRING_LIST_INIT_DUP;
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
-			!resolve_gitlink_ref(path->buf, "HEAD", submodule_head)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_git_repository(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 11f3a6d..4d07a4a 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
-test_expect_failure 'should clean things that almost look like git but are not' '
+test_expect_success 'should clean things that almost look like git but are not' '
 	rm -fr almost_git almost_bare_git almost_submodule &&
 	mkdir -p almost_git/.git/objects &&
 	mkdir -p almost_git/.git/refs &&
@@ -468,8 +468,6 @@ test_expect_failure 'should clean things that almost look like git but are not'
 	garbage
 	EOF
 	test_when_finished "rm -rf almost_*" &&
-	## This will fail due to die("Invalid gitfile format: %s", path); in
-	## setup.c:read_gitfile.
 	git clean -f -d &&
 	test_path_is_missing almost_git &&
 	test_path_is_missing almost_bare_git &&
@@ -501,7 +499,7 @@ test_expect_success 'should not clean submodules' '
 	test_path_is_missing to_clean
 '
 
-test_expect_failure 'nested (empty) git should be kept' '
+test_expect_success 'nested (empty) git should be kept' '
 	rm -fr empty_repo to_clean &&
 	git init empty_repo &&
 	mkdir to_clean &&
@@ -523,7 +521,7 @@ test_expect_success 'nested bare repositories should be cleaned' '
 	test_path_is_missing subdir
 '
 
-test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+test_expect_failure 'nested (empty) bare repositories should be cleaned even when in .git' '
 	rm -fr strange_bare &&
 	mkdir strange_bare &&
 	git init --bare strange_bare/.git &&
-- 
2.4.0.rc3.8.g4ebd28d

      parent reply	other threads:[~2015-04-25  9:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-25  9:06 [PATCH v4 0/5] Improving performance of git clean Erik Elfström
2015-04-25  9:06 ` [PATCH v4 1/5] setup: add gentle version of read_gitfile Erik Elfström
2015-04-25 16:51   ` Junio C Hamano
2015-04-25 16:54     ` Junio C Hamano
2015-04-25  9:06 ` [PATCH v4 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
2015-04-25 16:47   ` Junio C Hamano
2015-04-25 17:59     ` Erik Elfström
2015-04-26  4:29       ` Junio C Hamano
2015-04-26  6:49         ` [PATCH v5 0/5] Improving performance of git clean Erik Elfström
2015-04-26  6:49           ` [PATCH v5 1/5] setup: add gentle version of read_gitfile Erik Elfström
2015-04-28  6:17             ` Jeff King
2015-04-28 20:07               ` erik elfström
2015-04-28 20:19                 ` Jeff King
2015-04-28 20:34                   ` Jonathan Nieder
2015-04-28 20:36                     ` Jeff King
2015-04-28 20:42                       ` Jonathan Nieder
2015-04-28 20:48                         ` Jeff King
2015-04-28 21:06                           ` Jonathan Nieder
2015-04-28 23:34                           ` Junio C Hamano
2015-04-29 23:47             ` Stefan Beller
2015-04-30  1:35               ` Junio C Hamano
2015-04-26  6:49           ` [PATCH v5 2/5] setup: sanity check file size in read_gitfile_gently Erik Elfström
2015-04-28  6:02             ` Jeff King
2015-04-28  7:21               ` Windows path limites, was " Johannes Schindelin
2015-04-28 15:33                 ` Doug Kelly
2015-04-28 16:20                   ` Windows path limits, " Johannes Schindelin
2015-04-28 19:28               ` erik elfström
2015-04-29 15:42             ` Junio C Hamano
2015-04-26  6:49           ` [PATCH v5 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-26  6:49           ` [PATCH v5 4/5] p7300: add performance tests for clean Erik Elfström
2015-04-28  6:33             ` Jeff King
2015-04-28 19:36               ` erik elfström
2015-04-26  6:49           ` [PATCH v5 5/5] clean: improve performance when removing lots of directories Erik Elfström
2015-04-28  6:24             ` Jeff King
2015-04-28 20:31               ` erik elfström
2015-04-25  9:06 ` [PATCH v4 3/5] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-25  9:06 ` [PATCH v4 4/5] p7300: add performance tests for clean Erik Elfström
2015-04-25  9:06 ` Erik Elfström [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1429952801-2646-6-git-send-email-erik.elfstrom@gmail.com \
    --to=erik.elfstrom@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.