git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20] Untracked cache to speed up "git status"
@ 2014-05-07 14:51 Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 01/20] dir.c: coding style fix Nguyễn Thái Ngọc Duy
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

First of all, thanks for pointing to many more big repos. I'll look at
them tomorrow. End-of-day report (or ranting :D) time.

The series now looks good enough for public eyes. I haven't run the
test suite with untracked cache on by default so confidence level is
not so high. Although I suspect racy timestamp issue will practically
disable the cache anyway.

The idea is as before, exploit directory mtime to cache untracked
files. MSDN tells me NTFS on Windows exposes the "good" dir mtime
behavior, which means this series could speed up Git on Windows (I
think Karsten fscache only deals with slow lstat, untracked files..).

It would be nice if Windows people could try and confirm this. This
could be a good point for untracked cache vs watchman (no windows
support, last time I checked). Usage is very simple, "git update-index
--untracked-cache" and you're ready. Do --no-untracked-cache to revert
back.

The peformance numbers on webkit look good. If we focus on
read_directory time only. Normally it takes 890ms. The first run with
untracked cache goes up to 922ms (filling up the cache, not counting
index write time). The next run goes down to 184ms (best case). The
gain is about 80%. lstat costs on directories only about 20ms out of
that 184ms, so I still need to see if I can lower that number further
down.

"git status" performance gain is less impressive of course. Only about
38% with refresh time now becomes the top offender. With
core.preloadindex on, the gain increases to 50%. There's still room
for improvement to maybe make it to 65% by reducing read time, I think.

But again we may not stay in the best case forever. The more dirs are
damaged, the slower it gets. At the end of the spectrum, all dirs are
damanged, we gain nothing but overhead. This is actually when watchman
shines, although projects that do that may need some improvements.

Another bad point for untracked cache is, the extension data is
so specifiec to core git algorithm that it probably cannot be reused
by other implementations. Again, watchman shines here.

Last note, this series conflicts with split-index due to the
write_cache API change, so not a candidate for 'pu' yet. The series
could also be fetched from

https://github.com/pclouds/git/commits/untracked-cache

except the last few timing/experimental patches.

Nguyễn Thái Ngọc Duy (20):
  dir.c: coding style fix
  dir.h: move struct exclude declaration to top level
  prep_exclude: remove the artificial PATH_MAX limit
  dir.c: optionally compute sha-1 of a .gitignore file
  untracked cache: record .gitignore information and dir hierarchy
  untracked cache: initial untracked cache validation
  untracked cache: invalidate dirs recursively if .gitignore changes
  untracked cache: record/validate dir mtime and reuse cached output
  untracked cache: mark what dirs should be recursed/saved
  untracked cache: don't open non-existent .gitignore
  untracked cache: save to an index extension
  untracked cache: load from UNTR index extension
  untracked cache: invalidate at index addition or removal
  untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED
  read-cache.c: split racy stat test to a separate function
  untracked cache: avoid racy timestamps
  status: support untracked cache
  update-index: manually enable or disable untracked cache
  update-index: test the system before enabling untracked cache
  t7063: tests for untracked cache

 .gitignore                                 |   1 +
 Makefile                                   |   1 +
 builtin/commit.c                           |   8 +
 builtin/update-index.c                     | 161 ++++++
 cache.h                                    |   5 +
 dir.c                                      | 853 +++++++++++++++++++++++++++--
 dir.h                                      | 120 +++-
 read-cache.c                               |  51 +-
 t/t7063-status-untracked-cache.sh (new +x) | 352 ++++++++++++
 test-dump-untracked-cache.c (new)          |  61 +++
 unpack-trees.c                             |   7 +-
 wt-status.c                                |   6 +
 12 files changed, 1537 insertions(+), 89 deletions(-)
 create mode 100755 t/t7063-status-untracked-cache.sh
 create mode 100644 test-dump-untracked-cache.c

-- 
1.9.1.346.ga2b5940

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/20] dir.c: coding style fix
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 02/20] dir.h: move struct exclude declaration to top level Nguyễn Thái Ngọc Duy
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 dir.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index eb6f581..7a83f70 100644
--- a/dir.c
+++ b/dir.c
@@ -553,8 +553,7 @@ int add_excludes_from_file_to_list(const char *fname,
 			buf = xrealloc(buf, size+1);
 			buf[size++] = '\n';
 		}
-	}
-	else {
+	} else {
 		size = xsize_t(st.st_size);
 		if (size == 0) {
 			close(fd);
@@ -789,9 +788,11 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 
 	group = &dir->exclude_list_group[EXC_DIRS];
 
-	/* Pop the exclude lists from the EXCL_DIRS exclude_list_group
+	/*
+	 * Pop the exclude lists from the EXCL_DIRS exclude_list_group
 	 * which originate from directories not in the prefix of the
-	 * path being checked. */
+	 * path being checked.
+	 */
 	while ((stk = dir->exclude_stack) != NULL) {
 		if (stk->baselen <= baselen &&
 		    !strncmp(dir->basebuf, base, stk->baselen))
@@ -818,8 +819,7 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 		if (current < 0) {
 			cp = base;
 			current = 0;
-		}
-		else {
+		} else {
 			cp = strchr(base + current + 1, '/');
 			if (!cp)
 				die("oops in prep_exclude");
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/20] dir.h: move struct exclude declaration to top level
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 01/20] dir.c: coding style fix Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 03/20] prep_exclude: remove the artificial PATH_MAX limit Nguyễn Thái Ngọc Duy
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

There is no actual nested struct here. Move it out for clarity.
---
 dir.h | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/dir.h b/dir.h
index 55e5345..02e3710 100644
--- a/dir.h
+++ b/dir.h
@@ -15,6 +15,27 @@ struct dir_entry {
 #define EXC_FLAG_MUSTBEDIR 8
 #define EXC_FLAG_NEGATIVE 16
 
+struct exclude {
+	/*
+	 * This allows callers of last_exclude_matching() etc.
+	 * to determine the origin of the matching pattern.
+	 */
+	struct exclude_list *el;
+
+	const char *pattern;
+	int patternlen;
+	int nowildcardlen;
+	const char *base;
+	int baselen;
+	int flags;
+
+	/*
+	 * Counting starts from 1 for line numbers in ignore files,
+	 * and from -1 decrementing for patterns from CLI args.
+	 */
+	int srcpos;
+};
+
 /*
  * Each excludes file will be parsed into a fresh exclude_list which
  * is appended to the relevant exclude_list_group (either EXC_DIRS or
@@ -32,26 +53,7 @@ struct exclude_list {
 	/* origin of list, e.g. path to filename, or descriptive string */
 	const char *src;
 
-	struct exclude {
-		/*
-		 * This allows callers of last_exclude_matching() etc.
-		 * to determine the origin of the matching pattern.
-		 */
-		struct exclude_list *el;
-
-		const char *pattern;
-		int patternlen;
-		int nowildcardlen;
-		const char *base;
-		int baselen;
-		int flags;
-
-		/*
-		 * Counting starts from 1 for line numbers in ignore files,
-		 * and from -1 decrementing for patterns from CLI args.
-		 */
-		int srcpos;
-	} **excludes;
+	struct exclude **excludes;
 };
 
 /*
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/20] prep_exclude: remove the artificial PATH_MAX limit
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 01/20] dir.c: coding style fix Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 02/20] dir.h: move struct exclude declaration to top level Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 04/20] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This also fixes the problem of silently ignoring .gitignore if the
full path exceeds PATH_MAX. Now add_excludes_from_file() will report
if it gets ENAMETOOLONG.
---
 dir.c | 47 ++++++++++++++++++++++++++++-------------------
 dir.h |  2 +-
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/dir.c b/dir.c
index 7a83f70..c081754 100644
--- a/dir.c
+++ b/dir.c
@@ -795,12 +795,12 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 	 */
 	while ((stk = dir->exclude_stack) != NULL) {
 		if (stk->baselen <= baselen &&
-		    !strncmp(dir->basebuf, base, stk->baselen))
+		    !strncmp(dir->basebuf.buf, base, stk->baselen))
 			break;
 		el = &group->el[dir->exclude_stack->exclude_ix];
 		dir->exclude_stack = stk->prev;
 		dir->exclude = NULL;
-		free((char *)el->src); /* see strdup() below */
+		free((char *)el->src); /* see strbuf_detach() below */
 		clear_exclude_list(el);
 		free(stk);
 		group->nr--;
@@ -810,8 +810,17 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 	if (dir->exclude)
 		return;
 
+	/*
+	 * Lazy initialization. All call sites currently just
+	 * memset(dir, 0, sizeof(*dir)) before use. Changing all of
+	 * them seems lots of work for little benefit.
+	 */
+	if (!dir->basebuf.alloc)
+		strbuf_init(&dir->basebuf, PATH_MAX);
+
 	/* Read from the parent directories and push them down. */
 	current = stk ? stk->baselen : -1;
+	strbuf_setlen(&dir->basebuf, current < 0 ? 0 : current);
 	while (current < baselen) {
 		struct exclude_stack *stk = xcalloc(1, sizeof(*stk));
 		const char *cp;
@@ -829,48 +838,47 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 		stk->baselen = cp - base;
 		stk->exclude_ix = group->nr;
 		el = add_exclude_list(dir, EXC_DIRS, NULL);
-		memcpy(dir->basebuf + current, base + current,
-		       stk->baselen - current);
+		strbuf_add(&dir->basebuf, base + current, stk->baselen - current);
+		assert(stk->baselen == dir->basebuf.len);
 
 		/* Abort if the directory is excluded */
 		if (stk->baselen) {
 			int dt = DT_DIR;
-			dir->basebuf[stk->baselen - 1] = 0;
+			dir->basebuf.buf[stk->baselen - 1] = 0;
 			dir->exclude = last_exclude_matching_from_lists(dir,
-				dir->basebuf, stk->baselen - 1,
-				dir->basebuf + current, &dt);
-			dir->basebuf[stk->baselen - 1] = '/';
+				dir->basebuf.buf, stk->baselen - 1,
+				dir->basebuf.buf + current, &dt);
+			dir->basebuf.buf[stk->baselen - 1] = '/';
 			if (dir->exclude &&
 			    dir->exclude->flags & EXC_FLAG_NEGATIVE)
 				dir->exclude = NULL;
 			if (dir->exclude) {
-				dir->basebuf[stk->baselen] = 0;
 				dir->exclude_stack = stk;
 				return;
 			}
 		}
 
-		/* Try to read per-directory file unless path is too long */
-		if (dir->exclude_per_dir &&
-		    stk->baselen + strlen(dir->exclude_per_dir) < PATH_MAX) {
-			strcpy(dir->basebuf + stk->baselen,
-					dir->exclude_per_dir);
+		/* Try to read per-directory file */
+		if (dir->exclude_per_dir) {
 			/*
 			 * dir->basebuf gets reused by the traversal, but we
 			 * need fname to remain unchanged to ensure the src
 			 * member of each struct exclude correctly
 			 * back-references its source file.  Other invocations
 			 * of add_exclude_list provide stable strings, so we
-			 * strdup() and free() here in the caller.
+			 * strbuf_detach() and free() here in the caller.
 			 */
-			el->src = strdup(dir->basebuf);
-			add_excludes_from_file_to_list(dir->basebuf,
-					dir->basebuf, stk->baselen, el, 1);
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addbuf(&sb, &dir->basebuf);
+			strbuf_addstr(&sb, dir->exclude_per_dir);
+			el->src = strbuf_detach(&sb, NULL);
+			add_excludes_from_file_to_list(el->src, el->src,
+						       stk->baselen, el, 1);
 		}
 		dir->exclude_stack = stk;
 		current = stk->baselen;
 	}
-	dir->basebuf[baselen] = '\0';
+	strbuf_setlen(&dir->basebuf, baselen);
 }
 
 /*
@@ -1668,4 +1676,5 @@ void clear_directory(struct dir_struct *dir)
 		free(stk);
 		stk = prev;
 	}
+	strbuf_release(&dir->basebuf);
 }
diff --git a/dir.h b/dir.h
index 02e3710..6c45e9d 100644
--- a/dir.h
+++ b/dir.h
@@ -119,7 +119,7 @@ struct dir_struct {
 	 */
 	struct exclude_stack *exclude_stack;
 	struct exclude *exclude;
-	char basebuf[PATH_MAX];
+	struct strbuf basebuf;
 };
 
 /*
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/20] dir.c: optionally compute sha-1 of a .gitignore file
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 03/20] prep_exclude: remove the artificial PATH_MAX limit Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 05/20] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This is not used anywhere yet. But the goal is to compare quickly if a
.gitignore file has changed when one has SHA-1 of the old .gitignore.
---
 dir.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index c081754..e2edeca 100644
--- a/dir.c
+++ b/dir.c
@@ -466,7 +466,8 @@ void add_exclude(const char *string, const char *base,
 	x->el = el;
 }
 
-static void *read_skip_worktree_file_from_index(const char *path, size_t *size)
+static void *read_skip_worktree_file_from_index(const char *path, size_t *size,
+						unsigned char *sha1)
 {
 	int pos, len;
 	unsigned long sz;
@@ -485,6 +486,8 @@ static void *read_skip_worktree_file_from_index(const char *path, size_t *size)
 		return NULL;
 	}
 	*size = xsize_t(sz);
+	if (sha1)
+		hashcpy(sha1, active_cache[pos]->sha1);
 	return data;
 }
 
@@ -525,11 +528,14 @@ static void trim_trailing_spaces(char *buf)
 		buf[last_space] = '\0';
 }
 
-int add_excludes_from_file_to_list(const char *fname,
-				   const char *base,
-				   int baselen,
-				   struct exclude_list *el,
-				   int check_index)
+static int add_excludes(const char *fname,
+			const char *base,
+			int baselen,
+			struct exclude_list *el,
+			int check_index,
+			unsigned char *sha1,
+			struct stat_data *ref_stat,
+			const unsigned char *ref_sha1)
 {
 	struct stat st;
 	int fd, i, lineno = 1;
@@ -543,9 +549,13 @@ int add_excludes_from_file_to_list(const char *fname,
 		if (0 <= fd)
 			close(fd);
 		if (!check_index ||
-		    (buf = read_skip_worktree_file_from_index(fname, &size)) == NULL)
+		    (buf = read_skip_worktree_file_from_index(fname, &size, sha1)) == NULL)
 			return -1;
+		if (ref_stat)
+			memset(ref_stat, 0, sizeof(*ref_stat));
 		if (size == 0) {
+			if (sha1)
+				hashcpy(sha1, EMPTY_BLOB_SHA1_BIN);
 			free(buf);
 			return 0;
 		}
@@ -556,6 +566,10 @@ int add_excludes_from_file_to_list(const char *fname,
 	} else {
 		size = xsize_t(st.st_size);
 		if (size == 0) {
+			if (ref_stat)
+				fill_stat_data(ref_stat, &st);
+			if (sha1)
+				hashcpy(sha1, EMPTY_BLOB_SHA1_BIN);
 			close(fd);
 			return 0;
 		}
@@ -567,6 +581,21 @@ int add_excludes_from_file_to_list(const char *fname,
 		}
 		buf[size++] = '\n';
 		close(fd);
+		if (sha1) {
+			int pos;
+			if (!ref_stat &&
+			    (pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
+			    !ce_stage(active_cache[pos]) &&
+			    ce_uptodate(active_cache[pos]))
+				hashcpy(sha1, active_cache[pos]->sha1);
+			else if (ref_stat && !match_stat_data(ref_stat, &st)) {
+				if (ref_sha1 != sha1) /* support ref_sha1 == sha1 */
+					hashcpy(sha1, ref_sha1);
+			} else
+				hash_sha1_file(buf, size, "blob", sha1);
+		}
+		if (ref_stat)
+			fill_stat_data(ref_stat, &st);
 	}
 
 	el->filebuf = buf;
@@ -585,6 +614,14 @@ int add_excludes_from_file_to_list(const char *fname,
 	return 0;
 }
 
+int add_excludes_from_file_to_list(const char *fname, const char *base, int baselen,
+				   struct exclude_list *el, int check_index)
+{
+	return add_excludes(fname, base, baselen, el, check_index,
+			    NULL, NULL, NULL);
+}
+
+
 struct exclude_list *add_exclude_list(struct dir_struct *dir,
 				      int group_type, const char *src)
 {
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/20] untracked cache: record .gitignore information and dir hierarchy
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 04/20] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 06/20] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The idea is if we can capture all input and output of
read_directory_recursive() and verify at a later time that all the
input is the same, then read_directory_recursive() should produce the
same output, so we can bypass read_directory_recursive() and reuse the
cached output for the directory in question (the bypass code needs to
verify subdirectories separately)

The list of input of read_directory_recursive() is in the big comment
block in dir.h. This cache focuses on only untracked files as the
output from r_d_r(), not ignored files because the number of tracked
files is usually small, so small cache overhead, while the number of
ignored files could go really high (e.g. *.o files mixing with source
code).

This patch captures .gitignore information, check_only bit and the
list of directories that read_directory_recursive() examines.

Two hash_sha1_file() are required for $GIT_DIR/info/exclude and
core.excludesfile unless their stat data matches. hash_sha1_file() is
only needed when .gitignore files in the worktree are modified,
otherwise their SHA-1 in index is used.

We could store stat data for .gitignore files so we don't have to
rehash them if their content is different from index, but I think
.gitignore files are rarely modified, so not worth extra cache data
(and hashing penalty read-cache.c:verify_hdr(), as we will be storing
this as an index extension).

This means if you change .gitignore at root directory, you better add
it to the index soon or you lose all the benefit of untracked cache (a
change at root .gitignore invalidates everything) and pay the cache
overhead for nothing.
---
 dir.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 dir.h |  64 +++++++++++++++++++++++++
 2 files changed, 210 insertions(+), 21 deletions(-)

diff --git a/dir.c b/dir.c
index e2edeca..34a10b2 100644
--- a/dir.c
+++ b/dir.c
@@ -32,7 +32,7 @@ enum path_treatment {
 };
 
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
-	const char *path, int len,
+	const char *path, int len, struct untracked_cache_dir *untracked,
 	int check_only, const struct path_simplify *simplify);
 static int get_dtype(struct dirent *de, const char *path, int len);
 
@@ -528,6 +528,46 @@ static void trim_trailing_spaces(char *buf)
 		buf[last_space] = '\0';
 }
 
+static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+						    struct untracked_cache_dir *dir,
+						    const char *name, int len)
+{
+	int first, last;
+	struct untracked_cache_dir *d;
+	if (!dir)
+		return NULL;
+	if (len && name[len - 1] == '/')
+		len--;
+	first = 0;
+	last = dir->dirs_nr;
+	while (last > first) {
+		int cmp, next = (last + first) >> 1;
+		d = dir->dirs[next];
+		cmp = strncmp(name, d->name, len);
+		if (!cmp && strlen(d->name) > len)
+			cmp = -1;
+		if (!cmp)
+			return d;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+
+	uc->dir_created++;
+	d = xmalloc(sizeof(*d) + len);
+	memset(d, 0, sizeof(*d) + len);
+	memcpy(d->name, name, len);
+
+	ALLOC_GROW(dir->dirs, dir->dirs_nr + 1, dir->dirs_alloc);
+	memmove(dir->dirs + first + 1, dir->dirs + first,
+		(dir->dirs_nr - first) * sizeof(*dir->dirs));
+	dir->dirs_nr++;
+	dir->dirs[first] = d;
+	return d;
+}
+
 static int add_excludes(const char *fname,
 			const char *base,
 			int baselen,
@@ -639,14 +679,22 @@ struct exclude_list *add_exclude_list(struct dir_struct *dir,
 /*
  * Used to set up core.excludesfile and .git/info/exclude lists.
  */
-void add_excludes_from_file(struct dir_struct *dir, const char *fname)
+void add_excludes_from_file_1(struct dir_struct *dir, const char *fname,
+			      unsigned char *sha1,
+			      struct stat_data *ref_stat,
+			      const unsigned char *ref_sha1)
 {
 	struct exclude_list *el;
 	el = add_exclude_list(dir, EXC_FILE, fname);
-	if (add_excludes_from_file_to_list(fname, "", 0, el, 0) < 0)
+	if (add_excludes(fname, "", 0, el, 0, sha1, ref_stat, ref_sha1) < 0)
 		die("cannot use %s as an exclude file", fname);
 }
 
+void add_excludes_from_file(struct dir_struct *dir, const char *fname)
+{
+	add_excludes_from_file_1(dir, fname, NULL, NULL, NULL);
+}
+
 int match_basename(const char *basename, int basenamelen,
 		   const char *pattern, int prefix, int patternlen,
 		   int flags)
@@ -821,6 +869,7 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 	struct exclude_list_group *group;
 	struct exclude_list *el;
 	struct exclude_stack *stk = NULL;
+	struct untracked_cache_dir *untracked = NULL;
 	int current;
 
 	group = &dir->exclude_list_group[EXC_DIRS];
@@ -858,18 +907,39 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 	/* Read from the parent directories and push them down. */
 	current = stk ? stk->baselen : -1;
 	strbuf_setlen(&dir->basebuf, current < 0 ? 0 : current);
+
+	if (dir->untracked && current >= 0 && current < baselen) {
+		const char *start = base;
+		const char *end	  = base + current;
+		untracked = dir->untracked->root;
+		while (start < end) {
+			const char *cp = strchrnul(start, '/');
+			untracked =
+				lookup_untracked(dir->untracked, untracked,
+						 start, cp - start);
+			start = *cp == '/' ? cp + 1 : cp;
+		}
+	}
+
 	while (current < baselen) {
 		struct exclude_stack *stk = xcalloc(1, sizeof(*stk));
 		const char *cp;
+		unsigned char sha1[20];
 
 		if (current < 0) {
 			cp = base;
 			current = 0;
+			if (dir->untracked)
+				untracked = dir->untracked->root;
 		} else {
 			cp = strchr(base + current + 1, '/');
 			if (!cp)
 				die("oops in prep_exclude");
 			cp++;
+			untracked =
+				lookup_untracked(dir->untracked, untracked,
+						 base + current,
+						 cp - base - current);
 		}
 		stk->prev = dir->exclude_stack;
 		stk->baselen = cp - base;
@@ -896,6 +966,7 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 		}
 
 		/* Try to read per-directory file */
+		hashclr(sha1);
 		if (dir->exclude_per_dir) {
 			/*
 			 * dir->basebuf gets reused by the traversal, but we
@@ -909,8 +980,11 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 			strbuf_addbuf(&sb, &dir->basebuf);
 			strbuf_addstr(&sb, dir->exclude_per_dir);
 			el->src = strbuf_detach(&sb, NULL);
-			add_excludes_from_file_to_list(el->src, el->src,
-						       stk->baselen, el, 1);
+			add_excludes(el->src, el->src, stk->baselen, el, 1,
+				     untracked ? sha1 : NULL, NULL, NULL);
+		}
+		if (untracked) {
+			hashcpy(untracked->exclude_sha1, sha1);
 		}
 		dir->exclude_stack = stk;
 		current = stk->baselen;
@@ -1091,6 +1165,7 @@ static enum exist_status directory_exists_in_index(const char *dirname, int len)
  *  (c) otherwise, we recurse into it.
  */
 static enum path_treatment treat_directory(struct dir_struct *dir,
+	struct untracked_cache_dir *untracked,
 	const char *dirname, int len, int exclude,
 	const struct path_simplify *simplify)
 {
@@ -1118,7 +1193,9 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
 		return exclude ? path_excluded : path_untracked;
 
-	return read_directory_recursive(dir, dirname, len, 1, simplify);
+	untracked = lookup_untracked(dir->untracked, untracked, dirname, len);
+	return read_directory_recursive(dir, dirname, len,
+					untracked, 1, simplify);
 }
 
 /*
@@ -1234,6 +1311,7 @@ static int get_dtype(struct dirent *de, const char *path, int len)
 }
 
 static enum path_treatment treat_one_path(struct dir_struct *dir,
+					  struct untracked_cache_dir *untracked,
 					  struct strbuf *path,
 					  const struct path_simplify *simplify,
 					  int dtype, struct dirent *de)
@@ -1286,7 +1364,7 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 		return path_none;
 	case DT_DIR:
 		strbuf_addch(path, '/');
-		return treat_directory(dir, path->buf, path->len, exclude,
+		return treat_directory(dir, untracked, path->buf, path->len, exclude,
 			simplify);
 	case DT_REG:
 	case DT_LNK:
@@ -1295,6 +1373,7 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 }
 
 static enum path_treatment treat_path(struct dir_struct *dir,
+				      struct untracked_cache_dir *untracked,
 				      struct dirent *de,
 				      struct strbuf *path,
 				      int baselen,
@@ -1310,7 +1389,16 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		return path_none;
 
 	dtype = DTYPE(de);
-	return treat_one_path(dir, path, simplify, dtype, de);
+	return treat_one_path(dir, untracked, path, simplify, dtype, de);
+}
+
+static void add_untracked(struct untracked_cache_dir *dir, const char *name)
+{
+	if (!dir)
+		return;
+	ALLOC_GROW(dir->untracked, dir->untracked_nr + 1,
+		   dir->untracked_alloc);
+	dir->untracked[dir->untracked_nr++] = xstrdup(name);
 }
 
 /*
@@ -1326,7 +1414,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
  */
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 				    const char *base, int baselen,
-				    int check_only,
+				    struct untracked_cache_dir *untracked, int check_only,
 				    const struct path_simplify *simplify)
 {
 	DIR *fdir;
@@ -1340,24 +1428,36 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	if (!fdir)
 		goto out;
 
+	if (untracked)
+		untracked->check_only = !!check_only;
+
 	while ((de = readdir(fdir)) != NULL) {
 		/* check how the file or directory should be treated */
-		state = treat_path(dir, de, &path, baselen, simplify);
+		state = treat_path(dir, untracked, de, &path, baselen, simplify);
+
 		if (state > dir_state)
 			dir_state = state;
 
 		/* recurse into subdir if instructed by treat_path */
 		if (state == path_recurse) {
-			subdir_state = read_directory_recursive(dir, path.buf,
-				path.len, check_only, simplify);
+			struct untracked_cache_dir *ud;
+			ud = lookup_untracked(dir->untracked, untracked,
+					      path.buf + baselen,
+					      path.len - baselen);
+			subdir_state =
+				read_directory_recursive(dir, path.buf, path.len,
+							 ud, check_only, simplify);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 		}
 
 		if (check_only) {
 			/* abort early if maximum state has been reached */
-			if (dir_state == path_untracked)
+			if (dir_state == path_untracked) {
+				if (untracked)
+					add_untracked(untracked, path.buf + baselen);
 				break;
+			}
 			/* skip the dir_add_* part */
 			continue;
 		}
@@ -1375,8 +1475,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			break;
 
 		case path_untracked:
-			if (!(dir->flags & DIR_SHOW_IGNORED))
-				dir_add_name(dir, path.buf, path.len);
+			if (dir->flags & DIR_SHOW_IGNORED)
+				break;
+			dir_add_name(dir, path.buf, path.len);
+			if (untracked)
+				add_untracked(untracked, path.buf + baselen);
 			break;
 
 		default:
@@ -1454,7 +1557,7 @@ static int treat_leading_path(struct dir_struct *dir,
 			break;
 		if (simplify_away(sb.buf, sb.len, simplify))
 			break;
-		if (treat_one_path(dir, &sb, simplify,
+		if (treat_one_path(dir, NULL, &sb, simplify,
 				   DT_DIR, NULL) == path_none)
 			break; /* do not recurse into it */
 		if (len <= baselen) {
@@ -1494,7 +1597,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 	 */
 	simplify = create_simplify(pathspec ? pathspec->_raw : NULL);
 	if (!len || treat_leading_path(dir, path, len, simplify))
-		read_directory_recursive(dir, path, len, 0, simplify);
+		read_directory_recursive(dir, path, len,
+					 dir->untracked ? dir->untracked->root : NULL,
+					 0, simplify);
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
 	qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name);
@@ -1660,10 +1765,30 @@ void setup_standard_excludes(struct dir_struct *dir)
 		home_config_paths(NULL, &xdg_path, "ignore");
 		excludes_file = xdg_path;
 	}
-	if (!access_or_warn(path, R_OK, 0))
-		add_excludes_from_file(dir, path);
-	if (excludes_file && !access_or_warn(excludes_file, R_OK, 0))
-		add_excludes_from_file(dir, excludes_file);
+	if (!access_or_warn(path, R_OK, 0)) {
+		const unsigned char*ref_sha1 = NULL;
+		struct stat_data   *ref_stat = NULL;
+		unsigned char	   *sha1     = NULL;
+		if (dir->untracked) {
+			sha1	 = dir->info_exclude_sha1;
+			ref_stat = &dir->info_exclude_stat;
+			ref_sha1 = sha1;
+		}
+		add_excludes_from_file_1(dir, path, sha1,
+					 ref_stat, ref_sha1);
+	}
+	if (excludes_file && !access_or_warn(excludes_file, R_OK, 0)) {
+		const unsigned char*ref_sha1 = NULL;
+		struct stat_data   *ref_stat = NULL;
+		unsigned char	   *sha1     = NULL;
+		if (dir->untracked) {
+			sha1	 = dir->excludes_file_sha1;
+			ref_stat = &dir->excludes_file_stat;
+			ref_sha1 = sha1;
+		}
+		add_excludes_from_file_1(dir, excludes_file,
+					 sha1, ref_stat, ref_sha1);
+	}
 }
 
 int remove_path(const char *name)
diff --git a/dir.h b/dir.h
index 6c45e9d..bce7055 100644
--- a/dir.h
+++ b/dir.h
@@ -73,6 +73,63 @@ struct exclude_list_group {
 	struct exclude_list *el;
 };
 
+
+/*
+ *  Untracked cache
+ *
+ *  The following inputs are sufficient to determine what files in a
+ *  directory are excluded:
+ *
+ *   - The list of files and directories of the direction in question
+ *   - The $GIT_DIR/index
+ *   - dir_struct flags
+ *   - The content of $GIT_DIR/info/exclude
+ *   - The content of core.excludesfile
+ *   - The content (or the lack) of .gitignore of all parent directories
+ *     from $GIT_WORK_TREE
+ *   - The check_only flag in read_directory_recursive (for
+ *     DIR_HIDE_EMPTY_DIRECTORIES)
+ *
+ *  The first input can be checked using directory mtime. In many
+ *  filesystems, directory mtime (stat_data field) is updated when its
+ *  files or direct subdirs are added or removed.
+ *
+ *  The second one can be hooked from cache_tree_invalidate_path().
+ *  Whenever a file (or a submodule) is added or removed from a directory,
+ *  we invalidate (i.e. setting untracked_nr to -1) that directory.
+ *
+ *  The remaining inputs are easy, their SHA-1 could be used to verify
+ *  their contents (exclude_sha1[], info_exclude_sha1[] and
+ *  excludes_file_sha1[])
+ */
+struct untracked_cache_dir {
+	struct untracked_cache_dir **dirs;
+	char **untracked;
+	/* null SHA-1 means this directory does not have .gitignore */
+	unsigned char exclude_sha1[20];
+	struct stat_data stat_data;
+	unsigned int check_only : 1;
+	unsigned int untracked_nr : 29;
+	unsigned int untracked_alloc, dirs_nr, dirs_alloc;
+	char name[1];
+};
+
+struct untracked_cache {
+	struct stat_data info_exclude_stat;
+	struct stat_data excludes_file_stat;
+	unsigned char info_exclude_sha1[20];
+	unsigned char excludes_file_sha1[20];
+	const char *exclude_per_dir;
+	/*
+	 * dir_struct#flags must match dir_flags or the untracked
+	 * cache is ignored.
+	 */
+	unsigned dir_flags;
+	struct untracked_cache_dir *root;
+	/* Statistics */
+	int dir_created;
+};
+
 struct dir_struct {
 	int nr, alloc;
 	int ignored_nr, ignored_alloc;
@@ -120,6 +177,13 @@ struct dir_struct {
 	struct exclude_stack *exclude_stack;
 	struct exclude *exclude;
 	struct strbuf basebuf;
+
+	/* Enable untracked file cache if set */
+	struct untracked_cache *untracked;
+	struct stat_data info_exclude_stat;
+	struct stat_data excludes_file_stat;
+	unsigned char info_exclude_sha1[20];
+	unsigned char excludes_file_sha1[20];
 };
 
 /*
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/20] untracked cache: initial untracked cache validation
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 05/20] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 07/20] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 dir.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 dir.h |   3 ++
 2 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 34a10b2..a198aa8 100644
--- a/dir.c
+++ b/dir.c
@@ -568,6 +568,22 @@ static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
 	return d;
 }
 
+static void do_invalidate_gitignore(struct untracked_cache_dir *dir)
+{
+	int i;
+	dir->valid = 0;
+	dir->untracked_nr = 0;
+	for (i = 0; i < dir->dirs_nr; i++)
+		do_invalidate_gitignore(dir->dirs[i]);
+}
+
+static void invalidate_gitignore(struct untracked_cache *uc,
+				 struct untracked_cache_dir *dir)
+{
+	uc->gitignore_invalidated++;
+	do_invalidate_gitignore(dir);
+}
+
 static int add_excludes(const char *fname,
 			const char *base,
 			int baselen,
@@ -685,6 +701,13 @@ void add_excludes_from_file_1(struct dir_struct *dir, const char *fname,
 			      const unsigned char *ref_sha1)
 {
 	struct exclude_list *el;
+	/*
+	 * catch setup_standard_excludes() that's called before
+	 * dir->untracked is assigned. That function behaves
+	 * differently when dir->untracked is non-NULL.
+	 */
+	if (!dir->untracked)
+		dir->unmanaged_exclude_files++;
 	el = add_exclude_list(dir, EXC_FILE, fname);
 	if (add_excludes(fname, "", 0, el, 0, sha1, ref_stat, ref_sha1) < 0)
 		die("cannot use %s as an exclude file", fname);
@@ -692,6 +715,7 @@ void add_excludes_from_file_1(struct dir_struct *dir, const char *fname,
 
 void add_excludes_from_file(struct dir_struct *dir, const char *fname)
 {
+	dir->unmanaged_exclude_files++; /* see validate_untracked_cache() */
 	add_excludes_from_file_1(dir, fname, NULL, NULL, NULL);
 }
 
@@ -1570,9 +1594,89 @@ static int treat_leading_path(struct dir_struct *dir,
 	return rc;
 }
 
+static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
+						      int base_len,
+						      const struct pathspec *pathspec)
+{
+	struct untracked_cache_dir *root;
+
+	if (!dir->untracked)
+		return NULL;
+
+	/*
+	 * We only support $GIT_DIR/info/exclude and core.excludesfile
+	 * as the global ignore rule files. Any other additions
+	 * (e.g. from command line) invalidate the cache. This
+	 * condition also catches running setup_standard_excludes()
+	 * before setting dir->untracked!
+	 */
+	if (dir->unmanaged_exclude_files)
+		return NULL;
+
+	/*
+	 * Optimize for the main use case only: whole-tree git
+	 * status. More work involved in treat_leading_path() if we
+	 * use cache on just a subset of the worktree. pathspec
+	 * support could make the matter even worse.
+	 */
+	if (base_len || (pathspec && pathspec->nr))
+		return NULL;
+
+	/* Different set of flags may produce different results */
+	if (dir->flags != dir->untracked->dir_flags ||
+	    /*
+	     * See treat_directory(), case index_nonexistent. Without
+	     * this flag, we may need to also cache .git file content
+	     * for the resolve_gitlink_ref() call, which we don't.
+	     */
+	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
+	    /* We don't support collecting ignore files */
+	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			   DIR_COLLECT_IGNORED)))
+		return NULL;
+
+	/*
+	 * If we use .gitignore in the cache and now you change it to
+	 * .gitexclude, everything will go wrong.
+	 */
+	if (dir->exclude_per_dir != dir->untracked->exclude_per_dir &&
+	    strcmp(dir->exclude_per_dir, dir->untracked->exclude_per_dir))
+		return NULL;
+
+	/*
+	 * EXC_CMDL is not considered in the cache. If people set it,
+	 * skip the cache.
+	 */
+	if (dir->exclude_list_group[EXC_CMDL].nr)
+		return NULL;
+
+	if (!dir->untracked->root) {
+		const int len = sizeof(*dir->untracked->root);
+		dir->untracked->root = xmalloc(len);
+		memset(dir->untracked->root, 0, len);
+	}
+
+	/* Validate $GIT_DIR/info/exclude and core.excludesfile */
+	root = dir->untracked->root;
+	if (hashcmp(dir->info_exclude_sha1,
+		    dir->untracked->info_exclude_sha1)) {
+		invalidate_gitignore(dir->untracked, root);
+		hashcpy(dir->untracked->info_exclude_sha1,
+			dir->info_exclude_sha1);
+	}
+	if (hashcmp(dir->excludes_file_sha1,
+		    dir->untracked->excludes_file_sha1)) {
+		invalidate_gitignore(dir->untracked, root);
+		hashcpy(dir->untracked->excludes_file_sha1,
+			dir->excludes_file_sha1);
+	}
+	return root;
+}
+
 int read_directory(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec)
 {
 	struct path_simplify *simplify;
+	struct untracked_cache_dir *untracked;
 
 	/*
 	 * Check out create_simplify()
@@ -1596,10 +1700,15 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 	 * create_simplify().
 	 */
 	simplify = create_simplify(pathspec ? pathspec->_raw : NULL);
+	untracked = validate_untracked_cache(dir, len, pathspec);
+	if (!untracked)
+		/*
+		 * make sure untracked cache code path is disabled,
+		 * e.g. prep_exclude()
+		 */
+		dir->untracked = NULL;
 	if (!len || treat_leading_path(dir, path, len, simplify))
-		read_directory_recursive(dir, path, len,
-					 dir->untracked ? dir->untracked->root : NULL,
-					 0, simplify);
+		read_directory_recursive(dir, path, len, untracked, 0, simplify);
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
 	qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name);
diff --git a/dir.h b/dir.h
index bce7055..ded251e 100644
--- a/dir.h
+++ b/dir.h
@@ -109,6 +109,7 @@ struct untracked_cache_dir {
 	unsigned char exclude_sha1[20];
 	struct stat_data stat_data;
 	unsigned int check_only : 1;
+	unsigned int valid : 1;
 	unsigned int untracked_nr : 29;
 	unsigned int untracked_alloc, dirs_nr, dirs_alloc;
 	char name[1];
@@ -128,6 +129,7 @@ struct untracked_cache {
 	struct untracked_cache_dir *root;
 	/* Statistics */
 	int dir_created;
+	int gitignore_invalidated;
 };
 
 struct dir_struct {
@@ -180,6 +182,7 @@ struct dir_struct {
 
 	/* Enable untracked file cache if set */
 	struct untracked_cache *untracked;
+	unsigned unmanaged_exclude_files;
 	struct stat_data info_exclude_stat;
 	struct stat_data excludes_file_stat;
 	unsigned char info_exclude_sha1[20];
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/20] untracked cache: invalidate dirs recursively if .gitignore changes
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 06/20] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 08/20] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

It's easy to see that if an existing .gitignore changes, its SHA-1
would be different and invalidate_gitignore() is called.

If .gitignore is removed, add_excludes() will treat it like an empty
.gitignore, which again should invalidate the cached directory data.

if .gitignore is added, lookup_untracked() already fills initial
.gitignore SHA-1 as "empty file", so again invalidate_gitignore() is
called.
---
 dir.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/dir.c b/dir.c
index a198aa8..6370f6e 100644
--- a/dir.c
+++ b/dir.c
@@ -1007,7 +1007,26 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 			add_excludes(el->src, el->src, stk->baselen, el, 1,
 				     untracked ? sha1 : NULL, NULL, NULL);
 		}
+		/*
+		 * NEEDSWORK: when untracked cache is enabled,
+		 * prep_exclude() will first be called in
+		 * valid_cached_dir() then maybe many times more in
+		 * last_exclude_matching(). When the cache is used,
+		 * last_exclude_matching() will not be called and
+		 * reading .gitignore content will be a waste.
+		 *
+		 * So when it's called by valid_cached_dir() and we
+		 * can get .gitignore SHA-1 from the index
+		 * (i.e. .gitignore is not modified on work tree), we
+		 * could delay reading the .gitignore content until we
+		 * absolutely need it in last_exclude_matching(). Be
+		 * careful about ignore rule order, though, if you do
+		 * that.
+		 */
 		if (untracked) {
+			if (hashcmp(sha1, untracked->exclude_sha1))
+				invalidate_gitignore(dir->untracked,
+						     untracked);
 			hashcpy(untracked->exclude_sha1, sha1);
 		}
 		dir->exclude_stack = stk;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/20] untracked cache: record/validate dir mtime and reuse cached output
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 07/20] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 09/20] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The main readdir loop in read_directory_recursive() is replaced with a
new one that checks if cached results of a directory is still valid.

If a file is added or removed from the index, the containing directory
is invalidated (but not its subdirs). If directory's mtime is changed,
the same happens. If a .gitignore is updated, the containing directory
and all subdirs are invalidated recursively. If dir_struct#flags or
other conditions change, the cache is ignored.

If a directory is invalidated, we opendir/readdir/closedir and run the
exclude machinery on that directory listing as usual. If untracked
cache is also enabled, we'll update the cache along the way. If a
directory is validated, we simply pull the untracked listing out from
the cache. The cache also records the list of direct subdirs that we
have to recurse in. Fully excluded directories are seen as "untracked
files".

In the best case when no dirs are invalidated, read_directory()
becomes a series of stat(dir), open(.gitignore), fstat, read, close
and optionally hash_sha1_file. For comparison, standard
read_directory() is a sequence of opendir, readdir, open(gitignore),
fstat, read, close, (expensive) last_exclude_matching and closedir.

We already try not to open(gitignore) if we know it does not exist, so
open/fstat/read/close sequence does not apply to every directory. The
sequence could be reduced further, as noted in prep_exclude(). So in
theory, the entire best-case read_directory sequence could be reduced
to a series of stat() and nothing else.

This is not a silver bullet approach. When you compile a C file, for
example, the old .o file is removed and a new one with the same name
created, effectively invalidating the containing directory's cache
(but not its subdirectories). If your build process touches every
directory, this cache adds extra overhead for nothing, so it's a good
idea to separate generated files from tracked files.. Editors may use
the same strategy for saving files. And of course you're out of luck
running your repo on an unsupported filesytem and/or operating system.
---
 dir.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 dir.h |   2 +
 2 files changed, 172 insertions(+), 10 deletions(-)

diff --git a/dir.c b/dir.c
index 6370f6e..205f323 100644
--- a/dir.c
+++ b/dir.c
@@ -31,6 +31,24 @@ enum path_treatment {
 	path_untracked
 };
 
+/*
+ * Support data structure for our opendir/readdir/closedir wrappers
+ */
+struct cached_dir {
+	DIR *fdir;
+	struct untracked_cache_dir *untracked;
+	int nr_files;
+	int nr_dirs;
+
+	/*
+	 * return data from read_cached_dir(). name and state are only
+	 * valid if de is NULL
+	 */
+	struct dirent *de;
+	const char *file;
+	struct untracked_cache_dir *ucd;
+};
+
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	const char *path, int len, struct untracked_cache_dir *untracked,
 	int check_only, const struct path_simplify *simplify);
@@ -584,6 +602,14 @@ static void invalidate_gitignore(struct untracked_cache *uc,
 	do_invalidate_gitignore(dir);
 }
 
+static void invalidate_directory(struct untracked_cache *uc,
+				 struct untracked_cache_dir *dir)
+{
+	uc->dir_invalidated++;
+	dir->valid = 0;
+	dir->untracked_nr = 0;
+}
+
 static int add_excludes(const char *fname,
 			const char *base,
 			int baselen,
@@ -1415,15 +1441,54 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	}
 }
 
+static enum path_treatment treat_path_fast(struct dir_struct *dir,
+					   struct untracked_cache_dir *untracked,
+					   struct cached_dir *cdir,
+					   struct strbuf *path,
+					   int baselen,
+					   const struct path_simplify *simplify)
+{
+	if (!cdir->ucd) {
+		strbuf_setlen(path, baselen);
+		strbuf_addstr(path, cdir->file);
+		return path_untracked;
+	}
+	strbuf_setlen(path, baselen);
+	strbuf_addstr(path, cdir->ucd->name);
+	/* treat_one_path() does this before it calls treat_directory() */
+	if (path->buf[path->len - 1] != '/')
+		strbuf_addch(path, '/');
+	if (cdir->ucd->check_only)
+		/*
+		 * check_only is set as a result of treat_directory() getting
+		 * to its bottom. Verify again the same set of directories
+		 * with check_only set.
+		 */
+		return read_directory_recursive(dir, path->buf, path->len,
+
+						cdir->ucd, 1, simplify);
+	/*
+	 * We get path_recurse in the first run when
+	 * directory_exists_in_index() returns index_nonexistent. We
+	 * are sure that new changes in the index does not impact the
+	 * outcome. Return now.
+	 */
+	return path_recurse;
+}
+
 static enum path_treatment treat_path(struct dir_struct *dir,
 				      struct untracked_cache_dir *untracked,
-				      struct dirent *de,
+				      struct cached_dir *cdir,
 				      struct strbuf *path,
 				      int baselen,
 				      const struct path_simplify *simplify)
 {
 	int dtype;
+	struct dirent *de = cdir->de;
 
+	if (!de)
+		return treat_path_fast(dir, untracked, cdir, path,
+				       baselen, simplify);
 	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
 		return path_none;
 	strbuf_setlen(path, baselen);
@@ -1444,6 +1509,103 @@ static void add_untracked(struct untracked_cache_dir *dir, const char *name)
 	dir->untracked[dir->untracked_nr++] = xstrdup(name);
 }
 
+static int valid_cached_dir(struct dir_struct *dir,
+			    struct untracked_cache_dir *untracked,
+			    struct strbuf *path,
+			    int check_only)
+{
+	struct stat st;
+
+	if (stat(path->len ? path->buf : ".", &st)) {
+		invalidate_directory(dir->untracked, untracked);
+		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
+		return 0;
+	}
+	if (!untracked->valid ||
+	    match_stat_data(&untracked->stat_data, &st)) {
+		if (untracked->valid)
+			invalidate_directory(dir->untracked, untracked);
+		fill_stat_data(&untracked->stat_data, &st);
+		return 0;
+	}
+
+	if (untracked->check_only != !!check_only) {
+		invalidate_directory(dir->untracked, untracked);
+		return 0;
+	}
+
+	/*
+	 * prep_exclude will be called eventually on this directory,
+	 * but it's called much later in last_exclude_matching(). We
+	 * need it now to determine the validity of the cache for this
+	 * path. The next calls will be nearly no-op, the way
+	 * prep_exclude() is designed.
+	 */
+	if (path->len && path->buf[path->len - 1] != '/') {
+		strbuf_addch(path, '/');
+		prep_exclude(dir, path->buf, path->len);
+		strbuf_setlen(path, path->len - 1);
+	} else
+		prep_exclude(dir, path->buf, path->len);
+
+	/* hopefully prep_exclude() haven't invalidated this entry... */
+	return untracked->valid;
+}
+
+static int open_cached_dir(struct cached_dir *cdir,
+			   struct dir_struct *dir,
+			   struct untracked_cache_dir *untracked,
+			   struct strbuf *path,
+			   int check_only)
+{
+	memset(cdir, 0, sizeof(*cdir));
+	cdir->untracked = untracked;
+	if (!untracked ||
+	    !valid_cached_dir(dir, untracked, path, check_only)) {
+		cdir->fdir = opendir(path->len ? path->buf : ".");
+		if (dir->untracked)
+			dir->untracked->dir_opened++;
+		if (!cdir->fdir)
+			return -1;
+	}
+	return 0;
+}
+
+int read_cached_dir(struct cached_dir *cdir)
+{
+	if (cdir->fdir) {
+		cdir->de = readdir(cdir->fdir);
+		if (!cdir->de)
+			return -1;
+		return 0;
+	}
+	while (cdir->nr_dirs < cdir->untracked->dirs_nr) {
+		struct untracked_cache_dir *d = cdir->untracked->dirs[cdir->nr_dirs];
+		cdir->ucd = d;
+		cdir->nr_dirs++;
+		return 0;
+	}
+	cdir->ucd = NULL;
+	if (cdir->nr_files < cdir->untracked->untracked_nr) {
+		struct untracked_cache_dir *d = cdir->untracked;
+		cdir->file = d->untracked[cdir->nr_files++];
+		return 0;
+	}
+	return -1;
+}
+
+static void close_cached_dir(struct cached_dir *cdir)
+{
+	if (cdir->fdir)
+		closedir(cdir->fdir);
+	/*
+	 * We have gone through this directory and found no untracked
+	 * entries. Set untracked_nr to zero to make it valid.
+	 */
+	if (cdir->untracked && !cdir->untracked->valid)
+		cdir->untracked->valid = 1;
+}
+
 /*
  * Read a directory tree. We currently ignore anything but
  * directories, regular files and symlinks. That's because git
@@ -1460,23 +1622,21 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 				    struct untracked_cache_dir *untracked, int check_only,
 				    const struct path_simplify *simplify)
 {
-	DIR *fdir;
+	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
-	struct dirent *de;
 	struct strbuf path = STRBUF_INIT;
 
 	strbuf_add(&path, base, baselen);
 
-	fdir = opendir(path.len ? path.buf : ".");
-	if (!fdir)
+	if (open_cached_dir(&cdir, dir, untracked, &path, check_only))
 		goto out;
 
 	if (untracked)
 		untracked->check_only = !!check_only;
 
-	while ((de = readdir(fdir)) != NULL) {
+	while (!read_cached_dir(&cdir)) {
 		/* check how the file or directory should be treated */
-		state = treat_path(dir, untracked, de, &path, baselen, simplify);
+		state = treat_path(dir, untracked, &cdir, &path, baselen, simplify);
 
 		if (state > dir_state)
 			dir_state = state;
@@ -1497,7 +1657,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		if (check_only) {
 			/* abort early if maximum state has been reached */
 			if (dir_state == path_untracked) {
-				if (untracked)
+				if (cdir.fdir)
 					add_untracked(untracked, path.buf + baselen);
 				break;
 			}
@@ -1521,7 +1681,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			if (dir->flags & DIR_SHOW_IGNORED)
 				break;
 			dir_add_name(dir, path.buf, path.len);
-			if (untracked)
+			if (cdir.fdir)
 				add_untracked(untracked, path.buf + baselen);
 			break;
 
@@ -1529,7 +1689,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			break;
 		}
 	}
-	closedir(fdir);
+	close_cached_dir(&cdir);
  out:
 	strbuf_release(&path);
 
diff --git a/dir.h b/dir.h
index ded251e..8955945 100644
--- a/dir.h
+++ b/dir.h
@@ -130,6 +130,8 @@ struct untracked_cache {
 	/* Statistics */
 	int dir_created;
 	int gitignore_invalidated;
+	int dir_invalidated;
+	int dir_opened;
 };
 
 struct dir_struct {
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/20] untracked cache: mark what dirs should be recursed/saved
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (7 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 08/20] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 10/20] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Suppose untracked cache stores that in directory A we need to recurse
in A/B and A/C. Then A/B is removed. When read_directory() is executed
again, of course we detect that we only need to recurse in A/C when in
A, not A/B any more.

We need a way though to let the write phase know not to write A/B
down. Which is the purpose of this bit. We can't simply destroy A/B
when A is invalidated, because at that moment we don't know if A/B is
deleted or not.
---
 dir.c | 15 ++++++++++++++-
 dir.h |  1 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 205f323..63fa960 100644
--- a/dir.c
+++ b/dir.c
@@ -591,6 +591,7 @@ static void do_invalidate_gitignore(struct untracked_cache_dir *dir)
 	int i;
 	dir->valid = 0;
 	dir->untracked_nr = 0;
+	/* dir->recurse = 0; ? */
 	for (i = 0; i < dir->dirs_nr; i++)
 		do_invalidate_gitignore(dir->dirs[i]);
 }
@@ -605,9 +606,12 @@ static void invalidate_gitignore(struct untracked_cache *uc,
 static void invalidate_directory(struct untracked_cache *uc,
 				 struct untracked_cache_dir *dir)
 {
+	int i;
 	uc->dir_invalidated++;
 	dir->valid = 0;
 	dir->untracked_nr = 0;
+	for (i = 0; i < dir->dirs_nr; i++)
+		dir->dirs[i]->recurse = 0;
 }
 
 static int add_excludes(const char *fname,
@@ -1581,6 +1585,10 @@ int read_cached_dir(struct cached_dir *cdir)
 	}
 	while (cdir->nr_dirs < cdir->untracked->dirs_nr) {
 		struct untracked_cache_dir *d = cdir->untracked->dirs[cdir->nr_dirs];
+		if (!d->recurse) {
+			cdir->nr_dirs++;
+			continue;
+		}
 		cdir->ucd = d;
 		cdir->nr_dirs++;
 		return 0;
@@ -1602,8 +1610,10 @@ static void close_cached_dir(struct cached_dir *cdir)
 	 * We have gone through this directory and found no untracked
 	 * entries. Set untracked_nr to zero to make it valid.
 	 */
-	if (cdir->untracked && !cdir->untracked->valid)
+	if (cdir->untracked) {
 		cdir->untracked->valid = 1;
+		cdir->untracked->recurse = 1;
+	}
 }
 
 /*
@@ -1849,6 +1859,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		hashcpy(dir->untracked->excludes_file_sha1,
 			dir->excludes_file_sha1);
 	}
+
+	/* Make sure this directory is not dropped out at saving phase */
+	root->recurse = 1;
 	return root;
 }
 
diff --git a/dir.h b/dir.h
index 8955945..5dde37b 100644
--- a/dir.h
+++ b/dir.h
@@ -108,6 +108,7 @@ struct untracked_cache_dir {
 	/* null SHA-1 means this directory does not have .gitignore */
 	unsigned char exclude_sha1[20];
 	struct stat_data stat_data;
+	unsigned int recurse : 1;
 	unsigned int check_only : 1;
 	unsigned int valid : 1;
 	unsigned int untracked_nr : 29;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/20] untracked cache: don't open non-existent .gitignore
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (8 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 09/20] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 11/20] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This cuts down a signficant number of open(.gitignore) because most
directories usually don't have .gitignore files.
---
 dir.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 63fa960..b5bfda8 100644
--- a/dir.c
+++ b/dir.c
@@ -1021,7 +1021,21 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 
 		/* Try to read per-directory file */
 		hashclr(sha1);
-		if (dir->exclude_per_dir) {
+		if (dir->exclude_per_dir &&
+		    /*
+		     * If we know that no files have been added in
+		     * this directory (i.e. valid_cached_dir() has
+		     * been executed and set untracked->valid) ..
+		     */
+		    (!untracked || !untracked->valid ||
+		     /*
+		      * .. and .gitignore does not exist before
+		      * (i.e. null exclude_sha1 and skip_worktree is
+		      * not set). Then we can skip loading .gitignore,
+		      * which would result in ENOENT anyway.
+		      * skip_worktree is taken care in read_directory()
+		      */
+		     !is_null_sha1(untracked->exclude_sha1))) {
 			/*
 			 * dir->basebuf gets reused by the traversal, but we
 			 * need fname to remain unchanged to ensure the src
@@ -1788,6 +1802,7 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 						      const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *root;
+	int i;
 
 	if (!dir->untracked)
 		return NULL;
@@ -1839,6 +1854,15 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (dir->exclude_list_group[EXC_CMDL].nr)
 		return NULL;
 
+	/*
+	 * An optimization in prep_exclude() does not play well with
+	 * CE_SKIP_WORKTREE. It's a rare case anyway, if a single
+	 * entry has that bit set, disable the whole untracked cache.
+	 */
+	for (i = 0; i < active_nr; i++)
+		if (ce_skip_worktree(active_cache[i]))
+			return NULL;
+
 	if (!dir->untracked->root) {
 		const int len = sizeof(*dir->untracked->root);
 		dir->untracked->root = xmalloc(len);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/20] untracked cache: save to an index extension
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (9 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 10/20] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 12/20] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

FIXME: save check_only
---
 cache.h      |  3 ++
 dir.c        | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dir.h        |  1 +
 read-cache.c | 12 ++++++++
 4 files changed, 107 insertions(+)

diff --git a/cache.h b/cache.h
index 107ac61..06fcb6b 100644
--- a/cache.h
+++ b/cache.h
@@ -268,6 +268,8 @@ static inline unsigned int canon_mode(unsigned int mode)
 
 #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
 
+
+struct untracked_cache;
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -279,6 +281,7 @@ struct index_state {
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
+	struct untracked_cache *untracked;
 };
 
 extern struct index_state the_index;
diff --git a/dir.c b/dir.c
index b5bfda8..b7d394a 100644
--- a/dir.c
+++ b/dir.c
@@ -12,6 +12,7 @@
 #include "refs.h"
 #include "wildmatch.h"
 #include "pathspec.h"
+#include "varint.h"
 
 struct path_simplify {
 	int len;
@@ -2165,3 +2166,93 @@ void clear_directory(struct dir_struct *dir)
 	}
 	strbuf_release(&dir->basebuf);
 }
+
+struct ondisk_untracked_cache {
+	struct stat_data info_exclude_stat;
+	struct stat_data excludes_file_stat;
+	uint32_t dir_flags;
+	unsigned char info_exclude_sha1[20];
+	unsigned char excludes_file_sha1[20];
+	char exclude_per_dir[1];
+};
+
+static void stat_data_to_disk(struct stat_data *to, const struct stat_data *from)
+{
+	to->sd_ctime.sec  = htonl(from->sd_ctime.sec);
+	to->sd_ctime.nsec = htonl(from->sd_ctime.nsec);
+	to->sd_mtime.sec  = htonl(from->sd_mtime.sec);
+	to->sd_mtime.nsec = htonl(from->sd_mtime.nsec);
+	to->sd_dev	  = htonl(from->sd_dev);
+	to->sd_ino	  = htonl(from->sd_ino);
+	to->sd_uid	  = htonl(from->sd_uid);
+	to->sd_gid	  = htonl(from->sd_gid);
+	to->sd_size	  = htonl(from->sd_size);
+}
+
+static void write_one_dir(struct strbuf *out, struct untracked_cache_dir *untracked)
+{
+	struct stat_data stat_data;
+	unsigned char intbuf[16];
+	unsigned int intlen, value;
+	int i;
+
+	stat_data_to_disk(&stat_data, &untracked->stat_data);
+	strbuf_add(out, &stat_data, sizeof(stat_data));
+	strbuf_add(out, untracked->exclude_sha1, 20);
+
+	/*
+	 * untracked_nr should be reset whenever valid is clear, but
+	 * for safety..
+	 */
+	if (!untracked->valid) {
+		untracked->untracked_nr = 0;
+		untracked->check_only = 0;
+	}
+
+	/*
+	 * encode_varint does not deal with signed integers. Use the
+	 * lowest bit to store the sign.
+	 */
+	value = untracked->untracked_nr << 2;
+	if (untracked->valid)
+		value |= 1;
+	if (untracked->check_only)
+		value |= 2;
+	intlen = encode_varint(value, intbuf);
+	strbuf_add(out, intbuf, intlen);
+
+	/* skip non-recurse directories */
+	for (i = 0, value = 0; i < untracked->dirs_nr; i++)
+		if (untracked->dirs[i]->recurse)
+			value++;
+	intlen = encode_varint(value, intbuf);
+	strbuf_add(out, intbuf, intlen);
+
+	strbuf_add(out, untracked->name, strlen(untracked->name) + 1);
+
+	for (i = 0; i < untracked->untracked_nr; i++)
+		strbuf_add(out, untracked->untracked[i],
+			   strlen(untracked->untracked[i]) + 1);
+
+	for (i = 0; i < untracked->dirs_nr; i++)
+		if (untracked->dirs[i]->recurse)
+			write_one_dir(out, untracked->dirs[i]);
+}
+
+void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked)
+{
+	struct ondisk_untracked_cache *ouc;
+	int len = 0;
+	if (untracked->exclude_per_dir)
+		len = strlen(untracked->exclude_per_dir);
+	ouc = xmalloc(sizeof(*ouc) + len);
+	stat_data_to_disk(&ouc->info_exclude_stat, &untracked->info_exclude_stat);
+	stat_data_to_disk(&ouc->excludes_file_stat, &untracked->excludes_file_stat);
+	hashcpy(ouc->info_exclude_sha1, untracked->info_exclude_sha1);
+	hashcpy(ouc->excludes_file_sha1, untracked->excludes_file_sha1);
+	ouc->dir_flags = htonl(untracked->dir_flags);
+	memcpy(ouc->exclude_per_dir, untracked->exclude_per_dir, len + 1);
+	strbuf_add(out, ouc, sizeof(*ouc) + len);
+	if (untracked->root)
+		write_one_dir(out, untracked->root);
+}
diff --git a/dir.h b/dir.h
index 5dde37b..e520d58 100644
--- a/dir.h
+++ b/dir.h
@@ -295,4 +295,5 @@ static inline int dir_path_match(const struct dir_entry *ent,
 			      has_trailing_dir);
 }
 
+void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 #endif
diff --git a/read-cache.c b/read-cache.c
index ba13353..a619666 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -34,6 +34,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
 
 struct index_state the_index;
 
@@ -1869,6 +1870,17 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
+	if (istate->untracked) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_untracked_extension(&sb, istate->untracked);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_UNTRACKED,
+					     sb.len) < 0 ||
+			ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd) || fstat(newfd, &st))
 		return -1;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/20] untracked cache: load from UNTR index extension
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (10 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 11/20] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 13/20] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

FIXME: load check_only
---
 dir.c        | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dir.h        |   1 +
 read-cache.c |   3 ++
 3 files changed, 111 insertions(+)

diff --git a/dir.c b/dir.c
index b7d394a..3c61b42 100644
--- a/dir.c
+++ b/dir.c
@@ -2256,3 +2256,110 @@ void write_untracked_extension(struct strbuf *out, struct untracked_cache *untra
 	if (untracked->root)
 		write_one_dir(out, untracked->root);
 }
+
+static void stat_data_from_disk(struct stat_data *to, const struct stat_data *from)
+{
+	to->sd_ctime.sec  = get_be32(&from->sd_ctime.sec);
+	to->sd_ctime.nsec = get_be32(&from->sd_ctime.nsec);
+	to->sd_mtime.sec  = get_be32(&from->sd_mtime.sec);
+	to->sd_mtime.nsec = get_be32(&from->sd_mtime.nsec);
+	to->sd_dev	  = get_be32(&from->sd_dev);
+	to->sd_ino	  = get_be32(&from->sd_ino);
+	to->sd_uid	  = get_be32(&from->sd_uid);
+	to->sd_gid	  = get_be32(&from->sd_gid);
+	to->sd_size	  = get_be32(&from->sd_size);
+}
+
+static int read_one_dir(struct untracked_cache_dir **untracked_,
+			const unsigned char *data_, unsigned long sz)
+{
+#define NEXT(x) \
+	next = data + (x); \
+	if (next > data_ + sz) \
+		return -1;
+
+	struct untracked_cache_dir ud, *untracked;
+	const unsigned char *next, *data = data_;
+	unsigned int value;
+	int i, len;
+
+	memset(&ud, 0, sizeof(ud));
+	ud.recurse = 1;
+
+	NEXT(sizeof(struct stat_data));
+	stat_data_from_disk(&ud.stat_data, (struct stat_data *)data);
+	data = next;
+
+	NEXT(20);
+	hashcpy(ud.exclude_sha1, data);
+	data = next;
+
+	next = data;
+	value = decode_varint(&next);
+	if (next > data_ + sz)
+		return -1;
+	ud.untracked_alloc = ud.untracked_nr = value >> 2;
+	if (ud.untracked_nr)
+		ud.untracked = xmalloc(sizeof(*ud.untracked) * ud.untracked_nr);
+	if (value & 1)
+		ud.valid = 1;
+	if (value & 2)
+		ud.check_only = 1;
+	data = next;
+
+	next = data;
+	ud.dirs_alloc = ud.dirs_nr = decode_varint(&next);
+	if (next > data_ + sz)
+		return -1;
+	ud.dirs = xmalloc(sizeof(*ud.dirs) * ud.dirs_nr);
+	data = next;
+
+	len = strlen((const char *)data);
+	NEXT(len + 1);
+	*untracked_ = untracked = xmalloc(sizeof(*untracked) + len);
+	memcpy(untracked, &ud, sizeof(ud));
+	memcpy(untracked->name, data, len + 1);
+	data = next;
+
+	for (i = 0; i < untracked->untracked_nr; i++) {
+		len = strlen((const char *)data);
+		NEXT(len + 1);
+		untracked->untracked[i] = xstrdup((const char*)data);
+		data = next;
+	}
+
+	for (i = 0; i < untracked->dirs_nr; i++) {
+		len = read_one_dir(untracked->dirs + i, data, sz - (data - data_));
+		if (len < 0)
+			return -1;
+		data += len;
+	}
+	return data - data_;
+}
+
+struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz)
+{
+	const struct ondisk_untracked_cache *ouc = data;
+	struct untracked_cache *uc;
+	int len;
+
+	if (sz < sizeof(*ouc))
+		return NULL;
+
+	uc = xcalloc(1, sizeof(*uc));
+	stat_data_from_disk(&uc->info_exclude_stat, &ouc->info_exclude_stat);
+	stat_data_from_disk(&uc->excludes_file_stat, &ouc->excludes_file_stat);
+	hashcpy(uc->info_exclude_sha1, ouc->info_exclude_sha1);
+	hashcpy(uc->excludes_file_sha1, ouc->excludes_file_sha1);
+	uc->dir_flags = get_be32(&ouc->dir_flags);
+	uc->exclude_per_dir = xstrdup(ouc->exclude_per_dir);
+	len = sizeof(*ouc) + strlen(ouc->exclude_per_dir);
+	if (sz == len)
+		return uc;
+	if (sz > len &&
+	    read_one_dir(&uc->root, (const unsigned char *)data + len,
+			 sz - len) == sz - len)
+		return uc;
+	free(uc);
+	return NULL;
+}
diff --git a/dir.h b/dir.h
index e520d58..42a09ff 100644
--- a/dir.h
+++ b/dir.h
@@ -295,5 +295,6 @@ static inline int dir_path_match(const struct dir_entry *ent,
 			      has_trailing_dir);
 }
 
+struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz);
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 #endif
diff --git a/read-cache.c b/read-cache.c
index a619666..c350b7b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1332,6 +1332,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_UNTRACKED:
+		istate->untracked = read_untracked_extension(data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/20] untracked cache: invalidate at index addition or removal
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (11 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 12/20] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 14/20] untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Ideally we should replace untracked_cache_invalidate_path() with
untracked_cache_remove_from_index() and untracked_cache_add_to_index(),
and the two last functions will update untracked cache right away
instead of invalidating it and wait for read_directory() next time to
deal with it. But that may need some more work in unpack-trees.c. So
stay simple as the first step.
---
 dir.c          | 31 +++++++++++++++++++++++++++++++
 dir.h          |  4 ++++
 read-cache.c   |  4 ++++
 unpack-trees.c |  7 +++++--
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 3c61b42..18fe44c 100644
--- a/dir.c
+++ b/dir.c
@@ -2363,3 +2363,34 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	free(uc);
 	return NULL;
 }
+
+void untracked_cache_invalidate_path(struct index_state *istate,
+				     const char *path)
+{
+	const char *sep;
+	struct untracked_cache_dir *d;
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+	sep = strrchr(path, '/');
+	if (sep)
+		d = lookup_untracked(istate->untracked,
+				     istate->untracked->root,
+				     path, sep - path);
+	else
+		d = istate->untracked->root;
+	istate->untracked->dir_invalidated++;
+	d->valid = 0;
+	d->untracked_nr = 0;
+}
+
+void untracked_cache_remove_from_index(struct index_state *istate,
+				       const char *path)
+{
+	untracked_cache_invalidate_path(istate, path);
+}
+
+void untracked_cache_add_to_index(struct index_state *istate,
+				  const char *path)
+{
+	untracked_cache_invalidate_path(istate, path);
+}
diff --git a/dir.h b/dir.h
index 42a09ff..d56c43a 100644
--- a/dir.h
+++ b/dir.h
@@ -295,6 +295,10 @@ static inline int dir_path_match(const struct dir_entry *ent,
 			      has_trailing_dir);
 }
 
+void untracked_cache_invalidate_path(struct index_state *, const char *);
+void untracked_cache_remove_from_index(struct index_state *, const char *);
+void untracked_cache_add_to_index(struct index_state *, const char *);
+
 struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz);
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 #endif
diff --git a/read-cache.c b/read-cache.c
index c350b7b..66c2279 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -66,6 +66,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
 	memcpy(new->name, new_name, namelen + 1);
 
 	cache_tree_invalidate_path(istate->cache_tree, old->name);
+	untracked_cache_remove_from_index(istate, old->name);
 	remove_index_entry_at(istate, nr);
 	add_index_entry(istate, new, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
 }
@@ -520,6 +521,7 @@ int remove_file_from_index(struct index_state *istate, const char *path)
 	if (pos < 0)
 		pos = -pos-1;
 	cache_tree_invalidate_path(istate->cache_tree, path);
+	untracked_cache_remove_from_index(istate, path);
 	while (pos < istate->cache_nr && !strcmp(istate->cache[pos]->name, path))
 		remove_index_entry_at(istate, pos);
 	return 0;
@@ -948,6 +950,8 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	}
 	pos = -pos-1;
 
+	untracked_cache_add_to_index(istate, ce->name);
+
 	/*
 	 * Inserting a merged entry ("stage 0") into the index
 	 * will always replace all non-merged entries..
diff --git a/unpack-trees.c b/unpack-trees.c
index 97fc995..35ef298 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -8,6 +8,7 @@
 #include "progress.h"
 #include "refs.h"
 #include "attr.h"
+#include "dir.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -1258,8 +1259,10 @@ static int verify_uptodate_sparse(const struct cache_entry *ce,
 static void invalidate_ce_path(const struct cache_entry *ce,
 			       struct unpack_trees_options *o)
 {
-	if (ce)
-		cache_tree_invalidate_path(o->src_index->cache_tree, ce->name);
+	if (!ce)
+		return;
+	cache_tree_invalidate_path(o->src_index->cache_tree, ce->name);
+	untracked_cache_invalidate_path(o->src_index, ce->name);
 }
 
 /*
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/20] untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (12 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 13/20] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 15/20] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This could be used to verify correct behavior in tests
---
 dir.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/dir.c b/dir.c
index 18fe44c..58303ca 100644
--- a/dir.c
+++ b/dir.c
@@ -14,6 +14,8 @@
 #include "pathspec.h"
 #include "varint.h"
 
+#define TRACE_KEY "GIT_TRACE_UNTRACKED"
+
 struct path_simplify {
 	int len;
 	const char *path;
@@ -1929,6 +1931,16 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
 	qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name);
+	if (dir->untracked) {
+		trace_printf_key(TRACE_KEY, "node creation: %u\n",
+				 dir->untracked->dir_created);
+		trace_printf_key(TRACE_KEY, "gitignore invalidation: %u\n",
+				 dir->untracked->gitignore_invalidated);
+		trace_printf_key(TRACE_KEY, "directory invalidation: %u\n",
+				 dir->untracked->dir_invalidated);
+		trace_printf_key(TRACE_KEY, "opendir: %u\n",
+				 dir->untracked->dir_opened);
+	}
 	return dir->nr;
 }
 
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/20] read-cache.c: split racy stat test to a separate function
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (13 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 14/20] untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 16/20] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 read-cache.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 66c2279..72adcd6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -258,20 +258,26 @@ static int ce_match_stat_basic(const struct cache_entry *ce, struct stat *st)
 	return changed;
 }
 
-static int is_racy_timestamp(const struct index_state *istate,
-			     const struct cache_entry *ce)
+static int is_racy_stat(const struct index_state *istate,
+			const struct stat_data *sd)
 {
-	return (!S_ISGITLINK(ce->ce_mode) &&
-		istate->timestamp.sec &&
+	return (istate->timestamp.sec &&
 #ifdef USE_NSEC
 		 /* nanosecond timestamped files can also be racy! */
-		(istate->timestamp.sec < ce->ce_stat_data.sd_mtime.sec ||
-		 (istate->timestamp.sec == ce->ce_stat_data.sd_mtime.sec &&
-		  istate->timestamp.nsec <= ce->ce_stat_data.sd_mtime.nsec))
+		(istate->timestamp.sec < sd->sd_mtime.sec ||
+		 (istate->timestamp.sec == sd->sd_mtime.sec &&
+		  istate->timestamp.nsec <= sd->sd_mtime.nsec))
 #else
-		istate->timestamp.sec <= ce->ce_stat_data.sd_mtime.sec
+		istate->timestamp.sec <= sd->sd_mtime.sec
 #endif
-		 );
+		);
+}
+
+static int is_racy_timestamp(const struct index_state *istate,
+			     const struct cache_entry *ce)
+{
+	return (!S_ISGITLINK(ce->ce_mode) &&
+		is_racy_stat(istate, &ce->ce_stat_data));
 }
 
 int ie_match_stat(const struct index_state *istate,
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/20] untracked cache: avoid racy timestamps
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (14 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 15/20] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 17/20] status: support untracked cache Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

When a directory is updated within the same second that its timestamp
is last saved, we cannot realize the directory has been updated by
checking timestamps. Assume the worst (something is update). See
29e4d36 (Racy GIT - 2005-12-20) for more information.
---
 cache.h      | 2 ++
 dir.c        | 6 ++++--
 read-cache.c | 8 ++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 06fcb6b..98c22c4 100644
--- a/cache.h
+++ b/cache.h
@@ -525,6 +525,8 @@ extern void fill_stat_data(struct stat_data *sd, struct stat *st);
  * INODE_CHANGED, and DATA_CHANGED.
  */
 extern int match_stat_data(const struct stat_data *sd, struct stat *st);
+extern int match_stat_data_racy(const struct index_state *istate,
+				const struct stat_data *sd, struct stat *st);
 
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
diff --git a/dir.c b/dir.c
index 58303ca..24ccd22 100644
--- a/dir.c
+++ b/dir.c
@@ -677,7 +677,9 @@ static int add_excludes(const char *fname,
 			    !ce_stage(active_cache[pos]) &&
 			    ce_uptodate(active_cache[pos]))
 				hashcpy(sha1, active_cache[pos]->sha1);
-			else if (ref_stat && !match_stat_data(ref_stat, &st)) {
+			else if (ref_stat &&
+				 !match_stat_data_racy(&the_index,
+						       ref_stat, &st)) {
 				if (ref_sha1 != sha1) /* support ref_sha1 == sha1 */
 					hashcpy(sha1, ref_sha1);
 			} else
@@ -1543,7 +1545,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 	if (!untracked->valid ||
-	    match_stat_data(&untracked->stat_data, &st)) {
+	    match_stat_data_racy(&the_index, &untracked->stat_data, &st)) {
 		if (untracked->valid)
 			invalidate_directory(dir->untracked, untracked);
 		fill_stat_data(&untracked->stat_data, &st);
diff --git a/read-cache.c b/read-cache.c
index 72adcd6..823db9b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -280,6 +280,14 @@ static int is_racy_timestamp(const struct index_state *istate,
 		is_racy_stat(istate, &ce->ce_stat_data));
 }
 
+int match_stat_data_racy(const struct index_state *istate,
+			 const struct stat_data *sd, struct stat *st)
+{
+	if (is_racy_stat(istate, sd))
+		return MTIME_CHANGED;
+	return match_stat_data(sd, st);
+}
+
 int ie_match_stat(const struct index_state *istate,
 		  const struct cache_entry *ce, struct stat *st,
 		  unsigned int options)
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 17/20] status: support untracked cache
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (15 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 16/20] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 18/20] update-index: manually enable or disable " Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 builtin/commit.c | 8 ++++++++
 wt-status.c      | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/builtin/commit.c b/builtin/commit.c
index 9cfef6c..1e45ff0 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1327,6 +1327,14 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		wt_status_print(&s);
 		break;
 	}
+
+	if (active_cache_changed) {
+		fd = hold_locked_index(&index_lock, 0);
+		if (0 <= fd &&
+		    (write_cache(fd, active_cache, active_nr) ||
+		     commit_locked_index(&index_lock)))
+			die("Unable to write new index file");
+	}
 	return 0;
 }
 
diff --git a/wt-status.c b/wt-status.c
index ec7344e..0355129 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -578,9 +578,15 @@ static void wt_status_collect_untracked(struct wt_status *s)
 			DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
 	if (s->show_ignored_files)
 		dir.flags |= DIR_SHOW_IGNORED_TOO;
+	dir.untracked = the_index.untracked;
 	setup_standard_excludes(&dir);
 
 	fill_directory(&dir, &s->pathspec);
+	if (dir.untracked &&
+	    (dir.untracked->dir_opened ||
+	     dir.untracked->gitignore_invalidated ||
+	     dir.untracked->dir_invalidated))
+		active_cache_changed = 1;
 
 	for (i = 0; i < dir.nr; i++) {
 		struct dir_entry *ent = dir.entries[i];
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 18/20] update-index: manually enable or disable untracked cache
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (16 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 17/20] status: support untracked cache Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:51 ` [PATCH 19/20] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
  2014-05-07 14:52 ` [PATCH 20/20] t7063: tests for " Nguyễn Thái Ngọc Duy
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Some numbers below. In short, the saving on wt_status_collect_untracked()
is about 80%. There are some overhead on read/write_cache, but it seems
lower than 50ms, while the .._untracked() saving is in the 500ms range
(except linux-2.6, about 150ms). "git status" time saving ranges from
33% to 42%.

On gentoo-x86.git (100k files, 23k dirs, quite balance
tree, 8.5MB index v4, cache-tree fully populated), before turning
untracked cache on (the most important line is wt_status_collect:625)

   184.650 gitmodules_config:201 if (read_cache() < 0) die("index
     0.004 cmd_status:1299 read_cache_preload(&s.pathspec)
   226.231 cmd_status:1300 refresh_index(&the_index, REFRESH_QUIET
     3.096 cmd_status:1304 update_index_if_able(&the_index, &index_lock)
     6.788  wt_status_collect:619 wt_status_collect_changes_worktree(s)
     6.780  wt_status_collect:624 wt_status_collect_changes_index(s)
   772.866  wt_status_collect:625 wt_status_collect_untracked(s)
   786.686 cmd_status:1308 wt_status_collect(&s)

real    0m1.211s
user    0m0.566s
sys     0m0.638s

and after (saving 42% total time):

   220.888 gitmodules_config:201 if (read_cache() < 0) die("index
    36.368 cmd_status:1299 read_cache_preload(&s.pathspec)
   223.936 cmd_status:1300 refresh_index(&the_index, REFRESH_QUIET
     2.935 cmd_status:1304 update_index_if_able(&the_index, &index_lock)
     7.031  wt_status_collect:619 wt_status_collect_changes_worktree(s)
     7.156  wt_status_collect:624 wt_status_collect_changes_index(s)
   148.022  wt_status_collect:625 wt_status_collect_untracked(s)
   162.443 cmd_status:1308 wt_status_collect(&s)
    36.943 cmd_status:1348 if (the_index.untracked) { struct

real    0m0.696s
user    0m0.406s
sys     0m0.288s

On webkit.git (182k files, 6k dirs, 14M index v4), before:

   283.182 gitmodules_config:201 if (read_cache() < 0) die("index
     0.004 cmd_status:1299 read_cache_preload(&s.pathspec)
   515.836 cmd_status:1300 refresh_index(&the_index, REFRESH_QUIET
     5.428 cmd_status:1304 update_index_if_able(&the_index, &index_lock)
    11.162  wt_status_collect:619 wt_status_collect_changes_worktree(s)
    11.068  wt_status_collect:624 wt_status_collect_changes_index(s)
   887.247  wt_status_collect:625 wt_status_collect_untracked(s)
   909.722 cmd_status:1308 wt_status_collect(&s)

real    0m1.729s
user    0m0.785s
sys     0m0.941s

and after (saving 38% total time):

   290.994 gitmodules_config:201 if (read_cache() < 0) die("index
    10.132 cmd_status:1299 read_cache_preload(&s.pathspec)
   516.656 cmd_status:1300 refresh_index(&the_index, REFRESH_QUIET
     5.159 cmd_status:1304 update_index_if_able(&the_index, &index_lock)
    12.605  wt_status_collect:619 wt_status_collect_changes_worktree(s)
    11.262  wt_status_collect:624 wt_status_collect_changes_index(s)
   186.032  wt_status_collect:625 wt_status_collect_untracked(s)
   210.134 cmd_status:1308 wt_status_collect(&s)
    12.332 cmd_status:1348 if (the_index.untracked) { struct index_state

real    0m1.058s
user    0m0.525s
sys     0m0.532s

And linux-2.6 (45k files, 3k dirs, 3.2MB index v4), before:

    68.668 gitmodules_config:201 if (read_cache() < 0) die("index
     0.004 cmd_status:1299 read_cache_preload(&s.pathspec)
   114.270 cmd_status:1300 refresh_index(&the_index, REFRESH_QUIET
     1.180 cmd_status:1304 update_index_if_able(&the_index, &index_lock)
     4.027  wt_status_collect:619 wt_status_collect_changes_worktree(s)
     4.265  wt_status_collect:624 wt_status_collect_changes_index(s)
   191.285  wt_status_collect:625 wt_status_collect_untracked(s)
   199.825 cmd_status:1308 wt_status_collect(&s)

real    0m0.392s
user    0m0.177s
sys     0m0.215s

and after (saving 33%):

    71.756 gitmodules_config:201 if (read_cache() < 0) die("index
     5.201 cmd_status:1299 read_cache_preload(&s.pathspec)
   111.064 cmd_status:1300 refresh_index(&the_index, REFRESH_QUIET
     1.171 cmd_status:1304 update_index_if_able(&the_index, &index_lock)
     3.054  wt_status_collect:619 wt_status_collect_changes_worktree(s)
     4.945  wt_status_collect:624 wt_status_collect_changes_index(s)
    27.203  wt_status_collect:625 wt_status_collect_untracked(s)
    35.475 cmd_status:1308 wt_status_collect(&s)
    25.759 cmd_status:1348 if (the_index.untracked) { struct index_state

real    0m0.259s
user    0m0.106s
sys     0m0.132s
---
 builtin/update-index.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index ba54e19..003e28e 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -734,6 +734,7 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx,
 int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, line_termination = '\n';
+	int untracked_cache = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -822,6 +823,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			resolve_undo_clear_callback},
 		OPT_INTEGER(0, "index-version", &preferred_index_format,
 			N_("write index in this format")),
+		OPT_BOOL(0, "untracked-cache", &untracked_cache,
+			N_("enable/disable untracked cache")),
 		OPT_END()
 	};
 
@@ -915,6 +918,20 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	if (untracked_cache > 0) {
+		struct untracked_cache *uc;
+
+		uc = xcalloc(1, sizeof(*uc));
+		uc->exclude_per_dir = ".gitignore";
+		/* should be the same flags used by git-status */
+		uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+		the_index.untracked = uc;
+		active_cache_changed = 1;
+	} else if (!untracked_cache) {
+		the_index.untracked = NULL;
+		active_cache_changed = 1;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 19/20] update-index: test the system before enabling untracked cache
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (17 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 18/20] update-index: manually enable or disable " Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:51 ` Nguyễn Thái Ngọc Duy
  2014-05-07 14:52 ` [PATCH 20/20] t7063: tests for " Nguyễn Thái Ngọc Duy
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 builtin/update-index.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 003e28e..f18076b 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -46,6 +46,145 @@ static void report(const char *fmt, ...)
 	va_end(vp);
 }
 
+static void remove_test_directory(void)
+{
+	struct strbuf sb = STRBUF_INIT;
+	strbuf_addstr(&sb, git_path("dir-mtime-test"));
+	remove_dir_recursively(&sb, 0);
+	strbuf_release(&sb);
+}
+
+static void xmkdir(const char *path)
+{
+	if (mkdir(path, 0700))
+		die_errno(_("failed to create directory %s"), path);
+}
+
+static int xstat(const char *path, struct stat *st)
+{
+	if (stat(path, st))
+		die_errno(_("failed to stat %s"), path);
+	return 0;
+}
+
+static int create_file(const char *path)
+{
+	int fd = open(path, O_CREAT | O_RDWR, 0644);
+	if (fd < 0)
+		die_errno(_("failed to create file %s"), path);
+	return fd;
+}
+
+static void xunlink(const char *path)
+{
+	if (unlink(path))
+		die_errno(_("failed to delete file %s"), path);
+}
+
+static void xrmdir(const char *path)
+{
+	if (rmdir(path))
+		die_errno(_("failed to delete directory %s"), path);
+}
+
+static void avoid_racy(void)
+{
+	/*
+	 * not use if we could usleep(10) if USE_NSEC is defined. The
+	 * field nsec could be there, but the OS could choose to
+	 * ignore it?
+	 */
+	sleep(1);
+}
+
+static int test_if_untracked_cache_is_supported(void)
+{
+	struct stat st;
+	struct stat_data base;
+	int fd;
+
+	fprintf(stderr, _("Testing "));
+	xmkdir(git_path("dir-mtime-test"));
+	atexit(remove_test_directory);
+	xstat(git_path("dir-mtime-test"), &st);
+	fill_stat_data(&base, &st);
+	fputc('.', stderr);
+
+	avoid_racy();
+	fd = create_file(git_path("dir-mtime-test/newfile"));
+	xstat(git_path("dir-mtime-test"), &st);
+	if (!match_stat_data(&base, &st)) {
+		fputc('\n', stderr);
+		fprintf_ln(stderr,_("directory stat info does not "
+				    "change after adding a new file"));
+		return 0;
+	}
+	fill_stat_data(&base, &st);
+	fputc('.', stderr);
+
+	avoid_racy();
+	xmkdir(git_path("dir-mtime-test/new-dir"));
+	xstat(git_path("dir-mtime-test"), &st);
+	if (!match_stat_data(&base, &st)) {
+		fputc('\n', stderr);
+		fprintf_ln(stderr, _("directory stat info does not change "
+				     "after adding a new directory"));
+		return 0;
+	}
+	fill_stat_data(&base, &st);
+	fputc('.', stderr);
+
+	avoid_racy();
+	write_or_die(fd, "data", 4);
+	close(fd);
+	xstat(git_path("dir-mtime-test"), &st);
+	if (match_stat_data(&base, &st)) {
+		fputc('\n', stderr);
+		fprintf_ln(stderr, _("directory stat info changes "
+				     "after updating a file"));
+		return 0;
+	}
+	fputc('.', stderr);
+
+	avoid_racy();
+	close(create_file(git_path("dir-mtime-test/new-dir/new")));
+	xstat(git_path("dir-mtime-test"), &st);
+	if (match_stat_data(&base, &st)) {
+		fputc('\n', stderr);
+		fprintf_ln(stderr, _("directory stat info changes after "
+				     "adding a file inside subdirectory"));
+		return 0;
+	}
+	fputc('.', stderr);
+
+	avoid_racy();
+	xunlink(git_path("dir-mtime-test/newfile"));
+	xstat(git_path("dir-mtime-test"), &st);
+	if (!match_stat_data(&base, &st)) {
+		fputc('\n', stderr);
+		fprintf_ln(stderr, _("directory stat info does not "
+				     "change after deleting a file"));
+		return 0;
+	}
+	fill_stat_data(&base, &st);
+	fputc('.', stderr);
+
+	avoid_racy();
+	xunlink(git_path("dir-mtime-test/new-dir/new"));
+	xrmdir(git_path("dir-mtime-test/new-dir"));
+	xstat(git_path("dir-mtime-test"), &st);
+	if (!match_stat_data(&base, &st)) {
+		fputc('\n', stderr);
+		fprintf_ln(stderr, _("directory stat info does not "
+				     "change after deleting a directory"));
+		return 0;
+	}
+
+	xrmdir(git_path("dir-mtime-test"));
+	fprintf_ln(stderr, _(" OK"));
+	return 1;
+}
+
 static int mark_ce_flags(const char *path, int flag, int mark)
 {
 	int namelen = strlen(path);
@@ -823,6 +962,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			resolve_undo_clear_callback},
 		OPT_INTEGER(0, "index-version", &preferred_index_format,
 			N_("write index in this format")),
+		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
+			    N_("enable untracked cache without testing the filesystem"), 2),
 		OPT_BOOL(0, "untracked-cache", &untracked_cache,
 			N_("enable/disable untracked cache")),
 		OPT_END()
@@ -921,6 +1062,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (untracked_cache > 0) {
 		struct untracked_cache *uc;
 
+		if (untracked_cache < 2 &&
+		    !test_if_untracked_cache_is_supported())
+			return 1;
 		uc = xcalloc(1, sizeof(*uc));
 		uc->exclude_per_dir = ".gitignore";
 		/* should be the same flags used by git-status */
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 20/20] t7063: tests for untracked cache
  2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
                   ` (18 preceding siblings ...)
  2014-05-07 14:51 ` [PATCH 19/20] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
@ 2014-05-07 14:52 ` Nguyễn Thái Ngọc Duy
  19 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-07 14:52 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 .gitignore                                 |   1 +
 Makefile                                   |   1 +
 t/t7063-status-untracked-cache.sh (new +x) | 352 +++++++++++++++++++++++++++++
 test-dump-untracked-cache.c (new)          |  61 +++++
 4 files changed, 415 insertions(+)
 create mode 100755 t/t7063-status-untracked-cache.sh
 create mode 100644 test-dump-untracked-cache.c

diff --git a/.gitignore b/.gitignore
index dc600f9..7f3bfdb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -180,6 +180,7 @@
 /test-date
 /test-delta
 /test-dump-cache-tree
+/test-dump-untracked-cache
 /test-scrap-cache-tree
 /test-genrandom
 /test-hashmap
diff --git a/Makefile b/Makefile
index a53f3a8..d8a0482 100644
--- a/Makefile
+++ b/Makefile
@@ -553,6 +553,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-index-version
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
new file mode 100755
index 0000000..bb5124b
--- /dev/null
+++ b/t/t7063-status-untracked-cache.sh
@@ -0,0 +1,352 @@
+#!/bin/sh
+
+test_description='test untracked cache'
+
+. ./test-lib.sh
+
+avoid_racy() {
+	sleep 1
+}
+
+git update-index --untracked-cache
+# It's fine if git update-index returns an error code other than one,
+# it'll be caught in the first test.
+if test $? -eq 1; then
+	skip_all='This system does not support untracked cache'
+	test_done
+fi
+
+test_expect_success 'setup' '
+	git init worktree &&
+	cd worktree &&
+	mkdir done dtwo dthree &&
+	touch one two three done/one dtwo/two dthree/three &&
+	git add one two done/one &&
+	: >.git/info/exclude &&
+	git update-index --untracked-cache
+'
+
+test_expect_success 'untracked cache is empty' '
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude 0000000000000000000000000000000000000000
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+EOF
+	test_cmp ../expect ../actual
+'
+
+cat >../status.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? dthree/
+?? dtwo/
+?? three
+EOF
+
+cat >../dump.expect <<EOF &&
+info/exclude e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ 0000000000000000000000000000000000000000 recurse valid
+dthree/
+dtwo/
+three
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+three
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+
+test_expect_success 'status first time (empty cache)' '
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 3
+gitignore invalidation: 1
+directory invalidation: 0
+opendir: 4
+EOF
+	test_cmp ../trace.expect ../trace
+'
+
+test_expect_success 'untracked cache after first status' '
+	test-dump-untracked-cache >../actual &&
+	test_cmp ../dump.expect ../actual
+'
+
+test_expect_success 'status second time (fully populated cache)' '
+	avoid_racy &&
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 0
+gitignore invalidation: 0
+directory invalidation: 0
+opendir: 0
+EOF
+	test_cmp ../trace.expect ../trace
+'
+
+test_expect_success 'untracked cache after second status' '
+	test-dump-untracked-cache >../actual &&
+	test_cmp ../dump.expect ../actual
+'
+
+test_expect_success 'modify in root directory, one dir invalidation' '
+	avoid_racy &&
+	: >four &&
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	cat >../status.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? dthree/
+?? dtwo/
+?? four
+?? three
+EOF
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 0
+gitignore invalidation: 0
+directory invalidation: 1
+opendir: 1
+EOF
+	test_cmp ../trace.expect ../trace
+
+'
+
+test_expect_success 'verify untracked cache dump' '
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ 0000000000000000000000000000000000000000 recurse valid
+dthree/
+dtwo/
+four
+three
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+three
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_expect_success 'new .gitignore invalidates recursively' '
+	avoid_racy &&
+	echo four >.gitignore &&
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	cat >../status.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? .gitignore
+?? dthree/
+?? dtwo/
+?? three
+EOF
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 0
+gitignore invalidation: 1
+directory invalidation: 1
+opendir: 4
+EOF
+	test_cmp ../trace.expect ../trace
+
+'
+
+test_expect_success 'verify untracked cache dump' '
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
+.gitignore
+dthree/
+dtwo/
+three
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+three
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_expect_success 'new info/exclude invalidates everything' '
+	avoid_racy &&
+	echo three >>.git/info/exclude &&
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	cat >../status.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? .gitignore
+?? dtwo/
+EOF
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 0
+gitignore invalidation: 1
+directory invalidation: 0
+opendir: 4
+EOF
+	test_cmp ../trace.expect ../trace
+'
+
+test_expect_success 'verify untracked cache dump' '
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
+.gitignore
+dtwo/
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_expect_success 'move two from tracked to untracked' '
+	git rm --cached two &&
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_expect_success 'status after the move' '
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	cat >../status.expect <<EOF &&
+A  done/one
+A  one
+?? .gitignore
+?? dtwo/
+?? two
+EOF
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 0
+gitignore invalidation: 0
+directory invalidation: 0
+opendir: 1
+EOF
+	test_cmp ../trace.expect ../trace
+'
+
+test_expect_success 'verify untracked cache dump' '
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
+.gitignore
+dtwo/
+two
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_expect_success 'move two from untracked to tracked' '
+	git add two &&
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_expect_success 'status after the move' '
+	: >../trace &&
+	GIT_TRACE_UNTRACKED="$TRASH_DIRECTORY/trace" \
+	git status --porcelain >../actual &&
+	cat >../status.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? .gitignore
+?? dtwo/
+EOF
+	test_cmp ../status.expect ../actual &&
+	cat >../trace.expect <<EOF &&
+node creation: 0
+gitignore invalidation: 0
+directory invalidation: 0
+opendir: 1
+EOF
+	test_cmp ../trace.expect ../trace
+'
+
+test_expect_success 'verify untracked cache dump' '
+	test-dump-untracked-cache >../actual &&
+	cat >../expect <<EOF &&
+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0
+core.excludesfile 0000000000000000000000000000000000000000
+exclude_per_dir .gitignore
+flags 00000006
+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid
+.gitignore
+dtwo/
+/done/ 0000000000000000000000000000000000000000 recurse valid
+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid
+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid
+two
+EOF
+	test_cmp ../expect ../actual
+'
+
+test_done
diff --git a/test-dump-untracked-cache.c b/test-dump-untracked-cache.c
new file mode 100644
index 0000000..c55db09
--- /dev/null
+++ b/test-dump-untracked-cache.c
@@ -0,0 +1,61 @@
+#include "cache.h"
+#include "dir.h"
+
+static int compare_untracked(const void *a_, const void *b_)
+{
+	const char *const *a = a_;
+	const char *const *b = b_;
+	return strcmp(*a, *b);
+}
+
+static int compare_dir(const void *a_, const void *b_)
+{
+	const struct untracked_cache_dir *const *a = a_;
+	const struct untracked_cache_dir *const *b = b_;
+	return strcmp((*a)->name, (*b)->name);
+}
+
+static void dump(struct untracked_cache_dir *ucd, struct strbuf *base)
+{
+	int i, len;
+	qsort(ucd->untracked, ucd->untracked_nr, sizeof(*ucd->untracked),
+	      compare_untracked);
+	qsort(ucd->dirs, ucd->dirs_nr, sizeof(*ucd->dirs),
+	      compare_dir);
+	len = base->len;
+	strbuf_addf(base, "%s/", ucd->name);
+	printf("%s %s", base->buf,
+	       sha1_to_hex(ucd->exclude_sha1));
+	if (ucd->recurse)
+		fputs(" recurse", stdout);
+	if (ucd->check_only)
+		fputs(" check_only", stdout);
+	if (ucd->valid)
+		fputs(" valid", stdout);
+	printf("\n");
+	for (i = 0; i < ucd->untracked_nr; i++)
+		printf("%s\n", ucd->untracked[i]);
+	for (i = 0; i < ucd->dirs_nr; i++)
+		dump(ucd->dirs[i], base);
+	strbuf_setlen(base, len);
+}
+
+int main(int ac, char **av)
+{
+	struct untracked_cache *uc;
+	struct strbuf base = STRBUF_INIT;
+	if (read_cache() < 0)
+		die("unable to read index file");
+	uc = the_index.untracked;
+	if (!uc) {
+		printf("no untracked cache\n");
+		return 0;
+	}
+	printf("info/exclude %s\n", sha1_to_hex(uc->info_exclude_sha1));
+	printf("core.excludesfile %s\n", sha1_to_hex(uc->excludes_file_sha1));
+	printf("exclude_per_dir %s\n", uc->exclude_per_dir);
+	printf("flags %08x\n", uc->dir_flags);
+	if (uc->root)
+		dump(uc->root, &base);
+	return 0;
+}
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-05-07 14:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-07 14:51 [PATCH 00/20] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 01/20] dir.c: coding style fix Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 02/20] dir.h: move struct exclude declaration to top level Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 03/20] prep_exclude: remove the artificial PATH_MAX limit Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 04/20] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 05/20] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 06/20] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 07/20] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 08/20] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 09/20] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 10/20] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 11/20] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 12/20] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 13/20] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 14/20] untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 15/20] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 16/20] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 17/20] status: support untracked cache Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 18/20] update-index: manually enable or disable " Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 19/20] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
2014-05-07 14:52 ` [PATCH 20/20] t7063: tests for " Nguyễn Thái Ngọc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).