All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr()
@ 2016-06-26  4:14 David Turner
  2016-06-26  4:14 ` [PATCH v13 02/20] read-cache: allow to keep mmap'd memory after reading David Turner
                   ` (17 more replies)
  0 siblings, 18 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index d9fb78b..16cc487 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1345,7 +1345,7 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
 	unsigned char sha1[20];
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 02/20] read-cache: allow to keep mmap'd memory after reading
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 03/20] pkt-line: add gentle version of packet_write David Turner
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Later, we will introduce git index-helper to share this memory with
other git processes.

We only unmap it when we discard the index (although the kernel may of
course choose to page it out).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h      |  3 +++
 read-cache.c | 13 ++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index b829410..4180e2b 100644
--- a/cache.h
+++ b/cache.h
@@ -333,11 +333,14 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 keep_mmap : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	struct untracked_cache *untracked;
+	void *mmap;
+	size_t mmap_size;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 16cc487..3cb0ec3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1574,6 +1574,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
+	if (istate->keep_mmap) {
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	}
 	close(fd);
 
 	hdr = mmap;
@@ -1626,11 +1630,13 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += 8;
 		src_offset += extsize;
 	}
-	munmap(mmap, mmap_size);
+	if (!istate->keep_mmap)
+		munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
 	munmap(mmap, mmap_size);
+	istate->mmap = NULL;
 	die("index file corrupt");
 }
 
@@ -1655,6 +1661,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		discard_index(split_index->base);
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	split_index->base->keep_mmap = istate->keep_mmap;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1698,6 +1705,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->keep_mmap && istate->mmap) {
+		munmap(istate->mmap, istate->mmap_size);
+		istate->mmap = NULL;
+	}
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 03/20] pkt-line: add gentle version of packet_write
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
  2016-06-26  4:14 ` [PATCH v13 02/20] read-cache: allow to keep mmap'd memory after reading David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff David Turner
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

packet_write calls write_or_die, which dies with a sigpipe even if
calling code has explicitly blocked that signal.

Add packet_write_gently and packet_flush_gently, which don't.  Soon,
we will use this for communication with git index-helper, which, being
merely an optimization, should be permitted to die without disrupting
clients.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 pkt-line.c | 18 ++++++++++++++++++
 pkt-line.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..f964446 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,12 @@ void packet_flush(int fd)
 	write_or_die(fd, "0000", 4);
 }
 
+int packet_flush_gently(int fd)
+{
+	packet_trace("0000", 4, 1);
+	return write_in_full(fd, "0000", 4) != 4;
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
 	packet_trace("0000", 4, 1);
@@ -130,6 +136,18 @@ void packet_write(int fd, const char *fmt, ...)
 	write_or_die(fd, buf.buf, buf.len);
 }
 
+int packet_write_gently(int fd, const char *fmt, ...)
+{
+	static struct strbuf buf = STRBUF_INIT;
+	va_list args;
+
+	strbuf_reset(&buf);
+	va_start(args, fmt);
+	format_packet(&buf, fmt, args);
+	va_end(args);
+	return write_in_full(fd, buf.buf, buf.len) != buf.len;
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
 	va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..deffcb5 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,7 +20,9 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
+int packet_flush_gently(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
  2016-06-26  4:14 ` [PATCH v13 02/20] read-cache: allow to keep mmap'd memory after reading David Turner
  2016-06-26  4:14 ` [PATCH v13 03/20] pkt-line: add gentle version of packet_write David Turner
@ 2016-06-26  4:14 ` David Turner
       [not found]   ` <CAFUO74nENapwVsM3CUst9AHqy5LcKTFBCnJxGXPk8E952t+X5Q@mail.gmail.com>
  2016-06-30 13:06   ` Johannes Schindelin
  2016-06-26  4:14 ` [PATCH v13 05/20] index-helper: add --strict David Turner
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Ramsay Jones, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Instead of reading the index from disk and worrying about disk
corruption, the index is cached in memory (memory bit-flips happen
too, but hopefully less often). The result is faster read. Read time
is reduced by 70%.

The biggest gain is not having to verify the trailing SHA-1, which
takes lots of time especially on large index files. But this also
opens doors for further optimizations:

 - we could create an in-memory format that's essentially the memory
   dump of the index to eliminate most of parsing/allocation
   overhead. The mmap'd memory can be used straight away. Experiment
   [1] shows we could reduce read time by 88%.

 - we could cache non-index info such as name hash

Shared memory is done by storing files in a per-repository temporary
directory.  This is more portable than shm (which requires
posix-realtime and has various quirks on OS X).  It might even work on
Windows, although this has not been tested. The shared memory file's
name follows the template "shm-<object>-<SHA1>" where <SHA1> is the
trailing SHA-1 of the index file. <object> is "index" for cached index
files (and might later be "name-hash" for name-hash cache). If such
shared memory exists, it contains the same index content as on
disk. The content is already validated by the daemon and git won't
validate it again (except comparing the trailing SHA-1s).

We also add some bits to the index (to_shm and from_shm) to track
when an index came from shared memory or is going to shared memory.

We keep this daemon's logic as thin as possible. The "brain" stays in
git. So the daemon can read and validate stuff, but that's all it's
allowed to do. It does not add/create new information. It doesn't even
accept direct updates from git.

Git can poke the daemon via unix domain sockets to tell it to refresh
the index cache, or to keep it alive some more minutes. It can't give
any real index data directly to the daemon. Real data goes to disk
first, then the daemon reads and verifies it from there. The daemon
only handles $GIT_DIR/index, not temporary index files; it only gets
poked for the former.

$GIT_DIR/index-helper.sock is the socket for the daemon process. The
daemon reads from the socket and executes commands.

Named pipes were considered for portability reasons, but then commands
that need replies from the daemon would have to open their own pipes,
since a named pipe should only have one reader.  Unix domain sockets
don't have this problem.

On webkit.git with index format v2, duplicating 8 times to 1.5m
entries and 236MB in size:

(vanilla)      0.50 s: read_index_from .git/index
(index-helper) 0.18 s: read_index_from .git/index

Interestingly with index v4, we get less out of index-helper. It makes
sense as v4 requires more processing after loading the index:

(vanilla)      0.37 s: read_index_from .git/index
(index-helper) 0.22 s: read_index_from .git/index

[1] http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=248771

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                             |   1 +
 Documentation/git-index-helper.txt     |  50 ++++++
 Makefile                               |   7 +
 cache.h                                |  11 ++
 command-list.txt                       |   1 +
 contrib/completion/git-completion.bash |   1 +
 git-compat-util.h                      |   1 +
 index-helper.c                         | 292 +++++++++++++++++++++++++++++++++
 read-cache.c                           | 123 +++++++++++++-
 t/t7900-index-helper.sh                |  23 +++
 10 files changed, 501 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100755 t/t7900-index-helper.sh

diff --git a/.gitignore b/.gitignore
index 5087ce1..b92f122 100644
--- a/.gitignore
+++ b/.gitignore
@@ -71,6 +71,7 @@
 /git-http-fetch
 /git-http-push
 /git-imap-send
+/git-index-helper
 /git-index-pack
 /git-init
 /git-init-db
diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
new file mode 100644
index 0000000..fa6e347
--- /dev/null
+++ b/Documentation/git-index-helper.txt
@@ -0,0 +1,50 @@
+git-index-helper(1)
+===================
+
+NAME
+----
+git-index-helper - A simple cache daemon for speeding up index file access
+
+SYNOPSIS
+--------
+[verse]
+'git index-helper' [options]
+
+DESCRIPTION
+-----------
+Keep the index file in memory for faster access. This daemon is per
+repository and per working tree.  So if you have two working trees
+each with a submodule, you might need four index-helpers.  (In practice,
+this is only worthwhile for large indexes, so only use it if you notice
+that git status is slow).
+
+OPTIONS
+-------
+
+--exit-after=<n>::
+	Exit if the cached index is not accessed for `<n>`
+	seconds. Specify 0 to wait forever. Default is 600.
+
+NOTES
+-----
+
+$GIT_DIR/index-helper.sock a Unix domain socket that the daemon reads
+commands from.  The directory will also contain files named
+"shm-index-<SHA1>".  These are used as backing stores for shared
+memory.  Normally the daemon will clean up these files when it exits
+or when they are no longer relevant.  But if it crashes, some objects
+could remain there and they can be safely deleted with "rm"
+command. The following commands are used to control the daemon:
+
+"refresh"::
+	Reread the index.
+
+"poke":
+	Let the daemon know the index is to be read. It keeps the
+	daemon alive longer, unless `--exit-after=0` is used.
+
+All commands and replies are in pkt-line format.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 2742a69..2b2ac05 100644
--- a/Makefile
+++ b/Makefile
@@ -1433,6 +1433,12 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifndef NO_MMAP
+ifndef NO_UNIX_SOCKETS
+	PROGRAM_OBJS += index-helper.o
+endif
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
@@ -2159,6 +2165,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
+	@echo NO_MMAP=\''$(subst ','\'',$(subst ','\'',$(NO_MMAP)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index 4180e2b..2d7af6f 100644
--- a/cache.h
+++ b/cache.h
@@ -334,6 +334,17 @@ struct index_state {
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
 		 keep_mmap : 1,
+
+		 /*
+		  * This index came from index-helper originally.
+		  */
+		 from_shm : 1,
+		 /*
+		  * We're moving this index to shared memory, so we
+		  * don't need to poke the daemon to request updates
+		  * on it.
+		  */
+		 to_shm : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/command-list.txt b/command-list.txt
index 2a94137..d9ad7d1 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -69,6 +69,7 @@ git-http-backend                        synchingrepositories
 git-http-fetch                          synchelpers
 git-http-push                           synchelpers
 git-imap-send                           foreignscminterface
+git-index-helper                        purehelpers
 git-index-pack                          plumbingmanipulators
 git-init                                mainporcelain           init
 git-instaweb                            ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index e3918c8..98794d2 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -684,6 +684,7 @@ __git_list_porcelain_commands ()
 		for-each-ref)     : plumbing;;
 		hash-object)      : plumbing;;
 		http-*)           : transport;;
+		index-helper)     : plumbing;;
 		index-pack)       : plumbing;;
 		init-db)          : deprecated;;
 		local-fetch)      : plumbing;;
diff --git a/git-compat-util.h b/git-compat-util.h
index 4743954..41a0efa 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -513,6 +513,7 @@ static inline int ends_with(const char *str, const char *suffix)
 #define PROT_READ 1
 #define PROT_WRITE 2
 #define MAP_PRIVATE 1
+#define MAP_SHARED 2
 #endif
 
 #define mmap git_mmap
diff --git a/index-helper.c b/index-helper.c
new file mode 100644
index 0000000..baf5bc6
--- /dev/null
+++ b/index-helper.c
@@ -0,0 +1,292 @@
+#include "cache.h"
+#include "parse-options.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "exec_cmd.h"
+#include "split-index.h"
+#include "lockfile.h"
+#include "cache.h"
+#include "unix-socket.h"
+#include "pkt-line.h"
+
+struct shm {
+	unsigned char sha1[20];
+	void *shm;
+	size_t size;
+};
+
+static struct shm shm_index;
+static struct shm shm_base_index;
+
+static void release_index_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	unlink(git_path("shm-index-%s", sha1_to_hex(is->sha1)));
+	is->shm = NULL;
+}
+
+static void cleanup_shm(void)
+{
+	release_index_shm(&shm_index);
+	release_index_shm(&shm_base_index);
+}
+
+static void cleanup(void)
+{
+	unlink(git_path("index-helper.sock"));
+	cleanup_shm();
+}
+
+static void cleanup_on_signal(int sig)
+{
+	/*
+	 * We ignore sigpipes -- that's just a client suddenly dying.
+	 */
+	if (sig == SIGPIPE)
+		return;
+	cleanup();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static int shared_mmap_create(size_t size, void **new_mmap, const char *path)
+{
+	int fd = -1;
+	int ret = -1;
+
+	fd = open(path, O_CREAT | O_EXCL | O_RDWR, 0600);
+
+	if (fd < 0) {
+		if (errno != EEXIST)
+			goto done;
+
+		unlink(path);
+		fd = open(path, O_CREAT | O_EXCL | O_RDWR, 0600);
+	}
+
+	if (fd < 0)
+		goto done;
+
+	if (ftruncate(fd, size))
+		goto done;
+
+	*new_mmap = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
+			 fd, 0);
+
+	if (*new_mmap == MAP_FAILED) {
+		*new_mmap = NULL;
+		goto done;
+	}
+	madvise(new_mmap, size, MADV_WILLNEED);
+
+	ret = 0;
+done:
+	if (fd >= 0)
+		close(fd);
+	return ret;
+}
+
+static void share_index(struct index_state *istate, struct shm *is)
+{
+	void *new_mmap;
+	if (istate->mmap_size <= 20 ||
+	    hashcmp(istate->sha1,
+		    (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
+	    !hashcmp(istate->sha1, is->sha1))
+		/* Nothing to do */
+		return;
+	if (shared_mmap_create(istate->mmap_size, &new_mmap,
+			       git_path("shm-index-%s",
+					sha1_to_hex(istate->sha1))) < 0) {
+		die("Failed to create shm-index file");
+	}
+
+
+	release_index_shm(is);
+	is->size = istate->mmap_size;
+	is->shm = new_mmap;
+	hashcpy(is->sha1, istate->sha1);
+
+	memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
+
+	/*
+	 * The trailing hash must be written last after everything is
+	 * written. It's the indication that the shared memory is now
+	 * ready.
+	 * The memory barrier here matches read-cache.c:try_shm.
+	 */
+	__sync_synchronize();
+
+	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
+}
+
+static void share_the_index(void)
+{
+	if (the_index.split_index && the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_base_index);
+	share_index(&the_index, &shm_index);
+	discard_index(&the_index);
+}
+
+static void set_socket_blocking_flag(int fd, int make_nonblocking)
+{
+	int flags;
+
+	flags = fcntl(fd, F_GETFL, NULL);
+
+	if (flags < 0)
+		die(_("fcntl failed"));
+
+	if (make_nonblocking)
+		flags |= O_NONBLOCK;
+	else
+		flags &= ~O_NONBLOCK;
+
+	if (fcntl(fd, F_SETFL, flags) < 0)
+		die(_("fcntl failed"));
+}
+
+static void refresh(void)
+{
+	discard_index(&the_index);
+	the_index.keep_mmap = 1;
+	the_index.to_shm    = 1;
+	if (read_cache() < 0)
+		die(_("could not read index"));
+	share_the_index();
+}
+
+#ifndef NO_MMAP
+
+static void loop(int fd, int idle_in_seconds)
+{
+	assert(idle_in_seconds < INT_MAX / 1000);
+
+	if (idle_in_seconds == 0)
+		idle_in_seconds = -1;
+
+	while (1) {
+		struct pollfd pollfd;
+		int result, client_fd;
+		int flags;
+		char buf[4096];
+		int bytes_read;
+
+		/* Wait for a request */
+		pollfd.fd = fd;
+		pollfd.events = POLLIN;
+		result = poll(&pollfd, 1, idle_in_seconds * 1000);
+		if (result < 0) {
+			if (errno == EINTR)
+				/*
+				 * This can lead to an overlong keepalive,
+				 * but that is better than a premature exit.
+				 */
+				continue;
+			die_errno(_("poll() failed"));
+		} else if (result == 0)
+			/* timeout */
+			break;
+
+		client_fd = accept(fd, NULL, NULL);
+		if (client_fd < 0)
+			/*
+			 * An error here is unlikely -- it probably
+			 * indicates that the connecting process has
+			 * already dropped the connection.
+			 */
+			continue;
+
+		/*
+		 * Our connection to the client is blocking since a client
+		 * can always be killed by SIGINT or similar.
+		 */
+		set_socket_blocking_flag(client_fd, 0);
+
+		flags = PACKET_READ_GENTLE_ON_EOF | PACKET_READ_CHOMP_NEWLINE;
+		bytes_read = packet_read(client_fd, NULL, NULL, buf,
+					 sizeof(buf), flags);
+
+		if (bytes_read > 0) {
+			/* ensure string termination */
+			buf[bytes_read] = 0;
+			if (!strcmp(buf, "refresh")) {
+				refresh();
+			} else if (!strcmp(buf, "poke")) {
+				/*
+				 * Just a poke to keep us
+				 * alive, nothing to do.
+				 */
+			} else {
+				warning("BUG: Bogus command %s", buf);
+			}
+		} else {
+			/*
+			 * No command from client.  Probably it's just
+			 * a liveness check or client error.  Just
+			 * close up.
+			 */
+		}
+		close(client_fd);
+	}
+
+	close(fd);
+}
+
+#else
+
+static void loop(int fd, int idle_in_seconds)
+{
+	die(_("index-helper is not supported on this platform"));
+}
+
+#endif
+
+static const char * const usage_text[] = {
+	N_("git index-helper [options]"),
+	NULL
+};
+
+int main(int argc, char **argv)
+{
+	const char *prefix;
+	int idle_in_seconds = 600;
+	int fd;
+	struct strbuf socket_path = STRBUF_INIT;
+	struct option options[] = {
+		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
+			    N_("exit if not used after some seconds")),
+		OPT_END()
+	};
+
+	git_extract_argv0_path(argv[0]);
+	git_setup_gettext();
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(usage_text, options);
+
+	prefix = setup_git_directory();
+	if (parse_options(argc, (const char **)argv, prefix,
+			  options, usage_text, 0))
+		die(_("too many arguments"));
+
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
+	strbuf_git_path(&socket_path, "index-helper.sock");
+
+	fd = unix_stream_listen(socket_path.buf);
+	if (fd < 0)
+		die_errno(_("could not set up index-helper socket"));
+
+	if (idle_in_seconds >= INT_MAX / 1000)
+		die("--exit-after value must be less than %d seconds",
+		    INT_MAX / 1000);
+
+	loop(fd, idle_in_seconds);
+
+	close(fd);
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 3cb0ec3..10d5465 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,9 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "unix-socket.h"
+#include "pkt-line.h"
+#include "sigchain.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1541,6 +1544,96 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int poke_daemon(struct index_state *istate,
+		       const struct stat *st, int refresh_cache)
+{
+	int fd;
+	int ret = 0;
+	const char *socket_path;
+
+	/* if this is from index-helper, do not poke itself (recursively) */
+	if (istate->to_shm)
+		return 0;
+
+	socket_path = git_path("index-helper.sock");
+
+	fd = unix_stream_connect(socket_path);
+	sigchain_push(SIGPIPE, SIG_IGN);
+	if (refresh_cache) {
+		packet_write_gently(fd, "refresh");
+	} else {
+		packet_write_gently(fd, "poke");
+	}
+	packet_flush_gently(fd);
+
+	close(fd);
+	sigchain_pop(SIGPIPE);
+	return ret;
+}
+
+static int is_main_index(struct index_state *istate)
+{
+	return istate == &the_index ||
+		(the_index.split_index &&
+		 istate == the_index.split_index->base);
+}
+
+/*
+ * Try to open and verify a cached shm index if available. Return 0 if
+ * succeeds (istate->mmap and istate->mmap_size are updated). Return
+ * negative otherwise.
+ */
+static int try_shm(struct index_state *istate)
+{
+	void *new_mmap = NULL;
+	size_t old_size = istate->mmap_size;
+	ssize_t new_size;
+	const unsigned char *sha1;
+	struct stat st;
+	int fd = -1;
+
+	if (!is_main_index(istate) ||
+	    old_size <= 20 ||
+	    stat(git_path("index-helper.sock"), &st))
+		return -1;
+	if (poke_daemon(istate, &st, 0))
+		return -1;
+	sha1 = (unsigned char *)istate->mmap + old_size - 20;
+
+	fd = open(git_path("shm-index-%s", sha1_to_hex(sha1)), O_RDONLY);
+	if (fd < 0)
+		goto fail;
+
+	if (fstat(fd, &st))
+		goto fail;
+
+	new_size = st.st_size;
+	new_mmap = mmap(NULL, new_size, PROT_READ, MAP_SHARED, fd, 0);
+	if (new_size <= 20 ||
+	    hashcmp((unsigned char *)istate->mmap + old_size - 20,
+		    (unsigned char *)new_mmap + new_size - 20)) {
+		if (new_mmap)
+			munmap(new_mmap, new_size);
+		goto fail;
+	}
+
+	/* The memory barrier here matches index-helper.c:share_index. */
+	__sync_synchronize();
+
+	munmap(istate->mmap, istate->mmap_size);
+	istate->mmap = new_mmap;
+	istate->mmap_size = new_size;
+	istate->from_shm = 1;
+	close(fd);
+	return 0;
+
+fail:
+	if (fd >= 0)
+		close(fd);
+	poke_daemon(istate, &st, 1);
+	return -1;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1555,6 +1648,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	istate->from_shm = 0;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
 	fd = open(path, O_RDONLY);
@@ -1574,15 +1668,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
-	if (istate->keep_mmap) {
-		istate->mmap = mmap;
-		istate->mmap_size = mmap_size;
-	}
 	close(fd);
 
-	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	istate->mmap = mmap;
+	istate->mmap_size = mmap_size;
+	if (try_shm(istate) &&
+	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
 		goto unmap;
+	hdr = mmap = istate->mmap;
+	mmap_size = istate->mmap_size;
+	if (!istate->keep_mmap)
+		istate->mmap = NULL;
 
 	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
 	istate->version = ntohl(hdr->hdr_version);
@@ -1662,6 +1758,8 @@ int read_index_from(struct index_state *istate, const char *path)
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 	split_index->base->keep_mmap = istate->keep_mmap;
+	split_index->base->to_shm    = istate->to_shm;
+	split_index->base->from_shm  = istate->from_shm;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1712,6 +1810,8 @@ int discard_index(struct index_state *istate)
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
+	istate->from_shm = 0;
+	istate->to_shm   = 0;
 	return 0;
 }
 
@@ -2138,9 +2238,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
 	       (COMMIT_LOCK | CLOSE_LOCK));
-	if (flags & COMMIT_LOCK)
-		return commit_locked_index(lock);
-	else if (flags & CLOSE_LOCK)
+	if (flags & COMMIT_LOCK) {
+		struct stat st;
+		ret = commit_locked_index(lock);
+		if (!ret && is_main_index(istate) &&
+		    !stat(git_path("index-helper.sock"), &st))
+			poke_daemon(istate, &st, 1);
+		return ret;
+	} else if (flags & CLOSE_LOCK)
 		return close_lock_file(lock);
 	else
 		return ret;
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
new file mode 100755
index 0000000..114c112
--- /dev/null
+++ b/t/t7900-index-helper.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+#
+# Copyright (c) 2016, Twitter, Inc
+#
+
+test_description='git-index-helper
+
+Testing git index-helper
+'
+
+. ./test-lib.sh
+
+test -n "$NO_MMAP" && {
+	skip_all='skipping index-helper tests: no mmap'
+	test_done
+}
+
+test_expect_success 'index-helper smoke test' '
+	git index-helper --exit-after 1 &&
+	test_path_is_missing .git/index-helper.sock
+'
+
+test_done
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 05/20] index-helper: add --strict
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (2 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 06/20] daemonize(): set a flag before exiting the main process David Turner
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

There are "holes" in the index-helper approach because the shared
memory is not verified again by git. If $USER is compromised, shared
memory could be modified. But anyone who could do this could already
modify $GIT_DIR/index. A more realistic risk is some bugs in
index-helper that produce corrupt shared memory. --strict is added to
avoid that.

Strictly speaking there's still a very small gap where corrupt shared
memory could still be read by git: after we write the trailing SHA-1 in
the shared memory (thus signaling "this shm is ready") and before
verify_shm() detects an error.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt |  9 +++++++
 cache.h                            |  1 +
 index-helper.c                     | 48 ++++++++++++++++++++++++++++++++++++++
 read-cache.c                       |  9 ++++---
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index fa6e347..ca5a9de 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -25,6 +25,15 @@ OPTIONS
 	Exit if the cached index is not accessed for `<n>`
 	seconds. Specify 0 to wait forever. Default is 600.
 
+--strict::
+--no-strict::
+	Strict mode makes index-helper verify the shared memory after
+	it's created. If the result does not match what's read from
+	$GIT_DIR/index, the shared memory is destroyed. This makes
+	index-helper take more than double the amount of time required
+	for reading an index, but because it will happen in the
+	background, it's not noticable. `--strict` is enabled by default.
+
 NOTES
 -----
 
diff --git a/cache.h b/cache.h
index 2d7af6f..6cb0d02 100644
--- a/cache.h
+++ b/cache.h
@@ -345,6 +345,7 @@ struct index_state {
 		  * on it.
 		  */
 		 to_shm : 1,
+		 always_verify_trailing_sha1 : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/index-helper.c b/index-helper.c
index baf5bc6..d1e9287 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -17,6 +17,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static int to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -122,11 +123,56 @@ static void share_index(struct index_state *istate, struct shm *is)
 	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
 }
 
+static int verify_shm(void)
+{
+	int i;
+	struct index_state istate;
+	memset(&istate, 0, sizeof(istate));
+	istate.always_verify_trailing_sha1 = 1;
+	istate.to_shm = 1;
+	i = read_index(&istate);
+	if (i != the_index.cache_nr)
+		goto done;
+	for (; i < the_index.cache_nr; i++) {
+		struct cache_entry *base, *ce;
+		/* namelen is checked separately */
+		const unsigned int ondisk_flags =
+			CE_STAGEMASK | CE_VALID | CE_EXTENDED_FLAGS;
+		unsigned int ce_flags, base_flags, ret;
+		base = the_index.cache[i];
+		ce = istate.cache[i];
+		if (ce->ce_namelen != base->ce_namelen ||
+		    strcmp(ce->name, base->name)) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+		ce_flags = ce->ce_flags;
+		base_flags = base->ce_flags;
+		/* only on-disk flags matter */
+		ce->ce_flags   &= ondisk_flags;
+		base->ce_flags &= ondisk_flags;
+		ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
+			     offsetof(struct cache_entry, name) -
+			     offsetof(struct cache_entry, ce_stat_data));
+		ce->ce_flags = ce_flags;
+		base->ce_flags = base_flags;
+		if (ret) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+	}
+done:
+	discard_index(&istate);
+	return i == the_index.cache_nr;
+}
+
 static void share_the_index(void)
 {
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
+	if (to_verify && !verify_shm())
+		cleanup_shm();
 	discard_index(&the_index);
 }
 
@@ -258,6 +304,8 @@ int main(int argc, char **argv)
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
 			    N_("exit if not used after some seconds")),
+		OPT_BOOL(0, "strict", &to_verify,
+			 N_("verify shared memory after creating")),
 		OPT_END()
 	};
 
diff --git a/read-cache.c b/read-cache.c
index 10d5465..befc499 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1672,9 +1672,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 
 	istate->mmap = mmap;
 	istate->mmap_size = mmap_size;
-	if (try_shm(istate) &&
-	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
-		goto unmap;
+	if (try_shm(istate)) {
+		if (verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
+	} else if (istate->always_verify_trailing_sha1 &&
+		   verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
 	hdr = mmap = istate->mmap;
 	mmap_size = istate->mmap_size;
 	if (!istate->keep_mmap)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 06/20] daemonize(): set a flag before exiting the main process
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (3 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 05/20] index-helper: add --strict David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 07/20] index-helper: add --detach David Turner
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

This allows signal handlers and atexit functions to realize this
situation and not clean up.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/gc.c | 2 +-
 cache.h      | 2 +-
 daemon.c     | 2 +-
 setup.c      | 4 +++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c583aad..37180de 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -385,7 +385,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			 * failure to daemonize is ok, we'll continue
 			 * in foreground
 			 */
-			daemonized = !daemonize();
+			daemonized = !daemonize(NULL);
 		}
 	} else
 		add_repack_all_option();
diff --git a/cache.h b/cache.h
index 6cb0d02..4c1529a 100644
--- a/cache.h
+++ b/cache.h
@@ -539,7 +539,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int);
 extern int init_db(const char *template_dir, unsigned int flags);
 
 extern void sanitize_stdfds(void);
-extern int daemonize(void);
+extern int daemonize(int *);
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
diff --git a/daemon.c b/daemon.c
index 8d45c33..a5cf954 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1365,7 +1365,7 @@ int main(int argc, char **argv)
 		return execute();
 
 	if (detach) {
-		if (daemonize())
+		if (daemonize(NULL))
 			die("--detach not supported on this platform");
 	} else
 		sanitize_stdfds();
diff --git a/setup.c b/setup.c
index de1a2a7..9adf13f 100644
--- a/setup.c
+++ b/setup.c
@@ -1017,7 +1017,7 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
-int daemonize(void)
+int daemonize(int *daemonized)
 {
 #ifdef NO_POSIX_GOODIES
 	errno = ENOSYS;
@@ -1029,6 +1029,8 @@ int daemonize(void)
 		case -1:
 			die_errno("fork failed");
 		default:
+			if (daemonized)
+				*daemonized = 1;
 			exit(0);
 	}
 	if (setsid() == -1)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 07/20] index-helper: add --detach
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (4 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 06/20] daemonize(): set a flag before exiting the main process David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 08/20] index-helper: log warnings David Turner
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

We detach after creating and opening the socket, because otherwise
we might return control to the shell before index-helper is ready to
accept commands.  This might lead to flaky tests.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt | 3 +++
 index-helper.c                     | 9 +++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index ca5a9de..228b1df 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -34,6 +34,9 @@ OPTIONS
 	for reading an index, but because it will happen in the
 	background, it's not noticable. `--strict` is enabled by default.
 
+--detach::
+	Detach from the shell.
+
 NOTES
 -----
 
diff --git a/index-helper.c b/index-helper.c
index d1e9287..7cc8166 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -17,7 +17,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
-static int to_verify = 1;
+static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -36,6 +36,8 @@ static void cleanup_shm(void)
 
 static void cleanup(void)
 {
+	if (daemonized)
+		return;
 	unlink(git_path("index-helper.sock"));
 	cleanup_shm();
 }
@@ -298,7 +300,7 @@ static const char * const usage_text[] = {
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600;
+	int idle_in_seconds = 600, detach = 0;
 	int fd;
 	struct strbuf socket_path = STRBUF_INIT;
 	struct option options[] = {
@@ -306,6 +308,7 @@ int main(int argc, char **argv)
 			    N_("exit if not used after some seconds")),
 		OPT_BOOL(0, "strict", &to_verify,
 			 N_("verify shared memory after creating")),
+		OPT_BOOL(0, "detach", &detach, N_("detach the process")),
 		OPT_END()
 	};
 
@@ -333,6 +336,8 @@ int main(int argc, char **argv)
 		die("--exit-after value must be less than %d seconds",
 		    INT_MAX / 1000);
 
+	if (detach && daemonize(&daemonized))
+		die_errno(_("unable to detach"));
 	loop(fd, idle_in_seconds);
 
 	close(fd);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 08/20] index-helper: log warnings
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (5 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 07/20] index-helper: add --detach David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 09/20] read-cache: add watchman 'WAMA' extension David Turner
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

Instead of writing warnings to stderr, write them to a log.  Later, we'll
probably be daemonized, so writing to stderr will be pointless.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt |  3 +++
 index-helper.c                     | 12 +++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 228b1df..3d2c829 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -57,6 +57,9 @@ command. The following commands are used to control the daemon:
 
 All commands and replies are in pkt-line format.
 
+In the event of an error, messages may be written to
+$GIT_DIR/index-helper.log.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/index-helper.c b/index-helper.c
index 7cc8166..142af7a 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -103,7 +103,8 @@ static void share_index(struct index_state *istate, struct shm *is)
 	if (shared_mmap_create(istate->mmap_size, &new_mmap,
 			       git_path("shm-index-%s",
 					sha1_to_hex(istate->sha1))) < 0) {
-		die("Failed to create shm-index file");
+		warning("Failed to create shm-index file");
+		exit(1);
 	}
 
 
@@ -336,8 +337,17 @@ int main(int argc, char **argv)
 		die("--exit-after value must be less than %d seconds",
 		    INT_MAX / 1000);
 
+	if (detach) {
+		FILE *fp = fopen(git_path("index-helper.log"), "a");
+		if (!fp)
+			die("failed to open %s for writing",
+			    git_path("index-helper.log"));
+		set_error_handle(fp);
+	}
+
 	if (detach && daemonize(&daemonized))
 		die_errno(_("unable to detach"));
+
 	loop(fd, idle_in_seconds);
 
 	close(fd);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 09/20] read-cache: add watchman 'WAMA' extension
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (6 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 08/20] index-helper: log warnings David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 10/20] watchman: support watchman to reduce index refresh cost David Turner
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The extension contains a bitmap, one bit for each entry in the
index. If the n-th bit is zero, the n-th entry is considered
unchanged, we can ce_mark_uptodate() it without refreshing. If the bit
is non-zero and we found out the corresponding file is clean after
refresh, we can clear the bit.

In addition, there's a list of directories in the untracked-cache
to invalidate (because they have new or modified entries).

The 'skipping refresh' bit is not in this patch yet as we would need
watchman. More details in later patches.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/technical/index-format.txt |  22 ++++++
 cache.h                                  |   4 +
 dir.h                                    |   3 +
 read-cache.c                             | 127 ++++++++++++++++++++++++++++++-
 4 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c..86ed3a6 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,25 @@ The remaining data of each directory block is grouped by type:
     in the previous ewah bitmap.
 
   - One NUL.
+
+== Watchman cache
+
+  The watchman cache tracks files for which watchman has told us about
+  changes.  The signature for this extension is { 'W', 'A', 'M', 'A' }.
+
+  The extension starts with
+
+  - A NUL-terminated string: the watchman vector clock at the last
+    time we heard from watchman.
+
+  - 32-bit bitmap size: the size of the CE_WATCHMAN_DIRTY bitmap
+
+  - 32-bit untracked cache entry count: the number of dirty untracked
+    cache entries
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is CE_WATCHMAN_DIRTY.
+
+  - a list of N NUL-terminated strings.  Each is a directory that should
+    be marked dirty in the untracked cache because watchman has told us
+    about an update to a file in it.
diff --git a/cache.h b/cache.h
index 4c1529a..f10992d 100644
--- a/cache.h
+++ b/cache.h
@@ -182,6 +182,8 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
+#define CE_WATCHMAN_DIRTY  (0x0001)
+
 /*
  * Range 0xFFFF0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -320,6 +322,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define WATCHMAN_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -353,6 +356,7 @@ struct index_state {
 	struct untracked_cache *untracked;
 	void *mmap;
 	size_t mmap_size;
+	char *last_update;
 };
 
 extern struct index_state the_index;
diff --git a/dir.h b/dir.h
index cd46f30..896b64a 100644
--- a/dir.h
+++ b/dir.h
@@ -139,6 +139,9 @@ struct untracked_cache {
 	int gitignore_invalidated;
 	int dir_invalidated;
 	int dir_opened;
+	/* watchman invalidation data */
+	unsigned int use_watchman : 1;
+	struct string_list invalid_untracked;
 };
 
 struct dir_struct {
diff --git a/read-cache.c b/read-cache.c
index befc499..e0fc634 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -21,6 +21,7 @@
 #include "unix-socket.h"
 #include "pkt-line.h"
 #include "sigchain.h"
+#include "ewah/ewok.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -43,11 +44,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_WATCHMAN 0x57414D41	  /* "WAMA" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | \
+		 WATCHMAN_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -1222,8 +1225,23 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			if (ce->ce_flags & CE_WATCHMAN_DIRTY) {
+				/* The rule is index-helper sets
+				 * CE_WATCHMAN_DIRTY when a file is changed,
+				 * then git clears it when it has verified
+				 * that the in-index entry now matches the
+				 * worktree version. index-helper does not
+				 * clear the bit and git does not set it. We
+				 * have verified here that stat info (and even
+				 * content) matches, so it's safe to clear
+				 * CE_WATCHMAN_DIRTY now.
+				 */
+				ce->ce_flags          &= ~CE_WATCHMAN_DIRTY;
+				istate->cache_changed |= WATCHMAN_CHANGED;
+			}
 			continue;
+		}
 		if (!new) {
 			const char *fmt;
 
@@ -1367,6 +1385,94 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static void mark_no_watchman(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	assert(pos < istate->cache_nr);
+	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+}
+
+static int read_watchman_ext(struct index_state *istate, const void *data,
+			     unsigned long sz)
+{
+	struct ewah_bitmap *bitmap;
+	int ret, len;
+	uint32_t bitmap_size;
+	uint32_t untracked_nr;
+
+	if (memchr(data, 0, sz) == NULL)
+		return error("invalid extension");
+
+	len = strlen(data) + 1;
+	memcpy(&bitmap_size, (const char *)data + len, 4);
+	memcpy(&untracked_nr, (const char *)data + len + 4, 4);
+	untracked_nr = ntohl(untracked_nr);
+	bitmap_size = ntohl(bitmap_size);
+
+	bitmap = ewah_new();
+	ret = ewah_read_mmap(bitmap, (const char *)data + len + 8, bitmap_size);
+	if (ret != bitmap_size) {
+		ewah_free(bitmap);
+		return error("failed to parse ewah bitmap reading watchman index extension");
+	}
+	istate->last_update = xstrdup(data);
+	ewah_each_bit(bitmap, mark_no_watchman, istate);
+	ewah_free(bitmap);
+
+	/*
+	 * TODO: update the untracked cache from the untracked data in this
+	 * extension.
+	 */
+	return 0;
+}
+
+static int untracked_entry_append(struct string_list_item *item, void *sbvoid)
+{
+	struct strbuf *sb = sbvoid;
+
+	strbuf_addstr(sb, item->string);
+	strbuf_addch(sb, 0);
+	return 0;
+}
+
+void write_watchman_ext(struct strbuf *sb, struct index_state *istate)
+{
+	struct ewah_bitmap *bitmap;
+	int i;
+	int ewah_start;
+	int ewah_size = 0;
+	int fixup = 0;
+
+	strbuf_add(sb, istate->last_update, strlen(istate->last_update) + 1);
+	fixup = sb->len;
+	strbuf_add(sb, &ewah_size, 4); /* we'll fix this up later */
+	if (istate->untracked) {
+		uint32_t nr = istate->untracked->invalid_untracked.nr;
+		nr = htonl(nr);
+		strbuf_add(sb, &nr, 4);
+	} else {
+		/* zero */
+		strbuf_add(sb, &ewah_size, 4);
+	}
+
+	ewah_start = sb->len;
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHMAN_DIRTY)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+
+	/* fix up size field */
+	ewah_size = sb->len - ewah_start;
+	ewah_size = htonl(ewah_size);
+	memcpy(sb->buf + fixup, &ewah_size, 4);
+
+	if (istate->untracked)
+		for_each_string_list(&istate->untracked->invalid_untracked,
+				     untracked_entry_append, sb);
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1384,6 +1490,11 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+
+	case CACHE_EXT_WATCHMAN:
+		read_watchman_ext(istate, data, sz);
+		break;
+
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1815,6 +1926,8 @@ int discard_index(struct index_state *istate)
 	istate->untracked = NULL;
 	istate->from_shm = 0;
 	istate->to_shm   = 0;
+	free(istate->last_update);
+	istate->last_update = NULL;
 	return 0;
 }
 
@@ -2212,6 +2325,16 @@ static int do_write_index(struct index_state *istate, int newfd,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_watchman_ext(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCHMAN, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 10/20] watchman: support watchman to reduce index refresh cost
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (7 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 09/20] read-cache: add watchman 'WAMA' extension David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat() David Turner
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The previous patch has the logic to clear bits in 'WAMA' bitmap. This
patch has logic to set bits as told by watchman. The missing bit,
_using_ these bits, are not here yet.

A lot of this code is written by David Turner originally, mostly from
[1]. I'm just copying and polishing it a bit.

[1] http://article.gmane.org/gmane.comp.version-control.git/248006

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile           |  12 +++++
 cache.h            |   1 +
 config.c           |   5 ++
 configure.ac       |   8 ++++
 environment.c      |   2 +
 watchman-support.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 watchman-support.h |   7 +++
 7 files changed, 170 insertions(+)
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

diff --git a/Makefile b/Makefile
index 2b2ac05..7f4ab41 100644
--- a/Makefile
+++ b/Makefile
@@ -451,6 +451,7 @@ MSGFMT = msgfmt
 CURL_CONFIG = curl-config
 PTHREAD_LIBS = -lpthread
 PTHREAD_CFLAGS =
+WATCHMAN_LIBS =
 GCOV = gcov
 
 export TCL_PATH TCLTK_PATH
@@ -1416,6 +1417,13 @@ else
 	LIB_OBJS += thread-utils.o
 endif
 
+ifdef USE_WATCHMAN
+	LIB_H += watchman-support.h
+	LIB_OBJS += watchman-support.o
+	WATCHMAN_LIBS = -lwatchman
+	BASIC_CFLAGS += -DUSE_WATCHMAN
+endif
+
 ifdef HAVE_PATHS_H
 	BASIC_CFLAGS += -DHAVE_PATHS_H
 endif
@@ -2027,6 +2035,9 @@ git-remote-testsvn$X: remote-testsvn.o GIT-LDFLAGS $(GITLIBS) $(VCSSVN_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) \
 	$(VCSSVN_LIB)
 
+git-index-helper$X: index-helper.o GIT-LDFLAGS $(GITLIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) $(WATCHMAN_LIBS)
+
 $(REMOTE_CURL_ALIASES): $(REMOTE_CURL_PRIMARY)
 	$(QUIET_LNCP)$(RM) $@ && \
 	ln $< $@ 2>/dev/null || \
@@ -2166,6 +2177,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
 	@echo NO_MMAP=\''$(subst ','\'',$(subst ','\'',$(NO_MMAP)))'\' >>$@+
+	@echo USE_WATCHMAN=\''$(subst ','\'',$(subst ','\'',$(USE_WATCHMAN)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index f10992d..452aea2 100644
--- a/cache.h
+++ b/cache.h
@@ -696,6 +696,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_watchman_sync_timeout;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 9ba40bc..e6dc141 100644
--- a/config.c
+++ b/config.c
@@ -882,6 +882,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.watchmansynctimeout")) {
+		core_watchman_sync_timeout = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/configure.ac b/configure.ac
index 0cd9f46..334d63b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1099,6 +1099,14 @@ AC_COMPILE_IFELSE([BSD_SYSCTL_SRC],
 	HAVE_BSD_SYSCTL=])
 GIT_CONF_SUBST([HAVE_BSD_SYSCTL])
 
+#
+# Check for watchman client library
+
+AC_CHECK_LIB([watchman], [watchman_connect],
+	[USE_WATCHMAN=YesPlease],
+	[USE_WATCHMAN=])
+GIT_CONF_SUBST([USE_WATCHMAN])
+
 ## Other checks.
 # Define USE_PIC if you need the main git objects to be built with -fPIC
 # in order to build and link perl/Git.so.  x86-64 seems to need this.
diff --git a/environment.c b/environment.c
index 6dec9d0..ad41015 100644
--- a/environment.c
+++ b/environment.c
@@ -94,6 +94,8 @@ int core_preload_index = 1;
  */
 int ignore_untracked_cache_config;
 
+int core_watchman_sync_timeout = 300;
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
diff --git a/watchman-support.c b/watchman-support.c
new file mode 100644
index 0000000..8421fe0
--- /dev/null
+++ b/watchman-support.c
@@ -0,0 +1,135 @@
+#include "cache.h"
+#include "watchman-support.h"
+#include "strbuf.h"
+#include "dir.h"
+#include <watchman.h>
+
+static struct watchman_query *make_query(const char *last_update)
+{
+	struct watchman_query *query = watchman_query();
+	watchman_query_set_fields(query, WATCHMAN_FIELD_NAME |
+					 WATCHMAN_FIELD_EXISTS |
+					 WATCHMAN_FIELD_NEWER);
+	watchman_query_set_empty_on_fresh(query, 1);
+	query->sync_timeout = core_watchman_sync_timeout;
+	if (*last_update)
+		watchman_query_set_since_oclock(query, last_update);
+	return query;
+}
+
+static struct watchman_query_result *query_watchman(
+	struct index_state *istate, struct watchman_connection *connection,
+	const char *fs_path, const char *last_update)
+{
+	struct watchman_error wm_error;
+	struct watchman_query *query;
+	struct watchman_expression *expr;
+	struct watchman_query_result *result;
+
+	query = make_query(last_update);
+	expr = watchman_true_expression();
+	result = watchman_do_query(connection, fs_path, query, expr, &wm_error);
+	watchman_free_query(query);
+	watchman_free_expression(expr);
+
+	if (!result)
+		warning("Watchman query error: %s (at %s)",
+			wm_error.message,
+			*last_update ? last_update : "the beginning");
+
+	return result;
+}
+
+static void update_index(struct index_state *istate,
+			 struct watchman_query_result *result)
+{
+	int i;
+
+	if (result->is_fresh_instance) {
+		/* let refresh clear them later */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_WATCHMAN_DIRTY;
+		goto done;
+	}
+
+	for (i = 0; i < result->nr; i++) {
+		struct watchman_stat *wm = result->stats + i;
+		int pos;
+
+		if (S_ISDIR(wm->mode) ||
+		    !strncmp(wm->name, ".git/", 5) ||
+		    strstr(wm->name, "/.git/"))
+			continue;
+
+		pos = index_name_pos(istate, wm->name, strlen(wm->name));
+		if (pos < 0) {
+			if (istate->untracked) {
+				char *name = xstrdup(wm->name);
+				char *dname = dirname(name);
+
+				/*
+				 * dirname() returns '.' for the root,
+				 * but we call it ''.
+				 */
+				if (dname[0] == '.' && dname[1] == 0)
+					string_list_append(&istate->untracked->invalid_untracked, "");
+				else
+					string_list_append(&istate->untracked->invalid_untracked,
+							   dname);
+				free(name);
+			}
+			continue;
+		}
+		/* FIXME: ignore staged entries and gitlinks too? */
+
+		istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	}
+
+done:
+	free(istate->last_update);
+	istate->last_update    = xstrdup(result->clock);
+	istate->cache_changed |= WATCHMAN_CHANGED;
+	if (istate->untracked)
+		string_list_remove_duplicates(&istate->untracked->invalid_untracked, 0);
+}
+
+int check_watchman(struct index_state *istate)
+{
+	struct watchman_error wm_error;
+	struct watchman_connection *connection;
+	struct watchman_query_result *result;
+	const char *fs_path;
+	struct timeval timeout;
+	/*
+	 * Convert core_watchman_sync_timeout, in milliseconds, to
+	 * struct timeval, in seconds and microseconds.
+	 */
+
+	fs_path = get_git_work_tree();
+	if (!fs_path)
+		return -1;
+
+	timeout.tv_sec = core_watchman_sync_timeout / 1000;
+	timeout.tv_usec = (core_watchman_sync_timeout % 1000) * 1000;
+	connection = watchman_connect(timeout, &wm_error);
+
+	if (!connection) {
+		warning("Watchman watch error: %s", wm_error.message);
+		return -1;
+	}
+
+	if (watchman_watch(connection, fs_path, &wm_error)) {
+		warning("Watchman watch error: %s", wm_error.message);
+		watchman_connection_close(connection);
+		return -1;
+	}
+
+	result = query_watchman(istate, connection, fs_path,
+				istate->last_update ? istate->last_update : "");
+	watchman_connection_close(connection);
+	if (!result)
+		return -1;
+	update_index(istate, result);
+	watchman_free_query_result(result);
+	return 0;
+}
diff --git a/watchman-support.h b/watchman-support.h
new file mode 100644
index 0000000..ee1ef2c
--- /dev/null
+++ b/watchman-support.h
@@ -0,0 +1,7 @@
+#ifndef WATCHMAN_SUPPORT_H
+#define WATCHMAN_SUPPORT_H
+
+struct index_state;
+int check_watchman(struct index_state *index);
+
+#endif /* WATCHMAN_SUPPORT_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (8 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 10/20] watchman: support watchman to reduce index refresh cost David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-30 17:55   ` Ben Peart
  2016-06-26  4:14 ` [PATCH v13 12/20] update-index: enable/disable watchman support David Turner
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Watchman is hidden behind index-helper. Before git tries to read the
index from shm, it notifies index-helper through the socket and waits
for index-helper to prepare a file for sharing memory (with
MAP_SHARED). index-helper then contacts watchman, updates 'WAMA'
extension and put it in a separate file and wakes git up with a reply
to git's socket.

Git uses this extension to not lstat unchanged entries. Git only
trusts the 'WAMA' extension when it's received from the separate file,
not from disk. Unmarked entries are "clean". Marked entries are dirty
from watchman point of view. If it finds out some entries are
'watchman-dirty', but are really unchanged (e.g. the file was changed,
then reverted back), then Git will clear the marking in 'WAMA' before
writing it down.

Hiding watchman behind index-helper means you need both daemons. You
can't run watchman alone. Not so good. But on the other hand, 'git'
binary is not linked to watchman/json libraries, which is good for
packaging. Core git package will run fine without watchman-related
packages. If they need watchman, they can install git-index-helper and
dependencies.

This also lets us trust anything in the untracked cache that we haven't
marked invalid, saving those stat() calls.

Another reason for tying watchman to index-helper is, when used with
untracked cache, we need to keep track of $GIT_WORK_TREE file
listing. That kind of list can be kept in index-helper.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt |  11 ++
 cache.h                            |   2 +
 dir.c                              |  23 +++-
 dir.h                              |   3 +
 index-helper.c                     | 107 +++++++++++++++--
 pkt-line.c                         |   2 +-
 read-cache.c                       | 239 ++++++++++++++++++++++++++++++++++---
 7 files changed, 361 insertions(+), 26 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 3d2c829..2982e03 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -55,6 +55,17 @@ command. The following commands are used to control the daemon:
 	Let the daemon know the index is to be read. It keeps the
 	daemon alive longer, unless `--exit-after=0` is used.
 
+"poke <pid> <capabilities>":
+	Like "poke", but replies with "OK".  If the "watchman"
+	capability is requested, and git is built with watchman, and
+	the index has the watchman extension, then index-helper
+	queries watchman.  When watchman replies, index-helper
+	prepares a shared memory object with the watchman index
+	extension, then replies "OK watchman".  If git was built
+	without watchman, "NAK watchman" is sent.  This shouldn't
+	happen very often, since a git built without watchman won't
+	request the watchman capability.
+
 All commands and replies are in pkt-line format.
 
 In the event of an error, messages may be written to
diff --git a/cache.h b/cache.h
index 452aea2..633e1dd 100644
--- a/cache.h
+++ b/cache.h
@@ -567,6 +567,7 @@ extern int daemonize(int *);
 
 /* Initialize and use the cache information */
 struct lock_file;
+extern int verify_index(const struct index_state *);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
@@ -574,6 +575,7 @@ extern int do_read_index(struct index_state *istate, const char *path,
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
+extern void write_watchman_ext(struct strbuf *sb, struct index_state *istate);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
diff --git a/dir.c b/dir.c
index a4a9d9f..850d4fb 100644
--- a/dir.c
+++ b/dir.c
@@ -592,9 +592,9 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
-						    struct untracked_cache_dir *dir,
-						    const char *name, int len)
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len)
 {
 	int first, last;
 	struct untracked_cache_dir *d;
@@ -1559,6 +1559,17 @@ static int valid_cached_dir(struct dir_struct *dir,
 	if (!untracked)
 		return 0;
 
+	if (dir->untracked->use_watchman) {
+		/*
+		 * With watchman, we can trust the untracked cache's
+		 * valid field.
+		 */
+		if (untracked->valid)
+			goto skip_stat;
+		else
+			invalidate_directory(dir->untracked, untracked);
+	}
+
 	if (stat(path->len ? path->buf : ".", &st)) {
 		invalidate_directory(dir->untracked, untracked);
 		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
@@ -1572,6 +1583,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 
+skip_stat:
 	if (untracked->check_only != !!check_only) {
 		invalidate_directory(dir->untracked, untracked);
 		return 0;
@@ -2427,8 +2439,10 @@ static void free_untracked(struct untracked_cache_dir *ucd)
 
 void free_untracked_cache(struct untracked_cache *uc)
 {
-	if (uc)
+	if (uc) {
 		free_untracked(uc->root);
+		string_list_clear(&uc->invalid_untracked, 0);
+	}
 	free(uc);
 }
 
@@ -2577,6 +2591,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 		return NULL;
 
 	uc = xcalloc(1, sizeof(*uc));
+	string_list_init(&uc->invalid_untracked, 1);
 	strbuf_init(&uc->ident, ident_len);
 	strbuf_add(&uc->ident, ident, ident_len);
 	load_sha1_stat(&uc->ss_info_exclude, &ouc->info_exclude_stat,
diff --git a/dir.h b/dir.h
index 896b64a..ed16746 100644
--- a/dir.h
+++ b/dir.h
@@ -312,4 +312,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 void add_untracked_cache(struct index_state *istate);
 void remove_untracked_cache(struct index_state *istate);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len);
 #endif
diff --git a/index-helper.c b/index-helper.c
index 142af7a..bb39326 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -8,15 +8,18 @@
 #include "cache.h"
 #include "unix-socket.h"
 #include "pkt-line.h"
+#include "watchman-support.h"
 
 struct shm {
 	unsigned char sha1[20];
 	void *shm;
 	size_t size;
+	pid_t pid;
 };
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static struct shm shm_watchman;
 static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
@@ -28,10 +31,21 @@ static void release_index_shm(struct shm *is)
 	is->shm = NULL;
 }
 
+static void release_watchman_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	unlink(git_path("shm-watchman-%s-%" PRIuMAX,
+			sha1_to_hex(is->sha1), (uintmax_t)is->pid));
+	is->shm = NULL;
+}
+
 static void cleanup_shm(void)
 {
 	release_index_shm(&shm_index);
 	release_index_shm(&shm_base_index);
+	release_watchman_shm(&shm_watchman);
 }
 
 static void cleanup(void)
@@ -174,9 +188,10 @@ static void share_the_index(void)
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
-	if (to_verify && !verify_shm())
+	if (to_verify && !verify_shm()) {
 		cleanup_shm();
-	discard_index(&the_index);
+		discard_index(&the_index);
+	}
 }
 
 static void set_socket_blocking_flag(int fd, int make_nonblocking)
@@ -209,6 +224,80 @@ static void refresh(void)
 
 #ifndef NO_MMAP
 
+#ifdef USE_WATCHMAN
+static void share_watchman(struct index_state *istate,
+			   struct shm *is, pid_t pid)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *shm;
+
+	write_watchman_ext(&sb, istate);
+	if (!shared_mmap_create(sb.len + 20, &shm,
+				git_path("shm-watchman-%s-%" PRIuMAX,
+					 sha1_to_hex(istate->sha1),
+					 (uintmax_t)pid))) {
+		is->size = sb.len + 20;
+		is->shm = shm;
+		is->pid = pid;
+		hashcpy(is->sha1, istate->sha1);
+
+		memcpy(shm, sb.buf, sb.len);
+		hashcpy((unsigned char *)shm + is->size - 20, is->sha1);
+	}
+	strbuf_release(&sb);
+}
+
+
+static void prepare_with_watchman(pid_t pid)
+{
+	/*
+	 * TODO: with the help of watchman, maybe we could detect if
+	 * $GIT_DIR/index is updated.
+	 */
+	if (!verify_index(&the_index))
+		refresh();
+
+	if (check_watchman(&the_index))
+		return;
+
+	share_watchman(&the_index, &shm_watchman, pid);
+}
+
+static void prepare_index(pid_t pid)
+{
+	if (shm_index.shm == NULL)
+		refresh();
+	release_watchman_shm(&shm_watchman);
+	if (the_index.last_update)
+		prepare_with_watchman(pid);
+}
+
+#endif
+
+static void reply_to_poke(int client_fd, const char *pid_buf)
+{
+	char *capabilities;
+	struct strbuf sb = STRBUF_INIT;
+
+#ifdef USE_WATCHMAN
+	pid_t client_pid = strtoull(pid_buf, NULL, 10);
+
+	prepare_index(client_pid);
+#endif
+	capabilities = strchr(pid_buf, ' ');
+
+	if (capabilities && !strcmp(capabilities, " watchman"))
+#ifdef USE_WATCHMAN
+		packet_buf_write(&sb, "OK watchman");
+#else
+		packet_buf_write(&sb, "NAK watchman");
+#endif
+	else
+		packet_buf_write(&sb, "OK");
+	if (write_in_full(client_fd, sb.buf, sb.len) != sb.len)
+		warning(_("client write failed"));
+}
+
 static void loop(int fd, int idle_in_seconds)
 {
 	assert(idle_in_seconds < INT_MAX / 1000);
@@ -263,11 +352,15 @@ static void loop(int fd, int idle_in_seconds)
 			buf[bytes_read] = 0;
 			if (!strcmp(buf, "refresh")) {
 				refresh();
-			} else if (!strcmp(buf, "poke")) {
-				/*
-				 * Just a poke to keep us
-				 * alive, nothing to do.
-				 */
+			} else if (starts_with(buf, "poke")) {
+				if (buf[4] == ' ') {
+					reply_to_poke(client_fd, buf + 5);
+				} else {
+					/*
+					 * Just a poke to keep us
+					 * alive, nothing to do.
+					 */
+				}
 			} else {
 				warning("BUG: Bogus command %s", buf);
 			}
diff --git a/pkt-line.c b/pkt-line.c
index f964446..8aa6c09 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -94,7 +94,7 @@ void packet_flush(int fd)
 int packet_flush_gently(int fd)
 {
 	packet_trace("0000", 4, 1);
-	return write_in_full(fd, "0000", 4) != 4;
+	return write_in_full(fd, "0000", 4) == 4 ? 0 : -1;
 }
 
 void packet_buf_flush(struct strbuf *buf)
diff --git a/read-cache.c b/read-cache.c
index e0fc634..8521e85 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1385,11 +1385,75 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static struct untracked_cache_dir *find_untracked_cache_dir(
+	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
+	const char *name)
+{
+	int component_len;
+	const char *end;
+	struct untracked_cache_dir *dir;
+
+	if (!*name)
+		return ucd;
+
+	end = strchr(name, '/');
+	if (end)
+		component_len = end - name;
+	else
+		component_len = strlen(name);
+
+	dir = lookup_untracked(uc, ucd, name, component_len);
+	if (dir)
+		return find_untracked_cache_dir(uc, dir, name + component_len + 1);
+
+	return NULL;
+}
+
 static void mark_no_watchman(size_t pos, void *data)
 {
 	struct index_state *istate = data;
+	struct cache_entry *ce = istate->cache[pos];
+	struct strbuf sb = STRBUF_INIT;
+	char *c;
+	struct untracked_cache_dir *dir;
+
 	assert(pos < istate->cache_nr);
-	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	ce->ce_flags |= CE_WATCHMAN_DIRTY;
+
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	strbuf_add(&sb, ce->name, ce_namelen(ce));
+
+	for (c = sb.buf + sb.len - 1; c > sb.buf; c--) {
+		if (*c == '/') {
+			strbuf_setlen(&sb, c - sb.buf);
+			break;
+		}
+	}
+
+	if (c == sb.buf)
+		strbuf_setlen(&sb, 0);
+
+	dir = find_untracked_cache_dir(istate->untracked,
+				       istate->untracked->root, sb.buf);
+	if (dir)
+		dir->valid = 0;
+
+	strbuf_release(&sb);
+}
+
+static int mark_untracked_invalid(struct string_list_item *item, void *uc)
+{
+	struct untracked_cache *untracked = uc;
+	struct untracked_cache_dir *dir;
+
+	dir = find_untracked_cache_dir(untracked, untracked->root,
+				       item->string);
+	if (dir)
+		dir->valid = 0;
+
+	return 0;
 }
 
 static int read_watchman_ext(struct index_state *istate, const void *data,
@@ -1419,10 +1483,24 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
 	ewah_each_bit(bitmap, mark_no_watchman, istate);
 	ewah_free(bitmap);
 
-	/*
-	 * TODO: update the untracked cache from the untracked data in this
-	 * extension.
-	 */
+	if (istate->untracked && istate->untracked->root) {
+		int i;
+		const char *untracked;
+
+		untracked = (const char *)data + len + 8 + bitmap_size;
+		for (i = 0; i < untracked_nr; ++i) {
+			int len = strlen(untracked);
+			string_list_append(&istate->untracked->invalid_untracked,
+					   untracked);
+			untracked += len + 1;
+		}
+
+		for_each_string_list(&istate->untracked->invalid_untracked,
+			 mark_untracked_invalid, istate->untracked);
+
+		if (untracked_nr)
+			istate->cache_changed |= WATCHMAN_CHANGED;
+	}
 	return 0;
 }
 
@@ -1655,27 +1733,88 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+/* in ms */
+#define WATCHMAN_TIMEOUT 1000
+
+static int poke_and_wait_for_reply(int fd)
+{
+	int ret = -1;
+	struct pollfd pollfd;
+	int bytes_read;
+	char reply_buf[4096];
+	const char *requested_capabilities = "";
+
+#ifdef USE_WATCHMAN
+	requested_capabilities = "watchman";
+#endif
+
+	if (fd < 0)
+		return -1;
+
+	if (packet_write_gently(fd, "poke %d %s", getpid(), requested_capabilities))
+		return -1;
+	if (packet_flush_gently(fd))
+		return -1;
+
+	/* Now wait for a reply */
+	pollfd.fd = fd;
+	pollfd.events = POLLIN;
+	if (poll(&pollfd, 1, WATCHMAN_TIMEOUT) <= 0)
+		/* No reply or error, giving up */
+		goto done_poke;
+
+	bytes_read = packet_read(fd, NULL, NULL, reply_buf, sizeof(reply_buf),
+				 PACKET_READ_GENTLE_ON_EOF |
+				 PACKET_READ_CHOMP_NEWLINE);
+
+	if (bytes_read < 0)
+		goto done_poke;
+
+	if (!strcmp(reply_buf, "NAK watchman")
+#ifdef USE_WATCHMAN
+	    || !ends_with(reply_buf, "watchman")
+#endif
+		) {
+		warning("We requested watchman support from index-helper, but "
+			"it doesn't support it. Please use a version of git "
+			"index-helper with watchman support.");
+		goto done_poke;
+	}
+
+	if (!starts_with(reply_buf, "OK"))
+		goto done_poke;
+
+	ret = 0;
+done_poke:
+	close(fd);
+	return ret;
+}
+
 static int poke_daemon(struct index_state *istate,
 		       const struct stat *st, int refresh_cache)
 {
 	int fd;
-	int ret = 0;
-	const char *socket_path;
+	int ret = -1;
 
 	/* if this is from index-helper, do not poke itself (recursively) */
 	if (istate->to_shm)
 		return 0;
 
-	socket_path = git_path("index-helper.sock");
-
-	fd = unix_stream_connect(socket_path);
+	fd = unix_stream_connect(git_path("index-helper.sock"));
+	if (fd < 0) {
+		warning("Failed to connect to index-helper socket");
+		unlink(git_path("index-helper.sock"));
+		return -1;
+	}
 	sigchain_push(SIGPIPE, SIG_IGN);
+
 	if (refresh_cache) {
 		packet_write_gently(fd, "refresh");
+		packet_flush_gently(fd);
+		ret = 0;
 	} else {
-		packet_write_gently(fd, "poke");
+		ret = poke_and_wait_for_reply(fd);
 	}
-	packet_flush_gently(fd);
 
 	close(fd);
 	sigchain_pop(SIGPIPE);
@@ -1745,6 +1884,74 @@ fail:
 	return -1;
 }
 
+static void refresh_by_watchman(struct index_state *istate)
+{
+	void *shm = NULL;
+	int length;
+	int i;
+	struct stat st;
+	int fd = -1;
+	const char *path = git_path("shm-watchman-%s-%"PRIuMAX,
+				    sha1_to_hex(istate->sha1),
+				    (uintmax_t)getpid());
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	/*
+	 * This watchman data is just for us -- no need to keep it
+	 * around once we've got it open.
+	 */
+	unlink(path);
+
+	if (fstat(fd, &st) < 0)
+		goto done;
+
+	length = st.st_size;
+	shm = mmap(NULL, length, PROT_READ, MAP_SHARED, fd, 0);
+
+	if (shm == MAP_FAILED)
+		goto done;
+
+	close(fd);
+	fd = -1;
+
+	if (length <= 20 ||
+	    hashcmp(istate->sha1, (unsigned char *)shm + length - 20) ||
+	    /*
+	     * No need to clear CE_WATCHMAN_DIRTY set by 'WAMA' on
+	     * disk. Watchman can only set more, not clear any, so
+	     * this is OR mask.
+	     */
+	    read_watchman_ext(istate, shm, length - 20))
+		goto done;
+
+	/*
+	 * Now that we've marked the invalid entries in the
+	 * untracked-cache itself, we can erase them from the list of
+	 * entries to be processed and mark the untracked cache for
+	 * watchman usage.
+	 */
+	if (istate->untracked) {
+		string_list_clear(&istate->untracked->invalid_untracked, 0);
+		istate->untracked->use_watchman = 1;
+	}
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_stage(ce) || (ce->ce_flags & CE_WATCHMAN_DIRTY))
+			continue;
+		ce_mark_uptodate(ce);
+	}
+done:
+	if (shm)
+		munmap(shm, length);
+
+	if (fd >= 0)
+		close(fd);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1864,7 +2071,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
 		post_read_index_from(istate);
-		return ret;
+		goto done;
 	}
 
 	if (split_index->base)
@@ -1885,6 +2092,10 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	post_read_index_from(istate);
+
+done:
+	if (ret > 0 && istate->from_shm && istate->last_update)
+		refresh_by_watchman(istate);
 	return ret;
 }
 
@@ -2186,7 +2397,7 @@ out:
 	return 0;
 }
 
-static int verify_index(const struct index_state *istate)
+int verify_index(const struct index_state *istate)
 {
 	return verify_index_from(istate, get_index_file());
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 12/20] update-index: enable/disable watchman support
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (9 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat() David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 13/20] unpack-trees: preserve index extensions David Turner
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt |  3 +++
 Documentation/git-update-index.txt |  6 ++++++
 builtin/update-index.c             | 15 +++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 2982e03..b2ca511 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -18,6 +18,9 @@ each with a submodule, you might need four index-helpers.  (In practice,
 this is only worthwhile for large indexes, so only use it if you notice
 that git status is slow).
 
+If you want the index-helper to accelerate untracked file checking,
+run git update-index --watchman before using it.
+
 OPTIONS
 -------
 
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index c6cbed1..6736487 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -19,6 +19,7 @@ SYNOPSIS
 	     [--ignore-submodules]
 	     [--[no-]split-index]
 	     [--[no-|test-|force-]untracked-cache]
+	     [--[no-]watchman]
 	     [--really-refresh] [--unresolve] [--again | -g]
 	     [--info-only] [--index-info]
 	     [-z] [--stdin] [--index-version <n>]
@@ -176,6 +177,11 @@ may not support it yet.
 --no-untracked-cache::
 	Enable or disable untracked cache feature. Please use
 	`--test-untracked-cache` before enabling it.
+
+--watchman::
+--no-watchman::
+	Enable or disable watchman support. This is, at present,
+	only useful with git index-helper.
 +
 These options take effect whatever the value of the `core.untrackedCache`
 configuration variable (see linkgit:git-config[1]). But a warning is
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 1c94ca5..55722b9 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -914,6 +914,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, nul_term_line = 0;
 	enum uc_mode untracked_cache = UC_UNSPECIFIED;
+	int use_watchman = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -1012,6 +1013,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    N_("test if the filesystem supports untracked cache"), UC_TEST),
 		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
 			    N_("enable untracked cache without testing the filesystem"), UC_FORCE),
+		OPT_BOOL(0, "watchman", &use_watchman,
+			N_("use or not use watchman to reduce refresh cost")),
 		OPT_END()
 	};
 
@@ -1149,6 +1152,18 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		die("Bug: bad untracked_cache value: %d", untracked_cache);
 	}
 
+	if (use_watchman > 0) {
+		the_index.last_update    = xstrdup("");
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+#ifndef USE_WATCHMAN
+		warning(_("git was built without watchman support, so this "
+			  "command will probably not result in any speedup."));
+#endif
+	} else if (!use_watchman) {
+		the_index.last_update    = NULL;
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 13/20] unpack-trees: preserve index extensions
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (10 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 12/20] update-index: enable/disable watchman support David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 14/20] watchman: add a config option to enable the extension David Turner
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

Make git checkout (and other unpack_tree operations) preserve the
untracked cache and watchman status. This is valuable for two reasons:

1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.

2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.

The same logic applies to the watchman state.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h                           |  1 +
 read-cache.c                      |  8 ++++++++
 t/t7063-status-untracked-cache.sh | 22 ++++++++++++++++++++++
 t/test-lib-functions.sh           |  4 ++++
 unpack-trees.c                    |  1 +
 5 files changed, 36 insertions(+)

diff --git a/cache.h b/cache.h
index 633e1dd..1b372ed 100644
--- a/cache.h
+++ b/cache.h
@@ -580,6 +580,7 @@ extern void write_watchman_ext(struct strbuf *sb, struct index_state *istate);
 #define CLOSE_LOCK		(1 << 1)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
+extern void move_index_extensions(struct index_state *dst, struct index_state *src);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern int index_dir_exists(struct index_state *istate, const char *name, int namelen);
diff --git a/read-cache.c b/read-cache.c
index 8521e85..bc3c989 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2777,3 +2777,11 @@ void stat_validity_update(struct stat_validity *sv, int fd)
 		fill_stat_data(sv->sd, &st);
 	}
 }
+
+void move_index_extensions(struct index_state *dst, struct index_state *src)
+{
+	dst->untracked = src->untracked;
+	src->untracked = NULL;
+	dst->last_update = src->last_update;
+	src->last_update = NULL;
+}
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index a971884..083516d 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -646,4 +646,26 @@ test_expect_success 'test ident field is working' '
 	test_cmp ../expect ../err
 '
 
+test_expect_success 'untracked cache survives a checkout' '
+	git commit --allow-empty -m empty &&
+	test-dump-untracked-cache >../before &&
+	test_when_finished  "git checkout master" &&
+	git checkout -b other_branch &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after &&
+	test_commit test &&
+	test-dump-untracked-cache >../before &&
+	git checkout master &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
+test_expect_success 'untracked cache survives a commit' '
+	test-dump-untracked-cache >../before &&
+	git add done/two &&
+	git commit -m commit &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
 test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8d99eb3..e974b5b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -186,6 +186,10 @@ test_commit () {
 		test_tick
 	fi &&
 	git commit $signoff -m "$1" &&
+	if [ "$(git config core.bare)" = false ]
+	then
+	    git update-index --force-untracked-cache
+	fi
 	git tag "${4:-$1}"
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 9f55cc2..fc90eb3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1215,6 +1215,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
+		move_index_extensions(&o->result, o->dst_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 14/20] watchman: add a config option to enable the extension
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (11 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 13/20] unpack-trees: preserve index extensions David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 15/20] index-helper: kill mode David Turner
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

For installations that have centrally-managed configuration, it's
easier to set a config once than to run update-index on every
repository.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                    |  1 +
 Documentation/config.txt      |  4 ++++
 Makefile                      |  1 +
 read-cache.c                  |  6 ++++++
 t/t1701-watchman-extension.sh | 37 +++++++++++++++++++++++++++++++++++++
 test-dump-watchman.c          | 16 ++++++++++++++++
 6 files changed, 65 insertions(+)
 create mode 100755 t/t1701-watchman-extension.sh
 create mode 100644 test-dump-watchman.c

diff --git a/.gitignore b/.gitignore
index b92f122..e6a5b2c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -188,6 +188,7 @@
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
+/test-dump-watchman
 /test-fake-ssh
 /test-scrap-cache-tree
 /test-genrandom
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2cd6bdd..15001ce 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1848,6 +1848,10 @@ imap::
 	The configuration variables in the 'imap' section are described
 	in linkgit:git-imap-send[1].
 
+index.addwatchmanextension::
+	Automatically add the watchman extension to the index whenever
+	it is written.
+
 index.version::
 	Specify the version with which new index files should be
 	initialized.  This does not affect existing repositories.
diff --git a/Makefile b/Makefile
index 7f4ab41..25ab7b4 100644
--- a/Makefile
+++ b/Makefile
@@ -599,6 +599,7 @@ TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
+TEST_PROGRAMS_NEED_X += test-dump-watchman
 TEST_PROGRAMS_NEED_X += test-fake-ssh
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
diff --git a/read-cache.c b/read-cache.c
index bc3c989..8141559 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2436,6 +2436,7 @@ static int do_write_index(struct index_state *istate, int newfd,
 	int entries = istate->cache_nr;
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	int watchman = 0;
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -2459,6 +2460,11 @@ static int do_write_index(struct index_state *istate, int newfd,
 	if (istate->version == 3 || istate->version == 2)
 		istate->version = extended ? 3 : 2;
 
+	if (!git_config_get_bool("index.addwatchmanextension", &watchman) &&
+	    watchman &&
+	    !the_index.last_update)
+		the_index.last_update = xstrdup("");
+
 	hdr_version = istate->version;
 
 	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
diff --git a/t/t1701-watchman-extension.sh b/t/t1701-watchman-extension.sh
new file mode 100755
index 0000000..71f1d46
--- /dev/null
+++ b/t/t1701-watchman-extension.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='watchman extension smoke tests'
+
+# These don't actually test watchman interaction -- just the
+# index extension
+
+. ./test-lib.sh
+
+test_expect_success 'enable watchman' '
+	test_commit a &&
+	test-dump-watchman .git/index >actual &&
+	echo "last_update: (null)" >expect &&
+	test_cmp expect actual &&
+	git update-index --watchman &&
+	test-dump-watchman .git/index >actual &&
+	echo "last_update: " >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'disable watchman' '
+	git update-index --no-watchman &&
+	test-dump-watchman .git/index >actual &&
+	echo "last_update: (null)" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'auto-enable watchman' '
+	test_config index.addwatchmanextension true &&
+	test_commit c &&
+	test-dump-watchman .git/index >actual &&
+	echo "last_update: " >expect &&
+	test_cmp expect actual
+'
+
+
+test_done
diff --git a/test-dump-watchman.c b/test-dump-watchman.c
new file mode 100644
index 0000000..0314fa5
--- /dev/null
+++ b/test-dump-watchman.c
@@ -0,0 +1,16 @@
+#include "cache.h"
+#include "ewah/ewok.h"
+
+int main(int argc, char **argv)
+{
+	do_read_index(&the_index, argv[1], 1);
+	printf("last_update: %s\n", the_index.last_update ?
+	       the_index.last_update : "(null)");
+
+	/*
+	 * For now, we just dump last_update, since it is not reasonable
+	 * to populate the extension itself in tests.
+	 */
+
+	return 0;
+}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 15/20] index-helper: kill mode
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (12 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 14/20] watchman: add a config option to enable the extension David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 16/20] index-helper: don't run if already running David Turner
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

Add a new command (and command-line arg) to allow index-helpers to
exit cleanly.

This is mainly useful for tests.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt |  3 +++
 index-helper.c                     | 31 ++++++++++++++++++++++++++++++-
 t/t7900-index-helper.sh            |  9 +++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index b2ca511..6f63a9e 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -40,6 +40,9 @@ OPTIONS
 --detach::
 	Detach from the shell.
 
+--kill::
+	Kill any running index-helper and clean up the socket
+
 NOTES
 -----
 
diff --git a/index-helper.c b/index-helper.c
index bb39326..f7a002b 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -361,6 +361,8 @@ static void loop(int fd, int idle_in_seconds)
 					 * alive, nothing to do.
 					 */
 				}
+			} else if (!strcmp(buf, "die")) {
+				break;
 			} else {
 				warning("BUG: Bogus command %s", buf);
 			}
@@ -391,10 +393,29 @@ static const char * const usage_text[] = {
 	NULL
 };
 
+static void request_kill(void)
+{
+	int fd = unix_stream_connect(git_path("index-helper.sock"));
+
+	if (fd >= 0) {
+		write_in_full(fd, "die", 4);
+		close(fd);
+	}
+
+	/*
+	 * The child will try to do this anyway, but we want to be
+	 * ready to launch a new daemon immediately after this command
+	 * returns.
+	 */
+
+	unlink(git_path("index-helper.sock"));
+	return;
+}
+
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600, detach = 0;
+	int idle_in_seconds = 600, detach = 0, kill = 0;
 	int fd;
 	struct strbuf socket_path = STRBUF_INIT;
 	struct option options[] = {
@@ -403,6 +424,7 @@ int main(int argc, char **argv)
 		OPT_BOOL(0, "strict", &to_verify,
 			 N_("verify shared memory after creating")),
 		OPT_BOOL(0, "detach", &detach, N_("detach the process")),
+		OPT_BOOL(0, "kill", &kill, N_("request that existing index helper processes exit")),
 		OPT_END()
 	};
 
@@ -417,6 +439,13 @@ int main(int argc, char **argv)
 			  options, usage_text, 0))
 		die(_("too many arguments"));
 
+	if (kill) {
+		if (detach)
+			die(_("--kill doesn't want any other options"));
+		request_kill();
+		return 0;
+	}
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 114c112..e71b5af 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -20,4 +20,13 @@ test_expect_success 'index-helper smoke test' '
 	test_path_is_missing .git/index-helper.sock
 '
 
+test_expect_success 'index-helper creates usable path file and can be killed' '
+	test_when_finished "git index-helper --kill" &&
+	test_path_is_missing .git/index-helper.sock &&
+	git index-helper --detach &&
+	test -S .git/index-helper.sock &&
+	git index-helper --kill &&
+	test_path_is_missing .git/index-helper.sock
+'
+
 test_done
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 16/20] index-helper: don't run if already running
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (13 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 15/20] index-helper: kill mode David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 17/20] index-helper: autorun mode David Turner
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 index-helper.c          | 7 +++++++
 t/t7900-index-helper.sh | 9 +++++++++
 2 files changed, 16 insertions(+)

diff --git a/index-helper.c b/index-helper.c
index f7a002b..9cfcb9e 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -446,6 +446,13 @@ int main(int argc, char **argv)
 		return 0;
 	}
 
+	/* check that no other copy is running */
+	fd = unix_stream_connect(git_path("index-helper.sock"));
+	if (fd >= 0)
+		die(_("Already running"));
+	if (errno != ECONNREFUSED && errno != ENOENT)
+		die_errno(_("Unexpected error checking socket"));
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index e71b5af..7159971 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -29,4 +29,13 @@ test_expect_success 'index-helper creates usable path file and can be killed' '
 	test_path_is_missing .git/index-helper.sock
 '
 
+test_expect_success 'index-helper will not start if already running' '
+	test_when_finished "git index-helper --kill" &&
+	git index-helper --detach &&
+	test -S .git/index-helper.sock &&
+	test_must_fail git index-helper 2>err &&
+	test -S .git/index-helper.sock &&
+	grep "Already running" err
+'
+
 test_done
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 17/20] index-helper: autorun mode
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (14 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 16/20] index-helper: don't run if already running David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 18/20] index-helper: optionally automatically run David Turner
  2016-06-26  4:14 ` [PATCH v13 19/20] trace: measure where the time is spent in the index-heavy operations David Turner
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

Soon, we'll want to automatically start index-helper, so we need
a mode that silently exits if it can't start up (either because
it's not in a git dir, or because another one is already running).

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-helper.txt |  4 ++++
 index-helper.c                     | 29 +++++++++++++++++++++++------
 t/t7900-index-helper.sh            |  8 ++++++++
 3 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 6f63a9e..eea5a38 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -43,6 +43,10 @@ OPTIONS
 --kill::
 	Kill any running index-helper and clean up the socket
 
+--autorun::
+	If index-helper is not already running, start it.  Else, do
+	nothing.
+
 NOTES
 -----
 
diff --git a/index-helper.c b/index-helper.c
index 9cfcb9e..f57fc14 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -415,8 +415,9 @@ static void request_kill(void)
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600, detach = 0, kill = 0;
+	int idle_in_seconds = 600, detach = 0, kill = 0, autorun = 0;
 	int fd;
+	int nongit;
 	struct strbuf socket_path = STRBUF_INIT;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
@@ -425,6 +426,7 @@ int main(int argc, char **argv)
 			 N_("verify shared memory after creating")),
 		OPT_BOOL(0, "detach", &detach, N_("detach the process")),
 		OPT_BOOL(0, "kill", &kill, N_("request that existing index helper processes exit")),
+		OPT_BOOL(0, "autorun", &autorun, N_("this is an automatic run of git index-helper, so certain errors can be solved by silently exiting")),
 		OPT_END()
 	};
 
@@ -434,7 +436,14 @@ int main(int argc, char **argv)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(usage_text, options);
 
-	prefix = setup_git_directory();
+	prefix = setup_git_directory_gently(&nongit);
+	if (nongit) {
+		if (autorun)
+			exit(0);
+		else
+			die(_("not a git repository"));
+	}
+
 	if (parse_options(argc, (const char **)argv, prefix,
 			  options, usage_text, 0))
 		die(_("too many arguments"));
@@ -448,10 +457,18 @@ int main(int argc, char **argv)
 
 	/* check that no other copy is running */
 	fd = unix_stream_connect(git_path("index-helper.sock"));
-	if (fd >= 0)
-		die(_("Already running"));
-	if (errno != ECONNREFUSED && errno != ENOENT)
-		die_errno(_("Unexpected error checking socket"));
+	if (fd >= 0) {
+		if (autorun)
+			exit(0);
+		else
+			die(_("Already running"));
+	}
+	if (errno != ECONNREFUSED && errno != ENOENT) {
+		if (autorun)
+			return 0;
+		else
+			die_errno(_("Unexpected error checking socket"));
+	}
 
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 7159971..aa6e5fc 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -38,4 +38,12 @@ test_expect_success 'index-helper will not start if already running' '
 	grep "Already running" err
 '
 
+test_expect_success 'index-helper is quiet with --autorun' '
+	test_when_finished "git index-helper --kill" &&
+	git index-helper --kill &&
+	git index-helper --detach &&
+	test -S .git/index-helper.sock &&
+	git index-helper --autorun
+'
+
 test_done
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 18/20] index-helper: optionally automatically run
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (15 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 17/20] index-helper: autorun mode David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26  4:14 ` [PATCH v13 19/20] trace: measure where the time is spent in the index-heavy operations David Turner
  17 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: David Turner <dturner@twopensource.com>

Introduce a new config option, indexhelper.autorun, to automatically
run git index-helper before starting up a builtin git command.  This
enables users to keep index-helper running without manual
intervention.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  4 ++++
 read-cache.c             | 35 +++++++++++++++++++++++++++++++++--
 t/t7900-index-helper.sh  | 20 ++++++++++++++++++++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 15001ce..385ea66 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1856,6 +1856,10 @@ index.version::
 	Specify the version with which new index files should be
 	initialized.  This does not affect existing repositories.
 
+indexhelper.autorun::
+	Automatically run git index-helper when any builtin git
+	command is run inside a repository.
+
 init.templateDir::
 	Specify the directory from which templates will be copied.
 	(See the "TEMPLATE DIRECTORY" section of linkgit:git-init[1].)
diff --git a/read-cache.c b/read-cache.c
index 8141559..fde778b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -22,6 +22,7 @@
 #include "pkt-line.h"
 #include "sigchain.h"
 #include "ewah/ewok.h"
+#include "run-command.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1733,6 +1734,31 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int want_auto_index_helper(void)
+{
+	int want = -1;
+	const char *value = NULL;
+	const char *key = "indexhelper.autorun";
+
+	if (git_config_key_is_valid(key) &&
+	    !git_config_get_value(key, &value)) {
+		int b = git_config_maybe_bool(key, value);
+		want = b >= 0 ? b : 0;
+	}
+	return want;
+}
+
+static void autorun_index_helper(void)
+{
+	const char *argv[] = {"git-index-helper", "--detach", "--autorun", NULL};
+	if (want_auto_index_helper() <= 0)
+		return;
+
+	if (run_command_v_opt(argv,
+			      RUN_SILENT_EXEC_FAILURE | RUN_CLEAN_ON_EXIT))
+		warning(_("You specified indexhelper.autorun, but running git-index-helper failed."));
+}
+
 /* in ms */
 #define WATCHMAN_TIMEOUT 1000
 
@@ -1804,6 +1830,7 @@ static int poke_daemon(struct index_state *istate,
 	if (fd < 0) {
 		warning("Failed to connect to index-helper socket");
 		unlink(git_path("index-helper.sock"));
+		autorun_index_helper();
 		return -1;
 	}
 	sigchain_push(SIGPIPE, SIG_IGN);
@@ -1843,9 +1870,13 @@ static int try_shm(struct index_state *istate)
 	int fd = -1;
 
 	if (!is_main_index(istate) ||
-	    old_size <= 20 ||
-	    stat(git_path("index-helper.sock"), &st))
+	    old_size <= 20)
 		return -1;
+
+	if (stat(git_path("index-helper.sock"), &st)) {
+		autorun_index_helper();
+		return -1;
+	}
 	if (poke_daemon(istate, &st, 0))
 		return -1;
 	sha1 = (unsigned char *)istate->mmap + old_size - 20;
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index aa6e5fc..3cfdf63 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -16,6 +16,9 @@ test -n "$NO_MMAP" && {
 }
 
 test_expect_success 'index-helper smoke test' '
+	# We need an existing commit so that the index exists (otherwise,
+	# the index-helper will not be autostarted)
+	test_commit x &&
 	git index-helper --exit-after 1 &&
 	test_path_is_missing .git/index-helper.sock
 '
@@ -46,4 +49,21 @@ test_expect_success 'index-helper is quiet with --autorun' '
 	git index-helper --autorun
 '
 
+test_expect_success 'index-helper autorun works' '
+	test_when_finished "git index-helper --kill" &&
+	rm -f .git/index-helper.sock &&
+	git status &&
+	test_path_is_missing .git/index-helper.sock &&
+	test_config indexhelper.autorun true &&
+	git status &&
+	test -S .git/index-helper.sock &&
+	git status 2>err &&
+	test -S .git/index-helper.sock &&
+	test_must_be_empty err &&
+	git index-helper --kill &&
+	test_config indexhelper.autorun false &&
+	git status &&
+	test_path_is_missing .git/index-helper.sock
+'
+
 test_done
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 19/20] trace: measure where the time is spent in the index-heavy operations
  2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (16 preceding siblings ...)
  2016-06-26  4:14 ` [PATCH v13 18/20] index-helper: optionally automatically run David Turner
@ 2016-06-26  4:14 ` David Turner
  2016-06-26 12:09   ` [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported Nguyễn Thái Ngọc Duy
  17 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-06-26  4:14 UTC (permalink / raw)
  To: git, pclouds, kamggg; +Cc: David Turner, Junio C Hamano

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below (92% of
time is accounted). To sum up the effort of making git scale better:

 - read cache line is being addressed by index-helper
 - preload/refresh index by watchman
 - read packed refs by lmdb backend
 - diff-index by rebuilding cache-tree
 - read directory by untracked cache and watchman
 - write index by split index
 - name hash potientially by index-helper

read-cache.c:2075         performance: 0.004058570 s: read cache .../index
preload-index.c:104       performance: 0.004419864 s: preload index
read-cache.c:1265         performance: 0.000185224 s: refresh index
refs/files-backend.c:1100 performance: 0.001935788 s: read packed refs
diff-lib.c:240            performance: 0.000144132 s: diff-files
diff-lib.c:506            performance: 0.013592000 s: diff-index
name-hash.c:128           performance: 0.000614177 s: initialize name hash
dir.c:2030                performance: 0.015814103 s: read directory
read-cache.c:2565         performance: 0.004052343 s: write index, changed mask = 2
trace.c:420               performance: 0.048365509 s: git command: './git' 'status'

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff-lib.c           |  4 ++++
 dir.c                |  2 ++
 name-hash.c          |  2 ++
 preload-index.c      |  2 ++
 read-cache.c         | 11 +++++++++++
 refs/files-backend.c |  2 ++
 6 files changed, 23 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index bc49c70..7af7f9a 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -90,6 +90,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	int diff_unmerged_stage = revs->max_count;
 	unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)
 			      ? CE_MATCH_RACY_IS_DIRTY : 0);
+	uint64_t start = getnanotime();
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
@@ -236,6 +237,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	}
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
+	trace_performance_since(start, "diff-files");
 	return 0;
 }
 
@@ -491,6 +493,7 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
 	struct object_array_entry *ent;
+	uint64_t start = getnanotime();
 
 	ent = revs->pending.objects;
 	if (diff_cache(revs, ent->item->oid.hash, ent->name, cached))
@@ -500,6 +503,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diffcore_fix_diff_index(&revs->diffopt);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
+	trace_performance_since(start, "diff-index");
 	return 0;
 }
 
diff --git a/dir.c b/dir.c
index 850d4fb..024b13c 100644
--- a/dir.c
+++ b/dir.c
@@ -1991,6 +1991,7 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 {
 	struct path_simplify *simplify;
 	struct untracked_cache_dir *untracked;
+	uint64_t start = getnanotime();
 
 	/*
 	 * Check out create_simplify()
@@ -2026,6 +2027,7 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
 	qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name);
+	trace_performance_since(start, "read directory %.*s", len, path);
 	if (dir->untracked) {
 		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 		trace_printf_key(&trace_untracked_stats,
diff --git a/name-hash.c b/name-hash.c
index 6d9f23e..b3966d8 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -115,6 +115,7 @@ static int cache_entry_cmp(const struct cache_entry *ce1,
 static void lazy_init_name_hash(struct index_state *istate)
 {
 	int nr;
+	uint64_t start = getnanotime();
 
 	if (istate->name_hash_initialized)
 		return;
@@ -124,6 +125,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
+	trace_performance_since(start, "initialize name hash");
 }
 
 void add_name_hash(struct index_state *istate, struct cache_entry *ce)
diff --git a/preload-index.c b/preload-index.c
index c1fe3a3..7bb4809 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -72,6 +72,7 @@ static void preload_index(struct index_state *index,
 {
 	int threads, i, work, offset;
 	struct thread_data data[MAX_PARALLEL];
+	uint64_t start = getnanotime();
 
 	if (!core_preload_index)
 		return;
@@ -100,6 +101,7 @@ static void preload_index(struct index_state *index,
 		if (pthread_join(p->pthread, NULL))
 			die("unable to join threaded lstat");
 	}
+	trace_performance_since(start, "preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index fde778b..ee777c1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1189,6 +1189,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
+	uint64_t start = getnanotime();
 
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
@@ -1273,6 +1274,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 
 		replace_index_entry(istate, i, new);
 	}
+	trace_performance_since(start, "refresh index");
 	return has_errors;
 }
 
@@ -2092,12 +2094,15 @@ int read_index_from(struct index_state *istate, const char *path)
 {
 	struct split_index *split_index;
 	int ret;
+	uint64_t start;
 
 	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	start = getnanotime();
 	ret = do_read_index(istate, path, 0);
+	trace_performance_since(start, "read cache %s", path);
 
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
@@ -2112,6 +2117,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index->base->keep_mmap = istate->keep_mmap;
 	split_index->base->to_shm    = istate->to_shm;
 	split_index->base->from_shm  = istate->from_shm;
+	start = getnanotime();
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -2123,6 +2129,9 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	post_read_index_from(istate);
+	trace_performance_since(start, "read cache %s",
+				git_path("sharedindex.%s",
+					 sha1_to_hex(split_index->base_sha1)));
 
 done:
 	if (ret > 0 && istate->from_shm && istate->last_update)
@@ -2468,6 +2477,7 @@ static int do_write_index(struct index_state *istate, int newfd,
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 	int watchman = 0;
+	uint64_t start = getnanotime();
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -2588,6 +2598,7 @@ static int do_write_index(struct index_state *istate, int newfd,
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+	trace_performance_since(start, "write index, changed mask = %x", istate->cache_changed);
 	return 0;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 81f68f8..57571ce 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1048,6 +1048,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	struct ref_entry *last = NULL;
 	struct strbuf line = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
+	uint64_t start = getnanotime();
 
 	while (strbuf_getwholeline(&line, f, '\n') != EOF) {
 		unsigned char sha1[20];
@@ -1096,6 +1097,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	}
 
 	strbuf_release(&line);
+	trace_performance_since(start, "read packed refs");
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported
  2016-06-26  4:14 ` [PATCH v13 19/20] trace: measure where the time is spent in the index-heavy operations David Turner
@ 2016-06-26 12:09   ` Nguyễn Thái Ngọc Duy
  2016-06-27 12:14     ` Johannes Schindelin
  0 siblings, 1 reply; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-26 12:09 UTC (permalink / raw)
  To: git, novalis
  Cc: Junio C Hamano, kamggg, dturner, Nguyễn Thái Ngọc Duy

This keeps #ifdef at the callee instead of caller, it's less messier.

The caller in question is in read-cache.c which, unlike other
unix-socket callers so far, is always built regardless of unix socket
support. No extra handling (for ENOSYS) is needed because in this
build, index-helper does not exist, $GIT_DIR/index-helper.sock does
not exist, so no unix socket call is made by read-cache.c in the first
place.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 BTW 20/20 didn't seem to make it to the list (or my mailbox). And you
 probably want to update .mailmap with @novalis.org as main address
 too.

 Makefile      |  2 ++
 unix-socket.h | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/Makefile b/Makefile
index d1309b8..d8cfc51 100644
--- a/Makefile
+++ b/Makefile
@@ -1337,6 +1337,8 @@ ifndef NO_UNIX_SOCKETS
 	LIB_OBJS += unix-socket.o
 	PROGRAM_OBJS += credential-cache.o
 	PROGRAM_OBJS += credential-cache--daemon.o
+else
+	BASIC_CFLAGS += -DNO_UNIX_SOCKETS
 endif
 
 ifdef NO_ICONV
diff --git a/unix-socket.h b/unix-socket.h
index e271aee..f1cba70 100644
--- a/unix-socket.h
+++ b/unix-socket.h
@@ -1,7 +1,25 @@
 #ifndef UNIX_SOCKET_H
 #define UNIX_SOCKET_H
 
+#ifndef NO_UNIX_SOCKETS
+
 int unix_stream_connect(const char *path);
 int unix_stream_listen(const char *path);
 
+#else
+
+static inline int unix_stream_connect(const char *path)
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+static inline int unix_stream_listen(const char *path)
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+#endif
+
 #endif /* UNIX_SOCKET_H */
-- 
2.8.2.531.gd073806


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported
  2016-06-26 12:09   ` [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported Nguyễn Thái Ngọc Duy
@ 2016-06-27 12:14     ` Johannes Schindelin
  2016-06-27 16:16       ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Johannes Schindelin @ 2016-06-27 12:14 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, novalis, Junio C Hamano, kamggg, dturner

[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

Hi Duy,

On Sun, 26 Jun 2016, Nguyễn Thái Ngọc Duy wrote:

> This keeps #ifdef at the callee instead of caller, it's less messier.
> 
> The caller in question is in read-cache.c which, unlike other
> unix-socket callers so far, is always built regardless of unix socket
> support. No extra handling (for ENOSYS) is needed because in this
> build, index-helper does not exist, $GIT_DIR/index-helper.sock does
> not exist, so no unix socket call is made by read-cache.c in the first
> place.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

Heh, I made something very similar (although I did not update the errno as
your patch does): https://github.com/git-for-windows/git/commit/919cb1d79

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported
  2016-06-27 12:14     ` Johannes Schindelin
@ 2016-06-27 16:16       ` Duy Nguyen
  2016-06-28 10:06         ` Johannes Schindelin
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-06-27 16:16 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Git Mailing List, David Turner, Junio C Hamano, Keith McGuigan,
	David Turner

On Mon, Jun 27, 2016 at 2:14 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Sun, 26 Jun 2016, Nguyễn Thái Ngọc Duy wrote:
>
>> This keeps #ifdef at the callee instead of caller, it's less messier.
>>
>> The caller in question is in read-cache.c which, unlike other
>> unix-socket callers so far, is always built regardless of unix socket
>> support. No extra handling (for ENOSYS) is needed because in this
>> build, index-helper does not exist, $GIT_DIR/index-helper.sock does
>> not exist, so no unix socket call is made by read-cache.c in the first
>> place.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>
> Heh, I made something very similar (although I did not update the errno as
> your patch does): https://github.com/git-for-windows/git/commit/919cb1d79

Yours lacks the important "else" line in Makefile, so pick mine! :-D
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff
       [not found]   ` <CAFUO74nENapwVsM3CUst9AHqy5LcKTFBCnJxGXPk8E952t+X5Q@mail.gmail.com>
@ 2016-06-27 16:53     ` David Turner
  0 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-06-27 16:53 UTC (permalink / raw)
  To: Keith McGuigan, git, pclouds; +Cc: David Turner, Ramsay Jones, Junio C Hamano

On 06/27/2016 08:58 AM, Keith McGuigan wrote:
>
> If 'indexhelper.exitafter' is set to 0, then loop is called with
> idle_in_seconds == 0.  It gets converted to -1, but then
> 'idle_in_seconds * 1000' is passed to poll(), so poll gets an argument
> of -1000, which it EINVALs on.

Silly OS X!

Will squash when I re-roll thanks.




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported
  2016-06-27 16:16       ` Duy Nguyen
@ 2016-06-28 10:06         ` Johannes Schindelin
  0 siblings, 0 replies; 35+ messages in thread
From: Johannes Schindelin @ 2016-06-28 10:06 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, David Turner, Junio C Hamano, Keith McGuigan,
	David Turner

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

Hi Duym

On Mon, 27 Jun 2016, Duy Nguyen wrote:

> On Mon, Jun 27, 2016 at 2:14 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Sun, 26 Jun 2016, Nguyễn Thái Ngọc Duy wrote:
> >
> >> This keeps #ifdef at the callee instead of caller, it's less messier.
> >>
> >> The caller in question is in read-cache.c which, unlike other
> >> unix-socket callers so far, is always built regardless of unix socket
> >> support. No extra handling (for ENOSYS) is needed because in this
> >> build, index-helper does not exist, $GIT_DIR/index-helper.sock does
> >> not exist, so no unix socket call is made by read-cache.c in the first
> >> place.
> >>
> >> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> >> ---
> >
> > Heh, I made something very similar (although I did not update the errno as
> > your patch does): https://github.com/git-for-windows/git/commit/919cb1d79
> 
> Yours lacks the important "else" line in Makefile, so pick mine! :-D

Oh, I definitely prefer your patch over mine. I just meant to concur that
this needs fixing.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff
  2016-06-26  4:14 ` [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff David Turner
       [not found]   ` <CAFUO74nENapwVsM3CUst9AHqy5LcKTFBCnJxGXPk8E952t+X5Q@mail.gmail.com>
@ 2016-06-30 13:06   ` Johannes Schindelin
  2016-06-30 15:04     ` Duy Nguyen
  1 sibling, 1 reply; 35+ messages in thread
From: Johannes Schindelin @ 2016-06-30 13:06 UTC (permalink / raw)
  To: David Turner
  Cc: git, pclouds, kamggg, David Turner, Ramsay Jones, Junio C Hamano

Hi Dave,

On Sun, 26 Jun 2016, David Turner wrote:

> diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
> new file mode 100755
> index 0000000..114c112
> --- /dev/null
> +++ b/t/t7900-index-helper.sh
> @@ -0,0 +1,23 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2016, Twitter, Inc
> +#
> +
> +test_description='git-index-helper
> +
> +Testing git index-helper
> +'
> +
> +. ./test-lib.sh
> +
> +test -n "$NO_MMAP" && {
> +	skip_all='skipping index-helper tests: no mmap'
> +	test_done
> +}

Even when NO_MMAP is empty, there might be no Unix sockets available (such
as is the case on Windows). In any case, you really only want to skip
these tests when index-helper is not available, so would you mind
squashing this patch in?

-- snipsnap --
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: [PATCH] fixup! index-helper: new daemon for caching index and related
 stuff

---
 t/t7900-index-helper.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 6c9b4dd..12b5bf7 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -10,8 +10,10 @@ Testing git index-helper
 
 . ./test-lib.sh
 
-test -n "$NO_MMAP" && {
-	skip_all='skipping index-helper tests: no mmap'
+git index-helper -h 2>/dev/null
+test $? = 129 ||
+{
+	skip_all='skipping index-helper tests: no index-helper executable'
 	test_done
 }
 
-- 
2.9.0.270.g810e421


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff
  2016-06-30 13:06   ` Johannes Schindelin
@ 2016-06-30 15:04     ` Duy Nguyen
  2016-07-02 11:20       ` Johannes Schindelin
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-06-30 15:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: David Turner, Git Mailing List, Keith McGuigan, David Turner,
	Ramsay Jones, Junio C Hamano

On Thu, Jun 30, 2016 at 3:06 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Even when NO_MMAP is empty, there might be no Unix sockets available (such
> as is the case on Windows). In any case, you really only want to skip
> these tests when index-helper is not available, so would you mind
> squashing this patch in?
>
> -- snipsnap --
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> Subject: [PATCH] fixup! index-helper: new daemon for caching index and related
>  stuff
>
> ---
>  t/t7900-index-helper.sh | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
> index 6c9b4dd..12b5bf7 100755
> --- a/t/t7900-index-helper.sh
> +++ b/t/t7900-index-helper.sh
> @@ -10,8 +10,10 @@ Testing git index-helper
>
>  . ./test-lib.sh
>
> -test -n "$NO_MMAP" && {
> -       skip_all='skipping index-helper tests: no mmap'
> +git index-helper -h 2>/dev/null
> +test $? = 129 ||

So when NO_MMAP is set, "git index-helper -h" will set $? to 1. And we
correctly skip the tests as well. It's a bit subtle though. How about
"git help -a|grep index-helper"?

> +{
> +       skip_all='skipping index-helper tests: no index-helper executable'
>         test_done
>  }
>
> --
> 2.9.0.270.g810e421
>



-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-06-26  4:14 ` [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat() David Turner
@ 2016-06-30 17:55   ` Ben Peart
  2016-06-30 19:14     ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Ben Peart @ 2016-06-30 17:55 UTC (permalink / raw)
  To: git

David Turner <novalis <at> novalis.org> writes:

> 
> Hiding watchman behind index-helper means you need both daemons. You
> can't run watchman alone. Not so good. But on the other hand, 'git'
> binary is not linked to watchman/json libraries, which is good for
> packaging. Core git package will run fine without watchman-related
> packages. If they need watchman, they can install git-index-helper and
> dependencies.
> 

Have you considered splitting index-helper and watchman apart?  Using 
Watchman to not lstat unchanged entries is a huge perf win with very 
large repos.

It would also be interesting to make the Watchman backend replaceable by
using an extensible API.  This has the benefit of not having to link the
'git' binary to the watchman/json libraries.  Is there any pattern 
already in git for accomplishing this?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-06-30 17:55   ` Ben Peart
@ 2016-06-30 19:14     ` Duy Nguyen
  2016-06-30 23:54       ` Ben Peart
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-06-30 19:14 UTC (permalink / raw)
  To: Ben Peart; +Cc: Git Mailing List, David Turner

On Thu, Jun 30, 2016 at 7:55 PM, Ben Peart <peartben@gmail.com> wrote:
> David Turner <novalis <at> novalis.org> writes:
>
>>
>> Hiding watchman behind index-helper means you need both daemons. You
>> can't run watchman alone. Not so good. But on the other hand, 'git'
>> binary is not linked to watchman/json libraries, which is good for
>> packaging. Core git package will run fine without watchman-related
>> packages. If they need watchman, they can install git-index-helper and
>> dependencies.
>>
>
> Have you considered splitting index-helper and watchman apart?  Using
> Watchman to not lstat unchanged entries is a huge perf win with very
> large repos.

On large repos (i.e. lots of files/dirs on worktree), the cost of
reading index will increase proportionally. Yes lstat costs, but I
suspect index reading (integrity verification actually) may cost more,
especially on platforms with cheap lstat like linux. On these repos
you really want to enable all four: index-helper (with watchman),
split-index (I still need to work out pruning on split-index) and
untracked cache. There's still a lot more to make git run fast on
large repos though.

> It would also be interesting to make the Watchman backend replaceable by
> using an extensible API.  This has the benefit of not having to link the
> 'git' binary to the watchman/json libraries.

'git' binary is not linked to watchman libraries. git-index-helper is
a separate binary, by design. In theory you can create a
'git-index-helper' replacement binary with something other than
watchman. I think David documented the protocol well (it may change in
the future though and we are not prepared for capability progression)

> Is there any pattern already in git for accomplishing this?
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-06-30 19:14     ` Duy Nguyen
@ 2016-06-30 23:54       ` Ben Peart
  2016-07-03 12:28         ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Ben Peart @ 2016-06-30 23:54 UTC (permalink / raw)
  To: git

Duy Nguyen <pclouds <at> gmail.com> writes:

> 
> On Thu, Jun 30, 2016 at 7:55 PM, Ben Peart <peartben <at> gmail.com> wrote:
> > David Turner <novalis <at> novalis.org> writes:
> >
> >>
> >> Hiding watchman behind index-helper means you need both daemons. You
> >> can't run watchman alone. Not so good. But on the other hand, 'git'
> >> binary is not linked to watchman/json libraries, which is good for
> >> packaging. Core git package will run fine without watchman-related
> >> packages. If they need watchman, they can install git-index-helper and
> >> dependencies.
> >>
> >
> > Have you considered splitting index-helper and watchman apart?  Using
> > Watchman to not lstat unchanged entries is a huge perf win with very
> > large repos.
> 
> On large repos (i.e. lots of files/dirs on worktree), the cost of
> reading index will increase proportionally. Yes lstat costs, but I
> suspect index reading (integrity verification actually) may cost more,
> especially on platforms with cheap lstat like linux. On these repos
> you really want to enable all four: index-helper (with watchman),
> split-index (I still need to work out pruning on split-index) and
> untracked cache. There's still a lot more to make git run fast on
> large repos though.
> 

I've found (at least on Windows) that as the repo size gets larger, the
time to read the index becomes a much smaller percentage of the overall
time.  I just captured some perf traces of git status on a large repo we
have.  Of that, 92.5% was spent in git!read_directory and only 4.0% was 
spent in git!read_index.  Of that 4%, 2.6% was git!glk_SHA1_Update.

Given there were no dirty files, Watchman would have made a huge 
improvement in the overall time but index helper would have had
relatively little impact.  We've noticed this same pattern in all our
repos which is what is driving our interest in the Watchman model and
also shows why index-helper is of less interest.

> > It would also be interesting to make the Watchman backend replaceable by
> > using an extensible API.  This has the benefit of not having to link the
> > 'git' binary to the watchman/json libraries.
> 
> 'git' binary is not linked to watchman libraries. git-index-helper is
> a separate binary, by design. In theory you can create a
> 'git-index-helper' replacement binary with something other than
> watchman. I think David documented the protocol well (it may change in
> the future though and we are not prepared for capability progression)
> 
> > Is there any pattern already in git for accomplishing this?

While the current design hides watchman behind index-helper, if we were
to change that model so they were independent, we would be interested
in doing it in such a way that provided some abstraction so that it 
could be replaced with another file watching daemon.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff
  2016-06-30 15:04     ` Duy Nguyen
@ 2016-07-02 11:20       ` Johannes Schindelin
  2016-07-02 12:43         ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Johannes Schindelin @ 2016-07-02 11:20 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: David Turner, Git Mailing List, Keith McGuigan, David Turner,
	Ramsay Jones, Junio C Hamano

Hi Duy,

On Thu, 30 Jun 2016, Duy Nguyen wrote:

> On Thu, Jun 30, 2016 at 3:06 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > Even when NO_MMAP is empty, there might be no Unix sockets available (such
> > as is the case on Windows). In any case, you really only want to skip
> > these tests when index-helper is not available, so would you mind
> > squashing this patch in?
> >
> > -- snipsnap --
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > Subject: [PATCH] fixup! index-helper: new daemon for caching index and related
> >  stuff
> >
> > ---
> >  t/t7900-index-helper.sh | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
> > index 6c9b4dd..12b5bf7 100755
> > --- a/t/t7900-index-helper.sh
> > +++ b/t/t7900-index-helper.sh
> > @@ -10,8 +10,10 @@ Testing git index-helper
> >
> >  . ./test-lib.sh
> >
> > -test -n "$NO_MMAP" && {
> > -       skip_all='skipping index-helper tests: no mmap'
> > +git index-helper -h 2>/dev/null
> > +test $? = 129 ||
> 
> So when NO_MMAP is set, "git index-helper -h" will set $? to 1.

Not quite.

When NO_MMAP is set, index-helper will not be compiled. Or at least it
should not be:

> +ifndef NO_MMAP
> +ifndef NO_UNIX_SOCKETS
> +       PROGRAM_OBJS += index-helper.o
> +endif
> +endif

If it is *unset*, *and* if NO_UNIX_SOCKETS is *also* unset, index-helper
gets compiled, and -h triggers code in parse-options.c or usage.c that
exits with status 129.

So I do not think that this is subtle.

But it just occurred to me that the #ifndef NO_MMAP in index-helper.c is
unnecessary and that its #else clause (containing a loop() that fails)
contains dead code: we never compile this code with NO_MMAP, and neither
should we.

Dave, would you mind taking that #ifndef NO_MMAP out of "index-helper: new
daemon for caching index and related stuff" when you re-roll?

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff
  2016-07-02 11:20       ` Johannes Schindelin
@ 2016-07-02 12:43         ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-07-02 12:43 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: David Turner, Git Mailing List, Keith McGuigan, David Turner,
	Ramsay Jones, Junio C Hamano

On Sat, Jul 2, 2016 at 1:20 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> > -test -n "$NO_MMAP" && {
>> > -       skip_all='skipping index-helper tests: no mmap'
>> > +git index-helper -h 2>/dev/null
>> > +test $? = 129 ||
>>
>> So when NO_MMAP is set, "git index-helper -h" will set $? to 1.
>
> Not quite.
>
> When NO_MMAP is set, index-helper will not be compiled. Or at least it
> should not be:
>
>> +ifndef NO_MMAP
>> +ifndef NO_UNIX_SOCKETS
>> +       PROGRAM_OBJS += index-helper.o
>> +endif
>> +endif
>
> If it is *unset*, *and* if NO_UNIX_SOCKETS is *also* unset, index-helper
> gets compiled, and -h triggers code in parse-options.c or usage.c that
> exits with status 129.
>
> So I do not think that this is subtle.

The question is "whether index-helper is supported?" but you need to
go through parse-options.c and that strange (to me) exit status 129 to
conclude that index-helper is in fact built. It's not just as straight
forwarding from reading the test.

> But it just occurred to me that the #ifndef NO_MMAP in index-helper.c is
> unnecessary and that its #else clause (containing a loop() that fails)
> contains dead code: we never compile this code with NO_MMAP, and neither
> should we.

Yeah that's probably old code from POSIX shm time. Back then there
were calls to shm_open and stuff, not just mmap.

> Dave, would you mind taking that #ifndef NO_MMAP out of "index-helper: new
> daemon for caching index and related stuff" when you re-roll?
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-06-30 23:54       ` Ben Peart
@ 2016-07-03 12:28         ` Duy Nguyen
  2016-07-06 16:54           ` Ben Peart
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-07-03 12:28 UTC (permalink / raw)
  To: Ben Peart; +Cc: Git Mailing List

On Fri, Jul 1, 2016 at 1:54 AM, Ben Peart <peartben@gmail.com> wrote:
> I've found (at least on Windows) that as the repo size gets larger, the
> time to read the index becomes a much smaller percentage of the overall
> time.  I just captured some perf traces of git status on a large repo we
> have.  Of that, 92.5% was spent in git!read_directory and only 4.0% was
> spent in git!read_index.  Of that 4%, 2.6% was git!glk_SHA1_Update.

OK.. Windows. I don't have Windows numbers, but "git!read_directory"
(is that read_directory() in "git" process?) does not do a lot of
stat'ing. Although I agree that it dominates git-status. I recorded
some numbers in nd/untracked-cache topic (see 9e59724 (update-index:
manually enable or disable untracked cache - 2015-03-08)).

First step would be enabling that because besides directory
traversing, this code does a lot of string processing that's hopefully
eliminated by untracked cache extension. I cut git-status' time in
half with it. The side effect though, is that it creates a new flow of
stat(), one per directory. It would be great if you could do some
measurements with untracked cache on Windows and see if we get similar
gain.

> Given there were no dirty files, Watchman would have made a huge
> improvement in the overall time but index helper would have had
> relatively little impact.  We've noticed this same pattern in all our
> repos which is what is driving our interest in the Watchman model and
> also shows why index-helper is of less interest.

Assuming that untracked cache cuts git-status time by half on Windows
as well, then index read time now takes a bigger percentage, 8%, where
it starts to make sense to optimize it.

On a quiet repository, having watchman does not help so much because
we already reduce the significant number of filesystem-related system
calls. Yes there is lstat() and eliminating it may gain some more
(with watchman) and it matters on a repo with lots of directories. You
probably can just take these lstat out (I can help point out where)
and see how much the gain is.

Assuming (blindly again) that removing lstat helps like 10% of
read_index(), we would need to profile untracked cache code and see
where what we can do next. There are still a lot of directory
traversing there (except that it traverses the cache instead of
filesystem) and maybe we can do something. But I haven't gotten that
far.

> While the current design hides watchman behind index-helper, if we were
> to change that model so they were independent, we would be interested
> in doing it in such a way that provided some abstraction so that it
> could be replaced with another file watching daemon.

Frankly I'm not that interested in replacing another file watching
daemon. My first attempt at this problem was "file-watcher" which was
tied to Linux inotify system only and it would make sense to make it
replaceable. But watchman supports OS X, Linux, FreeBSD and Windows
now, not just Linux only as before, why another abstraction layer? You
could even replace "watchman.exe" binary. As long as you produce the
same json data, your new daemon should still work.

Tying index caching daemon and file watching daemon (let's avoid
"watchman" for now) gives us a bonus. Because both git and the caching
daemon know that they read the same index, we could answer the
question "what files are dirty?" with "file number 1, 5, 9 in the
index" instead of sending full paths and git has to make some more
lookups to identify them. In this series we send it as a compressed
bit map. Probably not a big deal in terms of real performance, but it
feels nice to do lookups less.

The second reason is because watchman daemon alone does not provide
enough information to reduce untracked cache's lstat() as much as
possible. The current approach in this series is a naive one, which
works for some cases, but not optimal (I'll get to that). We need a
separate long-running daemon to maintain extra info to reduce lstat().
Because our target is watchman, it does not make sense to add yet
another daemon besides index-helper to do this.

OK the optimal lstat() reduction thing. Right now, if any file in a
directory is updated, the directory is invalidated in untracked cache
and we need to traverse it to collect excluded files again. But it
does not have to be that way. We don't care if any file is _updated_
because it will not change untracked cache output. We care about what
files are _added_ or _deleted_. New files will need to be classified
as either tracked, untracked or ignored. Deleted files may invalid
either three file lists. Watchman cannot answer "what files are added
or deleted since the point X in time" and I agree that it's not
watchman's job (watchman issue 65). So we have to maintain some more
info by ourselves, e.g. the list of files at any requested "clock".
With that we can compare the file lists of two "clock"s and tell git
what files are added or deleted.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-07-03 12:28         ` Duy Nguyen
@ 2016-07-06 16:54           ` Ben Peart
  2016-07-08 16:32             ` Duy Nguyen
  2016-07-08 16:39             ` Duy Nguyen
  0 siblings, 2 replies; 35+ messages in thread
From: Ben Peart @ 2016-07-06 16:54 UTC (permalink / raw)
  To: git

Duy Nguyen <pclouds <at> gmail.com> writes:

> 
> First step would be enabling that because besides directory
> traversing, this code does a lot of string processing that's hopefully
> eliminated by untracked cache extension. I cut git-status' time in
> half with it. The side effect though, is that it creates a new flow of
> stat(), one per directory. It would be great if you could do some
> measurements with untracked cache on Windows and see if we get similar
> gain.
 
These numbers were captured with core.fscache and core.untrackedcache 
both set to true in the belief that it would produce the best 
performance.  If that is incorrect, I'm happy to capture numbers with 
other options set.  

If you drill the next step down into the call tree, the bulk of the cost
of read_directory is coming from mingw_stat (85.6%).  These numbers are 
inclusive in that they reflect the of the function plus any of its 
callees.  

If you look at the overall cost of functions exclusively (ie that only 
include the cost of the function and not it's children), 
kernelbase!CreateFileW, kernelbase!CloseHandle, and 
kernebase!GetFileInformationByHandle dominate (84.9% of the time).  
These functions form the basis of the stat emulation on Windows although
the fscache certainly has an impact on what is happening here as well.

While we can certainly work to speed these up, the biggest performance 
win will be eliminating as many of these calls as possible which is what
we are hoping to accomplish by using Watchman to limit the files that 
_need_ a stat call.

> > Given there were no dirty files, Watchman would have made a huge
> > improvement in the overall time but index helper would have had
> > relatively little impact.  We've noticed this same pattern in all our
> > repos which is what is driving our interest in the Watchman model and
> > also shows why index-helper is of less interest.
> 
> Assuming that untracked cache cuts git-status time by half on Windows
> as well, then index read time now takes a bigger percentage, 8%, where
> it starts to make sense to optimize it.
> 
> On a quiet repository, having watchman does not help so much because
> we already reduce the significant number of filesystem-related system
> calls. Yes there is lstat() and eliminating it may gain some more
> (with watchman) and it matters on a repo with lots of directories. You
> probably can just take these lstat out (I can help point out where)
> and see how much the gain is.

I don't understand why Watchman won't provide a _significant_ improvement
here.  My understanding is that it will minimize the stat calls to those
files that may have changed as well as limiting the untracked cache to
only scanning those directories that may have changes in them.  In this
particular scenario, _no_ files have changed so Watchman would return
an empty set thus eliminating virtually the stat calls and directory
enumerations.  I'd expect this to result in a >90% savings.  Am I 
missing something?

> Assuming (blindly again) that removing lstat helps like 10% of
> read_index(), we would need to profile untracked cache code and see
> where what we can do next. There are still a lot of directory
> traversing there (except that it traverses the cache instead of
> filesystem) and maybe we can do something. But I haven't gotten that
> far.
> 
> > While the current design hides watchman behind index-helper, if we were
> > to change that model so they were independent, we would be interested
> > in doing it in such a way that provided some abstraction so that it
> > could be replaced with another file watching daemon.
> 
> Frankly I'm not that interested in replacing another file watching
> daemon. My first attempt at this problem was "file-watcher" which was
> tied to Linux inotify system only and it would make sense to make it
> replaceable. But watchman supports OS X, Linux, FreeBSD and Windows
> now, not just Linux only as before, why another abstraction layer? You
> could even replace "watchman.exe" binary. As long as you produce the
> same json data, your new daemon should still work.

This is a simplification it would be nice to make as we have other code
already running that can report on all the changes happening.  It would
enable us to remove the additional complexity of the Watchman daemon. 
I'm sure we can make it work either way by emulating the Watchman 
interface and output.

> Tying index caching daemon and file watching daemon (let's avoid
> "watchman" for now) gives us a bonus. Because both git and the caching
> daemon know that they read the same index, we could answer the
> question "what files are dirty?" with "file number 1, 5, 9 in the
> index" instead of sending full paths and git has to make some more
> lookups to identify them. In this series we send it as a compressed
> bit map. Probably not a big deal in terms of real performance, but it
> feels nice to do lookups less.

Today, Watchman returns a list of files and directories and then creates
the compressed bitmap that index-helper uses.  The work of looking those
entries up in the index and then creating the bitmap still has to happen
so I suspect you are correct that it doesn't make much of a real 
performance difference.  It's just moving where that lookup and bitmap
creation happens.

I'm in the process of prototyping this, and currently skip much of the 
process of iterating through the list of changed files, looking up the 
entry in the index, creating the bitmap, passing that bitmap through the 
WAMA section to index-helper reading the WAMA section in git and then 
looping through the bitmap to set the CE_WATCHMAN_DIRTY bit on the 
corresponding index entries and updating the untracked cache.

Instead, I iterate through the list of changed files, look up the entry 
in the index and directly set the dirty bit all within read-cache.c.  
At this point, it's <100 lines of code.  I'll keep fleshing this out
and get some perf numbers once it's working.

While this saves several steps and reduces complexity (index-helper and 
the WAMA is section are no longer involved), it does mean that 
read-cache.c needs a platform independent way to query and receive the 
list of modified files and directories.  It also means we need a way to 
store the date/time string passed to the file watching daemon in the 
index.  Using the WAMA section just to store the date/time seems a bit 
heavy but it can work.

> The second reason is because watchman daemon alone does not provide
> enough information to reduce untracked cache's lstat() as much as
> possible. The current approach in this series is a naive one, which
> works for some cases, but not optimal (I'll get to that). We need a
> separate long-running daemon to maintain extra info to reduce lstat().
> Because our target is watchman, it does not make sense to add yet
> another daemon besides index-helper to do this.
> 
> OK the optimal lstat() reduction thing. Right now, if any file in a
> directory is updated, the directory is invalidated in untracked cache
> and we need to traverse it to collect excluded files again. But it
> does not have to be that way. We don't care if any file is _updated_
> because it will not change untracked cache output. We care about what
> files are _added_ or _deleted_. New files will need to be classified
> as either tracked, untracked or ignored. Deleted files may invalid
> either three file lists. Watchman cannot answer "what files are added
> or deleted since the point X in time" and I agree that it's not
> watchman's job (watchman issue 65). So we have to maintain some more
> info by ourselves, e.g. the list of files at any requested "clock".
> With that we can compare the file lists of two "clock"s and tell git
> what files are added or deleted.

This sounds like a nice optimization but right now I'm focused on how we
can scope the cost of status to be limited to the files and directories
that may have changes.  With large repos, this is a small subset of the
overall repo and will hopefully be enough to make the performance 
reasonable.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-07-06 16:54           ` Ben Peart
@ 2016-07-08 16:32             ` Duy Nguyen
  2016-07-08 16:39             ` Duy Nguyen
  1 sibling, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-07-08 16:32 UTC (permalink / raw)
  To: Ben Peart; +Cc: git

On Wed, Jul 06, 2016 at 04:54:00PM +0000, Ben Peart wrote:
> Duy Nguyen <pclouds <at> gmail.com> writes:
>
> >
> > First step would be enabling that because besides directory
> > traversing, this code does a lot of string processing that's hopefully
> > eliminated by untracked cache extension. I cut git-status' time in
> > half with it. The side effect though, is that it creates a new flow of
> > stat(), one per directory. It would be great if you could do some
> > measurements with untracked cache on Windows and see if we get similar
> > gain.
>
> These numbers were captured with core.fscache and core.untrackedcache
> both set to true in the belief that it would produce the best
> performance.

So I guessed it wrong :-) I thought untracked cache was still not popular.

> If that is incorrect, I'm happy to capture numbers with other
> options set.

It's not incorrect. But if you are happy to provide numbers, I'm happy
to read them and provide my opinions.

> If you drill the next step down into the call tree, the bulk of the cost
> of read_directory is coming from mingw_stat (85.6%).  These numbers are
> inclusive in that they reflect the of the function plus any of its
> callees.

If I remember it correctly, fscache changes mingw_stat() to
FindFirst.. and FindNext... basically read the whole directory up. The
assumption is, if you stat(a/b) you probably want to stat(a/c) and
stat(a/d) soon too. And stat() on Windows is slower in that case
(which is a real case, index refresh).

Untracked cache does not follow this pattern, it stat() all
directories an no files (ok, _some_ files, like .gitignore), which is
not what fscache is optimized for. It would be good if you can get
some numbers with untracked cache but _without_ fscache.

You should kill the lstat() stream from index refresh though because
it may interfere. Doing that is as simple as this. This could provide
some good base line for measuring read_directory().

diff --git a/read-cache.c b/read-cache.c
index db27766..1cef379 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1079,7 +1079,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	 * that the change to the work tree does not matter and told
 	 * us not to worry.
 	 */
-	if (!ignore_skip_worktree && ce_skip_worktree(ce)) {
+	if (1 || !ignore_skip_worktree && ce_skip_worktree(ce)) {
 		ce_mark_uptodate(ce);
 		return ce;
 	}


> If you look at the overall cost of functions exclusively (ie that only
> include the cost of the function and not it's children),
> kernelbase!CreateFileW, kernelbase!CloseHandle, and
> kernebase!GetFileInformationByHandle dominate (84.9% of the time).
> These functions form the basis of the stat emulation on Windows although
> the fscache certainly has an impact on what is happening here as well.

Yeah.. I'm hoping to see something different without fscache. But
probably not much.

> > > Given there were no dirty files, Watchman would have made a huge
> > > improvement in the overall time but index helper would have had
> > > relatively little impact.  We've noticed this same pattern in all our
> > > repos which is what is driving our interest in the Watchman model and
> > > also shows why index-helper is of less interest.
> >
> > Assuming that untracked cache cuts git-status time by half on Windows
> > as well, then index read time now takes a bigger percentage, 8%, where
> > it starts to make sense to optimize it.
> >
> > On a quiet repository, having watchman does not help so much because
> > we already reduce the significant number of filesystem-related system
> > calls. Yes there is lstat() and eliminating it may gain some more
> > (with watchman) and it matters on a repo with lots of directories. You
> > probably can just take these lstat out (I can help point out where)
> > and see how much the gain is.
>
> I don't understand why Watchman won't provide a _significant_ improvement
> here.  My understanding is that it will minimize the stat calls to those
> files that may have changed as well as limiting the untracked cache to
> only scanning those directories that may have changes in them.  In this
> particular scenario, _no_ files have changed so Watchman would return
> an empty set thus eliminating virtually the stat calls and directory
> enumerations.  I'd expect this to result in a >90% savings.  Am I
> missing something?

If stat() calls consume most of the time by read_directory() then yes
watchman provides significant improvement. I was writing that thinking
you did not enable untracked cache. It's a big leap with and without
untracked cache. Watchman's performance gain is smaller in comparison
to that (again I can be really wrong when it comes to Windows stuff).


> > > While the current design hides watchman behind index-helper, if we were
> > > to change that model so they were independent, we would be interested
> > > in doing it in such a way that provided some abstraction so that it
> > > could be replaced with another file watching daemon.
> >
> > Frankly I'm not that interested in replacing another file watching
> > daemon. My first attempt at this problem was "file-watcher" which was
> > tied to Linux inotify system only and it would make sense to make it
> > replaceable. But watchman supports OS X, Linux, FreeBSD and Windows
> > now, not just Linux only as before, why another abstraction layer? You
> > could even replace "watchman.exe" binary. As long as you produce the
> > same json data, your new daemon should still work.
>
> This is a simplification it would be nice to make as we have other code
> already running that can report on all the changes happening.  It would
> enable us to remove the additional complexity of the Watchman daemon.
> I'm sure we can make it work either way by emulating the Watchman
> interface and output.

In the spririt of scratching own itches, please feel free to suggest
some changes to split the watchman interface out of index-helper. It
feels like a burden to me, but I am just a contributor like you.

>
> > Tying index caching daemon and file watching daemon (let's avoid
> > "watchman" for now) gives us a bonus. Because both git and the caching
> > daemon know that they read the same index, we could answer the
> > question "what files are dirty?" with "file number 1, 5, 9 in the
> > index" instead of sending full paths and git has to make some more
> > lookups to identify them. In this series we send it as a compressed
> > bit map. Probably not a big deal in terms of real performance, but it
> > feels nice to do lookups less.
>
> Today, Watchman returns a list of files and directories and then creates
> the compressed bitmap that index-helper uses.  The work of looking those
> entries up in the index and then creating the bitmap still has to happen
> so I suspect you are correct that it doesn't make much of a real
> performance difference.  It's just moving where that lookup and bitmap
> creation happens.

We could do the lookups in background, before git requests it, so the
cost would be hidden. But yeah, numbers decide whether this is an evil
micro-optimization or not.


> I'm in the process of prototyping this, and currently skip much of the
> process of iterating through the list of changed files, looking up the
> entry in the index, creating the bitmap, passing that bitmap through the
> WAMA section to index-helper reading the WAMA section in git and then
> looping through the bitmap to set the CE_WATCHMAN_DIRTY bit on the
> corresponding index entries and updating the untracked cache.
>
> Instead, I iterate through the list of changed files, look up the entry
> in the index and directly set the dirty bit all within read-cache.c.
> At this point, it's <100 lines of code.  I'll keep fleshing this out
> and get some perf numbers once it's working.

Great!

> > The second reason is because watchman daemon alone does not provide
> > enough information to reduce untracked cache's lstat() as much as
> > possible. The current approach in this series is a naive one, which
> > works for some cases, but not optimal (I'll get to that). We need a
> > separate long-running daemon to maintain extra info to reduce lstat().
> > Because our target is watchman, it does not make sense to add yet
> > another daemon besides index-helper to do this.
> >
> > OK the optimal lstat() reduction thing. Right now, if any file in a
> > directory is updated, the directory is invalidated in untracked cache
> > and we need to traverse it to collect excluded files again. But it
> > does not have to be that way. We don't care if any file is _updated_
> > because it will not change untracked cache output. We care about what
> > files are _added_ or _deleted_. New files will need to be classified
> > as either tracked, untracked or ignored. Deleted files may invalid
> > either three file lists. Watchman cannot answer "what files are added
> > or deleted since the point X in time" and I agree that it's not
> > watchman's job (watchman issue 65). So we have to maintain some more
> > info by ourselves, e.g. the list of files at any requested "clock".
> > With that we can compare the file lists of two "clock"s and tell git
> > what files are added or deleted.
>
> This sounds like a nice optimization but right now I'm focused on how we
> can scope the cost of status to be limited to the files and directories
> that may have changes.  With large repos, this is a small subset of the
> overall repo and will hopefully be enough to make the performance
> reasonable.

The use case in my mind is building stuff where .o files are put in
the same place as source files, and such a build would touch everything.
I certainly am not familar with what most real large repos look like
and how they are used. So I may be wrong.
--
Duy

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat()
  2016-07-06 16:54           ` Ben Peart
  2016-07-08 16:32             ` Duy Nguyen
@ 2016-07-08 16:39             ` Duy Nguyen
  1 sibling, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-07-08 16:39 UTC (permalink / raw)
  To: Ben Peart; +Cc: Git Mailing List

BTW just for me to have some perspective, roughly how many directories
and files are there in the worktree of this repo?
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-07-08 16:40 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-26  4:14 [PATCH v13 01/20] read-cache.c: fix constness of verify_hdr() David Turner
2016-06-26  4:14 ` [PATCH v13 02/20] read-cache: allow to keep mmap'd memory after reading David Turner
2016-06-26  4:14 ` [PATCH v13 03/20] pkt-line: add gentle version of packet_write David Turner
2016-06-26  4:14 ` [PATCH v13 04/20] index-helper: new daemon for caching index and related stuff David Turner
     [not found]   ` <CAFUO74nENapwVsM3CUst9AHqy5LcKTFBCnJxGXPk8E952t+X5Q@mail.gmail.com>
2016-06-27 16:53     ` David Turner
2016-06-30 13:06   ` Johannes Schindelin
2016-06-30 15:04     ` Duy Nguyen
2016-07-02 11:20       ` Johannes Schindelin
2016-07-02 12:43         ` Duy Nguyen
2016-06-26  4:14 ` [PATCH v13 05/20] index-helper: add --strict David Turner
2016-06-26  4:14 ` [PATCH v13 06/20] daemonize(): set a flag before exiting the main process David Turner
2016-06-26  4:14 ` [PATCH v13 07/20] index-helper: add --detach David Turner
2016-06-26  4:14 ` [PATCH v13 08/20] index-helper: log warnings David Turner
2016-06-26  4:14 ` [PATCH v13 09/20] read-cache: add watchman 'WAMA' extension David Turner
2016-06-26  4:14 ` [PATCH v13 10/20] watchman: support watchman to reduce index refresh cost David Turner
2016-06-26  4:14 ` [PATCH v13 11/20] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-06-30 17:55   ` Ben Peart
2016-06-30 19:14     ` Duy Nguyen
2016-06-30 23:54       ` Ben Peart
2016-07-03 12:28         ` Duy Nguyen
2016-07-06 16:54           ` Ben Peart
2016-07-08 16:32             ` Duy Nguyen
2016-07-08 16:39             ` Duy Nguyen
2016-06-26  4:14 ` [PATCH v13 12/20] update-index: enable/disable watchman support David Turner
2016-06-26  4:14 ` [PATCH v13 13/20] unpack-trees: preserve index extensions David Turner
2016-06-26  4:14 ` [PATCH v13 14/20] watchman: add a config option to enable the extension David Turner
2016-06-26  4:14 ` [PATCH v13 15/20] index-helper: kill mode David Turner
2016-06-26  4:14 ` [PATCH v13 16/20] index-helper: don't run if already running David Turner
2016-06-26  4:14 ` [PATCH v13 17/20] index-helper: autorun mode David Turner
2016-06-26  4:14 ` [PATCH v13 18/20] index-helper: optionally automatically run David Turner
2016-06-26  4:14 ` [PATCH v13 19/20] trace: measure where the time is spent in the index-heavy operations David Turner
2016-06-26 12:09   ` [PATCH v13 21/20] unix-socket.c: add stub implementation when unix sockets are not supported Nguyễn Thái Ngọc Duy
2016-06-27 12:14     ` Johannes Schindelin
2016-06-27 16:16       ` Duy Nguyen
2016-06-28 10:06         ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.