All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Use watchman to reduce index refresh time
@ 2015-11-01 13:55 Nguyễn Thái Ngọc Duy
  2015-11-01 13:55 ` [PATCH 1/5] read-cache: add watchman 'WAMA' extension Nguyễn Thái Ngọc Duy
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2015-11-01 13:55 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Nguyễn Thái Ngọc Duy

This series builds on top of the index-helper series I just sent and
uses watchman to keep track of file changes in order to avoid lstat()
at refresh time. The series can also be found at [1]

When I started this work, watchman did not support Windows yet. It
does now, even if still experimental [2]. So Windows people, please
try it out if you have time.

To put all pieces so far together, we have split-index to reduce index
write time, untracked cache to reduce I/O as well as computation for
.gitignore, index-helper for index read time and this series for
lstat() at refresh time. The remaining piece is killing lstat() from
untracked cache, but right now it's just some idea and incomplete
code.

[1] https://github.com/pclouds/git/commits/refresh-with-watchman
[2] https://github.com/facebook/watchman/issues/19

Nguyễn Thái Ngọc Duy (5):
  read-cache: add watchman 'WAMA' extension
  Add watchman support to reduce index refresh cost
  read-cache: allow index-helper to prepare shm before git reads it
  index-helper: use watchman to avoid refreshing index with lstat()
  update-index: enable/disable watchman support

 Makefile                 |  12 +++++
 builtin/update-index.c   |  11 +++++
 cache.h                  |   7 +++
 config.c                 |   5 ++
 configure.ac             |   8 +++
 environment.c            |   3 ++
 index-helper.c           |  84 +++++++++++++++++++++++++++++--
 read-cache.c             | 126 ++++++++++++++++++++++++++++++++++++++++++++---
 watchman-support.c (new) | 108 ++++++++++++++++++++++++++++++++++++++++
 watchman-support.h (new) |   7 +++
 10 files changed, 361 insertions(+), 10 deletions(-)
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

-- 
2.2.0.513.g477eb31

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] read-cache: add watchman 'WAMA' extension
  2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
@ 2015-11-01 13:55 ` Nguyễn Thái Ngọc Duy
  2015-11-02 22:03   ` David Turner
  2015-11-01 13:55 ` [PATCH 2/5] Add watchman support to reduce index refresh cost Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2015-11-01 13:55 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Nguyễn Thái Ngọc Duy

The extension contains a bitmap, one bit for each entry in the
index. If the n-th bit is zero, the n-th entry is considered
unchanged, we can ce_mark_uptodate() it without refreshing. If the bit
is non-zero and we found out the corresponding file is clean after
refresh, we can clear the bit.

The 'skipping refresh' bit is not in this patch yet as we would need
watchman. More details in later patches.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  4 ++++
 read-cache.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 9633acc..a05fd31 100644
--- a/cache.h
+++ b/cache.h
@@ -163,6 +163,8 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
+#define CE_NO_WATCH  (0x0001)
+
 /*
  * Range 0xFFFF0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -298,6 +300,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define WATCHMAN_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -322,6 +325,7 @@ struct index_state {
 	struct untracked_cache *untracked;
 	void *mmap;
 	size_t mmap_size;
+	char *last_update;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index f609776..893223e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -19,6 +19,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "shm.h"
+#include "ewah/ewok.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -41,11 +42,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_WATCHMAN 0x57414D41	  /* "WAMA" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | \
+		 WATCHMAN_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -1224,8 +1227,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			if (ce->ce_flags & CE_NO_WATCH) {
+				ce->ce_flags          &= ~CE_NO_WATCH;
+				istate->cache_changed |= WATCHMAN_CHANGED;
+			}
 			continue;
+		}
 		if (!new) {
 			const char *fmt;
 
@@ -1369,6 +1377,48 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static void mark_no_watchman(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	assert(pos < istate->cache_nr);
+	istate->cache[pos]->ce_flags |= CE_NO_WATCH;
+}
+
+static int read_watchman_ext(struct index_state *istate, const void *data,
+			     unsigned long sz)
+{
+	struct ewah_bitmap *bitmap;
+	int ret, len;
+
+	if (memchr(data, 0, sz) == NULL)
+		return error("invalid extension");
+	len = strlen(data) + 1;
+	bitmap = ewah_new();
+	ret = ewah_read_mmap(bitmap, (const char *)data + len, sz - len);
+	if (ret != sz - len) {
+		ewah_free(bitmap);
+		return error("fail to parse ewah bitmap");
+	}
+	istate->last_update = xstrdup(data);
+	ewah_each_bit(bitmap, mark_no_watchman, istate);
+	ewah_free(bitmap);
+	return 0;
+}
+
+static void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
+{
+	struct ewah_bitmap *bitmap;
+	int i;
+
+	strbuf_add(sb, istate->last_update, strlen(istate->last_update) + 1);
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_NO_WATCH)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1386,6 +1436,11 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+
+	case CACHE_EXT_WATCHMAN:
+		read_watchman_ext(istate, data, sz);
+		break;
+
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1794,6 +1849,8 @@ int discard_index(struct index_state *istate)
 	istate->untracked = NULL;
 	istate->from_shm = 0;
 	istate->to_shm   = 0;
+	free(istate->last_update);
+	istate->last_update = NULL;
 	return 0;
 }
 
@@ -2191,6 +2248,16 @@ static int do_write_index(struct index_state *istate, int newfd,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_watchman_ext(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCHMAN, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
-- 
2.2.0.513.g477eb31

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] Add watchman support to reduce index refresh cost
  2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
  2015-11-01 13:55 ` [PATCH 1/5] read-cache: add watchman 'WAMA' extension Nguyễn Thái Ngọc Duy
@ 2015-11-01 13:55 ` Nguyễn Thái Ngọc Duy
  2015-11-02 21:19   ` David Turner
  2015-11-01 13:55 ` [PATCH 3/5] read-cache: allow index-helper to prepare shm before git reads it Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2015-11-01 13:55 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Nguyễn Thái Ngọc Duy

The previous patch has the logic to clear bits in 'WAMA' bitmap. This
patch has logic to set bits as told by watchman. The missing bit,
_using_ these bits, are not here yet.

A lot of this code is written by David Turner originally, mostly from
[1]. I'm just copying and polishing it a bit.

[1] http://article.gmane.org/gmane.comp.version-control.git/248006

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile                 |   7 +++
 cache.h                  |   1 +
 config.c                 |   5 +++
 configure.ac             |   8 ++++
 environment.c            |   3 ++
 watchman-support.c (new) | 108 +++++++++++++++++++++++++++++++++++++++++++++++
 watchman-support.h (new) |   8 ++++
 7 files changed, 140 insertions(+)
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

diff --git a/Makefile b/Makefile
index c01cd2e..761acb6 100644
--- a/Makefile
+++ b/Makefile
@@ -1389,6 +1389,12 @@ else
 	LIB_OBJS += thread-utils.o
 endif
 
+ifdef USE_WATCHMAN
+	LIB_H += watchman-support.h
+	LIB_OBJS += watchman-support.o
+	BASIC_CFLAGS += -DUSE_WATCHMAN
+endif
+
 ifdef HAVE_PATHS_H
 	BASIC_CFLAGS += -DHAVE_PATHS_H
 endif
@@ -2135,6 +2141,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
+	@echo USE_WATCHMAN=\''$(subst ','\'',$(subst ','\'',$(USE_WATCHMAN)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index a05fd31..572299c 100644
--- a/cache.h
+++ b/cache.h
@@ -648,6 +648,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_watchman_sync_timeout;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 248a21a..6b63f66 100644
--- a/config.c
+++ b/config.c
@@ -881,6 +881,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.watchmansynctimeout")) {
+		core_watchman_sync_timeout = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/configure.ac b/configure.ac
index 76170ad..9772f79 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1092,6 +1092,14 @@ AC_COMPILE_IFELSE([BSD_SYSCTL_SRC],
 	HAVE_BSD_SYSCTL=])
 GIT_CONF_SUBST([HAVE_BSD_SYSCTL])
 
+#
+# Check for watchman client library
+
+AC_CHECK_LIB([watchman], [watchman_connect],
+	[USE_WATCHMAN=YesPlease],
+	[USE_WATCHMAN=])
+GIT_CONF_SUBST([USE_WATCHMAN])
+
 ## Other checks.
 # Define USE_PIC if you need the main git objects to be built with -fPIC
 # in order to build and link perl/Git.so.  x86-64 seems to need this.
diff --git a/environment.c b/environment.c
index 2da7fe2..84df431 100644
--- a/environment.c
+++ b/environment.c
@@ -87,6 +87,9 @@ int auto_comment_line_char;
 /* Parallel index stat data preload? */
 int core_preload_index = 1;
 
+int core_watchman_sync_timeout = 300;
+
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
diff --git a/watchman-support.c b/watchman-support.c
new file mode 100644
index 0000000..7f6c0a9
--- /dev/null
+++ b/watchman-support.c
@@ -0,0 +1,108 @@
+#include "cache.h"
+#include "watchman-support.h"
+#include "strbuf.h"
+#include <watchman.h>
+
+static struct watchman_query *make_query(const char *last_update)
+{
+	struct watchman_query *query = watchman_query();
+	watchman_query_set_fields(query, WATCHMAN_FIELD_NAME |
+					 WATCHMAN_FIELD_EXISTS |
+					 WATCHMAN_FIELD_NEWER);
+	watchman_query_set_empty_on_fresh(query, 1);
+	query->sync_timeout = core_watchman_sync_timeout;
+	if (*last_update)
+		watchman_query_set_since_oclock(query, last_update);
+	return query;
+}
+
+static struct watchman_query_result* query_watchman(
+	struct index_state *istate, struct watchman_connection *connection,
+	const char *fs_path, const char *last_update)
+{
+	struct watchman_error wm_error;
+	struct watchman_query *query;
+	struct watchman_expression *expr;
+	struct watchman_query_result *result;
+
+	query = make_query(last_update);
+	expr = watchman_true_expression();
+	result = watchman_do_query(connection, fs_path, query, expr, &wm_error);
+	watchman_free_query(query);
+	watchman_free_expression(expr);
+
+	if (!result)
+		warning("Watchman query error: %s (at %s)",
+			wm_error.message,
+			*last_update ? last_update : "the beginning");
+
+	return result;
+}
+
+static void update_index(struct index_state *istate,
+			 struct watchman_query_result *result)
+{
+	int i;
+
+	if (result->is_fresh_instance) {
+		/* let refresh clear them later */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_NO_WATCH;
+		goto done;
+	}
+
+	for (i = 0; i < result->nr; i++) {
+		struct watchman_stat *wm = result->stats + i;
+		int pos;
+
+		if (!strncmp(wm->name, ".git/", 5) ||
+		    strstr(wm->name, "/.git/"))
+			continue;
+
+		pos = index_name_pos(istate, wm->name, strlen(wm->name));
+		if (pos < 0)
+			continue;
+		/* FIXME: ignore staged entries and gitlinks too? */
+
+		istate->cache[pos]->ce_flags |= CE_NO_WATCH;
+	}
+
+done:
+	free(istate->last_update);
+	istate->last_update    = xstrdup(result->clock);
+	istate->cache_changed |= WATCHMAN_CHANGED;
+}
+
+int check_watchman(struct index_state *istate)
+{
+	struct watchman_error wm_error;
+	struct watchman_connection *connection;
+	struct watchman_query_result *result;
+	const char *fs_path;
+
+	fs_path = get_git_work_tree();
+	if (!fs_path)
+		return -1;
+
+	connection = watchman_connect(&wm_error);
+
+	if (!connection) {
+		warning("Watchman watch error: %s", wm_error.message);
+		return -1;
+	}
+
+	if (watchman_watch(connection, fs_path, &wm_error)) {
+		warning("Watchman watch error: %s", wm_error.message);
+		watchman_connection_close(connection);
+		return -1;
+	}
+
+
+	result = query_watchman(istate, connection, fs_path, istate->last_update);
+	watchman_connection_close(connection);
+	if (!result)
+		return -1;
+	update_index(istate, result);
+	watchman_free_query_result(result);
+	return 0;
+}
diff --git a/watchman-support.h b/watchman-support.h
new file mode 100644
index 0000000..5610409
--- /dev/null
+++ b/watchman-support.h
@@ -0,0 +1,8 @@
+#ifndef WATCHMAN_SUPPORT_H
+#define WATCHMAN_SUPPORT_H
+
+struct index_state;
+int check_watchman(struct index_state *index);
+
+
+#endif /* WATCHMAN_SUPPORT_H */
-- 
2.2.0.513.g477eb31

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] read-cache: allow index-helper to prepare shm before git reads it
  2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
  2015-11-01 13:55 ` [PATCH 1/5] read-cache: add watchman 'WAMA' extension Nguyễn Thái Ngọc Duy
  2015-11-01 13:55 ` [PATCH 2/5] Add watchman support to reduce index refresh cost Nguyễn Thái Ngọc Duy
@ 2015-11-01 13:55 ` Nguyễn Thái Ngọc Duy
  2015-11-01 13:55 ` [PATCH 4/5] index-helper: use watchman to avoid refreshing index with lstat() Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2015-11-01 13:55 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Nguyễn Thái Ngọc Duy

If index-helper puts 'W' before pid in $GIT_DIR/index-helper.pid, then
git will sleep for a while, expecting to be waken up by SIGUSR1 when
index-helper has done shm preparation, or after the timeout.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 893223e..ae33951 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1591,14 +1591,24 @@ static void do_poke(struct strbuf *sb, int refresh_cache)
 	PostMessage(hwnd, refresh_cache ? WM_USER : WM_USER + 1, 0, 0);
 }
 #else
+static void do_nothing(int sig)
+{
+}
+
 static void do_poke(struct strbuf *sb, int refresh_cache)
 {
-	char	*start = sb->buf;
+	int	 wait  = sb->buf[0] == 'W';
+	char	*start = wait ? sb->buf + 1 : sb->buf;
 	char	*end   = NULL;
 	pid_t	 pid   = strtoul(start, &end, 10);
+	int	 ret;
 	if (!end || end != sb->buf + sb->len)
 		return;
-	kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
+	ret = kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
+	if (!refresh_cache && !ret && wait) {
+		signal(SIGHUP, do_nothing);
+		sleep(1);
+	}
 }
 #endif
 
-- 
2.2.0.513.g477eb31

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] index-helper: use watchman to avoid refreshing index with lstat()
  2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2015-11-01 13:55 ` [PATCH 3/5] read-cache: allow index-helper to prepare shm before git reads it Nguyễn Thái Ngọc Duy
@ 2015-11-01 13:55 ` Nguyễn Thái Ngọc Duy
  2015-11-01 13:55 ` [PATCH 5/5] update-index: enable/disable watchman support Nguyễn Thái Ngọc Duy
  2015-11-02 14:54 ` [PATCH 0/5] Use watchman to reduce index refresh time Paolo Ciarrocchi
  5 siblings, 0 replies; 17+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2015-11-01 13:55 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Nguyễn Thái Ngọc Duy

Watchman is hidden behind index-helper. Before git tries to read the
index from shm, it notifies index-helper with SIGHUP and sleep,
waiting for index-helper to prepare shm. index-helper then contacts
watchman, updates 'WAMA' extension and put it in a separate shm and
wakes git up with SIGHUP.

Git uses this extension to not lstat unchanged entries. Git only trust
'WAMA' extension when it's received from the separate shm, not from
disk. Unmarked entries are "clean". Marked entries are dirty from
watchman point of view. If it finds out some entries are
'watchman-dirty', but are really unchanged (e.g. the file was changed,
then reverted back), then Git will clear the marking in 'WAMA' before
writing it down.

Hiding watchman behind index-helper means you need both daemons. You
can't run watchman alone. Not so good. But on the other hand, 'git'
binary is not linked to watchman/json libraries, which is good for
packaging. Core git package will run fine without watchman-related
packages. If they need watchman, they can install git-index-helper and
dependencies.

Another reason for tying watchman to index-helper is, when used with
untracked cache, we need to keep track of $GIT_WORK_TREE file
listing. That kind of list can be kept in index-helper.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile           |  5 ++++
 cache.h            |  2 ++
 index-helper.c     | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 read-cache.c       | 43 ++++++++++++++++++++++++++--
 watchman-support.h |  1 -
 5 files changed, 127 insertions(+), 8 deletions(-)

diff --git a/Makefile b/Makefile
index 761acb6..3f5eac8 100644
--- a/Makefile
+++ b/Makefile
@@ -434,6 +434,7 @@ MSGFMT = msgfmt
 CURL_CONFIG = curl-config
 PTHREAD_LIBS = -lpthread
 PTHREAD_CFLAGS =
+WATCHMAN_LIBS =
 GCOV = gcov
 
 export TCL_PATH TCLTK_PATH
@@ -1392,6 +1393,7 @@ endif
 ifdef USE_WATCHMAN
 	LIB_H += watchman-support.h
 	LIB_OBJS += watchman-support.o
+	WATCHMAN_LIBS = -lwatchman
 	BASIC_CFLAGS += -DUSE_WATCHMAN
 endif
 
@@ -2005,6 +2007,9 @@ git-remote-testsvn$X: remote-testsvn.o GIT-LDFLAGS $(GITLIBS) $(VCSSVN_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) \
 	$(VCSSVN_LIB)
 
+git-index-helper$X: index-helper.o GIT-LDFLAGS $(GITLIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) $(WATCHMAN_LIBS)
+
 $(REMOTE_CURL_ALIASES): $(REMOTE_CURL_PRIMARY)
 	$(QUIET_LNCP)$(RM) $@ && \
 	ln $< $@ 2>/dev/null || \
diff --git a/cache.h b/cache.h
index 572299c..c04141b 100644
--- a/cache.h
+++ b/cache.h
@@ -518,6 +518,7 @@ extern int daemonize(int *);
 
 /* Initialize and use the cache information */
 struct lock_file;
+extern int verify_index(const struct index_state *);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
@@ -525,6 +526,7 @@ extern int do_read_index(struct index_state *istate, const char *path,
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
+extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
 #define REFRESH_DAEMON		(1 << 2)
diff --git a/index-helper.c b/index-helper.c
index cf26da7..421887e 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -5,15 +5,18 @@
 #include "split-index.h"
 #include "shm.h"
 #include "lockfile.h"
+#include "watchman-support.h"
 
 struct shm {
 	unsigned char sha1[20];
 	void *shm;
 	size_t size;
+	pid_t pid;
 };
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static struct shm shm_watchman;
 static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
@@ -25,10 +28,21 @@ static void release_index_shm(struct shm *is)
 	is->shm = NULL;
 }
 
+static void release_watchman_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	git_shm_unlink("git-watchman-%s-%" PRIuMAX,
+		       sha1_to_hex(is->sha1), (uintmax_t)is->pid);
+	is->shm = NULL;
+}
+
 static void cleanup_shm(void)
 {
 	release_index_shm(&shm_index);
 	release_index_shm(&shm_base_index);
+	release_watchman_shm(&shm_watchman);
 }
 
 static void cleanup(void)
@@ -120,13 +134,15 @@ static void share_the_index(void)
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
-	if (to_verify && !verify_shm())
+	if (to_verify && !verify_shm()) {
 		cleanup_shm();
-	discard_index(&the_index);
+		discard_index(&the_index);
+	}
 }
 
 static void refresh(int sig)
 {
+	discard_index(&the_index);
 	the_index.keep_mmap = 1;
 	the_index.to_shm    = 1;
 	if (read_cache() < 0)
@@ -136,7 +152,55 @@ static void refresh(int sig)
 
 #ifdef HAVE_SHM
 
-static void do_nothing(int sig)
+#ifdef USE_WATCHMAN
+static void share_watchman(struct index_state *istate,
+			   struct shm *is, pid_t pid)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *shm;
+
+	write_watchman_ext(&sb, istate);
+	if (git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, sb.len + 20,
+			&shm, PROT_READ | PROT_WRITE, MAP_SHARED,
+			"git-watchman-%s-%" PRIuMAX,
+			sha1_to_hex(istate->sha1), (uintmax_t)pid) == sb.len + 20) {
+		is->size = sb.len + 20;
+		is->shm = shm;
+		is->pid = pid;
+		hashcpy(is->sha1, istate->sha1);
+
+		memcpy(shm, sb.buf, sb.len);
+		hashcpy((unsigned char *)shm + is->size - 20, is->sha1);
+	}
+	strbuf_release(&sb);
+}
+
+static void prepare_with_watchman(pid_t pid)
+{
+	/*
+	 * with the help of watchman, maybe we could detect if
+	 * $GIT_DIR/index is updated..
+	 */
+	if (!verify_index(&the_index))
+		refresh(0);
+
+	if (check_watchman(&the_index))
+		return;
+
+	share_watchman(&the_index, &shm_watchman, pid);
+}
+
+static void prepare_index(int sig, siginfo_t *si, void *context)
+{
+	free_watchman_shm(&shm_watchman);
+	if (the_index.last_update)
+		prepare_with_watchman(si->si_pid);
+	kill(si->si_pid, SIGHUP); /* stop the waiting in poke_daemon() */
+}
+
+#else
+
+static void prepare_index(int sig, siginfo_t *si, void *context)
 {
 	/*
 	 * what we need is the signal received and interrupts
@@ -145,11 +209,21 @@ static void do_nothing(int sig)
 	 */
 }
 
+#endif
+
 static void loop(const char *pid_file, int idle_in_seconds)
 {
+	struct sigaction sa;
+
 	sigchain_pop(SIGHUP);	/* pushed by sigchain_push_common */
 	sigchain_push(SIGHUP, refresh);
-	sigchain_push(SIGUSR1, do_nothing);
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = prepare_index;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_SIGINFO;
+	sigaction(SIGUSR1, &sa, NULL);
+
 	refresh(0);
 	while (sleep(idle_in_seconds))
 		; /* do nothing, all is handled by signal handlers already */
@@ -245,6 +319,8 @@ int main(int argc, char **argv)
 				       LOCK_DIE_ON_ERROR);
 #ifdef GIT_WINDOWS_NATIVE
 	strbuf_addstr(&sb, "HWND");
+#elif defined(USE_WATCHMAN)
+	strbuf_addch(&sb, 'W');	/* see poke_daemon() */
 #endif
 	strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
 	write_in_full(fd, sb.buf, sb.len);
diff --git a/read-cache.c b/read-cache.c
index ae33951..1b84538 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1405,7 +1405,7 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
 	return 0;
 }
 
-static void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
+void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
 {
 	struct ewah_bitmap *bitmap;
 	int i;
@@ -1678,6 +1678,39 @@ static int try_shm(struct index_state *istate)
 	return 0;
 }
 
+static void refresh_by_watchman(struct index_state *istate)
+{
+	void *shm = NULL;
+	int length;
+	int i;
+
+	length = git_shm_map(O_RDONLY, 0700, -1, &shm,
+			     PROT_READ, MAP_SHARED,
+			     "git-watchman-%s-%" PRIuMAX,
+			     sha1_to_hex(istate->sha1),
+			     (uintmax_t)getpid());
+
+	if (length <= 20 ||
+	    hashcmp(istate->sha1, (unsigned char *)shm + length - 20) ||
+	    /*
+	     * No need to clear CE_NO_WATCH set by 'WAMA' on
+	     * disk. Watchman can only set more, not clear any, so
+	     * this is OR mask.
+	     */
+	    read_watchman_ext(istate, shm, length - 20))
+		goto done;
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_stage(ce) || (ce->ce_flags & CE_NO_WATCH))
+			continue;
+		ce_mark_uptodate(ce);
+	}
+done:
+	if (shm)
+		munmap(shm, length);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1797,7 +1830,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
 		check_ce_order(istate);
-		return ret;
+		goto done;
 	}
 
 	if (split_index->base)
@@ -1818,6 +1851,10 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	check_ce_order(istate);
+
+done:
+	if (ret > 0 && istate->from_shm && istate->last_update)
+		refresh_by_watchman(istate);
 	return ret;
 }
 
@@ -2119,7 +2156,7 @@ out:
 	return 0;
 }
 
-static int verify_index(const struct index_state *istate)
+int verify_index(const struct index_state *istate)
 {
 	return verify_index_from(istate, get_index_file());
 }
diff --git a/watchman-support.h b/watchman-support.h
index 5610409..ee1ef2c 100644
--- a/watchman-support.h
+++ b/watchman-support.h
@@ -4,5 +4,4 @@
 struct index_state;
 int check_watchman(struct index_state *index);
 
-
 #endif /* WATCHMAN_SUPPORT_H */
-- 
2.2.0.513.g477eb31

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] update-index: enable/disable watchman support
  2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2015-11-01 13:55 ` [PATCH 4/5] index-helper: use watchman to avoid refreshing index with lstat() Nguyễn Thái Ngọc Duy
@ 2015-11-01 13:55 ` Nguyễn Thái Ngọc Duy
  2015-11-02 14:54 ` [PATCH 0/5] Use watchman to reduce index refresh time Paolo Ciarrocchi
  5 siblings, 0 replies; 17+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2015-11-01 13:55 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/update-index.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 7431938..86aec21 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -903,6 +903,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, line_termination = '\n';
 	int untracked_cache = -1;
+	int use_watchman = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -998,6 +999,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			N_("enable/disable untracked cache")),
 		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
 			    N_("enable untracked cache without testing the filesystem"), 2),
+		OPT_BOOL(0, "watchman", &use_watchman,
+			N_("use or not use watchman to reduce refresh cost")),
 		OPT_END()
 	};
 
@@ -1127,6 +1130,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		the_index.cache_changed |= UNTRACKED_CHANGED;
 	}
 
+	if (use_watchman > 0) {
+		the_index.last_update    = xstrdup("");
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	} else if (!use_watchman) {
+		the_index.last_update    = NULL;
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
2.2.0.513.g477eb31

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2015-11-01 13:55 ` [PATCH 5/5] update-index: enable/disable watchman support Nguyễn Thái Ngọc Duy
@ 2015-11-02 14:54 ` Paolo Ciarrocchi
  2015-11-02 19:23   ` Duy Nguyen
  5 siblings, 1 reply; 17+ messages in thread
From: Paolo Ciarrocchi @ 2015-11-02 14:54 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git, Christian Couder

On Sun, Nov 1, 2015 at 2:55 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:

Hi Duy,

> This series builds on top of the index-helper series I just sent and
> uses watchman to keep track of file changes in order to avoid lstat()
> at refresh time. The series can also be found at [1]
>
> When I started this work, watchman did not support Windows yet. It
> does now, even if still experimental [2]. So Windows people, please
> try it out if you have time.
>
> To put all pieces so far together, we have split-index to reduce index
> write time, untracked cache to reduce I/O as well as computation for
> .gitignore, index-helper for index read time and this series for
> lstat() at refresh time. The remaining piece is killing lstat() from
> untracked cache, but right now it's just some idea and incomplete
> code.

Did you manage to measure the speedup introduced by this series?

Ciao,
-- 
Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-02 14:54 ` [PATCH 0/5] Use watchman to reduce index refresh time Paolo Ciarrocchi
@ 2015-11-02 19:23   ` Duy Nguyen
  2015-11-03  9:21     ` Duy Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Duy Nguyen @ 2015-11-02 19:23 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: Git, Christian Couder

On Mon, Nov 2, 2015 at 3:54 PM, Paolo Ciarrocchi
<paolo.ciarrocchi@gmail.com> wrote:
> On Sun, Nov 1, 2015 at 2:55 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> Hi Duy,
>
>> This series builds on top of the index-helper series I just sent and
>> uses watchman to keep track of file changes in order to avoid lstat()
>> at refresh time. The series can also be found at [1]
>>
>> When I started this work, watchman did not support Windows yet. It
>> does now, even if still experimental [2]. So Windows people, please
>> try it out if you have time.
>>
>> To put all pieces so far together, we have split-index to reduce index
>> write time, untracked cache to reduce I/O as well as computation for
>> .gitignore, index-helper for index read time and this series for
>> lstat() at refresh time. The remaining piece is killing lstat() from
>> untracked cache, but right now it's just some idea and incomplete
>> code.
>
> Did you manage to measure the speedup introduced by this series?

It was from last year. I may have measured it but because I didn't
save it in the commit message, it was lost anyway. Installing watchman
and measuring with webkit.git soon..
-- 
Duy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] Add watchman support to reduce index refresh cost
  2015-11-01 13:55 ` [PATCH 2/5] Add watchman support to reduce index refresh cost Nguyễn Thái Ngọc Duy
@ 2015-11-02 21:19   ` David Turner
  0 siblings, 0 replies; 17+ messages in thread
From: David Turner @ 2015-11-02 21:19 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Christian Couder

On Sun, 2015-11-01 at 14:55 +0100, Nguyễn Thái Ngọc Duy wrote:
> The previous patch has the logic to clear bits in 'WAMA' bitmap. This
> patch has logic to set bits as told by watchman. The missing bit,
> _using_ these bits, are not here yet.
> 
> A lot of this code is written by David Turner originally, mostly from
> [1]. I'm just copying and polishing it a bit.
>
> [1] http://article.gmane.org/gmane.comp.version-control.git/248006

Our code has evolved somewhat from there[1].  It looks like you've
incorporated some of these updates.  But I wanted to call out one thing
in particular that it doesn't look like this patch handles: there's some
real ugliness on at least OSX around case-changing renames. 

You probably won't be able to use our code directly as-is, but we'll
want to do something about this situation.  This is thicket of TOCTOU
issues.  For instance, watchman learns that a file called FOO has
changed, but by the time it goes to the filesystem to look up the
canonical case, FOO has been renamed to foo already (or renamed and then
deleted).  

I won't say for sure that your code is insufficient as I haven't yet
tried it out, but we should ensure this case is handled before this is
merged.

[1] https://github.com/dturner-tw/git/tree/dturner/watchman

> +
> +		pos = index_name_pos(istate, wm->name, strlen(wm->name));

This is the bit where case matters.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] read-cache: add watchman 'WAMA' extension
  2015-11-01 13:55 ` [PATCH 1/5] read-cache: add watchman 'WAMA' extension Nguyễn Thái Ngọc Duy
@ 2015-11-02 22:03   ` David Turner
  2015-11-03 19:17     ` Duy Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: David Turner @ 2015-11-02 22:03 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Christian Couder

On Sun, 2015-11-01 at 14:55 +0100, Nguyễn Thái Ngọc Duy wrote:
>
>+#define CE_NO_WATCH  (0x0001)

This name seems very confusing to me.  CE_NO_WATCHMAN_STAT?
CE_UNKNOWN_TO_WATCHMAN?  

(one reason it may seem more confusing to me than to others is that
Twitter's code has a concept of files that we don't watch at all e.g.
Intellij's .idea dir).  

> @@ -322,6 +325,7 @@ struct index_state {
>  	struct untracked_cache *untracked;
>  	void *mmap;
>  	size_t mmap_size;
> +	char *last_update;

Might be worth a comment explaining what this is.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-02 19:23   ` Duy Nguyen
@ 2015-11-03  9:21     ` Duy Nguyen
  2015-11-03 10:26       ` Paolo Ciarrocchi
  2015-11-09 20:06       ` Christian Couder
  0 siblings, 2 replies; 17+ messages in thread
From: Duy Nguyen @ 2015-11-03  9:21 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: Git, Christian Couder

On Mon, Nov 2, 2015 at 8:23 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Nov 2, 2015 at 3:54 PM, Paolo Ciarrocchi
> <paolo.ciarrocchi@gmail.com> wrote:
>> On Sun, Nov 1, 2015 at 2:55 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>>
>> Hi Duy,
>>
>>> This series builds on top of the index-helper series I just sent and
>>> uses watchman to keep track of file changes in order to avoid lstat()
>>> at refresh time. The series can also be found at [1]
>>>
>>> When I started this work, watchman did not support Windows yet. It
>>> does now, even if still experimental [2]. So Windows people, please
>>> try it out if you have time.
>>>
>>> To put all pieces so far together, we have split-index to reduce index
>>> write time, untracked cache to reduce I/O as well as computation for
>>> .gitignore, index-helper for index read time and this series for
>>> lstat() at refresh time. The remaining piece is killing lstat() from
>>> untracked cache, but right now it's just some idea and incomplete
>>> code.
>>
>> Did you manage to measure the speedup introduced by this series?
>
> It was from last year. I may have measured it but because I didn't
> save it in the commit message, it was lost anyway. Installing watchman
> and measuring with webkit.git soon..

Test repo: webkit.git with 104665 tracked files, 5615 untracked,
3517 dirs. Best numbers out of a few tries. This is best case
scenario. Normal usage could have worse numbers.

There is something strange about the "-uno" measurements. I don't
think watchman+untracked cache can beat -uno..  Maybe I did something
wrong.

0m0.383s   index v2
0m0.351s   index v4
0m0.352s   v2 split-index
0m0.309s   v2 split index-helper
0m0.159s   v2 split helper untracked-cache
0m0.123s   v2 split helper "status -uno"
0m0.098s   v2 split helper untracked watchman
0m0.071s   v2 split helper watchman "status -uno"

Note, the watchman series needs
s/free_watchman_shm/release_watchman_shm/ (I didn't do a good job
of testing after rebase). And there's a small bug in index-helper
--detach code writing incorrect PID..
-- 
Duy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-03  9:21     ` Duy Nguyen
@ 2015-11-03 10:26       ` Paolo Ciarrocchi
  2015-11-09 20:06       ` Christian Couder
  1 sibling, 0 replies; 17+ messages in thread
From: Paolo Ciarrocchi @ 2015-11-03 10:26 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git, Christian Couder

On Tue, Nov 3, 2015 at 10:21 AM, Duy Nguyen <pclouds@gmail.com> wrote:
>> It was from last year. I may have measured it but because I didn't
>> save it in the commit message, it was lost anyway. Installing watchman
>> and measuring with webkit.git soon..
>
> Test repo: webkit.git with 104665 tracked files, 5615 untracked,
> 3517 dirs. Best numbers out of a few tries. This is best case
> scenario. Normal usage could have worse numbers.
>
> There is something strange about the "-uno" measurements. I don't
> think watchman+untracked cache can beat -uno..  Maybe I did something
> wrong.
>
> 0m0.383s   index v2
> 0m0.351s   index v4
> 0m0.352s   v2 split-index
> 0m0.309s   v2 split index-helper
> 0m0.159s   v2 split helper untracked-cache
> 0m0.123s   v2 split helper "status -uno"
> 0m0.098s   v2 split helper untracked watchman
> 0m0.071s   v2 split helper watchman "status -uno"
>
> Note, the watchman series needs
> s/free_watchman_shm/release_watchman_shm/ (I didn't do a good job
> of testing after rebase). And there's a small bug in index-helper
> --detach code writing incorrect PID..


Impressive improvements!

Ciao,
Paolo

-- 
Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] read-cache: add watchman 'WAMA' extension
  2015-11-02 22:03   ` David Turner
@ 2015-11-03 19:17     ` Duy Nguyen
  2015-11-03 19:49       ` David Turner
  0 siblings, 1 reply; 17+ messages in thread
From: Duy Nguyen @ 2015-11-03 19:17 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List, Christian Couder

On Mon, Nov 2, 2015 at 11:03 PM, David Turner <dturner@twopensource.com> wrote:
> On Sun, 2015-11-01 at 14:55 +0100, Nguyễn Thái Ngọc Duy wrote:
>>
>>+#define CE_NO_WATCH  (0x0001)
>
> This name seems very confusing to me.  CE_NO_WATCHMAN_STAT?
> CE_UNKNOWN_TO_WATCHMAN?

Files that are known updated. Maybe CE_WATCHMAN_DIRTY?

> (one reason it may seem more confusing to me than to others is that
> Twitter's code has a concept of files that we don't watch at all e.g.
> Intellij's .idea dir).
>
>> @@ -322,6 +325,7 @@ struct index_state {
>>       struct untracked_cache *untracked;
>>       void *mmap;
>>       size_t mmap_size;
>> +     char *last_update;
>
> Might be worth a comment explaining what this is.

It's the clock value from watchman when we query file status. Will
make a note. Or maybe I should just rename it to watchman_clock.
-- 
Duy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] read-cache: add watchman 'WAMA' extension
  2015-11-03 19:17     ` Duy Nguyen
@ 2015-11-03 19:49       ` David Turner
  0 siblings, 0 replies; 17+ messages in thread
From: David Turner @ 2015-11-03 19:49 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Christian Couder

On Tue, 2015-11-03 at 20:17 +0100, Duy Nguyen wrote:
> On Mon, Nov 2, 2015 at 11:03 PM, David Turner <dturner@twopensource.com> wrote:
> > On Sun, 2015-11-01 at 14:55 +0100, Nguyễn Thái Ngọc Duy wrote:
> >>
> >>+#define CE_NO_WATCH  (0x0001)
> >
> > This name seems very confusing to me.  CE_NO_WATCHMAN_STAT?
> > CE_UNKNOWN_TO_WATCHMAN?
> 
> Files that are known updated. Maybe CE_WATCHMAN_DIRTY?

+1

> > (one reason it may seem more confusing to me than to others is that
> > Twitter's code has a concept of files that we don't watch at all e.g.
> > Intellij's .idea dir).
> >
> >> @@ -322,6 +325,7 @@ struct index_state {
> >>       struct untracked_cache *untracked;
> >>       void *mmap;
> >>       size_t mmap_size;
> >> +     char *last_update;
> >
> > Might be worth a comment explaining what this is.
> 
> It's the clock value from watchman when we query file status. Will
> make a note. Or maybe I should just rename it to watchman_clock.

Either way would work for me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-03  9:21     ` Duy Nguyen
  2015-11-03 10:26       ` Paolo Ciarrocchi
@ 2015-11-09 20:06       ` Christian Couder
  2015-11-10 21:04         ` David Turner
  1 sibling, 1 reply; 17+ messages in thread
From: Christian Couder @ 2015-11-09 20:06 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Paolo Ciarrocchi, Git, David Turner

On Tue, Nov 3, 2015 at 10:21 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Nov 2, 2015 at 8:23 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Mon, Nov 2, 2015 at 3:54 PM, Paolo Ciarrocchi
>> <paolo.ciarrocchi@gmail.com> wrote:
>>> On Sun, Nov 1, 2015 at 2:55 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>>>
>>> Hi Duy,
>>>
>>>> This series builds on top of the index-helper series I just sent and
>>>> uses watchman to keep track of file changes in order to avoid lstat()
>>>> at refresh time. The series can also be found at [1]
>>>>
>>>> When I started this work, watchman did not support Windows yet. It
>>>> does now, even if still experimental [2]. So Windows people, please
>>>> try it out if you have time.
>>>>
>>>> To put all pieces so far together, we have split-index to reduce index
>>>> write time, untracked cache to reduce I/O as well as computation for
>>>> .gitignore, index-helper for index read time and this series for
>>>> lstat() at refresh time. The remaining piece is killing lstat() from
>>>> untracked cache, but right now it's just some idea and incomplete
>>>> code.
>>>
>>> Did you manage to measure the speedup introduced by this series?
>>
>> It was from last year. I may have measured it but because I didn't
>> save it in the commit message, it was lost anyway. Installing watchman
>> and measuring with webkit.git soon..
>
> Test repo: webkit.git with 104665 tracked files, 5615 untracked,
> 3517 dirs. Best numbers out of a few tries. This is best case
> scenario. Normal usage could have worse numbers.

Thank you for the tests!

I tried to replicate your results on webkit.git with 184672 tracked
files, 5631 untracked, 9330 dirs. Also best numbers out of a few
tries.

> There is something strange about the "-uno" measurements. I don't
> think watchman+untracked cache can beat -uno..  Maybe I did something
> wrong.
>
> 0m0.383s   index v2
> 0m0.351s   index v4
> 0m0.352s   v2 split-index
> 0m0.309s   v2 split index-helper
> 0m0.159s   v2 split helper untracked-cache
> 0m0.123s   v2 split helper "status -uno"
> 0m0.098s   v2 split helper untracked watchman
> 0m0.071s   v2 split helper watchman "status -uno"

I got the following results from "time git status ...":

0m0.774s
0m0.799s split
0m0.766s split helper
0m0.335s split helper untracked
0m0.232s split helper untracked -uno
0m0.284s split helper untracked watchman
0m0.188s split helper untracked watchman -uno

Using David's series I get worse results than all of the above but I
guess it's because his series is based on an ancient git version
(v2.0.0-rc0).

> Note, the watchman series needs
> s/free_watchman_shm/release_watchman_shm/ (I didn't do a good job
> of testing after rebase). And there's a small bug in index-helper
> --detach code writing incorrect PID..

I did the s/free_watchman_shm/release_watchman_shm/ change, but I
didn't notice the index-helper bug.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-09 20:06       ` Christian Couder
@ 2015-11-10 21:04         ` David Turner
  2015-11-20  9:45           ` Christian Couder
  0 siblings, 1 reply; 17+ messages in thread
From: David Turner @ 2015-11-10 21:04 UTC (permalink / raw)
  To: Christian Couder; +Cc: Duy Nguyen, Paolo Ciarrocchi, Git

On Mon, 2015-11-09 at 21:06 +0100, Christian Couder wrote:
> Using David's series I get worse results than all of the above but I
> guess it's because his series is based on an ancient git version
> (v2.0.0-rc0).

My more-recent series is on top of 2.4, but (for webkit):
mine: 0m0.206s 
duy's: 0m0.107s

However, I'm getting occasional index-helper segfaults due to
istate->last_update being NULL.  (I'm using Duy's  index-helper branch
from his github + this patchset + a function signature fix due to a
newer version of libwatchman).  I haven't looked into why this is --
maybe accidentally mixing git versions while testing?  

Also, after messing around for a while (on Duy's branch), I ended up in
a state where git status would take ~2.5s every time.  The index helper
was alive but evidently not working right.  Killing and restarting it
worked.  Actually, I think I can repro this more easily: "git rm
Changelog" seems to put the index-helper into this state.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Use watchman to reduce index refresh time
  2015-11-10 21:04         ` David Turner
@ 2015-11-20  9:45           ` Christian Couder
  0 siblings, 0 replies; 17+ messages in thread
From: Christian Couder @ 2015-11-20  9:45 UTC (permalink / raw)
  To: David Turner; +Cc: Duy Nguyen, Paolo Ciarrocchi, Git

On Tue, Nov 10, 2015 at 10:04 PM, David Turner <dturner@twopensource.com> wrote:
> On Mon, 2015-11-09 at 21:06 +0100, Christian Couder wrote:
>> Using David's series I get worse results than all of the above but I
>> guess it's because his series is based on an ancient git version
>> (v2.0.0-rc0).
>
> My more-recent series is on top of 2.4, but (for webkit):
> mine: 0m0.206s
> duy's: 0m0.107s

I tried using your more recent series and I get basically no change
compared to the current git (without using untracked cache) on Webkit
with around 5600 untracked files. Maybe I am doing something wrong.

I compiled watchman and installed it. It is in /usr/local/bin/watchman
but I am not sure it is actually being used though I have configured
"core.usewatchman" to "true".

> However, I'm getting occasional index-helper segfaults due to
> istate->last_update being NULL.  (I'm using Duy's  index-helper branch
> from his github + this patchset + a function signature fix due to a
> newer version of libwatchman).  I haven't looked into why this is --
> maybe accidentally mixing git versions while testing?
>
> Also, after messing around for a while (on Duy's branch), I ended up in
> a state where git status would take ~2.5s every time.  The index helper
> was alive but evidently not working right.  Killing and restarting it
> worked.  Actually, I think I can repro this more easily: "git rm
> Changelog" seems to put the index-helper into this state.

Thanks for this. I have also seen strange things happening with the
index-helper.
When it is working I found that it provides around 10% improvement on
git rebase time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-20  9:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-01 13:55 [PATCH 0/5] Use watchman to reduce index refresh time Nguyễn Thái Ngọc Duy
2015-11-01 13:55 ` [PATCH 1/5] read-cache: add watchman 'WAMA' extension Nguyễn Thái Ngọc Duy
2015-11-02 22:03   ` David Turner
2015-11-03 19:17     ` Duy Nguyen
2015-11-03 19:49       ` David Turner
2015-11-01 13:55 ` [PATCH 2/5] Add watchman support to reduce index refresh cost Nguyễn Thái Ngọc Duy
2015-11-02 21:19   ` David Turner
2015-11-01 13:55 ` [PATCH 3/5] read-cache: allow index-helper to prepare shm before git reads it Nguyễn Thái Ngọc Duy
2015-11-01 13:55 ` [PATCH 4/5] index-helper: use watchman to avoid refreshing index with lstat() Nguyễn Thái Ngọc Duy
2015-11-01 13:55 ` [PATCH 5/5] update-index: enable/disable watchman support Nguyễn Thái Ngọc Duy
2015-11-02 14:54 ` [PATCH 0/5] Use watchman to reduce index refresh time Paolo Ciarrocchi
2015-11-02 19:23   ` Duy Nguyen
2015-11-03  9:21     ` Duy Nguyen
2015-11-03 10:26       ` Paolo Ciarrocchi
2015-11-09 20:06       ` Christian Couder
2015-11-10 21:04         ` David Turner
2015-11-20  9:45           ` Christian Couder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.