All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/19] index-helper, watchman
@ 2016-03-09 18:36 David Turner
  2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
                   ` (19 more replies)
  0 siblings, 20 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

This is a rebase and extension of Duy's work on git index-helper and
watchman support.

Patches 1-14 and are more-or-less just Duy's patches, rebased. I fixed
up a race condition with signal handling, but otherwise didn't change
much.

Patch 15 preserves the untracked cache and watchman extensions across
checkouts.

Patches 16-18 are new index-helper patches that I wrote to improve the
general usability of index-helper.

And patch 19 is for discussion only -- if the overall concept is
correct, I'll rewrite it into the places where it should go.  I didn't
do it yet in part because I am not sure folks will agree with it and
in part because I didn't want to mess too much with Duy's code before
re-sending it.

I haven't used this code much.  I just wrote patches 16-18 today, in
fact. It's got fewer tests than I would like, in part because it is
somewhat difficult to test since it involves two separate persistent
daemons (index-helper, watchman), and the interaction between them.

At Twitter, we're still using something based on my earlier watchman
patches[1].  We would like to switch to this (especially if this is
the version that mainstream git is going towards).  But we have other
local patches[2], and I haven't fully finished migrating them to 2.8.
My informal testing shows that with the untracked cache and watchman
index extensions, and git index-helper[3], performance is slightly
better than my earlier code.  That's not a surprise, since the
index-helper eliminates index verification time.

[1] https://github.com/dturner-tw/git/tree/dturner/watchman

[2] e.g. something like this:
https://github.com/dturner-tw/git/tree/dturner/journal

[3] In both cases, I'm using a version which has the SSE ref-parsing
and hashmap code; these were rejected due to complexity but they provide
a benefit for us, so we still use them.

David Turner (5):
  unpack-trees: preserve index extensions
  index-helper: rewrite pidfile after daemonizing
  index-helper: process management
  index-helper: autorun
  hack: watchman/untracked cache mashup

Nguyễn Thái Ngọc Duy (14):
  trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics
  read-cache.c: fix constness of verify_hdr()
  read-cache: allow to keep mmap'd memory after reading
  index-helper: new daemon for caching index and related stuff
  trace.c: add GIT_TRACE_INDEX_STATS for index statistics
  index-helper: add --strict
  daemonize(): set a flag before exiting the main process
  index-helper: add --detach
  index-helper: add Windows support
  read-cache: add watchman 'WAMA' extension
  Add watchman support to reduce index refresh cost
  read-cache: allow index-helper to prepare shm before git reads it
  index-helper: use watchman to avoid refreshing index with lstat()
  update-index: enable/disable watchman support

 .gitignore                         |   1 +
 Documentation/git-index-helper.txt |  56 +++++
 Documentation/git.txt              |   4 +
 Makefile                           |  21 ++
 builtin/gc.c                       |   2 +-
 builtin/update-index.c             |  11 +
 cache.h                            |  20 +-
 config.c                           |   5 +
 config.mak.uname                   |   3 +
 configure.ac                       |   8 +
 daemon.c                           |   2 +-
 dir.c                              |  23 +-
 dir.h                              |   6 +
 environment.c                      |   3 +
 git-compat-util.h                  |   1 +
 git.c                              |  37 +++-
 index-helper.c                     | 437 ++++++++++++++++++++++++++++++++++++
 read-cache.c                       | 441 +++++++++++++++++++++++++++++++++++--
 setup.c                            |   4 +-
 sha1_file.c                        |  24 ++
 shm.c                              | 163 ++++++++++++++
 shm.h                              |  23 ++
 t/t7063-status-untracked-cache.sh  |  23 ++
 t/t7900-index-helper.sh            |  23 ++
 t/test-lib-functions.sh            |   4 +
 trace.c                            |  16 ++
 trace.h                            |   1 +
 unpack-trees.c                     |   1 +
 watchman-support.c                 | 134 +++++++++++
 watchman-support.h                 |   7 +
 30 files changed, 1481 insertions(+), 23 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100644 shm.c
 create mode 100644 shm.h
 create mode 100755 t/t7900-index-helper.sh
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 22:58   ` Junio C Hamano
  2016-03-09 18:36 ` [PATCH 02/19] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

trace_stats() is intended for GIT_TRACE_*_STATS variable group and
GIT_TRACE_PACK_STATS is more like an example how new vars can be added.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git.txt |  3 +++
 cache.h               |  2 ++
 git.c                 |  1 +
 sha1_file.c           | 24 ++++++++++++++++++++++++
 trace.c               | 13 +++++++++++++
 trace.h               |  1 +
 6 files changed, 44 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 2754af8..794271e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -1055,6 +1055,9 @@ of clones and fetches.
 	time of each Git command.
 	See 'GIT_TRACE' for available trace output options.
 
+'GIT_TRACE_PACK_STATS'::
+	Print various statistics.
+
 'GIT_TRACE_SETUP'::
 	Enables trace messages printing the .git, working tree and current
 	working directory after Git has completed its setup phase.
diff --git a/cache.h b/cache.h
index b829410..7e01403 100644
--- a/cache.h
+++ b/cache.h
@@ -1828,4 +1828,6 @@ void sleep_millisec(int millisec);
  */
 void safe_create_dir(const char *dir, int share);
 
+void report_pack_stats(struct trace_key *key);
+
 #endif /* CACHE_H */
diff --git a/git.c b/git.c
index 6cc0c07..a4f6f71 100644
--- a/git.c
+++ b/git.c
@@ -655,6 +655,7 @@ int main(int argc, char **av)
 	git_setup_gettext();
 
 	trace_command_performance(argv);
+	trace_stats();
 
 	/*
 	 * "git-xxxx" is the same as "git xxxx", but we obviously:
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..14cebdf 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -518,6 +518,7 @@ static unsigned int peak_pack_open_windows;
 static unsigned int pack_open_windows;
 static unsigned int pack_open_fds;
 static unsigned int pack_max_fds;
+static unsigned int pack_access_nr;
 static size_t peak_pack_mapped;
 static size_t pack_mapped;
 struct packed_git *packed_git;
@@ -543,6 +544,28 @@ void pack_report(void)
 		sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
 }
 
+void report_pack_stats(struct trace_key *key)
+{
+	trace_printf_key(key, "\n"
+			 "pack_report: getpagesize()            = %10" SZ_FMT "\n"
+			 "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
+			 "pack_report: core.packedGitLimit      = %10" SZ_FMT "\n"
+			 "pack_report: pack_used_ctr            = %10u\n"
+			 "pack_report: pack_mmap_calls          = %10u\n"
+			 "pack_report: pack_open_windows        = %10u / %10u\n"
+			 "pack_report: pack_mapped              = "
+			 "%10" SZ_FMT " / %10" SZ_FMT "\n"
+			 "pack_report: pack accesss             = %10u\n",
+			 sz_fmt(getpagesize()),
+			 sz_fmt(packed_git_window_size),
+			 sz_fmt(packed_git_limit),
+			 pack_used_ctr,
+			 pack_mmap_calls,
+			 pack_open_windows, peak_pack_open_windows,
+			 sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped),
+			 pack_access_nr);
+}
+
 /*
  * Open and mmap the index file at path, perform a couple of
  * consistency checks, then record its information to p.  Return 0 on
@@ -2238,6 +2261,7 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
 	static struct trace_key pack_access = TRACE_KEY_INIT(PACK_ACCESS);
 	trace_printf_key(&pack_access, "%s %"PRIuMAX"\n",
 			 p->pack_name, (uintmax_t)obj_offset);
+	pack_access_nr++;
 }
 
 int do_check_packed_object_crc;
diff --git a/trace.c b/trace.c
index 4aeea60..b1d0885 100644
--- a/trace.c
+++ b/trace.c
@@ -432,3 +432,16 @@ void trace_command_performance(const char **argv)
 	sq_quote_argv(&command_line, argv, 0);
 	command_start_time = getnanotime();
 }
+
+static struct trace_key trace_pack_stats = TRACE_KEY_INIT(PACK_STATS);
+
+static void print_stats_atexit(void)
+{
+	report_pack_stats(&trace_pack_stats);
+}
+
+void trace_stats(void)
+{
+	if (trace_want(&trace_pack_stats))
+		atexit(print_stats_atexit);
+}
diff --git a/trace.h b/trace.h
index 179b249..52bda4e 100644
--- a/trace.h
+++ b/trace.h
@@ -19,6 +19,7 @@ extern void trace_disable(struct trace_key *key);
 extern uint64_t getnanotime(void);
 extern void trace_command_performance(const char **argv);
 extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
+extern void trace_stats(void);
 
 #ifndef HAVE_VARIADIC_MACROS
 
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/19] read-cache.c: fix constness of verify_hdr()
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
  2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading David Turner
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index d9fb78b..16cc487 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1345,7 +1345,7 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
 	unsigned char sha1[20];
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
  2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
  2016-03-09 18:36 ` [PATCH 02/19] read-cache.c: fix constness of verify_hdr() David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 23:02   ` Junio C Hamano
  2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  3 +++
 read-cache.c | 13 ++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 7e01403..c43ef3d 100644
--- a/cache.h
+++ b/cache.h
@@ -333,11 +333,14 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 keep_mmap : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	struct untracked_cache *untracked;
+	void *mmap;
+	size_t mmap_size;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 16cc487..7e387e9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1574,6 +1574,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
+	if (istate->keep_mmap) {
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	}
 	close(fd);
 
 	hdr = mmap;
@@ -1626,10 +1630,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += 8;
 		src_offset += extsize;
 	}
-	munmap(mmap, mmap_size);
+	if (!istate->keep_mmap)
+		munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
+	istate->mmap = NULL;
 	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
@@ -1655,6 +1661,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		discard_index(split_index->base);
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	split_index->base->keep_mmap = istate->keep_mmap;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1698,6 +1705,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->keep_mmap && istate->mmap) {
+		munmap(istate->mmap, istate->mmap_size);
+		istate->mmap = NULL;
+	}
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (2 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 23:09   ` Junio C Hamano
  2016-03-15 11:52   ` Duy Nguyen
  2016-03-09 18:36 ` [PATCH 05/19] trace.c: add GIT_TRACE_INDEX_STATS for index statistics David Turner
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Instead of reading the index from disk and worrying about disk
corruption, the index is cached in memory (memory bit-flips happen
too, but hopefully less often). The result is faster read. Read time
is reduced by 70%.

The biggest gain is not having to verify the trailing SHA-1, which
takes lots of time especially on large index files. But this also
opens doors for further optimiztions:

 - we could create an in-memory format that's essentially the memory
   dump of the index to eliminate most of parsing/allocation
   overhead. The mmap'd memory can be used straight away. Experiment
   [1] shows we could reduce read time by 88%.

 - we could cache non-index info such as name hash

The shared memory's name folows the template "git-<object>-<SHA1>"
where <SHA1> is the trailing SHA-1 of the index file. <object> is
"index" for cached index files (and may be "name-hash" for name-hash
cache). If such shared memory exists, it contains the same index
content as on disk. The content is already validated by the daemon and
git won't validate it again (except comparing the trailing SHA-1s).

Git can poke the daemon to tell it to refresh the index cache, or to
keep it alive some more minutes via UNIX signals. It can't give any
real index data directly to the daemon. Real data goes to disk first,
then the daemon reads and verifies it from there. Poking only happens
for $GIT_DIR/index, not temporary index files.

$GIT_DIR/index-helper.pid contains a reference to daemon process (and
it's pid on *nix). The file's mtime is updated every time it's accessed
(or should be updated often enough). Old index-helper.pid is considered
stale and ignored.

index-helper requires POSIX realtime extension. POSIX shm interface
however is abstracted away so that Windows support can be added later.

On webkit.git with index format v2, duplicating 8 times to 1.4m
entries and 200MB in size:

(vanilla)      0.986986364 s: read_index_from .git/index
(index-helper) 0.267850279 s: read_index_from .git/index

Interestingly with index v4, we get less out of index-helper. It makes
sense as v4 requires more processing after loading the index:

(vanilla)      0.722496666 s: read_index_from .git/index
(index-helper) 0.302741500 s: read_index_from .git/index

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore                         |   1 +
 Documentation/git-index-helper.txt |  44 ++++++++++
 Makefile                           |   9 +++
 cache.h                            |   3 +
 config.mak.uname                   |   1 +
 git-compat-util.h                  |   1 +
 index-helper.c                     | 162 +++++++++++++++++++++++++++++++++++++
 read-cache.c                       | 106 +++++++++++++++++++++---
 shm.c                              |  67 +++++++++++++++
 shm.h                              |  23 ++++++
 10 files changed, 408 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100644 shm.c
 create mode 100644 shm.h

diff --git a/.gitignore b/.gitignore
index 5087ce1..b92f122 100644
--- a/.gitignore
+++ b/.gitignore
@@ -71,6 +71,7 @@
 /git-http-fetch
 /git-http-push
 /git-imap-send
+/git-index-helper
 /git-index-pack
 /git-init
 /git-init-db
diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
new file mode 100644
index 0000000..9db28cf
--- /dev/null
+++ b/Documentation/git-index-helper.txt
@@ -0,0 +1,44 @@
+git-index-helper(1)
+===================
+
+NAME
+----
+git-index-helper - A simple cache daemon for speeding up index file access
+
+SYNOPSIS
+--------
+[verse]
+'git index-helper' [options]
+
+DESCRIPTION
+-----------
+Keep the index file in memory for faster access. This daemon is per
+repository.
+
+OPTIONS
+-------
+
+--exit-after=<n>::
+	Exit if the cached index is not accessed for `<n>`
+	minutes. Specify 0 to wait forever. Default is 10.
+
+NOTES
+-----
+On UNIX-like systems, $GIT_DIR/index-helper.pid contains the process
+id of the daemon. At least on Linux, shared memory objects are
+availble via /dev/shm with the name pattern "git-<something>-<SHA1>".
+Normally the daemon will clean up shared memory objects when it exits.
+But if it crashes, some objects could remain there and they can be
+safely deleted with "rm" command. The following signals are used to
+control the daemon:
+
+SIGHUP::
+	Reread the index.
+
+SIGUSR1::
+	Let the daemon know the index is to be read. It keeps the
+	daemon alive longer, unless `--exit-after=0` is used.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 24bef8d..a6c668b 100644
--- a/Makefile
+++ b/Makefile
@@ -367,6 +367,8 @@ all::
 # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function.
 #
 # Define HAVE_GETDELIM if your system has the getdelim() function.
+#
+# Define HAVE_SHM if you platform support shm_* functions in librt.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -802,6 +804,7 @@ LIB_OBJS += sha1-lookup.o
 LIB_OBJS += sha1_file.o
 LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
+LIB_OBJS += shm.o
 LIB_OBJS += sideband.o
 LIB_OBJS += sigchain.o
 LIB_OBJS += split-index.o
@@ -1430,6 +1433,12 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifdef HAVE_SHM
+	BASIC_CFLAGS	+= -DHAVE_SHM
+	EXTLIBS	        += -lrt
+	PROGRAM_OBJS	+= index-helper.o
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
diff --git a/cache.h b/cache.h
index c43ef3d..bc2f529 100644
--- a/cache.h
+++ b/cache.h
@@ -334,6 +334,8 @@ struct index_state {
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
 		 keep_mmap : 1,
+		 from_shm : 1,
+		 to_shm : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
@@ -560,6 +562,7 @@ extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
+#define REFRESH_DAEMON		(1 << 2)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
diff --git a/config.mak.uname b/config.mak.uname
index 4c68e07..b5108e1 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -37,6 +37,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_GETTIME = YesPlease
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	HAVE_GETDELIM = YesPlease
+	HAVE_SHM = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	HAVE_ALLOCA_H = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index c07e0c1..56945a7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -513,6 +513,7 @@ static inline int ends_with(const char *str, const char *suffix)
 #define PROT_READ 1
 #define PROT_WRITE 2
 #define MAP_PRIVATE 1
+#define MAP_SHARED 2
 #endif
 
 #define mmap git_mmap
diff --git a/index-helper.c b/index-helper.c
new file mode 100644
index 0000000..cf2971d
--- /dev/null
+++ b/index-helper.c
@@ -0,0 +1,162 @@
+#include "cache.h"
+#include "parse-options.h"
+#include "sigchain.h"
+#include "exec_cmd.h"
+#include "split-index.h"
+#include "shm.h"
+#include "lockfile.h"
+
+struct shm {
+	unsigned char sha1[20];
+	void *shm;
+	size_t size;
+};
+
+static struct shm shm_index;
+static struct shm shm_base_index;
+
+static void release_index_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	git_shm_unlink("git-index-%s", sha1_to_hex(is->sha1));
+	is->shm = NULL;
+}
+
+static void cleanup_shm(void)
+{
+	release_index_shm(&shm_index);
+	release_index_shm(&shm_base_index);
+}
+
+static void cleanup(void)
+{
+	unlink(git_path("index-helper.pid"));
+	cleanup_shm();
+}
+
+static void cleanup_on_signal(int sig)
+{
+	cleanup();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static void share_index(struct index_state *istate, struct shm *is)
+{
+	void *new_mmap;
+	if (istate->mmap_size <= 20 ||
+	    hashcmp(istate->sha1,
+		    (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
+	    !hashcmp(istate->sha1, is->sha1) ||
+	    git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate->mmap_size,
+			&new_mmap, PROT_READ | PROT_WRITE, MAP_SHARED,
+			"git-index-%s", sha1_to_hex(istate->sha1)) < 0)
+		return;
+
+	release_index_shm(is);
+	is->size = istate->mmap_size;
+	is->shm = new_mmap;
+	hashcpy(is->sha1, istate->sha1);
+	memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
+
+	/*
+	 * The trailing hash must be written last after everything is
+	 * written. It's the indication that the shared memory is now
+	 * ready.
+	 */
+	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
+}
+
+static void share_the_index(void)
+{
+	if (the_index.split_index && the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_base_index);
+	share_index(&the_index, &shm_index);
+	discard_index(&the_index);
+}
+
+static void refresh(int sig)
+{
+	the_index.keep_mmap = 1;
+	the_index.to_shm    = 1;
+	if (read_cache() < 0)
+		die(_("could not read index"));
+	share_the_index();
+}
+
+#ifdef HAVE_SHM
+
+static void do_nothing(int sig)
+{
+	/*
+	 * what we need is the signal received and interrupts
+	 * sleep(). We don't need to do anything else when receving
+	 * the signal
+	 */
+}
+
+static void loop(const char *pid_file, int idle_in_seconds)
+{
+	sigchain_pop(SIGHUP);	/* pushed by sigchain_push_common */
+	sigchain_push(SIGHUP, refresh);
+	sigchain_push(SIGUSR1, do_nothing);
+	refresh(0);
+	while (sleep(idle_in_seconds))
+		; /* do nothing, all is handled by signal handlers already */
+}
+
+#else
+
+static void loop(const char *pid_file, int idle_in_seconds)
+{
+	die(_("index-helper is not supported on this platform"));
+}
+
+#endif
+
+static const char * const usage_text[] = {
+	N_("git index-helper [options]"),
+	NULL
+};
+
+int main(int argc, char **argv)
+{
+	static struct lock_file lock;
+	struct strbuf sb = STRBUF_INIT;
+	const char *prefix;
+	int fd, idle_in_minutes = 10;
+	struct option options[] = {
+		OPT_INTEGER(0, "exit-after", &idle_in_minutes,
+			    N_("exit if not used after some minutes")),
+		OPT_END()
+	};
+
+	git_extract_argv0_path(argv[0]);
+	git_setup_gettext();
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(usage_text, options);
+
+	prefix = setup_git_directory();
+	if (parse_options(argc, (const char **)argv, prefix,
+			  options, usage_text, 0))
+		die(_("too many arguments"));
+
+	fd = hold_lock_file_for_update(&lock,
+				       git_path("index-helper.pid"),
+				       LOCK_DIE_ON_ERROR);
+	strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
+	write_in_full(fd, sb.buf, sb.len);
+	commit_lock_file(&lock);
+
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
+	if (!idle_in_minutes)
+		idle_in_minutes = 0xffffffff / 60;
+	loop(sb.buf, idle_in_minutes * 60);
+	strbuf_release(&sb);
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 7e387e9..eb4b9b4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "shm.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1541,6 +1542,81 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static void do_poke(struct strbuf *sb, int refresh_cache)
+{
+	char	*start = sb->buf;
+	char	*end   = NULL;
+	pid_t	 pid   = strtoul(start, &end, 10);
+	if (!end || end != sb->buf + sb->len)
+		return;
+	kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
+}
+
+static void poke_daemon(struct index_state *istate,
+			const struct stat *st, int refresh_cache)
+{
+	int fd;
+	struct strbuf sb;
+
+	/* if this is from index-helper, do not poke itself (recursively) */
+	if (istate->to_shm)
+		return;
+
+	fd = open(git_path("index-helper.pid"), O_RDONLY);
+	if (fd < 0)
+		return;
+	strbuf_init(&sb, st->st_size + 1);
+	strbuf_setlen(&sb, st->st_size);
+	if (read_in_full(fd, sb.buf, st->st_size) == st->st_size)
+		do_poke(&sb, refresh_cache);
+	close(fd);
+	strbuf_release(&sb);
+}
+
+static int is_main_index(struct index_state *istate)
+{
+	return istate == &the_index ||
+		(the_index.split_index &&
+		 istate == the_index.split_index->base);
+}
+
+/*
+ * Try to open and verify a cached shm index if available. Return 0 if
+ * succeeds (istate->mmap and istate->mmap_size are updated). Return
+ * negative otherwise.
+ */
+static int try_shm(struct index_state *istate)
+{
+	void *new_mmap = NULL;
+	size_t old_size = istate->mmap_size;
+	ssize_t new_size;
+	const unsigned char *sha1;
+	struct stat st;
+
+	if (!is_main_index(istate) ||
+	    old_size <= 20 ||
+	    stat(git_path("index-helper.pid"), &st))
+		return -1;
+	poke_daemon(istate, &st, 0);
+	sha1 = (unsigned char *)istate->mmap + old_size - 20;
+	new_size = git_shm_map(O_RDONLY, 0700, -1, &new_mmap,
+				 PROT_READ, MAP_SHARED,
+				 "git-index-%s", sha1_to_hex(sha1));
+	if (new_size <= 20 ||
+	    hashcmp((unsigned char *)istate->mmap + old_size - 20,
+		    (unsigned char *)new_mmap + new_size - 20)) {
+		if (new_mmap)
+			munmap(new_mmap, new_size);
+		poke_daemon(istate, &st, 1);
+		return -1;
+	}
+	munmap(istate->mmap, istate->mmap_size);
+	istate->mmap = new_mmap;
+	istate->mmap_size = new_size;
+	istate->from_shm = 1;
+	return 0;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1555,6 +1631,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	istate->from_shm = 0;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
 	fd = open(path, O_RDONLY);
@@ -1574,15 +1651,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
-	if (istate->keep_mmap) {
-		istate->mmap = mmap;
-		istate->mmap_size = mmap_size;
-	}
 	close(fd);
 
-	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	istate->mmap = mmap;
+	istate->mmap_size = mmap_size;
+	if (try_shm(istate) &&
+	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
 		goto unmap;
+	hdr = mmap = istate->mmap;
+	mmap_size = istate->mmap_size;
+	if (!istate->keep_mmap)
+		istate->mmap = NULL;
 
 	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
 	istate->version = ntohl(hdr->hdr_version);
@@ -1662,6 +1741,8 @@ int read_index_from(struct index_state *istate, const char *path)
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 	split_index->base->keep_mmap = istate->keep_mmap;
+	split_index->base->to_shm    = istate->to_shm;
+	split_index->base->from_shm  = istate->from_shm;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1712,6 +1793,8 @@ int discard_index(struct index_state *istate)
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
+	istate->from_shm = 0;
+	istate->to_shm   = 0;
 	return 0;
 }
 
@@ -2138,9 +2221,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
 	       (COMMIT_LOCK | CLOSE_LOCK));
-	if (flags & COMMIT_LOCK)
-		return commit_locked_index(lock);
-	else if (flags & CLOSE_LOCK)
+	if (flags & COMMIT_LOCK) {
+		struct stat st;
+		ret = commit_locked_index(lock);
+		if (!ret && is_main_index(istate) &&
+		    !stat(git_path("index-helper.pid"), &st))
+			poke_daemon(istate, &st, 1);
+		return ret;
+	} else if (flags & CLOSE_LOCK)
 		return close_lock_file(lock);
 	else
 		return ret;
diff --git a/shm.c b/shm.c
new file mode 100644
index 0000000..4ec1a00
--- /dev/null
+++ b/shm.c
@@ -0,0 +1,67 @@
+#include "git-compat-util.h"
+#include "shm.h"
+
+#ifdef HAVE_SHM
+
+#define SHM_PATH_LEN 72		/* we don't create very long paths.. */
+
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...)
+{
+	va_list ap;
+	char path[SHM_PATH_LEN];
+	int fd;
+
+	path[0] = '/';
+	va_start(ap, fmt);
+	vsprintf(path + 1, fmt, ap);
+	va_end(ap);
+	fd = shm_open(path, oflag, perm);
+	if (fd < 0)
+		return -1;
+	if (length > 0 && ftruncate(fd, length)) {
+		shm_unlink(path);
+		close(fd);
+		return -1;
+	}
+	if (length < 0 && !(oflag & O_CREAT)) {
+		struct stat st;
+		if (fstat(fd, &st))
+			die_errno("unable to stat %s", path);
+		length = st.st_size;
+	}
+	*mmap = xmmap(NULL, length, prot, flags, fd, 0);
+	close(fd);
+	if (*mmap == MAP_FAILED) {
+		*mmap = NULL;
+		shm_unlink(path);
+		return -1;
+	}
+	return length;
+}
+
+void git_shm_unlink(const char *fmt, ...)
+{
+	va_list ap;
+	char path[SHM_PATH_LEN];
+
+	path[0] = '/';
+	va_start(ap, fmt);
+	vsprintf(path + 1, fmt, ap);
+	va_end(ap);
+	shm_unlink(path);
+}
+
+#else
+
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...)
+{
+	return -1;
+}
+
+void git_shm_unlink(const char *fmt, ...)
+{
+}
+
+#endif
diff --git a/shm.h b/shm.h
new file mode 100644
index 0000000..798d3fd
--- /dev/null
+++ b/shm.h
@@ -0,0 +1,23 @@
+#ifndef SHM_H
+#define SHM_H
+
+/*
+ * Create or open a shared memory and mmap it. Return mmap size if
+ * successful, -1 otherwise. If successful mmap contains the mmap'd
+ * pointer. If oflag does not contain O_CREAT and length is negative,
+ * the mmap size is retrieved from existing shared memory object.
+ *
+ * The mmap could be freed by munmap, even on Windows. Note that on
+ * Windows, git_shm_unlink() is no-op, so the last unmap will destroy
+ * the shared memory.
+ */
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...);
+
+/*
+ * Unlink a shared memory object. Only needed on POSIX platforms. On
+ * Windows this is no-op.
+ */
+void git_shm_unlink(const char *fmt, ...);
+
+#endif
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/19] trace.c: add GIT_TRACE_INDEX_STATS for index statistics
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (3 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 06/19] index-helper: add --strict David Turner
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git.txt |  1 +
 cache.h               |  1 +
 read-cache.c          | 16 ++++++++++++++++
 trace.c               |  5 ++++-
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 794271e..71a88a8 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -1056,6 +1056,7 @@ of clones and fetches.
 	See 'GIT_TRACE' for available trace output options.
 
 'GIT_TRACE_PACK_STATS'::
+'GIT_TRACE_INDEX_STATS'::
 	Print various statistics.
 
 'GIT_TRACE_SETUP'::
diff --git a/cache.h b/cache.h
index bc2f529..e22296c 100644
--- a/cache.h
+++ b/cache.h
@@ -1835,5 +1835,6 @@ void sleep_millisec(int millisec);
 void safe_create_dir(const char *dir, int share);
 
 void report_pack_stats(struct trace_key *key);
+void report_index_stats(struct trace_key *key);
 
 #endif /* CACHE_H */
diff --git a/read-cache.c b/read-cache.c
index eb4b9b4..7bd3ce4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -50,6 +50,10 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 struct index_state the_index;
 static const char *alternate_index_output;
 
+static unsigned int nr_read_index;
+static unsigned int nr_read_shm_index;
+static unsigned int nr_write_index;
+
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	istate->cache[nr] = ce;
@@ -1614,6 +1618,7 @@ static int try_shm(struct index_state *istate)
 	istate->mmap = new_mmap;
 	istate->mmap_size = new_size;
 	istate->from_shm = 1;
+	nr_read_shm_index++;
 	return 0;
 }
 
@@ -1711,6 +1716,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	}
 	if (!istate->keep_mmap)
 		munmap(mmap, mmap_size);
+	nr_read_index++;
 	return istate->cache_nr;
 
 unmap:
@@ -2197,6 +2203,7 @@ static int do_write_index(struct index_state *istate, int newfd,
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+	nr_write_index++;
 	return 0;
 }
 
@@ -2423,3 +2430,12 @@ void stat_validity_update(struct stat_validity *sv, int fd)
 		fill_stat_data(sv->sd, &st);
 	}
 }
+
+void report_index_stats(struct trace_key *key)
+{
+	trace_printf_key(key, "\n"
+			 "index stats: file reads        = %10u\n"
+			 "index stats: cache reads       = %10u\n"
+			 "index stats: file writes       = %10u\n",
+			 nr_read_index, nr_read_shm_index, nr_write_index);
+}
diff --git a/trace.c b/trace.c
index b1d0885..eea1fa8 100644
--- a/trace.c
+++ b/trace.c
@@ -434,14 +434,17 @@ void trace_command_performance(const char **argv)
 }
 
 static struct trace_key trace_pack_stats = TRACE_KEY_INIT(PACK_STATS);
+static struct trace_key trace_index_stats = TRACE_KEY_INIT(INDEX_STATS);
 
 static void print_stats_atexit(void)
 {
 	report_pack_stats(&trace_pack_stats);
+	report_index_stats(&trace_index_stats);
 }
 
 void trace_stats(void)
 {
-	if (trace_want(&trace_pack_stats))
+	if (trace_want(&trace_pack_stats) ||
+	    trace_want(&trace_index_stats))
 		atexit(print_stats_atexit);
 }
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/19] index-helper: add --strict
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (4 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 05/19] trace.c: add GIT_TRACE_INDEX_STATS for index statistics David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 07/19] daemonize(): set a flag before exiting the main process David Turner
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

There's are "holes" in the index-helper approach because the shared
memory is not verified again by git. If $USER is compromised, shared
memory could be modified. But then they can already modify
$GIT_DIR/index. A more realistic risk is some bugs in index-helper
produce corrupt shared memory. --strict is added to avoid that

Strictly speaking there's still a very small gap where corrupt shared
memory could still be read by git: after we write the trailing SHA-1 in
the shared memory (thus signaling "this shm is ready") and before
verify_shm() detects an error.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-helper.txt |  9 +++++++
 cache.h                            |  1 +
 index-helper.c                     | 48 ++++++++++++++++++++++++++++++++++++++
 read-cache.c                       |  9 ++++---
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 9db28cf..ad40366 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -22,6 +22,15 @@ OPTIONS
 	Exit if the cached index is not accessed for `<n>`
 	minutes. Specify 0 to wait forever. Default is 10.
 
+--strict::
+--no-strict::
+	Strict mode makes index-helper verify the shared memory after
+	it's created. If the result does not match what's read from
+	$GIT_DIR/index, the shared memory is destroyed. This makes
+	index-helper take more than double the amount of time required
+	for reading an index, but because it will happen in the
+	background, it's not noticable. `--strict` is enabled by default.
+
 NOTES
 -----
 On UNIX-like systems, $GIT_DIR/index-helper.pid contains the process
diff --git a/cache.h b/cache.h
index e22296c..1d7e561 100644
--- a/cache.h
+++ b/cache.h
@@ -336,6 +336,7 @@ struct index_state {
 		 keep_mmap : 1,
 		 from_shm : 1,
 		 to_shm : 1,
+		 always_verify_trailing_sha1 : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/index-helper.c b/index-helper.c
index cf2971d..1140bc0 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -14,6 +14,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static int to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -69,11 +70,56 @@ static void share_index(struct index_state *istate, struct shm *is)
 	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
 }
 
+static int verify_shm(void)
+{
+	int i;
+	struct index_state istate;
+	memset(&istate, 0, sizeof(istate));
+	istate.always_verify_trailing_sha1 = 1;
+	istate.to_shm = 1;
+	i = read_index(&istate);
+	if (i != the_index.cache_nr)
+		goto done;
+	for (; i < the_index.cache_nr; i++) {
+		struct cache_entry *base, *ce;
+		/* namelen is checked separately */
+		const unsigned int ondisk_flags =
+			CE_STAGEMASK | CE_VALID | CE_EXTENDED_FLAGS;
+		unsigned int ce_flags, base_flags, ret;
+		base = the_index.cache[i];
+		ce = istate.cache[i];
+		if (ce->ce_namelen != base->ce_namelen ||
+		    strcmp(ce->name, base->name)) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+		ce_flags = ce->ce_flags;
+		base_flags = base->ce_flags;
+		/* only on-disk flags matter */
+		ce->ce_flags   &= ondisk_flags;
+		base->ce_flags &= ondisk_flags;
+		ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
+			     offsetof(struct cache_entry, name) -
+			     offsetof(struct cache_entry, ce_stat_data));
+		ce->ce_flags = ce_flags;
+		base->ce_flags = base_flags;
+		if (ret) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+	}
+done:
+	discard_index(&istate);
+	return i == the_index.cache_nr;
+}
+
 static void share_the_index(void)
 {
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
+	if (to_verify && !verify_shm())
+		cleanup_shm();
 	discard_index(&the_index);
 }
 
@@ -130,6 +176,8 @@ int main(int argc, char **argv)
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_minutes,
 			    N_("exit if not used after some minutes")),
+		OPT_BOOL(0, "strict", &to_verify,
+			 "verify shared memory after creating"),
 		OPT_END()
 	};
 
diff --git a/read-cache.c b/read-cache.c
index 7bd3ce4..1a0ab0c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1660,9 +1660,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 
 	istate->mmap = mmap;
 	istate->mmap_size = mmap_size;
-	if (try_shm(istate) &&
-	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
-		goto unmap;
+	if (try_shm(istate)) {
+		if (verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
+	} else if (istate->always_verify_trailing_sha1 &&
+		   verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
 	hdr = mmap = istate->mmap;
 	mmap_size = istate->mmap_size;
 	if (!istate->keep_mmap)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/19] daemonize(): set a flag before exiting the main process
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (5 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 06/19] index-helper: add --strict David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 08/19] index-helper: add --detach David Turner
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

This allows signal handlers and atexit functions to realize this
situation and not clean up.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/gc.c | 2 +-
 cache.h      | 2 +-
 daemon.c     | 2 +-
 setup.c      | 4 +++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c583aad..37180de 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -385,7 +385,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			 * failure to daemonize is ok, we'll continue
 			 * in foreground
 			 */
-			daemonized = !daemonize();
+			daemonized = !daemonize(NULL);
 		}
 	} else
 		add_repack_all_option();
diff --git a/cache.h b/cache.h
index 1d7e561..cc1f6f5 100644
--- a/cache.h
+++ b/cache.h
@@ -530,7 +530,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int);
 extern int init_db(const char *template_dir, unsigned int flags);
 
 extern void sanitize_stdfds(void);
-extern int daemonize(void);
+extern int daemonize(int *);
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
diff --git a/daemon.c b/daemon.c
index 8d45c33..a5cf954 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1365,7 +1365,7 @@ int main(int argc, char **argv)
 		return execute();
 
 	if (detach) {
-		if (daemonize())
+		if (daemonize(NULL))
 			die("--detach not supported on this platform");
 	} else
 		sanitize_stdfds();
diff --git a/setup.c b/setup.c
index de1a2a7..9adf13f 100644
--- a/setup.c
+++ b/setup.c
@@ -1017,7 +1017,7 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
-int daemonize(void)
+int daemonize(int *daemonized)
 {
 #ifdef NO_POSIX_GOODIES
 	errno = ENOSYS;
@@ -1029,6 +1029,8 @@ int daemonize(void)
 		case -1:
 			die_errno("fork failed");
 		default:
+			if (daemonized)
+				*daemonized = 1;
 			exit(0);
 	}
 	if (setsid() == -1)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/19] index-helper: add --detach
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (6 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 07/19] daemonize(): set a flag before exiting the main process David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 09/19] index-helper: add Windows support David Turner
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-helper.txt |  3 +++
 index-helper.c                     | 10 ++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index ad40366..9ced091 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -31,6 +31,9 @@ OPTIONS
 	for reading an index, but because it will happen in the
 	background, it's not noticable. `--strict` is enabled by default.
 
+--detach::
+	Detach from the shell.
+
 NOTES
 -----
 On UNIX-like systems, $GIT_DIR/index-helper.pid contains the process
diff --git a/index-helper.c b/index-helper.c
index 1140bc0..4dd9656 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -14,7 +14,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
-static int to_verify = 1;
+static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -33,6 +33,8 @@ static void cleanup_shm(void)
 
 static void cleanup(void)
 {
+	if (daemonized)
+		return;
 	unlink(git_path("index-helper.pid"));
 	cleanup_shm();
 }
@@ -172,12 +174,13 @@ int main(int argc, char **argv)
 	static struct lock_file lock;
 	struct strbuf sb = STRBUF_INIT;
 	const char *prefix;
-	int fd, idle_in_minutes = 10;
+	int fd, idle_in_minutes = 10, detach = 0;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_minutes,
 			    N_("exit if not used after some minutes")),
 		OPT_BOOL(0, "strict", &to_verify,
 			 "verify shared memory after creating"),
+		OPT_BOOL(0, "detach", &detach, "detach the process"),
 		OPT_END()
 	};
 
@@ -202,6 +205,9 @@ int main(int argc, char **argv)
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
+	if (detach && daemonize(&daemonized))
+		die_errno("unable to detach");
+
 	if (!idle_in_minutes)
 		idle_in_minutes = 0xffffffff / 60;
 	loop(sb.buf, idle_in_minutes * 60);
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/19] index-helper: add Windows support
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (7 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 08/19] index-helper: add --detach David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-16 11:42   ` Duy Nguyen
  2016-03-09 18:36 ` [PATCH 10/19] read-cache: add watchman 'WAMA' extension David Turner
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Windows supports shared memory, but the semantics is a bit different
than POSIX shm. The most noticeable thing is there's no way to get the
shared memory's size by the reader, and wrapping fstat to do that
would be hell. So the shm size is added near the end, hidden away from
shm users (storing it in headers would cause more problems with munmap,
storing it as a separate shm is even worse).

PostMessage is used instead of UNIX signals for
notification. Lightweight (at least code-wise) on the client side.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 config.mak.uname |  2 ++
 index-helper.c   | 48 ++++++++++++++++++++++++++++
 read-cache.c     | 13 ++++++++
 shm.c            | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 159 insertions(+)

diff --git a/config.mak.uname b/config.mak.uname
index b5108e1..49320c7 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -394,6 +394,7 @@ ifndef DEBUG
 else
 	BASIC_CFLAGS += -Zi -MDd
 endif
+	PROGRAM_OBJS += index-helper.o
 	X = .exe
 endif
 ifeq ($(uname_S),Interix)
@@ -574,6 +575,7 @@ else
 		NO_CURL = YesPlease
 	endif
 endif
+	PROGRAM_OBJS += index-helper.o
 endif
 ifeq ($(uname_S),QNX)
 	COMPAT_CFLAGS += -DSA_RESTART=0
diff --git a/index-helper.c b/index-helper.c
index 4dd9656..cf26da7 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -155,6 +155,51 @@ static void loop(const char *pid_file, int idle_in_seconds)
 		; /* do nothing, all is handled by signal handlers already */
 }
 
+#elif defined(GIT_WINDOWS_NATIVE)
+
+static void loop(const char *pid_file, int idle_in_seconds)
+{
+	HWND hwnd;
+	UINT_PTR timer = 0;
+	MSG msg;
+	HINSTANCE hinst = GetModuleHandle(NULL);
+	WNDCLASS wc;
+
+	/*
+	 * Emulate UNIX signals by sending WM_USER+x to a
+	 * window. Register window class and create a new window to
+	 * catch these messages.
+	 */
+	memset(&wc, 0, sizeof(wc));
+	wc.lpfnWndProc	 = DefWindowProc;
+	wc.hInstance	 = hinst;
+	wc.lpszClassName = "git-index-helper";
+	if (!RegisterClass(&wc))
+		die_errno(_("could not register new window class"));
+
+	hwnd = CreateWindow("git-index-helper", pid_file,
+			    0, 0, 0, 1, 1, NULL, NULL, hinst, NULL);
+	if (!hwnd)
+		die_errno(_("could not register new window"));
+
+	refresh(0);
+	while (1) {
+		timer = SetTimer(hwnd, timer, idle_in_seconds * 1000, NULL);
+		if (!timer)
+			die(_("no timer!"));
+		if (!GetMessage(&msg, hwnd, 0, 0) || msg.message == WM_TIMER)
+			break;
+		switch (msg.message) {
+		case WM_USER:
+			refresh(0);
+			break;
+		default:
+			/* just reset the timer */
+			break;
+		}
+	}
+}
+
 #else
 
 static void loop(const char *pid_file, int idle_in_seconds)
@@ -198,6 +243,9 @@ int main(int argc, char **argv)
 	fd = hold_lock_file_for_update(&lock,
 				       git_path("index-helper.pid"),
 				       LOCK_DIE_ON_ERROR);
+#ifdef GIT_WINDOWS_NATIVE
+	strbuf_addstr(&sb, "HWND");
+#endif
 	strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
 	write_in_full(fd, sb.buf, sb.len);
 	commit_lock_file(&lock);
diff --git a/read-cache.c b/read-cache.c
index 1a0ab0c..16fbdf6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1546,6 +1546,18 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+#if defined(GIT_WINDOWS_NATIVE)
+static void do_poke(struct strbuf *sb, int refresh_cache)
+{
+	HWND hwnd;
+	if (!starts_with(sb->buf, "HWND"))
+		return;
+	hwnd = FindWindow("git-index-helper", sb->buf);
+	if (!hwnd)
+		return;
+	PostMessage(hwnd, refresh_cache ? WM_USER : WM_USER + 1, 0, 0);
+}
+#else
 static void do_poke(struct strbuf *sb, int refresh_cache)
 {
 	char	*start = sb->buf;
@@ -1555,6 +1567,7 @@ static void do_poke(struct strbuf *sb, int refresh_cache)
 		return;
 	kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
 }
+#endif
 
 static void poke_daemon(struct index_state *istate,
 			const struct stat *st, int refresh_cache)
diff --git a/shm.c b/shm.c
index 4ec1a00..04d8a35 100644
--- a/shm.c
+++ b/shm.c
@@ -52,6 +52,102 @@ void git_shm_unlink(const char *fmt, ...)
 	shm_unlink(path);
 }
 
+#elif defined(GIT_WINDOWS_NATIVE)
+
+#define SHM_PATH_LEN 82	/* a little bit longer than POSIX because of "Local\\" */
+
+static ssize_t create_shm_map(int oflag, int perm, ssize_t length,
+			      void **mmap, int prot, int flags,
+			      const char *path, unsigned long page_size)
+{
+	size_t real_length;
+	void *last_page;
+	HANDLE h;
+
+	assert(perm   == 0700);
+	assert(oflag  == (O_CREAT | O_EXCL | O_RDWR));
+	assert(prot   == (PROT_READ | PROT_WRITE));
+	assert(flags  == MAP_SHARED);
+	assert(length >= 0);
+
+	real_length = length;
+	if (real_length % page_size)
+		real_length += page_size - (real_length % page_size);
+	real_length += page_size;
+	h = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0,
+			      real_length, path);
+	if (!h)
+		return -1;
+	*mmap = MapViewOfFile(h, FILE_MAP_ALL_ACCESS, 0, 0, real_length);
+	CloseHandle(h);
+	if (!*mmap)
+		return -1;
+	last_page = (unsigned char *)*mmap + real_length - page_size;
+	*(unsigned long *)last_page = length;
+	return length;
+}
+
+static ssize_t open_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+			    int prot, int flags, const char *path,
+			    unsigned long page_size)
+{
+	void *last_page;
+	HANDLE h;
+
+	assert(perm   == 0700);
+	assert(oflag  == O_RDONLY);
+	assert(prot   == PROT_READ);
+	assert(flags  == MAP_SHARED);
+	assert(length <= 0);
+
+	h = OpenFileMapping(FILE_MAP_READ, FALSE, path);
+	if (!h)
+		return -1;
+	*mmap = MapViewOfFile(h, FILE_MAP_READ, 0, 0, 0);
+	CloseHandle(h);
+	if (!*mmap)
+		return -1;
+	if (length < 0) {
+		MEMORY_BASIC_INFORMATION mbi;
+		if (!VirtualQuery(*mmap, &mbi, sizeof(mbi))) {
+			UnmapViewOfFile(*mmap);
+			return -1;
+		}
+		if (mbi.RegionSize % page_size)
+			die("expected size %lu to be %lu aligned",
+				    mbi.RegionSize, page_size);
+		last_page = (unsigned char *)*mmap + mbi.RegionSize - page_size;
+		length = *(unsigned long *)last_page;
+	}
+	return length;
+}
+
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...)
+{
+	SYSTEM_INFO si;
+	va_list ap;
+	char path[SHM_PATH_LEN];
+
+	GetSystemInfo(&si);
+
+	strcpy(path, "Local\\");
+	va_start(ap, fmt);
+	vsprintf(path + strlen(path), fmt, ap);
+	va_end(ap);
+
+	if (oflag & O_CREAT)
+		return create_shm_map(oflag, perm, length, mmap, prot,
+				      flags, path, si.dwPageSize);
+	else
+		return open_shm_map(oflag, perm, length, mmap, prot,
+				    flags, path, si.dwPageSize);
+}
+
+void git_shm_unlink(const char *fmt, ...)
+{
+}
+
 #else
 
 ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/19] read-cache: add watchman 'WAMA' extension
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (8 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 09/19] index-helper: add Windows support David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 11/19] Add watchman support to reduce index refresh cost David Turner
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The extension contains a bitmap, one bit for each entry in the
index. If the n-th bit is zero, the n-th entry is considered
unchanged, we can ce_mark_uptodate() it without refreshing. If the bit
is non-zero and we found out the corresponding file is clean after
refresh, we can clear the bit.

The 'skipping refresh' bit is not in this patch yet as we would need
watchman. More details in later patches.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  4 ++++
 read-cache.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index cc1f6f5..8f7b4b1 100644
--- a/cache.h
+++ b/cache.h
@@ -182,6 +182,8 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
+#define CE_WATCHMAN_DIRTY  (0x0001)
+
 /*
  * Range 0xFFFF0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -320,6 +322,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define WATCHMAN_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -344,6 +347,7 @@ struct index_state {
 	struct untracked_cache *untracked;
 	void *mmap;
 	size_t mmap_size;
+	char *last_update;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 16fbdf6..85ef15b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -19,6 +19,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "shm.h"
+#include "ewah/ewok.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -41,11 +42,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_WATCHMAN 0x57414D41	  /* "WAMA" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | \
+		 WATCHMAN_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -1224,8 +1227,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			if (ce->ce_flags & CE_WATCHMAN_DIRTY) {
+				ce->ce_flags          &= ~CE_WATCHMAN_DIRTY;
+				istate->cache_changed |= WATCHMAN_CHANGED;
+			}
 			continue;
+		}
 		if (!new) {
 			const char *fmt;
 
@@ -1369,6 +1377,48 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static void mark_no_watchman(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	assert(pos < istate->cache_nr);
+	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+}
+
+static int read_watchman_ext(struct index_state *istate, const void *data,
+			     unsigned long sz)
+{
+	struct ewah_bitmap *bitmap;
+	int ret, len;
+
+	if (memchr(data, 0, sz) == NULL)
+		return error("invalid extension");
+	len = strlen(data) + 1;
+	bitmap = ewah_new();
+	ret = ewah_read_mmap(bitmap, (const char *)data + len, sz - len);
+	if (ret != sz - len) {
+		ewah_free(bitmap);
+		return error("failed to parse ewah bitmap reading watchman index extension");
+	}
+	istate->last_update = xstrdup(data);
+	ewah_each_bit(bitmap, mark_no_watchman, istate);
+	ewah_free(bitmap);
+	return 0;
+}
+
+static void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
+{
+	struct ewah_bitmap *bitmap;
+	int i;
+
+	strbuf_add(sb, istate->last_update, strlen(istate->last_update) + 1);
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHMAN_DIRTY)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1386,6 +1436,11 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+
+	case CACHE_EXT_WATCHMAN:
+		read_watchman_ext(istate, data, sz);
+		break;
+
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1817,6 +1872,8 @@ int discard_index(struct index_state *istate)
 	istate->untracked = NULL;
 	istate->from_shm = 0;
 	istate->to_shm   = 0;
+	free(istate->last_update);
+	istate->last_update = NULL;
 	return 0;
 }
 
@@ -2214,6 +2271,16 @@ static int do_write_index(struct index_state *istate, int newfd,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_watchman_ext(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCHMAN, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/19] Add watchman support to reduce index refresh cost
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (9 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 10/19] read-cache: add watchman 'WAMA' extension David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 12/19] read-cache: allow index-helper to prepare shm before git reads it David Turner
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The previous patch has the logic to clear bits in 'WAMA' bitmap. This
patch has logic to set bits as told by watchman. The missing bit,
_using_ these bits, are not here yet.

A lot of this code is written by David Turner originally, mostly from
[1]. I'm just copying and polishing it a bit.

[1] http://article.gmane.org/gmane.comp.version-control.git/248006

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile           |   7 ++++
 cache.h            |   1 +
 config.c           |   5 +++
 configure.ac       |   8 ++++
 environment.c      |   3 ++
 watchman-support.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 watchman-support.h |   7 ++++
 7 files changed, 146 insertions(+)
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

diff --git a/Makefile b/Makefile
index a6c668b..e51331c 100644
--- a/Makefile
+++ b/Makefile
@@ -1416,6 +1416,12 @@ else
 	LIB_OBJS += thread-utils.o
 endif
 
+ifdef USE_WATCHMAN
+	LIB_H += watchman-support.h
+	LIB_OBJS += watchman-support.o
+	BASIC_CFLAGS += -DUSE_WATCHMAN
+endif
+
 ifdef HAVE_PATHS_H
 	BASIC_CFLAGS += -DHAVE_PATHS_H
 endif
@@ -2164,6 +2170,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
+	@echo USE_WATCHMAN=\''$(subst ','\'',$(subst ','\'',$(USE_WATCHMAN)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index 8f7b4b1..bf20652 100644
--- a/cache.h
+++ b/cache.h
@@ -688,6 +688,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_watchman_sync_timeout;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 9ba40bc..e6dc141 100644
--- a/config.c
+++ b/config.c
@@ -882,6 +882,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.watchmansynctimeout")) {
+		core_watchman_sync_timeout = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/configure.ac b/configure.ac
index 89e2590..6f10a15 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1092,6 +1092,14 @@ AC_COMPILE_IFELSE([BSD_SYSCTL_SRC],
 	HAVE_BSD_SYSCTL=])
 GIT_CONF_SUBST([HAVE_BSD_SYSCTL])
 
+#
+# Check for watchman client library
+
+AC_CHECK_LIB([watchman], [watchman_connect],
+	[USE_WATCHMAN=YesPlease],
+	[USE_WATCHMAN=])
+GIT_CONF_SUBST([USE_WATCHMAN])
+
 ## Other checks.
 # Define USE_PIC if you need the main git objects to be built with -fPIC
 # in order to build and link perl/Git.so.  x86-64 seems to need this.
diff --git a/environment.c b/environment.c
index 6dec9d0..35e03c7 100644
--- a/environment.c
+++ b/environment.c
@@ -94,6 +94,9 @@ int core_preload_index = 1;
  */
 int ignore_untracked_cache_config;
 
+int core_watchman_sync_timeout = 300;
+
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
diff --git a/watchman-support.c b/watchman-support.c
new file mode 100644
index 0000000..08e37ae
--- /dev/null
+++ b/watchman-support.c
@@ -0,0 +1,115 @@
+#include "cache.h"
+#include "watchman-support.h"
+#include "strbuf.h"
+#include <watchman.h>
+
+static struct watchman_query *make_query(const char *last_update)
+{
+	struct watchman_query *query = watchman_query();
+	watchman_query_set_fields(query, WATCHMAN_FIELD_NAME |
+					 WATCHMAN_FIELD_EXISTS |
+					 WATCHMAN_FIELD_NEWER);
+	watchman_query_set_empty_on_fresh(query, 1);
+	query->sync_timeout = core_watchman_sync_timeout;
+	if (*last_update)
+		watchman_query_set_since_oclock(query, last_update);
+	return query;
+}
+
+static struct watchman_query_result* query_watchman(
+	struct index_state *istate, struct watchman_connection *connection,
+	const char *fs_path, const char *last_update)
+{
+	struct watchman_error wm_error;
+	struct watchman_query *query;
+	struct watchman_expression *expr;
+	struct watchman_query_result *result;
+
+	query = make_query(last_update);
+	expr = watchman_true_expression();
+	result = watchman_do_query(connection, fs_path, query, expr, &wm_error);
+	watchman_free_query(query);
+	watchman_free_expression(expr);
+
+	if (!result)
+		warning("Watchman query error: %s (at %s)",
+			wm_error.message,
+			*last_update ? last_update : "the beginning");
+
+	return result;
+}
+
+static void update_index(struct index_state *istate,
+			 struct watchman_query_result *result)
+{
+	int i;
+
+	if (result->is_fresh_instance) {
+		/* let refresh clear them later */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_WATCHMAN_DIRTY;
+		goto done;
+	}
+
+	for (i = 0; i < result->nr; i++) {
+		struct watchman_stat *wm = result->stats + i;
+		int pos;
+
+		if (!strncmp(wm->name, ".git/", 5) ||
+		    strstr(wm->name, "/.git/"))
+			continue;
+
+		pos = index_name_pos(istate, wm->name, strlen(wm->name));
+		if (pos < 0)
+			continue;
+		/* FIXME: ignore staged entries and gitlinks too? */
+
+		istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	}
+
+done:
+	free(istate->last_update);
+	istate->last_update    = xstrdup(result->clock);
+	istate->cache_changed |= WATCHMAN_CHANGED;
+}
+
+int check_watchman(struct index_state *istate)
+{
+	struct watchman_error wm_error;
+	struct watchman_connection *connection;
+	struct watchman_query_result *result;
+	const char *fs_path;
+	struct timeval timeout;
+	/*
+	 * Convert core_watchman_sync_timeout, in milliseconds, to
+	 * struct timeval, in seconds and microseconds.
+	 */
+
+	fs_path = get_git_work_tree();
+	if (!fs_path)
+		return -1;
+
+	timeout.tv_sec = core_watchman_sync_timeout / 1000;
+	timeout.tv_usec = (core_watchman_sync_timeout % 1000) * 1000;
+	connection = watchman_connect(timeout, &wm_error);
+
+	if (!connection) {
+		warning("Watchman watch error: %s", wm_error.message);
+		return -1;
+	}
+
+	if (watchman_watch(connection, fs_path, &wm_error)) {
+		warning("Watchman watch error: %s", wm_error.message);
+		watchman_connection_close(connection);
+		return -1;
+	}
+
+
+	result = query_watchman(istate, connection, fs_path, istate->last_update);
+	watchman_connection_close(connection);
+	if (!result)
+		return -1;
+	update_index(istate, result);
+	watchman_free_query_result(result);
+	return 0;
+}
diff --git a/watchman-support.h b/watchman-support.h
new file mode 100644
index 0000000..ee1ef2c
--- /dev/null
+++ b/watchman-support.h
@@ -0,0 +1,7 @@
+#ifndef WATCHMAN_SUPPORT_H
+#define WATCHMAN_SUPPORT_H
+
+struct index_state;
+int check_watchman(struct index_state *index);
+
+#endif /* WATCHMAN_SUPPORT_H */
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/19] read-cache: allow index-helper to prepare shm before git reads it
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (10 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 11/19] Add watchman support to reduce index refresh cost David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 13/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

If index-helper puts 'W' before pid in $GIT_DIR/index-helper.pid, then
git will sleep for a while, expecting to be waken up by SIGUSR1 when
index-helper has done shm preparation, or after the timeout.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 read-cache.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 85ef15b..57c5df9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1613,14 +1613,46 @@ static void do_poke(struct strbuf *sb, int refresh_cache)
 	PostMessage(hwnd, refresh_cache ? WM_USER : WM_USER + 1, 0, 0);
 }
 #else
+
+static volatile int done_sleeping;
+
+static void mark_done_sleeping(int sig)
+{
+	done_sleeping = 1;
+}
+
+/*
+ * Send a message to the index-helper to let it know that we're going
+ * to read the index.  If refresh_cache is true, then the index-helper
+ * should re-read the index; otherwise, it should just stay alive.
+ *
+ * If the index-helper supports watchman, it will refresh the index
+ * before it hands it over.  Wait up to one second for a response
+ * indicating that the index has been successfully refreshed.
+ *
+ */
 static void do_poke(struct strbuf *sb, int refresh_cache)
 {
-	char	*start = sb->buf;
+	int	 wait  = sb->buf[0] == 'W';
+	char	*start = wait ? sb->buf + 1 : sb->buf;
 	char	*end   = NULL;
 	pid_t	 pid   = strtoul(start, &end, 10);
+	int	 ret;
+	int	 count = 0;
+
+	done_sleeping = 0;
 	if (!end || end != sb->buf + sb->len)
 		return;
-	kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
+	if (!refresh_cache && wait)
+		signal(SIGHUP, mark_done_sleeping);
+	ret = kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
+	if (!refresh_cache && wait) {
+		if (!ret)
+			while (!done_sleeping && count++ < 1000)
+				sleep_millisec(1);
+
+		sigaction(SIGHUP, NULL, NULL);
+	}
 }
 #endif
 
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/19] index-helper: use watchman to avoid refreshing index with lstat()
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (11 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 12/19] read-cache: allow index-helper to prepare shm before git reads it David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 14/19] update-index: enable/disable watchman support David Turner
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Watchman is hidden behind index-helper. Before git tries to read the
index from shm, it notifies index-helper with SIGHUP and sleep,
waiting for index-helper to prepare shm. index-helper then contacts
watchman, updates 'WAMA' extension and put it in a separate shm and
wakes git up with SIGHUP.

Git uses this extension to not lstat unchanged entries. Git only trust
'WAMA' extension when it's received from the separate shm, not from
disk. Unmarked entries are "clean". Marked entries are dirty from
watchman point of view. If it finds out some entries are
'watchman-dirty', but are really unchanged (e.g. the file was changed,
then reverted back), then Git will clear the marking in 'WAMA' before
writing it down.

Hiding watchman behind index-helper means you need both daemons. You
can't run watchman alone. Not so good. But on the other hand, 'git'
binary is not linked to watchman/json libraries, which is good for
packaging. Core git package will run fine without watchman-related
packages. If they need watchman, they can install git-index-helper and
dependencies.

Another reason for tying watchman to index-helper is, when used with
untracked cache, we need to keep track of $GIT_WORK_TREE file
listing. That kind of list can be kept in index-helper.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile       |  5 ++++
 cache.h        |  2 ++
 index-helper.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 read-cache.c   | 43 +++++++++++++++++++++++++++---
 4 files changed, 127 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index e51331c..d79fc0c 100644
--- a/Makefile
+++ b/Makefile
@@ -450,6 +450,7 @@ MSGFMT = msgfmt
 CURL_CONFIG = curl-config
 PTHREAD_LIBS = -lpthread
 PTHREAD_CFLAGS =
+WATCHMAN_LIBS =
 GCOV = gcov
 
 export TCL_PATH TCLTK_PATH
@@ -1419,6 +1420,7 @@ endif
 ifdef USE_WATCHMAN
 	LIB_H += watchman-support.h
 	LIB_OBJS += watchman-support.o
+	WATCHMAN_LIBS = -lwatchman
 	BASIC_CFLAGS += -DUSE_WATCHMAN
 endif
 
@@ -2032,6 +2034,9 @@ git-remote-testsvn$X: remote-testsvn.o GIT-LDFLAGS $(GITLIBS) $(VCSSVN_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) \
 	$(VCSSVN_LIB)
 
+git-index-helper$X: index-helper.o GIT-LDFLAGS $(GITLIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) $(WATCHMAN_LIBS)
+
 $(REMOTE_CURL_ALIASES): $(REMOTE_CURL_PRIMARY)
 	$(QUIET_LNCP)$(RM) $@ && \
 	ln $< $@ 2>/dev/null || \
diff --git a/cache.h b/cache.h
index bf20652..272c928 100644
--- a/cache.h
+++ b/cache.h
@@ -558,6 +558,7 @@ extern int daemonize(int *);
 
 /* Initialize and use the cache information */
 struct lock_file;
+extern int verify_index(const struct index_state *);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
@@ -565,6 +566,7 @@ extern int do_read_index(struct index_state *istate, const char *path,
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
+extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
 #define REFRESH_DAEMON		(1 << 2)
diff --git a/index-helper.c b/index-helper.c
index cf26da7..7e7ce9b 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -5,15 +5,18 @@
 #include "split-index.h"
 #include "shm.h"
 #include "lockfile.h"
+#include "watchman-support.h"
 
 struct shm {
 	unsigned char sha1[20];
 	void *shm;
 	size_t size;
+	pid_t pid;
 };
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static struct shm shm_watchman;
 static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
@@ -25,10 +28,21 @@ static void release_index_shm(struct shm *is)
 	is->shm = NULL;
 }
 
+static void release_watchman_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	git_shm_unlink("git-watchman-%s-%" PRIuMAX,
+		       sha1_to_hex(is->sha1), (uintmax_t)is->pid);
+	is->shm = NULL;
+}
+
 static void cleanup_shm(void)
 {
 	release_index_shm(&shm_index);
 	release_index_shm(&shm_base_index);
+	release_watchman_shm(&shm_watchman);
 }
 
 static void cleanup(void)
@@ -120,13 +134,15 @@ static void share_the_index(void)
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
-	if (to_verify && !verify_shm())
+	if (to_verify && !verify_shm()) {
 		cleanup_shm();
-	discard_index(&the_index);
+		discard_index(&the_index);
+	}
 }
 
 static void refresh(int sig)
 {
+	discard_index(&the_index);
 	the_index.keep_mmap = 1;
 	the_index.to_shm    = 1;
 	if (read_cache() < 0)
@@ -136,7 +152,55 @@ static void refresh(int sig)
 
 #ifdef HAVE_SHM
 
-static void do_nothing(int sig)
+#ifdef USE_WATCHMAN
+static void share_watchman(struct index_state *istate,
+			   struct shm *is, pid_t pid)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *shm;
+
+	write_watchman_ext(&sb, istate);
+	if (git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, sb.len + 20,
+			&shm, PROT_READ | PROT_WRITE, MAP_SHARED,
+			"git-watchman-%s-%" PRIuMAX,
+			sha1_to_hex(istate->sha1), (uintmax_t)pid) == sb.len + 20) {
+		is->size = sb.len + 20;
+		is->shm = shm;
+		is->pid = pid;
+		hashcpy(is->sha1, istate->sha1);
+
+		memcpy(shm, sb.buf, sb.len);
+		hashcpy((unsigned char *)shm + is->size - 20, is->sha1);
+	}
+	strbuf_release(&sb);
+}
+
+static void prepare_with_watchman(pid_t pid)
+{
+	/*
+	 * with the help of watchman, maybe we could detect if
+	 * $GIT_DIR/index is updated..
+	 */
+	if (!verify_index(&the_index))
+		refresh(0);
+
+	if (check_watchman(&the_index))
+		return;
+
+	share_watchman(&the_index, &shm_watchman, pid);
+}
+
+static void prepare_index(int sig, siginfo_t *si, void *context)
+{
+	release_watchman_shm(&shm_watchman);
+	if (the_index.last_update)
+		prepare_with_watchman(si->si_pid);
+	kill(si->si_pid, SIGHUP); /* stop the waiting in poke_daemon() */
+}
+
+#else
+
+static void prepare_index(int sig, siginfo_t *si, void *context)
 {
 	/*
 	 * what we need is the signal received and interrupts
@@ -145,11 +209,21 @@ static void do_nothing(int sig)
 	 */
 }
 
+#endif
+
 static void loop(const char *pid_file, int idle_in_seconds)
 {
+	struct sigaction sa;
+
 	sigchain_pop(SIGHUP);	/* pushed by sigchain_push_common */
 	sigchain_push(SIGHUP, refresh);
-	sigchain_push(SIGUSR1, do_nothing);
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = prepare_index;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_SIGINFO;
+	sigaction(SIGUSR1, &sa, NULL);
+
 	refresh(0);
 	while (sleep(idle_in_seconds))
 		; /* do nothing, all is handled by signal handlers already */
@@ -245,6 +319,8 @@ int main(int argc, char **argv)
 				       LOCK_DIE_ON_ERROR);
 #ifdef GIT_WINDOWS_NATIVE
 	strbuf_addstr(&sb, "HWND");
+#elif defined(USE_WATCHMAN)
+	strbuf_addch(&sb, 'W');	/* see poke_daemon() */
 #endif
 	strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
 	write_in_full(fd, sb.buf, sb.len);
diff --git a/read-cache.c b/read-cache.c
index 57c5df9..78f5f0e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1405,7 +1405,7 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
 	return 0;
 }
 
-static void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
+void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
 {
 	struct ewah_bitmap *bitmap;
 	int i;
@@ -1722,6 +1722,39 @@ static int try_shm(struct index_state *istate)
 	return 0;
 }
 
+static void refresh_by_watchman(struct index_state *istate)
+{
+	void *shm = NULL;
+	int length;
+	int i;
+
+	length = git_shm_map(O_RDONLY, 0700, -1, &shm,
+			     PROT_READ, MAP_SHARED,
+			     "git-watchman-%s-%" PRIuMAX,
+			     sha1_to_hex(istate->sha1),
+			     (uintmax_t)getpid());
+
+	if (length <= 20 ||
+	    hashcmp(istate->sha1, (unsigned char *)shm + length - 20) ||
+	    /*
+	     * No need to clear CE_WATCHMAN_DIRTY set by 'WAMA' on
+	     * disk. Watchman can only set more, not clear any, so
+	     * this is OR mask.
+	     */
+	    read_watchman_ext(istate, shm, length - 20))
+		goto done;
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_stage(ce) || (ce->ce_flags & CE_WATCHMAN_DIRTY))
+			continue;
+		ce_mark_uptodate(ce);
+	}
+done:
+	if (shm)
+		munmap(shm, length);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1842,7 +1875,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
 		post_read_index_from(istate);
-		return ret;
+		goto done;
 	}
 
 	if (split_index->base)
@@ -1863,6 +1896,10 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	post_read_index_from(istate);
+
+done:
+	if (ret > 0 && istate->from_shm && istate->last_update)
+		refresh_by_watchman(istate);
 	return ret;
 }
 
@@ -2164,7 +2201,7 @@ out:
 	return 0;
 }
 
-static int verify_index(const struct index_state *istate)
+int verify_index(const struct index_state *istate)
 {
 	return verify_index_from(istate, get_index_file());
 }
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/19] update-index: enable/disable watchman support
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (12 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 13/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 15/19] unpack-trees: preserve index extensions David Turner
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/update-index.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 1c94ca5..7c08e1c 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -914,6 +914,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, nul_term_line = 0;
 	enum uc_mode untracked_cache = UC_UNSPECIFIED;
+	int use_watchman = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -1012,6 +1013,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    N_("test if the filesystem supports untracked cache"), UC_TEST),
 		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
 			    N_("enable untracked cache without testing the filesystem"), UC_FORCE),
+		OPT_BOOL(0, "watchman", &use_watchman,
+			N_("use or not use watchman to reduce refresh cost")),
 		OPT_END()
 	};
 
@@ -1149,6 +1152,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		die("Bug: bad untracked_cache value: %d", untracked_cache);
 	}
 
+	if (use_watchman > 0) {
+		the_index.last_update    = xstrdup("");
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	} else if (!use_watchman) {
+		the_index.last_update    = NULL;
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/19] unpack-trees: preserve index extensions
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (13 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 14/19] update-index: enable/disable watchman support David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 16/19] index-helper: rewrite pidfile after daemonizing David Turner
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Make git checkout (and other unpack_tree operations) preserve the
untracked cache and watchman status. This is valuable for two reasons:

1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.

2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.

The same logic applies to the watchman state.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 cache.h                           |  1 +
 read-cache.c                      |  8 ++++++++
 t/t7063-status-untracked-cache.sh | 23 +++++++++++++++++++++++
 t/test-lib-functions.sh           |  4 ++++
 unpack-trees.c                    |  1 +
 5 files changed, 37 insertions(+)

diff --git a/cache.h b/cache.h
index 272c928..03c34f1 100644
--- a/cache.h
+++ b/cache.h
@@ -572,6 +572,7 @@ extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define REFRESH_DAEMON		(1 << 2)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
+extern void move_index_extensions(struct index_state *dst, struct index_state *src);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern int index_dir_exists(struct index_state *istate, const char *name, int namelen);
diff --git a/read-cache.c b/read-cache.c
index 78f5f0e..40d789a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2591,3 +2591,11 @@ void report_index_stats(struct trace_key *key)
 			 "index stats: file writes       = %10u\n",
 			 nr_read_index, nr_read_shm_index, nr_write_index);
 }
+
+void move_index_extensions(struct index_state *dst, struct index_state *src)
+{
+	dst->untracked = src->untracked;
+	src->untracked = NULL;
+	dst->last_update = src->last_update;
+	src->last_update = NULL;
+}
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index a971884..a2c8535 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -646,4 +646,27 @@ test_expect_success 'test ident field is working' '
 	test_cmp ../expect ../err
 '
 
+test_expect_success 'untracked cache survives a checkout' '
+	git commit --allow-empty -m empty &&
+	test-dump-untracked-cache >../before &&
+	test_when_finished  "git checkout master" &&
+	git checkout -b other_branch &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after &&
+	test_commit test &&
+	test-dump-untracked-cache >../before &&
+	git checkout master &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
+
+test_expect_success 'untracked cache survives a commit' '
+	test-dump-untracked-cache >../before &&
+	git add done/two &&
+	git commit -m commit &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
 test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8d99eb3..e974b5b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -186,6 +186,10 @@ test_commit () {
 		test_tick
 	fi &&
 	git commit $signoff -m "$1" &&
+	if [ "$(git config core.bare)" = false ]
+	then
+	    git update-index --force-untracked-cache
+	fi
 	git tag "${4:-$1}"
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 9f55cc2..fc90eb3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1215,6 +1215,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
+		move_index_extensions(&o->result, o->dst_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/19] index-helper: rewrite pidfile after daemonizing
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (14 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 15/19] unpack-trees: preserve index extensions David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 17/19] index-helper: process management David Turner
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

When we daemonize, our pid changes, so we need to rewrite the pid file.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 index-helper.c | 48 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/index-helper.c b/index-helper.c
index 7e7ce9b..dc03e1e 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -288,12 +288,33 @@ static const char * const usage_text[] = {
 	NULL
 };
 
-int main(int argc, char **argv)
+static const char *write_pid(void)
 {
+	static struct strbuf sb = STRBUF_INIT;
 	static struct lock_file lock;
-	struct strbuf sb = STRBUF_INIT;
+	int fd;
+
+	strbuf_reset(&sb);
+	fd = hold_lock_file_for_update(&lock,
+				       git_path("index-helper.pid"),
+				       LOCK_DIE_ON_ERROR);
+#ifdef GIT_WINDOWS_NATIVE
+	strbuf_addstr(&sb, "HWND");
+#elif defined(USE_WATCHMAN)
+	strbuf_addch(&sb, 'W');	/* see poke_daemon() */
+#endif
+	strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
+	write_in_full(fd, sb.buf, sb.len);
+	commit_lock_file(&lock);
+
+	return sb.buf;
+}
+
+int main(int argc, char **argv)
+{
 	const char *prefix;
-	int fd, idle_in_minutes = 10, detach = 0;
+	int idle_in_minutes = 10, detach = 0;
+	const char *pid_file;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_minutes,
 			    N_("exit if not used after some minutes")),
@@ -314,17 +335,7 @@ int main(int argc, char **argv)
 			  options, usage_text, 0))
 		die(_("too many arguments"));
 
-	fd = hold_lock_file_for_update(&lock,
-				       git_path("index-helper.pid"),
-				       LOCK_DIE_ON_ERROR);
-#ifdef GIT_WINDOWS_NATIVE
-	strbuf_addstr(&sb, "HWND");
-#elif defined(USE_WATCHMAN)
-	strbuf_addch(&sb, 'W');	/* see poke_daemon() */
-#endif
-	strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
-	write_in_full(fd, sb.buf, sb.len);
-	commit_lock_file(&lock);
+	write_pid();
 
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
@@ -332,9 +343,14 @@ int main(int argc, char **argv)
 	if (detach && daemonize(&daemonized))
 		die_errno("unable to detach");
 
+	/*
+	 * Now that we're daemonized, we need to rewrite the PID file
+	 * because we have a new PID.
+	 */
+	pid_file = write_pid();
+
 	if (!idle_in_minutes)
 		idle_in_minutes = 0xffffffff / 60;
-	loop(sb.buf, idle_in_minutes * 60);
-	strbuf_release(&sb);
+	loop(pid_file, idle_in_minutes * 60);
 	return 0;
 }
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/19] index-helper: process management
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (15 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 16/19] index-helper: rewrite pidfile after daemonizing David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-09 18:36 ` [PATCH 18/19] index-helper: autorun David Turner
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Don't start index-helper when it is already running for a repository.

Keep the index-helper pid file's mtime up-to-date.  Before starting
index-helper, check that (a) the pid file's mtime is recent and (b)
there is some process with that pid.  Of course, it's possible that
the mtime is recent but the index-helper has died anyway (and some other
process has taken its pid), but that's pretty unlikely.

Provide index-helper --kill (to kill the running index-helper) and
index-helper --ignore-existing (to start up a new index-helper even
if we believe that there's one already running).

Add a test for index-helper's pid-file writing and for --kill.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 index-helper.c          | 95 +++++++++++++++++++++++++++++++++++++++++++++----
 t/t7900-index-helper.sh | 23 ++++++++++++
 2 files changed, 111 insertions(+), 7 deletions(-)
 create mode 100755 t/t7900-index-helper.sh

diff --git a/index-helper.c b/index-helper.c
index dc03e1e..a75da60 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -152,6 +152,17 @@ static void refresh(int sig)
 
 #ifdef HAVE_SHM
 
+static void touch_pid_file(void)
+{
+	/*
+	 * If we fail to update the times on index-helper.pid, we've
+	 * probably had the repo deleted out from under us or otherwise
+	 * lost access; might as well give up.
+	 */
+	if (utimes(git_path("index-helper.pid"), NULL))
+		exit(0);
+}
+
 #ifdef USE_WATCHMAN
 static void share_watchman(struct index_state *istate,
 			   struct shm *is, pid_t pid)
@@ -211,9 +222,10 @@ static void prepare_index(int sig, siginfo_t *si, void *context)
 
 #endif
 
-static void loop(const char *pid_file, int idle_in_seconds)
+static void loop(const char *pid_file, int idle_in_minutes)
 {
 	struct sigaction sa;
+	int timeout_count = 0;
 
 	sigchain_pop(SIGHUP);	/* pushed by sigchain_push_common */
 	sigchain_push(SIGHUP, refresh);
@@ -225,19 +237,25 @@ static void loop(const char *pid_file, int idle_in_seconds)
 	sigaction(SIGUSR1, &sa, NULL);
 
 	refresh(0);
-	while (sleep(idle_in_seconds))
-		; /* do nothing, all is handled by signal handlers already */
+	while (timeout_count < idle_in_minutes) {
+		if (sleep(60) == EINTR)
+			timeout_count = 0;
+		else
+			timeout_count ++;
+		touch_pid_file();
+	}
 }
 
 #elif defined(GIT_WINDOWS_NATIVE)
 
-static void loop(const char *pid_file, int idle_in_seconds)
+static void loop(const char *pid_file, int idle_in_minutes)
 {
 	HWND hwnd;
 	UINT_PTR timer = 0;
 	MSG msg;
 	HINSTANCE hinst = GetModuleHandle(NULL);
 	WNDCLASS wc;
+	int timeout_count = 0;
 
 	/*
 	 * Emulate UNIX signals by sending WM_USER+x to a
@@ -258,11 +276,18 @@ static void loop(const char *pid_file, int idle_in_seconds)
 
 	refresh(0);
 	while (1) {
-		timer = SetTimer(hwnd, timer, idle_in_seconds * 1000, NULL);
+		timer = SetTimer(hwnd, timer, 60 * 1000, NULL);
 		if (!timer)
 			die(_("no timer!"));
-		if (!GetMessage(&msg, hwnd, 0, 0) || msg.message == WM_TIMER)
+		if (!GetMessage(&msg, hwnd, 0, 0))
 			break;
+		if (msg.message == WM_TIMER) {
+			timeout_count ++;
+			if (timeout_count == idle_in_minutes)
+				break;
+		}
+		timeout_count = 0;
+		touch_pid_file();
 		switch (msg.message) {
 		case WM_USER:
 			refresh(0);
@@ -288,6 +313,44 @@ static const char * const usage_text[] = {
 	NULL
 };
 
+static int already_running(void)
+{
+	char contents[20] = {0};
+	char *end;
+	int fd, dead, i = 0;
+	time_t now;
+	pid_t pid;
+	struct stat st;
+
+	fd = open(git_path("index-helper.pid"), O_RDONLY);
+	if (fd < 0)
+		return 0;
+
+	if (fstat(fd, &st) < 0)
+		return 0;
+
+	now = time(&now);
+
+	/*
+	 * We touch the pid file every minute, so if a pid file hasn't
+	 * been updated in two minutes, it's out-of-date.
+	 */
+	if (st.st_mtime + 120 < now)
+		return 0;
+
+	read_in_full(fd, &contents, sizeof(contents));
+
+	while (!isdigit(contents[i]) && contents[i])
+		i++;
+	pid = strtoll(contents + i, &end, 10);
+	if (contents == end)
+		return 0;
+	dead = kill(pid, 0);
+	if (!dead)
+		return pid;
+	return 0;
+}
+
 static const char *write_pid(void)
 {
 	static struct strbuf sb = STRBUF_INIT;
@@ -314,6 +377,8 @@ int main(int argc, char **argv)
 {
 	const char *prefix;
 	int idle_in_minutes = 10, detach = 0;
+	int ignore_existing = 0;
+	int kill_existing = 0;
 	const char *pid_file;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_minutes,
@@ -321,6 +386,8 @@ int main(int argc, char **argv)
 		OPT_BOOL(0, "strict", &to_verify,
 			 "verify shared memory after creating"),
 		OPT_BOOL(0, "detach", &detach, "detach the process"),
+		OPT_BOOL(0, "ignore-existing", &ignore_existing, "run even if another index-helper seems to be running for this repo"),
+		OPT_BOOL(0, "kill", &kill_existing, "kill any running index-helper for this repo"),
 		OPT_END()
 	};
 
@@ -335,6 +402,20 @@ int main(int argc, char **argv)
 			  options, usage_text, 0))
 		die(_("too many arguments"));
 
+	if (ignore_existing && kill_existing)
+		die(_("--ignore-existing and --kill don't make sense together"));
+
+	if (!ignore_existing) {
+		pid_t pid = already_running();
+		if (pid) {
+			if (kill_existing)
+				exit(kill(pid, SIGKILL));
+			else if (!detach)
+				warning("index-helper is apparently already running with pid %d", (int)pid);
+			exit(0);
+		}
+	}
+
 	write_pid();
 
 	atexit(cleanup);
@@ -351,6 +432,6 @@ int main(int argc, char **argv)
 
 	if (!idle_in_minutes)
 		idle_in_minutes = 0xffffffff / 60;
-	loop(pid_file, idle_in_minutes * 60);
+	loop(pid_file, idle_in_minutes);
 	return 0;
 }
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
new file mode 100755
index 0000000..6708180
--- /dev/null
+++ b/t/t7900-index-helper.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+#
+# Copyright (c) 2016, Twitter, Inc
+#
+
+test_description='git-index-helper
+
+Testing git index-helper
+'
+
+. ./test-lib.sh
+
+test_expect_success 'index-helper creates usable pid file and can be killed' '
+	test_path_is_missing .git/index-helper.pid &&
+	git index-helper --detach &&
+	test_path_is_file .git/index-helper.pid &&
+	pid=$(sed s/[^0-9]//g .git/index-helper.pid) &&
+	kill -0 $pid &&
+	git index-helper --kill &&
+	! kill -0 $pid
+'
+
+test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/19] index-helper: autorun
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (16 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 17/19] index-helper: process management David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-15 12:12   ` Duy Nguyen
  2016-03-09 18:36 ` [PATCH 19/19] hack: watchman/untracked cache mashup David Turner
  2016-03-29 17:09 ` [PATCH 00/19] index-helper, watchman Torsten Bögershausen
  19 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Introduce a new config option, indexhelper.autorun, to automatically
run git index-helper before starting up a builtin git command.  This
enables users to keep index-helper running without manual
intervention.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 git.c                   | 38 +++++++++++++++++++++++++++++++++++++-
 index-helper.c          | 11 ++++++++++-
 t/t7900-index-helper.sh | 10 ++++++++++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/git.c b/git.c
index a4f6f71..ccf04ab 100644
--- a/git.c
+++ b/git.c
@@ -521,6 +521,40 @@ static void strip_extension(const char **argv)
 #define strip_extension(cmd)
 #endif
 
+static int want_auto_index_helper(void)
+{
+	int want = -1;
+	const char *value = NULL;
+	const char *key = "indexhelper.autorun";
+
+	if (git_config_key_is_valid(key) &&
+	    !git_config_get_value(key, &value)) {
+		int b = git_config_maybe_bool(key, value);
+		want = b >= 0 ? b : 0;
+	}
+	return want;
+}
+
+static void maybe_run_index_helper(struct cmd_struct *cmd)
+{
+	const char *argv[] = {"git-index-helper", "--detach", "--auto", NULL};
+	int status;
+
+	if (!(cmd->option & NEED_WORK_TREE))
+		return;
+
+	if (want_auto_index_helper() <= 0)
+		return;
+
+	trace_argv_printf(argv, "trace: auto index-helper:");
+
+	status = run_command_v_opt(argv, RUN_SILENT_EXEC_FAILURE | RUN_CLEAN_ON_EXIT);
+
+	if (status) {
+		warning("You specified indexhelper.autorun, but running git-index-helper failed");
+	}
+}
+
 static void handle_builtin(int argc, const char **argv)
 {
 	const char *cmd;
@@ -536,8 +570,10 @@ static void handle_builtin(int argc, const char **argv)
 	}
 
 	builtin = get_builtin(cmd);
-	if (builtin)
+	if (builtin) {
+		maybe_run_index_helper(builtin);
 		exit(run_builtin(builtin, argc, argv));
+	}
 }
 
 static void execv_dashed_external(const char **argv)
diff --git a/index-helper.c b/index-helper.c
index a75da60..bc5c328 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -379,6 +379,7 @@ int main(int argc, char **argv)
 	int idle_in_minutes = 10, detach = 0;
 	int ignore_existing = 0;
 	int kill_existing = 0;
+	int nongit = 0, autorun = 0;
 	const char *pid_file;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_minutes,
@@ -388,6 +389,7 @@ int main(int argc, char **argv)
 		OPT_BOOL(0, "detach", &detach, "detach the process"),
 		OPT_BOOL(0, "ignore-existing", &ignore_existing, "run even if another index-helper seems to be running for this repo"),
 		OPT_BOOL(0, "kill", &kill_existing, "kill any running index-helper for this repo"),
+		OPT_BOOL(0, "auto", &autorun, "this is an automatic run of git index-helper, so certain errors can be solved by silently exiting"),
 		OPT_END()
 	};
 
@@ -397,11 +399,18 @@ int main(int argc, char **argv)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(usage_text, options);
 
-	prefix = setup_git_directory();
+	prefix = setup_git_directory_gently(&nongit);
 	if (parse_options(argc, (const char **)argv, prefix,
 			  options, usage_text, 0))
 		die(_("too many arguments"));
 
+	if (nongit) {
+		if (autorun)
+			exit(0);
+		else
+			die("Not a git repository");
+	}
+
 	if (ignore_existing && kill_existing)
 		die(_("--ignore-existing and --kill don't make sense together"));
 
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 6708180..e4f9564 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -20,4 +20,14 @@ test_expect_success 'index-helper creates usable pid file and can be killed' '
 	! kill -0 $pid
 '
 
+test_expect_success 'index-helper autorun works' '
+	rm -f .git/index-helper.pid &&
+	git status &&
+	test_path_is_missing .git/index-helper.pid &&
+	test_config indexhelper.autorun true &&
+	git status &&
+	test_path_is_file .git/index-helper.pid &&
+	git index-helper --kill
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 19/19] hack: watchman/untracked cache mashup
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (17 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 18/19] index-helper: autorun David Turner
@ 2016-03-09 18:36 ` David Turner
  2016-03-15 12:31   ` Duy Nguyen
  2016-03-29 17:09 ` [PATCH 00/19] index-helper, watchman Torsten Bögershausen
  19 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-09 18:36 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

DO NOT APPLY THIS

This is a hack to add untracked cache info to the watchman WAMA
extension.  The idea is that we don't have to stat directories if
watchman hasn't told us of any changes in those dirs.

If this is the right idea, I can go back and add it to the relevant
patch(es).  But I want to see if other folks think it's the right idea
first.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 dir.c              |  23 +++++++--
 dir.h              |   6 +++
 read-cache.c       | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 watchman-support.c |  21 +++++++-
 4 files changed, 182 insertions(+), 10 deletions(-)

diff --git a/dir.c b/dir.c
index 69e0be6..5058b29 100644
--- a/dir.c
+++ b/dir.c
@@ -597,9 +597,9 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
-						    struct untracked_cache_dir *dir,
-						    const char *name, int len)
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len)
 {
 	int first, last;
 	struct untracked_cache_dir *d;
@@ -1726,6 +1726,17 @@ static int valid_cached_dir(struct dir_struct *dir,
 	if (!untracked)
 		return 0;
 
+	if (dir->untracked->use_watchman) {
+		/*
+		 * With watchman, we can trust the untracked cache's
+		 * valid field.
+		 */
+		if (untracked->valid)
+			goto skip_stat;
+		else
+			invalidate_directory(dir->untracked, untracked);
+	}
+
 	if (stat(path->len ? path->buf : ".", &st)) {
 		invalidate_directory(dir->untracked, untracked);
 		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
@@ -1739,6 +1750,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 
+skip_stat:
 	if (untracked->check_only != !!check_only) {
 		invalidate_directory(dir->untracked, untracked);
 		return 0;
@@ -2625,8 +2637,10 @@ static void free_untracked(struct untracked_cache_dir *ucd)
 
 void free_untracked_cache(struct untracked_cache *uc)
 {
-	if (uc)
+	if (uc) {
 		free_untracked(uc->root);
+		string_list_clear(&uc->invalid_untracked, 0);
+	}
 	free(uc);
 }
 
@@ -2775,6 +2789,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 		return NULL;
 
 	uc = xcalloc(1, sizeof(*uc));
+	string_list_init(&uc->invalid_untracked, 1);
 	strbuf_init(&uc->ident, ident_len);
 	strbuf_add(&uc->ident, ident, ident_len);
 	load_sha1_stat(&uc->ss_info_exclude, &ouc->info_exclude_stat,
diff --git a/dir.h b/dir.h
index 3ec3fb0..8fd3f9e 100644
--- a/dir.h
+++ b/dir.h
@@ -142,6 +142,9 @@ struct untracked_cache {
 	int gitignore_invalidated;
 	int dir_invalidated;
 	int dir_opened;
+	/* watchman invalidation data */
+	unsigned int use_watchman : 1;
+	struct string_list invalid_untracked;
 };
 
 struct dir_struct {
@@ -312,4 +315,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 void add_untracked_cache(struct index_state *istate);
 void remove_untracked_cache(struct index_state *istate);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len);
 #endif
diff --git a/read-cache.c b/read-cache.c
index 40d789a..66438d8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1237,7 +1237,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (!new) {
 			const char *fmt;
 
-			if (really && cache_errno == EINVAL) {
+			if (really || cache_errno == EINVAL) {
 				/* If we are doing --really-refresh that
 				 * means the index is not valid anymore.
 				 */
@@ -1377,11 +1377,82 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static struct untracked_cache_dir *find_untracked_cache_dir(
+	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
+	const char *name)
+{
+	int component_len;
+	const char *end;
+	struct untracked_cache_dir *dir;
+
+	if (!*name)
+		return ucd;
+
+	end = strchr(name, '/');
+	if (end)
+		component_len = end - name;
+	else
+		component_len = strlen(name);
+
+	dir = lookup_untracked(uc, ucd, name, component_len);
+	if (dir)
+		return find_untracked_cache_dir(uc, dir, name + component_len + 1);
+
+	return NULL;
+}
+
 static void mark_no_watchman(size_t pos, void *data)
 {
 	struct index_state *istate = data;
+	struct cache_entry *ce = istate->cache[pos];
+	struct strbuf sb = STRBUF_INIT;
+	char *c;
+	struct untracked_cache_dir *dir;
+
 	assert(pos < istate->cache_nr);
-	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	ce->ce_flags |= CE_WATCHMAN_DIRTY;
+
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	strbuf_add(&sb, ce->name, ce_namelen(ce));
+
+	for (c = sb.buf + sb.len - 1; c > sb.buf; c--) {
+		if (*c == '/') {
+			strbuf_setlen(&sb, c - sb.buf);
+			break;
+		}
+	}
+
+	if (c == sb.buf) {
+		strbuf_setlen(&sb, 0);
+	}
+
+	dir = find_untracked_cache_dir(istate->untracked,
+				       istate->untracked->root, sb.buf);
+	if (dir) {
+		fprintf(stderr, "from '%s', dir '%s', invalidated '%s'\n",
+			ce->name, sb.buf, dir->name);
+		dir->valid = 0;
+	} else {
+		fprintf(stderr, "from '%s', dir '%s', not found\n",
+			ce->name, sb.buf);
+	}
+
+	strbuf_release(&sb);
+}
+
+static int mark_untracked_invalid(struct string_list_item *item, void *uc)
+{
+	struct untracked_cache *untracked = uc;
+	struct untracked_cache_dir *dir;
+
+	dir = find_untracked_cache_dir(untracked, untracked->root,
+				       item->string);
+	if (dir)
+		dir->valid = 0;
+
+	return 0;
 }
 
 static int read_watchman_ext(struct index_state *istate, const void *data,
@@ -1389,19 +1460,56 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
 {
 	struct ewah_bitmap *bitmap;
 	int ret, len;
+	uint32_t bitmap_size;
+	uint32_t untracked_nr;
+	const char *untracked;
 
 	if (memchr(data, 0, sz) == NULL)
 		return error("invalid extension");
+
 	len = strlen(data) + 1;
+	memcpy(&bitmap_size, (const char *)data + len, 4);
+	memcpy(&untracked_nr, (const char *)data + len + 4, 4);
+	untracked_nr = ntohl(untracked_nr);
+	bitmap_size = ntohl(bitmap_size);
+
 	bitmap = ewah_new();
-	ret = ewah_read_mmap(bitmap, (const char *)data + len, sz - len);
-	if (ret != sz - len) {
+	ret = ewah_read_mmap(bitmap, (const char *)data + len + 8, bitmap_size);
+	if (ret != bitmap_size) {
 		ewah_free(bitmap);
 		return error("failed to parse ewah bitmap reading watchman index extension");
 	}
 	istate->last_update = xstrdup(data);
 	ewah_each_bit(bitmap, mark_no_watchman, istate);
 	ewah_free(bitmap);
+
+	if (istate->untracked && istate->untracked->root) {
+		int i;
+
+		istate->untracked->use_watchman = 1;
+		untracked = data + len + 8 + bitmap_size;
+		for (i = 0; i < untracked_nr; ++i) {
+			int len = strlen(untracked);
+			string_list_append(&istate->untracked->invalid_untracked,
+					   untracked);
+			untracked += len + 1;
+		}
+
+		for_each_string_list(&istate->untracked->invalid_untracked,
+			 mark_untracked_invalid, istate->untracked);
+
+		string_list_clear(&istate->untracked->invalid_untracked, 0);
+		istate->cache_changed |= WATCHMAN_CHANGED;
+	}
+	return 0;
+}
+
+int untracked_entry_append(struct string_list_item *item, void *sbvoid)
+{
+	struct strbuf *sb = sbvoid;
+
+	strbuf_addstr(sb, item->string);
+	strbuf_addch(sb, 0);
 	return 0;
 }
 
@@ -1409,14 +1517,38 @@ void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
 {
 	struct ewah_bitmap *bitmap;
 	int i;
+	int ewah_start;
+	int ewah_size = 0;
+	int fixup = 0;
 
 	strbuf_add(sb, istate->last_update, strlen(istate->last_update) + 1);
+	fixup = sb->len;
+	strbuf_add(sb, &ewah_size, 4); /* we'll fix this up later */
+	if (istate->untracked) {
+		uint32_t nr = istate->untracked->invalid_untracked.nr;
+		nr = htonl(nr);
+		strbuf_add(sb, &nr, 4);
+	} else {
+		/* zero */
+		strbuf_add(sb, &ewah_size, 4);
+	}
+
+	ewah_start = sb->len;
 	bitmap = ewah_new();
 	for (i = 0; i < istate->cache_nr; i++)
 		if (istate->cache[i]->ce_flags & CE_WATCHMAN_DIRTY)
 			ewah_set(bitmap, i);
 	ewah_serialize_strbuf(bitmap, sb);
 	ewah_free(bitmap);
+
+	/* fix up size field */
+	ewah_size = sb->len - ewah_start;
+	ewah_size = htonl(ewah_size);
+	memcpy(sb->buf + fixup, &ewah_size, 4);
+
+	if (istate->untracked)
+		for_each_string_list(&istate->untracked->invalid_untracked,
+				     untracked_entry_append, sb);
 }
 
 static int read_index_extension(struct index_state *istate,
@@ -1956,7 +2088,7 @@ int unmerged_index(const struct index_state *istate)
 	return 0;
 }
 
-#define WRITE_BUFFER_SIZE 8192
+#define WRITE_BUFFER_SIZE 8192*16
 static unsigned char write_buffer[WRITE_BUFFER_SIZE];
 static unsigned long write_buffer_len;
 
diff --git a/watchman-support.c b/watchman-support.c
index 08e37ae..b7302b9 100644
--- a/watchman-support.c
+++ b/watchman-support.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "watchman-support.h"
 #include "strbuf.h"
+#include "dir.h"
 #include <watchman.h>
 
 static struct watchman_query *make_query(const char *last_update)
@@ -60,8 +61,24 @@ static void update_index(struct index_state *istate,
 			continue;
 
 		pos = index_name_pos(istate, wm->name, strlen(wm->name));
-		if (pos < 0)
+		if (pos < 0) {
+			if (istate->untracked) {
+				char *name = xstrdup(wm->name);
+				char *dname = dirname(name);
+
+				/*
+				 * dirname() returns '.' for the root,
+				 * but we call it ''.
+				 */
+				if (dname[0] == '.' && dname[1] == 0)
+					string_list_append(&istate->untracked->invalid_untracked, "");
+				else
+					string_list_append(&istate->untracked->invalid_untracked,
+							   dname);
+				free(name);
+			}
 			continue;
+		}
 		/* FIXME: ignore staged entries and gitlinks too? */
 
 		istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
@@ -71,6 +88,8 @@ done:
 	free(istate->last_update);
 	istate->last_update    = xstrdup(result->clock);
 	istate->cache_changed |= WATCHMAN_CHANGED;
+	if (istate->untracked)
+		string_list_remove_duplicates(&istate->untracked->invalid_untracked, 0);
 }
 
 int check_watchman(struct index_state *istate)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics
  2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
@ 2016-03-09 22:58   ` Junio C Hamano
  2016-03-10  0:05     ` David Turner
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2016-03-09 22:58 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds

David Turner <dturner@twopensource.com> writes:

> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> trace_stats() is intended for GIT_TRACE_*_STATS variable group and
> GIT_TRACE_PACK_STATS is more like an example how new vars can be added.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/git.txt |  3 +++
>  cache.h               |  2 ++
>  git.c                 |  1 +
>  sha1_file.c           | 24 ++++++++++++++++++++++++
>  trace.c               | 13 +++++++++++++
>  trace.h               |  1 +
>  6 files changed, 44 insertions(+)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 2754af8..794271e 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -1055,6 +1055,9 @@ of clones and fetches.
>  	time of each Git command.
>  	See 'GIT_TRACE' for available trace output options.
>  
> +'GIT_TRACE_PACK_STATS'::
> +	Print various statistics.

"Various" is quite vague.

Perhaps a new description in Documentation/technical/ might be a
good thing to have?

> +void report_pack_stats(struct trace_key *key)
> +{
> +	trace_printf_key(key, "\n"
> +			 "pack_report: getpagesize()            = %10" SZ_FMT "\n"
> +			 "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
> +			 "pack_report: core.packedGitLimit      = %10" SZ_FMT "\n"
> +			 "pack_report: pack_used_ctr            = %10u\n"
> +			 "pack_report: pack_mmap_calls          = %10u\n"
> +			 "pack_report: pack_open_windows        = %10u / %10u\n"
> +			 "pack_report: pack_mapped              = "
> +			 "%10" SZ_FMT " / %10" SZ_FMT "\n"
> +			 "pack_report: pack accesss             = %10u\n",
> +			 sz_fmt(getpagesize()),
> +			 sz_fmt(packed_git_window_size),
> +			 sz_fmt(packed_git_limit),
> +			 pack_used_ctr,
> +			 pack_mmap_calls,
> +			 pack_open_windows, peak_pack_open_windows,
> +			 sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped),
> +			 pack_access_nr);
> +}
> +
>  /*
>   * Open and mmap the index file at path, perform a couple of
>   * consistency checks, then record its information to p.  Return 0 on
> @@ -2238,6 +2261,7 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
>  	static struct trace_key pack_access = TRACE_KEY_INIT(PACK_ACCESS);
>  	trace_printf_key(&pack_access, "%s %"PRIuMAX"\n",
>  			 p->pack_name, (uintmax_t)obj_offset);
> +	pack_access_nr++;
>  }

This looks rather half-hearted, in that those who are interested in
this new number can run "wc -l" on the pack-access trace log without
adding a new "stats" anyway.  It may make the "stats" far more
useful to keep track of the summary information of what would be
written to the pack access log and add to the report_pack_stats()
output things like the average and median distance of seeks
(i.e. differences in the "obj_offset" into the same pack by two
adjacent pack accesse) and the variance, for example?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading
  2016-03-09 18:36 ` [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading David Turner
@ 2016-03-09 23:02   ` Junio C Hamano
  2016-03-10  0:09     ` David Turner
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2016-03-09 23:02 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds

David Turner <dturner@twopensource.com> writes:

> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

As usual in Duy's patches, this one seriously lacks "why".  And also
makes the reader wonder if the memory region is ever unmapped() and
if so under what condition.

> diff --git a/cache.h b/cache.h
> index 7e01403..c43ef3d 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -333,11 +333,14 @@ struct index_state {
>  	struct split_index *split_index;
>  	struct cache_time timestamp;
>  	unsigned name_hash_initialized : 1,
> +		 keep_mmap : 1,
>  		 initialized : 1;
>  	struct hashmap name_hash;
>  	struct hashmap dir_hash;
>  	unsigned char sha1[20];
>  	struct untracked_cache *untracked;
> +	void *mmap;
> +	size_t mmap_size;
>  };
>  
>  extern struct index_state the_index;
> diff --git a/read-cache.c b/read-cache.c
> index 16cc487..7e387e9 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1574,6 +1574,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
>  	if (mmap == MAP_FAILED)
>  		die_errno("unable to map index file");
> +	if (istate->keep_mmap) {
> +		istate->mmap = mmap;
> +		istate->mmap_size = mmap_size;
> +	}
>  	close(fd);
>  
>  	hdr = mmap;
> @@ -1626,10 +1630,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  		src_offset += 8;
>  		src_offset += extsize;
>  	}
> -	munmap(mmap, mmap_size);
> +	if (!istate->keep_mmap)
> +		munmap(mmap, mmap_size);
>  	return istate->cache_nr;
>  
>  unmap:
> +	istate->mmap = NULL;
>  	munmap(mmap, mmap_size);
>  	die("index file corrupt");
>  }
> @@ -1655,6 +1661,7 @@ int read_index_from(struct index_state *istate, const char *path)
>  		discard_index(split_index->base);
>  	else
>  		split_index->base = xcalloc(1, sizeof(*split_index->base));
> +	split_index->base->keep_mmap = istate->keep_mmap;
>  	ret = do_read_index(split_index->base,
>  			    git_path("sharedindex.%s",
>  				     sha1_to_hex(split_index->base_sha1)), 1);
> @@ -1698,6 +1705,10 @@ int discard_index(struct index_state *istate)
>  	free(istate->cache);
>  	istate->cache = NULL;
>  	istate->cache_alloc = 0;
> +	if (istate->keep_mmap && istate->mmap) {
> +		munmap(istate->mmap, istate->mmap_size);
> +		istate->mmap = NULL;
> +	}
>  	discard_split_index(istate);
>  	free_untracked_cache(istate->untracked);
>  	istate->untracked = NULL;

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-03-09 23:09   ` Junio C Hamano
  2016-03-09 23:21     ` Junio C Hamano
                       ` (2 more replies)
  2016-03-15 11:52   ` Duy Nguyen
  1 sibling, 3 replies; 60+ messages in thread
From: Junio C Hamano @ 2016-03-09 23:09 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds

David Turner <dturner@twopensource.com> writes:

> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> Instead of reading the index from disk and worrying about disk
> corruption, the index is cached in memory (memory bit-flips happen
> too, but hopefully less often). The result is faster read. Read time
> is reduced by 70%.
>
> The biggest gain is not having to verify the trailing SHA-1, which
> takes lots of time especially on large index files. But this also
> opens doors for further optimiztions:
>
>  - we could create an in-memory format that's essentially the memory
>    dump of the index to eliminate most of parsing/allocation
>    overhead. The mmap'd memory can be used straight away. Experiment
>    [1] shows we could reduce read time by 88%.
>
>  - we could cache non-index info such as name hash
>
> The shared memory's name folows the template "git-<object>-<SHA1>"
> where <SHA1> is the trailing SHA-1 of the index file. <object> is
> "index" for cached index files (and may be "name-hash" for name-hash
> cache). If such shared memory exists, it contains the same index
> content as on disk. The content is already validated by the daemon and
> git won't validate it again (except comparing the trailing SHA-1s).

This indeed is an interesting approach; what is not explained but
must be is when the on-disk index is updated to reflect the reality
(if I am reading the explanation and the code right, while the
daemon is running, its in-core cache becomes the source of truth by
forcing everybody's read-index-from() to go to the daemon).  The
explanation could be "this is only for read side, and updating the
index happens via the traditional 'write a new file and rename it to
the final place' codepath, at which time the daemon must be told to
re-read it."

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 23:09   ` Junio C Hamano
@ 2016-03-09 23:21     ` Junio C Hamano
  2016-03-10  0:01       ` David Turner
  2016-03-10 11:17       ` Duy Nguyen
  2016-03-10  0:18     ` David Turner
  2016-03-15 11:56     ` Duy Nguyen
  2 siblings, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2016-03-09 23:21 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds

Junio C Hamano <gitster@pobox.com> writes:

> David Turner <dturner@twopensource.com> writes:
>
>> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>>
>> Instead of reading the index from disk and worrying about disk
>> corruption, the index is cached in memory (memory bit-flips happen
>> too, but hopefully less often). The result is faster read. Read time
>> is reduced by 70%.
>>
>> The biggest gain is not having to verify the trailing SHA-1, which
>> takes lots of time especially on large index files.

Come to think of it, wouldn't it be far less intrusive change to
just put the index on a ramdisk and skip the trailing SHA-1
verification, to obtain a similar result with the same trade off
(i.e. blindly trusting memory instead of being paranoid)?

 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 23:21     ` Junio C Hamano
@ 2016-03-10  0:01       ` David Turner
  2016-03-10 11:17       ` Duy Nguyen
  1 sibling, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-10  0:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pclouds

On Wed, 2016-03-09 at 15:21 -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > David Turner <dturner@twopensource.com> writes:
> > 
> > > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > > 
> > > Instead of reading the index from disk and worrying about disk
> > > corruption, the index is cached in memory (memory bit-flips
> > > happen
> > > too, but hopefully less often). The result is faster read. Read
> > > time
> > > is reduced by 70%.
> > > 
> > > The biggest gain is not having to verify the trailing SHA-1,
> > > which
> > > takes lots of time especially on large index files.
> 
> Come to think of it, wouldn't it be far less intrusive change to
> just put the index on a ramdisk and skip the trailing SHA-1
> verification, to obtain a similar result with the same trade off
> (i.e. blindly trusting memory instead of being paranoid)?
> 

1. If the index were stored on a ramdisk, we would *also* have to write
it to durable storage to avoid losing the user's work when they power
off.  That's more complicated.  

2. Duy notes that it is easier to add further optimizations to this --
for instance, we could share a pre-parsed version of the index.  It
would be harder to add these optimizations to the disk format, because
(a) they would take up more space, and (b) they would probably be
harder to write in a single pass, which is presently how index writing
works.

3. If we wanted to just skip SHA-1 verification, we would not need a
ramdisk; it's almost certain that the index would be in the disk cache
most of the time, so regular storage should be very nearly as fast as a
ramdisk. I think mlock might help ensure that the data remains in the
cache, although I'm not sure what the permissions story is on that.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics
  2016-03-09 22:58   ` Junio C Hamano
@ 2016-03-10  0:05     ` David Turner
  2016-03-10 10:59       ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-10  0:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pclouds

On Wed, 2016-03-09 at 14:58 -0800, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
...
> > trace_stats() is intended for GIT_TRACE_*_STATS variable group and
> > GIT_TRACE_PACK_STATS is more like an example how new vars can be
> > added.
...

> > +	pack_access_nr++;
> >  }
> 
> This looks rather half-hearted, in that those who are interested in
> this new number can run "wc -l" on the pack-access trace log without
> adding a new "stats" anyway.  It may make the "stats" far more
> useful to keep track of the summary information of what would be
> written to the pack access log and add to the report_pack_stats()
> output things like the average and median distance of seeks
> (i.e. differences in the "obj_offset" into the same pack by two
> adjacent pack accesse) and the variance, for example?

On reflection, I do not think I need this patch; it was in Duy's series
and the index stats patch would need to be rewritten slightly to get
rid of it, but I'm OK with that.  I wanted to make minimal changes to
Duy's patches, but I'd rather make larger changes than do work on
patches I don't need.

So unless Duy objects, I'm going to drop this from the series.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading
  2016-03-09 23:02   ` Junio C Hamano
@ 2016-03-10  0:09     ` David Turner
  0 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-10  0:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pclouds

On Wed, 2016-03-09 at 15:02 -0800, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > 
> > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > ---
> 
> As usual in Duy's patches, this one seriously lacks "why".  And also
> makes the reader wonder if the memory region is ever unmapped() and
> if so under what condition.

How about this:

Later, we will introduce git index-helper to share this memory with
other git processes.

Since the memory will be shared, it will never be unmapped (although
the kernel may of course choose to page it out).

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 23:09   ` Junio C Hamano
  2016-03-09 23:21     ` Junio C Hamano
@ 2016-03-10  0:18     ` David Turner
  2016-03-15 11:56     ` Duy Nguyen
  2 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-10  0:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pclouds

On Wed, 2016-03-09 at 15:09 -0800, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > 
> > Instead of reading the index from disk and worrying about disk
> > corruption, the index is cached in memory (memory bit-flips happen
> > too, but hopefully less often). The result is faster read. Read
> > time
> > is reduced by 70%.
> > 
> > The biggest gain is not having to verify the trailing SHA-1, which
> > takes lots of time especially on large index files. But this also
> > opens doors for further optimiztions:
> > 
> >  - we could create an in-memory format that's essentially the
> > memory
> >    dump of the index to eliminate most of parsing/allocation
> >    overhead. The mmap'd memory can be used straight away.
> > Experiment
> >    [1] shows we could reduce read time by 88%.
> > 
> >  - we could cache non-index info such as name hash
> > 
> > The shared memory's name folows the template "git-<object>-<SHA1>"
> > where <SHA1> is the trailing SHA-1 of the index file. <object> is
> > "index" for cached index files (and may be "name-hash" for name
> > -hash
> > cache). If such shared memory exists, it contains the same index
> > content as on disk. The content is already validated by the daemon
> > and
> > git won't validate it again (except comparing the trailing SHA-1s).
> 
> This indeed is an interesting approach; what is not explained but
> must be is when the on-disk index is updated to reflect the reality
> (if I am reading the explanation and the code right, while the
> daemon is running, its in-core cache becomes the source of truth by
> forcing everybody's read-index-from() to go to the daemon).  The
> explanation could be "this is only for read side, and updating the
> index happens via the traditional 'write a new file and rename it to
> the final place' codepath, at which time the daemon must be told to
> re-read it."

This seems like the explanation (from the current commit message):

"Git can poke the daemon to tell it to refresh the index cache, or to
keep it alive some more minutes via UNIX signals. It can't give any
real index data directly to the daemon. Real data goes to disk first,
then the daemon reads and verifies it from there. Poking only happens
for $GIT_DIR/index, not temporary index files."

I guess this could be rewritten as:

"Index validity is ensured by the following method: When a read is
requested from the index-helper, it checks the SHA1 of its cached index
copy against the on-disk version.  If they differ, index-helper rereads
the index.  In addition, any git process may explicitly suggest a
reread via a UNIX signal, but this is only an optimization and it is
not necessary for correctness.

In addition, Git can signal the daemon with a heartbeat signal, to keep
the daemon alive longer."

How does that sound?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics
  2016-03-10  0:05     ` David Turner
@ 2016-03-10 10:59       ` Duy Nguyen
  0 siblings, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-10 10:59 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Git Mailing List

On Thu, Mar 10, 2016 at 7:05 AM, David Turner <dturner@twopensource.com> wrote:
> On Wed, 2016-03-09 at 14:58 -0800, Junio C Hamano wrote:
>> David Turner <dturner@twopensource.com> writes:
> ...
>> > trace_stats() is intended for GIT_TRACE_*_STATS variable group and
>> > GIT_TRACE_PACK_STATS is more like an example how new vars can be
>> > added.
> ...
>
>> > +   pack_access_nr++;
>> >  }
>>
>> This looks rather half-hearted, in that those who are interested in
>> this new number can run "wc -l" on the pack-access trace log without
>> adding a new "stats" anyway.  It may make the "stats" far more
>> useful to keep track of the summary information of what would be
>> written to the pack access log and add to the report_pack_stats()
>> output things like the average and median distance of seeks
>> (i.e. differences in the "obj_offset" into the same pack by two
>> adjacent pack accesse) and the variance, for example?
>
> On reflection, I do not think I need this patch; it was in Duy's series
> and the index stats patch would need to be rewritten slightly to get
> rid of it, but I'm OK with that.  I wanted to make minimal changes to
> Duy's patches, but I'd rather make larger changes than do work on
> patches I don't need.
>
> So unless Duy objects, I'm going to drop this from the series.

I wanted something to verify from the tests and this was the best I
could come up. But since I added trace_printf_key(&trace_exclude,... I
think that could serve better to show what's going on, and we can
easily summarize from the trace if we want to. That's one option. If
you can adjust the tests another way, I'm fine too.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 23:21     ` Junio C Hamano
  2016-03-10  0:01       ` David Turner
@ 2016-03-10 11:17       ` Duy Nguyen
  2016-03-10 20:22         ` David Turner
  1 sibling, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-10 11:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, Git Mailing List

On Thu, Mar 10, 2016 at 6:21 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> David Turner <dturner@twopensource.com> writes:
>>
>>> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>>>
>>> Instead of reading the index from disk and worrying about disk
>>> corruption, the index is cached in memory (memory bit-flips happen
>>> too, but hopefully less often). The result is faster read. Read time
>>> is reduced by 70%.
>>>
>>> The biggest gain is not having to verify the trailing SHA-1, which
>>> takes lots of time especially on large index files.
>
> Come to think of it, wouldn't it be far less intrusive change to
> just put the index on a ramdisk and skip the trailing SHA-1
> verification, to obtain a similar result with the same trade off
> (i.e. blindly trusting memory instead of being paranoid)?

I'm straying off-topic again because this reminded me about lmdb :)
For the record when I looked at lmdb I thought of using it as an index
format too. It has everything we wanted in index-v5. Good enough that
we would not need index-helper and split-index to work around current
index format. Though I finally decided it didn't fit well.

On the good side, lmdb is b+ trees, we can update directly on disk
(without rewriting whole file), read directly from mmap'd file and not
worry about corrupting it. There's no SHA-1 verification either (and
we can do crc32 per entry if we want too). It's good.

But, it does not fit well the our locking model (but we can change
this). And we sometimes want to create temporary throw-away indexes
(e.g. partial commits) which I don't think is easy to do. And the
reading directly from mmap does not give us much because we have to do
byte endian conversion  anyway.

Hmm.. on second thought, even though lmdb can't be the default format
(for a bunch of other limitations), it can still be an option for
super big worktrees, just like index-helper being an optional
optimization. Hm.. hm.. if we can support lmdb as index format in
addition to the current format, bringing back some work from Thomas..
We may still need a daemon or something to deal with watchman, but the
underlying mechanism is exactly the same... David, Junio, what do you
think?
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-10 11:17       ` Duy Nguyen
@ 2016-03-10 20:22         ` David Turner
  2016-03-11  1:11           ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-10 20:22 UTC (permalink / raw)
  To: Duy Nguyen, Junio C Hamano; +Cc: Git Mailing List

On Thu, 2016-03-10 at 18:17 +0700, Duy Nguyen wrote:
> On Thu, Mar 10, 2016 at 6:21 AM, Junio C Hamano <gitster@pobox.com>
> wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> > 
> > > David Turner <dturner@twopensource.com> writes:
> > > 
> > > > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > > > 
> > > > Instead of reading the index from disk and worrying about disk
> > > > corruption, the index is cached in memory (memory bit-flips
> > > > happen
> > > > too, but hopefully less often). The result is faster read. Read
> > > > time
> > > > is reduced by 70%.
> > > > 
> > > > The biggest gain is not having to verify the trailing SHA-1,
> > > > which
> > > > takes lots of time especially on large index files.
> > 
> > Come to think of it, wouldn't it be far less intrusive change to
> > just put the index on a ramdisk and skip the trailing SHA-1
> > verification, to obtain a similar result with the same trade off
> > (i.e. blindly trusting memory instead of being paranoid)?
> 
> I'm straying off-topic again because this reminded me about lmdb :)
> For the record when I looked at lmdb I thought of using it as an
> index
> format too. It has everything we wanted in index-v5. Good enough that
> we would not need index-helper and split-index to work around current
> index format. Though I finally decided it didn't fit well.
> 
> On the good side, lmdb is b+ trees, we can update directly on disk
> (without rewriting whole file), read directly from mmap'd file and
> not
> worry about corrupting it. There's no SHA-1 verification either (and
> we can do crc32 per entry if we want too). It's good.
> 
> But, it does not fit well the our locking model (but we can change
> this). And we sometimes want to create temporary throw-away indexes
> (e.g. partial commits) which I don't think is easy to do. And the
> reading directly from mmap does not give us much because we have to
> do
> byte endian conversion  anyway.
> 
> Hmm.. on second thought, even though lmdb can't be the default format
> (for a bunch of other limitations), it can still be an option for
> super big worktrees, just like index-helper being an optional
> optimization. Hm.. hm.. if we can support lmdb as index format in
> addition to the current format, bringing back some work from Thomas..
> We may still need a daemon or something to deal with watchman, but
> the
> underlying mechanism is exactly the same... David, Junio, what do you
> think?

LMDB makes writes faster, since we only have to write the stuff that's
changed.  That's good.

Reads (ignoring SHA verification) will be slightly slower (due to the
btree overhead).  If, in general, we only had to read part of the
index, that would be faster. But a fair amount of git code is written
to assume that it is reasonable to iterate over the whole index.  So I
don't see a win from just switching to LMDB without other changes.

(I guess we could parallelize index deserialization more easily with
lmdb, but I don't know 

The original intent of the index was to be a "staging area" to build up
a tree before committing it.  This implies an alternate view of the
index as a diff against some (possibly-empty) tree.  An index diff or
status operation, then, just requires iterating over the changes. 

I haven't really dug into merges to understand whether it would be
reasonable for them to use a representation like this, and what that
would do to speed there.

In general, git takes the commit side of the commit-diff duality. This
is generally a good thing, because commits are simpler to reason about.
But I wonder if we could do better for this specific case.

But I don't think switching to LMDB would replace index-helper.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-10 20:22         ` David Turner
@ 2016-03-11  1:11           ` Duy Nguyen
  0 siblings, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-11  1:11 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Git Mailing List

On Fri, Mar 11, 2016 at 3:22 AM, David Turner <dturner@twopensource.com> wrote:
> Reads (ignoring SHA verification) will be slightly slower (due to the
> btree overhead).  If, in general, we only had to read part of the
> index, that would be faster. But a fair amount of git code is written
> to assume that it is reasonable to iterate over the whole index.  So I
> don't see a win from just switching to LMDB without other changes.

I didn't think of the btree overhead. Will need to run some test.. I
think lmdb just gives us pointers to mmap'd file, so an entire index
reading can be fast. We just record where the on-disk data is. We need
to make index-on-lmdb use native byte endian though to avoid
conversion and actually reading all index entries because of that.

> (I guess we could parallelize index deserialization more easily with
> lmdb, but I don't know
>
> The original intent of the index was to be a "staging area" to build up
> a tree before committing it.  This implies an alternate view of the
> index as a diff against some (possibly-empty) tree.  An index diff or
> status operation, then, just requires iterating over the changes.
>
> I haven't really dug into merges to understand whether it would be
> reasonable for them to use a representation like this, and what that
> would do to speed there.

I think we keep the same internal representation of the index, a list
of paths with extra info like stat and some flags. Minimal impacts to
the code base that way. There are some cases (I think) where we just
need to read a portion of the index (filtered by pathspec). If we
don't update the index afterwards, lmdb partial reads definitely help.
I need to check if those cases are common (and worth optimizing for)
though.

> In general, git takes the commit side of the commit-diff duality. This
> is generally a good thing, because commits are simpler to reason about.
> But I wonder if we could do better for this specific case.
>
> But I don't think switching to LMDB would replace index-helper.

The point of index-helper is reduce reading cost (let's leave watchman
out for now). But it looks like we're not certain if using lmdb can
reduce reading cost at lower complexity. I'll give it some thought
over the weekend. Maybe in the end we better just go with
index-helper...
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
  2016-03-09 23:09   ` Junio C Hamano
@ 2016-03-15 11:52   ` Duy Nguyen
  1 sibling, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-15 11:52 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@twopensource.com> wrote:
> Git can poke the daemon to tell it to refresh the index cache, or to
> keep it alive some more minutes via UNIX signals.

The reason I went with UNIX signals was because it made it possible to
make a simple GetMessage loop, the only thing I can remember from my
Windows time, on Windows later. It sounded clever, but because this is
more like UDP (vs TCP) it's harder for communication. For example, we
can't get a confirmation after a request... UNIX sockets would be more
natural.

Since this patch was written, watchman has gained Windows support. I
just looked at the code, it uses named pipe on Windows. So maybe we
can just go with that too (if only because it has been proven working
in practice) and we can go back to UNIX sockets on the *nix side. Too
bad we can't just copy some functions from watchman because of license
incompatibility. But we can leave Windows support to gfw team now, I
think.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-09 23:09   ` Junio C Hamano
  2016-03-09 23:21     ` Junio C Hamano
  2016-03-10  0:18     ` David Turner
@ 2016-03-15 11:56     ` Duy Nguyen
  2016-03-15 15:56       ` Junio C Hamano
  2 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-15 11:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, Git Mailing List

On Thu, Mar 10, 2016 at 6:09 AM, Junio C Hamano <gitster@pobox.com> wrote:
> David Turner <dturner@twopensource.com> writes:
>
>> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>>
>> Instead of reading the index from disk and worrying about disk
>> corruption, the index is cached in memory (memory bit-flips happen
>> too, but hopefully less often). The result is faster read. Read time
>> is reduced by 70%.
>>
>> The biggest gain is not having to verify the trailing SHA-1, which
>> takes lots of time especially on large index files. But this also
>> opens doors for further optimiztions:
>>
>>  - we could create an in-memory format that's essentially the memory
>>    dump of the index to eliminate most of parsing/allocation
>>    overhead. The mmap'd memory can be used straight away. Experiment
>>    [1] shows we could reduce read time by 88%.
>>
>>  - we could cache non-index info such as name hash
>>
>> The shared memory's name folows the template "git-<object>-<SHA1>"
>> where <SHA1> is the trailing SHA-1 of the index file. <object> is
>> "index" for cached index files (and may be "name-hash" for name-hash
>> cache). If such shared memory exists, it contains the same index
>> content as on disk. The content is already validated by the daemon and
>> git won't validate it again (except comparing the trailing SHA-1s).
>
> This indeed is an interesting approach; what is not explained but
> must be is when the on-disk index is updated to reflect the reality
> (if I am reading the explanation and the code right, while the
> daemon is running, its in-core cache becomes the source of truth by
> forcing everybody's read-index-from() to go to the daemon).  The
> explanation could be "this is only for read side, and updating the
> index happens via the traditional 'write a new file and rename it to
> the final place' codepath, at which time the daemon must be told to
> re-read it."

Another aspect that's not mentioned is, we keep this daemon's logic as
thin as possible. The "brain" stays in git. So the daemon can read and
validate stuff, but that's about all it's allowed to do. It's not
supposed to add/create new contents. It's not even allowed to accept
direct updates from git.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-09 18:36 ` [PATCH 18/19] index-helper: autorun David Turner
@ 2016-03-15 12:12   ` Duy Nguyen
  2016-03-15 14:26     ` Johannes Schindelin
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-15 12:12 UTC (permalink / raw)
  To: David Turner, Johannes Schindelin; +Cc: Git Mailing List

On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@twopensource.com> wrote:
> Introduce a new config option, indexhelper.autorun, to automatically
> run git index-helper before starting up a builtin git command.  This
> enables users to keep index-helper running without manual
> intervention.

This could be a problem on Windows because "index-helper --detach"
does not work there. I have no idea how "daemons" are managed on
Windows and not sure if our design is still good when such a "daemon"
is added on Windows. So I'm pulling Johannes in for his opinions.

Background for Johannes. We're adding "git index-helper" daemon (one
per repo) to cache the index in memory to speed up index load time
(and in future probably name-hash too, I think it's also more often
used on Windows because of case-insensitive fs). It also enables
watchman (on Windows) for faster refresh. This patch allows to start
the daemon automatically if it's not running. But I don't know it will
work ok on Windows.

Assuming that "index-helper" service has to be installed and started
from system, there can only be one service running right? This clashes
with the per-repo daemon design... I think it can stilf work, if the
main service just spawns new process, one for each repo. But again I'm
not sure.

> Signed-off-by: David Turner <dturner@twopensource.com>
> ---
>  git.c                   | 38 +++++++++++++++++++++++++++++++++++++-
>  index-helper.c          | 11 ++++++++++-
>  t/t7900-index-helper.sh | 10 ++++++++++
>  3 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/git.c b/git.c
> index a4f6f71..ccf04ab 100644
> --- a/git.c
> +++ b/git.c
> @@ -521,6 +521,40 @@ static void strip_extension(const char **argv)
>  #define strip_extension(cmd)
>  #endif
>
> +static int want_auto_index_helper(void)
> +{
> +       int want = -1;
> +       const char *value = NULL;
> +       const char *key = "indexhelper.autorun";
> +
> +       if (git_config_key_is_valid(key) &&
> +           !git_config_get_value(key, &value)) {
> +               int b = git_config_maybe_bool(key, value);
> +               want = b >= 0 ? b : 0;
> +       }
> +       return want;
> +}
> +
> +static void maybe_run_index_helper(struct cmd_struct *cmd)
> +{
> +       const char *argv[] = {"git-index-helper", "--detach", "--auto", NULL};
> +       int status;
> +
> +       if (!(cmd->option & NEED_WORK_TREE))
> +               return;
> +
> +       if (want_auto_index_helper() <= 0)
> +               return;
> +
> +       trace_argv_printf(argv, "trace: auto index-helper:");
> +
> +       status = run_command_v_opt(argv, RUN_SILENT_EXEC_FAILURE | RUN_CLEAN_ON_EXIT);
> +
> +       if (status) {
> +               warning("You specified indexhelper.autorun, but running git-index-helper failed");
> +       }
> +}
> +
>  static void handle_builtin(int argc, const char **argv)
>  {
>         const char *cmd;
> @@ -536,8 +570,10 @@ static void handle_builtin(int argc, const char **argv)
>         }
>
>         builtin = get_builtin(cmd);
> -       if (builtin)
> +       if (builtin) {
> +               maybe_run_index_helper(builtin);
>                 exit(run_builtin(builtin, argc, argv));
> +       }
>  }
>
>  static void execv_dashed_external(const char **argv)
> diff --git a/index-helper.c b/index-helper.c
> index a75da60..bc5c328 100644
> --- a/index-helper.c
> +++ b/index-helper.c
> @@ -379,6 +379,7 @@ int main(int argc, char **argv)
>         int idle_in_minutes = 10, detach = 0;
>         int ignore_existing = 0;
>         int kill_existing = 0;
> +       int nongit = 0, autorun = 0;
>         const char *pid_file;
>         struct option options[] = {
>                 OPT_INTEGER(0, "exit-after", &idle_in_minutes,
> @@ -388,6 +389,7 @@ int main(int argc, char **argv)
>                 OPT_BOOL(0, "detach", &detach, "detach the process"),
>                 OPT_BOOL(0, "ignore-existing", &ignore_existing, "run even if another index-helper seems to be running for this repo"),
>                 OPT_BOOL(0, "kill", &kill_existing, "kill any running index-helper for this repo"),
> +               OPT_BOOL(0, "auto", &autorun, "this is an automatic run of git index-helper, so certain errors can be solved by silently exiting"),
>                 OPT_END()
>         };
>
> @@ -397,11 +399,18 @@ int main(int argc, char **argv)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage_with_options(usage_text, options);
>
> -       prefix = setup_git_directory();
> +       prefix = setup_git_directory_gently(&nongit);
>         if (parse_options(argc, (const char **)argv, prefix,
>                           options, usage_text, 0))
>                 die(_("too many arguments"));
>
> +       if (nongit) {
> +               if (autorun)
> +                       exit(0);
> +               else
> +                       die("Not a git repository");
> +       }
> +
>         if (ignore_existing && kill_existing)
>                 die(_("--ignore-existing and --kill don't make sense together"));
>
> diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
> index 6708180..e4f9564 100755
> --- a/t/t7900-index-helper.sh
> +++ b/t/t7900-index-helper.sh
> @@ -20,4 +20,14 @@ test_expect_success 'index-helper creates usable pid file and can be killed' '
>         ! kill -0 $pid
>  '
>
> +test_expect_success 'index-helper autorun works' '
> +       rm -f .git/index-helper.pid &&
> +       git status &&
> +       test_path_is_missing .git/index-helper.pid &&
> +       test_config indexhelper.autorun true &&
> +       git status &&
> +       test_path_is_file .git/index-helper.pid &&
> +       git index-helper --kill
> +'
> +
>  test_done
> --
> 2.4.2.767.g62658d5-twtrsrc
>



-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/19] hack: watchman/untracked cache mashup
  2016-03-09 18:36 ` [PATCH 19/19] hack: watchman/untracked cache mashup David Turner
@ 2016-03-15 12:31   ` Duy Nguyen
  2016-03-17  0:56     ` David Turner
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-15 12:31 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@twopensource.com> wrote:
>  static struct watchman_query *make_query(const char *last_update)
> @@ -60,8 +61,24 @@ static void update_index(struct index_state *istate,
>                         continue;
>
>                 pos = index_name_pos(istate, wm->name, strlen(wm->name));
> -               if (pos < 0)
> +               if (pos < 0) {
> +                       if (istate->untracked) {
> +                               char *name = xstrdup(wm->name);
> +                               char *dname = dirname(name);
> +
> +                               /*
> +                                * dirname() returns '.' for the root,
> +                                * but we call it ''.
> +                                */
> +                               if (dname[0] == '.' && dname[1] == 0)
> +                                       string_list_append(&istate->untracked->invalid_untracked, "");
> +                               else
> +                                       string_list_append(&istate->untracked->invalid_untracked,
> +                                                          dname);
> +                               free(name);
> +                       }
>                         continue;
> +               }

So if we detect an updated file that's not in the index, we are
prepared to invalidate that path, correct? We may invalidate more than
necessary if that's true. Imagine a.o is already ignored. If it's
rebuilt, we should not need to update untracked cache.

What I had in mind (and argued with watchman devs a bit [1]) was
maintain the file list of each clock and compare the file list of
different clocks to figure out what files are added or deleted. Then
we can generate invalidation list more accurately. We need to keep at
least one file list corresponds to  the clock saved in the index. When
we get the refresh request, we get a new file list (with new clock),
do a diff then save the invalidation list as an extension. Once we
notice that new clock is written back in index, we can discard older
file lists. In theory we should not need to keep too many file lists,
so even if one list is big, it should not be a big problem.

I have a note with me about race conditions with this approach, but I
haven't remembered exactly why or how yet.. My recent thoughts about
it though, are race conditions when you do "git status" is probably
tolerable. After all if you're doing some I/O when you do git-status,
you're guaranteed to miss some updates.

[1] https://github.com/facebook/watchman/issues/65
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-15 12:12   ` Duy Nguyen
@ 2016-03-15 14:26     ` Johannes Schindelin
  2016-03-16 11:37       ` Duy Nguyen
  2016-03-16 18:11       ` David Turner
  0 siblings, 2 replies; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-15 14:26 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Hi Duy,

On Tue, 15 Mar 2016, Duy Nguyen wrote:

> On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@twopensource.com> wrote:
> > Introduce a new config option, indexhelper.autorun, to automatically
> > run git index-helper before starting up a builtin git command.  This
> > enables users to keep index-helper running without manual
> > intervention.
> 
> This could be a problem on Windows because "index-helper --detach"
> does not work there. I have no idea how "daemons" are managed on
> Windows and not sure if our design is still good when such a "daemon"
> is added on Windows. So I'm pulling Johannes in for his opinions.
> 
> Background for Johannes. We're adding "git index-helper" daemon (one
> per repo) to cache the index in memory to speed up index load time
> (and in future probably name-hash too, I think it's also more often
> used on Windows because of case-insensitive fs). It also enables
> watchman (on Windows) for faster refresh. This patch allows to start
> the daemon automatically if it's not running. But I don't know it will
> work ok on Windows.
> 
> Assuming that "index-helper" service has to be installed and started
> from system, there can only be one service running right? This clashes
> with the per-repo daemon design... I think it can stilf work, if the
> main service just spawns new process, one for each repo. But again I'm
> not sure.

If we want to run the process as a Windows service, you are correct, there
really can only be one. Worse: it runs with admin privileges.

But why not just keep it running as a detached process? We can run those
on Windows, and if we're opening a named pipe whose name reveals the
one-to-one mapping with the index in question, I think we are fine (read:
we can detect whether the process is running already).

We can even tell those processes to have a timeout, or to react to other
system events.

Please note that I am *very* interested in this feature (speeding up index
operations).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
  2016-03-15 11:56     ` Duy Nguyen
@ 2016-03-15 15:56       ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2016-03-15 15:56 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> Another aspect that's not mentioned is, we keep this daemon's logic as
> thin as possible. The "brain" stays in git. So the daemon can read and
> validate stuff, but that's about all it's allowed to do. It's not
> supposed to add/create new contents. It's not even allowed to accept
> direct updates from git.

That explanation does make the intent clear. It is a kind of design
decision that needs to be made early and that is hard to change
later (I am not saying I see the need of changing, though), so it is
worth stating explicitly to guide future readers and updaters of the
code.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-15 14:26     ` Johannes Schindelin
@ 2016-03-16 11:37       ` Duy Nguyen
  2016-03-16 18:11       ` David Turner
  1 sibling, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-16 11:37 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List

On Tue, Mar 15, 2016 at 9:26 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Tue, 15 Mar 2016, Duy Nguyen wrote:
>
>> On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@twopensource.com> wrote:
>> > Introduce a new config option, indexhelper.autorun, to automatically
>> > run git index-helper before starting up a builtin git command.  This
>> > enables users to keep index-helper running without manual
>> > intervention.
>>
>> This could be a problem on Windows because "index-helper --detach"
>> does not work there. I have no idea how "daemons" are managed on
>> Windows and not sure if our design is still good when such a "daemon"
>> is added on Windows. So I'm pulling Johannes in for his opinions.
>>
>> Background for Johannes. We're adding "git index-helper" daemon (one
>> per repo) to cache the index in memory to speed up index load time
>> (and in future probably name-hash too, I think it's also more often
>> used on Windows because of case-insensitive fs). It also enables
>> watchman (on Windows) for faster refresh. This patch allows to start
>> the daemon automatically if it's not running. But I don't know it will
>> work ok on Windows.
>>
>> Assuming that "index-helper" service has to be installed and started
>> from system, there can only be one service running right? This clashes
>> with the per-repo daemon design... I think it can stilf work, if the
>> main service just spawns new process, one for each repo. But again I'm
>> not sure.
>
> If we want to run the process as a Windows service, you are correct, there
> really can only be one. Worse: it runs with admin privileges.
>
> But why not just keep it running as a detached process?

Because I'm a Windows ignorant who assumed that it's the only way.

> We can run those
> on Windows, and if we're opening a named pipe whose name reveals the
> one-to-one mapping with the index in question, I think we are fine (read:
> we can detect whether the process is running already).
>
> We can even tell those processes to have a timeout, or to react to other
> system events.

So what's you're saying is, we can make this --detach option works,
like how it's daemonized in *nix, right? Then we should have no
problem with this autorun patch. Thanks for checking.

> Please note that I am *very* interested in this feature (speeding up index
> operations).

Yeah I thought so ;-) I'm counting on gfw devs to complete the Windows
part, if this series gets in.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/19] index-helper: add Windows support
  2016-03-09 18:36 ` [PATCH 09/19] index-helper: add Windows support David Turner
@ 2016-03-16 11:42   ` Duy Nguyen
  2016-03-17 12:18     ` Johannes Schindelin
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-16 11:42 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

I think it's safe to drop this patch now. It's mostly to prove that it
could work on Windows. But I don't think it's tested a lot (even I
only occasionally test it under wine).

On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@twopensource.com> wrote:
> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> Windows supports shared memory, but the semantics is a bit different
> than POSIX shm. The most noticeable thing is there's no way to get the
> shared memory's size by the reader, and wrapping fstat to do that
> would be hell. So the shm size is added near the end, hidden away from
> shm users (storing it in headers would cause more problems with munmap,
> storing it as a separate shm is even worse).
>
> PostMessage is used instead of UNIX signals for
> notification. Lightweight (at least code-wise) on the client side.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  config.mak.uname |  2 ++
>  index-helper.c   | 48 ++++++++++++++++++++++++++++
>  read-cache.c     | 13 ++++++++
>  shm.c            | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 159 insertions(+)
>
> diff --git a/config.mak.uname b/config.mak.uname
> index b5108e1..49320c7 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -394,6 +394,7 @@ ifndef DEBUG
>  else
>         BASIC_CFLAGS += -Zi -MDd
>  endif
> +       PROGRAM_OBJS += index-helper.o
>         X = .exe
>  endif
>  ifeq ($(uname_S),Interix)
> @@ -574,6 +575,7 @@ else
>                 NO_CURL = YesPlease
>         endif
>  endif
> +       PROGRAM_OBJS += index-helper.o
>  endif
>  ifeq ($(uname_S),QNX)
>         COMPAT_CFLAGS += -DSA_RESTART=0
> diff --git a/index-helper.c b/index-helper.c
> index 4dd9656..cf26da7 100644
> --- a/index-helper.c
> +++ b/index-helper.c
> @@ -155,6 +155,51 @@ static void loop(const char *pid_file, int idle_in_seconds)
>                 ; /* do nothing, all is handled by signal handlers already */
>  }
>
> +#elif defined(GIT_WINDOWS_NATIVE)
> +
> +static void loop(const char *pid_file, int idle_in_seconds)
> +{
> +       HWND hwnd;
> +       UINT_PTR timer = 0;
> +       MSG msg;
> +       HINSTANCE hinst = GetModuleHandle(NULL);
> +       WNDCLASS wc;
> +
> +       /*
> +        * Emulate UNIX signals by sending WM_USER+x to a
> +        * window. Register window class and create a new window to
> +        * catch these messages.
> +        */
> +       memset(&wc, 0, sizeof(wc));
> +       wc.lpfnWndProc   = DefWindowProc;
> +       wc.hInstance     = hinst;
> +       wc.lpszClassName = "git-index-helper";
> +       if (!RegisterClass(&wc))
> +               die_errno(_("could not register new window class"));
> +
> +       hwnd = CreateWindow("git-index-helper", pid_file,
> +                           0, 0, 0, 1, 1, NULL, NULL, hinst, NULL);
> +       if (!hwnd)
> +               die_errno(_("could not register new window"));
> +
> +       refresh(0);
> +       while (1) {
> +               timer = SetTimer(hwnd, timer, idle_in_seconds * 1000, NULL);
> +               if (!timer)
> +                       die(_("no timer!"));
> +               if (!GetMessage(&msg, hwnd, 0, 0) || msg.message == WM_TIMER)
> +                       break;
> +               switch (msg.message) {
> +               case WM_USER:
> +                       refresh(0);
> +                       break;
> +               default:
> +                       /* just reset the timer */
> +                       break;
> +               }
> +       }
> +}
> +
>  #else
>
>  static void loop(const char *pid_file, int idle_in_seconds)
> @@ -198,6 +243,9 @@ int main(int argc, char **argv)
>         fd = hold_lock_file_for_update(&lock,
>                                        git_path("index-helper.pid"),
>                                        LOCK_DIE_ON_ERROR);
> +#ifdef GIT_WINDOWS_NATIVE
> +       strbuf_addstr(&sb, "HWND");
> +#endif
>         strbuf_addf(&sb, "%" PRIuMAX, (uintmax_t) getpid());
>         write_in_full(fd, sb.buf, sb.len);
>         commit_lock_file(&lock);
> diff --git a/read-cache.c b/read-cache.c
> index 1a0ab0c..16fbdf6 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1546,6 +1546,18 @@ static void post_read_index_from(struct index_state *istate)
>         tweak_untracked_cache(istate);
>  }
>
> +#if defined(GIT_WINDOWS_NATIVE)
> +static void do_poke(struct strbuf *sb, int refresh_cache)
> +{
> +       HWND hwnd;
> +       if (!starts_with(sb->buf, "HWND"))
> +               return;
> +       hwnd = FindWindow("git-index-helper", sb->buf);
> +       if (!hwnd)
> +               return;
> +       PostMessage(hwnd, refresh_cache ? WM_USER : WM_USER + 1, 0, 0);
> +}
> +#else
>  static void do_poke(struct strbuf *sb, int refresh_cache)
>  {
>         char    *start = sb->buf;
> @@ -1555,6 +1567,7 @@ static void do_poke(struct strbuf *sb, int refresh_cache)
>                 return;
>         kill(pid, refresh_cache ? SIGHUP : SIGUSR1);
>  }
> +#endif
>
>  static void poke_daemon(struct index_state *istate,
>                         const struct stat *st, int refresh_cache)
> diff --git a/shm.c b/shm.c
> index 4ec1a00..04d8a35 100644
> --- a/shm.c
> +++ b/shm.c
> @@ -52,6 +52,102 @@ void git_shm_unlink(const char *fmt, ...)
>         shm_unlink(path);
>  }
>
> +#elif defined(GIT_WINDOWS_NATIVE)
> +
> +#define SHM_PATH_LEN 82        /* a little bit longer than POSIX because of "Local\\" */
> +
> +static ssize_t create_shm_map(int oflag, int perm, ssize_t length,
> +                             void **mmap, int prot, int flags,
> +                             const char *path, unsigned long page_size)
> +{
> +       size_t real_length;
> +       void *last_page;
> +       HANDLE h;
> +
> +       assert(perm   == 0700);
> +       assert(oflag  == (O_CREAT | O_EXCL | O_RDWR));
> +       assert(prot   == (PROT_READ | PROT_WRITE));
> +       assert(flags  == MAP_SHARED);
> +       assert(length >= 0);
> +
> +       real_length = length;
> +       if (real_length % page_size)
> +               real_length += page_size - (real_length % page_size);
> +       real_length += page_size;
> +       h = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0,
> +                             real_length, path);
> +       if (!h)
> +               return -1;
> +       *mmap = MapViewOfFile(h, FILE_MAP_ALL_ACCESS, 0, 0, real_length);
> +       CloseHandle(h);
> +       if (!*mmap)
> +               return -1;
> +       last_page = (unsigned char *)*mmap + real_length - page_size;
> +       *(unsigned long *)last_page = length;
> +       return length;
> +}
> +
> +static ssize_t open_shm_map(int oflag, int perm, ssize_t length, void **mmap,
> +                           int prot, int flags, const char *path,
> +                           unsigned long page_size)
> +{
> +       void *last_page;
> +       HANDLE h;
> +
> +       assert(perm   == 0700);
> +       assert(oflag  == O_RDONLY);
> +       assert(prot   == PROT_READ);
> +       assert(flags  == MAP_SHARED);
> +       assert(length <= 0);
> +
> +       h = OpenFileMapping(FILE_MAP_READ, FALSE, path);
> +       if (!h)
> +               return -1;
> +       *mmap = MapViewOfFile(h, FILE_MAP_READ, 0, 0, 0);
> +       CloseHandle(h);
> +       if (!*mmap)
> +               return -1;
> +       if (length < 0) {
> +               MEMORY_BASIC_INFORMATION mbi;
> +               if (!VirtualQuery(*mmap, &mbi, sizeof(mbi))) {
> +                       UnmapViewOfFile(*mmap);
> +                       return -1;
> +               }
> +               if (mbi.RegionSize % page_size)
> +                       die("expected size %lu to be %lu aligned",
> +                                   mbi.RegionSize, page_size);
> +               last_page = (unsigned char *)*mmap + mbi.RegionSize - page_size;
> +               length = *(unsigned long *)last_page;
> +       }
> +       return length;
> +}
> +
> +ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
> +                   int prot, int flags, const char *fmt, ...)
> +{
> +       SYSTEM_INFO si;
> +       va_list ap;
> +       char path[SHM_PATH_LEN];
> +
> +       GetSystemInfo(&si);
> +
> +       strcpy(path, "Local\\");
> +       va_start(ap, fmt);
> +       vsprintf(path + strlen(path), fmt, ap);
> +       va_end(ap);
> +
> +       if (oflag & O_CREAT)
> +               return create_shm_map(oflag, perm, length, mmap, prot,
> +                                     flags, path, si.dwPageSize);
> +       else
> +               return open_shm_map(oflag, perm, length, mmap, prot,
> +                                   flags, path, si.dwPageSize);
> +}
> +
> +void git_shm_unlink(const char *fmt, ...)
> +{
> +}
> +
>  #else
>
>  ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
> --
> 2.4.2.767.g62658d5-twtrsrc
>



-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-15 14:26     ` Johannes Schindelin
  2016-03-16 11:37       ` Duy Nguyen
@ 2016-03-16 18:11       ` David Turner
  2016-03-16 18:27         ` Johannes Schindelin
  1 sibling, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-16 18:11 UTC (permalink / raw)
  To: Johannes Schindelin, Duy Nguyen; +Cc: Git Mailing List

On Tue, 2016-03-15 at 15:26 +0100, Johannes Schindelin wrote:
> Hi Duy,
> 
> On Tue, 15 Mar 2016, Duy Nguyen wrote:
> 
> > On Thu, Mar 10, 2016 at 1:36 AM, David Turner <
> > dturner@twopensource.com> wrote:
> > > Introduce a new config option, indexhelper.autorun, to
> > > automatically
> > > run git index-helper before starting up a builtin git command. 
> > >  This
> > > enables users to keep index-helper running without manual
> > > intervention.
> > 
> > This could be a problem on Windows because "index-helper --detach"
> > does not work there. I have no idea how "daemons" are managed on
> > Windows and not sure if our design is still good when such a
> > "daemon"
> > is added on Windows. So I'm pulling Johannes in for his opinions.
> > 
> > Background for Johannes. We're adding "git index-helper" daemon
> > (one
> > per repo) to cache the index in memory to speed up index load time
> > (and in future probably name-hash too, I think it's also more often
> > used on Windows because of case-insensitive fs). It also enables
> > watchman (on Windows) for faster refresh. This patch allows to
> > start
> > the daemon automatically if it's not running. But I don't know it
> > will
> > work ok on Windows.
> > 
> > Assuming that "index-helper" service has to be installed and
> > started
> > from system, there can only be one service running right? This
> > clashes
> > with the per-repo daemon design... I think it can stilf work, if
> > the
> > main service just spawns new process, one for each repo. But again
> > I'm
> > not sure.
> 
> If we want to run the process as a Windows service, you are correct,
> there
> really can only be one. Worse: it runs with admin privileges.
> 
> But why not just keep it running as a detached process? We can run
> those
> on Windows, and if we're opening a named pipe whose name reveals the
> one-to-one mapping with the index in question, I think we are fine
> (read:
> we can detect whether the process is running already).
> 
> We can even tell those processes to have a timeout, or to react to
> other
> system events.
> 
> Please note that I am *very* interested in this feature (speeding up
> index
> operations).

I don't understand what a "detached process" is on Windows (I have
never done any real Windows programming). Does that mean "call
daemonize() and it'll take care of it?"  Or something else?  Or should
I just not worry about it and let you take care of it?

Also, I'll figure out how to switch to named pipes. 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-16 18:11       ` David Turner
@ 2016-03-16 18:27         ` Johannes Schindelin
  2016-03-17 13:02           ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-16 18:27 UTC (permalink / raw)
  To: David Turner; +Cc: Duy Nguyen, Git Mailing List

Hi David,

On Wed, 16 Mar 2016, David Turner wrote:

> I don't understand what a "detached process" is on Windows (I have never
> done any real Windows programming). Does that mean "call daemonize() and
> it'll take care of it?"  Or something else?  Or should I just not worry
> about it and let you take care of it?

Every process is detached by default, when created via CreateProcess().
There is no fork() followed by a daemonize(). If you want to wait for a
process to finish, you have to wait for it explicitly.

But yeah, why don't you let me worry about this?

I am much more concerned about concurrent accesses and the communication
between the Git processes and the index-helper. Writing to the .pid file
sounds very fragile to me, in particular when multiple processes can poke
the index-helper in succession and some readers are unaware that the index
is being refreshed.

This is just a quick note, and I should go into more detail, also about
the topic branch my colleague Jeff Hostetler and I are working on, that is
similar in spirit but wants to use a home-rolled alternative to watchman.
I will go into more detail, but that will have to wait for tomorrow, as I
need to log off for the day.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/19] hack: watchman/untracked cache mashup
  2016-03-15 12:31   ` Duy Nguyen
@ 2016-03-17  0:56     ` David Turner
  2016-03-17 13:06       ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-17  0:56 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Tue, 2016-03-15 at 19:31 +0700, Duy Nguyen wrote:
> On Thu, Mar 10, 2016 at 1:36 AM, David Turner <
> dturner@twopensource.com> wrote:
> >  static struct watchman_query *make_query(const char *last_update)
> > @@ -60,8 +61,24 @@ static void update_index(struct index_state
> > *istate,
> >                         continue;
> > 
> >                 pos = index_name_pos(istate, wm->name, strlen(wm
> > ->name));
> > -               if (pos < 0)
> > +               if (pos < 0) {
> > +                       if (istate->untracked) {
> > +                               char *name = xstrdup(wm->name);
> > +                               char *dname = dirname(name);
> > +
> > +                               /*
> > +                                * dirname() returns '.' for the
> > root,
> > +                                * but we call it ''.
> > +                                */
> > +                               if (dname[0] == '.' && dname[1] ==
> > 0)
> > +                                       string_list_append(&istate
> > ->untracked->invalid_untracked, "");
> > +                               else
> > +                                       string_list_append(&istate
> > ->untracked->invalid_untracked,
> > +                                                          dname);
> > +                               free(name);
> > +                       }
> >                         continue;
> > +               }
> 
> So if we detect an updated file that's not in the index, we are
> prepared to invalidate that path, correct? We may invalidate more
> than
> necessary if that's true. Imagine a.o is already ignored. If it's
> rebuilt, we should not need to update untracked cache.

Yes, that's true.  But it would be true with the mtime system too. This
is no worse, even if it's no better.  In-tree builds are a hard case to
support, and I'm totally OK with a system that encourages out-of-tree
builds.

We could check if it's ignored, but then if someone changes their
gitignore, we could be wrong.

Or we could suggest that people make their watchmanignore match their
gitignore.


> What I had in mind (and argued with watchman devs a bit [1]) was
> maintain the file list of each clock and compare the file list of
> different clocks to figure out what files are added or deleted. Then
> we can generate invalidation list more accurately. We need to keep at
> least one file list corresponds to  the clock saved in the index.
> When
> we get the refresh request, we get a new file list (with new clock),
> do a diff then save the invalidation list as an extension. Once we
> notice that new clock is written back in index, we can discard older
> file lists. In theory we should not need to keep too many file lists,
> so even if one list is big, it should not be a big problem.
> 
> I have a note with me about race conditions with this approach, but I
> haven't remembered exactly why or how yet.. My recent thoughts about
> it though, are race conditions when you do "git status" is probably
> tolerable. After all if you're doing some I/O when you do git-status,
> you're guaranteed to miss some updates.
> 
> [1] https://github.com/facebook/watchman/issues/65

I think it would be possible to just check the UNTR extension and only
add a dir to it if that dir doesn't already contain (untracked) the
file that's being modified.  Or is that also racy?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/19] index-helper: add Windows support
  2016-03-16 11:42   ` Duy Nguyen
@ 2016-03-17 12:18     ` Johannes Schindelin
  2016-03-17 12:59       ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-17 12:18 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Hi Duy,

On Wed, 16 Mar 2016, Duy Nguyen wrote:

> I think it's safe to drop this patch now. It's mostly to prove that it
> could work on Windows. But I don't think it's tested a lot (even I
> only occasionally test it under wine).

I agree that it is more important to get the basics right (such as:
concurrent access, elegant interface to watchman, pluggable backends
different than watchman, support for *other* Git clients modifying the
index under index-helper's feet, etc).

We will try to keep the Windows end going at the same time.

Please note that there is one very important difference between Windows vs
Linux when it comes to shared memory: on Windows, the memory is released
as soon as the last user goes away.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/19] index-helper: add Windows support
  2016-03-17 12:18     ` Johannes Schindelin
@ 2016-03-17 12:59       ` Duy Nguyen
  0 siblings, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-17 12:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List

On Thu, Mar 17, 2016 at 7:18 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Please note that there is one very important difference between Windows vs
> Linux when it comes to shared memory: on Windows, the memory is released
> as soon as the last user goes away.

I assume that can be taken care of without changing anything else
fundamentally, or you need to tell us now. I guess that by making
index-helper hold the shared memory all the time, we can avoid that.
You may also want to check if my solution for the lack of ftruncate on
Windows shm is correct.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-16 18:27         ` Johannes Schindelin
@ 2016-03-17 13:02           ` Duy Nguyen
  2016-03-17 14:43             ` Johannes Schindelin
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-17 13:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List

On Thu, Mar 17, 2016 at 1:27 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> I am much more concerned about concurrent accesses and the communication
> between the Git processes and the index-helper. Writing to the .pid file
> sounds very fragile to me, in particular when multiple processes can poke
> the index-helper in succession and some readers are unaware that the index
> is being refreshed.

It's not that bad. We should have protection in place to deal with
this and fall back to reading directly from file when things get
suspicious. But I agree that sending UNIX signals (or PostMessage) is
not really good communication.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/19] hack: watchman/untracked cache mashup
  2016-03-17  0:56     ` David Turner
@ 2016-03-17 13:06       ` Duy Nguyen
  2016-03-17 18:08         ` David Turner
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-17 13:06 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Thu, Mar 17, 2016 at 7:56 AM, David Turner <dturner@twopensource.com> wrote:
>> So if we detect an updated file that's not in the index, we are
>> prepared to invalidate that path, correct? We may invalidate more
>> than
>> necessary if that's true. Imagine a.o is already ignored. If it's
>> rebuilt, we should not need to update untracked cache.
>
> Yes, that's true.  But it would be true with the mtime system too. This
> is no worse, even if it's no better.  In-tree builds are a hard case to
> support, and I'm totally OK with a system that encourages out-of-tree
> builds.
>
> We could check if it's ignored, but then if someone changes their
> gitignore, we could be wrong.
>
> Or we could suggest that people make their watchmanignore match their
> gitignore.

So your purpose is to reduce stat() on those "quiet" directories?
Putting it that way, yes it's reasonable to do this way. I want to "do
better" but we can do in steps and the first step "no worse, no
better" is already good.

>> What I had in mind (and argued with watchman devs a bit [1]) was
>> maintain the file list of each clock and compare the file list of
>> different clocks to figure out what files are added or deleted. Then
>> we can generate invalidation list more accurately. We need to keep at
>> least one file list corresponds to  the clock saved in the index.
>> When
>> we get the refresh request, we get a new file list (with new clock),
>> do a diff then save the invalidation list as an extension. Once we
>> notice that new clock is written back in index, we can discard older
>> file lists. In theory we should not need to keep too many file lists,
>> so even if one list is big, it should not be a big problem.
>>
>> I have a note with me about race conditions with this approach, but I
>> haven't remembered exactly why or how yet.. My recent thoughts about
>> it though, are race conditions when you do "git status" is probably
>> tolerable. After all if you're doing some I/O when you do git-status,
>> you're guaranteed to miss some updates.
>>
>> [1] https://github.com/facebook/watchman/issues/65
>
> I think it would be possible to just check the UNTR extension and only
> add a dir to it if that dir doesn't already contain (untracked) the
> file that's being modified.  Or is that also racy?

Sorry can't comment on this racy thing yet. Still haven't sat down,
redrawn the timeline and retraced what I thought months ago :)
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-17 13:02           ` Duy Nguyen
@ 2016-03-17 14:43             ` Johannes Schindelin
  2016-03-17 18:31               ` David Turner
  2016-03-18  0:50               ` Duy Nguyen
  0 siblings, 2 replies; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-17 14:43 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Hi Duy,

On Thu, 17 Mar 2016, Duy Nguyen wrote:

> On Thu, Mar 17, 2016 at 1:27 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > I am much more concerned about concurrent accesses and the communication
> > between the Git processes and the index-helper. Writing to the .pid file
> > sounds very fragile to me, in particular when multiple processes can poke
> > the index-helper in succession and some readers are unaware that the index
> > is being refreshed.
> 
> It's not that bad.

Well, the way I read the code it is possible that:

1. Git process 1 starts, reading the index
2. Git process 2 starts, poking the index-helper
3. The index-helper updates the .pid file (why not set a bit in the shared
   memory?) with a prefix "W"
4. Git process 2 reads the .pid file and waits for the "W" to go away
   (what if index-helper is not fast enough to write the "W"?)
5. Git process 1 access the index, happily oblivious that it is being
   updated and the data is in an inconsistent state

> We should have protection in place to deal with this and fall back to
> reading directly from file when things get suspicious.

I really want to prevent that. I know of use cases where the index weighs
300MB, and falling back to reading it directly *really* hurts.

> But I agree that sending UNIX signals (or PostMessage) is not really
> good communication.

Yeah, I really would like two-way communication instead. Named pipes?
They'd have the advantage that you could use the full path to the index as
identifier.

The way I read the current code, we would actually create a different
shared memory every time the index changes because its checksum is part of
the shared memory's "path"...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/19] hack: watchman/untracked cache mashup
  2016-03-17 13:06       ` Duy Nguyen
@ 2016-03-17 18:08         ` David Turner
  0 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-17 18:08 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Thu, 2016-03-17 at 20:06 +0700, Duy Nguyen wrote:
> On Thu, Mar 17, 2016 at 7:56 AM, David Turner <
> dturner@twopensource.com> wrote:
> > > So if we detect an updated file that's not in the index, we are
> > > prepared to invalidate that path, correct? We may invalidate more
> > > than
> > > necessary if that's true. Imagine a.o is already ignored. If it's
> > > rebuilt, we should not need to update untracked cache.
> > 
> > Yes, that's true.  But it would be true with the mtime system too.
> > This
> > is no worse, even if it's no better.  In-tree builds are a hard
> > case to
> > support, and I'm totally OK with a system that encourages out-of
> > -tree
> > builds.
> > 
> > We could check if it's ignored, but then if someone changes their
> > gitignore, we could be wrong.
> > 
> > Or we could suggest that people make their watchmanignore match
> > their
> > gitignore.
> 
> So your purpose is to reduce stat() on those "quiet" directories?

Yes.  Twitter's repo is perhaps somewhat unusual in that it has a very
"bushy" directory structure -- tens of thousands of directories.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-17 14:43             ` Johannes Schindelin
@ 2016-03-17 18:31               ` David Turner
  2016-03-18  0:50               ` Duy Nguyen
  1 sibling, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-17 18:31 UTC (permalink / raw)
  To: Johannes Schindelin, Duy Nguyen; +Cc: Git Mailing List

On Thu, 2016-03-17 at 15:43 +0100, Johannes Schindelin wrote:
> Hi Duy,
> 
> On Thu, 17 Mar 2016, Duy Nguyen wrote:
> 
> > On Thu, Mar 17, 2016 at 1:27 AM, Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > > I am much more concerned about concurrent accesses and the
> > > communication
> > > between the Git processes and the index-helper. Writing to the
> > > .pid file
> > > sounds very fragile to me, in particular when multiple processes
> > > can poke
> > > the index-helper in succession and some readers are unaware that
> > > the index
> > > is being refreshed.
> > 
> > It's not that bad.
> 
> Well, the way I read the code it is possible that:
> 
> 1. Git process 1 starts, reading the index
> 2. Git process 2 starts, poking the index-helper
> 3. The index-helper updates the .pid file (why not set a bit in the
> shared
>    memory?) with a prefix "W"
> 4. Git process 2 reads the .pid file and waits for the "W" to go away
>    (what if index-helper is not fast enough to write the "W"?)
> 5. Git process 1 access the index, happily oblivious that it is being
>    updated and the data is in an inconsistent state


That's not quite how I understand it.  It's more like MVCC.  Writes to
the index go to a new index file.  Index files are identified by their
SHA.  Reads from the index go into a new shm, identified by SHA.  

The "W" is set only once -- it just means "this index helper knows how
to talk to watchman".  It's a compile-time option.

(I'm going to change this anyway when I switch to named pipes).

The watchman data is shared independently; if it's not ready in time
(whatever that means -- it's 1s in the current code), then read-cache
should fall back to brute-force checking every file.

> > We should have protection in place to deal with this and fall back
> > to
> > reading directly from file when things get suspicious.
> 
> I really want to prevent that. I know of use cases where the index
> weighs
> 300MB, and falling back to reading it directly *really* hurts.





> > But I agree that sending UNIX signals (or PostMessage) is not
> > really
> > good communication.
> 
> Yeah, I really would like two-way communication instead. Named pipes?
> They'd have the advantage that you could use the full path to the
> index as
> identifier.
> 
> The way I read the current code, we would actually create a different
> shared memory every time the index changes because its checksum is
> part of
> the shared memory's "path"...
> 
> Ciao,
> Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-17 14:43             ` Johannes Schindelin
  2016-03-17 18:31               ` David Turner
@ 2016-03-18  0:50               ` Duy Nguyen
  2016-03-18  7:14                 ` Johannes Schindelin
  2016-03-18  7:17                 ` Johannes Schindelin
  1 sibling, 2 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-18  0:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List

On Thu, Mar 17, 2016 at 9:43 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Thu, 17 Mar 2016, Duy Nguyen wrote:
>
>> On Thu, Mar 17, 2016 at 1:27 AM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>> > I am much more concerned about concurrent accesses and the communication
>> > between the Git processes and the index-helper. Writing to the .pid file
>> > sounds very fragile to me, in particular when multiple processes can poke
>> > the index-helper in succession and some readers are unaware that the index
>> > is being refreshed.
>>
>> It's not that bad.
>
> Well, the way I read the code it is possible that:
>
> 1. Git process 1 starts, reading the index
> 2. Git process 2 starts, poking the index-helper
> 3. The index-helper updates the .pid file (why not set a bit in the shared
>    memory?) with a prefix "W"
> 4. Git process 2 reads the .pid file and waits for the "W" to go away
>    (what if index-helper is not fast enough to write the "W"?)
> 5. Git process 1 access the index, happily oblivious that it is being
>    updated and the data is in an inconsistent state

No, if process 1 reads the index file, then that file will remain
consistent/unchanged all the time. index-helper is not allowed to
touch that file at all.

The process 2 gets the index content from shm (cached by the index
helper), verifies that it's good (with the signature at the end of the
shm). If watchman is used, process 2 can also read the list of
modified files from another shm, combine it with the in-core index,
then write it down the normal way. Only then process 1 (or process 3)
can see the new index content from the file.

>> We should have protection in place to deal with this and fall back to
>> reading directly from file when things get suspicious.
>
> I really want to prevent that. I know of use cases where the index weighs
> 300MB, and falling back to reading it directly *really* hurts.

For crying out loud, what do you store in that repo? What I have in
mind for all these works are indexes in 10MB range, or maybe 50MB max.

Very unscientifically, git.git index is about 274kb and contains ~3000
entries, so 94 bytes per entry on average. With a 300MB index , the
extrapolated number of entries is about 3 millions! At around 1
million index entries, I think it's time to just use a database as
index.

>> But I agree that sending UNIX signals (or PostMessage) is not really
>> good communication.
>
> Yeah, I really would like two-way communication instead. Named pipes?
> They'd have the advantage that you could use the full path to the index as
> identifier.

Yep.

> The way I read the current code, we would actually create a different
> shared memory every time the index changes because its checksum is part of
> the shared memory's "path"...

Yep. shm objects are "immutable", pretty much like git objects. But
now that I think of it, I don't know how cheap/expensive shm creation
operation is on Windows.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18  0:50               ` Duy Nguyen
@ 2016-03-18  7:14                 ` Johannes Schindelin
  2016-03-18  7:44                   ` Duy Nguyen
  2016-03-18  7:17                 ` Johannes Schindelin
  1 sibling, 1 reply; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-18  7:14 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Hi Duy,

On Fri, 18 Mar 2016, Duy Nguyen wrote:

> On Thu, Mar 17, 2016 at 9:43 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > I know of use cases where the index weighs 300MB, and falling back to
> > reading it directly *really* hurts.
> 
> For crying out loud, what do you store in that repo? What I have in
> mind for all these works are indexes in 10MB range, or maybe 50MB max.

Welcome to the real world.

> Very unscientifically, git.git index is about 274kb and contains ~3000
> entries, so 94 bytes per entry on average.

In terms of software projects' size, git.git is but a toy. Most developers
deal with vastly larger (and often messier) repositories. This is
especially true outside Open Source. Even the Linux kernel's repository is
*tiny* compared to real-world repositories.

I am sure that David could tell many a tale about repository/working
directory size, too.

So yeah, this is the challenge: to make Git work at real-world scale
(didn't we hear a lot about this at the latest Git Merge?)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18  0:50               ` Duy Nguyen
  2016-03-18  7:14                 ` Johannes Schindelin
@ 2016-03-18  7:17                 ` Johannes Schindelin
  2016-03-18  7:34                   ` Duy Nguyen
  1 sibling, 1 reply; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-18  7:17 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Hi Duy,

On Fri, 18 Mar 2016, Duy Nguyen wrote:

> > Well, the way I read the code it is possible that:
> >
> > 1. Git process 1 starts, reading the index
> > 2. Git process 2 starts, poking the index-helper
> > 3. The index-helper updates the .pid file (why not set a bit in the shared
> >    memory?) with a prefix "W"
> > 4. Git process 2 reads the .pid file and waits for the "W" to go away
> >    (what if index-helper is not fast enough to write the "W"?)
> > 5. Git process 1 access the index, happily oblivious that it is being
> >    updated and the data is in an inconsistent state
> 
> No, if process 1 reads the index file, then that file will remain
> consistent/unchanged all the time. index-helper is not allowed to
> touch that file at all.
> 
> The process 2 gets the index content from shm (cached by the index
> helper), verifies that it's good (with the signature at the end of the
> shm). If watchman is used, process 2 can also read the list of
> modified files from another shm, combine it with the in-core index,
> then write it down the normal way. Only then process 1 (or process 3)
> can see the new index content from the file.

So how do you deal with releasing the shared memory instances that are
essentially created for every index update?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18  7:17                 ` Johannes Schindelin
@ 2016-03-18  7:34                   ` Duy Nguyen
  2016-03-18 15:57                     ` Johannes Schindelin
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-18  7:34 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List

On Fri, Mar 18, 2016 at 2:17 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Fri, 18 Mar 2016, Duy Nguyen wrote:
>
>> > Well, the way I read the code it is possible that:
>> >
>> > 1. Git process 1 starts, reading the index
>> > 2. Git process 2 starts, poking the index-helper
>> > 3. The index-helper updates the .pid file (why not set a bit in the shared
>> >    memory?) with a prefix "W"
>> > 4. Git process 2 reads the .pid file and waits for the "W" to go away
>> >    (what if index-helper is not fast enough to write the "W"?)
>> > 5. Git process 1 access the index, happily oblivious that it is being
>> >    updated and the data is in an inconsistent state
>>
>> No, if process 1 reads the index file, then that file will remain
>> consistent/unchanged all the time. index-helper is not allowed to
>> touch that file at all.
>>
>> The process 2 gets the index content from shm (cached by the index
>> helper), verifies that it's good (with the signature at the end of the
>> shm). If watchman is used, process 2 can also read the list of
>> modified files from another shm, combine it with the in-core index,
>> then write it down the normal way. Only then process 1 (or process 3)
>> can see the new index content from the file.
>
> So how do you deal with releasing the shared memory instances that are
> essentially created for every index update?

When index-helper reads the index file and realizes the file has been
updated (based on trailing SHA-1), it unlink()s the old shm and
prepares new shm. If no process is accessing old shm, it's gone. If
not, it stays until nobody elses uses it. shm on Windows behaves the
same way, I believe.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18  7:14                 ` Johannes Schindelin
@ 2016-03-18  7:44                   ` Duy Nguyen
  2016-03-18 17:22                     ` David Turner
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2016-03-18  7:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List

On Fri, Mar 18, 2016 at 2:14 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Fri, 18 Mar 2016, Duy Nguyen wrote:
>
>> On Thu, Mar 17, 2016 at 9:43 PM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>>
>> > I know of use cases where the index weighs 300MB, and falling back to
>> > reading it directly *really* hurts.
>>
>> For crying out loud, what do you store in that repo? What I have in
>> mind for all these works are indexes in 10MB range, or maybe 50MB max.
>
> Welcome to the real world.
>
>> Very unscientifically, git.git index is about 274kb and contains ~3000
>> entries, so 94 bytes per entry on average.
>
> In terms of software projects' size, git.git is but a toy. Most developers
> deal with vastly larger (and often messier) repositories. This is
> especially true outside Open Source. Even the Linux kernel's repository is
> *tiny* compared to real-world repositories.
>
> I am sure that David could tell many a tale about repository/working
> directory size, too.

I know a few real-world repos, decades old, but I don't remember if
any of them reached this size. Did I get that 3 million entry number
right? Because even my whole /home uses over 1m inodes. And whole /usr
only has 300k files. If the number of entries is lower, maybe there's
some improvement we can do to reduce index size a bit. There's some
fast compression we can do, for starter.

> So yeah, this is the challenge: to make Git work at real-world scale
> (didn't we hear a lot about this at the latest Git Merge?)

I'm all for making Junio cry by using Git for what it is (or was) not
intended for, but this seems too much. A repo about 500k files or
less, I think I can deal with,  not those in million range.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18  7:34                   ` Duy Nguyen
@ 2016-03-18 15:57                     ` Johannes Schindelin
  0 siblings, 0 replies; 60+ messages in thread
From: Johannes Schindelin @ 2016-03-18 15:57 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List

Hi Duy,

On Fri, 18 Mar 2016, Duy Nguyen wrote:

> On Fri, Mar 18, 2016 at 2:17 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Fri, 18 Mar 2016, Duy Nguyen wrote:
> >
> >> > Well, the way I read the code it is possible that:
> >> >
> >> > 1. Git process 1 starts, reading the index
> >> > 2. Git process 2 starts, poking the index-helper
> >> > 3. The index-helper updates the .pid file (why not set a bit in the shared
> >> >    memory?) with a prefix "W"
> >> > 4. Git process 2 reads the .pid file and waits for the "W" to go away
> >> >    (what if index-helper is not fast enough to write the "W"?)
> >> > 5. Git process 1 access the index, happily oblivious that it is being
> >> >    updated and the data is in an inconsistent state
> >>
> >> No, if process 1 reads the index file, then that file will remain
> >> consistent/unchanged all the time. index-helper is not allowed to
> >> touch that file at all.
> >>
> >> The process 2 gets the index content from shm (cached by the index
> >> helper), verifies that it's good (with the signature at the end of the
> >> shm). If watchman is used, process 2 can also read the list of
> >> modified files from another shm, combine it with the in-core index,
> >> then write it down the normal way. Only then process 1 (or process 3)
> >> can see the new index content from the file.
> >
> > So how do you deal with releasing the shared memory instances that are
> > essentially created for every index update?
> 
> When index-helper reads the index file and realizes the file has been
> updated (based on trailing SHA-1), it unlink()s the old shm and
> prepares new shm. If no process is accessing old shm, it's gone. If
> not, it stays until nobody elses uses it. shm on Windows behaves the
> same way, I believe.

Ah. On Windows, you do not even have to unlink: once the last user's gone,
the shared memory's gone, too.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18  7:44                   ` Duy Nguyen
@ 2016-03-18 17:22                     ` David Turner
  2016-03-18 23:09                       ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: David Turner @ 2016-03-18 17:22 UTC (permalink / raw)
  To: Duy Nguyen, Johannes Schindelin; +Cc: Git Mailing List

On Fri, 2016-03-18 at 14:44 +0700, Duy Nguyen wrote:
> > So yeah, this is the challenge: to make Git work at real-world
> > scale
> > (didn't we hear a lot about this at the latest Git Merge?)
> 
> I'm all for making Junio cry by using Git for what it is (or was) not
> intended for, but this seems too much. A repo about 500k files or
> less, I think I can deal with,  not those in million range.

Sad news: Facebook's repo was already getting towards that size three
years ago:
http://comments.gmane.org/gmane.comp.version-control.git/189776

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/19] index-helper: autorun
  2016-03-18 17:22                     ` David Turner
@ 2016-03-18 23:09                       ` Duy Nguyen
  0 siblings, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2016-03-18 23:09 UTC (permalink / raw)
  To: David Turner; +Cc: Johannes Schindelin, Git Mailing List

On Sat, Mar 19, 2016 at 12:22 AM, David Turner <dturner@twopensource.com> wrote:
> On Fri, 2016-03-18 at 14:44 +0700, Duy Nguyen wrote:
>> > So yeah, this is the challenge: to make Git work at real-world
>> > scale
>> > (didn't we hear a lot about this at the latest Git Merge?)
>>
>> I'm all for making Junio cry by using Git for what it is (or was) not
>> intended for, but this seems too much. A repo about 500k files or
>> less, I think I can deal with,  not those in million range.
>
> Sad news: Facebook's repo was already getting towards that size three
> years ago:
> http://comments.gmane.org/gmane.comp.version-control.git/189776

I've been doing this for too long I forgot about numbers in that mail.
The good news is I also forgot that I tried an artificial index of 200
MB nearly two years ago [1]. 100ms read time from that test is not
bad. But Johannes' point stands, it will hurt if index-helper is
bypassed at this size.

[1] http://article.gmane.org/gmane.comp.version-control.git/248771
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/19] index-helper, watchman
  2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
                   ` (18 preceding siblings ...)
  2016-03-09 18:36 ` [PATCH 19/19] hack: watchman/untracked cache mashup David Turner
@ 2016-03-29 17:09 ` Torsten Bögershausen
  2016-03-29 21:51   ` David Turner
  19 siblings, 1 reply; 60+ messages in thread
From: Torsten Bögershausen @ 2016-03-29 17:09 UTC (permalink / raw)
  To: David Turner, git, pclouds

On 2016-03-09 19.36, David Turner wrote:
> This is a rebase and extension of Duy's work on git index-helper and
> watchman support.
>
Somewhere we need to tweak something:
t7900 do all fail under Mac OS, because the index-helper is not build.

The best would be to have a precondition when running the tests ?

t7900-index-helper.sh   not ok 1 - index-helper smoke test
t7900-index-helper.sh   not ok 2 - index-helper creates usable pipe file and can
be killed
t7900-index-helper.sh   not ok 3 - index-helper will not start if already running
t7900-index-helper.sh   not ok 4 - index-helper is quiet with --autorun
t7900-index-helper.sh   not ok 5 - index-helper autorun works


The other thing is to enable SHM on other platforms, but first things first.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/19] index-helper, watchman
  2016-03-29 17:09 ` [PATCH 00/19] index-helper, watchman Torsten Bögershausen
@ 2016-03-29 21:51   ` David Turner
  0 siblings, 0 replies; 60+ messages in thread
From: David Turner @ 2016-03-29 21:51 UTC (permalink / raw)
  To: Torsten Bögershausen, git, pclouds

On Tue, 2016-03-29 at 19:09 +0200, Torsten Bögershausen wrote:
> On 2016-03-09 19.36, David Turner wrote:
> > This is a rebase and extension of Duy's work on git index-helper
> > and
> > watchman support.
> > 
> Somewhere we need to tweak something:
> t7900 do all fail under Mac OS, because the index-helper is not
> build.
> 
> The best would be to have a precondition when running the tests ?
> 
> t7900-index-helper.sh   not ok 1 - index-helper smoke test
> t7900-index-helper.sh   not ok 2 - index-helper creates usable pipe
> file and can
> be killed
> t7900-index-helper.sh   not ok 3 - index-helper will not start if
> already running
> t7900-index-helper.sh   not ok 4 - index-helper is quiet with -
> -autorun
> t7900-index-helper.sh   not ok 5 - index-helper autorun works
> 
> 
> The other thing is to enable SHM on other platforms, but first things
> first.

Will fix, thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-03-29 21:51 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
2016-03-09 22:58   ` Junio C Hamano
2016-03-10  0:05     ` David Turner
2016-03-10 10:59       ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 02/19] read-cache.c: fix constness of verify_hdr() David Turner
2016-03-09 18:36 ` [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading David Turner
2016-03-09 23:02   ` Junio C Hamano
2016-03-10  0:09     ` David Turner
2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
2016-03-09 23:09   ` Junio C Hamano
2016-03-09 23:21     ` Junio C Hamano
2016-03-10  0:01       ` David Turner
2016-03-10 11:17       ` Duy Nguyen
2016-03-10 20:22         ` David Turner
2016-03-11  1:11           ` Duy Nguyen
2016-03-10  0:18     ` David Turner
2016-03-15 11:56     ` Duy Nguyen
2016-03-15 15:56       ` Junio C Hamano
2016-03-15 11:52   ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 05/19] trace.c: add GIT_TRACE_INDEX_STATS for index statistics David Turner
2016-03-09 18:36 ` [PATCH 06/19] index-helper: add --strict David Turner
2016-03-09 18:36 ` [PATCH 07/19] daemonize(): set a flag before exiting the main process David Turner
2016-03-09 18:36 ` [PATCH 08/19] index-helper: add --detach David Turner
2016-03-09 18:36 ` [PATCH 09/19] index-helper: add Windows support David Turner
2016-03-16 11:42   ` Duy Nguyen
2016-03-17 12:18     ` Johannes Schindelin
2016-03-17 12:59       ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 10/19] read-cache: add watchman 'WAMA' extension David Turner
2016-03-09 18:36 ` [PATCH 11/19] Add watchman support to reduce index refresh cost David Turner
2016-03-09 18:36 ` [PATCH 12/19] read-cache: allow index-helper to prepare shm before git reads it David Turner
2016-03-09 18:36 ` [PATCH 13/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-03-09 18:36 ` [PATCH 14/19] update-index: enable/disable watchman support David Turner
2016-03-09 18:36 ` [PATCH 15/19] unpack-trees: preserve index extensions David Turner
2016-03-09 18:36 ` [PATCH 16/19] index-helper: rewrite pidfile after daemonizing David Turner
2016-03-09 18:36 ` [PATCH 17/19] index-helper: process management David Turner
2016-03-09 18:36 ` [PATCH 18/19] index-helper: autorun David Turner
2016-03-15 12:12   ` Duy Nguyen
2016-03-15 14:26     ` Johannes Schindelin
2016-03-16 11:37       ` Duy Nguyen
2016-03-16 18:11       ` David Turner
2016-03-16 18:27         ` Johannes Schindelin
2016-03-17 13:02           ` Duy Nguyen
2016-03-17 14:43             ` Johannes Schindelin
2016-03-17 18:31               ` David Turner
2016-03-18  0:50               ` Duy Nguyen
2016-03-18  7:14                 ` Johannes Schindelin
2016-03-18  7:44                   ` Duy Nguyen
2016-03-18 17:22                     ` David Turner
2016-03-18 23:09                       ` Duy Nguyen
2016-03-18  7:17                 ` Johannes Schindelin
2016-03-18  7:34                   ` Duy Nguyen
2016-03-18 15:57                     ` Johannes Schindelin
2016-03-09 18:36 ` [PATCH 19/19] hack: watchman/untracked cache mashup David Turner
2016-03-15 12:31   ` Duy Nguyen
2016-03-17  0:56     ` David Turner
2016-03-17 13:06       ` Duy Nguyen
2016-03-17 18:08         ` David Turner
2016-03-29 17:09 ` [PATCH 00/19] index-helper, watchman Torsten Bögershausen
2016-03-29 21:51   ` David Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.