All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] inotify support
@ 2014-01-12 11:03 Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 1/6] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
                   ` (7 more replies)
  0 siblings, 8 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

It's been 37 weeks since Robert Zeh's attempt to bring inotify support
to Git [1] and unless I missed some mails, no updates since. So here's
another attempt with my preferred approach (can't help it, playing
with your own ideas is more fun than improving other people's code)

To compare to Robert's approach:

- This one uses UNIX datagram socket. If I read its man page right,
  unix socket respects the containing directory's permission. Which
  means on normal repos, only the user process can access. On shared
  repos, multiple users can access it. This should work on Mac.
  Windows will need a different transport.

- The daemon is dumb. It passes the paths around and that's it.
  lstat() is done by git. If I design it right, there's should not be
  any race conditions that make git miss file updates.

- CE_VALID is reused to avoid mass changes (granted there's other
  neat ways as well). I quite like the idea of machine-controlled
  CE_VALID.

inotify support has the potential of reducing syscalls in
read_directory() as well. I wrote about using lstat() to reduce
readdir() a while back, if that's implemented then inotify will fit in
nicely.

This is just a proof of concept. I'm sure I haven't handled all error
cases very well. The first five patches show the protocol and git
side's changes. The last one fills inotify in.

[1] http://thread.gmane.org/gmane.comp.version-control.git/215820/focus=222278

Nguyễn Thái Ngọc Duy (6):
  read-cache: save trailing sha-1
  read-cache: new extension to mark what file is watched
  read-cache: connect to file watcher
  read-cache: get "updated" path list from file watcher
  read-cache: ask file watcher to watch files
  file-watcher: support inotify

 .gitignore           |   1 +
 Makefile             |   1 +
 cache.h              |   4 +
 config.mak.uname     |   1 +
 file-watcher.c (new) | 329 +++++++++++++++++++++++++++++++++++++++++++++++++++
 git-compat-util.h    |   5 +
 pkt-line.c           |   2 +-
 pkt-line.h           |   2 +
 read-cache.c         | 280 ++++++++++++++++++++++++++++++++++++++++++-
 wrapper.c            |  27 +++++
 10 files changed, 645 insertions(+), 7 deletions(-)
 create mode 100644 file-watcher.c

-- 
1.8.5.2.240.g8478abd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/6] read-cache: save trailing sha-1
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
@ 2014-01-12 11:03 ` Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 2/6] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This will be used as signature to know if the index has changed.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      | 1 +
 read-cache.c | 7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index ce377e1..7f7f306 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
+	unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 33dd676..3b6daf1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1269,10 +1269,11 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(struct cache_header *hdr,
+		      unsigned long size,
+		      unsigned char *sha1)
 {
 	git_SHA_CTX c;
-	unsigned char sha1[20];
 	int hdr_version;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
@@ -1461,7 +1462,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	close(fd);
 
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	if (verify_hdr(hdr, mmap_size, istate->sha1) < 0)
 		goto unmap;
 
 	istate->version = ntohl(hdr->hdr_version);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/6] read-cache: new extension to mark what file is watched
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 1/6] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
@ 2014-01-12 11:03 ` Nguyễn Thái Ngọc Duy
  2014-01-13 17:02   ` Jonathan Nieder
  2014-01-14  1:39   ` Duy Nguyen
  2014-01-12 11:03 ` [PATCH 3/6] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

If an entry is "watched", git lets an external program decide if the
entry is modified or not. It's more like --assume-unchanged, but
designed to be controlled by machine.

We are running out of on-disk ce_flags, so instead of extending
on-disk entry format again, "watched" flags are in-core only and
stored as extension instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  1 +
 read-cache.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 7f7f306..dfa8622 100644
--- a/cache.h
+++ b/cache.h
@@ -168,6 +168,7 @@ struct cache_entry {
 
 /* used to temporarily mark paths matched by pathspecs */
 #define CE_MATCHED           (1 << 26)
+#define CE_WATCHED           (1 << 27)
 
 /*
  * Extended on-disk flags
diff --git a/read-cache.c b/read-cache.c
index 3b6daf1..098d3b6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -33,6 +33,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
 
 struct index_state the_index;
 
@@ -1289,6 +1290,19 @@ static int verify_hdr(struct cache_header *hdr,
 	return 0;
 }
 
+static void read_watch_extension(struct index_state *istate, uint8_t *data,
+				 unsigned long sz)
+{
+	int i;
+	if ((istate->cache_nr + 7) / 8 != sz) {
+		error("invalid 'WATC' extension");
+		return;
+	}
+	for (i = 0; i < istate->cache_nr; i++)
+		if (data[i / 8] & (1 << (i % 8)))
+			istate->cache[i]->ce_flags |= CE_WATCHED;
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1299,6 +1313,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_WATCH:
+		read_watch_extension(istate, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1777,7 +1794,7 @@ int write_index(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
-	int i, err, removed, extended, hdr_version;
+	int i, err, removed, extended, hdr_version, has_watches = 0;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
 	struct stat st;
@@ -1786,6 +1803,8 @@ int write_index(struct index_state *istate, int newfd)
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
+		else if (cache[i]->ce_flags & CE_WATCHED)
+			has_watches++;
 
 		/* reduce extended entries if possible */
 		cache[i]->ce_flags &= ~CE_EXTENDED;
@@ -1857,6 +1876,26 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
+	if (has_watches) {
+		int id, sz = (entries - removed + 7) / 8;
+		uint8_t *data = xmalloc(sz);
+		memset(data, 0, sz);
+		for (i = 0, id = 0; i < entries && has_watches; i++) {
+			struct cache_entry *ce = cache[i];
+			if (ce->ce_flags & CE_REMOVE)
+				continue;
+			if (ce->ce_flags & CE_WATCHED) {
+				data[id / 8] |= 1 << (id % 8);
+				has_watches--;
+			}
+			id++;
+		}
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
+			|| ce_write(&c, newfd, data, sz) < 0;
+		free(data);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd) || fstat(newfd, &st))
 		return -1;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/6] read-cache: connect to file watcher
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 1/6] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 2/6] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
@ 2014-01-12 11:03 ` Nguyễn Thái Ngọc Duy
  2014-01-15 10:58   ` Jeff King
  2014-01-12 11:03 ` [PATCH 4/6] read-cache: get "updated" path list from " Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This patch establishes a connection between a new file watcher daemon
and git. Each index file may have at most one file watcher attached to
it. The file watcher maintains a UNIX socket at
$GIT_DIR/index.watcher. Any process that has write access to $GIT_DIR
can talk to the file watcher.

A validation is performed after git connects to the file watcher to
make sure both sides have the same view. This is done by exchanging
the index signature (*) The file watcher keeps a copy of the signature
locally while git computes the signature from the index. If the
signatures do not match, something has gone wrong so both sides
reinitialize wrt. to file watching: the file watcher clears all
watches while git clears CE_WATCHED flags.

If the signatures match, we can trust the file watcher and git can
start asking questions that are not important to this patch.

TODO: do not let git hang if the file watcher refuses to
answer. Timeout and move on without file watcher support after 20ms or
so.

(*) for current index versions, the signature is the index SHA-1
trailer. But it could be something else (e.g. v5 does not have SHA-1
trailer)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore           |   1 +
 Makefile             |   1 +
 cache.h              |   1 +
 file-watcher.c (new) | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++
 git-compat-util.h    |   5 ++
 read-cache.c         |  48 ++++++++++++++++++
 wrapper.c            |  27 ++++++++++
 7 files changed, 219 insertions(+)
 create mode 100644 file-watcher.c

diff --git a/.gitignore b/.gitignore
index b5f9def..12c78f0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -56,6 +56,7 @@
 /git-fast-import
 /git-fetch
 /git-fetch-pack
+/git-file-watcher
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
diff --git a/Makefile b/Makefile
index b4af1e2..ca5dc96 100644
--- a/Makefile
+++ b/Makefile
@@ -536,6 +536,7 @@ PROGRAMS += $(EXTRA_PROGRAMS)
 PROGRAM_OBJS += credential-store.o
 PROGRAM_OBJS += daemon.o
 PROGRAM_OBJS += fast-import.o
+PROGRAM_OBJS += file-watcher.o
 PROGRAM_OBJS += http-backend.o
 PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
diff --git a/cache.h b/cache.h
index dfa8622..6a182b5 100644
--- a/cache.h
+++ b/cache.h
@@ -281,6 +281,7 @@ struct index_state {
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
 	unsigned char sha1[20];
+	int watcher;
 };
 
 extern struct index_state the_index;
diff --git a/file-watcher.c b/file-watcher.c
new file mode 100644
index 0000000..66b44e5
--- /dev/null
+++ b/file-watcher.c
@@ -0,0 +1,136 @@
+#include "cache.h"
+#include "sigchain.h"
+
+static char index_signature[41];
+
+static int handle_command(int fd, char *msg, int msgsize)
+{
+	struct sockaddr_un sun;
+	int len;
+	socklen_t socklen;
+	const char *arg;
+
+	socklen = sizeof(sun);
+	len = recvfrom(fd, msg, msgsize, 0, &sun, &socklen);
+	if (!len)
+		return -1;
+	if (len == -1)
+		die_errno("read");
+	msg[len] = '\0';
+
+	if ((arg = skip_prefix(msg, "hello "))) {
+		sendtof(fd, 0, &sun, socklen, "hello %s", index_signature);
+		if (!strcmp(index_signature, arg))
+			return 0;
+		/*
+		 * Index SHA-1 mismatch, something has gone
+		 * wrong. Clean up and start over.
+		 */
+		strlcpy(index_signature, arg, sizeof(index_signature));
+	} else {
+		die("unrecognized command %s", msg);
+	}
+	return 0;
+}
+
+static const char *socket_path;
+static int do_not_clean_up;
+
+static void cleanup(void)
+{
+	if (do_not_clean_up)
+		return;
+	unlink(socket_path);
+}
+
+static void cleanup_on_signal(int signo)
+{
+	cleanup();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void daemonize(void)
+{
+#ifdef NO_POSIX_GOODIES
+	die("fork not supported on this platform");
+#else
+	switch (fork()) {
+		case 0:
+			break;
+		case -1:
+			die_errno("fork failed");
+		default:
+			do_not_clean_up = 1;
+			exit(0);
+	}
+	if (setsid() == -1)
+		die_errno("setsid failed");
+	close(0);
+	close(1);
+	close(2);
+	sanitize_stdfds();
+#endif
+}
+
+int main(int argc, char **argv)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct sockaddr_un sun;
+	struct pollfd pfd[2];
+	int fd, err, nr;
+	int msgsize;
+	char *msg;
+	socklen_t vallen = sizeof(msgsize);
+	int no_daemon = 0;
+
+	if (!strcmp(argv[1], "--no-daemon")) {
+		no_daemon =1;
+		argv++;
+		argc--;
+	}
+	if (argc < 2)
+		die("insufficient arguments");
+	socket_path = argv[1];
+	memset(index_signature, 0, sizeof(index_signature));
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	sun.sun_family = AF_UNIX;
+	strlcpy(sun.sun_path, socket_path, sizeof(sun.sun_path));
+	if (bind(fd, (struct sockaddr *)&sun, sizeof(sun)))
+		die_errno("unable to bind to %s", socket_path);
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
+	if (getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &msgsize, &vallen))
+		die_errno("could not get SO_SNDBUF");
+	msg = xmalloc(msgsize + 1);
+
+	if (!no_daemon) {
+		strbuf_addf(&sb, "%s.log", socket_path);
+		err = open(sb.buf, O_CREAT | O_TRUNC | O_WRONLY, 0600);
+		if (err == -1)
+			die_errno("unable to create %s", sb.buf);
+		daemonize();
+		dup2(err, 1);
+		dup2(err, 2);
+		close(err);
+	}
+
+	nr = 0;
+	pfd[nr].fd = fd;
+	pfd[nr++].events = POLLIN;
+
+	for (;;) {
+		if (poll(pfd, nr, -1) < 0) {
+			if (errno != EINTR) {
+				error("Poll failed, resuming: %s", strerror(errno));
+				sleep(1);
+			}
+			continue;
+		}
+
+		if ((pfd[0].revents & POLLIN) && handle_command(fd, msg, msgsize))
+			break;
+	}
+	return 0;
+}
diff --git a/git-compat-util.h b/git-compat-util.h
index b73916b..c119a94 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -536,6 +536,11 @@ extern void *xcalloc(size_t nmemb, size_t size);
 extern void *xmmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
 extern ssize_t xread(int fd, void *buf, size_t len);
 extern ssize_t xwrite(int fd, const void *buf, size_t len);
+extern ssize_t writef(int fd, const char *fmt, ...)
+	__attribute__((format (printf, 2, 3)));
+extern ssize_t sendtof(int sockfd, int flags, const void *dest_addr,
+		       socklen_t addrlen, const char *fmt, ...)
+	__attribute__((format (printf, 5, 6)));
 extern int xdup(int fd);
 extern FILE *xfdopen(int fd, const char *mode);
 extern int xmkstemp(char *template);
diff --git a/read-cache.c b/read-cache.c
index 098d3b6..506d488 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1443,6 +1443,49 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void connect_watcher(struct index_state *istate, const char *path)
+{
+	struct stat st;
+	struct strbuf sb = STRBUF_INIT;
+	int i;
+
+	strbuf_addf(&sb, "%s.watcher", path);
+	if (!stat(sb.buf, &st) && S_ISSOCK(st.st_mode)) {
+		struct sockaddr_un sun;
+		istate->watcher = socket(AF_UNIX, SOCK_DGRAM, 0);
+		sun.sun_family = AF_UNIX;
+		strlcpy(sun.sun_path, sb.buf, sizeof(sun.sun_path));
+		if (connect(istate->watcher, (struct sockaddr *)&sun, sizeof(sun))) {
+			perror("connect");
+			close(istate->watcher);
+			istate->watcher = -1;
+		}
+		sprintf(sun.sun_path, "%c%"PRIuMAX, 0, (uintmax_t)getpid());
+		bind(istate->watcher, (struct sockaddr *)&sun, sizeof(sun));
+	} else
+		istate->watcher = -1;
+	strbuf_release(&sb);
+	if (istate->watcher != -1) {
+		char line[1024];
+		int len;
+		strbuf_addf(&sb, "hello %s", sha1_to_hex(istate->sha1));
+		write(istate->watcher, sb.buf, sb.len);
+		len = read(istate->watcher, line, sizeof(line) - 1);
+		if (len > 0) {
+			line[len] = '\0';
+			if (!strcmp(sb.buf, line))
+				return; /* good */
+		}
+	}
+
+	/* No the file watcher is out of date, clear everything */
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHED) {
+			istate->cache[i]->ce_flags &= ~CE_WATCHED;
+			istate->cache_changed = 1;
+		}
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1528,6 +1571,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += extsize;
 	}
 	munmap(mmap, mmap_size);
+	connect_watcher(istate, path);
 	return istate->cache_nr;
 
 unmap:
@@ -1557,6 +1601,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->watcher != -1) {
+		close(istate->watcher);
+		istate->watcher = -1;
+	}
 	return 0;
 }
 
diff --git a/wrapper.c b/wrapper.c
index 0cc5636..29e3b35 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -455,3 +455,30 @@ struct passwd *xgetpwuid_self(void)
 		    errno ? strerror(errno) : _("no such user"));
 	return pw;
 }
+
+ssize_t writef(int fd, const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int ret;
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+	ret = write(fd, sb.buf, sb.len);
+	strbuf_release(&sb);
+	return ret;
+}
+
+ssize_t sendtof(int sockfd, int flags, const void *dest_addr, socklen_t addrlen,
+		const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int ret;
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+	ret = sendto(sockfd, sb.buf, sb.len, flags, dest_addr, addrlen);
+	strbuf_release(&sb);
+	return ret;
+}
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/6] read-cache: get "updated" path list from file watcher
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2014-01-12 11:03 ` [PATCH 3/6] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
@ 2014-01-12 11:03 ` Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 5/6] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

A new command is added to file watcher to send back the list of
updated files to git. These entries will have CE_WATCHED removed. The
remaining CE_WATCHED entries will have CE_VALID set (i.e. no changes
and no lstat either).

The file watcher keeps reporting the same "updated" list until it
receives "forget" commands, which should only be issued after the
updated index is written down. This ensures that if git crashes half
way before it could update the index (or multiple processes is reading
the same index), "updated" info is not lost. After the index is
updated (e.g. in this case because of toggling CE_WATCHED bits), git
sends the new index signature to the file watcher.

The file watcher does not cache stat info and send back to git. Its
main purpose is to reduce lstat on most untouched files, not to
completely eliminate lstat.

One can see that, assuming CE_WATCHED is magically set in some
entries, they will be all cleared over the time and we need to do
lstat on all entries. We haven't talked about how CE_WATCHED is set
yet. More to come later.

TODO: get as many paths as possible in one packet to reduce round
trips

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h        |  1 +
 file-watcher.c | 35 +++++++++++++++++++---
 read-cache.c   | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 6a182b5..2eddc1e 100644
--- a/cache.h
+++ b/cache.h
@@ -282,6 +282,7 @@ struct index_state {
 	struct hash_table dir_hash;
 	unsigned char sha1[20];
 	int watcher;
+	struct string_list *updated_entries;
 };
 
 extern struct index_state the_index;
diff --git a/file-watcher.c b/file-watcher.c
index 66b44e5..6aeed4d 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -1,7 +1,16 @@
 #include "cache.h"
 #include "sigchain.h"
+#include "string-list.h"
 
 static char index_signature[41];
+static struct string_list updated = STRING_LIST_INIT_DUP;
+static int updated_sorted;
+
+static void reset(const char *sig)
+{
+	string_list_clear(&updated, 0);
+	strlcpy(index_signature, sig, sizeof(index_signature));
+}
 
 static int handle_command(int fd, char *msg, int msgsize)
 {
@@ -22,10 +31,28 @@ static int handle_command(int fd, char *msg, int msgsize)
 		sendtof(fd, 0, &sun, socklen, "hello %s", index_signature);
 		if (!strcmp(index_signature, arg))
 			return 0;
-		/*
-		 * Index SHA-1 mismatch, something has gone
-		 * wrong. Clean up and start over.
-		 */
+		reset(arg);
+	} else if ((arg = skip_prefix(msg, "reset "))) {
+		reset(arg);
+	} else if (!strcmp(msg, "status")) {
+		int i;
+		for (i = 0; i < updated.nr; i++)
+			sendto(fd, updated.items[i].string,
+			       strlen(updated.items[i].string),
+			       0, &sun, socklen);
+		sendtof(fd, 0, &sun, socklen, "%c", 0);
+	} else if ((arg = skip_prefix(msg, "forget "))) {
+		struct string_list_item *item;
+		if (!updated_sorted) {
+			sort_string_list(&updated);
+			updated_sorted = 1;
+		}
+		item = string_list_lookup(&updated, arg);
+		if (item)
+			unsorted_string_list_delete_item(&updated,
+							 item - updated.items,
+							 0);
+	} else if ((arg = skip_prefix(msg, "bye "))) {
 		strlcpy(index_signature, arg, sizeof(index_signature));
 	} else {
 		die("unrecognized command %s", msg);
diff --git a/read-cache.c b/read-cache.c
index 506d488..caa2298 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1443,6 +1443,55 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void update_watched_files(struct index_state *istate)
+{
+	int i;
+	if (istate->watcher == -1)
+		return;
+	if (writef(istate->watcher, "status") < 0)
+		goto failed;
+	for (;;) {
+		char line[1024];
+		int len;
+		len = read(istate->watcher, line, sizeof(line) - 1);
+		if (len <= 0)
+			goto failed;
+		line[len] = '\0';
+		if (len == 1 && line[0] == '\0')
+			break;
+		i = index_name_pos(istate, line, len);
+		if (i < 0)
+			continue;
+		if (istate->cache[i]->ce_flags & CE_WATCHED) {
+			istate->cache[i]->ce_flags &= ~CE_WATCHED;
+			istate->cache_changed = 1;
+		}
+		if (!istate->updated_entries) {
+			struct string_list *sl;
+			sl = xmalloc(sizeof(*sl));
+			memset(sl, 0, sizeof(*sl));
+			sl->strdup_strings = 1;
+			istate->updated_entries = sl;
+		}
+		string_list_append(istate->updated_entries, line);
+	}
+
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHED)
+			istate->cache[i]->ce_flags |= CE_VALID;
+	return;
+failed:
+	if (istate->updated_entries) {
+		string_list_clear(istate->updated_entries, 0);
+		free(istate->updated_entries);
+		istate->updated_entries = NULL;
+	}
+	writef(istate->watcher, "reset %s", sha1_to_hex(istate->sha1));
+	for (i = 0; i < istate->cache_nr; i++)
+		istate->cache[i]->ce_flags &= ~CE_WATCHED;
+	istate->cache_changed = 1;
+}
+
 static void connect_watcher(struct index_state *istate, const char *path)
 {
 	struct stat st;
@@ -1473,8 +1522,10 @@ static void connect_watcher(struct index_state *istate, const char *path)
 		len = read(istate->watcher, line, sizeof(line) - 1);
 		if (len > 0) {
 			line[len] = '\0';
-			if (!strcmp(sb.buf, line))
+			if (!strcmp(sb.buf, line)) {
+				update_watched_files(istate);
 				return; /* good */
+			}
 		}
 	}
 
@@ -1486,6 +1537,20 @@ static void connect_watcher(struct index_state *istate, const char *path)
 		}
 }
 
+static void farewell_watcher(struct index_state *istate,
+			     const unsigned char *sha1)
+{
+	if (istate->watcher == -1)
+		return;
+	if (istate->updated_entries) {
+		int i;
+		for (i = 0; i < istate->updated_entries->nr; i++)
+			writef(istate->watcher, "forget %s",
+			       istate->updated_entries->items[i].string);
+	}
+	writef(istate->watcher, "bye %s", sha1_to_hex(sha1));
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1605,6 +1670,11 @@ int discard_index(struct index_state *istate)
 		close(istate->watcher);
 		istate->watcher = -1;
 	}
+	if (istate->updated_entries) {
+		string_list_clear(istate->updated_entries, 0);
+		free(istate->updated_entries);
+		istate->updated_entries = NULL;
+	}
 	return 0;
 }
 
@@ -1665,7 +1735,7 @@ static int write_index_ext_header(git_SHA_CTX *context, int fd,
 		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
 }
 
-static int ce_flush(git_SHA_CTX *context, int fd)
+static int ce_flush(git_SHA_CTX *context, int fd, unsigned char *sha1)
 {
 	unsigned int left = write_buffer_len;
 
@@ -1683,6 +1753,8 @@ static int ce_flush(git_SHA_CTX *context, int fd)
 
 	/* Append the SHA1 signature at the end */
 	git_SHA1_Final(write_buffer + left, context);
+	if (sha1)
+		hashcpy(sha1, write_buffer + left);
 	left += 20;
 	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
 }
@@ -1847,12 +1919,22 @@ int write_index(struct index_state *istate, int newfd)
 	int entries = istate->cache_nr;
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	unsigned char sha1[20];
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
-		else if (cache[i]->ce_flags & CE_WATCHED)
+		else if (cache[i]->ce_flags & CE_WATCHED) {
+			/*
+			 * CE_VALID when used with CE_WATCHED is not
+			 * supposed to be persistent. Next time git
+			 * runs, if this entry is still watched and
+			 * nothing has changed, CE_VALID will be
+			 * reinstated.
+			 */
+			cache[i]->ce_flags &= ~CE_VALID;
 			has_watches++;
+		}
 
 		/* reduce extended entries if possible */
 		cache[i]->ce_flags &= ~CE_EXTENDED;
@@ -1945,8 +2027,9 @@ int write_index(struct index_state *istate, int newfd)
 			return -1;
 	}
 
-	if (ce_flush(&c, newfd) || fstat(newfd, &st))
+	if (ce_flush(&c, newfd, sha1) || fstat(newfd, &st))
 		return -1;
+	farewell_watcher(istate, sha1);
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	return 0;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/6] read-cache: ask file watcher to watch files
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2014-01-12 11:03 ` [PATCH 4/6] read-cache: get "updated" path list from " Nguyễn Thái Ngọc Duy
@ 2014-01-12 11:03 ` Nguyễn Thái Ngọc Duy
  2014-01-12 11:03 ` [PATCH 6/6] file-watcher: support inotify Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We want to watch files that are never changed because lstat() on those
files is a wasted effort. So we sort unwatched files by date and start
adding them to the file watcher until it barfs (e.g. hits inotify
limit). Recently updated entries are also excluded from watch list.
CE_VALID is used in combination with CE_WATCHED. Those entries that
have CE_VALID already set will never be watched.

We send as many paths as possible in one packet in pkt-line
format. For small projects like git, all entries can be packed in one
packet. For large projects like webkit (182k entries) it takes two
packets. We may do prefix compression as well to send more in fewer
packets..

The file watcher replies how many entries it can watch (because at
least inotify has system limits).

Note that we still do lstat() on these new watched files because they
could have changed before the file watcher could watch them. Watched
files may only skip lstat() at the next git run.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 27 ++++++++++++++++
 pkt-line.c     |  2 +-
 pkt-line.h     |  2 ++
 read-cache.c   | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/file-watcher.c b/file-watcher.c
index 6aeed4d..35781fa 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -1,17 +1,41 @@
 #include "cache.h"
 #include "sigchain.h"
 #include "string-list.h"
+#include "pkt-line.h"
 
 static char index_signature[41];
 static struct string_list updated = STRING_LIST_INIT_DUP;
 static int updated_sorted;
 
+static int watch_path(char *path)
+{
+	return -1;
+}
+
 static void reset(const char *sig)
 {
 	string_list_clear(&updated, 0);
 	strlcpy(index_signature, sig, sizeof(index_signature));
 }
 
+static void watch_paths(char *buf, int maxlen,
+			int fd, struct sockaddr *sock,
+			socklen_t socklen)
+{
+	char *end = buf + maxlen;
+	int n, ret, len;
+	for (n = ret = 0; buf < end && !ret; buf += len) {
+		char ch;
+		len = packet_length(buf);
+		ch = buf[len];
+		buf[len] = '\0';
+		if (!(ret = watch_path(buf + 4)))
+			n++;
+		buf[len] = ch;
+	}
+	sendtof(fd, 0, sock, socklen, "fine %d", n);
+}
+
 static int handle_command(int fd, char *msg, int msgsize)
 {
 	struct sockaddr_un sun;
@@ -41,6 +65,9 @@ static int handle_command(int fd, char *msg, int msgsize)
 			       strlen(updated.items[i].string),
 			       0, &sun, socklen);
 		sendtof(fd, 0, &sun, socklen, "%c", 0);
+	} else if (starts_with(msg, "watch ")) {
+		watch_paths(msg + 6, len - 6,
+			    fd, (struct sockaddr *)&sun, socklen);
 	} else if ((arg = skip_prefix(msg, "forget "))) {
 		struct string_list_item *item;
 		if (!updated_sorted) {
diff --git a/pkt-line.c b/pkt-line.c
index bc63b3b..b5af84e 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -135,7 +135,7 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 	return ret;
 }
 
-static int packet_length(const char *linelen)
+int packet_length(const char *linelen)
 {
 	int n;
 	int len = 0;
diff --git a/pkt-line.h b/pkt-line.h
index 0a838d1..40470b9 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -75,6 +75,8 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+int packet_length(const char *linelen);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 extern char packet_buffer[LARGE_PACKET_MAX];
diff --git a/read-cache.c b/read-cache.c
index caa2298..839fd7c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "strbuf.h"
 #include "varint.h"
+#include "pkt-line.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
@@ -1537,6 +1538,90 @@ static void connect_watcher(struct index_state *istate, const char *path)
 		}
 }
 
+static int sort_by_date(const void *a_, const void *b_)
+{
+	const struct cache_entry *a = *(const struct cache_entry **)a_;
+	const struct cache_entry *b = *(const struct cache_entry **)b_;
+	uint32_t seca = a->ce_stat_data.sd_mtime.sec;
+	uint32_t secb = b->ce_stat_data.sd_mtime.sec;
+	return seca - secb;
+}
+
+static inline int ce_watchable(struct cache_entry *ce, time_t now)
+{
+	return !(ce->ce_flags & CE_WATCHED) &&
+		!(ce->ce_flags & CE_VALID) &&
+		(ce->ce_stat_data.sd_mtime.sec + 1800 < now);
+}
+
+static int do_watch_entries(struct index_state *istate,
+			    struct cache_entry **cache,
+			    struct strbuf *sb, int start, int now)
+{
+	char line[1024];
+	int i, len;
+
+	write(istate->watcher, sb->buf, sb->len);
+	len = read(istate->watcher, line, sizeof(line) - 1);
+	if (len <= 0)
+		return -1;
+	line[len] = '\0';
+	if (starts_with(line, "fine ")) {
+		char *end;
+		long n = strtoul(line + 5, &end, 10);
+		if (end != line + len)
+			return -1;
+		for (i = 0; i < n; i++)
+			cache[start + i]->ce_flags |= CE_WATCHED;
+		istate->cache_changed = 1;
+		if (i != now)
+			return -1;
+	} else
+		return -1;
+	start = i;
+	strbuf_reset(sb);
+	strbuf_addstr(sb, "watch ");
+	return 0;
+}
+
+static void watch_entries(struct index_state *istate)
+{
+	int i, start, nr;
+	struct cache_entry **sorted;
+	struct strbuf sb = STRBUF_INIT;
+	int val;
+	socklen_t vallen = sizeof(val);
+	time_t now = time(NULL);
+
+	if (istate->watcher == -1)
+		return;
+	for (i = nr = 0; i < istate->cache_nr; i++)
+		if (ce_watchable(istate->cache[i], now))
+			nr++;
+	if (nr < 50)
+		return;
+	sorted = xmalloc(sizeof(*sorted) * nr);
+	for (i = nr = 0; i < istate->cache_nr; i++)
+		if (ce_watchable(istate->cache[i], now))
+			sorted[nr++] = istate->cache[i];
+
+	getsockopt(istate->watcher, SOL_SOCKET, SO_SNDBUF, &val, &vallen);
+	strbuf_grow(&sb, val);
+	strbuf_addstr(&sb, "watch ");
+
+	qsort(sorted, nr, sizeof(*sorted), sort_by_date);
+	for (i = start = 0; i < nr; i++) {
+		if (sb.len + 4 + ce_namelen(sorted[i]) >= val &&
+		    do_watch_entries(istate, sorted, &sb, start, i))
+			break;
+		packet_buf_write(&sb, "%s", sorted[i]->name);
+	}
+	if (i == nr && start < i)
+		do_watch_entries(istate, sorted, &sb, start, i);
+	strbuf_release(&sb);
+	free(sorted);
+}
+
 static void farewell_watcher(struct index_state *istate,
 			     const unsigned char *sha1)
 {
@@ -1637,6 +1722,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	}
 	munmap(mmap, mmap_size);
 	connect_watcher(istate, path);
+	watch_entries(istate);
 	return istate->cache_nr;
 
 unmap:
@@ -1933,6 +2019,17 @@ int write_index(struct index_state *istate, int newfd)
 			 * reinstated.
 			 */
 			cache[i]->ce_flags &= ~CE_VALID;
+			/*
+			 * We may set CE_WATCHED (but not CE_VALID)
+			 * early when refresh has not been done
+			 * yet. At that time we had no idea if the
+			 * entry may have been updated. If it has
+			 * been, remove CE_WATCHED so CE_VALID won't
+			 * incorrectly be set next time if the file
+			 * watcher reports no changes.
+			 */
+			if (!ce_uptodate(cache[i]))
+				cache[i]->ce_flags &= ~CE_WATCHED;
 			has_watches++;
 		}
 
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 6/6] file-watcher: support inotify
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2014-01-12 11:03 ` [PATCH 5/6] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
@ 2014-01-12 11:03 ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
  2014-02-19 20:35 ` [PATCH 0/6] " Shawn Pearce
  7 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-12 11:03 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

"git diff" on webkit:

        no file watcher  1st run   subsequent runs
real        0m1.361s    0m1.445s      0m0.691s
user        0m0.889s    0m0.940s      0m0.649s
sys         0m0.469s    0m0.495s      0m0.040s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 config.mak.uname |   1 +
 file-watcher.c   | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+)

diff --git a/config.mak.uname b/config.mak.uname
index 82d549e..603890d 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_PATHS_H = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	BASIC_CFLAGS += -DHAVE_INOTIFY
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	NO_STRLCPY = YesPlease
diff --git a/file-watcher.c b/file-watcher.c
index 35781fa..1512b46 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -3,17 +3,140 @@
 #include "string-list.h"
 #include "pkt-line.h"
 
+#ifdef HAVE_INOTIFY
+#include <sys/inotify.h>
+#endif
+
 static char index_signature[41];
 static struct string_list updated = STRING_LIST_INIT_DUP;
 static int updated_sorted;
 
+#ifdef HAVE_INOTIFY
+
+static struct string_list watched_dirs = STRING_LIST_INIT_DUP;
+static int watched_dirs_sorted;
+static int inotify_fd;
+
+struct dir_info {
+	int wd;
+	struct string_list names;
+	int names_sorted;
+};
+
+static int handle_inotify(int fd)
+{
+	char buf[sizeof(struct inotify_event) + NAME_MAX + 1];
+	struct inotify_event *event;
+	struct dir_info *dir;
+	struct string_list_item *item;
+	int i;
+	int len = read(fd, buf, sizeof(buf));
+	if (len < 0)
+		return -1;
+	event = (struct inotify_event *)buf;
+
+	if (len <= sizeof(struct inotify_event))
+		return 0;
+
+	for (i = 0; i < watched_dirs.nr; i++) {
+		struct dir_info *dir = watched_dirs.items[i].util;
+		if (dir->wd == event->wd)
+			break;
+	}
+	if (i == watched_dirs.nr)
+		return 0;
+	dir = watched_dirs.items[i].util;
+
+	if (!dir->names_sorted) {
+		sort_string_list(&dir->names);
+		dir->names_sorted = 1;
+	}
+	item = string_list_lookup(&dir->names, event->name);
+	if (item) {
+		if (!strcmp(watched_dirs.items[i].string, "."))
+			string_list_append(&updated, event->name);
+		else {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addf(&sb, "%s/%s", watched_dirs.items[i].string,
+				    item->string);
+			string_list_append(&updated, sb.buf);
+			updated_sorted = 0;
+			strbuf_release(&sb);
+		}
+
+		unsorted_string_list_delete_item(&dir->names,
+						 item - dir->names.items, 0);
+		if (dir->names.nr == 0) {
+			inotify_rm_watch(inotify_fd, dir->wd);
+			unsorted_string_list_delete_item(&watched_dirs, i, 1);
+		}
+	}
+	return 0;
+}
+
+static int watch_path(char *path)
+{
+	struct string_list_item *item;
+	char *sep = strrchr(path, '/');
+	struct dir_info *dir;
+	const char *dirname = ".";
+
+	if (sep) {
+		*sep = '\0';
+		dirname = path;
+	}
+
+	if (!watched_dirs_sorted) {
+		sort_string_list(&watched_dirs);
+		watched_dirs_sorted = 1;
+	}
+	item = string_list_lookup(&watched_dirs, dirname);
+	if (!item) {
+		int ret = inotify_add_watch(inotify_fd, dirname,
+					    IN_ATTRIB | IN_DELETE | IN_MODIFY |
+					    IN_MOVED_FROM | IN_MOVED_TO);
+		if (ret < 0)
+			return -1;
+		dir = xmalloc(sizeof(*dir));
+		memset(dir, 0, sizeof(*dir));
+		dir->wd = ret;
+		dir->names.strdup_strings = 1;
+		item = string_list_append(&watched_dirs, dirname);
+		item->util = dir;
+	}
+	dir = item->util;
+	string_list_append(&dir->names, sep ? sep + 1 : path);
+	dir->names_sorted = 0;
+	return 0;
+}
+
+static void reset_watches(void)
+{
+	int i;
+	for (i = 0; i < watched_dirs.nr; i++) {
+		struct dir_info *dir = watched_dirs.items[i].util;
+		inotify_rm_watch(inotify_fd, dir->wd);
+		string_list_clear(&dir->names, 0);
+	}
+	string_list_clear(&watched_dirs, 1);
+}
+
+#else
+
 static int watch_path(char *path)
 {
 	return -1;
 }
 
+static void reset_watches(void)
+{
+}
+
+#endif
+
 static void reset(const char *sig)
 {
+	reset_watches();
 	string_list_clear(&updated, 0);
 	strlcpy(index_signature, sig, sizeof(index_signature));
 }
@@ -155,6 +278,14 @@ int main(int argc, char **argv)
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
+#ifdef HAVE_INOTIFY
+	inotify_fd = inotify_init();
+	if (inotify_fd < 0)
+		die_errno("unable to initialize inotify");
+#else
+	die("no file watching mechanism is supported");
+#endif
+
 	if (getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &msgsize, &vallen))
 		die_errno("could not get SO_SNDBUF");
 	msg = xmalloc(msgsize + 1);
@@ -173,6 +304,10 @@ int main(int argc, char **argv)
 	nr = 0;
 	pfd[nr].fd = fd;
 	pfd[nr++].events = POLLIN;
+#ifdef HAVE_INOTIFY
+	pfd[nr].fd = inotify_fd;
+	pfd[nr++].events = POLLIN;
+#endif
 
 	for (;;) {
 		if (poll(pfd, nr, -1) < 0) {
@@ -185,6 +320,10 @@ int main(int argc, char **argv)
 
 		if ((pfd[0].revents & POLLIN) && handle_command(fd, msg, msgsize))
 			break;
+#ifdef HAVE_INOTIFY
+		if ((pfd[1].revents & POLLIN) && handle_inotify(inotify_fd))
+			break;
+#endif
 	}
 	return 0;
 }
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/6] read-cache: new extension to mark what file is watched
  2014-01-12 11:03 ` [PATCH 2/6] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
@ 2014-01-13 17:02   ` Jonathan Nieder
  2014-01-14  1:25     ` Duy Nguyen
  2014-01-14  1:39   ` Duy Nguyen
  1 sibling, 1 reply; 72+ messages in thread
From: Jonathan Nieder @ 2014-01-13 17:02 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Hi,

Nguyễn Thái Ngọc Duy wrote:

> If an entry is "watched", git lets an external program decide if the
> entry is modified or not. It's more like --assume-unchanged, but
> designed to be controlled by machine.
>
> We are running out of on-disk ce_flags, so instead of extending
> on-disk entry format again, "watched" flags are in-core only and
> stored as extension instead.

Makes sense.

Care to add a brief description of the on-disk format for
Documetnation/technical/index-format.txt as well?

[...]
> --- a/cache.h
> +++ b/cache.h
> @@ -168,6 +168,7 @@ struct cache_entry {
>  
>  /* used to temporarily mark paths matched by pathspecs */
>  #define CE_MATCHED           (1 << 26)
> +#define CE_WATCHED           (1 << 27)

Nit: I'd add a blank line before the definition of CE_WATCHED to make
it clear that the comment doesn't apply to it.

Maybe it belongs with one of the groups before (e.g., UNPACKED +
NEW_SKIP_WORKTREE).  I dunno.

> --- a/read-cache.c
> +++ b/read-cache.c
[...]
> @@ -1289,6 +1290,19 @@ static int verify_hdr(struct cache_header *hdr,
>  	return 0;
>  }
>  
> +static void read_watch_extension(struct index_state *istate, uint8_t *data,
> +				 unsigned long sz)
> +{
> +	int i;
> +	if ((istate->cache_nr + 7) / 8 != sz) {
> +		error("invalid 'WATC' extension");
> +		return;
> +	}
> +	for (i = 0; i < istate->cache_nr; i++)
> +		if (data[i / 8] & (1 << (i % 8)))
> +			istate->cache[i]->ce_flags |= CE_WATCHED;
> +}

So the WATC section has one bit per index entry, encoding whether that
entry is WATCHED.  Makes sense.

Do I understand correctly that this patch just takes care of the
bookkeeping for the CE_WATCHED bit and the actual semantics will
come in a later patch?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/6] read-cache: new extension to mark what file is watched
  2014-01-13 17:02   ` Jonathan Nieder
@ 2014-01-14  1:25     ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-14  1:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List

On Tue, Jan 14, 2014 at 12:02 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi,
>
> Nguyễn Thái Ngọc Duy wrote:
>
>> If an entry is "watched", git lets an external program decide if the
>> entry is modified or not. It's more like --assume-unchanged, but
>> designed to be controlled by machine.
>>
>> We are running out of on-disk ce_flags, so instead of extending
>> on-disk entry format again, "watched" flags are in-core only and
>> stored as extension instead.
>
> Makes sense.
>
> Care to add a brief description of the on-disk format for
> Documetnation/technical/index-format.txt as well?

Sure, in the reroll after I fix inotify bugs that make the test suite fail.

>> +static void read_watch_extension(struct index_state *istate, uint8_t *data,
>> +                              unsigned long sz)
>> +{
>> +     int i;
>> +     if ((istate->cache_nr + 7) / 8 != sz) {
>> +             error("invalid 'WATC' extension");
>> +             return;
>> +     }
>> +     for (i = 0; i < istate->cache_nr; i++)
>> +             if (data[i / 8] & (1 << (i % 8)))
>> +                     istate->cache[i]->ce_flags |= CE_WATCHED;
>> +}
>
> So the WATC section has one bit per index entry, encoding whether that
> entry is WATCHED.  Makes sense.
>
> Do I understand correctly that this patch just takes care of the
> bookkeeping for the CE_WATCHED bit and the actual semantics will
> come in a later patch?

Correct.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/6] read-cache: new extension to mark what file is watched
  2014-01-12 11:03 ` [PATCH 2/6] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
  2014-01-13 17:02   ` Jonathan Nieder
@ 2014-01-14  1:39   ` Duy Nguyen
  1 sibling, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-14  1:39 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Nguyễn Thái Ngọc Duy

On Sun, Jan 12, 2014 at 6:03 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> We are running out of on-disk ce_flags,

Correction, we're not. I saw

/*
 * Extended on-disk flags
 */
#define CE_INTENT_TO_ADD     (1 << 29)
#define CE_SKIP_WORKTREE     (1 << 30)

followed by

/* CE_EXTENDED2 is for future extension */
#define CE_EXTENDED2         (1 << 31)

and panicked, but on-disk flags could be added backward (e.g. bit 28,
27...). Anyway using extended flags means 2 extra bytes per entry for
almost every entry in this case (and for index v5 it means redoing
crc32 for almost every entry too when the bit is updated) so it may
still be a good idea to keep the new flag separate.

> so instead of extending
> on-disk entry format again, "watched" flags are in-core only and
> stored as extension instead.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/6] read-cache: connect to file watcher
  2014-01-12 11:03 ` [PATCH 3/6] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
@ 2014-01-15 10:58   ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2014-01-15 10:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Sun, Jan 12, 2014 at 06:03:39PM +0700, Nguyễn Thái Ngọc Duy wrote:

> This patch establishes a connection between a new file watcher daemon
> and git. Each index file may have at most one file watcher attached to
> it. The file watcher maintains a UNIX socket at
> $GIT_DIR/index.watcher. Any process that has write access to $GIT_DIR
> can talk to the file watcher.

IIRC, this is not portable. Some systems (not Linux) will allow anyone
to connect to the socket if it the file is accessible to them (so
anybody with read access to $GIT_DIR can talk to the file watcher). The
usual trick is to put it in a sub-directory that only the connectors can
access (e.g., put it in "$GIT_DIR/watcher/index", and create "watcher"
mode 0700).

-Peff

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 00/14] inotify support
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2014-01-12 11:03 ` [PATCH 6/6] file-watcher: support inotify Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47 ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 01/14] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
                     ` (12 more replies)
  2014-02-19 20:35 ` [PATCH 0/6] " Shawn Pearce
  7 siblings, 13 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

This is getting in better shape. Still wondering if the design is
right, so documentation, tests and some error cases are still
neglected. I have not addressed Jonathan's and Jeff's comments in this
reroll, but I haven't forgotten them yet. The test suite seems to be
fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set..

Thomas, you were a proponent of per-user daemon last time. I agree
that is a better solution when you need to support submodules. So if
you have time, have a look and see if anything I did may prevent
per-user daemon changes later (hint, I have a few unfriendly exit() in
file-watcher.c). You also worked with inotify before maybe you can
help spot some mishandling too as I'm totally new to inotify.

Nguyễn Thái Ngọc Duy (14):
  read-cache: save trailing sha-1
  read-cache: new extension to mark what file is watched
  read-cache: connect to file watcher
  read-cache: ask file watcher to watch files
  read-cache: put some limits on file watching
  read-cache: get modified file list from file watcher
  read-cache: add config to start file watcher automatically
  read-cache: add GIT_TEST_FORCE_WATCHER for testing
  file-watcher: add --shutdown and --log options
  file-watcher: automatically quit
  file-watcher: support inotify
  file-watcher: exit when cwd is gone
  pkt-line.c: increase buffer size to 8192
  t1301: exclude sockets from file permission check

 .gitignore               |   1 +
 Documentation/config.txt |  14 ++
 Makefile                 |   2 +
 cache.h                  |   8 +
 config.mak.uname         |   1 +
 file-watcher-lib.c (new) | 109 +++++++++++
 file-watcher-lib.h (new) |   9 +
 file-watcher.c (new)     | 483 +++++++++++++++++++++++++++++++++++++++++++++++
 git-compat-util.h        |   3 +
 pkt-line.c               |   4 +-
 pkt-line.h               |   2 +
 read-cache.c             | 338 ++++++++++++++++++++++++++++++++-
 t/t1301-shared-repo.sh   |   3 +-
 trace.c                  |   3 +-
 14 files changed, 969 insertions(+), 11 deletions(-)
 create mode 100644 file-watcher-lib.c
 create mode 100644 file-watcher-lib.h
 create mode 100644 file-watcher.c

-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 01/14] read-cache: save trailing sha-1
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

This will be used as signature to know if the index has changed.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      | 1 +
 read-cache.c | 7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 323481c..a09d622 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
+	unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 3f735f3..fe1d153 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1273,10 +1273,11 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(struct cache_header *hdr,
+		      unsigned long size,
+		      unsigned char *sha1)
 {
 	git_SHA_CTX c;
-	unsigned char sha1[20];
 	int hdr_version;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
@@ -1465,7 +1466,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	close(fd);
 
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	if (verify_hdr(hdr, mmap_size, istate->sha1) < 0)
 		goto unmap;
 
 	istate->version = ntohl(hdr->hdr_version);
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 01/14] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17 11:19     ` Thomas Gummerer
  2014-01-19 17:06     ` Thomas Rast
  2014-01-17  9:47   ` [PATCH/WIP v2 03/14] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
                     ` (10 subsequent siblings)
  12 siblings, 2 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

If an entry is "watched", git lets an external program decide if the
entry is modified or not. It's more like --assume-unchanged, but
designed to be controlled by machine.

We are running out of on-disk ce_flags, so instead of extending
on-disk entry format again, "watched" flags are in-core only and
stored as extension instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  2 ++
 read-cache.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index a09d622..069dce7 100644
--- a/cache.h
+++ b/cache.h
@@ -168,6 +168,8 @@ struct cache_entry {
 /* used to temporarily mark paths matched by pathspecs */
 #define CE_MATCHED           (1 << 26)
 
+#define CE_WATCHED           (1 << 27)
+
 /*
  * Extended on-disk flags
  */
diff --git a/read-cache.c b/read-cache.c
index fe1d153..6f21e3f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -33,6 +33,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
 
 struct index_state the_index;
 
@@ -1293,6 +1294,19 @@ static int verify_hdr(struct cache_header *hdr,
 	return 0;
 }
 
+static void read_watch_extension(struct index_state *istate, uint8_t *data,
+				 unsigned long sz)
+{
+	int i;
+	if ((istate->cache_nr + 7) / 8 != sz) {
+		error("invalid 'WATC' extension");
+		return;
+	}
+	for (i = 0; i < istate->cache_nr; i++)
+		if (data[i / 8] & (1 << (i % 8)))
+			istate->cache[i]->ce_flags |= CE_WATCHED;
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1303,6 +1317,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_WATCH:
+		read_watch_extension(istate, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1781,7 +1798,7 @@ int write_index(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
-	int i, err, removed, extended, hdr_version;
+	int i, err, removed, extended, hdr_version, has_watches = 0;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
 	struct stat st;
@@ -1790,6 +1807,8 @@ int write_index(struct index_state *istate, int newfd)
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
+		else if (cache[i]->ce_flags & CE_WATCHED)
+			has_watches++;
 
 		/* reduce extended entries if possible */
 		cache[i]->ce_flags &= ~CE_EXTENDED;
@@ -1861,6 +1880,26 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
+	if (has_watches) {
+		int id, sz = (entries - removed + 7) / 8;
+		uint8_t *data = xmalloc(sz);
+		memset(data, 0, sz);
+		for (i = 0, id = 0; i < entries && has_watches; i++) {
+			struct cache_entry *ce = cache[i];
+			if (ce->ce_flags & CE_REMOVE)
+				continue;
+			if (ce->ce_flags & CE_WATCHED) {
+				data[id / 8] |= 1 << (id % 8);
+				has_watches--;
+			}
+			id++;
+		}
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
+			|| ce_write(&c, newfd, data, sz) < 0;
+		free(data);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd) || fstat(newfd, &st))
 		return -1;
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 03/14] read-cache: connect to file watcher
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 01/14] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17 15:24     ` Torsten Bögershausen
  2014-01-17  9:47   ` [PATCH/WIP v2 04/14] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
                     ` (9 subsequent siblings)
  12 siblings, 1 reply; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

This patch establishes a connection between a new file watcher daemon
and git. Each index file may have at most one file watcher attached to
it. The file watcher maintains a UNIX socket at
$GIT_DIR/index.watcher. Any process that has write access to $GIT_DIR
can talk to the file watcher.

A validation is performed after git connects to the file watcher to
make sure both sides have the same view. This is done by exchanging
the index signature (*) The file watcher keeps a copy of the signature
locally while git computes the signature from the index. If the
signatures do not match, something has gone wrong so both sides
reinitialize wrt. to file watching: the file watcher clears all
watches while git clears CE_WATCHED flags.

If the signatures match, we can trust the file watcher and git can
start asking questions that are not important to this patch.

This file watching thing is all about speed. So if the daemon is not
responding within 20ms (or even hanging), git moves on without it.

A note about per-repo vs global (or per-user) daemon approach. While I
implement per-repo daemon, this is actually implementation
details. Nothing can stop you from writing a global daemon that opens
unix sockets to many repos, e.g. to avoid hitting inotify's 128 user
instances limit.

If env variable GIT_NO_FILE_WATCHER is set, the file watcher is
ignored. 'WATC' extension is kept, but if the index is updated
(likely), it'll become invalid at the next connection.

(*) for current index versions, the signature is the index SHA-1
trailer. But it could be something else (e.g. v5 does not have SHA-1
trailer)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore               |   1 +
 Makefile                 |   2 +
 cache.h                  |   3 +
 file-watcher-lib.c (new) |  97 ++++++++++++++++++++++++++++++++
 file-watcher-lib.h (new) |   9 +++
 file-watcher.c (new)     | 142 +++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c             |  37 ++++++++++++
 trace.c                  |   3 +-
 8 files changed, 292 insertions(+), 2 deletions(-)
 create mode 100644 file-watcher-lib.c
 create mode 100644 file-watcher-lib.h
 create mode 100644 file-watcher.c

diff --git a/.gitignore b/.gitignore
index dc600f9..dc870cc 100644
--- a/.gitignore
+++ b/.gitignore
@@ -56,6 +56,7 @@
 /git-fast-import
 /git-fetch
 /git-fetch-pack
+/git-file-watcher
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
diff --git a/Makefile b/Makefile
index 287e6f8..4369b3b 100644
--- a/Makefile
+++ b/Makefile
@@ -536,6 +536,7 @@ PROGRAMS += $(EXTRA_PROGRAMS)
 PROGRAM_OBJS += credential-store.o
 PROGRAM_OBJS += daemon.o
 PROGRAM_OBJS += fast-import.o
+PROGRAM_OBJS += file-watcher.o
 PROGRAM_OBJS += http-backend.o
 PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
@@ -798,6 +799,7 @@ LIB_OBJS += entry.o
 LIB_OBJS += environment.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
+LIB_OBJS += file-watcher-lib.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
diff --git a/cache.h b/cache.h
index 069dce7..0d55551 100644
--- a/cache.h
+++ b/cache.h
@@ -282,6 +282,7 @@ struct index_state {
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
+	int watcher;
 };
 
 extern struct index_state the_index;
@@ -1241,6 +1242,8 @@ extern void alloc_report(void);
 __attribute__((format (printf, 1, 2)))
 extern void trace_printf(const char *format, ...);
 __attribute__((format (printf, 2, 3)))
+extern void trace_printf_key(const char *key, const char *fmt, ...);
+__attribute__((format (printf, 2, 3)))
 extern void trace_argv_printf(const char **argv, const char *format, ...);
 extern void trace_repo_setup(const char *prefix);
 extern int trace_want(const char *key);
diff --git a/file-watcher-lib.c b/file-watcher-lib.c
new file mode 100644
index 0000000..ed14ef9
--- /dev/null
+++ b/file-watcher-lib.c
@@ -0,0 +1,97 @@
+#include "cache.h"
+
+#define WAIT_TIME 20		/* in ms */
+#define TRACE_KEY "GIT_TRACE_WATCHER"
+
+int connect_watcher(const char *path)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct stat st;
+	int fd = -1;
+
+	strbuf_addf(&sb, "%s.watcher", path);
+	if (!stat(sb.buf, &st) && S_ISSOCK(st.st_mode)) {
+		struct sockaddr_un sun;
+		fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+		sun.sun_family = AF_UNIX;
+		strlcpy(sun.sun_path, sb.buf, sizeof(sun.sun_path));
+		if (connect(fd, (struct sockaddr *)&sun, sizeof(sun))) {
+			error(_("unable to connect to file watcher: %s"),
+			      strerror(errno));
+			close(fd);
+			fd = -1;
+		} else {
+			sprintf(sun.sun_path, "%c%"PRIuMAX, 0, (uintmax_t)getpid());
+			if (bind(fd, (struct sockaddr *)&sun, sizeof(sun))) {
+				error(_("unable to bind socket: %s"),
+				      strerror(errno));
+				close(fd);
+				fd = -1;
+			}
+		}
+	}
+	strbuf_release(&sb);
+	return fd;
+}
+
+ssize_t send_watcher(int sockfd, struct sockaddr_un *dest,
+		     const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct pollfd pfd;
+	int ret;
+
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	pfd.fd = sockfd;
+	pfd.events = POLLOUT;
+	ret = poll(&pfd, 1, WAIT_TIME);
+	if (ret > 0 && pfd.revents & POLLOUT) {
+		trace_printf_key(TRACE_KEY, "< %s\n", sb.buf);
+		if (dest)
+			ret = sendto(sockfd, sb.buf, sb.len, 0,
+				     (struct sockaddr *)dest,
+				     sizeof(struct sockaddr_un));
+		else
+			ret = write(sockfd, sb.buf, sb.len);
+	}
+	strbuf_release(&sb);
+	return ret;
+}
+
+char *read_watcher(int fd, ssize_t *size, struct sockaddr_un *sun)
+{
+	static char *buf;
+	static int buf_size;
+	struct pollfd pfd;
+	ssize_t len;
+
+	if (!buf_size) {
+		socklen_t vallen = sizeof(buf_size);
+		if (getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &buf_size, &vallen))
+			die_errno("could not get SO_SNDBUF from socket %d", fd);
+		buf = xmalloc(buf_size + 1);
+	}
+
+	pfd.fd = fd;
+	pfd.events = POLLIN;
+	if (poll(&pfd, 1, WAIT_TIME) > 0 &&
+	    (pfd.revents & POLLIN)) {
+		if (sun) {
+			socklen_t socklen = sizeof(*sun);
+			len = recvfrom(fd, buf, buf_size, 0, sun, &socklen);
+		} else
+			len = read(fd, buf, buf_size);
+		if (len > 0)
+			buf[len] = '\0';
+		if (size)
+			*size = len;
+		trace_printf_key(TRACE_KEY, "> %s\n", buf);
+		return buf;
+	} else if (size)
+		*size = 0;
+	return NULL;
+}
diff --git a/file-watcher-lib.h b/file-watcher-lib.h
new file mode 100644
index 0000000..0fe9399
--- /dev/null
+++ b/file-watcher-lib.h
@@ -0,0 +1,9 @@
+#ifndef __FILE_WATCHER_LIB__
+#define __FILE_WATCHER_LIB__
+
+int connect_watcher(const char *path);
+ssize_t send_watcher(int sockfd, struct sockaddr_un *dest,
+		     const char *fmt, ...)
+	__attribute__((format (printf, 3, 4)));
+char *read_watcher(int fd, ssize_t *size, struct sockaddr_un *sun);
+#endif
diff --git a/file-watcher.c b/file-watcher.c
new file mode 100644
index 0000000..36a9a8d
--- /dev/null
+++ b/file-watcher.c
@@ -0,0 +1,142 @@
+#include "cache.h"
+#include "sigchain.h"
+#include "parse-options.h"
+#include "exec_cmd.h"
+#include "file-watcher-lib.h"
+
+static const char *const file_watcher_usage[] = {
+	N_("git file-watcher [options]"),
+	NULL
+};
+
+static char index_signature[41];
+
+static int handle_command(int fd)
+{
+	struct sockaddr_un sun;
+	ssize_t len;
+	const char *arg;
+	char *msg;
+
+	if (!(msg = read_watcher(fd, &len, &sun)))
+		die_errno("read from socket");
+
+	if ((arg = skip_prefix(msg, "hello "))) {
+		send_watcher(fd, &sun, "hello %s", index_signature);
+		if (strcmp(arg, index_signature))
+			/*
+			 * Index SHA-1 mismatch, something has gone
+			 * wrong. Clean up and start over.
+			 */
+			index_signature[0] = '\0';
+	} else if (!strcmp(msg, "die")) {
+		exit(0);
+	} else {
+		die("unrecognized command %s", msg);
+	}
+	return 0;
+}
+
+static const char *socket_path;
+static int do_not_clean_up;
+
+static void cleanup(void)
+{
+	if (do_not_clean_up)
+		return;
+	unlink(socket_path);
+}
+
+static void cleanup_on_signal(int signo)
+{
+	cleanup();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void daemonize(void)
+{
+#ifdef NO_POSIX_GOODIES
+	die("fork not supported on this platform");
+#else
+	switch (fork()) {
+		case 0:
+			break;
+		case -1:
+			die_errno("fork failed");
+		default:
+			do_not_clean_up = 1;
+			exit(0);
+	}
+	if (setsid() == -1)
+		die_errno("setsid failed");
+	close(0);
+	close(1);
+	close(2);
+	sanitize_stdfds();
+#endif
+}
+
+int main(int argc, const char **argv)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct sockaddr_un sun;
+	struct pollfd pfd[2];
+	int fd, err, nr;
+	const char *prefix;
+	int daemon = 0;
+	struct option options[] = {
+		OPT_BOOL(0, "daemon", &daemon,
+			 N_("run in background")),
+		OPT_END()
+	};
+
+	git_extract_argv0_path(argv[0]);
+	git_setup_gettext();
+	prefix = setup_git_directory();
+	argc = parse_options(argc, argv, prefix, options,
+			     file_watcher_usage, 0);
+	if (argc)
+		die("too many arguments");
+
+	strbuf_addf(&sb, "%s.watcher", get_index_file());
+	socket_path = strbuf_detach(&sb, NULL);
+	memset(index_signature, 0, sizeof(index_signature));
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	sun.sun_family = AF_UNIX;
+	strlcpy(sun.sun_path, socket_path, sizeof(sun.sun_path));
+	if (bind(fd, (struct sockaddr *)&sun, sizeof(sun)))
+		die_errno("unable to bind to %s", socket_path);
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
+	if (daemon) {
+		strbuf_addf(&sb, "%s.log", socket_path);
+		err = open(sb.buf, O_CREAT | O_TRUNC | O_WRONLY, 0600);
+		adjust_shared_perm(sb.buf);
+		if (err == -1)
+			die_errno("unable to create %s", sb.buf);
+		daemonize();
+		dup2(err, 1);
+		dup2(err, 2);
+		close(err);
+	}
+
+	nr = 0;
+	pfd[nr].fd = fd;
+	pfd[nr++].events = POLLIN;
+
+	for (;;) {
+		if (poll(pfd, nr, -1) < 0) {
+			if (errno != EINTR) {
+				error("Poll failed, resuming: %s", strerror(errno));
+				sleep(1);
+			}
+			continue;
+		}
+
+		if ((pfd[0].revents & POLLIN) && handle_command(fd))
+			break;
+	}
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 6f21e3f..76cf0e3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "strbuf.h"
 #include "varint.h"
+#include "file-watcher-lib.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
@@ -1447,6 +1448,37 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void validate_watcher(struct index_state *istate, const char *path)
+{
+	int i;
+
+	if (getenv("GIT_NO_FILE_WATCHER")) {
+		istate->watcher = -1;
+		return;
+	}
+
+	istate->watcher = connect_watcher(path);
+	if (istate->watcher != -1) {
+		struct strbuf sb = STRBUF_INIT;
+		char *msg;
+		strbuf_addf(&sb, "hello %s", sha1_to_hex(istate->sha1));
+		if (send_watcher(istate->watcher, NULL, "%s", sb.buf) > 0 &&
+		    (msg = read_watcher(istate->watcher, NULL, NULL)) != NULL &&
+		    !strcmp(msg, sb.buf)) { /* good */
+			strbuf_release(&sb);
+			return;
+		}
+		strbuf_release(&sb);
+	}
+
+	/* No the file watcher is out of date, clear everything */
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHED) {
+			istate->cache[i]->ce_flags &= ~CE_WATCHED;
+			istate->cache_changed = 1;
+		}
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1532,6 +1564,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += extsize;
 	}
 	munmap(mmap, mmap_size);
+	validate_watcher(istate, path);
 	return istate->cache_nr;
 
 unmap:
@@ -1557,6 +1590,10 @@ int discard_index(struct index_state *istate)
 	istate->timestamp.nsec = 0;
 	free_name_hash(istate);
 	cache_tree_free(&(istate->cache_tree));
+	if (istate->watcher > 0) {
+		close(istate->watcher);
+		istate->watcher = -1;
+	}
 	istate->initialized = 0;
 	free(istate->cache);
 	istate->cache = NULL;
diff --git a/trace.c b/trace.c
index 3d744d1..0b8ebe0 100644
--- a/trace.c
+++ b/trace.c
@@ -75,8 +75,7 @@ static void trace_vprintf(const char *key, const char *fmt, va_list ap)
 	strbuf_release(&buf);
 }
 
-__attribute__((format (printf, 2, 3)))
-static void trace_printf_key(const char *key, const char *fmt, ...)
+void trace_printf_key(const char *key, const char *fmt, ...)
 {
 	va_list ap;
 	va_start(ap, fmt);
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 04/14] read-cache: ask file watcher to watch files
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 03/14] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 05/14] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

We want to watch files that are never changed because lstat() on those
files is a wasted effort. So we sort unwatched files by date and start
adding them to the file watcher until it barfs (e.g. hits inotify
limit). Recently updated entries are also excluded from watch list.
CE_VALID is used in combination with CE_WATCHED. Those entries that
have CE_VALID already set will never be watched.

We send as many paths as possible in one packet in pkt-line format to
reduce roundtrips. For small projects like git, all entries can be
packed in one packet. For large projects like webkit (182k entries) it
takes two packets. We may do prefix compression as well to send more
in fewer packets..

The file watcher replies how many entries it can watch (because at
least inotify has system limits).

Note that we still do lstat() on these new watched files because they
could have changed before the file watcher could watch them. Watched
files may only skip lstat() at the next git run.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c |  31 ++++++++++++++++
 pkt-line.c     |   2 +-
 pkt-line.h     |   2 ++
 read-cache.c   | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 143 insertions(+), 3 deletions(-)

diff --git a/file-watcher.c b/file-watcher.c
index 36a9a8d..3a54168 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -3,6 +3,7 @@
 #include "parse-options.h"
 #include "exec_cmd.h"
 #include "file-watcher-lib.h"
+#include "pkt-line.h"
 
 static const char *const file_watcher_usage[] = {
 	N_("git file-watcher [options]"),
@@ -11,6 +12,34 @@ static const char *const file_watcher_usage[] = {
 
 static char index_signature[41];
 
+static int watch_path(char *path)
+{
+	/*
+	 * Consider send "wait" every 10ms or so, in case there are
+	 * many paths to process that takes more than 20ms or the
+	 * sender won't keep waiting. This is usually one-time cost,
+	 * waiting a bit is ok.
+	 */
+	return -1;
+}
+
+static void watch_paths(char *buf, int maxlen,
+			int fd, struct sockaddr_un *sock)
+{
+	char *end = buf + maxlen;
+	int n, ret, len;
+	for (n = ret = 0; buf < end && !ret; buf += len) {
+		char ch;
+		len = packet_length(buf);
+		ch = buf[len];
+		buf[len] = '\0';
+		if (!(ret = watch_path(buf + 4)))
+			n++;
+		buf[len] = ch;
+	}
+	send_watcher(fd, sock, "fine %d", n);
+}
+
 static int handle_command(int fd)
 {
 	struct sockaddr_un sun;
@@ -29,6 +58,8 @@ static int handle_command(int fd)
 			 * wrong. Clean up and start over.
 			 */
 			index_signature[0] = '\0';
+	} else if (starts_with(msg, "watch ")) {
+		watch_paths(msg + 6, len - 6, fd, &sun);
 	} else if (!strcmp(msg, "die")) {
 		exit(0);
 	} else {
diff --git a/pkt-line.c b/pkt-line.c
index bc63b3b..b5af84e 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -135,7 +135,7 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 	return ret;
 }
 
-static int packet_length(const char *linelen)
+int packet_length(const char *linelen)
 {
 	int n;
 	int len = 0;
diff --git a/pkt-line.h b/pkt-line.h
index 0a838d1..40470b9 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -75,6 +75,8 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+int packet_length(const char *linelen);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 extern char packet_buffer[LARGE_PACKET_MAX];
diff --git a/read-cache.c b/read-cache.c
index 76cf0e3..21c3207 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -15,6 +15,7 @@
 #include "strbuf.h"
 #include "varint.h"
 #include "file-watcher-lib.h"
+#include "pkt-line.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
@@ -1479,6 +1480,98 @@ static void validate_watcher(struct index_state *istate, const char *path)
 		}
 }
 
+static int sort_by_date(const void *a_, const void *b_)
+{
+	const struct cache_entry *a = *(const struct cache_entry **)a_;
+	const struct cache_entry *b = *(const struct cache_entry **)b_;
+	uint32_t seca = a->ce_stat_data.sd_mtime.sec;
+	uint32_t secb = b->ce_stat_data.sd_mtime.sec;
+	return seca - secb;
+}
+
+static int do_watch_entries(struct index_state *istate,
+			    struct cache_entry **cache,
+			    struct strbuf *sb, int start, int now)
+{
+	char *line;
+	int i;
+	ssize_t len;
+
+	send_watcher(istate->watcher, NULL, "%s", sb->buf);
+	line = read_watcher(istate->watcher, &len, NULL);
+	if (!line) {
+		if (!len) {
+			close(istate->watcher);
+			istate->watcher = -1;
+		}
+		return -1;
+	}
+	if (starts_with(line, "fine ")) {
+		char *end;
+		long n = strtoul(line + 5, &end, 10);
+		if (end != line + len)
+			return -1;
+		for (i = 0; i < n; i++)
+			cache[start + i]->ce_flags |= CE_WATCHED;
+		istate->cache_changed = 1;
+		if (i != now)
+			return -1;
+	} else
+		return -1;
+	start = i;
+	strbuf_reset(sb);
+	strbuf_addstr(sb, "watch ");
+	return 0;
+}
+
+static inline int ce_watchable(struct cache_entry *ce)
+{
+	return ce_uptodate(ce) && /* write_index will catch late ce_uptodate bits */
+		!(ce->ce_flags & CE_WATCHED) &&
+		!(ce->ce_flags & CE_VALID) &&
+		/*
+		 * S_IFGITLINK should not be watched
+		 * obviously. S_IFLNK could be problematic because
+		 * inotify may follow symlinks without IN_DONT_FOLLOW
+		 */
+		S_ISREG(ce->ce_mode);
+}
+
+static void watch_entries(struct index_state *istate)
+{
+	int i, start, nr;
+	struct cache_entry **sorted;
+	struct strbuf sb = STRBUF_INIT;
+	int val;
+	socklen_t vallen = sizeof(val);
+
+	if (istate->watcher <= 0)
+		return;
+	for (i = nr = 0; i < istate->cache_nr; i++)
+		if (ce_watchable(istate->cache[i]))
+			nr++;
+	sorted = xmalloc(sizeof(*sorted) * nr);
+	for (i = nr = 0; i < istate->cache_nr; i++)
+		if (ce_watchable(istate->cache[i]))
+			sorted[nr++] = istate->cache[i];
+
+	getsockopt(istate->watcher, SOL_SOCKET, SO_SNDBUF, &val, &vallen);
+	strbuf_grow(&sb, val);
+	strbuf_addstr(&sb, "watch ");
+
+	qsort(sorted, nr, sizeof(*sorted), sort_by_date);
+	for (i = start = 0; i < nr; i++) {
+		if (sb.len + 4 + ce_namelen(sorted[i]) >= val &&
+		    do_watch_entries(istate, sorted, &sb, start, i))
+			break;
+		packet_buf_write(&sb, "%s", sorted[i]->name);
+	}
+	if (i == nr && start < i)
+		do_watch_entries(istate, sorted, &sb, start, i);
+	strbuf_release(&sb);
+	free(sorted);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1565,6 +1658,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	}
 	munmap(mmap, mmap_size);
 	validate_watcher(istate, path);
+	watch_entries(istate);
 	return istate->cache_nr;
 
 unmap:
@@ -1844,8 +1938,21 @@ int write_index(struct index_state *istate, int newfd)
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
-		else if (cache[i]->ce_flags & CE_WATCHED)
-			has_watches++;
+		else if (cache[i]->ce_flags & CE_WATCHED) {
+			/*
+			 * We may set CE_WATCHED (but not CE_VALID)
+			 * early when refresh has not been done
+			 * yet. At that time we had no idea if the
+			 * entry may have been updated. If it has
+			 * been, remove CE_WATCHED so CE_VALID won't
+			 * incorrectly be set next time if the file
+			 * watcher reports no changes.
+			 */
+			if (!ce_uptodate(cache[i]))
+				cache[i]->ce_flags &= ~CE_WATCHED;
+			else
+				has_watches++;
+		}
 
 		/* reduce extended entries if possible */
 		cache[i]->ce_flags &= ~CE_EXTENDED;
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 05/14] read-cache: put some limits on file watching
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (3 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 04/14] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-19 17:06     ` Thomas Rast
  2014-01-17  9:47   ` [PATCH/WIP v2 06/14] read-cache: get modified file list from file watcher Nguyễn Thái Ngọc Duy
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

watch_entries() is a lot of computation and could trigger a lot more
lookups in file-watcher. Normally after the first set of watches are
in place, we do not need to update often. Moreover if the number of
entries is small, the overhead of file watcher may actually slow git
down.

This patch only allows to update watches if the number of watchable
files is over a limit (and there are new files added if this is not
the first time). Measurements on Core i5-2520M and Linux 3.7.6, about
920 lstat() take 1ms. Somewhere between 2^16 and 2^17 lstat calls that
it starts to take longer than 100ms. 2^16 is chosen at the minimum
limit to start using file watcher.

Recently updated files are not considered watchable because they are
likely to be updated again soon, not worth the ping-pong game with
file watcher. The default limit 30min is just a random value.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt |  9 +++++++++
 cache.h                  |  1 +
 read-cache.c             | 44 ++++++++++++++++++++++++++++++++++++--------
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a405806..e394399 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1038,6 +1038,15 @@ difftool.<tool>.cmd::
 difftool.prompt::
 	Prompt before each invocation of the diff tool.
 
+filewatcher.minfiles::
+	Start watching files if the number of watchable files are
+	above this limit. Default value is 65536.
+
+filewatcher.recentlimit::
+	Files that are last updated within filewatcher.recentlimit
+	seconds from now are not considered watchable. Default value
+	is 1800 (30 minutes).
+
 fetch.recurseSubmodules::
 	This option can be either set to a boolean value or to 'on-demand'.
 	Setting it to a boolean changes the behavior of fetch and pull to
diff --git a/cache.h b/cache.h
index 0d55551..bcec29b 100644
--- a/cache.h
+++ b/cache.h
@@ -278,6 +278,7 @@ struct index_state {
 	struct cache_tree *cache_tree;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 update_watches : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/read-cache.c b/read-cache.c
index 21c3207..406834a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -38,6 +38,8 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
 
 struct index_state the_index;
+static int watch_lowerlimit = 65536;
+static int recent_limit = 1800;
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
@@ -1014,6 +1016,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 			(istate->cache_nr - pos - 1) * sizeof(ce));
 	set_index_entry(istate, pos, ce);
 	istate->cache_changed = 1;
+	istate->update_watches = 1;
 	return 0;
 }
 
@@ -1300,13 +1303,14 @@ static void read_watch_extension(struct index_state *istate, uint8_t *data,
 				 unsigned long sz)
 {
 	int i;
-	if ((istate->cache_nr + 7) / 8 != sz) {
+	if ((istate->cache_nr + 7) / 8 + 1 != sz) {
 		error("invalid 'WATC' extension");
 		return;
 	}
 	for (i = 0; i < istate->cache_nr; i++)
 		if (data[i / 8] & (1 << (i % 8)))
 			istate->cache[i]->ce_flags |= CE_WATCHED;
+	istate->update_watches = data[sz - 1];
 }
 
 static int read_index_extension(struct index_state *istate,
@@ -1449,6 +1453,19 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static int watcher_config(const char *var, const char *value, void *data)
+{
+	if (!strcmp(var, "filewatcher.minfiles")) {
+		watch_lowerlimit = git_config_int(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "filewatcher.recentlimit")) {
+		recent_limit = git_config_int(var, value);
+		return 0;
+	}
+	return 0;
+}
+
 static void validate_watcher(struct index_state *istate, const char *path)
 {
 	int i;
@@ -1458,6 +1475,7 @@ static void validate_watcher(struct index_state *istate, const char *path)
 		return;
 	}
 
+	git_config(watcher_config, NULL);
 	istate->watcher = connect_watcher(path);
 	if (istate->watcher != -1) {
 		struct strbuf sb = STRBUF_INIT;
@@ -1478,6 +1496,7 @@ static void validate_watcher(struct index_state *istate, const char *path)
 			istate->cache[i]->ce_flags &= ~CE_WATCHED;
 			istate->cache_changed = 1;
 		}
+	istate->update_watches = 1;
 }
 
 static int sort_by_date(const void *a_, const void *b_)
@@ -1524,7 +1543,7 @@ static int do_watch_entries(struct index_state *istate,
 	return 0;
 }
 
-static inline int ce_watchable(struct cache_entry *ce)
+static inline int ce_watchable(struct cache_entry *ce, time_t now)
 {
 	return ce_uptodate(ce) && /* write_index will catch late ce_uptodate bits */
 		!(ce->ce_flags & CE_WATCHED) &&
@@ -1534,7 +1553,8 @@ static inline int ce_watchable(struct cache_entry *ce)
 		 * obviously. S_IFLNK could be problematic because
 		 * inotify may follow symlinks without IN_DONT_FOLLOW
 		 */
-		S_ISREG(ce->ce_mode);
+		S_ISREG(ce->ce_mode) &&
+		(ce->ce_stat_data.sd_mtime.sec + recent_limit < now);
 }
 
 static void watch_entries(struct index_state *istate)
@@ -1544,15 +1564,20 @@ static void watch_entries(struct index_state *istate)
 	struct strbuf sb = STRBUF_INIT;
 	int val;
 	socklen_t vallen = sizeof(val);
+	time_t now = time(NULL);
 
-	if (istate->watcher <= 0)
+	if (istate->watcher <= 0 || !istate->update_watches)
 		return;
+	istate->update_watches = 0;
+	istate->cache_changed = 1;
 	for (i = nr = 0; i < istate->cache_nr; i++)
-		if (ce_watchable(istate->cache[i]))
+		if (ce_watchable(istate->cache[i], now))
 			nr++;
+	if (nr < watch_lowerlimit)
+		return;
 	sorted = xmalloc(sizeof(*sorted) * nr);
 	for (i = nr = 0; i < istate->cache_nr; i++)
-		if (ce_watchable(istate->cache[i]))
+		if (ce_watchable(istate->cache[i], now))
 			sorted[nr++] = istate->cache[i];
 
 	getsockopt(istate->watcher, SOL_SOCKET, SO_SNDBUF, &val, &vallen);
@@ -1616,6 +1641,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
 	istate->initialized = 1;
+	istate->update_watches = 1;
 
 	if (istate->version == 4)
 		previous_name = &previous_name_buf;
@@ -2024,8 +2050,9 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
-	if (has_watches) {
-		int id, sz = (entries - removed + 7) / 8;
+	if (has_watches ||
+	    (istate->watcher != -1 && !istate->update_watches)) {
+		int id, sz = (entries - removed + 7) / 8 + 1;
 		uint8_t *data = xmalloc(sz);
 		memset(data, 0, sz);
 		for (i = 0, id = 0; i < entries && has_watches; i++) {
@@ -2038,6 +2065,7 @@ int write_index(struct index_state *istate, int newfd)
 			}
 			id++;
 		}
+		data[sz - 1] = istate->update_watches;
 		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
 			|| ce_write(&c, newfd, data, sz) < 0;
 		free(data);
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 06/14] read-cache: get modified file list from file watcher
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (4 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 05/14] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 07/14] read-cache: add config to start file watcher automatically Nguyễn Thái Ngọc Duy
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

A new command is added to file watcher to send back the list of
updated files to git. These entries will have CE_WATCHED removed. The
remaining CE_WATCHED entries will have CE_VALID set (i.e. no changes
and no lstat either).

The file watcher does not cache stat info and send back to git. Its
main purpose is to reduce lstat on most untouched files, not to
completely eliminate lstat.

The file watcher keeps reporting the same "updated" list until it
receives "forget" commands, which should only be issued after the
updated index is written down. This ensures that if git crashes half
way before it could update the index (or multiple processes is reading
the same index), "updated" info is not lost.

After the index is updated (e.g. in this case because of toggling
CE_WATCHED bits), git sends the new index signature to the file
watcher.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h        |   1 +
 file-watcher.c |  63 +++++++++++++++++++++++++++++++++---
 read-cache.c   | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 157 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index bcec29b..8f065ed 100644
--- a/cache.h
+++ b/cache.h
@@ -284,6 +284,7 @@ struct index_state {
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	int watcher;
+	struct string_list *updated_entries;
 };
 
 extern struct index_state the_index;
diff --git a/file-watcher.c b/file-watcher.c
index 3a54168..369af37 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -3,6 +3,7 @@
 #include "parse-options.h"
 #include "exec_cmd.h"
 #include "file-watcher-lib.h"
+#include "string-list.h"
 #include "pkt-line.h"
 
 static const char *const file_watcher_usage[] = {
@@ -11,6 +12,8 @@ static const char *const file_watcher_usage[] = {
 };
 
 static char index_signature[41];
+static struct string_list updated = STRING_LIST_INIT_DUP;
+static int updated_sorted;
 
 static int watch_path(char *path)
 {
@@ -23,6 +26,37 @@ static int watch_path(char *path)
 	return -1;
 }
 
+static void reset(void)
+{
+	string_list_clear(&updated, 0);
+	index_signature[0] = '\0';
+}
+
+static void send_status(int fd, struct sockaddr_un *sun)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int i, size;
+	socklen_t vallen = sizeof(size);
+	if (getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &size, &vallen))
+		die_errno("could not get SO_SNDBUF from socket %d", fd);
+
+	strbuf_grow(&sb, size);
+	strbuf_addstr(&sb, "new ");
+
+	for (i = 0; i < updated.nr; i++) {
+		int len = strlen(updated.items[i].string) + 4;
+		if (sb.len + len >= size) {
+			send_watcher(fd, sun, "%s", sb.buf);
+			strbuf_reset(&sb);
+			strbuf_addstr(&sb, "new ");
+		}
+		packet_buf_write(&sb, "%s", updated.items[i].string);
+	}
+	strbuf_addstr(&sb, "0000");
+	send_watcher(fd, sun, "%s", sb.buf);
+	strbuf_release(&sb);
+}
+
 static void watch_paths(char *buf, int maxlen,
 			int fd, struct sockaddr_un *sock)
 {
@@ -40,6 +74,19 @@ static void watch_paths(char *buf, int maxlen,
 	send_watcher(fd, sock, "fine %d", n);
 }
 
+static void remove_updated(const char *path)
+{
+	struct string_list_item *item;
+	if (!updated_sorted) {
+		sort_string_list(&updated);
+		updated_sorted = 1;
+	}
+	item = string_list_lookup(&updated, path);
+	if (!item)
+		return;
+	unsorted_string_list_delete_item(&updated, item - updated.items, 0);
+}
+
 static int handle_command(int fd)
 {
 	struct sockaddr_un sun;
@@ -53,11 +100,17 @@ static int handle_command(int fd)
 	if ((arg = skip_prefix(msg, "hello "))) {
 		send_watcher(fd, &sun, "hello %s", index_signature);
 		if (strcmp(arg, index_signature))
-			/*
-			 * Index SHA-1 mismatch, something has gone
-			 * wrong. Clean up and start over.
-			 */
-			index_signature[0] = '\0';
+			reset();
+	} else if ((arg = skip_prefix(msg, "clear"))) {
+		reset();
+	} else if (!strcmp(msg, "status")) {
+		send_status(fd, &sun);
+	} else if ((arg = skip_prefix(msg, "bye "))) {
+		strlcpy(index_signature, arg, sizeof(index_signature));
+	} else if ((arg = skip_prefix(msg, "forget "))) {
+		int len = strlen(index_signature);
+		if (!strncmp(arg, index_signature, len) && arg[len] == ' ')
+			remove_updated(arg + len + 1);
 	} else if (starts_with(msg, "watch ")) {
 		watch_paths(msg + 6, len - 6, fd, &sun);
 	} else if (!strcmp(msg, "die")) {
diff --git a/read-cache.c b/read-cache.c
index 406834a..3aa541d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1453,6 +1453,69 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void update_watched_files(struct index_state *istate)
+{
+	int i;
+	if (istate->watcher <= 0)
+		return;
+	if (send_watcher(istate->watcher, NULL, "status") < 0)
+		goto failed;
+	for (;;) {
+		char *line, *end;
+		ssize_t len;
+		int ch;
+		line = read_watcher(istate->watcher, &len, NULL);
+		if (!line || !starts_with(line, "new ")) {
+			if (!len) {
+				close(istate->watcher);
+				istate->watcher = -1;
+			}
+			goto failed;
+		}
+		end = line + len;
+		line += 4;
+		for (; line < end; line[len] = ch, line += len) {
+			len = packet_length(line);
+			if (!len)
+				break;
+			ch = line[len];
+			line[len] = '\0';
+			i = index_name_pos(istate, line + 4, len - 4);
+			if (i < 0)
+				continue;
+			if (istate->cache[i]->ce_flags & CE_WATCHED) {
+				istate->cache[i]->ce_flags &= ~CE_WATCHED;
+				istate->cache_changed = 1;
+			}
+			if (!istate->updated_entries) {
+				struct string_list *sl;
+				sl = xmalloc(sizeof(*sl));
+				memset(sl, 0, sizeof(*sl));
+				sl->strdup_strings = 1;
+				istate->updated_entries = sl;
+			}
+			string_list_append(istate->updated_entries, line + 4);
+		}
+		if (!len)
+			break;
+	}
+
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHED)
+			istate->cache[i]->ce_flags |= CE_VALID;
+	return;
+failed:
+	if (istate->updated_entries) {
+		string_list_clear(istate->updated_entries, 0);
+		free(istate->updated_entries);
+		istate->updated_entries = NULL;
+	}
+	send_watcher(istate->watcher, NULL, "clear");
+	for (i = 0; i < istate->cache_nr; i++)
+		istate->cache[i]->ce_flags &= ~CE_WATCHED;
+	istate->cache_changed = 1;
+}
+
 static int watcher_config(const char *var, const char *value, void *data)
 {
 	if (!strcmp(var, "filewatcher.minfiles")) {
@@ -1484,6 +1547,7 @@ static void validate_watcher(struct index_state *istate, const char *path)
 		if (send_watcher(istate->watcher, NULL, "%s", sb.buf) > 0 &&
 		    (msg = read_watcher(istate->watcher, NULL, NULL)) != NULL &&
 		    !strcmp(msg, sb.buf)) { /* good */
+			update_watched_files(istate);
 			strbuf_release(&sb);
 			return;
 		}
@@ -1597,6 +1661,21 @@ static void watch_entries(struct index_state *istate)
 	free(sorted);
 }
 
+static void farewell_watcher(struct index_state *istate,
+			     const unsigned char *sha1)
+{
+	int i;
+	if (istate->watcher <= 0)
+		return;
+	send_watcher(istate->watcher, NULL, "bye %s", sha1_to_hex(sha1));
+	if (!istate->updated_entries)
+		return;
+	for (i = 0; i < istate->updated_entries->nr; i++)
+		send_watcher(istate->watcher, NULL, "forget %s %s",
+			     sha1_to_hex(sha1),
+			     istate->updated_entries->items[i].string);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1718,6 +1797,11 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->updated_entries) {
+		string_list_clear(istate->updated_entries, 0);
+		free(istate->updated_entries);
+		istate->updated_entries = NULL;
+	}
 	return 0;
 }
 
@@ -1778,7 +1862,7 @@ static int write_index_ext_header(git_SHA_CTX *context, int fd,
 		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
 }
 
-static int ce_flush(git_SHA_CTX *context, int fd)
+static int ce_flush(git_SHA_CTX *context, int fd, unsigned char *sha1)
 {
 	unsigned int left = write_buffer_len;
 
@@ -1796,6 +1880,8 @@ static int ce_flush(git_SHA_CTX *context, int fd)
 
 	/* Append the SHA1 signature at the end */
 	git_SHA1_Final(write_buffer + left, context);
+	if (sha1)
+		hashcpy(sha1, write_buffer + left);
 	left += 20;
 	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
 }
@@ -1960,12 +2046,21 @@ int write_index(struct index_state *istate, int newfd)
 	int entries = istate->cache_nr;
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	unsigned char sha1[20];
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
 		else if (cache[i]->ce_flags & CE_WATCHED) {
 			/*
+			 * CE_VALID when used with CE_WATCHED is not
+			 * supposed to be persistent. Next time git
+			 * runs, if this entry is still watched and
+			 * nothing has changed, CE_VALID will be
+			 * reinstated.
+			 */
+			cache[i]->ce_flags &= ~CE_VALID;
+			/*
 			 * We may set CE_WATCHED (but not CE_VALID)
 			 * early when refresh has not been done
 			 * yet. At that time we had no idea if the
@@ -2073,8 +2168,9 @@ int write_index(struct index_state *istate, int newfd)
 			return -1;
 	}
 
-	if (ce_flush(&c, newfd) || fstat(newfd, &st))
+	if (ce_flush(&c, newfd, sha1) || fstat(newfd, &st))
 		return -1;
+	farewell_watcher(istate, sha1);
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	return 0;
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 07/14] read-cache: add config to start file watcher automatically
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (5 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 06/14] read-cache: get modified file list from file watcher Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing Nguyễn Thái Ngọc Duy
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt |  5 +++++
 file-watcher-lib.c       | 18 +++++++++++++++---
 file-watcher-lib.h       |  2 +-
 file-watcher.c           |  8 ++++++--
 read-cache.c             | 17 +++++++++++++++--
 5 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e394399..3316b69 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1038,6 +1038,11 @@ difftool.<tool>.cmd::
 difftool.prompt::
 	Prompt before each invocation of the diff tool.
 
+filewatcher.autorun::
+	Run `git file-watcher` automatically if the number of cached
+	entries is greater than this limit. Zero means no running
+	file-watcher automatically. Default value is zero.
+
 filewatcher.minfiles::
 	Start watching files if the number of watchable files are
 	above this limit. Default value is 65536.
diff --git a/file-watcher-lib.c b/file-watcher-lib.c
index ed14ef9..71c8545 100644
--- a/file-watcher-lib.c
+++ b/file-watcher-lib.c
@@ -1,16 +1,28 @@
 #include "cache.h"
+#include "run-command.h"
 
 #define WAIT_TIME 20		/* in ms */
 #define TRACE_KEY "GIT_TRACE_WATCHER"
 
-int connect_watcher(const char *path)
+int connect_watcher(const char *path, int autorun)
 {
 	struct strbuf sb = STRBUF_INIT;
 	struct stat st;
-	int fd = -1;
+	int fd = -1, ret;
 
 	strbuf_addf(&sb, "%s.watcher", path);
-	if (!stat(sb.buf, &st) && S_ISSOCK(st.st_mode)) {
+	ret = stat(sb.buf, &st);
+	if (autorun && ret && errno == ENOENT) {
+		const char *av[] = { "file-watcher", "--daemon", "--quiet", NULL };
+		struct child_process cp;
+		memset(&cp, 0, sizeof(cp));
+		cp.git_cmd = 1;
+		cp.argv = av;
+		if (run_command(&cp))
+			return -1;
+		ret = stat(sb.buf, &st);
+	}
+	if (!ret && S_ISSOCK(st.st_mode)) {
 		struct sockaddr_un sun;
 		fd = socket(AF_UNIX, SOCK_DGRAM, 0);
 		sun.sun_family = AF_UNIX;
diff --git a/file-watcher-lib.h b/file-watcher-lib.h
index 0fe9399..ef3d196 100644
--- a/file-watcher-lib.h
+++ b/file-watcher-lib.h
@@ -1,7 +1,7 @@
 #ifndef __FILE_WATCHER_LIB__
 #define __FILE_WATCHER_LIB__
 
-int connect_watcher(const char *path);
+int connect_watcher(const char *path, int autorun);
 ssize_t send_watcher(int sockfd, struct sockaddr_un *dest,
 		     const char *fmt, ...)
 	__attribute__((format (printf, 3, 4)));
diff --git a/file-watcher.c b/file-watcher.c
index 369af37..1b4ac0a 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -168,8 +168,9 @@ int main(int argc, const char **argv)
 	struct pollfd pfd[2];
 	int fd, err, nr;
 	const char *prefix;
-	int daemon = 0;
+	int daemon = 0, quiet = 0;
 	struct option options[] = {
+		OPT__QUIET(&quiet, N_("be quiet")),
 		OPT_BOOL(0, "daemon", &daemon,
 			 N_("run in background")),
 		OPT_END()
@@ -189,8 +190,11 @@ int main(int argc, const char **argv)
 	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
 	sun.sun_family = AF_UNIX;
 	strlcpy(sun.sun_path, socket_path, sizeof(sun.sun_path));
-	if (bind(fd, (struct sockaddr *)&sun, sizeof(sun)))
+	if (bind(fd, (struct sockaddr *)&sun, sizeof(sun))) {
+		if (quiet)
+			exit(128);
 		die_errno("unable to bind to %s", socket_path);
+	}
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
diff --git a/read-cache.c b/read-cache.c
index 3aa541d..5dae9eb 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -40,6 +40,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 struct index_state the_index;
 static int watch_lowerlimit = 65536;
 static int recent_limit = 1800;
+static int autorun_watcher = -1;
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
@@ -1518,6 +1519,10 @@ failed:
 
 static int watcher_config(const char *var, const char *value, void *data)
 {
+	if (!strcmp(var, "filewatcher.autorun")) {
+		autorun_watcher = git_config_int(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "filewatcher.minfiles")) {
 		watch_lowerlimit = git_config_int(var, value);
 		return 0;
@@ -1538,8 +1543,16 @@ static void validate_watcher(struct index_state *istate, const char *path)
 		return;
 	}
 
-	git_config(watcher_config, NULL);
-	istate->watcher = connect_watcher(path);
+	if (autorun_watcher == -1) {
+		git_config(watcher_config, NULL);
+		if (autorun_watcher == -1)
+			autorun_watcher = 0;
+	}
+
+	istate->watcher = connect_watcher(path,
+					  autorun_watcher &&
+					  istate->cache_nr >= autorun_watcher);
+	autorun_watcher = 0;
 	if (istate->watcher != -1) {
 		struct strbuf sb = STRBUF_INIT;
 		char *msg;
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (6 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 07/14] read-cache: add config to start file watcher automatically Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-19 17:04     ` Thomas Rast
  2014-01-17  9:47   ` [PATCH/WIP v2 09/14] file-watcher: add --shutdown and --log options Nguyễn Thái Ngọc Duy
                     ` (4 subsequent siblings)
  12 siblings, 1 reply; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

This can be used to force watcher on when running the test
suite.

git-file-watcher processes are not automatically cleaned up after each
test. So after running the test suite you'll be left with plenty
git-file-watcher processes that should all end after about a minute.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 5dae9eb..a1245d4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1544,7 +1544,12 @@ static void validate_watcher(struct index_state *istate, const char *path)
 	}
 
 	if (autorun_watcher == -1) {
-		git_config(watcher_config, NULL);
+		if (getenv("GIT_TEST_FORCE_WATCHER")) {
+			watch_lowerlimit = 0;
+			recent_limit = 0;
+			autorun_watcher = 1;
+		} else
+			git_config(watcher_config, NULL);
 		if (autorun_watcher == -1)
 			autorun_watcher = 0;
 	}
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 09/14] file-watcher: add --shutdown and --log options
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (7 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 10/14] file-watcher: automatically quit Nguyễn Thái Ngọc Duy
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/file-watcher.c b/file-watcher.c
index 1b4ac0a..df06529 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -113,6 +113,8 @@ static int handle_command(int fd)
 			remove_updated(arg + len + 1);
 	} else if (starts_with(msg, "watch ")) {
 		watch_paths(msg + 6, len - 6, fd, &sun);
+	} else if ((arg = skip_prefix(msg, "log "))) {
+		fprintf(stderr, "log %s\n", arg);
 	} else if (!strcmp(msg, "die")) {
 		exit(0);
 	} else {
@@ -168,11 +170,16 @@ int main(int argc, const char **argv)
 	struct pollfd pfd[2];
 	int fd, err, nr;
 	const char *prefix;
-	int daemon = 0, quiet = 0;
+	int daemon = 0, quiet = 0, shutdown = 0;
+	const char *log_string = NULL;
 	struct option options[] = {
 		OPT__QUIET(&quiet, N_("be quiet")),
 		OPT_BOOL(0, "daemon", &daemon,
-			 N_("run in background")),
+			 N_("run in background (default)")),
+		OPT_BOOL(0, "shutdown", &shutdown,
+			 N_("shut down running file-watcher daemon")),
+		OPT_STRING(0, "log", &log_string, "string",
+			   N_("string to log to index.watcher.log")),
 		OPT_END()
 	};
 
@@ -190,11 +197,24 @@ int main(int argc, const char **argv)
 	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
 	sun.sun_family = AF_UNIX;
 	strlcpy(sun.sun_path, socket_path, sizeof(sun.sun_path));
+
+	if (shutdown || log_string) {
+		struct stat st;
+		if (stat(socket_path, &st) || !S_ISSOCK(st.st_mode))
+			return 0;
+		if (log_string && send_watcher(fd, &sun, "log %s", log_string) < 0)
+			die_errno("failed to shut file-watcher down");
+		if (shutdown && send_watcher(fd, &sun, "die") < 0)
+			die_errno("failed to shut file-watcher down");
+		return 0;
+	}
+
 	if (bind(fd, (struct sockaddr *)&sun, sizeof(sun))) {
 		if (quiet)
 			exit(128);
 		die_errno("unable to bind to %s", socket_path);
 	}
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 10/14] file-watcher: automatically quit
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (8 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 09/14] file-watcher: add --shutdown and --log options Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-17  9:47   ` [PATCH/WIP v2 11/14] file-watcher: support inotify Nguyễn Thái Ngọc Duy
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

If $GIT_DIR/index.watcher or $GIT_DIR/index is gone, exit. We could
watch this path too, but we'll waste precious resources (at least with
inotify). And with inotify, it seems to miss the case when $GIT_DIR is
moved. Just check if the socket path still exists every minute.

As the last resort, if we do not receive any commands in the last 6
hours, exit. The code is structured this way because later on inotify
is also polled. On an busy watched directory, the timeout may never
happen for us to kil the watcher, even if index.watcher is already
gone.

For mass cleanup, "killall -USR1 git-file-watcher" asks every watcher
process to question the purpose of its existence.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/file-watcher.c b/file-watcher.c
index df06529..f334e23 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -163,6 +163,12 @@ static void daemonize(void)
 #endif
 }
 
+static int check_exit_please;
+static void check_exit_signal(int signo)
+{
+	check_exit_please = 1;
+}
+
 int main(int argc, const char **argv)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -172,6 +178,8 @@ int main(int argc, const char **argv)
 	const char *prefix;
 	int daemon = 0, quiet = 0, shutdown = 0;
 	const char *log_string = NULL;
+	struct stat socket_st;
+	struct timeval tv_last_command;
 	struct option options[] = {
 		OPT__QUIET(&quiet, N_("be quiet")),
 		OPT_BOOL(0, "daemon", &daemon,
@@ -217,6 +225,10 @@ int main(int argc, const char **argv)
 
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
+	sigchain_push(SIGUSR1, check_exit_signal);
+
+	if (stat(socket_path, &socket_st))
+		die_errno("failed to stat %s", socket_path);
 
 	if (daemon) {
 		strbuf_addf(&sb, "%s.log", socket_path);
@@ -234,17 +246,37 @@ int main(int argc, const char **argv)
 	pfd[nr].fd = fd;
 	pfd[nr++].events = POLLIN;
 
+	gettimeofday(&tv_last_command, NULL);
 	for (;;) {
-		if (poll(pfd, nr, -1) < 0) {
+		int check_exit = check_exit_please;
+		int ret = poll(pfd, nr, check_exit ? 0 : 60 * 1000);
+		if (ret < 0) {
 			if (errno != EINTR) {
 				error("Poll failed, resuming: %s", strerror(errno));
 				sleep(1);
 			}
 			continue;
+		} else if (!ret)
+			check_exit = 1;
+
+		if ((pfd[0].revents & POLLIN)) {
+			if (handle_command(fd))
+				break;
+			gettimeofday(&tv_last_command, NULL);
 		}
 
-		if ((pfd[0].revents & POLLIN) && handle_command(fd))
-			break;
+		if (check_exit) {
+			struct stat st;
+			struct timeval now;
+			gettimeofday(&now, NULL);
+			if (tv_last_command.tv_sec + 6 * 60 < now.tv_sec)
+				break;
+			if (stat(socket_path, &st) ||
+			    st.st_ino != socket_st.st_ino ||
+			    stat(get_index_file(), &st))
+				break;
+			check_exit_please = 0;
+		}
 	}
 	return 0;
 }
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH/WIP v2 11/14] file-watcher: support inotify
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (9 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 10/14] file-watcher: automatically quit Nguyễn Thái Ngọc Duy
@ 2014-01-17  9:47   ` Nguyễn Thái Ngọc Duy
  2014-01-19 17:04   ` [PATCH/WIP v2 00/14] inotify support Thomas Rast
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
  12 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-01-17  9:47 UTC (permalink / raw)
  To: git; +Cc: tr, Nguyễn Thái Ngọc Duy

"git diff" on webkit:

        no file watcher  1st run   subsequent runs
real        0m1.361s    0m1.445s      0m0.691s
user        0m0.889s    0m0.940s      0m0.649s
sys         0m0.469s    0m0.495s      0m0.040s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 config.mak.uname  |   1 +
 file-watcher.c    | 194 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 git-compat-util.h |   3 +
 3 files changed, 198 insertions(+)

diff --git a/config.mak.uname b/config.mak.uname
index 7d31fad..ee548f5 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_PATHS_H = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	BASIC_CFLAGS += -DHAVE_INOTIFY
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	NO_STRLCPY = YesPlease
diff --git a/file-watcher.c b/file-watcher.c
index f334e23..356b58a 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -15,6 +15,166 @@ static char index_signature[41];
 static struct string_list updated = STRING_LIST_INIT_DUP;
 static int updated_sorted;
 
+#ifdef HAVE_INOTIFY
+
+static struct string_list watched_dirs = STRING_LIST_INIT_DUP;
+static int watched_dirs_sorted;
+static int inotify_fd;
+
+struct dir_info {
+	int wd;
+	struct string_list names;
+	int names_sorted;
+};
+
+static void reset_watches(void)
+{
+	int i;
+	for (i = 0; i < watched_dirs.nr; i++) {
+		struct dir_info *dir = watched_dirs.items[i].util;
+		inotify_rm_watch(inotify_fd, dir->wd);
+		string_list_clear(&dir->names, 0);
+	}
+	string_list_clear(&watched_dirs, 1);
+}
+
+static void update(const char *base, const char *name)
+{
+	if (!strcmp(base, "."))
+		string_list_append(&updated, name);
+	else {
+		static struct strbuf sb = STRBUF_INIT;
+		strbuf_addf(&sb, "%s/%s", base, name);
+		string_list_append(&updated, sb.buf);
+		strbuf_reset(&sb);
+	}
+	updated_sorted = 0;
+}
+
+static int do_handle_inotify(const struct inotify_event *event)
+{
+	struct dir_info *dir;
+	struct string_list_item *item;
+	int i;
+
+	if (event->mask & (IN_Q_OVERFLOW | IN_UNMOUNT)) {
+		/*
+		 * The connectionless nature of file watcher means we
+		 * can never tell we are reset in the middle of a
+		 * "session" because there are no "sessions". Close
+		 * the socket so all clients can react on it.
+		 */
+		exit(0);
+	}
+
+	/* Should have indexed them for faster access like trast's watch */
+	for (i = 0; i < watched_dirs.nr; i++) {
+		struct dir_info *dir = watched_dirs.items[i].util;
+		if (dir->wd == event->wd)
+			break;
+	}
+	if (i == watched_dirs.nr)
+		return 0;
+	dir = watched_dirs.items[i].util;
+
+	/*
+	 * If something happened to the watched directory, consider
+	 * everything inside modified
+	 */
+	if (event->mask & (IN_DELETE_SELF | IN_MOVE_SELF)) {
+		int dir_idx = i;
+		for (i = 0; i < dir->names.nr; i++)
+			update(watched_dirs.items[dir_idx].string,
+			       dir->names.items[i].string);
+		inotify_rm_watch(inotify_fd, dir->wd);
+		unsorted_string_list_delete_item(&watched_dirs, dir_idx, 1);
+		return 0;
+	}
+
+	if (!dir->names_sorted) {
+		sort_string_list(&dir->names);
+		dir->names_sorted = 1;
+	}
+	item = string_list_lookup(&dir->names, event->name);
+	if (item) {
+		update(watched_dirs.items[i].string, item->string);
+		unsorted_string_list_delete_item(&dir->names,
+						 item - dir->names.items, 0);
+		if (dir->names.nr == 0) {
+			inotify_rm_watch(inotify_fd, dir->wd);
+			unsorted_string_list_delete_item(&watched_dirs, i, 1);
+		}
+	}
+	return 0;
+}
+
+static int handle_inotify(int fd)
+{
+	static char buf[10 * (sizeof(struct inotify_event) + NAME_MAX + 1)];
+	struct inotify_event *event;
+	int offset = 0;
+	int len = read(fd, buf, sizeof(buf));
+	if (len <= 0)
+		return -1;
+	for (event = (struct inotify_event *)(buf + offset);
+	     offset < len;
+	     offset += sizeof(struct inotify_event) + event->len) {
+		if (do_handle_inotify(event))
+			return -1;
+	}
+	return 0;
+}
+
+static int watch_path(char *path)
+{
+	struct string_list_item *item;
+	char *sep = strrchr(path, '/');
+	struct dir_info *dir;
+	const char *dirname = ".";
+
+	if (sep) {
+		*sep = '\0';
+		dirname = path;
+	}
+
+	if (!watched_dirs_sorted) {
+		sort_string_list(&watched_dirs);
+		watched_dirs_sorted = 1;
+	}
+	item = string_list_lookup(&watched_dirs, dirname);
+	if (!item) {
+		/*
+		 * IN_CREATE is not included because we're targetting
+		 * lstat() for index vs worktree. If a file is not
+		 * tracked in index, it's not worth watching. If the
+		 * index has it, but the worktree is already gone
+		 * before watching, the file has already been marked
+		 * modified and should _not_ be watched.
+		 *
+		 * Problematic: IN_DONT_FOLLOW
+		 */
+		int ret = inotify_add_watch(inotify_fd, dirname,
+					    IN_DELETE_SELF | IN_MOVE_SELF |
+					    IN_ATTRIB | IN_DELETE | IN_MODIFY |
+					    IN_MOVED_FROM | IN_MOVED_TO);
+		if (ret < 0)
+			return -1;
+		dir = xmalloc(sizeof(*dir));
+		memset(dir, 0, sizeof(*dir));
+		dir->wd = ret;
+		dir->names.strdup_strings = 1;
+		item = string_list_append(&watched_dirs, dirname);
+		item->util = dir;
+		watched_dirs_sorted = 0;
+	}
+	dir = item->util;
+	string_list_append(&dir->names, sep ? sep + 1 : path);
+	dir->names_sorted = 0;
+	return 0;
+}
+
+#else
+
 static int watch_path(char *path)
 {
 	/*
@@ -26,8 +186,15 @@ static int watch_path(char *path)
 	return -1;
 }
 
+static void reset_watches(void)
+{
+}
+
+#endif
+
 static void reset(void)
 {
+	reset_watches();
 	string_list_clear(&updated, 0);
 	index_signature[0] = '\0';
 }
@@ -180,6 +347,7 @@ int main(int argc, const char **argv)
 	const char *log_string = NULL;
 	struct stat socket_st;
 	struct timeval tv_last_command;
+	struct timeval tv_last_inotify;
 	struct option options[] = {
 		OPT__QUIET(&quiet, N_("be quiet")),
 		OPT_BOOL(0, "daemon", &daemon,
@@ -193,6 +361,15 @@ int main(int argc, const char **argv)
 
 	git_extract_argv0_path(argv[0]);
 	git_setup_gettext();
+
+#ifdef HAVE_INOTIFY
+	inotify_fd = inotify_init();
+	if (inotify_fd < 0)
+		die_errno("unable to initialize inotify");
+#else
+	die("no file watching mechanism is supported");
+#endif
+
 	prefix = setup_git_directory();
 	argc = parse_options(argc, argv, prefix, options,
 			     file_watcher_usage, 0);
@@ -245,8 +422,13 @@ int main(int argc, const char **argv)
 	nr = 0;
 	pfd[nr].fd = fd;
 	pfd[nr++].events = POLLIN;
+#ifdef HAVE_INOTIFY
+	pfd[nr].fd = inotify_fd;
+	pfd[nr++].events = POLLIN;
+#endif
 
 	gettimeofday(&tv_last_command, NULL);
+	gettimeofday(&tv_last_inotify, NULL);
 	for (;;) {
 		int check_exit = check_exit_please;
 		int ret = poll(pfd, nr, check_exit ? 0 : 60 * 1000);
@@ -265,6 +447,18 @@ int main(int argc, const char **argv)
 			gettimeofday(&tv_last_command, NULL);
 		}
 
+#ifdef HAVE_INOTIFY
+		if ((pfd[1].revents & POLLIN)) {
+			struct timeval now;
+			if (handle_inotify(inotify_fd))
+				break;
+			gettimeofday(&now, NULL);
+			if (tv_last_inotify.tv_sec + 60 < now.tv_sec) {
+				check_exit = 1;
+				tv_last_inotify = now;
+			}
+		}
+#endif
 		if (check_exit) {
 			struct stat st;
 			struct timeval now;
diff --git a/git-compat-util.h b/git-compat-util.h
index cbd86c3..de5996a 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -128,6 +128,9 @@
 #else
 #include <poll.h>
 #endif
+#ifdef HAVE_INOTIFY
+#include <sys/inotify.h>
+#endif
 
 #if defined(__MINGW32__)
 /* pull in Windows compatibility stuff */
-- 
1.8.5.1.208.g05b12ea

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched
  2014-01-17  9:47   ` [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
@ 2014-01-17 11:19     ` Thomas Gummerer
  2014-01-19 17:06     ` Thomas Rast
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Gummerer @ 2014-01-17 11:19 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: tr, Nguyễn Thái Ngọc Duy

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> If an entry is "watched", git lets an external program decide if the
> entry is modified or not. It's more like --assume-unchanged, but
> designed to be controlled by machine.
>
> We are running out of on-disk ce_flags, so instead of extending
> on-disk entry format again, "watched" flags are in-core only and
> stored as extension instead.

As you said yourself in http://thread.gmane.org/gmane.comp.version-control.git/240339/focus=240385
this is not quite true.  As for your explanation there,

> Anyway using extended flags means 2 extra bytes per entry for
> almost every entry in this case (and for index v5 it means redoing
> crc32 for almost every entry too when the bit is updated) so it may
> still be a good idea to keep the new flag separate.

I don't think adding 2 extra bytes would be too bad, since we are
already using 62 bytes plus the bytes for the filename for each index
entry, so it would be a less than 3% increase in the index file size.
(And the extended flags may be used anyway in some cases)

As for index-v5 (if that's ever going to happen), it depends mostly on
how often the CE_WATCHED is going to be updated, to decide whether it
makes sense to store this as extension.

That said, I don't care too deeply if it's stored one way or another,
but I think it would be good to update the commit message with a better
rationale for the choice.

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  cache.h      |  2 ++
>  read-cache.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index a09d622..069dce7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -168,6 +168,8 @@ struct cache_entry {
>  /* used to temporarily mark paths matched by pathspecs */
>  #define CE_MATCHED           (1 << 26)
>
> +#define CE_WATCHED           (1 << 27)
> +
>  /*
>   * Extended on-disk flags
>   */
> diff --git a/read-cache.c b/read-cache.c
> index fe1d153..6f21e3f 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -33,6 +33,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>  #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
>  #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
>  #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
> +#define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
>
>  struct index_state the_index;
>
> @@ -1293,6 +1294,19 @@ static int verify_hdr(struct cache_header *hdr,
>  	return 0;
>  }
>
> +static void read_watch_extension(struct index_state *istate, uint8_t *data,
> +				 unsigned long sz)
> +{
> +	int i;
> +	if ((istate->cache_nr + 7) / 8 != sz) {
> +		error("invalid 'WATC' extension");
> +		return;
> +	}
> +	for (i = 0; i < istate->cache_nr; i++)
> +		if (data[i / 8] & (1 << (i % 8)))
> +			istate->cache[i]->ce_flags |= CE_WATCHED;
> +}
> +
>  static int read_index_extension(struct index_state *istate,
>  				const char *ext, void *data, unsigned long sz)
>  {
> @@ -1303,6 +1317,9 @@ static int read_index_extension(struct index_state *istate,
>  	case CACHE_EXT_RESOLVE_UNDO:
>  		istate->resolve_undo = resolve_undo_read(data, sz);
>  		break;
> +	case CACHE_EXT_WATCH:
> +		read_watch_extension(istate, data, sz);
> +		break;
>  	default:
>  		if (*ext < 'A' || 'Z' < *ext)
>  			return error("index uses %.4s extension, which we do not understand",
> @@ -1781,7 +1798,7 @@ int write_index(struct index_state *istate, int newfd)
>  {
>  	git_SHA_CTX c;
>  	struct cache_header hdr;
> -	int i, err, removed, extended, hdr_version;
> +	int i, err, removed, extended, hdr_version, has_watches = 0;
>  	struct cache_entry **cache = istate->cache;
>  	int entries = istate->cache_nr;
>  	struct stat st;
> @@ -1790,6 +1807,8 @@ int write_index(struct index_state *istate, int newfd)
>  	for (i = removed = extended = 0; i < entries; i++) {
>  		if (cache[i]->ce_flags & CE_REMOVE)
>  			removed++;
> +		else if (cache[i]->ce_flags & CE_WATCHED)
> +			has_watches++;
>
>  		/* reduce extended entries if possible */
>  		cache[i]->ce_flags &= ~CE_EXTENDED;
> @@ -1861,6 +1880,26 @@ int write_index(struct index_state *istate, int newfd)
>  		if (err)
>  			return -1;
>  	}
> +	if (has_watches) {
> +		int id, sz = (entries - removed + 7) / 8;
> +		uint8_t *data = xmalloc(sz);
> +		memset(data, 0, sz);
> +		for (i = 0, id = 0; i < entries && has_watches; i++) {
> +			struct cache_entry *ce = cache[i];
> +			if (ce->ce_flags & CE_REMOVE)
> +				continue;
> +			if (ce->ce_flags & CE_WATCHED) {
> +				data[id / 8] |= 1 << (id % 8);
> +				has_watches--;
> +			}
> +			id++;
> +		}
> +		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
> +			|| ce_write(&c, newfd, data, sz) < 0;
> +		free(data);
> +		if (err)
> +			return -1;
> +	}
>
>  	if (ce_flush(&c, newfd) || fstat(newfd, &st))
>  		return -1;
> --
> 1.8.5.1.208.g05b12ea
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 03/14] read-cache: connect to file watcher
  2014-01-17  9:47   ` [PATCH/WIP v2 03/14] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
@ 2014-01-17 15:24     ` Torsten Bögershausen
  2014-01-17 16:21       ` Duy Nguyen
  0 siblings, 1 reply; 72+ messages in thread
From: Torsten Bögershausen @ 2014-01-17 15:24 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git; +Cc: tr

On 2014-01-17 10.47, Nguyễn Thái Ngọc Duy wrote:
[snip[
> diff --git a/file-watcher-lib.c b/file-watcher-lib.c


> +int connect_watcher(const char *path)
Could it be worth to check if we can use some code from unix-socket.c ?

Especially important could be that unix_sockaddr_init() wotks around a problem
when "long" path names are used. 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 03/14] read-cache: connect to file watcher
  2014-01-17 15:24     ` Torsten Bögershausen
@ 2014-01-17 16:21       ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-17 16:21 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git Mailing List, Thomas Rast

On Fri, Jan 17, 2014 at 10:24 PM, Torsten Bögershausen <tboegi@web.de> wrote:
> On 2014-01-17 10.47, Nguyễn Thái Ngọc Duy wrote:
> [snip[
>> diff --git a/file-watcher-lib.c b/file-watcher-lib.c
>
>
>> +int connect_watcher(const char *path)
> Could it be worth to check if we can use some code from unix-socket.c ?
>
> Especially important could be that unix_sockaddr_init() wotks around a problem
> when "long" path names are used.
>

Thanks! I did not even know about unix-socket.c. Well, I never paid
attention to credential-cache.c :(
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 00/14] inotify support
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (10 preceding siblings ...)
  2014-01-17  9:47   ` [PATCH/WIP v2 11/14] file-watcher: support inotify Nguyễn Thái Ngọc Duy
@ 2014-01-19 17:04   ` Thomas Rast
  2014-01-20  1:28     ` Duy Nguyen
  2014-01-28 10:46     ` Duy Nguyen
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
  12 siblings, 2 replies; 72+ messages in thread
From: Thomas Rast @ 2014-01-19 17:04 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>   read-cache: save trailing sha-1
>   read-cache: new extension to mark what file is watched
>   read-cache: connect to file watcher
>   read-cache: ask file watcher to watch files
>   read-cache: put some limits on file watching
>   read-cache: get modified file list from file watcher
>   read-cache: add config to start file watcher automatically
>   read-cache: add GIT_TEST_FORCE_WATCHER for testing
>   file-watcher: add --shutdown and --log options
>   file-watcher: automatically quit
>   file-watcher: support inotify
>   file-watcher: exit when cwd is gone
>   pkt-line.c: increase buffer size to 8192
>   t1301: exclude sockets from file permission check

I never got the last three patches, did you send them?

Also, this doesn't cleanly apply anywhere at my end.  Can you push it
somewhere for easier experimentation?

> This is getting in better shape. Still wondering if the design is
> right, so documentation, tests and some error cases are still
> neglected. I have not addressed Jonathan's and Jeff's comments in this
> reroll, but I haven't forgotten them yet. The test suite seems to be
> fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set..

I tried to figure out whether there were any corners you are painting it
into, but doing so without a clear overview of the protocol makes my
head spin.

So here's what I gather from the patches (I would have started from the
final result, but see above).

The slow path, before the watcher is ready, works like:

  spawn watcher
  send "hello $index_sha1"
  receive "hello "
  mark most files CE_WATCHED
  send "watch $paths" for each CE_WATCHED
    the paths are actually a pkt-line-encoded series of paths
    get back "fine $n" (n = number of watches processed)
  do work as usual (including lstat())

The fast path:

  load CE_WATCHED from index
  send "hello $index_sha1"
  receive "hello $index_sha1"
  send "status"
    get back "new $path" for each changed file
    again aggregate
  mark files that are "new" as ~CE_WATCHED
  files that are CE_WATCHED can skip their lstat(), since they are unchanged

On index writing (both paths):

  save CE_WATCHED in index extension
  send "bye $index_sha1"
  send "forget $index_sha1 $path" for every path that is known to be changed

So I think the watcher protocol can be specified roughly as:

  Definitions: The watcher is a separate process that maintains a set of
  _watched paths_.  Git uses the commands below to add or remove paths
  from this set.

  In addition it maintains a set of _changed paths_, which is a subset
  of the watched paths.  Git occasionally queries the changed paths.  If
  at any time after a path P was added to the "watched" set, P has or
  may have changed, it MUST be added to the "changed" set.

  Note that a path being "unchanged" under this definition does NOT mean
  that it is unchanged in the index as per `git diff`.  It may have been
  changed before the watch was issued!  Therefore, Git MUST test whether
  the path started out unchanged using lstat() AFTER it was added to the
  "watched" set.

  Handshake::
    On connecting, git sends "hello <magic>".  <magic> is an opaque
    string that identifies the state of the index.  The watcher MUST
    respond with "hello <previous_magic>", where previous_magic is
    obtained from the "bye" command (below).  If <magic> !=
    <previous_magic>, the watcher MUST reset state for this repository.

  Watch::
    Git sends "watch <pathlist>" where the <pathlist> consists of a
    series of 4-digit lengths and literal pathnames (as in the pkt-line
    format) for each path it is interested in.  The watcher MUST respond
    with "fine <n>" after processing a message that contained <n> paths.
    The watcher MUST add each path to the "watched" set and remove it
    from the "changed" set (no error if it was already watched, or not
    changed).

  Status::
    Git sends "status".  The watcher MUST respond with its current
    "changed" set.  This uses the format "new <pathlist>", with
    <pathlist> formatted as for the "watch" command.

  Bye::
    Git sends "bye <magic>" to indicate it has finished writing the
    index.  The watcher must store <magic> so as to use it as the
    <previous_magic> in a "hello" response.

  Forget::
    Git sends "forget <pathlist>", with the <pathlist> formatted as for
    the "watch" command.  The watcher SHOULD remove each path in
    <pathlist> from its watched set.

Did I get that approximately straight?  Perhaps you can fix up as needed
and then turn it into the documentation for the protocol.

There are several points about this that I don't like:

* What does the datagram socket buy us?  You already do a lot of
  pkt-line work for the really big messages, so why not just use
  pkt-line everywhere and a streaming socket?

* The handshake should have room for capabilities, so that we can extend
  the protocol.

* The hello/hello handshake is confusing, and probably would allow you
  to point two watchers at each other without them noticing.  Make it
  hello/ready, or some other unambiguous choice.

* I took some liberty and tried to fully define the transitions between
  the sets.  So even though I'm not sure this is currently handled, I
  made it well-defined to issue "watch" for a path that is in the
  changed set.

* "bye" is confusing, because in practice git issues "forget"s after
  "bye".  The best I can come up with is "setstate", I'm sure you have
  better ideas.

There's also the problem of ordering guarantees between the socket and
inotify.  I haven't found any, so I would conservatively assume that the
socket messages may in fact arrive before inotify, which is a race in
the current code.  E.g., in the sequence 'touch foo; git status' the
daemon may see

  socket                    inotify                  
  < hello...
  < status
  > new <empty list>
                            touch foo

I think a clever way to handle this would be to add a new command:

  Wait::
    This command serves synchronization.  Git creates a file of its
    choice in $GIT_DIR/watch (say, `.git/watch/wait.<random>`).  Then it
    sends "wait <path>".  The watcher MUST block until it has processed
    all change notifications up to and including <path>.

This assumes the FS notification API to be ordered, which appears to be
the case for inotify; from inotify(7):

  The events returned by reading from an inotify file descriptor form an
  ordered queue.  Thus, for example, it is guaranteed that when renaming
  from one directory to another, events will be produced in the correct
  order on the inotify file descriptor.

As a corollary, from watching .git/watch you get free notification of
'rm .git/watch/socket' as a termination signal.

We also need to think about how other OS's APIs work with this.  From
what I've heard about the Windows API, it should work well, so that's
great.  I suspect the *bsd/darwin API corresponding to inotify won't be
too different, but it's better to work this out now.

> Thomas, you were a proponent of per-user daemon last time. I agree
> that is a better solution when you need to support submodules. So if
> you have time, have a look and see if anything I did may prevent
> per-user daemon changes later (hint, I have a few unfriendly exit() in
> file-watcher.c). You also worked with inotify before maybe you can
> help spot some mishandling too as I'm totally new to inotify.

As you note, the protocol for "help, I don't know what's true any more"
(i.e., it got an inotify buffer overflow event) needs to be a special
"reset" message, not exit().

The per-repo approach does require keeping open an extra FD per repo
though, regardless of whether the daemon is actually for all of the
user's repos.  The linuxen I run as desktop systems usually default
ulimit -n to 1024, so that's not a massive restriction, but still.

As a reminder to other reviewers, the reason I wanted a per-user watcher
is that /proc/sys/fs/inotify/max_user_instances defaults to 128 on my
systems.  We need one inotify FD per watcher process, and given that a
full android tree had something on the order of 300 repos last I looked,
that just won't fly.

As far as inotify corner-cases go, the only one I'm aware of is
directory renames.  I suspect we'll have to watch directories all the
way up to the repository root to reliably detect when this happens.  Not
sure how to best handle this.  Perhaps we should declare Git completely
agnostic wrt such issues, and behind the scenes issue all watches up to
the root even if we don't need them for anything other than directory
renames.


Ok, that's probably a confused sum of rambles.  Let me know if you can
make any sense of it.

-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing
  2014-01-17  9:47   ` [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing Nguyễn Thái Ngọc Duy
@ 2014-01-19 17:04     ` Thomas Rast
  2014-01-20  1:32       ` Duy Nguyen
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Rast @ 2014-01-19 17:04 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> This can be used to force watcher on when running the test
> suite.
>
> git-file-watcher processes are not automatically cleaned up after each
> test. So after running the test suite you'll be left with plenty
> git-file-watcher processes that should all end after about a minute.

Probably not a very good idea, especially in noninteractive use?  E.g.,
a bisection through the test suite or parallel test runs on different
commits may exhaust the available processes and/or memory.

Each test should make an effort to clean up all watchers before
terminating.

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  read-cache.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index 5dae9eb..a1245d4 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1544,7 +1544,12 @@ static void validate_watcher(struct index_state *istate, const char *path)
>  	}
>  
>  	if (autorun_watcher == -1) {
> -		git_config(watcher_config, NULL);
> +		if (getenv("GIT_TEST_FORCE_WATCHER")) {
> +			watch_lowerlimit = 0;
> +			recent_limit = 0;
> +			autorun_watcher = 1;
> +		} else
> +			git_config(watcher_config, NULL);
>  		if (autorun_watcher == -1)
>  			autorun_watcher = 0;
>  	}

-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 05/14] read-cache: put some limits on file watching
  2014-01-17  9:47   ` [PATCH/WIP v2 05/14] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
@ 2014-01-19 17:06     ` Thomas Rast
  2014-01-20  1:36       ` Duy Nguyen
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Rast @ 2014-01-19 17:06 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> watch_entries() is a lot of computation and could trigger a lot more
> lookups in file-watcher. Normally after the first set of watches are
> in place, we do not need to update often. Moreover if the number of
> entries is small, the overhead of file watcher may actually slow git
> down.
>
> This patch only allows to update watches if the number of watchable
> files is over a limit (and there are new files added if this is not
> the first time). Measurements on Core i5-2520M and Linux 3.7.6, about
> 920 lstat() take 1ms. Somewhere between 2^16 and 2^17 lstat calls that
> it starts to take longer than 100ms. 2^16 is chosen at the minimum
> limit to start using file watcher.
>
> Recently updated files are not considered watchable because they are
> likely to be updated again soon, not worth the ping-pong game with
> file watcher. The default limit 30min is just a random value.

But then a fresh clone of a big repository would not get any benefit
from the watcher?

Not yet sure how to best handle this.

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/config.txt |  9 +++++++++
>  cache.h                  |  1 +
>  read-cache.c             | 44 ++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 46 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a405806..e394399 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1038,6 +1038,15 @@ difftool.<tool>.cmd::
>  difftool.prompt::
>  	Prompt before each invocation of the diff tool.
>  
> +filewatcher.minfiles::
> +	Start watching files if the number of watchable files are
> +	above this limit. Default value is 65536.
> +
> +filewatcher.recentlimit::
> +	Files that are last updated within filewatcher.recentlimit
> +	seconds from now are not considered watchable. Default value
> +	is 1800 (30 minutes).
> +
>  fetch.recurseSubmodules::
>  	This option can be either set to a boolean value or to 'on-demand'.
>  	Setting it to a boolean changes the behavior of fetch and pull to
> diff --git a/cache.h b/cache.h
> index 0d55551..bcec29b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -278,6 +278,7 @@ struct index_state {
>  	struct cache_tree *cache_tree;
>  	struct cache_time timestamp;
>  	unsigned name_hash_initialized : 1,
> +		 update_watches : 1,
>  		 initialized : 1;
>  	struct hashmap name_hash;
>  	struct hashmap dir_hash;
> diff --git a/read-cache.c b/read-cache.c
> index 21c3207..406834a 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -38,6 +38,8 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>  #define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
>  
>  struct index_state the_index;
> +static int watch_lowerlimit = 65536;
> +static int recent_limit = 1800;
>  
>  static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
>  {
> @@ -1014,6 +1016,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
>  			(istate->cache_nr - pos - 1) * sizeof(ce));
>  	set_index_entry(istate, pos, ce);
>  	istate->cache_changed = 1;
> +	istate->update_watches = 1;
>  	return 0;
>  }
>  
> @@ -1300,13 +1303,14 @@ static void read_watch_extension(struct index_state *istate, uint8_t *data,
>  				 unsigned long sz)
>  {
>  	int i;
> -	if ((istate->cache_nr + 7) / 8 != sz) {
> +	if ((istate->cache_nr + 7) / 8 + 1 != sz) {
>  		error("invalid 'WATC' extension");
>  		return;
>  	}
>  	for (i = 0; i < istate->cache_nr; i++)
>  		if (data[i / 8] & (1 << (i % 8)))
>  			istate->cache[i]->ce_flags |= CE_WATCHED;
> +	istate->update_watches = data[sz - 1];
>  }
>  
>  static int read_index_extension(struct index_state *istate,
> @@ -1449,6 +1453,19 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
>  	return ce;
>  }
>  
> +static int watcher_config(const char *var, const char *value, void *data)
> +{
> +	if (!strcmp(var, "filewatcher.minfiles")) {
> +		watch_lowerlimit = git_config_int(var, value);
> +		return 0;
> +	}
> +	if (!strcmp(var, "filewatcher.recentlimit")) {
> +		recent_limit = git_config_int(var, value);
> +		return 0;
> +	}
> +	return 0;
> +}
> +
>  static void validate_watcher(struct index_state *istate, const char *path)
>  {
>  	int i;
> @@ -1458,6 +1475,7 @@ static void validate_watcher(struct index_state *istate, const char *path)
>  		return;
>  	}
>  
> +	git_config(watcher_config, NULL);
>  	istate->watcher = connect_watcher(path);
>  	if (istate->watcher != -1) {
>  		struct strbuf sb = STRBUF_INIT;
> @@ -1478,6 +1496,7 @@ static void validate_watcher(struct index_state *istate, const char *path)
>  			istate->cache[i]->ce_flags &= ~CE_WATCHED;
>  			istate->cache_changed = 1;
>  		}
> +	istate->update_watches = 1;
>  }
>  
>  static int sort_by_date(const void *a_, const void *b_)
> @@ -1524,7 +1543,7 @@ static int do_watch_entries(struct index_state *istate,
>  	return 0;
>  }
>  
> -static inline int ce_watchable(struct cache_entry *ce)
> +static inline int ce_watchable(struct cache_entry *ce, time_t now)
>  {
>  	return ce_uptodate(ce) && /* write_index will catch late ce_uptodate bits */
>  		!(ce->ce_flags & CE_WATCHED) &&
> @@ -1534,7 +1553,8 @@ static inline int ce_watchable(struct cache_entry *ce)
>  		 * obviously. S_IFLNK could be problematic because
>  		 * inotify may follow symlinks without IN_DONT_FOLLOW
>  		 */
> -		S_ISREG(ce->ce_mode);
> +		S_ISREG(ce->ce_mode) &&
> +		(ce->ce_stat_data.sd_mtime.sec + recent_limit < now);
>  }
>  
>  static void watch_entries(struct index_state *istate)
> @@ -1544,15 +1564,20 @@ static void watch_entries(struct index_state *istate)
>  	struct strbuf sb = STRBUF_INIT;
>  	int val;
>  	socklen_t vallen = sizeof(val);
> +	time_t now = time(NULL);
>  
> -	if (istate->watcher <= 0)
> +	if (istate->watcher <= 0 || !istate->update_watches)
>  		return;
> +	istate->update_watches = 0;
> +	istate->cache_changed = 1;
>  	for (i = nr = 0; i < istate->cache_nr; i++)
> -		if (ce_watchable(istate->cache[i]))
> +		if (ce_watchable(istate->cache[i], now))
>  			nr++;
> +	if (nr < watch_lowerlimit)
> +		return;
>  	sorted = xmalloc(sizeof(*sorted) * nr);
>  	for (i = nr = 0; i < istate->cache_nr; i++)
> -		if (ce_watchable(istate->cache[i]))
> +		if (ce_watchable(istate->cache[i], now))
>  			sorted[nr++] = istate->cache[i];
>  
>  	getsockopt(istate->watcher, SOL_SOCKET, SO_SNDBUF, &val, &vallen);
> @@ -1616,6 +1641,7 @@ int read_index_from(struct index_state *istate, const char *path)
>  	istate->cache_alloc = alloc_nr(istate->cache_nr);
>  	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
>  	istate->initialized = 1;
> +	istate->update_watches = 1;
>  
>  	if (istate->version == 4)
>  		previous_name = &previous_name_buf;
> @@ -2024,8 +2050,9 @@ int write_index(struct index_state *istate, int newfd)
>  		if (err)
>  			return -1;
>  	}
> -	if (has_watches) {
> -		int id, sz = (entries - removed + 7) / 8;
> +	if (has_watches ||
> +	    (istate->watcher != -1 && !istate->update_watches)) {
> +		int id, sz = (entries - removed + 7) / 8 + 1;
>  		uint8_t *data = xmalloc(sz);
>  		memset(data, 0, sz);
>  		for (i = 0, id = 0; i < entries && has_watches; i++) {
> @@ -2038,6 +2065,7 @@ int write_index(struct index_state *istate, int newfd)
>  			}
>  			id++;
>  		}
> +		data[sz - 1] = istate->update_watches;
>  		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
>  			|| ce_write(&c, newfd, data, sz) < 0;
>  		free(data);

-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched
  2014-01-17  9:47   ` [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
  2014-01-17 11:19     ` Thomas Gummerer
@ 2014-01-19 17:06     ` Thomas Rast
  2014-01-20  1:38       ` Duy Nguyen
  1 sibling, 1 reply; 72+ messages in thread
From: Thomas Rast @ 2014-01-19 17:06 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> If an entry is "watched", git lets an external program decide if the
> entry is modified or not. It's more like --assume-unchanged, but
> designed to be controlled by machine.
>
> We are running out of on-disk ce_flags, so instead of extending
> on-disk entry format again, "watched" flags are in-core only and
> stored as extension instead.

I wonder if this would be a good use-case for EWAH bitmaps?  Presumably
most users would end up having only a few large ranges of files that are
being watched.  Quite possibly most users would watch *all* files.

-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 00/14] inotify support
  2014-01-19 17:04   ` [PATCH/WIP v2 00/14] inotify support Thomas Rast
@ 2014-01-20  1:28     ` Duy Nguyen
  2014-01-20 21:51       ` Thomas Rast
  2014-01-28 10:46     ` Duy Nguyen
  1 sibling, 1 reply; 72+ messages in thread
From: Duy Nguyen @ 2014-01-20  1:28 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git Mailing List

On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast <tr@thomasrast.ch> wrote:
> I never got the last three patches, did you send them?
>
> Also, this doesn't cleanly apply anywhere at my end.  Can you push it
> somewhere for easier experimentation?

Sorry I rebased it on top of kb/fast-hashmap but never got around to
actually using hash tables. The code is here (and on top of master)

https://github.com/pclouds/git.git file-watcher

> So I think the watcher protocol can be specified roughly as:
>
>   Definitions: The watcher is a separate process that maintains a set of
>   _watched paths_.  Git uses the commands below to add or remove paths
>   from this set.
>
>   In addition it maintains a set of _changed paths_, which is a subset
>   of the watched paths.  Git occasionally queries the changed paths.  If
>   at any time after a path P was added to the "watched" set, P has or
>   may have changed, it MUST be added to the "changed" set.
>
>   Note that a path being "unchanged" under this definition does NOT mean
>   that it is unchanged in the index as per `git diff`.  It may have been
>   changed before the watch was issued!  Therefore, Git MUST test whether
>   the path started out unchanged using lstat() AFTER it was added to the
>   "watched" set.

Correct.

>   Handshake::
>     On connecting, git sends "hello <magic>".  <magic> is an opaque
>     string that identifies the state of the index.  The watcher MUST
>     respond with "hello <previous_magic>", where previous_magic is
>     obtained from the "bye" command (below).  If <magic> !=
>     <previous_magic>, the watcher MUST reset state for this repository.

In addition, git must reset itself (i.e. clear all CE_WATCHED flags)
because nothing can be trusted at this point.

...
>
> Did I get that approximately straight?  Perhaps you can fix up as needed
> and then turn it into the documentation for the protocol.

Will do (and probably steal some of your text above).

> There are several points about this that I don't like:
>
> * What does the datagram socket buy us?  You already do a lot of
>   pkt-line work for the really big messages, so why not just use
>   pkt-line everywhere and a streaming socket?

With datagram sockets I did not have to maintain the connected
sockets, which made it somewhat simpler to handle so far.

The default SO_SNDBUF goes up to 212k, so we can send up to that
amount without blocking. With current pkt-line we send up to 64k in
"watch" message then we have to wait for "fine", which results in more
context switches. But I think we can extend pkt-line's length field to
5 hex digit to cover this.

Streaming sockets are probably the way to go for per-user daemon, so
we can identify a socket with a repo.

> * The handshake should have room for capabilities, so that we can extend
>   the protocol.

Yeah. One more point for streaming sockets. With datagram sockets it's
harder to define a "session" and thus hard to add capabilities.

> * The hello/hello handshake is confusing, and probably would allow you
>   to point two watchers at each other without them noticing.  Make it
>   hello/ready, or some other unambiguous choice.

OK

> * I took some liberty and tried to fully define the transitions between
>   the sets.  So even though I'm not sure this is currently handled, I
>   made it well-defined to issue "watch" for a path that is in the
>   changed set.

Yes that should avoid races. The path can be removed from "watched"
set later after git acknowledges it.

> * "bye" is confusing, because in practice git issues "forget"s after
>   "bye".  The best I can come up with is "setstate", I'm sure you have
>   better ideas.

Originally it was "forget", "forget", "forget" then "bye". But with
that order, if git crashes before sending "bye" we could lose info in
"changed" set so the order was changed but I did not update the
command name.

> There's also the problem of ordering guarantees between the socket and
> inotify.  I haven't found any, so I would conservatively assume that the
> socket messages may in fact arrive before inotify, which is a race in
> the current code.  E.g., in the sequence 'touch foo; git status' the
> daemon may see
>
>   socket                    inotify
>   < hello...
>   < status
>   > new <empty list>
>                             touch foo
>
> I think a clever way to handle this would be to add a new command:
>
>   Wait::
>     This command serves synchronization.  Git creates a file of its
>     choice in $GIT_DIR/watch (say, `.git/watch/wait.<random>`).  Then it
>     sends "wait <path>".  The watcher MUST block until it has processed
>     all change notifications up to and including <path>.

So wait.<random> inotify event functions as a barrier. Nice.

> As far as inotify corner-cases go, the only one I'm aware of is
> directory renames.  I suspect we'll have to watch directories all the
> way up to the repository root to reliably detect when this happens.  Not
> sure how to best handle this.  Perhaps we should declare Git completely
> agnostic wrt such issues, and behind the scenes issue all watches up to
> the root even if we don't need them for anything other than directory
> renames.

Under normal circumstances we would watch all directories in the
worktree anyway. I'll need to write some tests for inotify..

> Ok, that's probably a confused sum of rambles.  Let me know if you can
> make any sense of it.

Thank you for your input. Now I'm back to the white board (or paper).
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing
  2014-01-19 17:04     ` Thomas Rast
@ 2014-01-20  1:32       ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-20  1:32 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git Mailing List

On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast <tr@thomasrast.ch> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> This can be used to force watcher on when running the test
>> suite.
>>
>> git-file-watcher processes are not automatically cleaned up after each
>> test. So after running the test suite you'll be left with plenty
>> git-file-watcher processes that should all end after about a minute.
>
> Probably not a very good idea, especially in noninteractive use?  E.g.,
> a bisection through the test suite or parallel test runs on different
> commits may exhaust the available processes and/or memory.

I think we run out of inotify resources before hitting process/memory
limits. At least that's the case when running tests in parallel.

> Each test should make an effort to clean up all watchers before
> terminating.

For now it's hard to do this correctly. Maybe once we get the
multi-repo file watcher, we could launch one watcher per trash
directory and cleaning up would be easier.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 05/14] read-cache: put some limits on file watching
  2014-01-19 17:06     ` Thomas Rast
@ 2014-01-20  1:36       ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-20  1:36 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git Mailing List

On Mon, Jan 20, 2014 at 12:06 AM, Thomas Rast <tr@thomasrast.ch> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> watch_entries() is a lot of computation and could trigger a lot more
>> lookups in file-watcher. Normally after the first set of watches are
>> in place, we do not need to update often. Moreover if the number of
>> entries is small, the overhead of file watcher may actually slow git
>> down.
>>
>> This patch only allows to update watches if the number of watchable
>> files is over a limit (and there are new files added if this is not
>> the first time). Measurements on Core i5-2520M and Linux 3.7.6, about
>> 920 lstat() take 1ms. Somewhere between 2^16 and 2^17 lstat calls that
>> it starts to take longer than 100ms. 2^16 is chosen at the minimum
>> limit to start using file watcher.
>>
>> Recently updated files are not considered watchable because they are
>> likely to be updated again soon, not worth the ping-pong game with
>> file watcher. The default limit 30min is just a random value.
>
> But then a fresh clone of a big repository would not get any benefit
> from the watcher?
>
> Not yet sure how to best handle this.

Gaahh, perhaps limit the number of unwatchable recent files to a
hundred or so in addition to time limit.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched
  2014-01-19 17:06     ` Thomas Rast
@ 2014-01-20  1:38       ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-20  1:38 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git Mailing List

On Mon, Jan 20, 2014 at 12:06 AM, Thomas Rast <tr@thomasrast.ch> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> If an entry is "watched", git lets an external program decide if the
>> entry is modified or not. It's more like --assume-unchanged, but
>> designed to be controlled by machine.
>>
>> We are running out of on-disk ce_flags, so instead of extending
>> on-disk entry format again, "watched" flags are in-core only and
>> stored as extension instead.
>
> I wonder if this would be a good use-case for EWAH bitmaps?  Presumably
> most users would end up having only a few large ranges of files that are
> being watched.  Quite possibly most users would watch *all* files.

Oh yeah. I edited my commit message locally to this a while ago

    On webkit.git with
    182k entries, that's 364k more to be SHA-1'd on current index
    versions, compared to 22k in this format (and even less when
    jk/pack-bitmap ewah graduates and we can use ewah compression)
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 00/14] inotify support
  2014-01-20  1:28     ` Duy Nguyen
@ 2014-01-20 21:51       ` Thomas Rast
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Rast @ 2014-01-20 21:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

>> I think a clever way to handle this would be to add a new command:
>>
>>   Wait::
>>     This command serves synchronization.  Git creates a file of its
>>     choice in $GIT_DIR/watch (say, `.git/watch/wait.<random>`).  Then it
>>     sends "wait <path>".  The watcher MUST block until it has processed
>>     all change notifications up to and including <path>.
>
> So wait.<random> inotify event functions as a barrier. Nice.

I forgot to specify a return for "wait".  Not sure you need one, though
correctly handling the timeout (that you apply for all select()) may be
somewhat tricky without it.

>> Ok, that's probably a confused sum of rambles.  Let me know if you can
>> make any sense of it.
>
> Thank you for your input. Now I'm back to the white board (or paper).

Don't go too far ;-)

Thanks a lot for doing this!  It's good that you picked it up, and I
think your design strikes a good balance in the complexity of the
protocol and the daemon's state.

-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH/WIP v2 00/14] inotify support
  2014-01-19 17:04   ` [PATCH/WIP v2 00/14] inotify support Thomas Rast
  2014-01-20  1:28     ` Duy Nguyen
@ 2014-01-28 10:46     ` Duy Nguyen
  1 sibling, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-01-28 10:46 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git Mailing List

On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast <tr@thomasrast.ch> wrote:
> There's also the problem of ordering guarantees between the socket and
> inotify.  I haven't found any, so I would conservatively assume that the
> socket messages may in fact arrive before inotify, which is a race in
> the current code.  E.g., in the sequence 'touch foo; git status' the
> daemon may see
>
>   socket                    inotify
>   < hello...
>   < status
>   > new <empty list>
>                             touch foo
>
> I think a clever way to handle this would be to add a new command:
>
>   Wait::
>     This command serves synchronization.  Git creates a file of its
>     choice in $GIT_DIR/watch (say, `.git/watch/wait.<random>`).  Then it
>     sends "wait <path>".  The watcher MUST block until it has processed
>     all change notifications up to and including <path>.

Assuming that the time between foo is touched and the time an event is
put in the daemon's queue is reasonably small, would emptying the
event queue at "hello" be enough? To my innocent eyes (at the kernel),
it seems inotify handling happens immediately after an fs event, and
it's uninterruptable (or at least not interruptable by another user
space process, I don't think we need to care about true interrupts).
If that's true, by the time the "touch syscall" is finished, the event
is already sitting in the daemon's queue.

The problem with wait.<random> is we need to tell the daemon to expect
it. Otherwise if the daemon processes the "wait.<random>" even before
"wait" is sent, it would try to wait for the (lost) "wait.<random>"
event forever. An extension is git touch wait.<random> regularly. Or
to keep a queue of processed "wait.*" events. Both look ugly imo.
Another option is send "expect wait.<random>" first, wait for ack,
touch wait.<random>, then send "wait", which is too much.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 00/26] inotify support
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
                     ` (11 preceding siblings ...)
  2014-01-19 17:04   ` [PATCH/WIP v2 00/14] inotify support Thomas Rast
@ 2014-02-03  4:28   ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 01/26] pkt-line.c: rename global variable buffer[] to something less generic Nguyễn Thái Ngọc Duy
                       ` (26 more replies)
  12 siblings, 27 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

I'm happy with this now. The only things left are applying ewah on the
watch index extension and maybe improve lookup performance a bit. The
former needs jk/pack-bitmap graduated. The latter is not urgent. Oh
and maybe address "BUGS" section (more like known limitations) in
git-file-watcher.txt.

For early adopters that fear a buggy file-watcher may cause update
loss, set GIT_TEST_WATCHED=1 (or 2). It'll do lstat() to verify
file-watcher results (so no perf. gain). Beware of race condition
that may lead to false positives, mentioned in 20/26 (maybe I should
do something about it).

The series can also be fetched from

https://github.com/pclouds/git.git file-watcher

Nguyễn Thái Ngọc Duy (26):
  pkt-line.c: rename global variable buffer[] to something less generic
  pkt-line.c: add packet_write_timeout()
  pkt-line.c: add packet_read_line_timeout()
  unix-socket: make unlink() optional in unix_stream_listen()
  Add git-file-watcher and basic connection handling logic
  file-watcher: check socket directory permission
  file-watcher: remove socket on exit
  file-watcher: add --detach
  read-cache: save trailing sha-1
  read-cache: new flag CE_WATCHED to mark what file is watched
  Clear CE_WATCHED when set CE_VALID alone
  read-cache: basic hand shaking to the file watcher
  read-cache: ask file watcher to watch files
  read-cache: put some limits on file watching
  read-cache: get changed file list from file watcher
  git-compat-util.h: add inotify stubs on non-Linux platforms
  file-watcher: inotify support, watching part
  file-watcher: inotify support, notification part
  Wrap CE_VALID test with ce_valid()
  read-cache: new variable to verify file-watcher results
  Support running file watcher with the test suite
  file-watcher: quit if $WATCHER/socket is gone
  file-watcher: tests for the daemon
  ls-files: print CE_WATCHED as W (or "w" with CE_VALID)
  file-watcher: tests for the client side
  Disable file-watcher with system inotify on some tests

 .gitignore                               |    2 +
 Documentation/config.txt                 |   19 +
 Documentation/git-file-watcher.txt (new) |   54 ++
 Documentation/git-ls-files.txt           |    1 +
 Documentation/technical/index-format.txt |    9 +
 Makefile                                 |    3 +
 builtin/grep.c                           |    2 +-
 builtin/ls-files.c                       |   14 +-
 builtin/update-index.c                   |   12 +-
 cache.h                                  |   17 +
 config.mak.uname                         |    1 +
 credential-cache--daemon.c               |    2 +-
 daemon.c                                 |   30 +-
 diff-lib.c                               |    4 +-
 diff.c                                   |    2 +-
 file-watcher-lib.c (new)                 |  321 +++++++++
 file-watcher-lib.h (new)                 |    8 +
 file-watcher.c (new)                     | 1149 ++++++++++++++++++++++++++++++
 git-compat-util.h                        |   43 ++
 pkt-line.c                               |   61 +-
 pkt-line.h                               |    2 +
 read-cache.c                             |  119 +++-
 setup.c                                  |   25 +
 t/t1011-read-tree-sparse-checkout.sh     |    2 +
 t/t2104-update-index-skip-worktree.sh    |    2 +
 t/t7011-skip-worktree-reading.sh         |    2 +
 t/t7012-skip-worktree-writing.sh         |    2 +
 t/t7513-file-watcher.sh (new +x)         |  382 ++++++++++
 t/t7514-file-watcher-lib.sh (new +x)     |  190 +++++
 test-file-watcher.c (new)                |  111 +++
 unix-socket.c                            |    5 +-
 unix-socket.h                            |    2 +-
 unpack-trees.c                           |    2 +-
 wrapper.c                                |   47 ++
 34 files changed, 2591 insertions(+), 56 deletions(-)
 create mode 100644 Documentation/git-file-watcher.txt
 create mode 100644 file-watcher-lib.c
 create mode 100644 file-watcher-lib.h
 create mode 100644 file-watcher.c
 create mode 100755 t/t7513-file-watcher.sh
 create mode 100755 t/t7514-file-watcher-lib.sh
 create mode 100644 test-file-watcher.c

-- 
1.8.5.2.240.g8478abd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 01/26] pkt-line.c: rename global variable buffer[] to something less generic
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 02/26] pkt-line.c: add packet_write_timeout() Nguyễn Thái Ngọc Duy
                       ` (25 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

"buffer" is a local variable in some other functions. Rename the
global one to make it less confusing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pkt-line.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index bc63b3b..eac45ad 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -64,14 +64,15 @@ void packet_buf_flush(struct strbuf *buf)
 }
 
 #define hex(a) (hexchar[(a) & 15])
-static char buffer[1000];
+static char write_buffer[1000];
 static unsigned format_packet(const char *fmt, va_list args)
 {
 	static char hexchar[] = "0123456789abcdef";
+	char *buffer = write_buffer;
 	unsigned n;
 
-	n = vsnprintf(buffer + 4, sizeof(buffer) - 4, fmt, args);
-	if (n >= sizeof(buffer)-4)
+	n = vsnprintf(buffer + 4, sizeof(write_buffer) - 4, fmt, args);
+	if (n >= sizeof(write_buffer)-4)
 		die("protocol error: impossibly long line");
 	n += 4;
 	buffer[0] = hex(n >> 12);
@@ -90,7 +91,7 @@ void packet_write(int fd, const char *fmt, ...)
 	va_start(args, fmt);
 	n = format_packet(fmt, args);
 	va_end(args);
-	write_or_die(fd, buffer, n);
+	write_or_die(fd, write_buffer, n);
 }
 
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
@@ -101,7 +102,7 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 	va_start(args, fmt);
 	n = format_packet(fmt, args);
 	va_end(args);
-	strbuf_add(buf, buffer, n);
+	strbuf_add(buf, write_buffer, n);
 }
 
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 02/26] pkt-line.c: add packet_write_timeout()
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 01/26] pkt-line.c: rename global variable buffer[] to something less generic Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 03/26] pkt-line.c: add packet_read_line_timeout() Nguyễn Thái Ngọc Duy
                       ` (24 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h    |  1 +
 pkt-line.c | 15 +++++++++++++++
 pkt-line.h |  1 +
 wrapper.c  | 26 ++++++++++++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/cache.h b/cache.h
index dc040fb..718e32b 100644
--- a/cache.h
+++ b/cache.h
@@ -1231,6 +1231,7 @@ extern void fsync_or_die(int fd, const char *);
 
 extern ssize_t read_in_full(int fd, void *buf, size_t count);
 extern ssize_t write_in_full(int fd, const void *buf, size_t count);
+extern ssize_t write_in_full_timeout(int fd, const void *buf, size_t count, int timeout);
 static inline ssize_t write_str_in_full(int fd, const char *str)
 {
 	return write_in_full(fd, str, strlen(str));
diff --git a/pkt-line.c b/pkt-line.c
index eac45ad..cf681e9 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -94,6 +94,21 @@ void packet_write(int fd, const char *fmt, ...)
 	write_or_die(fd, write_buffer, n);
 }
 
+int packet_write_timeout(int fd, int timeout, const char *fmt, ...)
+{
+	static struct strbuf sb = STRBUF_INIT;
+	va_list args;
+	unsigned n;
+
+	if (fd == -1)
+		return -1;
+	va_start(args, fmt);
+	strbuf_reset(&sb);
+	n = format_packet(fmt, args);
+	va_end(args);
+	return write_in_full_timeout(fd, write_buffer, n, timeout);
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
 	va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 0a838d1..4b93a0c 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -21,6 +21,7 @@
  */
 void packet_flush(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_timeout(int fd, int timeout, const char *fmt, ...) __attribute__((format (printf, 3, 4)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
diff --git a/wrapper.c b/wrapper.c
index 0cc5636..9a0e289 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -214,6 +214,32 @@ ssize_t write_in_full(int fd, const void *buf, size_t count)
 	return total;
 }
 
+ssize_t write_in_full_timeout(int fd, const void *buf,
+			      size_t count, int timeout)
+{
+	struct pollfd pfd;
+	const char *p = buf;
+	ssize_t total = 0;
+
+	pfd.fd = fd;
+	pfd.events = POLLOUT;
+	while (count > 0 && poll(&pfd, 1, timeout) > 0 &&
+	       (pfd.revents & POLLOUT)) {
+		ssize_t written = xwrite(fd, p, count);
+		if (written < 0)
+			return -1;
+		if (!written) {
+			errno = ENOSPC;
+			return -1;
+		}
+		count -= written;
+		p += written;
+		total += written;
+	}
+
+	return count ? -1 : total;
+}
+
 int xdup(int fd)
 {
 	int ret = dup(fd);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 03/26] pkt-line.c: add packet_read_line_timeout()
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 01/26] pkt-line.c: rename global variable buffer[] to something less generic Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 02/26] pkt-line.c: add packet_write_timeout() Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 04/26] unix-socket: make unlink() optional in unix_stream_listen() Nguyễn Thái Ngọc Duy
                       ` (23 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This version is also gentler than its friend packet_read_line()
because it's designed for side channel I/O that should not abort the
program even if the channel is broken.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h    |  1 +
 pkt-line.c | 35 +++++++++++++++++++++++++++++++++++
 pkt-line.h |  1 +
 wrapper.c  | 21 +++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/cache.h b/cache.h
index 718e32b..939db46 100644
--- a/cache.h
+++ b/cache.h
@@ -1230,6 +1230,7 @@ extern int write_or_whine_pipe(int fd, const void *buf, size_t count, const char
 extern void fsync_or_die(int fd, const char *);
 
 extern ssize_t read_in_full(int fd, void *buf, size_t count);
+extern ssize_t read_in_full_timeout(int fd, void *buf, size_t count, int timeout);
 extern ssize_t write_in_full(int fd, const void *buf, size_t count);
 extern ssize_t write_in_full_timeout(int fd, const void *buf, size_t count, int timeout);
 static inline ssize_t write_str_in_full(int fd, const char *str)
diff --git a/pkt-line.c b/pkt-line.c
index cf681e9..5a07e97 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -229,3 +229,38 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
 	return packet_read_line_generic(-1, src, src_len, dst_len);
 }
+
+char *packet_read_line_timeout(int fd, int timeout, int *len_p)
+{
+	char *buf = packet_buffer;
+	int ret, len, buf_len = sizeof(packet_buffer);
+	char linelen[4];
+
+	if (fd == -1)
+		return NULL;
+	if ((ret = read_in_full_timeout(fd, linelen, 4, timeout)) < 0)
+		return NULL;
+	len = packet_length(linelen);
+	if (len < 0) {
+		error("protocol error: bad line length character: %.4s", linelen);
+		return NULL;
+	}
+	if (!len) {
+		packet_trace("0000", 4, 0);
+		if (len_p)
+			*len_p = 0;
+		return "";
+	}
+	len -= 4;
+	if (len >= buf_len) {
+		error("protocol error: bad line length %d", len);
+		return NULL;
+	}
+	if ((ret = read_in_full_timeout(fd, buf, len, timeout)) < 0)
+		return NULL;
+	buf[len] = '\0';
+	if (len_p)
+		*len_p = len;
+	packet_trace(buf, len, 0);
+	return buf;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 4b93a0c..d47dca5 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -69,6 +69,7 @@ int packet_read(int fd, char **src_buffer, size_t *src_len, char
  * packet is written to it.
  */
 char *packet_read_line(int fd, int *size);
+char *packet_read_line_timeout(int fd, int timeout, int *size);
 
 /*
  * Same as packet_read_line, but read from a buf rather than a descriptor;
diff --git a/wrapper.c b/wrapper.c
index 9a0e289..9cf10b2 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -193,6 +193,27 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
 	return total;
 }
 
+ssize_t read_in_full_timeout(int fd, void *buf, size_t count, int timeout)
+{
+	char *p = buf;
+	ssize_t total = 0;
+	struct pollfd pfd;
+
+	pfd.fd = fd;
+	pfd.events = POLLIN;
+	while (count > 0 && poll(&pfd, 1, timeout) > 0 &&
+	       (pfd.revents & POLLIN)) {
+		ssize_t loaded = xread(fd, p, count);
+		if (loaded <= 0)
+			return -1;
+		count -= loaded;
+		p += loaded;
+		total += loaded;
+	}
+
+	return count ? -1 : total;
+}
+
 ssize_t write_in_full(int fd, const void *buf, size_t count)
 {
 	const char *p = buf;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 04/26] unix-socket: make unlink() optional in unix_stream_listen()
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (2 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 03/26] pkt-line.c: add packet_read_line_timeout() Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 05/26] Add git-file-watcher and basic connection handling logic Nguyễn Thái Ngọc Duy
                       ` (22 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 credential-cache--daemon.c | 2 +-
 unix-socket.c              | 5 +++--
 unix-socket.h              | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/credential-cache--daemon.c b/credential-cache--daemon.c
index 390f194..1b995a9 100644
--- a/credential-cache--daemon.c
+++ b/credential-cache--daemon.c
@@ -207,7 +207,7 @@ static void serve_cache(const char *socket_path)
 {
 	int fd;
 
-	fd = unix_stream_listen(socket_path);
+	fd = unix_stream_listen(socket_path, 1);
 	if (fd < 0)
 		die_errno("unable to bind to '%s'", socket_path);
 
diff --git a/unix-socket.c b/unix-socket.c
index 01f119f..2be1af6 100644
--- a/unix-socket.c
+++ b/unix-socket.c
@@ -93,7 +93,7 @@ fail:
 	return -1;
 }
 
-int unix_stream_listen(const char *path)
+int unix_stream_listen(const char *path, int replace)
 {
 	int fd, saved_errno;
 	struct sockaddr_un sa;
@@ -103,7 +103,8 @@ int unix_stream_listen(const char *path)
 		return -1;
 	fd = unix_stream_socket();
 
-	unlink(path);
+	if (replace)
+		unlink(path);
 	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
 		goto fail;
 
diff --git a/unix-socket.h b/unix-socket.h
index e271aee..18ee290 100644
--- a/unix-socket.h
+++ b/unix-socket.h
@@ -2,6 +2,6 @@
 #define UNIX_SOCKET_H
 
 int unix_stream_connect(const char *path);
-int unix_stream_listen(const char *path);
+int unix_stream_listen(const char *path, int replace);
 
 #endif /* UNIX_SOCKET_H */
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 05/26] Add git-file-watcher and basic connection handling logic
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (3 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 04/26] unix-socket: make unlink() optional in unix_stream_listen() Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 06/26] file-watcher: check socket directory permission Nguyễn Thái Ngọc Duy
                       ` (21 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

git-file-watcher is a daemon (*) that watches file changes and tells
git about them. The intent is to avoid lstat() calls at index refresh
time, which could be a lot on big working directory.

The actual monitoring needs support from OS (inotify, FSEvents,
FindFirstChangeNotification or kqueue) and is not part of this patch
or the next few yet. This patch only provides a UNIX socket server.

(*) it will be a a daemon in this end, but in this patch it runs in
the foreground.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore                               |   1 +
 Documentation/git-file-watcher.txt (new) |  29 +++++++
 Makefile                                 |   1 +
 file-watcher.c (new)                     | 140 +++++++++++++++++++++++++++++++
 4 files changed, 171 insertions(+)
 create mode 100644 Documentation/git-file-watcher.txt
 create mode 100644 file-watcher.c

diff --git a/.gitignore b/.gitignore
index b5f9def..12c78f0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -56,6 +56,7 @@
 /git-fast-import
 /git-fetch
 /git-fetch-pack
+/git-file-watcher
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
new file mode 100644
index 0000000..625a389
--- /dev/null
+++ b/Documentation/git-file-watcher.txt
@@ -0,0 +1,29 @@
+git-file-watcher(1)
+===================
+
+NAME
+----
+git-file-watcher - File update notification daemon
+
+SYNOPSIS
+--------
+[verse]
+'git file-watcher' [options] <socket directory>
+
+DESCRIPTION
+-----------
+This program watches file changes in a git working directory and let
+Git now what files have been changed so that Git does not have to call
+lstat(2) to detect that itself.
+
+OPTIONS
+-------
+
+SEE ALSO
+--------
+linkgit:git-update-index[1],
+linkgit:git-config[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index dddaf4f..8eef0d6 100644
--- a/Makefile
+++ b/Makefile
@@ -536,6 +536,7 @@ PROGRAMS += $(EXTRA_PROGRAMS)
 PROGRAM_OBJS += credential-store.o
 PROGRAM_OBJS += daemon.o
 PROGRAM_OBJS += fast-import.o
+PROGRAM_OBJS += file-watcher.o
 PROGRAM_OBJS += http-backend.o
 PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
diff --git a/file-watcher.c b/file-watcher.c
new file mode 100644
index 0000000..a6d7f17
--- /dev/null
+++ b/file-watcher.c
@@ -0,0 +1,140 @@
+#include "cache.h"
+#include "sigchain.h"
+#include "parse-options.h"
+#include "exec_cmd.h"
+#include "unix-socket.h"
+
+static const char *const file_watcher_usage[] = {
+	N_("git file-watcher [options] <socket directory>"),
+	NULL
+};
+
+struct connection {
+	int sock;
+};
+
+static struct connection **conns;
+static struct pollfd *pfd;
+static int conns_alloc, pfd_nr, pfd_alloc;
+
+static int shutdown_connection(int id)
+{
+	struct connection *conn = conns[id];
+	conns[id] = NULL;
+	pfd[id].fd = -1;
+	if (!conn)
+		return 0;
+	close(conn->sock);
+	free(conn);
+	return 0;
+}
+
+static int handle_command(int conn_id)
+{
+	return 0;
+}
+
+static void accept_connection(int fd)
+{
+	struct connection *conn;
+	int client = accept(fd, NULL, NULL);
+	if (client < 0) {
+		warning(_("accept failed: %s"), strerror(errno));
+		return;
+	}
+
+	ALLOC_GROW(pfd, pfd_nr + 1, pfd_alloc);
+	pfd[pfd_nr].fd = client;
+	pfd[pfd_nr].events = POLLIN;
+	pfd[pfd_nr].revents = 0;
+
+	ALLOC_GROW(conns, pfd_nr + 1, conns_alloc);
+	conn = xmalloc(sizeof(*conn));
+	memset(conn, 0, sizeof(*conn));
+	conn->sock = client;
+	conns[pfd_nr] = conn;
+	pfd_nr++;
+}
+
+int main(int argc, const char **argv)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int i, new_nr, fd, quit = 0, nr_common;
+	const char *socket_path = NULL;
+	struct option options[] = {
+		OPT_END()
+	};
+
+	git_extract_argv0_path(argv[0]);
+	git_setup_gettext();
+	argc = parse_options(argc, argv, NULL, options,
+			     file_watcher_usage, 0);
+	if (argc < 1)
+		die(_("socket path missing"));
+	else if (argc > 1)
+		die(_("too many arguments"));
+
+	socket_path = argv[0];
+	strbuf_addf(&sb, "%s/socket", socket_path);
+	fd = unix_stream_listen(sb.buf, 0);
+	if (fd == -1)
+		die_errno(_("unable to listen at %s"), sb.buf);
+	strbuf_reset(&sb);
+
+	nr_common = 1;
+	pfd_alloc = pfd_nr = nr_common;
+	pfd = xmalloc(sizeof(*pfd) * pfd_alloc);
+	pfd[0].fd = fd;
+	pfd[0].events = POLLIN;
+
+	while (!quit) {
+		if (poll(pfd, pfd_nr, -1) < 0) {
+			if (errno != EINTR) {
+				error("Poll failed, resuming: %s",
+				      strerror(errno));
+				sleep(1);
+			}
+			continue;
+		}
+
+		for (new_nr = i = nr_common; i < pfd_nr; i++) {
+			if (pfd[i].revents & (POLLERR | POLLNVAL))
+				shutdown_connection(i);
+			else if ((pfd[i].revents & POLLIN) && conns[i]) {
+				unsigned int avail = 1;
+				/*
+				 * pkt-line is not gentle with eof, at
+				 * least not with
+				 * packet_read_line(). Avoid feeding
+				 * eof to it.
+				 */
+				if ((pfd[i].revents & POLLHUP) &&
+				    ioctl(pfd[i].fd, FIONREAD, &avail))
+					die_errno("unable to FIONREAD inotify handle");
+				/*
+				 * We better have a way to handle all
+				 * packets in one go...
+				 */
+				if (avail)
+					handle_command(i);
+				else
+					shutdown_connection(i);
+			} else if (pfd[i].revents & POLLHUP)
+				shutdown_connection(i);
+			if (!conns[i])
+				continue;
+			if (i != new_nr) { /* pfd[] is shrunk, move pfd[i] up */
+				conns[new_nr] = conns[i];
+				pfd[new_nr] = pfd[i];
+			}
+			new_nr++; /* keep the good socket */
+		}
+		pfd_nr = new_nr;
+
+		if (pfd[0].revents & POLLIN)
+			accept_connection(pfd[0].fd);
+		if (pfd[0].revents & (POLLHUP | POLLERR | POLLNVAL))
+			die(_("error on listening socket"));
+	}
+	return 0;
+}
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 06/26] file-watcher: check socket directory permission
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (4 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 05/26] Add git-file-watcher and basic connection handling logic Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 07/26] file-watcher: remove socket on exit Nguyễn Thái Ngọc Duy
                       ` (20 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Access to the unix socket $WATCHER/socket is covered by $WATCHER's
permission. While the file watcher does not carry much information,
repo file listing is sometimes not something you want other users to
see. Make sure $WATCHER has 0700 permission to stop unwanted access.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/file-watcher.c b/file-watcher.c
index a6d7f17..91f4cfe 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -56,6 +56,37 @@ static void accept_connection(int fd)
 	pfd_nr++;
 }
 
+static const char permissions_advice[] =
+N_("The permissions on your socket directory are too loose; other\n"
+   "processes may be able to read your file listing. Consider running:\n"
+   "\n"
+   "	chmod 0700 %s");
+static void check_socket_directory(const char *path)
+{
+	struct stat st;
+	char *path_copy = xstrdup(path);
+	char *dir = dirname(path_copy);
+
+	if (!stat(dir, &st)) {
+		if (st.st_mode & 077)
+			die(_(permissions_advice), dir);
+		free(path_copy);
+		return;
+	}
+
+	/*
+	 * We must be sure to create the directory with the correct mode,
+	 * not just chmod it after the fact; otherwise, there is a race
+	 * condition in which somebody can chdir to it, sleep, then try to open
+	 * our protected socket.
+	 */
+	if (safe_create_leading_directories_const(dir) < 0)
+		die_errno(_("unable to create directories for '%s'"), dir);
+	if (mkdir(dir, 0700) < 0)
+		die_errno(_("unable to mkdir '%s'"), dir);
+	free(path_copy);
+}
+
 int main(int argc, const char **argv)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -76,6 +107,7 @@ int main(int argc, const char **argv)
 
 	socket_path = argv[0];
 	strbuf_addf(&sb, "%s/socket", socket_path);
+	check_socket_directory(sb.buf);
 	fd = unix_stream_listen(sb.buf, 0);
 	if (fd == -1)
 		die_errno(_("unable to listen at %s"), sb.buf);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 07/26] file-watcher: remove socket on exit
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (5 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 06/26] file-watcher: check socket directory permission Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 08/26] file-watcher: add --detach Nguyễn Thái Ngọc Duy
                       ` (19 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/file-watcher.c b/file-watcher.c
index 91f4cfe..9c639ef 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -56,6 +56,26 @@ static void accept_connection(int fd)
 	pfd_nr++;
 }
 
+static const char *socket_path;
+static int do_not_clean_up;
+
+static void cleanup(void)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (do_not_clean_up)
+		return;
+	strbuf_addf(&sb, "%s/socket", socket_path);
+	unlink(sb.buf);
+	strbuf_release(&sb);
+}
+
+static void cleanup_on_signal(int signo)
+{
+	cleanup();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
 static const char permissions_advice[] =
 N_("The permissions on your socket directory are too loose; other\n"
    "processes may be able to read your file listing. Consider running:\n"
@@ -91,7 +111,6 @@ int main(int argc, const char **argv)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int i, new_nr, fd, quit = 0, nr_common;
-	const char *socket_path = NULL;
 	struct option options[] = {
 		OPT_END()
 	};
@@ -113,6 +132,9 @@ int main(int argc, const char **argv)
 		die_errno(_("unable to listen at %s"), sb.buf);
 	strbuf_reset(&sb);
 
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
 	nr_common = 1;
 	pfd_alloc = pfd_nr = nr_common;
 	pfd = xmalloc(sizeof(*pfd) * pfd_alloc);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 08/26] file-watcher: add --detach
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (6 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 07/26] file-watcher: remove socket on exit Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 09/26] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
                       ` (18 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

In daemon mode, stdout and stderr are saved in $WATCHER/log.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-file-watcher.txt |  2 ++
 cache.h                            |  1 +
 daemon.c                           | 30 ++++--------------------------
 file-watcher.c                     | 17 +++++++++++++++++
 setup.c                            | 25 +++++++++++++++++++++++++
 5 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
index 625a389..ec81f18 100644
--- a/Documentation/git-file-watcher.txt
+++ b/Documentation/git-file-watcher.txt
@@ -18,6 +18,8 @@ lstat(2) to detect that itself.
 
 OPTIONS
 -------
+--detach::
+	Run in background.
 
 SEE ALSO
 --------
diff --git a/cache.h b/cache.h
index 939db46..7a836b1 100644
--- a/cache.h
+++ b/cache.h
@@ -434,6 +434,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int);
 extern int init_db(const char *template_dir, unsigned int flags);
 
 extern void sanitize_stdfds(void);
+extern int daemonize(int *);
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
diff --git a/daemon.c b/daemon.c
index 503e039..2650504 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1056,11 +1056,6 @@ static void drop_privileges(struct credentials *cred)
 	/* nothing */
 }
 
-static void daemonize(void)
-{
-	die("--detach not supported on this platform");
-}
-
 static struct credentials *prepare_credentials(const char *user_name,
     const char *group_name)
 {
@@ -1102,24 +1097,6 @@ static struct credentials *prepare_credentials(const char *user_name,
 
 	return &c;
 }
-
-static void daemonize(void)
-{
-	switch (fork()) {
-		case 0:
-			break;
-		case -1:
-			die_errno("fork failed");
-		default:
-			exit(0);
-	}
-	if (setsid() == -1)
-		die_errno("setsid failed");
-	close(0);
-	close(1);
-	close(2);
-	sanitize_stdfds();
-}
 #endif
 
 static void store_pid(const char *path)
@@ -1333,9 +1310,10 @@ int main(int argc, char **argv)
 	if (inetd_mode || serve_mode)
 		return execute();
 
-	if (detach)
-		daemonize();
-	else
+	if (detach) {
+		if (daemonize(NULL))
+			die("--detach not supported on this platform");
+	} else
 		sanitize_stdfds();
 
 	if (pid_file)
diff --git a/file-watcher.c b/file-watcher.c
index 9c639ef..1e1ccad 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -111,7 +111,10 @@ int main(int argc, const char **argv)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int i, new_nr, fd, quit = 0, nr_common;
+	int daemon = 0;
 	struct option options[] = {
+		OPT_BOOL(0, "detach", &daemon,
+			 N_("run in background")),
 		OPT_END()
 	};
 
@@ -135,6 +138,20 @@ int main(int argc, const char **argv)
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
+	if (daemon) {
+		int err;
+		strbuf_addf(&sb, "%s/log", socket_path);
+		err = open(sb.buf, O_CREAT | O_TRUNC | O_WRONLY, 0600);
+		adjust_shared_perm(sb.buf);
+		if (err == -1)
+			die_errno(_("unable to create %s"), sb.buf);
+		if (daemonize(&do_not_clean_up))
+			die(_("--detach not supported on this platform"));
+		dup2(err, 1);
+		dup2(err, 2);
+		close(err);
+	}
+
 	nr_common = 1;
 	pfd_alloc = pfd_nr = nr_common;
 	pfd = xmalloc(sizeof(*pfd) * pfd_alloc);
diff --git a/setup.c b/setup.c
index 6c3f85f..757c45f 100644
--- a/setup.c
+++ b/setup.c
@@ -787,3 +787,28 @@ void sanitize_stdfds(void)
 	if (fd > 2)
 		close(fd);
 }
+
+int daemonize(int *flag)
+{
+#ifndef NO_POSIX_GOODIES
+	switch (fork()) {
+		case 0:
+			break;
+		case -1:
+			die_errno("fork failed");
+		default:
+			if (flag)
+				*flag = 1;
+			exit(0);
+	}
+	if (setsid() == -1)
+		die_errno("setsid failed");
+	close(0);
+	close(1);
+	close(2);
+	sanitize_stdfds();
+	return 0;
+#else
+	return -1;
+#endif
+}
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 09/26] read-cache: save trailing sha-1
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (7 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 08/26] file-watcher: add --detach Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 10/26] read-cache: new flag CE_WATCHED to mark what file is watched Nguyễn Thái Ngọc Duy
                       ` (17 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This will be used as signature to know if the index has changed.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      | 1 +
 read-cache.c | 7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 7a836b1..f14d535 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
+	unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 33dd676..3b6daf1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1269,10 +1269,11 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(struct cache_header *hdr,
+		      unsigned long size,
+		      unsigned char *sha1)
 {
 	git_SHA_CTX c;
-	unsigned char sha1[20];
 	int hdr_version;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
@@ -1461,7 +1462,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	close(fd);
 
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	if (verify_hdr(hdr, mmap_size, istate->sha1) < 0)
 		goto unmap;
 
 	istate->version = ntohl(hdr->hdr_version);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 10/26] read-cache: new flag CE_WATCHED to mark what file is watched
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (8 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 09/26] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:28     ` [PATCH v3 11/26] Clear CE_WATCHED when set CE_VALID alone Nguyễn Thái Ngọc Duy
                       ` (16 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This bit is basically "dynamic CE_VALID". It marks entries that are
being watched by the incoming file-watcher. When an index is loaded,
file watcher is contacted and the list of updated paths is retrieved.

These paths will have CE_WATCHED cleared and lstat() will be called on
them. Those that have CE_WATCHED and not in the list will have
CE_VALID turn on to skip lstat(). The setting is temporarily, CE_VALID
is not saved to disk if CE_WATCHED is also set.

We keep the CE_WATCHED in a new extension, separated from the entries
to save some space because extended ce_flags adds 2 bytes per entry
and this flag would be present in the majority of entries. When stored
as bitmap, this extension could compress very well with ewah algorithm.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/technical/index-format.txt |  6 +++++
 cache.h                                  |  3 +++
 read-cache.c                             | 41 +++++++++++++++++++++++++++++++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index f352a9b..24fd0ae 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -198,3 +198,9 @@ Git index format
   - At most three 160-bit object names of the entry in stages from 1 to 3
     (nothing is written for a missing stage).
 
+=== File watcher
+
+  The signature of this extension is { 'W', 'A', 'T', 'C' }.
+
+  - A bit map of all entries in the index, n-th bit of m-th byte
+    corresponds to CE_WATCHED of the <m * 8+ n>-th index entry.
diff --git a/cache.h b/cache.h
index f14d535..a0af2a5 100644
--- a/cache.h
+++ b/cache.h
@@ -169,6 +169,9 @@ struct cache_entry {
 /* used to temporarily mark paths matched by pathspecs */
 #define CE_MATCHED           (1 << 26)
 
+/* set CE_VALID at runtime if the entry is guaranteed not updated */
+#define CE_WATCHED           (1 << 27)
+
 /*
  * Extended on-disk flags
  */
diff --git a/read-cache.c b/read-cache.c
index 3b6daf1..098d3b6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -33,6 +33,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
 
 struct index_state the_index;
 
@@ -1289,6 +1290,19 @@ static int verify_hdr(struct cache_header *hdr,
 	return 0;
 }
 
+static void read_watch_extension(struct index_state *istate, uint8_t *data,
+				 unsigned long sz)
+{
+	int i;
+	if ((istate->cache_nr + 7) / 8 != sz) {
+		error("invalid 'WATC' extension");
+		return;
+	}
+	for (i = 0; i < istate->cache_nr; i++)
+		if (data[i / 8] & (1 << (i % 8)))
+			istate->cache[i]->ce_flags |= CE_WATCHED;
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1299,6 +1313,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_WATCH:
+		read_watch_extension(istate, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1777,7 +1794,7 @@ int write_index(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
-	int i, err, removed, extended, hdr_version;
+	int i, err, removed, extended, hdr_version, has_watches = 0;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
 	struct stat st;
@@ -1786,6 +1803,8 @@ int write_index(struct index_state *istate, int newfd)
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
+		else if (cache[i]->ce_flags & CE_WATCHED)
+			has_watches++;
 
 		/* reduce extended entries if possible */
 		cache[i]->ce_flags &= ~CE_EXTENDED;
@@ -1857,6 +1876,26 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
+	if (has_watches) {
+		int id, sz = (entries - removed + 7) / 8;
+		uint8_t *data = xmalloc(sz);
+		memset(data, 0, sz);
+		for (i = 0, id = 0; i < entries && has_watches; i++) {
+			struct cache_entry *ce = cache[i];
+			if (ce->ce_flags & CE_REMOVE)
+				continue;
+			if (ce->ce_flags & CE_WATCHED) {
+				data[id / 8] |= 1 << (id % 8);
+				has_watches--;
+			}
+			id++;
+		}
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
+			|| ce_write(&c, newfd, data, sz) < 0;
+		free(data);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd) || fstat(newfd, &st))
 		return -1;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 11/26] Clear CE_WATCHED when set CE_VALID alone
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (9 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 10/26] read-cache: new flag CE_WATCHED to mark what file is watched Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:28     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 12/26] read-cache: basic hand shaking to the file watcher Nguyễn Thái Ngọc Duy
                       ` (15 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

If CE_WATCHED is set, CE_VALID is controlled by CE_WATCHED and will be
cleared bfore writing to disk. Users of --assume-unchanged therefore
need to clear CE_WATCHED to avoid this side effect.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/update-index.c | 12 ++++++++----
 read-cache.c           |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index e3a10d7..9283fd6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -50,9 +50,13 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 	int namelen = strlen(path);
 	int pos = cache_name_pos(path, namelen);
 	if (0 <= pos) {
-		if (mark)
-			active_cache[pos]->ce_flags |= flag;
-		else
+		if (mark) {
+			struct cache_entry *ce = active_cache[pos];
+			if (flag == CE_VALID)
+				ce->ce_flags = (ce->ce_flags & ~CE_WATCHED) | CE_VALID;
+			else
+				ce->ce_flags |= flag;
+		} else
 			active_cache[pos]->ce_flags &= ~flag;
 		cache_tree_invalidate_path(active_cache_tree, path);
 		active_cache_changed = 1;
@@ -235,7 +239,7 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
 	ce->ce_namelen = len;
 	ce->ce_mode = create_ce_mode(mode);
 	if (assume_unchanged)
-		ce->ce_flags |= CE_VALID;
+		ce->ce_flags = (ce->ce_flags & ~CE_WATCHED) | CE_VALID;
 	option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
 	option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
 	if (add_cache_entry(ce, option))
diff --git a/read-cache.c b/read-cache.c
index 098d3b6..8961864 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -133,7 +133,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 	fill_stat_data(&ce->ce_stat_data, st);
 
 	if (assume_unchanged)
-		ce->ce_flags |= CE_VALID;
+		ce->ce_flags = (ce->ce_flags & ~CE_WATCHED) | CE_VALID;
 
 	if (S_ISREG(st->st_mode))
 		ce_mark_uptodate(ce);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 12/26] read-cache: basic hand shaking to the file watcher
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (10 preceding siblings ...)
  2014-02-03  4:28     ` [PATCH v3 11/26] Clear CE_WATCHED when set CE_VALID alone Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 13/26] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
                       ` (14 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

read_cache() connects to the file watcher, specified by
filewatcher.path config, and performs basic hand shaking. CE_WATCHED
is cleared if git and file watcher have different views on the index
state.

All send/receive calls must be complete within a limited time to avoid
a buggy file-watcher hang "git status" forever. And the whole point of
doing this is speed. If file watcher can't respond fast enough, for
whatever reason, then it should not be used.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt           |  10 +++
 Documentation/git-file-watcher.txt |   4 +-
 Makefile                           |   1 +
 cache.h                            |   1 +
 file-watcher-lib.c (new)           |  91 ++++++++++++++++++++++
 file-watcher-lib.h (new)           |   6 ++
 file-watcher.c                     | 152 ++++++++++++++++++++++++++++++++++++-
 read-cache.c                       |   6 ++
 8 files changed, 269 insertions(+), 2 deletions(-)
 create mode 100644 file-watcher-lib.c
 create mode 100644 file-watcher-lib.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5f4d793..6ad653a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1042,6 +1042,16 @@ difftool.<tool>.cmd::
 difftool.prompt::
 	Prompt before each invocation of the diff tool.
 
+filewatcher.path::
+	The directory that contains the socket of `git	file-watcher`.
+	If it's not an absolute path, it's relative to $GIT_DIR. An
+	empty path means no connection to file watcher.
+
+filewatcher.timeout::
+	This is the maximum time in milliseconds that Git waits for
+	the file watcher to respond before giving up. Default value is
+	50. Setting to -1 makes Git wait forever.
+
 fetch.recurseSubmodules::
 	This option can be either set to a boolean value or to 'on-demand'.
 	Setting it to a boolean changes the behavior of fetch and pull to
diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
index ec81f18..d91caf3 100644
--- a/Documentation/git-file-watcher.txt
+++ b/Documentation/git-file-watcher.txt
@@ -14,7 +14,9 @@ DESCRIPTION
 -----------
 This program watches file changes in a git working directory and let
 Git now what files have been changed so that Git does not have to call
-lstat(2) to detect that itself.
+lstat(2) to detect that itself. Config key filewatcher.path needs to
+be set to `<socket directory>` so Git knows how to contact to the file
+watcher.
 
 OPTIONS
 -------
diff --git a/Makefile b/Makefile
index 8eef0d6..1c4d659 100644
--- a/Makefile
+++ b/Makefile
@@ -798,6 +798,7 @@ LIB_OBJS += entry.o
 LIB_OBJS += environment.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
+LIB_OBJS += file-watcher-lib.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
diff --git a/cache.h b/cache.h
index a0af2a5..b3ea574 100644
--- a/cache.h
+++ b/cache.h
@@ -283,6 +283,7 @@ struct index_state {
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
 	unsigned char sha1[20];
+	int watcher;
 };
 
 extern struct index_state the_index;
diff --git a/file-watcher-lib.c b/file-watcher-lib.c
new file mode 100644
index 0000000..d0636cc
--- /dev/null
+++ b/file-watcher-lib.c
@@ -0,0 +1,91 @@
+#include "cache.h"
+#include "file-watcher-lib.h"
+#include "pkt-line.h"
+#include "unix-socket.h"
+
+static char *watcher_path;
+static int WAIT_TIME = 50;	/* in ms */
+
+static int connect_watcher(const char *path)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+
+	if (!path || !*path)
+		return -1;
+
+	strbuf_addf(&sb, "%s/socket", path);
+	fd = unix_stream_connect(sb.buf);
+	strbuf_release(&sb);
+	return fd;
+}
+
+static void reset_watches(struct index_state *istate, int disconnect)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHED) {
+			istate->cache[i]->ce_flags &= ~(CE_WATCHED | CE_VALID);
+			istate->cache_changed = 1;
+		}
+	if (disconnect && istate->watcher > 0) {
+		close(istate->watcher);
+		istate->watcher = -1;
+	}
+}
+
+static int watcher_config(const char *var, const char *value, void *data)
+{
+	if (!strcmp(var, "filewatcher.path")) {
+		if (is_absolute_path(value))
+			watcher_path = xstrdup(value);
+		else if (*value == '~')
+			watcher_path = expand_user_path(value);
+		else
+			watcher_path = git_pathdup("%s", value);
+		return 0;
+	}
+	if (!strcmp(var, "filewatcher.timeout")) {
+		WAIT_TIME = git_config_int(var, value);
+		return 0;
+	}
+	return 0;
+}
+
+void open_watcher(struct index_state *istate)
+{
+	static int read_config = 0;
+	char *msg;
+
+	if (!get_git_work_tree()) {
+		reset_watches(istate, 1);
+		return;
+	}
+
+	if (!read_config) {
+		/*
+		 * can't hook into git_default_config because
+		 * read_cache() may be called even before git_config()
+		 * call.
+		 */
+		git_config(watcher_config, NULL);
+		read_config = 1;
+	}
+
+	istate->watcher = connect_watcher(watcher_path);
+	if (packet_write_timeout(istate->watcher, WAIT_TIME, "hello") <= 0 ||
+	    (msg = packet_read_line_timeout(istate->watcher, WAIT_TIME, NULL)) == NULL ||
+	    strcmp(msg, "hello")) {
+		reset_watches(istate, 1);
+		return;
+	}
+
+	if (packet_write_timeout(istate->watcher, WAIT_TIME, "index %s %s",
+				 sha1_to_hex(istate->sha1),
+				 get_git_work_tree()) <= 0 ||
+	    (msg = packet_read_line_timeout(istate->watcher, WAIT_TIME, NULL)) == NULL ||
+	    strcmp(msg, "ok")) {
+		reset_watches(istate, 0);
+		return;
+	}
+}
diff --git a/file-watcher-lib.h b/file-watcher-lib.h
new file mode 100644
index 0000000..eb6edf5
--- /dev/null
+++ b/file-watcher-lib.h
@@ -0,0 +1,6 @@
+#ifndef __FILE_WATCHER_LIB__
+#define __FILE_WATCHER_LIB__
+
+void open_watcher(struct index_state *istate);
+
+#endif
diff --git a/file-watcher.c b/file-watcher.c
index 1e1ccad..6df3a48 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -3,20 +3,78 @@
 #include "parse-options.h"
 #include "exec_cmd.h"
 #include "unix-socket.h"
+#include "pkt-line.h"
 
 static const char *const file_watcher_usage[] = {
 	N_("git file-watcher [options] <socket directory>"),
 	NULL
 };
 
+struct repository {
+	char *work_tree;
+	char index_signature[41];
+	/*
+	 * At least with inotify we don't keep track down to "/". So
+	 * if worktree is /abc/def and someone moves /abc to /ghi, and
+	 * /jlk to /abc (and /jlk/def exists before the move), we
+	 * cant' detect that /abc/def is totally new. Checking inode
+	 * is probably enough for this case.
+	 */
+	ino_t inode;
+};
+
+const char *invalid_signature = "0000000000000000000000000000000000000000";
+
+static struct repository **repos;
+static int nr_repos;
+
 struct connection {
-	int sock;
+	int sock, polite;
+	struct repository *repo;
 };
 
 static struct connection **conns;
 static struct pollfd *pfd;
 static int conns_alloc, pfd_nr, pfd_alloc;
 
+static struct repository *get_repo(const char *work_tree)
+{
+	int first, last;
+	struct repository *repo;
+
+	first = 0;
+	last = nr_repos;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		int cmp = strcmp(work_tree, repos[next]->work_tree);
+		if (!cmp)
+			return repos[next];
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+
+	nr_repos++;
+	repos = xrealloc(repos, sizeof(*repos) * nr_repos);
+	if (nr_repos > first + 1)
+		memmove(repos + first + 1, repos + first,
+			(nr_repos - first - 1) * sizeof(*repos));
+	repo = xmalloc(sizeof(*repo));
+	memset(repo, 0, sizeof(*repo));
+	repo->work_tree = xstrdup(work_tree);
+	memset(repo->index_signature, '0', 40);
+	repos[first] = repo;
+	return repo;
+}
+
+static void reset_repo(struct repository *repo, ino_t inode)
+{
+	memcpy(repo->index_signature, invalid_signature, 40);
+	repo->inode = inode;
+}
+
 static int shutdown_connection(int id)
 {
 	struct connection *conn = conns[id];
@@ -31,6 +89,98 @@ static int shutdown_connection(int id)
 
 static int handle_command(int conn_id)
 {
+	int fd = conns[conn_id]->sock;
+	int len;
+	const char *arg;
+	char *msg;
+
+	/*
+	 * ">" denotes an incoming packet, "<" outgoing. The lack of
+	 * "<" means no reply expected.
+	 *
+	 * < "error" SP ERROR-STRING
+	 *
+	 * This can be sent whenever the client violates the protocol.
+	 */
+
+	msg = packet_read_line(fd, &len);
+	if (!msg) {
+		packet_write(fd, "error invalid pkt-line");
+		return shutdown_connection(conn_id);
+	}
+
+	/*
+	 * > "hello" [SP CAP [SP CAP..]]
+	 * < "hello" [SP CAP [SP CAP..]]
+	 *
+	 * Advertise capabilities of both sides. File watcher may
+	 * disconnect if the client does not advertise the required
+	 * capabilities. Capabilities in uppercase MUST be
+	 * supported. If any side does not understand any of the
+	 * advertised uppercase capabilities, it must disconnect.
+	 */
+	if ((arg = skip_prefix(msg, "hello"))) {
+		if (*arg) {	/* no capabilities supported yet */
+			packet_write(fd, "error capabilities not supported");
+			return shutdown_connection(conn_id);
+		}
+		packet_write(fd, "hello");
+		conns[conn_id]->polite = 1;
+	}
+
+	/*
+	 * > "index" SP INDEX-SIGNATURE SP WORK-TREE-PATH
+	 * < "ok" | "inconsistent"
+	 *
+	 * INDEX-SIGNATURE consists of 40 hexadecimal letters
+	 * WORK-TREE-PATH must be absolute and normalized path
+	 *
+	 * Watch file changes in index. The client sends the index and
+	 * work tree info. File watcher validates that it holds the
+	 * same info. If so it sends "ok" back indicating both sides
+	 * are on the same page and CE_WATCHED bits can be ketpt.
+	 *
+	 * Otherwise it sends "inconsistent" and both sides must reset
+	 * back to initial state. File watcher keeps its index
+	 * signature all-zero until the client has updated the index
+	 * ondisk and request to update index signature.
+	 *
+	 * "hello" must be exchanged first. After this command the
+	 * connection is associated with a worktree/index. Many
+	 * commands may require this to proceed.
+	 */
+	else if (starts_with(msg, "index ")) {
+		struct repository *repo;
+		struct stat st;
+		if (!conns[conn_id]->polite) {
+			packet_write(fd, "error why did you not greet me? go away");
+			return shutdown_connection(conn_id);
+		}
+		if (len < 47 || msg[46] != ' ' || !is_absolute_path(msg + 47)) {
+			packet_write(fd, "error invalid index line %s", msg);
+			return shutdown_connection(conn_id);
+		}
+
+		if (lstat(msg + 47, &st) || !S_ISDIR(st.st_mode)) {
+			packet_write(fd, "error work tree does not exist: %s",
+				     strerror(errno));
+			return shutdown_connection(conn_id);
+		}
+		repo = get_repo(msg + 47);
+		conns[conn_id]->repo = repo;
+		if (memcmp(msg + 6, repo->index_signature, 40) ||
+		    !memcmp(msg + 6, invalid_signature, 40) ||
+		    repo->inode != st.st_ino) {
+			packet_write(fd, "inconsistent");
+			reset_repo(repo, st.st_ino);
+			return 0;
+		}
+		packet_write(fd, "ok");
+	}
+	else {
+		packet_write(fd, "error unrecognized command %s", msg);
+		return shutdown_connection(conn_id);
+	}
 	return 0;
 }
 
diff --git a/read-cache.c b/read-cache.c
index 8961864..a7e5735 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "strbuf.h"
 #include "varint.h"
+#include "file-watcher-lib.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
@@ -1528,6 +1529,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += extsize;
 	}
 	munmap(mmap, mmap_size);
+	open_watcher(istate);
 	return istate->cache_nr;
 
 unmap:
@@ -1553,6 +1555,10 @@ int discard_index(struct index_state *istate)
 	istate->timestamp.nsec = 0;
 	free_name_hash(istate);
 	cache_tree_free(&(istate->cache_tree));
+	if (istate->watcher > 0) {
+		close(istate->watcher);
+		istate->watcher = -1;
+	}
 	istate->initialized = 0;
 	free(istate->cache);
 	istate->cache = NULL;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 13/26] read-cache: ask file watcher to watch files
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (11 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 12/26] read-cache: basic hand shaking to the file watcher Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 14/26] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
                       ` (13 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We want to watch files that are never changed because lstat() on those
files is a wasted effort. So we sort unwatched files by date and start
adding them to the file watcher until it barfs (e.g. hits inotify
limit).

Note that we still do lstat() on these new watched files because they
could have changed before the file watcher could watch them. Watched
files may only skip lstat() at the next run.

Also, at this early in the index loading process, we don't know what
files are dirty and thus can skip watching (we do clear CE_WATCHED on
entries that are not verified clean before writing index). So the
watches are set, but git ignores its results. Maybe in future we could
store the list of dirty files in WATC extension and use it as a hint
to skip watching.

In the future, file watcher should figure out what paths are
watchable, what not (e.g. network filesystems) and reject them. For
now it's the user resposibility to set (or unset) filewatcher.path
properly.

The previous attempt sends paths in batch, 64k per pkt-line, then wait
for response. It's designed to stop short in case file watcher is out
of resources. But that's a rare case, and send/wait cycles increase
latency.

Instead we now send everything in one packet, and not in pkt-line to
avoid the 64k limit. Then we wait for the response. On webkit.git,
normal "status -uno" takes 0m1.138s. The sending 14M (of 182k paths)
takes 52ms extra. Previous approach takes 213ms extra. Of course in
the end, extra time is longer because file watcher is basically no-op
so far.

There is not much room for improvement. If we compress paths to reduce
payload, zlib time costs about 300ms (so how small the end result is
no longer matters). Even simple prefix compressing (index v4 style)
would cost 76ms on processing time alone (reducing payload to 3M).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher-lib.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
 file-watcher-lib.h |  1 +
 file-watcher.c     | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c       | 18 +++++++++--
 4 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/file-watcher-lib.c b/file-watcher-lib.c
index d0636cc..791faae 100644
--- a/file-watcher-lib.c
+++ b/file-watcher-lib.c
@@ -89,3 +89,86 @@ void open_watcher(struct index_state *istate)
 		return;
 	}
 }
+
+static int sort_by_date(const void *a_, const void *b_)
+{
+	const struct cache_entry *a = *(const struct cache_entry **)a_;
+	const struct cache_entry *b = *(const struct cache_entry **)b_;
+	uint32_t seca = a->ce_stat_data.sd_mtime.sec;
+	uint32_t secb = b->ce_stat_data.sd_mtime.sec;
+	return seca - secb;
+}
+
+static inline int ce_watchable(struct cache_entry *ce)
+{
+	return
+		!(ce->ce_flags & CE_WATCHED) &&
+		!(ce->ce_flags & CE_VALID) &&
+		/*
+		 * S_IFGITLINK should not be watched
+		 * obviously. S_IFLNK could be problematic because
+		 * inotify may follow symlinks without IN_DONT_FOLLOW
+		 */
+		S_ISREG(ce->ce_mode);
+}
+
+static void send_watches(struct index_state *istate,
+			 struct cache_entry **sorted, int nr)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int i, len = 0;
+
+	for (i = 0; i < nr; i++)
+		len += ce_namelen(sorted[i]) + 1;
+
+	if (packet_write_timeout(istate->watcher, WAIT_TIME, "watch %d", len) <= 0)
+		return;
+
+	strbuf_grow(&sb, len);
+	for (i = 0; i < nr; i++)
+		strbuf_add(&sb, sorted[i]->name, ce_namelen(sorted[i]) + 1);
+
+	if (write_in_full_timeout(istate->watcher, sb.buf,
+				  sb.len, WAIT_TIME) != sb.len) {
+		strbuf_release(&sb);
+		return;
+	}
+	strbuf_release(&sb);
+
+	for (;;) {
+		char *line, *end;
+		unsigned long n;
+
+		if (!(line = packet_read_line_timeout(istate->watcher,
+						      WAIT_TIME, &len)))
+			return;
+		if (starts_with(line, "watching "))
+			continue;
+		if (!starts_with(line, "watched "))
+			return;
+		n = strtoul(line + 8, &end, 10);
+		for (i = 0; i < n; i++)
+			sorted[i]->ce_flags |= CE_WATCHED;
+		istate->cache_changed = 1;
+		break;
+	}
+}
+
+void watch_entries(struct index_state *istate)
+{
+	int i, nr;
+	struct cache_entry **sorted;
+
+	if (istate->watcher <= 0)
+		return;
+	for (i = nr = 0; i < istate->cache_nr; i++)
+		if (ce_watchable(istate->cache[i]))
+			nr++;
+	sorted = xmalloc(sizeof(*sorted) * nr);
+	for (i = nr = 0; i < istate->cache_nr; i++)
+		if (ce_watchable(istate->cache[i]))
+			sorted[nr++] = istate->cache[i];
+	qsort(sorted, nr, sizeof(*sorted), sort_by_date);
+	send_watches(istate, sorted, nr);
+	free(sorted);
+}
diff --git a/file-watcher-lib.h b/file-watcher-lib.h
index eb6edf5..1641024 100644
--- a/file-watcher-lib.h
+++ b/file-watcher-lib.h
@@ -2,5 +2,6 @@
 #define __FILE_WATCHER_LIB__
 
 void open_watcher(struct index_state *istate);
+void watch_entries(struct index_state *istate);
 
 #endif
diff --git a/file-watcher.c b/file-watcher.c
index 6df3a48..c257414 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -37,6 +37,70 @@ static struct connection **conns;
 static struct pollfd *pfd;
 static int conns_alloc, pfd_nr, pfd_alloc;
 
+static int watch_path(struct repository *repo, char *path)
+{
+	return -1;
+}
+
+static inline uint64_t stamp(void)
+{
+	struct timeval tv;
+	gettimeofday(&tv, NULL);
+	return (uint64_t)tv.tv_sec * 1000000 + tv.tv_usec;
+}
+
+static int shutdown_connection(int id);
+static void watch_paths(int conn_id, char *buf, int maxlen)
+{
+	int ret, len, n;
+	uint64_t start, now;
+	char *end;
+
+	n = strtol(buf, &end, 10);
+	if (end != buf + maxlen) {
+		packet_write(conns[conn_id]->sock,
+			     "error invalid watch number %s", buf);
+		shutdown_connection(conn_id);
+		return;
+	}
+
+	buf = xmallocz(n);
+	end = buf + n;
+	/*
+	 * Careful if this takes longer than 50ms, it'll upset other
+	 * connections
+	 */
+	if (read_in_full(conns[conn_id]->sock, buf, n) != n) {
+		shutdown_connection(conn_id);
+		return;
+	}
+	if (chdir(conns[conn_id]->repo->work_tree)) {
+		packet_write(conns[conn_id]->sock,
+			     "error chdir %s", strerror(errno));
+		return;
+	}
+	start = stamp();
+	for (n = ret = 0; buf < end; buf += len + 1) {
+		len = strlen(buf);
+		if (watch_path(conns[conn_id]->repo, buf))
+			break;
+		n++;
+		if (n & 0x3ff)
+			continue;
+		now = stamp();
+		/*
+		 * If we process for too long, the client may timeout
+		 * and give up. Let the client know we're not dead
+		 * yet, every 30ms.
+		 */
+		if (start + 30000 < now) {
+			packet_write(conns[conn_id]->sock, "watching %d", n);
+			start = now;
+		}
+	}
+	packet_write(conns[conn_id]->sock, "watched %u", n);
+}
+
 static struct repository *get_repo(const char *work_tree)
 {
 	int first, last;
@@ -177,6 +241,33 @@ static int handle_command(int conn_id)
 		}
 		packet_write(fd, "ok");
 	}
+
+	/*
+	 * > "watch" SP LENGTH
+	 * > PATH-LIST
+	 * < "watching" SP NUM
+	 * < "watched" SP NUM
+	 *
+	 * PATH-LIST is the list of paths, each terminated with
+	 * NUL. PATH-LIST is not wrapped in pkt-line format. LENGTH is
+	 * the size of PATH-LIST in bytes.
+	 *
+	 * The client asks file watcher to watcher a number of
+	 * paths. File watcher starts to process from path by path in
+	 * received order. File watcher returns the actual number of
+	 * watched paths with "watched" command.
+	 *
+	 * File watcher may send any number of "watching" messages
+	 * before "watched". This packet is to keep the connection
+	 * alive and has no other values.
+	 */
+	else if (starts_with(msg, "watch ")) {
+		if (!conns[conn_id]->repo) {
+			packet_write(fd, "error have not received index command");
+			return shutdown_connection(conn_id);
+		}
+		watch_paths(conn_id, msg + 6, len - 6);
+	}
 	else {
 		packet_write(fd, "error unrecognized command %s", msg);
 		return shutdown_connection(conn_id);
diff --git a/read-cache.c b/read-cache.c
index a7e5735..cb2188f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1530,6 +1530,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	}
 	munmap(mmap, mmap_size);
 	open_watcher(istate);
+	watch_entries(istate);
 	return istate->cache_nr;
 
 unmap:
@@ -1809,8 +1810,21 @@ int write_index(struct index_state *istate, int newfd)
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
-		else if (cache[i]->ce_flags & CE_WATCHED)
-			has_watches++;
+		else if (cache[i]->ce_flags & CE_WATCHED) {
+			/*
+			 * We may set CE_WATCHED (but not CE_VALID)
+			 * early when refresh has not been done
+			 * yet. At that time we had no idea if the
+			 * entry may have been updated. If it has
+			 * been, remove CE_WATCHED so CE_VALID won't
+			 * incorrectly be set next time if the file
+			 * watcher reports no changes.
+			 */
+			if (!ce_uptodate(cache[i]))
+				cache[i]->ce_flags &= ~CE_WATCHED;
+			else
+				has_watches++;
+		}
 
 		/* reduce extended entries if possible */
 		cache[i]->ce_flags &= ~CE_EXTENDED;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 14/26] read-cache: put some limits on file watching
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (12 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 13/26] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 15/26] read-cache: get changed file list from file watcher Nguyễn Thái Ngọc Duy
                       ` (12 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

watch_entries() is a lot of computation and could trigger a lot more
lookups in file-watcher. Normally after the first set of watches are
in place, we do not need to update often. Moreover if the number of
entries is small, the overhead of file watcher may actually slow git
down.

This patch only allows to update watches if the number of watchable
files is over a limit (and there are new files added if this is not
the first time). Measurements on Core i5-2520M and Linux 3.7.6, about
920 lstat() take 1ms. Somewhere between 2^16 and 2^17 lstat calls that
it starts to take longer than 100ms. 2^16 is chosen at the minimum
limit to start using file watcher.

Of course this is only sensible default for single-repo use
case. Lower it when you need to work with many small repos.

Recently updated files are not considered watchable because they are
likely to be updated again soon, not worth the ping-pong game with
file watcher. The default limit 10min is just a random value. Recent
limit is ignored if there are no watched files (e.g. a fresh clone, or
after a bad hand shake with file watcher).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt                 |  9 +++++++
 Documentation/technical/index-format.txt |  3 +++
 cache.h                                  |  1 +
 file-watcher-lib.c                       | 42 ++++++++++++++++++++++++++------
 read-cache.c                             | 11 ++++++---
 5 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6ad653a..451c100 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1052,6 +1052,15 @@ filewatcher.timeout::
 	the file watcher to respond before giving up. Default value is
 	50. Setting to -1 makes Git wait forever.
 
+filewatcher.minfiles::
+	Start watching files if the number of watchable files are
+	above this limit. Default value is 65536.
+
+filewatcher.recentlimit::
+	Files that are last updated within filewatcher.recentlimit
+	seconds from now are not considered watchable. Default value
+	is 600 (5 minutes).
+
 fetch.recurseSubmodules::
 	This option can be either set to a boolean value or to 'on-demand'.
 	Setting it to a boolean changes the behavior of fetch and pull to
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index 24fd0ae..7081e55 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -204,3 +204,6 @@ Git index format
 
   - A bit map of all entries in the index, n-th bit of m-th byte
     corresponds to CE_WATCHED of the <m * 8+ n>-th index entry.
+
+  - 1-byte, non-zero indicates the index should be scanned for new
+    watched entries.
diff --git a/cache.h b/cache.h
index b3ea574..10ff33e 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 	struct cache_tree *cache_tree;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 update_watches : 1,
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
diff --git a/file-watcher-lib.c b/file-watcher-lib.c
index 791faae..d4949a5 100644
--- a/file-watcher-lib.c
+++ b/file-watcher-lib.c
@@ -5,6 +5,8 @@
 
 static char *watcher_path;
 static int WAIT_TIME = 50;	/* in ms */
+static int watch_lowerlimit = 65536;
+static int recent_limit = 600;
 
 static int connect_watcher(const char *path)
 {
@@ -22,12 +24,17 @@ static int connect_watcher(const char *path)
 
 static void reset_watches(struct index_state *istate, int disconnect)
 {
-	int i;
+	int i, changed = 0;
 	for (i = 0; i < istate->cache_nr; i++)
 		if (istate->cache[i]->ce_flags & CE_WATCHED) {
 			istate->cache[i]->ce_flags &= ~(CE_WATCHED | CE_VALID);
-			istate->cache_changed = 1;
+			changed = 1;
 		}
+	recent_limit = 0;
+	if (changed) {
+		istate->update_watches = 1;
+		istate->cache_changed = 1;
+	}
 	if (disconnect && istate->watcher > 0) {
 		close(istate->watcher);
 		istate->watcher = -1;
@@ -49,6 +56,14 @@ static int watcher_config(const char *var, const char *value, void *data)
 		WAIT_TIME = git_config_int(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "filewatcher.minfiles")) {
+		watch_lowerlimit = git_config_int(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "filewatcher.recentlimit")) {
+		recent_limit = git_config_int(var, value);
+		return 0;
+	}
 	return 0;
 }
 
@@ -63,12 +78,18 @@ void open_watcher(struct index_state *istate)
 	}
 
 	if (!read_config) {
+		int i;
 		/*
 		 * can't hook into git_default_config because
 		 * read_cache() may be called even before git_config()
 		 * call.
 		 */
 		git_config(watcher_config, NULL);
+		for (i = 0; i < istate->cache_nr; i++)
+			if (istate->cache[i]->ce_flags & CE_WATCHED)
+				break;
+		if (i == istate->cache_nr)
+			recent_limit = 0;
 		read_config = 1;
 	}
 
@@ -86,6 +107,7 @@ void open_watcher(struct index_state *istate)
 	    (msg = packet_read_line_timeout(istate->watcher, WAIT_TIME, NULL)) == NULL ||
 	    strcmp(msg, "ok")) {
 		reset_watches(istate, 0);
+		istate->update_watches = 1;
 		return;
 	}
 }
@@ -99,7 +121,7 @@ static int sort_by_date(const void *a_, const void *b_)
 	return seca - secb;
 }
 
-static inline int ce_watchable(struct cache_entry *ce)
+static inline int ce_watchable(struct cache_entry *ce, time_t now)
 {
 	return
 		!(ce->ce_flags & CE_WATCHED) &&
@@ -109,7 +131,8 @@ static inline int ce_watchable(struct cache_entry *ce)
 		 * obviously. S_IFLNK could be problematic because
 		 * inotify may follow symlinks without IN_DONT_FOLLOW
 		 */
-		S_ISREG(ce->ce_mode);
+		S_ISREG(ce->ce_mode) &&
+		(ce->ce_stat_data.sd_mtime.sec + recent_limit <= now);
 }
 
 static void send_watches(struct index_state *istate,
@@ -158,15 +181,20 @@ void watch_entries(struct index_state *istate)
 {
 	int i, nr;
 	struct cache_entry **sorted;
+	time_t now = time(NULL);
 
-	if (istate->watcher <= 0)
+	if (istate->watcher <= 0 || !istate->update_watches)
 		return;
+	istate->update_watches = 0;
+	istate->cache_changed = 1;
 	for (i = nr = 0; i < istate->cache_nr; i++)
-		if (ce_watchable(istate->cache[i]))
+		if (ce_watchable(istate->cache[i], now))
 			nr++;
+	if (nr < watch_lowerlimit)
+		return;
 	sorted = xmalloc(sizeof(*sorted) * nr);
 	for (i = nr = 0; i < istate->cache_nr; i++)
-		if (ce_watchable(istate->cache[i]))
+		if (ce_watchable(istate->cache[i], now))
 			sorted[nr++] = istate->cache[i];
 	qsort(sorted, nr, sizeof(*sorted), sort_by_date);
 	send_watches(istate, sorted, nr);
diff --git a/read-cache.c b/read-cache.c
index cb2188f..dc49858 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1009,6 +1009,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 			(istate->cache_nr - pos - 1) * sizeof(ce));
 	set_index_entry(istate, pos, ce);
 	istate->cache_changed = 1;
+	istate->update_watches = 1;
 	return 0;
 }
 
@@ -1295,13 +1296,14 @@ static void read_watch_extension(struct index_state *istate, uint8_t *data,
 				 unsigned long sz)
 {
 	int i;
-	if ((istate->cache_nr + 7) / 8 != sz) {
+	if ((istate->cache_nr + 7) / 8 + 1 != sz) {
 		error("invalid 'WATC' extension");
 		return;
 	}
 	for (i = 0; i < istate->cache_nr; i++)
 		if (data[i / 8] & (1 << (i % 8)))
 			istate->cache[i]->ce_flags |= CE_WATCHED;
+	istate->update_watches = data[sz - 1];
 }
 
 static int read_index_extension(struct index_state *istate,
@@ -1488,6 +1490,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
 	istate->initialized = 1;
+	istate->update_watches = 1;
 
 	if (istate->version == 4)
 		previous_name = &previous_name_buf;
@@ -1896,8 +1899,9 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
-	if (has_watches) {
-		int id, sz = (entries - removed + 7) / 8;
+	if (has_watches ||
+	    (istate->watcher != -1 && !istate->update_watches)) {
+		int id, sz = (entries - removed + 7) / 8 + 1;
 		uint8_t *data = xmalloc(sz);
 		memset(data, 0, sz);
 		for (i = 0, id = 0; i < entries && has_watches; i++) {
@@ -1910,6 +1914,7 @@ int write_index(struct index_state *istate, int newfd)
 			}
 			id++;
 		}
+		data[sz - 1] = istate->update_watches;
 		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCH, sz) < 0
 			|| ce_write(&c, newfd, data, sz) < 0;
 		free(data);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 15/26] read-cache: get changed file list from file watcher
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (13 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 14/26] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 16/26] git-compat-util.h: add inotify stubs on non-Linux platforms Nguyễn Thái Ngọc Duy
                       ` (11 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

When some paths are watched, they are added to the "watched" list in
file watcher. When a path in this list is updated, the path is moved
to "changed" list and no longer watched.

With this patch we have a complete path exchanging picture between git
and file-watcher:

1) Hand shake

2) Get the list of changed paths, clear CE_WATCHED on these paths. Set
   CE_VALID on the remaining CE_WATCHED paths

3) (Optionally) Ask to watch more paths. Set CE_WATCHED on
   them. CE_VALID is not set so these are still lstat'd

4) Refresh as usual. lstat is skipped on CE_VALID paths. If one of
   those paths at step 3 are found modified, CE_WATCHED is removed.

5) Write index to disk. Notify file-watcher about new index
   signature. Ask file watcher to remove the "changed paths".

A few points:

 - Changed list remains until step 5. If git crashes or do not write
   index down, next time it starts, it'll fed the same changed list.

 - If git writes index down without telling file-watcher about it,
   next time it starts, hand shake should fail and git should clear
   all CE_WATCHED.

 - There's a buffer between starting watch at #3 and saving watch at
   #5. We do verify paths are clean at #4. But that time all watches
   should have been active for a while. No chance for race conditions.

 - #5 is sort of atomic. If git crashes half way through step 5, file
   watcher should not update its index signature. Which means next
   time git starts, hand shake fails (because new index's written) so
   we'll start over.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h            |   1 +
 file-watcher-lib.c |  99 ++++++++++++++++++++++++++++++++++++++
 file-watcher-lib.h |   1 +
 file-watcher.c     | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c       |  21 +++++++-
 5 files changed, 258 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 10ff33e..9f7d952 100644
--- a/cache.h
+++ b/cache.h
@@ -285,6 +285,7 @@ struct index_state {
 	struct hash_table dir_hash;
 	unsigned char sha1[20];
 	int watcher;
+	struct string_list *updated_entries;
 };
 
 extern struct index_state the_index;
diff --git a/file-watcher-lib.c b/file-watcher-lib.c
index d4949a5..b6b0848 100644
--- a/file-watcher-lib.c
+++ b/file-watcher-lib.c
@@ -2,6 +2,7 @@
 #include "file-watcher-lib.h"
 #include "pkt-line.h"
 #include "unix-socket.h"
+#include "string-list.h"
 
 static char *watcher_path;
 static int WAIT_TIME = 50;	/* in ms */
@@ -25,6 +26,11 @@ static int connect_watcher(const char *path)
 static void reset_watches(struct index_state *istate, int disconnect)
 {
 	int i, changed = 0;
+	if (istate->updated_entries) {
+		string_list_clear(istate->updated_entries, 0);
+		free(istate->updated_entries);
+		istate->updated_entries = NULL;
+	}
 	for (i = 0; i < istate->cache_nr; i++)
 		if (istate->cache[i]->ce_flags & CE_WATCHED) {
 			istate->cache[i]->ce_flags &= ~(CE_WATCHED | CE_VALID);
@@ -41,6 +47,58 @@ static void reset_watches(struct index_state *istate, int disconnect)
 	}
 }
 
+static void mark_ce_valid(struct index_state *istate)
+{
+	struct strbuf sb = STRBUF_INIT;
+	char *line, *end;
+	int i, len;
+	unsigned long n;
+	if (packet_write_timeout(istate->watcher, WAIT_TIME, "get-changed") <= 0 ||
+	    !(line = packet_read_line_timeout(istate->watcher, WAIT_TIME, &len)) ||
+	    !starts_with(line, "changed ")) {
+		reset_watches(istate, 1);
+		return;
+	}
+	n = strtoul(line + 8, &end, 10);
+	if (end != line + len) {
+		reset_watches(istate, 1);
+		return;
+	}
+	if (!n)
+		goto done;
+	strbuf_grow(&sb, n);
+	if (read_in_full_timeout(istate->watcher, sb.buf, n, WAIT_TIME) != n) {
+		strbuf_release(&sb);
+		reset_watches(istate, 1);
+		return;
+	}
+	line = sb.buf;
+	end = line + n;
+	for (; line < end; line += len + 1) {
+		len = strlen(line);
+		i = index_name_pos(istate, line, len);
+		if (i < 0)
+			continue;
+		if (istate->cache[i]->ce_flags & CE_WATCHED) {
+			istate->cache[i]->ce_flags &= ~CE_WATCHED;
+			istate->cache_changed = 1;
+		}
+		if (!istate->updated_entries) {
+			struct string_list *sl;
+			sl = xmalloc(sizeof(*sl));
+			memset(sl, 0, sizeof(*sl));
+			sl->strdup_strings = 1;
+			istate->updated_entries = sl;
+		}
+		string_list_append(istate->updated_entries, line);
+	}
+	strbuf_release(&sb);
+done:
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHED)
+			istate->cache[i]->ce_flags |= CE_VALID;
+}
+
 static int watcher_config(const char *var, const char *value, void *data)
 {
 	if (!strcmp(var, "filewatcher.path")) {
@@ -110,6 +168,8 @@ void open_watcher(struct index_state *istate)
 		istate->update_watches = 1;
 		return;
 	}
+
+	mark_ce_valid(istate);
 }
 
 static int sort_by_date(const void *a_, const void *b_)
@@ -200,3 +260,42 @@ void watch_entries(struct index_state *istate)
 	send_watches(istate, sorted, nr);
 	free(sorted);
 }
+
+void close_watcher(struct index_state *istate, const unsigned char *sha1)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int len, i, nr;
+	if (istate->watcher <= 0)
+		return;
+	if (packet_write_timeout(istate->watcher, WAIT_TIME,
+				 "new-index %s", sha1_to_hex(sha1)) <= 0)
+		goto done;
+	nr = istate->updated_entries ? istate->updated_entries->nr : 0;
+	if (!nr) {
+		packet_write_timeout(istate->watcher, WAIT_TIME, "unchange 0");
+		goto done;
+	}
+	for (i = len = 0; i < nr; i++) {
+		const char *s = istate->updated_entries->items[i].string;
+		len += strlen(s) + 1;
+	}
+	if (packet_write_timeout(istate->watcher, WAIT_TIME,
+				 "unchange %d", len) <= 0)
+	    goto done;
+	strbuf_grow(&sb, len);
+	for (i = 0; i < nr; i++) {
+		const char *s = istate->updated_entries->items[i].string;
+		int len = strlen(s);
+		strbuf_add(&sb, s, len + 1);
+	}
+	/*
+	 * it does not matter if it fails anymore, we're closing
+	 * down. If it only gets through partially, file watcher
+	 * should ignore it.
+	 */
+	write_in_full_timeout(istate->watcher, sb.buf, sb.len, WAIT_TIME);
+	strbuf_release(&sb);
+done:
+	close(istate->watcher);
+	istate->watcher = -1;
+}
diff --git a/file-watcher-lib.h b/file-watcher-lib.h
index 1641024..df68a73 100644
--- a/file-watcher-lib.h
+++ b/file-watcher-lib.h
@@ -3,5 +3,6 @@
 
 void open_watcher(struct index_state *istate);
 void watch_entries(struct index_state *istate);
+void close_watcher(struct index_state *istate, const unsigned char *sha1);
 
 #endif
diff --git a/file-watcher.c b/file-watcher.c
index c257414..aa2daf6 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -3,6 +3,7 @@
 #include "parse-options.h"
 #include "exec_cmd.h"
 #include "unix-socket.h"
+#include "string-list.h"
 #include "pkt-line.h"
 
 static const char *const file_watcher_usage[] = {
@@ -21,6 +22,9 @@ struct repository {
 	 * is probably enough for this case.
 	 */
 	ino_t inode;
+	struct string_list updated;
+	int updated_sorted;
+	int updating;
 };
 
 const char *invalid_signature = "0000000000000000000000000000000000000000";
@@ -31,6 +35,8 @@ static int nr_repos;
 struct connection {
 	int sock, polite;
 	struct repository *repo;
+
+	char new_index[41];
 };
 
 static struct connection **conns;
@@ -42,6 +48,24 @@ static int watch_path(struct repository *repo, char *path)
 	return -1;
 }
 
+static void get_changed_list(int conn_id)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int i, size, fd = conns[conn_id]->sock;
+	struct repository *repo = conns[conn_id]->repo;
+
+	for (i = size = 0; i < repo->updated.nr; i++)
+		size += strlen(repo->updated.items[i].string) + 1;
+	packet_write(fd, "changed %d", size);
+	if (!size)
+		return;
+	strbuf_grow(&sb, size);
+	for (i = 0; i < repo->updated.nr; i++)
+		strbuf_add(&sb, repo->updated.items[i].string,
+			   strlen(repo->updated.items[i].string) + 1);
+	write_in_full(fd, sb.buf, sb.len);
+}
+
 static inline uint64_t stamp(void)
 {
 	struct timeval tv;
@@ -101,6 +125,43 @@ static void watch_paths(int conn_id, char *buf, int maxlen)
 	packet_write(conns[conn_id]->sock, "watched %u", n);
 }
 
+static void unchange(int conn_id, unsigned long size)
+{
+	struct connection *conn = conns[conn_id];
+	struct repository *repo = conn->repo;
+	if (size) {
+		struct strbuf sb = STRBUF_INIT;
+		char *p;
+		int len;
+		strbuf_grow(&sb, size);
+		if (read_in_full(conn->sock, sb.buf, size) <= 0)
+			return;
+		if (!repo->updated_sorted) {
+			sort_string_list(&repo->updated);
+			repo->updated_sorted = 1;
+		}
+		for (p = sb.buf; p - sb.buf < size; p += len + 1) {
+			struct string_list_item *item;
+			len = strlen(p);
+			item = string_list_lookup(&repo->updated, p);
+			if (!item)
+				continue;
+			unsorted_string_list_delete_item(&repo->updated,
+							 item - repo->updated.items, 0);
+		}
+		strbuf_release(&sb);
+	}
+	memcpy(repo->index_signature, conn->new_index, 40);
+	/*
+	 * If other connections on this repo are in some sort of
+	 * session that depend on the previous repository state, we
+	 * may need to disconnect them to be safe.
+	 */
+
+	/* pfd[0] is the listening socket, can't be a connection */
+	repo->updating = 0;
+}
+
 static struct repository *get_repo(const char *work_tree)
 {
 	int first, last;
@@ -129,12 +190,14 @@ static struct repository *get_repo(const char *work_tree)
 	memset(repo, 0, sizeof(*repo));
 	repo->work_tree = xstrdup(work_tree);
 	memset(repo->index_signature, '0', 40);
+	repo->updated.strdup_strings = 1;
 	repos[first] = repo;
 	return repo;
 }
 
 static void reset_repo(struct repository *repo, ino_t inode)
 {
+	string_list_clear(&repo->updated, 0);
 	memcpy(repo->index_signature, invalid_signature, 40);
 	repo->inode = inode;
 }
@@ -147,6 +210,8 @@ static int shutdown_connection(int id)
 	if (!conn)
 		return 0;
 	close(conn->sock);
+	if (conn->repo && conn->repo->updating == id)
+		conn->repo->updating = 0;
 	free(conn);
 	return 0;
 }
@@ -268,6 +333,77 @@ static int handle_command(int conn_id)
 		}
 		watch_paths(conn_id, msg + 6, len - 6);
 	}
+
+	/*
+	 * > "get-changed"
+	 * < changed SP LENGTH
+	 * < PATH-LIST
+	 *
+	 * When watched path gets updated, the path is moved from
+	 * "watched" list to "changed" list and is no longer watched.
+	 * This command get the list of changed paths. PATH-LIST is
+	 * also sent if LENGTH is non-zero.
+	 */
+	else if (!strcmp(msg, "get-changed")) {
+		if (!conns[conn_id]->repo) {
+			packet_write(fd, "error have not received index command");
+			return shutdown_connection(conn_id);
+		}
+		get_changed_list(conn_id);
+	}
+
+	/*
+	 * > "new-index" INDEX-SIGNATURE
+	 * > "unchange" SP LENGTH
+	 * > PATH-LIST
+	 *
+	 * "new-index" passes new index signature from the
+	 * client. "unchange" sends the list of paths to be removed
+	 * from "changed" list.
+	 *
+	 * "new-index" must be sent before "unchange". File watcher
+	 * waits until the last "unchange" line, then update its index
+	 * signature as well as "changed" list.
+	 */
+	else if (starts_with(msg, "new-index ")) {
+		if (len != 50) {
+			packet_write(fd, "error invalid new-index line %s", msg);
+			return shutdown_connection(conn_id);
+		}
+		if (!conns[conn_id]->repo) {
+			packet_write(fd, "error have not received index command");
+			return shutdown_connection(conn_id);
+		}
+		if (conns[conn_id]->repo->updating == conn_id) {
+			packet_write(fd, "error received new-index command more than once");
+			return shutdown_connection(conn_id);
+		}
+		memcpy(conns[conn_id]->new_index, msg + 10, 40);
+		/*
+		 * if updating is non-zero the other client will get
+		 * disconnected at the next "unchange" command because
+		 * "updating" no longer points to its connection.
+		 */
+		conns[conn_id]->repo->updating = conn_id;
+	}
+	else if (skip_prefix(msg, "unchange ")) {
+		unsigned long n;
+		char *end;
+		n = strtoul(msg + 9, &end, 10);
+		if (end != msg + len) {
+			packet_write(fd, "error invalid unchange line %s", msg);
+			return shutdown_connection(conn_id);
+		}
+		if (!conns[conn_id]->repo) {
+			packet_write(fd, "error have not received index command");
+			return shutdown_connection(conn_id);
+		}
+		if (conns[conn_id]->repo->updating != conn_id) {
+			packet_write(fd, "error have not received new-index command");
+			return shutdown_connection(conn_id);
+		}
+		unchange(conn_id, n);
+	}
 	else {
 		packet_write(fd, "error unrecognized command %s", msg);
 		return shutdown_connection(conn_id);
@@ -436,6 +572,8 @@ int main(int argc, const char **argv)
 			if (!conns[i])
 				continue;
 			if (i != new_nr) { /* pfd[] is shrunk, move pfd[i] up */
+				if (conns[i]->repo && conns[i]->repo->updating == i)
+					conns[i]->repo->updating = new_nr;
 				conns[new_nr] = conns[i];
 				pfd[new_nr] = pfd[i];
 			}
diff --git a/read-cache.c b/read-cache.c
index dc49858..5540b06 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1567,6 +1567,11 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->updated_entries) {
+		string_list_clear(istate->updated_entries, 0);
+		free(istate->updated_entries);
+		istate->updated_entries = NULL;
+	}
 	return 0;
 }
 
@@ -1627,7 +1632,7 @@ static int write_index_ext_header(git_SHA_CTX *context, int fd,
 		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
 }
 
-static int ce_flush(git_SHA_CTX *context, int fd)
+static int ce_flush(git_SHA_CTX *context, int fd, unsigned char *sha1)
 {
 	unsigned int left = write_buffer_len;
 
@@ -1645,6 +1650,8 @@ static int ce_flush(git_SHA_CTX *context, int fd)
 
 	/* Append the SHA1 signature at the end */
 	git_SHA1_Final(write_buffer + left, context);
+	if (sha1)
+		hashcpy(sha1, write_buffer + left);
 	left += 20;
 	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
 }
@@ -1809,12 +1816,21 @@ int write_index(struct index_state *istate, int newfd)
 	int entries = istate->cache_nr;
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	unsigned char sha1[20];
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
 			removed++;
 		else if (cache[i]->ce_flags & CE_WATCHED) {
 			/*
+			 * CE_VALID when used with CE_WATCHED is not
+			 * supposed to be persistent. Next time git
+			 * runs, if this entry is still watched and
+			 * nothing has changed, CE_VALID will be
+			 * reinstated.
+			 */
+			cache[i]->ce_flags &= ~CE_VALID;
+			/*
 			 * We may set CE_WATCHED (but not CE_VALID)
 			 * early when refresh has not been done
 			 * yet. At that time we had no idea if the
@@ -1922,8 +1938,9 @@ int write_index(struct index_state *istate, int newfd)
 			return -1;
 	}
 
-	if (ce_flush(&c, newfd) || fstat(newfd, &st))
+	if (ce_flush(&c, newfd, sha1) || fstat(newfd, &st))
 		return -1;
+	close_watcher(istate, sha1);
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	return 0;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 16/26] git-compat-util.h: add inotify stubs on non-Linux platforms
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (14 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 15/26] read-cache: get changed file list from file watcher Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 17/26] file-watcher: inotify support, watching part Nguyễn Thái Ngọc Duy
                       ` (10 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This is to avoid spreading #ifdef HAVE_INOTIFY around and keep most
code compiled even if it's never active.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 config.mak.uname  |  1 +
 git-compat-util.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/config.mak.uname b/config.mak.uname
index 7d31fad..ee548f5 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_PATHS_H = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	BASIC_CFLAGS += -DHAVE_INOTIFY
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	NO_STRLCPY = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index cbd86c3..8b55dd0 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -128,6 +128,9 @@
 #else
 #include <poll.h>
 #endif
+#ifdef HAVE_INOTIFY
+#include <sys/inotify.h>
+#endif
 
 #if defined(__MINGW32__)
 /* pull in Windows compatibility stuff */
@@ -721,4 +724,44 @@ void warn_on_inaccessible(const char *path);
 /* Get the passwd entry for the UID of the current process. */
 struct passwd *xgetpwuid_self(void);
 
+#ifndef HAVE_INOTIFY
+/* Keep inotify-specific code build, even if it's not used */
+
+#define IN_DELETE_SELF	1
+#define IN_MOVE_SELF	2
+#define IN_ATTRIB	4
+#define IN_DELETE	8
+#define IN_MODIFY	16
+#define IN_MOVED_FROM	32
+#define IN_MOVED_TO	64
+#define IN_Q_OVERFLOW	128
+#define IN_UNMOUNT	256
+#define IN_CREATE	512
+#define IN_ISDIR	1024
+#define IN_IGNORED	2048
+
+struct inotify_event {
+	int event, mask, wd, len;
+	char name[FLEX_ARRAY];
+};
+
+static inline int inotify_init()
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+static inline int inotify_add_watch(int fd, const char *path, int options)
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+static inline int inotify_rm_watch(int fd, int wd)
+{
+	errno = ENOSYS;
+	return -1;
+}
+#endif
+
 #endif
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 17/26] file-watcher: inotify support, watching part
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (15 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 16/26] git-compat-util.h: add inotify stubs on non-Linux platforms Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 18/26] file-watcher: inotify support, notification part Nguyễn Thái Ngọc Duy
                       ` (9 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

"struct dir" manages inotify file descriptor and forms a tree. "struct
file" manages a file. When a file is watched, all dirs up to the file
is watched. Any changes on a directory impacts all subdirs and files.

The way data structure is made might be inotify-specific. I haven't
thought of how other file notification mechanisms may be
implemented. So there may be some refactoring later when a new OS is
supported.

Room for improvement: consecutive watched paths likely share the same
directory part (even though they are sorted by mtime, not name). Try
remember the last "dir" sequence and reduce lookups.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-file-watcher.txt |  13 +++
 file-watcher.c                     | 226 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 238 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
index d91caf3..d694fea 100644
--- a/Documentation/git-file-watcher.txt
+++ b/Documentation/git-file-watcher.txt
@@ -18,11 +18,24 @@ lstat(2) to detect that itself. Config key filewatcher.path needs to
 be set to `<socket directory>` so Git knows how to contact to the file
 watcher.
 
+This program is only supported under Linux with inotify(7) support.
+
 OPTIONS
 -------
 --detach::
 	Run in background.
 
+BUGS
+----
+On Linux, file watcher may fail to detect changes if you move the work
+tree from outside. For example if you have work tree at
+`/tmp/foo/work`, you move `/tmp/foo` to `/tmp/bar`, make some changes
+in `/tmp/bar/work` and move `/tmp/bar` back to `/tmp/foo`, changes
+won't get noticed. Moving `/tmp/foo/work` to something else is fine.
+
+inotify may not work well with network filesystems and a few special
+others. Check inotify documents.
+
 SEE ALSO
 --------
 linkgit:git-update-index[1],
diff --git a/file-watcher.c b/file-watcher.c
index aa2daf6..d0762e6 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -11,6 +11,28 @@ static const char *const file_watcher_usage[] = {
 	NULL
 };
 
+struct dir;
+struct repository;
+
+struct file {
+	char *name;
+	struct dir *parent;
+	struct repository *repo;
+	struct file *next;
+};
+
+struct dir {
+	char *name;
+	struct dir *parent;
+	struct dir **subdirs;
+	struct file **files;
+	struct repository *repo; /* only for root note */
+	int wd, nr_subdirs, nr_files;
+};
+
+static struct dir **wds;
+static int wds_alloc;
+
 struct repository {
 	char *work_tree;
 	char index_signature[41];
@@ -25,6 +47,7 @@ struct repository {
 	struct string_list updated;
 	int updated_sorted;
 	int updating;
+	struct dir *root;
 };
 
 const char *invalid_signature = "0000000000000000000000000000000000000000";
@@ -42,10 +65,199 @@ struct connection {
 static struct connection **conns;
 static struct pollfd *pfd;
 static int conns_alloc, pfd_nr, pfd_alloc;
+static int inotify_fd;
+
+/*
+ * IN_DONT_FOLLOW does not matter now as we do not monitor
+ * symlinks. See ce_watchable().
+ */
+#define INOTIFY_MASKS (IN_DELETE_SELF | IN_MOVE_SELF | \
+		       IN_CREATE | IN_ATTRIB | IN_DELETE | IN_MODIFY |	\
+		       IN_MOVED_FROM | IN_MOVED_TO)
+static struct dir *create_dir(struct dir *parent, const char *path,
+			      const char *basename)
+{
+	struct dir *d;
+	int wd = inotify_add_watch(inotify_fd, path, INOTIFY_MASKS);
+	if (wd < 0)
+		return NULL;
+
+	d = xmalloc(sizeof(*d));
+	memset(d, 0, sizeof(*d));
+	d->wd = wd;
+	d->parent = parent;
+	d->name = xstrdup(basename);
+
+	ALLOC_GROW(wds, wd + 1, wds_alloc);
+	wds[wd] = d;
+	return d;
+}
+
+static int get_dir_pos(struct dir *d, const char *name)
+{
+	int first, last;
+
+	first = 0;
+	last = d->nr_subdirs;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		int cmp = strcmp(name, d->subdirs[next]->name);
+		if (!cmp)
+			return next;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+
+	return -first-1;
+}
+
+static void free_file(struct dir *d, int pos, int topdown);
+static void free_dir(struct dir *d, int topdown)
+{
+	struct dir *p = d->parent;
+	int pos;
+	if (!topdown && p && (pos = get_dir_pos(p, d->name)) < 0)
+		die("How come this directory is not registered in its parent?");
+	if (d->repo)
+		d->repo->root = NULL;
+	wds[d->wd] = NULL;
+	inotify_rm_watch(inotify_fd, d->wd);
+	if (topdown) {
+		int i;
+		for (i = 0; i < d->nr_subdirs; i++)
+			free_dir(d->subdirs[i], topdown);
+		for (i = 0; i < d->nr_files; i++)
+			free_file(d, i, topdown);
+	}
+	free(d->name);
+	free(d->subdirs);
+	free(d->files);
+	free(d);
+	if (p && !topdown) {
+		p->nr_subdirs--;
+		memmove(p->subdirs + pos, p->subdirs + pos + 1,
+			(p->nr_subdirs - pos) * sizeof(*p->subdirs));
+		if (!p->nr_subdirs && !p->nr_files)
+			free_dir(p, topdown);
+	}
+}
+
+static int get_file_pos(struct dir *d, const char *name)
+{
+	int first, last;
+
+	first = 0;
+	last = d->nr_files;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		int cmp = strcmp(name, d->files[next]->name);
+		if (!cmp)
+			return next;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+
+	return -first-1;
+}
+
+static void free_file(struct dir *d, int pos, int topdown)
+{
+	struct file *f = d->files[pos];
+	free(f->name);
+	free(f);
+	if (!topdown) {
+		d->nr_files--;
+		memmove(d->files + pos, d->files + pos + 1,
+			(d->nr_files - pos) * sizeof(*d->files));
+		if (!d->nr_subdirs && !d->nr_files)
+			free_dir(d, topdown);
+	}
+}
+
+static struct dir *add_dir(struct dir *d,
+			   const char *path, const char *basename)
+{
+	struct dir *new;
+	int pos = get_dir_pos(d, basename);
+	if (pos >= 0)
+		return d->subdirs[pos];
+	pos = -pos-1;
+
+	new = create_dir(d, path, basename);
+	if (!new)
+		return NULL;
+
+	d->nr_subdirs++;
+	d->subdirs = xrealloc(d->subdirs, sizeof(*d->subdirs) * d->nr_subdirs);
+	if (d->nr_subdirs > pos + 1)
+		memmove(d->subdirs + pos + 1, d->subdirs + pos,
+			(d->nr_subdirs - pos - 1) * sizeof(*d->subdirs));
+	d->subdirs[pos] = new;
+	return new;
+}
+
+static struct file *add_file(struct dir *d, const char *name)
+{
+	struct file *new;
+	int pos = get_file_pos(d, name);
+	if (pos >= 0)
+		return d->files[pos];
+	pos = -pos-1;
+
+	new = xmalloc(sizeof(*new));
+	memset(new, 0, sizeof(*new));
+	new->parent = d;
+	new->name = xstrdup(name);
+
+	d->nr_files++;
+	d->files = xrealloc(d->files, sizeof(*d->files) * d->nr_files);
+	if (d->nr_files > pos + 1)
+		memmove(d->files + pos + 1, d->files + pos,
+			(d->nr_files - pos - 1) * sizeof(*d->files));
+	d->files[pos] = new;
+	return new;
+}
 
 static int watch_path(struct repository *repo, char *path)
 {
-	return -1;
+	struct dir *d = repo->root;
+	char *p = path;
+
+	if (!d) {
+		d = create_dir(NULL, ".", "");
+		if (!d)
+			return -1;
+		repo->root = d;
+		d->repo = repo;
+	}
+
+	for (;;) {
+		char *next, *sep;
+		sep = strchr(p, '/');
+		if (!sep) {
+			struct file *file;
+			file = add_file(d, p);
+			if (!file->repo)
+				file->repo = repo;
+			break;
+		}
+
+		next = sep + 1;
+		*sep = '\0';
+		d = add_dir(d, path, p);
+		if (!d)
+			/* we could free oldest watches and try again */
+			return -1;
+		*sep = '/';
+		p = next;
+	}
+	return 0;
 }
 
 static void get_changed_list(int conn_id)
@@ -195,8 +407,15 @@ static struct repository *get_repo(const char *work_tree)
 	return repo;
 }
 
+static void reset_watches(struct repository *repo)
+{
+	if (repo->root)
+		free_dir(repo->root, 1);
+}
+
 static void reset_repo(struct repository *repo, ino_t inode)
 {
+	reset_watches(repo);
 	string_list_clear(&repo->updated, 0);
 	memcpy(repo->index_signature, invalid_signature, 40);
 	repo->inode = inode;
@@ -497,6 +716,11 @@ int main(int argc, const char **argv)
 
 	git_extract_argv0_path(argv[0]);
 	git_setup_gettext();
+
+	inotify_fd = inotify_init();
+	if (inotify_fd < 0)
+		die_errno("unable to initialize inotify");
+
 	argc = parse_options(argc, argv, NULL, options,
 			     file_watcher_usage, 0);
 	if (argc < 1)
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 18/26] file-watcher: inotify support, notification part
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (16 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 17/26] file-watcher: inotify support, watching part Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 19/26] Wrap CE_VALID test with ce_valid() Nguyễn Thái Ngọc Duy
                       ` (8 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 141 insertions(+), 1 deletion(-)

diff --git a/file-watcher.c b/file-watcher.c
index d0762e6..5867942 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -260,6 +260,131 @@ static int watch_path(struct repository *repo, char *path)
 	return 0;
 }
 
+static inline void queue_file_changed(struct file *f, struct strbuf *sb)
+{
+	int len = sb->len;
+	strbuf_addf(sb, "%s%s", f->parent->parent ? "/" : "", f->name);
+	string_list_append(&f->repo->updated, sb->buf);
+	f->repo->updated_sorted = 0;
+	strbuf_setlen(sb, len);
+}
+
+static void construct_path(struct dir *d, struct strbuf *sb)
+{
+	if (!d->parent)
+		return;
+	if (!d->parent->parent) {
+		strbuf_addstr(sb, d->name);
+		return;
+	}
+	construct_path(d->parent, sb);
+	strbuf_addf(sb, "/%s", d->name);
+}
+
+static void file_changed(const struct inotify_event *event,
+			 struct dir *d, int pos)
+{
+	struct strbuf sb = STRBUF_INIT;
+	construct_path(d, &sb);
+	queue_file_changed(d->files[pos], &sb);
+	strbuf_release(&sb);
+	free_file(d, pos, 0);
+}
+
+static void dir_changed(const struct inotify_event *event, struct dir *d,
+			const char *base)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int i;
+
+	if (!base)		/* top call -> base == NULL */
+		construct_path(d, &sb);
+	else {
+		strbuf_addstr(&sb, base);
+		if (sb.len)
+			strbuf_addch(&sb, '/');
+		strbuf_addstr(&sb, d->name);
+	}
+
+	for (i = 0; i < d->nr_files; i++)
+		queue_file_changed(d->files[i], &sb);
+	for (i = 0; i < d->nr_subdirs; i++) {
+		dir_changed(event, d->subdirs[i], sb.buf);
+		if (!base)
+			free_dir(d->subdirs[i], 1);
+	}
+	strbuf_release(&sb);
+	if (!base)
+		free_dir(d, 0);
+}
+
+static void reset_repo(struct repository *repo, ino_t inode);
+static int do_handle_inotify(const struct inotify_event *event)
+{
+	struct dir *d;
+	int pos;
+
+	if (event->mask & (IN_Q_OVERFLOW | IN_UNMOUNT)) {
+		int i;
+		for (i = 0; i < nr_repos; i++)
+			reset_repo(repos[i], 0);
+		return 0;
+	}
+
+	if ((event->mask & IN_IGNORED) ||
+	    /*
+	     * Perhaps left over events that we have not consumed
+	     * before the watch descriptor is removed.
+	     */
+	    event->wd >= wds_alloc || wds[event->wd] == NULL)
+		return 0;
+
+	d = wds[event->wd];
+
+	/*
+	 * If something happened to the watched directory, consider
+	 * everything inside modified
+	 */
+	if (event->mask & (IN_DELETE_SELF | IN_MOVE_SELF)) {
+		dir_changed(event, d, NULL);
+		return 0;
+	}
+
+	if (!(event->mask & IN_ISDIR)) {
+		pos = get_file_pos(d, event->name);
+		if (pos >= 0)
+			file_changed(event, d, pos);
+	}
+
+	return 0;
+}
+
+static int handle_inotify(int fd)
+{
+	static char *buf;
+	static unsigned int buf_len = 0;
+	unsigned int avail, offset;
+	int ret, len;
+
+	/* drain the event queue */
+	if (ioctl(fd, FIONREAD, &avail))
+		die_errno("unable to FIONREAD inotify handle");
+	if (buf_len < avail) {
+		buf = xrealloc(buf, avail);
+		buf_len = avail;
+	}
+	len = read(fd, buf, avail);
+	if (len <= 0)
+		return -1;
+	ret = offset = 0;
+	while (offset < len) {
+		struct inotify_event *event = (void *)(buf + offset);
+		ret += do_handle_inotify(event);
+		offset += sizeof(struct inotify_event) + event->len;
+	}
+	return ret;
+}
+
 static void get_changed_list(int conn_id)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -466,6 +591,12 @@ static int handle_command(int conn_id)
 	 * capabilities. Capabilities in uppercase MUST be
 	 * supported. If any side does not understand any of the
 	 * advertised uppercase capabilities, it must disconnect.
+	 *
+	 * The way the main event loop is structured, we should get at
+	 * least one handle_inotify() before receiving the next
+	 * command. And handle_inotify() should process all events by
+	 * this point of time. This guarantees our reports won't miss
+	 * anything by the time get-changed is called.
 	 */
 	if ((arg = skip_prefix(msg, "hello"))) {
 		if (*arg) {	/* no capabilities supported yet */
@@ -753,11 +884,15 @@ int main(int argc, const char **argv)
 		close(err);
 	}
 
-	nr_common = 1;
+	nr_common = 1 + !!inotify_fd;
 	pfd_alloc = pfd_nr = nr_common;
 	pfd = xmalloc(sizeof(*pfd) * pfd_alloc);
 	pfd[0].fd = fd;
 	pfd[0].events = POLLIN;
+	if (inotify_fd) {
+		pfd[1].fd = inotify_fd;
+		pfd[1].events = POLLIN;
+	}
 
 	while (!quit) {
 		if (poll(pfd, pfd_nr, -1) < 0) {
@@ -769,6 +904,11 @@ int main(int argc, const char **argv)
 			continue;
 		}
 
+		if (inotify_fd && (pfd[1].revents & POLLIN)) {
+			if (handle_inotify(inotify_fd))
+				break;
+		}
+
 		for (new_nr = i = nr_common; i < pfd_nr; i++) {
 			if (pfd[i].revents & (POLLERR | POLLNVAL))
 				shutdown_connection(i);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 19/26] Wrap CE_VALID test with ce_valid()
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (17 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 18/26] file-watcher: inotify support, notification part Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 20/26] read-cache: new variable to verify file-watcher results Nguyễn Thái Ngọc Duy
                       ` (7 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The next patch wants to ignore CE_VALID under some condition but not
really clears it. Centralizing its access makes such a change easier.

Not all "ce_flags & CE_VALID" is converted though. The tests that
really mean _bit_ CE_VALID remains so. The tests that mean "ignore
worktree" are converted.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/grep.c | 2 +-
 cache.h        | 4 ++++
 diff-lib.c     | 4 ++--
 diff.c         | 2 +-
 read-cache.c   | 6 +++---
 unpack-trees.c | 2 +-
 6 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 63f8603..00526a1 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -386,7 +386,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		 * are identical, even if worktree file has been modified, so use
 		 * cache version instead
 		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
+		if (cached || ce_valid(ce) || ce_skip_worktree(ce)) {
 			if (ce_stage(ce))
 				continue;
 			hit |= grep_sha1(opt, ce->sha1, ce->name, 0, ce->name);
diff --git a/cache.h b/cache.h
index 9f7d952..c229bf9 100644
--- a/cache.h
+++ b/cache.h
@@ -222,6 +222,10 @@ static inline unsigned create_ce_flags(unsigned stage)
 #define ce_uptodate(ce) ((ce)->ce_flags & CE_UPTODATE)
 #define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
 #define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
+static inline int ce_valid(const struct cache_entry *ce)
+{
+	return ce->ce_flags & CE_VALID;
+}
 
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
 static inline unsigned int create_ce_mode(unsigned int mode)
diff --git a/diff-lib.c b/diff-lib.c
index 346cac6..dcda7f3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -198,7 +198,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			continue;
 
 		/* If CE_VALID is set, don't look at workdir for file removal */
-		changed = (ce->ce_flags & CE_VALID) ? 0 : check_removed(ce, &st);
+		changed = ce_valid(ce) ? 0 : check_removed(ce, &st);
 		if (changed) {
 			if (changed < 0) {
 				perror(ce->name);
@@ -369,7 +369,7 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 
 	/* if the entry is not checked out, don't examine work tree */
 	cached = o->index_only ||
-		(idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
+		(idx && (ce_valid(idx) || ce_skip_worktree(idx)));
 	/*
 	 * Backward compatibility wart - "diff-index -m" does
 	 * not mean "do not ignore merges", but "match_missing".
diff --git a/diff.c b/diff.c
index 8e4a6a9..22c73fe 100644
--- a/diff.c
+++ b/diff.c
@@ -2636,7 +2636,7 @@ static int reuse_worktree_file(const char *name, const unsigned char *sha1, int
 	 * If ce is marked as "assume unchanged", there is no
 	 * guarantee that work tree matches what we are looking for.
 	 */
-	if ((ce->ce_flags & CE_VALID) || ce_skip_worktree(ce))
+	if (ce_valid(ce) || ce_skip_worktree(ce))
 		return 0;
 
 	/*
diff --git a/read-cache.c b/read-cache.c
index 5540b06..95c9ccb 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -289,8 +289,8 @@ int ie_match_stat(const struct index_state *istate,
 	 */
 	if (!ignore_skip_worktree && ce_skip_worktree(ce))
 		return 0;
-	if (!ignore_valid && (ce->ce_flags & CE_VALID))
-		return 0;
+	if (!ignore_valid && ce_valid(ce))
+			return 0;
 
 	/*
 	 * Intent-to-add entries have not been added, so the index entry
@@ -1047,7 +1047,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		ce_mark_uptodate(ce);
 		return ce;
 	}
-	if (!ignore_valid && (ce->ce_flags & CE_VALID)) {
+	if (!ignore_valid && ce_valid(ce)) {
 		ce_mark_uptodate(ce);
 		return ce;
 	}
diff --git a/unpack-trees.c b/unpack-trees.c
index 164354d..61c3f35 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1215,7 +1215,7 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 	 * if this entry is truly up-to-date because this file may be
 	 * overwritten.
 	 */
-	if ((ce->ce_flags & CE_VALID) || ce_skip_worktree(ce))
+	if (ce_valid(ce) || ce_skip_worktree(ce))
 		; /* keep checking */
 	else if (o->reset || ce_uptodate(ce))
 		return 0;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 20/26] read-cache: new variable to verify file-watcher results
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (18 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 19/26] Wrap CE_VALID test with ce_valid() Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 21/26] Support running file watcher with the test suite Nguyễn Thái Ngọc Duy
                       ` (6 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

If GIT_TEST_WATCHED is set to a non-zero value, Git still uses file
watcher if configured. But it does lstat() anyway and notifies the
user if a file is changed but the file watcher said otherwise.

Note that there is a race condition. Changed paths are retrieved at
time X, then refresh and validation at time Y. Even if X and Y are
very close, an update can happen between X and Y, causing a false
report.

If GIT_TEST_WATCHED is set greater than 1, git will abort instead of
just warn and move on.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-file-watcher.txt |  8 ++++++++
 cache.h                            |  5 ++++-
 read-cache.c                       | 17 +++++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
index d694fea..dd09e30 100644
--- a/Documentation/git-file-watcher.txt
+++ b/Documentation/git-file-watcher.txt
@@ -25,6 +25,14 @@ OPTIONS
 --detach::
 	Run in background.
 
+TROUBLESHOOTING
+---------------
+Setting environment variable `GIT_TEST_WATCHED` to a non-zero number
+makes Git communicate with the file watcher, but do lstat anyway to
+verify that the file watcher results. Setting to 1 prints warning when
+file watcher fails to monitor files correctly. Setting to 2 aborts Git
+when it happens.
+
 BUGS
 ----
 On Linux, file watcher may fail to detect changes if you move the work
diff --git a/cache.h b/cache.h
index c229bf9..806c886 100644
--- a/cache.h
+++ b/cache.h
@@ -224,7 +224,10 @@ static inline unsigned create_ce_flags(unsigned stage)
 #define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
 static inline int ce_valid(const struct cache_entry *ce)
 {
-	return ce->ce_flags & CE_VALID;
+	extern int test_watched;
+	if (!test_watched)
+		return ce->ce_flags & CE_VALID;
+	return (ce->ce_flags & CE_VALID) && !(ce->ce_flags & CE_WATCHED);
 }
 
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
diff --git a/read-cache.c b/read-cache.c
index 95c9ccb..d5f084a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -37,6 +37,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT_WATCH 0x57415443	  /* "WATC" */
 
 struct index_state the_index;
+int test_watched;
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
@@ -1117,6 +1118,16 @@ static void show_file(const char * fmt, const char * name, int in_porcelain,
 	printf(fmt, name);
 }
 
+static void report_bad_watcher(struct index_state *istate,
+			       struct cache_entry *ce)
+{
+	if (test_watched > 1)
+		die("%s is updated but file-watcher said no",
+		    ce->name);
+	warning("%s is updated but file-watcher said no",
+		ce->name);
+}
+
 int refresh_index(struct index_state *istate, unsigned int flags,
 		  const struct pathspec *pathspec,
 		  char *seen, const char *header_msg)
@@ -1188,6 +1199,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 				ce->ce_flags &= ~CE_VALID;
 				istate->cache_changed = 1;
 			}
+			if (test_watched &&
+			    (ce->ce_flags & CE_WATCHED) && (ce->ce_flags & CE_VALID))
+				report_bad_watcher(istate, ce);
 			if (quiet)
 				continue;
 
@@ -1460,6 +1474,9 @@ int read_index_from(struct index_state *istate, const char *path)
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	if (getenv("GIT_TEST_WATCHED"))
+		test_watched = atoi(getenv("GIT_TEST_WATCHED"));
+
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
 	fd = open(path, O_RDONLY);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 21/26] Support running file watcher with the test suite
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (19 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 20/26] read-cache: new variable to verify file-watcher results Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 22/26] file-watcher: quit if $WATCHER/socket is gone Nguyễn Thái Ngọc Duy
                       ` (5 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This is to force running the test suite with file-watcher with

  $ mkdir /tmp/watcher
  $ chmod 700 /tmp/watcher
  $ git-file-watcher /tmp/watcher/

then open another terminal and run

  $ export GIT_TEST_WATCHED=2 GIT_TEST_WATCHER=2
  $ export GIT_TEST_WATCHER_PATH=/tmp/watcher
  $ make test

TIME_WAIT set set to unlimited by GIT_TEST_WATCHER=2 so the test suite
could hang up indefinitely due to a file-watcher bug. Luckily
everything passes.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher-lib.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/file-watcher-lib.c b/file-watcher-lib.c
index b6b0848..93afb52 100644
--- a/file-watcher-lib.c
+++ b/file-watcher-lib.c
@@ -136,6 +136,7 @@ void open_watcher(struct index_state *istate)
 	}
 
 	if (!read_config) {
+		const char *s;
 		int i;
 		/*
 		 * can't hook into git_default_config because
@@ -149,6 +150,18 @@ void open_watcher(struct index_state *istate)
 		if (i == istate->cache_nr)
 			recent_limit = 0;
 		read_config = 1;
+
+		s = getenv("GIT_TEST_WATCHER");
+		if (s) {
+			watch_lowerlimit = 1;
+			recent_limit = 0;
+			WAIT_TIME = -1;
+			if (atoi(s) > 1)
+				istate->update_watches = 1;
+			s = getenv("GIT_TEST_WATCHER_PATH");
+			if (s)
+				watcher_path = xstrdup(s);
+		}
 	}
 
 	istate->watcher = connect_watcher(watcher_path);
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 22/26] file-watcher: quit if $WATCHER/socket is gone
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (20 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 21/26] Support running file watcher with the test suite Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 23/26] file-watcher: tests for the daemon Nguyễn Thái Ngọc Duy
                       ` (4 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This is more of an issue in development than in production. When a
file-watcher related test fails, the daemon may be left hanging. When
you rerun the same test, old $TRASH_DIRECTORY is wiped out and no one
can communicate with the old daemon any more. Make the old daemon quit
after 5 minutes in such cases.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/file-watcher.c b/file-watcher.c
index 5867942..1e45b25 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -834,11 +834,22 @@ static void check_socket_directory(const char *path)
 	free(path_copy);
 }
 
+static void run_housekeeping(void)
+{
+	struct stat st;
+	struct strbuf sb = STRBUF_INIT;
+	strbuf_addf(&sb, "%s/socket", socket_path);
+	if (stat(sb.buf, &st) || !S_ISSOCK(st.st_mode))
+		exit(0);
+	strbuf_release(&sb);
+}
+
 int main(int argc, const char **argv)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int i, new_nr, fd, quit = 0, nr_common;
 	int daemon = 0;
+	time_t last_checked;
 	struct option options[] = {
 		OPT_BOOL(0, "detach", &daemon,
 			 N_("run in background")),
@@ -894,19 +905,25 @@ int main(int argc, const char **argv)
 		pfd[1].events = POLLIN;
 	}
 
+	last_checked = time(NULL);
 	while (!quit) {
-		if (poll(pfd, pfd_nr, -1) < 0) {
+		int ret = poll(pfd, pfd_nr, 300000);
+		int time_for_housekeeping = 0;
+		if (ret < 0) {
 			if (errno != EINTR) {
 				error("Poll failed, resuming: %s",
 				      strerror(errno));
 				sleep(1);
 			}
 			continue;
-		}
+		} else if (ret == 0)
+			time_for_housekeeping = 1;
 
 		if (inotify_fd && (pfd[1].revents & POLLIN)) {
 			if (handle_inotify(inotify_fd))
 				break;
+			if (last_checked + 300 < time(NULL))
+				time_for_housekeeping = 1;
 		}
 
 		for (new_nr = i = nr_common; i < pfd_nr; i++) {
@@ -949,6 +966,11 @@ int main(int argc, const char **argv)
 			accept_connection(pfd[0].fd);
 		if (pfd[0].revents & (POLLHUP | POLLERR | POLLNVAL))
 			die(_("error on listening socket"));
+
+		if (time_for_housekeeping) {
+			run_housekeeping();
+			last_checked = time(NULL);
+		}
 	}
 	return 0;
 }
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 23/26] file-watcher: tests for the daemon
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (21 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 22/26] file-watcher: quit if $WATCHER/socket is gone Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 24/26] ls-files: print CE_WATCHED as W (or "w" with CE_VALID) Nguyễn Thái Ngọc Duy
                       ` (3 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

test-file-watcher is a simple chat program to talk to file
watcher. Typically you would write something like this

  cat >expect <<EOF
  # send "hello". Oh and this is a comment!
  <hello
  # Wait for reply and print to stdout.
  # test-file-watcher does not care about anything after '>'
  >hello
  <index foo bar
  >ok
  EOF
  test-file-watcher . <expect >actual

and test-file-watcher will execute the commands and get responses. If
all go well, "actual" should be the same as "expect". '<' and '>'
denote send and receive packets respectively. '<<' and '>>' can be
used to send and receive a list of NUL-terminated paths.

$GIT_TEST_WATCHER enables a few more commands for testing purposes.
The most important one is 'test-mode' where system inotify is taken
out and inotify events could be injected via test-file-watcher.

There are two debug commands in file-watcher that's not used by the
test suite, but would help debugging: setenv and log. They can be used
to turn on GIT_TRACE_PACKET then any "log" command will show, which
functions as barrier between events file watcher.

GIT_TRACE_WATCHER can also be enabled (dynamically or at startup) to
track inotify events.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore                       |   1 +
 Makefile                         |   1 +
 file-watcher.c                   | 181 ++++++++++++++++++-
 t/t7513-file-watcher.sh (new +x) | 382 +++++++++++++++++++++++++++++++++++++++
 test-file-watcher.c (new)        |  96 ++++++++++
 5 files changed, 657 insertions(+), 4 deletions(-)
 create mode 100755 t/t7513-file-watcher.sh
 create mode 100644 test-file-watcher.c

diff --git a/.gitignore b/.gitignore
index 12c78f0..277f929 100644
--- a/.gitignore
+++ b/.gitignore
@@ -181,6 +181,7 @@
 /test-date
 /test-delta
 /test-dump-cache-tree
+/test-file-watcher
 /test-scrap-cache-tree
 /test-genrandom
 /test-index-version
diff --git a/Makefile b/Makefile
index 1c4d659..f0dc2cc 100644
--- a/Makefile
+++ b/Makefile
@@ -555,6 +555,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-file-watcher
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-index-version
 TEST_PROGRAMS_NEED_X += test-line-buffer
diff --git a/file-watcher.c b/file-watcher.c
index 1e45b25..3ab0a11 100644
--- a/file-watcher.c
+++ b/file-watcher.c
@@ -65,7 +65,8 @@ struct connection {
 static struct connection **conns;
 static struct pollfd *pfd;
 static int conns_alloc, pfd_nr, pfd_alloc;
-static int inotify_fd;
+static int inotify_fd, test_mode;
+static int wd_counter = 1;
 
 /*
  * IN_DONT_FOLLOW does not matter now as we do not monitor
@@ -78,10 +79,19 @@ static struct dir *create_dir(struct dir *parent, const char *path,
 			      const char *basename)
 {
 	struct dir *d;
-	int wd = inotify_add_watch(inotify_fd, path, INOTIFY_MASKS);
+	int wd;
+	if (!test_mode)
+		wd = inotify_add_watch(inotify_fd, path, INOTIFY_MASKS);
+	else {
+		wd = wd_counter++;
+		if (wd > 8)
+			wd = -1;
+	}
 	if (wd < 0)
 		return NULL;
 
+	trace_printf_key("GIT_TRACE_WATCHER", "inotify: watch %d %s\n",
+			 wd, path);
 	d = xmalloc(sizeof(*d));
 	memset(d, 0, sizeof(*d));
 	d->wd = wd;
@@ -124,7 +134,9 @@ static void free_dir(struct dir *d, int topdown)
 	if (d->repo)
 		d->repo->root = NULL;
 	wds[d->wd] = NULL;
-	inotify_rm_watch(inotify_fd, d->wd);
+	if (!test_mode)
+		inotify_rm_watch(inotify_fd, d->wd);
+	trace_printf_key("GIT_TRACE_WATCHER", "inotify: unwatch %d\n", d->wd);
 	if (topdown) {
 		int i;
 		for (i = 0; i < d->nr_subdirs; i++)
@@ -265,6 +277,7 @@ static inline void queue_file_changed(struct file *f, struct strbuf *sb)
 	int len = sb->len;
 	strbuf_addf(sb, "%s%s", f->parent->parent ? "/" : "", f->name);
 	string_list_append(&f->repo->updated, sb->buf);
+	trace_printf_key("GIT_TRACE_WATCHER", "watcher: changed %s\n", sb->buf);
 	f->repo->updated_sorted = 0;
 	strbuf_setlen(sb, len);
 }
@@ -324,6 +337,10 @@ static int do_handle_inotify(const struct inotify_event *event)
 	struct dir *d;
 	int pos;
 
+	trace_printf_key("GIT_TRACE_WATCHER", "inotify: event %08x wd %d %s\n",
+			 event->mask, event->wd,
+			 event->len ? event->name : "N/A");
+
 	if (event->mask & (IN_Q_OVERFLOW | IN_UNMOUNT)) {
 		int i;
 		for (i = 0; i < nr_repos; i++)
@@ -385,6 +402,81 @@ static int handle_inotify(int fd)
 	return ret;
 }
 
+struct constant {
+	const char *name;
+	int value;
+};
+
+#define CONSTANT(x) { #x, x }
+static const struct constant inotify_masks[] = {
+	CONSTANT(IN_DELETE_SELF),
+	CONSTANT(IN_MOVE_SELF),
+	CONSTANT(IN_ATTRIB),
+	CONSTANT(IN_DELETE),
+	CONSTANT(IN_MODIFY),
+	CONSTANT(IN_MOVED_FROM),
+	CONSTANT(IN_MOVED_TO),
+	CONSTANT(IN_Q_OVERFLOW),
+	CONSTANT(IN_UNMOUNT),
+	{ NULL, 0 },
+};
+
+static void inject_inotify(const char *msg)
+{
+	char buf[sizeof(struct inotify_event) + NAME_MAX + 1];
+	struct inotify_event *event = (struct inotify_event *)buf;
+	char *end, *p;
+	int i;
+	memset(event, 0, sizeof(*event));
+	event->wd = strtol(msg, &end, 0);
+	if (*end++ != ' ')
+		die("expect a space after watch descriptor");
+	p = end;
+	end = strchrnul(p, ' ');
+	if (*end)
+		strcpy(event->name, end + 1);
+	while (p < end) {
+		char *sep = strchrnul(p, '|');
+		if (sep > end)
+			sep = end;
+		*sep = '\0';
+		for (i = 0; inotify_masks[i].name; i++)
+			if (!strcmp(inotify_masks[i].name, p))
+				break;
+		if (!inotify_masks[i].name)
+			die("unrecognize event mask %s", p);
+		event->mask |= inotify_masks[i].value;
+		p = sep + 1;
+	}
+	do_handle_inotify(event);
+}
+
+static void dump_watches(struct dir *d, struct strbuf *sb, struct strbuf *out)
+{
+	int i, len = sb->len;
+	strbuf_addstr(sb, d->name);
+	strbuf_addf(out, "%s %d%c", sb->buf[0] ? sb->buf : ".", d->wd, '\0');
+	if (d->name[0])
+		strbuf_addch(sb, '/');
+	for (i = 0; i < d->nr_subdirs; i++)
+		dump_watches(d->subdirs[i], sb, out);
+	for (i = 0; i < d->nr_files; i++)
+		strbuf_addf(out, "%s%s%c", sb->buf, d->files[i]->name, '\0');
+	strbuf_setlen(sb, len);
+}
+
+static void dump_changes(struct repository *repo, struct strbuf *sb)
+{
+	int i;
+	if (!repo->updated_sorted) {
+		sort_string_list(&repo->updated);
+		repo->updated_sorted = 1;
+	}
+	for (i = 0; i < repo->updated.nr; i++)
+		strbuf_add(sb, repo->updated.items[i].string,
+			   strlen(repo->updated.items[i].string) + 1);
+}
+
 static void get_changed_list(int conn_id)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -483,11 +575,13 @@ static void unchange(int conn_id, unsigned long size)
 			item = string_list_lookup(&repo->updated, p);
 			if (!item)
 				continue;
+			trace_printf_key("GIT_TRACE_WATCHER", "watcher: unchange %s\n", p);
 			unsorted_string_list_delete_item(&repo->updated,
 							 item - repo->updated.items, 0);
 		}
 		strbuf_release(&sb);
 	}
+	trace_printf_key("GIT_TRACE_WATCHER", "watcher: unchange complete\n");
 	memcpy(repo->index_signature, conn->new_index, 40);
 	/*
 	 * If other connections on this repo are in some sort of
@@ -540,6 +634,13 @@ static void reset_watches(struct repository *repo)
 
 static void reset_repo(struct repository *repo, ino_t inode)
 {
+	if (test_mode)
+		/*
+		 * test-mode is designed for single repo, we can
+		 * safely reset wd counter because all wd should be
+		 * deleted
+		 */
+		wd_counter = 1;
 	reset_watches(repo);
 	string_list_clear(&repo->updated, 0);
 	memcpy(repo->index_signature, invalid_signature, 40);
@@ -560,6 +661,7 @@ static int shutdown_connection(int id)
 	return 0;
 }
 
+static void cleanup(void);
 static int handle_command(int conn_id)
 {
 	int fd = conns[conn_id]->sock;
@@ -754,6 +856,71 @@ static int handle_command(int conn_id)
 		}
 		unchange(conn_id, n);
 	}
+
+	/*
+	 * Testing and debugging support
+	 */
+	else if (!strcmp(msg, "test-mode") && getenv("GIT_TEST_WATCHER")) {
+		test_mode = 1;
+		packet_write(fd, "test mode on");
+	}
+	else if (starts_with(msg, "setenv ")) {
+		/* useful for setting GIT_TRACE_WATCHER or GIT_TRACE_PACKET */
+		char *sep = strchr(msg + 7, ' ');
+		if (!sep) {
+			packet_write(fd, "error invalid setenv line %s", msg);
+			return shutdown_connection(conn_id);
+		}
+		*sep = '\0';
+		setenv(msg + 7, sep + 1, 1);
+	}
+	else if (starts_with(msg, "log ")) {
+		; /* do nothing, if GIT_TRACE_PACKET is on, it's already logged */
+	}
+	else if (!strcmp(msg, "die") && getenv("GIT_TEST_WATCHER")) {
+		/*
+		 * The client will wait for "see you" before it may
+		 * run another daemon with the same path. So there's
+		 * no racing on unlink() and listen() on the same
+		 * socket path.
+		 */
+		cleanup();
+		packet_write(fd, "see you");
+		close(fd);
+		exit(0);
+	}
+	else if (starts_with(msg, "dump ") && getenv("GIT_TEST_WATCHER")) {
+		struct strbuf sb = STRBUF_INIT;
+		struct strbuf out = STRBUF_INIT;
+		const char *reply = NULL;
+		if (!strcmp(msg + 5, "watches")) {
+			if (conns[conn_id]->repo) {
+				if (conns[conn_id]->repo->root)
+					dump_watches(conns[conn_id]->repo->root, &sb, &out);
+			} else {
+				int i;
+				for (i = 0; i < nr_repos; i++) {
+					strbuf_addf(&out, "%s%c", repos[i]->work_tree, '\0');
+					if (repos[i]->root)
+						dump_watches(repos[i]->root, &sb, &out);
+					strbuf_reset(&out);
+					strbuf_reset(&sb);
+				}
+			}
+			reply = "watching";
+		} else if (!strcmp(msg + 5, "changes")) {
+			dump_changes(conns[conn_id]->repo, &out);
+			reply = "changed";
+		}
+		packet_write(fd, "%s %d", reply, (int)out.len);
+		if (out.len)
+			write_in_full(fd, out.buf, out.len);
+		strbuf_release(&out);
+		strbuf_release(&sb);
+	}
+	else if (starts_with(msg, "inotify ") && getenv("GIT_TEST_WATCHER")) {
+		inject_inotify(msg + 8);
+	}
 	else {
 		packet_write(fd, "error unrecognized command %s", msg);
 		return shutdown_connection(conn_id);
@@ -848,11 +1015,13 @@ int main(int argc, const char **argv)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int i, new_nr, fd, quit = 0, nr_common;
-	int daemon = 0;
+	int daemon = 0, check_support = 0;
 	time_t last_checked;
 	struct option options[] = {
 		OPT_BOOL(0, "detach", &daemon,
 			 N_("run in background")),
+		OPT_BOOL(0, "check-support", &check_support,
+			 N_("return zero file watcher is available")),
 		OPT_END()
 	};
 
@@ -865,6 +1034,10 @@ int main(int argc, const char **argv)
 
 	argc = parse_options(argc, argv, NULL, options,
 			     file_watcher_usage, 0);
+
+	if (check_support)
+		return 0;
+
 	if (argc < 1)
 		die(_("socket path missing"));
 	else if (argc > 1)
diff --git a/t/t7513-file-watcher.sh b/t/t7513-file-watcher.sh
new file mode 100755
index 0000000..bf64fc4
--- /dev/null
+++ b/t/t7513-file-watcher.sh
@@ -0,0 +1,382 @@
+#!/bin/sh
+
+test_description='File watcher daemon tests'
+
+. ./test-lib.sh
+
+if git file-watcher --check-support && test_have_prereq POSIXPERM; then
+	:				# good
+else
+	skip_all="file-watcher not supported on this system"
+	test_done
+fi
+
+kill_it() {
+	test-file-watcher "$1" <<EOF >/dev/null
+<die
+>see you
+EOF
+}
+
+GIT_TEST_WATCHER=1
+export GIT_TEST_WATCHER
+
+test_expect_success 'test-file-watcher can kill the daemon' '
+	chmod 700 . &&
+	git file-watcher --detach . &&
+	cat >expect <<EOF &&
+<die
+>see you
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual &&
+	! test -S socket
+'
+
+test_expect_success 'exchange hello' '
+	git file-watcher --detach . &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<die
+>see you
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'normal index sequence' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>inconsistent
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual &&
+	cat >expect2 <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+# inconsistent again because new-index has not been issued yet
+>inconsistent
+<new-index $SIG
+<<unchange
+<<
+EOF
+	test-file-watcher . >actual2 <expect2 &&
+	test_cmp expect2 actual2 &&
+	cat >expect3 <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>ok
+<die
+>see you
+EOF
+	test-file-watcher . >actual3 <expect3 &&
+	test_cmp expect3 actual3
+'
+
+test_expect_success 'unaccepted index: hello not sent' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<index $SIG $TRASH_DIRECTORY
+>error why did you not greet me? go away
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual &&
+	kill_it .
+'
+
+test_expect_success 'unaccepted index: signature too short' '
+	git file-watcher --detach . &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index 1234 $TRASH_DIRECTORY
+>error invalid index line index 1234 $TRASH_DIRECTORY
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual &&
+	kill_it .
+'
+
+test_expect_success 'unaccepted index: worktree unavailable' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY/non-existent
+>error work tree does not exist: No such file or directory
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual &&
+	kill_it .
+'
+
+test_expect_success 'watch foo and abc/bar' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>inconsistent
+<test-mode
+>test mode on
+<<watch
+<<foo
+<<abc/bar
+<<
+>watched 2
+<dump watches
+>>watching
+>>. 1
+>>abc 2
+>>abc/bar
+>>foo
+<new-index $SIG
+<<unchange
+<<
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'modify abc/bar' '
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>ok
+<inotify 2 IN_MODIFY bar
+<dump watches
+>>watching
+>>. 1
+>>foo
+<dump changes
+>>changed
+>>abc/bar
+<die
+>see you
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'delete abc makes abc/bar changed' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>inconsistent
+<test-mode
+>test mode on
+<<watch
+<<foo/abc/bar
+<<
+>watched 1
+<dump watches
+>>watching
+>>. 1
+>>foo 2
+>>foo/abc 3
+>>foo/abc/bar
+<inotify 2 IN_DELETE_SELF
+<dump watches
+>>watching
+<dump changes
+>>changed
+>>foo/abc/bar
+<new-index $SIG
+<<unchange
+<<
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'get changed list' '
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>ok
+<get-changed
+>>changed
+>>foo/abc/bar
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'incomplete new-index request' '
+	SIG=0123456789012345678901234567890123456789 &&
+	SIG2=9123456789012345678901234567890123456780 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>ok
+<new-index $SIG2
+<dump changes
+>>changed
+>>foo/abc/bar
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'delete abc/bar from changed list' '
+	SIG=0123456789012345678901234567890123456789 &&
+	SIG2=9123456789012345678901234567890123456780 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>ok
+<new-index $SIG2
+<<unchange
+<<foo/abc/bar
+<<
+<dump changes
+>>changed
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'file-watcher index updated after new-index' '
+	SIG2=9123456789012345678901234567890123456780 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG2 $TRASH_DIRECTORY
+>ok
+<die
+>see you
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+# When test-mode is on, file-watch only accepts 8 directories
+test_expect_success 'watch too many directories' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>inconsistent
+# Do not call inotify_add_watch()
+<test-mode
+>test mode on
+# First batch should be all ok
+<<watch
+<<dir1/foo
+<<dir2/foo
+<<dir3/foo
+<<dir4/foo
+<<
+>watched 4
+# Second batch hits the limit
+<<watch
+<<dir5/foo
+<<dir6/foo
+<<dir7/foo
+<<dir8/foo
+<<dir9/foo
+<<
+>watched 3
+# The third batch is already registered, should accept too
+<<watch
+<<dir5/foo
+<<dir6/foo
+<<dir7/foo
+<<
+>watched 3
+# Try again, see if it still rejects
+<<watch
+<<dir8/foo
+<<dir9/foo
+<<
+>watched 0
+<dump watches
+>>watching
+>>. 1
+>>dir1 2
+>>dir1/foo
+>>dir2 3
+>>dir2/foo
+>>dir3 4
+>>dir3/foo
+>>dir4 5
+>>dir4/foo
+>>dir5 6
+>>dir5/foo
+>>dir6 7
+>>dir6/foo
+>>dir7 8
+>>dir7/foo
+<die
+>see you
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'event overflow' '
+	git file-watcher --detach . &&
+	SIG=0123456789012345678901234567890123456789 &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>inconsistent
+<test-mode
+>test mode on
+<<watch
+<<foo
+<<abc/bar
+<<
+>watched 2
+<inotify 2 IN_MODIFY bar
+<dump watches
+>>watching
+>>. 1
+>>foo
+<dump changes
+>>changed
+>>abc/bar
+<inotify -1 IN_Q_OVERFLOW
+<dump watches
+>>watching
+<dump changes
+>>changed
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual &&
+	cat >expect2 <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+# Must be inconsistent because of IN_Q_OVERFLOW
+>inconsistent
+<die
+>see you
+EOF
+	test-file-watcher . >actual2 <expect2 &&
+	test_cmp expect2 actual2
+'
+
+test_done
diff --git a/test-file-watcher.c b/test-file-watcher.c
new file mode 100644
index 0000000..ffff198
--- /dev/null
+++ b/test-file-watcher.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "unix-socket.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+
+int main(int ac, char **av)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct strbuf packed = STRBUF_INIT;
+	char *packing = NULL;
+	int last_command_is_reply = 0;
+	int fd;
+
+	strbuf_addf(&sb, "%s/socket", av[1]);
+	fd = unix_stream_connect(sb.buf);
+	if (fd < 0)
+		die_errno("connect");
+	strbuf_reset(&sb);
+
+	/*
+	 * test-file-watcher crashes sometimes, make sure to flush
+	 */
+	setbuf(stdout, NULL);
+
+	while (!strbuf_getline(&sb, stdin, '\n')) {
+		if (sb.buf[0] == '#') {
+			puts(sb.buf);
+			continue;
+		}
+		if (sb.buf[0] == '>') {
+			if (last_command_is_reply)
+				continue;
+			last_command_is_reply = 1;
+		} else
+			last_command_is_reply = 0;
+
+		if (sb.buf[0] == '<' && sb.buf[1] == '<') {
+			puts(sb.buf);
+			if (!packing) {
+				packing = xstrdup(sb.buf + 2);
+				strbuf_reset(&packed);
+				continue;
+			}
+			if (!sb.buf[2]) {
+				packet_write(fd, "%s %d", packing, (int)packed.len);
+				if (packed.len)
+					write_in_full(fd, packed.buf, packed.len);
+				free(packing);
+				packing = NULL;
+			} else
+				strbuf_add(&packed, sb.buf + 2, sb.len - 2 + 1);
+			continue;
+		}
+		if (sb.buf[0] == '<') {
+			packet_write(fd, "%s", sb.buf + 1);
+			puts(sb.buf);
+			continue;
+		}
+		if (sb.buf[0] == '>' && sb.buf[1] == '>') {
+			int len;
+			char *p, *reply = packet_read_line(fd, &len);
+			if (!starts_with(reply, sb.buf + 2) ||
+			    reply[sb.len - 2] != ' ') {
+				printf(">%s\n", reply);
+				continue;
+			} else {
+				p = reply + sb.len - 2;
+				printf(">>%.*s\n", (int)(p - reply), reply);
+				len = atoi(p + 1);
+				if (!len)
+					continue;
+			}
+			strbuf_reset(&packed);
+			strbuf_grow(&packed, len);
+			if (read_in_full(fd, packed.buf, len) <= 0)
+				return 1;
+			strbuf_setlen(&packed, len);
+			for (p = packed.buf; p - packed.buf < packed.len; p += len + 1) {
+				len = strlen(p);
+				printf(">>%s\n", p);
+			}
+			continue;
+		}
+		if (sb.buf[0] == '>') {
+			int len;
+			char *reply = packet_read_line(fd, &len);
+			if (!reply)
+				puts(">");
+			else
+				printf(">%s\n", reply);
+			continue;
+		}
+		die("unrecognize command %s", sb.buf);
+	}
+	return 0;
+}
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 24/26] ls-files: print CE_WATCHED as W (or "w" with CE_VALID)
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (22 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 23/26] file-watcher: tests for the daemon Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 25/26] file-watcher: tests for the client side Nguyễn Thái Ngọc Duy
                       ` (2 subsequent siblings)
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-ls-files.txt |  1 +
 builtin/ls-files.c             | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index c0856a6..bdb17a5 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -123,6 +123,7 @@ a space) at the start of each line:
 	R::	removed/deleted
 	C::	modified/changed
 	K::	to be killed
+	W::	being watched by `git file-watcher`
 	?::	other
 
 -v::
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e1cf6d8..f1f7c07 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -46,6 +46,7 @@ static const char *tag_killed = "";
 static const char *tag_modified = "";
 static const char *tag_skip_worktree = "";
 static const char *tag_resolve_undo = "";
+static const char *tag_watched = "";
 
 static void write_name(const char *name)
 {
@@ -231,6 +232,7 @@ static void show_files(struct dir_struct *dir)
 	if (show_cached || show_stage) {
 		for (i = 0; i < active_nr; i++) {
 			const struct cache_entry *ce = active_cache[i];
+			const char *tag;
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
 			    !ce_excluded(dir, ce))
 				continue;
@@ -238,8 +240,15 @@ static void show_files(struct dir_struct *dir)
 				continue;
 			if (ce->ce_flags & CE_UPDATE)
 				continue;
-			show_ce_entry(ce_stage(ce) ? tag_unmerged :
-				(ce_skip_worktree(ce) ? tag_skip_worktree : tag_cached), ce);
+			if (ce_stage(ce))
+				tag = tag_unmerged;
+			else if (ce_skip_worktree(ce))
+				tag = tag_skip_worktree;
+			else if (ce->ce_flags & CE_WATCHED)
+				tag = tag_watched;
+			else
+				tag = tag_cached;
+			show_ce_entry(tag, ce);
 		}
 	}
 	if (show_deleted || show_modified) {
@@ -530,6 +539,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		tag_killed = "K ";
 		tag_skip_worktree = "S ";
 		tag_resolve_undo = "U ";
+		tag_watched = "W ";
 	}
 	if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
 		require_work_tree = 1;
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 25/26] file-watcher: tests for the client side
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (23 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 24/26] ls-files: print CE_WATCHED as W (or "w" with CE_VALID) Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-03  4:29     ` [PATCH v3 26/26] Disable file-watcher with system inotify on some tests Nguyễn Thái Ngọc Duy
  2014-02-08  8:04     ` [PATCH v3 00/26] inotify support Torsten Bögershausen
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Similar to t7513, system inotify is taken out to give us better
controlled environment.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 file-watcher-lib.c                   |  21 ++--
 t/t7514-file-watcher-lib.sh (new +x) | 190 +++++++++++++++++++++++++++++++++++
 test-file-watcher.c                  |  15 +++
 3 files changed, 219 insertions(+), 7 deletions(-)
 create mode 100755 t/t7514-file-watcher-lib.sh

diff --git a/file-watcher-lib.c b/file-watcher-lib.c
index 93afb52..8cc3b73 100644
--- a/file-watcher-lib.c
+++ b/file-watcher-lib.c
@@ -23,7 +23,8 @@ static int connect_watcher(const char *path)
 	return fd;
 }
 
-static void reset_watches(struct index_state *istate, int disconnect)
+static void reset_watches(struct index_state *istate, int disconnect,
+			  const char *reason)
 {
 	int i, changed = 0;
 	if (istate->updated_entries) {
@@ -45,6 +46,8 @@ static void reset_watches(struct index_state *istate, int disconnect)
 		close(istate->watcher);
 		istate->watcher = -1;
 	}
+	trace_printf_key("GIT_TRACE_WATCHER", "reset%s: %s\n",
+			 disconnect ? "/disconnect" : "", reason);
 }
 
 static void mark_ce_valid(struct index_state *istate)
@@ -53,15 +56,16 @@ static void mark_ce_valid(struct index_state *istate)
 	char *line, *end;
 	int i, len;
 	unsigned long n;
+	trace_printf_key("GIT_TRACE_WATCHER", "mark_ce_valid\n");
 	if (packet_write_timeout(istate->watcher, WAIT_TIME, "get-changed") <= 0 ||
 	    !(line = packet_read_line_timeout(istate->watcher, WAIT_TIME, &len)) ||
 	    !starts_with(line, "changed ")) {
-		reset_watches(istate, 1);
+		reset_watches(istate, 1, "invalid get-changed response");
 		return;
 	}
 	n = strtoul(line + 8, &end, 10);
 	if (end != line + len) {
-		reset_watches(istate, 1);
+		reset_watches(istate, 1, "invalid get-changed response");
 		return;
 	}
 	if (!n)
@@ -69,7 +73,7 @@ static void mark_ce_valid(struct index_state *istate)
 	strbuf_grow(&sb, n);
 	if (read_in_full_timeout(istate->watcher, sb.buf, n, WAIT_TIME) != n) {
 		strbuf_release(&sb);
-		reset_watches(istate, 1);
+		reset_watches(istate, 1, "invalid get-changed payload");
 		return;
 	}
 	line = sb.buf;
@@ -131,7 +135,7 @@ void open_watcher(struct index_state *istate)
 	char *msg;
 
 	if (!get_git_work_tree()) {
-		reset_watches(istate, 1);
+		reset_watches(istate, 1, "no worktree");
 		return;
 	}
 
@@ -165,10 +169,11 @@ void open_watcher(struct index_state *istate)
 	}
 
 	istate->watcher = connect_watcher(watcher_path);
+	trace_printf_key("GIT_TRACE_WATCHER", "open watcher %d\n", istate->watcher);
 	if (packet_write_timeout(istate->watcher, WAIT_TIME, "hello") <= 0 ||
 	    (msg = packet_read_line_timeout(istate->watcher, WAIT_TIME, NULL)) == NULL ||
 	    strcmp(msg, "hello")) {
-		reset_watches(istate, 1);
+		reset_watches(istate, 1, "invalid hello response");
 		return;
 	}
 
@@ -177,7 +182,7 @@ void open_watcher(struct index_state *istate)
 				 get_git_work_tree()) <= 0 ||
 	    (msg = packet_read_line_timeout(istate->watcher, WAIT_TIME, NULL)) == NULL ||
 	    strcmp(msg, "ok")) {
-		reset_watches(istate, 0);
+		reset_watches(istate, 0, "inconsistent");
 		istate->update_watches = 1;
 		return;
 	}
@@ -265,6 +270,7 @@ void watch_entries(struct index_state *istate)
 			nr++;
 	if (nr < watch_lowerlimit)
 		return;
+	trace_printf_key("GIT_TRACE_WATCHER", "watch %d\n", nr);
 	sorted = xmalloc(sizeof(*sorted) * nr);
 	for (i = nr = 0; i < istate->cache_nr; i++)
 		if (ce_watchable(istate->cache[i], now))
@@ -280,6 +286,7 @@ void close_watcher(struct index_state *istate, const unsigned char *sha1)
 	int len, i, nr;
 	if (istate->watcher <= 0)
 		return;
+	trace_printf_key("GIT_TRACE_WATCHER", "close watcher\n");
 	if (packet_write_timeout(istate->watcher, WAIT_TIME,
 				 "new-index %s", sha1_to_hex(sha1)) <= 0)
 		goto done;
diff --git a/t/t7514-file-watcher-lib.sh b/t/t7514-file-watcher-lib.sh
new file mode 100755
index 0000000..8dabb13
--- /dev/null
+++ b/t/t7514-file-watcher-lib.sh
@@ -0,0 +1,190 @@
+#!/bin/sh
+
+test_description='File watcher daemon and client tests'
+
+. ./test-lib.sh
+
+if git file-watcher --check-support && test_have_prereq POSIXPERM; then
+	: # good
+else
+	skip_all="file-watcher not supported on this system"
+	test_done
+fi
+
+kill_it() {
+	test-file-watcher "$1" <<EOF >/dev/null
+<die
+>see you
+EOF
+}
+
+GIT_TEST_WATCHER=2
+GIT_TEST_WATCHER_PATH="$TRASH_DIRECTORY"
+export GIT_TEST_WATCHER GIT_TEST_WATCHER_PATH
+
+test_expect_success 'setup' '
+	chmod 700 . &&
+	mkdir foo bar &&
+	touch abc foo/def bar/ghi &&
+	git add . &&
+	git file-watcher --detach . &&
+	cat <<EOF >expect &&
+<test-mode
+>test mode on
+EOF
+	test-file-watcher . <expect >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'initial opening sequence' '
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	rm actual &&
+	GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual" git ls-files >/dev/null &&
+	cat <<EOF >expect &&
+packet:          git> hello
+packet:          git< hello
+packet:          git> index $SIG $TRASH_DIRECTORY
+packet:          git< inconsistent
+packet:          git> watch 20
+packet:          git< watched 3
+EOF
+	test_cmp expect actual &&
+
+	# second time gives the same result because file-watch has not
+	# received new-index
+	GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual2" git ls-files >/dev/null &&
+	test_cmp expect actual2
+'
+
+test_expect_success 'full sequence' '
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	rm actual &&
+	GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual" git status >/dev/null &&
+	SIG2=`test-file-watcher --index-signature .git/index` &&
+	cat <<EOF >expect &&
+packet:          git> hello
+packet:          git< hello
+packet:          git> index $SIG $TRASH_DIRECTORY
+packet:          git< inconsistent
+packet:          git> watch 20
+packet:          git< watched 3
+packet:          git> new-index $SIG2
+packet:          git> unchange 0
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'full sequence after file-watcher is active' '
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	rm actual &&
+	GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual" git ls-files -v >paths.actual &&
+	cat <<EOF >expect &&
+packet:          git> hello
+packet:          git< hello
+packet:          git> index $SIG $TRASH_DIRECTORY
+packet:          git< ok
+packet:          git> get-changed
+packet:          git< changed 0
+EOF
+	test_cmp expect actual &&
+	cat <<EOF >paths.expect &&
+w abc
+w bar/ghi
+w foo/def
+EOF
+	test_cmp paths.expect paths.actual
+'
+
+test_expect_success 'inject a file change' '
+	echo modified >bar/ghi &&
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	cat >expect <<EOF &&
+<hello
+>hello
+<index $SIG $TRASH_DIRECTORY
+>ok
+<inotify 2 IN_MODIFY ghi
+<dump changes
+>>changed
+>>bar/ghi
+EOF
+	test-file-watcher . >actual <expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git obtains the changes' '
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	rm actual &&
+	GIT_TEST_WATCHER=1 GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual" git ls-files -v >paths.actual &&
+	cat <<EOF >expect &&
+packet:          git> hello
+packet:          git< hello
+packet:          git> index $SIG $TRASH_DIRECTORY
+packet:          git< ok
+packet:          git> get-changed
+packet:          git< changed 8
+EOF
+	test_cmp expect actual &&
+	cat <<EOF >paths.expect &&
+w abc
+H bar/ghi
+w foo/def
+EOF
+	test_cmp paths.expect paths.actual
+'
+
+test_expect_success 'sync file-watcher after index update' '
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	rm actual &&
+	GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual" git status --porcelain | grep -vF "??" >paths.actual &&
+	SIG2=`test-file-watcher --index-signature .git/index` &&
+	cat <<EOF >expect &&
+packet:          git> hello
+packet:          git< hello
+packet:          git> index $SIG $TRASH_DIRECTORY
+packet:          git< ok
+packet:          git> get-changed
+packet:          git< changed 8
+packet:          git> watch 8
+packet:          git< watched 1
+packet:          git> new-index $SIG2
+packet:          git> unchange 8
+EOF
+	test_cmp expect actual &&
+	cat <<EOF >paths.expect &&
+A  abc
+AM bar/ghi
+A  foo/def
+EOF
+	test_cmp paths.expect paths.actual
+'
+
+test_expect_success 'make sure file-watcher cleans its changed list' '
+	SIG=`test-file-watcher --index-signature .git/index` &&
+	rm actual &&
+	GIT_TEST_WATCHER=1 GIT_TRACE_PACKET="$TRASH_DIRECTORY/actual" git ls-files -v >paths.actual &&
+	cat <<EOF >expect &&
+packet:          git> hello
+packet:          git< hello
+packet:          git> index $SIG $TRASH_DIRECTORY
+packet:          git< ok
+packet:          git> get-changed
+packet:          git< changed 0
+EOF
+	test_cmp expect actual &&
+	cat <<EOF >paths.expect &&
+w abc
+H bar/ghi
+w foo/def
+EOF
+	test_cmp paths.expect paths.actual
+'
+
+test_expect_success 'closing the daemon' '
+	test-file-watcher . <<EOF >/dev/null
+<die
+>see you
+EOF
+'
+
+test_done
diff --git a/test-file-watcher.c b/test-file-watcher.c
index ffff198..77037e1 100644
--- a/test-file-watcher.c
+++ b/test-file-watcher.c
@@ -11,6 +11,21 @@ int main(int ac, char **av)
 	int last_command_is_reply = 0;
 	int fd;
 
+	if (!strcmp(av[1], "--index-signature")) {
+		struct stat st;
+		void *mmap;
+		if (lstat(av[2], &st))
+			die_errno("lstat");
+		fd = open(av[2], O_RDONLY);
+		if (fd < 0)
+			die_errno("open");
+		mmap = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+		if (mmap == MAP_FAILED)
+			die_errno("mmap");
+		printf("%s\n", sha1_to_hex((unsigned char*)mmap + st.st_size - 20));
+		return 0;
+	}
+
 	strbuf_addf(&sb, "%s/socket", av[1]);
 	fd = unix_stream_connect(sb.buf);
 	if (fd < 0)
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 26/26] Disable file-watcher with system inotify on some tests
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (24 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 25/26] file-watcher: tests for the client side Nguyễn Thái Ngọc Duy
@ 2014-02-03  4:29     ` Nguyễn Thái Ngọc Duy
  2014-02-08  8:04     ` [PATCH v3 00/26] inotify support Torsten Bögershausen
  26 siblings, 0 replies; 72+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-02-03  4:29 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

When file watcher is active via GIT_TEST_WATCHER_PATH, `ls-files -v`
output may be different (e.g. "H" becomes "W"). Disable file watcher
in those cases.

We could make ls-files turn 'W' back to 'H' for these cases, but not
sure it's worth the effort. The intention of running with
GIT_TEST_WATCHER_PATH is to exercise file watcher in somewhat real
scenarios and the remaining tests are more than enough for that.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t1011-read-tree-sparse-checkout.sh  | 2 ++
 t/t2104-update-index-skip-worktree.sh | 2 ++
 t/t7011-skip-worktree-reading.sh      | 2 ++
 t/t7012-skip-worktree-writing.sh      | 2 ++
 4 files changed, 8 insertions(+)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 0c74bee..00e92bc 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -14,6 +14,8 @@ test_description='sparse checkout tests
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-read-tree.sh
 
+unset GIT_TEST_WATCHER_PATH
+
 test_expect_success 'setup' '
 	cat >expected <<-\EOF &&
 	100644 77f0ba1734ed79d12881f81b36ee134de6a3327b 0	init.t
diff --git a/t/t2104-update-index-skip-worktree.sh b/t/t2104-update-index-skip-worktree.sh
index 1d0879b..2b93c58 100755
--- a/t/t2104-update-index-skip-worktree.sh
+++ b/t/t2104-update-index-skip-worktree.sh
@@ -7,6 +7,8 @@ test_description='skip-worktree bit test'
 
 . ./test-lib.sh
 
+unset GIT_TEST_WATCHER_PATH
+
 cat >expect.full <<EOF
 H 1
 H 2
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 88d60c1..2242718 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -7,6 +7,8 @@ test_description='skip-worktree bit test'
 
 . ./test-lib.sh
 
+unset GIT_TEST_WATCHER_PATH
+
 cat >expect.full <<EOF
 H 1
 H 2
diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh
index 9ceaa40..c8e0eb5 100755
--- a/t/t7012-skip-worktree-writing.sh
+++ b/t/t7012-skip-worktree-writing.sh
@@ -7,6 +7,8 @@ test_description='test worktree writing operations when skip-worktree is used'
 
 . ./test-lib.sh
 
+unset GIT_TEST_WATCHER_PATH
+
 test_expect_success 'setup' '
 	test_commit init &&
 	echo modified >> init.t &&
-- 
1.8.5.2.240.g8478abd

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
                       ` (25 preceding siblings ...)
  2014-02-03  4:29     ` [PATCH v3 26/26] Disable file-watcher with system inotify on some tests Nguyễn Thái Ngọc Duy
@ 2014-02-08  8:04     ` Torsten Bögershausen
  2014-02-08  8:53       ` Duy Nguyen
  26 siblings, 1 reply; 72+ messages in thread
From: Torsten Bögershausen @ 2014-02-08  8:04 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git, Torsten Bögershausen

On 03.02.14 05:28, Nguyễn Thái Ngọc Duy wrote:

I managed to review the code 0..12/26, so some parts are missing.
The list below became longer than what I intended,
my comments may be hard to read,
and there is a mixture of minor and major remarks.

I would appreciate if we could have an outline of the protocol
as a seperate "document" somewhere, to be able to have a look at the protocol
first, before looking into the code.

(Am I using wireshark too much to dream about a dissector ?)

All in all I like the concept, thanks for the work.


1)
  write_in_full_timeout()
  packet_read_line_timeout()
  At other places we handle EINTR after calling poll().
  Looking at the code, it could be easier to introduce
  a new function xpoll() in wrapper.c, and use that instead
  of poll().

2)
  Similar for the usage of accept().
  I like the idea of xread() xwrite() and all the x functions
  so it coud make sense to establish xpoll() and xaccept()
  before inotify suppport.


3)
> -int unix_stream_listen(const char *path)
> +int unix_stream_listen(const char *path, int replace)
>  {
>  	int fd, saved_errno;
>  	struct sockaddr_un sa;
> @@ -103,7 +103,8 @@ int unix_stream_listen(const char *path)
>  		return -1;
>  	fd = unix_stream_socket();
>  
> -	unlink(path);
> +	if (replace)
> +		unlink(path);

Minor remark:
As we do not do the replace here:
s/replace/un_link/ may be ?


4)
> +++ b/file-watcher.c
{}
> +static const char *const file_watcher_usage[] = {
> +	N_("git file-watcher [options] <socket directory>"),
> +	NULL
> +};
Do we already have options here?
I can think about having 
-d daemoniye
-u uses Unix doain socket
(And later -t to use a TCP/IP socket, when the repo
 is on a mounted NFS (or SAMBA) drive, and  the daemon is on a 
 different machine.
 I don't say this patch should include this logic in first round,
 But I can see a gain for this kind of setup)


5)
> +++ b/file-watcher.c
[]
> +static int shutdown_connection(int id)
> +{
> +	struct connection *conn = conns[id];
> +	conns[id] = NULL;
> +	pfd[id].fd = -1;
> +	if (!conn)
> +		return 0;
> +	close(conn->sock);
> +	free(conn);
> +	return 0;
> +}
The function is called shutdown_connection(), but it does a close()
Could it be named close_connection() ?

6) 
> +++ b/file-watcher.c
[]
Do we have a sequence chart about the command flow between the watcher
daemon and the client ?

------------
7)

in 03/26:
>This version is also gentler than its friend packet_read_line()
gentler, what does this mean?

>because it's designed for side channel I/O that should not abort the
>program even if the channel is broken.
I'm not so familar with side-channel I/O. How does it fit in here?

Does this make sense:
In opposite to packet_read_line() which can call die()
to abort the program, read_in_full_timeout() will keep the program running.
(or something like this)

>
>Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>---
> cache.h    |  1 +
> pkt-line.c | 35 +++++++++++++++++++++++++++++++++++
> pkt-line.h |  1 +
> wrapper.c  | 21 +++++++++++++++++++++
> 4 files changed, 58 insertions(+)
>
>diff --git a/cache.h b/cache.h
>index 718e32b..939db46 100644
>--- a/cache.h
>+++ b/cache.h
>@@ -1230,6 +1230,7 @@ extern int write_or_whine_pipe(int fd, const void *buf, size_t count, const char
> extern void fsync_or_die(int fd, const char *);
> 
> extern ssize_t read_in_full(int fd, void *buf, size_t count);
>+extern ssize_t read_in_full_timeout(int fd, void *buf, size_t count, int timeout);
> extern ssize_t write_in_full(int fd, const void *buf, size_t count);
> extern ssize_t write_in_full_timeout(int fd, const void *buf, size_t count, int timeout);
> static inline ssize_t write_str_in_full(int fd, const char *str)
>diff --git a/pkt-line.c b/pkt-line.c
>index cf681e9..5a07e97 100644
>--- a/pkt-line.c
>+++ b/pkt-line.c
>@@ -229,3 +229,38 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
> {
> 	return packet_read_line_generic(-1, src, src_len, dst_len);
> }
>+

In what is the timeout measured ?
seconds, milli years?
As we use poll() I think it is milli seconds.
(I like the idea of naming timeout timeout_ms)

[]
>+	len -= 4;
>+	if (len >= buf_len) {
>+		error("protocol error: bad line length %d", len);
>+		return NULL;
>+	}
>+	if ((ret = read_in_full_timeout(fd, buf, len, timeout)) < 0)
>+		return NULL;
Do we want a packet_trace here?

When a timeout occurs, do we want to close the connection,
marking it as dead?
Or need to look at errno?


>+	buf[len] = '\0';
>+	if (len_p)
>+		*len_p = len;
>+	packet_trace(buf, len, 0);
>+	return buf;
>+}

>diff --git a/pkt-line.h b/pkt-line.h
>index 4b93a0c..d47dca5 100644
>--- a/pkt-line.h
>+++ b/pkt-line.h
>@@ -69,6 +69,7 @@ int packet_read(int fd, char **src_buffer, size_t *src_len, char
>  * packet is written to it.
>  */
> char *packet_read_line(int fd, int *size);
>+char *packet_read_line_timeout(int fd, int timeout, int *size);
> 
> /*
>  * Same as packet_read_line, but read from a buf rather than a descriptor;
>diff --git a/wrapper.c b/wrapper.c
>index 9a0e289..9cf10b2 100644
>--- a/wrapper.c
>+++ b/wrapper.c
>@@ -193,6 +193,27 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
> 	return total;
> }
> 
>+ssize_t read_in_full_timeout(int fd, void *buf, size_t count, int timeout)
>+{
>+	char *p = buf;
>+	ssize_t total = 0;
>+	struct pollfd pfd;
>+
>+	pfd.fd = fd;
>+	pfd.events = POLLIN;
>+	while (count > 0 && poll(&pfd, 1, timeout) > 0 &&
>+	       (pfd.revents & POLLIN)) {
>+		ssize_t loaded = xread(fd, p, count);
>+		if (loaded <= 0)
>+			return -1;
>+		count -= loaded;
>+		p += loaded;
>+		total += loaded;
>+	}
>+
>+	return count ? -1 : total;
Isn't it that 
ret < 0  means "error of some kind"
ret == 0 means EOF,
ret > 0  means some data
Why do we turn 0 into -1?

--------------
8)
+++ b/unix-socket.c
@@ -93,7 +93,7 @@ fail:
 	return -1;
 }
 
-int unix_stream_listen(const char *path)
+int unix_stream_listen(const char *path, int replace)
 {
 	int fd, saved_errno;
 	struct sockaddr_un sa;
@@ -103,7 +103,8 @@ int unix_stream_listen(const char *path)
 		return -1;
 	fd = unix_stream_socket();
 
-	unlink(path);
+	if (replace)
+		unlink(path);
 	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
Why do we call the parameter replace, when it does an
unlink() ?
s/replace/un_link/ ? 


9)
>Access to the unix socket $WATCHER/socket is covered by $WATCHER's
>permission. While the file watcher does not carry much information,
>repo file listing is sometimes not something you want other users to
>see. Make sure $WATCHER has 0700 permission to stop unwanted access.

>Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>---
> file-watcher.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)

I feel a little bit unsure about the 700.
Most often Git does not care about permissions,
and relies on umask being set appropriatly.
(Please correct me if I'm wrong)

My spontanous feeling is that adjust_shared_perm() could be used.

10)
An other thing:
>strbuf_addf(&sb, "%s/socket", socket_path);
Does it make sense to name the socket "%s/watcher" ?


11)
>In daemon mode, stdout and stderr are saved in $WATCHER/log.

[]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-file-watcher.txt |  2 ++
 cache.h                            |  1 +
 daemon.c                           | 30 ++++--------------------------
 file-watcher.c                     | 17 +++++++++++++++++
 setup.c                            | 25 +++++++++++++++++++++++++
 5 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
index 625a389..ec81f18 100644
--- a/Documentation/git-file-watcher.txt
+++ b/Documentation/git-file-watcher.txt
@@ -18,6 +18,8 @@ lstat(2) to detect that itself.
 
 OPTIONS
 -------
+--detach::
+	Run in background.

Shouldn't that be named --daemon ?


12)
>read_cache() connects to the file watcher, specified by
>filewatcher.path config, and performs basic hand shaking. CE_WATCHED
>is cleared if git and file watcher have different views on the index
>state.
>
>All send/receive calls must be complete within a limited time to avoid
>a buggy file-watcher hang "git status" forever. And the whole point of
>doing this is speed. If file watcher can't respond fast enough, for
>whatever reason, then it should not be used.
>
>Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>---
> Documentation/config.txt           |  10 +++
> Documentation/git-file-watcher.txt |   4 +-
> Makefile                           |   1 +
> cache.h                            |   1 +
> file-watcher-lib.c (new)           |  91 ++++++++++++++++++++++
> file-watcher-lib.h (new)           |   6 ++
> file-watcher.c                     | 152 ++++++++++++++++++++++++++++++++++++-
> read-cache.c                       |   6 ++
> 8 files changed, 269 insertions(+), 2 deletions(-)
> create mode 100644 file-watcher-lib.c
> create mode 100644 file-watcher-lib.h
>
>diff --git a/Documentation/config.txt b/Documentation/config.txt
>index 5f4d793..6ad653a 100644
>--- a/Documentation/config.txt
>+++ b/Documentation/config.txt
>@@ -1042,6 +1042,16 @@ difftool.<tool>.cmd::
> difftool.prompt::
> 	Prompt before each invocation of the diff tool.
> 
>+filewatcher.path::
>+	The directory that contains the socket of `git	file-watcher`.
>+	If it's not an absolute path, it's relative to $GIT_DIR. An
>+	empty path means no connection to file watcher.
I would really like to be able to have different tranport mechanismen
using the same config.
Today this is only a matter of documentation and naming things:

How about this idea:
filewatcher.url::
The directory that contains the socket of `git	file-watcher`.
It can be an absolute path, meaning a unix domain socket in the
file system.
It can be a path relative to $GIT_DIR.

It must start with either 
"/" (absolut path) or 
"./" (relative path to $GIT_DIR) or 
"~/" (relaive path in your $HOME directoy.
An empty url means no connection to file watcher.
(Later we can add url schemes like "tcp://" or "pipe://")


>+
>+filewatcher.timeout::
>+	This is the maximum time in milliseconds that Git waits for
>+	the file watcher to respond before giving up. Default value is
>+	50. Setting to -1 makes Git wait forever.
50 feels low, especially on "older/slower machines"
200 is probably acceptable even for the very impatient user,
I could think of 500 to make the user aware of it.

[]
>+	/*
>+	 * ">" denotes an incoming packet, "<" outgoing. The lack of
>+	 * "<" means no reply expected.
>+	 *
>+	 * < "error" SP ERROR-STRING
>+	 *
>+	 * This can be sent whenever the client violates the protocol.
>+	 */
>+
>+	msg = packet_read_line(fd, &len);
>+	if (!msg) {
>+		packet_write(fd, "error invalid pkt-line");
>+		return shutdown_connection(conn_id);
>+	}
>+
>+	/*
>+	 * > "hello" [SP CAP [SP CAP..]]
>+	 * < "hello" [SP CAP [SP CAP..]]
>+	 *
>+	 * Advertise capabilities of both sides. File watcher may
>+	 * disconnect if the client does not advertise the required
>+	 * capabilities. Capabilities in uppercase MUST be
>+	 * supported. If any side does not understand any of the
>+	 * advertised uppercase capabilities, it must disconnect.
>+	 */
>+	if ((arg = skip_prefix(msg, "hello"))) {
>+		if (*arg) {	/* no capabilities supported yet */
>+			packet_write(fd, "error capabilities not supported");
>+			return shutdown_connection(conn_id);
>+		}
>+		packet_write(fd, "hello");
>+		conns[conn_id]->polite = 1;
>+	}
>+
>+	/*
>+	 * > "index" SP INDEX-SIGNATURE SP WORK-TREE-PATH
>+	 * < "ok" | "inconsistent"
>+	 *
>+	 * INDEX-SIGNATURE consists of 40 hexadecimal letters
>+	 * WORK-TREE-PATH must be absolute and normalized path
>+	 *
>+	 * Watch file changes in index. The client sends the index and
>+	 * work tree info. File watcher validates that it holds the
>+	 * same info. If so it sends "ok" back indicating both sides
>+	 * are on the same page and CE_WATCHED bits can be ketpt.
>+	 *
>+	 * Otherwise it sends "inconsistent" and both sides must reset
>+	 * back to initial state. File watcher keeps its index
>+	 * signature all-zero until the client has updated the index
>+	 * ondisk and request to update index signature.
>+	 *
>+	 * "hello" must be exchanged first. After this command the
>+	 * connection is associated with a worktree/index. Many
>+	 * commands may require this to proceed.
>+	 */
>+	else if (starts_with(msg, "index ")) {
>+		struct repository *repo;
>+		struct stat st;
>+		if (!conns[conn_id]->polite) {
>+			packet_write(fd, "error why did you not greet me? go away");
>+			return shutdown_connection(conn_id);
>+		}
>+		if (len < 47 || msg[46] != ' ' || !is_absolute_path(msg + 47)) {
>+			packet_write(fd, "error invalid index line %s", msg);
>+			return shutdown_connection(conn_id);
>+		}
>+
>+		if (lstat(msg + 47, &st) || !S_ISDIR(st.st_mode)) {
>+			packet_write(fd, "error work tree does not exist: %s",
>+				     strerror(errno));
>+			return shutdown_connection(conn_id);
>+		}
>+		repo = get_repo(msg + 47);
>+		conns[conn_id]->repo = repo;
>+		if (memcmp(msg + 6, repo->index_signature, 40) ||
>+		    !memcmp(msg + 6, invalid_signature, 40) ||
>+		    repo->inode != st.st_ino) {
>+			packet_write(fd, "inconsistent");
>+			reset_repo(repo, st.st_ino);
>+			return 0;
>+		}
>+		packet_write(fd, "ok");
>+	}
>+	else {
>+		packet_write(fd, "error unrecognized command %s", msg);
>+		return shutdown_connection(conn_id);
I feel a little bit unsure about this.
We establish a protocol which is not extendable.
Do we need to call shutdown_connection() ?
(And as I noticed earlier, close_connection() could be a better name)

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-08  8:04     ` [PATCH v3 00/26] inotify support Torsten Bögershausen
@ 2014-02-08  8:53       ` Duy Nguyen
  2014-02-09 20:19         ` Torsten Bögershausen
  0 siblings, 1 reply; 72+ messages in thread
From: Duy Nguyen @ 2014-02-08  8:53 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git Mailing List

Thanks for the comments. I can see I now have some work to do in the
coming weeks :)

On Sat, Feb 8, 2014 at 3:04 PM, Torsten Bögershausen <tboegi@web.de> wrote:
> I would appreciate if we could have an outline of the protocol
> as a seperate "document" somewhere, to be able to have a look at the protocol
> first, before looking into the code.

My fear is document becomes outdated because people don't always
remember to update doc when they change code. Which is why I embed the
protocol as comments in handle_command() function. If the final
version [1] is still not easy to read, I'll split the protocol out
into a separate document.

[1] https://github.com/pclouds/git/blob/file-watcher/file-watcher.c#L672

> (Am I using wireshark too much to dream about a dissector ?)

Try GIT_TRACE_PACKET=2

> 1)
>   write_in_full_timeout()
>   packet_read_line_timeout()
>   At other places we handle EINTR after calling poll().
>   Looking at the code, it could be easier to introduce
>   a new function xpoll() in wrapper.c, and use that instead
>   of poll().

Yeah there are already 4 poll call sites before file-watcher jumps in. Will do.

> 2)
>   Similar for the usage of accept().
>   I like the idea of xread() xwrite() and all the x functions
>   so it coud make sense to establish xpoll() and xaccept()
>   before inotify suppport.

OK

> 3)
>> -int unix_stream_listen(const char *path)
>> +int unix_stream_listen(const char *path, int replace)
>>  {
>>       int fd, saved_errno;
>>       struct sockaddr_un sa;
>> @@ -103,7 +103,8 @@ int unix_stream_listen(const char *path)
>>               return -1;
>>       fd = unix_stream_socket();
>>
>> -     unlink(path);
>> +     if (replace)
>> +             unlink(path);
>
> Minor remark:
> As we do not do the replace here:
> s/replace/un_link/ may be ?

Heh, I thought of using the name "unlink" but it's taken so I chose
"replace" and did not think of underscore.Will do.

> 4)
>> +++ b/file-watcher.c
> {}
>> +static const char *const file_watcher_usage[] = {
>> +     N_("git file-watcher [options] <socket directory>"),
>> +     NULL
>> +};
> Do we already have options here?
> I can think about having
> -d daemoniye
> -u uses Unix doain socket
> (And later -t to use a TCP/IP socket, when the repo
>  is on a mounted NFS (or SAMBA) drive, and  the daemon is on a
>  different machine.
>  I don't say this patch should include this logic in first round,
>  But I can see a gain for this kind of setup)

Later on we have two, --detach (i.e. daemonize, I reuse the same name
from git-daemon) and --check-support. Transport settings (like unix vs
tcp/ip...) should be config options though. You don't want to specify
it here, then again when you run "git status". Actually I should add
--default, that will retrieve <socket directory> from config key
filewatcher.path, so the user does not have to specify it..

> 5)
>> +++ b/file-watcher.c
> []
>> +static int shutdown_connection(int id)
>> +{
>> +     struct connection *conn = conns[id];
>> +     conns[id] = NULL;
>> +     pfd[id].fd = -1;
>> +     if (!conn)
>> +             return 0;
>> +     close(conn->sock);
>> +     free(conn);
>> +     return 0;
>> +}
> The function is called shutdown_connection(), but it does a close()
> Could it be named close_connection() ?

Yes, there was a close_connection() which did something similar, and
then it was killed off. Will rename.

> 6)
>> +++ b/file-watcher.c
> []
> Do we have a sequence chart about the command flow between the watcher
> daemon and the client ?

I suggest you have a look at the file-watcher test. For example, from
[2] we have this sequence

<hello
>hello
<index $SIG $TRASH_DIRECTORY
>inconsistent
# Do not call inotify_add_watch()
<test-mode
>test mode on
# First batch should be all ok
<<watch
<<dir1/foo
<<dir2/foo
<<dir3/foo
<<dir4/foo
...

"<" denotes a packet from the client to file-watcher, ">" the opposite
direction. But you can always obtain a real flow with
GIT_TRACE_PACKET=2 (path lists not available though, so really just a
flow).

[2] https://github.com/pclouds/git/blob/file-watcher/t/t7513-file-watcher.sh#L273

> in 03/26:
>>This version is also gentler than its friend packet_read_line()
> gentler, what does this mean?

No dying for whatever error. packet_read_line is designed for git
protocol. It's the main job so if there's an error, dying is the right
thing to do. file-watcher on the other hand is a side job and should
not stop whatever command from doing. Will update commit message.

>>because it's designed for side channel I/O that should not abort the
>>program even if the channel is broken.
> I'm not so familar with side-channel I/O. How does it fit in here?

To sum up, whatever error in communication with file-watcher should
not stop you from doing whatever you're doing. file-watcher is
contacted whenever $GIT_DIR/index is read, so "whatever you're doing"
is basically all git commands that involve worktree or index.

> Does this make sense:
> In opposite to packet_read_line() which can call die()
> to abort the program, read_in_full_timeout() will keep the program running.
> (or something like this)

Exactly!

>>diff --git a/cache.h b/cache.h
>>index 718e32b..939db46 100644
>>--- a/cache.h
>>+++ b/cache.h
>>@@ -1230,6 +1230,7 @@ extern int write_or_whine_pipe(int fd, const void *buf, size_t count, const char
>> extern void fsync_or_die(int fd, const char *);
>>
>> extern ssize_t read_in_full(int fd, void *buf, size_t count);
>>+extern ssize_t read_in_full_timeout(int fd, void *buf, size_t count, int timeout);
>> extern ssize_t write_in_full(int fd, const void *buf, size_t count);
>> extern ssize_t write_in_full_timeout(int fd, const void *buf, size_t count, int timeout);
>> static inline ssize_t write_str_in_full(int fd, const char *str)
>>diff --git a/pkt-line.c b/pkt-line.c
>>index cf681e9..5a07e97 100644
>>--- a/pkt-line.c
>>+++ b/pkt-line.c
>>@@ -229,3 +229,38 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
>> {
>>       return packet_read_line_generic(-1, src, src_len, dst_len);
>> }
>>+
>
> In what is the timeout measured ?
> seconds, milli years?
> As we use poll() I think it is milli seconds.
> (I like the idea of naming timeout timeout_ms)

Yes in ms. Will rename.

>>+      len -= 4;
>>+      if (len >= buf_len) {
>>+              error("protocol error: bad line length %d", len);
>>+              return NULL;
>>+      }
>>+      if ((ret = read_in_full_timeout(fd, buf, len, timeout)) < 0)
>>+              return NULL;
> Do we want a packet_trace here?

Compared to packet_read_line(), probably not.

> When a timeout occurs, do we want to close the connection,
> marking it as dead?
> Or need to look at errno?

We do want to close the connection, but the caller should do that and
clear the file handle variable at the same time. Or we risk race
condition.

>>diff --git a/wrapper.c b/wrapper.c
>>index 9a0e289..9cf10b2 100644
>>--- a/wrapper.c
>>+++ b/wrapper.c
>>@@ -193,6 +193,27 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>>       return total;
>> }
>>
>>+ssize_t read_in_full_timeout(int fd, void *buf, size_t count, int timeout)
>>+{
>>+      char *p = buf;
>>+      ssize_t total = 0;
>>+      struct pollfd pfd;
>>+
>>+      pfd.fd = fd;
>>+      pfd.events = POLLIN;
>>+      while (count > 0 && poll(&pfd, 1, timeout) > 0 &&
>>+             (pfd.revents & POLLIN)) {
>>+              ssize_t loaded = xread(fd, p, count);
>>+              if (loaded <= 0)
>>+                      return -1;
>>+              count -= loaded;
>>+              p += loaded;
>>+              total += loaded;
>>+      }
>>+
>>+      return count ? -1 : total;
> Isn't it that
> ret < 0  means "error of some kind"
> ret == 0 means EOF,
> ret > 0  means some data
> Why do we turn 0 into -1?

Because this is read_in_full. The caller expects either all is read, or errors.

> 9)
>>Access to the unix socket $WATCHER/socket is covered by $WATCHER's
>>permission. While the file watcher does not carry much information,
>>repo file listing is sometimes not something you want other users to
>>see. Make sure $WATCHER has 0700 permission to stop unwanted access.
>
>>Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>>---
>> file-watcher.c | 32 ++++++++++++++++++++++++++++++++
>> 1 file changed, 32 insertions(+)
>
> I feel a little bit unsure about the 700.
> Most often Git does not care about permissions,
> and relies on umask being set appropriatly.
> (Please correct me if I'm wrong)

Git does care. See credential-cache--daemon.c. In fact this function
is a copy of check_socket_directory() from that file.

> My spontanous feeling is that adjust_shared_perm() could be used.
>
> 10)
> An other thing:
>>strbuf_addf(&sb, "%s/socket", socket_path);
> Does it make sense to name the socket "%s/watcher" ?

No strong opinions. Whatever name works for me. Will change unless
somebody else raises their voice.

> 11)
>>In daemon mode, stdout and stderr are saved in $WATCHER/log.

Mostly for debugging purposes.

> diff --git a/Documentation/git-file-watcher.txt b/Documentation/git-file-watcher.txt
> index 625a389..ec81f18 100644
> --- a/Documentation/git-file-watcher.txt
> +++ b/Documentation/git-file-watcher.txt
> @@ -18,6 +18,8 @@ lstat(2) to detect that itself.
>
>  OPTIONS
>  -------
> +--detach::
> +       Run in background.
>
> Shouldn't that be named --daemon ?

I just followed the naming convention in git-daemon. I think detach is
more friendly then daemon anyway.

> 12)
>>+filewatcher.path::
>>+      The directory that contains the socket of `git  file-watcher`.
>>+      If it's not an absolute path, it's relative to $GIT_DIR. An
>>+      empty path means no connection to file watcher.
> I would really like to be able to have different tranport mechanismen
> using the same config.
> Today this is only a matter of documentation and naming things:
>
> How about this idea:
> filewatcher.url::
> The directory that contains the socket of `git  file-watcher`.
> It can be an absolute path, meaning a unix domain socket in the
> file system.
> It can be a path relative to $GIT_DIR.
>
> It must start with either
> "/" (absolut path) or
> "./" (relative path to $GIT_DIR) or
> "~/" (relaive path in your $HOME directoy.
> An empty url means no connection to file watcher.
> (Later we can add url schemes like "tcp://" or "pipe://")

Makes sense.

>>+filewatcher.timeout::
>>+      This is the maximum time in milliseconds that Git waits for
>>+      the file watcher to respond before giving up. Default value is
>>+      50. Setting to -1 makes Git wait forever.
> 50 feels low, especially on "older/slower machines"
> 200 is probably acceptable even for the very impatient user,
> I could think of 500 to make the user aware of it.

Note that this timeout should never kick in unless there is an error
in communication. So in practice it should not matter much. OK I lied
a bit because file-watcher hard codes to "ping" the client every 30ms
to avoid this timeout. So it does matter, but very little.

>>+      /*
>>+       * ">" denotes an incoming packet, "<" outgoing. The lack of
>>+       * "<" means no reply expected.
>>+       *
>>+       * < "error" SP ERROR-STRING
>>+       *
>>+       * This can be sent whenever the client violates the protocol.
>>+       */
>>+
>>+      msg = packet_read_line(fd, &len);
>>+      if (!msg) {
>>+              packet_write(fd, "error invalid pkt-line");
>>+              return shutdown_connection(conn_id);
>>+      }
>>+
>>+      /*
>>+       * > "hello" [SP CAP [SP CAP..]]
>>+       * < "hello" [SP CAP [SP CAP..]]
>>+       *
>>+       * Advertise capabilities of both sides. File watcher may
>>+       * disconnect if the client does not advertise the required
>>+       * capabilities. Capabilities in uppercase MUST be
>>+       * supported. If any side does not understand any of the
>>+       * advertised uppercase capabilities, it must disconnect.
>>+       */
>>+      if ((arg = skip_prefix(msg, "hello"))) {
>>+              if (*arg) {     /* no capabilities supported yet */
>>+                      packet_write(fd, "error capabilities not supported");
>>+                      return shutdown_connection(conn_id);
>>+              }
>>+              packet_write(fd, "hello");
>>+              conns[conn_id]->polite = 1;
>>+      }
>>+
>>+      /*
>>+       * > "index" SP INDEX-SIGNATURE SP WORK-TREE-PATH
>>+       * < "ok" | "inconsistent"
>>+       *
>>+       * INDEX-SIGNATURE consists of 40 hexadecimal letters
>>+       * WORK-TREE-PATH must be absolute and normalized path
>>+       *
>>+       * Watch file changes in index. The client sends the index and
>>+       * work tree info. File watcher validates that it holds the
>>+       * same info. If so it sends "ok" back indicating both sides
>>+       * are on the same page and CE_WATCHED bits can be ketpt.
>>+       *
>>+       * Otherwise it sends "inconsistent" and both sides must reset
>>+       * back to initial state. File watcher keeps its index
>>+       * signature all-zero until the client has updated the index
>>+       * ondisk and request to update index signature.
>>+       *
>>+       * "hello" must be exchanged first. After this command the
>>+       * connection is associated with a worktree/index. Many
>>+       * commands may require this to proceed.
>>+       */
>>+      else if (starts_with(msg, "index ")) {
>>+              struct repository *repo;
>>+              struct stat st;
>>+              if (!conns[conn_id]->polite) {
>>+                      packet_write(fd, "error why did you not greet me? go away");
>>+                      return shutdown_connection(conn_id);
>>+              }
>>+              if (len < 47 || msg[46] != ' ' || !is_absolute_path(msg + 47)) {
>>+                      packet_write(fd, "error invalid index line %s", msg);
>>+                      return shutdown_connection(conn_id);
>>+              }
>>+
>>+              if (lstat(msg + 47, &st) || !S_ISDIR(st.st_mode)) {
>>+                      packet_write(fd, "error work tree does not exist: %s",
>>+                                   strerror(errno));
>>+                      return shutdown_connection(conn_id);
>>+              }
>>+              repo = get_repo(msg + 47);
>>+              conns[conn_id]->repo = repo;
>>+              if (memcmp(msg + 6, repo->index_signature, 40) ||
>>+                  !memcmp(msg + 6, invalid_signature, 40) ||
>>+                  repo->inode != st.st_ino) {
>>+                      packet_write(fd, "inconsistent");
>>+                      reset_repo(repo, st.st_ino);
>>+                      return 0;
>>+              }
>>+              packet_write(fd, "ok");
>>+      }
>>+      else {
>>+              packet_write(fd, "error unrecognized command %s", msg);
>>+              return shutdown_connection(conn_id);
> I feel a little bit unsure about this.
> We establish a protocol which is not extendable.

It should be extensible via "hello" message, as described in the
comments above it.

> Do we need to call shutdown_connection() ?

If you receive an unrecognized command, there's not much you can do
but shut the connection down.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-08  8:53       ` Duy Nguyen
@ 2014-02-09 20:19         ` Torsten Bögershausen
  2014-02-10 10:37           ` Duy Nguyen
  2014-02-17 12:36           ` Duy Nguyen
  0 siblings, 2 replies; 72+ messages in thread
From: Torsten Bögershausen @ 2014-02-09 20:19 UTC (permalink / raw)
  To: Duy Nguyen, Torsten Bögershausen; +Cc: Git Mailing List


On 2014-02-08 09.53, Duy Nguyen wrote:
> Thanks for the comments. I can see I now have some work to do in the
> coming weeks :)
> 


--------------------
>>> file-watcher.c | 32 ++++++++++++++++++++++++++++++++
>>> 1 file changed, 32 insertions(+)
>>
>> I feel a little bit unsure about the 700.
>> Most often Git does not care about permissions,
>> and relies on umask being set appropriatly.
>> (Please correct me if I'm wrong)
>
>Git does care. See credential-cache--daemon.c. In fact this function
>is a copy of check_socket_directory() from that file.
>
I was probably a little bit unclear.
Of course credentials should be protected well and stored with 700.
The rest of the repo could be more loose by using adjust_shared_perm().
Because the whole repo can be shared (or not) and data is visible
to the group or everyone.
(this is a minor issue)

Please see filewatcher.c:
+	if (daemon) {
+		int err;
+		strbuf_addf(&sb, "%s/log", socket_path);
+		err = open(sb.buf, O_CREAT | O_TRUNC | O_WRONLY, 0600);
+		adjust_shared_perm(sb.buf);
(And now we talk about the logfile:
"In daemon mode, stdout and stderr are saved in $WATCHER/log."
It could be nice to make this feature configrable,
either XXX/log or /dev/null.
On the long run we may eat to much disc space on a machine.
The other thing is that we may want to seperate stdout
from stderr, but even this is a low prio comment.


----------------
There is a small issue when I tested on a machine,
where the "data directory" called "daten" is softlinked to another disk:
daten -> /disk3/home2/tb/daten

and the "projects" directory is softlinked to "daten/projects"
projects -> daten/projects/

t7514 fails like this:
--- expect      2014-02-08 14:37:07.000000000 +0000
+++ actual      2014-02-08 14:37:07.000000000 +0000
@@ -1,6 +1,6 @@
 packet:          git> hello
 packet:          git< hello
-packet:          git> index 6cb9741eee29ca02c5b79e9c0bc647bcf47ce948 /home/tb/projects/git/tb/t/trash directory.t7514-file-watcher-lib
+packet:          git> index 6cb9741eee29ca02c5b79e9c0bc647bcf47ce948 /disk3/home2/tb/daten/projects/git/tb/t/trash directory.t7514-file-watcher-lib

Could we use relative path names internally, relative to $GIT_DIR ?


-------------------
Another thing:
Compiling under Mingw gives this:
    LINK git-credential-store.exe
libgit.a(file-watcher-lib.o): In function `connect_watcher':
c:\Dokumente und Einstellungen\tb\projects\git\tb/file-watcher-lib.c:21: undefined reference to `unix_stream_connect'
collect2: ld returned 1 exit status
make: *** [git-credential-store.exe] Error 1

We may need a compiler option like HAS_UNIX_SOCKETS or so.

--------------------------
+++ b/file-watcher.c

+#define INOTIFY_MASKS (IN_DELETE_SELF | IN_MOVE_SELF | \
+		       IN_CREATE | IN_ATTRIB | IN_DELETE | IN_MODIFY |	\
+		       IN_MOVED_FROM | IN_MOVED_TO)
This feels confusing:
a) we have inotify_masks with lower case below.
b) how about INOTIFY_NEEDED_BITS ?
---------------




I'm OK with having the protocol having specified in the
test cases.
One thing that I have on the wish list is to make the
commands/responses more unique, to be able to run grep
on the code base.

One idea could be to use a prefix
"fwr" for "file watcher request" or
"fwr" for "file watcher response".
This does not work, hihi, so

"fwq" for "file watcher reQuest" and
"fwe" for "file watcher rEsponse".
Or 
"ffw" as "from file watcher" and
"tfw" as "to file watcher" for the people who have problems
with left and right, < and > could work.

This is all for today.
I will have a look at different error scenarios, what happens
when the watcher crashes and needs to be restarted,
or when Git itself dies with a segfault and doesn't tell the
watcher.

The easiest way to simulate this would be in terms of test cases.
So I will try to write some
/Torsten
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-09 20:19         ` Torsten Bögershausen
@ 2014-02-10 10:37           ` Duy Nguyen
  2014-02-10 16:55             ` Torsten Bögershausen
  2014-02-17 12:36           ` Duy Nguyen
  1 sibling, 1 reply; 72+ messages in thread
From: Duy Nguyen @ 2014-02-10 10:37 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git Mailing List

On Mon, Feb 10, 2014 at 3:19 AM, Torsten Bögershausen <tboegi@web.de> wrote:
> Please see filewatcher.c:
> +       if (daemon) {
> +               int err;
> +               strbuf_addf(&sb, "%s/log", socket_path);
> +               err = open(sb.buf, O_CREAT | O_TRUNC | O_WRONLY, 0600);
> +               adjust_shared_perm(sb.buf);
> (And now we talk about the logfile:
> "In daemon mode, stdout and stderr are saved in $WATCHER/log."
> It could be nice to make this feature configrable,
> either XXX/log or /dev/null.
> On the long run we may eat to much disc space on a machine.
> The other thing is that we may want to seperate stdout
> from stderr, but even this is a low prio comment.

I probably should follow git-daemon and put these to syslog.

> ----------------
> There is a small issue when I tested on a machine,
> where the "data directory" called "daten" is softlinked to another disk:
> daten -> /disk3/home2/tb/daten
>
> and the "projects" directory is softlinked to "daten/projects"
> projects -> daten/projects/
>
> t7514 fails like this:
> --- expect      2014-02-08 14:37:07.000000000 +0000
> +++ actual      2014-02-08 14:37:07.000000000 +0000
> @@ -1,6 +1,6 @@
>  packet:          git> hello
>  packet:          git< hello
> -packet:          git> index 6cb9741eee29ca02c5b79e9c0bc647bcf47ce948 /home/tb/projects/git/tb/t/trash directory.t7514-file-watcher-lib
> +packet:          git> index 6cb9741eee29ca02c5b79e9c0bc647bcf47ce948 /disk3/home2/tb/daten/projects/git/tb/t/trash directory.t7514-file-watcher-lib
>
> Could we use relative path names internally, relative to $GIT_DIR ?

No because this is when the client tell the server about $GIT_DIR. I
guess we can use realpath(1) here.

> -------------------
> Another thing:
> Compiling under Mingw gives this:
>     LINK git-credential-store.exe
> libgit.a(file-watcher-lib.o): In function `connect_watcher':
> c:\Dokumente und Einstellungen\tb\projects\git\tb/file-watcher-lib.c:21: undefined reference to `unix_stream_connect'
> collect2: ld returned 1 exit status
> make: *** [git-credential-store.exe] Error 1
>
> We may need a compiler option like HAS_UNIX_SOCKETS or so.

I'll make unix-socket.o build unconditionally and return error at runtime.

> --------------------------
> +++ b/file-watcher.c
>
> +#define INOTIFY_MASKS (IN_DELETE_SELF | IN_MOVE_SELF | \
> +                      IN_CREATE | IN_ATTRIB | IN_DELETE | IN_MODIFY |  \
> +                      IN_MOVED_FROM | IN_MOVED_TO)
> This feels confusing:
> a) we have inotify_masks with lower case below.
> b) how about INOTIFY_NEEDED_BITS ?
> ---------------

OK

> I'm OK with having the protocol having specified in the
> test cases.
> One thing that I have on the wish list is to make the
> commands/responses more unique, to be able to run grep
> on the code base.
>
> One idea could be to use a prefix
> "fwr" for "file watcher request" or
> "fwr" for "file watcher response".
> This does not work, hihi, so
>
> "fwq" for "file watcher reQuest" and
> "fwe" for "file watcher rEsponse".
> Or
> "ffw" as "from file watcher" and
> "tfw" as "to file watcher" for the people who have problems
> with left and right, < and > could work.

If you want I can update test-file-watcher to accept "send<" and
"recv>" instead of "<" and ">", respectively. The only command with
the same name for response and request is "hello". I can make it
"hello" and "helloack" (or "bonjour" as response?).
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-10 10:37           ` Duy Nguyen
@ 2014-02-10 16:55             ` Torsten Bögershausen
  2014-02-10 23:34               ` Duy Nguyen
  0 siblings, 1 reply; 72+ messages in thread
From: Torsten Bögershausen @ 2014-02-10 16:55 UTC (permalink / raw)
  To: Duy Nguyen, Torsten Bögershausen; +Cc: Git Mailing List

On 2014-02-10 11.37, Duy Nguyen wrote:
>>
>> Could we use relative path names internally, relative to $GIT_DIR ?
> 
> No because this is when the client tell the server about $GIT_DIR. I
> guess we can use realpath(1) here.
Good.

I realized that the watcher can watch several repos at the same time.

However, we could allow relative path names, which will be relative to $SOCKET_DIR,
and loosen the demand for an absolut path name a little bit.
And $SOCKET_DIR can be the same as $GIT_DIR, when we are watching only one repo.
> If you want I can update test-file-watcher to accept "send<" and
> "recv>" instead of "<" and ">", respectively. The only command with
> the same name for response and request is "hello". I can make it
> "hello" and "helloack" (or "bonjour" as response?).

helloack looks good. 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-10 16:55             ` Torsten Bögershausen
@ 2014-02-10 23:34               ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-02-10 23:34 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git Mailing List

On Mon, Feb 10, 2014 at 11:55 PM, Torsten Bögershausen <tboegi@web.de> wrote:
> On 2014-02-10 11.37, Duy Nguyen wrote:
>>>
>>> Could we use relative path names internally, relative to $GIT_DIR ?
>>
>> No because this is when the client tell the server about $GIT_DIR. I
>> guess we can use realpath(1) here.
> Good.
>
> I realized that the watcher can watch several repos at the same time.
>
> However, we could allow relative path names, which will be relative to $SOCKET_DIR,
> and loosen the demand for an absolut path name a little bit.
> And $SOCKET_DIR can be the same as $GIT_DIR, when we are watching only one repo.

It does not help much anyway because file-watcher-lib.c sends
get_git_work_tree(), which is absolute/normalized path, to
file-watcher. There's no sources of sending $GIT_DIR relative to
$SOCKET_DIR (and I don't think we want to make get_git_work_tree()
relative before sending, it's more work on both sides we no benefits,
except for tracing).
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 00/26] inotify support
  2014-02-09 20:19         ` Torsten Bögershausen
  2014-02-10 10:37           ` Duy Nguyen
@ 2014-02-17 12:36           ` Duy Nguyen
  1 sibling, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-02-17 12:36 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git Mailing List

On Mon, Feb 10, 2014 at 3:19 AM, Torsten Bögershausen <tboegi@web.de> wrote:
>
> On 2014-02-08 09.53, Duy Nguyen wrote:
>>>> file-watcher.c | 32 ++++++++++++++++++++++++++++++++
>>>> 1 file changed, 32 insertions(+)
>>>
>>> I feel a little bit unsure about the 700.
>>> Most often Git does not care about permissions,
>>> and relies on umask being set appropriatly.
>>> (Please correct me if I'm wrong)
>>
>>Git does care. See credential-cache--daemon.c. In fact this function
>>is a copy of check_socket_directory() from that file.
>>
> I was probably a little bit unclear.
> Of course credentials should be protected well and stored with 700.
> The rest of the repo could be more loose by using adjust_shared_perm().
> Because the whole repo can be shared (or not) and data is visible
> to the group or everyone.
> (this is a minor issue)

So how about a check whenever a worktree is connected to the daemon,
if that worktree has stricter permission, e.g. 0700 vs 0770 of the
daemon socket directory, then the daemon refuses the worktree (maybe
with a warning)?
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 0/6] inotify support
  2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
@ 2014-02-19 20:35 ` Shawn Pearce
  2014-02-19 23:45   ` Duy Nguyen
  7 siblings, 1 reply; 72+ messages in thread
From: Shawn Pearce @ 2014-02-19 20:35 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Sun, Jan 12, 2014 at 3:03 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> It's been 37 weeks since Robert Zeh's attempt to bring inotify support
> to Git [1] and unless I missed some mails, no updates since. So here's
> another attempt with my preferred approach (can't help it, playing
> with your own ideas is more fun than improving other people's code)
>
> To compare to Robert's approach:
>
> - This one uses UNIX datagram socket. If I read its man page right,
>   unix socket respects the containing directory's permission. Which
>   means on normal repos, only the user process can access. On shared
>   repos, multiple users can access it. This should work on Mac.
>   Windows will need a different transport.
>
> - The daemon is dumb. It passes the paths around and that's it.
>   lstat() is done by git. If I design it right, there's should not be
>   any race conditions that make git miss file updates.
>
> - CE_VALID is reused to avoid mass changes (granted there's other
>   neat ways as well). I quite like the idea of machine-controlled
>   CE_VALID.
>
> inotify support has the potential of reducing syscalls in
> read_directory() as well. I wrote about using lstat() to reduce
> readdir() a while back, if that's implemented then inotify will fit in
> nicely.
>
> This is just a proof of concept. I'm sure I haven't handled all error
> cases very well. The first five patches show the protocol and git
> side's changes. The last one fills inotify in.
>
> [1] http://thread.gmane.org/gmane.comp.version-control.git/215820/focus=222278

Why a new daemon? Why don't we reuse the stable
https://github.com/facebook/watchman project Facebook built to make
Hg's status system fast?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 0/6] inotify support
  2014-02-19 20:35 ` [PATCH 0/6] " Shawn Pearce
@ 2014-02-19 23:45   ` Duy Nguyen
  0 siblings, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2014-02-19 23:45 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Thu, Feb 20, 2014 at 3:35 AM, Shawn Pearce <spearce@spearce.org> wrote:
> Why a new daemon? Why don't we reuse the stable
> https://github.com/facebook/watchman project Facebook built to make
> Hg's status system fast?

I did look briefly through its readme before but there were a few
off-factors like CLA, JSON.. that made me go away. I agree that
reusing the same daemon would save us maintenance code and buy the
stability (and maybe sharing some inotify handles for users that use
both git and hg). I'll need to have a closer look at it and compare
with what I've got, now that I have a better picture of how things
should work.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2014-02-19 23:45 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 1/6] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 2/6] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
2014-01-13 17:02   ` Jonathan Nieder
2014-01-14  1:25     ` Duy Nguyen
2014-01-14  1:39   ` Duy Nguyen
2014-01-12 11:03 ` [PATCH 3/6] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
2014-01-15 10:58   ` Jeff King
2014-01-12 11:03 ` [PATCH 4/6] read-cache: get "updated" path list from " Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 5/6] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 6/6] file-watcher: support inotify Nguyễn Thái Ngọc Duy
2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 01/14] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
2014-01-17 11:19     ` Thomas Gummerer
2014-01-19 17:06     ` Thomas Rast
2014-01-20  1:38       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 03/14] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
2014-01-17 15:24     ` Torsten Bögershausen
2014-01-17 16:21       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 04/14] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 05/14] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
2014-01-19 17:06     ` Thomas Rast
2014-01-20  1:36       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 06/14] read-cache: get modified file list from file watcher Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 07/14] read-cache: add config to start file watcher automatically Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing Nguyễn Thái Ngọc Duy
2014-01-19 17:04     ` Thomas Rast
2014-01-20  1:32       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 09/14] file-watcher: add --shutdown and --log options Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 10/14] file-watcher: automatically quit Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 11/14] file-watcher: support inotify Nguyễn Thái Ngọc Duy
2014-01-19 17:04   ` [PATCH/WIP v2 00/14] inotify support Thomas Rast
2014-01-20  1:28     ` Duy Nguyen
2014-01-20 21:51       ` Thomas Rast
2014-01-28 10:46     ` Duy Nguyen
2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 01/26] pkt-line.c: rename global variable buffer[] to something less generic Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 02/26] pkt-line.c: add packet_write_timeout() Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 03/26] pkt-line.c: add packet_read_line_timeout() Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 04/26] unix-socket: make unlink() optional in unix_stream_listen() Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 05/26] Add git-file-watcher and basic connection handling logic Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 06/26] file-watcher: check socket directory permission Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 07/26] file-watcher: remove socket on exit Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 08/26] file-watcher: add --detach Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 09/26] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 10/26] read-cache: new flag CE_WATCHED to mark what file is watched Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 11/26] Clear CE_WATCHED when set CE_VALID alone Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 12/26] read-cache: basic hand shaking to the file watcher Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 13/26] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 14/26] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 15/26] read-cache: get changed file list from file watcher Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 16/26] git-compat-util.h: add inotify stubs on non-Linux platforms Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 17/26] file-watcher: inotify support, watching part Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 18/26] file-watcher: inotify support, notification part Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 19/26] Wrap CE_VALID test with ce_valid() Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 20/26] read-cache: new variable to verify file-watcher results Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 21/26] Support running file watcher with the test suite Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 22/26] file-watcher: quit if $WATCHER/socket is gone Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 23/26] file-watcher: tests for the daemon Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 24/26] ls-files: print CE_WATCHED as W (or "w" with CE_VALID) Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 25/26] file-watcher: tests for the client side Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 26/26] Disable file-watcher with system inotify on some tests Nguyễn Thái Ngọc Duy
2014-02-08  8:04     ` [PATCH v3 00/26] inotify support Torsten Bögershausen
2014-02-08  8:53       ` Duy Nguyen
2014-02-09 20:19         ` Torsten Bögershausen
2014-02-10 10:37           ` Duy Nguyen
2014-02-10 16:55             ` Torsten Bögershausen
2014-02-10 23:34               ` Duy Nguyen
2014-02-17 12:36           ` Duy Nguyen
2014-02-19 20:35 ` [PATCH 0/6] " Shawn Pearce
2014-02-19 23:45   ` Duy Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.