git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/32] Split index mode for very large indexes
@ 2014-04-28 10:55 Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
                   ` (34 more replies)
  0 siblings, 35 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

I hinted about it earlier [1]. It now passes the test suite and with a
design that I'm happy with (thanks to Junio for a suggestion about the
rename problem).

From the user point of view, this reduces the writable size of index
down to the number of updated files. For example my webkit index v4 is
14MB. With a fresh split, I only have to update an index of 200KB.
Every file I touch will add about 80 bytes to that. As long as I don't
touch every single tracked file in my worktree, I should not pay
penalty for writing 14MB index file on every operation.

The read penalty is not addressed here, so I still pay 14MB hashing
cost. But that's an easy problem. We could cache the validated index
in a daemon. Whenever git needs to load an index, it pokes the daemon.
The daemon verifies that the on-disk index still has the same
signature, then sends the in-mem index to git. When git updates the
index, it pokes the daemon again to update in-mem index. Next time git
reads the index, it does not have to pay I/O cost any more (actually
it does but the cost is hidden away when you do not have to read it
yet).

The forth patch is not really necessary. I started out with a
different approach that needed that abstraction. But I think it's
still a nice thing to keep. The real meat starts from 0017 to 0025. In
essence, the new index is more like a journal, where the real index is
put away unchanged.

Doing this in other implementations should be easy (at least the
reading part) and with small code change. The whole index format is
retained. All you need is to read a new extension that contains two
ewah-bitmaps and apply the changes to create the final index.

This is a preparation step for my untracked file cache. With writing
(and later on reading) index becoming cheap, I can start to put more
things in there.

[1] http://thread.gmane.org/gmane.comp.version-control.git/246471/focus=247031

Nguyễn Thái Ngọc Duy (32):
  ewah: fix constness of ewah_read_mmap
  ewah: delete unused ewah_read_mmap_native declaration
  sequencer: do not update/refresh index if the lock cannot be held
  read-cache: new API write_locked_index instead of write_index/write_cache
  read-cache: relocate and unexport commit_locked_index()
  read-cache: store in-memory flags in the first 12 bits of ce_flags
  read-cache: be strict about "changed" in remove_marked_cache_entries()
  read-cache: be specific what part of the index has changed
  update-index: be specific what part of the index has changed
  resolve-undo: be specific what part of the index has changed
  unpack-trees: be specific what part of the index has changed
  cache-tree: mark istate->cache_changed on cache tree invalidation
  cache-tree: mark istate->cache_changed on cache tree update
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  read-cache: save index SHA-1 after reading
  read-cache: split-index mode
  read-cache: mark new entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark updated entries for split index
  split-index: the writing part
  split-index: the reading part
  split-index: do not invalidate cache-tree at read time
  split-index: strip pathname of on-disk replaced entries
  update-index: new options to enable/disable split index mode
  update-index --split-index: do not split if $GIT_DIR is read only
  rev-parse: add --shared-index-path to get shared index path
  read-tree: force split-index mode off on --index-output
  read-tree: note about dropping split-index mode or index version
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  t2104: make sure split index mode is off for the version test
  t1700: new tests for split-index mode
-- 
1.9.1.346.ga2b5940

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 01/32] ewah: fix constness of ewah_read_mmap
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration Nguyễn Thái Ngọc Duy
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 ewah/ewah_io.c | 4 ++--
 ewah/ewok.h    | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index f7f700e..1c2d7af 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -110,9 +110,9 @@ int ewah_serialize(struct ewah_bitmap *self, int fd)
 	return ewah_serialize_to(self, write_helper, (void *)(intptr_t)fd);
 }
 
-int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len)
+int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len)
 {
-	uint8_t *ptr = map;
+	const uint8_t *ptr = map;
 	size_t i;
 
 	self->bit_size = get_be32(ptr);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 43adeb5..0556ca5 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -99,7 +99,7 @@ int ewah_serialize(struct ewah_bitmap *self, int fd);
 int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 
 int ewah_deserialize(struct ewah_bitmap *self, int fd);
-int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len);
+int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
 int ewah_read_mmap_native(struct ewah_bitmap *self, void *map, size_t len);
 
 uint32_t ewah_checksum(struct ewah_bitmap *self);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held Nguyễn Thái Ngọc Duy
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 ewah/ewok.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/ewah/ewok.h b/ewah/ewok.h
index 0556ca5..f6ad190 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -100,7 +100,6 @@ int ewah_serialize_native(struct ewah_bitmap *self, int fd);
 
 int ewah_deserialize(struct ewah_bitmap *self, int fd);
 int ewah_read_mmap(struct ewah_bitmap *self, const void *map, size_t len);
-int ewah_read_mmap_native(struct ewah_bitmap *self, void *map, size_t len);
 
 uint32_t ewah_checksum(struct ewah_bitmap *self);
 
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache Nguyễn Thái Ngọc Duy
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index bde5f04..7b886a6 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -679,7 +679,7 @@ static void read_and_refresh_cache(struct replay_opts *opts)
 	if (read_index_preload(&the_index, NULL) < 0)
 		die(_("git %s: failed to read the index"), action_name(opts));
 	refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, NULL, NULL, NULL);
-	if (the_index.cache_changed) {
+	if (the_index.cache_changed && index_fd >= 0) {
 		if (write_index(&the_index, index_fd) ||
 		    commit_locked_index(&index_lock))
 			die(_("git %s: failed to refresh the index"), action_name(opts));
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 05/32] read-cache: relocate and unexport commit_locked_index() Nguyễn Thái Ngọc Duy
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/add.c            |  6 ++----
 builtin/apply.c          |  9 ++++-----
 builtin/checkout-index.c |  3 +--
 builtin/checkout.c       | 11 ++++-------
 builtin/clone.c          |  7 +++----
 builtin/commit.c         | 33 ++++++++++++++-------------------
 builtin/merge.c          | 12 ++++--------
 builtin/mv.c             |  7 +++----
 builtin/read-tree.c      |  7 +++----
 builtin/reset.c          |  5 ++---
 builtin/rm.c             |  7 +++----
 builtin/update-index.c   |  3 +--
 cache-tree.c             |  3 +--
 cache.h                  |  6 ++++--
 merge-recursive.c        |  7 +++----
 merge.c                  |  7 +++----
 read-cache.c             | 28 ++++++++++++++++++++++++----
 rerere.c                 |  3 +--
 sequencer.c              | 12 +++++-------
 test-scrap-cache-tree.c  |  5 ++---
 20 files changed, 87 insertions(+), 94 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 459208a..4baf3a5 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -299,7 +299,6 @@ static int add_files(struct dir_struct *dir, int flags)
 int cmd_add(int argc, const char **argv, const char *prefix)
 {
 	int exit_status = 0;
-	int newfd;
 	struct pathspec pathspec;
 	struct dir_struct dir;
 	int flags;
@@ -345,7 +344,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only;
 	require_pathspec = !take_worktree_changes;
 
-	newfd = hold_locked_index(&lock_file, 1);
+	hold_locked_index(&lock_file, 1);
 
 	flags = ((verbose ? ADD_CACHE_VERBOSE : 0) |
 		 (show_only ? ADD_CACHE_PRETEND : 0) |
@@ -443,8 +442,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 finish:
 	if (active_cache_changed) {
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(&lock_file))
+		if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 			die(_("Unable to write new index file"));
 	}
 
diff --git a/builtin/apply.c b/builtin/apply.c
index 87439fa..5e13444 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3644,7 +3644,7 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
 {
 	struct patch *patch;
 	struct index_state result = { NULL };
-	int fd;
+	static struct lock_file lock;
 
 	/* Once we start supporting the reverse patch, it may be
 	 * worth showing the new sha1 prefix, but until then...
@@ -3682,8 +3682,8 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
 			die ("Could not add %s to temporary index", name);
 	}
 
-	fd = open(filename, O_WRONLY | O_CREAT, 0666);
-	if (fd < 0 || write_index(&result, fd) || close(fd))
+	hold_lock_file_for_update(&lock, filename, LOCK_DIE_ON_ERROR);
+	if (write_locked_index(&result, &lock, COMMIT_LOCK))
 		die ("Could not write temporary index to %s", filename);
 
 	discard_index(&result);
@@ -4501,8 +4501,7 @@ int cmd_apply(int argc, const char **argv, const char *prefix_)
 	}
 
 	if (update_index) {
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(&lock_file))
+		if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 			die(_("Unable to write new index file"));
 	}
 
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 61e75eb..9e49bf2 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -279,8 +279,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		checkout_all(prefix, prefix_length);
 
 	if (0 <= newfd &&
-	    (write_cache(newfd, active_cache, active_nr) ||
-	     commit_locked_index(&lock_file)))
+	    write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die("Unable to write new index file");
 	return 0;
 }
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 07cf555..944a634 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -225,7 +225,6 @@ static int checkout_paths(const struct checkout_opts *opts,
 	int flag;
 	struct commit *head;
 	int errs = 0;
-	int newfd;
 	struct lock_file *lock_file;
 
 	if (opts->track != BRANCH_TRACK_UNSPECIFIED)
@@ -256,7 +255,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 
 	lock_file = xcalloc(1, sizeof(struct lock_file));
 
-	newfd = hold_locked_index(lock_file, 1);
+	hold_locked_index(lock_file, 1);
 	if (read_cache_preload(&opts->pathspec) < 0)
 		return error(_("corrupt index file"));
 
@@ -352,8 +351,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 		}
 	}
 
-	if (write_cache(newfd, active_cache, active_nr) ||
-	    commit_locked_index(lock_file))
+	if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
 	read_ref_full("HEAD", rev, 0, &flag);
@@ -444,8 +442,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
 {
 	int ret;
 	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
-	int newfd = hold_locked_index(lock_file, 1);
 
+	hold_locked_index(lock_file, 1);
 	if (read_cache_preload(NULL) < 0)
 		return error(_("corrupt index file"));
 
@@ -553,8 +551,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		}
 	}
 
-	if (write_cache(newfd, active_cache, active_nr) ||
-	    commit_locked_index(lock_file))
+	if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
 	if (!opts->force && !opts->quiet)
diff --git a/builtin/clone.c b/builtin/clone.c
index 9b3c04d..48f91f5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -616,7 +616,7 @@ static int checkout(void)
 	struct unpack_trees_options opts;
 	struct tree *tree;
 	struct tree_desc t;
-	int err = 0, fd;
+	int err = 0;
 
 	if (option_no_checkout)
 		return 0;
@@ -640,7 +640,7 @@ static int checkout(void)
 	setup_work_tree();
 
 	lock_file = xcalloc(1, sizeof(struct lock_file));
-	fd = hold_locked_index(lock_file, 1);
+	hold_locked_index(lock_file, 1);
 
 	memset(&opts, 0, sizeof opts);
 	opts.update = 1;
@@ -656,8 +656,7 @@ static int checkout(void)
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
 
-	if (write_cache(fd, active_cache, active_nr) ||
-	    commit_locked_index(lock_file))
+	if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
diff --git a/builtin/commit.c b/builtin/commit.c
index 9cfef6c..243b0c3 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -305,7 +305,6 @@ static void refresh_cache_or_die(int refresh_flags)
 static char *prepare_index(int argc, const char **argv, const char *prefix,
 			   const struct commit *current_head, int is_status)
 {
-	int fd;
 	struct string_list partial;
 	struct pathspec pathspec;
 	int refresh_flags = REFRESH_QUIET;
@@ -321,12 +320,11 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
 
 	if (interactive) {
 		char *old_index_env = NULL;
-		fd = hold_locked_index(&index_lock, 1);
+		hold_locked_index(&index_lock, 1);
 
 		refresh_cache_or_die(refresh_flags);
 
-		if (write_cache(fd, active_cache, active_nr) ||
-		    close_lock_file(&index_lock))
+		if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
 			die(_("unable to create temporary index"));
 
 		old_index_env = getenv(INDEX_ENVIRONMENT);
@@ -360,12 +358,11 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
 	 * (B) on failure, rollback the real index.
 	 */
 	if (all || (also && pathspec.nr)) {
-		fd = hold_locked_index(&index_lock, 1);
+		hold_locked_index(&index_lock, 1);
 		add_files_to_cache(also ? prefix : NULL, &pathspec, 0);
 		refresh_cache_or_die(refresh_flags);
 		update_main_cache_tree(WRITE_TREE_SILENT);
-		if (write_cache(fd, active_cache, active_nr) ||
-		    close_lock_file(&index_lock))
+		if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
 			die(_("unable to write new_index file"));
 		commit_style = COMMIT_NORMAL;
 		return index_lock.filename;
@@ -381,12 +378,12 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
 	 * We still need to refresh the index here.
 	 */
 	if (!only && !pathspec.nr) {
-		fd = hold_locked_index(&index_lock, 1);
+		hold_locked_index(&index_lock, 1);
 		refresh_cache_or_die(refresh_flags);
 		if (active_cache_changed) {
 			update_main_cache_tree(WRITE_TREE_SILENT);
-			if (write_cache(fd, active_cache, active_nr) ||
-			    commit_locked_index(&index_lock))
+			if (write_locked_index(&the_index, &index_lock,
+					       COMMIT_LOCK))
 				die(_("unable to write new_index file"));
 		} else {
 			rollback_lock_file(&index_lock);
@@ -432,24 +429,22 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
 	if (read_cache() < 0)
 		die(_("cannot read the index"));
 
-	fd = hold_locked_index(&index_lock, 1);
+	hold_locked_index(&index_lock, 1);
 	add_remove_files(&partial);
 	refresh_cache(REFRESH_QUIET);
-	if (write_cache(fd, active_cache, active_nr) ||
-	    close_lock_file(&index_lock))
+	if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
 		die(_("unable to write new_index file"));
 
-	fd = hold_lock_file_for_update(&false_lock,
-				       git_path("next-index-%"PRIuMAX,
-						(uintmax_t) getpid()),
-				       LOCK_DIE_ON_ERROR);
+	hold_lock_file_for_update(&false_lock,
+				  git_path("next-index-%"PRIuMAX,
+					   (uintmax_t) getpid()),
+				  LOCK_DIE_ON_ERROR);
 
 	create_base_index(current_head);
 	add_remove_files(&partial);
 	refresh_cache(REFRESH_QUIET);
 
-	if (write_cache(fd, active_cache, active_nr) ||
-	    close_lock_file(&false_lock))
+	if (write_locked_index(&the_index, &false_lock, CLOSE_LOCK))
 		die(_("unable to write temporary index file"));
 
 	discard_cache();
diff --git a/builtin/merge.c b/builtin/merge.c
index 66d8843..bf770b6 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -657,14 +657,12 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 			      struct commit_list *remoteheads,
 			      struct commit *head, const char *head_arg)
 {
-	int index_fd;
 	struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
 
-	index_fd = hold_locked_index(lock, 1);
+	hold_locked_index(lock, 1);
 	refresh_cache(REFRESH_QUIET);
 	if (active_cache_changed &&
-			(write_cache(index_fd, active_cache, active_nr) ||
-			 commit_locked_index(lock)))
+	    write_locked_index(&the_index, lock, COMMIT_LOCK))
 		return error(_("Unable to write index."));
 	rollback_lock_file(lock);
 
@@ -672,7 +670,6 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 		int clean, x;
 		struct commit *result;
 		struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
-		int index_fd;
 		struct commit_list *reversed = NULL;
 		struct merge_options o;
 		struct commit_list *j;
@@ -700,12 +697,11 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 		for (j = common; j; j = j->next)
 			commit_list_insert(j->item, &reversed);
 
-		index_fd = hold_locked_index(lock, 1);
+		hold_locked_index(lock, 1);
 		clean = merge_recursive(&o, head,
 				remoteheads->item, reversed, &result);
 		if (active_cache_changed &&
-				(write_cache(index_fd, active_cache, active_nr) ||
-				 commit_locked_index(lock)))
+		    write_locked_index(&the_index, lock, COMMIT_LOCK))
 			die (_("unable to write %s"), get_index_file());
 		rollback_lock_file(lock);
 		return clean ? 0 : 1;
diff --git a/builtin/mv.c b/builtin/mv.c
index 2a7243f..db40777 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -63,7 +63,7 @@ static struct lock_file lock_file;
 
 int cmd_mv(int argc, const char **argv, const char *prefix)
 {
-	int i, newfd, gitmodules_modified = 0;
+	int i, gitmodules_modified = 0;
 	int verbose = 0, show_only = 0, force = 0, ignore_errors = 0;
 	struct option builtin_mv_options[] = {
 		OPT__VERBOSE(&verbose, N_("be verbose")),
@@ -85,7 +85,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 	if (--argc < 1)
 		usage_with_options(builtin_mv_usage, builtin_mv_options);
 
-	newfd = hold_locked_index(&lock_file, 1);
+	hold_locked_index(&lock_file, 1);
 	if (read_cache() < 0)
 		die(_("index file corrupt"));
 
@@ -275,8 +275,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		stage_updated_gitmodules();
 
 	if (active_cache_changed) {
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(&lock_file))
+		if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 			die(_("Unable to write new index file"));
 	}
 
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 0d7ef84..f26d90f 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -99,7 +99,7 @@ static struct lock_file lock_file;
 
 int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
 {
-	int i, newfd, stage = 0;
+	int i, stage = 0;
 	unsigned char sha1[20];
 	struct tree_desc t[MAX_UNPACK_TREES];
 	struct unpack_trees_options opts;
@@ -149,7 +149,7 @@ int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
 	argc = parse_options(argc, argv, unused_prefix, read_tree_options,
 			     read_tree_usage, 0);
 
-	newfd = hold_locked_index(&lock_file, 1);
+	hold_locked_index(&lock_file, 1);
 
 	prefix_set = opts.prefix ? 1 : 0;
 	if (1 < opts.merge + opts.reset + prefix_set)
@@ -233,8 +233,7 @@ int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
 	if (nr_trees == 1 && !opts.prefix)
 		prime_cache_tree(&active_cache_tree, trees[0]);
 
-	if (write_cache(newfd, active_cache, active_nr) ||
-	    commit_locked_index(&lock_file))
+	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die("unable to write new index file");
 	return 0;
 }
diff --git a/builtin/reset.c b/builtin/reset.c
index f4e0875..0c56d28 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -351,7 +351,7 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
 
 	if (reset_type != SOFT) {
 		struct lock_file *lock = xcalloc(1, sizeof(*lock));
-		int newfd = hold_locked_index(lock, 1);
+		hold_locked_index(lock, 1);
 		if (reset_type == MIXED) {
 			int flags = quiet ? REFRESH_QUIET : REFRESH_IN_PORCELAIN;
 			if (read_from_tree(&pathspec, sha1, intent_to_add))
@@ -367,8 +367,7 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
 				die(_("Could not reset index file to revision '%s'."), rev);
 		}
 
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(lock))
+		if (write_locked_index(&the_index, lock, COMMIT_LOCK))
 			die(_("Could not write new index file."));
 	}
 
diff --git a/builtin/rm.c b/builtin/rm.c
index 960634d..bc6490b 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -278,7 +278,7 @@ static struct option builtin_rm_options[] = {
 
 int cmd_rm(int argc, const char **argv, const char *prefix)
 {
-	int i, newfd;
+	int i;
 	struct pathspec pathspec;
 	char *seen;
 
@@ -293,7 +293,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (!index_only)
 		setup_work_tree();
 
-	newfd = hold_locked_index(&lock_file, 1);
+	hold_locked_index(&lock_file, 1);
 
 	if (read_cache() < 0)
 		die(_("index file corrupt"));
@@ -427,8 +427,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 	}
 
 	if (active_cache_changed) {
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(&lock_file))
+		if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 			die(_("Unable to write new index file"));
 	}
 
diff --git a/builtin/update-index.c b/builtin/update-index.c
index ba54e19..42cbe4b 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -921,8 +921,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				exit(128);
 			unable_to_lock_index_die(get_index_file(), lock_error);
 		}
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(lock_file))
+		if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 			die("Unable to write new index file");
 	}
 
diff --git a/cache-tree.c b/cache-tree.c
index 7fa524a..52f8692 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -595,8 +595,7 @@ int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)
 				      active_nr, flags) < 0)
 			return WRITE_TREE_UNMERGED_INDEX;
 		if (0 <= newfd) {
-			if (!write_cache(newfd, active_cache, active_nr) &&
-			    !commit_lock_file(lock_file))
+			if (!write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 				newfd = -1;
 		}
 		/* Not being able to write is fine -- we are only interested
diff --git a/cache.h b/cache.h
index 107ac61..9cc2b97 100644
--- a/cache.h
+++ b/cache.h
@@ -301,7 +301,6 @@ extern void free_name_hash(struct index_state *istate);
 #define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec))
 #define is_cache_unborn() is_index_unborn(&the_index)
 #define read_cache_unmerged() read_index_unmerged(&the_index)
-#define write_cache(newfd, cache, entries) write_index(&the_index, (newfd))
 #define discard_cache() discard_index(&the_index)
 #define unmerged_cache() unmerged_index(&the_index)
 #define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
@@ -456,12 +455,15 @@ extern int daemonize(void);
 	} while (0)
 
 /* Initialize and use the cache information */
+struct lock_file;
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
-extern int write_index(struct index_state *, int newfd);
+#define COMMIT_LOCK		(1 << 0)
+#define CLOSE_LOCK		(1 << 1)
+extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
diff --git a/merge-recursive.c b/merge-recursive.c
index 4177092..442c1ec 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1986,7 +1986,7 @@ int merge_recursive_generic(struct merge_options *o,
 			    const unsigned char **base_list,
 			    struct commit **result)
 {
-	int clean, index_fd;
+	int clean;
 	struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
 	struct commit *head_commit = get_ref(head, o->branch1);
 	struct commit *next_commit = get_ref(merge, o->branch2);
@@ -2003,12 +2003,11 @@ int merge_recursive_generic(struct merge_options *o,
 		}
 	}
 
-	index_fd = hold_locked_index(lock, 1);
+	hold_locked_index(lock, 1);
 	clean = merge_recursive(o, head_commit, next_commit, ca,
 			result);
 	if (active_cache_changed &&
-			(write_cache(index_fd, active_cache, active_nr) ||
-			 commit_locked_index(lock)))
+	    write_locked_index(&the_index, lock, COMMIT_LOCK))
 		return error(_("Unable to write index."));
 
 	return clean ? 0 : 1;
diff --git a/merge.c b/merge.c
index 70f1000..610725c 100644
--- a/merge.c
+++ b/merge.c
@@ -66,13 +66,13 @@ int checkout_fast_forward(const unsigned char *head,
 	struct tree *trees[MAX_UNPACK_TREES];
 	struct unpack_trees_options opts;
 	struct tree_desc t[MAX_UNPACK_TREES];
-	int i, fd, nr_trees = 0;
+	int i, nr_trees = 0;
 	struct dir_struct dir;
 	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 
 	refresh_cache(REFRESH_QUIET);
 
-	fd = hold_locked_index(lock_file, 1);
+	hold_locked_index(lock_file, 1);
 
 	memset(&trees, 0, sizeof(trees));
 	memset(&opts, 0, sizeof(opts));
@@ -105,8 +105,7 @@ int checkout_fast_forward(const unsigned char *head,
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return -1;
-	if (write_cache(fd, active_cache, active_nr) ||
-		commit_locked_index(lock_file))
+	if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 	return 0;
 }
diff --git a/read-cache.c b/read-cache.c
index ba13353..44d4732 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1779,13 +1779,11 @@ static int has_racy_timestamp(struct index_state *istate)
 void update_index_if_able(struct index_state *istate, struct lock_file *lockfile)
 {
 	if ((istate->cache_changed || has_racy_timestamp(istate)) &&
-	    !write_index(istate, lockfile->fd))
-		commit_locked_index(lockfile);
-	else
+	    write_locked_index(istate, lockfile, COMMIT_LOCK))
 		rollback_lock_file(lockfile);
 }
 
-int write_index(struct index_state *istate, int newfd)
+static int do_write_index(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
@@ -1877,6 +1875,28 @@ int write_index(struct index_state *istate, int newfd)
 	return 0;
 }
 
+static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
+				 unsigned flags)
+{
+	int ret = do_write_index(istate, lock->fd);
+	if (ret)
+		return ret;
+	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
+	       (COMMIT_LOCK | CLOSE_LOCK));
+	if (flags & COMMIT_LOCK)
+		return commit_locked_index(lock);
+	else if (flags & CLOSE_LOCK)
+		return close_lock_file(lock);
+	else
+		return ret;
+}
+
+int write_locked_index(struct index_state *istate, struct lock_file *lock,
+		       unsigned flags)
+{
+	return do_write_locked_index(istate, lock, flags);
+}
+
 /*
  * Read the index file that is potentially unmerged into given
  * index_state, dropping any unmerged entries.  Returns true if
diff --git a/rerere.c b/rerere.c
index d55aa8a..ffc6a5b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -492,8 +492,7 @@ static int update_paths(struct string_list *update)
 	}
 
 	if (!status && active_cache_changed) {
-		if (write_cache(fd, active_cache, active_nr) ||
-		    commit_locked_index(&index_lock))
+		if (write_locked_index(&the_index, &index_lock, COMMIT_LOCK))
 			die("Unable to write new index file");
 	} else if (fd >= 0)
 		rollback_lock_file(&index_lock);
diff --git a/sequencer.c b/sequencer.c
index 7b886a6..4fb0774 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -294,11 +294,11 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
 {
 	struct merge_options o;
 	struct tree *result, *next_tree, *base_tree, *head_tree;
-	int clean, index_fd;
+	int clean;
 	const char **xopt;
 	static struct lock_file index_lock;
 
-	index_fd = hold_locked_index(&index_lock, 1);
+	hold_locked_index(&index_lock, 1);
 
 	read_cache();
 
@@ -319,8 +319,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
 			    next_tree, base_tree, &result);
 
 	if (active_cache_changed &&
-	    (write_cache(index_fd, active_cache, active_nr) ||
-	     commit_locked_index(&index_lock)))
+	    write_locked_index(&the_index, &index_lock, COMMIT_LOCK))
 		/* TRANSLATORS: %s will be "revert" or "cherry-pick" */
 		die(_("%s: Unable to write new index file"), action_name(opts));
 	rollback_lock_file(&index_lock);
@@ -675,13 +674,12 @@ static void prepare_revs(struct replay_opts *opts)
 static void read_and_refresh_cache(struct replay_opts *opts)
 {
 	static struct lock_file index_lock;
-	int index_fd = hold_locked_index(&index_lock, 0);
+	hold_locked_index(&index_lock, 0);
 	if (read_index_preload(&the_index, NULL) < 0)
 		die(_("git %s: failed to read the index"), action_name(opts));
 	refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, NULL, NULL, NULL);
 	if (the_index.cache_changed && index_fd >= 0) {
-		if (write_index(&the_index, index_fd) ||
-		    commit_locked_index(&index_lock))
+		if (write_locked_index(&the_index, &index_lock, COMMIT_LOCK))
 			die(_("git %s: failed to refresh the index"), action_name(opts));
 	}
 	rollback_lock_file(&index_lock);
diff --git a/test-scrap-cache-tree.c b/test-scrap-cache-tree.c
index 4728013..9ebcbca 100644
--- a/test-scrap-cache-tree.c
+++ b/test-scrap-cache-tree.c
@@ -6,12 +6,11 @@ static struct lock_file index_lock;
 
 int main(int ac, char **av)
 {
-	int fd = hold_locked_index(&index_lock, 1);
+	hold_locked_index(&index_lock, 1);
 	if (read_cache() < 0)
 		die("unable to read index file");
 	active_cache_tree = NULL;
-	if (write_cache(fd, active_cache, active_nr)
-	    || commit_lock_file(&index_lock))
+	if (write_locked_index(&the_index, &index_lock, COMMIT_LOCK))
 		die("unable to write index file");
 	return 0;
 }
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 05/32] read-cache: relocate and unexport commit_locked_index()
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags Nguyễn Thái Ngọc Duy
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  1 -
 lockfile.c   | 20 --------------------
 read-cache.c | 20 ++++++++++++++++++++
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 9cc2b97..e44048c 100644
--- a/cache.h
+++ b/cache.h
@@ -552,7 +552,6 @@ extern int commit_lock_file(struct lock_file *);
 extern void update_index_if_able(struct index_state *, struct lock_file *);
 
 extern int hold_locked_index(struct lock_file *, int);
-extern int commit_locked_index(struct lock_file *);
 extern void set_alternate_index_output(const char *);
 extern int close_lock_file(struct lock_file *);
 extern void rollback_lock_file(struct lock_file *);
diff --git a/lockfile.c b/lockfile.c
index 8fbcb6a..b706614 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -5,7 +5,6 @@
 #include "sigchain.h"
 
 static struct lock_file *lock_file_list;
-static const char *alternate_index_output;
 
 static void remove_lock_file(void)
 {
@@ -252,25 +251,6 @@ int hold_locked_index(struct lock_file *lk, int die_on_error)
 					 : 0);
 }
 
-void set_alternate_index_output(const char *name)
-{
-	alternate_index_output = name;
-}
-
-int commit_locked_index(struct lock_file *lk)
-{
-	if (alternate_index_output) {
-		if (lk->fd >= 0 && close_lock_file(lk))
-			return -1;
-		if (rename(lk->filename, alternate_index_output))
-			return -1;
-		lk->filename[0] = 0;
-		return 0;
-	}
-	else
-		return commit_lock_file(lk);
-}
-
 void rollback_lock_file(struct lock_file *lk)
 {
 	if (lk->filename[0]) {
diff --git a/read-cache.c b/read-cache.c
index 44d4732..576e506 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -36,6 +36,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 
 struct index_state the_index;
+static const char *alternate_index_output;
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
@@ -1875,6 +1876,25 @@ static int do_write_index(struct index_state *istate, int newfd)
 	return 0;
 }
 
+void set_alternate_index_output(const char *name)
+{
+	alternate_index_output = name;
+}
+
+static int commit_locked_index(struct lock_file *lk)
+{
+	if (alternate_index_output) {
+		if (lk->fd >= 0 && close_lock_file(lk))
+			return -1;
+		if (rename(lk->filename, alternate_index_output))
+			return -1;
+		lk->filename[0] = 0;
+		return 0;
+	}
+	else
+		return commit_lock_file(lk);
+}
+
 static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
 				 unsigned flags)
 {
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 05/32] read-cache: relocate and unexport commit_locked_index() Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries() Nguyễn Thái Ngọc Duy
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      | 2 +-
 read-cache.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index e44048c..57ad318 100644
--- a/cache.h
+++ b/cache.h
@@ -145,7 +145,7 @@ struct cache_entry {
 #define CE_STAGESHIFT 12
 
 /*
- * Range 0xFFFF0000 in ce_flags is divided into
+ * Range 0xFFFF0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
  * Flags in CE_EXTENDED_FLAGS will get saved on-disk
  * if you want to save a new flag, add it in
diff --git a/read-cache.c b/read-cache.c
index 576e506..5761b1f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1702,7 +1702,7 @@ static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
 	ondisk->size = htonl(ce->ce_stat_data.sd_size);
 	hashcpy(ondisk->sha1, ce->sha1);
 
-	flags = ce->ce_flags;
+	flags = ce->ce_flags & ~CE_NAMEMASK;
 	flags |= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));
 	ondisk->flags = htons(flags);
 	if (ce->ce_flags & CE_EXTENDED) {
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries()
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 08/32] read-cache: be specific what part of the index has changed Nguyễn Thái Ngọc Duy
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

remove_marked_cache_entries() deletes entries marked with
CE_REMOVE. But if there is no such entry, do not mark the index as
"changed" because that could trigger an index update unnecessarily.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 5761b1f..9819363 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -510,6 +510,8 @@ void remove_marked_cache_entries(struct index_state *istate)
 		else
 			ce_array[j++] = ce_array[i];
 	}
+	if (j == istate->cache_nr)
+		return;
 	istate->cache_changed = 1;
 	istate->cache_nr = j;
 }
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 08/32] read-cache: be specific what part of the index has changed
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries() Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 09/32] update-index: " Nguyễn Thái Ngọc Duy
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  4 ++++
 read-cache.c | 11 ++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 57ad318..d692b74 100644
--- a/cache.h
+++ b/cache.h
@@ -268,6 +268,10 @@ static inline unsigned int canon_mode(unsigned int mode)
 
 #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
 
+#define CE_ENTRY_CHANGED	(1 << 0)
+#define CE_ENTRY_REMOVED	(1 << 1)
+#define CE_ENTRY_ADDED		(1 << 2)
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
diff --git a/read-cache.c b/read-cache.c
index 9819363..6971fc4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -51,7 +51,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 	remove_name_hash(istate, old);
 	free(old);
 	set_index_entry(istate, nr, ce);
-	istate->cache_changed = 1;
+	istate->cache_changed |= CE_ENTRY_CHANGED;
 }
 
 void rename_index_entry_at(struct index_state *istate, int nr, const char *new_name)
@@ -482,7 +482,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 	record_resolve_undo(istate, ce);
 	remove_name_hash(istate, ce);
 	free(ce);
-	istate->cache_changed = 1;
+	istate->cache_changed |= CE_ENTRY_REMOVED;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
 		return 0;
@@ -512,7 +512,7 @@ void remove_marked_cache_entries(struct index_state *istate)
 	}
 	if (j == istate->cache_nr)
 		return;
-	istate->cache_changed = 1;
+	istate->cache_changed |= CE_ENTRY_REMOVED;
 	istate->cache_nr = j;
 }
 
@@ -1002,7 +1002,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 			istate->cache + pos,
 			(istate->cache_nr - pos - 1) * sizeof(ce));
 	set_index_entry(istate, pos, ce);
-	istate->cache_changed = 1;
+	istate->cache_changed |= CE_ENTRY_ADDED;
 	return 0;
 }
 
@@ -1101,6 +1101,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	    !(ce->ce_flags & CE_VALID))
 		updated->ce_flags &= ~CE_VALID;
 
+	/* istate->cache_changed is updated in the caller */
 	return updated;
 }
 
@@ -1182,7 +1183,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 				 * means the index is not valid anymore.
 				 */
 				ce->ce_flags &= ~CE_VALID;
-				istate->cache_changed = 1;
+				istate->cache_changed |= CE_ENTRY_CHANGED;
 			}
 			if (quiet)
 				continue;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 09/32] update-index: be specific what part of the index has changed
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (7 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 08/32] read-cache: be specific what part of the index has changed Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 10/32] resolve-undo: " Nguyễn Thái Ngọc Duy
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/update-index.c | 6 +++---
 cache.h                | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 42cbe4b..e0e881b 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -56,7 +56,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 		else
 			active_cache[pos]->ce_flags &= ~flag;
 		cache_tree_invalidate_path(active_cache_tree, path);
-		active_cache_changed = 1;
+		active_cache_changed |= CE_ENTRY_CHANGED;
 		return 0;
 	}
 	return -1;
@@ -268,7 +268,7 @@ static void chmod_path(int flip, const char *path)
 		goto fail;
 	}
 	cache_tree_invalidate_path(active_cache_tree, path);
-	active_cache_changed = 1;
+	active_cache_changed |= CE_ENTRY_CHANGED;
 	report("chmod %cx '%s'", flip, path);
 	return;
  fail:
@@ -889,7 +889,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    INDEX_FORMAT_LB, INDEX_FORMAT_UB);
 
 		if (the_index.version != preferred_index_format)
-			active_cache_changed = 1;
+			active_cache_changed |= SOMETHING_CHANGED;
 		the_index.version = preferred_index_format;
 	}
 
diff --git a/cache.h b/cache.h
index d692b74..4133797 100644
--- a/cache.h
+++ b/cache.h
@@ -271,6 +271,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CE_ENTRY_CHANGED	(1 << 0)
 #define CE_ENTRY_REMOVED	(1 << 1)
 #define CE_ENTRY_ADDED		(1 << 2)
+#define SOMETHING_CHANGED	(1 << 3) /* unclassified changes go here */
 
 struct index_state {
 	struct cache_entry **cache;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 10/32] resolve-undo: be specific what part of the index has changed
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (8 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 09/32] update-index: " Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 11/32] unpack-trees: " Nguyễn Thái Ngọc Duy
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h        | 1 +
 resolve-undo.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 4133797..7155052 100644
--- a/cache.h
+++ b/cache.h
@@ -272,6 +272,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CE_ENTRY_REMOVED	(1 << 1)
 #define CE_ENTRY_ADDED		(1 << 2)
 #define SOMETHING_CHANGED	(1 << 3) /* unclassified changes go here */
+#define RESOLVE_UNDO_CHANGED	(1 << 4)
 
 struct index_state {
 	struct cache_entry **cache;
diff --git a/resolve-undo.c b/resolve-undo.c
index 44c697c..468a2eb 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -110,7 +110,7 @@ void resolve_undo_clear_index(struct index_state *istate)
 	string_list_clear(resolve_undo, 1);
 	free(resolve_undo);
 	istate->resolve_undo = NULL;
-	istate->cache_changed = 1;
+	istate->cache_changed |= RESOLVE_UNDO_CHANGED;
 }
 
 int unmerge_index_entry_at(struct index_state *istate, int pos)
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 11/32] unpack-trees: be specific what part of the index has changed
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (9 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 10/32] resolve-undo: " Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation Nguyễn Thái Ngọc Duy
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 97fc995..a722685 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -246,7 +246,9 @@ static int verify_absent_sparse(const struct cache_entry *ce,
 				enum unpack_trees_error_types,
 				struct unpack_trees_options *o);
 
-static int apply_sparse_checkout(struct cache_entry *ce, struct unpack_trees_options *o)
+static int apply_sparse_checkout(struct index_state *istate,
+				 struct cache_entry *ce,
+				 struct unpack_trees_options *o)
 {
 	int was_skip_worktree = ce_skip_worktree(ce);
 
@@ -254,6 +256,8 @@ static int apply_sparse_checkout(struct cache_entry *ce, struct unpack_trees_opt
 		ce->ce_flags |= CE_SKIP_WORKTREE;
 	else
 		ce->ce_flags &= ~CE_SKIP_WORKTREE;
+	if (was_skip_worktree != ce_skip_worktree(ce))
+		istate->cache_changed |= CE_ENTRY_CHANGED;
 
 	/*
 	 * if (!was_skip_worktree && !ce_skip_worktree()) {
@@ -1131,7 +1135,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 				ret = -1;
 			}
 
-			if (apply_sparse_checkout(ce, o)) {
+			if (apply_sparse_checkout(&o->result, ce, o)) {
 				if (!o->show_all_errors)
 					goto return_failed;
 				ret = -1;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (10 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 11/32] unpack-trees: " Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update Nguyễn Thái Ngọc Duy
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/blame.c        |  2 +-
 builtin/update-index.c |  4 ++--
 cache-tree.c           | 15 +++++++++++----
 cache-tree.h           |  2 +-
 cache.h                |  1 +
 read-cache.c           |  6 +++---
 unpack-trees.c         |  2 +-
 7 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 88cb799..914d919 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2126,7 +2126,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 	 * right now, but someday we might optimize diff-index --cached
 	 * with cache-tree information.
 	 */
-	cache_tree_invalidate_path(active_cache_tree, path);
+	cache_tree_invalidate_path(&the_index, path);
 
 	return commit;
 }
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e0e881b..fa3c441 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -55,7 +55,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 			active_cache[pos]->ce_flags |= flag;
 		else
 			active_cache[pos]->ce_flags &= ~flag;
-		cache_tree_invalidate_path(active_cache_tree, path);
+		cache_tree_invalidate_path(&the_index, path);
 		active_cache_changed |= CE_ENTRY_CHANGED;
 		return 0;
 	}
@@ -267,7 +267,7 @@ static void chmod_path(int flip, const char *path)
 	default:
 		goto fail;
 	}
-	cache_tree_invalidate_path(active_cache_tree, path);
+	cache_tree_invalidate_path(&the_index, path);
 	active_cache_changed |= CE_ENTRY_CHANGED;
 	report("chmod %cx '%s'", flip, path);
 	return;
diff --git a/cache-tree.c b/cache-tree.c
index 52f8692..23ddc73 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -98,7 +98,7 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *it, const char *path)
 	return find_subtree(it, path, pathlen, 1);
 }
 
-void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
+static int do_invalidate_path(struct cache_tree *it, const char *path)
 {
 	/* a/b/c
 	 * ==> invalidate self
@@ -116,7 +116,7 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
 #endif
 
 	if (!it)
-		return;
+		return 0;
 	slash = strchrnul(path, '/');
 	namelen = slash - path;
 	it->entry_count = -1;
@@ -137,11 +137,18 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
 				(it->subtree_nr - pos - 1));
 			it->subtree_nr--;
 		}
-		return;
+		return 1;
 	}
 	down = find_subtree(it, path, namelen, 0);
 	if (down)
-		cache_tree_invalidate_path(down->cache_tree, slash + 1);
+		do_invalidate_path(down->cache_tree, slash + 1);
+	return 1;
+}
+
+void cache_tree_invalidate_path(struct index_state *istate, const char *path)
+{
+	if (do_invalidate_path(istate->cache_tree, path))
+		istate->cache_changed |= CACHE_TREE_CHANGED;
 }
 
 static int verify_cache(const struct cache_entry * const *cache,
diff --git a/cache-tree.h b/cache-tree.h
index f1923ad..dfbcfab 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -23,7 +23,7 @@ struct cache_tree {
 
 struct cache_tree *cache_tree(void);
 void cache_tree_free(struct cache_tree **);
-void cache_tree_invalidate_path(struct cache_tree *, const char *);
+void cache_tree_invalidate_path(struct index_state *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
diff --git a/cache.h b/cache.h
index 7155052..4c288e8 100644
--- a/cache.h
+++ b/cache.h
@@ -273,6 +273,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CE_ENTRY_ADDED		(1 << 2)
 #define SOMETHING_CHANGED	(1 << 3) /* unclassified changes go here */
 #define RESOLVE_UNDO_CHANGED	(1 << 4)
+#define CACHE_TREE_CHANGED	(1 << 5)
 
 struct index_state {
 	struct cache_entry **cache;
diff --git a/read-cache.c b/read-cache.c
index 6971fc4..f1265d4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -65,7 +65,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
 	new->ce_namelen = namelen;
 	memcpy(new->name, new_name, namelen + 1);
 
-	cache_tree_invalidate_path(istate->cache_tree, old->name);
+	cache_tree_invalidate_path(istate, old->name);
 	remove_index_entry_at(istate, nr);
 	add_index_entry(istate, new, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
 }
@@ -521,7 +521,7 @@ int remove_file_from_index(struct index_state *istate, const char *path)
 	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0)
 		pos = -pos-1;
-	cache_tree_invalidate_path(istate->cache_tree, path);
+	cache_tree_invalidate_path(istate, path);
 	while (pos < istate->cache_nr && !strcmp(istate->cache[pos]->name, path))
 		remove_index_entry_at(istate, pos);
 	return 0;
@@ -939,7 +939,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
-	cache_tree_invalidate_path(istate->cache_tree, ce->name);
+	cache_tree_invalidate_path(istate, ce->name);
 	pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
 
 	/* existing match? Just replace it. */
diff --git a/unpack-trees.c b/unpack-trees.c
index a722685..3beff8a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1263,7 +1263,7 @@ static void invalidate_ce_path(const struct cache_entry *ce,
 			       struct unpack_trees_options *o)
 {
 	if (ce)
-		cache_tree_invalidate_path(o->src_index->cache_tree, ce->name);
+		cache_tree_invalidate_path(o->src_index, ce->name);
 }
 
 /*
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (11 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree() Nguyễn Thái Ngọc Duy
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c           | 25 +++++++++++--------------
 cache-tree.h           |  2 +-
 merge-recursive.c      |  4 +---
 sequencer.c            |  4 +---
 test-dump-cache-tree.c |  7 ++++---
 5 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 23ddc73..18055f1 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -151,7 +151,7 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
 		istate->cache_changed |= CACHE_TREE_CHANGED;
 }
 
-static int verify_cache(const struct cache_entry * const *cache,
+static int verify_cache(struct cache_entry **cache,
 			int entries, int flags)
 {
 	int i, funny;
@@ -236,7 +236,7 @@ int cache_tree_fully_valid(struct cache_tree *it)
 }
 
 static int update_one(struct cache_tree *it,
-		      const struct cache_entry * const *cache,
+		      struct cache_entry **cache,
 		      int entries,
 		      const char *base,
 		      int baselen,
@@ -398,18 +398,19 @@ static int update_one(struct cache_tree *it,
 	return i;
 }
 
-int cache_tree_update(struct cache_tree *it,
-		      const struct cache_entry * const *cache,
-		      int entries,
-		      int flags)
+int cache_tree_update(struct index_state *istate, int flags)
 {
-	int i, skip;
-	i = verify_cache(cache, entries, flags);
+	struct cache_tree *it = istate->cache_tree;
+	struct cache_entry **cache = istate->cache;
+	int entries = istate->cache_nr;
+	int skip, i = verify_cache(cache, entries, flags);
+
 	if (i)
 		return i;
 	i = update_one(it, cache, entries, "", 0, &skip, flags);
 	if (i < 0)
 		return i;
+	istate->cache_changed |= CACHE_TREE_CHANGED;
 	return 0;
 }
 
@@ -597,9 +598,7 @@ int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)
 
 	was_valid = cache_tree_fully_valid(active_cache_tree);
 	if (!was_valid) {
-		if (cache_tree_update(active_cache_tree,
-				      (const struct cache_entry * const *)active_cache,
-				      active_nr, flags) < 0)
+		if (cache_tree_update(&the_index, flags) < 0)
 			return WRITE_TREE_UNMERGED_INDEX;
 		if (0 <= newfd) {
 			if (!write_locked_index(&the_index, lock_file, COMMIT_LOCK))
@@ -698,7 +697,5 @@ int update_main_cache_tree(int flags)
 {
 	if (!the_index.cache_tree)
 		the_index.cache_tree = cache_tree();
-	return cache_tree_update(the_index.cache_tree,
-				 (const struct cache_entry * const *)the_index.cache,
-				 the_index.cache_nr, flags);
+	return cache_tree_update(&the_index, flags);
 }
diff --git a/cache-tree.h b/cache-tree.h
index dfbcfab..154b357 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -30,7 +30,7 @@ void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
 int cache_tree_fully_valid(struct cache_tree *);
-int cache_tree_update(struct cache_tree *, const struct cache_entry * const *, int, int);
+int cache_tree_update(struct index_state *, int);
 
 int update_main_cache_tree(int);
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 442c1ec..0b5d34d 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -265,9 +265,7 @@ struct tree *write_tree_from_memory(struct merge_options *o)
 		active_cache_tree = cache_tree();
 
 	if (!cache_tree_fully_valid(active_cache_tree) &&
-	    cache_tree_update(active_cache_tree,
-			      (const struct cache_entry * const *)active_cache,
-			      active_nr, 0) < 0)
+	    cache_tree_update(&the_index, 0) < 0)
 		die(_("error building trees"));
 
 	result = lookup_tree(active_cache_tree->sha1);
diff --git a/sequencer.c b/sequencer.c
index 4fb0774..377c877 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -371,9 +371,7 @@ static int is_index_unchanged(void)
 		active_cache_tree = cache_tree();
 
 	if (!cache_tree_fully_valid(active_cache_tree))
-		if (cache_tree_update(active_cache_tree,
-				      (const struct cache_entry * const *)active_cache,
-				      active_nr, 0))
+		if (cache_tree_update(&the_index, 0))
 			return error(_("Unable to update cache tree\n"));
 
 	return !hashcmp(active_cache_tree->sha1, head_commit->tree->object.sha1);
diff --git a/test-dump-cache-tree.c b/test-dump-cache-tree.c
index 47eab97..330ba4f 100644
--- a/test-dump-cache-tree.c
+++ b/test-dump-cache-tree.c
@@ -56,11 +56,12 @@ static int dump_cache_tree(struct cache_tree *it,
 
 int main(int ac, char **av)
 {
+	struct index_state istate;
 	struct cache_tree *another = cache_tree();
 	if (read_cache() < 0)
 		die("unable to read index file");
-	cache_tree_update(another,
-			  (const struct cache_entry * const *)active_cache,
-			  active_nr, WRITE_TREE_DRY_RUN);
+	istate = the_index;
+	istate.cache_tree = another;
+	cache_tree_update(&istate, WRITE_TREE_DRY_RUN);
 	return dump_cache_tree(active_cache_tree, another, "");
 }
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree()
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (12 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry() Nguyễn Thái Ngọc Duy
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/read-tree.c | 2 +-
 builtin/reset.c     | 2 +-
 cache-tree.c        | 9 +++++----
 cache-tree.h        | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index f26d90f..3204c62 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -231,7 +231,7 @@ int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
 	 * what came from the tree.
 	 */
 	if (nr_trees == 1 && !opts.prefix)
-		prime_cache_tree(&active_cache_tree, trees[0]);
+		prime_cache_tree(&the_index, trees[0]);
 
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die("unable to write new index file");
diff --git a/builtin/reset.c b/builtin/reset.c
index 0c56d28..234b2eb 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -84,7 +84,7 @@ static int reset_index(const unsigned char *sha1, int reset_type, int quiet)
 
 	if (reset_type == MIXED || reset_type == HARD) {
 		tree = parse_tree_indirect(sha1);
-		prime_cache_tree(&active_cache_tree, tree);
+		prime_cache_tree(&the_index, tree);
 	}
 
 	return 0;
diff --git a/cache-tree.c b/cache-tree.c
index 18055f1..c53f7de 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -654,11 +654,12 @@ static void prime_cache_tree_rec(struct cache_tree *it, struct tree *tree)
 	it->entry_count = cnt;
 }
 
-void prime_cache_tree(struct cache_tree **it, struct tree *tree)
+void prime_cache_tree(struct index_state *istate, struct tree *tree)
 {
-	cache_tree_free(it);
-	*it = cache_tree();
-	prime_cache_tree_rec(*it, tree);
+	cache_tree_free(&istate->cache_tree);
+	istate->cache_tree = cache_tree();
+	prime_cache_tree_rec(istate->cache_tree, tree);
+	istate->cache_changed |= CACHE_TREE_CHANGED;
 }
 
 /*
diff --git a/cache-tree.h b/cache-tree.h
index 154b357..b47ccec 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -46,7 +46,7 @@ int update_main_cache_tree(int);
 #define WRITE_TREE_PREFIX_ERROR (-3)
 
 int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix);
-void prime_cache_tree(struct cache_tree **, struct tree *);
+void prime_cache_tree(struct index_state *, struct tree *);
 
 extern int cache_tree_matches_traversal(struct cache_tree *, struct name_entry *ent, struct traverse_info *info);
 
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (13 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree() Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 16/32] read-cache: save index SHA-1 after reading Nguyễn Thái Ngọc Duy
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/apply.c          | 8 +++++---
 builtin/checkout-index.c | 1 +
 builtin/checkout.c       | 1 +
 cache.h                  | 1 +
 entry.c                  | 2 ++
 unpack-trees.c           | 1 +
 6 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 5e13444..adca035 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3084,13 +3084,15 @@ static void prepare_fn_table(struct patch *patch)
 	}
 }
 
-static int checkout_target(struct cache_entry *ce, struct stat *st)
+static int checkout_target(struct index_state *istate,
+			   struct cache_entry *ce, struct stat *st)
 {
 	struct checkout costate;
 
 	memset(&costate, 0, sizeof(costate));
 	costate.base_dir = "";
 	costate.refresh_cache = 1;
+	costate.istate = istate;
 	if (checkout_entry(ce, &costate, NULL) || lstat(ce->name, st))
 		return error(_("cannot checkout %s"), ce->name);
 	return 0;
@@ -3257,7 +3259,7 @@ static int load_current(struct image *image, struct patch *patch)
 	if (lstat(name, &st)) {
 		if (errno != ENOENT)
 			return error(_("%s: %s"), name, strerror(errno));
-		if (checkout_target(ce, &st))
+		if (checkout_target(&the_index, ce, &st))
 			return -1;
 	}
 	if (verify_index_match(ce, &st))
@@ -3411,7 +3413,7 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s
 		}
 		*ce = active_cache[pos];
 		if (stat_ret < 0) {
-			if (checkout_target(*ce, st))
+			if (checkout_target(&the_index, *ce, st))
 				return -1;
 		}
 		if (!cached && verify_index_match(*ce, st))
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 9e49bf2..05edd9e 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -135,6 +135,7 @@ static int option_parse_u(const struct option *opt,
 	int *newfd = opt->value;
 
 	state.refresh_cache = 1;
+	state.istate = &the_index;
 	if (*newfd < 0)
 		*newfd = hold_locked_index(&lock_file, 1);
 	return 0;
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 944a634..146ab91 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -336,6 +336,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 	memset(&state, 0, sizeof(state));
 	state.force = 1;
 	state.refresh_cache = 1;
+	state.istate = &the_index;
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
diff --git a/cache.h b/cache.h
index 4c288e8..9bbebab 100644
--- a/cache.h
+++ b/cache.h
@@ -1063,6 +1063,7 @@ extern int split_ident_line(struct ident_split *, const char *, int);
 extern int ident_cmp(const struct ident_split *, const struct ident_split *);
 
 struct checkout {
+	struct index_state *istate;
 	const char *base_dir;
 	int base_dir_len;
 	unsigned force:1,
diff --git a/entry.c b/entry.c
index 77c6882..d913c1d 100644
--- a/entry.c
+++ b/entry.c
@@ -210,9 +210,11 @@ static int write_entry(struct cache_entry *ce,
 
 finish:
 	if (state->refresh_cache) {
+		assert(state->istate);
 		if (!fstat_done)
 			lstat(ce->name, &st);
 		fill_stat_cache_info(ce, &st);
+		state->istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
 	return 0;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 3beff8a..26f65c7 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1029,6 +1029,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	state.force = 1;
 	state.quiet = 1;
 	state.refresh_cache = 1;
+	state.istate = &o->result;
 
 	memset(&el, 0, sizeof(el));
 	if (!core_apply_sparse_checkout || !o->update)
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 16/32] read-cache: save index SHA-1 after reading
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (14 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry() Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h        | 1 +
 read-cache.c   | 6 ++++--
 unpack-trees.c | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 9bbebab..0f6247c 100644
--- a/cache.h
+++ b/cache.h
@@ -286,6 +286,7 @@ struct index_state {
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
+	unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index f1265d4..723d769 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1481,6 +1481,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	if (verify_hdr(hdr, mmap_size) < 0)
 		goto unmap;
 
+	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
 	istate->version = ntohl(hdr->hdr_version);
 	istate->cache_nr = ntohl(hdr->hdr_entries);
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
@@ -1616,7 +1617,7 @@ static int write_index_ext_header(git_SHA_CTX *context, int fd,
 		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
 }
 
-static int ce_flush(git_SHA_CTX *context, int fd)
+static int ce_flush(git_SHA_CTX *context, int fd, unsigned char *sha1)
 {
 	unsigned int left = write_buffer_len;
 
@@ -1634,6 +1635,7 @@ static int ce_flush(git_SHA_CTX *context, int fd)
 
 	/* Append the SHA1 signature at the end */
 	git_SHA1_Final(write_buffer + left, context);
+	hashcpy(sha1, write_buffer + left);
 	left += 20;
 	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
 }
@@ -1872,7 +1874,7 @@ static int do_write_index(struct index_state *istate, int newfd)
 			return -1;
 	}
 
-	if (ce_flush(&c, newfd) || fstat(newfd, &st))
+	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
diff --git a/unpack-trees.c b/unpack-trees.c
index 26f65c7..f594932 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1046,6 +1046,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->result.timestamp.sec = o->src_index->timestamp.sec;
 	o->result.timestamp.nsec = o->src_index->timestamp.nsec;
 	o->result.version = o->src_index->version;
+	hashcpy(o->result.sha1, o->src_index->sha1);
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
 
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 17/32] read-cache: split-index mode
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (15 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 16/32] read-cache: save index SHA-1 after reading Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 22:46   ` Junio C Hamano
  2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
                   ` (17 subsequent siblings)
  34 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)

This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.

Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.

The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/gitrepository-layout.txt   |  4 ++
 Documentation/technical/index-format.txt | 35 ++++++++++++
 Makefile                                 |  1 +
 cache.h                                  |  3 +
 read-cache.c                             | 96 ++++++++++++++++++++++++++++++--
 split-index.c (new)                      | 90 ++++++++++++++++++++++++++++++
 split-index.h (new)                      | 25 +++++++++
 unpack-trees.c                           |  4 ++
 8 files changed, 253 insertions(+), 5 deletions(-)
 create mode 100644 split-index.c
 create mode 100644 split-index.h

diff --git a/Documentation/gitrepository-layout.txt b/Documentation/gitrepository-layout.txt
index 17d2ea6..79653f3 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -155,6 +155,10 @@ index::
 	The current index file for the repository.  It is
 	usually not found in a bare repository.
 
+sharedindex.<SHA-1>::
+	The shared index part, to be referenced by $GIT_DIR/index and
+	other temporary index files. Only valid in split index mode.
+
 info::
 	Additional information about the repository is recorded
 	in this directory.
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index f352a9b..fe6f316 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -129,6 +129,9 @@ Git index format
   (Version 4) In version 4, the padding after the pathname does not
   exist.
 
+  Interpretation of index entries in split index mode is completely
+  different. See below for details.
+
 == Extensions
 
 === Cached tree
@@ -198,3 +201,35 @@ Git index format
   - At most three 160-bit object names of the entry in stages from 1 to 3
     (nothing is written for a missing stage).
 
+=== Split index
+
+  In split index mode, the majority of index entries could be stored
+  in a separate file. This extension records the changes to be made on
+  top of that to produce the final index.
+
+  The signature for this extension is { 'l', 'i, 'n', 'k' }.
+
+  The extension consists of:
+
+  - 160-bit SHA-1 of the shared index file. The shared index file path
+    is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+    index does not require a shared index file.
+
+  - An ewah-encoded delete bitmap, each bit represents an entry in the
+    shared index. If a bit is set, its corresponding entry in the
+    shared index will be removed from the final index.  Note, because
+    a delete operation changes index entry positions, but we do need
+    original positions in replace phase, it's best to just mark
+    entries for removal, then do a mass deletion after replacement.
+
+  - An ewah-encoded replace bitmap, each bit represents an entry in
+    the shared index. If a bit is set, its corresponding entry in the
+    shared index will be replaced with an entry in this index
+    file. All replaced entries are stored in sorted order in this
+    index. The first "1" bit in the replace bitmap corresponds to the
+    first index entry, the second "1" bit to the second entry and so
+    on. Replaced entries may have empty path names to save space.
+
+  The remaining index entries after replaced ones will be added to the
+  final index. These added entries are also sorted by entry namme then
+  stage.
diff --git a/Makefile b/Makefile
index 74a929b..c3957bb 100644
--- a/Makefile
+++ b/Makefile
@@ -884,6 +884,7 @@ LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
 LIB_OBJS += sigchain.o
+LIB_OBJS += split-index.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
diff --git a/cache.h b/cache.h
index 0f6247c..90a5998 100644
--- a/cache.h
+++ b/cache.h
@@ -135,6 +135,7 @@ struct cache_entry {
 	unsigned int ce_mode;
 	unsigned int ce_flags;
 	unsigned int ce_namelen;
+	unsigned int index;	/* for link extension */
 	unsigned char sha1[20];
 	char name[FLEX_ARRAY]; /* more */
 };
@@ -275,12 +276,14 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define RESOLVE_UNDO_CHANGED	(1 << 4)
 #define CACHE_TREE_CHANGED	(1 << 5)
 
+struct split_index;
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
 	unsigned int cache_nr, cache_alloc, cache_changed;
 	struct string_list *resolve_undo;
 	struct cache_tree *cache_tree;
+	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
 		 initialized : 1;
diff --git a/read-cache.c b/read-cache.c
index 723d769..ff889ad 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "strbuf.h"
 #include "varint.h"
+#include "split-index.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -34,6 +35,10 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
+
+/* changes that can be kept in $GIT_DIR/index (basically all extensions) */
+#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -63,6 +68,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
 	copy_cache_entry(new, old);
 	new->ce_flags &= ~CE_HASHED;
 	new->ce_namelen = namelen;
+	new->index = 0;
 	memcpy(new->name, new_name, namelen + 1);
 
 	cache_tree_invalidate_path(istate, old->name);
@@ -1335,6 +1341,10 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_LINK:
+		if (read_link_extension(istate, data, sz))
+			return -1;
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1369,6 +1379,7 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
 	ce->ce_stat_data.sd_size  = get_be32(&ondisk->size);
 	ce->ce_flags = flags & ~CE_NAMEMASK;
 	ce->ce_namelen = len;
+	ce->index = 0;
 	hashcpy(ce->sha1, ondisk->sha1);
 	memcpy(ce->name, name, len);
 	ce->name[len] = '\0';
@@ -1443,7 +1454,8 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 }
 
 /* remember to discard_cache() before reading a different cache! */
-int read_index_from(struct index_state *istate, const char *path)
+static int do_read_index(struct index_state *istate, const char *path,
+			 int must_exist)
 {
 	int fd, i;
 	struct stat st;
@@ -1460,9 +1472,9 @@ int read_index_from(struct index_state *istate, const char *path)
 	istate->timestamp.nsec = 0;
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
-		if (errno == ENOENT)
+		if (!must_exist && errno == ENOENT)
 			return 0;
-		die_errno("index file open failed");
+		die_errno("%s: index file open failed", path);
 	}
 
 	if (fstat(fd, &st))
@@ -1535,6 +1547,42 @@ unmap:
 	die("index file corrupt");
 }
 
+int read_index_from(struct index_state *istate, const char *path)
+{
+	struct split_index *split_index;
+	int ret;
+
+	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
+	if (istate->initialized)
+		return istate->cache_nr;
+
+	ret = do_read_index(istate, path, 0);
+	split_index = istate->split_index;
+	if (!split_index)
+		return ret;
+
+	if (is_null_sha1(split_index->base_sha1))
+		return ret;
+	if (istate->cache_nr)
+		die("index in split-index mode must contain no entries");
+
+	if (split_index->base)
+		discard_index(split_index->base);
+	else
+		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	ret = do_read_index(split_index->base,
+			    git_path("sharedindex.%s",
+				     sha1_to_hex(split_index->base_sha1)), 1);
+	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
+		die("broken index, expect %s in %s, got %s",
+		    sha1_to_hex(split_index->base_sha1),
+		    git_path("sharedindex.%s",
+				     sha1_to_hex(split_index->base_sha1)),
+		    sha1_to_hex(split_index->base->sha1));
+	merge_base_index(istate);
+	return ret;
+}
+
 int is_index_unborn(struct index_state *istate)
 {
 	return (!istate->cache_nr && !istate->timestamp.sec);
@@ -1544,8 +1592,15 @@ int discard_index(struct index_state *istate)
 {
 	int i;
 
-	for (i = 0; i < istate->cache_nr; i++)
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (istate->cache[i]->index &&
+		    istate->split_index &&
+		    istate->split_index->base &&
+		    istate->cache[i]->index <= istate->split_index->base->cache_nr &&
+		    istate->cache[i] == istate->split_index->base->cache[istate->cache[i]->index - 1])
+			continue;
 		free(istate->cache[i]);
+	}
 	resolve_undo_clear_index(istate);
 	istate->cache_nr = 0;
 	istate->cache_changed = 0;
@@ -1557,6 +1612,7 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	discard_split_index(istate);
 	return 0;
 }
 
@@ -1852,6 +1908,17 @@ static int do_write_index(struct index_state *istate, int newfd)
 	strbuf_release(&previous_name_buf);
 
 	/* Write extension data here */
+	if (istate->split_index) {
+		struct strbuf sb = STRBUF_INIT;
+
+		err = write_link_extension(&sb, istate) < 0 ||
+			write_index_ext_header(&c, newfd, CACHE_EXT_LINK,
+					       sb.len) < 0 ||
+			ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 	if (istate->cache_tree) {
 		struct strbuf sb = STRBUF_INIT;
 
@@ -1916,10 +1983,29 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 		return ret;
 }
 
+static int write_split_index(struct index_state *istate,
+			     struct lock_file *lock,
+			     unsigned flags)
+{
+	int ret;
+	prepare_to_write_split_index(istate);
+	ret = do_write_locked_index(istate, lock, flags);
+	finish_writing_split_index(istate);
+	return ret;
+}
+
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		       unsigned flags)
 {
-	return do_write_locked_index(istate, lock, flags);
+	struct split_index *si = istate->split_index;
+
+	if (!si || (istate->cache_changed & ~EXTMASK)) {
+		if (si)
+			hashclr(si->base_sha1);
+		return do_write_locked_index(istate, lock, flags);
+	}
+
+	return write_split_index(istate, lock, flags);
 }
 
 /*
diff --git a/split-index.c b/split-index.c
new file mode 100644
index 0000000..63b52bb
--- /dev/null
+++ b/split-index.c
@@ -0,0 +1,90 @@
+#include "cache.h"
+#include "split-index.h"
+
+struct split_index *init_split_index(struct index_state *istate)
+{
+	if (!istate->split_index) {
+		istate->split_index = xcalloc(1, sizeof(*istate->split_index));
+		istate->split_index->refcount = 1;
+	}
+	return istate->split_index;
+}
+
+int read_link_extension(struct index_state *istate,
+			 const void *data_, unsigned long sz)
+{
+	const unsigned char *data = data_;
+	struct split_index *si;
+	if (sz < 20)
+		return error("corrupt link extension (too short)");
+	si = init_split_index(istate);
+	hashcpy(si->base_sha1, data);
+	data += 20;
+	sz -= 20;
+	if (sz)
+		return error("garbage at the end of link extension");
+	return 0;
+}
+
+int write_link_extension(struct strbuf *sb,
+			 struct index_state *istate)
+{
+	struct split_index *si = istate->split_index;
+	strbuf_add(sb, si->base_sha1, 20);
+	return 0;
+}
+
+static void mark_base_index_entries(struct index_state *base)
+{
+	int i;
+	/*
+	 * To keep track of the shared entries between
+	 * istate->base->cache[] and istate->cache[], base entry
+	 * position is stored in each base entry. All positions start
+	 * from 1 instead of 0, which is resrved to say "this is a new
+	 * entry".
+	 */
+	for (i = 0; i < base->cache_nr; i++)
+		base->cache[i]->index = i + 1;
+}
+
+void merge_base_index(struct index_state *istate)
+{
+	struct split_index *si = istate->split_index;
+
+	mark_base_index_entries(si->base);
+	istate->cache_nr = si->base->cache_nr;
+	ALLOC_GROW(istate->cache, istate->cache_nr, istate->cache_alloc);
+	memcpy(istate->cache, si->base->cache,
+	       sizeof(*istate->cache) * istate->cache_nr);
+}
+
+void prepare_to_write_split_index(struct index_state *istate)
+{
+	struct split_index *si = init_split_index(istate);
+	/* take cache[] out temporarily */
+	si->saved_cache_nr = istate->cache_nr;
+	istate->cache_nr = 0;
+}
+
+void finish_writing_split_index(struct index_state *istate)
+{
+	struct split_index *si = init_split_index(istate);
+	istate->cache_nr = si->saved_cache_nr;
+}
+
+void discard_split_index(struct index_state *istate)
+{
+	struct split_index *si = istate->split_index;
+	if (!si)
+		return;
+	istate->split_index = NULL;
+	si->refcount--;
+	if (si->refcount)
+		return;
+	if (si->base) {
+		discard_index(si->base);
+		free(si->base);
+	}
+	free(si);
+}
diff --git a/split-index.h b/split-index.h
new file mode 100644
index 0000000..8d74041
--- /dev/null
+++ b/split-index.h
@@ -0,0 +1,25 @@
+#ifndef SPLIT_INDEX_H
+#define SPLIT_INDEX_H
+
+struct index_state;
+struct strbuf;
+
+struct split_index {
+	unsigned char base_sha1[20];
+	struct index_state *base;
+	unsigned int saved_cache_nr;
+	int refcount;
+};
+
+struct split_index *init_split_index(struct index_state *istate);
+int read_link_extension(struct index_state *istate,
+			const void *data, unsigned long sz);
+int write_link_extension(struct strbuf *sb,
+			 struct index_state *istate);
+void move_cache_to_base_index(struct index_state *istate);
+void merge_base_index(struct index_state *istate);
+void prepare_to_write_split_index(struct index_state *istate);
+void finish_writing_split_index(struct index_state *istate);
+void discard_split_index(struct index_state *istate);
+
+#endif
diff --git a/unpack-trees.c b/unpack-trees.c
index f594932..a941f7c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -8,6 +8,7 @@
 #include "progress.h"
 #include "refs.h"
 #include "attr.h"
+#include "split-index.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -1046,6 +1047,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->result.timestamp.sec = o->src_index->timestamp.sec;
 	o->result.timestamp.nsec = o->src_index->timestamp.nsec;
 	o->result.version = o->src_index->version;
+	o->result.split_index = o->src_index->split_index;
+	if (o->result.split_index)
+		o->result.split_index->refcount++;
 	hashcpy(o->result.sha1, o->src_index->sha1);
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 18/32] read-cache: mark new entries for split index
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (16 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-30 20:35   ` Eric Sunshine
  2014-04-28 10:55 ` [PATCH 19/32] read-cache: save deleted entries in " Nguyễn Thái Ngọc Duy
                   ` (16 subsequent siblings)
  34 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Make sure entry addition does not lead to unifying the index. We don't
need to explicitly keep track of new entries. If ce->index is zero,
they're new. Otherwise it's unlikely that they are new, but we'll do a
through check later at writing time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index ff889ad..2f2e0c1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -38,7 +38,8 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
-#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED)
+#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
+		 CE_ENTRY_ADDED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 19/32] read-cache: save deleted entries in split index
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (17 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 20/32] read-cache: mark updated entries for " Nguyễn Thái Ngọc Duy
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Entries that belong to the base index should not be freed. Mark
CE_REMOVE to track them.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c  | 14 ++++++++------
 split-index.c | 12 ++++++++++++
 split-index.h |  1 +
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 2f2e0c1..7cdb171 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -39,7 +39,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
-		 CE_ENTRY_ADDED)
+		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -488,7 +488,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 
 	record_resolve_undo(istate, ce);
 	remove_name_hash(istate, ce);
-	free(ce);
+	save_or_free_index_entry(istate, ce);
 	istate->cache_changed |= CE_ENTRY_REMOVED;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
@@ -512,7 +512,7 @@ void remove_marked_cache_entries(struct index_state *istate)
 	for (i = j = 0; i < istate->cache_nr; i++) {
 		if (ce_array[i]->ce_flags & CE_REMOVE) {
 			remove_name_hash(istate, ce_array[i]);
-			free(ce_array[i]);
+			save_or_free_index_entry(istate, ce_array[i]);
 		}
 		else
 			ce_array[j++] = ce_array[i];
@@ -577,7 +577,9 @@ static int different_name(struct cache_entry *ce, struct cache_entry *alias)
  * So we use the CE_ADDED flag to verify that the alias was an old
  * one before we accept it as
  */
-static struct cache_entry *create_alias_ce(struct cache_entry *ce, struct cache_entry *alias)
+static struct cache_entry *create_alias_ce(struct index_state *istate,
+					   struct cache_entry *ce,
+					   struct cache_entry *alias)
 {
 	int len;
 	struct cache_entry *new;
@@ -590,7 +592,7 @@ static struct cache_entry *create_alias_ce(struct cache_entry *ce, struct cache_
 	new = xcalloc(1, cache_entry_size(len));
 	memcpy(new->name, alias->name, len);
 	copy_cache_entry(new, ce);
-	free(ce);
+	save_or_free_index_entry(istate, ce);
 	return new;
 }
 
@@ -683,7 +685,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 		set_object_name_for_intent_to_add_entry(ce);
 
 	if (ignore_case && alias && different_name(ce, alias))
-		ce = create_alias_ce(ce, alias);
+		ce = create_alias_ce(istate, ce, alias);
 	ce->ce_flags |= CE_ADDED;
 
 	/* It was suspected to be racily clean, but it turns out to be Ok */
diff --git a/split-index.c b/split-index.c
index 63b52bb..2bb5d55 100644
--- a/split-index.c
+++ b/split-index.c
@@ -88,3 +88,15 @@ void discard_split_index(struct index_state *istate)
 	}
 	free(si);
 }
+
+void save_or_free_index_entry(struct index_state *istate, struct cache_entry *ce)
+{
+	if (ce->index &&
+	    istate->split_index &&
+	    istate->split_index->base &&
+	    ce->index <= istate->split_index->base->cache_nr &&
+	    ce == istate->split_index->base->cache[ce->index - 1])
+		ce->ce_flags |= CE_REMOVE;
+	else
+		free(ce);
+}
diff --git a/split-index.h b/split-index.h
index 8d74041..5302118 100644
--- a/split-index.h
+++ b/split-index.h
@@ -12,6 +12,7 @@ struct split_index {
 };
 
 struct split_index *init_split_index(struct index_state *istate);
+void save_or_free_index_entry(struct index_state *istate, struct cache_entry *ce);
 int read_link_extension(struct index_state *istate,
 			const void *data, unsigned long sz);
 int write_link_extension(struct strbuf *sb,
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 20/32] read-cache: mark updated entries for split index
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (18 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 19/32] read-cache: save deleted entries in " Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 21/32] split-index: the writing part Nguyễn Thái Ngọc Duy
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/update-index.c |  2 ++
 cache.h                |  2 ++
 entry.c                |  1 +
 read-cache.c           |  5 ++++-
 split-index.c          | 15 +++++++++++++++
 split-index.h          |  3 +++
 unpack-trees.c         |  4 +++-
 7 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index fa3c441..f7a19c4 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -55,6 +55,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 			active_cache[pos]->ce_flags |= flag;
 		else
 			active_cache[pos]->ce_flags &= ~flag;
+		active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
 		cache_tree_invalidate_path(&the_index, path);
 		active_cache_changed |= CE_ENTRY_CHANGED;
 		return 0;
@@ -268,6 +269,7 @@ static void chmod_path(int flip, const char *path)
 		goto fail;
 	}
 	cache_tree_invalidate_path(&the_index, path);
+	ce->ce_flags |= CE_UPDATE_IN_BASE;
 	active_cache_changed |= CE_ENTRY_CHANGED;
 	report("chmod %cx '%s'", flip, path);
 	return;
diff --git a/cache.h b/cache.h
index 90a5998..127804e 100644
--- a/cache.h
+++ b/cache.h
@@ -169,6 +169,8 @@ struct cache_entry {
 /* used to temporarily mark paths matched by pathspecs */
 #define CE_MATCHED           (1 << 26)
 
+#define CE_UPDATE_IN_BASE    (1 << 27)
+
 /*
  * Extended on-disk flags
  */
diff --git a/entry.c b/entry.c
index d913c1d..1eda8e9 100644
--- a/entry.c
+++ b/entry.c
@@ -214,6 +214,7 @@ finish:
 		if (!fstat_done)
 			lstat(ce->name, &st);
 		fill_stat_cache_info(ce, &st);
+		ce->ce_flags |= CE_UPDATE_IN_BASE;
 		state->istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
 	return 0;
diff --git a/read-cache.c b/read-cache.c
index 7cdb171..a717171 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -39,7 +39,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
-		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED)
+		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -54,9 +54,11 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 {
 	struct cache_entry *old = istate->cache[nr];
 
+	replace_index_entry_in_base(istate, old, ce);
 	remove_name_hash(istate, old);
 	free(old);
 	set_index_entry(istate, nr, ce);
+	ce->ce_flags |= CE_UPDATE_IN_BASE;
 	istate->cache_changed |= CE_ENTRY_CHANGED;
 }
 
@@ -1192,6 +1194,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 				 * means the index is not valid anymore.
 				 */
 				ce->ce_flags &= ~CE_VALID;
+				ce->ce_flags |= CE_UPDATE_IN_BASE;
 				istate->cache_changed |= CE_ENTRY_CHANGED;
 			}
 			if (quiet)
diff --git a/split-index.c b/split-index.c
index 2bb5d55..b36c73b 100644
--- a/split-index.c
+++ b/split-index.c
@@ -100,3 +100,18 @@ void save_or_free_index_entry(struct index_state *istate, struct cache_entry *ce
 	else
 		free(ce);
 }
+
+void replace_index_entry_in_base(struct index_state *istate,
+				 struct cache_entry *old,
+				 struct cache_entry *new)
+{
+	if (old->index &&
+	    istate->split_index &&
+	    istate->split_index->base &&
+	    old->index <= istate->split_index->base->cache_nr) {
+		new->index = old->index;
+		if (old != istate->split_index->base->cache[new->index - 1])
+			free(istate->split_index->base->cache[new->index - 1]);
+		istate->split_index->base->cache[new->index - 1] = new;
+	}
+}
diff --git a/split-index.h b/split-index.h
index 5302118..812e510 100644
--- a/split-index.h
+++ b/split-index.h
@@ -13,6 +13,9 @@ struct split_index {
 
 struct split_index *init_split_index(struct index_state *istate);
 void save_or_free_index_entry(struct index_state *istate, struct cache_entry *ce);
+void replace_index_entry_in_base(struct index_state *istate,
+				 struct cache_entry *old,
+				 struct cache_entry *new);
 int read_link_extension(struct index_state *istate,
 			const void *data, unsigned long sz);
 int write_link_extension(struct strbuf *sb,
diff --git a/unpack-trees.c b/unpack-trees.c
index a941f7c..4a9cdf2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -257,8 +257,10 @@ static int apply_sparse_checkout(struct index_state *istate,
 		ce->ce_flags |= CE_SKIP_WORKTREE;
 	else
 		ce->ce_flags &= ~CE_SKIP_WORKTREE;
-	if (was_skip_worktree != ce_skip_worktree(ce))
+	if (was_skip_worktree != ce_skip_worktree(ce)) {
+		ce->ce_flags |= CE_UPDATE_IN_BASE;
 		istate->cache_changed |= CE_ENTRY_CHANGED;
+	}
 
 	/*
 	 * if (!was_skip_worktree && !ce_skip_worktree()) {
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 21/32] split-index: the writing part
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (19 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 20/32] read-cache: mark updated entries for " Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 22/32] split-index: the reading part Nguyễn Thái Ngọc Duy
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

prepare_to_write_split_index() does the major work, classifying
deleted, updated and added entries. write_link_extension() then just
writes it down.

An observation is, deleting an entry, then adding it back is recorded
as "entry X is deleted, entry X is added", not "entry X is replaced".
This is simpler, with small overhead: a replaced entry is stored
without its path, a new entry is store with its path.

A note about unpack_trees() and the deduplication code inside
prepare_to_write_split_index(). Usually tracking updated/removed
entries via read-cache API is enough. unpack_trees() manipulates the
index in a different way: it throws the entire source index out,
builds up a new one, copying/duplicating entries (using dup_entry)
from the source index over if necessary, then returns the new index.

A naive solution would be marking the entire source index "deleted"
and add their duplicates as new. That could bring $GIT_DIR/index back
to the original size. So we try harder and memcmp() between the
original and the duplicate to see if it needs updating.

We could avoid memcmp() too, by avoiding duplicating the original
entry in dup_entry(). The performance gain this way is within noise
level and it complicates unpack-trees.c. So memcmp() is the preferred
way to deal with deduplication.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 split-index.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 split-index.h |   4 +++
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/split-index.c b/split-index.c
index b36c73b..5708807 100644
--- a/split-index.c
+++ b/split-index.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "split-index.h"
+#include "ewah/ewok.h"
 
 struct split_index *init_split_index(struct index_state *istate)
 {
@@ -26,11 +27,22 @@ int read_link_extension(struct index_state *istate,
 	return 0;
 }
 
+static int write_strbuf(void *user_data, const void *data, size_t len)
+{
+	struct strbuf *sb = user_data;
+	strbuf_add(sb, data, len);
+	return len;
+}
+
 int write_link_extension(struct strbuf *sb,
 			 struct index_state *istate)
 {
 	struct split_index *si = istate->split_index;
 	strbuf_add(sb, si->base_sha1, 20);
+	if (!si->delete_bitmap && !si->replace_bitmap)
+		return 0;
+	ewah_serialize_to(si->delete_bitmap, write_strbuf, sb);
+	ewah_serialize_to(si->replace_bitmap, write_strbuf, sb);
 	return 0;
 }
 
@@ -62,14 +74,99 @@ void merge_base_index(struct index_state *istate)
 void prepare_to_write_split_index(struct index_state *istate)
 {
 	struct split_index *si = init_split_index(istate);
-	/* take cache[] out temporarily */
+	struct cache_entry **entries = NULL, *ce;
+	int i, nr_entries = 0, nr_alloc = 0;
+
+	si->delete_bitmap = ewah_new();
+	si->replace_bitmap = ewah_new();
+
+	if (si->base) {
+		/* Go through istate->cache[] and mark CE_MATCHED to
+		 * entry with positive index. We'll go through
+		 * base->cache[] later to delete all entries in base
+		 * that are not marked eith either CE_MATCHED or
+		 * CE_UPDATE_IN_BASE. If istate->cache[i] is a
+		 * duplicate, deduplicate it.
+		 */
+		for (i = 0; i < istate->cache_nr; i++) {
+			struct cache_entry *base;
+			/* namelen is checked separately */
+			const unsigned int ondisk_flags =
+				CE_STAGEMASK | CE_VALID | CE_EXTENDED_FLAGS;
+			unsigned int ce_flags, base_flags, ret;
+			ce = istate->cache[i];
+			if (!ce->index)
+				continue;
+			if (ce->index > si->base->cache_nr) {
+				ce->index = 0;
+				continue;
+			}
+			ce->ce_flags |= CE_MATCHED; /* or "shared" */
+			base = si->base->cache[ce->index - 1];
+			if (ce == base)
+				continue;
+			if (ce->ce_namelen != base->ce_namelen ||
+			    strcmp(ce->name, base->name)) {
+				ce->index = 0;
+				continue;
+			}
+			ce_flags = ce->ce_flags;
+			base_flags = base->ce_flags;
+			/* only on-disk flags matter */
+			ce->ce_flags   &= ondisk_flags;
+			base->ce_flags &= ondisk_flags;
+			ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
+				     offsetof(struct cache_entry, name) -
+				     offsetof(struct cache_entry, ce_stat_data));
+			ce->ce_flags = ce_flags;
+			base->ce_flags = base_flags;
+			if (ret)
+				ce->ce_flags |= CE_UPDATE_IN_BASE;
+			free(base);
+			si->base->cache[ce->index - 1] = ce;
+		}
+		for (i = 0; i < si->base->cache_nr; i++) {
+			ce = si->base->cache[i];
+			if ((ce->ce_flags & CE_REMOVE) ||
+			    !(ce->ce_flags & CE_MATCHED))
+				ewah_set(si->delete_bitmap, i);
+			else if (ce->ce_flags & CE_UPDATE_IN_BASE) {
+				ewah_set(si->replace_bitmap, i);
+				ALLOC_GROW(entries, nr_entries+1, nr_alloc);
+				entries[nr_entries++] = ce;
+			}
+		}
+	}
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		ce = istate->cache[i];
+		if ((!si->base || !ce->index) && !(ce->ce_flags & CE_REMOVE)) {
+			ALLOC_GROW(entries, nr_entries+1, nr_alloc);
+			entries[nr_entries++] = ce;
+		}
+		ce->ce_flags &= ~CE_MATCHED;
+	}
+
+	/*
+	 * take cache[] out temporarily, put entries[] in its place
+	 * for writing
+	 */
+	si->saved_cache = istate->cache;
 	si->saved_cache_nr = istate->cache_nr;
-	istate->cache_nr = 0;
+	istate->cache = entries;
+	istate->cache_nr = nr_entries;
 }
 
 void finish_writing_split_index(struct index_state *istate)
 {
 	struct split_index *si = init_split_index(istate);
+
+	ewah_free(si->delete_bitmap);
+	ewah_free(si->replace_bitmap);
+	si->delete_bitmap = NULL;
+	si->replace_bitmap = NULL;
+	free(istate->cache);
+	istate->cache = si->saved_cache;
 	istate->cache_nr = si->saved_cache_nr;
 }
 
diff --git a/split-index.h b/split-index.h
index 812e510..53b778f 100644
--- a/split-index.h
+++ b/split-index.h
@@ -3,10 +3,14 @@
 
 struct index_state;
 struct strbuf;
+struct ewah_bitmap;
 
 struct split_index {
 	unsigned char base_sha1[20];
 	struct index_state *base;
+	struct ewah_bitmap *delete_bitmap;
+	struct ewah_bitmap *replace_bitmap;
+	struct cache_entry **saved_cache;
 	unsigned int saved_cache_nr;
 	int refcount;
 };
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 22/32] split-index: the reading part
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (20 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 21/32] split-index: the writing part Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 23/32] split-index: do not invalidate cache-tree at read time Nguyễn Thái Ngọc Duy
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

CE_REMOVE'd entries are removed here because only parts of the code
base (unpack_trees in fact) test this bit when they look for the
presence of an entry. Leaving them may confuse the code ignores this
bit and expects to see a real entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c  |  2 --
 split-index.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 split-index.h |  2 ++
 3 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index a717171..a5517bf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1569,8 +1569,6 @@ int read_index_from(struct index_state *istate, const char *path)
 
 	if (is_null_sha1(split_index->base_sha1))
 		return ret;
-	if (istate->cache_nr)
-		die("index in split-index mode must contain no entries");
 
 	if (split_index->base)
 		discard_index(split_index->base);
diff --git a/split-index.c b/split-index.c
index 5708807..b03a250 100644
--- a/split-index.c
+++ b/split-index.c
@@ -16,13 +16,27 @@ int read_link_extension(struct index_state *istate,
 {
 	const unsigned char *data = data_;
 	struct split_index *si;
+	int ret;
+
 	if (sz < 20)
 		return error("corrupt link extension (too short)");
 	si = init_split_index(istate);
 	hashcpy(si->base_sha1, data);
 	data += 20;
 	sz -= 20;
-	if (sz)
+	if (!sz)
+		return 0;
+	si->delete_bitmap = ewah_new();
+	ret = ewah_read_mmap(si->delete_bitmap, data, sz);
+	if (ret < 0)
+		return error("corrupt delete bitmap in link extension");
+	data += ret;
+	sz -= ret;
+	si->replace_bitmap = ewah_new();
+	ret = ewah_read_mmap(si->replace_bitmap, data, sz);
+	if (ret < 0)
+		return error("corrupt replace bitmap in link extension");
+	if (ret != sz)
 		return error("garbage at the end of link extension");
 	return 0;
 }
@@ -60,15 +74,81 @@ static void mark_base_index_entries(struct index_state *base)
 		base->cache[i]->index = i + 1;
 }
 
+static void mark_entry_for_delete(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	if (pos >= istate->cache_nr)
+		die("position for delete %d exceeds base index size %d",
+		    (int)pos, istate->cache_nr);
+	istate->cache[pos]->ce_flags |= CE_REMOVE;
+	istate->split_index->nr_deletions = 1;
+}
+
+static void replace_entry(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	struct split_index *si = istate->split_index;
+	struct cache_entry *dst, *src;
+	if (pos >= istate->cache_nr)
+		die("position for replacement %d exceeds base index size %d",
+		    (int)pos, istate->cache_nr);
+	if (si->nr_replacements >= si->saved_cache_nr)
+		die("too many replacements (%d vs %d)",
+		    si->nr_replacements, si->saved_cache_nr);
+	dst = istate->cache[pos];
+	if (dst->ce_flags & CE_REMOVE)
+		die("entry %d is marked as both replaced and deleted",
+		    (int)pos);
+	src = si->saved_cache[si->nr_replacements];
+	src->index = pos + 1;
+	src->ce_flags |= CE_UPDATE_IN_BASE;
+	free(dst);
+	dst = src;
+	si->nr_replacements++;
+}
+
 void merge_base_index(struct index_state *istate)
 {
 	struct split_index *si = istate->split_index;
+	unsigned int i;
 
 	mark_base_index_entries(si->base);
-	istate->cache_nr = si->base->cache_nr;
+
+	si->saved_cache	    = istate->cache;
+	si->saved_cache_nr  = istate->cache_nr;
+	istate->cache_nr    = si->base->cache_nr;
+	istate->cache	    = NULL;
+	istate->cache_alloc = 0;
 	ALLOC_GROW(istate->cache, istate->cache_nr, istate->cache_alloc);
 	memcpy(istate->cache, si->base->cache,
 	       sizeof(*istate->cache) * istate->cache_nr);
+
+	si->nr_deletions = 0;
+	si->nr_replacements = 0;
+	ewah_each_bit(si->replace_bitmap, replace_entry, istate);
+	ewah_each_bit(si->delete_bitmap, mark_entry_for_delete, istate);
+	if (si->nr_deletions)
+		remove_marked_cache_entries(istate);
+
+	for (i = si->nr_replacements; i < si->saved_cache_nr; i++) {
+		add_index_entry(istate, si->saved_cache[i],
+				ADD_CACHE_OK_TO_ADD |
+				/*
+				 * we may have to replay what
+				 * merge-recursive.c:update_stages()
+				 * does, which has this flag on
+				 */
+				ADD_CACHE_SKIP_DFCHECK);
+		si->saved_cache[i] = NULL;
+	}
+
+	ewah_free(si->delete_bitmap);
+	ewah_free(si->replace_bitmap);
+	free(si->saved_cache);
+	si->delete_bitmap  = NULL;
+	si->replace_bitmap = NULL;
+	si->saved_cache	   = NULL;
+	si->saved_cache_nr = 0;
 }
 
 void prepare_to_write_split_index(struct index_state *istate)
diff --git a/split-index.h b/split-index.h
index 53b778f..c1324f5 100644
--- a/split-index.h
+++ b/split-index.h
@@ -12,6 +12,8 @@ struct split_index {
 	struct ewah_bitmap *replace_bitmap;
 	struct cache_entry **saved_cache;
 	unsigned int saved_cache_nr;
+	unsigned int nr_deletions;
+	unsigned int nr_replacements;
 	int refcount;
 };
 
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 23/32] split-index: do not invalidate cache-tree at read time
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (21 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 22/32] split-index: the reading part Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h       | 1 +
 read-cache.c  | 3 ++-
 split-index.c | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 127804e..be95b84 100644
--- a/cache.h
+++ b/cache.h
@@ -488,6 +488,7 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
 #define ADD_CACHE_SKIP_DFCHECK 4	/* Ok to skip DF conflict checks */
 #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16		/* Do not replace existing ones */
+#define ADD_CACHE_KEEP_CACHE_TREE 32	/* Do not invalidate cache-tree */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
 extern int remove_index_entry_at(struct index_state *, int pos);
diff --git a/read-cache.c b/read-cache.c
index a5517bf..43a61d3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -950,7 +950,8 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
-	cache_tree_invalidate_path(istate, ce->name);
+	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
+		cache_tree_invalidate_path(istate, ce->name);
 	pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
 
 	/* existing match? Just replace it. */
diff --git a/split-index.c b/split-index.c
index b03a250..33c0c4b 100644
--- a/split-index.c
+++ b/split-index.c
@@ -133,6 +133,7 @@ void merge_base_index(struct index_state *istate)
 	for (i = si->nr_replacements; i < si->saved_cache_nr; i++) {
 		add_index_entry(istate, si->saved_cache[i],
 				ADD_CACHE_OK_TO_ADD |
+				ADD_CACHE_KEEP_CACHE_TREE |
 				/*
 				 * we may have to replay what
 				 * merge-recursive.c:update_stages()
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 24/32] split-index: strip pathname of on-disk replaced entries
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (22 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 23/32] split-index: do not invalidate cache-tree at read time Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-29 20:25   ` Junio C Hamano
  2014-04-28 10:55 ` [PATCH 25/32] update-index: new options to enable/disable split index mode Nguyễn Thái Ngọc Duy
                   ` (10 subsequent siblings)
  34 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h       |  1 +
 read-cache.c  | 10 ++++++++++
 split-index.c | 14 ++++++++++++--
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index be95b84..604328b 100644
--- a/cache.h
+++ b/cache.h
@@ -170,6 +170,7 @@ struct cache_entry {
 #define CE_MATCHED           (1 << 26)
 
 #define CE_UPDATE_IN_BASE    (1 << 27)
+#define CE_STRIP_NAME        (1 << 28)
 
 /*
  * Extended on-disk flags
diff --git a/read-cache.c b/read-cache.c
index 43a61d3..81835a6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1787,9 +1787,15 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce,
 {
 	int size;
 	struct ondisk_cache_entry *ondisk;
+	int saved_namelen;
 	char *name;
 	int result;
 
+	if (ce->ce_flags & CE_STRIP_NAME) {
+		saved_namelen = ce_namelen(ce);
+		ce->ce_namelen = 0;
+	}
+
 	if (!previous_name) {
 		size = ondisk_ce_size(ce);
 		ondisk = xcalloc(1, size);
@@ -1821,6 +1827,10 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce,
 		strbuf_splice(previous_name, common, to_remove,
 			      ce->name + common, ce_namelen(ce) - common);
 	}
+	if (ce->ce_flags & CE_STRIP_NAME) {
+		ce->ce_namelen = saved_namelen;
+		ce->ce_flags &= ~CE_STRIP_NAME;
+	}
 
 	result = ce_write(c, fd, ondisk, size);
 	free(ondisk);
diff --git a/split-index.c b/split-index.c
index 33c0c4b..ee3246f 100644
--- a/split-index.c
+++ b/split-index.c
@@ -89,6 +89,7 @@ static void replace_entry(size_t pos, void *data)
 	struct index_state *istate = data;
 	struct split_index *si = istate->split_index;
 	struct cache_entry *dst, *src;
+
 	if (pos >= istate->cache_nr)
 		die("position for replacement %d exceeds base index size %d",
 		    (int)pos, istate->cache_nr);
@@ -100,10 +101,14 @@ static void replace_entry(size_t pos, void *data)
 		die("entry %d is marked as both replaced and deleted",
 		    (int)pos);
 	src = si->saved_cache[si->nr_replacements];
+	if (ce_namelen(src))
+		die("corrupt link extension, entry %d should have "
+		    "zero length name", (int)pos);
 	src->index = pos + 1;
 	src->ce_flags |= CE_UPDATE_IN_BASE;
-	free(dst);
-	dst = src;
+	src->ce_namelen = dst->ce_namelen;
+	copy_cache_entry(dst, src);
+	free(src);
 	si->nr_replacements++;
 }
 
@@ -131,6 +136,9 @@ void merge_base_index(struct index_state *istate)
 		remove_marked_cache_entries(istate);
 
 	for (i = si->nr_replacements; i < si->saved_cache_nr; i++) {
+		if (!ce_namelen(si->saved_cache[i]))
+			die("corrupt link extension, entry %d should "
+			    "have non-zero length name", i);
 		add_index_entry(istate, si->saved_cache[i],
 				ADD_CACHE_OK_TO_ADD |
 				ADD_CACHE_KEEP_CACHE_TREE |
@@ -213,6 +221,7 @@ void prepare_to_write_split_index(struct index_state *istate)
 				ewah_set(si->delete_bitmap, i);
 			else if (ce->ce_flags & CE_UPDATE_IN_BASE) {
 				ewah_set(si->replace_bitmap, i);
+				ce->ce_flags |= CE_STRIP_NAME;
 				ALLOC_GROW(entries, nr_entries+1, nr_alloc);
 				entries[nr_entries++] = ce;
 			}
@@ -222,6 +231,7 @@ void prepare_to_write_split_index(struct index_state *istate)
 	for (i = 0; i < istate->cache_nr; i++) {
 		ce = istate->cache[i];
 		if ((!si->base || !ce->index) && !(ce->ce_flags & CE_REMOVE)) {
+			assert(!(ce->ce_flags & CE_STRIP_NAME));
 			ALLOC_GROW(entries, nr_entries+1, nr_alloc);
 			entries[nr_entries++] = ce;
 		}
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 25/32] update-index: new options to enable/disable split index mode
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (23 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only Nguyễn Thái Ngọc Duy
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-update-index.txt | 11 +++++++
 builtin/update-index.c             | 18 ++++++++++
 cache.h                            |  1 +
 read-cache.c                       | 67 ++++++++++++++++++++++++++++++++++----
 split-index.c                      | 23 +++++++++++++
 5 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index d6de4a0..dfc09d9 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -161,6 +161,17 @@ may not support it yet.
 	Only meaningful with `--stdin` or `--index-info`; paths are
 	separated with NUL character instead of LF.
 
+--split-index::
+--no-split-index::
+	Enable or disable split index mode. If enabled, the index is
+	split into two files, $GIT_DIR/index and $GIT_DIR/sharedindex.<SHA-1>.
+	Changes are accumulated in $GIT_DIR/index while the shared
+	index file contains all index entries stays unchanged. If
+	split-index mode is already enabled and `--split-index` is
+	given again, all changes in $GIT_DIR/index are pushed back to
+	the shared index file. This mode is designed for very large
+	indexes that take a signficant amount of time to read or write.
+
 \--::
 	Do not interpret any more arguments as options.
 
diff --git a/builtin/update-index.c b/builtin/update-index.c
index f7a19c4..b0503f4 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -13,6 +13,7 @@
 #include "parse-options.h"
 #include "pathspec.h"
 #include "dir.h"
+#include "split-index.h"
 
 /*
  * Default to not allowing changes to the list of files. The
@@ -742,6 +743,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	char set_executable_bit = 0;
 	struct refresh_params refresh_args = {0, &has_errors};
 	int lock_error = 0;
+	int split_index = -1;
 	struct lock_file *lock_file;
 	struct parse_opt_ctx_t ctx;
 	int parseopt_state = PARSE_OPT_UNKNOWN;
@@ -824,6 +826,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			resolve_undo_clear_callback},
 		OPT_INTEGER(0, "index-version", &preferred_index_format,
 			N_("write index in this format")),
+		OPT_BOOL(0, "split-index", &split_index,
+			N_("enable or disable split index")),
 		OPT_END()
 	};
 
@@ -917,6 +921,20 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	if (split_index > 0) {
+		init_split_index(&the_index);
+		the_index.cache_changed |= SPLIT_INDEX_ORDERED;
+	} else if (!split_index && the_index.split_index) {
+		/*
+		 * can't discard_split_index(&the_index); because that
+		 * will destroy split_index->base->cache[], which may
+		 * be shared with the_index.cache[]. So yeah we're
+		 * leaking a bit here.
+		 */
+		the_index.split_index = NULL;
+		the_index.cache_changed |= SOMETHING_CHANGED;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
diff --git a/cache.h b/cache.h
index 604328b..42cdfe6 100644
--- a/cache.h
+++ b/cache.h
@@ -278,6 +278,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define SOMETHING_CHANGED	(1 << 3) /* unclassified changes go here */
 #define RESOLVE_UNDO_CHANGED	(1 << 4)
 #define CACHE_TREE_CHANGED	(1 << 5)
+#define SPLIT_INDEX_ORDERED	(1 << 6)
 
 struct split_index;
 struct index_state {
diff --git a/read-cache.c b/read-cache.c
index 81835a6..a6c9407 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -15,6 +15,7 @@
 #include "strbuf.h"
 #include "varint.h"
 #include "split-index.h"
+#include "sigchain.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -39,7 +40,8 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
-		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED)
+		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
+		 SPLIT_INDEX_ORDERED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -1860,7 +1862,8 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-static int do_write_index(struct index_state *istate, int newfd)
+static int do_write_index(struct index_state *istate, int newfd,
+			  int strip_extensions)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
@@ -1923,7 +1926,7 @@ static int do_write_index(struct index_state *istate, int newfd)
 	strbuf_release(&previous_name_buf);
 
 	/* Write extension data here */
-	if (istate->split_index) {
+	if (!strip_extensions && istate->split_index) {
 		struct strbuf sb = STRBUF_INIT;
 
 		err = write_link_extension(&sb, istate) < 0 ||
@@ -1934,7 +1937,7 @@ static int do_write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
-	if (istate->cache_tree) {
+	if (!strip_extensions && istate->cache_tree) {
 		struct strbuf sb = STRBUF_INIT;
 
 		cache_tree_write(&sb, istate->cache_tree);
@@ -1944,7 +1947,7 @@ static int do_write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
-	if (istate->resolve_undo) {
+	if (!strip_extensions && istate->resolve_undo) {
 		struct strbuf sb = STRBUF_INIT;
 
 		resolve_undo_write(&sb, istate->resolve_undo);
@@ -1985,7 +1988,7 @@ static int commit_locked_index(struct lock_file *lk)
 static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
 				 unsigned flags)
 {
-	int ret = do_write_index(istate, lock->fd);
+	int ret = do_write_index(istate, lock->fd, 0);
 	if (ret)
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
@@ -2009,6 +2012,52 @@ static int write_split_index(struct index_state *istate,
 	return ret;
 }
 
+static char *temporary_sharedindex;
+
+static void remove_temporary_sharedindex(void)
+{
+	if (temporary_sharedindex) {
+		unlink_or_warn(temporary_sharedindex);
+		free(temporary_sharedindex);
+		temporary_sharedindex = NULL;
+	}
+}
+
+static void remove_temporary_sharedindex_on_signal(int signo)
+{
+	remove_temporary_sharedindex();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static int write_shared_index(struct index_state *istate)
+{
+	struct split_index *si = istate->split_index;
+	static int installed_handler;
+	int fd, ret;
+
+	temporary_sharedindex = git_pathdup("sharedindex_XXXXXX");
+	fd = xmkstemp(temporary_sharedindex);
+	if (!installed_handler) {
+		atexit(remove_temporary_sharedindex);
+		sigchain_push_common(remove_temporary_sharedindex_on_signal);
+	}
+	move_cache_to_base_index(istate);
+	ret = do_write_index(si->base, fd, 1);
+	close(fd);
+	if (ret) {
+		remove_temporary_sharedindex();
+		return ret;
+	}
+	ret = rename(temporary_sharedindex,
+		     git_path("sharedindex.%s", sha1_to_hex(si->base->sha1)));
+	free(temporary_sharedindex);
+	temporary_sharedindex = NULL;
+	if (!ret)
+		hashcpy(si->base_sha1, si->base->sha1);
+	return ret;
+}
+
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		       unsigned flags)
 {
@@ -2020,6 +2069,12 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		return do_write_locked_index(istate, lock, flags);
 	}
 
+	if (istate->cache_changed & SPLIT_INDEX_ORDERED) {
+		int ret = write_shared_index(istate);
+		if (ret)
+			return ret;
+	}
+
 	return write_split_index(istate, lock, flags);
 }
 
diff --git a/split-index.c b/split-index.c
index ee3246f..21485e2 100644
--- a/split-index.c
+++ b/split-index.c
@@ -74,6 +74,29 @@ static void mark_base_index_entries(struct index_state *base)
 		base->cache[i]->index = i + 1;
 }
 
+void move_cache_to_base_index(struct index_state *istate)
+{
+	struct split_index *si = istate->split_index;
+	int i;
+
+	/*
+	 * do not delete old si->base, its index entries may be shared
+	 * with istate->cache[]. Accept a bit of leaking here because
+	 * this code is only used by short-lived update-index.
+	 */
+	si->base = xcalloc(1, sizeof(*si->base));
+	si->base->version = istate->version;
+	/* zero timestamp disables racy test in ce_write_index() */
+	si->base->timestamp = istate->timestamp;
+	ALLOC_GROW(si->base->cache, istate->cache_nr, si->base->cache_alloc);
+	si->base->cache_nr = istate->cache_nr;
+	memcpy(si->base->cache, istate->cache,
+	       sizeof(*istate->cache) * istate->cache_nr);
+	mark_base_index_entries(si->base);
+	for (i = 0; i < si->base->cache_nr; i++)
+		si->base->cache[i]->ce_flags &= ~CE_UPDATE_IN_BASE;
+}
+
 static void mark_entry_for_delete(size_t pos, void *data)
 {
 	struct index_state *istate = data;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (24 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 25/32] update-index: new options to enable/disable split index mode Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path Nguyễn Thái Ngọc Duy
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

If $GIT_DIR is read only, we can't write $GIT_DIR/sharedindex. This
could happen when $GIT_INDEX_FILE is set to somehwere outside
$GIT_DIR.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index a6c9407..f9fc3a5 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2030,14 +2030,21 @@ static void remove_temporary_sharedindex_on_signal(int signo)
 	raise(signo);
 }
 
-static int write_shared_index(struct index_state *istate)
+static int write_shared_index(struct index_state *istate,
+			      struct lock_file *lock, unsigned flags)
 {
 	struct split_index *si = istate->split_index;
 	static int installed_handler;
 	int fd, ret;
 
 	temporary_sharedindex = git_pathdup("sharedindex_XXXXXX");
-	fd = xmkstemp(temporary_sharedindex);
+	fd = mkstemp(temporary_sharedindex);
+	if (fd < 0) {
+		free(temporary_sharedindex);
+		temporary_sharedindex = NULL;
+		hashclr(si->base_sha1);
+		return do_write_locked_index(istate, lock, flags);
+	}
 	if (!installed_handler) {
 		atexit(remove_temporary_sharedindex);
 		sigchain_push_common(remove_temporary_sharedindex_on_signal);
@@ -2070,7 +2077,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	}
 
 	if (istate->cache_changed & SPLIT_INDEX_ORDERED) {
-		int ret = write_shared_index(istate);
+		int ret = write_shared_index(istate, lock, flags);
 		if (ret)
 			return ret;
 	}
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (25 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 28/32] read-tree: force split-index mode off on --index-output Nguyễn Thái Ngọc Duy
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Normally scripts do not have to be aware about split indexes because
all shared indexes are in $GIT_DIR. A simple "mv $tmp_index
$GIT_DIR/somewhere" is enough. Scripts that generate temporary indexes
and move them across repos must be aware about split index and copy
the shared file as well. This option enables that.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-rev-parse.txt |  4 ++++
 builtin/rev-parse.c             | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index 987395d..9bd76a5 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -245,6 +245,10 @@ print a message to stderr and exit with nonzero status.
 --show-toplevel::
 	Show the absolute path of the top-level directory.
 
+--shared-index-path::
+	Show the path to the shared index file in split index mode, or
+	empty if not in split-index mode.
+
 Other Options
 ~~~~~~~~~~~~~
 
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 1a6122d..8102aaa 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -11,6 +11,7 @@
 #include "parse-options.h"
 #include "diff.h"
 #include "revision.h"
+#include "split-index.h"
 
 #define DO_REVS		1
 #define DO_NOREV	2
@@ -775,6 +776,15 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 						: "false");
 				continue;
 			}
+			if (!strcmp(arg, "--shared-index-path")) {
+				if (read_cache() < 0)
+					die(_("Could not read the index"));
+				if (the_index.split_index) {
+					const unsigned char *sha1 = the_index.split_index->base_sha1;
+					puts(git_path("sharedindex.%s", sha1_to_hex(sha1)));
+				}
+				continue;
+			}
 			if (starts_with(arg, "--since=")) {
 				show_datestring("--max-age=", arg+8);
 				continue;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 28/32] read-tree: force split-index mode off on --index-output
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (26 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 29/32] read-tree: note about dropping split-index mode or index version Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Just a (paranoid?) safety measure..

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index f9fc3a5..568bc20 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2070,7 +2070,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 {
 	struct split_index *si = istate->split_index;
 
-	if (!si || (istate->cache_changed & ~EXTMASK)) {
+	if (!si || alternate_index_output ||
+	    (istate->cache_changed & ~EXTMASK)) {
 		if (si)
 			hashclr(si->base_sha1);
 		return do_write_locked_index(istate, lock, flags);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 29/32] read-tree: note about dropping split-index mode or index version
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (27 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 28/32] read-tree: force split-index mode off on --index-output Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/read-tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 3204c62..e7e1c33 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -155,6 +155,15 @@ int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
 	if (1 < opts.merge + opts.reset + prefix_set)
 		die("Which one? -m, --reset, or --prefix?");
 
+	/*
+	 * NEEDSWORK
+	 *
+	 * The old index should be read anyway even if we're going to
+	 * destroy all index entries because we still need to preserve
+	 * certain information such as index version or split-index
+	 * mode.
+	 */
+
 	if (opts.reset || opts.merge || opts.prefix) {
 		if (read_cache_unmerged() && (opts.prefix || opts.merge))
 			die("You need to resolve your current index first");
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (28 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 29/32] read-tree: note about dropping split-index mode or index version Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 31/32] t2104: make sure split index mode is off for the version test Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This could be used to run the whole test suite with split
indexes. Index splitting is carried out at random. "git read-tree"
also resets the index and forces splitting at the next update.

I had a lot of headaches with the test suite, which proves it
exercises split index pretty good.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 568bc20..831b67e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1885,8 +1885,11 @@ static int do_write_index(struct index_state *istate, int newfd,
 		}
 	}
 
-	if (!istate->version)
+	if (!istate->version) {
 		istate->version = get_index_format_default();
+		if (getenv("GIT_TEST_SPLIT_INDEX"))
+			init_split_index(istate);
+	}
 
 	/* demote version 3 to version 2 when the latter suffices */
 	if (istate->version == 3 || istate->version == 2)
@@ -2077,6 +2080,11 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		return do_write_locked_index(istate, lock, flags);
 	}
 
+	if (getenv("GIT_TEST_SPLIT_INDEX")) {
+		int v = si->base_sha1[0];
+		if ((v & 15) < 6)
+			istate->cache_changed |= SPLIT_INDEX_ORDERED;
+	}
 	if (istate->cache_changed & SPLIT_INDEX_ORDERED) {
 		int ret = write_shared_index(istate, lock, flags);
 		if (ret)
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 31/32] t2104: make sure split index mode is off for the version test
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (29 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 32/32] t1700: new tests for split-index mode Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Version tests only make sense when all entries are in the same file,
so we can see if version is downgraded to 2 if 3 is not required.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t2104-update-index-skip-worktree.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t2104-update-index-skip-worktree.sh b/t/t2104-update-index-skip-worktree.sh
index 29c1fb1..cc830da 100755
--- a/t/t2104-update-index-skip-worktree.sh
+++ b/t/t2104-update-index-skip-worktree.sh
@@ -7,6 +7,8 @@ test_description='skip-worktree bit test'
 
 . ./test-lib.sh
 
+sane_unset GIT_TEST_SPLIT_INDEX
+
 test_set_index_version 3
 
 cat >expect.full <<EOF
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 32/32] t1700: new tests for split-index mode
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (30 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 31/32] t2104: make sure split index mode is off for the version test Nguyễn Thái Ngọc Duy
@ 2014-04-28 10:55 ` Nguyễn Thái Ngọc Duy
  2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore                      |   1 +
 Makefile                        |   1 +
 cache.h                         |   2 +
 read-cache.c                    |   3 +-
 t/t1700-split-index.sh (new +x) | 194 ++++++++++++++++++++++++++++++++++++++++
 test-dump-split-index.c (new)   |  34 +++++++
 6 files changed, 233 insertions(+), 2 deletions(-)
 create mode 100755 t/t1700-split-index.sh
 create mode 100644 test-dump-split-index.c

diff --git a/.gitignore b/.gitignore
index dc600f9..70992a4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -180,6 +180,7 @@
 /test-date
 /test-delta
 /test-dump-cache-tree
+/test-dump-split-index
 /test-scrap-cache-tree
 /test-genrandom
 /test-hashmap
diff --git a/Makefile b/Makefile
index c3957bb..e39a4f5 100644
--- a/Makefile
+++ b/Makefile
@@ -562,6 +562,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-index-version
diff --git a/cache.h b/cache.h
index 42cdfe6..6ad2595 100644
--- a/cache.h
+++ b/cache.h
@@ -473,6 +473,8 @@ extern int daemonize(void);
 struct lock_file;
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
+extern int do_read_index(struct index_state *istate, const char *path,
+			 int must_exist); /* for testting only! */
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
diff --git a/read-cache.c b/read-cache.c
index 831b67e..159c3e8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1463,8 +1463,7 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 }
 
 /* remember to discard_cache() before reading a different cache! */
-static int do_read_index(struct index_state *istate, const char *path,
-			 int must_exist)
+int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
 	int fd, i;
 	struct stat st;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
new file mode 100755
index 0000000..94fb473
--- /dev/null
+++ b/t/t1700-split-index.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+
+test_description='split index mode tests'
+
+. ./test-lib.sh
+
+# We need total control of index splitting here
+sane_unset GIT_TEST_SPLIT_INDEX
+
+test_expect_success 'enable split index' '
+	git update-index --split-index &&
+	test-dump-split-index .git/index >actual &&
+	cat >expect <<EOF &&
+own 8299b0bcd1ac364e5f1d7768efb62fa2da79a339
+base 39d890139ee5356c7ef572216cebcd27aa41f9df
+replacements:
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'add one file' '
+	: >one &&
+	git update-index --add one &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+base 39d890139ee5356c7ef572216cebcd27aa41f9df
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+replacements:
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'disable split index' '
+	git update-index --no-split-index &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	BASE=`test-dump-split-index .git/index | grep "^own" | sed "s/own/base/"` &&
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+not a split index
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'enable split index again, "one" now belongs to base index"' '
+	git update-index --split-index &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+$BASE
+replacements:
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'modify original file, base index untouched' '
+	echo modified >one &&
+	git update-index one &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 2e0996000b7e9019eabcad29391bf0f5c7702f0b 0	one
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	q_to_tab >expect <<EOF &&
+$BASE
+100644 2e0996000b7e9019eabcad29391bf0f5c7702f0b 0Q
+replacements: 0
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'add another file, which stays index' '
+	: >two &&
+	git update-index --add two &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 2e0996000b7e9019eabcad29391bf0f5c7702f0b 0	one
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	two
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	q_to_tab >expect <<EOF &&
+$BASE
+100644 2e0996000b7e9019eabcad29391bf0f5c7702f0b 0Q
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	two
+replacements: 0
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'remove file not in base index' '
+	git update-index --force-remove two &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 2e0996000b7e9019eabcad29391bf0f5c7702f0b 0	one
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	q_to_tab >expect <<EOF &&
+$BASE
+100644 2e0996000b7e9019eabcad29391bf0f5c7702f0b 0Q
+replacements: 0
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'remove file in base index' '
+	git update-index --force-remove one &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+$BASE
+replacements:
+deletions: 0
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'add original file back' '
+	: >one &&
+	git update-index --add one &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+$BASE
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+replacements:
+deletions: 0
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'add new file' '
+	: >two &&
+	git update-index --add two &&
+	git ls-files --stage >actual &&
+	cat >expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	two
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'unify index, two files remain' '
+	git update-index --no-split-index &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	two
+EOF
+	test_cmp ls-files.expect ls-files.actual
+
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+not a split index
+EOF
+	test_cmp expect actual
+'
+
+test_done
diff --git a/test-dump-split-index.c b/test-dump-split-index.c
new file mode 100644
index 0000000..42fb4f4
--- /dev/null
+++ b/test-dump-split-index.c
@@ -0,0 +1,34 @@
+#include "cache.h"
+#include "split-index.h"
+#include "ewah/ewok.h"
+
+static void show_bit(size_t pos, void *data)
+{
+	printf(" %d", (int)pos);
+}
+
+int main(int ac, char **av)
+{
+	struct split_index *si;
+	int i;
+
+	do_read_index(&the_index, av[1], 1);
+	printf("own %s\n", sha1_to_hex(the_index.sha1));
+	si = the_index.split_index;
+	if (!si) {
+		printf("not a split index\n");
+		return 0;
+	}
+	printf("base %s\n", sha1_to_hex(si->base_sha1));
+	for (i = 0; i< the_index.cache_nr; i++) {
+		struct cache_entry *ce = the_index.cache[i];
+		printf("%06o %s %d\t%s\n", ce->ce_mode,
+		       sha1_to_hex(ce->sha1), ce_stage(ce), ce->name);
+	}
+	printf("replacements:");
+	ewah_each_bit(si->replace_bitmap, show_bit, NULL);
+	printf("\ndeletions:");
+	ewah_each_bit(si->delete_bitmap, show_bit, NULL);
+	printf("\n");
+	return 0;
+}
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (31 preceding siblings ...)
  2014-04-28 10:55 ` [PATCH 32/32] t1700: new tests for split-index mode Nguyễn Thái Ngọc Duy
@ 2014-04-28 21:18 ` Shawn Pearce
  2014-04-29  1:52   ` Duy Nguyen
                     ` (3 more replies)
  2014-04-28 22:23 ` [PATCH 00/32] Split index mode for very large indexes Junio C Hamano
  2014-04-30 20:48 ` Richard Hansen
  34 siblings, 4 replies; 76+ messages in thread
From: Shawn Pearce @ 2014-04-28 21:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Mon, Apr 28, 2014 at 3:55 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> I hinted about it earlier [1]. It now passes the test suite and with a
> design that I'm happy with (thanks to Junio for a suggestion about the
> rename problem).
>
> From the user point of view, this reduces the writable size of index
> down to the number of updated files. For example my webkit index v4 is
> 14MB. With a fresh split, I only have to update an index of 200KB.
> Every file I touch will add about 80 bytes to that. As long as I don't
> touch every single tracked file in my worktree, I should not pay
> penalty for writing 14MB index file on every operation.

This is a very welcome type of improvement.

I am however concerned about the complexity of the format employed.
Why do we need two EWAH bitmaps in the new index? Why isn't this just
a pair of sorted files that are merge-joined at read, with records in
$GIT_DIR/index taking priority over same-named records in
$GIT_DIR/sharedindex.$SHA1?  Deletes could be marked with a bit or an
"all zero" metadata record.

> The read penalty is not addressed here, so I still pay 14MB hashing
> cost. But that's an easy problem. We could cache the validated index
> in a daemon. Whenever git needs to load an index, it pokes the daemon.
> The daemon verifies that the on-disk index still has the same
> signature, then sends the in-mem index to git. When git updates the
> index, it pokes the daemon again to update in-mem index. Next time git
> reads the index, it does not have to pay I/O cost any more (actually
> it does but the cost is hidden away when you do not have to read it
> yet).

If we are going this far, maybe it is worthwhile building a mmap()
region the daemon exports to the git client that holds the "in memory"
format of the index. Clients would mmap this PROT_READ, MAP_PRIVATE
and can then quickly access the base file information without doing
further validation, or copying the large(ish) data over a pipe.


Junio had some other great ideas for improving the index on really
large trees. Maybe I should let him comment since they are really his
ideas. Something about not even checking out most files, storing most
subtrees as just a "tree" entry in the index. E.g. if you are a bad
developer and never touch the "t/" subdirectory then that is stored as
just "t" and the SHA-1 of the "t" tree, rather than the recursively
exploded list of the test directory.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (32 preceding siblings ...)
  2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
@ 2014-04-28 22:23 ` Junio C Hamano
  2014-04-30 20:48 ` Richard Hansen
  34 siblings, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2014-04-28 22:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The read penalty is not addressed here, so I still pay 14MB hashing
> cost.

Hmm, yeah, the cost for verify_hdr() would still matter, and
presumably you would be hashing the additional 200kB to validate the
smaller "changes since the base" file to give users the same level
of protection against corruption.

> Doing this in other implementations should be easy (at least the
> reading part) and with small code change. The whole index format is
> retained. All you need is to read a new extension that contains two
> ewah-bitmaps and apply the changes to create the final index.

Why bitmaps, though?  Naïvely I would have expected you to read from
two sorted streams and have the transaction log override the base.

Intrigued to find it out...

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 17/32] read-cache: split-index mode
  2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
@ 2014-04-28 22:46   ` Junio C Hamano
  2014-04-29  1:43     ` Duy Nguyen
  0 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2014-04-28 22:46 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/cache.h b/cache.h
> index 0f6247c..90a5998 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -135,6 +135,7 @@ struct cache_entry {
>  	unsigned int ce_mode;
>  	unsigned int ce_flags;
>  	unsigned int ce_namelen;
> +	unsigned int index;	/* for link extension */
>  	unsigned char sha1[20];
>  	char name[FLEX_ARRAY]; /* more */
>  };

I am not sure if we want to keep an otherwise unused 8-byte around
per cache entry (especially for a large project where the split
index mode should matter) after we read the index.

I expected to see some code where entries in this incremental index
are used to override the entries from the base/shared index, but
merge_base_index() seems to do just memcpy() to discard the former
and replace them with the latter.  Is this step meant to work at
all, or is it a smaller step meant to be completed in later patches?

I do think it is sensible to keep two arrays of "struct cache_entry"
around (one for base and one for incremental changes) inside
index_state, and the patch seems to do so via "struct split_index"
that does have a copy of saved_cache.  If the write-out codepath
walks these two sorted arrays in parallel, shouldn't it be able to
figure out which entry is added, deleted and modified without
fattening this structure?

Maybe it is too early for me to be asking these questions and it may
be better if I read the whole series twice and wait it to become
clear to me why this field is necessary.  I dunno.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 17/32] read-cache: split-index mode
  2014-04-28 22:46   ` Junio C Hamano
@ 2014-04-29  1:43     ` Duy Nguyen
  2014-04-29 17:23       ` Junio C Hamano
  0 siblings, 1 reply; 76+ messages in thread
From: Duy Nguyen @ 2014-04-29  1:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Tue, Apr 29, 2014 at 5:46 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> diff --git a/cache.h b/cache.h
>> index 0f6247c..90a5998 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -135,6 +135,7 @@ struct cache_entry {
>>       unsigned int ce_mode;
>>       unsigned int ce_flags;
>>       unsigned int ce_namelen;
>> +     unsigned int index;     /* for link extension */
>>       unsigned char sha1[20];
>>       char name[FLEX_ARRAY]; /* more */
>>  };
>
> I am not sure if we want to keep an otherwise unused 8-byte around
> per cache entry (especially for a large project where the split
> index mode should matter) after we read the index.
>
> I expected to see some code where entries in this incremental index
> are used to override the entries from the base/shared index, but
> merge_base_index() seems to do just memcpy() to discard the former
> and replace them with the latter.  Is this step meant to work at
> all, or is it a smaller step meant to be completed in later patches?

This field only matters at write time, not read time. It's to quickly
detect if an entry is shared, see prepare_to_write_split_index().

> I do think it is sensible to keep two arrays of "struct cache_entry"
> around (one for base and one for incremental changes) inside
> index_state, and the patch seems to do so via "struct split_index"
> that does have a copy of saved_cache.  If the write-out codepath
> walks these two sorted arrays in parallel, shouldn't it be able to
> figure out which entry is added, deleted and modified without
> fattening this structure?

So far without that "index" field I would have to resort to hasing
entries in both arrays to find the shared paths. But ideas are
welcome.
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
@ 2014-04-29  1:52   ` Duy Nguyen
  2014-05-09 10:27   ` Duy Nguyen
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: Duy Nguyen @ 2014-04-29  1:52 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Tue, Apr 29, 2014 at 4:18 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Mon, Apr 28, 2014 at 3:55 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> I hinted about it earlier [1]. It now passes the test suite and with a
>> design that I'm happy with (thanks to Junio for a suggestion about the
>> rename problem).
>>
>> From the user point of view, this reduces the writable size of index
>> down to the number of updated files. For example my webkit index v4 is
>> 14MB. With a fresh split, I only have to update an index of 200KB.
>> Every file I touch will add about 80 bytes to that. As long as I don't
>> touch every single tracked file in my worktree, I should not pay
>> penalty for writing 14MB index file on every operation.
>
> This is a very welcome type of improvement.
>
> I am however concerned about the complexity of the format employed.
> Why do we need two EWAH bitmaps in the new index? Why isn't this just
> a pair of sorted files that are merge-joined at read, with records in
> $GIT_DIR/index taking priority over same-named records in
> $GIT_DIR/sharedindex.$SHA1?  Deletes could be marked with a bit or an
> "all zero" metadata record.

With the bitmaps, I know the exact position to replace or delete an
entry. Merge sort works, but I would need to walk through all entries
in both indexes to compare entry name and stage, a bit costly in my
opinion. And if you look at the format description in patch 0017, I
store the replaced entries without their names to save a bit more
space. "EWAH" is just an implementation detail. A straightforward
bitmap should work fine (25kb for 200k entries seem reasonable).
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 17/32] read-cache: split-index mode
  2014-04-29  1:43     ` Duy Nguyen
@ 2014-04-29 17:23       ` Junio C Hamano
  2014-04-29 22:45         ` Duy Nguyen
  0 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2014-04-29 17:23 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

>> I do think it is sensible to keep two arrays of "struct cache_entry"
>> around (one for base and one for incremental changes) inside
>> index_state, and the patch seems to do so via "struct split_index"
>> that does have a copy of saved_cache.  If the write-out codepath
>> walks these two sorted arrays in parallel, shouldn't it be able to
>> figure out which entry is added, deleted and modified without
>> fattening this structure?
>
> So far without that "index" field I would have to resort to hasing
> entries in both arrays to find the shared paths. But ideas are
> welcome.

Hmm, why do you need to hash, when both arrays are sorted?  Wouldn't
it be just the matter of walking these two arrays in parallel,
with one scanning index for each array initialized to the beginning,
comparing the elements pointed by these indices, noting the side
that comes earlier in the sort order and advancing the index on that
side (or if they compare equal then advance both), ...?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 24/32] split-index: strip pathname of on-disk replaced entries
  2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
@ 2014-04-29 20:25   ` Junio C Hamano
  0 siblings, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2014-04-29 20:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

This triggers "saved_namelen may be used uninitialized" for me, even
though it looks clear that it is used under CE_STRIP_NAME and it is
assigned under that condition.  Sigh to a stupid compiler...

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 17/32] read-cache: split-index mode
  2014-04-29 17:23       ` Junio C Hamano
@ 2014-04-29 22:45         ` Duy Nguyen
  2014-04-30 13:57           ` Junio C Hamano
  0 siblings, 1 reply; 76+ messages in thread
From: Duy Nguyen @ 2014-04-29 22:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Wed, Apr 30, 2014 at 12:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>>> I do think it is sensible to keep two arrays of "struct cache_entry"
>>> around (one for base and one for incremental changes) inside
>>> index_state, and the patch seems to do so via "struct split_index"
>>> that does have a copy of saved_cache.  If the write-out codepath
>>> walks these two sorted arrays in parallel, shouldn't it be able to
>>> figure out which entry is added, deleted and modified without
>>> fattening this structure?
>>
>> So far without that "index" field I would have to resort to hasing
>> entries in both arrays to find the shared paths. But ideas are
>> welcome.
>
> Hmm, why do you need to hash, when both arrays are sorted?  Wouldn't
> it be just the matter of walking these two arrays in parallel,
> with one scanning index for each array initialized to the beginning,
> comparing the elements pointed by these indices, noting the side
> that comes earlier in the sort order and advancing the index on that
> side (or if they compare equal then advance both), ...?

And compare all names and stages (especially in the unpack-trees case,
when no entry is reused). I kinda hope to avoid that. Speaking about
reusing cache_entry, we won't be able to share cache_entry because
when it's freed in replace_index_entry, or remove_index_entry_at in
the main index, we need to locate the same entry in the shared index
as well and remove that stale pointer. Without sharing, we nearly
double memory usage from the beginning.
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 17/32] read-cache: split-index mode
  2014-04-29 22:45         ` Duy Nguyen
@ 2014-04-30 13:57           ` Junio C Hamano
  0 siblings, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2014-04-30 13:57 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> when no entry is reused). I kinda hope to avoid that.

I see.

> Speaking about
> reusing cache_entry, we won't be able to share cache_entry because
> when it's freed in replace_index_entry, or remove_index_entry_at in
> the main index, we need to locate the same entry in the shared index
> as well and remove that stale pointer. Without sharing, we nearly
> double memory usage from the beginning.

Yeah, the point being to have most of the entries come from the base
one, it is expected the real one and the saved base one will be
mostly the same, so sharing is really necessary.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 18/32] read-cache: mark new entries for split index
  2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
@ 2014-04-30 20:35   ` Eric Sunshine
  0 siblings, 0 replies; 76+ messages in thread
From: Eric Sunshine @ 2014-04-30 20:35 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git List

On Mon, Apr 28, 2014 at 6:55 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Make sure entry addition does not lead to unifying the index. We don't
> need to explicitly keep track of new entries. If ce->index is zero,
> they're new. Otherwise it's unlikely that they are new, but we'll do a
> through check later at writing time.

s/through/thorough/

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  read-cache.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index ff889ad..2f2e0c1 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -38,7 +38,8 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
>  #define CACHE_EXT_LINK 0x6c696e6b        /* "link" */
>
>  /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
> -#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED)
> +#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
> +                CE_ENTRY_ADDED)
>
>  struct index_state the_index;
>  static const char *alternate_index_output;
> --
> 1.9.1.346.ga2b5940

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
                   ` (33 preceding siblings ...)
  2014-04-28 22:23 ` [PATCH 00/32] Split index mode for very large indexes Junio C Hamano
@ 2014-04-30 20:48 ` Richard Hansen
  2014-05-01  0:09   ` Duy Nguyen
  34 siblings, 1 reply; 76+ messages in thread
From: Richard Hansen @ 2014-04-30 20:48 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git

On 2014-04-28 06:55, Nguyễn Thái Ngọc Duy wrote:
> From the user point of view, this reduces the writable size of index
> down to the number of updated files. For example my webkit index v4 is
> 14MB. With a fresh split, I only have to update an index of 200KB.
> Every file I touch will add about 80 bytes to that. As long as I don't
> touch every single tracked file in my worktree, I should not pay
> penalty for writing 14MB index file on every operation.

I played around with these changes a bit and have some questions:

  * These changes should only affect performance when the index is
    updated, right?  In other words, if I do "git status; git status"
    the second "git status" shouldn't update the index and therefore
    shouldn't have a noticeable performance improvement relative to Git
    without these patches.  Right?

  * Do you have any before/after benchmark results you can share?

  * Are there any benchmark scripts I can use to test it out in my own
    repositories?

  * Is there a debug utility I can use to examine the contents of the
    index and sharedindex.* files in a more human-readable way?

I'm asking because in my (very basic) tests I noticed that with the
following command:

    git status; time git status

the second "git status" had an unexpected ~20% performance improvement
in my repo relative to a build without your patches.  The second "git
status" in the following command also had about a ~20% performance
improvement:

    git status; touch file-in-index; time git status

So it seems like the patches did improve performance somewhat, but in
ways I wasn't expecting.  (I'm not entirely certain my benchmark method
is sound.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-04-30 20:48 ` Richard Hansen
@ 2014-05-01  0:09   ` Duy Nguyen
  0 siblings, 0 replies; 76+ messages in thread
From: Duy Nguyen @ 2014-05-01  0:09 UTC (permalink / raw)
  To: Richard Hansen; +Cc: git

On Wed, Apr 30, 2014 at 04:48:05PM -0400, Richard Hansen wrote:
> I played around with these changes a bit and have some questions:
> 
>   * These changes should only affect performance when the index is
>     updated, right?  In other words, if I do "git status; git status"
>     the second "git status" shouldn't update the index and therefore
>     shouldn't have a noticeable performance improvement relative to Git
>     without these patches.  Right?

Yes, provided that other factors in "git status" give stable numbers
too.

>   * Do you have any before/after benchmark results you can share?

I did not pay much attention to benchmarking because the index size is
pretty much the deciding factor to write performance (of course too
much computation could add up to that..). But here it is with the 14MB
webkit index, about 182k files. The lines of interest are the first
one (read_cache) and forth (update_index_if_able).

With normal index

pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   278.382ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   489.275ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     6.191ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.321ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    12.733ms wt_status_collect:621 wt_status_collect_changes_index(s)
    98.043ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   915.331ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.727s
user    0m0.809s
sys     0m0.915s
pclouds@lanh ~/d/webkit $ touch wscript
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   276.307ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   504.034ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
   356.122ms cmd_status:1299 update_index_if_able(&the_index, &inde
    11.870ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    10.077ms wt_status_collect:621 wt_status_collect_changes_index(s)
    96.205ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   899.425ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m2.071s
user    0m1.115s
sys     0m0.953s
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   279.424ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   484.303ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     5.288ms cmd_status:1299 update_index_if_able(&the_index, &inde
    13.927ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    12.507ms wt_status_collect:621 wt_status_collect_changes_index(s)
    98.220ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   920.985ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.731s
user    0m0.830s
sys     0m0.897s

And with split index:

pclouds@lanh ~/d/webkit $ time ~/w/git/git update-index --split-index

real    0m0.660s
user    0m0.601s
sys     0m0.058s
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   281.211ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.003ms cmd_status:1294 read_cache_preload(&s.pathspec);
   479.629ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     5.489ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.611ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    11.235ms wt_status_collect:621 wt_status_collect_changes_index(s)
    96.086ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   894.489ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.697s
user    0m0.813s
sys     0m0.881s
pclouds@lanh ~/d/webkit $ touch wscript
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   291.411ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.003ms cmd_status:1294 read_cache_preload(&s.pathspec);
   475.144ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
    24.348ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.440ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    10.400ms wt_status_collect:621 wt_status_collect_changes_index(s)
    97.147ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   907.240ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.734s
user    0m0.842s
sys     0m0.888s
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   281.119ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   479.702ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     5.061ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.220ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    11.408ms wt_status_collect:621 wt_status_collect_changes_index(s)
    95.374ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   896.931ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.700s
user    0m0.809s
sys     0m0.888s

>   * Are there any benchmark scripts I can use to test it out in my own
>     repositories?

You could use the patch I used to generate the timing above. Patch at
the bottom of this mail.

>   * Is there a debug utility I can use to examine the contents of the
>     index and sharedindex.* files in a more human-readable way?

test-dump-split-index <path-to-$GIT_DIR/index>

will show you the content of "index". Entries without names are
replaced entries. Entries with them are added. They are followed by
replace/delete bitmaps printed out. So pretty much everything stored
in $GIT_DIR/index in human-readable format.

git rev-parse --shared-index-path will give you the path to
sharedindex. ls-files --stage can be used to show that.

> I'm asking because in my (very basic) tests I noticed that with the
> following command:
> 
>     git status; time git status
> 
> the second "git status" had an unexpected ~20% performance improvement
> in my repo relative to a build without your patches.  The second "git
> status" in the following command also had about a ~20% performance
> improvement:
> 
>     git status; touch file-in-index; time git status
> 
> So it seems like the patches did improve performance somewhat, but in
> ways I wasn't expecting.  (I'm not entirely certain my benchmark method
> is sound.)

git-status is a complex operation. If you want to focus on this
writing part only, "git update-index" may be better.

And the timing patch:

-- 8< --
diff --git a/builtin/commit.c b/builtin/commit.c
index 243b0c3..d680a44 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1291,12 +1291,12 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		       PATHSPEC_PREFER_FULL,
 		       prefix, argv);
 
-	read_cache_preload(&s.pathspec);
-	refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL);
+	TIME(read_cache_preload(&s.pathspec););
+	TIME(refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL););
 
 	fd = hold_locked_index(&index_lock, 0);
 	if (0 <= fd)
-		update_index_if_able(&the_index, &index_lock);
+		TIME(update_index_if_able(&the_index, &index_lock));
 
 	s.is_initial = get_sha1(s.reference, sha1) ? 1 : 0;
 	s.ignore_submodule_arg = ignore_submodule_arg;
diff --git a/cache.h b/cache.h
index 6ad2595..99854c7 100644
--- a/cache.h
+++ b/cache.h
@@ -1480,4 +1480,21 @@ void stat_validity_update(struct stat_validity *sv, int fd);
 
 int versioncmp(const char *s1, const char *s2);
 
+#define TIME(x) {						\
+	extern int time_level__;				\
+	struct timeval	tv1, tv2;				\
+	int current_level = time_level__;		\
+	gettimeofday(&tv1, NULL);				\
+	time_level__++;						\
+	x;							\
+	time_level__--;						\
+	gettimeofday(&tv2, NULL);				\
+	fprintf(stderr, "% 10.3fms%*s%s:%d %.*s\n",		\
+		tv2.tv_sec * 1000.0 + tv2.tv_usec / 1000.0 -	\
+		tv1.tv_sec * 1000.0 - tv1.tv_usec / 1000.0,	\
+		current_level, " ",				\
+		__FUNCTION__, __LINE__, 38, #x			\
+		);						\
+	}
+
 #endif /* CACHE_H */
diff --git a/environment.c b/environment.c
index 5c4815d..5dcbc6b 100644
--- a/environment.c
+++ b/environment.c
@@ -63,6 +63,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 struct startup_info *startup_info;
 unsigned long pack_size_limit_cfg;
+int time_level__;
 
 /*
  * The character that begins a commented line in user-editable file
diff --git a/name-hash.c b/name-hash.c
index 97444d0..b9121d1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -122,7 +122,7 @@ static int cache_entry_cmp(const struct cache_entry *ce1,
 }
 
 static void lazy_init_name_hash(struct index_state *istate)
-{
+	TIME({
 	int nr;
 
 	if (istate->name_hash_initialized)
@@ -133,7 +133,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
-}
+		})
 
 void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 {
diff --git a/submodule.c b/submodule.c
index b80ecac..1947f20 100644
--- a/submodule.c
+++ b/submodule.c
@@ -195,8 +195,8 @@ void gitmodules_config(void)
 		int pos;
 		strbuf_addstr(&gitmodules_path, work_tree);
 		strbuf_addstr(&gitmodules_path, "/.gitmodules");
-		if (read_cache() < 0)
-			die("index file corrupt");
+		TIME(if (read_cache() < 0)
+			die("index file corrupt"););
 		pos = cache_name_pos(".gitmodules", 11);
 		if (pos < 0) { /* .gitmodules not found or isn't merged */
 			pos = -1 - pos;
diff --git a/wt-status.c b/wt-status.c
index ec7344e..0141856 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -613,13 +613,13 @@ static void wt_status_collect_untracked(struct wt_status *s)
 
 void wt_status_collect(struct wt_status *s)
 {
-	wt_status_collect_changes_worktree(s);
+	TIME(wt_status_collect_changes_worktree(s));
 
 	if (s->is_initial)
 		wt_status_collect_changes_initial(s);
 	else
-		wt_status_collect_changes_index(s);
-	wt_status_collect_untracked(s);
+		TIME(wt_status_collect_changes_index(s));
+	TIME(wt_status_collect_untracked(s));
 }
 
 static void wt_status_print_unmerged(struct wt_status *s)
-- 8< --

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
  2014-04-29  1:52   ` Duy Nguyen
@ 2014-05-09 10:27   ` Duy Nguyen
  2014-05-09 17:55     ` Junio C Hamano
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
  2014-05-13 11:20   ` [PATCH 9/8] even faster loading time with index version 254 Nguyễn Thái Ngọc Duy
  3 siblings, 1 reply; 76+ messages in thread
From: Duy Nguyen @ 2014-05-09 10:27 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Mon, Apr 28, 2014 at 02:18:44PM -0700, Shawn Pearce wrote:
> > The read penalty is not addressed here, so I still pay 14MB hashing
> > cost. But that's an easy problem. We could cache the validated index
> > in a daemon. Whenever git needs to load an index, it pokes the daemon.
> > The daemon verifies that the on-disk index still has the same
> > signature, then sends the in-mem index to git. When git updates the
> > index, it pokes the daemon again to update in-mem index. Next time git
> > reads the index, it does not have to pay I/O cost any more (actually
> > it does but the cost is hidden away when you do not have to read it
> > yet).
> 
> If we are going this far, maybe it is worthwhile building a mmap()
> region the daemon exports to the git client that holds the "in memory"
> format of the index. Clients would mmap this PROT_READ, MAP_PRIVATE
> and can then quickly access the base file information without doing
> further validation, or copying the large(ish) data over a pipe.

The below patch implements such a daemon to cache the index. It takes
91ms and 377ms to load a 25MB index with and without the daemon. I use
share memory instead of pipe, but the format is still "on disk" not
"in memory" for simplicity. I think we're good even without in memory
format.

The daemon should work on Windows after shm_open and unix socket are
replaced with the equivalents. Then we could cache name-hash in it
too.

With all improvements on (index v4, split index, preload index,
untracked cache, read-cache daemon), "git status" goes from 1.8s to
0.6s on webkit.git (the CPU is clocked at 800 MHz so this is close to
"poor machine" case).

Time to convince Junio it's good and push bit by bit to master :)

-- 8< --
diff --git a/.gitignore b/.gitignore
index 70992a4..07e0cb6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -110,6 +110,7 @@
 /git-pull
 /git-push
 /git-quiltimport
+/git-read-cache--daemon
 /git-read-tree
 /git-rebase
 /git-rebase--am
diff --git a/Makefile b/Makefile
index 028749b..98d22de 100644
--- a/Makefile
+++ b/Makefile
@@ -1502,6 +1502,12 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifdef HAVE_SHM
+	BASIC_CFLAGS += -DHAVE_SHM
+	EXTLIBS += -lrt
+	PROGRAM_OBJS += read-cache--daemon.o
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
diff --git a/cache.h b/cache.h
index 99854c7..5251bda 100644
--- a/cache.h
+++ b/cache.h
@@ -290,10 +290,14 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 keep_mmap : 1,
+		 poke_daemon : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
+	void *mmap;
+	size_t mmap_size;
 };
 
 extern struct index_state the_index;
diff --git a/config.mak.uname b/config.mak.uname
index 23a8803..b6a37e5 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_PATHS_H = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	HAVE_SHM = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	NO_STRLCPY = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index f6d3a46..b2116ab 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -723,4 +723,12 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
 #define gmtime_r git_gmtime_r
 #endif
 
+#ifndef HAVE_SHM
+static inline int shm_open(const char *path, int flags, int mode)
+{
+	errno = ENOSYS;
+	return -1;
+}
+#endif
+
 #endif
diff --git a/read-cache--daemon.c b/read-cache--daemon.c
new file mode 100644
index 0000000..32a5aa7
--- /dev/null
+++ b/read-cache--daemon.c
@@ -0,0 +1,168 @@
+#include "cache.h"
+#include "sigchain.h"
+#include "unix-socket.h"
+#include "split-index.h"
+#include "pkt-line.h"
+
+static char *socket_path;
+static struct strbuf shm_index = STRBUF_INIT;
+static struct strbuf shm_sharedindex = STRBUF_INIT;
+
+static void cleanup_socket(void)
+{
+	if (socket_path)
+		unlink(socket_path);
+	if (shm_index.len)
+		shm_unlink(shm_index.buf);
+	if (shm_sharedindex.len)
+		shm_unlink(shm_sharedindex.buf);
+}
+
+static void cleanup_socket_on_signal(int sig)
+{
+	cleanup_socket();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static void share_index(struct index_state *istate, struct strbuf *shm_path)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *map;
+	int fd;
+
+	strbuf_addf(&sb, "/git-index-%s", sha1_to_hex(istate->sha1));
+	if (shm_path->len && strcmp(sb.buf, shm_path->buf)) {
+		shm_unlink(shm_path->buf);
+		strbuf_reset(shm_path);
+	}
+	fd = shm_open(sb.buf, O_RDWR | O_CREAT | O_TRUNC, 0700);
+	if (fd < 0)
+		return;
+	/*
+	 * We "lock" the shm in preparation by set its size larger
+	 * than expected. The reader is supposed to check the size and
+	 * ignore if shm size is different than the actual file size
+	 */
+	if (ftruncate(fd, istate->mmap_size + 1)) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return;
+	}
+	strbuf_addbuf(shm_path, &sb);
+	map = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE,
+		    MAP_SHARED, fd, 0);
+	if (map == MAP_FAILED) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return;
+	}
+	memcpy(map, istate->mmap, istate->mmap_size);
+	munmap(map, istate->mmap_size);
+	/* Now "unlock" it */
+	if (ftruncate(fd, istate->mmap_size)) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return;
+	}
+	close(fd);
+}
+
+static void refresh()
+{
+	the_index.keep_mmap = 1;
+	if (read_cache() < 0)
+		die("could not read index");
+	share_index(&the_index, &shm_index);
+	if (the_index.split_index &&
+	    the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_sharedindex);
+	discard_index(&the_index);
+	fprintf(stderr, "refreshed\n");
+}
+
+static unsigned long next;
+static int serve_cache_loop(int fd)
+{
+	struct pollfd pfd;
+	unsigned long now = time(NULL);
+
+	if (now > next)
+		return 0;
+
+	pfd.fd = fd;
+	pfd.events = POLLIN;
+	if (poll(&pfd, 1, 1000 * (next - now)) < 0) {
+		if (errno != EINTR)
+			die_errno("poll failed");
+		return 1;
+	}
+
+	if (pfd.revents & POLLIN) {
+		int client = accept(fd, NULL, NULL);
+		if (client < 0) {
+			warning("accept failed: %s", strerror(errno));
+			return 1;
+		}
+		refresh();
+		close(client);
+		next = now + 600;
+	}
+	return 1;
+}
+
+static void serve_cache(const char *socket_path)
+{
+	int fd;
+
+	fd = unix_stream_listen(socket_path);
+	if (fd < 0)
+		die_errno("unable to bind to '%s'", socket_path);
+
+	refresh();
+
+	printf("ok\n");
+	fclose(stdout);
+
+	next = time(NULL) + 600;
+	while (serve_cache_loop(fd))
+		; /* nothing */
+
+	close(fd);
+	unlink(socket_path);
+}
+
+static void check_socket_directory(const char *path)
+{
+	struct stat st;
+	char *path_copy = xstrdup(path);
+	char *dir = dirname(path_copy);
+
+	if (!stat(dir, &st)) {
+		free(path_copy);
+		return;
+	}
+
+	/*
+	 * We must be sure to create the directory with the correct mode,
+	 * not just chmod it after the fact; otherwise, there is a race
+	 * condition in which somebody can chdir to it, sleep, then try to open
+	 * our protected socket.
+	 */
+	if (safe_create_leading_directories_const(dir) < 0)
+		die_errno("unable to create directories for '%s'", dir);
+	if (mkdir(dir, 0700) < 0)
+		die_errno("unable to mkdir '%s'", dir);
+	free(path_copy);
+}
+
+int main(int argc, const char **argv)
+{
+	setup_git_directory();
+	socket_path = git_pathdup("daemon/index");
+	check_socket_directory(socket_path);
+	atexit(cleanup_socket);
+	sigchain_push_common(cleanup_socket_on_signal);
+	serve_cache(socket_path);
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 42eac62..7ab6fb5 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -16,6 +16,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "sigchain.h"
+#include "unix-socket.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1332,6 +1333,8 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+	if (!size)
+		return 0;
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
 	git_SHA1_Final(sha1, &c);
@@ -1462,6 +1465,35 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void *try_shm(void *mmap, size_t mmap_size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *new_mmap;
+	struct stat st;
+	int fd;
+
+	if (mmap_size <= 20)
+		return mmap;
+
+	strbuf_addf(&sb, "/git-index-%s",
+		    sha1_to_hex((unsigned char *)mmap + mmap_size - 20));
+	fd = shm_open(sb.buf, O_RDONLY, 0777);
+	strbuf_release(&sb);
+	if (fd < 0)
+		return mmap;
+	if (fstat(fd, &st) || st.st_size != mmap_size) {
+		close(fd);
+		return mmap;
+	}
+	new_mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (new_mmap == MAP_FAILED)
+		return mmap;
+	munmap(mmap, mmap_size);
+	return new_mmap;
+}
+
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1469,7 +1501,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	struct stat st;
 	unsigned long src_offset;
 	struct cache_header *hdr;
-	void *mmap;
+	void *mmap, *old_mmap;
 	size_t mmap_size;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 
@@ -1495,11 +1527,23 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
+	if (istate->keep_mmap) {
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	}
 	close(fd);
 
+	old_mmap = mmap;
+	mmap = try_shm(old_mmap, mmap_size);
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
-		goto unmap;
+	if (old_mmap == mmap) {
+		if (verify_hdr(hdr, mmap_size) < 0)
+			goto unmap;
+	} else {
+		if (verify_hdr(hdr, 0) < 0)
+			goto unmap;
+		istate->poke_daemon = 1;
+	}
 
 	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
 	istate->version = ntohl(hdr->hdr_version);
@@ -1547,10 +1591,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += 8;
 		src_offset += extsize;
 	}
-	munmap(mmap, mmap_size);
+	if (!istate->keep_mmap)
+		munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
+	istate->mmap = NULL;
 	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
@@ -1576,6 +1622,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		discard_index(split_index->base);
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	split_index->base->keep_mmap = istate->keep_mmap;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1618,6 +1665,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->keep_mmap && istate->mmap) {
+		munmap(istate->mmap, istate->mmap_size);
+		istate->mmap = NULL;
+	}
 	discard_split_index(istate);
 	return 0;
 }
@@ -2071,12 +2122,14 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		       unsigned flags)
 {
 	struct split_index *si = istate->split_index;
+	int ret;
 
 	if (!si || alternate_index_output ||
 	    (istate->cache_changed & ~EXTMASK)) {
 		if (si)
 			hashclr(si->base_sha1);
-		return do_write_locked_index(istate, lock, flags);
+		ret = do_write_locked_index(istate, lock, flags);
+		goto done;
 	}
 
 	if (getenv("GIT_TEST_SPLIT_INDEX")) {
@@ -2090,7 +2143,13 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 			return ret;
 	}
 
-	return write_split_index(istate, lock, flags);
+	ret = write_split_index(istate, lock, flags);
+done:
+	if (!ret && istate->poke_daemon) {
+		int fd = unix_stream_connect(git_path("daemon/index"));
+		close(fd);
+	}
+	return ret;
 }
 
 /*
-- 8< --

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 00/32] Split index mode for very large indexes
  2014-05-09 10:27   ` Duy Nguyen
@ 2014-05-09 17:55     ` Junio C Hamano
  0 siblings, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2014-05-09 17:55 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Shawn Pearce, git

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Apr 28, 2014 at 02:18:44PM -0700, Shawn Pearce wrote:
>> > The read penalty is not addressed here, so I still pay 14MB hashing
>> > cost. But that's an easy problem. We could cache the validated index
>> > in a daemon. Whenever git needs to load an index, it pokes the daemon.
>> > The daemon verifies that the on-disk index still has the same
>> > signature, then sends the in-mem index to git. When git updates the
>> > index, it pokes the daemon again to update in-mem index. Next time git
>> > reads the index, it does not have to pay I/O cost any more (actually
>> > it does but the cost is hidden away when you do not have to read it
>> > yet).
>> 
>> If we are going this far, maybe it is worthwhile building a mmap()
>> region the daemon exports to the git client that holds the "in memory"
>> format of the index. Clients would mmap this PROT_READ, MAP_PRIVATE
>> and can then quickly access the base file information without doing
>> further validation, or copying the large(ish) data over a pipe.
>
> The below patch implements such a daemon to cache the index. It takes
> 91ms and 377ms to load a 25MB index with and without the daemon. I use
> share memory instead of pipe, but the format is still "on disk" not
> "in memory" for simplicity. I think we're good even without in memory
> format.

Interesting ;-).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 0/8] Speed up cache loading time
  2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
  2014-04-29  1:52   ` Duy Nguyen
  2014-05-09 10:27   ` Duy Nguyen
@ 2014-05-13 11:15   ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
                       ` (10 more replies)
  2014-05-13 11:20   ` [PATCH 9/8] even faster loading time with index version 254 Nguyễn Thái Ngọc Duy
  3 siblings, 11 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

On Fri, May 9, 2014 at 5:27 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> The below patch implements such a daemon to cache the index. It takes
> 91ms and 377ms to load a 25MB index with and without the daemon. I use
> share memory instead of pipe, but the format is still "on disk" not
> "in memory" for simplicity. I think we're good even without in memory
> format.

Here is a better version (on top of split-index). I duplicated webkit
index 8 times to get its size to 199MB (version 2), close to what
Facebook tried last time [1]. read_cache() on index v2, v4, with the
daemon caching v2 and v4 respectively is 2994.861ms (199MB index
file), 2245.113ms (118MB) and 663.399ms and 880.935ms. The best number
is 4.5 times better the worst.

That is clocked at 800 MHz. A repository at this size deserves a
better CPU. At 2.5 GHz we spend 183.228ms on loading the index. A
reasonable number to me. If we scale other parts of git-status as well
as this, we should be able to make "git status" within 1 or 2 seconds.

The tested index does not have fully populated cache-tree so real
world numbers could be a bit higher.

[1] http://thread.gmane.org/gmane.comp.version-control.git/189776/focus=190156

Nguyễn Thái Ngọc Duy (8):
  read-cache: allow to keep mmap'd memory after reading
  unix-socket: stub impl. for platforms with no unix socket support
  daemonize: set a flag before exiting the main process
  Add read-cache--daemon for caching index and related stuff
  read-cache: try index data from shared memory
  read-cache--daemon: do not read index from shared memory
  read-cache: skip verifying trailing SHA-1 on cached index
  read-cache: inform the daemon that the index has been updated

 .gitignore                                     |   1 +
 Documentation/config.txt                       |   4 +
 Documentation/git-read-cache--daemon.txt (new) |  27 ++++
 Makefile                                       |   8 +
 builtin/gc.c                                   |   2 +-
 cache.h                                        |   7 +-
 config.c                                       |  12 ++
 config.mak.uname                               |   1 +
 daemon.c                                       |   2 +-
 environment.c                                  |   1 +
 read-cache--daemon.c (new)                     | 208 +++++++++++++++++++++++++
 read-cache.c                                   | 116 +++++++++++++-
 setup.c                                        |   4 +-
 submodule.c                                    |   1 +
 unix-socket.h                                  |  18 +++
 wrapper.c                                      |  14 ++
 16 files changed, 414 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/git-read-cache--daemon.txt
 create mode 100644 read-cache--daemon.c

-- 
1.9.1.346.ga2b5940

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  3 +++
 read-cache.c | 13 ++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index c6b7770..6549e02 100644
--- a/cache.h
+++ b/cache.h
@@ -290,10 +290,13 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 keep_mmap : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
+	void *mmap;
+	size_t mmap_size;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 342fe52..a5031f3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1495,6 +1495,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
+	if (istate->keep_mmap) {
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	}
 	close(fd);
 
 	hdr = mmap;
@@ -1547,10 +1551,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += 8;
 		src_offset += extsize;
 	}
-	munmap(mmap, mmap_size);
+	if (!istate->keep_mmap)
+		munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
+	istate->mmap = NULL;
 	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
@@ -1576,6 +1582,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		discard_index(split_index->base);
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	split_index->base->keep_mmap = istate->keep_mmap;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1618,6 +1625,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->keep_mmap && istate->mmap) {
+		munmap(istate->mmap, istate->mmap_size);
+		istate->mmap = NULL;
+	}
 	discard_split_index(istate);
 	return 0;
 }
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/3] Add read-cache--daemon
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:52       ` Erik Faye-Lund
  2014-05-13 11:15     ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
                       ` (8 subsequent siblings)
  10 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 .gitignore                 |   1 +
 Makefile                   |   6 ++
 config.mak.uname           |   1 +
 git-compat-util.h          |   8 +++
 read-cache--daemon.c (new) | 167 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 183 insertions(+)
 create mode 100644 read-cache--daemon.c

diff --git a/.gitignore b/.gitignore
index 70992a4..07e0cb6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -110,6 +110,7 @@
 /git-pull
 /git-push
 /git-quiltimport
+/git-read-cache--daemon
 /git-read-tree
 /git-rebase
 /git-rebase--am
diff --git a/Makefile b/Makefile
index 028749b..98d22de 100644
--- a/Makefile
+++ b/Makefile
@@ -1502,6 +1502,12 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifdef HAVE_SHM
+	BASIC_CFLAGS += -DHAVE_SHM
+	EXTLIBS += -lrt
+	PROGRAM_OBJS += read-cache--daemon.o
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
diff --git a/config.mak.uname b/config.mak.uname
index 23a8803..b6a37e5 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_PATHS_H = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	HAVE_SHM = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	NO_STRLCPY = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index f6d3a46..b2116ab 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -723,4 +723,12 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
 #define gmtime_r git_gmtime_r
 #endif
 
+#ifndef HAVE_SHM
+static inline int shm_open(const char *path, int flags, int mode)
+{
+	errno = ENOSYS;
+	return -1;
+}
+#endif
+
 #endif
diff --git a/read-cache--daemon.c b/read-cache--daemon.c
new file mode 100644
index 0000000..52b4067
--- /dev/null
+++ b/read-cache--daemon.c
@@ -0,0 +1,167 @@
+#include "cache.h"
+#include "sigchain.h"
+#include "unix-socket.h"
+#include "split-index.h"
+#include "pkt-line.h"
+
+static char *socket_path;
+static struct strbuf shm_index = STRBUF_INIT;
+static struct strbuf shm_sharedindex = STRBUF_INIT;
+
+static void cleanup_socket(void)
+{
+	if (socket_path)
+		unlink(socket_path);
+	if (shm_index.len)
+		shm_unlink(shm_index.buf);
+	if (shm_sharedindex.len)
+		shm_unlink(shm_sharedindex.buf);
+}
+
+static void cleanup_socket_on_signal(int sig)
+{
+	cleanup_socket();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static void share_index(struct index_state *istate, struct strbuf *shm_path)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *map;
+	int fd;
+
+	strbuf_addf(&sb, "/git-index-%s", sha1_to_hex(istate->sha1));
+	if (shm_path->len && strcmp(sb.buf, shm_path->buf)) {
+		shm_unlink(shm_path->buf);
+		strbuf_reset(shm_path);
+	}
+	fd = shm_open(sb.buf, O_RDWR | O_CREAT | O_TRUNC, 0700);
+	if (fd < 0)
+		return;
+	/*
+	 * We "lock" the shm in preparation by set its size larger
+	 * than expected. The reader is supposed to check the size and
+	 * ignore if shm size is different than the actual file size
+	 */
+	if (ftruncate(fd, istate->mmap_size + 1)) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return;
+	}
+	strbuf_addbuf(shm_path, &sb);
+	map = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE,
+		    MAP_SHARED, fd, 0);
+	if (map == MAP_FAILED) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return;
+	}
+	memcpy(map, istate->mmap, istate->mmap_size);
+	munmap(map, istate->mmap_size);
+	/* Now "unlock" it */
+	if (ftruncate(fd, istate->mmap_size)) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return;
+	}
+	close(fd);
+}
+
+static void refresh()
+{
+	the_index.keep_mmap = 1;
+	if (read_cache() < 0)
+		die("could not read index");
+	share_index(&the_index, &shm_index);
+	if (the_index.split_index &&
+	    the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_sharedindex);
+	discard_index(&the_index);
+}
+
+static unsigned long next;
+static int serve_cache_loop(int fd)
+{
+	struct pollfd pfd;
+	unsigned long now = time(NULL);
+
+	if (now > next)
+		return 0;
+
+	pfd.fd = fd;
+	pfd.events = POLLIN;
+	if (poll(&pfd, 1, 1000 * (next - now)) < 0) {
+		if (errno != EINTR)
+			die_errno("poll failed");
+		return 1;
+	}
+
+	if (pfd.revents & POLLIN) {
+		int client = accept(fd, NULL, NULL);
+		if (client < 0) {
+			warning("accept failed: %s", strerror(errno));
+			return 1;
+		}
+		refresh();
+		close(client);
+		next = now + 600;
+	}
+	return 1;
+}
+
+static void serve_cache(const char *socket_path)
+{
+	int fd;
+
+	fd = unix_stream_listen(socket_path);
+	if (fd < 0)
+		die_errno("unable to bind to '%s'", socket_path);
+
+	refresh();
+
+	printf("ok\n");
+	fclose(stdout);
+
+	next = time(NULL) + 600;
+	while (serve_cache_loop(fd))
+		; /* nothing */
+
+	close(fd);
+	unlink(socket_path);
+}
+
+static void check_socket_directory(const char *path)
+{
+	struct stat st;
+	char *path_copy = xstrdup(path);
+	char *dir = dirname(path_copy);
+
+	if (!stat(dir, &st)) {
+		free(path_copy);
+		return;
+	}
+
+	/*
+	 * We must be sure to create the directory with the correct mode,
+	 * not just chmod it after the fact; otherwise, there is a race
+	 * condition in which somebody can chdir to it, sleep, then try to open
+	 * our protected socket.
+	 */
+	if (safe_create_leading_directories_const(dir) < 0)
+		die_errno("unable to create directories for '%s'", dir);
+	if (mkdir(dir, 0700) < 0)
+		die_errno("unable to mkdir '%s'", dir);
+	free(path_copy);
+}
+
+int main(int argc, const char **argv)
+{
+	setup_git_directory();
+	socket_path = git_pathdup("daemon/index");
+	check_socket_directory(socket_path);
+	atexit(cleanup_socket);
+	sigchain_push_common(cleanup_socket_on_signal);
+	serve_cache(socket_path);
+	return 0;
+}
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:59       ` Erik Faye-Lund
  2014-05-13 11:15     ` [PATCH 3/8] daemonize: set a flag before exiting the main process Nguyễn Thái Ngọc Duy
                       ` (7 subsequent siblings)
  10 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

With this we can make unix_stream_* calls without #ifdef.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile      |  2 ++
 unix-socket.h | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/Makefile b/Makefile
index 028749b..d0a2b4b 100644
--- a/Makefile
+++ b/Makefile
@@ -1417,6 +1417,8 @@ ifndef NO_UNIX_SOCKETS
 	LIB_H += unix-socket.h
 	PROGRAM_OBJS += credential-cache.o
 	PROGRAM_OBJS += credential-cache--daemon.o
+else
+	BASIC_CFLAGS += -DNO_UNIX_SOCKETS
 endif
 
 ifdef NO_ICONV
diff --git a/unix-socket.h b/unix-socket.h
index e271aee..f1cba70 100644
--- a/unix-socket.h
+++ b/unix-socket.h
@@ -1,7 +1,25 @@
 #ifndef UNIX_SOCKET_H
 #define UNIX_SOCKET_H
 
+#ifndef NO_UNIX_SOCKETS
+
 int unix_stream_connect(const char *path);
 int unix_stream_listen(const char *path);
 
+#else
+
+static inline int unix_stream_connect(const char *path)
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+static inline int unix_stream_listen(const char *path)
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+#endif
+
 #endif /* UNIX_SOCKET_H */
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/8] daemonize: set a flag before exiting the main process
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (2 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 3/3] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This allows signal handlers and atexit functions to realize this
situation and not clean up.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/gc.c | 2 +-
 cache.h      | 2 +-
 daemon.c     | 2 +-
 setup.c      | 4 +++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 85f5c2b..50275af 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -325,7 +325,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			 * failure to daemonize is ok, we'll continue
 			 * in foreground
 			 */
-			daemonize();
+			daemonize(NULL);
 	} else
 		add_repack_all_option();
 
diff --git a/cache.h b/cache.h
index 6549e02..d0ff11c 100644
--- a/cache.h
+++ b/cache.h
@@ -450,7 +450,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int);
 extern int init_db(const char *template_dir, unsigned int flags);
 
 extern void sanitize_stdfds(void);
-extern int daemonize(void);
+extern int daemonize(int *);
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
diff --git a/daemon.c b/daemon.c
index eba1255..2650504 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1311,7 +1311,7 @@ int main(int argc, char **argv)
 		return execute();
 
 	if (detach) {
-		if (daemonize())
+		if (daemonize(NULL))
 			die("--detach not supported on this platform");
 	} else
 		sanitize_stdfds();
diff --git a/setup.c b/setup.c
index 613e3b3..e8e129a 100644
--- a/setup.c
+++ b/setup.c
@@ -842,7 +842,7 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
-int daemonize(void)
+int daemonize(int *daemonized)
 {
 #ifdef NO_POSIX_GOODIES
 	errno = ENOSYS;
@@ -854,6 +854,8 @@ int daemonize(void)
 		case -1:
 			die_errno("fork failed");
 		default:
+			if (daemonized)
+				*daemonized = 1;
 			exit(0);
 	}
 	if (setsid() == -1)
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/3] read-cache: try index data from shared memory
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (3 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 3/8] daemonize: set a flag before exiting the main process Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

---
 read-cache.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 9e742c7..3100a59 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1462,6 +1462,35 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void *try_shm(void *mmap, size_t mmap_size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *new_mmap;
+	struct stat st;
+	int fd;
+
+	if (mmap_size <= 20)
+		return mmap;
+
+	strbuf_addf(&sb, "/git-index-%s",
+		    sha1_to_hex((unsigned char *)mmap + mmap_size - 20));
+	fd = shm_open(sb.buf, O_RDONLY, 0777);
+	strbuf_release(&sb);
+	if (fd < 0)
+		return mmap;
+	if (fstat(fd, &st) || st.st_size != mmap_size) {
+		close(fd);
+		return mmap;
+	}
+	new_mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (new_mmap == MAP_FAILED)
+		return mmap;
+	munmap(mmap, mmap_size);
+	return new_mmap;
+}
+
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1501,6 +1530,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	}
 	close(fd);
 
+	mmap = try_shm(mmap, mmap_size);
 	hdr = mmap;
 	if (verify_hdr(hdr, mmap_size) < 0)
 		goto unmap;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 4/8] Add read-cache--daemon for caching index and related stuff
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (4 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 3/3] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:56       ` Erik Faye-Lund
  2014-05-13 11:15     ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
                       ` (4 subsequent siblings)
  10 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The name of the shared memory folows the template "/git-index-<SHA1>"
where <SHA1> is the trailing SHA-1 of the index file. If such a shared
memory exists, it contains the same index content as on disk. The
content is already validated by the daemon. Note that it does not
necessarily use the same format as the on-disk version. The content
could be in a format that can be parsed much faster, or even reused
without parsing).

While preparing the shm object, the daemon would keep the shm object
"/git-index-<SHA1>.lock". After "git-index-<SHA1>" is ready, the
".lock" object is removed. A shared object must not be updated
afterwards. So if ".lock" does not exist, it's safe to assume that the
associated shm object is ready.

Other info could also by cached if it's tied to the index. For
example, name hash could be stored in "/git-namehash-<SHA1>"..

After Git writes a new index down, it may want to ask the daemon to
preload the new index so next time Git runs the index is already
validated and in memory. It does so by send a command to a UNIX socket
in $GIT_DIR/daemon/index.

Windows can use its named shared memory instead of POSIX shared memory
and probably named pipe in place of UNIX socket.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore                                     |   1 +
 Documentation/git-read-cache--daemon.txt (new) |  27 ++++
 Makefile                                       |   6 +
 config.mak.uname                               |   1 +
 read-cache--daemon.c (new)                     | 207 +++++++++++++++++++++++++
 wrapper.c                                      |  14 ++
 6 files changed, 256 insertions(+)
 create mode 100644 Documentation/git-read-cache--daemon.txt
 create mode 100644 read-cache--daemon.c

diff --git a/.gitignore b/.gitignore
index 70992a4..07e0cb6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -110,6 +110,7 @@
 /git-pull
 /git-push
 /git-quiltimport
+/git-read-cache--daemon
 /git-read-tree
 /git-rebase
 /git-rebase--am
diff --git a/Documentation/git-read-cache--daemon.txt b/Documentation/git-read-cache--daemon.txt
new file mode 100644
index 0000000..1b05be4
--- /dev/null
+++ b/Documentation/git-read-cache--daemon.txt
@@ -0,0 +1,27 @@
+git-read-cache--daemon(1)
+=============
+
+NAME
+----
+git-daemon - A simple cache server for speeding up index file access
+
+SYNOPSIS
+--------
+[verse]
+'git daemon' [--detach]
+
+DESCRIPTION
+-----------
+Keep the index file in memory for faster access. This daemon is per
+repository. Note that core.useReadCacheDaemon must be set for Git to
+contact the daemon. This daemon is only available on POSIX system with
+shared memory support (e.g. Linux)
+
+OPTIONS
+-------
+--detach::
+	Detach from the shell.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index d0a2b4b..a44ab0b 100644
--- a/Makefile
+++ b/Makefile
@@ -1504,6 +1504,12 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifdef HAVE_SHM
+	BASIC_CFLAGS += -DHAVE_SHM
+	EXTLIBS += -lrt
+	PROGRAM_OBJS += read-cache--daemon.o
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
diff --git a/config.mak.uname b/config.mak.uname
index 23a8803..b6a37e5 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_PATHS_H = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	HAVE_SHM = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	NO_STRLCPY = YesPlease
diff --git a/read-cache--daemon.c b/read-cache--daemon.c
new file mode 100644
index 0000000..4531978
--- /dev/null
+++ b/read-cache--daemon.c
@@ -0,0 +1,207 @@
+#include "cache.h"
+#include "sigchain.h"
+#include "unix-socket.h"
+#include "split-index.h"
+#include "pkt-line.h"
+
+static char *socket_path;
+static struct strbuf shm_index = STRBUF_INIT;
+static struct strbuf shm_sharedindex = STRBUF_INIT;
+static struct strbuf shm_lock = STRBUF_INIT;
+static int lock_fd = -1;
+static int daemonized;
+
+static void cleanup_socket(void)
+{
+	if (daemonized)
+		return;
+	if (socket_path)
+		unlink(socket_path);
+	if (shm_index.len)
+		shm_unlink(shm_index.buf);
+	if (shm_sharedindex.len)
+		shm_unlink(shm_sharedindex.buf);
+	if (lock_fd != -1)
+		close(lock_fd);
+	if (shm_lock.len)
+		shm_unlink(shm_lock.buf);
+}
+
+static void cleanup_socket_on_signal(int sig)
+{
+	cleanup_socket();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static int do_share_index(struct index_state *istate, struct strbuf *shm_path)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *map;
+	int fd;
+
+	strbuf_addf(&sb, "/git-index-%s", sha1_to_hex(istate->sha1));
+	fd = shm_open(sb.buf, O_RDWR | O_CREAT | O_EXCL, 0700);
+	if (fd < 0)
+		return -1;
+	if (shm_path->len) {
+		shm_unlink(shm_path->buf);
+		strbuf_reset(shm_path);
+	}
+	if (ftruncate(fd, istate->mmap_size)) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return -1;
+	}
+	strbuf_addbuf(shm_path, &sb);
+	map = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE,
+		    MAP_SHARED, fd, 0);
+	if (map == MAP_FAILED) {
+		close(fd);
+		shm_unlink(shm_path->buf);
+		return -1;
+	}
+	memcpy(map, istate->mmap, istate->mmap_size);
+	munmap(map, istate->mmap_size);
+	fchmod(fd, 0400);
+	close(fd);
+	return 0;
+}
+
+static void share_index(struct index_state *istate, struct strbuf *shm_path)
+{
+	if (shm_lock.len)
+		return;
+
+	strbuf_addf(&shm_lock, "/git-index-%s.lock", sha1_to_hex(istate->sha1));
+	lock_fd = shm_open(shm_lock.buf, O_CREAT | O_EXCL, 0700);
+	if (lock_fd < 0) {
+		strbuf_reset(&shm_lock);
+		return;
+	}
+	do_share_index(istate, shm_path);
+	close(lock_fd);
+	lock_fd = -1;
+	shm_unlink(shm_lock.buf);
+	strbuf_reset(&shm_lock);
+}
+
+static void refresh()
+{
+	the_index.keep_mmap = 1;
+	if (read_cache() < 0)
+		die("could not read index");
+	share_index(&the_index, &shm_index);
+	if (the_index.split_index &&
+	    the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_sharedindex);
+	discard_index(&the_index);
+}
+
+static void serve_one_client(int fd)
+{
+	char *buf = packet_read_line(fd, NULL);
+	if (!strcmp(buf, "refresh"))
+		refresh();
+	else
+		fprintf(stderr, "unrecognized command %s\n", buf);
+}
+
+static unsigned long next;
+static int serve_cache_loop(int fd)
+{
+	struct pollfd pfd;
+	unsigned long now = time(NULL);
+
+	if (now > next)
+		return 0;
+
+	pfd.fd = fd;
+	pfd.events = POLLIN;
+	if (poll(&pfd, 1, 1000 * (next - now)) < 0) {
+		if (errno != EINTR)
+			die_errno("poll failed");
+		return 1;
+	}
+
+	if (pfd.revents & POLLIN) {
+		int client = accept(fd, NULL, NULL);
+		if (client < 0) {
+			warning("accept failed: %s", strerror(errno));
+			return 1;
+		}
+		serve_one_client(client);
+		close(client);
+		next = now + 600;
+	}
+	return 1;
+}
+
+static void serve_cache(const char *socket_path, int detach)
+{
+	int fd;
+
+	fd = unix_stream_listen(socket_path);
+	if (fd < 0)
+		die_errno("unable to bind to '%s'", socket_path);
+
+	refresh();
+	if (detach && daemonize(&daemonized))
+		die_errno("unable to detach");
+
+	next = time(NULL) + 600;
+	while (serve_cache_loop(fd))
+		; /* nothing */
+
+	close(fd);
+	unlink(socket_path);
+}
+
+static void check_socket_directory(const char *path)
+{
+	struct stat st;
+	char *path_copy = xstrdup(path);
+	char *dir = dirname(path_copy);
+
+	if (!stat(dir, &st)) {
+		free(path_copy);
+		return;
+	}
+
+	/*
+	 * We must be sure to create the directory with the correct mode,
+	 * not just chmod it after the fact; otherwise, there is a race
+	 * condition in which somebody can chdir to it, sleep, then try to open
+	 * our protected socket.
+	 */
+	if (safe_create_leading_directories_const(dir) < 0)
+		die_errno("unable to create directories for '%s'", dir);
+	if (mkdir(dir, 0700) < 0)
+		die_errno("unable to mkdir '%s'", dir);
+	free(path_copy);
+}
+
+int main(int argc, const char **argv)
+{
+	int detach = 0;
+	switch (argc) {
+	case 1:
+		break;
+	case 2:
+		if (!strcmp(argv[1], "--detach"))
+			detach = 1;
+		else
+			die("unknown option %s", argv[1]);
+		break;
+	default:
+		die("unexpected argument number %d\n", argc);
+	}
+
+	setup_git_directory();
+	socket_path = git_pathdup("daemon/index");
+	check_socket_directory(socket_path);
+	atexit(cleanup_socket);
+	sigchain_push_common(cleanup_socket_on_signal);
+	serve_cache(socket_path, detach);
+	return 0;
+}
diff --git a/wrapper.c b/wrapper.c
index 0cc5636..4cd7415 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -455,3 +455,17 @@ struct passwd *xgetpwuid_self(void)
 		    errno ? strerror(errno) : _("no such user"));
 	return pw;
 }
+
+#ifndef HAVE_SHM
+int shm_open(const char *path, int flags, mode_t mode)
+{
+	errno = ENOSYS;
+	return -1;
+}
+
+int shm_unlink(const char *path)
+{
+	errno = ENOSYS;
+	return -1;
+}
+#endif
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 5/8] read-cache: try index data from shared memory
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (5 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 12:13       ` Erik Faye-Lund
  2014-05-13 11:15     ` [PATCH 6/8] read-cache--daemon: do not read index " Nguyễn Thái Ngọc Duy
                       ` (3 subsequent siblings)
  10 siblings, 1 reply; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt |  4 ++++
 cache.h                  |  1 +
 config.c                 | 12 ++++++++++++
 environment.c            |  1 +
 read-cache.c             | 43 +++++++++++++++++++++++++++++++++++++++++++
 submodule.c              |  1 +
 6 files changed, 62 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d8b6cc9..ccbe00b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -617,6 +617,10 @@ relatively high IO latencies.  With this set to 'true', Git will do the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.
 
+core.useReadCacheDaemon::
+	Use `git read-cache--daemon` to speed up index reading. See
+	linkgit:git-read-cache--daemon for more information.
+
 core.createObject::
 	You can set this to 'link', in which case a hardlink followed by
 	a delete of the source are used to make sure that object creation
diff --git a/cache.h b/cache.h
index d0ff11c..fb29c7e 100644
--- a/cache.h
+++ b/cache.h
@@ -603,6 +603,7 @@ extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
 extern unsigned long big_file_threshold;
 extern unsigned long pack_size_limit_cfg;
+extern int use_read_cache_daemon;
 
 /*
  * Do replace refs need to be checked this run?  This variable is
diff --git a/config.c b/config.c
index a30cb5c..5c832ad 100644
--- a/config.c
+++ b/config.c
@@ -874,6 +874,18 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+#ifdef HAVE_SHM
+	/*
+	 * Currently git-read-cache--daemon is only built when
+	 * HAVE_SHM is set. Ignore user settings if HAVE_SHM is not
+	 * defined.
+	 */
+	if (!strcmp(var, "core.usereadcachedaemon")) {
+		use_read_cache_daemon = git_config_bool(var, value);
+		return 0;
+	}
+#endif
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 5c4815d..b76a414 100644
--- a/environment.c
+++ b/environment.c
@@ -63,6 +63,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 struct startup_info *startup_info;
 unsigned long pack_size_limit_cfg;
+int use_read_cache_daemon;
 
 /*
  * The character that begins a commented line in user-editable file
diff --git a/read-cache.c b/read-cache.c
index a5031f3..0e46523 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1462,6 +1462,48 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
+static void *try_shm(void *mmap, size_t *mmap_size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	size_t old_size = *mmap_size;
+	void *new_mmap;
+	struct stat st;
+	int fd;
+
+	if (old_size <= 20 || !use_read_cache_daemon)
+		return mmap;
+
+	strbuf_addf(&sb, "/git-index-%s.lock",
+		    sha1_to_hex((unsigned char *)mmap + old_size - 20));
+	fd = shm_open(sb.buf, O_RDONLY, 0777);
+	if (fd >= 0) {
+		close(fd);
+		return mmap;
+	}
+	strbuf_setlen(&sb, sb.len - 5); /* no ".lock" */
+	fd = shm_open(sb.buf, O_RDONLY, 0777);
+	strbuf_release(&sb);
+	if (fd < 0)
+		/*
+		 * git-read-cache--daemon is probably not started yet. For
+		 * simplicity, only start it at the next index update, which
+		 * should happen often.
+		 */
+		return mmap;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return mmap;
+	}
+	new_mmap = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (new_mmap == MAP_FAILED)
+		return mmap;
+	munmap(mmap, old_size);
+	*mmap_size = st.st_size;
+	return new_mmap;
+}
+
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1501,6 +1543,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	}
 	close(fd);
 
+	mmap = try_shm(mmap, &mmap_size);
 	hdr = mmap;
 	if (verify_hdr(hdr, mmap_size) < 0)
 		goto unmap;
diff --git a/submodule.c b/submodule.c
index b80ecac..9872928 100644
--- a/submodule.c
+++ b/submodule.c
@@ -195,6 +195,7 @@ void gitmodules_config(void)
 		int pos;
 		strbuf_addstr(&gitmodules_path, work_tree);
 		strbuf_addstr(&gitmodules_path, "/.gitmodules");
+		git_config(git_default_config, NULL);
 		if (read_cache() < 0)
 			die("index file corrupt");
 		pos = cache_name_pos(".gitmodules", 11);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 6/8] read-cache--daemon: do not read index from shared memory
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (6 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index Nguyễn Thái Ngọc Duy
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

It does not hurt doing that. But it does not help anybody either.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache--daemon.c | 1 +
 read-cache.c         | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/read-cache--daemon.c b/read-cache--daemon.c
index 4531978..bd6d84f 100644
--- a/read-cache--daemon.c
+++ b/read-cache--daemon.c
@@ -145,6 +145,7 @@ static void serve_cache(const char *socket_path, int detach)
 	if (fd < 0)
 		die_errno("unable to bind to '%s'", socket_path);
 
+	use_read_cache_daemon = -1;
 	refresh();
 	if (detach && daemonize(&daemonized))
 		die_errno("unable to detach");
diff --git a/read-cache.c b/read-cache.c
index 0e46523..4041485 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1470,7 +1470,7 @@ static void *try_shm(void *mmap, size_t *mmap_size)
 	struct stat st;
 	int fd;
 
-	if (old_size <= 20 || !use_read_cache_daemon)
+	if (old_size <= 20 || use_read_cache_daemon <= 0)
 		return mmap;
 
 	strbuf_addf(&sb, "/git-index-%s.lock",
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (7 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 6/8] read-cache--daemon: do not read index " Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
  2014-05-13 14:24     ` [PATCH 0/8] Speed up cache loading time Stefan Beller
  10 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The daemon is responsible for verifying the index before putting it in
the shared memory. No need to redo it again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 4041485..e98521f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1332,6 +1332,8 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+	if (!size)
+		return 0;
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
 	git_SHA1_Final(sha1, &c);
@@ -1511,7 +1513,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	struct stat st;
 	unsigned long src_offset;
 	struct cache_header *hdr;
-	void *mmap;
+	void *mmap, *old_mmap;
 	size_t mmap_size;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 
@@ -1543,9 +1545,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	}
 	close(fd);
 
+	old_mmap = mmap;
 	mmap = try_shm(mmap, &mmap_size);
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	if (verify_hdr(hdr, old_mmap != mmap ? 0 : mmap_size) < 0)
 		goto unmap;
 
 	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 8/8] read-cache: inform the daemon that the index has been updated
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (8 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:15     ` Nguyễn Thái Ngọc Duy
  2014-05-13 12:17       ` Erik Faye-Lund
  2014-05-22 16:38       ` David Turner
  2014-05-13 14:24     ` [PATCH 0/8] Speed up cache loading time Stefan Beller
  10 siblings, 2 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:15 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The daemon would immediately load the new index in memory in
background. Next time Git needs to read the index again, everything is
ready.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  1 +
 read-cache.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index fb29c7e..3115b86 100644
--- a/cache.h
+++ b/cache.h
@@ -483,6 +483,7 @@ extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
+#define REFRESH_DAEMON		(1 << 2)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
diff --git a/read-cache.c b/read-cache.c
index e98521f..d5c9247 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -16,6 +16,9 @@
 #include "varint.h"
 #include "split-index.h"
 #include "sigchain.h"
+#include "unix-socket.h"
+#include "pkt-line.h"
+#include "run-command.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -2030,6 +2033,32 @@ void set_alternate_index_output(const char *name)
 	alternate_index_output = name;
 }
 
+static void refresh_daemon(struct index_state *istate)
+{
+	int fd;
+	fd = unix_stream_connect(git_path("daemon/index"));
+	if (fd < 0) {
+		struct child_process cp;
+		const char *av[] = {"read-cache--daemon", "--detach", NULL };
+		memset(&cp, 0, sizeof(cp));
+		cp.argv = av;
+		cp.git_cmd = 1;
+		cp.no_stdin = 1;
+		if (run_command(&cp))
+			warning(_("failed to start read-cache--daemon: %s"),
+				strerror(errno));
+		return;
+	}
+	/*
+	 * packet_write() could die() but unless this is from
+	 * update_index_if_able(), we're about to exit anyway,
+	 * probably ok to die (for now). Blocking mode is another
+	 * problem to deal with later.
+	 */
+	packet_write(fd, "refresh");
+	close(fd);
+}
+
 static int commit_locked_index(struct lock_file *lk)
 {
 	if (alternate_index_output) {
@@ -2052,9 +2081,22 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
 	       (COMMIT_LOCK | CLOSE_LOCK));
-	if (flags & COMMIT_LOCK)
-		return commit_locked_index(lock);
-	else if (flags & CLOSE_LOCK)
+	if (flags & COMMIT_LOCK) {
+		int ret;
+		int len = strlen(lock->filename) - 5; /* .lock */
+		if (!use_read_cache_daemon || len < 6 ||
+		    /*
+		     * do not wake the daemon when we update a temporary
+		     * index. This is not a perfect test for this, but good
+		     * enough.
+		     */
+		    strncmp(lock->filename + len - 6, "/index", 6))
+			flags &= ~REFRESH_DAEMON;
+		ret = commit_locked_index(lock);
+		if (!ret && use_read_cache_daemon)
+			refresh_daemon(istate);
+		return ret;
+	} else if (flags & CLOSE_LOCK)
 		return close_lock_file(lock);
 	else
 		return ret;
@@ -2066,7 +2108,7 @@ static int write_split_index(struct index_state *istate,
 {
 	int ret;
 	prepare_to_write_split_index(istate);
-	ret = do_write_locked_index(istate, lock, flags);
+	ret = do_write_locked_index(istate, lock, flags | REFRESH_DAEMON);
 	finish_writing_split_index(istate);
 	return ret;
 }
@@ -2133,7 +2175,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	    (istate->cache_changed & ~EXTMASK)) {
 		if (si)
 			hashclr(si->base_sha1);
-		return do_write_locked_index(istate, lock, flags);
+		return do_write_locked_index(istate, lock,
+					     flags | REFRESH_DAEMON);
 	}
 
 	if (getenv("GIT_TEST_SPLIT_INDEX")) {
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 9/8] even faster loading time with index version 254
  2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
                     ` (2 preceding siblings ...)
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:20   ` Nguyễn Thái Ngọc Duy
  3 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-13 11:20 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This dirty (and likely buggy) patch shows a direction of lowering load
time even more. Basically the shared memory now contains a clean
memory dump that a git process could use with little preparation
(which also means it's tied to C Git, other implementations can't use
this)

Memory is actually shared, git won't malloc and copy over, so even if
the v254 is 235 MB (larger than v2 199MB), we use less memory.

With this patch, we can get as low as 256.442ms (compared to 663ms in
0/8) at 800 MHz, or 91ms at 2.5 GHz. Index load time should be a
solved problem.

But I'm not going to polish this patch and try to get it merged. I'd
rather see a real world repository of this size first to justify
messing up read-cache.c even more.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h              |   2 +
 read-cache--daemon.c |  31 +++++------
 read-cache.c         | 154 ++++++++++++++++++++++++++++++++++++++++++---------
 split-index.c        |   3 +
 4 files changed, 149 insertions(+), 41 deletions(-)

diff --git a/cache.h b/cache.h
index c246dee..7f0ef1e 100644
--- a/cache.h
+++ b/cache.h
@@ -297,6 +297,8 @@ struct index_state {
 	unsigned char sha1[20];
 	void *mmap;
 	size_t mmap_size;
+	int mmap_fd;
+	void *(*allocate_254)(struct index_state *, size_t);
 };
 
 extern struct index_state the_index;
diff --git a/read-cache--daemon.c b/read-cache--daemon.c
index bd6d84f..a44bd09 100644
--- a/read-cache--daemon.c
+++ b/read-cache--daemon.c
@@ -34,10 +34,19 @@ static void cleanup_socket_on_signal(int sig)
 	raise(sig);
 }
 
+static void *allocate_254(struct index_state *istate, unsigned long size)
+{
+	ftruncate(istate->mmap_fd, size);
+	istate->mmap_size = size;
+	istate->mmap = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE,
+			     MAP_SHARED, istate->mmap_fd, 0);
+	return istate->mmap != MAP_FAILED ? istate->mmap : NULL;
+}
+
+extern int do_write_index(struct index_state *istate, int newfd, int strip_extensions);
 static int do_share_index(struct index_state *istate, struct strbuf *shm_path)
 {
 	struct strbuf sb = STRBUF_INIT;
-	void *map;
 	int fd;
 
 	strbuf_addf(&sb, "/git-index-%s", sha1_to_hex(istate->sha1));
@@ -48,21 +57,16 @@ static int do_share_index(struct index_state *istate, struct strbuf *shm_path)
 		shm_unlink(shm_path->buf);
 		strbuf_reset(shm_path);
 	}
-	if (ftruncate(fd, istate->mmap_size)) {
-		close(fd);
-		shm_unlink(shm_path->buf);
-		return -1;
-	}
+	istate->version = 254;
+	istate->allocate_254 = allocate_254;
+	istate->mmap_fd = fd;
+	do_write_index(istate, -1, 0);
 	strbuf_addbuf(shm_path, &sb);
-	map = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE,
-		    MAP_SHARED, fd, 0);
-	if (map == MAP_FAILED) {
+	if (istate->mmap == MAP_FAILED) {
 		close(fd);
 		shm_unlink(shm_path->buf);
 		return -1;
 	}
-	memcpy(map, istate->mmap, istate->mmap_size);
-	munmap(map, istate->mmap_size);
 	fchmod(fd, 0400);
 	close(fd);
 	return 0;
@@ -88,13 +92,9 @@ static void share_index(struct index_state *istate, struct strbuf *shm_path)
 
 static void refresh()
 {
-	the_index.keep_mmap = 1;
 	if (read_cache() < 0)
 		die("could not read index");
 	share_index(&the_index, &shm_index);
-	if (the_index.split_index &&
-	    the_index.split_index->base)
-		share_index(the_index.split_index->base, &shm_sharedindex);
 	discard_index(&the_index);
 }
 
@@ -145,7 +145,6 @@ static void serve_cache(const char *socket_path, int detach)
 	if (fd < 0)
 		die_errno("unable to bind to '%s'", socket_path);
 
-	use_read_cache_daemon = -1;
 	refresh();
 	if (detach && daemonize(&daemonized))
 		die_errno("unable to detach");
diff --git a/read-cache.c b/read-cache.c
index d5c9247..4db1c30 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -61,7 +61,8 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 
 	replace_index_entry_in_base(istate, old, ce);
 	remove_name_hash(istate, old);
-	free(old);
+	if (old->index != 0xffffffff) /* special mark by v254 entry writing code */
+		free(old);
 	set_index_entry(istate, nr, ce);
 	ce->ce_flags |= CE_UPDATE_IN_BASE;
 	istate->cache_changed |= CE_ENTRY_CHANGED;
@@ -1333,9 +1334,11 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
 		return error("bad signature");
 	hdr_version = ntohl(hdr->hdr_version);
-	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
+	if (!size && hdr_version == 254)
+		fprintf(stderr, "yeah\n");		/* go on */
+	else if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
-	if (!size)
+	if (!size || hdr_version == 254)
 		return 0;
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
@@ -1499,7 +1502,8 @@ static void *try_shm(void *mmap, size_t *mmap_size)
 		close(fd);
 		return mmap;
 	}
-	new_mmap = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	new_mmap = xmmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE, fd, 0);
 	close(fd);
 	if (new_mmap == MAP_FAILED)
 		return mmap;
@@ -1519,6 +1523,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	void *mmap, *old_mmap;
 	size_t mmap_size;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	int ver_254 = 0;
 
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -1561,7 +1566,13 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
 	istate->initialized = 1;
 
-	if (istate->version == 4)
+	if (istate->version == 254) {
+		istate->version = 4;
+		ver_254 = 1;
+		istate->keep_mmap = 1;
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	} else if (istate->version == 4)
 		previous_name = &previous_name_buf;
 	else
 		previous_name = NULL;
@@ -1573,7 +1584,14 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		unsigned long consumed;
 
 		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
-		ce = create_from_disk(disk_ce, &consumed, previous_name);
+		if (ver_254) {
+			ce = mmap + src_offset;
+			consumed =
+				offsetof(struct cache_entry, name) +
+				ce_namelen(ce) + 1;
+			consumed = (consumed + 7) & ~7;
+		} else
+			ce = create_from_disk(disk_ce, &consumed, previous_name);
 		set_index_entry(istate, i, ce);
 
 		src_offset += consumed;
@@ -1655,6 +1673,8 @@ int discard_index(struct index_state *istate)
 	int i;
 
 	for (i = 0; i < istate->cache_nr; i++) {
+		if (istate->cache[i]->index == 0xffffffff)
+			continue;
 		if (istate->cache[i]->index &&
 		    istate->split_index &&
 		    istate->split_index->base &&
@@ -1696,13 +1716,51 @@ int unmerged_index(const struct index_state *istate)
 static unsigned char write_buffer[WRITE_BUFFER_SIZE];
 static unsigned long write_buffer_len;
 
+struct file_block {
+	struct file_block *next;
+	char buf[1];
+};
+static struct file_block *start, *end;
+static unsigned long file_block_size;
+#define FB_ALLOC_SIZE 65536
+#define FB_USABLE_SIZE (FB_ALLOC_SIZE - sizeof(struct file_block *))
+
+static void fill_file_block(const unsigned char *buffer, unsigned int len)
+{
+	if (!start) {
+		start = end = xmalloc(FB_ALLOC_SIZE);
+		start->next = NULL;
+	}
+
+	while (len) {
+		unsigned long used = file_block_size % FB_USABLE_SIZE;
+		unsigned long remaining = FB_USABLE_SIZE - used;
+		if (len < remaining) {
+			memcpy(end->buf + used, buffer, len);
+			file_block_size += len;
+			return;
+		}
+		memcpy(end->buf + used, buffer, remaining);
+		file_block_size += remaining;
+		buffer		+= remaining;
+		len		-= remaining;
+		end->next	 = xmalloc(FB_ALLOC_SIZE);
+		end		 = end->next;
+		end->next	 = NULL;
+	}
+}
+
 static int ce_write_flush(git_SHA_CTX *context, int fd)
 {
 	unsigned int buffered = write_buffer_len;
 	if (buffered) {
-		git_SHA1_Update(context, write_buffer, buffered);
-		if (write_in_full(fd, write_buffer, buffered) != buffered)
-			return -1;
+		if (context) {
+			git_SHA1_Update(context, write_buffer, buffered);
+			if (write_in_full(fd, write_buffer, buffered) != buffered)
+				return -1;
+		} else {
+			fill_file_block(write_buffer, buffered);
+		}
 		write_buffer_len = 0;
 	}
 	return 0;
@@ -1745,7 +1803,8 @@ static int ce_flush(git_SHA_CTX *context, int fd, unsigned char *sha1)
 
 	if (left) {
 		write_buffer_len = 0;
-		git_SHA1_Update(context, write_buffer, left);
+		if (context)
+			git_SHA1_Update(context, write_buffer, left);
 	}
 
 	/* Flush first if not enough space for SHA1 signature */
@@ -1756,10 +1815,18 @@ static int ce_flush(git_SHA_CTX *context, int fd, unsigned char *sha1)
 	}
 
 	/* Append the SHA1 signature at the end */
-	git_SHA1_Final(write_buffer + left, context);
+	if (context)
+		git_SHA1_Final(write_buffer + left, context);
+	else
+		hashclr(write_buffer + left);
 	hashcpy(sha1, write_buffer + left);
 	left += 20;
-	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
+	if (context)
+		return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
+	else {
+		fill_file_block(write_buffer, left);
+		return 0;
+	}
 }
 
 static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
@@ -1921,10 +1988,9 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-static int do_write_index(struct index_state *istate, int newfd,
-			  int strip_extensions)
+int do_write_index(struct index_state *istate, int newfd, int strip_extensions)
 {
-	git_SHA_CTX c;
+	git_SHA_CTX c, *c_p;
 	struct cache_header hdr;
 	int i, err, removed, extended, hdr_version;
 	struct cache_entry **cache = istate->cache;
@@ -1960,8 +2026,14 @@ static int do_write_index(struct index_state *istate, int newfd,
 	hdr.hdr_version = htonl(hdr_version);
 	hdr.hdr_entries = htonl(entries - removed);
 
-	git_SHA1_Init(&c);
-	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
+	if (istate->version == 254)
+		c_p = NULL;
+	else {
+		c_p = &c;
+		git_SHA1_Init(c_p);
+	}
+
+	if (ce_write(c_p, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 
 	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
@@ -1982,7 +2054,21 @@ static int do_write_index(struct index_state *istate, int newfd,
 			else
 				return error(msg, ce->name);
 		}
-		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
+		if (!c_p) {
+			static unsigned padding[8];
+			unsigned sz = offsetof(struct cache_entry, name) + ce_namelen(ce) + 1;
+			unsigned int ce_flags = ce->ce_flags;
+			struct hashmap_entry he = ce->ent;
+			ce->index = 0xffffffff;
+			memset(&ce->ent, 0, sizeof(ce->ent));
+			ce->ce_flags &= CE_VALID | CE_EXTENDED_FLAGS;
+			ce_write(NULL, 0, ce, sz);
+			ce->ce_flags = ce_flags;
+			memcpy(&ce->ent, &he, sizeof(he));
+			ce->index = 0;
+			if (sz % 8)
+				ce_write(NULL, 0, padding, 8 - (sz % 8));
+		} else if (ce_write_entry(c_p, newfd, ce, previous_name) < 0)
 			return -1;
 	}
 	strbuf_release(&previous_name_buf);
@@ -1992,9 +2078,9 @@ static int do_write_index(struct index_state *istate, int newfd,
 		struct strbuf sb = STRBUF_INIT;
 
 		err = write_link_extension(&sb, istate) < 0 ||
-			write_index_ext_header(&c, newfd, CACHE_EXT_LINK,
+			write_index_ext_header(c_p, newfd, CACHE_EXT_LINK,
 					       sb.len) < 0 ||
-			ce_write(&c, newfd, sb.buf, sb.len) < 0;
+			ce_write(c_p, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
 		if (err)
 			return -1;
@@ -2003,8 +2089,8 @@ static int do_write_index(struct index_state *istate, int newfd,
 		struct strbuf sb = STRBUF_INIT;
 
 		cache_tree_write(&sb, istate->cache_tree);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
-			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		err = write_index_ext_header(c_p, newfd, CACHE_EXT_TREE, sb.len) < 0
+			|| ce_write(c_p, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
 		if (err)
 			return -1;
@@ -2013,16 +2099,34 @@ static int do_write_index(struct index_state *istate, int newfd,
 		struct strbuf sb = STRBUF_INIT;
 
 		resolve_undo_write(&sb, istate->resolve_undo);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
+		err = write_index_ext_header(c_p, newfd, CACHE_EXT_RESOLVE_UNDO,
 					     sb.len) < 0
-			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+			|| ce_write(c_p, newfd, sb.buf, sb.len) < 0;
 		strbuf_release(&sb);
 		if (err)
 			return -1;
 	}
 
-	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
+	if (ce_flush(c_p, newfd, istate->sha1) || (c_p && fstat(newfd, &st)))
 		return -1;
+	if (!c_p) {
+		unsigned char *p = NULL;
+		if (istate->allocate_254)
+			p = istate->allocate_254(istate, file_block_size);
+		while (file_block_size) {
+			struct file_block *to_free = start;
+			int len = file_block_size > FB_USABLE_SIZE ? FB_USABLE_SIZE : file_block_size;
+			if (p) {
+				memcpy(p, start->buf, len);
+				p += len;
+			} else
+				write_or_die(newfd, start->buf, len);
+			file_block_size -= len;
+			start = start->next;
+			free(to_free);
+		}
+		start = end = NULL;
+	}
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	return 0;
diff --git a/split-index.c b/split-index.c
index 21485e2..a47f805 100644
--- a/split-index.c
+++ b/split-index.c
@@ -302,6 +302,9 @@ void discard_split_index(struct index_state *istate)
 
 void save_or_free_index_entry(struct index_state *istate, struct cache_entry *ce)
 {
+	if (ce->index == 0xffffffff)
+		return;
+
 	if (ce->index &&
 	    istate->split_index &&
 	    istate->split_index->base &&
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 11:15     ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:52       ` Erik Faye-Lund
  2014-05-13 12:01         ` Duy Nguyen
  2014-05-13 13:01         ` Duy Nguyen
  0 siblings, 2 replies; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 11:52 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> diff --git a/Makefile b/Makefile
> index 028749b..98d22de 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1502,6 +1502,12 @@ ifdef HAVE_DEV_TTY
>         BASIC_CFLAGS += -DHAVE_DEV_TTY
>  endif
>
> +ifdef HAVE_SHM
> +       BASIC_CFLAGS += -DHAVE_SHM
> +       EXTLIBS += -lrt
> +       PROGRAM_OBJS += read-cache--daemon.o
> +endif
> +

I think read-cache--daemon will fail in case of NO_UNIX_SOCKETS.

But, read-cache--daemon.c only gets compiled if we have shm_open...

> diff --git a/git-compat-util.h b/git-compat-util.h
> index f6d3a46..b2116ab 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -723,4 +723,12 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
>  #define gmtime_r git_gmtime_r
>  #endif
>
> +#ifndef HAVE_SHM
> +static inline int shm_open(const char *path, int flags, int mode)
> +{
> +       errno = ENOSYS;
> +       return -1;
> +}
> +#endif


...yet, you introduce a compatibility-shim...

> diff --git a/read-cache--daemon.c b/read-cache--daemon.c
> new file mode 100644
> index 0000000..52b4067
> --- /dev/null
> +++ b/read-cache--daemon.c
> @@ -0,0 +1,167 @@
> +#include "cache.h"
> +#include "sigchain.h"
> +#include "unix-socket.h"
> +#include "split-index.h"
> +#include "pkt-line.h"
> +
> +static char *socket_path;
> +static struct strbuf shm_index = STRBUF_INIT;
> +static struct strbuf shm_sharedindex = STRBUF_INIT;
> +
> +static void cleanup_socket(void)
> +{
> +       if (socket_path)
> +               unlink(socket_path);
> +       if (shm_index.len)
> +               shm_unlink(shm_index.buf);
> +       if (shm_sharedindex.len)
> +               shm_unlink(shm_sharedindex.buf);
> +}
> +
> +static void cleanup_socket_on_signal(int sig)
> +{
> +       cleanup_socket();
> +       sigchain_pop(sig);
> +       raise(sig);
> +}
> +
> +static void share_index(struct index_state *istate, struct strbuf *shm_path)
> +{
> +       struct strbuf sb = STRBUF_INIT;
> +       void *map;
> +       int fd;
> +
> +       strbuf_addf(&sb, "/git-index-%s", sha1_to_hex(istate->sha1));
> +       if (shm_path->len && strcmp(sb.buf, shm_path->buf)) {
> +               shm_unlink(shm_path->buf);
> +               strbuf_reset(shm_path);
> +       }
> +       fd = shm_open(sb.buf, O_RDWR | O_CREAT | O_TRUNC, 0700);
> +       if (fd < 0)
> +               return;

...that only gets called from read-cache--daemon.c, which already only
gets compiled if we have open?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 4/8] Add read-cache--daemon for caching index and related stuff
  2014-05-13 11:15     ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:56       ` Erik Faye-Lund
  0 siblings, 0 replies; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 11:56 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> diff --git a/Documentation/git-read-cache--daemon.txt b/Documentation/git-read-cache--daemon.txt
> new file mode 100644
> index 0000000..1b05be4
> --- /dev/null
> +++ b/Documentation/git-read-cache--daemon.txt
> @@ -0,0 +1,27 @@
> +git-read-cache--daemon(1)
> +=============
> +
> +NAME
> +----
> +git-daemon - A simple cache server for speeding up index file access
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git daemon' [--detach]
> +

Huh, "git daemon" can't be right...

> diff --git a/Makefile b/Makefile
> index d0a2b4b..a44ab0b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1504,6 +1504,12 @@ ifdef HAVE_DEV_TTY
>         BASIC_CFLAGS += -DHAVE_DEV_TTY
>  endif
>
> +ifdef HAVE_SHM
> +       BASIC_CFLAGS += -DHAVE_SHM
> +       EXTLIBS += -lrt
> +       PROGRAM_OBJS += read-cache--daemon.o
> +endif

I think this also needs to be protected against NO_UNIX_SOCKETS

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support
  2014-05-13 11:15     ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
@ 2014-05-13 11:59       ` Erik Faye-Lund
  2014-05-13 12:03         ` Erik Faye-Lund
  0 siblings, 1 reply; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 11:59 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> With this we can make unix_stream_* calls without #ifdef.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Makefile      |  2 ++
>  unix-socket.h | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 028749b..d0a2b4b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1417,6 +1417,8 @@ ifndef NO_UNIX_SOCKETS
>         LIB_H += unix-socket.h
>         PROGRAM_OBJS += credential-cache.o
>         PROGRAM_OBJS += credential-cache--daemon.o
> +else
> +       BASIC_CFLAGS += -DNO_UNIX_SOCKETS
>  endif
>
>  ifdef NO_ICONV
> diff --git a/unix-socket.h b/unix-socket.h
> index e271aee..f1cba70 100644
> --- a/unix-socket.h
> +++ b/unix-socket.h
> @@ -1,7 +1,25 @@
>  #ifndef UNIX_SOCKET_H
>  #define UNIX_SOCKET_H
>
> +#ifndef NO_UNIX_SOCKETS
> +
>  int unix_stream_connect(const char *path);
>  int unix_stream_listen(const char *path);
>
> +#else
> +
> +static inline int unix_stream_connect(const char *path)
> +{
> +       errno = ENOSYS;
> +       return -1;
> +}
> +
> +static inline int unix_stream_listen(const char *path)
> +{
> +       errno = ENOSYS;
> +       return -1;
> +}
> +
> +#endif
> +
>  #endif /* UNIX_SOCKET_H */

OK, so I missed this before my other two comments. But still... in
what way does errno=ENOSYS make this *work*? Won't we end up compiling
lots of non-functional tools on Windows in this case?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 11:52       ` Erik Faye-Lund
@ 2014-05-13 12:01         ` Duy Nguyen
  2014-05-13 13:01         ` Duy Nguyen
  1 sibling, 0 replies; 76+ messages in thread
From: Duy Nguyen @ 2014-05-13 12:01 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 6:52 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> diff --git a/Makefile b/Makefile
>> index 028749b..98d22de 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1502,6 +1502,12 @@ ifdef HAVE_DEV_TTY
>>         BASIC_CFLAGS += -DHAVE_DEV_TTY
>>  endif
>>
>> +ifdef HAVE_SHM
>> +       BASIC_CFLAGS += -DHAVE_SHM
>> +       EXTLIBS += -lrt
>> +       PROGRAM_OBJS += read-cache--daemon.o
>> +endif
>> +
>
> I think read-cache--daemon will fail in case of NO_UNIX_SOCKETS.
>
> But, read-cache--daemon.c only gets compiled if we have shm_open...

Portability is something to be sorted out.Ideally we should not build
this unless we have both unix socket and shared memory support. On
Windows, I'm not sure how much code can be shared, or it'll be a
completely different program. In that case maybe this program should
be read-cache--daemon-posix (at least the .c file name, the binary may
be still git-read-cache--daemon) or something and the Windows version
read-cache--daemon-windows..
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support
  2014-05-13 11:59       ` Erik Faye-Lund
@ 2014-05-13 12:03         ` Erik Faye-Lund
  0 siblings, 0 replies; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 12:03 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 1:59 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> With this we can make unix_stream_* calls without #ifdef.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>  Makefile      |  2 ++
>>  unix-socket.h | 18 ++++++++++++++++++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/Makefile b/Makefile
>> index 028749b..d0a2b4b 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1417,6 +1417,8 @@ ifndef NO_UNIX_SOCKETS
>>         LIB_H += unix-socket.h
>>         PROGRAM_OBJS += credential-cache.o
>>         PROGRAM_OBJS += credential-cache--daemon.o
>> +else
>> +       BASIC_CFLAGS += -DNO_UNIX_SOCKETS
>>  endif
>>
>>  ifdef NO_ICONV
>> diff --git a/unix-socket.h b/unix-socket.h
>> index e271aee..f1cba70 100644
>> --- a/unix-socket.h
>> +++ b/unix-socket.h
>> @@ -1,7 +1,25 @@
>>  #ifndef UNIX_SOCKET_H
>>  #define UNIX_SOCKET_H
>>
>> +#ifndef NO_UNIX_SOCKETS
>> +
>>  int unix_stream_connect(const char *path);
>>  int unix_stream_listen(const char *path);
>>
>> +#else
>> +
>> +static inline int unix_stream_connect(const char *path)
>> +{
>> +       errno = ENOSYS;
>> +       return -1;
>> +}
>> +
>> +static inline int unix_stream_listen(const char *path)
>> +{
>> +       errno = ENOSYS;
>> +       return -1;
>> +}
>> +
>> +#endif
>> +
>>  #endif /* UNIX_SOCKET_H */
>
> OK, so I missed this before my other two comments. But still... in
> what way does errno=ENOSYS make this *work*? Won't we end up compiling
> lots of non-functional tools on Windows in this case?

ENOSYS makes git-credential-cache.c just die with the message "unable
to start cache daemon", and git-credential--daemon.c die with "unable
to bind to <socket_path>".

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/8] read-cache: try index data from shared memory
  2014-05-13 11:15     ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
@ 2014-05-13 12:13       ` Erik Faye-Lund
  0 siblings, 0 replies; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 12:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/config.txt |  4 ++++
>  cache.h                  |  1 +
>  config.c                 | 12 ++++++++++++
>  environment.c            |  1 +
>  read-cache.c             | 43 +++++++++++++++++++++++++++++++++++++++++++
>  submodule.c              |  1 +
>  6 files changed, 62 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index d8b6cc9..ccbe00b 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -617,6 +617,10 @@ relatively high IO latencies.  With this set to 'true', Git will do the
>  index comparison to the filesystem data in parallel, allowing
>  overlapping IO's.
>
> +core.useReadCacheDaemon::
> +       Use `git read-cache--daemon` to speed up index reading. See
> +       linkgit:git-read-cache--daemon for more information.
> +
>  core.createObject::
>         You can set this to 'link', in which case a hardlink followed by
>         a delete of the source are used to make sure that object creation
> diff --git a/cache.h b/cache.h
> index d0ff11c..fb29c7e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -603,6 +603,7 @@ extern size_t packed_git_limit;
>  extern size_t delta_base_cache_limit;
>  extern unsigned long big_file_threshold;
>  extern unsigned long pack_size_limit_cfg;
> +extern int use_read_cache_daemon;
>
>  /*
>   * Do replace refs need to be checked this run?  This variable is
> diff --git a/config.c b/config.c
> index a30cb5c..5c832ad 100644
> --- a/config.c
> +++ b/config.c
> @@ -874,6 +874,18 @@ static int git_default_core_config(const char *var, const char *value)
>                 return 0;
>         }
>
> +#ifdef HAVE_SHM
> +       /*
> +        * Currently git-read-cache--daemon is only built when
> +        * HAVE_SHM is set. Ignore user settings if HAVE_SHM is not
> +        * defined.
> +        */
> +       if (!strcmp(var, "core.usereadcachedaemon")) {
> +               use_read_cache_daemon = git_config_bool(var, value);
> +               return 0;
> +       }
> +#endif
> +
>         /* Add other config variables here and to Documentation/config.txt. */
>         return 0;
>  }
> diff --git a/environment.c b/environment.c
> index 5c4815d..b76a414 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -63,6 +63,7 @@ int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  struct startup_info *startup_info;
>  unsigned long pack_size_limit_cfg;
> +int use_read_cache_daemon;
>
>  /*
>   * The character that begins a commented line in user-editable file
> diff --git a/read-cache.c b/read-cache.c
> index a5031f3..0e46523 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1462,6 +1462,48 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
>         return ce;
>  }
>
> +static void *try_shm(void *mmap, size_t *mmap_size)
> +{
> +       struct strbuf sb = STRBUF_INIT;
> +       size_t old_size = *mmap_size;
> +       void *new_mmap;
> +       struct stat st;
> +       int fd;
> +
> +       if (old_size <= 20 || !use_read_cache_daemon)
> +               return mmap;
> +
> +       strbuf_addf(&sb, "/git-index-%s.lock",
> +                   sha1_to_hex((unsigned char *)mmap + old_size - 20));
> +       fd = shm_open(sb.buf, O_RDONLY, 0777);

OK, so *here* you do an unguarded use of shm_open, but the code never
gets executed because of the HAVE_SHM guard in
git_default_core_config. Perhaps not introduce the compatibility shim
until this patch, then?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 8/8] read-cache: inform the daemon that the index has been updated
  2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
@ 2014-05-13 12:17       ` Erik Faye-Lund
  2014-05-22 16:38       ` David Turner
  1 sibling, 0 replies; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 12:17 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 1:15 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> The daemon would immediately load the new index in memory in
> background. Next time Git needs to read the index again, everything is
> ready.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  cache.h      |  1 +
>  read-cache.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index fb29c7e..3115b86 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -483,6 +483,7 @@ extern int is_index_unborn(struct index_state *);
>  extern int read_index_unmerged(struct index_state *);
>  #define COMMIT_LOCK            (1 << 0)
>  #define CLOSE_LOCK             (1 << 1)
> +#define REFRESH_DAEMON         (1 << 2)
>  extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
>  extern int discard_index(struct index_state *);
>  extern int unmerged_index(const struct index_state *);
> diff --git a/read-cache.c b/read-cache.c
> index e98521f..d5c9247 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -16,6 +16,9 @@
>  #include "varint.h"
>  #include "split-index.h"
>  #include "sigchain.h"
> +#include "unix-socket.h"
> +#include "pkt-line.h"
> +#include "run-command.h"
>
>  static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
>                                                unsigned int options);
> @@ -2030,6 +2033,32 @@ void set_alternate_index_output(const char *name)
>         alternate_index_output = name;
>  }
>
> +static void refresh_daemon(struct index_state *istate)
> +{
> +       int fd;
> +       fd = unix_stream_connect(git_path("daemon/index"));
> +       if (fd < 0) {
> +               struct child_process cp;
> +               const char *av[] = {"read-cache--daemon", "--detach", NULL };
> +               memset(&cp, 0, sizeof(cp));
> +               cp.argv = av;
> +               cp.git_cmd = 1;
> +               cp.no_stdin = 1;
> +               if (run_command(&cp))
> +                       warning(_("failed to start read-cache--daemon: %s"),
> +                               strerror(errno));
> +               return;
> +       }
> +       /*
> +        * packet_write() could die() but unless this is from
> +        * update_index_if_able(), we're about to exit anyway,
> +        * probably ok to die (for now). Blocking mode is another
> +        * problem to deal with later.
> +        */
> +       packet_write(fd, "refresh");
> +       close(fd);
> +}
> +

Seems the argument 'istate' isn't used.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 11:52       ` Erik Faye-Lund
  2014-05-13 12:01         ` Duy Nguyen
@ 2014-05-13 13:01         ` Duy Nguyen
  2014-05-13 13:37           ` Erik Faye-Lund
  1 sibling, 1 reply; 76+ messages in thread
From: Duy Nguyen @ 2014-05-13 13:01 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: GIT Mailing-list

What do you think is a good replacement for unix socket on Windows?
It's only used to refresh the cache in the daemon, no sensitive data
sent over, so security is not a problem. I'm thinking maybe just
TCP/IP server, but that's going to be a system-wide daemon.. Perhaps
the windows daemon could just monitor $GIT_DIR/index and refresh it?
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 13:01         ` Duy Nguyen
@ 2014-05-13 13:37           ` Erik Faye-Lund
  2014-05-13 13:49             ` Duy Nguyen
  0 siblings, 1 reply; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 13:37 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 3:01 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> What do you think is a good replacement for unix socket on Windows?
> It's only used to refresh the cache in the daemon, no sensitive data
> sent over, so security is not a problem. I'm thinking maybe just
> TCP/IP server, but that's going to be a system-wide daemon.. Perhaps
> the windows daemon could just monitor $GIT_DIR/index and refresh it?

Windows has support for Named Pipes, which seems like the right kind
of communication channel. However, the programming model differs quite
a bit from unix-sockets:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365594%28v=vs.85%29.aspx

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 13:37           ` Erik Faye-Lund
@ 2014-05-13 13:49             ` Duy Nguyen
  2014-05-13 14:06               ` Erik Faye-Lund
  0 siblings, 1 reply; 76+ messages in thread
From: Duy Nguyen @ 2014-05-13 13:49 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 8:37 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Tue, May 13, 2014 at 3:01 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> What do you think is a good replacement for unix socket on Windows?
>> It's only used to refresh the cache in the daemon, no sensitive data
>> sent over, so security is not a problem. I'm thinking maybe just
>> TCP/IP server, but that's going to be a system-wide daemon.. Perhaps
>> the windows daemon could just monitor $GIT_DIR/index and refresh it?
>
> Windows has support for Named Pipes, which seems like the right kind
> of communication channel. However, the programming model differs quite
> a bit from unix-sockets:
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365594%28v=vs.85%29.aspx

Yeah that was my first option, but if code cannot be shared to
differences then we probably should go another way. The old
FindWindow/PostMessage still works with modern Windows, right? Maybe
we could create a window with a name derived from the daemon's pid and
save the name in the index, then PostMessage can signal the daemon. On
the UNIX front, we store pid and send SIGUSR1 instead..The good thing
here is the Git side will be very simple (PostMessage vs kill).
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 13:49             ` Duy Nguyen
@ 2014-05-13 14:06               ` Erik Faye-Lund
  2014-05-13 14:10                 ` Duy Nguyen
  0 siblings, 1 reply; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 14:06 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 3:49 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, May 13, 2014 at 8:37 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Tue, May 13, 2014 at 3:01 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>> What do you think is a good replacement for unix socket on Windows?
>>> It's only used to refresh the cache in the daemon, no sensitive data
>>> sent over, so security is not a problem. I'm thinking maybe just
>>> TCP/IP server, but that's going to be a system-wide daemon.. Perhaps
>>> the windows daemon could just monitor $GIT_DIR/index and refresh it?
>>
>> Windows has support for Named Pipes, which seems like the right kind
>> of communication channel. However, the programming model differs quite
>> a bit from unix-sockets:
>>
>> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365594%28v=vs.85%29.aspx
>
> Yeah that was my first option, but if code cannot be shared to
> differences then we probably should go another way. The old
> FindWindow/PostMessage still works with modern Windows, right? Maybe
> we could create a window with a name derived from the daemon's pid and
> save the name in the index, then PostMessage can signal the daemon. On
> the UNIX front, we store pid and send SIGUSR1 instead..The good thing
> here is the Git side will be very simple (PostMessage vs kill).

Hmmm.... I'm a bit worried about having to load in USER32.DLL just to
read the cache that way. But it seems we already do that, thanks to
compat/poll/poll.c (it depends on DispatchMessage,
MsgWaitForMultipleObjects, PeekMessage and TranslateMessage, all from
that DLL).

Preferably, we should delay-load USER32.DLL in compat/poll/poll.c, but
if we start needing it for the reading the index, it'll be loaded by
the vast majority of processes anyway.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 14:06               ` Erik Faye-Lund
@ 2014-05-13 14:10                 ` Duy Nguyen
  2014-05-13 14:16                   ` Erik Faye-Lund
  0 siblings, 1 reply; 76+ messages in thread
From: Duy Nguyen @ 2014-05-13 14:10 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 9:06 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Tue, May 13, 2014 at 3:49 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Tue, May 13, 2014 at 8:37 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>>> On Tue, May 13, 2014 at 3:01 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>>> What do you think is a good replacement for unix socket on Windows?
>>>> It's only used to refresh the cache in the daemon, no sensitive data
>>>> sent over, so security is not a problem. I'm thinking maybe just
>>>> TCP/IP server, but that's going to be a system-wide daemon.. Perhaps
>>>> the windows daemon could just monitor $GIT_DIR/index and refresh it?
>>>
>>> Windows has support for Named Pipes, which seems like the right kind
>>> of communication channel. However, the programming model differs quite
>>> a bit from unix-sockets:
>>>
>>> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365594%28v=vs.85%29.aspx
>>
>> Yeah that was my first option, but if code cannot be shared to
>> differences then we probably should go another way. The old
>> FindWindow/PostMessage still works with modern Windows, right? Maybe
>> we could create a window with a name derived from the daemon's pid and
>> save the name in the index, then PostMessage can signal the daemon. On
>> the UNIX front, we store pid and send SIGUSR1 instead..The good thing
>> here is the Git side will be very simple (PostMessage vs kill).
>
> Hmmm.... I'm a bit worried about having to load in USER32.DLL just to
> read the cache that way. But it seems we already do that, thanks to
> compat/poll/poll.c (it depends on DispatchMessage,
> MsgWaitForMultipleObjects, PeekMessage and TranslateMessage, all from
> that DLL).
>
> Preferably, we should delay-load USER32.DLL in compat/poll/poll.c, but
> if we start needing it for the reading the index, it'll be loaded by
> the vast majority of processes anyway.

Thanks for the info. I'll see if we can still stick to named pipes. If
we have to load user32.dll, hopefully the gain will outweigh load time
for that dell.
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/3] Add read-cache--daemon
  2014-05-13 14:10                 ` Duy Nguyen
@ 2014-05-13 14:16                   ` Erik Faye-Lund
  0 siblings, 0 replies; 76+ messages in thread
From: Erik Faye-Lund @ 2014-05-13 14:16 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: GIT Mailing-list

On Tue, May 13, 2014 at 4:10 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, May 13, 2014 at 9:06 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Tue, May 13, 2014 at 3:49 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>> On Tue, May 13, 2014 at 8:37 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>>>> On Tue, May 13, 2014 at 3:01 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>>>> What do you think is a good replacement for unix socket on Windows?
>>>>> It's only used to refresh the cache in the daemon, no sensitive data
>>>>> sent over, so security is not a problem. I'm thinking maybe just
>>>>> TCP/IP server, but that's going to be a system-wide daemon.. Perhaps
>>>>> the windows daemon could just monitor $GIT_DIR/index and refresh it?
>>>>
>>>> Windows has support for Named Pipes, which seems like the right kind
>>>> of communication channel. However, the programming model differs quite
>>>> a bit from unix-sockets:
>>>>
>>>> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365594%28v=vs.85%29.aspx
>>>
>>> Yeah that was my first option, but if code cannot be shared to
>>> differences then we probably should go another way. The old
>>> FindWindow/PostMessage still works with modern Windows, right? Maybe
>>> we could create a window with a name derived from the daemon's pid and
>>> save the name in the index, then PostMessage can signal the daemon. On
>>> the UNIX front, we store pid and send SIGUSR1 instead..The good thing
>>> here is the Git side will be very simple (PostMessage vs kill).
>>
>> Hmmm.... I'm a bit worried about having to load in USER32.DLL just to
>> read the cache that way. But it seems we already do that, thanks to
>> compat/poll/poll.c (it depends on DispatchMessage,
>> MsgWaitForMultipleObjects, PeekMessage and TranslateMessage, all from
>> that DLL).
>>
>> Preferably, we should delay-load USER32.DLL in compat/poll/poll.c, but
>> if we start needing it for the reading the index, it'll be loaded by
>> the vast majority of processes anyway.
>
> Thanks for the info. I'll see if we can still stick to named pipes. If
> we have to load user32.dll, hopefully the gain will outweigh load time
> for that dell.

I just timed it here on my system, and omitting USER32.DLL didn't gain
anything for "git --version", so I suspect I was worrying too soon.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/8] Speed up cache loading time
  2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
                       ` (9 preceding siblings ...)
  2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
@ 2014-05-13 14:24     ` Stefan Beller
  2014-05-13 14:35       ` Duy Nguyen
  10 siblings, 1 reply; 76+ messages in thread
From: Stefan Beller @ 2014-05-13 14:24 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git Mailing List

> That is clocked at 800 MHz. A repository at this size deserves a
> better CPU. At 2.5 GHz we spend 183.228ms on loading the index. A
> reasonable number to me. If we scale other parts of git-status as well
> as this, we should be able to make "git status" within 1 or 2 seconds.
>

Which harddrive do you use? Traditional or SSDs?
Does have harddrive loading time significant impact here? (just a
guess/question)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/8] Speed up cache loading time
  2014-05-13 14:24     ` [PATCH 0/8] Speed up cache loading time Stefan Beller
@ 2014-05-13 14:35       ` Duy Nguyen
  0 siblings, 0 replies; 76+ messages in thread
From: Duy Nguyen @ 2014-05-13 14:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

On Tue, May 13, 2014 at 9:24 PM, Stefan Beller <stefanbeller@gmail.com> wrote:
>> That is clocked at 800 MHz. A repository at this size deserves a
>> better CPU. At 2.5 GHz we spend 183.228ms on loading the index. A
>> reasonable number to me. If we scale other parts of git-status as well
>> as this, we should be able to make "git status" within 1 or 2 seconds.
>>
>
> Which harddrive do you use? Traditional or SSDs?

Traditional

> Does have harddrive loading time significant impact here? (just a
> guess/question)

In the hot cache case, I assume the index stays in OS cache anyway so
hard drive should not impact much (the other parts of git-status like
index refresh or untracked file listing is a different story and some
may fall out of cache). My laptop has 4G ram, with my repeated tests,
I guess the index (even 200mb) stayed in the cache (but did not really
verify it).
-- 
Duy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 8/8] read-cache: inform the daemon that the index has been updated
  2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
  2014-05-13 12:17       ` Erik Faye-Lund
@ 2014-05-22 16:38       ` David Turner
  1 sibling, 0 replies; 76+ messages in thread
From: David Turner @ 2014-05-22 16:38 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Tue, 2014-05-13 at 18:15 +0700, Nguyễn Thái Ngọc Duy wrote:
> +		if (run_command(&cp))
> +			warning(_("failed to start read-cache--daemon: %s"),
> +				strerror(errno));

errno is not always (ever?) set, so if read-cache--daemon is missing,
you get:
warning: failed to start read-cache--daemon: Success

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 18/32] read-cache: mark new entries for split index
  2014-06-13 12:19 [PATCH 00/32] Split index resend Nguyễn Thái Ngọc Duy
@ 2014-06-13 12:19 ` Nguyễn Thái Ngọc Duy
  0 siblings, 0 replies; 76+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-06-13 12:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Make sure entry addition does not lead to unifying the index. We don't
need to explicitly keep track of new entries. If ce->index is zero,
they're new. Otherwise it's unlikely that they are new, but we'll do a
thorough check later at writing time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 90a3f09..52a27b3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -38,7 +38,8 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
-#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED)
+#define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
+		 CE_ENTRY_ADDED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2014-06-13 12:21 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 05/32] read-cache: relocate and unexport commit_locked_index() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 08/32] read-cache: be specific what part of the index has changed Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 09/32] update-index: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 10/32] resolve-undo: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 11/32] unpack-trees: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 16/32] read-cache: save index SHA-1 after reading Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 22:46   ` Junio C Hamano
2014-04-29  1:43     ` Duy Nguyen
2014-04-29 17:23       ` Junio C Hamano
2014-04-29 22:45         ` Duy Nguyen
2014-04-30 13:57           ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
2014-04-30 20:35   ` Eric Sunshine
2014-04-28 10:55 ` [PATCH 19/32] read-cache: save deleted entries in " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 20/32] read-cache: mark updated entries for " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 21/32] split-index: the writing part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 22/32] split-index: the reading part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 23/32] split-index: do not invalidate cache-tree at read time Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
2014-04-29 20:25   ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 25/32] update-index: new options to enable/disable split index mode Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 28/32] read-tree: force split-index mode off on --index-output Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 29/32] read-tree: note about dropping split-index mode or index version Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 31/32] t2104: make sure split index mode is off for the version test Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 32/32] t1700: new tests for split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
2014-04-29  1:52   ` Duy Nguyen
2014-05-09 10:27   ` Duy Nguyen
2014-05-09 17:55     ` Junio C Hamano
2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
2014-05-13 11:52       ` Erik Faye-Lund
2014-05-13 12:01         ` Duy Nguyen
2014-05-13 13:01         ` Duy Nguyen
2014-05-13 13:37           ` Erik Faye-Lund
2014-05-13 13:49             ` Duy Nguyen
2014-05-13 14:06               ` Erik Faye-Lund
2014-05-13 14:10                 ` Duy Nguyen
2014-05-13 14:16                   ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
2014-05-13 11:59       ` Erik Faye-Lund
2014-05-13 12:03         ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 3/8] daemonize: set a flag before exiting the main process Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 3/3] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
2014-05-13 11:56       ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 12:13       ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 6/8] read-cache--daemon: do not read index " Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
2014-05-13 12:17       ` Erik Faye-Lund
2014-05-22 16:38       ` David Turner
2014-05-13 14:24     ` [PATCH 0/8] Speed up cache loading time Stefan Beller
2014-05-13 14:35       ` Duy Nguyen
2014-05-13 11:20   ` [PATCH 9/8] even faster loading time with index version 254 Nguyễn Thái Ngọc Duy
2014-04-28 22:23 ` [PATCH 00/32] Split index mode for very large indexes Junio C Hamano
2014-04-30 20:48 ` Richard Hansen
2014-05-01  0:09   ` Duy Nguyen
2014-06-13 12:19 [PATCH 00/32] Split index resend Nguyễn Thái Ngọc Duy
2014-06-13 12:19 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).