git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Richard Hansen <rhansen@bbn.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 00/32] Split index mode for very large indexes
Date: Thu, 1 May 2014 07:09:05 +0700	[thread overview]
Message-ID: <20140501000905.GA2107@lanh> (raw)
In-Reply-To: <53616185.9000208@bbn.com>

On Wed, Apr 30, 2014 at 04:48:05PM -0400, Richard Hansen wrote:
> I played around with these changes a bit and have some questions:
> 
>   * These changes should only affect performance when the index is
>     updated, right?  In other words, if I do "git status; git status"
>     the second "git status" shouldn't update the index and therefore
>     shouldn't have a noticeable performance improvement relative to Git
>     without these patches.  Right?

Yes, provided that other factors in "git status" give stable numbers
too.

>   * Do you have any before/after benchmark results you can share?

I did not pay much attention to benchmarking because the index size is
pretty much the deciding factor to write performance (of course too
much computation could add up to that..). But here it is with the 14MB
webkit index, about 182k files. The lines of interest are the first
one (read_cache) and forth (update_index_if_able).

With normal index

pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   278.382ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   489.275ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     6.191ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.321ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    12.733ms wt_status_collect:621 wt_status_collect_changes_index(s)
    98.043ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   915.331ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.727s
user    0m0.809s
sys     0m0.915s
pclouds@lanh ~/d/webkit $ touch wscript
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   276.307ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   504.034ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
   356.122ms cmd_status:1299 update_index_if_able(&the_index, &inde
    11.870ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    10.077ms wt_status_collect:621 wt_status_collect_changes_index(s)
    96.205ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   899.425ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m2.071s
user    0m1.115s
sys     0m0.953s
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   279.424ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   484.303ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     5.288ms cmd_status:1299 update_index_if_able(&the_index, &inde
    13.927ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    12.507ms wt_status_collect:621 wt_status_collect_changes_index(s)
    98.220ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   920.985ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.731s
user    0m0.830s
sys     0m0.897s

And with split index:

pclouds@lanh ~/d/webkit $ time ~/w/git/git update-index --split-index

real    0m0.660s
user    0m0.601s
sys     0m0.058s
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   281.211ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.003ms cmd_status:1294 read_cache_preload(&s.pathspec);
   479.629ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     5.489ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.611ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    11.235ms wt_status_collect:621 wt_status_collect_changes_index(s)
    96.086ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   894.489ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.697s
user    0m0.813s
sys     0m0.881s
pclouds@lanh ~/d/webkit $ touch wscript
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   291.411ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.003ms cmd_status:1294 read_cache_preload(&s.pathspec);
   475.144ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
    24.348ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.440ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    10.400ms wt_status_collect:621 wt_status_collect_changes_index(s)
    97.147ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   907.240ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.734s
user    0m0.842s
sys     0m0.888s
pclouds@lanh ~/d/webkit $ time ~/w/git/git status >/dev/null
   281.119ms gitmodules_config:199 if (read_cache() < 0) die("index file
     0.004ms cmd_status:1294 read_cache_preload(&s.pathspec);
   479.702ms cmd_status:1295 refresh_index(&the_index, REFRESH_QUIE
     5.061ms cmd_status:1299 update_index_if_able(&the_index, &inde
    12.220ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
    11.408ms wt_status_collect:621 wt_status_collect_changes_index(s)
    95.374ms lazy_init_name_hash:136 { int nr; if (istate->name_hash_initia
   896.931ms wt_status_collect:622 wt_status_collect_untracked(s)

real    0m1.700s
user    0m0.809s
sys     0m0.888s

>   * Are there any benchmark scripts I can use to test it out in my own
>     repositories?

You could use the patch I used to generate the timing above. Patch at
the bottom of this mail.

>   * Is there a debug utility I can use to examine the contents of the
>     index and sharedindex.* files in a more human-readable way?

test-dump-split-index <path-to-$GIT_DIR/index>

will show you the content of "index". Entries without names are
replaced entries. Entries with them are added. They are followed by
replace/delete bitmaps printed out. So pretty much everything stored
in $GIT_DIR/index in human-readable format.

git rev-parse --shared-index-path will give you the path to
sharedindex. ls-files --stage can be used to show that.

> I'm asking because in my (very basic) tests I noticed that with the
> following command:
> 
>     git status; time git status
> 
> the second "git status" had an unexpected ~20% performance improvement
> in my repo relative to a build without your patches.  The second "git
> status" in the following command also had about a ~20% performance
> improvement:
> 
>     git status; touch file-in-index; time git status
> 
> So it seems like the patches did improve performance somewhat, but in
> ways I wasn't expecting.  (I'm not entirely certain my benchmark method
> is sound.)

git-status is a complex operation. If you want to focus on this
writing part only, "git update-index" may be better.

And the timing patch:

-- 8< --
diff --git a/builtin/commit.c b/builtin/commit.c
index 243b0c3..d680a44 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1291,12 +1291,12 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		       PATHSPEC_PREFER_FULL,
 		       prefix, argv);
 
-	read_cache_preload(&s.pathspec);
-	refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL);
+	TIME(read_cache_preload(&s.pathspec););
+	TIME(refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL););
 
 	fd = hold_locked_index(&index_lock, 0);
 	if (0 <= fd)
-		update_index_if_able(&the_index, &index_lock);
+		TIME(update_index_if_able(&the_index, &index_lock));
 
 	s.is_initial = get_sha1(s.reference, sha1) ? 1 : 0;
 	s.ignore_submodule_arg = ignore_submodule_arg;
diff --git a/cache.h b/cache.h
index 6ad2595..99854c7 100644
--- a/cache.h
+++ b/cache.h
@@ -1480,4 +1480,21 @@ void stat_validity_update(struct stat_validity *sv, int fd);
 
 int versioncmp(const char *s1, const char *s2);
 
+#define TIME(x) {						\
+	extern int time_level__;				\
+	struct timeval	tv1, tv2;				\
+	int current_level = time_level__;		\
+	gettimeofday(&tv1, NULL);				\
+	time_level__++;						\
+	x;							\
+	time_level__--;						\
+	gettimeofday(&tv2, NULL);				\
+	fprintf(stderr, "% 10.3fms%*s%s:%d %.*s\n",		\
+		tv2.tv_sec * 1000.0 + tv2.tv_usec / 1000.0 -	\
+		tv1.tv_sec * 1000.0 - tv1.tv_usec / 1000.0,	\
+		current_level, " ",				\
+		__FUNCTION__, __LINE__, 38, #x			\
+		);						\
+	}
+
 #endif /* CACHE_H */
diff --git a/environment.c b/environment.c
index 5c4815d..5dcbc6b 100644
--- a/environment.c
+++ b/environment.c
@@ -63,6 +63,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 struct startup_info *startup_info;
 unsigned long pack_size_limit_cfg;
+int time_level__;
 
 /*
  * The character that begins a commented line in user-editable file
diff --git a/name-hash.c b/name-hash.c
index 97444d0..b9121d1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -122,7 +122,7 @@ static int cache_entry_cmp(const struct cache_entry *ce1,
 }
 
 static void lazy_init_name_hash(struct index_state *istate)
-{
+	TIME({
 	int nr;
 
 	if (istate->name_hash_initialized)
@@ -133,7 +133,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
-}
+		})
 
 void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 {
diff --git a/submodule.c b/submodule.c
index b80ecac..1947f20 100644
--- a/submodule.c
+++ b/submodule.c
@@ -195,8 +195,8 @@ void gitmodules_config(void)
 		int pos;
 		strbuf_addstr(&gitmodules_path, work_tree);
 		strbuf_addstr(&gitmodules_path, "/.gitmodules");
-		if (read_cache() < 0)
-			die("index file corrupt");
+		TIME(if (read_cache() < 0)
+			die("index file corrupt"););
 		pos = cache_name_pos(".gitmodules", 11);
 		if (pos < 0) { /* .gitmodules not found or isn't merged */
 			pos = -1 - pos;
diff --git a/wt-status.c b/wt-status.c
index ec7344e..0141856 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -613,13 +613,13 @@ static void wt_status_collect_untracked(struct wt_status *s)
 
 void wt_status_collect(struct wt_status *s)
 {
-	wt_status_collect_changes_worktree(s);
+	TIME(wt_status_collect_changes_worktree(s));
 
 	if (s->is_initial)
 		wt_status_collect_changes_initial(s);
 	else
-		wt_status_collect_changes_index(s);
-	wt_status_collect_untracked(s);
+		TIME(wt_status_collect_changes_index(s));
+	TIME(wt_status_collect_untracked(s));
 }
 
 static void wt_status_print_unmerged(struct wt_status *s)
-- 8< --

      reply	other threads:[~2014-05-01  0:07 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 05/32] read-cache: relocate and unexport commit_locked_index() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 08/32] read-cache: be specific what part of the index has changed Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 09/32] update-index: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 10/32] resolve-undo: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 11/32] unpack-trees: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 16/32] read-cache: save index SHA-1 after reading Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 22:46   ` Junio C Hamano
2014-04-29  1:43     ` Duy Nguyen
2014-04-29 17:23       ` Junio C Hamano
2014-04-29 22:45         ` Duy Nguyen
2014-04-30 13:57           ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
2014-04-30 20:35   ` Eric Sunshine
2014-04-28 10:55 ` [PATCH 19/32] read-cache: save deleted entries in " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 20/32] read-cache: mark updated entries for " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 21/32] split-index: the writing part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 22/32] split-index: the reading part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 23/32] split-index: do not invalidate cache-tree at read time Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
2014-04-29 20:25   ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 25/32] update-index: new options to enable/disable split index mode Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 28/32] read-tree: force split-index mode off on --index-output Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 29/32] read-tree: note about dropping split-index mode or index version Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 31/32] t2104: make sure split index mode is off for the version test Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 32/32] t1700: new tests for split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
2014-04-29  1:52   ` Duy Nguyen
2014-05-09 10:27   ` Duy Nguyen
2014-05-09 17:55     ` Junio C Hamano
2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
2014-05-13 11:52       ` Erik Faye-Lund
2014-05-13 12:01         ` Duy Nguyen
2014-05-13 13:01         ` Duy Nguyen
2014-05-13 13:37           ` Erik Faye-Lund
2014-05-13 13:49             ` Duy Nguyen
2014-05-13 14:06               ` Erik Faye-Lund
2014-05-13 14:10                 ` Duy Nguyen
2014-05-13 14:16                   ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
2014-05-13 11:59       ` Erik Faye-Lund
2014-05-13 12:03         ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 3/8] daemonize: set a flag before exiting the main process Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 3/3] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
2014-05-13 11:56       ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 12:13       ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 6/8] read-cache--daemon: do not read index " Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
2014-05-13 12:17       ` Erik Faye-Lund
2014-05-22 16:38       ` David Turner
2014-05-13 14:24     ` [PATCH 0/8] Speed up cache loading time Stefan Beller
2014-05-13 14:35       ` Duy Nguyen
2014-05-13 11:20   ` [PATCH 9/8] even faster loading time with index version 254 Nguyễn Thái Ngọc Duy
2014-04-28 22:23 ` [PATCH 00/32] Split index mode for very large indexes Junio C Hamano
2014-04-30 20:48 ` Richard Hansen
2014-05-01  0:09   ` Duy Nguyen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140501000905.GA2107@lanh \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=rhansen@bbn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).