From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, newren@gmail.com, peff@peff.net,
jrnieder@gmail.com, sunshine@sunshineco.com, pclouds@gmail.com,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 27/27] cache-tree: integrate with sparse directory entries
Date: Mon, 25 Jan 2021 17:42:13 +0000 [thread overview]
Message-ID: <05e7548b780da6b2bf2342d91d8757568df0a6b8.1611596534.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.847.git.1611596533.gitgitgadget@gmail.com>
From: Derrick Stolee <dstolee@microsoft.com>
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.
When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.
Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:
1. Use the full index. The index size is ~186 MB.
2. Use the sparse index. The index size is ~5.5 MB.
3. Use a commit where HEAD matches the populated set. The full index
size is ~5.3MB.
The third benchmark is included as a theoretical optimium for a
repository of the same object database.
First, a clean 'git status' improves from 3.1s to 240ms.
Benchmark #1: full index (git status)
Time (mean ± σ): 3.167 s ± 0.036 s [User: 2.006 s, System: 1.078 s]
Range (min … max): 3.100 s … 3.208 s 10 runs
Benchmark #2: sparse index (git status)
Time (mean ± σ): 239.5 ms ± 8.1 ms [User: 189.4 ms, System: 226.8 ms]
Range (min … max): 226.0 ms … 251.9 ms 13 runs
Benchmark #3: small tree (git status)
Time (mean ± σ): 195.3 ms ± 4.5 ms [User: 116.5 ms, System: 84.4 ms]
Range (min … max): 188.8 ms … 202.8 ms 15 runs
The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.
The performance numbers for 'git add .' are much closer to optimal:
Benchmark #1: full index (git add .)
Time (mean ± σ): 3.076 s ± 0.022 s [User: 2.065 s, System: 0.943 s]
Range (min … max): 3.044 s … 3.116 s 10 runs
Benchmark #2: sparse index (git add .)
Time (mean ± σ): 218.0 ms ± 6.6 ms [User: 195.7 ms, System: 206.6 ms]
Range (min … max): 209.8 ms … 228.2 ms 13 runs
Benchmark #3: small tree (git add .)
Time (mean ± σ): 217.6 ms ± 5.4 ms [User: 131.9 ms, System: 86.7 ms]
Range (min … max): 212.1 ms … 228.4 ms 14 runs
In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
cache-tree.c | 18 ++++++++++++++++++
sparse-index.c | 10 +++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/cache-tree.c b/cache-tree.c
index 5f07a39e501..9da6a4394e0 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it,
*skip_count = 0;
+ /*
+ * If the first entry of this region is a sparse directory
+ * entry corresponding exactly to 'base', then this cache_tree
+ * struct is a "leaf" in the data structure, pointing to the
+ * tree OID specified in the entry.
+ */
+ if (entries > 0) {
+ const struct cache_entry *ce = cache[0];
+
+ if (S_ISSPARSEDIR(ce) &&
+ ce->ce_namelen == baselen &&
+ !strncmp(ce->name, base, baselen)) {
+ it->entry_count = 1;
+ oidcpy(&it->oid, &ce->oid);
+ return 1;
+ }
+ }
+
if (0 <= it->entry_count && has_object_file(&it->oid))
return it->entry_count;
diff --git a/sparse-index.c b/sparse-index.c
index a201f3b905c..9ea3b321400 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -181,7 +181,11 @@ int convert_to_sparse(struct index_state *istate)
istate->cache_nr = convert_to_sparse_rec(istate,
0, 0, istate->cache_nr,
"", 0, istate->cache_tree);
- istate->drop_cache_tree = 1;
+
+ /* Clear and recompute the cache-tree */
+ cache_tree_free(&istate->cache_tree);
+ cache_tree_update(istate, 0);
+
istate->sparse_index = 1;
trace2_region_leave("index", "convert_to_sparse", istate->repo);
return 0;
@@ -278,6 +282,10 @@ void ensure_full_index(struct index_state *istate)
free(full);
+ /* Clear and recompute the cache-tree */
+ cache_tree_free(&istate->cache_tree);
+ cache_tree_update(istate, 0);
+
trace2_region_leave("index", "ensure_full_index", istate->repo);
}
--
gitgitgadget
next prev parent reply other threads:[~2021-01-25 17:59 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-25 17:41 [PATCH 00/27] [RFC] Sparse Index Derrick Stolee via GitGitGadget
2021-01-25 17:41 ` [PATCH 01/27] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget
2021-01-25 17:41 ` [PATCH 02/27] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget
2021-01-27 3:05 ` Elijah Newren
2021-01-27 13:43 ` Derrick Stolee
2021-01-27 16:38 ` Elijah Newren
2021-01-28 5:25 ` Junio C Hamano
2021-01-25 17:41 ` [PATCH 03/27] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget
2021-01-27 3:08 ` Elijah Newren
2021-01-27 13:30 ` Derrick Stolee
2021-01-27 16:54 ` Elijah Newren
2021-01-25 17:41 ` [PATCH 04/27] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget
2021-01-27 3:25 ` Elijah Newren
2021-01-25 17:41 ` [PATCH 05/27] test-tool: read-cache --table --no-stat Derrick Stolee via GitGitGadget
2021-01-25 17:41 ` [PATCH 06/27] test-tool: don't force full index Derrick Stolee via GitGitGadget
2021-01-25 17:41 ` [PATCH 07/27] unpack-trees: ensure " Derrick Stolee via GitGitGadget
2021-01-27 4:43 ` Elijah Newren
2021-01-25 17:41 ` [PATCH 08/27] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget
2021-01-27 17:00 ` Elijah Newren
2021-01-28 13:12 ` Derrick Stolee
2021-01-25 17:41 ` [PATCH 09/27] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget
2021-01-27 17:30 ` Elijah Newren
2021-01-25 17:41 ` [PATCH 10/27] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget
2021-01-25 17:41 ` [PATCH 11/27] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget
2021-01-27 17:36 ` Elijah Newren
2021-01-25 17:41 ` [PATCH 12/27] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget
2021-01-27 17:46 ` Elijah Newren
2021-01-25 17:41 ` [PATCH 13/27] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget
2021-01-27 18:03 ` Elijah Newren
2021-01-25 17:42 ` [PATCH 14/27] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget
2021-01-27 18:18 ` Elijah Newren
2021-01-28 15:26 ` Derrick Stolee
2021-01-25 17:42 ` [PATCH 15/27] [RFC-VERSION] *: ensure full index Derrick Stolee via GitGitGadget
2021-02-01 20:22 ` Elijah Newren
2021-02-01 21:10 ` Derrick Stolee
2021-01-25 17:42 ` [PATCH 16/27] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
2021-02-01 20:50 ` Elijah Newren
2021-02-09 17:23 ` Derrick Stolee
2021-01-25 17:42 ` [PATCH 17/27] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-02-01 22:12 ` Elijah Newren
2021-01-25 17:42 ` [PATCH 18/27] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-01-25 17:42 ` [PATCH 19/27] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-01-25 17:42 ` [PATCH 20/27] sparse-index: expand_to_path() trivial implementation Derrick Stolee via GitGitGadget
2021-01-25 17:42 ` [PATCH 21/27] sparse-index: expand_to_path no-op if path exists Derrick Stolee via GitGitGadget
2021-02-01 22:34 ` Elijah Newren
2021-01-25 17:42 ` [PATCH 22/27] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-02-01 23:08 ` Elijah Newren
2021-01-25 17:42 ` [PATCH 23/27] submodule: die_path_inside_submodule is sparse aware Derrick Stolee via GitGitGadget
2021-01-25 17:42 ` [PATCH 24/27] dir: use expand_to_path in add_patterns() Derrick Stolee via GitGitGadget
2021-02-01 23:21 ` Elijah Newren
2021-01-25 17:42 ` [PATCH 25/27] fsmonitor: disable if index is sparse Derrick Stolee via GitGitGadget
2021-01-25 17:42 ` [PATCH 26/27] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-02-01 23:24 ` Elijah Newren
2021-02-02 2:39 ` Derrick Stolee
2021-01-25 17:42 ` Derrick Stolee via GitGitGadget [this message]
2021-02-01 23:54 ` [PATCH 27/27] cache-tree: integrate with sparse directory entries Elijah Newren
2021-02-02 2:41 ` Derrick Stolee
2021-02-02 3:05 ` Elijah Newren
2021-01-25 20:10 ` [PATCH 00/27] [RFC] Sparse Index Junio C Hamano
2021-01-25 21:18 ` Derrick Stolee
2021-02-02 3:11 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=05e7548b780da6b2bf2342d91d8757568df0a6b8.1611596534.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=derrickstolee@github.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=newren@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).