From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: szeder.dev@gmail.com, newren@gmail.com,
Derrick Stolee <dstolee@microsoft.com>,
Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/1] sparse-checkout: respect core.ignoreCase in cone mode
Date: Fri, 13 Dec 2019 18:09:52 +0000 [thread overview]
Message-ID: <pull.488.v2.git.1576260593.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.488.git.1575920580.gitgitgadget@gmail.com>
This is the first of several cleanups to the sparse-checkout feature that
had a large overhaul in ds/sparse-cone.
We have an internal customer that plans to use sparse-checkout as a core
feature of their version control experience. They use a case-insensitive
filesystem, so as they created their directory dependency graph they did not
keep consistent casing. This data is what they plan to feed to "git
sparse-checkout set".
While I would certainly prefer that the data is cleaned up (and that process
is ongoing), I could not argue with the logic that git add does the "right
thing" when core.ignoreCase is enabled.
Update in V2:
Based on excellent feedback from Junio, we decided to shore up the case
issues from the previous implementation. Now, the sparse-checkout file may
be stored with the "wrong" case but the hash algorithm is replaced with a
case-insensitive check when core.ignoreCase is set. Because the performance
impact is much lower than I anticipated, I no longer think it is important
to add a new config option to enable/disable this logic.
Thanks, -Stolee
Derrick Stolee (1):
sparse-checkout: respect core.ignoreCase in cone mode
Documentation/git-sparse-checkout.txt | 5 +++++
builtin/sparse-checkout.c | 10 ++++++++--
dir.c | 15 ++++++++++++---
t/t1091-sparse-checkout-builtin.sh | 17 +++++++++++++++++
4 files changed, 42 insertions(+), 5 deletions(-)
base-commit: cff4e9138d8df45e3b6199171092ee781cdadaeb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-488%2Fderrickstolee%2Fsparse-case-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-488/derrickstolee/sparse-case-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/488
Range-diff vs v1:
1: 23705845ce ! 1: dc97d5a0d2 sparse-checkout: respect core.ignoreCase in cone mode
@@ -12,23 +12,42 @@
using a case-insensitive match. Do the same for the sparse-checkout
feature.
- The sanitize_cone_input() method is named to imply that other checks
- may be added. In fact, such checks are planned including looking for
- wildcards that make the paths invalid cone patterns or must be
- escaped.
-
- Specifically, if the path has a match in the index, then use that
- path instead. If there is no match, still add that path to the
- patterns, as the user may expect the directory to appear after a
- checkout to another ref. However, we have no matching path to
- correct for a case conflict, and must assume that the user provided
- the correct case.
-
- Another option would be to do case-insensitive checks while
- updating the skip-worktree bits during unpack_trees(). Outside of
- the potential performance loss on a more expensive code path, that
- also breaks compatibility with older versions of Git as using the
- same sparse-checkout file would change the paths that are included.
+ Perform case-insensitive checks while updating the skip-worktree
+ bits during unpack_trees(). This is done by changing the hash
+ algorithm and hashmap comparison methods to optionally use case-
+ insensitive methods.
+
+ When this is enabled, there is a small performance cost in the
+ hashing algorithm. To tease out the worst possible case, the
+ following was run on a repo with a deep directory structure:
+
+ git ls-tree -d -r --name-only HEAD |
+ git sparse-checkout set --stdin
+
+ The 'set' command was timed with core.ignoreCase disabled or
+ enabled. For the repo with a deep history, the numbers were
+
+ core.ignoreCase=false: 62s
+ core.ignoreCase=true: 74s (+19.3%)
+
+ For reproducibility, the equivalent test on the Linux kernel
+ repository had these numbers:
+
+ core.ignoreCase=false: 3.1s
+ core.ignoreCase=true: 3.6s (+16%)
+
+ Now, this is not an entirely fair comparison, as most users
+ will define their sparse cone using more shallow directories,
+ and the performance improvement from eb42feca97 ("unpack-trees:
+ hash less in cone mode" 2019-11-21) can remove most of the
+ hash cost. For a more realistic test, drop the "-r" from the
+ ls-tree command to store only the first-level directories.
+ In that case, the Linux kernel repository takes 0.2-0.25s in
+ each case, and the deep repository takes one second, plus or
+ minus 0.05s, in each case.
+
+ Thus, we _can_ demonstrate a cost to this change, but it is
+ unlikely to matter to any reasonable sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@@ -39,9 +58,10 @@
If the patterns do match the expected format, then Git will use faster hash-
based algorithms to compute inclusion in the sparse-checkout.
-+If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
-+correct for incorrect case when assigning patterns to the sparse-checkout
-+file.
++If `core.ignoreCase=true`, then the pattern-matching algorithm will use a
++case-insensitive check. This corrects for case mismatched filenames in the
++'git sparse-checkout set' command to reflect the expected cone in the working
++directory.
+
SEE ALSO
--------
@@ -51,71 +71,76 @@
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@
- }
- }
+ struct pattern_entry *e = xmalloc(sizeof(*e));
+ e->patternlen = path->len;
+ e->pattern = strbuf_detach(path, NULL);
+- hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
++ hashmap_entry_init(&e->ent,
++ ignore_case ?
++ strihash(e->pattern) :
++ strhash(e->pattern));
+
+ hashmap_add(&pl->recursive_hashmap, &e->ent);
-+static void sanitize_cone_input(struct strbuf *line)
-+{
-+ if (ignore_case) {
-+ struct index_state *istate = the_repository->index;
-+ const char *name = index_dir_matching_name(istate, line->buf, line->len);
-+
-+ if (name) {
-+ strbuf_setlen(line, 0);
-+ strbuf_addstr(line, name);
-+ }
-+ }
-+
-+ if (line->buf[0] != '/')
-+ strbuf_insert(line, 0, "/", 1);
-+}
-+
- static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
- {
- strbuf_trim(line);
@@
- if (!line->len)
- return;
+ e = xmalloc(sizeof(struct pattern_entry));
+ e->patternlen = newlen;
+ e->pattern = xstrndup(oldpattern, newlen);
+- hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
++ hashmap_entry_init(&e->ent,
++ ignore_case ?
++ strihash(e->pattern) :
++ strhash(e->pattern));
-- if (line->buf[0] != '/')
-- strbuf_insert(line, 0, "/", 1);
-+ sanitize_cone_input(line);
+ if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
+ hashmap_add(&pl->parent_hashmap, &e->ent);
+
+ diff --git a/dir.c b/dir.c
+ --- a/dir.c
+ +++ b/dir.c
+@@
+ ? ee1->patternlen
+ : ee2->patternlen;
- insert_recursive_pattern(pl, line);
++ if (ignore_case)
++ return strncasecmp(ee1->pattern, ee2->pattern, min_len);
+ return strncmp(ee1->pattern, ee2->pattern, min_len);
}
-
- diff --git a/cache.h b/cache.h
- --- a/cache.h
- +++ b/cache.h
+
@@
- int verify_path(const char *path, unsigned mode);
- int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
- int index_dir_exists(struct index_state *istate, const char *name, int namelen);
-+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
- void adjust_dirname_case(struct index_state *istate, char *name);
- struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
+ translated->pattern = truncated;
+ translated->patternlen = given->patternlen - 2;
+ hashmap_entry_init(&translated->ent,
+- memhash(translated->pattern, translated->patternlen));
++ ignore_case ?
++ strihash(translated->pattern) :
++ strhash(translated->pattern));
-
- diff --git a/name-hash.c b/name-hash.c
- --- a/name-hash.c
- +++ b/name-hash.c
+ if (!hashmap_get_entry(&pl->recursive_hashmap,
+ translated, ent, NULL)) {
@@
- return dir && dir->nr;
+ translated->pattern = xstrdup(given->pattern);
+ translated->patternlen = given->patternlen;
+ hashmap_entry_init(&translated->ent,
+- memhash(translated->pattern, translated->patternlen));
++ ignore_case ?
++ strihash(translated->pattern) :
++ strhash(translated->pattern));
+
+ hashmap_add(&pl->recursive_hashmap, &translated->ent);
+
+@@
+ /* Check straight mapping */
+ p.pattern = pattern->buf;
+ p.patternlen = pattern->len;
+- hashmap_entry_init(&p.ent, memhash(p.pattern, p.patternlen));
++ hashmap_entry_init(&p.ent,
++ ignore_case ?
++ strihash(p.pattern) :
++ strhash(p.pattern));
+ return !!hashmap_get_entry(map, &p, ent, NULL);
}
-+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
-+{
-+ struct dir_entry *dir;
-+
-+ lazy_init_name_hash(istate);
-+ dir = find_dir_entry(istate, name, namelen);
-+
-+ return dir ? dir->name : NULL;
-+}
-+
- void adjust_dirname_case(struct index_state *istate, char *name)
- {
- const char *startPtr = name;
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
--- a/t/t1091-sparse-checkout-builtin.sh
@@ -125,16 +150,20 @@
'
+test_expect_success 'cone mode: set with core.ignoreCase=true' '
-+ test_when_finished git -C repo config --unset core.ignoreCase &&
+ git -C repo sparse-checkout init --cone &&
-+ git -C repo config core.ignoreCase true &&
-+ git -C repo sparse-checkout set Folder1 &&
++ git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
+ cat >expect <<-EOF &&
+ /*
+ !/*/
+ /folder1/
+ EOF
-+ test_cmp expect repo/.git/info/sparse-checkout
++ test_cmp expect repo/.git/info/sparse-checkout &&
++ ls repo >dir &&
++ cat >expect <<-EOF &&
++ a
++ folder1
++ EOF
++ test_cmp expect dir
+'
+
test_done
--
gitgitgadget
next prev parent reply other threads:[~2019-12-13 20:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2019-12-11 18:44 ` Junio C Hamano
2019-12-11 19:11 ` Derrick Stolee
2019-12-11 20:00 ` Junio C Hamano
2019-12-11 20:29 ` Derrick Stolee
2019-12-11 21:37 ` Junio C Hamano
2019-12-12 2:51 ` Derrick Stolee
2019-12-12 20:59 ` Derrick Stolee
2019-12-13 19:05 ` Junio C Hamano
2019-12-13 19:40 ` Derrick Stolee
2019-12-13 22:02 ` Junio C Hamano
2019-12-13 18:09 ` Derrick Stolee via GitGitGadget [this message]
2019-12-13 18:09 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2019-12-13 19:58 ` [PATCH v2 0/1] " Junio C Hamano
2019-12-18 11:41 ` [PATCH " Ed Maste
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.488.v2.git.1576260593.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).