From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: szeder.dev@gmail.com, newren@gmail.com,
Derrick Stolee <dstolee@microsoft.com>,
Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v2 1/1] sparse-checkout: respect core.ignoreCase in cone mode
Date: Fri, 13 Dec 2019 18:09:53 +0000 [thread overview]
Message-ID: <dc97d5a0d2a39d3888b75e1154770817e16c3112.1576260593.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.488.v2.git.1576260593.gitgitgadget@gmail.com>
From: Derrick Stolee <dstolee@microsoft.com>
When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."
If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.
Perform case-insensitive checks while updating the skip-worktree
bits during unpack_trees(). This is done by changing the hash
algorithm and hashmap comparison methods to optionally use case-
insensitive methods.
When this is enabled, there is a small performance cost in the
hashing algorithm. To tease out the worst possible case, the
following was run on a repo with a deep directory structure:
git ls-tree -d -r --name-only HEAD |
git sparse-checkout set --stdin
The 'set' command was timed with core.ignoreCase disabled or
enabled. For the repo with a deep history, the numbers were
core.ignoreCase=false: 62s
core.ignoreCase=true: 74s (+19.3%)
For reproducibility, the equivalent test on the Linux kernel
repository had these numbers:
core.ignoreCase=false: 3.1s
core.ignoreCase=true: 3.6s (+16%)
Now, this is not an entirely fair comparison, as most users
will define their sparse cone using more shallow directories,
and the performance improvement from eb42feca97 ("unpack-trees:
hash less in cone mode" 2019-11-21) can remove most of the
hash cost. For a more realistic test, drop the "-r" from the
ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in
each case, and the deep repository takes one second, plus or
minus 0.05s, in each case.
Thus, we _can_ demonstrate a cost to this change, but it is
unlikely to matter to any reasonable sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
Documentation/git-sparse-checkout.txt | 5 +++++
builtin/sparse-checkout.c | 10 ++++++++--
dir.c | 15 ++++++++++++---
t/t1091-sparse-checkout-builtin.sh | 17 +++++++++++++++++
4 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index b975285673..9c3c66cc37 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -150,6 +150,11 @@ expecting patterns of these types. Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash-
based algorithms to compute inclusion in the sparse-checkout.
+If `core.ignoreCase=true`, then the pattern-matching algorithm will use a
+case-insensitive check. This corrects for case mismatched filenames in the
+'git sparse-checkout set' command to reflect the expected cone in the working
+directory.
+
SEE ALSO
--------
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index a542d617a5..5d62f7a66d 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -313,7 +313,10 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
struct pattern_entry *e = xmalloc(sizeof(*e));
e->patternlen = path->len;
e->pattern = strbuf_detach(path, NULL);
- hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
+ hashmap_entry_init(&e->ent,
+ ignore_case ?
+ strihash(e->pattern) :
+ strhash(e->pattern));
hashmap_add(&pl->recursive_hashmap, &e->ent);
@@ -329,7 +332,10 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
e = xmalloc(sizeof(struct pattern_entry));
e->patternlen = newlen;
e->pattern = xstrndup(oldpattern, newlen);
- hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
+ hashmap_entry_init(&e->ent,
+ ignore_case ?
+ strihash(e->pattern) :
+ strhash(e->pattern));
if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
hashmap_add(&pl->parent_hashmap, &e->ent);
diff --git a/dir.c b/dir.c
index 2ef92a50a0..22d08e61c2 100644
--- a/dir.c
+++ b/dir.c
@@ -625,6 +625,8 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
? ee1->patternlen
: ee2->patternlen;
+ if (ignore_case)
+ return strncasecmp(ee1->pattern, ee2->pattern, min_len);
return strncmp(ee1->pattern, ee2->pattern, min_len);
}
@@ -665,7 +667,9 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
translated->pattern = truncated;
translated->patternlen = given->patternlen - 2;
hashmap_entry_init(&translated->ent,
- memhash(translated->pattern, translated->patternlen));
+ ignore_case ?
+ strihash(translated->pattern) :
+ strhash(translated->pattern));
if (!hashmap_get_entry(&pl->recursive_hashmap,
translated, ent, NULL)) {
@@ -694,7 +698,9 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
translated->pattern = xstrdup(given->pattern);
translated->patternlen = given->patternlen;
hashmap_entry_init(&translated->ent,
- memhash(translated->pattern, translated->patternlen));
+ ignore_case ?
+ strihash(translated->pattern) :
+ strhash(translated->pattern));
hashmap_add(&pl->recursive_hashmap, &translated->ent);
@@ -724,7 +730,10 @@ static int hashmap_contains_path(struct hashmap *map,
/* Check straight mapping */
p.pattern = pattern->buf;
p.patternlen = pattern->len;
- hashmap_entry_init(&p.ent, memhash(p.pattern, p.patternlen));
+ hashmap_entry_init(&p.ent,
+ ignore_case ?
+ strihash(p.pattern) :
+ strhash(p.pattern));
return !!hashmap_get_entry(map, &p, ent, NULL);
}
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index d5e2892526..cee98a1c8a 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -304,4 +304,21 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
git -C dirty sparse-checkout disable
'
+test_expect_success 'cone mode: set with core.ignoreCase=true' '
+ git -C repo sparse-checkout init --cone &&
+ git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
+ cat >expect <<-EOF &&
+ /*
+ !/*/
+ /folder1/
+ EOF
+ test_cmp expect repo/.git/info/sparse-checkout &&
+ ls repo >dir &&
+ cat >expect <<-EOF &&
+ a
+ folder1
+ EOF
+ test_cmp expect dir
+'
+
test_done
--
gitgitgadget
next prev parent reply other threads:[~2019-12-13 20:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2019-12-11 18:44 ` Junio C Hamano
2019-12-11 19:11 ` Derrick Stolee
2019-12-11 20:00 ` Junio C Hamano
2019-12-11 20:29 ` Derrick Stolee
2019-12-11 21:37 ` Junio C Hamano
2019-12-12 2:51 ` Derrick Stolee
2019-12-12 20:59 ` Derrick Stolee
2019-12-13 19:05 ` Junio C Hamano
2019-12-13 19:40 ` Derrick Stolee
2019-12-13 22:02 ` Junio C Hamano
2019-12-13 18:09 ` [PATCH v2 0/1] " Derrick Stolee via GitGitGadget
2019-12-13 18:09 ` Derrick Stolee via GitGitGadget [this message]
2019-12-13 19:58 ` Junio C Hamano
2019-12-18 11:41 ` [PATCH " Ed Maste
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc97d5a0d2a39d3888b75e1154770817e16c3112.1576260593.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).