git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: szeder.dev@gmail.com, newren@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/1] sparse-checkout: respect core.ignoreCase in cone mode
Date: Fri, 13 Dec 2019 18:09:52 +0000	[thread overview]
Message-ID: <pull.488.v2.git.1576260593.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.488.git.1575920580.gitgitgadget@gmail.com>

This is the first of several cleanups to the sparse-checkout feature that
had a large overhaul in ds/sparse-cone.

We have an internal customer that plans to use sparse-checkout as a core
feature of their version control experience. They use a case-insensitive
filesystem, so as they created their directory dependency graph they did not
keep consistent casing. This data is what they plan to feed to "git
sparse-checkout set".

While I would certainly prefer that the data is cleaned up (and that process
is ongoing), I could not argue with the logic that git add does the "right
thing" when core.ignoreCase is enabled.

Update in V2:

Based on excellent feedback from Junio, we decided to shore up the case
issues from the previous implementation. Now, the sparse-checkout file may
be stored with the "wrong" case but the hash algorithm is replaced with a
case-insensitive check when core.ignoreCase is set. Because the performance
impact is much lower than I anticipated, I no longer think it is important
to add a new config option to enable/disable this logic.

Thanks, -Stolee

Derrick Stolee (1):
  sparse-checkout: respect core.ignoreCase in cone mode

 Documentation/git-sparse-checkout.txt |  5 +++++
 builtin/sparse-checkout.c             | 10 ++++++++--
 dir.c                                 | 15 ++++++++++++---
 t/t1091-sparse-checkout-builtin.sh    | 17 +++++++++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)


base-commit: cff4e9138d8df45e3b6199171092ee781cdadaeb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-488%2Fderrickstolee%2Fsparse-case-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-488/derrickstolee/sparse-case-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/488

Range-diff vs v1:

 1:  23705845ce ! 1:  dc97d5a0d2 sparse-checkout: respect core.ignoreCase in cone mode
     @@ -12,23 +12,42 @@
          using a case-insensitive match. Do the same for the sparse-checkout
          feature.
      
     -    The sanitize_cone_input() method is named to imply that other checks
     -    may be added. In fact, such checks are planned including looking for
     -    wildcards that make the paths invalid cone patterns or must be
     -    escaped.
     -
     -    Specifically, if the path has a match in the index, then use that
     -    path instead. If there is no match, still add that path to the
     -    patterns, as the user may expect the directory to appear after a
     -    checkout to another ref. However, we have no matching path to
     -    correct for a case conflict, and must assume that the user provided
     -    the correct case.
     -
     -    Another option would be to do case-insensitive checks while
     -    updating the skip-worktree bits during unpack_trees(). Outside of
     -    the potential performance loss on a more expensive code path, that
     -    also breaks compatibility with older versions of Git as using the
     -    same sparse-checkout file would change the paths that are included.
     +    Perform case-insensitive checks while updating the skip-worktree
     +    bits during unpack_trees(). This is done by changing the hash
     +    algorithm and hashmap comparison methods to optionally use case-
     +    insensitive methods.
     +
     +    When this is enabled, there is a small performance cost in the
     +    hashing algorithm. To tease out the worst possible case, the
     +    following was run on a repo with a deep directory structure:
     +
     +            git ls-tree -d -r --name-only HEAD |
     +                    git sparse-checkout set --stdin
     +
     +    The 'set' command was timed with core.ignoreCase disabled or
     +    enabled. For the repo with a deep history, the numbers were
     +
     +            core.ignoreCase=false: 62s
     +            core.ignoreCase=true:  74s (+19.3%)
     +
     +    For reproducibility, the equivalent test on the Linux kernel
     +    repository had these numbers:
     +
     +            core.ignoreCase=false: 3.1s
     +            core.ignoreCase=true:  3.6s (+16%)
     +
     +    Now, this is not an entirely fair comparison, as most users
     +    will define their sparse cone using more shallow directories,
     +    and the performance improvement from eb42feca97 ("unpack-trees:
     +    hash less in cone mode" 2019-11-21) can remove most of the
     +    hash cost. For a more realistic test, drop the "-r" from the
     +    ls-tree command to store only the first-level directories.
     +    In that case, the Linux kernel repository takes 0.2-0.25s in
     +    each case, and the deep repository takes one second, plus or
     +    minus 0.05s, in each case.
     +
     +    Thus, we _can_ demonstrate a cost to this change, but it is
     +    unlikely to matter to any reasonable sparse-checkout cone.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -39,9 +58,10 @@
       If the patterns do match the expected format, then Git will use faster hash-
       based algorithms to compute inclusion in the sparse-checkout.
       
     -+If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
     -+correct for incorrect case when assigning patterns to the sparse-checkout
     -+file.
     ++If `core.ignoreCase=true`, then the pattern-matching algorithm will use a
     ++case-insensitive check. This corrects for case mismatched filenames in the
     ++'git sparse-checkout set' command to reflect the expected cone in the working
     ++directory.
      +
       SEE ALSO
       --------
     @@ -51,71 +71,76 @@
       --- a/builtin/sparse-checkout.c
       +++ b/builtin/sparse-checkout.c
      @@
     - 	}
     - }
     + 	struct pattern_entry *e = xmalloc(sizeof(*e));
     + 	e->patternlen = path->len;
     + 	e->pattern = strbuf_detach(path, NULL);
     +-	hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
     ++	hashmap_entry_init(&e->ent,
     ++			   ignore_case ?
     ++			   strihash(e->pattern) :
     ++			   strhash(e->pattern));
     + 
     + 	hashmap_add(&pl->recursive_hashmap, &e->ent);
       
     -+static void sanitize_cone_input(struct strbuf *line)
     -+{
     -+	if (ignore_case) {
     -+		struct index_state *istate = the_repository->index;
     -+		const char *name = index_dir_matching_name(istate, line->buf, line->len);
     -+
     -+		if (name) {
     -+			strbuf_setlen(line, 0);
     -+			strbuf_addstr(line, name);
     -+		}
     -+	}
     -+
     -+	if (line->buf[0] != '/')
     -+		strbuf_insert(line, 0, "/", 1);
     -+}
     -+
     - static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
     - {
     - 	strbuf_trim(line);
      @@
     - 	if (!line->len)
     - 		return;
     + 		e = xmalloc(sizeof(struct pattern_entry));
     + 		e->patternlen = newlen;
     + 		e->pattern = xstrndup(oldpattern, newlen);
     +-		hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
     ++		hashmap_entry_init(&e->ent,
     ++				   ignore_case ?
     ++				   strihash(e->pattern) :
     ++				   strhash(e->pattern));
       
     --	if (line->buf[0] != '/')
     --		strbuf_insert(line, 0, "/", 1);
     -+	sanitize_cone_input(line);
     + 		if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
     + 			hashmap_add(&pl->parent_hashmap, &e->ent);
     +
     + diff --git a/dir.c b/dir.c
     + --- a/dir.c
     + +++ b/dir.c
     +@@
     + 			 ? ee1->patternlen
     + 			 : ee2->patternlen;
       
     - 	insert_recursive_pattern(pl, line);
     ++	if (ignore_case)
     ++		return strncasecmp(ee1->pattern, ee2->pattern, min_len);
     + 	return strncmp(ee1->pattern, ee2->pattern, min_len);
       }
     -
     - diff --git a/cache.h b/cache.h
     - --- a/cache.h
     - +++ b/cache.h
     + 
      @@
     - int verify_path(const char *path, unsigned mode);
     - int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
     - int index_dir_exists(struct index_state *istate, const char *name, int namelen);
     -+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
     - void adjust_dirname_case(struct index_state *istate, char *name);
     - struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
     + 		translated->pattern = truncated;
     + 		translated->patternlen = given->patternlen - 2;
     + 		hashmap_entry_init(&translated->ent,
     +-				   memhash(translated->pattern, translated->patternlen));
     ++				   ignore_case ?
     ++				   strihash(translated->pattern) :
     ++				   strhash(translated->pattern));
       
     -
     - diff --git a/name-hash.c b/name-hash.c
     - --- a/name-hash.c
     - +++ b/name-hash.c
     + 		if (!hashmap_get_entry(&pl->recursive_hashmap,
     + 				       translated, ent, NULL)) {
      @@
     - 	return dir && dir->nr;
     + 	translated->pattern = xstrdup(given->pattern);
     + 	translated->patternlen = given->patternlen;
     + 	hashmap_entry_init(&translated->ent,
     +-			   memhash(translated->pattern, translated->patternlen));
     ++			   ignore_case ?
     ++			   strihash(translated->pattern) :
     ++			   strhash(translated->pattern));
     + 
     + 	hashmap_add(&pl->recursive_hashmap, &translated->ent);
     + 
     +@@
     + 	/* Check straight mapping */
     + 	p.pattern = pattern->buf;
     + 	p.patternlen = pattern->len;
     +-	hashmap_entry_init(&p.ent, memhash(p.pattern, p.patternlen));
     ++	hashmap_entry_init(&p.ent,
     ++			   ignore_case ?
     ++			   strihash(p.pattern) :
     ++			   strhash(p.pattern));
     + 	return !!hashmap_get_entry(map, &p, ent, NULL);
       }
       
     -+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
     -+{
     -+	struct dir_entry *dir;
     -+
     -+	lazy_init_name_hash(istate);
     -+	dir = find_dir_entry(istate, name, namelen);
     -+
     -+	return dir ? dir->name : NULL;
     -+}
     -+
     - void adjust_dirname_case(struct index_state *istate, char *name)
     - {
     - 	const char *startPtr = name;
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
     @@ -125,16 +150,20 @@
       '
       
      +test_expect_success 'cone mode: set with core.ignoreCase=true' '
     -+	test_when_finished git -C repo config --unset core.ignoreCase &&
      +	git -C repo sparse-checkout init --cone &&
     -+	git -C repo config core.ignoreCase true &&
     -+	git -C repo sparse-checkout set Folder1 &&
     ++	git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
      +	cat >expect <<-EOF &&
      +		/*
      +		!/*/
      +		/folder1/
      +	EOF
     -+	test_cmp expect repo/.git/info/sparse-checkout
     ++	test_cmp expect repo/.git/info/sparse-checkout &&
     ++	ls repo >dir &&
     ++	cat >expect <<-EOF &&
     ++		a
     ++		folder1
     ++	EOF
     ++	test_cmp expect dir
      +'
      +
       test_done

-- 
gitgitgadget

  parent reply	other threads:[~2019-12-13 20:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2019-12-11 18:44   ` Junio C Hamano
2019-12-11 19:11     ` Derrick Stolee
2019-12-11 20:00       ` Junio C Hamano
2019-12-11 20:29         ` Derrick Stolee
2019-12-11 21:37           ` Junio C Hamano
2019-12-12  2:51             ` Derrick Stolee
2019-12-12 20:59   ` Derrick Stolee
2019-12-13 19:05     ` Junio C Hamano
2019-12-13 19:40       ` Derrick Stolee
2019-12-13 22:02         ` Junio C Hamano
2019-12-13 18:09 ` Derrick Stolee via GitGitGadget [this message]
2019-12-13 18:09   ` [PATCH v2 " Derrick Stolee via GitGitGadget
2019-12-13 19:58   ` [PATCH v2 0/1] " Junio C Hamano
2019-12-18 11:41 ` [PATCH " Ed Maste

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.488.v2.git.1576260593.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).