All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: newren@gmail.com, gitster@pobox.com,
	Derrick Stolee <stolee@gmail.com>,
	Matheus Tavares Bernardino <matheus.bernardino@usp.br>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v3 00/26] Sparse Index: API protections
Date: Mon, 12 Apr 2021 21:07:51 +0000	[thread overview]
Message-ID: <pull.906.v3.git.1618261697.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.906.v2.git.1617241802.gitgitgadget@gmail.com>

Here is the second patch series submission coming out of the sparse-index
RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/

This is based on ds/sparse-index.

The point of this series is to insert protections for the consumers of the
in-memory index to avoid unintended behavior change when using a sparse
index versus a full one.

We mark certain regions of code as needing a full index, so we call
ensure_full_index() to expand a sparse index to a full one, if necessary.
These protections are inserted file-by-file in every loop over all cache
entries. Well, "most" loops, because some are going to be handled in the
very next series so I leave them out.

Many callers use index_name_pos() to find a path by name. In these cases, we
can check if that position resolves to a sparse directory instance. In those
cases, we just expand to a full index and run the search again.

The last few patches deal with the name-hash hashtable for doing O(1)
lookups.

These protections don't do much right now, since the previous series created
the_repository->settings.command_requires_full_index to guard all index
reads and writes to ensure the in-memory copy is full for commands that have
not been tested with the sparse index yet.

However, after this series is complete, we now have a straight-forward plan
for making commands "sparse aware" one-by-one:

 1. Disable settings.command_requires_full_index to allow an in-memory
    sparse-index.
 2. Run versions of that command under a debugger, breaking on
    ensure_full_index().
 3. Examine the call stack to determine the context of that expansion, then
    implement the proper behavior in those locations.
 4. Add tests to ensure we are checking this logic in the presence of sparse
    directory entries.

I will admit that mostly it is the writing of the test cases that takes the
most time in the conversions I've done so far.


Updates in v3
=============

 * I updated based on Elijah's feedback.
 * One new patch splits out a change that Elijah (rightfully) pointed out
   did not belong with the patch it was originally in.

I gave it time to see if any other comments came in, but it looks like
review stabilized. I probably waited a bit longer than I should have.


Updates in v2
=============

 * Rebased onto v5 of ds/sparse-index
 * Updated the technical doc to describe how these protections are guards to
   keep behavior consistent between a sparse-index and a full index. Whether
   or not that behavior is "correct" can be interrogated later.
 * Calls to ensure_full_index() are marked with a TODO comment saying these
   calls should be audited later (with tests).
 * Fixed an incorrectly squashed commit message.
 * Dropped the diff-lib.c commit because it was erroneously included in v2.
 * Dropped the merge-ort.c commit because of conflicts with work in flight
   and a quick audit that it is not needed.
 * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
   found it equivalent to what I would have done.

Thanks, -Stolee

Derrick Stolee (26):
  sparse-index: API protection strategy
  *: remove 'const' qualifier for struct index_state
  read-cache: expand on query into sparse-directory entry
  cache: move ensure_full_index() to cache.h
  add: ensure full index
  checkout-index: ensure full index
  checkout: ensure full index
  commit: ensure full index
  difftool: ensure full index
  fsck: ensure full index
  grep: ensure full index
  ls-files: ensure full index
  merge-index: ensure full index
  rm: ensure full index
  stash: ensure full index
  update-index: ensure full index
  dir: ensure full index
  entry: ensure full index
  merge-recursive: ensure full index
  pathspec: ensure full index
  read-cache: ensure full index
  resolve-undo: ensure full index
  revision: ensure full index
  name-hash: don't add directories to name_hash
  sparse-index: expand_to_path()
  name-hash: use expand_to_path()

 Documentation/technical/sparse-index.txt | 37 +++++++++++-
 attr.c                                   | 14 ++---
 attr.h                                   |  4 +-
 builtin/add.c                            |  2 +
 builtin/checkout-index.c                 |  2 +
 builtin/checkout.c                       |  5 ++
 builtin/commit.c                         |  4 ++
 builtin/difftool.c                       |  3 +
 builtin/fsck.c                           |  2 +
 builtin/grep.c                           |  2 +
 builtin/ls-files.c                       | 14 +++--
 builtin/merge-index.c                    |  5 ++
 builtin/rm.c                             |  2 +
 builtin/stash.c                          |  2 +
 builtin/update-index.c                   |  2 +
 cache.h                                  |  7 ++-
 convert.c                                | 26 ++++-----
 convert.h                                | 20 +++----
 dir.c                                    | 14 +++--
 dir.h                                    |  8 +--
 entry.c                                  |  2 +
 merge-recursive.c                        |  4 +-
 name-hash.c                              | 11 +++-
 pathspec.c                               |  8 ++-
 pathspec.h                               |  6 +-
 read-cache.c                             | 35 ++++++++++--
 resolve-undo.c                           |  4 ++
 revision.c                               |  2 +
 sparse-index.c                           | 73 ++++++++++++++++++++++++
 sparse-index.h                           | 14 ++++-
 submodule.c                              |  6 +-
 submodule.h                              |  6 +-
 32 files changed, 273 insertions(+), 73 deletions(-)


base-commit: c9e40ae8ec41c5566e5849a87c969fa81ef49fcd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/906

Range-diff vs v2:

  1:  7484e085e342 =  1:  4731f610ba6e sparse-index: API protection strategy
  2:  098b2c9ef352 =  2:  d3a92538edb6 *: remove 'const' qualifier for struct index_state
  3:  737d27e18d64 =  3:  f4b77aa18b93 read-cache: expand on query into sparse-directory entry
  4:  db5c100f3e2b =  4:  da17774a53c5 cache: move ensure_full_index() to cache.h
  5:  4a5fc2eb5a9f =  5:  b59c9f482828 add: ensure full index
  6:  11c38f7277c5 =  6:  0082855b5961 checkout-index: ensure full index
  7:  fd04adbb3f79 =  7:  e2ac527143ff checkout: ensure full index
  8:  65704f39edc9 =  8:  1a3b51fd3c4b commit: ensure full index
  9:  739f3fe9edf2 =  9:  8c61d40dfe01 difftool: ensure full index
 10:  779a86ad1ec4 = 10:  45b603379422 fsck: ensure full index
 11:  8c0d377054fa = 11:  97124e9fdc7f grep: ensure full index
 12:  beaa1467cabb = 12:  b00e214515e8 ls-files: ensure full index
 13:  73684141fcff = 13:  6497f2ce225b merge-index: ensure full index
 14:  6ea81a49c6b5 = 14:  175f3bc6b336 rm: ensure full index
 15:  49ca5ed05c8d = 15:  daa77e84e0e2 stash: ensure full index
 16:  9c4bb187c15d = 16:  8c5336964d9b update-index: ensure full index
 17:  fae4c078c3ef = 17:  08a62c23c8f7 dir: ensure full index
 18:  2b9180ee77d3 = 18:  825ebceee508 entry: ensure full index
 19:  1e3f6085a405 = 19:  3673db517235 merge-recursive: ensure full index
 20:  e62a597a9725 = 20:  4d3f6de29a63 pathspec: ensure full index
 21:  ebfffdbdd6ad = 21:  bda9cab15966 read-cache: ensure full index
 22:  495b07a87973 = 22:  38f295a41ec1 resolve-undo: ensure full index
 23:  3144114d1a75 = 23:  f928e104f0d3 revision: ensure full index
  -:  ------------ > 24:  5fd83dcf2747 name-hash: don't add directories to name_hash
 24:  d52c72b4a7b9 ! 25:  335fec3676a0 sparse-index: expand_to_path()
     @@ sparse-index.c: void ensure_full_index(struct index_state *istate)
      +				      path_mutable.len, icase)) {
      +			/*
      +			 * We found a parent directory in the name-hash
     -+			 * hashtable, which means that this entry could
     -+			 * exist within a sparse-directory entry. Expand
     -+			 * accordingly.
     ++			 * hashtable, because only sparse directory entries
     ++			 * have a trailing '/' character.  Since "path" wasn't
     ++			 * in the index, perhaps it exists within this
     ++			 * sparse-directory.  Expand accordingly.
      +			 */
      +			ensure_full_index(istate);
      +			break;
 25:  7e2d3fae9a2a ! 26:  1f3af8a886e5 name-hash: use expand_to_path()
     @@ name-hash.c
       
       struct dir_entry {
       	struct hashmap_entry ent;
     -@@ name-hash.c: static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
     - 	if (ce->ce_flags & CE_HASHED)
     - 		return;
     - 	ce->ce_flags |= CE_HASHED;
     -+
     -+	if (S_ISSPARSEDIR(ce->ce_mode)) {
     -+		add_dir_entry(istate, ce);
     -+		return;
     -+	}
     -+
     - 	hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
     - 	hashmap_add(&istate->name_hash, &ce->ent);
     - 
      @@ name-hash.c: int index_dir_exists(struct index_state *istate, const char *name, int namelen)
       	struct dir_entry *dir;
       

-- 
gitgitgadget

  parent reply	other threads:[~2021-04-12 21:08 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-03-19 21:01   ` Junio C Hamano
2021-03-20  1:45     ` Derrick Stolee
2021-03-20  1:52     ` Junio C Hamano
2021-03-30 16:53       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 02/27] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 03/27] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 04/27] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
2021-03-17 17:35   ` Elijah Newren
2021-03-17 20:35     ` Matheus Tavares Bernardino
2021-03-17 20:55       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
2021-03-17 17:50   ` Elijah Newren
2021-03-17 20:05     ` Derrick Stolee
2021-03-17 21:10       ` Elijah Newren
2021-03-17 21:33         ` Derrick Stolee
2021-03-17 22:36           ` Elijah Newren
2021-03-18  1:17             ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 07/27] checkout: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 08/27] commit: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 09/27] difftool: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 10/27] fsck: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 11/27] grep: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 12/27] ls-files: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 13/27] merge-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 14/27] rm: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
2021-03-18  5:22   ` Elijah Newren
2021-03-23 13:13     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 16/27] update-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
2021-03-18  5:24   ` Elijah Newren
2021-03-23 13:15     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 18/27] dir: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 19/27] entry: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
2021-03-18  5:31   ` Elijah Newren
2021-03-23 13:26     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 21/27] merge-recursive: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 22/27] pathspec: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 23/27] read-cache: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 24/27] resolve-undo: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 25/27] revision: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 26/27] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 27/27] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
2021-03-18  6:32   ` Elijah Newren
2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:32     ` Elijah Newren
2021-04-06 11:46       ` Derrick Stolee
2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:53     ` Elijah Newren
2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
2021-04-01 13:32     ` Derrick Stolee
2021-04-05 19:55   ` Elijah Newren
2021-04-12 21:07   ` Derrick Stolee via GitGitGadget [this message]
2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
2021-04-14 20:44       ` Junio C Hamano
2021-04-15  2:42         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.906.v3.git.1618261697.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.