Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 02/10] unpack-trees: make sparse aware
Date: Tue, 20 Apr 2021 16:00:35 -0700
Message-ID: <CABPp-BHcdhO_kEdqR23sDdGAgoSu2R-HBWv-RmzQvJ0ysWtzxg@mail.gmail.com> (raw)
In-Reply-To: <0a3892d2ec9e4acd4cba1c1d0390acc60dc6e50f.1618322497.git.gitgitgadget@gmail.com>

On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As a first step to integrate 'git status' and 'git add' with the sparse
> index, we must start integrating unpack_trees() with sparse directory
> entries. These changes are currently impossible to trigger because
> unpack_trees() calls ensure_full_index() if command_requires_full_index
> is true. This is the case for all commands at the moment. As we expand
> more commands to be sparse-aware, we might find that more changes are
> required to unpack_trees(). The current changes will suffice for
> 'status' and 'add'.
>
> unpack_trees() calls the traverse_trees() API using unpack_callback()
> to decide if we should recurse into a subtree. We must add new abilities
> to skip a subtree if it corresponds to a sparse directory entry.
>
> It is important to be careful about the trailing directory separator
> that exists in the sparse directory entries but not in the subtree
> paths.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.h           |  2 +-
>  preload-index.c |  2 ++
>  read-cache.c    |  3 +++
>  unpack-trees.c  | 24 ++++++++++++++++++++++--
>  4 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/dir.h b/dir.h
> index 51cb0e217247..9d6666f520f3 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate,
>                                 char *seen)
>  {
>         return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
> -                             S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
> +                             S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));

I'm confused why this change would be needed, or why it'd semantically
be meaningful here either.  Doesn't S_ISSPARSEDIR() being true imply
S_ISDIR() is true (and perhaps even vice versa?).

By chance, was this a leftover from your early RFC changes from a few
series ago when you had an entirely different mode for sparse
directory entries?

>  }
>
>  static inline int dir_path_match(struct index_state *istate,
> diff --git a/preload-index.c b/preload-index.c
> index e5529a586366..35e67057ca9b 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data)
>                         continue;
>                 if (S_ISGITLINK(ce->ce_mode))
>                         continue;
> +               if (S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
>                 if (ce_uptodate(ce))
>                         continue;
>                 if (ce_skip_worktree(ce))

Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)?
 Is this a duplicate check?  If so, is it still desirable for
future-proofing or code clarity, or is it strictly redundant?

> diff --git a/read-cache.c b/read-cache.c
> index 29ffa9ac5db9..6308234b4838 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>                 if (ignore_skip_worktree && ce_skip_worktree(ce))
>                         continue;
>
> +               if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
> +                       continue;
> +

I'm a bit confused about what could trigger ce_skip_worktree(ce) &&
!ignore_skip_worktree and why it'd be desirable to refresh
skip-worktree entries.  However, this is tangential to your patch and
has apparently been around since 2009 (in particular, from 56cac48c35
("ie_match_stat(): do not ignore skip-worktree bit with
CE_MATCH_IGNORE_VALID", 2009-12-14)).

>                 if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>                         filtered = 1;
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index dddf106d5bd4..9a62e823928a 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
>  {
>         ce->ce_flags |= CE_UNPACKED;
>
> +       /*
> +        * If this is a sparse directory, don't advance cache_bottom.
> +        * That will be advanced later using the cache-tree data.
> +        */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return;
> +

I don't understand cache_bottom stuff; we might want to get Junio to
look over it.  Or maybe I just need to dig a bit further and attempt
to understand it.

>         if (o->cache_bottom < o->src_index->cache_nr &&
>             o->src_index->cache[o->cache_bottom] == ce) {
>                 int bottom = o->cache_bottom;
> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce,
>         ce_len -= pathlen;
>         ce_name = ce->name + pathlen;
>
> +       /* remove directory separator if a sparse directory entry */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               ce_len--;
>         return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);

Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well?

Note the following sort order:
   foo
   foo.txt
   foo/
   foo/bar

You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is,
but df_name_compare() exists to make "foo" sort exactly where "foo/"
would when "foo" is a directory.  Will your df_name_compare() call
here result in foo.txt being placed after all the "foo/<subpath>"
entries in the index and perhaps cause other problems down the line?
(Are there issues, e.g. with cache-trees getting wrong ordering from
this, or even writing out indexes or tree objects with the wrong
ordering?  I've written out trees to disk with wrong ordering before
and git usually survives but gets really confused with diffs.)

Since at least one caller of compare_entry() takes the return result
and does a "if (cmp < 0)", this order is going to matter in some
cases.  Perhaps we need some testcases where there is a sparse
directory entry named "foo/" and a file recorded in some relevant tree
with the name "foo.txt" to be able to trigger these lines of code?

>  }
>
> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
>         if (cmp)
>                 return cmp;
>
> +       /* If ce is a sparse directory, then allow equality here. */
> +       if (S_ISSPARSEDIR(ce->ce_mode))
> +               return 0;
> +

Um...so a sparse directory compares equal to _anything_ at all?  I'm
really confused why this would be desirable.  Am I missing something
here?

>         /*
>          * Even if the beginning compared identically, the ce should
>          * compare as bigger than a directory leading up to it!
> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
>         const struct name_entry *p = names;
> +       unsigned recurse = 1;

"recurse" sent my mind off into questions about safety checks, base
cases, etc., instead of just the simple "we don't want to read in
directories corresponding to sparse entries".  I think this would be
clearer either if the variable had the sparsity concept embedded in
its name somewhere (e.g. "unsigned sparse_entry = 0", and check for
(!sparse_entry) instead of (recurse) below), or with a comment about
why there are cases where you want to avoid recursion.

>
>         /* Find first entry with a real name (we could use "mask" too) */
>         while (!p->mode)
> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                         }
>                                 }
>                                 src[0] = ce;
> +
> +                               if (S_ISSPARSEDIR(ce->ce_mode))
> +                                       recurse = 0;

Ah, the context here doesn't show it but this is in the "if (!cmp)"
block, i.e. if we found a match for the sparse directory.  This makes
sense, to me, _if_ we ignore the above question about sparse
directories matching equal to anything and everything.

>                         }
>                         break;
>                 }
>         }
>
> -       if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
> +       if (recurse &&
> +           unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
>                 return -1;
>
>         if (o->merge && src[0]) {
> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                         }
>                 }
>
> -               if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> +               if (recurse &&
> +                   traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                              names, info) < 0)
>                         return -1;
>                 return mask;

Nice.  :-)


I think your patch was mostly about the recurse stuff, which other
than the name or a comment about it look good to me.  However, all the
other preparatory small tweaks brought up a lot of questions or
confusion for me.  I'm worried there might be a bug or two, though I
may have just misunderstood some of the code bits.

  reply index

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 14:01 [PATCH 00/10] Sparse-index: integrate with status and add Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 01/10] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-04-20 21:52   ` Elijah Newren
2021-04-21 13:21     ` Derrick Stolee
2021-04-21 15:14   ` Matheus Tavares Bernardino
2021-04-23 20:12     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 02/10] unpack-trees: make sparse aware Derrick Stolee via GitGitGadget
2021-04-20 23:00   ` Elijah Newren [this message]
2021-04-21 13:41     ` Derrick Stolee
2021-04-21 16:11       ` Elijah Newren
2021-04-22  2:24         ` Matheus Tavares Bernardino
2021-04-21 17:27     ` Derrick Stolee
2021-04-21 18:55       ` Matheus Tavares Bernardino
2021-04-21 19:10         ` Elijah Newren
2021-04-21 19:51           ` Matheus Tavares Bernardino
2021-04-21 18:56       ` Elijah Newren
2021-04-23 20:16         ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 03/10] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-20 23:21   ` Elijah Newren
2021-04-21 13:47     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 04/10] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-20 23:26   ` Elijah Newren
2021-04-21 13:51     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 05/10] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-21  0:44   ` Elijah Newren
2021-04-21 13:55     ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 06/10] dir: use expand_to_path() for sparse directories Derrick Stolee via GitGitGadget
2021-04-21  0:52   ` Elijah Newren
2021-04-21  0:53     ` Elijah Newren
2021-04-21 14:03       ` Derrick Stolee
2021-04-13 14:01 ` [PATCH 07/10] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 08/10] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-04-21  0:57   ` Elijah Newren
2021-04-13 14:01 ` [PATCH 09/10] t7519: add sparse directories to FS monitor tests Derrick Stolee via GitGitGadget
2021-04-13 14:01 ` [PATCH 10/10] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-04-21  7:00   ` Elijah Newren
2021-04-13 20:45 ` [PATCH 00/10] Sparse-index: integrate with status and add Matheus Tavares Bernardino
2021-04-14 16:31   ` Derrick Stolee
2021-04-23 21:34 ` [PATCH v2 0/8] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 1/8] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-13 12:40     ` Matheus Tavares Bernardino
2021-05-14 12:27       ` Derrick Stolee
2021-04-23 21:34   ` [PATCH v2 2/8] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 3/8] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-13  3:26     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 4/8] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-13  3:31     ` Elijah Newren
2021-04-23 21:34   ` [PATCH v2 5/8] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 6/8] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 7/8] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-04-23 21:34   ` [PATCH v2 8/8] fsmonitor: test with sparse index Derrick Stolee via GitGitGadget
2021-05-13  4:12   ` [PATCH v2 0/8] Sparse-index: integrate with status Elijah Newren
2021-05-14 18:28     ` Derrick Stolee
2021-05-14 18:30   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-18  1:33       ` Elijah Newren
2021-05-18 14:57         ` Derrick Stolee
2021-05-18 17:48           ` Elijah Newren
2021-05-18 18:16             ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-18  1:49       ` Elijah Newren
2021-05-18 14:59         ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 07/12] unpack-trees: stop recursing into sparse directories Derrick Stolee via GitGitGadget
2021-05-18  2:03       ` Elijah Newren
2021-05-18  2:06         ` Elijah Newren
2021-05-18 19:20           ` Derrick Stolee
2021-05-14 18:31     ` [PATCH v3 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-14 18:31     ` [PATCH v3 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-18  2:27       ` Elijah Newren
2021-05-18 18:26         ` Derrick Stolee
2021-05-18 19:04           ` Derrick Stolee
2021-05-19  8:38             ` Elijah Newren
2021-05-14 18:31     ` [PATCH v3 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-05-21 11:59     ` [PATCH v4 00/12] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 01/12] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 02/12] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 03/12] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 04/12] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 05/12] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 06/12] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 07/12] unpack-trees: be careful around sparse directory entries Derrick Stolee via GitGitGadget
2021-05-28 11:36         ` Derrick Stolee
2021-05-21 11:59       ` [PATCH v4 08/12] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 09/12] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 10/12] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 11/12] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-05-21 11:59       ` [PATCH v4 12/12] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget
2021-06-07 12:33       ` [PATCH v5 00/14] Sparse-index: integrate with status Derrick Stolee via GitGitGadget
2021-06-07 12:33         ` [PATCH v5 01/14] sparse-index: skip indexes with unmerged entries Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 02/14] sparse-index: include EXTENDED flag when expanding Derrick Stolee via GitGitGadget
2021-06-08 18:56           ` Elijah Newren
2021-06-09 17:39             ` Derrick Stolee
2021-06-09 18:11               ` Elijah Newren
2021-06-07 12:34         ` [PATCH v5 03/14] t1092: replace incorrect 'echo' with 'cat' Derrick Stolee via GitGitGadget
2021-06-08 19:18           ` Elijah Newren
2021-06-07 12:34         ` [PATCH v5 04/14] t1092: expand repository data shape Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 05/14] t1092: add tests for status/add and sparse files Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 06/14] unpack-trees: preserve cache_bottom Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 07/14] unpack-trees: compare sparse directories correctly Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 08/14] unpack-trees: unpack sparse directory entries Derrick Stolee via GitGitGadget
2021-06-09  3:48           ` Elijah Newren
2021-06-09 20:21             ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 09/14] dir.c: accept a directory as part of cone-mode patterns Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 10/14] diff-lib: handle index diffs with sparse dirs Derrick Stolee via GitGitGadget
2021-06-07 15:26           ` Derrick Stolee
2021-06-08  1:05             ` Junio C Hamano
2021-06-08 13:00               ` Derrick Stolee
2021-06-09  5:47           ` Elijah Newren
2021-06-09  6:32             ` Junio C Hamano
2021-06-09  8:11               ` Elijah Newren
2021-06-09 20:33                 ` Derrick Stolee
2021-06-10 17:45                   ` Derrick Stolee
2021-06-10 21:31                     ` Elijah Newren
2021-06-11 12:57                       ` Derrick Stolee
2021-06-11 17:27                         ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 11/14] status: skip sparse-checkout percentage with sparse-index Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 12/14] status: use sparse-index throughout Derrick Stolee via GitGitGadget
2021-06-07 12:34         ` [PATCH v5 13/14] wt-status: expand added sparse directory entries Derrick Stolee via GitGitGadget
2021-06-09  5:27           ` Elijah Newren
2021-06-09 20:49             ` Derrick Stolee
2021-06-07 12:34         ` [PATCH v5 14/14] fsmonitor: integrate with sparse index Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHcdhO_kEdqR23sDdGAgoSu2R-HBWv-RmzQvJ0ysWtzxg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git