All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Victoria Dye <vdye@github.com>, Derrick Stolee <stolee@gmail.com>,
	Lessley Dennington <lessleydennington@gmail.com>
Subject: Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
Date: Wed, 16 Feb 2022 08:30:42 -0800	[thread overview]
Message-ID: <CABPp-BEog_CBEjx3FBGdUAhjwrPPDuP54HWQssAWnGeUnr0cBg@mail.gmail.com> (raw)
In-Reply-To: <220216.86fsojup82.gmgdl@evledraar.gmail.com>

On Wed, Feb 16, 2022 at 1:37 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> > [...]
> > +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> > +                   int *dir_found)
> > +{
> > +     struct stat st;
> > +     char *newdir;
> > +     char *tmp;
> > +
> > +     /*
> > +      * If dirname corresponds to a directory that doesn't exist, and this
> > +      * path starts with dirname, then path can't exist.
> > +      */
> > +     if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> > +             return 0;
> > +
> > +     /*
> > +      * If path itself exists, return 1.
> > +      */
> > +     if (!lstat(path, &st))
> > +             return 1;
> > +
> > +     /*
> > +      * Otherwise, path does not exist so we'll return 0...but we'll first
> > +      * determine some info about its parent directory so we can avoid
> > +      * lstat calls for future cache entries.
> > +      */
> > +     newdir = strrchr(path, '/');
> > +     if (!newdir)
> > +             return 0; /* Didn't find a parent dir; just return 0 now. */
> > +
> > +     /*
> > +      * If path starts with directory (which we already lstat'ed and found),
> > +      * then no need to lstat parent directory again.
> > +      */
> > +     if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> > +             return 0;
>
> I really don't care/just asking, but there was a discussion on another
> topic about guarding calls to the mem*() family when n=0:
> https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/
>
> Is this the same sort of redundancy where we could lose the "&&
> *dirname" part, or is it still important because a "\0" dirname would
> have corresponding non-0 *dir_len?

No, dirname is a char**, not a char*.  I need to make sure *dirname is
non-NULL before passing to memcmp or we get segfaults (and *dirname
will be NULL the first time it gets to this line, so the check is
critical).

> More generally ... (see below)...
>
> > +
> > +     /* Free previous dirname, and cache path's dirname */
> > +     *dirname = path;
> > +     *dir_len = newdir - path + 1;
> > +
> > +     tmp = xstrndup(path, *dir_len);
> > +     *dir_found = !lstat(tmp, &st);
>
> In most other places we're a bit more careful about lstat() error handling, e.g.:
>
>     builtin/init-db.c:              if (lstat(path->buf, &st_git)) {
>     builtin/init-db.c-                      if (errno != ENOENT)
>     builtin/init-db.c-                              die_errno(_("cannot stat '%s'"), path->buf);
>     builtin/init-db.c-              }
>
> Shouldn't we do the same here and at least error() on return values of
> -1 with an accompanying errno that isn't ENOENT?

If we should do that everywhere, should we have an xlstat in wrapper.[ch]?

> > +     free(tmp);
> > +
> > +     return 0;
> > +}
> > +
> >  void clear_skip_worktree_from_present_files(struct index_state *istate)
> >  {
> > +     const char *last_dirname = NULL;
> > +     size_t dir_len = 0;
> > +     int dir_found = 1;
> > +
> >       int i;
> > +
> >       if (!core_apply_sparse_checkout)
> >               return;
> >
> >  restart:
> >       for (i = 0; i < istate->cache_nr; i++) {
> >               struct cache_entry *ce = istate->cache[i];
> > -             struct stat st;
> >
> > -             if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> > +             if (ce_skip_worktree(ce) &&
> > +                 path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>
> ...(continued from above) is the "path is zero" part of this even
> reachable? I tried with this on top and ran your tests (and the rest of
> t*sparse*.sh) successfully:
>
>         diff --git a/sparse-index.c b/sparse-index.c
>         index eed170cd8f7..f89c944d8cd 100644
>         --- a/sparse-index.c
>         +++ b/sparse-index.c
>         @@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>                 for (i = 0; i < istate->cache_nr; i++) {
>                         struct cache_entry *ce = istate->cache[i];
>
>         +               assert(*ce->name);
>                         if (ce_skip_worktree(ce) &&
>                             path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>                                 if (S_ISSPARSEDIR(ce->ce_mode)) {
>
> I.e. isn't this undue paranoia about the cache API giving us zero-length
> paths?

Nope, not related at all, for two reasons: the code above was checking
for NULL pointers rather than NUL characters, and the argument I was
checking was last_dirname, not ce->name.

  reply	other threads:[~2022-02-16 16:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-13 23:35   ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:02       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:08       ` Elijah Newren
2022-02-19  1:06     ` Jonathan Nieder
2022-02-19 16:42       ` Elijah Newren
2022-02-19 18:14         ` Jonathan Nieder
2022-02-20  5:28           ` Elijah Newren
2022-02-20 16:56       ` Derrick Stolee
2022-02-22 23:17         ` Jonathan Nieder
2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:21       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-15  1:39     ` Victoria Dye
2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:30       ` Elijah Newren [this message]
2022-02-17  4:40         ` Elijah Newren
2022-01-15  1:51   ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEog_CBEjx3FBGdUAhjwrPPDuP54HWQssAWnGeUnr0cBg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=lessleydennington@gmail.com \
    --cc=stolee@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.