git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Utsav Shah <utsav@dropbox.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Utsav Shah via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Utsav Shah <ukshah2@illinois.edu>
Subject: Re: [PATCH v3 1/1] unpack-trees: skip stat on fsmonitor-valid files
Date: Thu, 7 Nov 2019 19:51:04 -0800	[thread overview]
Message-ID: <CAPYzU3OU45E_OQKOFFohxsqUvmMGXL9NNYNi2i27k3aRZ4oqVg@mail.gmail.com> (raw)
In-Reply-To: <CAPYzU3P6M0s365gOEz360B0QqRN6JKaUB2Zc8+-XnVei0WF6WA@mail.gmail.com>

On Wed, Nov 6, 2019 at 2:33 PM Utsav Shah <utsav@dropbox.com> wrote:
>
> On Wed, Nov 6, 2019 at 2:46 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Utsav Shah via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Utsav Shah <utsav@dropbox.com>
> > >
> > > The index might be aware that a file hasn't modified via fsmonitor, but
> > > unpack-trees did not pay attention to it and checked via ie_match_stat
> > > which can be inefficient on certain filesystems. This significantly slows
> > > down commands that run oneway_merge, like checkout and reset --hard.
> >
> > s/hasn't/& been/;
> >
> > Otherwise, well written.
> >
> > > This patch makes oneway_merge check whether a file is considered
> > > unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees
> > > also now correctly copies over fsmonitor validity state from the source
> > > index. Finally, for correctness, we force a refresh of fsmonitor state in
> > > tweak_fsmonitor.
> >
> > s/This patch makes/Make/; order the person who is updating the code
> > what to do to the codebase in imperative mood.
> >
> > Otherwise, well written.
> >
> > > After this change, commands like stash (that use reset --hard
> > > internally) go from 8s or more to ~2s on a 250k file repository on a
> > > mac.
> > >
> > > Signed-off-by: Utsav Shah <utsav@dropbox.com>
> > > ---
> > >  fsmonitor.c                 | 39 ++++++++++++++++++++++++-------------
> > >  t/t7519-status-fsmonitor.sh |  9 +++++++--
> > >  unpack-trees.c              |  6 +++++-
> > >  3 files changed, 37 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/fsmonitor.c b/fsmonitor.c
> > > index 1f4aa1b150..04d6232531 100644
> > > --- a/fsmonitor.c
> > > +++ b/fsmonitor.c
> > > @@ -18,7 +18,7 @@ static void fsmonitor_ewah_callback(size_t pos, void *is)
> > >
> > >       if (pos >= istate->cache_nr)
> > >               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
> > > -                 (uintmax_t)pos, istate->cache_nr);
> > > +                     (uintmax_t)pos, istate->cache_nr);
> >
> > Unintended whitespace change?
> >
> > > @@ -55,9 +55,10 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
> > >       }
> > >       istate->fsmonitor_dirty = fsmonitor_dirty;
> > >
> > > -     if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > -             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > > -                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > > +     if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > +             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> > > +                     (uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
> > > +
> >
> > The patch disables this sanity check under "split index" mode and it
> > must be done for good reasons, but readers (imagine, somebody found
> > a bug on this line 6 months down the road, ran "git blame" and found
> > this commit and reading it via "git show") would want to know why
> > this change was made.
> >
> > I recall seeing no mention of "split index" in the proposed log
> > message.  Is this a fix for unrelated issue that needs to be
> > explained in a separate patch, perhaps?

Yes. I think this is an important bug to fix, fsmonitor and split
indices don't seem to be interoperating well after these checks have
been added.

> >
> > The hunk also has the unintended whitespace change, it seems.
> >
> > > @@ -83,9 +84,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
> > >       uint32_t ewah_size = 0;
> > >       int fixup = 0;
> > >
> > > -     if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > -             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > > -                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > > +     if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > +             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> > > +                     (uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
> >
> > Likewise (both indentation of the second line and the unexplained
> > change to the sanity check condition we saw above).
> >
> > > @@ -189,13 +190,25 @@ void refresh_fsmonitor(struct index_state *istate)
> > >               }
> > >               if (bol < query_result.len)
> > >                       fsmonitor_refresh_callback(istate, buf + bol);
> > > +
> > > +             if (istate->untracked)
> > > +                     istate->untracked->use_fsmonitor = 1;
> >
> > Unexplained.  We used to do this in tweak_fsmonitor() but now we do
> > this here, as we are making tweak_fsmonitor() to call this function
> > anyway.  If there are other callers that call refresh_fsmonitor()
> > and they did not do this, this would be a behaviour change to them.
> > As there is no explanation why this change is done, readers cannot
> > tell if it is a good change.  If this were explained like so:
> >
> >     Any caller of refresh_fsmonitor() must set use_fsmonitor bit in
> >     istate when istate->untracked exists FOR SUCH AND SUCH REASONS.
> >     Move the code to do so in tweak_fsmonitor() to near the
> >     beginning of refresh_fsmonitor(), which would fix SUCH AND SUCH
> >     other callers that forgets to do this.
> >
> > in the proposed log message, that might help justifying the change.
> >
> > If use_fsmonitor is not set, why is the caller calling
> > refresh_fsmonitor() in the first place, by the way?
> >
> > Isn't it more like "we are told to use fsmonitor via
> > istate->untracked->use_fsmonitor bit being true, so we call
> > refresh_fsmonitor, and if the bit is false, we do not have to bother
> > with fsmonitor and no point calling refresh_fsmonitor"?
> >
> > If a careless caller makes a call to refresh_fsmonitor() when the
> > configuration tells us not to use fsmonitor, wouldn't this cause us
> > to use fsmonitor anyway?  Which sounds bad, so perhaps all callers
> > are careful to first check if use_fsmonitor is set before deciding
> > to call refresh_fsmonitor()---but if that is the case, is there a
> > point in setting it here to true?
> >
> > Puzzled by an unexplained code...

I might be misunderstanding, but wouldn't the if condition above make
sure we don't enter this codepath at all?

if (!core_fsmonitor || istate->fsmonitor_has_run_once)
        return;

istate->untracked->use_fsmonitor is only used in the untracked cache
to skip some lstats AFAICT. So callers refresh_fsmonitor should see no
difference.

I agree that this change deserves a better explanation.

> >
> > >       } else {
> > > +
> > > +             /* We only want to run the post index changed hook if we've actually changed entries, so keep track
> > > +              * if we actually changed entries or not */
> > > +             int is_cache_changed = 0;
> > >               /* Mark all entries invalid */
> > > -             for (i = 0; i < istate->cache_nr; i++)
> > > -                     istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> > > +             for (i = 0; i < istate->cache_nr; i++) {
> > > +                     if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) {
> > > +                             is_cache_changed = 1;
> > > +                             istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> > > +                     }
> > > +             }
> > >
> > >               /* If we're going to check every file, ensure we save the results */
> > > -             istate->cache_changed |= FSMONITOR_CHANGED;
> > > +             if (is_cache_changed)
> > > +                     istate->cache_changed |= FSMONITOR_CHANGED;
> >
> > This part (and a call to refresh_fsmonitor() we see blow) is the
> > "Finally, we force a refresh" explained in the proposed log message,
> > I presume.
> >
> > >               if (istate->untracked)
> > >                       istate->untracked->use_fsmonitor = 0;
> > > @@ -254,12 +267,10 @@ void tweak_fsmonitor(struct index_state *istate)
> > >                       /* Mark all previously saved entries as dirty */
> > >                       if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > >                               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > > -                                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > > +                                     (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> >
> > This shares the same indentation issue but does not disable the
> > sanity check for split index case.  Intended?  Without explanation
> > in the proposed log message, readers cannot tell.

Yes, will add the sanity check here too.

> >
> > >                       ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
> > >
> > > -                     /* Now mark the untracked cache for fsmonitor usage */
> > > -                     if (istate->untracked)
> > > -                             istate->untracked->use_fsmonitor = 1;
> > > +                     refresh_fsmonitor(istate);
> > >               }
> > >
> > >               ewah_free(istate->fsmonitor_dirty);
> > > diff --git a/unpack-trees.c b/unpack-trees.c
> > > index 33ea7810d8..fc5ceb932c 100644
> > > --- a/unpack-trees.c
> > > +++ b/unpack-trees.c
> > > @@ -1504,6 +1504,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > >       o->merge_size = len;
> > >       mark_all_ce_unused(o->src_index);
> > >
> > > +     if (o->src_index->fsmonitor_last_update)
> > > +             o->result.fsmonitor_last_update = o->src_index->fsmonitor_last_update;
> > > +
> >
> > This is the "correctly copies" part, which was well explained.
> >
> > >       /*
> > >        * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
> > >        */
> > > @@ -2384,7 +2387,8 @@ int oneway_merge(const struct cache_entry * const *src,
> > >
> > >       if (old && same(old, a)) {
> > >               int update = 0;
> > > -             if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old)) {
> > > +             if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
> > > +                     !(old->ce_flags & CE_FSMONITOR_VALID)) {
> >
> > This is the "skip when we know it is valid" part, which was well
> > explained.
> >
> > >                       struct stat st;
> > >                       if (lstat(old->name, &st) ||
> > >                           ie_match_stat(o->src_index, old, &st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE))
> >
> > Thanks.
>
> Thanks for the feedback. The lines with the indentation changes had
> tabs and spaces mixed up, but I'll revert those changes.

I will try to figure out how to use gitgitgadget to send a smaller
patch within this thread to fix the fsmonitor and split index
interactions. I tried writing a test today that uses
t7519/fsmonitor-watchman to simulate the bug, but it is flaky.

  reply	other threads:[~2019-11-08  3:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-25 15:23 [PATCH 0/1] unpack-trees: skip lstat on files based on fsmonitor Utsav Shah via GitGitGadget
2019-10-25 15:23 ` [PATCH 1/1] unpack-trees: skip lstat " Utsav Shah via GitGitGadget
2019-10-28  3:37   ` Junio C Hamano
2019-10-28  6:39     ` Utsav Shah
2019-10-28 19:23       ` Kevin Willford
2019-10-29 19:06         ` Utsav Shah
2019-10-29 20:12           ` Kevin Willford
2019-10-29 23:50             ` Utsav Shah
2019-10-30  0:21               ` Junio C Hamano
2019-10-30 16:41                 ` Utsav Shah
2019-11-04  6:02                   ` Junio C Hamano
2019-11-05 15:27 ` [PATCH v2 0/1] unpack-trees: skip stat on fsmonitor-valid files Utsav Shah via GitGitGadget
2019-11-05 15:27   ` [PATCH v2 1/1] " Utsav Shah via GitGitGadget
2019-11-05 21:40     ` Kevin Willford
2019-11-06  4:36       ` Utsav Shah
2019-11-06 17:24         ` Kevin Willford
2019-11-06  4:54   ` [PATCH v3 0/1] " Utsav Shah via GitGitGadget
2019-11-06  4:54     ` [PATCH v3 1/1] " Utsav Shah via GitGitGadget
2019-11-06 10:46       ` Junio C Hamano
2019-11-06 22:33         ` Utsav Shah
2019-11-08  3:51           ` Utsav Shah [this message]
2019-11-08  4:11             ` Junio C Hamano
2019-11-06 10:16     ` [PATCH v3 0/1] " Junio C Hamano
2019-11-20  8:32     ` [PATCH v4 " Utsav Shah via GitGitGadget
2019-11-20  8:32       ` [PATCH v4 1/1] " Utsav Shah via GitGitGadget
2019-11-21  4:15         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPYzU3OU45E_OQKOFFohxsqUvmMGXL9NNYNi2i27k3aRZ4oqVg@mail.gmail.com \
    --to=utsav@dropbox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=ukshah2@illinois.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).