git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git <git@vger.kernel.org>
Cc: Ben Peart <benpeart@microsoft.com>,
	Alex Vandiver <alexmv@dropbox.com>,
	Christian Couder <christian.couder@gmail.com>,
	David Turner <dturner@twopensource.com>
Subject: Re: Some rough edges of core.fsmonitor
Date: Tue, 30 Jan 2018 17:57:18 -0500	[thread overview]
Message-ID: <26b9d795-b2b3-905d-c67a-83bc4e976894@gmail.com> (raw)
In-Reply-To: <87efmcw3fa.fsf@evledraar.gmail.com>

While some of these issues have been discussed in other threads, I 
thought I'd summarize my thoughts here.

On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:
> I just got around to testing this since it landed, for context some
> previous poking of mine in [1].
> 
> Issues / stuff I've noticed:
> 
> 1) We end up invalidating the untracked cache because stuff in .git/
> changed. For example:
> 
>      01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>      01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>      01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>      01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>      01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
> 
> Am I missing something or should we do something like:
> 
>      diff --git a/fsmonitor.c b/fsmonitor.c
>      index 0af7c4edba..5067b89bda 100644
>      --- a/fsmonitor.c
>      +++ b/fsmonitor.c
>      @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
> 
>       static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>       {
>      -       int pos = index_name_pos(istate, name, strlen(name));
>      +       int pos;
>      +
>      +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>      +               return;
>      +
>      +       pos = index_name_pos(istate, name, strlen(name));
> 
>              if (pos >= 0) {
>                      struct cache_entry *ce = istate->cache[pos];
> 
> With that patch applied status on a large repo[2] goes from a consistent
> ~180-200ms to ~140-150ms, since we're not invalidating some of the UC
> structure
> 

I favor making this optimization by updating 
untracked_cache_invalidate_path() so that it ignores paths under 
get_git_dir() and doesn't invalidate the untracked cache or flag the 
index as dirty.

> 2) We re-write out the index even though we know nothing changed
> 
> When you first run with core.fsmonitor it needs to
> mark_fsmonitor_clean() for every path, but is there a reason for why we
> wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
> marked and we know from the hook that nothing changed? Why write out the
> index again?
> 

Writing out the index when core.fsmonitor is first turned on is 
necessary to get the index extension added with the current state of the 
dirty flags.  Given it is a one time cost, I don't think we have 
anything worth trying to optimize here.

> 3) A lot of time spend reading the index (or something..)
> 
> While the hook itself takes ~20ms (and watchman itself 1/4 of that)
> status as a whole takes much longer. gprof reveals:
> 
>      Each sample counts as 0.01 seconds.
>        %   cumulative   self              self     total
>       time   seconds   seconds    calls  ms/call  ms/call  name
>       15.38      0.02     0.02   221690     0.00     0.00  memihash
>       15.38      0.04     0.02   221689     0.00     0.00  create_from_disk
>        7.69      0.05     0.01  2216897     0.00     0.00  git_bswap32
>        7.69      0.06     0.01   222661     0.00     0.00  ce_path_match
>        7.69      0.07     0.01   221769     0.00     0.00  hashmap_add
>        7.69      0.08     0.01    39941     0.00     0.00  prep_exclude
>        7.69      0.09     0.01    39940     0.00     0.00  strbuf_addch
>        7.69      0.10     0.01        1    10.00    10.00  read_one
>        7.69      0.11     0.01        1    10.00    10.00  refresh_index
>        7.69      0.12     0.01        1    10.00    10.00  tweak_fsmonitor
>        7.69      0.13     0.01                             preload_thread
> 
> The index is 24M in this case, I guess it's unpacking it, but I wonder
> if this couldn't be much faster if we saved away the result of the last
> "status" in something that's quick to access, and then if nothing
> changed we just report that, and no need to re-write the index (or just
> write the "it was clean at this time" part).

Yes, reading the index is slow.  We've made some improvements (not 
computing the SHA, not validating the sort order, etc) and have one more 
in progress that will reduce the malloc() cost.  I haven't found any 
other easy optimizations but it would be great if you could find more! 
To make significant improvements, I'm afraid it will take more 
substantial changes to the in memory and on disk formats and updates to 
the code to take advantage of those changes.

> 
> 4) core.fsmonitor=false behaves unexpectedly
> 
> The code that reads this variable just treats it as a string, so we do a
> bunch of work for nothing (and nothing warns) if this is set and 'false'
> is executed. Any reason we couldn't do our standard boolean parsing
> here? You couldn't call your hook 0/1/true/false, but that doesn't seem
> like a big loss.
> 
> 1. https://public-inbox.org/git/CACBZZX5a6Op7dH_g9WOFBnejh2zgNK4b34ygxA8daNDqvitFVA@mail.gmail.com/
> 2. https://github.com/avar/2015-04-03-1M-git
> 

I'm torn on this one.  The core.fsmontior setting isn't a boolean value, 
its a string that is the command to run when we need file system 
changes.  It would be pretty simple to add a call to 
git_parse_maybe_bool_text() to treat "false," "no," or "off" the same as 
an empty string but that makes it look even more like a boolean when it 
isn't.

  parent reply	other threads:[~2018-01-30 22:57 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27  1:36 ` Duy Nguyen
2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
2018-01-27 11:58     ` Thomas Gummerer
2018-01-27 12:27       ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 12:39     ` Duy Nguyen
2018-01-27 13:09       ` Duy Nguyen
2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
2018-01-30 22:41           ` Ben Peart
2018-01-29  9:40     ` Duy Nguyen
2018-01-29 23:16       ` Ben Peart
2018-02-01 10:40         ` Duy Nguyen
2018-01-28 20:44 ` Johannes Schindelin
2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
2018-01-30  1:21     ` Ben Peart
2018-01-31 10:15       ` Duy Nguyen
2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
2018-02-05 17:44           ` Ben Peart
2018-02-06 12:02             ` Duy Nguyen
2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
2018-02-07 16:59               ` Ben Peart
2018-02-13 10:00                 ` Duy Nguyen
2018-02-13 17:57                   ` Junio C Hamano
2018-02-14  1:24                     ` Duy Nguyen
2018-02-14  8:00                       ` Junio C Hamano
2018-01-30 22:57 ` Ben Peart [this message]
2018-01-30 23:16   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-31 16:12     ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26b9d795-b2b3-905d-c67a-83bc4e976894@gmail.com \
    --to=peartben@gmail.com \
    --cc=alexmv@dropbox.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).