git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: David Turner <dturner@twopensource.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Watchman support for git
Date: Wed, 14 May 2014 17:36:40 +0700	[thread overview]
Message-ID: <CACsJy8C49EDwjtv_L2pTRy3XbPptp7+nNzT=Jp4BaH_TOZtvnQ@mail.gmail.com> (raw)
In-Reply-To: <1400024691.14179.40.camel@stross>

On Wed, May 14, 2014 at 6:44 AM, David Turner <dturner@twopensource.com> wrote:
>> I'm not so happy that git now needs to link to libjansson.so and
>> libwatchman.so. I would load libwatchman.so at runtime only when
>> core.usewatchman is on, but this is more of personal taste.
>
> I assume you mean something with dlopen? I don't really like that because
> (a) nothing else in git works that way and
> (b) you would still need the libwatchman headers to build git (or the
> structs could be copied, but that is ugly).

And because we don't link to libdl.so anyway, using dlopen means
trading libjansson.so and libwatchman.so for libdl.so. And it's
uglier. So forget what I wrote.

>>  With that in
>> mind, I think you don't need to keep a fs cache on disk at all. All
>> you need to store (in the index) is the clock value from watchman.
>> After we parse the index, we perform a "since" query to get path names
>> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit
>> on entries that are _not_ in the query result. The remaining items
>> will be lstat'd by git (see [1] and read-cache.c changes on the next
>> few patches). Assuming the number of updated files is reasonably
>> small, we won't  be punished by lookup time.
>
> I considered this, but rejected it for a few reasons:
> 1. We still need to know about the untracked files.  I guess we could
> do an index extension for just that, like your untracked cache.

Yes. But consider that the number of untracked files is usually small
(otherwise 'git status' would look very messy). And your fscache would
need to store excluded file list too, which could be a lot bigger (one
pair of  .[ch] -> one .o). _But_ yours would make 'git status
--ignored' work. I don't consider that a major use case for
optimization, but people may have different opinions.

> 2. That doesn't eliminate opendir/readdir, I think.  Those are a major
> cost. I did not thoroughly review your patches from January/February,
> but I didn't notice anything in there about this part of dir.c.
> Also the cost of open(nonexistent .gitignore) to do ignore processing.

Assuming untracked cache is in use, opendir/readdir is replaced with
lstat. And cheap lstat can be solved with watchman without storing
anything extra. I solve open(.gitignore) too by checking its stat data
with the one in index. If matches, I reuse the version in tree. This
does not necessarily make it cheaper, but it increases locality so it
might be. _Modified_ .gitignore files have to go through
open(.gitignore), but people don't update .gitignore often.

> 3. Changing a file in the index (e.g. git add) requires clearing
> the CE_VALID bit; this means that third-party libraries (e.g. jgit)
> that change the git repo need to understand this extension for correct
> operation.  Maybe that's just the nature of extensions, but it's not
> something that my present patch set requires.

I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED.
Only after loading and validating against watchman that I turn
CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused.

I assume you won't change your mind about this. Which is fine to me. I
will still try out my approach with your libwatchman though. Just
curious about its performance and complexity, compared to your
approach.

A bit off topic, but msys fork has another "fscache" in compat/win32.
If you could come up with a different name, maybe it'll be less
confusing for them after merging. But this is not a big deal, as this
fscache won't work on windows anyway.
-- 
Duy

  reply	other threads:[~2014-05-14 10:37 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 23:14 Watchman support for git dturner
2014-05-02 23:14 ` [PATCH 1/3] After chdir to run grep, return to old directory dturner
2014-05-06 22:24   ` Junio C Hamano
2014-05-07  0:06     ` David Turner
2014-05-07  3:00       ` Jeff King
2014-05-07  3:33         ` David Turner
2014-05-07 17:42           ` Junio C Hamano
2014-05-07 20:57             ` David Turner
2014-05-02 23:14 ` [PATCH 3/3] Watchman support dturner
2014-05-02 23:20 ` Watchman support for git Felipe Contreras
2014-05-03  2:24   ` David Turner
2014-05-03  3:40     ` Felipe Contreras
2014-05-05 18:08       ` David Turner
2014-05-05 18:14         ` Felipe Contreras
2014-05-08 19:17       ` Sebastian Schuberth
2014-05-09  7:08         ` David Lang
2014-05-09 17:17           ` David Turner
2014-05-09 18:08             ` David Lang
2014-05-09 18:17               ` David Turner
2014-05-09 18:27                 ` David Lang
2014-05-09 18:47                   ` David Turner
2014-05-03  0:52 ` Duy Nguyen
2014-05-03  4:39   ` David Turner
2014-05-03  8:49     ` Duy Nguyen
2014-05-03 20:49       ` David Turner
2014-05-04  0:15         ` Duy Nguyen
2014-05-06  3:13           ` David Turner
2014-05-06  0:26   ` Duy Nguyen
2014-05-06  0:30     ` Duy Nguyen
2014-05-10  5:26 ` Duy Nguyen
2014-05-10 18:38   ` David Turner
2014-05-11  0:21     ` Duy Nguyen
2014-05-11 22:56       ` David Turner
2014-05-12 10:45         ` Duy Nguyen
2014-05-13 22:38           ` David Turner
2014-05-13 22:54             ` Duy Nguyen
2014-05-13 23:19               ` David Turner
2014-05-10  8:16 ` Duy Nguyen
2014-05-13 23:44   ` David Turner
2014-05-14 10:36     ` Duy Nguyen [this message]
2014-05-14 10:52       ` Duy Nguyen
2014-05-15 19:42       ` David Turner
2014-05-19 10:10         ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8C49EDwjtv_L2pTRy3XbPptp7+nNzT=Jp4BaH_TOZtvnQ@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).