git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Duy Nguyen <pclouds@gmail.com>, git mailing list <git@vger.kernel.org>
Subject: Re: Watchman support for git
Date: Sat, 03 May 2014 16:49:09 -0400	[thread overview]
Message-ID: <1399150149.5310.47.camel@stross> (raw)
In-Reply-To: <CACsJy8B1Q3WEPT+nzDDwS5f7Wx+u5CHfN9JppRHv5VEx5NTxSw@mail.gmail.com>

On Sat, 2014-05-03 at 15:49 +0700, Duy Nguyen wrote:
> On Sat, May 3, 2014 at 11:39 AM, David Turner <dturner@twopensource.com> wrote:
> >> Index v4 and split index (and the following read-cache daemon,
> >> hopefully)
> >
> > Looking at some of the archives for read-cache daemon, it seems to be
> > somewhat similar to watchman, right?  But I only saw inotify code; what
> > about Mac OS?  Or am I misunderstanding what it is?
> 
> It's mentioned in [1], the second paragraph, mostly to hide index I/O
> read cost and the SHA-1 hashing cost in the background. In theory it
> should work on all platforms that support multiple processes and
> efficient IPC. It can help load watchman file cache faster too.

Yes, that seems like a good idea.

I actually wrote some of a more-complicated, weirder version of this
idea.  In my version, there was a long-running git daemon process that
held the index, the watchman file cache, and also objects loaded from
the object database.  Other git commands would then send their
command-line and arguments over to the daemon, which would run the
commands and send stdin/out/err back.  Of course, this is complicated
because git commands are designed to run then exit, so they often rely
on variables being initialized to zero, or fail to free memory.  I used
the Boehm GC to handle the memory freeing problem.  To handle variables
that needed to be reinitialized, I used __attribute__(section...) to put
them all into one section, which I could save on daemon startup and
restore after each command.  I also replaced calls to exit() with a
function that called longjmp() so the daemon could survive commands
failing.  Had I continued, I would also have had to track open file
descriptors to avoid leaking those.

This was a giant mess that only sort-of worked: it was difficult to
track down all of the variables that needed to be reinitialized. 

The advantage of my method is that there was somewhat less data to
marshall over IPC, and that objects could be easily cached; the
disadvantage is complexity, massive code changes, and the fact that it
didn't actually totally work at the time I ran out of time. 

So I'm really looking forward to trying your version!

> >> The last line could be a competition between watchman and my coming
> >> "untracked cache" series. I expect to cut the number in that line at
> >> least in half without external dependency.
> >
> > I hadn't seen the "untracked cached" work (I actually finished these
> > patches a month or so ago but have been waiting for some internal
> > reviews before sending them out).  Looks interesting.  It seems we use a
> > similar strategy for handling ignores.
> 
> Yep, mostly the same at the core, except that I exploit directory
> mtime while you use inotify. Each approach has its own pros and cons,
> I think. Both should face the same traps in caching (e.g. if you "git
> rm --cached" a file, that file could be come either untracked, or
> ignored).
> 
> >> Patch 2/3 did not seem to make it to the list by the way..
> >
> > Thanks for your comments.  I just tried again to send patch 2/3.  I do
> > actually see the CC of it in my @twitter.com mailbox, but I don't see it
> > in the archives on the web.  Do you know if there is a reason the
> > mailing list would reject it?
> 
> Probably its size, 131K, which is also an indicator to split it (and
> the third patch) into smaller patches if you want to merge this
> feature in master eventually.

I would like to merge the feature into master.  It works well for me,
and some of my colleagues who have tried it out.

I can split the vmac patch into two, but one of them will remain quite
large because it contains the code for VMAC and AES, which total a bit
over 100k.  Since the list will probably reject that, I'll post a link
to a repository containing the patches.

I'm not 100% sure how to split the watchman patch up.  I could add the
fs_cache code and then separately add the watchman code that populates
the cache.  Do you think there is a need to divide it up beyond this?

  reply	other threads:[~2014-05-03 20:49 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 23:14 Watchman support for git dturner
2014-05-02 23:14 ` [PATCH 1/3] After chdir to run grep, return to old directory dturner
2014-05-06 22:24   ` Junio C Hamano
2014-05-07  0:06     ` David Turner
2014-05-07  3:00       ` Jeff King
2014-05-07  3:33         ` David Turner
2014-05-07 17:42           ` Junio C Hamano
2014-05-07 20:57             ` David Turner
2014-05-02 23:14 ` [PATCH 3/3] Watchman support dturner
2014-05-02 23:20 ` Watchman support for git Felipe Contreras
2014-05-03  2:24   ` David Turner
2014-05-03  3:40     ` Felipe Contreras
2014-05-05 18:08       ` David Turner
2014-05-05 18:14         ` Felipe Contreras
2014-05-08 19:17       ` Sebastian Schuberth
2014-05-09  7:08         ` David Lang
2014-05-09 17:17           ` David Turner
2014-05-09 18:08             ` David Lang
2014-05-09 18:17               ` David Turner
2014-05-09 18:27                 ` David Lang
2014-05-09 18:47                   ` David Turner
2014-05-03  0:52 ` Duy Nguyen
2014-05-03  4:39   ` David Turner
2014-05-03  8:49     ` Duy Nguyen
2014-05-03 20:49       ` David Turner [this message]
2014-05-04  0:15         ` Duy Nguyen
2014-05-06  3:13           ` David Turner
2014-05-06  0:26   ` Duy Nguyen
2014-05-06  0:30     ` Duy Nguyen
2014-05-10  5:26 ` Duy Nguyen
2014-05-10 18:38   ` David Turner
2014-05-11  0:21     ` Duy Nguyen
2014-05-11 22:56       ` David Turner
2014-05-12 10:45         ` Duy Nguyen
2014-05-13 22:38           ` David Turner
2014-05-13 22:54             ` Duy Nguyen
2014-05-13 23:19               ` David Turner
2014-05-10  8:16 ` Duy Nguyen
2014-05-13 23:44   ` David Turner
2014-05-14 10:36     ` Duy Nguyen
2014-05-14 10:52       ` Duy Nguyen
2014-05-15 19:42       ` David Turner
2014-05-19 10:10         ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1399150149.5310.47.camel@stross \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).