git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Lang <david@lang.hm>
To: David Turner <dturner@twopensource.com>
Cc: Sebastian Schuberth <sschuberth@gmail.com>,
	Felipe Contreras <felipe.contreras@gmail.com>,
	git@vger.kernel.org
Subject: Re: Watchman support for git
Date: Fri, 9 May 2014 11:08:01 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.02.1405091103200.5876@nftneq.ynat.uz> (raw)
In-Reply-To: <1399655858.11843.119.camel@stross>

On Fri, 9 May 2014, David Turner wrote:

> On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:
>> On Thu, 8 May 2014, Sebastian Schuberth wrote:
>>
>>> On 03.05.2014 05:40, Felipe Contreras wrote:
>>>
>>>>>> That's very interesting. Do you get similar improvements when doing
>>>>>> something similar in Merurial (watchman vs . no watchman).
>>>>>
>>>>> I have not tried it.  My understanding is that this is why Facebook
>>>>> wrote Watchman and added support for it to Mercurial, so I would assume
>>>>> that the improvements are at least this good.
>>>>
>>>> Yeah, my bet is that they are actually much better (because Mercurial
>>>> can't be so optimized as Git).
>>>>
>>>> I'm interested in this number because if watchman in Git is improving it
>>>> by 30%, but in Mercurial it's improving it by 100% (made up number),
>>>> therefore it makes sens that you might want it more if you are using hg,
>>>> but not so much if you are using git.
>>>>
>>>> Also, if similar repositories with Mercurial+watchman are actually
>>>> faster than Git+watchman, that means that there's room for improvement
>>>> in your implementation. This is not a big issue at this point of the
>>>> process, just something nice to know.
>>>
>>> The article at [1] has some details, they claim "For our repository, enabling Watchman integration has made Mercurial's status command more than 5x faster than Git's status command".
>>>
>>> [1] https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
>>
>> a lot of that speed comparison is going to depend on your storage system and the
>> size of your repository.
>>
>> if you have a high-end enterprise storage system that tracks metadata very
>> differently from the file contents (I've seen some that have rackes worth of
>> SATA drives for contents and then 'small' arrays of a few dozen flash drives for
>> the metadata), and then you have very large repositories (Facebook has
>> everything in a single repo), then you have a perfect storm where something like
>> watchman that talks the proprietary protocol of the storage array can be FAR
>> faster than anything that needs to operate with the standard POSIX calls.
>>
>> That can easily account for the difference between the facebook announcement and
>> the results presented for normal disks that show an improvement, but with even
>> stock git being faster than improved mercurial.
>
> As I recall from Facebook's presentation[1] on this (as well as from the
> discussion on the git mailing list[2]), Facebook's test respository is
> much larger than any known git repository.  In particular, it is larger
> than WebKit.

agreed, it's huge, it's the entire codebase history of every tool that they use 
crammed together in one rep

> These performance improvements are not for server-side
> tasks, but for client-side (e.g. git/hg status).  Facebook also made
> other improvements for the client-server communication, and for
> log/blame, but these are not relevant to watchman.

well, in their situation they have shared storage that clients use for this huge 
repo, so I don't think they have a clear client/server boundry the way you are 
thinking. Even clients have this huge repo to deal with, and they can do so 
efficiently by querying the storage device rather than trying to walk the tree 
or monitor access directly.

> It is entirely possible that, as repo size grows, Mercurial with
> watchman is faster than git without.
>
> With my patches, git status isn't constant-time; it's merely a roughly
> constant factor faster. My initial design was to make git status
> constant-time by caching the results of the wt_status_collect calls.

This is what you would have to do with traditional storage. My understanding is 
that the real benefits of watchman show up when you have non-traditional storage 
and can take advantage of the knowledge that the storage system gathers for it's 
own use.

David Lang

> But there were so many cases with the various options that I got a bit
> lost in the wilderness and made a big mess. Maybe I would do better if I
> tried it again today.  And maybe if I just build on top of the
> untracked-cache code, I would be able to get to constant-time; I'll have
> to try that at some point.
>
> [1] http://www.youtube.com/watch?v=Dlguc63cRXg
> [2] http://comments.gmane.org/gmane.comp.version-control.git/189776
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2014-05-09 18:08 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 23:14 Watchman support for git dturner
2014-05-02 23:14 ` [PATCH 1/3] After chdir to run grep, return to old directory dturner
2014-05-06 22:24   ` Junio C Hamano
2014-05-07  0:06     ` David Turner
2014-05-07  3:00       ` Jeff King
2014-05-07  3:33         ` David Turner
2014-05-07 17:42           ` Junio C Hamano
2014-05-07 20:57             ` David Turner
2014-05-02 23:14 ` [PATCH 3/3] Watchman support dturner
2014-05-02 23:20 ` Watchman support for git Felipe Contreras
2014-05-03  2:24   ` David Turner
2014-05-03  3:40     ` Felipe Contreras
2014-05-05 18:08       ` David Turner
2014-05-05 18:14         ` Felipe Contreras
2014-05-08 19:17       ` Sebastian Schuberth
2014-05-09  7:08         ` David Lang
2014-05-09 17:17           ` David Turner
2014-05-09 18:08             ` David Lang [this message]
2014-05-09 18:17               ` David Turner
2014-05-09 18:27                 ` David Lang
2014-05-09 18:47                   ` David Turner
2014-05-03  0:52 ` Duy Nguyen
2014-05-03  4:39   ` David Turner
2014-05-03  8:49     ` Duy Nguyen
2014-05-03 20:49       ` David Turner
2014-05-04  0:15         ` Duy Nguyen
2014-05-06  3:13           ` David Turner
2014-05-06  0:26   ` Duy Nguyen
2014-05-06  0:30     ` Duy Nguyen
2014-05-10  5:26 ` Duy Nguyen
2014-05-10 18:38   ` David Turner
2014-05-11  0:21     ` Duy Nguyen
2014-05-11 22:56       ` David Turner
2014-05-12 10:45         ` Duy Nguyen
2014-05-13 22:38           ` David Turner
2014-05-13 22:54             ` Duy Nguyen
2014-05-13 23:19               ` David Turner
2014-05-10  8:16 ` Duy Nguyen
2014-05-13 23:44   ` David Turner
2014-05-14 10:36     ` Duy Nguyen
2014-05-14 10:52       ` Duy Nguyen
2014-05-15 19:42       ` David Turner
2014-05-19 10:10         ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1405091103200.5876@nftneq.ynat.uz \
    --to=david@lang.hm \
    --cc=dturner@twopensource.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sschuberth@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).