git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	benpeart@microsoft.com,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	David.Turner@twosigma.com, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
Date: Thu, 25 May 2017 09:49:20 -0400	[thread overview]
Message-ID: <52d8376d-63ad-14f8-fbd5-3540a4926e92@gmail.com> (raw)
In-Reply-To: <CACBZZX6LENwuVuTyU-RetaXz8jZtUp1dv+gmQQ281sPx1czPnA@mail.gmail.com>



On 5/22/2017 1:28 PM, Ævar Arnfjörð Bjarmason wrote:
> On Mon, May 22, 2017 at 6:18 PM, Ben Peart <peartben@gmail.com> wrote:
>> On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> +== File System Monitor cache
>>>> +
>>>> +  The file system monitor cache tracks files for which the
>>>> query-fsmonitor
>>>> +  hook has told us about changes.  The signature for this extension is
>>>> +  { 'F', 'S', 'M', 'N' }.
>>>> +
>>>> +  The extension starts with
>>>> +
>>>> +  - 32-bit version number: the current supported version is 1.
>>>> +
>>>> +  - 64-bit time: the extension data reflects all changes through the
>>>> given
>>>> +       time which is stored as the seconds elapsed since midnight,
>>>> January 1, 1970.
>>>> +
>>>> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
>>>> +
>>>> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
>>>> +    is CE_FSMONITOR_DIRTY.
>>>
>>>
>>> We already have a uint64_t in one place in the codebase (getnanotime)
>>> which uses a 64 bit time for nanosecond accuracy, and numerous
>>> filesystems already support nanosecond timestamps (ext4, that new
>>> Apple thingy...).
>>>
>>> I don't know if any of the inotify/fsmonitor APIs support that yet,
>>> but it seems inevitable that that'll be added if not, in some
>>> pathological cases we can have a lot of files modified in 1 second, so
>>> using nanosecond accuracy means there'll be a lot less data to
>>> consider in some cases.
>>>
>>> It does mean this'll only work until the year ~2500, but that seems
>>> like an acceptable trade-off.
>>>
>>
>> I really don't think nano-second resolution is needed in this case for a few
>> reasons.
>>
>> The number of files that can change within a given second is limited by the
>> IO throughput of the underlying device. Even assuming a very fast device and
>> very small files and changes, this won't be that many files.
>>
>> Without this patch, git would have scanned all those files every time. With
>> this patch, git will only scan those files a 2nd time that are modified in
>> the same second that it did the first scan *that came before the first scan
>> started* (the "lots of files modified" section in the 1 second timeline
>> below).
>>
>> |------------------------- one second ---------------------|
>> |-lots of files modified - git status - more file modified-|
>>
>> Yes, some duplicate status checks can be made but its still a significant
>> win in any reasonable scenario. Especially when you consider that it is
>> pretty unusual to do git status/add/commit calls in the middle of making
>> lots of changes to files.
> 
> I understand that we get most of the wins either way.
> 
> I'm just wondering why not make it nanosecond-resolution now from the
> beginning since that's where major filesystems are heading already,
> and changing stuff like this can be a hassle once it's initially out
> there, whereas just dividing by 10^9 for APIs that need seconds from
> the beginning is easy & future-proof.
> 
> There are some uses of git where this would probably matter in practice.
> 
> E.g. consider a git-annex backed fileserver using ext4, currently
> git-annex comes with its own FS watcher as a daemon invoked via `git
> annex watch` to constantly add new files without killing performance
> via a status/add in a loop, with this a daemon like that could just
> run status/add in a loop, but on a busy repo the 1s interval size
> might start to matter as you're constantly inspecting larger
> intervals.
> 
> More importantly though, I can't think of any case where having it in
> nanoseconds to begin with would do any harm.

You're right, it's not hard to support nano second resolution and it 
doesn't do any harm.  I switch the index format and interface as I don't 
expect this code will still be running when the timer rolls over. 
Someone long after me will have to fix it if it is. :)

> 
>> In addition, the backing file system monitor (Watchman) supports number of
>> seconds since the unix epoch (unix time_t style).  This means any support of
>> nano seconds by git is academic until someone provides a file system watcher
>> that does support nano second granularity.
> 
> I haven't used watchman for anything non-trivial, but the
> documentation for the query API you seem to be using says you can
> request a {ctime,mtime}_ns field:
> 
> https://github.com/facebook/watchman/blob/master/website/_docs/cmd.query.markdown#user-content-available-fields
> 
> And isn't this the documentation for the "since" query you're using,
> saying you can specify any arbitrary fs timing field, such as a _ns
> time field:
> 
> https://github.com/facebook/watchman/blob/master/website/_docs/expr.since.md
> 
> ?

To keep the index extension and hook interface generic, I have not 
adopted the Watchman specific clock format.  This enables anyone to 
provide a different file system watcher that can inter-operate as nano 
seconds since epoc is easy for anyone to support.

> 
>> Finally, the fsmonitor index extension is versioned so that we can
>> seamlessly upgrade to nano second resolution later if we desire.
> 
> Aside from my nanosecond bikeshedding, let's say we change the
> semantics of any of this in the future: The index has the version, but
> there's one argument passed to the hook without a version. Is the hook
> expected to potentially be reading the version from the index to make
> sense of whether (in this case) the argument is a mtime or mtime_ns?
> 
> Wouldn't this make more sense than that on top, i.e. pass the version
> to the hook, untested (and probably whines about the sprintf
> format...):

It's a good point that the index extension is versioned and the hook 
interface is not.  I've not seen versioning in any hook interface to 
date but its never to late to start.  I'll add a separate version to the 
hook interface as well so they can be updated independently if needed.

> 
> $ git diff -U1
> diff --git a/cache.h b/cache.h
> index 6eafd34fff..3c63f179f8 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -346,2 +346,3 @@ struct index_state {
>          struct untracked_cache *untracked;
> +       uint32_t fsmonitor_version;
>          time_t fsmonitor_last_update;
> diff --git a/fsmonitor.c b/fsmonitor.c
> index f5a9f818e8..7236b538ac 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -60,2 +60,4 @@ int read_fsmonitor_extension(struct index_state
> *istate, const void *data,
>                  return error("bad fsmonitor version %d", hdr_version);
> +       else
> +               istate->fsmonitor_version = hdr_version;
> 
> @@ -137,2 +139,3 @@ static int query_fsmonitor(time_t last_update,
> struct strbuf *query_result)
>          struct child_process cp = CHILD_PROCESS_INIT;
> +       char version[1];
>          char date[64];
> @@ -143,2 +146,3 @@ static int query_fsmonitor(time_t last_update,
> struct strbuf *query_result)
> 
> +       snprintf(version, sizeof(version), "%d", istate->fsmonitor_version);
>          snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update)
> 

  reply	other threads:[~2017-05-25 13:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-05-18 20:13 ` [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-05-19 15:33   ` Ben Peart
2017-05-20 10:41     ` Junio C Hamano
2017-05-24 12:30   ` Christian Couder
2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-05-20 16:55   ` Torsten Bögershausen
2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
2017-05-20 11:28   ` Junio C Hamano
2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
2017-05-22 16:18     ` Ben Peart
2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason
2017-05-25 13:49         ` Ben Peart [this message]
2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-05-24 13:12   ` Christian Couder
2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
2017-05-26 16:02       ` Ben Peart
2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
2017-05-25 13:55   ` Ben Peart
2017-05-27  6:57     ` Christian Couder
2017-05-30 18:05       ` Ben Peart
2017-05-30 20:33         ` Christian Couder
2017-05-30 23:11           ` Ben Peart
2017-05-31  7:37             ` Christian Couder
2017-05-31  7:59     ` Christian Couder
2017-05-31 13:37       ` Ben Peart
2017-05-31 14:10         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52d8376d-63ad-14f8-fbd5-3540a4926e92@gmail.com \
    --to=peartben@gmail.com \
    --cc=David.Turner@twosigma.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).