git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
@ 2013-03-10 20:17 Ramkumar Ramachandra
  2013-03-11 17:05 ` Heiko Voigt
  2013-03-12 23:21 ` Karsten Blees
  0 siblings, 2 replies; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-03-10 20:17 UTC (permalink / raw)
  To: Git List
  Cc: Duy Nguyen, Junio C Hamano, Torsten Bögershausen,
	Robert Zeh, Jeff King, Erik Faye-Lund, Karsten Blees,
	Drew Northup

git operations are slow on repositories with lots of files, and lots
of tiny filesystem calls like lstat(), getdents(), open() are
reposible for this.  On the linux-2.6 repository, for instance, the
numbers for "git status" look like this:

  top syscalls sorted     top syscalls sorted
  by acc. time            by number
  ----------------------------------------------
  0.401906 40950 lstat    0.401906 40950 lstat
  0.190484 5343 getdents  0.150055 5374 open
  0.150055 5374 open      0.190484 5343 getdents
  0.074843 2806 close     0.074843 2806 close
  0.003216 157 read       0.003216 157 read

To solve this problem, we propose to build a daemon which will watch
the filesystem using inotify and report batched up events over a UNIX
socket.  Since inotify is Linux-only, we have to leave open the
possibility of writing similar daemons for other platforms.
Everything will continue to work as before if there is no helper
present.

The fswatch API introduces a generic way for git.git to request for
filesystem changes.  Different helpers (like the inotify daemon on
Linux) will be plugged into this API on different platforms.  It falls
back to using the filesystem calls.

The daemon will start up with the very first operation done on the git
repository, and will die after a specified period of repository
inactivity.  It is going to be a per-repo daemon and will write to a
socket in the repository: access control is managed by filesystem
permissions.

This design is inspired by the credential helper design.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Documentation/technical/api-fswatch.txt | 62 +++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 Documentation/technical/api-fswatch.txt

diff --git a/Documentation/technical/api-fswatch.txt b/Documentation/technical/api-fswatch.txt
new file mode 100644
index 0000000..9c6826a
--- /dev/null
+++ b/Documentation/technical/api-fswatch.txt
@@ -0,0 +1,62 @@
+fswatch API
+===========
+
+The fswatch API provides an abstracted way of collecting information
+about filesystem changes.  A remote helper is typically a daemon which
+uses inotify to watch the filesystem, and this information is used by
+git instead of making expensive system calls like lstat(), open().
+
+Typical setup
+-------------
+
+------------
++-----------------------+
+| Git code (C)          |--- requires information about fs changes
+|.......................|
+| C fswatch API         |--- system calls ---> filesystem
++-----------------------+
+     ^             |
+     | UNIX socket |
+     |             v
++-----------------------+
+| Git fswatch helper    |--- daemon inotify-watching ---> filesystem
++-----------------------+
+------------
+
+The Git code will call the C API to obtain changes in filesystem
+information.  The API will itself call a configured helper (e.g. "git
+fswatch-notify") which may run filesystem changes, if the remote
+helper daemon was started in a previous invocation.  If the daemon is
+not already running, it is started, and the C API will fall back to
+making expensive system calls.
+
+C API
+-----
+
+The credential C API is meant to be called by Git code which needs
+information aboutx filesystem changes.  It is centered around an
+object representing the changes the filesystem since the last
+invocation.
+
+Data Structures
+~~~~~~~~~~~~~~~
+
+`struct fschanges`::
+
+	TODO
+
+
+Functions
+~~~~~~~~~
+
+TODO
+
+Example
+~~~~~~~
+
+TODO
+
+fswatch Helpers
+---------------
+
+TODO
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-10 20:17 [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline Ramkumar Ramachandra
@ 2013-03-11 17:05 ` Heiko Voigt
  2013-03-12  9:43   ` Ramkumar Ramachandra
  2013-03-12 23:21 ` Karsten Blees
  1 sibling, 1 reply; 17+ messages in thread
From: Heiko Voigt @ 2013-03-11 17:05 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git List, Duy Nguyen, Junio C Hamano, Torsten Bögershausen,
	Robert Zeh, Jeff King, Erik Faye-Lund, Karsten Blees,
	Drew Northup

On Mon, Mar 11, 2013 at 01:47:03AM +0530, Ramkumar Ramachandra wrote:
> git operations are slow on repositories with lots of files, and lots
> of tiny filesystem calls like lstat(), getdents(), open() are
> reposible for this.  On the linux-2.6 repository, for instance, the
> numbers for "git status" look like this:
> 
>   top syscalls sorted     top syscalls sorted
>   by acc. time            by number
>   ----------------------------------------------
>   0.401906 40950 lstat    0.401906 40950 lstat
>   0.190484 5343 getdents  0.150055 5374 open
>   0.150055 5374 open      0.190484 5343 getdents
>   0.074843 2806 close     0.074843 2806 close
>   0.003216 157 read       0.003216 157 read
> 
> To solve this problem, we propose to build a daemon which will watch
> the filesystem using inotify and report batched up events over a UNIX
> socket.  Since inotify is Linux-only, we have to leave open the
> possibility of writing similar daemons for other platforms.
> Everything will continue to work as before if there is no helper
> present.

While talking about platform independence. How about Windows? AFAIK
there are no file based sockets. How about using shared memory, thats
available, instead? It would greatly reduce the needed porting effort.

Since operations on a lot of files is especially expensive on Windows it
is one of the platforms that would profit the most from such a daemon.

Cheers Heiko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-11 17:05 ` Heiko Voigt
@ 2013-03-12  9:43   ` Ramkumar Ramachandra
  2013-03-12  9:50     ` Erik Faye-Lund
  2013-03-12  9:55     ` Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-03-12  9:43 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Git List, Duy Nguyen, Junio C Hamano, Torsten Bögershausen,
	Robert Zeh, Jeff King, Erik Faye-Lund, Karsten Blees,
	Drew Northup

Heiko Voigt wrote:
> While talking about platform independence. How about Windows? AFAIK
> there are no file based sockets. How about using shared memory, thats
> available, instead? It would greatly reduce the needed porting effort.

What about the git credential helper: it uses UNIX sockets, no?  How
does git-credential-winstore [1] work?

[1]: https://github.com/anurse/git-credential-winstore

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-12  9:43   ` Ramkumar Ramachandra
@ 2013-03-12  9:50     ` Erik Faye-Lund
  2013-03-12  9:55     ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Erik Faye-Lund @ 2013-03-12  9:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Heiko Voigt, Git List, Duy Nguyen, Junio C Hamano,
	Torsten Bögershausen, Robert Zeh, Jeff King, Karsten Blees,
	Drew Northup

On Tue, Mar 12, 2013 at 10:43 AM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> Heiko Voigt wrote:
>> While talking about platform independence. How about Windows? AFAIK
>> there are no file based sockets. How about using shared memory, thats
>> available, instead? It would greatly reduce the needed porting effort.
>
> What about the git credential helper: it uses UNIX sockets, no?  How
> does git-credential-winstore [1] work?
>
> [1]: https://github.com/anurse/git-credential-winstore

First, we have a proper credential helper for Windows in
contrib/credential/wincred these days. As the one who wrote that, we
communicate using stdin/stdout. The credential-helper doesn't maintain
state in itself, the Windows Credential Manager does. I suspect
git-credential-winstore works the same way.

As for Windows support, AFAIK there is no support for Unix domain
sockets in Windows. But there is support for named pipes, which is
almost the same thing. What we have support for in compat/mingw.[ch]
is a different matter, but we can extend that if needed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-12  9:43   ` Ramkumar Ramachandra
  2013-03-12  9:50     ` Erik Faye-Lund
@ 2013-03-12  9:55     ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2013-03-12  9:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Heiko Voigt, Git List, Duy Nguyen, Junio C Hamano,
	Torsten Bögershausen, Robert Zeh, Erik Faye-Lund,
	Karsten Blees, Drew Northup

On Tue, Mar 12, 2013 at 03:13:39PM +0530, Ramkumar Ramachandra wrote:

> Heiko Voigt wrote:
> > While talking about platform independence. How about Windows? AFAIK
> > there are no file based sockets. How about using shared memory, thats
> > available, instead? It would greatly reduce the needed porting effort.
> 
> What about the git credential helper: it uses UNIX sockets, no?  How
> does git-credential-winstore [1] work?

No, the main credential protocol happens over pipes to a child process's
stdin/stdout. The credential-cache helper does use unix sockets (since
it needs to contact a long-running daemon that caches the credentials),
and AFAIK is not available under Windows (but that's OK, because
Windows-specific helpers that use secure storage are better anyway).

When I introduced credential-cache, I recall somebody mentioned that
there is some Windows-equivalent IPC that can be used to emulate unix
domain sockets. The calls aren't the same, but as long as your
requirements are basically "get messages to/from the daemon", you can
probably abstract away the details on a per-platform basis.

Unfortunately I can't seem to find the original message or any details
in the archive (and I know next to nothing about Windows IPC).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-10 20:17 [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline Ramkumar Ramachandra
  2013-03-11 17:05 ` Heiko Voigt
@ 2013-03-12 23:21 ` Karsten Blees
  2013-03-13  1:03   ` Duy Nguyen
  1 sibling, 1 reply; 17+ messages in thread
From: Karsten Blees @ 2013-03-12 23:21 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git List, Duy Nguyen, Junio C Hamano, Torsten Bögershausen,
	Robert Zeh, Jeff King, Erik Faye-Lund, Drew Northup

Am 10.03.2013 21:17, schrieb Ramkumar Ramachandra:
> git operations are slow on repositories with lots of files, and lots
> of tiny filesystem calls like lstat(), getdents(), open() are
> reposible for this.  On the linux-2.6 repository, for instance, the
> numbers for "git status" look like this:
> 
>   top syscalls sorted     top syscalls sorted
>   by acc. time            by number
>   ----------------------------------------------
>   0.401906 40950 lstat    0.401906 40950 lstat
>   0.190484 5343 getdents  0.150055 5374 open
>   0.150055 5374 open      0.190484 5343 getdents
>   0.074843 2806 close     0.074843 2806 close
>   0.003216 157 read       0.003216 157 read
> 
> To solve this problem, we propose to build a daemon which will watch
> the filesystem using inotify and report batched up events over a UNIX
> socket.

[...]

> +
> +The credential C API is meant to be called by Git code which needs
> +information aboutx filesystem changes.  It is centered around an
> +object representing the changes the filesystem since the last
> +invocation.
> +

Hmmm...I don't see how filesystem changes since last invocation can solve the problem, or am I missing something? I think what you mean to say is that the daemon should keep track of the filesystem *state* of the working copy, or alternatively the deltas/changes to some known state (such as .git/index)?

I'm also still skeptical whether a daemon will improve overall performance. In my understanding its essentially a filesystem cache in user-mode. The difference to using the OS filesystem cache directly (via lstat/readdir) is that we replace ~50k sys-calls with a single IPC call (i.e. the git <--> fswatch daemon communication is less 'chatty'). However, the 'chattyness' is still there between the fswatch daemon and the OS / inotify. Consider 'git status; make; make clean; git status'...that's a *lot* of changes to process for nothing (potentially slowing down make).

Then there's the issue of stale data in the cache. Modifying porcelain commands that use 'git status --porcelain' to compile their changesets will want 100% exact data. I'm not saying its not doable, but adding another platform specific, caching daemon to the tool chain doesn't exactly simplify things...

But perhaps I'm too pessimistic (or just stigmatized by inherently slow and out-of-date TGitCache/TSvnCache on Windows :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-12 23:21 ` Karsten Blees
@ 2013-03-13  1:03   ` Duy Nguyen
  2013-03-13 17:50     ` Karsten Blees
  0 siblings, 1 reply; 17+ messages in thread
From: Duy Nguyen @ 2013-03-13  1:03 UTC (permalink / raw)
  To: Karsten Blees
  Cc: Ramkumar Ramachandra, Git List, Junio C Hamano,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

On Wed, Mar 13, 2013 at 6:21 AM, Karsten Blees <karsten.blees@gmail.com> wrote:
> Hmmm...I don't see how filesystem changes since last invocation can solve the problem, or am I missing something? I think what you mean to say is that the daemon should keep track of the filesystem *state* of the working copy, or alternatively the deltas/changes to some known state (such as .git/index)?

I think git process can keep track of filesystem state (and save it
down if necessary). But when git process is not running, system state
changes and it cannot know about. The daemon helps filling this gap
(and basically keeps git "running" (in a light form) throughout a
development session). For example if we know only 5 files have changed
since the last refresh, we only need to re-stat those 5. The same for
untracked/ignored file checking,

> I'm also still skeptical whether a daemon will improve overall performance. In my understanding its essentially a filesystem cache in user-mode. The difference to using the OS filesystem cache directly (via lstat/readdir) is that we replace ~50k sys-calls with a single IPC call (i.e. the git <--> fswatch daemon communication is less 'chatty'). However, the 'chattyness' is still there between the fswatch daemon and the OS / inotify.

I think it attempts to reduce unnecessary system calls, not eliminate
them all. In the "5 changed files" above, a few IPC calls are done to
retrieve the file list, then 5 lstat will be issued (by git, not the
daemon) instead of thousands of them.

>Consider 'git status; make; make clean; git status'...that's a *lot* of changes to process for nothing (potentially slowing down make).

Yeah. In my opinion, the daemon should realize that at some point
accumulated changes are too much that it's not worth collecting
anymore, and drop them all. Git will do it the normal/slow way. After
that the daemon picks up again. We only optimize for the case when
little changes are made in filesystem.

> Then there's the issue of stale data in the cache. Modifying porcelain commands that use 'git status --porcelain' to compile their changesets will want 100% exact data. I'm not saying its not doable, but adding another platform specific, caching daemon to the tool chain doesn't exactly simplify things...
>
> But perhaps I'm too pessimistic (or just stigmatized by inherently slow and out-of-date TGitCache/TSvnCache on Windows :-)

Thanks. I didn't know about TGitCache. Will dig it up. Maybe we can
learn something from it (or realize the daemon approach is futile
after all).
-- 
Duy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-13  1:03   ` Duy Nguyen
@ 2013-03-13 17:50     ` Karsten Blees
  2013-03-13 19:38       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Karsten Blees @ 2013-03-13 17:50 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Ramkumar Ramachandra, Git List, Junio C Hamano,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Am 13.03.2013 02:03, schrieb Duy Nguyen:
> On Wed, Mar 13, 2013 at 6:21 AM, Karsten Blees <karsten.blees@gmail.com> wrote:
>> Hmmm...I don't see how filesystem changes since last invocation can solve the problem, or am I missing something? I think what you mean to say is that the daemon should keep track of the filesystem *state* of the working copy, or alternatively the deltas/changes to some known state (such as .git/index)?
> 
> I think git process can keep track of filesystem state (and save it
> down if necessary).
[...]
Ah, saving the state was the missing bits, thanks.

However, AFAIK inotify doesn't work recursively, so the daemon would at least have to track the directory structure to be able to register / unregister inotify handlers as directories come and go.

>> Consider 'git status; make; make clean; git status'...that's a *lot* of changes to process for nothing (potentially slowing down make).
> 
> Yeah. In my opinion, the daemon should realize that at some point
> accumulated changes are too much that it's not worth collecting
> anymore, and drop them all. Git will do it the normal/slow way. After
> that the daemon picks up again. We only optimize for the case when
> little changes are made in filesystem.
> 

That sounds reasonable...

>> Then there's the issue of stale data in the cache. Modifying porcelain commands that use 'git status --porcelain' to compile their changesets will want 100% exact data. I'm not saying its not doable, but adding another platform specific, caching daemon to the tool chain doesn't exactly simplify things...
>>
>> But perhaps I'm too pessimistic (or just stigmatized by inherently slow and out-of-date TGitCache/TSvnCache on Windows :-)
> 
> Thanks. I didn't know about TGitCache. Will dig it up. Maybe we can
> learn something from it (or realize the daemon approach is futile
> after all).
> 

TGitCache/TSvnCache are the background processes in TortoiseGit/TortoiseSvn that keep track of filesystem state to display icon overlays in Windows Explorer.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-13 17:50     ` Karsten Blees
@ 2013-03-13 19:38       ` Junio C Hamano
  2013-03-14 10:58         ` Duy Nguyen
                           ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Junio C Hamano @ 2013-03-13 19:38 UTC (permalink / raw)
  To: Karsten Blees
  Cc: Duy Nguyen, Ramkumar Ramachandra, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Karsten Blees <karsten.blees@gmail.com> writes:

> However, AFAIK inotify doesn't work recursively, so the daemon
> would at least have to track the directory structure to be able to
> register / unregister inotify handlers as directories come and go.

Yes, and you would need one inotify per directory but you do not
have an infinite supply of outstanding inotify watch (wasn't the
limit like 8k per a single uid or something?), so the daemon must be
prepared to say "I'll watch this, that and that directories, but the
consumers should check other directories themselves."

FWIW, I share your suspicion that an effort in the direction this
thread suggests may end up duplicating what the caching vfs layer
already does, and doing so poorly.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-13 19:38       ` Junio C Hamano
@ 2013-03-14 10:58         ` Duy Nguyen
  2013-03-15 16:27         ` Pete Wyckoff
  2013-03-16 14:21         ` Thomas Rast
  2 siblings, 0 replies; 17+ messages in thread
From: Duy Nguyen @ 2013-03-14 10:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karsten Blees, Ramkumar Ramachandra, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

On Thu, Mar 14, 2013 at 2:38 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Karsten Blees <karsten.blees@gmail.com> writes:
>
>> However, AFAIK inotify doesn't work recursively, so the daemon
>> would at least have to track the directory structure to be able to
>> register / unregister inotify handlers as directories come and go.
>
> Yes, and you would need one inotify per directory but you do not
> have an infinite supply of outstanding inotify watch (wasn't the
> limit like 8k per a single uid or something?), so the daemon must be
> prepared to say "I'll watch this, that and that directories, but the
> consumers should check other directories themselves."

Hey I did not know that. Webkit has about 6k leaf dirs and 182k files.
Watching the top N biggest directories would cover M% of cached files:

   N     M%
  10   8.60
  20  13.28
  30  17.52
  40  20.52
  50  23.55
 200  49.70
 676  75.00
 863  80.00
1486  90.00

So it's trade-off. We can cut some syscall cost off but we probably
need to pay some for inotify. And we definitely can't watch full
worktree. I don't know how costly it may be for watching many
directories. If it's not so costly, watching 256 or 512 dirs might be
enough.

What about Windows? Does the equivalent mechanism have similar limits?

> FWIW, I share your suspicion that an effort in the direction this
> thread suggests may end up duplicating what the caching vfs layer
> already does, and doing so poorly.

I'm still curious how it works out. Maybe it's not up to the original
expectation, but hopefully it will speed things up a bit.
-- 
Duy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-13 19:38       ` Junio C Hamano
  2013-03-14 10:58         ` Duy Nguyen
@ 2013-03-15 16:27         ` Pete Wyckoff
  2013-03-16 14:21         ` Thomas Rast
  2 siblings, 0 replies; 17+ messages in thread
From: Pete Wyckoff @ 2013-03-15 16:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karsten Blees, Duy Nguyen, Ramkumar Ramachandra, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

gitster@pobox.com wrote on Wed, 13 Mar 2013 12:38 -0700:
> Karsten Blees <karsten.blees@gmail.com> writes:
> 
> > However, AFAIK inotify doesn't work recursively, so the daemon
> > would at least have to track the directory structure to be able to
> > register / unregister inotify handlers as directories come and go.
> 
> Yes, and you would need one inotify per directory but you do not
> have an infinite supply of outstanding inotify watch (wasn't the
> limit like 8k per a single uid or something?), so the daemon must be
> prepared to say "I'll watch this, that and that directories, but the
> consumers should check other directories themselves."

fanotify is an option here too; it can watch an entire file
system.

		-- Pete

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-13 19:38       ` Junio C Hamano
  2013-03-14 10:58         ` Duy Nguyen
  2013-03-15 16:27         ` Pete Wyckoff
@ 2013-03-16 14:21         ` Thomas Rast
  2013-03-18  8:24           ` Ramkumar Ramachandra
  2 siblings, 1 reply; 17+ messages in thread
From: Thomas Rast @ 2013-03-16 14:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karsten Blees, Duy Nguyen, Ramkumar Ramachandra, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Junio C Hamano <gitster@pobox.com> writes:

> Karsten Blees <karsten.blees@gmail.com> writes:
>
>> However, AFAIK inotify doesn't work recursively, so the daemon
>> would at least have to track the directory structure to be able to
>> register / unregister inotify handlers as directories come and go.
>
> Yes, and you would need one inotify per directory but you do not
> have an infinite supply of outstanding inotify watch (wasn't the
> limit like 8k per a single uid or something?), so the daemon must be
> prepared to say "I'll watch this, that and that directories, but the
> consumers should check other directories themselves."

Those are tunable limits though.  For example I run this silly hack

  https://github.com/trast/watch

with the shell snippets to be able to quickly cd a shell to where
something recently happened.  I am able to watch most of my "working
set" even under default limits, which here (opensuse tumbleweed, kernel
3.8.x, x86_64) are

  $ cat /proc/sys/fs/inotify/max_user_watches 
  65536
  $ cat /proc/sys/fs/inotify/max_user_instances 
  128

I'm not sure if other distros impose tighter limits by default, but as
it stands you're not very likely to hit the 65k watches limit in any
given repo.  It seems more likely that you might hit the 128 instances
limit if we go with a design that uses one daemon per repo, if you run a
script that accesses many repos.  For example, in an android tree I have
lying around,

  $ repo list | wc -l
  297

That alone might indicate it would be a good idea to have one "global"
git-agent that starts on demand, rather than a per-repo daemon.
Otherwise we'd have to find a way to discover "old" daemons and tell
them to quit when we hit max_user_instances.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-16 14:21         ` Thomas Rast
@ 2013-03-18  8:24           ` Ramkumar Ramachandra
  2013-03-18 10:07             ` Thomas Rast
  0 siblings, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-03-18  8:24 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Karsten Blees, Duy Nguyen, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Junio C Hamano wrote:
> Yes, and you would need one inotify per directory but you do not
> have an infinite supply of outstanding inotify watch (wasn't the
> limit like 8k per a single uid or something?), so the daemon must be
> prepared to say "I'll watch this, that and that directories, but the
> consumers should check other directories themselves."
>
> FWIW, I share your suspicion that an effort in the direction this
> thread suggests may end up duplicating what the caching vfs layer
> already does, and doing so poorly.

Thomas Rast wrote:
>   $ cat /proc/sys/fs/inotify/max_user_watches
>   65536
>   $ cat /proc/sys/fs/inotify/max_user_instancest
>   128

>From Junio's and Thomas' observations, I'm inclined to think that
inotify is ill-suited for the problem we are trying to solve.  It is
designed as a per-directory watch, because VFS can quickly supply the
inodes for a directory entry.  As such, I think the ideal usecase for
inotify is to execute something immediately when a change takes place
in a directory: it's well-suited for solutions like Dropbox (which I
think is poorly designed to begin with, but that's offtopic).  It
doesn't substitute of augment VFS caching.  I suspect the VFS cache
works by caching the inodes in a frequently used directory entry, thus
optimizing calls like lstat() on them.

The correct solution for our problem is to get VFS to recognize our
repository as a unit: the repository is not a bunch of frequently-used
directory entries, but a frequently-used unit in itself.  We need an
optimization that will work on recursively on a directory entry.
However, since the repository is a special usecase, I suspect adding
an rwatch() system call (or similar) will be necessary to register the
repository with VFS.  The design of this feature should be transparent
to userland, and their filesystem calls will be optimized magically.
We certainly don't need something as fine-grained as inotify to
perform these optimizations: if the tree hash of a registered
repository changes frequently enough, we have to optimize operations
on that directory tree (recursively).

Inputs from btrfs/ vfs hackers would be appreciated.  I'll take out
some time to look at them myself this week.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-18  8:24           ` Ramkumar Ramachandra
@ 2013-03-18 10:07             ` Thomas Rast
  2013-03-25 10:44               ` Ramkumar Ramachandra
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Rast @ 2013-03-18 10:07 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Junio C Hamano, Karsten Blees, Duy Nguyen, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> Yes, and you would need one inotify per directory but you do not
>> have an infinite supply of outstanding inotify watch (wasn't the
>> limit like 8k per a single uid or something?), so the daemon must be
>> prepared to say "I'll watch this, that and that directories, but the
>> consumers should check other directories themselves."
>>
>> FWIW, I share your suspicion that an effort in the direction this
>> thread suggests may end up duplicating what the caching vfs layer
>> already does, and doing so poorly.
>
> Thomas Rast wrote:
>>   $ cat /proc/sys/fs/inotify/max_user_watches
>>   65536
>>   $ cat /proc/sys/fs/inotify/max_user_instancest
>>   128
>
> From Junio's and Thomas' observations, I'm inclined to think that
> inotify is ill-suited for the problem we are trying to solve.  It is
> designed as a per-directory watch, because VFS can quickly supply the
> inodes for a directory entry.  As such, I think the ideal usecase for
> inotify is to execute something immediately when a change takes place
> in a directory: it's well-suited for solutions like Dropbox (which I
> think is poorly designed to begin with, but that's offtopic).  It
> doesn't substitute of augment VFS caching.  I suspect the VFS cache
> works by caching the inodes in a frequently used directory entry, thus
> optimizing calls like lstat() on them.

I have three objections to changing the kernel to fit us, as opposed to
just using inotify:

* inotify works.  I can watch most of my $HOME with the hack I linked
  earlier[1].  Yes, it's a lot of coding around the problem that it is
  nonrecursive, but we already have a lot of code around the problem
  that we can't ask the VFS for diffs between points in time (namely,
  the whole business with an index and lstat() loops).

* inotify is here today.  Even if you got a hypothetical notifier into
  the kernel today, you'd have to wait months/years until it is
  available in distros, and years until everyone has it.

* I'll bet you a beer that the kernel folks already had the same
  discussion when they made inotify.  There has to be a reason why it's
  better than providing for recursive watches.


[1]  https://github.com/trast/watch

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-18 10:07             ` Thomas Rast
@ 2013-03-25 10:44               ` Ramkumar Ramachandra
  2013-03-25 10:59                 ` Duy Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-03-25 10:44 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Karsten Blees, Duy Nguyen, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Just a small heads-up for people using Emacs.  24.4 has inotify
support, and magit-inotify.el [1] has already started using it.  From
initial impressions, I'm quite impressed with it.

[1]: https://github.com/magit/magit/blob/master/contrib/magit-inotify.el

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-25 10:44               ` Ramkumar Ramachandra
@ 2013-03-25 10:59                 ` Duy Nguyen
  2013-03-25 11:13                   ` Ramkumar Ramachandra
  0 siblings, 1 reply; 17+ messages in thread
From: Duy Nguyen @ 2013-03-25 10:59 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Thomas Rast, Junio C Hamano, Karsten Blees, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

On Mon, Mar 25, 2013 at 5:44 PM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> Just a small heads-up for people using Emacs.  24.4 has inotify
> support, and magit-inotify.el [1] has already started using it.  From
> initial impressions, I'm quite impressed with it.

Have you tried it? From a quick look, it seems to watch all
directories. I wonder how it performs on webkit (at least 5k dirs)

> [1]: https://github.com/magit/magit/blob/master/contrib/magit-inotify.el
-- 
Duy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
  2013-03-25 10:59                 ` Duy Nguyen
@ 2013-03-25 11:13                   ` Ramkumar Ramachandra
  0 siblings, 0 replies; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-03-25 11:13 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Thomas Rast, Junio C Hamano, Karsten Blees, Git List,
	Torsten Bögershausen, Robert Zeh, Jeff King, Erik Faye-Lund,
	Drew Northup

Duy Nguyen wrote:
> On Mon, Mar 25, 2013 at 5:44 PM, Ramkumar Ramachandra
> <artagnon@gmail.com> wrote:
>> Just a small heads-up for people using Emacs.  24.4 has inotify
>> support, and magit-inotify.el [1] has already started using it.  From
>> initial impressions, I'm quite impressed with it.
>
> Have you tried it? From a quick look, it seems to watch all
> directories. I wonder how it performs on webkit (at least 5k dirs)

Yeah, but only on some small repositories.  I expect it to be
problematic on big repositories: if I'm not mistaken, Emacs will
block.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-03-25 11:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-10 20:17 [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline Ramkumar Ramachandra
2013-03-11 17:05 ` Heiko Voigt
2013-03-12  9:43   ` Ramkumar Ramachandra
2013-03-12  9:50     ` Erik Faye-Lund
2013-03-12  9:55     ` Jeff King
2013-03-12 23:21 ` Karsten Blees
2013-03-13  1:03   ` Duy Nguyen
2013-03-13 17:50     ` Karsten Blees
2013-03-13 19:38       ` Junio C Hamano
2013-03-14 10:58         ` Duy Nguyen
2013-03-15 16:27         ` Pete Wyckoff
2013-03-16 14:21         ` Thomas Rast
2013-03-18  8:24           ` Ramkumar Ramachandra
2013-03-18 10:07             ` Thomas Rast
2013-03-25 10:44               ` Ramkumar Ramachandra
2013-03-25 10:59                 ` Duy Nguyen
2013-03-25 11:13                   ` Ramkumar Ramachandra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).