linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Beyond inotify recursive watches
@ 2013-03-18 10:48 Ramkumar Ramachandra
  2013-04-05 15:55 ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Ramkumar Ramachandra @ 2013-03-18 10:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Junio C Hamano, Thomas Rast, Duy Nguyễn, Jeff King, Karsten Blees

Hi,

We, the Git folks, were wondering how to speed things up.  In an
strace of "git status" on linux-2.6.git, we found:

  top syscalls sorted     top syscalls sorted
  by acc. time            by number
  ----------------------------------------------
  0.401906 40950 lstat    0.401906 40950 lstat
  0.190484 5343 getdents  0.150055 5374 open
  0.150055 5374 open      0.190484 5343 getdents
  0.074843 2806 close     0.074843 2806 close
  0.003216 157 read       0.003216 157 read

Most of this happens when we try to build the index, querying for
changes in tracked files and discovering untracked files.  It was
suggested that we can use inotify to speed things up: we'll write a
user-wide daemon (like ssh_client) that will set up watches on each
directory of each git repository.  A repository-wide daemon wouldn't
work because /proc/sys/fs/inotify/max_user_instances reads 128 on
typical linux-3.8 systems, and this is problematic.

However, Karsten and Junio point out that our efforts might be futile
as we are trying to do what the VFS caching already does, and doing it
poorly.  Speedups, if any, would be minor and certainly not worth the
effort.

I think inotify is a poorly suited solution for our needs, as setting
up recursive watches is horribly inelegant.  I think it's a
well-suited solution for something like Dropbox, which just executes
something when there's a change in a specified directory.  Also, I
suspect VFS caching works by optimizing filesystem calls for
frequently used directory entries.  A git repository is not a
collection of frequently-used directory entries, but a frequently used
unit.  I know very little about how VFS works, but I'm wondering if we
can make any changes in VFS to make it perform better with git
repositories.  We won't need something as fine-grained as inotify: if
the tree hash of a directory entry changes frequently enough, optimize
all filesystem calls to inodes in the directory recursively.
Recursively optimizing a directory is useless in the general case, and
I would imagine something like a new rwatch() syscall for git to
register the repository with VFS.  All system calls will then be
magically optimized, and few changes need to be made to git.  The
added side-benefit is that all other version control systems can use
it too.

Thanks for reading.

Ram

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-03-18 10:48 Beyond inotify recursive watches Ramkumar Ramachandra
@ 2013-04-05 15:55 ` Jan Kara
  2013-04-05 16:12   ` Al Viro
  2013-04-05 16:56   ` Ramkumar Ramachandra
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2013-04-05 15:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: linux-kernel, Junio C Hamano, Thomas Rast, Duy Nguyễn,
	Jeff King, Karsten Blees

  Hi,

On Mon 18-03-13 16:18:11, Ramkumar Ramachandra wrote:
> We, the Git folks, were wondering how to speed things up.  In an
> strace of "git status" on linux-2.6.git, we found:
> 
>   top syscalls sorted     top syscalls sorted
>   by acc. time            by number
>   ----------------------------------------------
>   0.401906 40950 lstat    0.401906 40950 lstat
>   0.190484 5343 getdents  0.150055 5374 open
>   0.150055 5374 open      0.190484 5343 getdents
>   0.074843 2806 close     0.074843 2806 close
>   0.003216 157 read       0.003216 157 read
> 
> Most of this happens when we try to build the index, querying for
> changes in tracked files and discovering untracked files.  It was
> suggested that we can use inotify to speed things up: we'll write a
> user-wide daemon (like ssh_client) that will set up watches on each
> directory of each git repository.  A repository-wide daemon wouldn't
> work because /proc/sys/fs/inotify/max_user_instances reads 128 on
> typical linux-3.8 systems, and this is problematic.
> 
> However, Karsten and Junio point out that our efforts might be futile
> as we are trying to do what the VFS caching already does, and doing it
> poorly.  Speedups, if any, would be minor and certainly not worth the
> effort.
> 
> I think inotify is a poorly suited solution for our needs, as setting
> up recursive watches is horribly inelegant.  I think it's a
> well-suited solution for something like Dropbox, which just executes
> something when there's a change in a specified directory.  Also, I
> suspect VFS caching works by optimizing filesystem calls for
> frequently used directory entries.  A git repository is not a
> collection of frequently-used directory entries, but a frequently used
> unit.  I know very little about how VFS works, but I'm wondering if we
> can make any changes in VFS to make it perform better with git
> repositories.  We won't need something as fine-grained as inotify: if
> the tree hash of a directory entry changes frequently enough, optimize
> all filesystem calls to inodes in the directory recursively.
> Recursively optimizing a directory is useless in the general case, and
> I would imagine something like a new rwatch() syscall for git to
> register the repository with VFS.  All system calls will then be
> magically optimized, and few changes need to be made to git.  The
> added side-benefit is that all other version control systems can use
> it too.
  Hum, I have somewhat hard time to understand what do you mean by
'magically optimized syscalls'. What should happen in VFS to speedup your
load?

What your question reminds me is an idea of recursive modification time
stamp on directories. That is a time stamp that gets updated whenever
anything in the tree under the directory changes. Now this would be too
expensive to maintain so there's also a trick implemented that you update
the time stamp (and continue updating recursive time stamps upwards) only
if a special flag is set on the directory. And you clear the flag at that
moment. So until someone checks the time stamp and resets the flag no
further updates of the recursive modification time happen.

This scheme works for arbitrary number of processes interested in recursive
time stamps (only updates of the time stamps get more frequent). What is
somewhat inconvenient is that this only tells you something in the
directory or its subtree changed so you still have to scan all the
directories on the path to modified file. So I'm not sure of how much use
this would be to you.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-05 15:55 ` Jan Kara
@ 2013-04-05 16:12   ` Al Viro
  2013-04-08  9:31     ` Jan Kara
  2013-04-05 16:56   ` Ramkumar Ramachandra
  1 sibling, 1 reply; 9+ messages in thread
From: Al Viro @ 2013-04-05 16:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ramkumar Ramachandra, linux-kernel, Junio C Hamano, Thomas Rast,
	Duy Nguy???n, Jeff King, Karsten Blees

On Fri, Apr 05, 2013 at 05:55:34PM +0200, Jan Kara wrote:

> What your question reminds me is an idea of recursive modification time
> stamp on directories. That is a time stamp that gets updated whenever
> anything in the tree under the directory changes. Now this would be too
> expensive to maintain so there's also a trick implemented that you update
> the time stamp (and continue updating recursive time stamps upwards) only
> if a special flag is set on the directory. And you clear the flag at that
> moment. So until someone checks the time stamp and resets the flag no
> further updates of the recursive modification time happen.
> 
> This scheme works for arbitrary number of processes interested in recursive
> time stamps (only updates of the time stamps get more frequent). What is
> somewhat inconvenient is that this only tells you something in the
> directory or its subtree changed so you still have to scan all the
> directories on the path to modified file. So I'm not sure of how much use
> this would be to you.

Feel free to write up the details of locking you'll need for that.  It will
*not* be fun...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-05 15:55 ` Jan Kara
  2013-04-05 16:12   ` Al Viro
@ 2013-04-05 16:56   ` Ramkumar Ramachandra
  1 sibling, 0 replies; 9+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-05 16:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, Junio C Hamano, Thomas Rast, Duy Nguyễn,
	Jeff King, Karsten Blees, Git List

Jan Kara wrote:
>   Hum, I have somewhat hard time to understand what do you mean by
> 'magically optimized syscalls'. What should happen in VFS to speedup your
> load?

In retrospect, I think this is a terrible hack to begin with.  Tuning
the filesystem specifically for git repositories is inelegant on so
many levels, I can't recall why I ever thought it would be a good
idea.  Like all software, Git has scaling issues with ultra-large
repositories.  Too many stat() calls is just one of the problems:
there will be too many objects to do any operation at reasonable
speed, and the overall UX would just suck.  Instead of growing to a
huge monolithic beast that spawns off worker threads for everything
and ultimately dying off, I've decided that git should take a
different direction: it should work with well with many small
easily-composable repositories.  I've started work on this already,
and it looks very promising.

Let the filesystem people do what they do best: optimizing for all
applications uniformly.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-05 16:12   ` Al Viro
@ 2013-04-08  9:31     ` Jan Kara
  2013-04-10 18:36       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-04-08  9:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Ramkumar Ramachandra, linux-kernel, Junio C Hamano,
	Thomas Rast, Duy Nguy???n, Jeff King, Karsten Blees

On Fri 05-04-13 17:12:29, Al Viro wrote:
> On Fri, Apr 05, 2013 at 05:55:34PM +0200, Jan Kara wrote:
> 
> > What your question reminds me is an idea of recursive modification time
> > stamp on directories. That is a time stamp that gets updated whenever
> > anything in the tree under the directory changes. Now this would be too
> > expensive to maintain so there's also a trick implemented that you update
> > the time stamp (and continue updating recursive time stamps upwards) only
> > if a special flag is set on the directory. And you clear the flag at that
> > moment. So until someone checks the time stamp and resets the flag no
> > further updates of the recursive modification time happen.
> > 
> > This scheme works for arbitrary number of processes interested in recursive
> > time stamps (only updates of the time stamps get more frequent). What is
> > somewhat inconvenient is that this only tells you something in the
> > directory or its subtree changed so you still have to scan all the
> > directories on the path to modified file. So I'm not sure of how much use
> > this would be to you.
> 
> Feel free to write up the details of locking you'll need for that.  It will
> *not* be fun...
  Actually, it shouldn't be too bad if we don't guarantee we walk exactly
the path used for modification. Then it is enough to do the same thing as
following .. from each directory.

And for userspace that should be enough because if timestamp update races
with renames or similar actions somewhere up in the three then these
operations will generate modification events and update time stamps as
well. So userspace will notice there was a change.

So this part should be doable. But as I wrote before, we might need some
fs-internal index to allow efficient tracking of what has changed in one
directory anyway and locking rules / costs for that are non-obvious.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-08  9:31     ` Jan Kara
@ 2013-04-10 18:36       ` Ramkumar Ramachandra
  2013-04-10 20:40         ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-10 18:36 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-kernel

[Dropping git people from the CC, as this is not relevant to git anymore]

Okay, let me attempt to understand this.

Jan Kara wrote:
> On Fri 05-04-13 17:12:29, Al Viro wrote:
>> On Fri, Apr 05, 2013 at 05:55:34PM +0200, Jan Kara wrote:
>>
>> > What your question reminds me is an idea of recursive modification time
>> > stamp on directories. That is a time stamp that gets updated whenever
>> > anything in the tree under the directory changes. Now this would be too
>> > expensive to maintain so there's also a trick implemented that you update
>> > the time stamp (and continue updating recursive time stamps upwards) only
>> > if a special flag is set on the directory. And you clear the flag at that
>> > moment. So until someone checks the time stamp and resets the flag no
>> > further updates of the recursive modification time happen.

If I understand correctly, I'll have to set a flag on the toplevel
directory of the repository, and this recursive timestamp update magic
will apply to my entire worktree.  How exactly?  Are you going to
store that path somewhere? Whenever there's any modification on the
filesystem, you can look at the path of the inode you're modifying,
and see if it's under this path.  If it is, we'll have to keep update
the container dentry's timestamp, and continue recursively until we
hit the toplevel dentry.  On the toplevel dentry, you'll flip a flag
in addition to modifying the timestamp.

Later, I'll have to check if the timestamp changed from what I have
remembered in git.  If there is a change, I'll look through the
timestamp of every dentry downwards until I find the modified inode:
certainly much fewer fs calls.  After updating the git index with
fresh information, I'll have to flip the flag on the toplevel
directory again.

>> > This scheme works for arbitrary number of processes interested in recursive
>> > time stamps (only updates of the time stamps get more frequent). What is
>> > somewhat inconvenient is that this only tells you something in the
>> > directory or its subtree changed so you still have to scan all the
>> > directories on the path to modified file. So I'm not sure of how much use
>> > this would be to you.

I think it's a very useful feature to have in general, not just for
git or version control systems.

>> Feel free to write up the details of locking you'll need for that.  It will
>> *not* be fun...

Is this what you mean: What happens if two inodes under the toplevel
directory change nearly simultaneously?  The two propagation threads
will conflict.

>   Actually, it shouldn't be too bad if we don't guarantee we walk exactly
> the path used for modification. Then it is enough to do the same thing as
> following .. from each directory.

I have no idea what this means.

> And for userspace that should be enough because if timestamp update races
> with renames or similar actions somewhere up in the three then these
> operations will generate modification events and update time stamps as
> well. So userspace will notice there was a change.

Do you mean: as long as updating the timestamp is atomic, it doesn't
matter than many threads race to update it (it is guaranteed that
every thread does a successful update)?

> So this part should be doable. But as I wrote before, we might need some
> fs-internal index to allow efficient tracking of what has changed in one
> directory anyway and locking rules / costs for that are non-obvious.

Why does it have to be fs-internal, and not at the VFS layer?  I don't
know what VFS looks like, but of the little I know about btrfs:
There's one global B+ tree where dentry paths are keyed by their CRC32
hashes.  The dentry contains many inodes, and you're worried about
efficiently tracking which inodes have changed.  Why does there have
to be an efficiency concern there?  I suppose multiple inodes'
timestamp changing simultaneously can spawn threads that race to
update the dentry's timestamp.  Why is this challenge different from
the recursive propagation challenge?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-10 18:36       ` Ramkumar Ramachandra
@ 2013-04-10 20:40         ` Jan Kara
  2013-04-11 11:59           ` Ramkumar Ramachandra
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-04-10 20:40 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jan Kara, Al Viro, linux-kernel

On Thu 11-04-13 00:06:02, Ramkumar Ramachandra wrote:
> [Dropping git people from the CC, as this is not relevant to git anymore]
> 
> Okay, let me attempt to understand this.
> 
> Jan Kara wrote:
> > On Fri 05-04-13 17:12:29, Al Viro wrote:
> >> On Fri, Apr 05, 2013 at 05:55:34PM +0200, Jan Kara wrote:
> >>
> >> > What your question reminds me is an idea of recursive modification time
> >> > stamp on directories. That is a time stamp that gets updated whenever
> >> > anything in the tree under the directory changes. Now this would be too
> >> > expensive to maintain so there's also a trick implemented that you update
> >> > the time stamp (and continue updating recursive time stamps upwards) only
> >> > if a special flag is set on the directory. And you clear the flag at that
> >> > moment. So until someone checks the time stamp and resets the flag no
> >> > further updates of the recursive modification time happen.
> 
> If I understand correctly, I'll have to set a flag on the toplevel
> directory of the repository, and this recursive timestamp update magic
> will apply to my entire worktree.  How exactly?  Are you going to
> store that path somewhere?
  Initially, you will have to flip the flag on every directory in the
subtree. But the flag is persistently stored on disk so you have to do it
once when the directory is created and then each time you notice the
directory has changed and the flag has been cleared.

> Whenever there's any modification on the
> filesystem, you can look at the path of the inode you're modifying,
> and see if it's under this path.  If it is, we'll have to keep update
> the container dentry's timestamp, and continue recursively until we
> hit the toplevel dentry.  On the toplevel dentry, you'll flip a flag
> in addition to modifying the timestamp.
> 
> Later, I'll have to check if the timestamp changed from what I have
> remembered in git.  If there is a change, I'll look through the
> timestamp of every dentry downwards until I find the modified inode:
> certainly much fewer fs calls.  After updating the git index with
> fresh information, I'll have to flip the flag on the toplevel
> directory again.
  Yes, that's the intended use. And yes, it has a potential for significant
speedup if modifications are relatively rare / concentrated in a few places
in the directory tree even as is. Just it would be more useful if changed
entries in a directory could be located quickly.

> >> > This scheme works for arbitrary number of processes interested in recursive
> >> > time stamps (only updates of the time stamps get more frequent). What is
> >> > somewhat inconvenient is that this only tells you something in the
> >> > directory or its subtree changed so you still have to scan all the
> >> > directories on the path to modified file. So I'm not sure of how much use
> >> > this would be to you.
> 
> I think it's a very useful feature to have in general, not just for
> git or version control systems.
> 
> >> Feel free to write up the details of locking you'll need for that.  It will
> >> *not* be fun...
> 
> Is this what you mean: What happens if two inodes under the toplevel
> directory change nearly simultaneously?  The two propagation threads
> will conflict.
  I think Al is asking how do we lock kernel dentry cache so that we can
safely walk up the tree and update time stamps in presence of other
modifications happening to the directory tree in parallel.

> >   Actually, it shouldn't be too bad if we don't guarantee we walk exactly
> > the path used for modification. Then it is enough to do the same thing as
> > following .. from each directory.
> 
> I have no idea what this means.
> 
> > And for userspace that should be enough because if timestamp update races
> > with renames or similar actions somewhere up in the three then these
> > operations will generate modification events and update time stamps as
> > well. So userspace will notice there was a change.
> 
> Do you mean: as long as updating the timestamp is atomic, it doesn't
> matter than many threads race to update it (it is guaranteed that
> every thread does a successful update)?
  It's not as much timestamp updates themselves racing against each other
but rather things like moving directories in the directory tree racing with
us walking up the tree and updating time stamps - in Linux, directory
locking happens in top-bottom manner (like when you do lookup of a path) so
when you are climbing up, one has to be careful not to introduce races.
 
> > So this part should be doable. But as I wrote before, we might need some
> > fs-internal index to allow efficient tracking of what has changed in one
> > directory anyway and locking rules / costs for that are non-obvious.
> 
> Why does it have to be fs-internal, and not at the VFS layer?
  One reason why we need things to be fs-internal is that we want to store
everything permanently on disk so that e.g. if there's reboot between
modification of a git tree and 'git add -u', you will still find what has
changed since last time you've checked (without walking the whole tree).

> I don't
> know what VFS looks like, but of the little I know about btrfs:
> There's one global B+ tree where dentry paths are keyed by their CRC32
> hashes.  The dentry contains many inodes, and you're worried about
> efficiently tracking which inodes have changed.  Why does there have
> to be an efficiency concern there?  I suppose multiple inodes'
> timestamp changing simultaneously can spawn threads that race to
> update the dentry's timestamp.  Why is this challenge different from
> the recursive propagation challenge?
  My concern is that if you have a directory tree where there are lots of
entries in each directory, then you still have to check a lot of entries
before you find what has changed because you have to scan all entries in
each directory on the modified path. If there was a way to iterate only
through entries in a directory which had the flag cleared, things could be
considerably faster.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-10 20:40         ` Jan Kara
@ 2013-04-11 11:59           ` Ramkumar Ramachandra
  2013-04-11 21:02             ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-11 11:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-kernel

Jan Kara wrote:
>   Initially, you will have to flip the flag on every directory in the
> subtree. But the flag is persistently stored on disk so you have to do it
> once when the directory is created and then each time you notice the
> directory has changed and the flag has been cleared.

How is this any better than setting up inotify recursively on the
directory tree?  I'll have to readdir() each directory in the tree,
looking for more directories to set the flag on.

>   I think Al is asking how do we lock kernel dentry cache so that we can
> safely walk up the tree and update time stamps in presence of other
> modifications happening to the directory tree in parallel.

>   It's not as much timestamp updates themselves racing against each other
> but rather things like moving directories in the directory tree racing with
> us walking up the tree and updating time stamps - in Linux, directory
> locking happens in top-bottom manner (like when you do lookup of a path) so
> when you are climbing up, one has to be careful not to introduce races.

Oh.  So we need to carefully code very fine-grained locking (so that
the entire fs isn't unusable when this recursive update is happening).

>   One reason why we need things to be fs-internal is that we want to store
> everything permanently on disk so that e.g. if there's reboot between
> modification of a git tree and 'git add -u', you will still find what has
> changed since last time you've checked (without walking the whole tree).

Makes sense.  However, doesn't this mean that we have to patch every
filesystem separately for this feature, as opposed to just patching
VFS?

>   My concern is that if you have a directory tree where there are lots of
> entries in each directory, then you still have to check a lot of entries
> before you find what has changed because you have to scan all entries in
> each directory on the modified path. If there was a way to iterate only
> through entries in a directory which had the flag cleared, things could be
> considerably faster.

What are your thoughts on introducing a version of readdir() that only
lists dentries with this flag?

Can you get deeper into the implementation, and point me to the parts
of the code to look at?  Do you have any WIP patches that I can look
at?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Beyond inotify recursive watches
  2013-04-11 11:59           ` Ramkumar Ramachandra
@ 2013-04-11 21:02             ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2013-04-11 21:02 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jan Kara, Al Viro, linux-kernel

On Thu 11-04-13 17:29:39, Ramkumar Ramachandra wrote:
> Jan Kara wrote:
> >   Initially, you will have to flip the flag on every directory in the
> > subtree. But the flag is persistently stored on disk so you have to do it
> > once when the directory is created and then each time you notice the
> > directory has changed and the flag has been cleared.
> 
> How is this any better than setting up inotify recursively on the
> directory tree?  I'll have to readdir() each directory in the tree,
> looking for more directories to set the flag on.
  With inotify you have to do this every time your watching application
starts. With my scheme you have to do this only once in a lifetime because
flags are persistent (stored on disk) once set.

> >   I think Al is asking how do we lock kernel dentry cache so that we can
> > safely walk up the tree and update time stamps in presence of other
> > modifications happening to the directory tree in parallel.
> 
> >   It's not as much timestamp updates themselves racing against each other
> > but rather things like moving directories in the directory tree racing with
> > us walking up the tree and updating time stamps - in Linux, directory
> > locking happens in top-bottom manner (like when you do lookup of a path) so
> > when you are climbing up, one has to be careful not to introduce races.
> 
> Oh.  So we need to carefully code very fine-grained locking (so that
> the entire fs isn't unusable when this recursive update is happening).
  Yes.

> >   One reason why we need things to be fs-internal is that we want to store
> > everything permanently on disk so that e.g. if there's reboot between
> > modification of a git tree and 'git add -u', you will still find what has
> > changed since last time you've checked (without walking the whole tree).
> 
> Makes sense.  However, doesn't this mean that we have to patch every
> filesystem separately for this feature, as opposed to just patching
> VFS?
  True. That's the downside of the flag being persistent. But the
persistency is necessary for some use cases (e.g. if you'd like to speedup
rsync with this, or if you'd like to keep cache of preparsed config files).
Also when the flag is persistent, inodes / dentries are not pinned in
memory which is another problem with using inotify for watching lots of
directories.

> >   My concern is that if you have a directory tree where there are lots of
> > entries in each directory, then you still have to check a lot of entries
> > before you find what has changed because you have to scan all entries in
> > each directory on the modified path. If there was a way to iterate only
> > through entries in a directory which had the flag cleared, things could be
> > considerably faster.
> 
> What are your thoughts on introducing a version of readdir() that only
> lists dentries with this flag?
  Yes, I was considering something like this. But again such list of
entries to return would have to be stored on disk so that we don't have to
lookup and load from disk inodes that don't have the flag set.

> Can you get deeper into the implementation, and point me to the parts
> of the code to look at?  Do you have any WIP patches that I can look
> at?
  Well, I have patches that were working in 2.6.37 and added support for
the feature in ext3 and ext4. But then I got stuck with the questions I
mentioned... But people keep asking for some feature like this so from time
to time I think it might be actually worth it to revive the patches.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-04-11 21:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 10:48 Beyond inotify recursive watches Ramkumar Ramachandra
2013-04-05 15:55 ` Jan Kara
2013-04-05 16:12   ` Al Viro
2013-04-08  9:31     ` Jan Kara
2013-04-10 18:36       ` Ramkumar Ramachandra
2013-04-10 20:40         ` Jan Kara
2013-04-11 11:59           ` Ramkumar Ramachandra
2013-04-11 21:02             ` Jan Kara
2013-04-05 16:56   ` Ramkumar Ramachandra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).