linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: jblunck@suse.de, vaurora@redhat.com, dwmw2@infradead.org,
	viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 13/35] fallthru: ext2 fallthru support
Date: Wed, 21 Apr 2010 18:36:05 +0100	[thread overview]
Message-ID: <20100421173605.GD27575@shareable.org> (raw)
In-Reply-To: <E1O4X02-0003UN-IV@pomaz-ex.szeredi.hu>

Miklos Szeredi wrote:
> On Wed, 21 Apr 2010, Jamie Lokier wrote:
> > Sorry, no: That does not work for bind mounts.  Both layers can have
> > the same st_dev.
> 
> Okay.
> 
> >  Nor does O_NOFOLLOW stop traversal in the middle of
> > a path, there is no handy O_NOCROSSMOUNTS, and no st_mode flag or
> > d_type to say it's a bind mount.  Bind mounts are really a big pain
> > for i_nlink+inotify name counting.
> 
> I'm confused.  You are monitoring a specific file and would like to
> know if something is happening to any of it's links, right?

Not quite. I'm monitoring a million files (say), so I must use
directory watches for most of them.  I need directory watches anyway,
when the semantic is "calling open on /path/to/file and reading would
return the same data", because renames and unlinks are also a way to
invalidate monitored file contents.

At a high level, what we're talking about is the ability to cache and
verify the the validity information derived from reading files in the
filesystem, in a manner which efficiently triggers invalidation only
on changes.  Being able to answer, as quickly as possible, "if I read
this, that and other, will I get the same results as the last time I
did those operations, without having to actually do them to check".
There are many applications, provided the method is reliable.

> Why do you need to know about bind mounts for that?
> 
> Count the number of times you encounter that d_ino and if that matches
> i_nlink then every directory is monitored.  Simple as that, no?

When I see a file has i_nlink > 1, I must watch the file directly
using a file-watch (with inotify; polling with stat() with dnotify),
_unless_ I have seen all the links to that file.

When I've seen all the links to a file, I know that my directory
watches on the directories containing those links are sufficient to
detect changes to the file contents.  That's because every
file change will get notified on at least one of those paths.

I learn that I've seen all the links by seeing d_ino during readdir as
you suggested, or by st_ino in the cases where I've not had reason to
readdir and I have needed to open the file or call stat.

Let's look at some bind mounts.  One where st_ino doesn't work:

    /dirA/file1  [hard link to inode 100, i_nlink = 2]
    /dirA/bound  [bind mount, has /dirA/file1 mounted on it]
    /dirB/file2  [hard link to inode 100, i_nlink = 2]

If the program is asked to open /dirA/file1 and /dirA/bound at various
times, and never asked to readdir /dirA, it will have used fstat not
readdir, seen the same (st_dev,st_ino,i_nlink = 2), and _wrongly_
concluded that it is monitoring all directories containing paths to
the file.

To avoid that problem, it parses /proc/mounts and detects that
/dirA/bound does not contributed to the link count.  This is faster
than calling readdir in all possible places that it can happen.

Another one, where readdir + d_ino doesn't work anyway:

    /dirA/file1  [hard link to inode 100, i_nlink = 2]
    /dirB/dirX   [bind mount, has /dirA mounted on it]
    /dirC/file2  [hard link to inode 100, i_nlink = 2]

This time the program is asked to open /dirA/file1 and
/dirB/dirX/file1 at various times.  Suppose it aggressively calls
readdir on all of the places it goes near, and uses d_ino comparisons.

Bear in mind it can't hunt for /dirC because there may be millions of
directories; this is just an example.

Then it will see the same d_ino for /dirA/file1 and /dirB/dirX/file1,
and wrongly conclude that it is monitoring all directories containing
paths to the file.

So again, it must parse /proc/mounts to detect that everything found
under /dirB/dirX mirrors /dirA.

This is a bit more complicated by the fact that inotify/dnotify send
events to the watching dentry parent of the link used to access a
file, not necessarily the parent in the mounted path space.

Although this doesn't make the bind mount problem go away, this is
where union mounts complicate the picture more:

Ideally, the program may assume that d_ino and st_ino match as long as
the file is open (on any filesystem), or that the filesystem type is
in a whitelist of ones with stable inode numbers (most local
filesystems), and it's not a mountpoint.  So when it's asked to open
at one path, and something else asks it to readdir at another path, it
could combine the information to learn when it's found all entries,
without having to use redundant readdirs and stats.

I'm thinking that I might have to detect union mounts specially in
/proc/mounts, now that they are a VFS feature, and disable a bunch of
assumptions about d_ino when seeing them.  Hopefully it is possible to
unambiguously check for union mount points in /proc/mounts?

d_ino == directory's st_ino sounds neat.  Maybe that will be enough,
as a special magical Linux rule.  When reading a directory, it's cheap
to get the directory's st_ino with fstat().  It's possible to bind
mount a directory on it's _own_ child, so that st_ino == directory's
st_ino, but d_ino isn't affected so maybe that's the trick to use.

-- Jamie

  reply	other threads:[~2010-04-21 17:36 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-15 23:04 [PATCH 00/35] Union mounts - everything but the xattrs Valerie Aurora
2010-04-15 23:04 ` [PATCH 01/35] VFS: Make lookup_hash() return a struct path Valerie Aurora
2010-04-15 23:04   ` [PATCH 02/35] VFS: Add read-only users count to superblock Valerie Aurora
     [not found]     ` <1271372682-21225-4-git-send-email-vaurora@redhat.com>
2010-04-15 23:04       ` [PATCH 04/35] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
2010-04-15 23:04         ` [PATCH 05/35] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
2010-04-15 23:04           ` [PATCH 06/35] whiteout: Set S_OPAQUE inode flag when creating directories Valerie Aurora
2010-04-15 23:04             ` [PATCH 07/35] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
2010-04-15 23:04               ` [PATCH 08/35] whiteout: tmpfs whiteout support Valerie Aurora
2010-04-15 23:04                 ` [PATCH 09/35] whiteout: Split of ext2_append_link() from ext2_add_link() Valerie Aurora
2010-04-15 23:04                   ` [PATCH 10/35] whiteout: ext2 whiteout support Valerie Aurora
2010-04-15 23:04                     ` [PATCH 11/35] whiteout: jffs2 " Valerie Aurora
2010-04-15 23:04                       ` [PATCH 12/35] fallthru: Basic fallthru definitions Valerie Aurora
2010-04-15 23:04                         ` [PATCH 13/35] fallthru: ext2 fallthru support Valerie Aurora
2010-04-15 23:04                           ` [PATCH 14/35] fallthru: jffs2 " Valerie Aurora
2010-04-15 23:04                             ` [PATCH 15/35] fallthru: tmpfs " Valerie Aurora
2010-04-15 23:04                               ` [PATCH 16/35] union-mount: Writable overlays/union mounts documentation Valerie Aurora
2010-04-15 23:04                                 ` [PATCH 17/35] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2010-04-15 23:04                                   ` [PATCH 18/35] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
2010-04-15 23:04                                     ` [PATCH 19/35] union-mount: Drive the union cache via dcache Valerie Aurora
2010-04-15 23:04                                       ` [PATCH 20/35] union-mount: Implement union lookup Valerie Aurora
2010-04-15 23:04                                         ` [PATCH 21/35] union-mount: Support for mounting union mount file systems Valerie Aurora
2010-04-15 23:04                                           ` [PATCH 22/35] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
2010-04-15 23:04                                             ` [PATCH 23/35] union-mount: Copy up directory entries on first readdir() Valerie Aurora
2010-04-15 23:04                                               ` [PATCH 24/35] VFS: Split inode_permission() and create path_permission() Valerie Aurora
2010-04-15 23:04                                                 ` [PATCH 25/35] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
2010-04-15 23:04                                                   ` [PATCH 26/35] union-mount: In-kernel copyup routines Valerie Aurora
2010-04-15 23:04                                                     ` [PATCH 27/35] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
2010-04-15 23:04                                                       ` [PATCH 28/35] union-mount: Implement union-aware link() Valerie Aurora
2010-04-15 23:04                                                         ` [PATCH 29/35] union-mount: Implement union-aware rename() Valerie Aurora
2010-04-15 23:04                                                           ` [PATCH 30/35] union-mount: Implement union-aware writable open() Valerie Aurora
2010-04-15 23:04                                                             ` [PATCH 31/35] union-mount: Implement union-aware chown() Valerie Aurora
2010-04-15 23:04                                                               ` [PATCH 32/35] union-mount: Implement union-aware truncate() Valerie Aurora
2010-04-15 23:04                                                                 ` [PATCH 33/35] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
2010-04-15 23:04                                                                   ` [PATCH 34/35] union-mount: Implement union-aware lchown() Valerie Aurora
2010-04-15 23:04                                                                     ` [PATCH 35/35] union-mount: Implement union-aware utimensat() Valerie Aurora
2010-04-20 16:30                                 ` [PATCH 16/35] union-mount: Writable overlays/union mounts documentation Miklos Szeredi
2010-04-28 20:19                                   ` Valerie Aurora
2010-04-29  9:33                                     ` Miklos Szeredi
2010-04-29 20:20                                       ` Valerie Aurora
2010-05-10 12:57                                         ` Miklos Szeredi
2010-05-17 19:55                                           ` Valerie Aurora
2010-04-29 16:10                                     ` J. R. Okajima
2010-04-19 12:40                           ` [PATCH 13/35] fallthru: ext2 fallthru support Jan Blunck
2010-04-19 13:02                             ` David Woodhouse
2010-04-19 13:23                               ` Jan Blunck
2010-04-19 13:30                                 ` Jamie Lokier
2010-04-19 14:12                                   ` Jan Blunck
2010-04-19 14:23                                     ` Valerie Aurora
2010-04-19 14:53                                       ` Miklos Szeredi
2010-04-20 21:34                                         ` Jamie Lokier
2010-04-21  8:42                                           ` Jan Blunck
2010-04-21  9:22                                             ` Jamie Lokier
2010-04-21  9:34                                               ` Miklos Szeredi
2010-04-21  9:52                                                 ` Jamie Lokier
2010-04-21 10:17                                                   ` Miklos Szeredi
2010-04-21 17:36                                                     ` Jamie Lokier [this message]
2010-04-21 21:34                                                   ` Valerie Aurora
2010-04-21 21:38                                                 ` Valerie Aurora
2010-04-21 22:10                                                   ` Jamie Lokier
2010-04-22 10:30                                               ` J. R. Okajima
2010-04-20 21:40                                       ` Jamie Lokier
2010-04-19 13:03                       ` [PATCH 11/35] whiteout: jffs2 whiteout support David Woodhouse
2010-04-19 14:26                         ` Valerie Aurora
2010-04-16 15:59         ` [PATCH 04/35] whiteout/NFSD: Don't return information about whiteouts to userspace J. Bruce Fields
2010-04-19 12:37           ` Jan Blunck
2010-04-19 13:54             ` J. Bruce Fields
2010-04-15 23:45     ` [PATCH 03/35] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2010-04-21 22:06 ` [PATCH 00/35] Union mounts - everything but the xattrs Randy Dunlap
2010-04-21 23:35   ` Valerie Aurora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100421173605.GD27575@shareable.org \
    --to=jamie@shareable.org \
    --cc=dwmw2@infradead.org \
    --cc=jblunck@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tytso@mit.edu \
    --cc=vaurora@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).