From: Wez Furlong <wez@fb.com>
To: Amir Goldstein <amir73il@gmail.com>, Jan Kara <jack@suse.cz>
Cc: Mo Re Ra <more7.rev@gmail.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: File monitor problem
Date: Wed, 11 Dec 2019 22:06:13 +0000 [thread overview]
Message-ID: <8486261f-9cf2-e14e-c425-d9df7ba7b277@fb.com> (raw)
In-Reply-To: <CAOQ4uximwdf37JdVVfHuM_bxk=X7pz21hnT3thk01oDs_npfhw@mail.gmail.com>
On 12/10/19 12:49, Amir Goldstein wrote:
> [cc: Watchman maintainer]
Hi, I'm the Watchman creator and maintainer, and I also work on a FUSE
based virtual filesystem called EdenFS that works with the source
control systems that we use at Facebook.
I don't have much context on fanotify yet, but I do have a lot of
practical experience with Watchman on various operating systems with
very large recursive directory trees.
Amir asked me to participate in this discussion, and I think it's
probably helpful to give a little bit of context on how we deal with
some of the different watcher interfaces, and also how we see the
consumers of Watchman making use of this sort of data. There are tens
of watchman consuming applications in common use inside FB, and a long
tail of ad-hoc consumers that are not on my radar.
I don't want to flood you with data that may not feel relevant so I'm
going to try to summarize some key points; I'd be happy to elaborate if
you'd like more context! These are written out as numbered statements
to make it easier to reference in further discussion, and are not
intended to be taken as any kind of prescriptive manifesto!
1. Humans think in terms of filenames. Applications largely only care
about filenames. It's rare (it came up as a hypothetical for only one
integrating application at FB in the past several years) that they care
about optimizing for the various rename cases so long as they get
notified that the old name is no longer visible in the filesystem and
that a new name is now visible elsewhere in the portion of the
filesystem that they are watching.
2. Application authors don't want to deal with the complexities of file
watching, they just want to reliably know if/when a named file has
changed. Rename cookies and overflow events are too difficult for most
applications to deal with at all/correctly.
3. Overflow events are painful to deal with. In Watchman we deal with
inotify overflow by re-crawling and examining the directory structure to
re-synchronize with the filesystem state. For very large trees this can
take a very long time.
4. Partially related to 3., restarting the watchman server is an
expensive event because we have to re-crawl everything to re-create the
directory watches with inotify. If the system provided a recursive
watch function and some kind of a change journal that told watchman a
set of N directories to crawl (where N < the M overall number of
directories) and we had a stable identifier for files, then we could
persist state across restarts and cheaply re-synchronize.
5. Is also related to 3. and 4. We use btrfs subvolumes in our CI to
snapshot large repos and make them available to jobs running in
different containers potentially on different hosts. If the journal
mechanism from 4. were available in this situation it would make it
super cheap to bring up watchman in those environments.
6. A downside to recursive watches on macOS is that fseventsd has very
limited ability to add exceptions. A common pattern at FB is that the
buck build system maintains a build artifacts directory called
`buck-out` in the repo. On Linux we can ignore change notifications for
this directory with zero cost by simply not registering it with
inotify. On macOS, the kernel interface allows for a maximum of 8
exclusions. The rest of the changes are delivered to fseventsd which
stats and records everything in a sqlite database. This is a
performance hotspot for us because the number of excluded directories in
a repo exceeds 8, and the uninteresting bulky build artifact writes then
need to transit fseventsd and into watchman before we can decide to
ignore them.
7. Windows has a journal mechanism that could potentially be used as
suggested in 4. above, but it requires privileged access. I happen to
know from someone at MS that worked on a similar system that there is
also a way to access a subset of this data that doesn't require
privileged access, but that isn't documented. I mention this because
elsewhere in this thread is a discussion about privileged access to
similar sounding information.
8. Related to 6. and 7., if there is a privileged system daemon to act
as the interface between userspace<->kernel, care needs to be taken to
avoid the sort of performance hotspot we see on macOS with 6. above.
OK, hopefully that doesn't feel too off the mark! I don't think
everything above needs to be handled directly at the kernel interface.
Some of these details could be handled on the userspace side, either by
a daemon (eg: watchman) or a suitably well designed client library
(although that can make it difficult to consume in some programming
environments).
--Wez
next prev parent reply other threads:[~2019-12-11 22:06 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 10:02 File monitor problem Mo Re Ra
2019-12-04 12:53 ` Amir Goldstein
2019-12-04 14:24 ` Mo Re Ra
2019-12-04 17:34 ` Jan Kara
2019-12-04 18:37 ` Amir Goldstein
2019-12-04 19:02 ` Matthew Wilcox
2019-12-04 20:27 ` Amir Goldstein
2019-12-11 10:06 ` Jan Kara
2019-12-11 13:58 ` Amir Goldstein
2019-12-16 15:00 ` Amir Goldstein
2019-12-19 7:33 ` Amir Goldstein
2019-12-23 18:19 ` Jan Kara
2019-12-23 19:14 ` Amir Goldstein
2019-12-24 3:49 ` Amir Goldstein
2019-12-31 11:53 ` Amir Goldstein
2020-01-07 17:10 ` Jan Kara
2020-01-07 18:56 ` Amir Goldstein
2020-01-08 9:04 ` Jan Kara
2020-01-08 10:25 ` Amir Goldstein
2020-01-08 12:04 ` Jan Kara
2019-12-07 12:36 ` Mo Re Ra
2019-12-10 16:55 ` Jan Kara
2019-12-10 20:49 ` Amir Goldstein
2019-12-11 22:06 ` Wez Furlong [this message]
2019-12-12 5:56 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8486261f-9cf2-e14e-c425-d9df7ba7b277@fb.com \
--to=wez@fb.com \
--cc=amir73il@gmail.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=more7.rev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).