archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <>
To: Amir Goldstein <>
Cc: "Theodore Ts'o" <>, Jan Kara <>,
	Al Viro <>,
	David Howells <>,
	Dave Chinner <>,
	"Darrick J. Wong" <>,, linux-fsdevel <>,
Subject: Re: [RFC] Filesystem error notifications proposal
Date: Tue, 02 Feb 2021 15:51:55 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <> (Amir Goldstein's message of "Fri, 22 Jan 2021 09:36:18 +0200")

Amir Goldstein <> writes:

>> I see.  But the visibility is of a watcher who can see an object, not
>> the application that caused the error.  The fact that the error happened
>> outside the context of the containerized process should not be a problem
>> here, right?  As long as the watcher is watching a mountpoint that can
>> reach the failed inode, that inode should be accessible to the watcher
>> and it should receive a notification. No?
> No, because the mount/path is usually not available in file system
> internal context. Even in vfs, many operations have no mnt context,
> which is the reason that some fanotify event types are available for

Hi Amir, thanks for the explanation.

> I understand the use case of monitoring a fleet of machines to know
> when some machine in the fleet has a corruption.
> I don't understand why the monitoring messages need to carry all the
> debugging info of that corruption.
> For corruption detection use case, it seems more logical to configure
> machines in the fleet to errors=remount-ro and then you'd only ever
> need to signal that a corruption was detected on a filesystem and the
> monitoring agent can access that machine to get more debugging
> info from dmesg or from filesystem recorded first/last error.

The main use-case, as Ted mentioned, is corruption detection in a bunch
of machines and, while allowing them to continue to operate if possible,
schedule the execution of repair tasks and/or data rebuilding of
specific files.  In fact, you are right, we don't need to provide enough
debug information, but the ext4 message, for instance would be
useful. This is more similar to my previous RFC at

There are other use cases requiring us to provide some more information, in
particular the place where the error was raised in the code and the type
of error, for pattern analysis. So just reporting corruption via sysfs,
for instance, wouldn't suffice.

> You may be able to avoid allocation in fanotify if a group keeps
> a pre-allocated "emergency" event, but you won't be able to
> avoid taking locks in fanotify. Even fsnotify takes srcu_read_lock
> and spin_lock in some cases, so you'd have to be carefull with the
> context you call fsnotify from.
> If you agree with my observation that filesystem can abort itself
> on corruption and keep the details internally, then the notification
> of a corrupted state can always be made from a safe context
> sometime after the corruption was detected, regardless of the
> context in which ext4_error() was called.
> IOW, if the real world use cases you have are reporting
> writeback errors and signalling that the filesystem entered a corrupted
> state, then fanotify might be the right tool for the job and you should
> have no need for variable size detailed event info.
> If you want a netoops equivalent reporting infrastructure, then
> you should probably use a different tool.

The main reason I was looking at fanotify was the ability to watch
different mountpoints and objects without watching the entire
filesystem.  This was a requirement raised against my previous
submission linked above, which provided only a mechanism based on
watch_queue to watch the entire filesystem.  If we agree to no longer
watch specific subtrees, I think it makes sense to revert to the
previous proposal, and drop fanotify all together for this use case.

Gabriel Krisman Bertazi

  reply	other threads:[~2021-02-02 20:53 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 20:13 [RFC] Filesystem error notifications proposal Gabriel Krisman Bertazi
2021-01-21  4:01 ` Viacheslav Dubeyko
2021-01-21 11:44 ` Jan Kara
2021-01-21 13:27   ` Amir Goldstein
2021-01-21 18:56   ` Gabriel Krisman Bertazi
2021-01-21 22:44 ` Theodore Ts'o
2021-01-22  0:44   ` Gabriel Krisman Bertazi
2021-01-22  7:36     ` Amir Goldstein
2021-02-02 20:51       ` Gabriel Krisman Bertazi [this message]
2021-01-28 22:28     ` Theodore Ts'o
2021-02-02 20:26       ` Gabriel Krisman Bertazi
2021-02-02 22:34         ` Theodore Ts'o
2021-02-08 18:49           ` Gabriel Krisman Bertazi
2021-02-08 22:19             ` Dave Chinner
2021-02-09  1:08               ` Theodore Ts'o
2021-02-09  5:12                 ` Khazhismel Kumykov
2021-02-09  8:55                 ` Dave Chinner
2021-02-09 17:57                   ` Theodore Ts'o
2021-02-10  0:52                     ` Darrick J. Wong
2021-02-10  2:21                       ` Theodore Ts'o
2021-02-10  2:32                         ` Darrick J. Wong
2021-02-09 17:35               ` Jan Kara
2021-02-10  0:22                 ` Darrick J. Wong
2021-02-10  7:46                 ` Dave Chinner
2021-02-10  0:49               ` Darrick J. Wong
2021-02-10  0:09 ` Darrick J. Wong
2021-02-10  7:23   ` Amir Goldstein
2021-02-10 23:29   ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).