* [PATCH RFC 00/15] File system wide monitoring
@ 2021-04-26 18:41 Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
` (15 more replies)
0 siblings, 16 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Hi,
In an attempt to consolidate some of the feedback from the previous
proposals, I wrote a new attempt to solve the file system error reporting
problem. Before I spend more time polishing it, I'd like to hear your
feedback if I'm going in the wrong direction, in particular with the
modifications to fsnotify.
This RFC follows up on my previous proposals which attempted to leverage
watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
to push error notifications to user space. This proposal starts by, as
suggested by Darrick, limiting the scope of what I'm trying to do to an
interface for administrators to monitor the health of a file system,
instead of a generic inteface for file errors. Therefore, this doesn't
solve the problem of writeback errors or the need to watch a specific
subsystem.
* Format
The feature is implemented on top of fanotify, as a new type of fanotify
mark, FAN_ERROR, which a file system monitoring tool can register to
receive notifications. A notification is split in three parts, and only
the first is guaranteed to exist for any given error event:
- FS generic data: A file system agnostic structure that has a generic
error code and identifies the filesystem. Basically, it let's
userspace know something happen on a monitored filesystem.
- FS location data: Identifies where in the code the problem
happened. (This is important for the use case of analysing frequent
error points that we discussed earlier).
- FS specific data: A detailed error report in a filesystem specific
format that details what the error is. Ideally, a capable monitoring
tool can use the information here for error recovery. For instance,
xfs can put the xfs_scrub structures here, ext4 can send its error
reports, etc. An example of usage is done in the ext4 patch of this
series.
More details on the information in each record can be found on the
documentation introduced in patch 15.
* Using fanotify
Using fanotify for this kind of thing is slightly tricky because we want
to guarantee delivery in some complicated conditions, for instance, the
file system might want to send an error while holding several locks.
Instead of working around file system constraints at the file system
level, this proposal tries to make the FAN_ERROR submission safe in
those contexts. This is done with a new mode in fsnotify that
preallocates the memory at group creation to be used for the
notification submission.
This new mode in fsnotify introduces a ring buffer to queue
notifications, which eliminates the allocation path in fsnotify. From
what I saw, the allocation is the only problem in fsnotify for
filesystems to submit errors in constrained situations.
* Visibility
Since the usecase is limited to a tool for whole file system monitoring,
errors are associated with the superblock and visible filesystem-wide.
It is assumed and required that userspace has CAP_SYS_ADMIN.
* Testing
This was tested with corrupted ext4 images in a few scenarios, which
caused errors to be triggered and monitored with the sample tool
provided in the next to final patch.
* patches
Patches 1-4 massage fanotify attempt to refactor fanotify a bit for
the patches to come. Patch 5 introduce the ring buffer interface to
fsnotify, while patch 6 enable this support in fanotify. Patch 7, 8 wire
the FS_ERROR event type, which will be used by filesystems. In
sequennce, patches 9-12 implement the FAN_ERROR record types and create
the new event. Patch 13 is an ext4 example implementation supporting
this feature. Finally, patches 14 and 15 document and provide examples
of a userspace tool that uses this feature.
I also pushed the full series to:
https://gitlab.collabora.com/krisman/linux -b fanotify-notifications
[1] https://lwn.net/Articles/839310/
[2] https://www.spinics.net/lists/linux-fsdevel/msg187075.html
Gabriel Krisman Bertazi (15):
fanotify: Fold event size calculation to its own function
fanotify: Split fsid check from other fid mode checks
fsnotify: Wire flags field on group allocation
fsnotify: Wire up group information on event initialization
fsnotify: Support event submission through ring buffer
fanotify: Support submission through ring buffer
fsnotify: Support FS_ERROR event type
fsnotify: Introduce helpers to send error_events
fanotify: Introduce generic error record
fanotify: Introduce code location record
fanotify: Introduce filesystem specific data record
fanotify: Introduce the FAN_ERROR mark
ext4: Send notifications on error
samples: Add fs error monitoring example
Documentation: Document the FAN_ERROR framework
.../admin-guide/filesystem-monitoring.rst | 103 ++++++
Documentation/admin-guide/index.rst | 1 +
fs/ext4/super.c | 60 +++-
fs/notify/Makefile | 2 +-
fs/notify/dnotify/dnotify.c | 2 +-
fs/notify/fanotify/fanotify.c | 127 +++++--
fs/notify/fanotify/fanotify.h | 35 +-
fs/notify/fanotify/fanotify_user.c | 319 ++++++++++++++----
fs/notify/fsnotify.c | 2 +-
fs/notify/group.c | 25 +-
fs/notify/inotify/inotify_fsnotify.c | 2 +-
fs/notify/inotify/inotify_user.c | 4 +-
fs/notify/notification.c | 10 +
fs/notify/ring.c | 199 +++++++++++
include/linux/fanotify.h | 12 +-
include/linux/fsnotify.h | 15 +
include/linux/fsnotify_backend.h | 63 +++-
include/uapi/linux/ext4-notify.h | 17 +
include/uapi/linux/fanotify.h | 26 ++
kernel/audit_fsnotify.c | 2 +-
kernel/audit_tree.c | 2 +-
kernel/audit_watch.c | 2 +-
samples/Kconfig | 7 +
samples/Makefile | 1 +
samples/fanotify/Makefile | 3 +
samples/fanotify/fs-monitor.c | 135 ++++++++
26 files changed, 1034 insertions(+), 142 deletions(-)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
create mode 100644 fs/notify/ring.c
create mode 100644 include/uapi/linux/ext4-notify.h
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c
--
2.31.0
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 4:42 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
` (14 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Every time this function is invoked, it is immediately added to
FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
length of info records. This minor clean up folds the rest of the
calculation into the function, which now operates in terms of events,
returning the size of the entire event, including metadata.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify_user.c | 40 +++++++++++++++++-------------
1 file changed, 23 insertions(+), 17 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9e0c1afac8bd..0332c4afeec3 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -64,17 +64,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
}
-static int fanotify_event_info_len(unsigned int fid_mode,
- struct fanotify_event *event)
+static size_t fanotify_event_len(struct fanotify_event *event,
+ unsigned int fid_mode)
{
- struct fanotify_info *info = fanotify_event_info(event);
- int dir_fh_len = fanotify_event_dir_fh_len(event);
- int fh_len = fanotify_event_object_fh_len(event);
- int info_len = 0;
+ size_t event_len = FAN_EVENT_METADATA_LEN;
+ struct fanotify_info *info;
+ int dir_fh_len;
+ int fh_len;
int dot_len = 0;
+ if (!fid_mode)
+ return event_len;
+
+ info = fanotify_event_info(event);
+ dir_fh_len = fanotify_event_dir_fh_len(event);
+ fh_len = fanotify_event_object_fh_len(event);
+
if (dir_fh_len) {
- info_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
+ event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
} else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
/*
* With group flag FAN_REPORT_NAME, if name was not recorded in
@@ -84,9 +91,9 @@ static int fanotify_event_info_len(unsigned int fid_mode,
}
if (fh_len)
- info_len += fanotify_fid_info_len(fh_len, dot_len);
+ event_len += fanotify_fid_info_len(fh_len, dot_len);
- return info_len;
+ return event_len;
}
/*
@@ -98,7 +105,8 @@ static int fanotify_event_info_len(unsigned int fid_mode,
static struct fanotify_event *get_one_event(struct fsnotify_group *group,
size_t count)
{
- size_t event_size = FAN_EVENT_METADATA_LEN;
+ size_t event_size;
+ struct fsnotify_event *fse;
struct fanotify_event *event = NULL;
unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
@@ -108,16 +116,15 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
if (fsnotify_notify_queue_is_empty(group))
goto out;
- if (fid_mode) {
- event_size += fanotify_event_info_len(fid_mode,
- FANOTIFY_E(fsnotify_peek_first_event(group)));
- }
+ fse = fsnotify_peek_first_event(group);
+ event = FANOTIFY_E(fse);
+ event_size = fanotify_event_len(event, fid_mode);
if (event_size > count) {
event = ERR_PTR(-EINVAL);
goto out;
}
- event = FANOTIFY_E(fsnotify_remove_first_event(group));
+ fsnotify_remove_queued_event(group, fse);
if (fanotify_is_perm_event(event->mask))
FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
out:
@@ -334,8 +341,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
- metadata.event_len = FAN_EVENT_METADATA_LEN +
- fanotify_event_info_len(fid_mode, event);
+ metadata.event_len = fanotify_event_len(event, fid_mode);
metadata.metadata_len = FAN_EVENT_METADATA_LEN;
metadata.vers = FANOTIFY_METADATA_VERSION;
metadata.reserved = 0;
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 4:53 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
` (13 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
FAN_ERROR will require fsid, but not necessarily require the filesystem
to expose a file handle. Split those checks into different functions, so
they can be used separately when creating a mark.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify_user.c | 35 +++++++++++++++++++-----------
1 file changed, 22 insertions(+), 13 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 0332c4afeec3..e0d113e3b65c 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1055,7 +1055,23 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
}
/* Check if filesystem can encode a unique fid */
-static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
+static int fanotify_test_fid(struct path *path)
+{
+ /*
+ * We need to make sure that the file system supports at least
+ * encoding a file handle so user can use name_to_handle_at() to
+ * compare fid returned with event to the file handle of watched
+ * objects. However, name_to_handle_at() requires that the
+ * filesystem also supports decoding file handles.
+ */
+ if (!path->dentry->d_sb->s_export_op ||
+ !path->dentry->d_sb->s_export_op->fh_to_dentry)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static int fanotify_check_path_fsid(struct path *path, __kernel_fsid_t *fsid)
{
__kernel_fsid_t root_fsid;
int err;
@@ -1082,17 +1098,6 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
root_fsid.val[1] != fsid->val[1])
return -EXDEV;
- /*
- * We need to make sure that the file system supports at least
- * encoding a file handle so user can use name_to_handle_at() to
- * compare fid returned with event to the file handle of watched
- * objects. However, name_to_handle_at() requires that the
- * filesystem also supports decoding file handles.
- */
- if (!path->dentry->d_sb->s_export_op ||
- !path->dentry->d_sb->s_export_op->fh_to_dentry)
- return -EOPNOTSUPP;
-
return 0;
}
@@ -1230,7 +1235,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
}
if (fid_mode) {
- ret = fanotify_test_fid(&path, &__fsid);
+ ret = fanotify_check_path_fsid(&path, &__fsid);
+ if (ret)
+ goto path_put_and_out;
+
+ ret = fanotify_test_fid(&path);
if (ret)
goto path_put_and_out;
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 5:03 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization Gabriel Krisman Bertazi
` (12 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Introduce a flags field in fsnotify_group to track the mode of
submission this group has.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/dnotify/dnotify.c | 2 +-
fs/notify/fanotify/fanotify_user.c | 4 ++--
fs/notify/group.c | 13 ++++++++-----
fs/notify/inotify/inotify_user.c | 2 +-
include/linux/fsnotify_backend.h | 7 +++++--
kernel/audit_fsnotify.c | 2 +-
kernel/audit_tree.c | 2 +-
kernel/audit_watch.c | 2 +-
8 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index e85e13c50d6d..37960c8750e4 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -383,7 +383,7 @@ static int __init dnotify_init(void)
SLAB_PANIC|SLAB_ACCOUNT);
dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
- dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops);
+ dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0);
if (IS_ERR(dnotify_group))
panic("unable to allocate fsnotify group for dnotify\n");
return 0;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index e0d113e3b65c..f50c4ab721e3 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -929,7 +929,7 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void)
SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
{
struct fsnotify_group *group;
- int f_flags, fd;
+ int f_flags, fd, fsn_flags = 0;
struct user_struct *user;
unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
unsigned int class = flags & FANOTIFY_CLASS_BITS;
@@ -982,7 +982,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
f_flags |= O_NONBLOCK;
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
- group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
+ group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
if (IS_ERR(group)) {
free_uid(user);
return PTR_ERR(group);
diff --git a/fs/notify/group.c b/fs/notify/group.c
index ffd723ffe46d..08acb1afc0c2 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -112,7 +112,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
EXPORT_SYMBOL_GPL(fsnotify_put_group);
static struct fsnotify_group *__fsnotify_alloc_group(
- const struct fsnotify_ops *ops, gfp_t gfp)
+ const struct fsnotify_ops *ops, unsigned int flags, gfp_t gfp)
{
struct fsnotify_group *group;
@@ -134,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group(
INIT_LIST_HEAD(&group->marks_list);
group->ops = ops;
+ group->flags = flags;
return group;
}
@@ -141,18 +142,20 @@ static struct fsnotify_group *__fsnotify_alloc_group(
/*
* Create a new fsnotify_group and hold a reference for the group returned.
*/
-struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
+ unsigned int flags)
{
- return __fsnotify_alloc_group(ops, GFP_KERNEL);
+ return __fsnotify_alloc_group(ops, flags, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
/*
* Create a new fsnotify_group and hold a reference for the group returned.
*/
-struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops,
+ unsigned int flags)
{
- return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT);
+ return __fsnotify_alloc_group(ops, flags, GFP_KERNEL_ACCOUNT);
}
EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group);
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index c71be4fb7dc5..f2687267bc15 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -632,7 +632,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
struct fsnotify_group *group;
struct inotify_event_info *oevent;
- group = fsnotify_alloc_user_group(&inotify_fsnotify_ops);
+ group = fsnotify_alloc_user_group(&inotify_fsnotify_ops, 0);
if (IS_ERR(group))
return group;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index e5409b83e731..ef4352563ede 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -221,6 +221,7 @@ struct fsnotify_group {
* full */
struct mem_cgroup *memcg; /* memcg to charge allocations */
+ unsigned int flags;
/* groups can define private fields here or use the void *private */
union {
@@ -469,8 +470,10 @@ static inline void fsnotify_update_flags(struct dentry *dentry)
/* called from fsnotify listeners, such as fanotify or dnotify */
/* create a new group */
-extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops);
-extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops);
+extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
+ unsigned int flags);
+extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops,
+ unsigned int flags);
/* get reference to a group */
extern void fsnotify_get_group(struct fsnotify_group *group);
/* drop reference on a group from fsnotify_alloc_group */
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 60739d5e3373..dce6a6212f8f 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -182,7 +182,7 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = {
static int __init audit_fsnotify_init(void)
{
- audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops);
+ audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, 0);
if (IS_ERR(audit_fsnotify_group)) {
audit_fsnotify_group = NULL;
audit_panic("cannot create audit fsnotify group");
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 6c91902f4f45..3d045fc791f2 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -1077,7 +1077,7 @@ static int __init audit_tree_init(void)
audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
- audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
+ audit_tree_group = fsnotify_alloc_group(&audit_tree_ops, 0);
if (IS_ERR(audit_tree_group))
audit_panic("cannot initialize fsnotify group for rectree watches");
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 2acf7ca49154..80a8c14de961 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -493,7 +493,7 @@ static const struct fsnotify_ops audit_watch_fsnotify_ops = {
static int __init audit_watch_init(void)
{
- audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops);
+ audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops, 0);
if (IS_ERR(audit_watch_group)) {
audit_watch_group = NULL;
audit_panic("cannot create audit fsnotify group");
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (2 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
` (11 subsequent siblings)
15 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
This is used by following patches when deciding about which fields to
initialize.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify.c | 2 +-
fs/notify/fanotify/fanotify.h | 5 +++--
fs/notify/fanotify/fanotify_user.c | 6 +++---
fs/notify/inotify/inotify_fsnotify.c | 2 +-
fs/notify/inotify/inotify_user.c | 2 +-
include/linux/fsnotify_backend.h | 3 ++-
6 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 1192c9953620..e3669d8a4a64 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -601,7 +601,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
* event queue, so event reported on parent is merged with event
* reported on child when both directory and child watches exist.
*/
- fanotify_init_event(event, (unsigned long)id, mask);
+ fanotify_init_event(group, event, (unsigned long)id, mask);
if (FAN_GROUP_FLAG(group, FAN_REPORT_TID))
event->pid = get_pid(task_pid(current));
else
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 896c819a1786..47299e3d6efd 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -144,10 +144,11 @@ struct fanotify_event {
struct pid *pid;
};
-static inline void fanotify_init_event(struct fanotify_event *event,
+static inline void fanotify_init_event(struct fsnotify_group *group,
+ struct fanotify_event *event,
unsigned long id, u32 mask)
{
- fsnotify_init_event(&event->fse, id);
+ fsnotify_init_event(group, &event->fse, id);
event->mask = mask;
event->pid = NULL;
}
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index f50c4ab721e3..fe605359af88 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -911,7 +911,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
FSNOTIFY_OBJ_TYPE_INODE, mask, flags, fsid);
}
-static struct fsnotify_event *fanotify_alloc_overflow_event(void)
+static struct fsnotify_event *fanotify_alloc_overflow_event(struct fsnotify_group *group)
{
struct fanotify_event *oevent;
@@ -919,7 +919,7 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void)
if (!oevent)
return NULL;
- fanotify_init_event(oevent, 0, FS_Q_OVERFLOW);
+ fanotify_init_event(group, oevent, 0, FS_Q_OVERFLOW);
oevent->type = FANOTIFY_EVENT_TYPE_OVERFLOW;
return &oevent->fse;
@@ -993,7 +993,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
atomic_inc(&user->fanotify_listeners);
group->memcg = get_mem_cgroup_from_mm(current->mm);
- group->overflow_event = fanotify_alloc_overflow_event();
+ group->overflow_event = fanotify_alloc_overflow_event(group);
if (unlikely(!group->overflow_event)) {
fd = -ENOMEM;
goto out_destroy_group;
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 1901d799909b..c6eceb663ac3 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -107,7 +107,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
mask &= ~IN_ISDIR;
fsn_event = &event->fse;
- fsnotify_init_event(fsn_event, 0);
+ fsnotify_init_event(group, fsn_event, 0);
event->mask = mask;
event->wd = i_mark->wd;
event->sync_cookie = cookie;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index f2687267bc15..fb9a62b988e5 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -642,7 +642,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
return ERR_PTR(-ENOMEM);
}
group->overflow_event = &oevent->fse;
- fsnotify_init_event(group->overflow_event, 0);
+ fsnotify_init_event(group, group->overflow_event, 0);
oevent->mask = FS_Q_OVERFLOW;
oevent->wd = -1;
oevent->sync_cookie = 0;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index ef4352563ede..190c6a402e98 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -579,7 +579,8 @@ extern void fsnotify_put_mark(struct fsnotify_mark *mark);
extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info);
extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
-static inline void fsnotify_init_event(struct fsnotify_event *event,
+static inline void fsnotify_init_event(struct fsnotify_group *group,
+ struct fsnotify_event *event,
unsigned long objectid)
{
INIT_LIST_HEAD(&event->list);
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (3 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-26 22:00 ` kernel test robot
` (2 more replies)
2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
` (10 subsequent siblings)
15 siblings, 3 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
In order to support file system health/error reporting over fanotify,
fsnotify needs to expose a submission path that doesn't allow sleeping.
The only problem I identified with the current submission path is the
need to dynamically allocate memory for the event queue.
This patch avoids the problem by introducing a new mode in fsnotify,
where a ring buffer is used to submit events for a group. Each group
has its own ring buffer, and error notifications are submitted
exclusively through it.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/Makefile | 2 +-
fs/notify/group.c | 12 +-
fs/notify/notification.c | 10 ++
fs/notify/ring.c | 199 +++++++++++++++++++++++++++++++
include/linux/fsnotify_backend.h | 37 +++++-
5 files changed, 255 insertions(+), 5 deletions(-)
create mode 100644 fs/notify/ring.c
diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index 63a4b8828df4..61dae1e90f2d 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_FSNOTIFY) += fsnotify.o notification.o group.o mark.o \
- fdinfo.o
+ fdinfo.o ring.o
obj-y += dnotify/
obj-y += inotify/
diff --git a/fs/notify/group.c b/fs/notify/group.c
index 08acb1afc0c2..b99b3de36696 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -81,7 +81,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
* notification against this group. So clearing the notification queue
* of all events is reliable now.
*/
- fsnotify_flush_notify(group);
+ if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+ fsnotify_free_ring_buffer(group);
+ else
+ fsnotify_flush_notify(group);
/*
* Destroy overflow event (we cannot use fsnotify_destroy_event() as
@@ -136,6 +139,13 @@ static struct fsnotify_group *__fsnotify_alloc_group(
group->ops = ops;
group->flags = flags;
+ if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
+ if (fsnotify_create_ring_buffer(group)) {
+ kfree(group);
+ return ERR_PTR(-ENOMEM);
+ }
+ }
+
return group;
}
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 75d79d6d3ef0..32f97e7b7a80 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -51,6 +51,10 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
{
assert_spin_locked(&group->notification_lock);
+
+ if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+ return fsnotify_ring_notify_queue_is_empty(group);
+
return list_empty(&group->notification_list) ? true : false;
}
@@ -132,6 +136,9 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group,
struct fsnotify_event *event)
{
assert_spin_locked(&group->notification_lock);
+
+ if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+ return;
/*
* We need to init list head for the case of overflow event so that
* check in fsnotify_add_event() works
@@ -166,6 +173,9 @@ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
{
assert_spin_locked(&group->notification_lock);
+ if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+ return fsnotify_ring_peek_first_event(group);
+
return list_first_entry(&group->notification_list,
struct fsnotify_event, list);
}
diff --git a/fs/notify/ring.c b/fs/notify/ring.c
new file mode 100644
index 000000000000..75e8af1f8d80
--- /dev/null
+++ b/fs/notify/ring.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/types.h>
+#include <linux/fsnotify.h>
+#include <linux/memcontrol.h>
+
+#define INVALID_RING_SLOT -1
+
+#define FSNOTIFY_RING_PAGES 16
+
+#define NEXT_SLOT(cur, len, ring_size) ((cur + len) & (ring_size-1))
+#define NEXT_PAGE(cur, ring_size) (round_up(cur, PAGE_SIZE) & (ring_size-1))
+
+bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group)
+{
+ assert_spin_locked(&group->notification_lock);
+
+ if (group->ring_buffer.tail == group->ring_buffer.head)
+ return true;
+ return false;
+}
+
+struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group)
+{
+ u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+ struct fsnotify_event *fsn;
+ char *kaddr;
+ u64 tail;
+
+ assert_spin_locked(&group->notification_lock);
+
+again:
+ tail = group->ring_buffer.tail;
+
+ if ((PAGE_SIZE - (tail & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
+ group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
+ goto again;
+ }
+
+ kaddr = kmap_atomic(group->ring_buffer.pages[tail / PAGE_SIZE]);
+ if (!kaddr)
+ return NULL;
+ fsn = (struct fsnotify_event *) (kaddr + (tail & (PAGE_SIZE-1)));
+
+ if (fsn->slot_len == INVALID_RING_SLOT) {
+ group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
+ kunmap_atomic(kaddr);
+ goto again;
+ }
+
+ /* will be unmapped when entry is consumed. */
+ return fsn;
+}
+
+void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
+ struct fsnotify_event *event)
+{
+ u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+ u64 new_tail = NEXT_SLOT(group->ring_buffer.tail, event->slot_len, ring_size);
+
+ kunmap_atomic(event);
+
+ pr_debug("%s: group=%p tail=%llx->%llx ring_size=%llu\n", __func__,
+ group, group->ring_buffer.tail, new_tail, ring_size);
+
+ WRITE_ONCE(group->ring_buffer.tail, new_tail);
+}
+
+struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
+ size_t size)
+ __acquires(&group->notification_lock)
+{
+ struct fsnotify_event *fsn;
+ u64 head, tail;
+ u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+ u64 new_head;
+ void *kaddr;
+
+ if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
+ return ERR_PTR(-EINVAL);
+
+ pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
+ ring_size, size);
+
+ spin_lock(&group->notification_lock);
+again:
+ head = group->ring_buffer.head;
+ tail = group->ring_buffer.tail;
+ new_head = NEXT_SLOT(head, size, ring_size);
+
+ /* head would catch up to tail, corrupting an entry. */
+ if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
+ fsn = ERR_PTR(-ENOMEM);
+ goto err;
+ }
+
+ /*
+ * Not event a skip message fits in the page. We can detect the
+ * lack of space. Move on to the next page.
+ */
+ if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
+ /* Start again on next page */
+ group->ring_buffer.head = NEXT_PAGE(head, ring_size);
+ goto again;
+ }
+
+ kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
+ if (!kaddr) {
+ fsn = ERR_PTR(-EFAULT);
+ goto err;
+ }
+
+ fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
+
+ if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
+ /*
+ * No room in the current page. Add a fake entry
+ * consuming the end the page to avoid splitting event
+ * structure.
+ */
+ fsn->slot_len = INVALID_RING_SLOT;
+ kunmap_atomic(kaddr);
+ /* Start again on the next page */
+ group->ring_buffer.head = NEXT_PAGE(head, ring_size);
+
+ goto again;
+ }
+ fsn->slot_len = size;
+
+ return fsn;
+
+err:
+ spin_unlock(&group->notification_lock);
+ return fsn;
+}
+
+void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn)
+ __releases(&group->notification_lock)
+{
+ u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+ u64 head = group->ring_buffer.head;
+ u64 new_head = NEXT_SLOT(head, fsn->slot_len, ring_size);
+
+ pr_debug("%s: group=%p head=%llx->%llx ring_size=%llu\n", __func__,
+ group, head, new_head, ring_size);
+
+ kunmap_atomic(fsn);
+ group->ring_buffer.head = new_head;
+
+ spin_unlock(&group->notification_lock);
+
+ wake_up(&group->notification_waitq);
+ kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
+
+}
+
+void fsnotify_free_ring_buffer(struct fsnotify_group *group)
+{
+ int i;
+
+ for (i = 0; i < group->ring_buffer.nr_pages; i++)
+ __free_page(group->ring_buffer.pages[i]);
+ kfree(group->ring_buffer.pages);
+ group->ring_buffer.nr_pages = 0;
+}
+
+int fsnotify_create_ring_buffer(struct fsnotify_group *group)
+{
+ int nr_pages = FSNOTIFY_RING_PAGES;
+ int i;
+
+ pr_debug("%s: group=%p pages=%d\n", __func__, group, nr_pages);
+
+ group->ring_buffer.pages = kmalloc_array(nr_pages, sizeof(struct pages *),
+ GFP_KERNEL);
+ if (!group->ring_buffer.pages)
+ return -ENOMEM;
+
+ group->ring_buffer.head = 0;
+ group->ring_buffer.tail = 0;
+
+ for (i = 0; i < nr_pages; i++) {
+ group->ring_buffer.pages[i] = alloc_pages(GFP_KERNEL, 1);
+ if (!group->ring_buffer.pages)
+ goto err_dealloc;
+ }
+
+ group->ring_buffer.nr_pages = nr_pages;
+
+ return 0;
+
+err_dealloc:
+ for (--i; i >= 0; i--)
+ __free_page(group->ring_buffer.pages[i]);
+ kfree(group->ring_buffer.pages);
+ group->ring_buffer.nr_pages = 0;
+ return -ENOMEM;
+}
+
+
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 190c6a402e98..a1a4dd69e5ed 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -74,6 +74,8 @@
#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
FS_OPEN_EXEC_PERM)
+#define FSN_SUBMISSION_RING_BUFFER 0x00000080
+
/*
* This is a list of all events that may get sent to a parent that is watching
* with flag FS_EVENT_ON_CHILD based on fs event on a child of that directory.
@@ -166,7 +168,11 @@ struct fsnotify_ops {
* listener this structure is where you need to be adding fields.
*/
struct fsnotify_event {
- struct list_head list;
+ union {
+ struct list_head list;
+ int slot_len;
+ };
+
unsigned long objectid; /* identifier for queue merges */
};
@@ -191,7 +197,21 @@ struct fsnotify_group {
/* needed to send notification to userspace */
spinlock_t notification_lock; /* protect the notification_list */
- struct list_head notification_list; /* list of event_holder this group needs to send to userspace */
+
+ union {
+ /*
+ * list of event_holder this group needs to send to
+ * userspace. Either a linked list (default), or a ring
+ * buffer(FSN_SUBMISSION_RING_BUFFER).
+ */
+ struct list_head notification_list;
+ struct {
+ struct page **pages;
+ int nr_pages;
+ u64 head;
+ u64 tail;
+ } ring_buffer;
+ };
wait_queue_head_t notification_waitq; /* read() on the notification file blocks on this waitq */
unsigned int q_len; /* events on the queue */
unsigned int max_events; /* maximum events allowed on the list */
@@ -492,6 +512,16 @@ extern int fsnotify_add_event(struct fsnotify_group *group,
struct fsnotify_event *event,
int (*merge)(struct list_head *,
struct fsnotify_event *));
+
+extern int fsnotify_create_ring_buffer(struct fsnotify_group *group);
+extern void fsnotify_free_ring_buffer(struct fsnotify_group *group);
+extern struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
+ size_t size);
+extern void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
+ struct fsnotify_event *event);
+extern bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group);
+struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group);
+extern void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn);
/* Queue overflow event to a notification group */
static inline void fsnotify_queue_overflow(struct fsnotify_group *group)
{
@@ -583,7 +613,8 @@ static inline void fsnotify_init_event(struct fsnotify_group *group,
struct fsnotify_event *event,
unsigned long objectid)
{
- INIT_LIST_HEAD(&event->list);
+ if (!(group->flags & FSN_SUBMISSION_RING_BUFFER))
+ INIT_LIST_HEAD(&event->list);
event->objectid = objectid;
}
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 06/15] fanotify: Support submission through ring buffer
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (4 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 6:02 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
` (9 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
This adds support for the ring buffer mode in fanotify. It is enabled
by a new flag FAN_PREALLOC_QUEUE passed to fanotify_init. If this flag
is enabled, the group only allows marks that support the ring buffer
submission. In a following patch, FAN_ERROR will make use of this
mechanism.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify.c | 77 +++++++++++++++++++---------
fs/notify/fanotify/fanotify_user.c | 81 ++++++++++++++++++------------
include/linux/fanotify.h | 5 +-
include/uapi/linux/fanotify.h | 1 +
4 files changed, 105 insertions(+), 59 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e3669d8a4a64..98591a8155a7 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -612,6 +612,26 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
return event;
}
+static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *group,
+ u32 mask, const void *data,
+ int data_type)
+{
+ size_t size = 0;
+
+ pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
+
+ return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
+}
+
+static void fanotify_ring_write_event(struct fsnotify_group *group,
+ struct fanotify_event *event, u32 mask,
+ const void *data, __kernel_fsid_t *fsid)
+{
+ fanotify_init_event(group, event, 0, mask);
+
+ event->pid = get_pid(task_tgid(current));
+}
+
/*
* Get cached fsid of the filesystem containing the object from any connector.
* All connectors are supposed to have the same fsid, but we do not verify that
@@ -701,31 +721,38 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
return 0;
}
- event = fanotify_alloc_event(group, mask, data, data_type, dir,
- file_name, &fsid);
- ret = -ENOMEM;
- if (unlikely(!event)) {
- /*
- * We don't queue overflow events for permission events as
- * there the access is denied and so no event is in fact lost.
- */
- if (!fanotify_is_perm_event(mask))
- fsnotify_queue_overflow(group);
- goto finish;
- }
-
- fsn_event = &event->fse;
- ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
- if (ret) {
- /* Permission events shouldn't be merged */
- BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
- /* Our event wasn't used in the end. Free it. */
- fsnotify_destroy_event(group, fsn_event);
-
- ret = 0;
- } else if (fanotify_is_perm_event(mask)) {
- ret = fanotify_get_response(group, FANOTIFY_PERM(event),
- iter_info);
+ if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
+ event = fanotify_ring_get_slot(group, mask, data, data_type);
+ if (IS_ERR(event))
+ return PTR_ERR(event);
+ fanotify_ring_write_event(group, event, mask, data, &fsid);
+ fsnotify_ring_commit_slot(group, &event->fse);
+ } else {
+ event = fanotify_alloc_event(group, mask, data, data_type, dir,
+ file_name, &fsid);
+ ret = -ENOMEM;
+ if (unlikely(!event)) {
+ /*
+ * We don't queue overflow events for permission events as
+ * there the access is denied and so no event is in fact lost.
+ */
+ if (!fanotify_is_perm_event(mask))
+ fsnotify_queue_overflow(group);
+ goto finish;
+ }
+ fsn_event = &event->fse;
+ ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
+ if (ret) {
+ /* Permission events shouldn't be merged */
+ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
+ /* Our event wasn't used in the end. Free it. */
+ fsnotify_destroy_event(group, fsn_event);
+
+ ret = 0;
+ } else if (fanotify_is_perm_event(mask)) {
+ ret = fanotify_get_response(group, FANOTIFY_PERM(event),
+ iter_info);
+ }
}
finish:
if (fanotify_is_perm_event(mask))
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index fe605359af88..5031198bf7db 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -521,7 +521,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
* Permission events get queued to wait for response. Other
* events can be destroyed now.
*/
- if (!fanotify_is_perm_event(event->mask)) {
+ if (group->fanotify_data.flags & FAN_PREALLOC_QUEUE) {
+ fsnotify_ring_buffer_consume_event(group, &event->fse);
+ } else if (!fanotify_is_perm_event(event->mask)) {
fsnotify_destroy_event(group, &event->fse);
} else {
if (ret <= 0) {
@@ -587,40 +589,39 @@ static int fanotify_release(struct inode *ignored, struct file *file)
*/
fsnotify_group_stop_queueing(group);
- /*
- * Process all permission events on access_list and notification queue
- * and simulate reply from userspace.
- */
- spin_lock(&group->notification_lock);
- while (!list_empty(&group->fanotify_data.access_list)) {
- struct fanotify_perm_event *event;
-
- event = list_first_entry(&group->fanotify_data.access_list,
- struct fanotify_perm_event, fae.fse.list);
- list_del_init(&event->fae.fse.list);
- finish_permission_event(group, event, FAN_ALLOW);
+ if (!(group->flags & FSN_SUBMISSION_RING_BUFFER)) {
+ /*
+ * Process all permission events on access_list and notification queue
+ * and simulate reply from userspace.
+ */
spin_lock(&group->notification_lock);
- }
-
- /*
- * Destroy all non-permission events. For permission events just
- * dequeue them and set the response. They will be freed once the
- * response is consumed and fanotify_get_response() returns.
- */
- while (!fsnotify_notify_queue_is_empty(group)) {
- struct fanotify_event *event;
-
- event = FANOTIFY_E(fsnotify_remove_first_event(group));
- if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
- spin_unlock(&group->notification_lock);
- fsnotify_destroy_event(group, &event->fse);
- } else {
- finish_permission_event(group, FANOTIFY_PERM(event),
- FAN_ALLOW);
+ while (!list_empty(&group->fanotify_data.access_list)) {
+ struct fanotify_perm_event *event;
+ event = list_first_entry(&group->fanotify_data.access_list,
+ struct fanotify_perm_event, fae.fse.list);
+ list_del_init(&event->fae.fse.list);
+ finish_permission_event(group, event, FAN_ALLOW);
+ spin_lock(&group->notification_lock);
}
- spin_lock(&group->notification_lock);
+ /*
+ * Destroy all non-permission events. For permission events just
+ * dequeue them and set the response. They will be freed once the
+ * response is consumed and fanotify_get_response() returns.
+ */
+ while (!fsnotify_notify_queue_is_empty(group)) {
+ struct fanotify_event *event;
+ event = FANOTIFY_E(fsnotify_remove_first_event(group));
+ if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
+ spin_unlock(&group->notification_lock);
+ fsnotify_destroy_event(group, &event->fse);
+ } else {
+ finish_permission_event(group, FANOTIFY_PERM(event),
+ FAN_ALLOW);
+ }
+ spin_lock(&group->notification_lock);
+ }
+ spin_unlock(&group->notification_lock);
}
- spin_unlock(&group->notification_lock);
/* Response for all permission events it set, wakeup waiters */
wake_up(&group->fanotify_data.access_waitq);
@@ -981,6 +982,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
if (flags & FAN_NONBLOCK)
f_flags |= O_NONBLOCK;
+ if (flags & FAN_PREALLOC_QUEUE) {
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (flags & FAN_UNLIMITED_QUEUE)
+ return -EINVAL;
+
+ fsn_flags = FSN_SUBMISSION_RING_BUFFER;
+ }
+
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
if (IS_ERR(group)) {
@@ -1223,6 +1234,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
goto fput_and_out;
}
+ if ((group->flags & FSN_SUBMISSION_RING_BUFFER) &&
+ (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
+ goto fput_and_out;
+
ret = fanotify_find_path(dfd, pathname, &path, flags,
(mask & ALL_FSNOTIFY_EVENTS), obj_type);
if (ret)
@@ -1327,7 +1342,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
*/
static int __init fanotify_user_setup(void)
{
- BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
+ BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 3e9c56ee651f..5a4cefb4b1c3 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -23,7 +23,8 @@
#define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
FAN_REPORT_TID | \
FAN_CLOEXEC | FAN_NONBLOCK | \
- FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
+ FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
+ FAN_PREALLOC_QUEUE)
#define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \
FAN_MARK_FILESYSTEM)
@@ -71,6 +72,8 @@
FANOTIFY_PERM_EVENTS | \
FAN_Q_OVERFLOW | FAN_ONDIR)
+#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
+
#define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \
FANOTIFY_EVENT_FLAGS)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index fbf9c5c7dd59..b283531549f1 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -49,6 +49,7 @@
#define FAN_UNLIMITED_QUEUE 0x00000010
#define FAN_UNLIMITED_MARKS 0x00000020
#define FAN_ENABLE_AUDIT 0x00000040
+#define FAN_PREALLOC_QUEUE 0x00000080
/* Flags to determine fanotify event format */
#define FAN_REPORT_TID 0x00000100 /* event->pid is thread id */
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (5 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 8:39 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
` (8 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Expose a new type of fsnotify event for filesystems to report errors for
userspace monitoring tools. fanotify will send this type of
notification for FAN_ERROR marks.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fsnotify.c | 2 +-
include/linux/fsnotify_backend.h | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 30d422b8c0fc..9fff35e67b37 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -558,7 +558,7 @@ static __init int fsnotify_init(void)
{
int ret;
- BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26);
ret = init_srcu_struct(&fsnotify_mark_srcu);
if (ret)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index a1a4dd69e5ed..f850bfbe30d4 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -48,6 +48,8 @@
#define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */
#define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */
+#define FS_ERROR 0x00100000 /* Used for filesystem error reporting */
+
#define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */
/*
* Set on inode mark that cares about things that happen to its children.
@@ -74,6 +76,8 @@
#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
FS_OPEN_EXEC_PERM)
+#define ALL_FSNOTIFY_ERROR_EVENTS FS_ERROR
+
#define FSN_SUBMISSION_RING_BUFFER 0x00000080
/*
@@ -95,6 +99,7 @@
/* Events that can be reported to backends */
#define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
+ ALL_FSNOTIFY_ERROR_EVENTS | \
FS_EVENTS_POSS_ON_CHILD | \
FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \
FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED)
@@ -272,6 +277,17 @@ enum fsnotify_data_type {
FSNOTIFY_EVENT_NONE,
FSNOTIFY_EVENT_PATH,
FSNOTIFY_EVENT_INODE,
+ FSNOTIFY_EVENT_ERROR,
+};
+
+struct fs_error_report {
+ int error;
+
+ int line;
+ const char *function;
+
+ size_t fs_data_size;
+ void *fs_data;
};
static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (6 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 6:49 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
` (7 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
include/linux/fsnotify.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index f8acddcf54fb..b3ac1a9d0d4d 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -317,4 +317,19 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
fsnotify_dentry(dentry, mask);
}
+static inline void fsnotify_error_event(int error, struct inode *dir,
+ const char *function, int line,
+ void *fs_data, int fs_data_size)
+{
+ struct fs_error_report report = {
+ .error = error,
+ .line = line,
+ .function = function,
+ .fs_data_size = fs_data_size,
+ .fs_data = fs_data,
+ };
+
+ fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, dir, NULL, NULL, 0);
+}
+
#endif /* _LINUX_FS_NOTIFY_H */
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 09/15] fanotify: Introduce generic error record
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (7 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 7:01 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
` (6 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
This record describes a fs error in a fs agnostic way. It will be send
back to userspace in response to a FSNOTIFY_EVENT_ERROR for groups with
the FAN_ERROR mark.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify.h | 16 ++++++++++++++++
fs/notify/fanotify/fanotify_user.c | 28 ++++++++++++++++++++++++++++
include/uapi/linux/fanotify.h | 10 ++++++++++
3 files changed, 54 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 47299e3d6efd..4cb9dd31f084 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -179,6 +179,22 @@ FANOTIFY_NE(struct fanotify_event *event)
return container_of(event, struct fanotify_name_event, fae);
}
+struct fanotify_error_event {
+ struct fanotify_event fae;
+ int error;
+ __kernel_fsid_t fsid;
+
+ int fs_data_size;
+ /* Must be the last item in the structure */
+ char fs_data[0];
+};
+
+static inline struct fanotify_error_event *
+FANOTIFY_EE(struct fanotify_event *event)
+{
+ return container_of(event, struct fanotify_error_event, fae);
+}
+
static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
{
if (event->type == FANOTIFY_EVENT_TYPE_FID)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 5031198bf7db..21162d347bd1 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -64,6 +64,11 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
}
+static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
+{
+ return sizeof(struct fanotify_event_info_error);
+}
+
static size_t fanotify_event_len(struct fanotify_event *event,
unsigned int fid_mode)
{
@@ -232,6 +237,29 @@ static int process_access_response(struct fsnotify_group *group,
return -ENOENT;
}
+static size_t copy_error_info_to_user(struct fanotify_error_event *fee,
+ char __user *buf, int count)
+{
+ struct fanotify_event_info_error info;
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
+ info.hdr.pad = 0;
+ info.hdr.len = fanotify_error_info_len(fee);
+
+ if (WARN_ON(count < info.hdr.len))
+ return -EFAULT;
+
+ info.version = FANOTIFY_EVENT_INFO_ERROR_VERS_1;
+ info.error = fee->error;
+ info.fsid = fee->fsid;
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ return info.hdr.len;
+
+}
+
static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name, size_t name_len,
char __user *buf, size_t count)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index b283531549f1..cc9a1fa80e30 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -124,6 +124,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_FID 1
#define FAN_EVENT_INFO_TYPE_DFID_NAME 2
#define FAN_EVENT_INFO_TYPE_DFID 3
+#define FAN_EVENT_INFO_TYPE_ERROR 4
/* Variable length info record following event metadata */
struct fanotify_event_info_header {
@@ -149,6 +150,15 @@ struct fanotify_event_info_fid {
unsigned char handle[0];
};
+#define FANOTIFY_EVENT_INFO_ERROR_VERS_1 1
+
+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ int version;
+ int error;
+ __kernel_fsid_t fsid;
+};
+
struct fanotify_response {
__s32 fd;
__u32 response;
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 10/15] fanotify: Introduce code location record
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (8 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 7:11 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
` (5 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
This patch introduces an optional info record that describes the
source (as in the region of the source-code where an event was
initiated). This record is not produced for other type of existing
notification, but it is optionally enabled for FAN_ERROR notifications.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify.h | 6 +++++
fs/notify/fanotify/fanotify_user.c | 35 ++++++++++++++++++++++++++++++
include/uapi/linux/fanotify.h | 7 ++++++
3 files changed, 48 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 4cb9dd31f084..0d1b4cb8b005 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -161,6 +161,11 @@ struct fanotify_fid_event {
unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN];
};
+struct fanotify_event_location {
+ int line;
+ const char *function;
+};
+
static inline struct fanotify_fid_event *
FANOTIFY_FE(struct fanotify_event *event)
{
@@ -183,6 +188,7 @@ struct fanotify_error_event {
struct fanotify_event fae;
int error;
__kernel_fsid_t fsid;
+ struct fanotify_event_location loc;
int fs_data_size;
/* Must be the last item in the structure */
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 21162d347bd1..cb79a4a8e6fb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -69,6 +69,16 @@ static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
return sizeof(struct fanotify_event_info_error);
}
+static size_t fanotify_location_info_len(const struct fanotify_event_location *loc)
+{
+ if (!loc->function)
+ return 0;
+
+ /* Includes NULL byte at end of loc->function */
+ return (sizeof(struct fanotify_event_info_location) +
+ strlen(loc->function) + 1);
+}
+
static size_t fanotify_event_len(struct fanotify_event *event,
unsigned int fid_mode)
{
@@ -260,6 +270,31 @@ static size_t copy_error_info_to_user(struct fanotify_error_event *fee,
}
+static size_t copy_location_info_to_user(struct fanotify_event_location *location,
+ char __user *buf, int count)
+{
+ size_t len = fanotify_location_info_len(location);
+ size_t tail = len - sizeof(struct fanotify_event_info_location);
+ struct fanotify_event_info_location info;
+
+ if (!len)
+ return 0;
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_LOCATION;
+ info.hdr.len = len;
+ info.line = location->line;
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ buf += sizeof(info);
+
+ if (copy_to_user(buf, location->function, tail))
+ return -EFAULT;
+
+ return info.hdr.len;
+}
+
static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name, size_t name_len,
char __user *buf, size_t count)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index cc9a1fa80e30..67217756dac9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -125,6 +125,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_DFID_NAME 2
#define FAN_EVENT_INFO_TYPE_DFID 3
#define FAN_EVENT_INFO_TYPE_ERROR 4
+#define FAN_EVENT_INFO_TYPE_LOCATION 5
/* Variable length info record following event metadata */
struct fanotify_event_info_header {
@@ -159,6 +160,12 @@ struct fanotify_event_info_error {
__kernel_fsid_t fsid;
};
+struct fanotify_event_info_location {
+ struct fanotify_event_info_header hdr;
+ int line;
+ char function[0];
+};
+
struct fanotify_response {
__s32 fd;
__u32 response;
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (9 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-27 7:12 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
` (4 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Allow a FS_ERROR_TYPE notification to send a filesystem provided blob
back to userspace. This is useful for filesystems who want to provide
debug information for recovery tools.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify_user.c | 27 +++++++++++++++++++++++++++
include/uapi/linux/fanotify.h | 10 ++++++++--
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index cb79a4a8e6fb..e2f4599dfc25 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -69,6 +69,14 @@ static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
return sizeof(struct fanotify_event_info_error);
}
+static size_t fanotify_error_fsdata_len(struct fanotify_error_event *fee)
+{
+ if (!fee->fs_data_size)
+ return 0;
+
+ return sizeof(struct fanotify_event_info_fsdata) + fee->fs_data_size;
+}
+
static size_t fanotify_location_info_len(const struct fanotify_event_location *loc)
{
if (!loc->function)
@@ -295,6 +303,25 @@ static size_t copy_location_info_to_user(struct fanotify_event_location *locatio
return info.hdr.len;
}
+static ssize_t copy_error_fsdata_info_to_user(struct fanotify_error_event *fee,
+ char __user *buf, int count)
+{
+ struct fanotify_event_info_fsdata info;
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_FSDATA;
+ info.hdr.len = fanotify_error_fsdata_len(fee);
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ buf += sizeof(info);
+
+ if (copy_to_user(buf, fee->fs_data, fee->fs_data_size))
+ return -EFAULT;
+
+ return info.hdr.len;
+}
+
static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name, size_t name_len,
char __user *buf, size_t count)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 67217756dac9..49808c857ee1 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -124,8 +124,9 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_FID 1
#define FAN_EVENT_INFO_TYPE_DFID_NAME 2
#define FAN_EVENT_INFO_TYPE_DFID 3
-#define FAN_EVENT_INFO_TYPE_ERROR 4
-#define FAN_EVENT_INFO_TYPE_LOCATION 5
+#define FAN_EVENT_INFO_TYPE_LOCATION 4
+#define FAN_EVENT_INFO_TYPE_ERROR 5
+#define FAN_EVENT_INFO_TYPE_FSDATA 6
/* Variable length info record following event metadata */
struct fanotify_event_info_header {
@@ -166,6 +167,11 @@ struct fanotify_event_info_location {
char function[0];
};
+struct fanotify_event_info_fsdata {
+ struct fanotify_event_info_header hdr;
+ char data[0];
+};
+
struct fanotify_response {
__s32 fd;
__u32 response;
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (10 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-26 22:45 ` kernel test robot
2021-04-27 7:25 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
` (3 subsequent siblings)
15 siblings, 2 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
The FAN_ERROR mark is used by filesystem wide monitoring tools to
receive notifications of type FS_ERROR_EVENT, emited by filesystems when
a problem is detected. The error notification includes a generic error
descriptor, an optional location record and a filesystem specific blob.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/notify/fanotify/fanotify.c | 48 +++++++++++++++++++----
fs/notify/fanotify/fanotify.h | 8 ++++
fs/notify/fanotify/fanotify_user.c | 63 ++++++++++++++++++++++++++++++
include/linux/fanotify.h | 9 ++++-
include/uapi/linux/fanotify.h | 2 +
5 files changed, 120 insertions(+), 10 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 98591a8155a7..6bae23d42e5e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -240,12 +240,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
__func__, iter_info->report_mask, event_mask, data, data_type);
if (!fid_mode) {
- /* Do we have path to open a file descriptor? */
- if (!path)
- return 0;
- /* Path type events are only relevant for files and dirs */
- if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
- return 0;
+ if (!fanotify_is_error_event(event_mask)) {
+ /* Do we have path to open a file descriptor? */
+ if (!path)
+ return 0;
+ /* Path type events are only relevant for files and dirs */
+ if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
+ return 0;
+ }
} else if (!(fid_mode & FAN_REPORT_FID)) {
/* Do we have a directory inode to report? */
if (!dir && !(event_mask & FS_ISDIR))
@@ -458,6 +460,25 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
return &pevent->fae;
}
+static void fanotify_init_error_event(struct fanotify_event *fae,
+ const struct fs_error_report *report,
+ __kernel_fsid_t *fsid)
+{
+ struct fanotify_error_event *fee;
+
+ fae->type = FANOTIFY_EVENT_TYPE_ERROR;
+ fee = FANOTIFY_EE(fae);
+ fee->error = report->error;
+ fee->fsid = *fsid;
+
+ fee->loc.line = report->line;
+ fee->loc.function = report->function;
+
+ fee->fs_data_size = report->fs_data_size;
+
+ memcpy(&fee->fs_data, report->fs_data, report->fs_data_size);
+}
+
static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id,
__kernel_fsid_t *fsid,
gfp_t gfp)
@@ -618,6 +639,13 @@ static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *grou
{
size_t size = 0;
+ if (fanotify_is_error_event(mask)) {
+ const struct fs_error_report *report = data;
+ size = sizeof(struct fanotify_error_event) + report->fs_data_size;
+ } else {
+ return ERR_PTR(-EINVAL);
+ }
+
pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
@@ -629,6 +657,9 @@ static void fanotify_ring_write_event(struct fsnotify_group *group,
{
fanotify_init_event(group, event, 0, mask);
+ if (fanotify_is_error_event(mask))
+ fanotify_init_error_event(event, data, fsid);
+
event->pid = get_pid(task_tgid(current));
}
@@ -695,8 +726,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
+ BUILD_BUG_ON(FAN_ERROR != FS_ERROR);
- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
mask = fanotify_group_event_mask(group, iter_info, mask, data,
data_type, dir);
@@ -714,7 +746,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
return 0;
}
- if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
+ if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS) || mask == FAN_ERROR) {
fsid = fanotify_get_fsid(iter_info);
/* Racing with mark destruction or creation? */
if (!fsid.val[0] && !fsid.val[1])
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 0d1b4cb8b005..097667be9079 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -135,6 +135,7 @@ enum fanotify_event_type {
FANOTIFY_EVENT_TYPE_PATH,
FANOTIFY_EVENT_TYPE_PATH_PERM,
FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
+ FANOTIFY_EVENT_TYPE_ERROR,
};
struct fanotify_event {
@@ -207,6 +208,8 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
return &FANOTIFY_FE(event)->fsid;
else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
return &FANOTIFY_NE(event)->fsid;
+ else if (event->type == FANOTIFY_EVENT_TYPE_ERROR)
+ return &FANOTIFY_EE(event)->fsid;
else
return NULL;
}
@@ -292,6 +295,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
return container_of(fse, struct fanotify_event, fse);
}
+static inline bool fanotify_is_error_event(u32 mask)
+{
+ return mask & FANOTIFY_ERROR_EVENTS;
+}
+
static inline bool fanotify_event_has_path(struct fanotify_event *event)
{
return event->type == FANOTIFY_EVENT_TYPE_PATH ||
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index e2f4599dfc25..6270083bee36 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -96,6 +96,24 @@ static size_t fanotify_event_len(struct fanotify_event *event,
int fh_len;
int dot_len = 0;
+ if (fanotify_is_error_event(event->mask)) {
+ struct fanotify_error_event *fee = FANOTIFY_EE(event);
+ /*
+ * Error events (FAN_ERROR) have a different format
+ * as follows:
+ * [ event_metadata ]
+ * [ fs-generic error header ]
+ * [ error location (optional) ]
+ * [ fs-specific blob ]
+ */
+ event_len = fanotify_error_info_len(fee);
+ if (fee->loc.function)
+ event_len += fanotify_location_info_len(&fee->loc);
+ if (fee->fs_data)
+ event_len += fanotify_error_fsdata_len(fee);
+ return event_len;
+ }
+
if (!fid_mode)
return event_len;
@@ -322,6 +340,38 @@ static ssize_t copy_error_fsdata_info_to_user(struct fanotify_error_event *fee,
return info.hdr.len;
}
+static int copy_error_event_to_user(struct fanotify_event *event,
+ char __user *buf, int count)
+{
+ struct fanotify_error_event *fee = FANOTIFY_EE(event);
+ ssize_t len;
+ int original_count = count;
+
+ len = copy_error_info_to_user(fee, buf, count);
+ if (len < 0)
+ return -EFAULT;
+ buf += len;
+ count -= len;
+
+ if (fee->loc.function) {
+ len = copy_location_info_to_user(&fee->loc, buf, count);
+ if (len < 0)
+ return len;
+ buf += len;
+ count -= len;
+ }
+
+ if (fee->fs_data_size) {
+ len = copy_error_fsdata_info_to_user(fee, buf, count);
+ if (len < 0)
+ return len;
+ buf += len;
+ count -= len;
+ }
+
+ return original_count - count;
+}
+
static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name, size_t name_len,
char __user *buf, size_t count)
@@ -528,6 +578,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
count -= ret;
}
+ if (fanotify_is_error_event(event->mask))
+ ret = copy_error_event_to_user(event, buf, count);
+
return metadata.event_len;
out_close_fd:
@@ -1328,6 +1381,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
(mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
goto fput_and_out;
+ if (fanotify_is_error_event(mask) &&
+ !(group->flags & FSN_SUBMISSION_RING_BUFFER))
+ goto fput_and_out;
+
ret = fanotify_find_path(dfd, pathname, &path, flags,
(mask & ALL_FSNOTIFY_EVENTS), obj_type);
if (ret)
@@ -1350,6 +1407,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
fsid = &__fsid;
}
+ if (mask & FAN_ERROR) {
+ ret = fanotify_check_path_fsid(&path, &__fsid);
+ if (ret)
+ goto path_put_and_out;
+ fsid = &__fsid;
+ }
/* inode held in place by reference to path; group by fget on fd */
if (mark_type == FAN_MARK_INODE)
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 5a4cefb4b1c3..e08be5fae14a 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -56,9 +56,13 @@
#define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
+#define FANOTIFY_ERROR_EVENTS (FAN_ERROR)
+
/* Events that user can request to be notified on */
#define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
- FANOTIFY_INODE_EVENTS)
+ FANOTIFY_INODE_EVENTS | \
+ FANOTIFY_ERROR_EVENTS)
+
/* Events that require a permission response from user */
#define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
@@ -70,9 +74,10 @@
/* Events that may be reported to user */
#define FANOTIFY_OUTGOING_EVENTS (FANOTIFY_EVENTS | \
FANOTIFY_PERM_EVENTS | \
+ FANOTIFY_ERROR_EVENTS | \
FAN_Q_OVERFLOW | FAN_ONDIR)
-#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
+#define FANOTIFY_SUBMISSION_BUFFER_EVENTS FANOTIFY_ERROR_EVENTS
#define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \
FANOTIFY_EVENT_FLAGS)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 49808c857ee1..ee0ae8b1e20b 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -25,6 +25,8 @@
#define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
#define FAN_OPEN_EXEC_PERM 0x00040000 /* File open/exec in perm check */
+#define FAN_ERROR 0x00100000 /* Filesystem error */
+
#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
#define FAN_ONDIR 0x40000000 /* Event occurred against dir */
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 13/15] ext4: Send notifications on error
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (11 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
2021-04-26 23:10 ` kernel test robot
` (2 more replies)
2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
` (2 subsequent siblings)
15 siblings, 3 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Send a FS_ERROR message via fsnotify to a userspace monitoring tool
whenever a ext4 error condition is triggered. This follows the existing
error conditions in ext4, so it is hooked to the ext4_error* functions.
It also follows the current dmesg reporting in the format. The
filesystem message is composed mostly by the string that would be
otherwise printed in dmesg.
A new ext4 specific record format is exposed in the uapi, such that a
monitoring tool knows what to expect when listening errors of an ext4
filesystem.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
fs/ext4/super.c | 60 ++++++++++++++++++++++++--------
include/uapi/linux/ext4-notify.h | 17 +++++++++
2 files changed, 62 insertions(+), 15 deletions(-)
create mode 100644 include/uapi/linux/ext4-notify.h
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..032e29e7ff6a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -46,6 +46,8 @@
#include <linux/part_stat.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
+#include <linux/fsnotify.h>
+#include <uapi/linux/ext4-notify.h>
#include "ext4.h"
#include "ext4_extents.h" /* Needed for trace points definition */
@@ -727,6 +729,22 @@ static void flush_stashed_error_work(struct work_struct *work)
ext4_commit_super(sbi->s_sb);
}
+static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
+ const char *func, int line,
+ const char *desc, struct va_format *vaf)
+{
+ struct ext4_error_inode_report report;
+
+ if (inode->i_sb->s_fsnotify_marks) {
+ report.inode = inode ? inode->i_ino : -1L;
+ report.block = block ? block : -1L;
+
+ snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
+
+ fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
+ }
+}
+
#define ext4_error_ratelimit(sb) \
___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state), \
"EXT4-fs error")
@@ -742,15 +760,18 @@ void __ext4_error(struct super_block *sb, const char *function,
return;
trace_ext4_error(sb, function, line);
+
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
if (ext4_error_ratelimit(sb)) {
- va_start(args, fmt);
- vaf.fmt = fmt;
- vaf.va = &args;
printk(KERN_CRIT
"EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
sb->s_id, function, line, current->comm, &vaf);
- va_end(args);
+
}
+ ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
+ va_end(args);
ext4_handle_error(sb, force_ro, error, 0, block, function, line);
}
@@ -765,10 +786,10 @@ void __ext4_error_inode(struct inode *inode, const char *function,
return;
trace_ext4_error(inode->i_sb, function, line);
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
if (ext4_error_ratelimit(inode->i_sb)) {
- va_start(args, fmt);
- vaf.fmt = fmt;
- vaf.va = &args;
if (block)
printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
"inode #%lu: block %llu: comm %s: %pV\n",
@@ -779,8 +800,11 @@ void __ext4_error_inode(struct inode *inode, const char *function,
"inode #%lu: comm %s: %pV\n",
inode->i_sb->s_id, function, line, inode->i_ino,
current->comm, &vaf);
- va_end(args);
}
+
+ ext4_fsnotify_error(error, inode, block, function, line, NULL, &vaf);
+ va_end(args);
+
ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
function, line);
}
@@ -798,13 +822,16 @@ void __ext4_error_file(struct file *file, const char *function,
return;
trace_ext4_error(inode->i_sb, function, line);
+
+ path = file_path(file, pathname, sizeof(pathname));
+ if (IS_ERR(path))
+ path = "(unknown)";
+
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+
if (ext4_error_ratelimit(inode->i_sb)) {
- path = file_path(file, pathname, sizeof(pathname));
- if (IS_ERR(path))
- path = "(unknown)";
- va_start(args, fmt);
- vaf.fmt = fmt;
- vaf.va = &args;
if (block)
printk(KERN_CRIT
"EXT4-fs error (device %s): %s:%d: inode #%lu: "
@@ -817,8 +844,10 @@ void __ext4_error_file(struct file *file, const char *function,
"comm %s: path %s: %pV\n",
inode->i_sb->s_id, function, line, inode->i_ino,
current->comm, path, &vaf);
- va_end(args);
}
+ ext4_fsnotify_error(EFSCORRUPTED, inode, block, function, line, NULL, &vaf);
+ va_end(args);
+
ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
function, line);
}
@@ -886,6 +915,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
sb->s_id, function, line, errstr);
}
+ ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
ext4_handle_error(sb, false, -errno, 0, 0, function, line);
}
diff --git a/include/uapi/linux/ext4-notify.h b/include/uapi/linux/ext4-notify.h
new file mode 100644
index 000000000000..31a3bbcafd13
--- /dev/null
+++ b/include/uapi/linux/ext4-notify.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * Copyright 2021, Collabora Ltd.
+ */
+
+#ifndef EXT4_NOTIFY_H
+#define EXT4_NOTIFY_H
+
+#define EXT4_FSN_DESC_LEN 256
+
+struct ext4_error_inode_report {
+ u64 inode;
+ u64 block;
+ char desc[EXT4_FSN_DESC_LEN];
+};
+
+#endif
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 14/15] samples: Add fs error monitoring example
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (12 preceding siblings ...)
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
@ 2021-04-26 18:42 ` Gabriel Krisman Bertazi
2021-04-26 23:10 ` kernel test robot
2021-04-26 18:42 ` [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework Gabriel Krisman Bertazi
2021-04-27 4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:42 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Introduce an example of a FAN_ERROR fanotify user to track filesystem
errors.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
samples/Kconfig | 7 ++
samples/Makefile | 1 +
samples/fanotify/Makefile | 3 +
samples/fanotify/fs-monitor.c | 135 ++++++++++++++++++++++++++++++++++
4 files changed, 146 insertions(+)
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c
diff --git a/samples/Kconfig b/samples/Kconfig
index e76cdfc50e25..a2968338517f 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -120,6 +120,13 @@ config SAMPLE_CONNECTOR
with it.
See also Documentation/driver-api/connector.rst
+config SAMPLE_FANOTIFY_ERROR
+ bool "Build fanotify error monitoring sample"
+ depends on FANOTIFY
+ help
+ When enabled, this builds an example code that uses the FAN_ERROR
+ fanotify mechanism to monitor filesystem errors.
+
config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index c3392a595e4b..93e2d64bc9a7 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -5,6 +5,7 @@ subdir-$(CONFIG_SAMPLE_AUXDISPLAY) += auxdisplay
subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
+obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
obj-$(CONFIG_SAMPLE_KDB) += kdb/
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
new file mode 100644
index 000000000000..b3d5cc826e6f
--- /dev/null
+++ b/samples/fanotify/Makefile
@@ -0,0 +1,3 @@
+userprogs-always-y += fs-monitor
+
+userccflags += -I usr/include
diff --git a/samples/fanotify/fs-monitor.c b/samples/fanotify/fs-monitor.c
new file mode 100644
index 000000000000..cdece8344c20
--- /dev/null
+++ b/samples/fanotify/fs-monitor.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021, Collabora Ltd.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <err.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/fanotify.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#ifndef FAN_ERROR
+
+#define FAN_ERROR 0x00100000
+#define FAN_PREALLOC_QUEUE 0x00000080
+
+#define FAN_EVENT_INFO_TYPE_LOCATION 4
+#define FAN_EVENT_INFO_TYPE_ERROR 5
+#define FAN_EVENT_INFO_TYPE_FSDATA 6
+
+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ int version;
+ int error;
+ long long unsigned fsid;
+};
+
+struct fanotify_event_info_location {
+ struct fanotify_event_info_header hdr;
+ int line;
+ char function[0];
+};
+
+struct fanotify_event_info_fsdata {
+ struct fanotify_event_info_header hdr;
+ char data[0];
+};
+
+struct ext4_error_inode_report {
+ unsigned long long inode;
+ unsigned long long block;
+ char desc[40];
+};
+#endif
+
+static void handle_notifications(char *buffer, int len)
+{
+ struct fanotify_event_metadata *metadata;
+ struct fanotify_event_info_header *hdr = 0;
+ char *off, *next;
+
+ for (metadata = (struct fanotify_event_metadata *) buffer;
+ FAN_EVENT_OK(metadata, len); metadata = FAN_EVENT_NEXT(metadata, len)) {
+ next = (char*)metadata + metadata->event_len;
+ if (!(metadata->mask == FAN_ERROR)) {
+ printf("unexpected FAN MARK: %llx\n", metadata->mask);
+ continue;
+ }
+ if (metadata->fd != FAN_NOFD) {
+ printf("bizar fd != FAN_NOFD\n");
+ continue;;
+ }
+
+ printf("FAN_ERROR found len=%d\n", metadata->event_len);
+
+ for (off = (char*)(metadata+1); off < next; off = off + hdr->len) {
+ hdr = (struct fanotify_event_info_header*)(off);
+
+ if (hdr->info_type == FAN_EVENT_INFO_TYPE_ERROR) {
+ struct fanotify_event_info_error *error =
+ (struct fanotify_event_info_error*) hdr;
+
+ printf(" Generic Error Record: len=%d\n", hdr->len);
+ printf(" version: %d\n", error->version);
+ printf(" error: %d\n", error->error);
+ printf(" fsid: %llx\n", error->fsid);
+
+ } else if(hdr->info_type == FAN_EVENT_INFO_TYPE_LOCATION) {
+ struct fanotify_event_info_location *loc =
+ (struct fanotify_event_info_location*) hdr;
+
+ printf(" Location Record Size = %d\n", loc->hdr.len);
+ printf(" loc=%s:%d\n", loc->function, loc->line);
+
+ } else if(hdr->info_type == FAN_EVENT_INFO_TYPE_FSDATA) {
+ struct fanotify_event_info_fsdata *data =
+ (struct fanotify_event_info_fsdata *)hdr;
+ struct ext4_error_inode_report *fsdata =
+ (struct ext4_error_inode_report*) ((char*)data->data);
+
+ printf(" Fsdata Record: len=%d\n", hdr->len);
+ printf(" inode=%llu\n", fsdata->inode);
+ if (fsdata->block != -1L)
+ printf(" block=%llu\n", fsdata->block);
+ printf(" desc=%s\n", fsdata->desc);
+ }
+ }
+ }
+}
+
+int main(int argc, char **argv)
+{
+ int fd;
+ char buffer[BUFSIZ];
+
+ if (argc < 2) {
+ printf("Missing path argument\n");
+ return 1;
+ }
+
+ fd = fanotify_init(FAN_CLASS_NOTIF|FAN_PREALLOC_QUEUE, O_RDONLY);
+ if (fd < 0)
+ errx(1, "fanotify_init");
+
+ if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
+ FAN_ERROR, AT_FDCWD, argv[1])) {
+ errx(1, "fanotify_mark");
+ }
+
+ while (1) {
+ int n = read(fd, buffer, BUFSIZ);
+ if (n < 0)
+ errx(1, "read");
+
+ handle_notifications(buffer, n);
+ }
+
+ return 0;
+}
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (13 preceding siblings ...)
2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
@ 2021-04-26 18:42 ` Gabriel Krisman Bertazi
2021-04-27 4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
15 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:42 UTC (permalink / raw)
To: amir73il, tytso, djwong
Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
Gabriel Krisman Bertazi, kernel
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
.../admin-guide/filesystem-monitoring.rst | 103 ++++++++++++++++++
Documentation/admin-guide/index.rst | 1 +
2 files changed, 104 insertions(+)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
new file mode 100644
index 000000000000..e19bf792dd7a
--- /dev/null
+++ b/Documentation/admin-guide/filesystem-monitoring.rst
@@ -0,0 +1,103 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+File system Monitoring with fanotify
+====================================
+
+fanotify supports the FAN_ERROR mark for file system-wide error
+reporting. It is meant to be used by file system health monitoring
+daemons who listen on that interface and take actions (notify sysadmin,
+start recovery) when a file system problem is detected by the kernel.
+
+By design, A FAN_ERROR notification exposes sufficient information for a
+monitoring tool to map a problem to specific region of the file system
+or its code and trigger recovery procedures. It doesn't necessarily
+provide a user space application with semantics to verify an IO
+operation was successfully executed. That is outside of scope of this
+feature. Instead, it is only meant as a framework for early file system
+problem detection and reporting recovery tools.
+
+At the time of this writing, the only file system that emits this
+FAN_ERROR notifications is ext4.
+
+An example code for ext4 is provided at ``samples/fanotify/fs-monitor.c``.
+
+Usage
+=====
+
+In order to guarantee notification delivery on different error
+conditions, FAN_ERROR requires the fanotify group to be created with
+FAN_PREALLOC_QUEUE. This means a group that emits FAN_ERROR
+notifications currently cannot be reused for any other kind of
+notification.
+
+To setup a group for error notification::
+
+ fanotify_init(FAN_CLASS_NOTIF | FAN_PREALLOC_QUEUE, O_RDONLY);
+
+Then, enable the FAN_ERROR mark on a specific path::
+
+ fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM, FAN_ERROR, AT_FDCWD, "/mnt");
+
+Notification structure
+======================
+
+A FAN_ERROR Notification has the following format::
+
+ [ Notification Metadata (Mandatory) ]
+ [ Generic Error Record (Mandatory) ]
+ [ Error Location Record (Optional) ]
+ [ FS-Specific Record (Optional) ]
+
+With the exception of the notification metadata and the generic
+information, all information records are optional. Each record type is
+identified by its unique ``struct fanotify_event_info_header.info_type``.
+
+Generic error Location
+----------------------
+
+The Generic error record provides enough information for a file system
+agnostic tool to learn about a problem in the file system, without
+requiring any details about the problem.::
+
+ struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr; /* info_type = FAN_EVENT_INFO_TYPE_ERROR */
+ int version;
+ int error;
+ __kernel_fsid_t fsid;
+ };
+
+Error Location Record
+---------------------
+
+Error location is required by some use cases to easily associate an
+error with a specific line of code. Not every user case requires it and
+they might not be emitted for different file systems.
+
+Notice this field is variable length, but its size is found in ```hdr.len```.::
+
+ struct fanotify_event_info_location {
+ struct fanotify_event_info_header hdr; /* info_type = FAN_EVENT_INFO_TYPE_LOCATION */
+ int line;
+ char function[0];
+ };
+
+File system specific Record
+---------------------------
+
+The file system specific record attempts to provide file system specific
+tools with enough information to uniquely identify the problem and
+hopefully recover from it.
+
+Since each file system defines its own specific data, this record is
+composed by a header, followed by a data blob, that is defined by each
+file system. Review the file system documentation for more information.
+
+While the hdr.info_type identifies the presence of this field,
+``hdr.len`` field identifies the length of the file system specific
+structure following the header.::
+
+ struct fanotify_event_info_fsdata {
+ struct fanotify_event_info_header hdr; /* info_type = FAN_EVENT_INFO_TYPE_FSDATA */
+ struct data[0];
+ };
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 423116c4e787..a0d1bf76629f 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -83,6 +83,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
+ filesystem-monitoring
nfs/index
gpio/index
highuid
--
2.31.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
@ 2021-04-26 22:00 ` kernel test robot
2021-04-26 22:43 ` kernel test robot
2021-04-27 5:39 ` Amir Goldstein
2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 22:00 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3775 bytes --]
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: microblaze-randconfig-r006-20210426 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/0b550d36bb2ec4613ad64b68b18898a72fd5af50
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
git checkout 0b550d36bb2ec4613ad64b68b18898a72fd5af50
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=microblaze
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from include/linux/kernel.h:16,
from include/linux/radix-tree.h:12,
from include/linux/idr.h:15,
from include/linux/fsnotify_backend.h:13,
from include/linux/fsnotify.h:15,
from fs/notify/ring.c:3:
fs/notify/ring.c: In function 'fsnotify_ring_alloc_event_slot':
>> include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
5 | #define KERN_SOH "\001" /* ASCII Start Of Header */
| ^~~~~~
include/linux/printk.h:140:10: note: in definition of macro 'no_printk'
140 | printk(fmt, ##__VA_ARGS__); \
| ^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
15 | #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
| ^~~~~~~~
include/linux/printk.h:430:12: note: in expansion of macro 'KERN_DEBUG'
430 | no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~~~~
fs/notify/ring.c:81:2: note: in expansion of macro 'pr_debug'
81 | pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
| ^~~~~~~~
fs/notify/ring.c:81:59: note: format string is defined here
81 | pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
| ~~^
| |
| long unsigned int
| %u
vim +5 include/linux/kern_levels.h
314ba3520e513a Joe Perches 2012-07-30 4
04d2c8c83d0e3a Joe Perches 2012-07-30 @5 #define KERN_SOH "\001" /* ASCII Start Of Header */
04d2c8c83d0e3a Joe Perches 2012-07-30 6 #define KERN_SOH_ASCII '\001'
04d2c8c83d0e3a Joe Perches 2012-07-30 7
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35433 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
2021-04-26 22:00 ` kernel test robot
@ 2021-04-26 22:43 ` kernel test robot
2021-04-27 5:39 ` Amir Goldstein
2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 22:43 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 4824 bytes --]
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: arm-randconfig-r022-20210426 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project d941863de2becb3d8d2e00676fc7125974934c7f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/0b550d36bb2ec4613ad64b68b18898a72fd5af50
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
git checkout 0b550d36bb2ec4613ad64b68b18898a72fd5af50
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/notify/ring.c:82:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
ring_size, size);
^~~~
include/linux/printk.h:430:38: note: expanded from macro 'pr_debug'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
include/linux/printk.h:140:17: note: expanded from macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
~~~ ^~~~~~~~~~~
1 warning generated.
vim +82 fs/notify/ring.c
67
68 struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
69 size_t size)
70 __acquires(&group->notification_lock)
71 {
72 struct fsnotify_event *fsn;
73 u64 head, tail;
74 u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
75 u64 new_head;
76 void *kaddr;
77
78 if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
79 return ERR_PTR(-EINVAL);
80
81 pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
> 82 ring_size, size);
83
84 spin_lock(&group->notification_lock);
85 again:
86 head = group->ring_buffer.head;
87 tail = group->ring_buffer.tail;
88 new_head = NEXT_SLOT(head, size, ring_size);
89
90 /* head would catch up to tail, corrupting an entry. */
91 if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
92 fsn = ERR_PTR(-ENOMEM);
93 goto err;
94 }
95
96 /*
97 * Not event a skip message fits in the page. We can detect the
98 * lack of space. Move on to the next page.
99 */
100 if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
101 /* Start again on next page */
102 group->ring_buffer.head = NEXT_PAGE(head, ring_size);
103 goto again;
104 }
105
106 kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
107 if (!kaddr) {
108 fsn = ERR_PTR(-EFAULT);
109 goto err;
110 }
111
112 fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
113
114 if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
115 /*
116 * No room in the current page. Add a fake entry
117 * consuming the end the page to avoid splitting event
118 * structure.
119 */
120 fsn->slot_len = INVALID_RING_SLOT;
121 kunmap_atomic(kaddr);
122 /* Start again on the next page */
123 group->ring_buffer.head = NEXT_PAGE(head, ring_size);
124
125 goto again;
126 }
127 fsn->slot_len = size;
128
129 return fsn;
130
131 err:
132 spin_unlock(&group->notification_lock);
133 return fsn;
134 }
135
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33558 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
@ 2021-04-26 22:45 ` kernel test robot
2021-04-27 7:25 ` Amir Goldstein
1 sibling, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 22:45 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3869 bytes --]
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: x86_64-randconfig-a001-20210426 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project d941863de2becb3d8d2e00676fc7125974934c7f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/6179a61e1067e69a0e24e98bad3cb0eebdbfbee0
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
git checkout 6179a61e1067e69a0e24e98bad3cb0eebdbfbee0
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/notify/fanotify/fanotify_user.c:112:12: warning: address of array 'fee->fs_data' will always evaluate to 'true' [-Wpointer-bool-conversion]
if (fee->fs_data)
~~ ~~~~~^~~~~~~
1 warning generated.
vim +112 fs/notify/fanotify/fanotify_user.c
89
90 static size_t fanotify_event_len(struct fanotify_event *event,
91 unsigned int fid_mode)
92 {
93 size_t event_len = FAN_EVENT_METADATA_LEN;
94 struct fanotify_info *info;
95 int dir_fh_len;
96 int fh_len;
97 int dot_len = 0;
98
99 if (fanotify_is_error_event(event->mask)) {
100 struct fanotify_error_event *fee = FANOTIFY_EE(event);
101 /*
102 * Error events (FAN_ERROR) have a different format
103 * as follows:
104 * [ event_metadata ]
105 * [ fs-generic error header ]
106 * [ error location (optional) ]
107 * [ fs-specific blob ]
108 */
109 event_len = fanotify_error_info_len(fee);
110 if (fee->loc.function)
111 event_len += fanotify_location_info_len(&fee->loc);
> 112 if (fee->fs_data)
113 event_len += fanotify_error_fsdata_len(fee);
114 return event_len;
115 }
116
117 if (!fid_mode)
118 return event_len;
119
120 info = fanotify_event_info(event);
121 dir_fh_len = fanotify_event_dir_fh_len(event);
122 fh_len = fanotify_event_object_fh_len(event);
123
124 if (dir_fh_len) {
125 event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
126 } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
127 /*
128 * With group flag FAN_REPORT_NAME, if name was not recorded in
129 * event on a directory, we will report the name ".".
130 */
131 dot_len = 1;
132 }
133
134 if (fh_len)
135 event_len += fanotify_fid_info_len(fh_len, dot_len);
136
137 return event_len;
138 }
139
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37920 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 13/15] ext4: Send notifications on error
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
@ 2021-04-26 23:10 ` kernel test robot
2021-04-27 4:32 ` Amir Goldstein
2021-04-29 0:57 ` Darrick J. Wong
2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 23:10 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 2734 bytes --]
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on pcmoore-audit/next]
[also build test ERROR on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: nios2-randconfig-r024-20210426 (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/151ead19fe71b5ca87e8a345ec1f640454f7ad34
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
git checkout 151ead19fe71b5ca87e8a345ec1f640454f7ad34
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=nios2
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
fs/ext4/super.c: In function 'ext4_fsnotify_error':
>> fs/ext4/super.c:738:17: error: 'struct super_block' has no member named 's_fsnotify_marks'
738 | if (inode->i_sb->s_fsnotify_marks) {
| ^~
fs/ext4/super.c: In function 'ext4_remount':
fs/ext4/super.c:5839:6: warning: variable 'enable_quota' set but not used [-Wunused-but-set-variable]
5839 | int enable_quota = 0;
| ^~~~~~~~~~~~
vim +738 fs/ext4/super.c
731
732 static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
733 const char *func, int line,
734 const char *desc, struct va_format *vaf)
735 {
736 struct ext4_error_inode_report report;
737
> 738 if (inode->i_sb->s_fsnotify_marks) {
739 report.inode = inode ? inode->i_ino : -1L;
740 report.block = block ? block : -1L;
741
742 snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
743
744 fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
745 }
746 }
747
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25380 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 14/15] samples: Add fs error monitoring example
2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
@ 2021-04-26 23:10 ` kernel test robot
0 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 23:10 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 6306 bytes --]
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on pcmoore-audit/next]
[also build test ERROR on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/3b25963eb28ef5e89ef34cc7b64e479205a38e9e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
git checkout 3b25963eb28ef5e89ef34cc7b64e479205a38e9e
# save the attached .config to linux build tree
make W=1 W=1 ARCH=um
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> samples/fanotify/fs-monitor.c:28:36: error: field 'hdr' has incomplete type
28 | struct fanotify_event_info_header hdr;
| ^~~
samples/fanotify/fs-monitor.c:35:36: error: field 'hdr' has incomplete type
35 | struct fanotify_event_info_header hdr;
| ^~~
samples/fanotify/fs-monitor.c:41:36: error: field 'hdr' has incomplete type
41 | struct fanotify_event_info_header hdr;
| ^~~
samples/fanotify/fs-monitor.c: In function 'handle_notifications':
>> samples/fanotify/fs-monitor.c:72:62: error: dereferencing pointer to incomplete type 'struct fanotify_event_info_header'
72 | for (off = (char*)(metadata+1); off < next; off = off + hdr->len) {
| ^~
samples/fanotify/fs-monitor.c: In function 'main':
>> samples/fanotify/fs-monitor.c:121:37: error: 'FAN_MARK_FILESYSTEM' undeclared (first use in this function); did you mean 'FAN_MARK_FLUSH'?
121 | if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
| ^~~~~~~~~~~~~~~~~~~
| FAN_MARK_FLUSH
samples/fanotify/fs-monitor.c:121:37: note: each undeclared identifier is reported only once for each function it appears in
vim +/hdr +28 samples/fanotify/fs-monitor.c
26
27 struct fanotify_event_info_error {
> 28 struct fanotify_event_info_header hdr;
29 int version;
30 int error;
31 long long unsigned fsid;
32 };
33
34 struct fanotify_event_info_location {
35 struct fanotify_event_info_header hdr;
36 int line;
37 char function[0];
38 };
39
40 struct fanotify_event_info_fsdata {
41 struct fanotify_event_info_header hdr;
42 char data[0];
43 };
44
45 struct ext4_error_inode_report {
46 unsigned long long inode;
47 unsigned long long block;
48 char desc[40];
49 };
50 #endif
51
52 static void handle_notifications(char *buffer, int len)
53 {
54 struct fanotify_event_metadata *metadata;
55 struct fanotify_event_info_header *hdr = 0;
56 char *off, *next;
57
58 for (metadata = (struct fanotify_event_metadata *) buffer;
59 FAN_EVENT_OK(metadata, len); metadata = FAN_EVENT_NEXT(metadata, len)) {
60 next = (char*)metadata + metadata->event_len;
61 if (!(metadata->mask == FAN_ERROR)) {
62 printf("unexpected FAN MARK: %llx\n", metadata->mask);
63 continue;
64 }
65 if (metadata->fd != FAN_NOFD) {
66 printf("bizar fd != FAN_NOFD\n");
67 continue;;
68 }
69
70 printf("FAN_ERROR found len=%d\n", metadata->event_len);
71
> 72 for (off = (char*)(metadata+1); off < next; off = off + hdr->len) {
73 hdr = (struct fanotify_event_info_header*)(off);
74
75 if (hdr->info_type == FAN_EVENT_INFO_TYPE_ERROR) {
76 struct fanotify_event_info_error *error =
77 (struct fanotify_event_info_error*) hdr;
78
79 printf(" Generic Error Record: len=%d\n", hdr->len);
80 printf(" version: %d\n", error->version);
81 printf(" error: %d\n", error->error);
82 printf(" fsid: %llx\n", error->fsid);
83
84 } else if(hdr->info_type == FAN_EVENT_INFO_TYPE_LOCATION) {
85 struct fanotify_event_info_location *loc =
86 (struct fanotify_event_info_location*) hdr;
87
88 printf(" Location Record Size = %d\n", loc->hdr.len);
89 printf(" loc=%s:%d\n", loc->function, loc->line);
90
91 } else if(hdr->info_type == FAN_EVENT_INFO_TYPE_FSDATA) {
92 struct fanotify_event_info_fsdata *data =
93 (struct fanotify_event_info_fsdata *)hdr;
94 struct ext4_error_inode_report *fsdata =
95 (struct ext4_error_inode_report*) ((char*)data->data);
96
97 printf(" Fsdata Record: len=%d\n", hdr->len);
98 printf(" inode=%llu\n", fsdata->inode);
99 if (fsdata->block != -1L)
100 printf(" block=%llu\n", fsdata->block);
101 printf(" desc=%s\n", fsdata->desc);
102 }
103 }
104 }
105 }
106
107 int main(int argc, char **argv)
108 {
109 int fd;
110 char buffer[BUFSIZ];
111
112 if (argc < 2) {
113 printf("Missing path argument\n");
114 return 1;
115 }
116
117 fd = fanotify_init(FAN_CLASS_NOTIF|FAN_PREALLOC_QUEUE, O_RDONLY);
118 if (fd < 0)
119 errx(1, "fanotify_init");
120
> 121 if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 24316 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 00/15] File system wide monitoring
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
` (14 preceding siblings ...)
2021-04-26 18:42 ` [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework Gabriel Krisman Bertazi
@ 2021-04-27 4:11 ` Amir Goldstein
2021-04-27 15:44 ` Gabriel Krisman Bertazi
2021-05-11 10:43 ` Jan Kara
15 siblings, 2 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 4:11 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Hi,
>
> In an attempt to consolidate some of the feedback from the previous
> proposals, I wrote a new attempt to solve the file system error reporting
> problem. Before I spend more time polishing it, I'd like to hear your
> feedback if I'm going in the wrong direction, in particular with the
> modifications to fsnotify.
>
IMO you are going in the right direction, but you have gone a bit too far ;-)
My understanding of the requirements and my interpretation of the feedback
from filesystem maintainers is that the missing piece in the ecosystem is a
user notification that "something went wrong". The "what went wrong" part
is something that users and admins have long been able to gather from the
kernel log and from filesystem tools (e.g. last error recorded).
I do not see the need to duplicate existing functionality in fsmonitor.
Don't get me wrong, I understand why it would have been nice for fsmonitor
to be able to get all the errors nicely without looking anywhere else, but I
don't think it justifies the extra complexity.
> This RFC follows up on my previous proposals which attempted to leverage
> watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
> to push error notifications to user space. This proposal starts by, as
> suggested by Darrick, limiting the scope of what I'm trying to do to an
> interface for administrators to monitor the health of a file system,
> instead of a generic inteface for file errors. Therefore, this doesn't
> solve the problem of writeback errors or the need to watch a specific
> subsystem.
>
> * Format
>
> The feature is implemented on top of fanotify, as a new type of fanotify
> mark, FAN_ERROR, which a file system monitoring tool can register to
You have a terminology mistake throughout your series.
FAN_ERROR is not a type of a mark, it is a type of an event.
A mark describes the watched object (i.e. a filesystem, mount, inode).
> receive notifications. A notification is split in three parts, and only
> the first is guaranteed to exist for any given error event:
>
> - FS generic data: A file system agnostic structure that has a generic
> error code and identifies the filesystem. Basically, it let's
> userspace know something happen on a monitored filesystem.
I think an error seq counter per fs would be a nice addition to generic data.
It does not need to be persistent (it could be if filesystem supports it).
>
> - FS location data: Identifies where in the code the problem
> happened. (This is important for the use case of analysing frequent
> error points that we discussed earlier).
>
> - FS specific data: A detailed error report in a filesystem specific
> format that details what the error is. Ideally, a capable monitoring
> tool can use the information here for error recovery. For instance,
> xfs can put the xfs_scrub structures here, ext4 can send its error
> reports, etc. An example of usage is done in the ext4 patch of this
> series.
>
> More details on the information in each record can be found on the
> documentation introduced in patch 15.
>
> * Using fanotify
>
> Using fanotify for this kind of thing is slightly tricky because we want
> to guarantee delivery in some complicated conditions, for instance, the
> file system might want to send an error while holding several locks.
>
> Instead of working around file system constraints at the file system
> level, this proposal tries to make the FAN_ERROR submission safe in
> those contexts. This is done with a new mode in fsnotify that
> preallocates the memory at group creation to be used for the
> notification submission.
>
> This new mode in fsnotify introduces a ring buffer to queue
> notifications, which eliminates the allocation path in fsnotify. From
> what I saw, the allocation is the only problem in fsnotify for
> filesystems to submit errors in constrained situations.
>
The ring buffer functionality for fsnotify is interesting and it may be
useful on its own, but IMO, its too big of a hammer for the problem
at hand.
The question that you should be asking yourself is what is the
expected behavior in case of a flood of filesystem corruption errors.
I think it has already been expressed by filesystem maintainers on
one your previous postings, that a flood of filesystem corruption
errors is often noise and the only interesting information is the first error.
For this reason, I think that FS_ERROR could be implemented
by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
struct fsnotify_sb_mark {
struct fsnotify_mark fsn_mark;
struct fsnotify_error_info info;
}
Similar to fd sampled errseq, there can be only one error report
per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
the error report can be allocated at the time of setting the filesystem mark.
With this, you will not need the added complexity of the ring buffer
and you will not need to limit FAN_ERROR reporting to a group that
is only listening for FAN_ERROR, which is an unneeded limitation IMO.
Anyway, in case, others do like the ring buffer approach, I do have
some technical comments on the implementation.
I will comment on individual patches.
Thanks,
Amir.
> * Visibility
>
> Since the usecase is limited to a tool for whole file system monitoring,
> errors are associated with the superblock and visible filesystem-wide.
> It is assumed and required that userspace has CAP_SYS_ADMIN.
>
> * Testing
>
> This was tested with corrupted ext4 images in a few scenarios, which
> caused errors to be triggered and monitored with the sample tool
> provided in the next to final patch.
>
> * patches
>
> Patches 1-4 massage fanotify attempt to refactor fanotify a bit for
> the patches to come. Patch 5 introduce the ring buffer interface to
> fsnotify, while patch 6 enable this support in fanotify. Patch 7, 8 wire
> the FS_ERROR event type, which will be used by filesystems. In
> sequennce, patches 9-12 implement the FAN_ERROR record types and create
> the new event. Patch 13 is an ext4 example implementation supporting
> this feature. Finally, patches 14 and 15 document and provide examples
> of a userspace tool that uses this feature.
>
> I also pushed the full series to:
>
> https://gitlab.collabora.com/krisman/linux -b fanotify-notifications
>
> [1] https://lwn.net/Articles/839310/
> [2] https://www.spinics.net/lists/linux-fsdevel/msg187075.html
>
> Gabriel Krisman Bertazi (15):
> fanotify: Fold event size calculation to its own function
> fanotify: Split fsid check from other fid mode checks
> fsnotify: Wire flags field on group allocation
> fsnotify: Wire up group information on event initialization
> fsnotify: Support event submission through ring buffer
> fanotify: Support submission through ring buffer
> fsnotify: Support FS_ERROR event type
> fsnotify: Introduce helpers to send error_events
> fanotify: Introduce generic error record
> fanotify: Introduce code location record
> fanotify: Introduce filesystem specific data record
> fanotify: Introduce the FAN_ERROR mark
> ext4: Send notifications on error
> samples: Add fs error monitoring example
> Documentation: Document the FAN_ERROR framework
>
> .../admin-guide/filesystem-monitoring.rst | 103 ++++++
> Documentation/admin-guide/index.rst | 1 +
> fs/ext4/super.c | 60 +++-
> fs/notify/Makefile | 2 +-
> fs/notify/dnotify/dnotify.c | 2 +-
> fs/notify/fanotify/fanotify.c | 127 +++++--
> fs/notify/fanotify/fanotify.h | 35 +-
> fs/notify/fanotify/fanotify_user.c | 319 ++++++++++++++----
> fs/notify/fsnotify.c | 2 +-
> fs/notify/group.c | 25 +-
> fs/notify/inotify/inotify_fsnotify.c | 2 +-
> fs/notify/inotify/inotify_user.c | 4 +-
> fs/notify/notification.c | 10 +
> fs/notify/ring.c | 199 +++++++++++
> include/linux/fanotify.h | 12 +-
> include/linux/fsnotify.h | 15 +
> include/linux/fsnotify_backend.h | 63 +++-
> include/uapi/linux/ext4-notify.h | 17 +
> include/uapi/linux/fanotify.h | 26 ++
> kernel/audit_fsnotify.c | 2 +-
> kernel/audit_tree.c | 2 +-
> kernel/audit_watch.c | 2 +-
> samples/Kconfig | 7 +
> samples/Makefile | 1 +
> samples/fanotify/Makefile | 3 +
> samples/fanotify/fs-monitor.c | 135 ++++++++
> 26 files changed, 1034 insertions(+), 142 deletions(-)
> create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
> create mode 100644 fs/notify/ring.c
> create mode 100644 include/uapi/linux/ext4-notify.h
> create mode 100644 samples/fanotify/Makefile
> create mode 100644 samples/fanotify/fs-monitor.c
>
> --
> 2.31.0
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 13/15] ext4: Send notifications on error
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
2021-04-26 23:10 ` kernel test robot
@ 2021-04-27 4:32 ` Amir Goldstein
2021-04-29 0:57 ` Darrick J. Wong
2 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 4:32 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> whenever a ext4 error condition is triggered. This follows the existing
> error conditions in ext4, so it is hooked to the ext4_error* functions.
>
> It also follows the current dmesg reporting in the format. The
> filesystem message is composed mostly by the string that would be
> otherwise printed in dmesg.
>
> A new ext4 specific record format is exposed in the uapi, such that a
> monitoring tool knows what to expect when listening errors of an ext4
> filesystem.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/ext4/super.c | 60 ++++++++++++++++++++++++--------
> include/uapi/linux/ext4-notify.h | 17 +++++++++
> 2 files changed, 62 insertions(+), 15 deletions(-)
> create mode 100644 include/uapi/linux/ext4-notify.h
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b9693680463a..032e29e7ff6a 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -46,6 +46,8 @@
> #include <linux/part_stat.h>
> #include <linux/kthread.h>
> #include <linux/freezer.h>
> +#include <linux/fsnotify.h>
> +#include <uapi/linux/ext4-notify.h>
>
> #include "ext4.h"
> #include "ext4_extents.h" /* Needed for trace points definition */
> @@ -727,6 +729,22 @@ static void flush_stashed_error_work(struct work_struct *work)
> ext4_commit_super(sbi->s_sb);
> }
>
> +static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
> + const char *func, int line,
> + const char *desc, struct va_format *vaf)
> +{
> + struct ext4_error_inode_report report;
> +
> + if (inode->i_sb->s_fsnotify_marks) {
> + report.inode = inode ? inode->i_ino : -1L;
> + report.block = block ? block : -1L;
> +
> + snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
> +
> + fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
> + }
> +}
> +
> #define ext4_error_ratelimit(sb) \
> ___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state), \
> "EXT4-fs error")
> @@ -742,15 +760,18 @@ void __ext4_error(struct super_block *sb, const char *function,
> return;
>
> trace_ext4_error(sb, function, line);
> +
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> if (ext4_error_ratelimit(sb)) {
> - va_start(args, fmt);
> - vaf.fmt = fmt;
> - vaf.va = &args;
> printk(KERN_CRIT
> "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
> sb->s_id, function, line, current->comm, &vaf);
> - va_end(args);
> +
> }
> + ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
> + va_end(args);
> ext4_handle_error(sb, force_ro, error, 0, block, function, line);
> }
>
So error reporting to kernel log is ratelimited and error reporting to
fsnotify is limited by a fixed size ring buffer which may be filled by
report floods from another filesystem, so user can miss the first
important error report from this filesystem.
Not optimal.
With my proposal of keeping a single fsnotify_error_info in every
fsnotify_sb_mark, users will be guaranteed to get the first error
report from every filesystem and once they read that report they
will be guaranteed to also get the next report.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
@ 2021-04-29 11:31 ` Dan Carpenter
2021-04-27 7:25 ` Amir Goldstein
1 sibling, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-27 4:33 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 5856 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210426184201.4177978-13-krisman@collabora.com>
References: <20210426184201.4177978-13-krisman@collabora.com>
TO: Gabriel Krisman Bertazi <krisman@collabora.com>
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
:::::: branch date: 10 hours ago
:::::: commit date: 10 hours ago
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/notify/fanotify/fanotify_user.c:112 fanotify_event_len() warn: this array is probably non-NULL. 'fee->fs_data'
Old smatch warnings:
fs/notify/fanotify/fanotify_user.c:1443 do_fanotify_mark() error: we previously assumed 'mnt' could be null (see line 1424)
vim +112 fs/notify/fanotify/fanotify_user.c
b5e798df944730 Gabriel Krisman Bertazi 2021-04-26 89
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 90 static size_t fanotify_event_len(struct fanotify_event *event,
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 91 unsigned int fid_mode)
5e469c830fdb5a Amir Goldstein 2019-01-10 92 {
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 93 size_t event_len = FAN_EVENT_METADATA_LEN;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 94 struct fanotify_info *info;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 95 int dir_fh_len;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 96 int fh_len;
929943b38daf81 Amir Goldstein 2020-07-16 97 int dot_len = 0;
f454fa610a69b9 Amir Goldstein 2020-07-16 98
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 99 if (fanotify_is_error_event(event->mask)) {
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 100 struct fanotify_error_event *fee = FANOTIFY_EE(event);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 101 /*
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 102 * Error events (FAN_ERROR) have a different format
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 103 * as follows:
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 104 * [ event_metadata ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 105 * [ fs-generic error header ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 106 * [ error location (optional) ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 107 * [ fs-specific blob ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 108 */
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 109 event_len = fanotify_error_info_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 110 if (fee->loc.function)
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 111 event_len += fanotify_location_info_len(&fee->loc);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 @112 if (fee->fs_data)
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 113 event_len += fanotify_error_fsdata_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 114 return event_len;
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 115 }
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 116
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 117 if (!fid_mode)
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 118 return event_len;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 119
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 120 info = fanotify_event_info(event);
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 121 dir_fh_len = fanotify_event_dir_fh_len(event);
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 122 fh_len = fanotify_event_object_fh_len(event);
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 123
929943b38daf81 Amir Goldstein 2020-07-16 124 if (dir_fh_len) {
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 125 event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
929943b38daf81 Amir Goldstein 2020-07-16 126 } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
929943b38daf81 Amir Goldstein 2020-07-16 127 /*
929943b38daf81 Amir Goldstein 2020-07-16 128 * With group flag FAN_REPORT_NAME, if name was not recorded in
929943b38daf81 Amir Goldstein 2020-07-16 129 * event on a directory, we will report the name ".".
929943b38daf81 Amir Goldstein 2020-07-16 130 */
929943b38daf81 Amir Goldstein 2020-07-16 131 dot_len = 1;
929943b38daf81 Amir Goldstein 2020-07-16 132 }
afc894c784c84c Jan Kara 2020-03-24 133
44d705b0370b1d Amir Goldstein 2020-03-19 134 if (fh_len)
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 135 event_len += fanotify_fid_info_len(fh_len, dot_len);
5e469c830fdb5a Amir Goldstein 2019-01-10 136
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 137 return event_len;
5e469c830fdb5a Amir Goldstein 2019-01-10 138 }
5e469c830fdb5a Amir Goldstein 2019-01-10 139
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 41522 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function
2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
@ 2021-04-27 4:42 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 4:42 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Every time this function is invoked, it is immediately added to
> FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
> length of info records. This minor clean up folds the rest of the
> calculation into the function, which now operates in terms of events,
> returning the size of the entire event, including metadata.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Nice
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/notify/fanotify/fanotify_user.c | 40 +++++++++++++++++-------------
> 1 file changed, 23 insertions(+), 17 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9e0c1afac8bd..0332c4afeec3 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -64,17 +64,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
> return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
> }
>
> -static int fanotify_event_info_len(unsigned int fid_mode,
> - struct fanotify_event *event)
> +static size_t fanotify_event_len(struct fanotify_event *event,
> + unsigned int fid_mode)
> {
> - struct fanotify_info *info = fanotify_event_info(event);
> - int dir_fh_len = fanotify_event_dir_fh_len(event);
> - int fh_len = fanotify_event_object_fh_len(event);
> - int info_len = 0;
> + size_t event_len = FAN_EVENT_METADATA_LEN;
> + struct fanotify_info *info;
> + int dir_fh_len;
> + int fh_len;
> int dot_len = 0;
>
> + if (!fid_mode)
> + return event_len;
> +
> + info = fanotify_event_info(event);
> + dir_fh_len = fanotify_event_dir_fh_len(event);
> + fh_len = fanotify_event_object_fh_len(event);
> +
> if (dir_fh_len) {
> - info_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
> + event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
> } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
> /*
> * With group flag FAN_REPORT_NAME, if name was not recorded in
> @@ -84,9 +91,9 @@ static int fanotify_event_info_len(unsigned int fid_mode,
> }
>
> if (fh_len)
> - info_len += fanotify_fid_info_len(fh_len, dot_len);
> + event_len += fanotify_fid_info_len(fh_len, dot_len);
>
> - return info_len;
> + return event_len;
> }
>
> /*
> @@ -98,7 +105,8 @@ static int fanotify_event_info_len(unsigned int fid_mode,
> static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> size_t count)
> {
> - size_t event_size = FAN_EVENT_METADATA_LEN;
> + size_t event_size;
> + struct fsnotify_event *fse;
> struct fanotify_event *event = NULL;
> unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
>
> @@ -108,16 +116,15 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> if (fsnotify_notify_queue_is_empty(group))
> goto out;
>
> - if (fid_mode) {
> - event_size += fanotify_event_info_len(fid_mode,
> - FANOTIFY_E(fsnotify_peek_first_event(group)));
> - }
> + fse = fsnotify_peek_first_event(group);
> + event = FANOTIFY_E(fse);
> + event_size = fanotify_event_len(event, fid_mode);
>
> if (event_size > count) {
> event = ERR_PTR(-EINVAL);
> goto out;
> }
> - event = FANOTIFY_E(fsnotify_remove_first_event(group));
> + fsnotify_remove_queued_event(group, fse);
> if (fanotify_is_perm_event(event->mask))
> FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
> out:
> @@ -334,8 +341,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>
> pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> - metadata.event_len = FAN_EVENT_METADATA_LEN +
> - fanotify_event_info_len(fid_mode, event);
> + metadata.event_len = fanotify_event_len(event, fid_mode);
> metadata.metadata_len = FAN_EVENT_METADATA_LEN;
> metadata.vers = FANOTIFY_METADATA_VERSION;
> metadata.reserved = 0;
> --
> 2.31.0
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks
2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
@ 2021-04-27 4:53 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 4:53 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> FAN_ERROR will require fsid, but not necessarily require the filesystem
> to expose a file handle. Split those checks into different functions, so
> they can be used separately when creating a mark.
Ok for the split, but...
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/fanotify/fanotify_user.c | 35 +++++++++++++++++++-----------
> 1 file changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 0332c4afeec3..e0d113e3b65c 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1055,7 +1055,23 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> }
>
> /* Check if filesystem can encode a unique fid */
> -static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
> +static int fanotify_test_fid(struct path *path)
This helper can take a dentry.
> +{
> + /*
> + * We need to make sure that the file system supports at least
> + * encoding a file handle so user can use name_to_handle_at() to
> + * compare fid returned with event to the file handle of watched
> + * objects. However, name_to_handle_at() requires that the
> + * filesystem also supports decoding file handles.
> + */
> + if (!path->dentry->d_sb->s_export_op ||
> + !path->dentry->d_sb->s_export_op->fh_to_dentry)
> + return -EOPNOTSUPP;
> +
> + return 0;
> +}
> +
> +static int fanotify_check_path_fsid(struct path *path, __kernel_fsid_t *fsid)
And so does this helper.
I certainly don't see the need for the _path_ in the helper name.
> {
> __kernel_fsid_t root_fsid;
> int err;
> @@ -1082,17 +1098,6 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
> root_fsid.val[1] != fsid->val[1])
> return -EXDEV;
>
> - /*
> - * We need to make sure that the file system supports at least
> - * encoding a file handle so user can use name_to_handle_at() to
> - * compare fid returned with event to the file handle of watched
> - * objects. However, name_to_handle_at() requires that the
> - * filesystem also supports decoding file handles.
> - */
> - if (!path->dentry->d_sb->s_export_op ||
> - !path->dentry->d_sb->s_export_op->fh_to_dentry)
> - return -EOPNOTSUPP;
> -
> return 0;
> }
>
> @@ -1230,7 +1235,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
> }
>
> if (fid_mode) {
> - ret = fanotify_test_fid(&path, &__fsid);
> + ret = fanotify_check_path_fsid(&path, &__fsid);
> + if (ret)
> + goto path_put_and_out;
> +
> + ret = fanotify_test_fid(&path);
Whether _test_ or _check_ please stick to one.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation
2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
@ 2021-04-27 5:03 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 5:03 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Introduce a flags field in fsnotify_group to track the mode of
> submission this group has.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/dnotify/dnotify.c | 2 +-
> fs/notify/fanotify/fanotify_user.c | 4 ++--
> fs/notify/group.c | 13 ++++++++-----
> fs/notify/inotify/inotify_user.c | 2 +-
> include/linux/fsnotify_backend.h | 7 +++++--
> kernel/audit_fsnotify.c | 2 +-
> kernel/audit_tree.c | 2 +-
> kernel/audit_watch.c | 2 +-
> 8 files changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
> index e85e13c50d6d..37960c8750e4 100644
> --- a/fs/notify/dnotify/dnotify.c
> +++ b/fs/notify/dnotify/dnotify.c
> @@ -383,7 +383,7 @@ static int __init dnotify_init(void)
> SLAB_PANIC|SLAB_ACCOUNT);
> dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
>
> - dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops);
> + dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0);
> if (IS_ERR(dnotify_group))
> panic("unable to allocate fsnotify group for dnotify\n");
> return 0;
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index e0d113e3b65c..f50c4ab721e3 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -929,7 +929,7 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void)
> SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> {
> struct fsnotify_group *group;
> - int f_flags, fd;
> + int f_flags, fd, fsn_flags = 0;
> struct user_struct *user;
> unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
> unsigned int class = flags & FANOTIFY_CLASS_BITS;
> @@ -982,7 +982,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> f_flags |= O_NONBLOCK;
>
> /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
> - group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
> + group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
> if (IS_ERR(group)) {
> free_uid(user);
> return PTR_ERR(group);
> diff --git a/fs/notify/group.c b/fs/notify/group.c
> index ffd723ffe46d..08acb1afc0c2 100644
> --- a/fs/notify/group.c
> +++ b/fs/notify/group.c
> @@ -112,7 +112,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
> EXPORT_SYMBOL_GPL(fsnotify_put_group);
>
> static struct fsnotify_group *__fsnotify_alloc_group(
> - const struct fsnotify_ops *ops, gfp_t gfp)
> + const struct fsnotify_ops *ops, unsigned int flags, gfp_t gfp)
> {
> struct fsnotify_group *group;
>
> @@ -134,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group(
> INIT_LIST_HEAD(&group->marks_list);
>
> group->ops = ops;
> + group->flags = flags;
>
> return group;
> }
> @@ -141,18 +142,20 @@ static struct fsnotify_group *__fsnotify_alloc_group(
> /*
> * Create a new fsnotify_group and hold a reference for the group returned.
> */
> -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
> +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
> + unsigned int flags)
> {
> - return __fsnotify_alloc_group(ops, GFP_KERNEL);
> + return __fsnotify_alloc_group(ops, flags, GFP_KERNEL);
> }
> EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
>
> /*
> * Create a new fsnotify_group and hold a reference for the group returned.
> */
> -struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops)
> +struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops,
> + unsigned int flags)
> {
> - return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT);
> + return __fsnotify_alloc_group(ops, flags, GFP_KERNEL_ACCOUNT);
> }
> EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group);
>
*IF* we go this way, note that fsnotify_alloc_group() doesn't need
flags argument.
None of the callers of fsnotify_alloc_group() ever use the
notification list, so it
would be better to pass flag FSN_NOTIFICATION_LIST from the backends that
do use it (fanotify and inotify) for the sake of symmetry with FSN_RING_BUFFER
and no need to change other callers.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
2021-04-26 22:00 ` kernel test robot
2021-04-26 22:43 ` kernel test robot
@ 2021-04-27 5:39 ` Amir Goldstein
2021-04-29 18:33 ` Gabriel Krisman Bertazi
2 siblings, 1 reply; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 5:39 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> In order to support file system health/error reporting over fanotify,
> fsnotify needs to expose a submission path that doesn't allow sleeping.
> The only problem I identified with the current submission path is the
> need to dynamically allocate memory for the event queue.
>
> This patch avoids the problem by introducing a new mode in fsnotify,
> where a ring buffer is used to submit events for a group. Each group
> has its own ring buffer, and error notifications are submitted
> exclusively through it.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/Makefile | 2 +-
> fs/notify/group.c | 12 +-
> fs/notify/notification.c | 10 ++
> fs/notify/ring.c | 199 +++++++++++++++++++++++++++++++
> include/linux/fsnotify_backend.h | 37 +++++-
> 5 files changed, 255 insertions(+), 5 deletions(-)
> create mode 100644 fs/notify/ring.c
>
> diff --git a/fs/notify/Makefile b/fs/notify/Makefile
> index 63a4b8828df4..61dae1e90f2d 100644
> --- a/fs/notify/Makefile
> +++ b/fs/notify/Makefile
> @@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0
> obj-$(CONFIG_FSNOTIFY) += fsnotify.o notification.o group.o mark.o \
> - fdinfo.o
> + fdinfo.o ring.o
>
> obj-y += dnotify/
> obj-y += inotify/
> diff --git a/fs/notify/group.c b/fs/notify/group.c
> index 08acb1afc0c2..b99b3de36696 100644
> --- a/fs/notify/group.c
> +++ b/fs/notify/group.c
> @@ -81,7 +81,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
> * notification against this group. So clearing the notification queue
> * of all events is reliable now.
> */
> - fsnotify_flush_notify(group);
> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> + fsnotify_free_ring_buffer(group);
> + else
> + fsnotify_flush_notify(group);
>
> /*
> * Destroy overflow event (we cannot use fsnotify_destroy_event() as
> @@ -136,6 +139,13 @@ static struct fsnotify_group *__fsnotify_alloc_group(
> group->ops = ops;
> group->flags = flags;
>
> + if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
> + if (fsnotify_create_ring_buffer(group)) {
> + kfree(group);
> + return ERR_PTR(-ENOMEM);
> + }
> + }
> +
> return group;
> }
>
> diff --git a/fs/notify/notification.c b/fs/notify/notification.c
> index 75d79d6d3ef0..32f97e7b7a80 100644
> --- a/fs/notify/notification.c
> +++ b/fs/notify/notification.c
> @@ -51,6 +51,10 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
> bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
> {
> assert_spin_locked(&group->notification_lock);
> +
> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> + return fsnotify_ring_notify_queue_is_empty(group);
> +
> return list_empty(&group->notification_list) ? true : false;
> }
>
> @@ -132,6 +136,9 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group,
> struct fsnotify_event *event)
> {
> assert_spin_locked(&group->notification_lock);
> +
> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> + return;
> /*
> * We need to init list head for the case of overflow event so that
> * check in fsnotify_add_event() works
> @@ -166,6 +173,9 @@ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
> {
> assert_spin_locked(&group->notification_lock);
>
> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> + return fsnotify_ring_peek_first_event(group);
> +
> return list_first_entry(&group->notification_list,
> struct fsnotify_event, list);
> }
> diff --git a/fs/notify/ring.c b/fs/notify/ring.c
> new file mode 100644
> index 000000000000..75e8af1f8d80
> --- /dev/null
> +++ b/fs/notify/ring.c
> @@ -0,0 +1,199 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/types.h>
> +#include <linux/fsnotify.h>
> +#include <linux/memcontrol.h>
> +
> +#define INVALID_RING_SLOT -1
> +
> +#define FSNOTIFY_RING_PAGES 16
> +
> +#define NEXT_SLOT(cur, len, ring_size) ((cur + len) & (ring_size-1))
> +#define NEXT_PAGE(cur, ring_size) (round_up(cur, PAGE_SIZE) & (ring_size-1))
> +
> +bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group)
> +{
> + assert_spin_locked(&group->notification_lock);
> +
> + if (group->ring_buffer.tail == group->ring_buffer.head)
> + return true;
> + return false;
> +}
> +
> +struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group)
> +{
> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> + struct fsnotify_event *fsn;
> + char *kaddr;
> + u64 tail;
> +
> + assert_spin_locked(&group->notification_lock);
> +
> +again:
> + tail = group->ring_buffer.tail;
> +
> + if ((PAGE_SIZE - (tail & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
> + group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
> + goto again;
> + }
> +
> + kaddr = kmap_atomic(group->ring_buffer.pages[tail / PAGE_SIZE]);
> + if (!kaddr)
> + return NULL;
> + fsn = (struct fsnotify_event *) (kaddr + (tail & (PAGE_SIZE-1)));
> +
> + if (fsn->slot_len == INVALID_RING_SLOT) {
> + group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
> + kunmap_atomic(kaddr);
> + goto again;
> + }
> +
> + /* will be unmapped when entry is consumed. */
> + return fsn;
> +}
> +
> +void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
> + struct fsnotify_event *event)
> +{
> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> + u64 new_tail = NEXT_SLOT(group->ring_buffer.tail, event->slot_len, ring_size);
> +
> + kunmap_atomic(event);
> +
> + pr_debug("%s: group=%p tail=%llx->%llx ring_size=%llu\n", __func__,
> + group, group->ring_buffer.tail, new_tail, ring_size);
> +
> + WRITE_ONCE(group->ring_buffer.tail, new_tail);
> +}
> +
> +struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
> + size_t size)
> + __acquires(&group->notification_lock)
> +{
> + struct fsnotify_event *fsn;
> + u64 head, tail;
> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> + u64 new_head;
> + void *kaddr;
> +
> + if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
> + return ERR_PTR(-EINVAL);
> +
> + pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
> + ring_size, size);
> +
> + spin_lock(&group->notification_lock);
> +again:
> + head = group->ring_buffer.head;
> + tail = group->ring_buffer.tail;
> + new_head = NEXT_SLOT(head, size, ring_size);
> +
> + /* head would catch up to tail, corrupting an entry. */
> + if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
> + fsn = ERR_PTR(-ENOMEM);
> + goto err;
> + }
> +
> + /*
> + * Not event a skip message fits in the page. We can detect the
> + * lack of space. Move on to the next page.
> + */
> + if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
> + /* Start again on next page */
> + group->ring_buffer.head = NEXT_PAGE(head, ring_size);
> + goto again;
> + }
> +
> + kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
> + if (!kaddr) {
> + fsn = ERR_PTR(-EFAULT);
> + goto err;
> + }
> +
> + fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
> +
> + if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
> + /*
> + * No room in the current page. Add a fake entry
> + * consuming the end the page to avoid splitting event
> + * structure.
> + */
> + fsn->slot_len = INVALID_RING_SLOT;
> + kunmap_atomic(kaddr);
> + /* Start again on the next page */
> + group->ring_buffer.head = NEXT_PAGE(head, ring_size);
> +
> + goto again;
> + }
> + fsn->slot_len = size;
> +
> + return fsn;
> +
> +err:
> + spin_unlock(&group->notification_lock);
> + return fsn;
> +}
> +
> +void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn)
> + __releases(&group->notification_lock)
> +{
> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> + u64 head = group->ring_buffer.head;
> + u64 new_head = NEXT_SLOT(head, fsn->slot_len, ring_size);
> +
> + pr_debug("%s: group=%p head=%llx->%llx ring_size=%llu\n", __func__,
> + group, head, new_head, ring_size);
> +
> + kunmap_atomic(fsn);
> + group->ring_buffer.head = new_head;
> +
> + spin_unlock(&group->notification_lock);
> +
> + wake_up(&group->notification_waitq);
> + kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
> +
> +}
> +
> +void fsnotify_free_ring_buffer(struct fsnotify_group *group)
> +{
> + int i;
> +
> + for (i = 0; i < group->ring_buffer.nr_pages; i++)
> + __free_page(group->ring_buffer.pages[i]);
> + kfree(group->ring_buffer.pages);
> + group->ring_buffer.nr_pages = 0;
> +}
> +
> +int fsnotify_create_ring_buffer(struct fsnotify_group *group)
> +{
> + int nr_pages = FSNOTIFY_RING_PAGES;
> + int i;
> +
> + pr_debug("%s: group=%p pages=%d\n", __func__, group, nr_pages);
> +
> + group->ring_buffer.pages = kmalloc_array(nr_pages, sizeof(struct pages *),
> + GFP_KERNEL);
> + if (!group->ring_buffer.pages)
> + return -ENOMEM;
> +
> + group->ring_buffer.head = 0;
> + group->ring_buffer.tail = 0;
> +
> + for (i = 0; i < nr_pages; i++) {
> + group->ring_buffer.pages[i] = alloc_pages(GFP_KERNEL, 1);
> + if (!group->ring_buffer.pages)
> + goto err_dealloc;
> + }
> +
> + group->ring_buffer.nr_pages = nr_pages;
> +
> + return 0;
> +
> +err_dealloc:
> + for (--i; i >= 0; i--)
> + __free_page(group->ring_buffer.pages[i]);
> + kfree(group->ring_buffer.pages);
> + group->ring_buffer.nr_pages = 0;
> + return -ENOMEM;
> +}
> +
> +
Nothing in this file is fsnotify specific.
Is there no kernel lib implementation for this already?
If there isn't (I'd be very surprised) please put this in lib/ and post it
for wider review including self tests.
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 190c6a402e98..a1a4dd69e5ed 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -74,6 +74,8 @@
> #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
> FS_OPEN_EXEC_PERM)
>
> +#define FSN_SUBMISSION_RING_BUFFER 0x00000080
FSNOTIFY_GROUP_FLAG_RING_BUFFER please (or FSN_GROUP_ if you must)
and please define this above struct fsnotify_group, even right above the flags
field like FSNOTIFY_CONN_FLAG_HAS_FSID
*IF* we go this way :)
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 06/15] fanotify: Support submission through ring buffer
2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
@ 2021-04-27 6:02 ` Amir Goldstein
2021-04-29 18:36 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 6:02 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This adds support for the ring buffer mode in fanotify. It is enabled
> by a new flag FAN_PREALLOC_QUEUE passed to fanotify_init. If this flag
> is enabled, the group only allows marks that support the ring buffer
I don't like this limitation.
I think FAN_PREALLOC_QUEUE can work with other events, why not?
In any case if we keep ring buffer, please use a different set of
fanotify_ring_buffer_ops struct instead of spraying if/else all over the
event queue implementation.
> submission. In a following patch, FAN_ERROR will make use of this
> mechanism.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/fanotify/fanotify.c | 77 +++++++++++++++++++---------
> fs/notify/fanotify/fanotify_user.c | 81 ++++++++++++++++++------------
> include/linux/fanotify.h | 5 +-
> include/uapi/linux/fanotify.h | 1 +
> 4 files changed, 105 insertions(+), 59 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index e3669d8a4a64..98591a8155a7 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -612,6 +612,26 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
> return event;
> }
>
> +static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *group,
> + u32 mask, const void *data,
> + int data_type)
> +{
> + size_t size = 0;
> +
> + pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
> +
> + return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
> +}
> +
> +static void fanotify_ring_write_event(struct fsnotify_group *group,
> + struct fanotify_event *event, u32 mask,
> + const void *data, __kernel_fsid_t *fsid)
> +{
> + fanotify_init_event(group, event, 0, mask);
> +
> + event->pid = get_pid(task_tgid(current));
> +}
> +
> /*
> * Get cached fsid of the filesystem containing the object from any connector.
> * All connectors are supposed to have the same fsid, but we do not verify that
> @@ -701,31 +721,38 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> return 0;
> }
>
> - event = fanotify_alloc_event(group, mask, data, data_type, dir,
> - file_name, &fsid);
> - ret = -ENOMEM;
> - if (unlikely(!event)) {
> - /*
> - * We don't queue overflow events for permission events as
> - * there the access is denied and so no event is in fact lost.
> - */
> - if (!fanotify_is_perm_event(mask))
> - fsnotify_queue_overflow(group);
> - goto finish;
> - }
> -
> - fsn_event = &event->fse;
> - ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
> - if (ret) {
> - /* Permission events shouldn't be merged */
> - BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
> - /* Our event wasn't used in the end. Free it. */
> - fsnotify_destroy_event(group, fsn_event);
> -
> - ret = 0;
> - } else if (fanotify_is_perm_event(mask)) {
> - ret = fanotify_get_response(group, FANOTIFY_PERM(event),
> - iter_info);
> + if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
> + event = fanotify_ring_get_slot(group, mask, data, data_type);
> + if (IS_ERR(event))
> + return PTR_ERR(event);
So no FAN_OVERFLOW with the ring buffer implementation?
This will be unexpected for fanotify users and frankly, less useful IMO.
I also don't see the technical reason to omit the overflow event.
> + fanotify_ring_write_event(group, event, mask, data, &fsid);
> + fsnotify_ring_commit_slot(group, &event->fse);
> + } else {
> + event = fanotify_alloc_event(group, mask, data, data_type, dir,
> + file_name, &fsid);
> + ret = -ENOMEM;
> + if (unlikely(!event)) {
> + /*
> + * We don't queue overflow events for permission events as
> + * there the access is denied and so no event is in fact lost.
> + */
> + if (!fanotify_is_perm_event(mask))
> + fsnotify_queue_overflow(group);
> + goto finish;
> + }
> + fsn_event = &event->fse;
> + ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
> + if (ret) {
> + /* Permission events shouldn't be merged */
> + BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
> + /* Our event wasn't used in the end. Free it. */
> + fsnotify_destroy_event(group, fsn_event);
> +
> + ret = 0;
> + } else if (fanotify_is_perm_event(mask)) {
> + ret = fanotify_get_response(group, FANOTIFY_PERM(event),
> + iter_info);
> + }
> }
> finish:
> if (fanotify_is_perm_event(mask))
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index fe605359af88..5031198bf7db 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -521,7 +521,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> * Permission events get queued to wait for response. Other
> * events can be destroyed now.
> */
> - if (!fanotify_is_perm_event(event->mask)) {
> + if (group->fanotify_data.flags & FAN_PREALLOC_QUEUE) {
> + fsnotify_ring_buffer_consume_event(group, &event->fse);
> + } else if (!fanotify_is_perm_event(event->mask)) {
> fsnotify_destroy_event(group, &event->fse);
> } else {
> if (ret <= 0) {
> @@ -587,40 +589,39 @@ static int fanotify_release(struct inode *ignored, struct file *file)
> */
> fsnotify_group_stop_queueing(group);
>
> - /*
> - * Process all permission events on access_list and notification queue
> - * and simulate reply from userspace.
> - */
> - spin_lock(&group->notification_lock);
> - while (!list_empty(&group->fanotify_data.access_list)) {
> - struct fanotify_perm_event *event;
> -
> - event = list_first_entry(&group->fanotify_data.access_list,
> - struct fanotify_perm_event, fae.fse.list);
> - list_del_init(&event->fae.fse.list);
> - finish_permission_event(group, event, FAN_ALLOW);
> + if (!(group->flags & FSN_SUBMISSION_RING_BUFFER)) {
> + /*
> + * Process all permission events on access_list and notification queue
> + * and simulate reply from userspace.
> + */
> spin_lock(&group->notification_lock);
> - }
> -
> - /*
> - * Destroy all non-permission events. For permission events just
> - * dequeue them and set the response. They will be freed once the
> - * response is consumed and fanotify_get_response() returns.
> - */
> - while (!fsnotify_notify_queue_is_empty(group)) {
> - struct fanotify_event *event;
> -
> - event = FANOTIFY_E(fsnotify_remove_first_event(group));
> - if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
> - spin_unlock(&group->notification_lock);
> - fsnotify_destroy_event(group, &event->fse);
> - } else {
> - finish_permission_event(group, FANOTIFY_PERM(event),
> - FAN_ALLOW);
> + while (!list_empty(&group->fanotify_data.access_list)) {
> + struct fanotify_perm_event *event;
> + event = list_first_entry(&group->fanotify_data.access_list,
> + struct fanotify_perm_event, fae.fse.list);
> + list_del_init(&event->fae.fse.list);
> + finish_permission_event(group, event, FAN_ALLOW);
> + spin_lock(&group->notification_lock);
> }
> - spin_lock(&group->notification_lock);
> + /*
> + * Destroy all non-permission events. For permission events just
> + * dequeue them and set the response. They will be freed once the
> + * response is consumed and fanotify_get_response() returns.
> + */
> + while (!fsnotify_notify_queue_is_empty(group)) {
> + struct fanotify_event *event;
> + event = FANOTIFY_E(fsnotify_remove_first_event(group));
> + if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
> + spin_unlock(&group->notification_lock);
> + fsnotify_destroy_event(group, &event->fse);
> + } else {
> + finish_permission_event(group, FANOTIFY_PERM(event),
> + FAN_ALLOW);
> + }
> + spin_lock(&group->notification_lock);
> + }
> + spin_unlock(&group->notification_lock);
> }
> - spin_unlock(&group->notification_lock);
>
> /* Response for all permission events it set, wakeup waiters */
> wake_up(&group->fanotify_data.access_waitq);
> @@ -981,6 +982,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> if (flags & FAN_NONBLOCK)
> f_flags |= O_NONBLOCK;
>
> + if (flags & FAN_PREALLOC_QUEUE) {
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (flags & FAN_UNLIMITED_QUEUE)
> + return -EINVAL;
> +
> + fsn_flags = FSN_SUBMISSION_RING_BUFFER;
> + }
> +
> /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
> group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
> if (IS_ERR(group)) {
> @@ -1223,6 +1234,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
> goto fput_and_out;
> }
>
> + if ((group->flags & FSN_SUBMISSION_RING_BUFFER) &&
> + (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
> + goto fput_and_out;
> +
> ret = fanotify_find_path(dfd, pathname, &path, flags,
> (mask & ALL_FSNOTIFY_EVENTS), obj_type);
> if (ret)
> @@ -1327,7 +1342,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
> */
> static int __init fanotify_user_setup(void)
> {
> - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
> + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
> BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>
> fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 3e9c56ee651f..5a4cefb4b1c3 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -23,7 +23,8 @@
> #define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
> FAN_REPORT_TID | \
> FAN_CLOEXEC | FAN_NONBLOCK | \
> - FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
> + FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
> + FAN_PREALLOC_QUEUE)
>
> #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \
> FAN_MARK_FILESYSTEM)
> @@ -71,6 +72,8 @@
> FANOTIFY_PERM_EVENTS | \
> FAN_Q_OVERFLOW | FAN_ONDIR)
>
> +#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
FANOTIFY_RING_BUFFER_EVENTS? FANOTIFY_PREALLOC_EVENTS?
Please leave a comment above to state what this group means.
I *think* there is no reason to limit the set of events, only the sort of
information that is possible with FAN_PREALLOC_QUEUE.
Perhaps FAN_REPORT_FID cannot be allowed and as a result
FANOTIFY_INODE_EVENTS will not be allowed, but I am not even
sure if that limitation is needed.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events
2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
@ 2021-04-27 6:49 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 6:49 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> include/linux/fsnotify.h | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index f8acddcf54fb..b3ac1a9d0d4d 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -317,4 +317,19 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
> fsnotify_dentry(dentry, mask);
> }
>
> +static inline void fsnotify_error_event(int error, struct inode *dir,
> + const char *function, int line,
> + void *fs_data, int fs_data_size)
> +{
> + struct fs_error_report report = {
> + .error = error,
> + .line = line,
> + .function = function,
> + .fs_data_size = fs_data_size,
> + .fs_data = fs_data,
> + };
> +
> + fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, dir, NULL, NULL, 0);
The way you use this helper from ext4_fsnotify_error() it would make more sense
to name the inode argument 'inode' and call:
fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, NULL, NULL, inode, 0);
Also, if we stick with returning ENOMEM instead of overflow event (I
don't think we should),
then this helper should return the error as well.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 09/15] fanotify: Introduce generic error record
2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
@ 2021-04-27 7:01 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 7:01 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This record describes a fs error in a fs agnostic way. It will be send
> back to userspace in response to a FSNOTIFY_EVENT_ERROR for groups with
> the FAN_ERROR mark.
It's not a mark, it's an event, so:
"...for groups with the FAN_ERROR event in their mark mask"
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/fanotify/fanotify.h | 16 ++++++++++++++++
> fs/notify/fanotify/fanotify_user.c | 28 ++++++++++++++++++++++++++++
> include/uapi/linux/fanotify.h | 10 ++++++++++
> 3 files changed, 54 insertions(+)
>
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 47299e3d6efd..4cb9dd31f084 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -179,6 +179,22 @@ FANOTIFY_NE(struct fanotify_event *event)
> return container_of(event, struct fanotify_name_event, fae);
> }
>
> +struct fanotify_error_event {
> + struct fanotify_event fae;
> + int error;
> + __kernel_fsid_t fsid;
> +
> + int fs_data_size;
> + /* Must be the last item in the structure */
> + char fs_data[0];
> +};
> +
> +static inline struct fanotify_error_event *
> +FANOTIFY_EE(struct fanotify_event *event)
> +{
> + return container_of(event, struct fanotify_error_event, fae);
> +}
> +
> static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
> {
> if (event->type == FANOTIFY_EVENT_TYPE_FID)
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 5031198bf7db..21162d347bd1 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -64,6 +64,11 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
> return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
> }
>
> +static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
> +{
> + return sizeof(struct fanotify_event_info_error);
> +}
> +
> static size_t fanotify_event_len(struct fanotify_event *event,
> unsigned int fid_mode)
> {
> @@ -232,6 +237,29 @@ static int process_access_response(struct fsnotify_group *group,
> return -ENOENT;
> }
>
> +static size_t copy_error_info_to_user(struct fanotify_error_event *fee,
> + char __user *buf, int count)
> +{
> + struct fanotify_event_info_error info;
> +
> + info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
> + info.hdr.pad = 0;
> + info.hdr.len = fanotify_error_info_len(fee);
> +
> + if (WARN_ON(count < info.hdr.len))
> + return -EFAULT;
> +
> + info.version = FANOTIFY_EVENT_INFO_ERROR_VERS_1;
> + info.error = fee->error;
> + info.fsid = fee->fsid;
> +
> + if (copy_to_user(buf, &info, sizeof(info)))
> + return -EFAULT;
> +
> + return info.hdr.len;
> +
> +}
> +
> static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
> int info_type, const char *name, size_t name_len,
> char __user *buf, size_t count)
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index b283531549f1..cc9a1fa80e30 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -124,6 +124,7 @@ struct fanotify_event_metadata {
> #define FAN_EVENT_INFO_TYPE_FID 1
> #define FAN_EVENT_INFO_TYPE_DFID_NAME 2
> #define FAN_EVENT_INFO_TYPE_DFID 3
> +#define FAN_EVENT_INFO_TYPE_ERROR 4
>
> /* Variable length info record following event metadata */
> struct fanotify_event_info_header {
> @@ -149,6 +150,15 @@ struct fanotify_event_info_fid {
> unsigned char handle[0];
> };
>
> +#define FANOTIFY_EVENT_INFO_ERROR_VERS_1 1
Honestly, this struct is too simple to have a 'version'.
The format of this simple struct is already defined by
FAN_EVENT_INFO_TYPE_ERROR and if we want to change
the reported info in the future, we can use
FAN_EVENT_INFO_TYPE_ERROR_V2.
In fact, I suggest to name the type
FAN_EVENT_INFO_TYPE_FS_ERROR
to differentiate from a future
FAN_EVENT_INFO_TYPE_WB_ERROR
> +
> +struct fanotify_event_info_error {
> + struct fanotify_event_info_header hdr;
> + int version;
> + int error;
> + __kernel_fsid_t fsid;
> +};
I suggest to put an error seq counter in this struct.
The per-sb seq counter can be provided by the filesystem
or by fsnotify.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 10/15] fanotify: Introduce code location record
2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
@ 2021-04-27 7:11 ` Amir Goldstein
2021-04-29 18:40 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 7:11 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This patch introduces an optional info record that describes the
> source (as in the region of the source-code where an event was
> initiated). This record is not produced for other type of existing
> notification, but it is optionally enabled for FAN_ERROR notifications.
>
I find this functionality controversial, because think that the fs provided
s_last_error*, s_first_error* is more reliable and more powerful than this
functionality.
Let's leave it for a future extending proposal, should fanotify event reporting
proposal pass muster, shall we?
Or do you think that without this optional extension fanotify event reporting
will not be valuable enough?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record
2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
@ 2021-04-27 7:12 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 7:12 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Allow a FS_ERROR_TYPE notification to send a filesystem provided blob
> back to userspace. This is useful for filesystems who want to provide
> debug information for recovery tools.
>
Same comment as for FAN_EVENT_INFO_TYPE_LOCATION.
Can we leave this patch out of the discussion for now?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
2021-04-26 22:45 ` kernel test robot
@ 2021-04-27 7:25 ` Amir Goldstein
1 sibling, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 7:25 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> The FAN_ERROR mark is used by filesystem wide monitoring tools to
> receive notifications of type FS_ERROR_EVENT, emited by filesystems when
> a problem is detected. The error notification includes a generic error
> descriptor, an optional location record and a filesystem specific blob.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/fanotify/fanotify.c | 48 +++++++++++++++++++----
> fs/notify/fanotify/fanotify.h | 8 ++++
> fs/notify/fanotify/fanotify_user.c | 63 ++++++++++++++++++++++++++++++
> include/linux/fanotify.h | 9 ++++-
> include/uapi/linux/fanotify.h | 2 +
> 5 files changed, 120 insertions(+), 10 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 98591a8155a7..6bae23d42e5e 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -240,12 +240,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
> __func__, iter_info->report_mask, event_mask, data, data_type);
>
> if (!fid_mode) {
> - /* Do we have path to open a file descriptor? */
> - if (!path)
> - return 0;
> - /* Path type events are only relevant for files and dirs */
> - if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
> - return 0;
> + if (!fanotify_is_error_event(event_mask)) {
This open coded nested condition is not nice.
If we get as far as this, I will explain what needs to be done.
Need helpers fanotify_is_reporting_fd(), fanotify_is_reporting_fid() and
fanotify_is_reporting_dir_fid().
Thanks,
Amir.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 13/15] ext4: Send notifications on error
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
@ 2021-04-29 13:19 ` Dan Carpenter
2021-04-27 4:32 ` Amir Goldstein
2021-04-29 0:57 ` Darrick J. Wong
2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-27 8:36 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 17965 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210426184201.4177978-14-krisman@collabora.com>
References: <20210426184201.4177978-14-krisman@collabora.com>
TO: Gabriel Krisman Bertazi <krisman@collabora.com>
Hi Gabriel,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
:::::: branch date: 14 hours ago
:::::: commit date: 14 hours ago
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/ext4/super.c:739 ext4_fsnotify_error() warn: variable dereferenced before check 'inode' (see line 738)
fs/ext4/super.c:918 __ext4_std_error() error: uninitialized symbol 'errstr'.
Old smatch warnings:
fs/ext4/super.c:3787 ext4_register_li_request() error: we previously assumed 'ext4_li_info' could be null (see line 3769)
fs/ext4/super.c:4463 ext4_fill_super() warn: bitwise AND condition is false here
vim +/inode +739 fs/ext4/super.c
ac27a0ec112a08 Dave Kleikamp 2006-10-11 731
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 732 static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 733 const char *func, int line,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 734 const char *desc, struct va_format *vaf)
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 735 {
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 736 struct ext4_error_inode_report report;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 737
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @738 if (inode->i_sb->s_fsnotify_marks) {
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @739 report.inode = inode ? inode->i_ino : -1L;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 740 report.block = block ? block : -1L;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 741
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 742 snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 743
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 744 fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 745 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 746 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 747
efbed4dc5857f8 Theodore Ts'o 2013-10-17 748 #define ext4_error_ratelimit(sb) \
efbed4dc5857f8 Theodore Ts'o 2013-10-17 749 ___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state), \
efbed4dc5857f8 Theodore Ts'o 2013-10-17 750 "EXT4-fs error")
efbed4dc5857f8 Theodore Ts'o 2013-10-17 751
12062dddda4509 Eric Sandeen 2010-02-15 752 void __ext4_error(struct super_block *sb, const char *function,
014c9caa29d3a4 Jan Kara 2020-11-27 753 unsigned int line, bool force_ro, int error, __u64 block,
54d3adbc29f0c7 Theodore Ts'o 2020-03-28 754 const char *fmt, ...)
ac27a0ec112a08 Dave Kleikamp 2006-10-11 755 {
0ff2ea7d84e311 Joe Perches 2010-12-19 756 struct va_format vaf;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 757 va_list args;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 758
0db1ff222d40f1 Theodore Ts'o 2017-02-05 759 if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
0db1ff222d40f1 Theodore Ts'o 2017-02-05 760 return;
0db1ff222d40f1 Theodore Ts'o 2017-02-05 761
ccf0f32acd436b Theodore Ts'o 2018-02-18 762 trace_ext4_error(sb, function, line);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 763
ac27a0ec112a08 Dave Kleikamp 2006-10-11 764 va_start(args, fmt);
0ff2ea7d84e311 Joe Perches 2010-12-19 765 vaf.fmt = fmt;
0ff2ea7d84e311 Joe Perches 2010-12-19 766 vaf.va = &args;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 767 if (ext4_error_ratelimit(sb)) {
efbed4dc5857f8 Theodore Ts'o 2013-10-17 768 printk(KERN_CRIT
efbed4dc5857f8 Theodore Ts'o 2013-10-17 769 "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
0ff2ea7d84e311 Joe Perches 2010-12-19 770 sb->s_id, function, line, current->comm, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 771
efbed4dc5857f8 Theodore Ts'o 2013-10-17 772 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 773 ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 774 va_end(args);
e789ca0cc1d512 Jan Kara 2020-12-16 775 ext4_handle_error(sb, force_ro, error, 0, block, function, line);
ac27a0ec112a08 Dave Kleikamp 2006-10-11 776 }
ac27a0ec112a08 Dave Kleikamp 2006-10-11 777
e7c96e8e47baf2 Joe Perches 2013-07-01 778 void __ext4_error_inode(struct inode *inode, const char *function,
54d3adbc29f0c7 Theodore Ts'o 2020-03-28 779 unsigned int line, ext4_fsblk_t block, int error,
273df556b6ee20 Frank Mayhar 2010-03-02 780 const char *fmt, ...)
273df556b6ee20 Frank Mayhar 2010-03-02 781 {
273df556b6ee20 Frank Mayhar 2010-03-02 782 va_list args;
f7c21177af0b32 Theodore Ts'o 2011-01-10 783 struct va_format vaf;
273df556b6ee20 Frank Mayhar 2010-03-02 784
0db1ff222d40f1 Theodore Ts'o 2017-02-05 785 if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
0db1ff222d40f1 Theodore Ts'o 2017-02-05 786 return;
0db1ff222d40f1 Theodore Ts'o 2017-02-05 787
ccf0f32acd436b Theodore Ts'o 2018-02-18 788 trace_ext4_error(inode->i_sb, function, line);
273df556b6ee20 Frank Mayhar 2010-03-02 789 va_start(args, fmt);
f7c21177af0b32 Theodore Ts'o 2011-01-10 790 vaf.fmt = fmt;
f7c21177af0b32 Theodore Ts'o 2011-01-10 791 vaf.va = &args;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 792 if (ext4_error_ratelimit(inode->i_sb)) {
c398eda0e43a79 Theodore Ts'o 2010-07-27 793 if (block)
d9ee81da93e86a Joe Perches 2012-03-19 794 printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
d9ee81da93e86a Joe Perches 2012-03-19 795 "inode #%lu: block %llu: comm %s: %pV\n",
d9ee81da93e86a Joe Perches 2012-03-19 796 inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches 2012-03-19 797 block, current->comm, &vaf);
d9ee81da93e86a Joe Perches 2012-03-19 798 else
d9ee81da93e86a Joe Perches 2012-03-19 799 printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
d9ee81da93e86a Joe Perches 2012-03-19 800 "inode #%lu: comm %s: %pV\n",
d9ee81da93e86a Joe Perches 2012-03-19 801 inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches 2012-03-19 802 current->comm, &vaf);
efbed4dc5857f8 Theodore Ts'o 2013-10-17 803 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 804
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 805 ext4_fsnotify_error(error, inode, block, function, line, NULL, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 806 va_end(args);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 807
e789ca0cc1d512 Jan Kara 2020-12-16 808 ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
54d3adbc29f0c7 Theodore Ts'o 2020-03-28 809 function, line);
273df556b6ee20 Frank Mayhar 2010-03-02 810 }
273df556b6ee20 Frank Mayhar 2010-03-02 811
e7c96e8e47baf2 Joe Perches 2013-07-01 812 void __ext4_error_file(struct file *file, const char *function,
f7c21177af0b32 Theodore Ts'o 2011-01-10 813 unsigned int line, ext4_fsblk_t block,
f7c21177af0b32 Theodore Ts'o 2011-01-10 814 const char *fmt, ...)
273df556b6ee20 Frank Mayhar 2010-03-02 815 {
273df556b6ee20 Frank Mayhar 2010-03-02 816 va_list args;
f7c21177af0b32 Theodore Ts'o 2011-01-10 817 struct va_format vaf;
496ad9aa8ef448 Al Viro 2013-01-23 818 struct inode *inode = file_inode(file);
273df556b6ee20 Frank Mayhar 2010-03-02 819 char pathname[80], *path;
273df556b6ee20 Frank Mayhar 2010-03-02 820
0db1ff222d40f1 Theodore Ts'o 2017-02-05 821 if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
0db1ff222d40f1 Theodore Ts'o 2017-02-05 822 return;
0db1ff222d40f1 Theodore Ts'o 2017-02-05 823
ccf0f32acd436b Theodore Ts'o 2018-02-18 824 trace_ext4_error(inode->i_sb, function, line);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 825
9bf39ab2adafd7 Miklos Szeredi 2015-06-19 826 path = file_path(file, pathname, sizeof(pathname));
f9a62d090cf47f Dan Carpenter 2011-01-10 827 if (IS_ERR(path))
273df556b6ee20 Frank Mayhar 2010-03-02 828 path = "(unknown)";
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 829
f7c21177af0b32 Theodore Ts'o 2011-01-10 830 va_start(args, fmt);
f7c21177af0b32 Theodore Ts'o 2011-01-10 831 vaf.fmt = fmt;
f7c21177af0b32 Theodore Ts'o 2011-01-10 832 vaf.va = &args;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 833
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 834 if (ext4_error_ratelimit(inode->i_sb)) {
d9ee81da93e86a Joe Perches 2012-03-19 835 if (block)
d9ee81da93e86a Joe Perches 2012-03-19 836 printk(KERN_CRIT
d9ee81da93e86a Joe Perches 2012-03-19 837 "EXT4-fs error (device %s): %s:%d: inode #%lu: "
d9ee81da93e86a Joe Perches 2012-03-19 838 "block %llu: comm %s: path %s: %pV\n",
d9ee81da93e86a Joe Perches 2012-03-19 839 inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches 2012-03-19 840 block, current->comm, path, &vaf);
d9ee81da93e86a Joe Perches 2012-03-19 841 else
d9ee81da93e86a Joe Perches 2012-03-19 842 printk(KERN_CRIT
d9ee81da93e86a Joe Perches 2012-03-19 843 "EXT4-fs error (device %s): %s:%d: inode #%lu: "
d9ee81da93e86a Joe Perches 2012-03-19 844 "comm %s: path %s: %pV\n",
d9ee81da93e86a Joe Perches 2012-03-19 845 inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches 2012-03-19 846 current->comm, path, &vaf);
efbed4dc5857f8 Theodore Ts'o 2013-10-17 847 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 848 ext4_fsnotify_error(EFSCORRUPTED, inode, block, function, line, NULL, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 849 va_end(args);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 850
e789ca0cc1d512 Jan Kara 2020-12-16 851 ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
54d3adbc29f0c7 Theodore Ts'o 2020-03-28 852 function, line);
273df556b6ee20 Frank Mayhar 2010-03-02 853 }
273df556b6ee20 Frank Mayhar 2010-03-02 854
722887ddc8982f Theodore Ts'o 2013-02-08 855 const char *ext4_decode_error(struct super_block *sb, int errno,
ac27a0ec112a08 Dave Kleikamp 2006-10-11 856 char nbuf[16])
ac27a0ec112a08 Dave Kleikamp 2006-10-11 857 {
ac27a0ec112a08 Dave Kleikamp 2006-10-11 858 char *errstr = NULL;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 859
ac27a0ec112a08 Dave Kleikamp 2006-10-11 860 switch (errno) {
6a797d27378389 Darrick J. Wong 2015-10-17 861 case -EFSCORRUPTED:
6a797d27378389 Darrick J. Wong 2015-10-17 862 errstr = "Corrupt filesystem";
6a797d27378389 Darrick J. Wong 2015-10-17 863 break;
6a797d27378389 Darrick J. Wong 2015-10-17 864 case -EFSBADCRC:
6a797d27378389 Darrick J. Wong 2015-10-17 865 errstr = "Filesystem failed CRC";
6a797d27378389 Darrick J. Wong 2015-10-17 866 break;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 867 case -EIO:
ac27a0ec112a08 Dave Kleikamp 2006-10-11 868 errstr = "IO failure";
ac27a0ec112a08 Dave Kleikamp 2006-10-11 869 break;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 870 case -ENOMEM:
ac27a0ec112a08 Dave Kleikamp 2006-10-11 871 errstr = "Out of memory";
ac27a0ec112a08 Dave Kleikamp 2006-10-11 872 break;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 873 case -EROFS:
78f1ddbb498283 Theodore Ts'o 2009-07-27 874 if (!sb || (EXT4_SB(sb)->s_journal &&
78f1ddbb498283 Theodore Ts'o 2009-07-27 875 EXT4_SB(sb)->s_journal->j_flags & JBD2_ABORT))
ac27a0ec112a08 Dave Kleikamp 2006-10-11 876 errstr = "Journal has aborted";
ac27a0ec112a08 Dave Kleikamp 2006-10-11 877 else
ac27a0ec112a08 Dave Kleikamp 2006-10-11 878 errstr = "Readonly filesystem";
ac27a0ec112a08 Dave Kleikamp 2006-10-11 879 break;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 880 default:
ac27a0ec112a08 Dave Kleikamp 2006-10-11 881 /* If the caller passed in an extra buffer for unknown
ac27a0ec112a08 Dave Kleikamp 2006-10-11 882 * errors, textualise them now. Else we just return
ac27a0ec112a08 Dave Kleikamp 2006-10-11 883 * NULL. */
ac27a0ec112a08 Dave Kleikamp 2006-10-11 884 if (nbuf) {
ac27a0ec112a08 Dave Kleikamp 2006-10-11 885 /* Check for truncated error codes... */
ac27a0ec112a08 Dave Kleikamp 2006-10-11 886 if (snprintf(nbuf, 16, "error %d", -errno) >= 0)
ac27a0ec112a08 Dave Kleikamp 2006-10-11 887 errstr = nbuf;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 888 }
ac27a0ec112a08 Dave Kleikamp 2006-10-11 889 break;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 890 }
ac27a0ec112a08 Dave Kleikamp 2006-10-11 891
ac27a0ec112a08 Dave Kleikamp 2006-10-11 892 return errstr;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 893 }
ac27a0ec112a08 Dave Kleikamp 2006-10-11 894
617ba13b31fbf5 Mingming Cao 2006-10-11 895 /* __ext4_std_error decodes expected errors from journaling functions
ac27a0ec112a08 Dave Kleikamp 2006-10-11 896 * automatically and invokes the appropriate error response. */
ac27a0ec112a08 Dave Kleikamp 2006-10-11 897
c398eda0e43a79 Theodore Ts'o 2010-07-27 898 void __ext4_std_error(struct super_block *sb, const char *function,
c398eda0e43a79 Theodore Ts'o 2010-07-27 899 unsigned int line, int errno)
ac27a0ec112a08 Dave Kleikamp 2006-10-11 900 {
ac27a0ec112a08 Dave Kleikamp 2006-10-11 901 char nbuf[16];
ac27a0ec112a08 Dave Kleikamp 2006-10-11 902 const char *errstr;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 903
0db1ff222d40f1 Theodore Ts'o 2017-02-05 904 if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
0db1ff222d40f1 Theodore Ts'o 2017-02-05 905 return;
0db1ff222d40f1 Theodore Ts'o 2017-02-05 906
ac27a0ec112a08 Dave Kleikamp 2006-10-11 907 /* Special case: if the error is EROFS, and we're not already
ac27a0ec112a08 Dave Kleikamp 2006-10-11 908 * inside a transaction, then there's really no point in logging
ac27a0ec112a08 Dave Kleikamp 2006-10-11 909 * an error. */
bc98a42c1f7d0f David Howells 2017-07-17 910 if (errno == -EROFS && journal_current_handle() == NULL && sb_rdonly(sb))
ac27a0ec112a08 Dave Kleikamp 2006-10-11 911 return;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 912
efbed4dc5857f8 Theodore Ts'o 2013-10-17 913 if (ext4_error_ratelimit(sb)) {
617ba13b31fbf5 Mingming Cao 2006-10-11 914 errstr = ext4_decode_error(sb, errno, nbuf);
c398eda0e43a79 Theodore Ts'o 2010-07-27 915 printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
c398eda0e43a79 Theodore Ts'o 2010-07-27 916 sb->s_id, function, line, errstr);
efbed4dc5857f8 Theodore Ts'o 2013-10-17 917 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @918 ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
ac27a0ec112a08 Dave Kleikamp 2006-10-11 919
e789ca0cc1d512 Jan Kara 2020-12-16 920 ext4_handle_error(sb, false, -errno, 0, 0, function, line);
ac27a0ec112a08 Dave Kleikamp 2006-10-11 921 }
ac27a0ec112a08 Dave Kleikamp 2006-10-11 922
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 41522 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type
2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
@ 2021-04-27 8:39 ` Amir Goldstein
0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27 8:39 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Expose a new type of fsnotify event for filesystems to report errors for
> userspace monitoring tools. fanotify will send this type of
> notification for FAN_ERROR marks.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/notify/fsnotify.c | 2 +-
> include/linux/fsnotify_backend.h | 16 ++++++++++++++++
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 30d422b8c0fc..9fff35e67b37 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -558,7 +558,7 @@ static __init int fsnotify_init(void)
> {
> int ret;
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26);
>
> ret = init_srcu_struct(&fsnotify_mark_srcu);
> if (ret)
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index a1a4dd69e5ed..f850bfbe30d4 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -48,6 +48,8 @@
> #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */
> #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */
>
> +#define FS_ERROR 0x00100000 /* Used for filesystem error reporting */
> +
Why skip 0x00080000?
Anyway, event bits are starting to run out so I would prefer that you overload
an existing bit used only by inotify/dnotify.
FS_IN_IGNORED is completely internal to inotify and there is no need to
set it in i_fsnotify_mask at all, so if we remove the bit from the output of
inotify_arg_to_mask() no functionality will change and we will be able to
overload the event bit for FS_ERROR (see patch below).
I also kind of like that FS_ERROR is adjacent to FS_UMOUNT and
FS_Q_OVERFLOW :-)
Other FS_IN/FS_DN bits may also be reclaimed but it takes a bit more work
I have patches for those.
Thanks,
Amir.
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 98f61b31745a..351517bae716 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct
inode *inode, u32 arg)
__u32 mask;
/*
- * Everything should accept their own ignored and should receive events
- * when the inode is unmounted. All directories care about children.
+ * Everything should receive events when the inode is unmounted.
+ * All directories care about children.
*/
- mask = (FS_IN_IGNORED | FS_UNMOUNT);
+ mask = FS_UNMOUNT;
if (S_ISDIR(inode->i_mode))
mask |= FS_EVENT_ON_CHILD;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1ce66748a2d2..ecbafb3f36d7 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -42,6 +42,11 @@
#define FS_UNMOUNT 0x00002000 /* inode on umount fs */
#define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define FS_ERROR 0x00008000 /* Filesystem error report */
+/*
+ * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify
+ * which does not support FS_ERROR.
+ */
#define FS_IN_IGNORED 0x00008000 /* last inotify event here */
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 00/15] File system wide monitoring
2021-04-27 4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
@ 2021-04-27 15:44 ` Gabriel Krisman Bertazi
2021-05-11 4:45 ` Khazhy Kumykov
2021-05-11 10:43 ` Jan Kara
1 sibling, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-27 15:44 UTC (permalink / raw)
To: Amir Goldstein
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
Amir Goldstein <amir73il@gmail.com> writes:
> On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> Hi,
>>
>> In an attempt to consolidate some of the feedback from the previous
>> proposals, I wrote a new attempt to solve the file system error reporting
>> problem. Before I spend more time polishing it, I'd like to hear your
>> feedback if I'm going in the wrong direction, in particular with the
>> modifications to fsnotify.
>>
>
> IMO you are going in the right direction, but you have gone a bit too far ;-)
>
> My understanding of the requirements and my interpretation of the feedback
> from filesystem maintainers is that the missing piece in the ecosystem is a
> user notification that "something went wrong". The "what went wrong" part
> is something that users and admins have long been able to gather from the
> kernel log and from filesystem tools (e.g. last error recorded).
>
> I do not see the need to duplicate existing functionality in fsmonitor.
> Don't get me wrong, I understand why it would have been nice for fsmonitor
> to be able to get all the errors nicely without looking anywhere else, but I
> don't think it justifies the extra complexity.
Hi Amir,
Thanks for the detailed review.
The reasons for the location record and the ring buffer is the use case
from Google to do analysis on a series of errors. I understand this is
important to them, which is why I expanded a bit on the 'what went
wrong' and multiple errors. In addition, The file system specific blob
attempts to assist online recovery tools with more information, but it
might make sense to do it in the future, when it is needed.
>> This RFC follows up on my previous proposals which attempted to leverage
>> watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
>> to push error notifications to user space. This proposal starts by, as
>> suggested by Darrick, limiting the scope of what I'm trying to do to an
>> interface for administrators to monitor the health of a file system,
>> instead of a generic inteface for file errors. Therefore, this doesn't
>> solve the problem of writeback errors or the need to watch a specific
>> subsystem.
>>
>> * Format
>>
>> The feature is implemented on top of fanotify, as a new type of fanotify
>> mark, FAN_ERROR, which a file system monitoring tool can register to
>
> You have a terminology mistake throughout your series.
> FAN_ERROR is not a type of a mark, it is a type of an event.
> A mark describes the watched object (i.e. a filesystem, mount, inode).
Right. I understand the mistake and will fix it around the series.
>
>> receive notifications. A notification is split in three parts, and only
>> the first is guaranteed to exist for any given error event:
>>
>> - FS generic data: A file system agnostic structure that has a generic
>> error code and identifies the filesystem. Basically, it let's
>> userspace know something happen on a monitored filesystem.
>
> I think an error seq counter per fs would be a nice addition to generic data.
> It does not need to be persistent (it could be if filesystem supports it).
Makes sense to me.
>>
>> - FS location data: Identifies where in the code the problem
>> happened. (This is important for the use case of analysing frequent
>> error points that we discussed earlier).
>>
>> - FS specific data: A detailed error report in a filesystem specific
>> format that details what the error is. Ideally, a capable monitoring
>> tool can use the information here for error recovery. For instance,
>> xfs can put the xfs_scrub structures here, ext4 can send its error
>> reports, etc. An example of usage is done in the ext4 patch of this
>> series.
>>
>> More details on the information in each record can be found on the
>> documentation introduced in patch 15.
>>
>> * Using fanotify
>>
>> Using fanotify for this kind of thing is slightly tricky because we want
>> to guarantee delivery in some complicated conditions, for instance, the
>> file system might want to send an error while holding several locks.
>>
>> Instead of working around file system constraints at the file system
>> level, this proposal tries to make the FAN_ERROR submission safe in
>> those contexts. This is done with a new mode in fsnotify that
>> preallocates the memory at group creation to be used for the
>> notification submission.
>>
>> This new mode in fsnotify introduces a ring buffer to queue
>> notifications, which eliminates the allocation path in fsnotify. From
>> what I saw, the allocation is the only problem in fsnotify for
>> filesystems to submit errors in constrained situations.
>>
>
> The ring buffer functionality for fsnotify is interesting and it may be
> useful on its own, but IMO, its too big of a hammer for the problem
> at hand.
>
> The question that you should be asking yourself is what is the
> expected behavior in case of a flood of filesystem corruption errors.
> I think it has already been expressed by filesystem maintainers on
> one your previous postings, that a flood of filesystem corruption
> errors is often noise and the only interesting information is the
> first error.
My idea was be to provide an ioctl for the user to resize the ring
buffer when needed, to make the flood manageable. But I understand your
main point about the ring buffer. i'm not sure saving only the first
notification solves Google's use case of error monitoring and analysis,
though. Khazhy, Ted, can you weight in?
> For this reason, I think that FS_ERROR could be implemented
> by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
>
> struct fsnotify_sb_mark {
> struct fsnotify_mark fsn_mark;
> struct fsnotify_error_info info;
> }
>
> Similar to fd sampled errseq, there can be only one error report
> per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
> the error report can be allocated at the time of setting the filesystem mark.
>
> With this, you will not need the added complexity of the ring buffer
> and you will not need to limit FAN_ERROR reporting to a group that
> is only listening for FAN_ERROR, which is an unneeded limitation IMO.
The limitation exists because I was concerned about not breaking the
semantics of FAN_ACCESS and others, with regards to merged
notifications. I believe there should be no other reason why
notifications of FAN_CLASS_NOTIF can't be sent to the ring buffer too.
That limitation could be lifted for everything but permission events, I
think.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 13/15] ext4: Send notifications on error
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
2021-04-26 23:10 ` kernel test robot
2021-04-27 4:32 ` Amir Goldstein
@ 2021-04-29 0:57 ` Darrick J. Wong
2 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2021-04-29 0:57 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: amir73il, tytso, david, jack, dhowells, khazhy, linux-fsdevel,
linux-ext4, kernel
On Mon, Apr 26, 2021 at 02:41:59PM -0400, Gabriel Krisman Bertazi wrote:
> Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> whenever a ext4 error condition is triggered. This follows the existing
> error conditions in ext4, so it is hooked to the ext4_error* functions.
>
> It also follows the current dmesg reporting in the format. The
> filesystem message is composed mostly by the string that would be
> otherwise printed in dmesg.
>
> A new ext4 specific record format is exposed in the uapi, such that a
> monitoring tool knows what to expect when listening errors of an ext4
> filesystem.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> fs/ext4/super.c | 60 ++++++++++++++++++++++++--------
> include/uapi/linux/ext4-notify.h | 17 +++++++++
> 2 files changed, 62 insertions(+), 15 deletions(-)
> create mode 100644 include/uapi/linux/ext4-notify.h
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b9693680463a..032e29e7ff6a 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -46,6 +46,8 @@
> #include <linux/part_stat.h>
> #include <linux/kthread.h>
> #include <linux/freezer.h>
> +#include <linux/fsnotify.h>
> +#include <uapi/linux/ext4-notify.h>
>
> #include "ext4.h"
> #include "ext4_extents.h" /* Needed for trace points definition */
> @@ -727,6 +729,22 @@ static void flush_stashed_error_work(struct work_struct *work)
> ext4_commit_super(sbi->s_sb);
> }
>
> +static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
> + const char *func, int line,
> + const char *desc, struct va_format *vaf)
> +{
> + struct ext4_error_inode_report report;
> +
> + if (inode->i_sb->s_fsnotify_marks) {
> + report.inode = inode ? inode->i_ino : -1L;
> + report.block = block ? block : -1L;
> +
> + snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
> +
> + fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
> + }
> +}
> +
> #define ext4_error_ratelimit(sb) \
> ___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state), \
> "EXT4-fs error")
> @@ -742,15 +760,18 @@ void __ext4_error(struct super_block *sb, const char *function,
> return;
>
> trace_ext4_error(sb, function, line);
> +
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> if (ext4_error_ratelimit(sb)) {
> - va_start(args, fmt);
> - vaf.fmt = fmt;
> - vaf.va = &args;
> printk(KERN_CRIT
> "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
> sb->s_id, function, line, current->comm, &vaf);
> - va_end(args);
> +
> }
> + ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
> + va_end(args);
> ext4_handle_error(sb, force_ro, error, 0, block, function, line);
> }
>
> @@ -765,10 +786,10 @@ void __ext4_error_inode(struct inode *inode, const char *function,
> return;
>
> trace_ext4_error(inode->i_sb, function, line);
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> if (ext4_error_ratelimit(inode->i_sb)) {
> - va_start(args, fmt);
> - vaf.fmt = fmt;
> - vaf.va = &args;
> if (block)
> printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
> "inode #%lu: block %llu: comm %s: %pV\n",
> @@ -779,8 +800,11 @@ void __ext4_error_inode(struct inode *inode, const char *function,
> "inode #%lu: comm %s: %pV\n",
> inode->i_sb->s_id, function, line, inode->i_ino,
> current->comm, &vaf);
> - va_end(args);
> }
> +
> + ext4_fsnotify_error(error, inode, block, function, line, NULL, &vaf);
> + va_end(args);
> +
> ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
> function, line);
> }
> @@ -798,13 +822,16 @@ void __ext4_error_file(struct file *file, const char *function,
> return;
>
> trace_ext4_error(inode->i_sb, function, line);
> +
> + path = file_path(file, pathname, sizeof(pathname));
> + if (IS_ERR(path))
> + path = "(unknown)";
> +
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> +
> if (ext4_error_ratelimit(inode->i_sb)) {
> - path = file_path(file, pathname, sizeof(pathname));
> - if (IS_ERR(path))
> - path = "(unknown)";
> - va_start(args, fmt);
> - vaf.fmt = fmt;
> - vaf.va = &args;
> if (block)
> printk(KERN_CRIT
> "EXT4-fs error (device %s): %s:%d: inode #%lu: "
> @@ -817,8 +844,10 @@ void __ext4_error_file(struct file *file, const char *function,
> "comm %s: path %s: %pV\n",
> inode->i_sb->s_id, function, line, inode->i_ino,
> current->comm, path, &vaf);
> - va_end(args);
> }
> + ext4_fsnotify_error(EFSCORRUPTED, inode, block, function, line, NULL, &vaf);
> + va_end(args);
> +
> ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
> function, line);
> }
> @@ -886,6 +915,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
> printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
> sb->s_id, function, line, errstr);
> }
> + ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
>
> ext4_handle_error(sb, false, -errno, 0, 0, function, line);
> }
> diff --git a/include/uapi/linux/ext4-notify.h b/include/uapi/linux/ext4-notify.h
> new file mode 100644
> index 000000000000..31a3bbcafd13
> --- /dev/null
> +++ b/include/uapi/linux/ext4-notify.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * Copyright 2021, Collabora Ltd.
> + */
> +
> +#ifndef EXT4_NOTIFY_H
> +#define EXT4_NOTIFY_H
> +
> +#define EXT4_FSN_DESC_LEN 256
> +
> +struct ext4_error_inode_report {
> + u64 inode;
I don't have much to contribute this time, other than suggesting that
you might want to encode the inode generation here so that forensics
tools won't waste their time if the inode has been deleted and recreated
in between when the error happens and when the fs gets pulled offline
for analysis.
(...and maybe add a u32 flags field that can remain zero for now)
> + u64 block;
...and maybe call this "lblk" (assuming this is the logical block offset
within the file?) since that's already in wide use around e2fsprogs and
fs/ext4/.
--D
> + char desc[EXT4_FSN_DESC_LEN];
> +};
> +
> +#endif
> --
> 2.31.0
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
@ 2021-04-29 11:31 ` Dan Carpenter
0 siblings, 0 replies; 46+ messages in thread
From: Dan Carpenter @ 2021-04-29 11:31 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3393 bytes --]
Hi Gabriel,
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/notify/fanotify/fanotify_user.c:112 fanotify_event_len() warn: this array is probably non-NULL. 'fee->fs_data'
vim +112 fs/notify/fanotify/fanotify_user.c
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 90 static size_t fanotify_event_len(struct fanotify_event *event,
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 91 unsigned int fid_mode)
5e469c830fdb5a Amir Goldstein 2019-01-10 92 {
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 93 size_t event_len = FAN_EVENT_METADATA_LEN;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 94 struct fanotify_info *info;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 95 int dir_fh_len;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 96 int fh_len;
929943b38daf81 Amir Goldstein 2020-07-16 97 int dot_len = 0;
f454fa610a69b9 Amir Goldstein 2020-07-16 98
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 99 if (fanotify_is_error_event(event->mask)) {
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 100 struct fanotify_error_event *fee = FANOTIFY_EE(event);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 101 /*
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 102 * Error events (FAN_ERROR) have a different format
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 103 * as follows:
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 104 * [ event_metadata ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 105 * [ fs-generic error header ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 106 * [ error location (optional) ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 107 * [ fs-specific blob ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 108 */
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 109 event_len = fanotify_error_info_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 110 if (fee->loc.function)
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 111 event_len += fanotify_location_info_len(&fee->loc);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 @112 if (fee->fs_data)
^^^^^^^^^^^^
This is a zero length array, not a pointer. It can't be NULL.
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 113 event_len += fanotify_error_fsdata_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 114 return event_len;
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 115 }
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 116
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 117 if (!fid_mode)
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26 118 return event_len;
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 13/15] ext4: Send notifications on error
@ 2021-04-29 13:19 ` Dan Carpenter
0 siblings, 0 replies; 46+ messages in thread
From: Dan Carpenter @ 2021-04-29 13:19 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 4560 bytes --]
Hi Gabriel,
url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/ext4/super.c:739 ext4_fsnotify_error() warn: variable dereferenced before check 'inode' (see line 738)
fs/ext4/super.c:918 __ext4_std_error() error: uninitialized symbol 'errstr'.
vim +/inode +739 fs/ext4/super.c
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 732 static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 733 const char *func, int line,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 734 const char *desc, struct va_format *vaf)
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 735 {
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 736 struct ext4_error_inode_report report;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 737
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @738 if (inode->i_sb->s_fsnotify_marks) {
^^^^^
Dereference
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @739 report.inode = inode ? inode->i_ino : -1L;
^^^^^
Check too late.
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 740 report.block = block ? block : -1L;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 741
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 742 snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 743
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 744 fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 745 }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 746 }
[ snip ]
c398eda0e43a79 Theodore Ts'o 2010-07-27 898 void __ext4_std_error(struct super_block *sb, const char *function,
c398eda0e43a79 Theodore Ts'o 2010-07-27 899 unsigned int line, int errno)
ac27a0ec112a08 Dave Kleikamp 2006-10-11 900 {
ac27a0ec112a08 Dave Kleikamp 2006-10-11 901 char nbuf[16];
ac27a0ec112a08 Dave Kleikamp 2006-10-11 902 const char *errstr;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 903
0db1ff222d40f1 Theodore Ts'o 2017-02-05 904 if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
0db1ff222d40f1 Theodore Ts'o 2017-02-05 905 return;
0db1ff222d40f1 Theodore Ts'o 2017-02-05 906
ac27a0ec112a08 Dave Kleikamp 2006-10-11 907 /* Special case: if the error is EROFS, and we're not already
ac27a0ec112a08 Dave Kleikamp 2006-10-11 908 * inside a transaction, then there's really no point in logging
ac27a0ec112a08 Dave Kleikamp 2006-10-11 909 * an error. */
bc98a42c1f7d0f David Howells 2017-07-17 910 if (errno == -EROFS && journal_current_handle() == NULL && sb_rdonly(sb))
ac27a0ec112a08 Dave Kleikamp 2006-10-11 911 return;
ac27a0ec112a08 Dave Kleikamp 2006-10-11 912
efbed4dc5857f8 Theodore Ts'o 2013-10-17 913 if (ext4_error_ratelimit(sb)) {
617ba13b31fbf5 Mingming Cao 2006-10-11 914 errstr = ext4_decode_error(sb, errno, nbuf);
c398eda0e43a79 Theodore Ts'o 2010-07-27 915 printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
c398eda0e43a79 Theodore Ts'o 2010-07-27 916 sb->s_id, function, line, errstr);
efbed4dc5857f8 Theodore Ts'o 2013-10-17 917 }
"errstr" not set on error path.
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @918 ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
ac27a0ec112a08 Dave Kleikamp 2006-10-11 919
e789ca0cc1d512 Jan Kara 2020-12-16 920 ext4_handle_error(sb, false, -errno, 0, 0, function, line);
ac27a0ec112a08 Dave Kleikamp 2006-10-11 921 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
2021-04-27 5:39 ` Amir Goldstein
@ 2021-04-29 18:33 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-29 18:33 UTC (permalink / raw)
To: Amir Goldstein
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
Amir Goldstein <amir73il@gmail.com> writes:
> On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> In order to support file system health/error reporting over fanotify,
>> fsnotify needs to expose a submission path that doesn't allow sleeping.
>> The only problem I identified with the current submission path is the
>> need to dynamically allocate memory for the event queue.
>>
>> This patch avoids the problem by introducing a new mode in fsnotify,
>> where a ring buffer is used to submit events for a group. Each group
>> has its own ring buffer, and error notifications are submitted
>> exclusively through it.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>> fs/notify/Makefile | 2 +-
>> fs/notify/group.c | 12 +-
>> fs/notify/notification.c | 10 ++
>> fs/notify/ring.c | 199 +++++++++++++++++++++++++++++++
>> include/linux/fsnotify_backend.h | 37 +++++-
>> 5 files changed, 255 insertions(+), 5 deletions(-)
>> create mode 100644 fs/notify/ring.c
>>
>> diff --git a/fs/notify/Makefile b/fs/notify/Makefile
>> index 63a4b8828df4..61dae1e90f2d 100644
>> --- a/fs/notify/Makefile
>> +++ b/fs/notify/Makefile
>> @@ -1,6 +1,6 @@
>> # SPDX-License-Identifier: GPL-2.0
>> obj-$(CONFIG_FSNOTIFY) += fsnotify.o notification.o group.o mark.o \
>> - fdinfo.o
>> + fdinfo.o ring.o
>>
>> obj-y += dnotify/
>> obj-y += inotify/
>> diff --git a/fs/notify/group.c b/fs/notify/group.c
>> index 08acb1afc0c2..b99b3de36696 100644
>> --- a/fs/notify/group.c
>> +++ b/fs/notify/group.c
>> @@ -81,7 +81,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
>> * notification against this group. So clearing the notification queue
>> * of all events is reliable now.
>> */
>> - fsnotify_flush_notify(group);
>> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> + fsnotify_free_ring_buffer(group);
>> + else
>> + fsnotify_flush_notify(group);
>>
>> /*
>> * Destroy overflow event (we cannot use fsnotify_destroy_event() as
>> @@ -136,6 +139,13 @@ static struct fsnotify_group *__fsnotify_alloc_group(
>> group->ops = ops;
>> group->flags = flags;
>>
>> + if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
>> + if (fsnotify_create_ring_buffer(group)) {
>> + kfree(group);
>> + return ERR_PTR(-ENOMEM);
>> + }
>> + }
>> +
>> return group;
>> }
>>
>> diff --git a/fs/notify/notification.c b/fs/notify/notification.c
>> index 75d79d6d3ef0..32f97e7b7a80 100644
>> --- a/fs/notify/notification.c
>> +++ b/fs/notify/notification.c
>> @@ -51,6 +51,10 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
>> bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
>> {
>> assert_spin_locked(&group->notification_lock);
>> +
>> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> + return fsnotify_ring_notify_queue_is_empty(group);
>> +
>> return list_empty(&group->notification_list) ? true : false;
>> }
>>
>> @@ -132,6 +136,9 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group,
>> struct fsnotify_event *event)
>> {
>> assert_spin_locked(&group->notification_lock);
>> +
>> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> + return;
>> /*
>> * We need to init list head for the case of overflow event so that
>> * check in fsnotify_add_event() works
>> @@ -166,6 +173,9 @@ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
>> {
>> assert_spin_locked(&group->notification_lock);
>>
>> + if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> + return fsnotify_ring_peek_first_event(group);
>> +
>> return list_first_entry(&group->notification_list,
>> struct fsnotify_event, list);
>> }
>> diff --git a/fs/notify/ring.c b/fs/notify/ring.c
>> new file mode 100644
>> index 000000000000..75e8af1f8d80
>> --- /dev/null
>> +++ b/fs/notify/ring.c
>> @@ -0,0 +1,199 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/types.h>
>> +#include <linux/fsnotify.h>
>> +#include <linux/memcontrol.h>
>> +
>> +#define INVALID_RING_SLOT -1
>> +
>> +#define FSNOTIFY_RING_PAGES 16
>> +
>> +#define NEXT_SLOT(cur, len, ring_size) ((cur + len) & (ring_size-1))
>> +#define NEXT_PAGE(cur, ring_size) (round_up(cur, PAGE_SIZE) & (ring_size-1))
>> +
>> +bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group)
>> +{
>> + assert_spin_locked(&group->notification_lock);
>> +
>> + if (group->ring_buffer.tail == group->ring_buffer.head)
>> + return true;
>> + return false;
>> +}
>> +
>> +struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group)
>> +{
>> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> + struct fsnotify_event *fsn;
>> + char *kaddr;
>> + u64 tail;
>> +
>> + assert_spin_locked(&group->notification_lock);
>> +
>> +again:
>> + tail = group->ring_buffer.tail;
>> +
>> + if ((PAGE_SIZE - (tail & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
>> + group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
>> + goto again;
>> + }
>> +
>> + kaddr = kmap_atomic(group->ring_buffer.pages[tail / PAGE_SIZE]);
>> + if (!kaddr)
>> + return NULL;
>> + fsn = (struct fsnotify_event *) (kaddr + (tail & (PAGE_SIZE-1)));
>> +
>> + if (fsn->slot_len == INVALID_RING_SLOT) {
>> + group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
>> + kunmap_atomic(kaddr);
>> + goto again;
>> + }
>> +
>> + /* will be unmapped when entry is consumed. */
>> + return fsn;
>> +}
>> +
>> +void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
>> + struct fsnotify_event *event)
>> +{
>> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> + u64 new_tail = NEXT_SLOT(group->ring_buffer.tail, event->slot_len, ring_size);
>> +
>> + kunmap_atomic(event);
>> +
>> + pr_debug("%s: group=%p tail=%llx->%llx ring_size=%llu\n", __func__,
>> + group, group->ring_buffer.tail, new_tail, ring_size);
>> +
>> + WRITE_ONCE(group->ring_buffer.tail, new_tail);
>> +}
>> +
>> +struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
>> + size_t size)
>> + __acquires(&group->notification_lock)
>> +{
>> + struct fsnotify_event *fsn;
>> + u64 head, tail;
>> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> + u64 new_head;
>> + void *kaddr;
>> +
>> + if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
>> + return ERR_PTR(-EINVAL);
>> +
>> + pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
>> + ring_size, size);
>> +
>> + spin_lock(&group->notification_lock);
>> +again:
>> + head = group->ring_buffer.head;
>> + tail = group->ring_buffer.tail;
>> + new_head = NEXT_SLOT(head, size, ring_size);
>> +
>> + /* head would catch up to tail, corrupting an entry. */
>> + if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
>> + fsn = ERR_PTR(-ENOMEM);
>> + goto err;
>> + }
>> +
>> + /*
>> + * Not event a skip message fits in the page. We can detect the
>> + * lack of space. Move on to the next page.
>> + */
>> + if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
>> + /* Start again on next page */
>> + group->ring_buffer.head = NEXT_PAGE(head, ring_size);
>> + goto again;
>> + }
>> +
>> + kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
>> + if (!kaddr) {
>> + fsn = ERR_PTR(-EFAULT);
>> + goto err;
>> + }
>> +
>> + fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
>> +
>> + if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
>> + /*
>> + * No room in the current page. Add a fake entry
>> + * consuming the end the page to avoid splitting event
>> + * structure.
>> + */
>> + fsn->slot_len = INVALID_RING_SLOT;
>> + kunmap_atomic(kaddr);
>> + /* Start again on the next page */
>> + group->ring_buffer.head = NEXT_PAGE(head, ring_size);
>> +
>> + goto again;
>> + }
>> + fsn->slot_len = size;
>> +
>> + return fsn;
>> +
>> +err:
>> + spin_unlock(&group->notification_lock);
>> + return fsn;
>> +}
>> +
>> +void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn)
>> + __releases(&group->notification_lock)
>> +{
>> + u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> + u64 head = group->ring_buffer.head;
>> + u64 new_head = NEXT_SLOT(head, fsn->slot_len, ring_size);
>> +
>> + pr_debug("%s: group=%p head=%llx->%llx ring_size=%llu\n", __func__,
>> + group, head, new_head, ring_size);
>> +
>> + kunmap_atomic(fsn);
>> + group->ring_buffer.head = new_head;
>> +
>> + spin_unlock(&group->notification_lock);
>> +
>> + wake_up(&group->notification_waitq);
>> + kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
>> +
>> +}
>> +
>> +void fsnotify_free_ring_buffer(struct fsnotify_group *group)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < group->ring_buffer.nr_pages; i++)
>> + __free_page(group->ring_buffer.pages[i]);
>> + kfree(group->ring_buffer.pages);
>> + group->ring_buffer.nr_pages = 0;
>> +}
>> +
>> +int fsnotify_create_ring_buffer(struct fsnotify_group *group)
>> +{
>> + int nr_pages = FSNOTIFY_RING_PAGES;
>> + int i;
>> +
>> + pr_debug("%s: group=%p pages=%d\n", __func__, group, nr_pages);
>> +
>> + group->ring_buffer.pages = kmalloc_array(nr_pages, sizeof(struct pages *),
>> + GFP_KERNEL);
>> + if (!group->ring_buffer.pages)
>> + return -ENOMEM;
>> +
>> + group->ring_buffer.head = 0;
>> + group->ring_buffer.tail = 0;
>> +
>> + for (i = 0; i < nr_pages; i++) {
>> + group->ring_buffer.pages[i] = alloc_pages(GFP_KERNEL, 1);
>> + if (!group->ring_buffer.pages)
>> + goto err_dealloc;
>> + }
>> +
>> + group->ring_buffer.nr_pages = nr_pages;
>> +
>> + return 0;
>> +
>> +err_dealloc:
>> + for (--i; i >= 0; i--)
>> + __free_page(group->ring_buffer.pages[i]);
>> + kfree(group->ring_buffer.pages);
>> + group->ring_buffer.nr_pages = 0;
>> + return -ENOMEM;
>> +}
>> +
>> +
>
> Nothing in this file is fsnotify specific.
> Is there no kernel lib implementation for this already?
> If there isn't (I'd be very surprised) please put this in lib/ and post it
> for wider review including self tests.
About the implementation, the only generic code I could find is
include/linux/circ_buf.h, but it doesn't really do much or fit well
here. For instance, it doesn't deal well with non-contiguous pages.
There are other smarter implementations around the kernel like
Documentation/trace/ring-buffer-design.txt, that would be a better
candidate to be a generic ring buffer in lib/, but I admit to haven't
checked them well enough to see if they would solve the problem for
fsnotify, which has a very simple ring buffer anyway.
>> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
>> index 190c6a402e98..a1a4dd69e5ed 100644
>> --- a/include/linux/fsnotify_backend.h
>> +++ b/include/linux/fsnotify_backend.h
>> @@ -74,6 +74,8 @@
>> #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
>> FS_OPEN_EXEC_PERM)
>>
>> +#define FSN_SUBMISSION_RING_BUFFER 0x00000080
>
> FSNOTIFY_GROUP_FLAG_RING_BUFFER please (or FSN_GROUP_ if you must)
> and please define this above struct fsnotify_group, even right above the flags
> field like FSNOTIFY_CONN_FLAG_HAS_FSID
>
> *IF* we go this way :)
>
> Thanks,
> Amir.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 06/15] fanotify: Support submission through ring buffer
2021-04-27 6:02 ` Amir Goldstein
@ 2021-04-29 18:36 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-29 18:36 UTC (permalink / raw)
To: Amir Goldstein
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
Amir Goldstein <amir73il@gmail.com> writes:
> On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> This adds support for the ring buffer mode in fanotify. It is enabled
>> by a new flag FAN_PREALLOC_QUEUE passed to fanotify_init. If this flag
>> is enabled, the group only allows marks that support the ring buffer
>
> I don't like this limitation.
> I think FAN_PREALLOC_QUEUE can work with other events, why not?
The only complications I see are permission events and mergeable events,
which would no longer be merged. The merging problem is not big,
except it changes the existing expectations. Other than that, it should
be trivial to have every FAN_CLASS_NOTIF events in the ring buffer.
>
> In any case if we keep ring buffer, please use a different set of
> fanotify_ring_buffer_ops struct instead of spraying if/else all over the
> event queue implementation.
>
>> submission. In a following patch, FAN_ERROR will make use of this
>> mechanism.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>> fs/notify/fanotify/fanotify.c | 77 +++++++++++++++++++---------
>> fs/notify/fanotify/fanotify_user.c | 81 ++++++++++++++++++------------
>> include/linux/fanotify.h | 5 +-
>> include/uapi/linux/fanotify.h | 1 +
>> 4 files changed, 105 insertions(+), 59 deletions(-)
>>
>> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
>> index e3669d8a4a64..98591a8155a7 100644
>> --- a/fs/notify/fanotify/fanotify.c
>> +++ b/fs/notify/fanotify/fanotify.c
>> @@ -612,6 +612,26 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>> return event;
>> }
>>
>> +static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *group,
>> + u32 mask, const void *data,
>> + int data_type)
>> +{
>> + size_t size = 0;
>> +
>> + pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
>> +
>> + return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
>> +}
>> +
>> +static void fanotify_ring_write_event(struct fsnotify_group *group,
>> + struct fanotify_event *event, u32 mask,
>> + const void *data, __kernel_fsid_t *fsid)
>> +{
>> + fanotify_init_event(group, event, 0, mask);
>> +
>> + event->pid = get_pid(task_tgid(current));
>> +}
>> +
>> /*
>> * Get cached fsid of the filesystem containing the object from any connector.
>> * All connectors are supposed to have the same fsid, but we do not verify that
>> @@ -701,31 +721,38 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>> return 0;
>> }
>>
>> - event = fanotify_alloc_event(group, mask, data, data_type, dir,
>> - file_name, &fsid);
>> - ret = -ENOMEM;
>> - if (unlikely(!event)) {
>> - /*
>> - * We don't queue overflow events for permission events as
>> - * there the access is denied and so no event is in fact lost.
>> - */
>> - if (!fanotify_is_perm_event(mask))
>> - fsnotify_queue_overflow(group);
>> - goto finish;
>> - }
>> -
>> - fsn_event = &event->fse;
>> - ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
>> - if (ret) {
>> - /* Permission events shouldn't be merged */
>> - BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
>> - /* Our event wasn't used in the end. Free it. */
>> - fsnotify_destroy_event(group, fsn_event);
>> -
>> - ret = 0;
>> - } else if (fanotify_is_perm_event(mask)) {
>> - ret = fanotify_get_response(group, FANOTIFY_PERM(event),
>> - iter_info);
>> + if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
>> + event = fanotify_ring_get_slot(group, mask, data, data_type);
>> + if (IS_ERR(event))
>> + return PTR_ERR(event);
>
> So no FAN_OVERFLOW with the ring buffer implementation?
> This will be unexpected for fanotify users and frankly, less useful IMO.
> I also don't see the technical reason to omit the overflow event.
>
>> + fanotify_ring_write_event(group, event, mask, data, &fsid);
>> + fsnotify_ring_commit_slot(group, &event->fse);
>> + } else {
>> + event = fanotify_alloc_event(group, mask, data, data_type, dir,
>> + file_name, &fsid);
>> + ret = -ENOMEM;
>> + if (unlikely(!event)) {
>> + /*
>> + * We don't queue overflow events for permission events as
>> + * there the access is denied and so no event is in fact lost.
>> + */
>> + if (!fanotify_is_perm_event(mask))
>> + fsnotify_queue_overflow(group);
>> + goto finish;
>> + }
>> + fsn_event = &event->fse;
>> + ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
>> + if (ret) {
>> + /* Permission events shouldn't be merged */
>> + BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
>> + /* Our event wasn't used in the end. Free it. */
>> + fsnotify_destroy_event(group, fsn_event);
>> +
>> + ret = 0;
>> + } else if (fanotify_is_perm_event(mask)) {
>> + ret = fanotify_get_response(group, FANOTIFY_PERM(event),
>> + iter_info);
>> + }
>> }
>> finish:
>> if (fanotify_is_perm_event(mask))
>> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
>> index fe605359af88..5031198bf7db 100644
>> --- a/fs/notify/fanotify/fanotify_user.c
>> +++ b/fs/notify/fanotify/fanotify_user.c
>> @@ -521,7 +521,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
>> * Permission events get queued to wait for response. Other
>> * events can be destroyed now.
>> */
>> - if (!fanotify_is_perm_event(event->mask)) {
>> + if (group->fanotify_data.flags & FAN_PREALLOC_QUEUE) {
>> + fsnotify_ring_buffer_consume_event(group, &event->fse);
>> + } else if (!fanotify_is_perm_event(event->mask)) {
>> fsnotify_destroy_event(group, &event->fse);
>> } else {
>> if (ret <= 0) {
>> @@ -587,40 +589,39 @@ static int fanotify_release(struct inode *ignored, struct file *file)
>> */
>> fsnotify_group_stop_queueing(group);
>>
>> - /*
>> - * Process all permission events on access_list and notification queue
>> - * and simulate reply from userspace.
>> - */
>> - spin_lock(&group->notification_lock);
>> - while (!list_empty(&group->fanotify_data.access_list)) {
>> - struct fanotify_perm_event *event;
>> -
>> - event = list_first_entry(&group->fanotify_data.access_list,
>> - struct fanotify_perm_event, fae.fse.list);
>> - list_del_init(&event->fae.fse.list);
>> - finish_permission_event(group, event, FAN_ALLOW);
>> + if (!(group->flags & FSN_SUBMISSION_RING_BUFFER)) {
>> + /*
>> + * Process all permission events on access_list and notification queue
>> + * and simulate reply from userspace.
>> + */
>> spin_lock(&group->notification_lock);
>> - }
>> -
>> - /*
>> - * Destroy all non-permission events. For permission events just
>> - * dequeue them and set the response. They will be freed once the
>> - * response is consumed and fanotify_get_response() returns.
>> - */
>> - while (!fsnotify_notify_queue_is_empty(group)) {
>> - struct fanotify_event *event;
>> -
>> - event = FANOTIFY_E(fsnotify_remove_first_event(group));
>> - if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
>> - spin_unlock(&group->notification_lock);
>> - fsnotify_destroy_event(group, &event->fse);
>> - } else {
>> - finish_permission_event(group, FANOTIFY_PERM(event),
>> - FAN_ALLOW);
>> + while (!list_empty(&group->fanotify_data.access_list)) {
>> + struct fanotify_perm_event *event;
>> + event = list_first_entry(&group->fanotify_data.access_list,
>> + struct fanotify_perm_event, fae.fse.list);
>> + list_del_init(&event->fae.fse.list);
>> + finish_permission_event(group, event, FAN_ALLOW);
>> + spin_lock(&group->notification_lock);
>> }
>> - spin_lock(&group->notification_lock);
>> + /*
>> + * Destroy all non-permission events. For permission events just
>> + * dequeue them and set the response. They will be freed once the
>> + * response is consumed and fanotify_get_response() returns.
>> + */
>> + while (!fsnotify_notify_queue_is_empty(group)) {
>> + struct fanotify_event *event;
>> + event = FANOTIFY_E(fsnotify_remove_first_event(group));
>> + if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
>> + spin_unlock(&group->notification_lock);
>> + fsnotify_destroy_event(group, &event->fse);
>> + } else {
>> + finish_permission_event(group, FANOTIFY_PERM(event),
>> + FAN_ALLOW);
>> + }
>> + spin_lock(&group->notification_lock);
>> + }
>> + spin_unlock(&group->notification_lock);
>> }
>> - spin_unlock(&group->notification_lock);
>>
>> /* Response for all permission events it set, wakeup waiters */
>> wake_up(&group->fanotify_data.access_waitq);
>> @@ -981,6 +982,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>> if (flags & FAN_NONBLOCK)
>> f_flags |= O_NONBLOCK;
>>
>> + if (flags & FAN_PREALLOC_QUEUE) {
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + if (flags & FAN_UNLIMITED_QUEUE)
>> + return -EINVAL;
>> +
>> + fsn_flags = FSN_SUBMISSION_RING_BUFFER;
>> + }
>> +
>> /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
>> group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
>> if (IS_ERR(group)) {
>> @@ -1223,6 +1234,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>> goto fput_and_out;
>> }
>>
>> + if ((group->flags & FSN_SUBMISSION_RING_BUFFER) &&
>> + (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
>> + goto fput_and_out;
>> +
>> ret = fanotify_find_path(dfd, pathname, &path, flags,
>> (mask & ALL_FSNOTIFY_EVENTS), obj_type);
>> if (ret)
>> @@ -1327,7 +1342,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
>> */
>> static int __init fanotify_user_setup(void)
>> {
>> - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
>> + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
>> BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>>
>> fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
>> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
>> index 3e9c56ee651f..5a4cefb4b1c3 100644
>> --- a/include/linux/fanotify.h
>> +++ b/include/linux/fanotify.h
>> @@ -23,7 +23,8 @@
>> #define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
>> FAN_REPORT_TID | \
>> FAN_CLOEXEC | FAN_NONBLOCK | \
>> - FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
>> + FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
>> + FAN_PREALLOC_QUEUE)
>>
>> #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \
>> FAN_MARK_FILESYSTEM)
>> @@ -71,6 +72,8 @@
>> FANOTIFY_PERM_EVENTS | \
>> FAN_Q_OVERFLOW | FAN_ONDIR)
>>
>> +#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
>
> FANOTIFY_RING_BUFFER_EVENTS? FANOTIFY_PREALLOC_EVENTS?
>
> Please leave a comment above to state what this group means.
> I *think* there is no reason to limit the set of events, only the sort of
> information that is possible with FAN_PREALLOC_QUEUE.
>
> Perhaps FAN_REPORT_FID cannot be allowed and as a result
> FANOTIFY_INODE_EVENTS will not be allowed, but I am not even
> sure if that limitation is needed.
>
> Thanks,
> Amir.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 10/15] fanotify: Introduce code location record
2021-04-27 7:11 ` Amir Goldstein
@ 2021-04-29 18:40 ` Gabriel Krisman Bertazi
2021-05-11 5:35 ` Khazhy Kumykov
0 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-29 18:40 UTC (permalink / raw)
To: Amir Goldstein
Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel
Amir Goldstein <amir73il@gmail.com> writes:
> On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> This patch introduces an optional info record that describes the
>> source (as in the region of the source-code where an event was
>> initiated). This record is not produced for other type of existing
>> notification, but it is optionally enabled for FAN_ERROR notifications.
>>
>
> I find this functionality controversial, because think that the fs provided
> s_last_error*, s_first_error* is more reliable and more powerful than this
> functionality.
>
> Let's leave it for a future extending proposal, should fanotify event reporting
> proposal pass muster, shall we?
> Or do you think that without this optional extension fanotify event reporting
> will not be valuable enough?
I think it is valuable enough without this bit, at least on a first
moment. I understand it would be useful for ext4 to analyse information
through this interface, but the main priority is to have a way to push
out the information that an error occured, as you mentioned.
Also, this might be more powerful if we stick to the ring buffer instead
of single stlot, as it would allow more data to be collected than just
first/last.
>
> Thanks,
> Amir.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 00/15] File system wide monitoring
2021-04-27 15:44 ` Gabriel Krisman Bertazi
@ 2021-05-11 4:45 ` Khazhy Kumykov
0 siblings, 0 replies; 46+ messages in thread
From: Khazhy Kumykov @ 2021-05-11 4:45 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Amir Goldstein, Theodore Tso, Darrick J. Wong, Dave Chinner,
Jan Kara, David Howells, linux-fsdevel, Ext4, kernel
[-- Attachment #1: Type: text/plain, Size: 7402 bytes --]
On Tue, Apr 27, 2021 at 8:44 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
>
> > On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> >>
> >> Hi,
> >>
> >> In an attempt to consolidate some of the feedback from the previous
> >> proposals, I wrote a new attempt to solve the file system error reporting
> >> problem. Before I spend more time polishing it, I'd like to hear your
> >> feedback if I'm going in the wrong direction, in particular with the
> >> modifications to fsnotify.
> >>
> >
> > IMO you are going in the right direction, but you have gone a bit too far ;-)
> >
> > My understanding of the requirements and my interpretation of the feedback
> > from filesystem maintainers is that the missing piece in the ecosystem is a
> > user notification that "something went wrong". The "what went wrong" part
> > is something that users and admins have long been able to gather from the
> > kernel log and from filesystem tools (e.g. last error recorded).
> >
> > I do not see the need to duplicate existing functionality in fsmonitor.
> > Don't get me wrong, I understand why it would have been nice for fsmonitor
> > to be able to get all the errors nicely without looking anywhere else, but I
> > don't think it justifies the extra complexity.
>
> Hi Amir,
>
> Thanks for the detailed review.
>
> The reasons for the location record and the ring buffer is the use case
> from Google to do analysis on a series of errors. I understand this is
> important to them, which is why I expanded a bit on the 'what went
> wrong' and multiple errors. In addition, The file system specific blob
> attempts to assist online recovery tools with more information, but it
> might make sense to do it in the future, when it is needed.
>
> >> This RFC follows up on my previous proposals which attempted to leverage
> >> watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
> >> to push error notifications to user space. This proposal starts by, as
> >> suggested by Darrick, limiting the scope of what I'm trying to do to an
> >> interface for administrators to monitor the health of a file system,
> >> instead of a generic inteface for file errors. Therefore, this doesn't
> >> solve the problem of writeback errors or the need to watch a specific
> >> subsystem.
> >>
> >> * Format
> >>
> >> The feature is implemented on top of fanotify, as a new type of fanotify
> >> mark, FAN_ERROR, which a file system monitoring tool can register to
> >
> > You have a terminology mistake throughout your series.
> > FAN_ERROR is not a type of a mark, it is a type of an event.
> > A mark describes the watched object (i.e. a filesystem, mount, inode).
>
> Right. I understand the mistake and will fix it around the series.
> >
> >> receive notifications. A notification is split in three parts, and only
> >> the first is guaranteed to exist for any given error event:
> >>
> >> - FS generic data: A file system agnostic structure that has a generic
> >> error code and identifies the filesystem. Basically, it let's
> >> userspace know something happen on a monitored filesystem.
> >
> > I think an error seq counter per fs would be a nice addition to generic data.
> > It does not need to be persistent (it could be if filesystem supports it).
>
> Makes sense to me.
>
> >>
> >> - FS location data: Identifies where in the code the problem
> >> happened. (This is important for the use case of analysing frequent
> >> error points that we discussed earlier).
> >>
> >> - FS specific data: A detailed error report in a filesystem specific
> >> format that details what the error is. Ideally, a capable monitoring
> >> tool can use the information here for error recovery. For instance,
> >> xfs can put the xfs_scrub structures here, ext4 can send its error
> >> reports, etc. An example of usage is done in the ext4 patch of this
> >> series.
> >>
> >> More details on the information in each record can be found on the
> >> documentation introduced in patch 15.
> >>
> >> * Using fanotify
> >>
> >> Using fanotify for this kind of thing is slightly tricky because we want
> >> to guarantee delivery in some complicated conditions, for instance, the
> >> file system might want to send an error while holding several locks.
> >>
> >> Instead of working around file system constraints at the file system
> >> level, this proposal tries to make the FAN_ERROR submission safe in
> >> those contexts. This is done with a new mode in fsnotify that
> >> preallocates the memory at group creation to be used for the
> >> notification submission.
> >>
> >> This new mode in fsnotify introduces a ring buffer to queue
> >> notifications, which eliminates the allocation path in fsnotify. From
> >> what I saw, the allocation is the only problem in fsnotify for
> >> filesystems to submit errors in constrained situations.
> >>
> >
> > The ring buffer functionality for fsnotify is interesting and it may be
> > useful on its own, but IMO, its too big of a hammer for the problem
> > at hand.
> >
> > The question that you should be asking yourself is what is the
> > expected behavior in case of a flood of filesystem corruption errors.
> > I think it has already been expressed by filesystem maintainers on
> > one your previous postings, that a flood of filesystem corruption
> > errors is often noise and the only interesting information is the
> > first error.
>
> My idea was be to provide an ioctl for the user to resize the ring
> buffer when needed, to make the flood manageable. But I understand your
> main point about the ring buffer. i'm not sure saving only the first
> notification solves Google's use case of error monitoring and analysis,
> though. Khazhy, Ted, can you weight in?
I think this is a good point to bring up - a flood of errors shouldn't
drown out other filesystems, and from the perspective of error
reporting, it's better to drop all but one notification from one FS
than to drop the only notification from another. In the cases I can
think of, the first error is probably enough and does simplify things
quite a bit. There is the option of setting up a ring buffer per fs,
which does seem excessive in light of the previous statement.
>
> > For this reason, I think that FS_ERROR could be implemented
> > by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
> >
> > struct fsnotify_sb_mark {
> > struct fsnotify_mark fsn_mark;
> > struct fsnotify_error_info info;
> > }
> >
> > Similar to fd sampled errseq, there can be only one error report
> > per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
> > the error report can be allocated at the time of setting the filesystem mark.
> >
> > With this, you will not need the added complexity of the ring buffer
> > and you will not need to limit FAN_ERROR reporting to a group that
> > is only listening for FAN_ERROR, which is an unneeded limitation IMO.
>
> The limitation exists because I was concerned about not breaking the
> semantics of FAN_ACCESS and others, with regards to merged
> notifications. I believe there should be no other reason why
> notifications of FAN_CLASS_NOTIF can't be sent to the ring buffer too.
> That limitation could be lifted for everything but permission events, I
> think.
>
> --
> Gabriel Krisman Bertazi
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 10/15] fanotify: Introduce code location record
2021-04-29 18:40 ` Gabriel Krisman Bertazi
@ 2021-05-11 5:35 ` Khazhy Kumykov
0 siblings, 0 replies; 46+ messages in thread
From: Khazhy Kumykov @ 2021-05-11 5:35 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Amir Goldstein, Theodore Tso, Darrick J. Wong, Dave Chinner,
Jan Kara, David Howells, linux-fsdevel, Ext4, kernel
[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]
On Thu, Apr 29, 2021 at 11:40 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
>
> > On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> >>
> >> This patch introduces an optional info record that describes the
> >> source (as in the region of the source-code where an event was
> >> initiated). This record is not produced for other type of existing
> >> notification, but it is optionally enabled for FAN_ERROR notifications.
> >>
> >
> > I find this functionality controversial, because think that the fs provided
> > s_last_error*, s_first_error* is more reliable and more powerful than this
> > functionality.
> >
> > Let's leave it for a future extending proposal, should fanotify event reporting
> > proposal pass muster, shall we?
> > Or do you think that without this optional extension fanotify event reporting
> > will not be valuable enough?
>
> I think it is valuable enough without this bit, at least on a first
> moment. I understand it would be useful for ext4 to analyse information
> through this interface, but the main priority is to have a way to push
> out the information that an error occured, as you mentioned.
Ack, if it's deemed cleaner we could look at sysfs on notification,
but having the information in the same event provides some convenience
factor, and avoids racing in the event that we're looking at an error
after the first one.
>
> Also, this might be more powerful if we stick to the ring buffer instead
> of single stlot, as it would allow more data to be collected than just
> first/last.
> >
> > Thanks,
> > Amir.
>
> --
> Gabriel Krisman Bertazi
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH RFC 00/15] File system wide monitoring
2021-04-27 4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
2021-04-27 15:44 ` Gabriel Krisman Bertazi
@ 2021-05-11 10:43 ` Jan Kara
1 sibling, 0 replies; 46+ messages in thread
From: Jan Kara @ 2021-05-11 10:43 UTC (permalink / raw)
To: Amir Goldstein
Cc: Gabriel Krisman Bertazi, Theodore Tso, Darrick J. Wong,
Dave Chinner, Jan Kara, David Howells, Khazhismel Kumykov,
linux-fsdevel, Ext4, kernel
On Tue 27-04-21 07:11:49, Amir Goldstein wrote:
> The ring buffer functionality for fsnotify is interesting and it may be
> useful on its own, but IMO, its too big of a hammer for the problem
> at hand.
>
> The question that you should be asking yourself is what is the
> expected behavior in case of a flood of filesystem corruption errors.
> I think it has already been expressed by filesystem maintainers on
> one your previous postings, that a flood of filesystem corruption
> errors is often noise and the only interesting information is the first error.
>
> For this reason, I think that FS_ERROR could be implemented
> by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
>
> struct fsnotify_sb_mark {
> struct fsnotify_mark fsn_mark;
> struct fsnotify_error_info info;
> }
>
> Similar to fd sampled errseq, there can be only one error report
> per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
> the error report can be allocated at the time of setting the filesystem mark.
>
> With this, you will not need the added complexity of the ring buffer
> and you will not need to limit FAN_ERROR reporting to a group that
> is only listening for FAN_ERROR, which is an unneeded limitation IMO.
Seeing that this 'single error per mark' idea is gathering some support I'd
like to add my 2c: Probably we don't want fsnotify_error_info attached to
every fsnotify_mark, I guess we can have:
struct fanotify_mark {
struct fsnotify_mark fsn_mark;
struct fanotify_error_event *event;
};
and 'event' will be normally NULL and if we add FAN_ERROR to mark's mask,
we will allocate event (also containing error info) to use when generating
error event. And then the handling will be somewhat similar to how we
handle overflow events.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2021-05-11 10:43 UTC | newest]
Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
2021-04-27 4:42 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
2021-04-27 4:53 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
2021-04-27 5:03 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
2021-04-26 22:00 ` kernel test robot
2021-04-26 22:43 ` kernel test robot
2021-04-27 5:39 ` Amir Goldstein
2021-04-29 18:33 ` Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
2021-04-27 6:02 ` Amir Goldstein
2021-04-29 18:36 ` Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
2021-04-27 8:39 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
2021-04-27 6:49 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
2021-04-27 7:01 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
2021-04-27 7:11 ` Amir Goldstein
2021-04-29 18:40 ` Gabriel Krisman Bertazi
2021-05-11 5:35 ` Khazhy Kumykov
2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
2021-04-27 7:12 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
2021-04-26 22:45 ` kernel test robot
2021-04-27 7:25 ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
2021-04-26 23:10 ` kernel test robot
2021-04-27 4:32 ` Amir Goldstein
2021-04-29 0:57 ` Darrick J. Wong
2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
2021-04-26 23:10 ` kernel test robot
2021-04-26 18:42 ` [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework Gabriel Krisman Bertazi
2021-04-27 4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
2021-04-27 15:44 ` Gabriel Krisman Bertazi
2021-05-11 4:45 ` Khazhy Kumykov
2021-05-11 10:43 ` Jan Kara
2021-04-27 4:33 [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark kernel test robot
2021-04-29 11:31 ` Dan Carpenter
2021-04-27 8:36 [PATCH RFC 13/15] ext4: Send notifications on error kernel test robot
2021-04-29 13:19 ` Dan Carpenter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.