All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/15] File system wide monitoring
@ 2021-04-26 18:41 Gabriel Krisman Bertazi
  2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
                   ` (15 more replies)
  0 siblings, 16 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Hi,

In an attempt to consolidate some of the feedback from the previous
proposals, I wrote a new attempt to solve the file system error reporting
problem.  Before I spend more time polishing it, I'd like to hear your
feedback if I'm going in the wrong direction, in particular with the
modifications to fsnotify.

This RFC follows up on my previous proposals which attempted to leverage
watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
to push error notifications to user space.  This proposal starts by, as
suggested by Darrick, limiting the scope of what I'm trying to do to an
interface for administrators to monitor the health of a file system,
instead of a generic inteface for file errors.  Therefore, this doesn't
solve the problem of writeback errors or the need to watch a specific
subsystem.

* Format

The feature is implemented on top of fanotify, as a new type of fanotify
mark, FAN_ERROR, which a file system monitoring tool can register to
receive notifications.  A notification is split in three parts, and only
the first is guaranteed to exist for any given error event:

 - FS generic data: A file system agnostic structure that has a generic
 error code and identifies the filesystem.  Basically, it let's
 userspace know something happen on a monitored filesystem.

 - FS location data: Identifies where in the code the problem
 happened. (This is important for the use case of analysing frequent
 error points that we discussed earlier).

 - FS specific data: A detailed error report in a filesystem specific
 format that details what the error is.  Ideally, a capable monitoring
 tool can use the information here for error recovery.  For instance,
 xfs can put the xfs_scrub structures here, ext4 can send its error
 reports, etc.  An example of usage is done in the ext4 patch of this
 series.

More details on the information in each record can be found on the
documentation introduced in patch 15.

* Using fanotify

Using fanotify for this kind of thing is slightly tricky because we want
to guarantee delivery in some complicated conditions, for instance, the
file system might want to send an error while holding several locks.

Instead of working around file system constraints at the file system
level, this proposal tries to make the FAN_ERROR submission safe in
those contexts.  This is done with a new mode in fsnotify that
preallocates the memory at group creation to be used for the
notification submission.

This new mode in fsnotify introduces a ring buffer to queue
notifications, which eliminates the allocation path in fsnotify.  From
what I saw, the allocation is the only problem in fsnotify for
filesystems to submit errors in constrained situations.

* Visibility

Since the usecase is limited to a tool for whole file system monitoring,
errors are associated with the superblock and visible filesystem-wide.
It is assumed and required that userspace has CAP_SYS_ADMIN.

* Testing

This was tested with corrupted ext4 images in a few scenarios, which
caused errors to be triggered and monitored with the sample tool
provided in the next to final patch.

* patches

Patches 1-4 massage fanotify attempt to refactor fanotify a bit for
the patches to come.  Patch 5 introduce the ring buffer interface to
fsnotify, while patch 6 enable this support in fanotify.  Patch 7, 8 wire
the FS_ERROR event type, which will be used by filesystems.  In
sequennce, patches 9-12 implement the FAN_ERROR record types and create
the new event.  Patch 13 is an ext4 example implementation supporting
this feature.  Finally, patches 14 and 15 document and provide examples
of a userspace tool that uses this feature.

I also pushed the full series to:

  https://gitlab.collabora.com/krisman/linux -b fanotify-notifications

[1] https://lwn.net/Articles/839310/
[2] https://www.spinics.net/lists/linux-fsdevel/msg187075.html

Gabriel Krisman Bertazi (15):
  fanotify: Fold event size calculation to its own function
  fanotify: Split fsid check from other fid mode checks
  fsnotify: Wire flags field on group allocation
  fsnotify: Wire up group information on event initialization
  fsnotify: Support event submission through ring buffer
  fanotify: Support submission through ring buffer
  fsnotify: Support FS_ERROR event type
  fsnotify: Introduce helpers to send error_events
  fanotify: Introduce generic error record
  fanotify: Introduce code location record
  fanotify: Introduce filesystem specific data record
  fanotify: Introduce the FAN_ERROR mark
  ext4: Send notifications on error
  samples: Add fs error monitoring example
  Documentation: Document the FAN_ERROR framework

 .../admin-guide/filesystem-monitoring.rst     | 103 ++++++
 Documentation/admin-guide/index.rst           |   1 +
 fs/ext4/super.c                               |  60 +++-
 fs/notify/Makefile                            |   2 +-
 fs/notify/dnotify/dnotify.c                   |   2 +-
 fs/notify/fanotify/fanotify.c                 | 127 +++++--
 fs/notify/fanotify/fanotify.h                 |  35 +-
 fs/notify/fanotify/fanotify_user.c            | 319 ++++++++++++++----
 fs/notify/fsnotify.c                          |   2 +-
 fs/notify/group.c                             |  25 +-
 fs/notify/inotify/inotify_fsnotify.c          |   2 +-
 fs/notify/inotify/inotify_user.c              |   4 +-
 fs/notify/notification.c                      |  10 +
 fs/notify/ring.c                              | 199 +++++++++++
 include/linux/fanotify.h                      |  12 +-
 include/linux/fsnotify.h                      |  15 +
 include/linux/fsnotify_backend.h              |  63 +++-
 include/uapi/linux/ext4-notify.h              |  17 +
 include/uapi/linux/fanotify.h                 |  26 ++
 kernel/audit_fsnotify.c                       |   2 +-
 kernel/audit_tree.c                           |   2 +-
 kernel/audit_watch.c                          |   2 +-
 samples/Kconfig                               |   7 +
 samples/Makefile                              |   1 +
 samples/fanotify/Makefile                     |   3 +
 samples/fanotify/fs-monitor.c                 | 135 ++++++++
 26 files changed, 1034 insertions(+), 142 deletions(-)
 create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
 create mode 100644 fs/notify/ring.c
 create mode 100644 include/uapi/linux/ext4-notify.h
 create mode 100644 samples/fanotify/Makefile
 create mode 100644 samples/fanotify/fs-monitor.c

-- 
2.31.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  4:42   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Every time this function is invoked, it is immediately added to
FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
length of info records. This minor clean up folds the rest of the
calculation into the function, which now operates in terms of events,
returning the size of the entire event, including metadata.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify_user.c | 40 +++++++++++++++++-------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9e0c1afac8bd..0332c4afeec3 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -64,17 +64,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
 	return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
 }
 
-static int fanotify_event_info_len(unsigned int fid_mode,
-				   struct fanotify_event *event)
+static size_t fanotify_event_len(struct fanotify_event *event,
+				 unsigned int fid_mode)
 {
-	struct fanotify_info *info = fanotify_event_info(event);
-	int dir_fh_len = fanotify_event_dir_fh_len(event);
-	int fh_len = fanotify_event_object_fh_len(event);
-	int info_len = 0;
+	size_t event_len = FAN_EVENT_METADATA_LEN;
+	struct fanotify_info *info;
+	int dir_fh_len;
+	int fh_len;
 	int dot_len = 0;
 
+	if (!fid_mode)
+		return event_len;
+
+	info = fanotify_event_info(event);
+	dir_fh_len = fanotify_event_dir_fh_len(event);
+	fh_len = fanotify_event_object_fh_len(event);
+
 	if (dir_fh_len) {
-		info_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
+		event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
 	} else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
 		/*
 		 * With group flag FAN_REPORT_NAME, if name was not recorded in
@@ -84,9 +91,9 @@ static int fanotify_event_info_len(unsigned int fid_mode,
 	}
 
 	if (fh_len)
-		info_len += fanotify_fid_info_len(fh_len, dot_len);
+		event_len += fanotify_fid_info_len(fh_len, dot_len);
 
-	return info_len;
+	return event_len;
 }
 
 /*
@@ -98,7 +105,8 @@ static int fanotify_event_info_len(unsigned int fid_mode,
 static struct fanotify_event *get_one_event(struct fsnotify_group *group,
 					    size_t count)
 {
-	size_t event_size = FAN_EVENT_METADATA_LEN;
+	size_t event_size;
+	struct fsnotify_event *fse;
 	struct fanotify_event *event = NULL;
 	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 
@@ -108,16 +116,15 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
 	if (fsnotify_notify_queue_is_empty(group))
 		goto out;
 
-	if (fid_mode) {
-		event_size += fanotify_event_info_len(fid_mode,
-			FANOTIFY_E(fsnotify_peek_first_event(group)));
-	}
+	fse = fsnotify_peek_first_event(group);
+	event = FANOTIFY_E(fse);
+	event_size = fanotify_event_len(event, fid_mode);
 
 	if (event_size > count) {
 		event = ERR_PTR(-EINVAL);
 		goto out;
 	}
-	event = FANOTIFY_E(fsnotify_remove_first_event(group));
+	fsnotify_remove_queued_event(group, fse);
 	if (fanotify_is_perm_event(event->mask))
 		FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
 out:
@@ -334,8 +341,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	metadata.event_len = FAN_EVENT_METADATA_LEN +
-				fanotify_event_info_len(fid_mode, event);
+	metadata.event_len = fanotify_event_len(event, fid_mode);
 	metadata.metadata_len = FAN_EVENT_METADATA_LEN;
 	metadata.vers = FANOTIFY_METADATA_VERSION;
 	metadata.reserved = 0;
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
  2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  4:53   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

FAN_ERROR will require fsid, but not necessarily require the filesystem
to expose a file handle.  Split those checks into different functions, so
they can be used separately when creating a mark.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify_user.c | 35 +++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 0332c4afeec3..e0d113e3b65c 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1055,7 +1055,23 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 }
 
 /* Check if filesystem can encode a unique fid */
-static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
+static int fanotify_test_fid(struct path *path)
+{
+	/*
+	 * We need to make sure that the file system supports at least
+	 * encoding a file handle so user can use name_to_handle_at() to
+	 * compare fid returned with event to the file handle of watched
+	 * objects. However, name_to_handle_at() requires that the
+	 * filesystem also supports decoding file handles.
+	 */
+	if (!path->dentry->d_sb->s_export_op ||
+	    !path->dentry->d_sb->s_export_op->fh_to_dentry)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static int fanotify_check_path_fsid(struct path *path, __kernel_fsid_t *fsid)
 {
 	__kernel_fsid_t root_fsid;
 	int err;
@@ -1082,17 +1098,6 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
 	    root_fsid.val[1] != fsid->val[1])
 		return -EXDEV;
 
-	/*
-	 * We need to make sure that the file system supports at least
-	 * encoding a file handle so user can use name_to_handle_at() to
-	 * compare fid returned with event to the file handle of watched
-	 * objects. However, name_to_handle_at() requires that the
-	 * filesystem also supports decoding file handles.
-	 */
-	if (!path->dentry->d_sb->s_export_op ||
-	    !path->dentry->d_sb->s_export_op->fh_to_dentry)
-		return -EOPNOTSUPP;
-
 	return 0;
 }
 
@@ -1230,7 +1235,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	}
 
 	if (fid_mode) {
-		ret = fanotify_test_fid(&path, &__fsid);
+		ret = fanotify_check_path_fsid(&path, &__fsid);
+		if (ret)
+			goto path_put_and_out;
+
+		ret = fanotify_test_fid(&path);
 		if (ret)
 			goto path_put_and_out;
 
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
  2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
  2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  5:03   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization Gabriel Krisman Bertazi
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Introduce a flags field in fsnotify_group to track the mode of
submission this group has.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/dnotify/dnotify.c        |  2 +-
 fs/notify/fanotify/fanotify_user.c |  4 ++--
 fs/notify/group.c                  | 13 ++++++++-----
 fs/notify/inotify/inotify_user.c   |  2 +-
 include/linux/fsnotify_backend.h   |  7 +++++--
 kernel/audit_fsnotify.c            |  2 +-
 kernel/audit_tree.c                |  2 +-
 kernel/audit_watch.c               |  2 +-
 8 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index e85e13c50d6d..37960c8750e4 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -383,7 +383,7 @@ static int __init dnotify_init(void)
 					  SLAB_PANIC|SLAB_ACCOUNT);
 	dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
 
-	dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops);
+	dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0);
 	if (IS_ERR(dnotify_group))
 		panic("unable to allocate fsnotify group for dnotify\n");
 	return 0;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index e0d113e3b65c..f50c4ab721e3 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -929,7 +929,7 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void)
 SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 {
 	struct fsnotify_group *group;
-	int f_flags, fd;
+	int f_flags, fd, fsn_flags = 0;
 	struct user_struct *user;
 	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
 	unsigned int class = flags & FANOTIFY_CLASS_BITS;
@@ -982,7 +982,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		f_flags |= O_NONBLOCK;
 
 	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
-	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
+	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
 	if (IS_ERR(group)) {
 		free_uid(user);
 		return PTR_ERR(group);
diff --git a/fs/notify/group.c b/fs/notify/group.c
index ffd723ffe46d..08acb1afc0c2 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -112,7 +112,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
 EXPORT_SYMBOL_GPL(fsnotify_put_group);
 
 static struct fsnotify_group *__fsnotify_alloc_group(
-				const struct fsnotify_ops *ops, gfp_t gfp)
+	const struct fsnotify_ops *ops, unsigned int flags, gfp_t gfp)
 {
 	struct fsnotify_group *group;
 
@@ -134,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 	INIT_LIST_HEAD(&group->marks_list);
 
 	group->ops = ops;
+	group->flags = flags;
 
 	return group;
 }
@@ -141,18 +142,20 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 /*
  * Create a new fsnotify_group and hold a reference for the group returned.
  */
-struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
+					    unsigned int flags)
 {
-	return __fsnotify_alloc_group(ops, GFP_KERNEL);
+	return __fsnotify_alloc_group(ops, flags, GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
 
 /*
  * Create a new fsnotify_group and hold a reference for the group returned.
  */
-struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops,
+						 unsigned int flags)
 {
-	return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT);
+	return __fsnotify_alloc_group(ops, flags, GFP_KERNEL_ACCOUNT);
 }
 EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group);
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index c71be4fb7dc5..f2687267bc15 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -632,7 +632,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
 	struct fsnotify_group *group;
 	struct inotify_event_info *oevent;
 
-	group = fsnotify_alloc_user_group(&inotify_fsnotify_ops);
+	group = fsnotify_alloc_user_group(&inotify_fsnotify_ops, 0);
 	if (IS_ERR(group))
 		return group;
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index e5409b83e731..ef4352563ede 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -221,6 +221,7 @@ struct fsnotify_group {
 						 * full */
 
 	struct mem_cgroup *memcg;	/* memcg to charge allocations */
+	unsigned int flags;
 
 	/* groups can define private fields here or use the void *private */
 	union {
@@ -469,8 +470,10 @@ static inline void fsnotify_update_flags(struct dentry *dentry)
 /* called from fsnotify listeners, such as fanotify or dnotify */
 
 /* create a new group */
-extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops);
-extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops);
+extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
+						   unsigned int flags);
+extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops,
+							unsigned int flags);
 /* get reference to a group */
 extern void fsnotify_get_group(struct fsnotify_group *group);
 /* drop reference on a group from fsnotify_alloc_group */
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 60739d5e3373..dce6a6212f8f 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -182,7 +182,7 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = {
 
 static int __init audit_fsnotify_init(void)
 {
-	audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops);
+	audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, 0);
 	if (IS_ERR(audit_fsnotify_group)) {
 		audit_fsnotify_group = NULL;
 		audit_panic("cannot create audit fsnotify group");
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 6c91902f4f45..3d045fc791f2 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -1077,7 +1077,7 @@ static int __init audit_tree_init(void)
 
 	audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
 
-	audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
+	audit_tree_group = fsnotify_alloc_group(&audit_tree_ops, 0);
 	if (IS_ERR(audit_tree_group))
 		audit_panic("cannot initialize fsnotify group for rectree watches");
 
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 2acf7ca49154..80a8c14de961 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -493,7 +493,7 @@ static const struct fsnotify_ops audit_watch_fsnotify_ops = {
 
 static int __init audit_watch_init(void)
 {
-	audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops);
+	audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops, 0);
 	if (IS_ERR(audit_watch_group)) {
 		audit_watch_group = NULL;
 		audit_panic("cannot create audit fsnotify group");
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

This is used by following patches when deciding about which fields to
initialize.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify.c        | 2 +-
 fs/notify/fanotify/fanotify.h        | 5 +++--
 fs/notify/fanotify/fanotify_user.c   | 6 +++---
 fs/notify/inotify/inotify_fsnotify.c | 2 +-
 fs/notify/inotify/inotify_user.c     | 2 +-
 include/linux/fsnotify_backend.h     | 3 ++-
 6 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 1192c9953620..e3669d8a4a64 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -601,7 +601,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	 * event queue, so event reported on parent is merged with event
 	 * reported on child when both directory and child watches exist.
 	 */
-	fanotify_init_event(event, (unsigned long)id, mask);
+	fanotify_init_event(group, event, (unsigned long)id, mask);
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_TID))
 		event->pid = get_pid(task_pid(current));
 	else
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 896c819a1786..47299e3d6efd 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -144,10 +144,11 @@ struct fanotify_event {
 	struct pid *pid;
 };
 
-static inline void fanotify_init_event(struct fanotify_event *event,
+static inline void fanotify_init_event(struct fsnotify_group *group,
+				       struct fanotify_event *event,
 				       unsigned long id, u32 mask)
 {
-	fsnotify_init_event(&event->fse, id);
+	fsnotify_init_event(group, &event->fse, id);
 	event->mask = mask;
 	event->pid = NULL;
 }
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index f50c4ab721e3..fe605359af88 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -911,7 +911,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
 				 FSNOTIFY_OBJ_TYPE_INODE, mask, flags, fsid);
 }
 
-static struct fsnotify_event *fanotify_alloc_overflow_event(void)
+static struct fsnotify_event *fanotify_alloc_overflow_event(struct fsnotify_group *group)
 {
 	struct fanotify_event *oevent;
 
@@ -919,7 +919,7 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void)
 	if (!oevent)
 		return NULL;
 
-	fanotify_init_event(oevent, 0, FS_Q_OVERFLOW);
+	fanotify_init_event(group, oevent, 0, FS_Q_OVERFLOW);
 	oevent->type = FANOTIFY_EVENT_TYPE_OVERFLOW;
 
 	return &oevent->fse;
@@ -993,7 +993,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	atomic_inc(&user->fanotify_listeners);
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
 
-	group->overflow_event = fanotify_alloc_overflow_event();
+	group->overflow_event = fanotify_alloc_overflow_event(group);
 	if (unlikely(!group->overflow_event)) {
 		fd = -ENOMEM;
 		goto out_destroy_group;
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 1901d799909b..c6eceb663ac3 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -107,7 +107,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
 		mask &= ~IN_ISDIR;
 
 	fsn_event = &event->fse;
-	fsnotify_init_event(fsn_event, 0);
+	fsnotify_init_event(group, fsn_event, 0);
 	event->mask = mask;
 	event->wd = i_mark->wd;
 	event->sync_cookie = cookie;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index f2687267bc15..fb9a62b988e5 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -642,7 +642,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
 		return ERR_PTR(-ENOMEM);
 	}
 	group->overflow_event = &oevent->fse;
-	fsnotify_init_event(group->overflow_event, 0);
+	fsnotify_init_event(group, group->overflow_event, 0);
 	oevent->mask = FS_Q_OVERFLOW;
 	oevent->wd = -1;
 	oevent->sync_cookie = 0;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index ef4352563ede..190c6a402e98 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -579,7 +579,8 @@ extern void fsnotify_put_mark(struct fsnotify_mark *mark);
 extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info);
 extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
 
-static inline void fsnotify_init_event(struct fsnotify_event *event,
+static inline void fsnotify_init_event(struct fsnotify_group *group,
+				       struct fsnotify_event *event,
 				       unsigned long objectid)
 {
 	INIT_LIST_HEAD(&event->list);
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-26 22:00   ` kernel test robot
                     ` (2 more replies)
  2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
                   ` (10 subsequent siblings)
  15 siblings, 3 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

In order to support file system health/error reporting over fanotify,
fsnotify needs to expose a submission path that doesn't allow sleeping.
The only problem I identified with the current submission path is the
need to dynamically allocate memory for the event queue.

This patch avoids the problem by introducing a new mode in fsnotify,
where a ring buffer is used to submit events for a group.  Each group
has its own ring buffer, and error notifications are submitted
exclusively through it.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/Makefile               |   2 +-
 fs/notify/group.c                |  12 +-
 fs/notify/notification.c         |  10 ++
 fs/notify/ring.c                 | 199 +++++++++++++++++++++++++++++++
 include/linux/fsnotify_backend.h |  37 +++++-
 5 files changed, 255 insertions(+), 5 deletions(-)
 create mode 100644 fs/notify/ring.c

diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index 63a4b8828df4..61dae1e90f2d 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o mark.o \
-				   fdinfo.o
+				   fdinfo.o ring.o
 
 obj-y			+= dnotify/
 obj-y			+= inotify/
diff --git a/fs/notify/group.c b/fs/notify/group.c
index 08acb1afc0c2..b99b3de36696 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -81,7 +81,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
 	 * notification against this group. So clearing the notification queue
 	 * of all events is reliable now.
 	 */
-	fsnotify_flush_notify(group);
+	if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+		fsnotify_free_ring_buffer(group);
+	else
+		fsnotify_flush_notify(group);
 
 	/*
 	 * Destroy overflow event (we cannot use fsnotify_destroy_event() as
@@ -136,6 +139,13 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 	group->ops = ops;
 	group->flags = flags;
 
+	if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
+		if (fsnotify_create_ring_buffer(group)) {
+			kfree(group);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+
 	return group;
 }
 
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 75d79d6d3ef0..32f97e7b7a80 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -51,6 +51,10 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
 bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
 {
 	assert_spin_locked(&group->notification_lock);
+
+	if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+		return fsnotify_ring_notify_queue_is_empty(group);
+
 	return list_empty(&group->notification_list) ? true : false;
 }
 
@@ -132,6 +136,9 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group,
 				  struct fsnotify_event *event)
 {
 	assert_spin_locked(&group->notification_lock);
+
+	if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+		return;
 	/*
 	 * We need to init list head for the case of overflow event so that
 	 * check in fsnotify_add_event() works
@@ -166,6 +173,9 @@ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
 {
 	assert_spin_locked(&group->notification_lock);
 
+	if (group->flags & FSN_SUBMISSION_RING_BUFFER)
+		return fsnotify_ring_peek_first_event(group);
+
 	return list_first_entry(&group->notification_list,
 				struct fsnotify_event, list);
 }
diff --git a/fs/notify/ring.c b/fs/notify/ring.c
new file mode 100644
index 000000000000..75e8af1f8d80
--- /dev/null
+++ b/fs/notify/ring.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/types.h>
+#include <linux/fsnotify.h>
+#include <linux/memcontrol.h>
+
+#define INVALID_RING_SLOT -1
+
+#define FSNOTIFY_RING_PAGES 16
+
+#define NEXT_SLOT(cur, len, ring_size) ((cur + len) & (ring_size-1))
+#define NEXT_PAGE(cur, ring_size) (round_up(cur, PAGE_SIZE) & (ring_size-1))
+
+bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group)
+{
+	assert_spin_locked(&group->notification_lock);
+
+	if (group->ring_buffer.tail == group->ring_buffer.head)
+		return true;
+	return false;
+}
+
+struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group)
+{
+	u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+	struct fsnotify_event *fsn;
+	char *kaddr;
+	u64 tail;
+
+	assert_spin_locked(&group->notification_lock);
+
+again:
+	tail = group->ring_buffer.tail;
+
+	if ((PAGE_SIZE - (tail & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
+		group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
+		goto again;
+	}
+
+	kaddr = kmap_atomic(group->ring_buffer.pages[tail / PAGE_SIZE]);
+	if (!kaddr)
+		return NULL;
+	fsn = (struct fsnotify_event *) (kaddr + (tail & (PAGE_SIZE-1)));
+
+	if (fsn->slot_len == INVALID_RING_SLOT) {
+		group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
+		kunmap_atomic(kaddr);
+		goto again;
+	}
+
+	/* will be unmapped when entry is consumed. */
+	return fsn;
+}
+
+void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
+					struct fsnotify_event *event)
+{
+	u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+	u64 new_tail = NEXT_SLOT(group->ring_buffer.tail, event->slot_len, ring_size);
+
+	kunmap_atomic(event);
+
+	pr_debug("%s: group=%p tail=%llx->%llx ring_size=%llu\n", __func__,
+		 group, group->ring_buffer.tail, new_tail, ring_size);
+
+	WRITE_ONCE(group->ring_buffer.tail, new_tail);
+}
+
+struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
+						      size_t size)
+	__acquires(&group->notification_lock)
+{
+	struct fsnotify_event *fsn;
+	u64 head, tail;
+	u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+	u64 new_head;
+	void *kaddr;
+
+	if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
+		return ERR_PTR(-EINVAL);
+
+	pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
+		 ring_size, size);
+
+	spin_lock(&group->notification_lock);
+again:
+	head = group->ring_buffer.head;
+	tail = group->ring_buffer.tail;
+	new_head = NEXT_SLOT(head, size, ring_size);
+
+	/* head would catch up to tail, corrupting an entry. */
+	if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
+		fsn = ERR_PTR(-ENOMEM);
+		goto err;
+	}
+
+	/*
+	 * Not event a skip message fits in the page. We can detect the
+	 * lack of space. Move on to the next page.
+	 */
+	if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
+		/* Start again on next page */
+		group->ring_buffer.head = NEXT_PAGE(head, ring_size);
+		goto again;
+	}
+
+	kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
+	if (!kaddr) {
+		fsn = ERR_PTR(-EFAULT);
+		goto err;
+	}
+
+	fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
+
+	if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
+		/*
+		 * No room in the current page.  Add a fake entry
+		 * consuming the end the page to avoid splitting event
+		 * structure.
+		 */
+		fsn->slot_len = INVALID_RING_SLOT;
+		kunmap_atomic(kaddr);
+		/* Start again on the next page */
+		group->ring_buffer.head = NEXT_PAGE(head, ring_size);
+
+		goto again;
+	}
+	fsn->slot_len = size;
+
+	return fsn;
+
+err:
+	spin_unlock(&group->notification_lock);
+	return fsn;
+}
+
+void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn)
+	__releases(&group->notification_lock)
+{
+	u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
+	u64 head = group->ring_buffer.head;
+	u64 new_head = NEXT_SLOT(head, fsn->slot_len, ring_size);
+
+	pr_debug("%s: group=%p head=%llx->%llx ring_size=%llu\n", __func__,
+		 group, head, new_head, ring_size);
+
+	kunmap_atomic(fsn);
+	group->ring_buffer.head = new_head;
+
+	spin_unlock(&group->notification_lock);
+
+	wake_up(&group->notification_waitq);
+	kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
+
+}
+
+void fsnotify_free_ring_buffer(struct fsnotify_group *group)
+{
+	int i;
+
+	for (i = 0; i < group->ring_buffer.nr_pages; i++)
+		__free_page(group->ring_buffer.pages[i]);
+	kfree(group->ring_buffer.pages);
+	group->ring_buffer.nr_pages = 0;
+}
+
+int fsnotify_create_ring_buffer(struct fsnotify_group *group)
+{
+	int nr_pages = FSNOTIFY_RING_PAGES;
+	int i;
+
+	pr_debug("%s: group=%p pages=%d\n", __func__, group, nr_pages);
+
+	group->ring_buffer.pages = kmalloc_array(nr_pages, sizeof(struct pages *),
+						 GFP_KERNEL);
+	if (!group->ring_buffer.pages)
+		return -ENOMEM;
+
+	group->ring_buffer.head = 0;
+	group->ring_buffer.tail = 0;
+
+	for (i = 0; i < nr_pages; i++) {
+		group->ring_buffer.pages[i] = alloc_pages(GFP_KERNEL, 1);
+		if (!group->ring_buffer.pages)
+			goto err_dealloc;
+	}
+
+	group->ring_buffer.nr_pages = nr_pages;
+
+	return 0;
+
+err_dealloc:
+	for (--i; i >= 0; i--)
+		__free_page(group->ring_buffer.pages[i]);
+	kfree(group->ring_buffer.pages);
+	group->ring_buffer.nr_pages = 0;
+	return -ENOMEM;
+}
+
+
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 190c6a402e98..a1a4dd69e5ed 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -74,6 +74,8 @@
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)
 
+#define FSN_SUBMISSION_RING_BUFFER	0x00000080
+
 /*
  * This is a list of all events that may get sent to a parent that is watching
  * with flag FS_EVENT_ON_CHILD based on fs event on a child of that directory.
@@ -166,7 +168,11 @@ struct fsnotify_ops {
  * listener this structure is where you need to be adding fields.
  */
 struct fsnotify_event {
-	struct list_head list;
+	union {
+		struct list_head list;
+		int slot_len;
+	};
+
 	unsigned long objectid;	/* identifier for queue merges */
 };
 
@@ -191,7 +197,21 @@ struct fsnotify_group {
 
 	/* needed to send notification to userspace */
 	spinlock_t notification_lock;		/* protect the notification_list */
-	struct list_head notification_list;	/* list of event_holder this group needs to send to userspace */
+
+	union {
+		/*
+		 * list of event_holder this group needs to send to
+		 * userspace.  Either a linked list (default), or a ring
+		 * buffer(FSN_SUBMISSION_RING_BUFFER).
+		 */
+		struct list_head notification_list;
+		struct {
+			struct page **pages;
+			int nr_pages;
+			u64 head;
+			u64 tail;
+		} ring_buffer;
+	};
 	wait_queue_head_t notification_waitq;	/* read() on the notification file blocks on this waitq */
 	unsigned int q_len;			/* events on the queue */
 	unsigned int max_events;		/* maximum events allowed on the list */
@@ -492,6 +512,16 @@ extern int fsnotify_add_event(struct fsnotify_group *group,
 			      struct fsnotify_event *event,
 			      int (*merge)(struct list_head *,
 					   struct fsnotify_event *));
+
+extern int fsnotify_create_ring_buffer(struct fsnotify_group *group);
+extern void fsnotify_free_ring_buffer(struct fsnotify_group *group);
+extern struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
+							     size_t size);
+extern void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
+					       struct fsnotify_event *event);
+extern bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group);
+struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group);
+extern void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn);
 /* Queue overflow event to a notification group */
 static inline void fsnotify_queue_overflow(struct fsnotify_group *group)
 {
@@ -583,7 +613,8 @@ static inline void fsnotify_init_event(struct fsnotify_group *group,
 				       struct fsnotify_event *event,
 				       unsigned long objectid)
 {
-	INIT_LIST_HEAD(&event->list);
+	if (!(group->flags & FSN_SUBMISSION_RING_BUFFER))
+		INIT_LIST_HEAD(&event->list);
 	event->objectid = objectid;
 }
 
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 06/15] fanotify: Support submission through ring buffer
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  6:02   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

This adds support for the ring buffer mode in fanotify.  It is enabled
by a new flag FAN_PREALLOC_QUEUE passed to fanotify_init.  If this flag
is enabled, the group only allows marks that support the ring buffer
submission.  In a following patch, FAN_ERROR will make use of this
mechanism.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify.c      | 77 +++++++++++++++++++---------
 fs/notify/fanotify/fanotify_user.c | 81 ++++++++++++++++++------------
 include/linux/fanotify.h           |  5 +-
 include/uapi/linux/fanotify.h      |  1 +
 4 files changed, 105 insertions(+), 59 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e3669d8a4a64..98591a8155a7 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -612,6 +612,26 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	return event;
 }
 
+static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *group,
+						     u32 mask, const void *data,
+						     int data_type)
+{
+	size_t size = 0;
+
+	pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
+
+	return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
+}
+
+static void fanotify_ring_write_event(struct fsnotify_group *group,
+				      struct fanotify_event *event, u32 mask,
+				      const void *data, __kernel_fsid_t *fsid)
+{
+	fanotify_init_event(group, event, 0, mask);
+
+	event->pid = get_pid(task_tgid(current));
+}
+
 /*
  * Get cached fsid of the filesystem containing the object from any connector.
  * All connectors are supposed to have the same fsid, but we do not verify that
@@ -701,31 +721,38 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 			return 0;
 	}
 
-	event = fanotify_alloc_event(group, mask, data, data_type, dir,
-				     file_name, &fsid);
-	ret = -ENOMEM;
-	if (unlikely(!event)) {
-		/*
-		 * We don't queue overflow events for permission events as
-		 * there the access is denied and so no event is in fact lost.
-		 */
-		if (!fanotify_is_perm_event(mask))
-			fsnotify_queue_overflow(group);
-		goto finish;
-	}
-
-	fsn_event = &event->fse;
-	ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
-	if (ret) {
-		/* Permission events shouldn't be merged */
-		BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
-		/* Our event wasn't used in the end. Free it. */
-		fsnotify_destroy_event(group, fsn_event);
-
-		ret = 0;
-	} else if (fanotify_is_perm_event(mask)) {
-		ret = fanotify_get_response(group, FANOTIFY_PERM(event),
-					    iter_info);
+	if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
+		event = fanotify_ring_get_slot(group, mask, data, data_type);
+		if (IS_ERR(event))
+			return PTR_ERR(event);
+		fanotify_ring_write_event(group, event, mask, data, &fsid);
+		fsnotify_ring_commit_slot(group, &event->fse);
+	} else {
+		event = fanotify_alloc_event(group, mask, data, data_type, dir,
+					     file_name, &fsid);
+		ret = -ENOMEM;
+		if (unlikely(!event)) {
+			/*
+			 * We don't queue overflow events for permission events as
+			 * there the access is denied and so no event is in fact lost.
+			 */
+			if (!fanotify_is_perm_event(mask))
+				fsnotify_queue_overflow(group);
+			goto finish;
+		}
+		fsn_event = &event->fse;
+		ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
+		if (ret) {
+			/* Permission events shouldn't be merged */
+			BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
+			/* Our event wasn't used in the end. Free it. */
+			fsnotify_destroy_event(group, fsn_event);
+
+			ret = 0;
+		} else if (fanotify_is_perm_event(mask)) {
+			ret = fanotify_get_response(group, FANOTIFY_PERM(event),
+						    iter_info);
+		}
 	}
 finish:
 	if (fanotify_is_perm_event(mask))
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index fe605359af88..5031198bf7db 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -521,7 +521,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
 		 * Permission events get queued to wait for response.  Other
 		 * events can be destroyed now.
 		 */
-		if (!fanotify_is_perm_event(event->mask)) {
+		if (group->fanotify_data.flags & FAN_PREALLOC_QUEUE) {
+			fsnotify_ring_buffer_consume_event(group, &event->fse);
+		} else if (!fanotify_is_perm_event(event->mask)) {
 			fsnotify_destroy_event(group, &event->fse);
 		} else {
 			if (ret <= 0) {
@@ -587,40 +589,39 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 	 */
 	fsnotify_group_stop_queueing(group);
 
-	/*
-	 * Process all permission events on access_list and notification queue
-	 * and simulate reply from userspace.
-	 */
-	spin_lock(&group->notification_lock);
-	while (!list_empty(&group->fanotify_data.access_list)) {
-		struct fanotify_perm_event *event;
-
-		event = list_first_entry(&group->fanotify_data.access_list,
-				struct fanotify_perm_event, fae.fse.list);
-		list_del_init(&event->fae.fse.list);
-		finish_permission_event(group, event, FAN_ALLOW);
+	if (!(group->flags & FSN_SUBMISSION_RING_BUFFER)) {
+		/*
+		 * Process all permission events on access_list and notification queue
+		 * and simulate reply from userspace.
+		 */
 		spin_lock(&group->notification_lock);
-	}
-
-	/*
-	 * Destroy all non-permission events. For permission events just
-	 * dequeue them and set the response. They will be freed once the
-	 * response is consumed and fanotify_get_response() returns.
-	 */
-	while (!fsnotify_notify_queue_is_empty(group)) {
-		struct fanotify_event *event;
-
-		event = FANOTIFY_E(fsnotify_remove_first_event(group));
-		if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
-			spin_unlock(&group->notification_lock);
-			fsnotify_destroy_event(group, &event->fse);
-		} else {
-			finish_permission_event(group, FANOTIFY_PERM(event),
-						FAN_ALLOW);
+		while (!list_empty(&group->fanotify_data.access_list)) {
+			struct fanotify_perm_event *event;
+			event = list_first_entry(&group->fanotify_data.access_list,
+						 struct fanotify_perm_event, fae.fse.list);
+			list_del_init(&event->fae.fse.list);
+			finish_permission_event(group, event, FAN_ALLOW);
+			spin_lock(&group->notification_lock);
 		}
-		spin_lock(&group->notification_lock);
+		/*
+		 * Destroy all non-permission events. For permission events just
+		 * dequeue them and set the response. They will be freed once the
+		 * response is consumed and fanotify_get_response() returns.
+		 */
+		while (!fsnotify_notify_queue_is_empty(group)) {
+			struct fanotify_event *event;
+			event = FANOTIFY_E(fsnotify_remove_first_event(group));
+			if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
+				spin_unlock(&group->notification_lock);
+				fsnotify_destroy_event(group, &event->fse);
+			} else {
+				finish_permission_event(group, FANOTIFY_PERM(event),
+							FAN_ALLOW);
+			}
+			spin_lock(&group->notification_lock);
+		}
+		spin_unlock(&group->notification_lock);
 	}
-	spin_unlock(&group->notification_lock);
 
 	/* Response for all permission events it set, wakeup waiters */
 	wake_up(&group->fanotify_data.access_waitq);
@@ -981,6 +982,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	if (flags & FAN_NONBLOCK)
 		f_flags |= O_NONBLOCK;
 
+	if (flags & FAN_PREALLOC_QUEUE) {
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+
+		if (flags & FAN_UNLIMITED_QUEUE)
+			return -EINVAL;
+
+		fsn_flags = FSN_SUBMISSION_RING_BUFFER;
+	}
+
 	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
 	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
 	if (IS_ERR(group)) {
@@ -1223,6 +1234,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		goto fput_and_out;
 	}
 
+	if ((group->flags & FSN_SUBMISSION_RING_BUFFER) &&
+	    (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
+		goto fput_and_out;
+
 	ret = fanotify_find_path(dfd, pathname, &path, flags,
 			(mask & ALL_FSNOTIFY_EVENTS), obj_type);
 	if (ret)
@@ -1327,7 +1342,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
  */
 static int __init fanotify_user_setup(void)
 {
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 3e9c56ee651f..5a4cefb4b1c3 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -23,7 +23,8 @@
 #define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
 				 FAN_REPORT_TID | \
 				 FAN_CLOEXEC | FAN_NONBLOCK | \
-				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
+				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
+				 FAN_PREALLOC_QUEUE)
 
 #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
 				 FAN_MARK_FILESYSTEM)
@@ -71,6 +72,8 @@
 					 FANOTIFY_PERM_EVENTS | \
 					 FAN_Q_OVERFLOW | FAN_ONDIR)
 
+#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
+
 #define ALL_FANOTIFY_EVENT_BITS		(FANOTIFY_OUTGOING_EVENTS | \
 					 FANOTIFY_EVENT_FLAGS)
 
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index fbf9c5c7dd59..b283531549f1 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -49,6 +49,7 @@
 #define FAN_UNLIMITED_QUEUE	0x00000010
 #define FAN_UNLIMITED_MARKS	0x00000020
 #define FAN_ENABLE_AUDIT	0x00000040
+#define FAN_PREALLOC_QUEUE	0x00000080
 
 /* Flags to determine fanotify event format */
 #define FAN_REPORT_TID		0x00000100	/* event->pid is thread id */
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (5 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  8:39   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Expose a new type of fsnotify event for filesystems to report errors for
userspace monitoring tools.  fanotify will send this type of
notification for FAN_ERROR marks.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fsnotify.c             |  2 +-
 include/linux/fsnotify_backend.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 30d422b8c0fc..9fff35e67b37 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -558,7 +558,7 @@ static __init int fsnotify_init(void)
 {
 	int ret;
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26);
 
 	ret = init_srcu_struct(&fsnotify_mark_srcu);
 	if (ret)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index a1a4dd69e5ed..f850bfbe30d4 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -48,6 +48,8 @@
 #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
 
+#define FS_ERROR		0x00100000	/* Used for filesystem error reporting */
+
 #define FS_EXCL_UNLINK		0x04000000	/* do not send events if object is unlinked */
 /*
  * Set on inode mark that cares about things that happen to its children.
@@ -74,6 +76,8 @@
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)
 
+#define ALL_FSNOTIFY_ERROR_EVENTS	FS_ERROR
+
 #define FSN_SUBMISSION_RING_BUFFER	0x00000080
 
 /*
@@ -95,6 +99,7 @@
 
 /* Events that can be reported to backends */
 #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
+			     ALL_FSNOTIFY_ERROR_EVENTS |  \
 			     FS_EVENTS_POSS_ON_CHILD | \
 			     FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \
 			     FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED)
@@ -272,6 +277,17 @@ enum fsnotify_data_type {
 	FSNOTIFY_EVENT_NONE,
 	FSNOTIFY_EVENT_PATH,
 	FSNOTIFY_EVENT_INODE,
+	FSNOTIFY_EVENT_ERROR,
+};
+
+struct fs_error_report {
+	int error;
+
+	int line;
+	const char *function;
+
+	size_t fs_data_size;
+	void *fs_data;
 };
 
 static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (6 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  6:49   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/fsnotify.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index f8acddcf54fb..b3ac1a9d0d4d 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -317,4 +317,19 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 		fsnotify_dentry(dentry, mask);
 }
 
+static inline void fsnotify_error_event(int error, struct inode *dir,
+					const char *function, int line,
+					void *fs_data, int fs_data_size)
+{
+	struct fs_error_report report = {
+		.error = error,
+		.line = line,
+		.function = function,
+		.fs_data_size = fs_data_size,
+		.fs_data = fs_data,
+	};
+
+	fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, dir, NULL, NULL, 0);
+}
+
 #endif	/* _LINUX_FS_NOTIFY_H */
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 09/15] fanotify: Introduce generic error record
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (7 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  7:01   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

This record describes a fs error in a fs agnostic way.  It will be send
back to userspace in response to a FSNOTIFY_EVENT_ERROR for groups with
the FAN_ERROR mark.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify.h      | 16 ++++++++++++++++
 fs/notify/fanotify/fanotify_user.c | 28 ++++++++++++++++++++++++++++
 include/uapi/linux/fanotify.h      | 10 ++++++++++
 3 files changed, 54 insertions(+)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 47299e3d6efd..4cb9dd31f084 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -179,6 +179,22 @@ FANOTIFY_NE(struct fanotify_event *event)
 	return container_of(event, struct fanotify_name_event, fae);
 }
 
+struct fanotify_error_event {
+	struct fanotify_event fae;
+	int error;
+	__kernel_fsid_t fsid;
+
+	int fs_data_size;
+	/* Must be the last item in the structure */
+	char fs_data[0];
+};
+
+static inline struct fanotify_error_event *
+FANOTIFY_EE(struct fanotify_event *event)
+{
+	return container_of(event, struct fanotify_error_event, fae);
+}
+
 static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
 {
 	if (event->type == FANOTIFY_EVENT_TYPE_FID)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 5031198bf7db..21162d347bd1 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -64,6 +64,11 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
 	return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
 }
 
+static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
+{
+	return sizeof(struct fanotify_event_info_error);
+}
+
 static size_t fanotify_event_len(struct fanotify_event *event,
 				 unsigned int fid_mode)
 {
@@ -232,6 +237,29 @@ static int process_access_response(struct fsnotify_group *group,
 	return -ENOENT;
 }
 
+static size_t copy_error_info_to_user(struct fanotify_error_event *fee,
+				      char __user *buf, int count)
+{
+	struct fanotify_event_info_error info;
+
+	info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
+	info.hdr.pad = 0;
+	info.hdr.len = fanotify_error_info_len(fee);
+
+	if (WARN_ON(count < info.hdr.len))
+		return -EFAULT;
+
+	info.version = FANOTIFY_EVENT_INFO_ERROR_VERS_1;
+	info.error = fee->error;
+	info.fsid = fee->fsid;
+
+	if (copy_to_user(buf, &info, sizeof(info)))
+		return -EFAULT;
+
+	return info.hdr.len;
+
+}
+
 static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
 			     int info_type, const char *name, size_t name_len,
 			     char __user *buf, size_t count)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index b283531549f1..cc9a1fa80e30 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -124,6 +124,7 @@ struct fanotify_event_metadata {
 #define FAN_EVENT_INFO_TYPE_FID		1
 #define FAN_EVENT_INFO_TYPE_DFID_NAME	2
 #define FAN_EVENT_INFO_TYPE_DFID	3
+#define FAN_EVENT_INFO_TYPE_ERROR	4
 
 /* Variable length info record following event metadata */
 struct fanotify_event_info_header {
@@ -149,6 +150,15 @@ struct fanotify_event_info_fid {
 	unsigned char handle[0];
 };
 
+#define FANOTIFY_EVENT_INFO_ERROR_VERS_1   1
+
+struct fanotify_event_info_error {
+	struct fanotify_event_info_header hdr;
+	int version;
+	int error;
+	__kernel_fsid_t fsid;
+};
+
 struct fanotify_response {
 	__s32 fd;
 	__u32 response;
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 10/15] fanotify: Introduce code location record
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (8 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  7:11   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

This patch introduces an optional info record that describes the
source (as in the region of the source-code where an event was
initiated).  This record is not produced for other type of existing
notification, but it is optionally enabled for FAN_ERROR notifications.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify.h      |  6 +++++
 fs/notify/fanotify/fanotify_user.c | 35 ++++++++++++++++++++++++++++++
 include/uapi/linux/fanotify.h      |  7 ++++++
 3 files changed, 48 insertions(+)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 4cb9dd31f084..0d1b4cb8b005 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -161,6 +161,11 @@ struct fanotify_fid_event {
 	unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN];
 };
 
+struct fanotify_event_location {
+	int line;
+	const char *function;
+};
+
 static inline struct fanotify_fid_event *
 FANOTIFY_FE(struct fanotify_event *event)
 {
@@ -183,6 +188,7 @@ struct fanotify_error_event {
 	struct fanotify_event fae;
 	int error;
 	__kernel_fsid_t fsid;
+	struct fanotify_event_location loc;
 
 	int fs_data_size;
 	/* Must be the last item in the structure */
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 21162d347bd1..cb79a4a8e6fb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -69,6 +69,16 @@ static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
 	return sizeof(struct fanotify_event_info_error);
 }
 
+static size_t fanotify_location_info_len(const struct fanotify_event_location *loc)
+{
+	if (!loc->function)
+		return 0;
+
+	/* Includes NULL byte at end of loc->function */
+	return (sizeof(struct fanotify_event_info_location) +
+		strlen(loc->function) + 1);
+}
+
 static size_t fanotify_event_len(struct fanotify_event *event,
 				 unsigned int fid_mode)
 {
@@ -260,6 +270,31 @@ static size_t copy_error_info_to_user(struct fanotify_error_event *fee,
 
 }
 
+static size_t copy_location_info_to_user(struct fanotify_event_location *location,
+					 char __user *buf, int count)
+{
+	size_t len = fanotify_location_info_len(location);
+	size_t tail = len - sizeof(struct fanotify_event_info_location);
+	struct fanotify_event_info_location info;
+
+	if (!len)
+		return 0;
+
+	info.hdr.info_type = FAN_EVENT_INFO_TYPE_LOCATION;
+	info.hdr.len = len;
+	info.line = location->line;
+
+	if (copy_to_user(buf, &info, sizeof(info)))
+		return -EFAULT;
+
+	buf += sizeof(info);
+
+	if (copy_to_user(buf, location->function, tail))
+		return -EFAULT;
+
+	return info.hdr.len;
+}
+
 static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
 			     int info_type, const char *name, size_t name_len,
 			     char __user *buf, size_t count)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index cc9a1fa80e30..67217756dac9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -125,6 +125,7 @@ struct fanotify_event_metadata {
 #define FAN_EVENT_INFO_TYPE_DFID_NAME	2
 #define FAN_EVENT_INFO_TYPE_DFID	3
 #define FAN_EVENT_INFO_TYPE_ERROR	4
+#define FAN_EVENT_INFO_TYPE_LOCATION	5
 
 /* Variable length info record following event metadata */
 struct fanotify_event_info_header {
@@ -159,6 +160,12 @@ struct fanotify_event_info_error {
 	__kernel_fsid_t fsid;
 };
 
+struct fanotify_event_info_location {
+	struct fanotify_event_info_header hdr;
+	int line;
+	char function[0];
+};
+
 struct fanotify_response {
 	__s32 fd;
 	__u32 response;
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (9 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-27  7:12   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Allow a FS_ERROR_TYPE notification to send a filesystem provided blob
back to userspace.  This is useful for filesystems who want to provide
debug information for recovery tools.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify_user.c | 27 +++++++++++++++++++++++++++
 include/uapi/linux/fanotify.h      | 10 ++++++++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index cb79a4a8e6fb..e2f4599dfc25 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -69,6 +69,14 @@ static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
 	return sizeof(struct fanotify_event_info_error);
 }
 
+static size_t fanotify_error_fsdata_len(struct fanotify_error_event *fee)
+{
+	if (!fee->fs_data_size)
+		return 0;
+
+	return sizeof(struct fanotify_event_info_fsdata) + fee->fs_data_size;
+}
+
 static size_t fanotify_location_info_len(const struct fanotify_event_location *loc)
 {
 	if (!loc->function)
@@ -295,6 +303,25 @@ static size_t copy_location_info_to_user(struct fanotify_event_location *locatio
 	return info.hdr.len;
 }
 
+static ssize_t copy_error_fsdata_info_to_user(struct fanotify_error_event *fee,
+					      char __user *buf, int count)
+{
+	struct fanotify_event_info_fsdata info;
+
+	info.hdr.info_type = FAN_EVENT_INFO_TYPE_FSDATA;
+	info.hdr.len = fanotify_error_fsdata_len(fee);
+
+	if (copy_to_user(buf, &info, sizeof(info)))
+		return -EFAULT;
+
+	buf += sizeof(info);
+
+	if (copy_to_user(buf, fee->fs_data, fee->fs_data_size))
+		return -EFAULT;
+
+	return info.hdr.len;
+}
+
 static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
 			     int info_type, const char *name, size_t name_len,
 			     char __user *buf, size_t count)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 67217756dac9..49808c857ee1 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -124,8 +124,9 @@ struct fanotify_event_metadata {
 #define FAN_EVENT_INFO_TYPE_FID		1
 #define FAN_EVENT_INFO_TYPE_DFID_NAME	2
 #define FAN_EVENT_INFO_TYPE_DFID	3
-#define FAN_EVENT_INFO_TYPE_ERROR	4
-#define FAN_EVENT_INFO_TYPE_LOCATION	5
+#define FAN_EVENT_INFO_TYPE_LOCATION	4
+#define FAN_EVENT_INFO_TYPE_ERROR	5
+#define FAN_EVENT_INFO_TYPE_FSDATA	6
 
 /* Variable length info record following event metadata */
 struct fanotify_event_info_header {
@@ -166,6 +167,11 @@ struct fanotify_event_info_location {
 	char function[0];
 };
 
+struct fanotify_event_info_fsdata {
+	struct fanotify_event_info_header hdr;
+	char data[0];
+};
+
 struct fanotify_response {
 	__s32 fd;
 	__u32 response;
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (10 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-26 22:45   ` kernel test robot
  2021-04-27  7:25   ` Amir Goldstein
  2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

The FAN_ERROR mark is used by filesystem wide monitoring tools to
receive notifications of type FS_ERROR_EVENT, emited by filesystems when
a problem is detected.  The error notification includes a generic error
descriptor, an optional location record and a filesystem specific blob.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/notify/fanotify/fanotify.c      | 48 +++++++++++++++++++----
 fs/notify/fanotify/fanotify.h      |  8 ++++
 fs/notify/fanotify/fanotify_user.c | 63 ++++++++++++++++++++++++++++++
 include/linux/fanotify.h           |  9 ++++-
 include/uapi/linux/fanotify.h      |  2 +
 5 files changed, 120 insertions(+), 10 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 98591a8155a7..6bae23d42e5e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -240,12 +240,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 		 __func__, iter_info->report_mask, event_mask, data, data_type);
 
 	if (!fid_mode) {
-		/* Do we have path to open a file descriptor? */
-		if (!path)
-			return 0;
-		/* Path type events are only relevant for files and dirs */
-		if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
-			return 0;
+		if (!fanotify_is_error_event(event_mask)) {
+			/* Do we have path to open a file descriptor? */
+			if (!path)
+				return 0;
+			/* Path type events are only relevant for files and dirs */
+			if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
+				return 0;
+		}
 	} else if (!(fid_mode & FAN_REPORT_FID)) {
 		/* Do we have a directory inode to report? */
 		if (!dir && !(event_mask & FS_ISDIR))
@@ -458,6 +460,25 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
 	return &pevent->fae;
 }
 
+static void fanotify_init_error_event(struct fanotify_event *fae,
+				      const struct fs_error_report *report,
+				      __kernel_fsid_t *fsid)
+{
+	struct fanotify_error_event *fee;
+
+	fae->type = FANOTIFY_EVENT_TYPE_ERROR;
+	fee = FANOTIFY_EE(fae);
+	fee->error = report->error;
+	fee->fsid = *fsid;
+
+	fee->loc.line = report->line;
+	fee->loc.function = report->function;
+
+	fee->fs_data_size = report->fs_data_size;
+
+	memcpy(&fee->fs_data, report->fs_data, report->fs_data_size);
+}
+
 static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id,
 						       __kernel_fsid_t *fsid,
 						       gfp_t gfp)
@@ -618,6 +639,13 @@ static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *grou
 {
 	size_t size = 0;
 
+	if (fanotify_is_error_event(mask)) {
+		const struct fs_error_report *report = data;
+		size = sizeof(struct fanotify_error_event) + report->fs_data_size;
+	} else {
+		return ERR_PTR(-EINVAL);
+	}
+
 	pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
 
 	return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
@@ -629,6 +657,9 @@ static void fanotify_ring_write_event(struct fsnotify_group *group,
 {
 	fanotify_init_event(group, event, 0, mask);
 
+	if (fanotify_is_error_event(mask))
+		fanotify_init_error_event(event, data, fsid);
+
 	event->pid = get_pid(task_tgid(current));
 }
 
@@ -695,8 +726,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
 	BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
 	BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
+	BUILD_BUG_ON(FAN_ERROR != FS_ERROR);
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
 
 	mask = fanotify_group_event_mask(group, iter_info, mask, data,
 					 data_type, dir);
@@ -714,7 +746,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 			return 0;
 	}
 
-	if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
+	if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS) || mask == FAN_ERROR) {
 		fsid = fanotify_get_fsid(iter_info);
 		/* Racing with mark destruction or creation? */
 		if (!fsid.val[0] && !fsid.val[1])
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 0d1b4cb8b005..097667be9079 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -135,6 +135,7 @@ enum fanotify_event_type {
 	FANOTIFY_EVENT_TYPE_PATH,
 	FANOTIFY_EVENT_TYPE_PATH_PERM,
 	FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
+	FANOTIFY_EVENT_TYPE_ERROR,
 };
 
 struct fanotify_event {
@@ -207,6 +208,8 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
 		return &FANOTIFY_FE(event)->fsid;
 	else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
 		return &FANOTIFY_NE(event)->fsid;
+	else if (event->type == FANOTIFY_EVENT_TYPE_ERROR)
+		return &FANOTIFY_EE(event)->fsid;
 	else
 		return NULL;
 }
@@ -292,6 +295,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
 	return container_of(fse, struct fanotify_event, fse);
 }
 
+static inline bool fanotify_is_error_event(u32 mask)
+{
+	return mask & FANOTIFY_ERROR_EVENTS;
+}
+
 static inline bool fanotify_event_has_path(struct fanotify_event *event)
 {
 	return event->type == FANOTIFY_EVENT_TYPE_PATH ||
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index e2f4599dfc25..6270083bee36 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -96,6 +96,24 @@ static size_t fanotify_event_len(struct fanotify_event *event,
 	int fh_len;
 	int dot_len = 0;
 
+	if (fanotify_is_error_event(event->mask)) {
+		struct fanotify_error_event *fee = FANOTIFY_EE(event);
+		/*
+		 *  Error events (FAN_ERROR) have a different format
+		 *  as follows:
+		 * [ event_metadata            ]
+		 * [ fs-generic error header   ]
+		 * [ error location (optional) ]
+		 * [ fs-specific blob          ]
+		 */
+		event_len = fanotify_error_info_len(fee);
+		if (fee->loc.function)
+			event_len += fanotify_location_info_len(&fee->loc);
+		if (fee->fs_data)
+			event_len += fanotify_error_fsdata_len(fee);
+		return event_len;
+	}
+
 	if (!fid_mode)
 		return event_len;
 
@@ -322,6 +340,38 @@ static ssize_t copy_error_fsdata_info_to_user(struct fanotify_error_event *fee,
 	return info.hdr.len;
 }
 
+static int copy_error_event_to_user(struct fanotify_event *event,
+				    char __user *buf, int count)
+{
+	struct fanotify_error_event *fee = FANOTIFY_EE(event);
+	ssize_t len;
+	int original_count = count;
+
+	len = copy_error_info_to_user(fee, buf, count);
+	if (len < 0)
+		return -EFAULT;
+	buf += len;
+	count -= len;
+
+	if (fee->loc.function) {
+		len = copy_location_info_to_user(&fee->loc, buf, count);
+		if (len < 0)
+			return len;
+		buf += len;
+		count -= len;
+	}
+
+	if (fee->fs_data_size) {
+		len = copy_error_fsdata_info_to_user(fee, buf, count);
+		if (len < 0)
+			return len;
+		buf += len;
+		count -= len;
+	}
+
+	return original_count - count;
+}
+
 static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
 			     int info_type, const char *name, size_t name_len,
 			     char __user *buf, size_t count)
@@ -528,6 +578,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 		count -= ret;
 	}
 
+	if (fanotify_is_error_event(event->mask))
+		ret = copy_error_event_to_user(event, buf, count);
+
 	return metadata.event_len;
 
 out_close_fd:
@@ -1328,6 +1381,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	    (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
 		goto fput_and_out;
 
+	if (fanotify_is_error_event(mask) &&
+	    !(group->flags & FSN_SUBMISSION_RING_BUFFER))
+		goto fput_and_out;
+
 	ret = fanotify_find_path(dfd, pathname, &path, flags,
 			(mask & ALL_FSNOTIFY_EVENTS), obj_type);
 	if (ret)
@@ -1350,6 +1407,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 
 		fsid = &__fsid;
 	}
+	if (mask & FAN_ERROR) {
+		ret = fanotify_check_path_fsid(&path, &__fsid);
+		if (ret)
+			goto path_put_and_out;
+		fsid = &__fsid;
+	}
 
 	/* inode held in place by reference to path; group by fget on fd */
 	if (mark_type == FAN_MARK_INODE)
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 5a4cefb4b1c3..e08be5fae14a 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -56,9 +56,13 @@
 #define FANOTIFY_INODE_EVENTS	(FANOTIFY_DIRENT_EVENTS | \
 				 FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
 
+#define FANOTIFY_ERROR_EVENTS	(FAN_ERROR)
+
 /* Events that user can request to be notified on */
 #define FANOTIFY_EVENTS		(FANOTIFY_PATH_EVENTS | \
-				 FANOTIFY_INODE_EVENTS)
+				 FANOTIFY_INODE_EVENTS | \
+				 FANOTIFY_ERROR_EVENTS)
+
 
 /* Events that require a permission response from user */
 #define FANOTIFY_PERM_EVENTS	(FAN_OPEN_PERM | FAN_ACCESS_PERM | \
@@ -70,9 +74,10 @@
 /* Events that may be reported to user */
 #define FANOTIFY_OUTGOING_EVENTS	(FANOTIFY_EVENTS | \
 					 FANOTIFY_PERM_EVENTS | \
+					 FANOTIFY_ERROR_EVENTS | \
 					 FAN_Q_OVERFLOW | FAN_ONDIR)
 
-#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
+#define FANOTIFY_SUBMISSION_BUFFER_EVENTS FANOTIFY_ERROR_EVENTS
 
 #define ALL_FANOTIFY_EVENT_BITS		(FANOTIFY_OUTGOING_EVENTS | \
 					 FANOTIFY_EVENT_FLAGS)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 49808c857ee1..ee0ae8b1e20b 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -25,6 +25,8 @@
 #define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
 #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
 
+#define FAN_ERROR		0x00100000	/* Filesystem error */
+
 #define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
 
 #define FAN_ONDIR		0x40000000	/* Event occurred against dir */
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 13/15] ext4: Send notifications on error
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (11 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
@ 2021-04-26 18:41 ` Gabriel Krisman Bertazi
  2021-04-26 23:10   ` kernel test robot
                     ` (2 more replies)
  2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  15 siblings, 3 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:41 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Send a FS_ERROR message via fsnotify to a userspace monitoring tool
whenever a ext4 error condition is triggered.  This follows the existing
error conditions in ext4, so it is hooked to the ext4_error* functions.

It also follows the current dmesg reporting in the format.  The
filesystem message is composed mostly by the string that would be
otherwise printed in dmesg.

A new ext4 specific record format is exposed in the uapi, such that a
monitoring tool knows what to expect when listening errors of an ext4
filesystem.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/ext4/super.c                  | 60 ++++++++++++++++++++++++--------
 include/uapi/linux/ext4-notify.h | 17 +++++++++
 2 files changed, 62 insertions(+), 15 deletions(-)
 create mode 100644 include/uapi/linux/ext4-notify.h

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..032e29e7ff6a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -46,6 +46,8 @@
 #include <linux/part_stat.h>
 #include <linux/kthread.h>
 #include <linux/freezer.h>
+#include <linux/fsnotify.h>
+#include <uapi/linux/ext4-notify.h>
 
 #include "ext4.h"
 #include "ext4_extents.h"	/* Needed for trace points definition */
@@ -727,6 +729,22 @@ static void flush_stashed_error_work(struct work_struct *work)
 	ext4_commit_super(sbi->s_sb);
 }
 
+static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
+				const char *func, int line,
+				const char *desc, struct va_format *vaf)
+{
+	struct ext4_error_inode_report report;
+
+	if (inode->i_sb->s_fsnotify_marks) {
+		report.inode = inode ? inode->i_ino : -1L;
+		report.block = block ? block : -1L;
+
+		snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
+
+		fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
+	}
+}
+
 #define ext4_error_ratelimit(sb)					\
 		___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state),	\
 			     "EXT4-fs error")
@@ -742,15 +760,18 @@ void __ext4_error(struct super_block *sb, const char *function,
 		return;
 
 	trace_ext4_error(sb, function, line);
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
 	if (ext4_error_ratelimit(sb)) {
-		va_start(args, fmt);
-		vaf.fmt = fmt;
-		vaf.va = &args;
 		printk(KERN_CRIT
 		       "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
 		       sb->s_id, function, line, current->comm, &vaf);
-		va_end(args);
+
 	}
+	ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
+	va_end(args);
 	ext4_handle_error(sb, force_ro, error, 0, block, function, line);
 }
 
@@ -765,10 +786,10 @@ void __ext4_error_inode(struct inode *inode, const char *function,
 		return;
 
 	trace_ext4_error(inode->i_sb, function, line);
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
 	if (ext4_error_ratelimit(inode->i_sb)) {
-		va_start(args, fmt);
-		vaf.fmt = fmt;
-		vaf.va = &args;
 		if (block)
 			printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
 			       "inode #%lu: block %llu: comm %s: %pV\n",
@@ -779,8 +800,11 @@ void __ext4_error_inode(struct inode *inode, const char *function,
 			       "inode #%lu: comm %s: %pV\n",
 			       inode->i_sb->s_id, function, line, inode->i_ino,
 			       current->comm, &vaf);
-		va_end(args);
 	}
+
+	ext4_fsnotify_error(error, inode, block, function, line, NULL, &vaf);
+	va_end(args);
+
 	ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
 			  function, line);
 }
@@ -798,13 +822,16 @@ void __ext4_error_file(struct file *file, const char *function,
 		return;
 
 	trace_ext4_error(inode->i_sb, function, line);
+
+	path = file_path(file, pathname, sizeof(pathname));
+	if (IS_ERR(path))
+		path = "(unknown)";
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+
 	if (ext4_error_ratelimit(inode->i_sb)) {
-		path = file_path(file, pathname, sizeof(pathname));
-		if (IS_ERR(path))
-			path = "(unknown)";
-		va_start(args, fmt);
-		vaf.fmt = fmt;
-		vaf.va = &args;
 		if (block)
 			printk(KERN_CRIT
 			       "EXT4-fs error (device %s): %s:%d: inode #%lu: "
@@ -817,8 +844,10 @@ void __ext4_error_file(struct file *file, const char *function,
 			       "comm %s: path %s: %pV\n",
 			       inode->i_sb->s_id, function, line, inode->i_ino,
 			       current->comm, path, &vaf);
-		va_end(args);
 	}
+	ext4_fsnotify_error(EFSCORRUPTED, inode, block, function, line, NULL, &vaf);
+	va_end(args);
+
 	ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
 			  function, line);
 }
@@ -886,6 +915,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
 		printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
 		       sb->s_id, function, line, errstr);
 	}
+	ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
 
 	ext4_handle_error(sb, false, -errno, 0, 0, function, line);
 }
diff --git a/include/uapi/linux/ext4-notify.h b/include/uapi/linux/ext4-notify.h
new file mode 100644
index 000000000000..31a3bbcafd13
--- /dev/null
+++ b/include/uapi/linux/ext4-notify.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * Copyright 2021, Collabora Ltd.
+ */
+
+#ifndef EXT4_NOTIFY_H
+#define EXT4_NOTIFY_H
+
+#define EXT4_FSN_DESC_LEN	256
+
+struct ext4_error_inode_report {
+	u64 inode;
+	u64 block;
+	char desc[EXT4_FSN_DESC_LEN];
+};
+
+#endif
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 14/15] samples: Add fs error monitoring example
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (12 preceding siblings ...)
  2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
@ 2021-04-26 18:42 ` Gabriel Krisman Bertazi
  2021-04-26 23:10   ` kernel test robot
  2021-04-26 18:42 ` [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework Gabriel Krisman Bertazi
  2021-04-27  4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
  15 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:42 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Introduce an example of a FAN_ERROR fanotify user to track filesystem
errors.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 samples/Kconfig               |   7 ++
 samples/Makefile              |   1 +
 samples/fanotify/Makefile     |   3 +
 samples/fanotify/fs-monitor.c | 135 ++++++++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+)
 create mode 100644 samples/fanotify/Makefile
 create mode 100644 samples/fanotify/fs-monitor.c

diff --git a/samples/Kconfig b/samples/Kconfig
index e76cdfc50e25..a2968338517f 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -120,6 +120,13 @@ config SAMPLE_CONNECTOR
 	  with it.
 	  See also Documentation/driver-api/connector.rst
 
+config SAMPLE_FANOTIFY_ERROR
+	bool "Build fanotify error monitoring sample"
+	depends on FANOTIFY
+	help
+	  When enabled, this builds an example code that uses the FAN_ERROR
+	  fanotify mechanism to monitor filesystem errors.
+
 config SAMPLE_HIDRAW
 	bool "hidraw sample"
 	depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index c3392a595e4b..93e2d64bc9a7 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -5,6 +5,7 @@ subdir-$(CONFIG_SAMPLE_AUXDISPLAY)	+= auxdisplay
 subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
 obj-$(CONFIG_SAMPLE_CONFIGFS)		+= configfs/
 obj-$(CONFIG_SAMPLE_CONNECTOR)		+= connector/
+obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR)	+= fanotify/
 subdir-$(CONFIG_SAMPLE_HIDRAW)		+= hidraw
 obj-$(CONFIG_SAMPLE_HW_BREAKPOINT)	+= hw_breakpoint/
 obj-$(CONFIG_SAMPLE_KDB)		+= kdb/
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
new file mode 100644
index 000000000000..b3d5cc826e6f
--- /dev/null
+++ b/samples/fanotify/Makefile
@@ -0,0 +1,3 @@
+userprogs-always-y += fs-monitor
+
+userccflags += -I usr/include
diff --git a/samples/fanotify/fs-monitor.c b/samples/fanotify/fs-monitor.c
new file mode 100644
index 000000000000..cdece8344c20
--- /dev/null
+++ b/samples/fanotify/fs-monitor.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021, Collabora Ltd.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <err.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/fanotify.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#ifndef FAN_ERROR
+
+#define FAN_ERROR		0x00100000
+#define FAN_PREALLOC_QUEUE      0x00000080
+
+#define FAN_EVENT_INFO_TYPE_LOCATION	4
+#define FAN_EVENT_INFO_TYPE_ERROR	5
+#define FAN_EVENT_INFO_TYPE_FSDATA	6
+
+struct fanotify_event_info_error {
+	struct fanotify_event_info_header hdr;
+	int version;
+	int error;
+	long long unsigned fsid;
+};
+
+struct fanotify_event_info_location {
+	struct fanotify_event_info_header hdr;
+	int line;
+	char function[0];
+};
+
+struct fanotify_event_info_fsdata {
+	struct fanotify_event_info_header hdr;
+	char data[0];
+};
+
+struct ext4_error_inode_report {
+	unsigned long long inode;
+	unsigned long long block;
+	char desc[40];
+};
+#endif
+
+static void handle_notifications(char *buffer, int len)
+{
+	struct fanotify_event_metadata *metadata;
+	struct fanotify_event_info_header *hdr = 0;
+	char *off, *next;
+
+	for (metadata = (struct fanotify_event_metadata *) buffer;
+	     FAN_EVENT_OK(metadata, len); metadata = FAN_EVENT_NEXT(metadata, len)) {
+		next = (char*)metadata  + metadata->event_len;
+		if (!(metadata->mask == FAN_ERROR)) {
+			printf("unexpected FAN MARK: %llx\n", metadata->mask);
+			continue;
+		}
+		if (metadata->fd != FAN_NOFD) {
+			printf("bizar fd != FAN_NOFD\n");
+			continue;;
+		}
+
+		printf("FAN_ERROR found len=%d\n", metadata->event_len);
+
+		for (off = (char*)(metadata+1); off < next; off = off + hdr->len) {
+			hdr = (struct fanotify_event_info_header*)(off);
+
+			if (hdr->info_type == FAN_EVENT_INFO_TYPE_ERROR) {
+				struct fanotify_event_info_error *error =
+					(struct fanotify_event_info_error*) hdr;
+
+				printf("  Generic Error Record: len=%d\n", hdr->len);
+				printf("      version: %d\n", error->version);
+				printf("      error: %d\n", error->error);
+				printf("      fsid: %llx\n", error->fsid);
+
+			} else if(hdr->info_type == FAN_EVENT_INFO_TYPE_LOCATION) {
+				struct fanotify_event_info_location *loc =
+					(struct fanotify_event_info_location*) hdr;
+
+				printf("  Location Record Size = %d\n", loc->hdr.len);
+				printf("      loc=%s:%d\n", loc->function, loc->line);
+
+			} else if(hdr->info_type == FAN_EVENT_INFO_TYPE_FSDATA) {
+				struct fanotify_event_info_fsdata *data =
+					(struct fanotify_event_info_fsdata *)hdr;
+				struct ext4_error_inode_report *fsdata =
+					(struct ext4_error_inode_report*) ((char*)data->data);
+
+				printf("  Fsdata Record: len=%d\n", hdr->len);
+				printf("      inode=%llu\n", fsdata->inode);
+				if (fsdata->block != -1L)
+					printf("      block=%llu\n", fsdata->block);
+				printf("      desc=%s\n", fsdata->desc);
+			}
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int fd;
+	char buffer[BUFSIZ];
+
+	if (argc < 2) {
+		printf("Missing path argument\n");
+		return 1;
+	}
+
+	fd = fanotify_init(FAN_CLASS_NOTIF|FAN_PREALLOC_QUEUE, O_RDONLY);
+	if (fd < 0)
+		errx(1, "fanotify_init");
+
+	if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
+			  FAN_ERROR, AT_FDCWD, argv[1])) {
+		errx(1, "fanotify_mark");
+	}
+
+	while (1) {
+		int n = read(fd, buffer, BUFSIZ);
+		if (n < 0)
+			errx(1, "read");
+
+		handle_notifications(buffer, n);
+	}
+
+	return 0;
+}
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (13 preceding siblings ...)
  2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
@ 2021-04-26 18:42 ` Gabriel Krisman Bertazi
  2021-04-27  4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
  15 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-26 18:42 UTC (permalink / raw)
  To: amir73il, tytso, djwong
  Cc: david, jack, dhowells, khazhy, linux-fsdevel, linux-ext4,
	Gabriel Krisman Bertazi, kernel

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 .../admin-guide/filesystem-monitoring.rst     | 103 ++++++++++++++++++
 Documentation/admin-guide/index.rst           |   1 +
 2 files changed, 104 insertions(+)
 create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst

diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
new file mode 100644
index 000000000000..e19bf792dd7a
--- /dev/null
+++ b/Documentation/admin-guide/filesystem-monitoring.rst
@@ -0,0 +1,103 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+File system Monitoring with fanotify
+====================================
+
+fanotify supports the FAN_ERROR mark for file system-wide error
+reporting.  It is meant to be used by file system health monitoring
+daemons who listen on that interface and take actions (notify sysadmin,
+start recovery) when a file system problem is detected by the kernel.
+
+By design, A FAN_ERROR notification exposes sufficient information for a
+monitoring tool to map a problem to specific region of the file system
+or its code and trigger recovery procedures.  It doesn't necessarily
+provide a user space application with semantics to verify an IO
+operation was successfully executed.  That is outside of scope of this
+feature. Instead, it is only meant as a framework for early file system
+problem detection and reporting recovery tools.
+
+At the time of this writing, the only file system that emits this
+FAN_ERROR notifications is ext4.
+
+An example code for ext4 is provided at ``samples/fanotify/fs-monitor.c``.
+
+Usage
+=====
+
+In order to guarantee notification delivery on different error
+conditions, FAN_ERROR requires the fanotify group to be created with
+FAN_PREALLOC_QUEUE.  This means a group that emits FAN_ERROR
+notifications currently cannot be reused for any other kind of
+notification.
+
+To setup a group for error notification::
+
+  fanotify_init(FAN_CLASS_NOTIF | FAN_PREALLOC_QUEUE, O_RDONLY);
+
+Then, enable the FAN_ERROR mark on a specific path::
+
+  fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM, FAN_ERROR, AT_FDCWD, "/mnt");
+
+Notification structure
+======================
+
+A FAN_ERROR Notification has the following format::
+
+  [ Notification Metadata (Mandatory) ]
+  [ Generic Error Record  (Mandatory) ]
+  [ Error Location Record (Optional)  ]
+  [ FS-Specific Record    (Optional)  ]
+
+With the exception of the notification metadata and the generic
+information, all information records are optional.  Each record type is
+identified by its unique ``struct fanotify_event_info_header.info_type``.
+
+Generic error Location
+----------------------
+
+The Generic error record provides enough information for a file system
+agnostic tool to learn about a problem in the file system, without
+requiring any details about the problem.::
+
+  struct fanotify_event_info_error {
+	struct fanotify_event_info_header hdr;  /* info_type = FAN_EVENT_INFO_TYPE_ERROR */
+	int version;
+	int error;
+	__kernel_fsid_t fsid;
+  };
+
+Error Location Record
+---------------------
+
+Error location is required by some use cases to easily associate an
+error with a specific line of code.  Not every user case requires it and
+they might not be emitted for different file systems.
+
+Notice this field is variable length, but its size is found in ```hdr.len```.::
+
+  struct fanotify_event_info_location {
+	struct fanotify_event_info_header hdr; /* info_type = FAN_EVENT_INFO_TYPE_LOCATION */
+	int line;
+	char function[0];
+  };
+
+File system specific Record
+---------------------------
+
+The file system specific record attempts to provide file system specific
+tools with enough information to uniquely identify the problem and
+hopefully recover from it.
+
+Since each file system defines its own specific data, this record is
+composed by a header, followed by a data blob, that is defined by each
+file system.  Review the file system documentation for more information.
+
+While the hdr.info_type identifies the presence of this field,
+``hdr.len`` field identifies the length of the file system specific
+structure following the header.::
+
+  struct fanotify_event_info_fsdata {
+	struct fanotify_event_info_header hdr; /* info_type = FAN_EVENT_INFO_TYPE_FSDATA */
+	struct data[0];
+  };
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 423116c4e787..a0d1bf76629f 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -83,6 +83,7 @@ configure specific aspects of kernel behavior to your liking.
    edid
    efi-stub
    ext4
+   filesystem-monitoring
    nfs/index
    gpio/index
    highuid
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
  2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
@ 2021-04-26 22:00   ` kernel test robot
  2021-04-26 22:43   ` kernel test robot
  2021-04-27  5:39   ` Amir Goldstein
  2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 22:00 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3775 bytes --]

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: microblaze-randconfig-r006-20210426 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/0b550d36bb2ec4613ad64b68b18898a72fd5af50
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
        git checkout 0b550d36bb2ec4613ad64b68b18898a72fd5af50
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:16,
                    from include/linux/radix-tree.h:12,
                    from include/linux/idr.h:15,
                    from include/linux/fsnotify_backend.h:13,
                    from include/linux/fsnotify.h:15,
                    from fs/notify/ring.c:3:
   fs/notify/ring.c: In function 'fsnotify_ring_alloc_event_slot':
>> include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
       5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
         |                  ^~~~~~
   include/linux/printk.h:140:10: note: in definition of macro 'no_printk'
     140 |   printk(fmt, ##__VA_ARGS__);  \
         |          ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
      15 | #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
         |                    ^~~~~~~~
   include/linux/printk.h:430:12: note: in expansion of macro 'KERN_DEBUG'
     430 |  no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
         |            ^~~~~~~~~~
   fs/notify/ring.c:81:2: note: in expansion of macro 'pr_debug'
      81 |  pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
         |  ^~~~~~~~
   fs/notify/ring.c:81:59: note: format string is defined here
      81 |  pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
         |                                                         ~~^
         |                                                           |
         |                                                           long unsigned int
         |                                                         %u


vim +5 include/linux/kern_levels.h

314ba3520e513a Joe Perches 2012-07-30  4  
04d2c8c83d0e3a Joe Perches 2012-07-30 @5  #define KERN_SOH	"\001"		/* ASCII Start Of Header */
04d2c8c83d0e3a Joe Perches 2012-07-30  6  #define KERN_SOH_ASCII	'\001'
04d2c8c83d0e3a Joe Perches 2012-07-30  7  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35433 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
  2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
  2021-04-26 22:00   ` kernel test robot
@ 2021-04-26 22:43   ` kernel test robot
  2021-04-27  5:39   ` Amir Goldstein
  2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 22:43 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4824 bytes --]

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: arm-randconfig-r022-20210426 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project d941863de2becb3d8d2e00676fc7125974934c7f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/0b550d36bb2ec4613ad64b68b18898a72fd5af50
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
        git checkout 0b550d36bb2ec4613ad64b68b18898a72fd5af50
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/notify/ring.c:82:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
                    ring_size, size);
                               ^~~~
   include/linux/printk.h:430:38: note: expanded from macro 'pr_debug'
           no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
                                       ~~~     ^~~~~~~~~~~
   include/linux/printk.h:140:17: note: expanded from macro 'no_printk'
                   printk(fmt, ##__VA_ARGS__);             \
                          ~~~    ^~~~~~~~~~~
   1 warning generated.


vim +82 fs/notify/ring.c

    67	
    68	struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
    69							      size_t size)
    70		__acquires(&group->notification_lock)
    71	{
    72		struct fsnotify_event *fsn;
    73		u64 head, tail;
    74		u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
    75		u64 new_head;
    76		void *kaddr;
    77	
    78		if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
    79			return ERR_PTR(-EINVAL);
    80	
    81		pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
  > 82			 ring_size, size);
    83	
    84		spin_lock(&group->notification_lock);
    85	again:
    86		head = group->ring_buffer.head;
    87		tail = group->ring_buffer.tail;
    88		new_head = NEXT_SLOT(head, size, ring_size);
    89	
    90		/* head would catch up to tail, corrupting an entry. */
    91		if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
    92			fsn = ERR_PTR(-ENOMEM);
    93			goto err;
    94		}
    95	
    96		/*
    97		 * Not event a skip message fits in the page. We can detect the
    98		 * lack of space. Move on to the next page.
    99		 */
   100		if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
   101			/* Start again on next page */
   102			group->ring_buffer.head = NEXT_PAGE(head, ring_size);
   103			goto again;
   104		}
   105	
   106		kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
   107		if (!kaddr) {
   108			fsn = ERR_PTR(-EFAULT);
   109			goto err;
   110		}
   111	
   112		fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
   113	
   114		if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
   115			/*
   116			 * No room in the current page.  Add a fake entry
   117			 * consuming the end the page to avoid splitting event
   118			 * structure.
   119			 */
   120			fsn->slot_len = INVALID_RING_SLOT;
   121			kunmap_atomic(kaddr);
   122			/* Start again on the next page */
   123			group->ring_buffer.head = NEXT_PAGE(head, ring_size);
   124	
   125			goto again;
   126		}
   127		fsn->slot_len = size;
   128	
   129		return fsn;
   130	
   131	err:
   132		spin_unlock(&group->notification_lock);
   133		return fsn;
   134	}
   135	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33558 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
  2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
@ 2021-04-26 22:45   ` kernel test robot
  2021-04-27  7:25   ` Amir Goldstein
  1 sibling, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 22:45 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3869 bytes --]

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: x86_64-randconfig-a001-20210426 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project d941863de2becb3d8d2e00676fc7125974934c7f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/6179a61e1067e69a0e24e98bad3cb0eebdbfbee0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
        git checkout 6179a61e1067e69a0e24e98bad3cb0eebdbfbee0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/notify/fanotify/fanotify_user.c:112:12: warning: address of array 'fee->fs_data' will always evaluate to 'true' [-Wpointer-bool-conversion]
                   if (fee->fs_data)
                   ~~  ~~~~~^~~~~~~
   1 warning generated.


vim +112 fs/notify/fanotify/fanotify_user.c

    89	
    90	static size_t fanotify_event_len(struct fanotify_event *event,
    91					 unsigned int fid_mode)
    92	{
    93		size_t event_len = FAN_EVENT_METADATA_LEN;
    94		struct fanotify_info *info;
    95		int dir_fh_len;
    96		int fh_len;
    97		int dot_len = 0;
    98	
    99		if (fanotify_is_error_event(event->mask)) {
   100			struct fanotify_error_event *fee = FANOTIFY_EE(event);
   101			/*
   102			 *  Error events (FAN_ERROR) have a different format
   103			 *  as follows:
   104			 * [ event_metadata            ]
   105			 * [ fs-generic error header   ]
   106			 * [ error location (optional) ]
   107			 * [ fs-specific blob          ]
   108			 */
   109			event_len = fanotify_error_info_len(fee);
   110			if (fee->loc.function)
   111				event_len += fanotify_location_info_len(&fee->loc);
 > 112			if (fee->fs_data)
   113				event_len += fanotify_error_fsdata_len(fee);
   114			return event_len;
   115		}
   116	
   117		if (!fid_mode)
   118			return event_len;
   119	
   120		info = fanotify_event_info(event);
   121		dir_fh_len = fanotify_event_dir_fh_len(event);
   122		fh_len = fanotify_event_object_fh_len(event);
   123	
   124		if (dir_fh_len) {
   125			event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
   126		} else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
   127			/*
   128			 * With group flag FAN_REPORT_NAME, if name was not recorded in
   129			 * event on a directory, we will report the name ".".
   130			 */
   131			dot_len = 1;
   132		}
   133	
   134		if (fh_len)
   135			event_len += fanotify_fid_info_len(fh_len, dot_len);
   136	
   137		return event_len;
   138	}
   139	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37920 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 13/15] ext4: Send notifications on error
  2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
@ 2021-04-26 23:10   ` kernel test robot
  2021-04-27  4:32   ` Amir Goldstein
  2021-04-29  0:57   ` Darrick J. Wong
  2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 23:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2734 bytes --]

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on pcmoore-audit/next]
[also build test ERROR on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: nios2-randconfig-r024-20210426 (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/151ead19fe71b5ca87e8a345ec1f640454f7ad34
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
        git checkout 151ead19fe71b5ca87e8a345ec1f640454f7ad34
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/ext4/super.c: In function 'ext4_fsnotify_error':
>> fs/ext4/super.c:738:17: error: 'struct super_block' has no member named 's_fsnotify_marks'
     738 |  if (inode->i_sb->s_fsnotify_marks) {
         |                 ^~
   fs/ext4/super.c: In function 'ext4_remount':
   fs/ext4/super.c:5839:6: warning: variable 'enable_quota' set but not used [-Wunused-but-set-variable]
    5839 |  int enable_quota = 0;
         |      ^~~~~~~~~~~~


vim +738 fs/ext4/super.c

   731	
   732	static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
   733					const char *func, int line,
   734					const char *desc, struct va_format *vaf)
   735	{
   736		struct ext4_error_inode_report report;
   737	
 > 738		if (inode->i_sb->s_fsnotify_marks) {
   739			report.inode = inode ? inode->i_ino : -1L;
   740			report.block = block ? block : -1L;
   741	
   742			snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
   743	
   744			fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
   745		}
   746	}
   747	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25380 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 14/15] samples: Add fs error monitoring example
  2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
@ 2021-04-26 23:10   ` kernel test robot
  0 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-26 23:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6306 bytes --]

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on pcmoore-audit/next]
[also build test ERROR on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/3b25963eb28ef5e89ef34cc7b64e479205a38e9e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
        git checkout 3b25963eb28ef5e89ef34cc7b64e479205a38e9e
        # save the attached .config to linux build tree
        make W=1 W=1 ARCH=um 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> samples/fanotify/fs-monitor.c:28:36: error: field 'hdr' has incomplete type
      28 |  struct fanotify_event_info_header hdr;
         |                                    ^~~
   samples/fanotify/fs-monitor.c:35:36: error: field 'hdr' has incomplete type
      35 |  struct fanotify_event_info_header hdr;
         |                                    ^~~
   samples/fanotify/fs-monitor.c:41:36: error: field 'hdr' has incomplete type
      41 |  struct fanotify_event_info_header hdr;
         |                                    ^~~
   samples/fanotify/fs-monitor.c: In function 'handle_notifications':
>> samples/fanotify/fs-monitor.c:72:62: error: dereferencing pointer to incomplete type 'struct fanotify_event_info_header'
      72 |   for (off = (char*)(metadata+1); off < next; off = off + hdr->len) {
         |                                                              ^~
   samples/fanotify/fs-monitor.c: In function 'main':
>> samples/fanotify/fs-monitor.c:121:37: error: 'FAN_MARK_FILESYSTEM' undeclared (first use in this function); did you mean 'FAN_MARK_FLUSH'?
     121 |  if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
         |                                     ^~~~~~~~~~~~~~~~~~~
         |                                     FAN_MARK_FLUSH
   samples/fanotify/fs-monitor.c:121:37: note: each undeclared identifier is reported only once for each function it appears in


vim +/hdr +28 samples/fanotify/fs-monitor.c

    26	
    27	struct fanotify_event_info_error {
  > 28		struct fanotify_event_info_header hdr;
    29		int version;
    30		int error;
    31		long long unsigned fsid;
    32	};
    33	
    34	struct fanotify_event_info_location {
    35		struct fanotify_event_info_header hdr;
    36		int line;
    37		char function[0];
    38	};
    39	
    40	struct fanotify_event_info_fsdata {
    41		struct fanotify_event_info_header hdr;
    42		char data[0];
    43	};
    44	
    45	struct ext4_error_inode_report {
    46		unsigned long long inode;
    47		unsigned long long block;
    48		char desc[40];
    49	};
    50	#endif
    51	
    52	static void handle_notifications(char *buffer, int len)
    53	{
    54		struct fanotify_event_metadata *metadata;
    55		struct fanotify_event_info_header *hdr = 0;
    56		char *off, *next;
    57	
    58		for (metadata = (struct fanotify_event_metadata *) buffer;
    59		     FAN_EVENT_OK(metadata, len); metadata = FAN_EVENT_NEXT(metadata, len)) {
    60			next = (char*)metadata  + metadata->event_len;
    61			if (!(metadata->mask == FAN_ERROR)) {
    62				printf("unexpected FAN MARK: %llx\n", metadata->mask);
    63				continue;
    64			}
    65			if (metadata->fd != FAN_NOFD) {
    66				printf("bizar fd != FAN_NOFD\n");
    67				continue;;
    68			}
    69	
    70			printf("FAN_ERROR found len=%d\n", metadata->event_len);
    71	
  > 72			for (off = (char*)(metadata+1); off < next; off = off + hdr->len) {
    73				hdr = (struct fanotify_event_info_header*)(off);
    74	
    75				if (hdr->info_type == FAN_EVENT_INFO_TYPE_ERROR) {
    76					struct fanotify_event_info_error *error =
    77						(struct fanotify_event_info_error*) hdr;
    78	
    79					printf("  Generic Error Record: len=%d\n", hdr->len);
    80					printf("      version: %d\n", error->version);
    81					printf("      error: %d\n", error->error);
    82					printf("      fsid: %llx\n", error->fsid);
    83	
    84				} else if(hdr->info_type == FAN_EVENT_INFO_TYPE_LOCATION) {
    85					struct fanotify_event_info_location *loc =
    86						(struct fanotify_event_info_location*) hdr;
    87	
    88					printf("  Location Record Size = %d\n", loc->hdr.len);
    89					printf("      loc=%s:%d\n", loc->function, loc->line);
    90	
    91				} else if(hdr->info_type == FAN_EVENT_INFO_TYPE_FSDATA) {
    92					struct fanotify_event_info_fsdata *data =
    93						(struct fanotify_event_info_fsdata *)hdr;
    94					struct ext4_error_inode_report *fsdata =
    95						(struct ext4_error_inode_report*) ((char*)data->data);
    96	
    97					printf("  Fsdata Record: len=%d\n", hdr->len);
    98					printf("      inode=%llu\n", fsdata->inode);
    99					if (fsdata->block != -1L)
   100						printf("      block=%llu\n", fsdata->block);
   101					printf("      desc=%s\n", fsdata->desc);
   102				}
   103			}
   104		}
   105	}
   106	
   107	int main(int argc, char **argv)
   108	{
   109		int fd;
   110		char buffer[BUFSIZ];
   111	
   112		if (argc < 2) {
   113			printf("Missing path argument\n");
   114			return 1;
   115		}
   116	
   117		fd = fanotify_init(FAN_CLASS_NOTIF|FAN_PREALLOC_QUEUE, O_RDONLY);
   118		if (fd < 0)
   119			errx(1, "fanotify_init");
   120	
 > 121		if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 24316 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 00/15] File system wide monitoring
  2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
                   ` (14 preceding siblings ...)
  2021-04-26 18:42 ` [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework Gabriel Krisman Bertazi
@ 2021-04-27  4:11 ` Amir Goldstein
  2021-04-27 15:44   ` Gabriel Krisman Bertazi
  2021-05-11 10:43   ` Jan Kara
  15 siblings, 2 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  4:11 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Hi,
>
> In an attempt to consolidate some of the feedback from the previous
> proposals, I wrote a new attempt to solve the file system error reporting
> problem.  Before I spend more time polishing it, I'd like to hear your
> feedback if I'm going in the wrong direction, in particular with the
> modifications to fsnotify.
>

IMO you are going in the right direction, but you have gone a bit too far ;-)

My understanding of the requirements and my interpretation of the feedback
from filesystem maintainers is that the missing piece in the ecosystem is a
user notification that "something went wrong". The "what went wrong" part
is something that users and admins have long been able to gather from the
kernel log and from filesystem tools (e.g. last error recorded).

I do not see the need to duplicate existing functionality in fsmonitor.
Don't get me wrong, I understand why it would have been nice for fsmonitor
to be able to get all the errors nicely without looking anywhere else, but I
don't think it justifies the extra complexity.

> This RFC follows up on my previous proposals which attempted to leverage
> watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
> to push error notifications to user space.  This proposal starts by, as
> suggested by Darrick, limiting the scope of what I'm trying to do to an
> interface for administrators to monitor the health of a file system,
> instead of a generic inteface for file errors.  Therefore, this doesn't
> solve the problem of writeback errors or the need to watch a specific
> subsystem.
>
> * Format
>
> The feature is implemented on top of fanotify, as a new type of fanotify
> mark, FAN_ERROR, which a file system monitoring tool can register to

You have a terminology mistake throughout your series.
FAN_ERROR is not a type of a mark, it is a type of an event.
A mark describes the watched object (i.e. a filesystem, mount, inode).

> receive notifications.  A notification is split in three parts, and only
> the first is guaranteed to exist for any given error event:
>
>  - FS generic data: A file system agnostic structure that has a generic
>  error code and identifies the filesystem.  Basically, it let's
>  userspace know something happen on a monitored filesystem.

I think an error seq counter per fs would be a nice addition to generic data.
It does not need to be persistent (it could be if filesystem supports it).

>
>  - FS location data: Identifies where in the code the problem
>  happened. (This is important for the use case of analysing frequent
>  error points that we discussed earlier).
>
>  - FS specific data: A detailed error report in a filesystem specific
>  format that details what the error is.  Ideally, a capable monitoring
>  tool can use the information here for error recovery.  For instance,
>  xfs can put the xfs_scrub structures here, ext4 can send its error
>  reports, etc.  An example of usage is done in the ext4 patch of this
>  series.
>
> More details on the information in each record can be found on the
> documentation introduced in patch 15.
>
> * Using fanotify
>
> Using fanotify for this kind of thing is slightly tricky because we want
> to guarantee delivery in some complicated conditions, for instance, the
> file system might want to send an error while holding several locks.
>
> Instead of working around file system constraints at the file system
> level, this proposal tries to make the FAN_ERROR submission safe in
> those contexts.  This is done with a new mode in fsnotify that
> preallocates the memory at group creation to be used for the
> notification submission.
>
> This new mode in fsnotify introduces a ring buffer to queue
> notifications, which eliminates the allocation path in fsnotify.  From
> what I saw, the allocation is the only problem in fsnotify for
> filesystems to submit errors in constrained situations.
>

The ring buffer functionality for fsnotify is interesting and it may be
useful on its own, but IMO, its too big of a hammer for the problem
at hand.

The question that you should be asking yourself is what is the
expected behavior in case of a flood of filesystem corruption errors.
I think it has already been expressed by filesystem maintainers on
one your previous postings, that a flood of filesystem corruption
errors is often noise and the only interesting information is the first error.

For this reason, I think that FS_ERROR could be implemented
by attaching an fsnotify_error_info object to an fsnotify_sb_mark:

struct fsnotify_sb_mark {
        struct fsnotify_mark fsn_mark;
        struct fsnotify_error_info info;
}

Similar to fd sampled errseq, there can be only one error report
per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
the error report can be allocated at the time of setting the filesystem mark.

With this, you will not need the added complexity of the ring buffer
and you will not need to limit FAN_ERROR reporting to a group that
is only listening for FAN_ERROR, which is an unneeded limitation IMO.

Anyway, in case, others do like the ring buffer approach, I do have
some technical comments on the implementation.
I will comment on individual patches.

Thanks,
Amir.

> * Visibility
>
> Since the usecase is limited to a tool for whole file system monitoring,
> errors are associated with the superblock and visible filesystem-wide.
> It is assumed and required that userspace has CAP_SYS_ADMIN.
>
> * Testing
>
> This was tested with corrupted ext4 images in a few scenarios, which
> caused errors to be triggered and monitored with the sample tool
> provided in the next to final patch.
>
> * patches
>
> Patches 1-4 massage fanotify attempt to refactor fanotify a bit for
> the patches to come.  Patch 5 introduce the ring buffer interface to
> fsnotify, while patch 6 enable this support in fanotify.  Patch 7, 8 wire
> the FS_ERROR event type, which will be used by filesystems.  In
> sequennce, patches 9-12 implement the FAN_ERROR record types and create
> the new event.  Patch 13 is an ext4 example implementation supporting
> this feature.  Finally, patches 14 and 15 document and provide examples
> of a userspace tool that uses this feature.
>
> I also pushed the full series to:
>
>   https://gitlab.collabora.com/krisman/linux -b fanotify-notifications
>
> [1] https://lwn.net/Articles/839310/
> [2] https://www.spinics.net/lists/linux-fsdevel/msg187075.html
>
> Gabriel Krisman Bertazi (15):
>   fanotify: Fold event size calculation to its own function
>   fanotify: Split fsid check from other fid mode checks
>   fsnotify: Wire flags field on group allocation
>   fsnotify: Wire up group information on event initialization
>   fsnotify: Support event submission through ring buffer
>   fanotify: Support submission through ring buffer
>   fsnotify: Support FS_ERROR event type
>   fsnotify: Introduce helpers to send error_events
>   fanotify: Introduce generic error record
>   fanotify: Introduce code location record
>   fanotify: Introduce filesystem specific data record
>   fanotify: Introduce the FAN_ERROR mark
>   ext4: Send notifications on error
>   samples: Add fs error monitoring example
>   Documentation: Document the FAN_ERROR framework
>
>  .../admin-guide/filesystem-monitoring.rst     | 103 ++++++
>  Documentation/admin-guide/index.rst           |   1 +
>  fs/ext4/super.c                               |  60 +++-
>  fs/notify/Makefile                            |   2 +-
>  fs/notify/dnotify/dnotify.c                   |   2 +-
>  fs/notify/fanotify/fanotify.c                 | 127 +++++--
>  fs/notify/fanotify/fanotify.h                 |  35 +-
>  fs/notify/fanotify/fanotify_user.c            | 319 ++++++++++++++----
>  fs/notify/fsnotify.c                          |   2 +-
>  fs/notify/group.c                             |  25 +-
>  fs/notify/inotify/inotify_fsnotify.c          |   2 +-
>  fs/notify/inotify/inotify_user.c              |   4 +-
>  fs/notify/notification.c                      |  10 +
>  fs/notify/ring.c                              | 199 +++++++++++
>  include/linux/fanotify.h                      |  12 +-
>  include/linux/fsnotify.h                      |  15 +
>  include/linux/fsnotify_backend.h              |  63 +++-
>  include/uapi/linux/ext4-notify.h              |  17 +
>  include/uapi/linux/fanotify.h                 |  26 ++
>  kernel/audit_fsnotify.c                       |   2 +-
>  kernel/audit_tree.c                           |   2 +-
>  kernel/audit_watch.c                          |   2 +-
>  samples/Kconfig                               |   7 +
>  samples/Makefile                              |   1 +
>  samples/fanotify/Makefile                     |   3 +
>  samples/fanotify/fs-monitor.c                 | 135 ++++++++
>  26 files changed, 1034 insertions(+), 142 deletions(-)
>  create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
>  create mode 100644 fs/notify/ring.c
>  create mode 100644 include/uapi/linux/ext4-notify.h
>  create mode 100644 samples/fanotify/Makefile
>  create mode 100644 samples/fanotify/fs-monitor.c
>
> --
> 2.31.0
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 13/15] ext4: Send notifications on error
  2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
  2021-04-26 23:10   ` kernel test robot
@ 2021-04-27  4:32   ` Amir Goldstein
  2021-04-29  0:57   ` Darrick J. Wong
  2 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  4:32 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> whenever a ext4 error condition is triggered.  This follows the existing
> error conditions in ext4, so it is hooked to the ext4_error* functions.
>
> It also follows the current dmesg reporting in the format.  The
> filesystem message is composed mostly by the string that would be
> otherwise printed in dmesg.
>
> A new ext4 specific record format is exposed in the uapi, such that a
> monitoring tool knows what to expect when listening errors of an ext4
> filesystem.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/ext4/super.c                  | 60 ++++++++++++++++++++++++--------
>  include/uapi/linux/ext4-notify.h | 17 +++++++++
>  2 files changed, 62 insertions(+), 15 deletions(-)
>  create mode 100644 include/uapi/linux/ext4-notify.h
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b9693680463a..032e29e7ff6a 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -46,6 +46,8 @@
>  #include <linux/part_stat.h>
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
> +#include <linux/fsnotify.h>
> +#include <uapi/linux/ext4-notify.h>
>
>  #include "ext4.h"
>  #include "ext4_extents.h"      /* Needed for trace points definition */
> @@ -727,6 +729,22 @@ static void flush_stashed_error_work(struct work_struct *work)
>         ext4_commit_super(sbi->s_sb);
>  }
>
> +static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
> +                               const char *func, int line,
> +                               const char *desc, struct va_format *vaf)
> +{
> +       struct ext4_error_inode_report report;
> +
> +       if (inode->i_sb->s_fsnotify_marks) {
> +               report.inode = inode ? inode->i_ino : -1L;
> +               report.block = block ? block : -1L;
> +
> +               snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
> +
> +               fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
> +       }
> +}
> +
>  #define ext4_error_ratelimit(sb)                                       \
>                 ___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state),     \
>                              "EXT4-fs error")
> @@ -742,15 +760,18 @@ void __ext4_error(struct super_block *sb, const char *function,
>                 return;
>
>         trace_ext4_error(sb, function, line);
> +
> +       va_start(args, fmt);
> +       vaf.fmt = fmt;
> +       vaf.va = &args;
>         if (ext4_error_ratelimit(sb)) {
> -               va_start(args, fmt);
> -               vaf.fmt = fmt;
> -               vaf.va = &args;
>                 printk(KERN_CRIT
>                        "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
>                        sb->s_id, function, line, current->comm, &vaf);
> -               va_end(args);
> +
>         }
> +       ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
> +       va_end(args);
>         ext4_handle_error(sb, force_ro, error, 0, block, function, line);
>  }
>

So error reporting to kernel log is ratelimited and error reporting to
fsnotify is limited by a fixed size ring buffer which may be filled by
report floods from another filesystem, so user can miss the first
important error report from this filesystem.

Not optimal.

With my proposal of keeping a single fsnotify_error_info in every
fsnotify_sb_mark, users will be guaranteed to get the first error
report from every filesystem and once they read that report they
will be guaranteed to also get the next report.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
  2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
@ 2021-04-29 11:31 ` Dan Carpenter
  2021-04-27  7:25   ` Amir Goldstein
  1 sibling, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-27  4:33 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 5856 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210426184201.4177978-13-krisman@collabora.com>
References: <20210426184201.4177978-13-krisman@collabora.com>
TO: Gabriel Krisman Bertazi <krisman@collabora.com>

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
:::::: branch date: 10 hours ago
:::::: commit date: 10 hours ago
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/notify/fanotify/fanotify_user.c:112 fanotify_event_len() warn: this array is probably non-NULL. 'fee->fs_data'

Old smatch warnings:
fs/notify/fanotify/fanotify_user.c:1443 do_fanotify_mark() error: we previously assumed 'mnt' could be null (see line 1424)

vim +112 fs/notify/fanotify/fanotify_user.c

b5e798df944730 Gabriel Krisman Bertazi 2021-04-26   89  
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   90  static size_t fanotify_event_len(struct fanotify_event *event,
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   91  				 unsigned int fid_mode)
5e469c830fdb5a Amir Goldstein          2019-01-10   92  {
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   93  	size_t event_len = FAN_EVENT_METADATA_LEN;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   94  	struct fanotify_info *info;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   95  	int dir_fh_len;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   96  	int fh_len;
929943b38daf81 Amir Goldstein          2020-07-16   97  	int dot_len = 0;
f454fa610a69b9 Amir Goldstein          2020-07-16   98  
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26   99  	if (fanotify_is_error_event(event->mask)) {
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  100  		struct fanotify_error_event *fee = FANOTIFY_EE(event);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  101  		/*
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  102  		 *  Error events (FAN_ERROR) have a different format
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  103  		 *  as follows:
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  104  		 * [ event_metadata            ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  105  		 * [ fs-generic error header   ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  106  		 * [ error location (optional) ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  107  		 * [ fs-specific blob          ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  108  		 */
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  109  		event_len = fanotify_error_info_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  110  		if (fee->loc.function)
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  111  			event_len += fanotify_location_info_len(&fee->loc);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 @112  		if (fee->fs_data)
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  113  			event_len += fanotify_error_fsdata_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  114  		return event_len;
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  115  	}
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  116  
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  117  	if (!fid_mode)
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  118  		return event_len;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  119  
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  120  	info = fanotify_event_info(event);
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  121  	dir_fh_len = fanotify_event_dir_fh_len(event);
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  122  	fh_len = fanotify_event_object_fh_len(event);
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  123  
929943b38daf81 Amir Goldstein          2020-07-16  124  	if (dir_fh_len) {
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  125  		event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
929943b38daf81 Amir Goldstein          2020-07-16  126  	} else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
929943b38daf81 Amir Goldstein          2020-07-16  127  		/*
929943b38daf81 Amir Goldstein          2020-07-16  128  		 * With group flag FAN_REPORT_NAME, if name was not recorded in
929943b38daf81 Amir Goldstein          2020-07-16  129  		 * event on a directory, we will report the name ".".
929943b38daf81 Amir Goldstein          2020-07-16  130  		 */
929943b38daf81 Amir Goldstein          2020-07-16  131  		dot_len = 1;
929943b38daf81 Amir Goldstein          2020-07-16  132  	}
afc894c784c84c Jan Kara                2020-03-24  133  
44d705b0370b1d Amir Goldstein          2020-03-19  134  	if (fh_len)
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  135  		event_len += fanotify_fid_info_len(fh_len, dot_len);
5e469c830fdb5a Amir Goldstein          2019-01-10  136  
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  137  	return event_len;
5e469c830fdb5a Amir Goldstein          2019-01-10  138  }
5e469c830fdb5a Amir Goldstein          2019-01-10  139  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 41522 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function
  2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
@ 2021-04-27  4:42   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  4:42 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Every time this function is invoked, it is immediately added to
> FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
> length of info records. This minor clean up folds the rest of the
> calculation into the function, which now operates in terms of events,
> returning the size of the entire event, including metadata.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Nice

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/fanotify/fanotify_user.c | 40 +++++++++++++++++-------------
>  1 file changed, 23 insertions(+), 17 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9e0c1afac8bd..0332c4afeec3 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -64,17 +64,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
>         return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
>  }
>
> -static int fanotify_event_info_len(unsigned int fid_mode,
> -                                  struct fanotify_event *event)
> +static size_t fanotify_event_len(struct fanotify_event *event,
> +                                unsigned int fid_mode)
>  {
> -       struct fanotify_info *info = fanotify_event_info(event);
> -       int dir_fh_len = fanotify_event_dir_fh_len(event);
> -       int fh_len = fanotify_event_object_fh_len(event);
> -       int info_len = 0;
> +       size_t event_len = FAN_EVENT_METADATA_LEN;
> +       struct fanotify_info *info;
> +       int dir_fh_len;
> +       int fh_len;
>         int dot_len = 0;
>
> +       if (!fid_mode)
> +               return event_len;
> +
> +       info = fanotify_event_info(event);
> +       dir_fh_len = fanotify_event_dir_fh_len(event);
> +       fh_len = fanotify_event_object_fh_len(event);
> +
>         if (dir_fh_len) {
> -               info_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
> +               event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
>         } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
>                 /*
>                  * With group flag FAN_REPORT_NAME, if name was not recorded in
> @@ -84,9 +91,9 @@ static int fanotify_event_info_len(unsigned int fid_mode,
>         }
>
>         if (fh_len)
> -               info_len += fanotify_fid_info_len(fh_len, dot_len);
> +               event_len += fanotify_fid_info_len(fh_len, dot_len);
>
> -       return info_len;
> +       return event_len;
>  }
>
>  /*
> @@ -98,7 +105,8 @@ static int fanotify_event_info_len(unsigned int fid_mode,
>  static struct fanotify_event *get_one_event(struct fsnotify_group *group,
>                                             size_t count)
>  {
> -       size_t event_size = FAN_EVENT_METADATA_LEN;
> +       size_t event_size;
> +       struct fsnotify_event *fse;
>         struct fanotify_event *event = NULL;
>         unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
>
> @@ -108,16 +116,15 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
>         if (fsnotify_notify_queue_is_empty(group))
>                 goto out;
>
> -       if (fid_mode) {
> -               event_size += fanotify_event_info_len(fid_mode,
> -                       FANOTIFY_E(fsnotify_peek_first_event(group)));
> -       }
> +       fse = fsnotify_peek_first_event(group);
> +       event = FANOTIFY_E(fse);
> +       event_size = fanotify_event_len(event, fid_mode);
>
>         if (event_size > count) {
>                 event = ERR_PTR(-EINVAL);
>                 goto out;
>         }
> -       event = FANOTIFY_E(fsnotify_remove_first_event(group));
> +       fsnotify_remove_queued_event(group, fse);
>         if (fanotify_is_perm_event(event->mask))
>                 FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
>  out:
> @@ -334,8 +341,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>
>         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> -       metadata.event_len = FAN_EVENT_METADATA_LEN +
> -                               fanotify_event_info_len(fid_mode, event);
> +       metadata.event_len = fanotify_event_len(event, fid_mode);
>         metadata.metadata_len = FAN_EVENT_METADATA_LEN;
>         metadata.vers = FANOTIFY_METADATA_VERSION;
>         metadata.reserved = 0;
> --
> 2.31.0
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks
  2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
@ 2021-04-27  4:53   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  4:53 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> FAN_ERROR will require fsid, but not necessarily require the filesystem
> to expose a file handle.  Split those checks into different functions, so
> they can be used separately when creating a mark.

Ok for the split, but...

>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/fanotify/fanotify_user.c | 35 +++++++++++++++++++-----------
>  1 file changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 0332c4afeec3..e0d113e3b65c 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1055,7 +1055,23 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  }
>
>  /* Check if filesystem can encode a unique fid */
> -static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
> +static int fanotify_test_fid(struct path *path)

This helper can take a dentry.

> +{
> +       /*
> +        * We need to make sure that the file system supports at least
> +        * encoding a file handle so user can use name_to_handle_at() to
> +        * compare fid returned with event to the file handle of watched
> +        * objects. However, name_to_handle_at() requires that the
> +        * filesystem also supports decoding file handles.
> +        */
> +       if (!path->dentry->d_sb->s_export_op ||
> +           !path->dentry->d_sb->s_export_op->fh_to_dentry)
> +               return -EOPNOTSUPP;
> +
> +       return 0;
> +}
> +
> +static int fanotify_check_path_fsid(struct path *path, __kernel_fsid_t *fsid)

And so does this helper.
I certainly don't see the need for the _path_ in the helper name.

>  {
>         __kernel_fsid_t root_fsid;
>         int err;
> @@ -1082,17 +1098,6 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
>             root_fsid.val[1] != fsid->val[1])
>                 return -EXDEV;
>
> -       /*
> -        * We need to make sure that the file system supports at least
> -        * encoding a file handle so user can use name_to_handle_at() to
> -        * compare fid returned with event to the file handle of watched
> -        * objects. However, name_to_handle_at() requires that the
> -        * filesystem also supports decoding file handles.
> -        */
> -       if (!path->dentry->d_sb->s_export_op ||
> -           !path->dentry->d_sb->s_export_op->fh_to_dentry)
> -               return -EOPNOTSUPP;
> -
>         return 0;
>  }
>
> @@ -1230,7 +1235,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>         }
>
>         if (fid_mode) {
> -               ret = fanotify_test_fid(&path, &__fsid);
> +               ret = fanotify_check_path_fsid(&path, &__fsid);
> +               if (ret)
> +                       goto path_put_and_out;
> +
> +               ret = fanotify_test_fid(&path);

Whether _test_ or _check_ please stick to one.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation
  2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
@ 2021-04-27  5:03   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  5:03 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Introduce a flags field in fsnotify_group to track the mode of
> submission this group has.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/dnotify/dnotify.c        |  2 +-
>  fs/notify/fanotify/fanotify_user.c |  4 ++--
>  fs/notify/group.c                  | 13 ++++++++-----
>  fs/notify/inotify/inotify_user.c   |  2 +-
>  include/linux/fsnotify_backend.h   |  7 +++++--
>  kernel/audit_fsnotify.c            |  2 +-
>  kernel/audit_tree.c                |  2 +-
>  kernel/audit_watch.c               |  2 +-
>  8 files changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
> index e85e13c50d6d..37960c8750e4 100644
> --- a/fs/notify/dnotify/dnotify.c
> +++ b/fs/notify/dnotify/dnotify.c
> @@ -383,7 +383,7 @@ static int __init dnotify_init(void)
>                                           SLAB_PANIC|SLAB_ACCOUNT);
>         dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
>
> -       dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops);
> +       dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0);
>         if (IS_ERR(dnotify_group))
>                 panic("unable to allocate fsnotify group for dnotify\n");
>         return 0;
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index e0d113e3b65c..f50c4ab721e3 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -929,7 +929,7 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void)
>  SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  {
>         struct fsnotify_group *group;
> -       int f_flags, fd;
> +       int f_flags, fd, fsn_flags = 0;
>         struct user_struct *user;
>         unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
>         unsigned int class = flags & FANOTIFY_CLASS_BITS;
> @@ -982,7 +982,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>                 f_flags |= O_NONBLOCK;
>
>         /* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
> -       group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
> +       group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
>         if (IS_ERR(group)) {
>                 free_uid(user);
>                 return PTR_ERR(group);
> diff --git a/fs/notify/group.c b/fs/notify/group.c
> index ffd723ffe46d..08acb1afc0c2 100644
> --- a/fs/notify/group.c
> +++ b/fs/notify/group.c
> @@ -112,7 +112,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
>  EXPORT_SYMBOL_GPL(fsnotify_put_group);
>
>  static struct fsnotify_group *__fsnotify_alloc_group(
> -                               const struct fsnotify_ops *ops, gfp_t gfp)
> +       const struct fsnotify_ops *ops, unsigned int flags, gfp_t gfp)
>  {
>         struct fsnotify_group *group;
>
> @@ -134,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group(
>         INIT_LIST_HEAD(&group->marks_list);
>
>         group->ops = ops;
> +       group->flags = flags;
>
>         return group;
>  }
> @@ -141,18 +142,20 @@ static struct fsnotify_group *__fsnotify_alloc_group(
>  /*
>   * Create a new fsnotify_group and hold a reference for the group returned.
>   */
> -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
> +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
> +                                           unsigned int flags)
>  {
> -       return __fsnotify_alloc_group(ops, GFP_KERNEL);
> +       return __fsnotify_alloc_group(ops, flags, GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
>
>  /*
>   * Create a new fsnotify_group and hold a reference for the group returned.
>   */
> -struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops)
> +struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops,
> +                                                unsigned int flags)
>  {
> -       return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT);
> +       return __fsnotify_alloc_group(ops, flags, GFP_KERNEL_ACCOUNT);
>  }
>  EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group);
>

*IF* we go this way, note that fsnotify_alloc_group() doesn't need
flags argument.
None of the callers of fsnotify_alloc_group() ever use the
notification list, so it
would be better to pass flag FSN_NOTIFICATION_LIST from the backends that
do use it (fanotify and inotify) for the sake of symmetry with FSN_RING_BUFFER
and no need to change other callers.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
  2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
  2021-04-26 22:00   ` kernel test robot
  2021-04-26 22:43   ` kernel test robot
@ 2021-04-27  5:39   ` Amir Goldstein
  2021-04-29 18:33     ` Gabriel Krisman Bertazi
  2 siblings, 1 reply; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  5:39 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> In order to support file system health/error reporting over fanotify,
> fsnotify needs to expose a submission path that doesn't allow sleeping.
> The only problem I identified with the current submission path is the
> need to dynamically allocate memory for the event queue.
>
> This patch avoids the problem by introducing a new mode in fsnotify,
> where a ring buffer is used to submit events for a group.  Each group
> has its own ring buffer, and error notifications are submitted
> exclusively through it.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/Makefile               |   2 +-
>  fs/notify/group.c                |  12 +-
>  fs/notify/notification.c         |  10 ++
>  fs/notify/ring.c                 | 199 +++++++++++++++++++++++++++++++
>  include/linux/fsnotify_backend.h |  37 +++++-
>  5 files changed, 255 insertions(+), 5 deletions(-)
>  create mode 100644 fs/notify/ring.c
>
> diff --git a/fs/notify/Makefile b/fs/notify/Makefile
> index 63a4b8828df4..61dae1e90f2d 100644
> --- a/fs/notify/Makefile
> +++ b/fs/notify/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_FSNOTIFY)         += fsnotify.o notification.o group.o mark.o \
> -                                  fdinfo.o
> +                                  fdinfo.o ring.o
>
>  obj-y                  += dnotify/
>  obj-y                  += inotify/
> diff --git a/fs/notify/group.c b/fs/notify/group.c
> index 08acb1afc0c2..b99b3de36696 100644
> --- a/fs/notify/group.c
> +++ b/fs/notify/group.c
> @@ -81,7 +81,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
>          * notification against this group. So clearing the notification queue
>          * of all events is reliable now.
>          */
> -       fsnotify_flush_notify(group);
> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> +               fsnotify_free_ring_buffer(group);
> +       else
> +               fsnotify_flush_notify(group);
>
>         /*
>          * Destroy overflow event (we cannot use fsnotify_destroy_event() as
> @@ -136,6 +139,13 @@ static struct fsnotify_group *__fsnotify_alloc_group(
>         group->ops = ops;
>         group->flags = flags;
>
> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
> +               if (fsnotify_create_ring_buffer(group)) {
> +                       kfree(group);
> +                       return ERR_PTR(-ENOMEM);
> +               }
> +       }
> +
>         return group;
>  }
>
> diff --git a/fs/notify/notification.c b/fs/notify/notification.c
> index 75d79d6d3ef0..32f97e7b7a80 100644
> --- a/fs/notify/notification.c
> +++ b/fs/notify/notification.c
> @@ -51,6 +51,10 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
>  bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
>  {
>         assert_spin_locked(&group->notification_lock);
> +
> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> +               return fsnotify_ring_notify_queue_is_empty(group);
> +
>         return list_empty(&group->notification_list) ? true : false;
>  }
>
> @@ -132,6 +136,9 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group,
>                                   struct fsnotify_event *event)
>  {
>         assert_spin_locked(&group->notification_lock);
> +
> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> +               return;
>         /*
>          * We need to init list head for the case of overflow event so that
>          * check in fsnotify_add_event() works
> @@ -166,6 +173,9 @@ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
>  {
>         assert_spin_locked(&group->notification_lock);
>
> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
> +               return fsnotify_ring_peek_first_event(group);
> +
>         return list_first_entry(&group->notification_list,
>                                 struct fsnotify_event, list);
>  }
> diff --git a/fs/notify/ring.c b/fs/notify/ring.c
> new file mode 100644
> index 000000000000..75e8af1f8d80
> --- /dev/null
> +++ b/fs/notify/ring.c
> @@ -0,0 +1,199 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/types.h>
> +#include <linux/fsnotify.h>
> +#include <linux/memcontrol.h>
> +
> +#define INVALID_RING_SLOT -1
> +
> +#define FSNOTIFY_RING_PAGES 16
> +
> +#define NEXT_SLOT(cur, len, ring_size) ((cur + len) & (ring_size-1))
> +#define NEXT_PAGE(cur, ring_size) (round_up(cur, PAGE_SIZE) & (ring_size-1))
> +
> +bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group)
> +{
> +       assert_spin_locked(&group->notification_lock);
> +
> +       if (group->ring_buffer.tail == group->ring_buffer.head)
> +               return true;
> +       return false;
> +}
> +
> +struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group)
> +{
> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> +       struct fsnotify_event *fsn;
> +       char *kaddr;
> +       u64 tail;
> +
> +       assert_spin_locked(&group->notification_lock);
> +
> +again:
> +       tail = group->ring_buffer.tail;
> +
> +       if ((PAGE_SIZE - (tail & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
> +               group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
> +               goto again;
> +       }
> +
> +       kaddr = kmap_atomic(group->ring_buffer.pages[tail / PAGE_SIZE]);
> +       if (!kaddr)
> +               return NULL;
> +       fsn = (struct fsnotify_event *) (kaddr + (tail & (PAGE_SIZE-1)));
> +
> +       if (fsn->slot_len == INVALID_RING_SLOT) {
> +               group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
> +               kunmap_atomic(kaddr);
> +               goto again;
> +       }
> +
> +       /* will be unmapped when entry is consumed. */
> +       return fsn;
> +}
> +
> +void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
> +                                       struct fsnotify_event *event)
> +{
> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> +       u64 new_tail = NEXT_SLOT(group->ring_buffer.tail, event->slot_len, ring_size);
> +
> +       kunmap_atomic(event);
> +
> +       pr_debug("%s: group=%p tail=%llx->%llx ring_size=%llu\n", __func__,
> +                group, group->ring_buffer.tail, new_tail, ring_size);
> +
> +       WRITE_ONCE(group->ring_buffer.tail, new_tail);
> +}
> +
> +struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
> +                                                     size_t size)
> +       __acquires(&group->notification_lock)
> +{
> +       struct fsnotify_event *fsn;
> +       u64 head, tail;
> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> +       u64 new_head;
> +       void *kaddr;
> +
> +       if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
> +               return ERR_PTR(-EINVAL);
> +
> +       pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
> +                ring_size, size);
> +
> +       spin_lock(&group->notification_lock);
> +again:
> +       head = group->ring_buffer.head;
> +       tail = group->ring_buffer.tail;
> +       new_head = NEXT_SLOT(head, size, ring_size);
> +
> +       /* head would catch up to tail, corrupting an entry. */
> +       if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
> +               fsn = ERR_PTR(-ENOMEM);
> +               goto err;
> +       }
> +
> +       /*
> +        * Not event a skip message fits in the page. We can detect the
> +        * lack of space. Move on to the next page.
> +        */
> +       if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
> +               /* Start again on next page */
> +               group->ring_buffer.head = NEXT_PAGE(head, ring_size);
> +               goto again;
> +       }
> +
> +       kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
> +       if (!kaddr) {
> +               fsn = ERR_PTR(-EFAULT);
> +               goto err;
> +       }
> +
> +       fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
> +
> +       if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
> +               /*
> +                * No room in the current page.  Add a fake entry
> +                * consuming the end the page to avoid splitting event
> +                * structure.
> +                */
> +               fsn->slot_len = INVALID_RING_SLOT;
> +               kunmap_atomic(kaddr);
> +               /* Start again on the next page */
> +               group->ring_buffer.head = NEXT_PAGE(head, ring_size);
> +
> +               goto again;
> +       }
> +       fsn->slot_len = size;
> +
> +       return fsn;
> +
> +err:
> +       spin_unlock(&group->notification_lock);
> +       return fsn;
> +}
> +
> +void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn)
> +       __releases(&group->notification_lock)
> +{
> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
> +       u64 head = group->ring_buffer.head;
> +       u64 new_head = NEXT_SLOT(head, fsn->slot_len, ring_size);
> +
> +       pr_debug("%s: group=%p head=%llx->%llx ring_size=%llu\n", __func__,
> +                group, head, new_head, ring_size);
> +
> +       kunmap_atomic(fsn);
> +       group->ring_buffer.head = new_head;
> +
> +       spin_unlock(&group->notification_lock);
> +
> +       wake_up(&group->notification_waitq);
> +       kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
> +
> +}
> +
> +void fsnotify_free_ring_buffer(struct fsnotify_group *group)
> +{
> +       int i;
> +
> +       for (i = 0; i < group->ring_buffer.nr_pages; i++)
> +               __free_page(group->ring_buffer.pages[i]);
> +       kfree(group->ring_buffer.pages);
> +       group->ring_buffer.nr_pages = 0;
> +}
> +
> +int fsnotify_create_ring_buffer(struct fsnotify_group *group)
> +{
> +       int nr_pages = FSNOTIFY_RING_PAGES;
> +       int i;
> +
> +       pr_debug("%s: group=%p pages=%d\n", __func__, group, nr_pages);
> +
> +       group->ring_buffer.pages = kmalloc_array(nr_pages, sizeof(struct pages *),
> +                                                GFP_KERNEL);
> +       if (!group->ring_buffer.pages)
> +               return -ENOMEM;
> +
> +       group->ring_buffer.head = 0;
> +       group->ring_buffer.tail = 0;
> +
> +       for (i = 0; i < nr_pages; i++) {
> +               group->ring_buffer.pages[i] = alloc_pages(GFP_KERNEL, 1);
> +               if (!group->ring_buffer.pages)
> +                       goto err_dealloc;
> +       }
> +
> +       group->ring_buffer.nr_pages = nr_pages;
> +
> +       return 0;
> +
> +err_dealloc:
> +       for (--i; i >= 0; i--)
> +               __free_page(group->ring_buffer.pages[i]);
> +       kfree(group->ring_buffer.pages);
> +       group->ring_buffer.nr_pages = 0;
> +       return -ENOMEM;
> +}
> +
> +

Nothing in this file is fsnotify specific.
Is there no kernel lib implementation for this already?
If there isn't (I'd be very surprised) please put this in lib/ and post it
for wider review including self tests.

> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 190c6a402e98..a1a4dd69e5ed 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -74,6 +74,8 @@
>  #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
>                                   FS_OPEN_EXEC_PERM)
>
> +#define FSN_SUBMISSION_RING_BUFFER     0x00000080

FSNOTIFY_GROUP_FLAG_RING_BUFFER please (or FSN_GROUP_ if you must)
and please define this above struct fsnotify_group, even right above the flags
field like FSNOTIFY_CONN_FLAG_HAS_FSID

*IF* we go this way :)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 06/15] fanotify: Support submission through ring buffer
  2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
@ 2021-04-27  6:02   ` Amir Goldstein
  2021-04-29 18:36     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  6:02 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This adds support for the ring buffer mode in fanotify.  It is enabled
> by a new flag FAN_PREALLOC_QUEUE passed to fanotify_init.  If this flag
> is enabled, the group only allows marks that support the ring buffer

I don't like this limitation.
I think FAN_PREALLOC_QUEUE can work with other events, why not?

In any case if we keep ring buffer, please use a different set of
fanotify_ring_buffer_ops struct instead of spraying if/else all over the
event queue implementation.

> submission.  In a following patch, FAN_ERROR will make use of this
> mechanism.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/fanotify/fanotify.c      | 77 +++++++++++++++++++---------
>  fs/notify/fanotify/fanotify_user.c | 81 ++++++++++++++++++------------
>  include/linux/fanotify.h           |  5 +-
>  include/uapi/linux/fanotify.h      |  1 +
>  4 files changed, 105 insertions(+), 59 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index e3669d8a4a64..98591a8155a7 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -612,6 +612,26 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>         return event;
>  }
>
> +static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *group,
> +                                                    u32 mask, const void *data,
> +                                                    int data_type)
> +{
> +       size_t size = 0;
> +
> +       pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
> +
> +       return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
> +}
> +
> +static void fanotify_ring_write_event(struct fsnotify_group *group,
> +                                     struct fanotify_event *event, u32 mask,
> +                                     const void *data, __kernel_fsid_t *fsid)
> +{
> +       fanotify_init_event(group, event, 0, mask);
> +
> +       event->pid = get_pid(task_tgid(current));
> +}
> +
>  /*
>   * Get cached fsid of the filesystem containing the object from any connector.
>   * All connectors are supposed to have the same fsid, but we do not verify that
> @@ -701,31 +721,38 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>                         return 0;
>         }
>
> -       event = fanotify_alloc_event(group, mask, data, data_type, dir,
> -                                    file_name, &fsid);
> -       ret = -ENOMEM;
> -       if (unlikely(!event)) {
> -               /*
> -                * We don't queue overflow events for permission events as
> -                * there the access is denied and so no event is in fact lost.
> -                */
> -               if (!fanotify_is_perm_event(mask))
> -                       fsnotify_queue_overflow(group);
> -               goto finish;
> -       }
> -
> -       fsn_event = &event->fse;
> -       ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
> -       if (ret) {
> -               /* Permission events shouldn't be merged */
> -               BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
> -               /* Our event wasn't used in the end. Free it. */
> -               fsnotify_destroy_event(group, fsn_event);
> -
> -               ret = 0;
> -       } else if (fanotify_is_perm_event(mask)) {
> -               ret = fanotify_get_response(group, FANOTIFY_PERM(event),
> -                                           iter_info);
> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
> +               event = fanotify_ring_get_slot(group, mask, data, data_type);
> +               if (IS_ERR(event))
> +                       return PTR_ERR(event);

So no FAN_OVERFLOW with the ring buffer implementation?
This will be unexpected for fanotify users and frankly, less useful IMO.
I also don't see the technical reason to omit the overflow event.

> +               fanotify_ring_write_event(group, event, mask, data, &fsid);
> +               fsnotify_ring_commit_slot(group, &event->fse);
> +       } else {
> +               event = fanotify_alloc_event(group, mask, data, data_type, dir,
> +                                            file_name, &fsid);
> +               ret = -ENOMEM;
> +               if (unlikely(!event)) {
> +                       /*
> +                        * We don't queue overflow events for permission events as
> +                        * there the access is denied and so no event is in fact lost.
> +                        */
> +                       if (!fanotify_is_perm_event(mask))
> +                               fsnotify_queue_overflow(group);
> +                       goto finish;
> +               }
> +               fsn_event = &event->fse;
> +               ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
> +               if (ret) {
> +                       /* Permission events shouldn't be merged */
> +                       BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
> +                       /* Our event wasn't used in the end. Free it. */
> +                       fsnotify_destroy_event(group, fsn_event);
> +
> +                       ret = 0;
> +               } else if (fanotify_is_perm_event(mask)) {
> +                       ret = fanotify_get_response(group, FANOTIFY_PERM(event),
> +                                                   iter_info);
> +               }
>         }
>  finish:
>         if (fanotify_is_perm_event(mask))
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index fe605359af88..5031198bf7db 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -521,7 +521,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
>                  * Permission events get queued to wait for response.  Other
>                  * events can be destroyed now.
>                  */
> -               if (!fanotify_is_perm_event(event->mask)) {
> +               if (group->fanotify_data.flags & FAN_PREALLOC_QUEUE) {
> +                       fsnotify_ring_buffer_consume_event(group, &event->fse);
> +               } else if (!fanotify_is_perm_event(event->mask)) {
>                         fsnotify_destroy_event(group, &event->fse);
>                 } else {
>                         if (ret <= 0) {
> @@ -587,40 +589,39 @@ static int fanotify_release(struct inode *ignored, struct file *file)
>          */
>         fsnotify_group_stop_queueing(group);
>
> -       /*
> -        * Process all permission events on access_list and notification queue
> -        * and simulate reply from userspace.
> -        */
> -       spin_lock(&group->notification_lock);
> -       while (!list_empty(&group->fanotify_data.access_list)) {
> -               struct fanotify_perm_event *event;
> -
> -               event = list_first_entry(&group->fanotify_data.access_list,
> -                               struct fanotify_perm_event, fae.fse.list);
> -               list_del_init(&event->fae.fse.list);
> -               finish_permission_event(group, event, FAN_ALLOW);
> +       if (!(group->flags & FSN_SUBMISSION_RING_BUFFER)) {
> +               /*
> +                * Process all permission events on access_list and notification queue
> +                * and simulate reply from userspace.
> +                */
>                 spin_lock(&group->notification_lock);
> -       }
> -
> -       /*
> -        * Destroy all non-permission events. For permission events just
> -        * dequeue them and set the response. They will be freed once the
> -        * response is consumed and fanotify_get_response() returns.
> -        */
> -       while (!fsnotify_notify_queue_is_empty(group)) {
> -               struct fanotify_event *event;
> -
> -               event = FANOTIFY_E(fsnotify_remove_first_event(group));
> -               if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
> -                       spin_unlock(&group->notification_lock);
> -                       fsnotify_destroy_event(group, &event->fse);
> -               } else {
> -                       finish_permission_event(group, FANOTIFY_PERM(event),
> -                                               FAN_ALLOW);
> +               while (!list_empty(&group->fanotify_data.access_list)) {
> +                       struct fanotify_perm_event *event;
> +                       event = list_first_entry(&group->fanotify_data.access_list,
> +                                                struct fanotify_perm_event, fae.fse.list);
> +                       list_del_init(&event->fae.fse.list);
> +                       finish_permission_event(group, event, FAN_ALLOW);
> +                       spin_lock(&group->notification_lock);
>                 }
> -               spin_lock(&group->notification_lock);
> +               /*
> +                * Destroy all non-permission events. For permission events just
> +                * dequeue them and set the response. They will be freed once the
> +                * response is consumed and fanotify_get_response() returns.
> +                */
> +               while (!fsnotify_notify_queue_is_empty(group)) {
> +                       struct fanotify_event *event;
> +                       event = FANOTIFY_E(fsnotify_remove_first_event(group));
> +                       if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
> +                               spin_unlock(&group->notification_lock);
> +                               fsnotify_destroy_event(group, &event->fse);
> +                       } else {
> +                               finish_permission_event(group, FANOTIFY_PERM(event),
> +                                                       FAN_ALLOW);
> +                       }
> +                       spin_lock(&group->notification_lock);
> +               }
> +               spin_unlock(&group->notification_lock);
>         }
> -       spin_unlock(&group->notification_lock);
>
>         /* Response for all permission events it set, wakeup waiters */
>         wake_up(&group->fanotify_data.access_waitq);
> @@ -981,6 +982,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>         if (flags & FAN_NONBLOCK)
>                 f_flags |= O_NONBLOCK;
>
> +       if (flags & FAN_PREALLOC_QUEUE) {
> +               if (!capable(CAP_SYS_ADMIN))
> +                       return -EPERM;
> +
> +               if (flags & FAN_UNLIMITED_QUEUE)
> +                       return -EINVAL;
> +
> +               fsn_flags = FSN_SUBMISSION_RING_BUFFER;
> +       }
> +
>         /* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
>         group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
>         if (IS_ERR(group)) {
> @@ -1223,6 +1234,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>                 goto fput_and_out;
>         }
>
> +       if ((group->flags & FSN_SUBMISSION_RING_BUFFER) &&
> +           (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
> +               goto fput_and_out;
> +
>         ret = fanotify_find_path(dfd, pathname, &path, flags,
>                         (mask & ALL_FSNOTIFY_EVENTS), obj_type);
>         if (ret)
> @@ -1327,7 +1342,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
>   */
>  static int __init fanotify_user_setup(void)
>  {
> -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
> +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
>         BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>
>         fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 3e9c56ee651f..5a4cefb4b1c3 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -23,7 +23,8 @@
>  #define FANOTIFY_INIT_FLAGS    (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
>                                  FAN_REPORT_TID | \
>                                  FAN_CLOEXEC | FAN_NONBLOCK | \
> -                                FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
> +                                FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
> +                                FAN_PREALLOC_QUEUE)
>
>  #define FANOTIFY_MARK_TYPE_BITS        (FAN_MARK_INODE | FAN_MARK_MOUNT | \
>                                  FAN_MARK_FILESYSTEM)
> @@ -71,6 +72,8 @@
>                                          FANOTIFY_PERM_EVENTS | \
>                                          FAN_Q_OVERFLOW | FAN_ONDIR)
>
> +#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0

FANOTIFY_RING_BUFFER_EVENTS? FANOTIFY_PREALLOC_EVENTS?

Please leave a comment above to state what this group means.
I *think* there is no reason to limit the set of events, only the sort of
information that is possible with FAN_PREALLOC_QUEUE.

Perhaps FAN_REPORT_FID cannot be allowed and as a result
FANOTIFY_INODE_EVENTS will not be allowed, but I am not even
sure if that limitation is needed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events
  2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
@ 2021-04-27  6:49   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  6:49 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  include/linux/fsnotify.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index f8acddcf54fb..b3ac1a9d0d4d 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -317,4 +317,19 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
>                 fsnotify_dentry(dentry, mask);
>  }
>
> +static inline void fsnotify_error_event(int error, struct inode *dir,
> +                                       const char *function, int line,
> +                                       void *fs_data, int fs_data_size)
> +{
> +       struct fs_error_report report = {
> +               .error = error,
> +               .line = line,
> +               .function = function,
> +               .fs_data_size = fs_data_size,
> +               .fs_data = fs_data,
> +       };
> +
> +       fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, dir, NULL, NULL, 0);

The way you use this helper from ext4_fsnotify_error() it would make more sense
to name the inode argument 'inode' and call:

       fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, NULL, NULL, inode, 0);

Also, if we stick with returning ENOMEM instead of overflow event (I
don't think we should),
then this helper should return the error as well.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 09/15] fanotify: Introduce generic error record
  2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
@ 2021-04-27  7:01   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  7:01 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This record describes a fs error in a fs agnostic way.  It will be send
> back to userspace in response to a FSNOTIFY_EVENT_ERROR for groups with
> the FAN_ERROR mark.

It's not a mark, it's an event, so:
"...for groups with the FAN_ERROR event in their mark mask"

>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/fanotify/fanotify.h      | 16 ++++++++++++++++
>  fs/notify/fanotify/fanotify_user.c | 28 ++++++++++++++++++++++++++++
>  include/uapi/linux/fanotify.h      | 10 ++++++++++
>  3 files changed, 54 insertions(+)
>
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 47299e3d6efd..4cb9dd31f084 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -179,6 +179,22 @@ FANOTIFY_NE(struct fanotify_event *event)
>         return container_of(event, struct fanotify_name_event, fae);
>  }
>
> +struct fanotify_error_event {
> +       struct fanotify_event fae;
> +       int error;
> +       __kernel_fsid_t fsid;
> +
> +       int fs_data_size;
> +       /* Must be the last item in the structure */
> +       char fs_data[0];
> +};
> +
> +static inline struct fanotify_error_event *
> +FANOTIFY_EE(struct fanotify_event *event)
> +{
> +       return container_of(event, struct fanotify_error_event, fae);
> +}
> +
>  static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
>  {
>         if (event->type == FANOTIFY_EVENT_TYPE_FID)
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 5031198bf7db..21162d347bd1 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -64,6 +64,11 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
>         return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
>  }
>
> +static size_t fanotify_error_info_len(struct fanotify_error_event *fee)
> +{
> +       return sizeof(struct fanotify_event_info_error);
> +}
> +
>  static size_t fanotify_event_len(struct fanotify_event *event,
>                                  unsigned int fid_mode)
>  {
> @@ -232,6 +237,29 @@ static int process_access_response(struct fsnotify_group *group,
>         return -ENOENT;
>  }
>
> +static size_t copy_error_info_to_user(struct fanotify_error_event *fee,
> +                                     char __user *buf, int count)
> +{
> +       struct fanotify_event_info_error info;
> +
> +       info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
> +       info.hdr.pad = 0;
> +       info.hdr.len = fanotify_error_info_len(fee);
> +
> +       if (WARN_ON(count < info.hdr.len))
> +               return -EFAULT;
> +
> +       info.version = FANOTIFY_EVENT_INFO_ERROR_VERS_1;
> +       info.error = fee->error;
> +       info.fsid = fee->fsid;
> +
> +       if (copy_to_user(buf, &info, sizeof(info)))
> +               return -EFAULT;
> +
> +       return info.hdr.len;
> +
> +}
> +
>  static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
>                              int info_type, const char *name, size_t name_len,
>                              char __user *buf, size_t count)
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index b283531549f1..cc9a1fa80e30 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -124,6 +124,7 @@ struct fanotify_event_metadata {
>  #define FAN_EVENT_INFO_TYPE_FID                1
>  #define FAN_EVENT_INFO_TYPE_DFID_NAME  2
>  #define FAN_EVENT_INFO_TYPE_DFID       3
> +#define FAN_EVENT_INFO_TYPE_ERROR      4
>
>  /* Variable length info record following event metadata */
>  struct fanotify_event_info_header {
> @@ -149,6 +150,15 @@ struct fanotify_event_info_fid {
>         unsigned char handle[0];
>  };
>
> +#define FANOTIFY_EVENT_INFO_ERROR_VERS_1   1

Honestly, this struct is too simple to have a 'version'.
The format of this simple struct is already defined by
FAN_EVENT_INFO_TYPE_ERROR and if we want to change
the reported info in the future, we can use
FAN_EVENT_INFO_TYPE_ERROR_V2.
In fact, I suggest to name the type
FAN_EVENT_INFO_TYPE_FS_ERROR
to differentiate from a future
FAN_EVENT_INFO_TYPE_WB_ERROR

> +
> +struct fanotify_event_info_error {
> +       struct fanotify_event_info_header hdr;
> +       int version;
> +       int error;
> +       __kernel_fsid_t fsid;
> +};

I suggest to put an error seq counter in this struct.
The per-sb seq counter can be provided by the filesystem
or by fsnotify.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 10/15] fanotify: Introduce code location record
  2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
@ 2021-04-27  7:11   ` Amir Goldstein
  2021-04-29 18:40     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  7:11 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This patch introduces an optional info record that describes the
> source (as in the region of the source-code where an event was
> initiated).  This record is not produced for other type of existing
> notification, but it is optionally enabled for FAN_ERROR notifications.
>

I find this functionality controversial, because think that the fs provided
s_last_error*, s_first_error* is more reliable and more powerful than this
functionality.

Let's leave it for a future extending proposal, should fanotify event reporting
proposal pass muster, shall we?
Or do you think that without this optional extension fanotify event reporting
will not be valuable enough?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record
  2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
@ 2021-04-27  7:12   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  7:12 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Allow a FS_ERROR_TYPE notification to send a filesystem provided blob
> back to userspace.  This is useful for filesystems who want to provide
> debug information for recovery tools.
>

Same comment as for FAN_EVENT_INFO_TYPE_LOCATION.
Can we leave this patch out of the discussion for now?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
  2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
  2021-04-26 22:45   ` kernel test robot
@ 2021-04-27  7:25   ` Amir Goldstein
  1 sibling, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  7:25 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> The FAN_ERROR mark is used by filesystem wide monitoring tools to
> receive notifications of type FS_ERROR_EVENT, emited by filesystems when
> a problem is detected.  The error notification includes a generic error
> descriptor, an optional location record and a filesystem specific blob.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/fanotify/fanotify.c      | 48 +++++++++++++++++++----
>  fs/notify/fanotify/fanotify.h      |  8 ++++
>  fs/notify/fanotify/fanotify_user.c | 63 ++++++++++++++++++++++++++++++
>  include/linux/fanotify.h           |  9 ++++-
>  include/uapi/linux/fanotify.h      |  2 +
>  5 files changed, 120 insertions(+), 10 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 98591a8155a7..6bae23d42e5e 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -240,12 +240,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
>                  __func__, iter_info->report_mask, event_mask, data, data_type);
>
>         if (!fid_mode) {
> -               /* Do we have path to open a file descriptor? */
> -               if (!path)
> -                       return 0;
> -               /* Path type events are only relevant for files and dirs */
> -               if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
> -                       return 0;
> +               if (!fanotify_is_error_event(event_mask)) {

This open coded nested condition is not nice.
If we get as far as this, I will explain what needs to be done.
Need helpers fanotify_is_reporting_fd(), fanotify_is_reporting_fid() and
fanotify_is_reporting_dir_fid().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 13/15] ext4: Send notifications on error
  2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
@ 2021-04-29 13:19 ` Dan Carpenter
  2021-04-27  4:32   ` Amir Goldstein
  2021-04-29  0:57   ` Darrick J. Wong
  2 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2021-04-27  8:36 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 17965 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210426184201.4177978-14-krisman@collabora.com>
References: <20210426184201.4177978-14-krisman@collabora.com>
TO: Gabriel Krisman Bertazi <krisman@collabora.com>

Hi Gabriel,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on pcmoore-audit/next]
[also build test WARNING on ext4/dev linus/master v5.12]
[cannot apply to ext3/fsnotify next-20210426]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
:::::: branch date: 14 hours ago
:::::: commit date: 14 hours ago
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/ext4/super.c:739 ext4_fsnotify_error() warn: variable dereferenced before check 'inode' (see line 738)
fs/ext4/super.c:918 __ext4_std_error() error: uninitialized symbol 'errstr'.

Old smatch warnings:
fs/ext4/super.c:3787 ext4_register_li_request() error: we previously assumed 'ext4_li_info' could be null (see line 3769)
fs/ext4/super.c:4463 ext4_fill_super() warn: bitwise AND condition is false here

vim +/inode +739 fs/ext4/super.c

ac27a0ec112a08 Dave Kleikamp           2006-10-11  731  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  732  static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  733  				const char *func, int line,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  734  				const char *desc, struct va_format *vaf)
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  735  {
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  736  	struct ext4_error_inode_report report;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  737  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @738  	if (inode->i_sb->s_fsnotify_marks) {
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @739  		report.inode = inode ? inode->i_ino : -1L;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  740  		report.block = block ? block : -1L;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  741  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  742  		snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  743  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  744  		fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  745  	}
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  746  }
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  747  
efbed4dc5857f8 Theodore Ts'o           2013-10-17  748  #define ext4_error_ratelimit(sb)					\
efbed4dc5857f8 Theodore Ts'o           2013-10-17  749  		___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state),	\
efbed4dc5857f8 Theodore Ts'o           2013-10-17  750  			     "EXT4-fs error")
efbed4dc5857f8 Theodore Ts'o           2013-10-17  751  
12062dddda4509 Eric Sandeen            2010-02-15  752  void __ext4_error(struct super_block *sb, const char *function,
014c9caa29d3a4 Jan Kara                2020-11-27  753  		  unsigned int line, bool force_ro, int error, __u64 block,
54d3adbc29f0c7 Theodore Ts'o           2020-03-28  754  		  const char *fmt, ...)
ac27a0ec112a08 Dave Kleikamp           2006-10-11  755  {
0ff2ea7d84e311 Joe Perches             2010-12-19  756  	struct va_format vaf;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  757  	va_list args;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  758  
0db1ff222d40f1 Theodore Ts'o           2017-02-05  759  	if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
0db1ff222d40f1 Theodore Ts'o           2017-02-05  760  		return;
0db1ff222d40f1 Theodore Ts'o           2017-02-05  761  
ccf0f32acd436b Theodore Ts'o           2018-02-18  762  	trace_ext4_error(sb, function, line);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  763  
ac27a0ec112a08 Dave Kleikamp           2006-10-11  764  	va_start(args, fmt);
0ff2ea7d84e311 Joe Perches             2010-12-19  765  	vaf.fmt = fmt;
0ff2ea7d84e311 Joe Perches             2010-12-19  766  	vaf.va = &args;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  767  	if (ext4_error_ratelimit(sb)) {
efbed4dc5857f8 Theodore Ts'o           2013-10-17  768  		printk(KERN_CRIT
efbed4dc5857f8 Theodore Ts'o           2013-10-17  769  		       "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
0ff2ea7d84e311 Joe Perches             2010-12-19  770  		       sb->s_id, function, line, current->comm, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  771  
efbed4dc5857f8 Theodore Ts'o           2013-10-17  772  	}
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  773  	ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  774  	va_end(args);
e789ca0cc1d512 Jan Kara                2020-12-16  775  	ext4_handle_error(sb, force_ro, error, 0, block, function, line);
ac27a0ec112a08 Dave Kleikamp           2006-10-11  776  }
ac27a0ec112a08 Dave Kleikamp           2006-10-11  777  
e7c96e8e47baf2 Joe Perches             2013-07-01  778  void __ext4_error_inode(struct inode *inode, const char *function,
54d3adbc29f0c7 Theodore Ts'o           2020-03-28  779  			unsigned int line, ext4_fsblk_t block, int error,
273df556b6ee20 Frank Mayhar            2010-03-02  780  			const char *fmt, ...)
273df556b6ee20 Frank Mayhar            2010-03-02  781  {
273df556b6ee20 Frank Mayhar            2010-03-02  782  	va_list args;
f7c21177af0b32 Theodore Ts'o           2011-01-10  783  	struct va_format vaf;
273df556b6ee20 Frank Mayhar            2010-03-02  784  
0db1ff222d40f1 Theodore Ts'o           2017-02-05  785  	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
0db1ff222d40f1 Theodore Ts'o           2017-02-05  786  		return;
0db1ff222d40f1 Theodore Ts'o           2017-02-05  787  
ccf0f32acd436b Theodore Ts'o           2018-02-18  788  	trace_ext4_error(inode->i_sb, function, line);
273df556b6ee20 Frank Mayhar            2010-03-02  789  	va_start(args, fmt);
f7c21177af0b32 Theodore Ts'o           2011-01-10  790  	vaf.fmt = fmt;
f7c21177af0b32 Theodore Ts'o           2011-01-10  791  	vaf.va = &args;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  792  	if (ext4_error_ratelimit(inode->i_sb)) {
c398eda0e43a79 Theodore Ts'o           2010-07-27  793  		if (block)
d9ee81da93e86a Joe Perches             2012-03-19  794  			printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
d9ee81da93e86a Joe Perches             2012-03-19  795  			       "inode #%lu: block %llu: comm %s: %pV\n",
d9ee81da93e86a Joe Perches             2012-03-19  796  			       inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches             2012-03-19  797  			       block, current->comm, &vaf);
d9ee81da93e86a Joe Perches             2012-03-19  798  		else
d9ee81da93e86a Joe Perches             2012-03-19  799  			printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
d9ee81da93e86a Joe Perches             2012-03-19  800  			       "inode #%lu: comm %s: %pV\n",
d9ee81da93e86a Joe Perches             2012-03-19  801  			       inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches             2012-03-19  802  			       current->comm, &vaf);
efbed4dc5857f8 Theodore Ts'o           2013-10-17  803  	}
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  804  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  805  	ext4_fsnotify_error(error, inode, block, function, line, NULL, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  806  	va_end(args);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  807  
e789ca0cc1d512 Jan Kara                2020-12-16  808  	ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
54d3adbc29f0c7 Theodore Ts'o           2020-03-28  809  			  function, line);
273df556b6ee20 Frank Mayhar            2010-03-02  810  }
273df556b6ee20 Frank Mayhar            2010-03-02  811  
e7c96e8e47baf2 Joe Perches             2013-07-01  812  void __ext4_error_file(struct file *file, const char *function,
f7c21177af0b32 Theodore Ts'o           2011-01-10  813  		       unsigned int line, ext4_fsblk_t block,
f7c21177af0b32 Theodore Ts'o           2011-01-10  814  		       const char *fmt, ...)
273df556b6ee20 Frank Mayhar            2010-03-02  815  {
273df556b6ee20 Frank Mayhar            2010-03-02  816  	va_list args;
f7c21177af0b32 Theodore Ts'o           2011-01-10  817  	struct va_format vaf;
496ad9aa8ef448 Al Viro                 2013-01-23  818  	struct inode *inode = file_inode(file);
273df556b6ee20 Frank Mayhar            2010-03-02  819  	char pathname[80], *path;
273df556b6ee20 Frank Mayhar            2010-03-02  820  
0db1ff222d40f1 Theodore Ts'o           2017-02-05  821  	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
0db1ff222d40f1 Theodore Ts'o           2017-02-05  822  		return;
0db1ff222d40f1 Theodore Ts'o           2017-02-05  823  
ccf0f32acd436b Theodore Ts'o           2018-02-18  824  	trace_ext4_error(inode->i_sb, function, line);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  825  
9bf39ab2adafd7 Miklos Szeredi          2015-06-19  826  	path = file_path(file, pathname, sizeof(pathname));
f9a62d090cf47f Dan Carpenter           2011-01-10  827  	if (IS_ERR(path))
273df556b6ee20 Frank Mayhar            2010-03-02  828  		path = "(unknown)";
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  829  
f7c21177af0b32 Theodore Ts'o           2011-01-10  830  	va_start(args, fmt);
f7c21177af0b32 Theodore Ts'o           2011-01-10  831  	vaf.fmt = fmt;
f7c21177af0b32 Theodore Ts'o           2011-01-10  832  	vaf.va = &args;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  833  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  834  	if (ext4_error_ratelimit(inode->i_sb)) {
d9ee81da93e86a Joe Perches             2012-03-19  835  		if (block)
d9ee81da93e86a Joe Perches             2012-03-19  836  			printk(KERN_CRIT
d9ee81da93e86a Joe Perches             2012-03-19  837  			       "EXT4-fs error (device %s): %s:%d: inode #%lu: "
d9ee81da93e86a Joe Perches             2012-03-19  838  			       "block %llu: comm %s: path %s: %pV\n",
d9ee81da93e86a Joe Perches             2012-03-19  839  			       inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches             2012-03-19  840  			       block, current->comm, path, &vaf);
d9ee81da93e86a Joe Perches             2012-03-19  841  		else
d9ee81da93e86a Joe Perches             2012-03-19  842  			printk(KERN_CRIT
d9ee81da93e86a Joe Perches             2012-03-19  843  			       "EXT4-fs error (device %s): %s:%d: inode #%lu: "
d9ee81da93e86a Joe Perches             2012-03-19  844  			       "comm %s: path %s: %pV\n",
d9ee81da93e86a Joe Perches             2012-03-19  845  			       inode->i_sb->s_id, function, line, inode->i_ino,
d9ee81da93e86a Joe Perches             2012-03-19  846  			       current->comm, path, &vaf);
efbed4dc5857f8 Theodore Ts'o           2013-10-17  847  	}
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  848  	ext4_fsnotify_error(EFSCORRUPTED, inode, block, function, line, NULL, &vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  849  	va_end(args);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  850  
e789ca0cc1d512 Jan Kara                2020-12-16  851  	ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
54d3adbc29f0c7 Theodore Ts'o           2020-03-28  852  			  function, line);
273df556b6ee20 Frank Mayhar            2010-03-02  853  }
273df556b6ee20 Frank Mayhar            2010-03-02  854  
722887ddc8982f Theodore Ts'o           2013-02-08  855  const char *ext4_decode_error(struct super_block *sb, int errno,
ac27a0ec112a08 Dave Kleikamp           2006-10-11  856  			      char nbuf[16])
ac27a0ec112a08 Dave Kleikamp           2006-10-11  857  {
ac27a0ec112a08 Dave Kleikamp           2006-10-11  858  	char *errstr = NULL;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  859  
ac27a0ec112a08 Dave Kleikamp           2006-10-11  860  	switch (errno) {
6a797d27378389 Darrick J. Wong         2015-10-17  861  	case -EFSCORRUPTED:
6a797d27378389 Darrick J. Wong         2015-10-17  862  		errstr = "Corrupt filesystem";
6a797d27378389 Darrick J. Wong         2015-10-17  863  		break;
6a797d27378389 Darrick J. Wong         2015-10-17  864  	case -EFSBADCRC:
6a797d27378389 Darrick J. Wong         2015-10-17  865  		errstr = "Filesystem failed CRC";
6a797d27378389 Darrick J. Wong         2015-10-17  866  		break;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  867  	case -EIO:
ac27a0ec112a08 Dave Kleikamp           2006-10-11  868  		errstr = "IO failure";
ac27a0ec112a08 Dave Kleikamp           2006-10-11  869  		break;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  870  	case -ENOMEM:
ac27a0ec112a08 Dave Kleikamp           2006-10-11  871  		errstr = "Out of memory";
ac27a0ec112a08 Dave Kleikamp           2006-10-11  872  		break;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  873  	case -EROFS:
78f1ddbb498283 Theodore Ts'o           2009-07-27  874  		if (!sb || (EXT4_SB(sb)->s_journal &&
78f1ddbb498283 Theodore Ts'o           2009-07-27  875  			    EXT4_SB(sb)->s_journal->j_flags & JBD2_ABORT))
ac27a0ec112a08 Dave Kleikamp           2006-10-11  876  			errstr = "Journal has aborted";
ac27a0ec112a08 Dave Kleikamp           2006-10-11  877  		else
ac27a0ec112a08 Dave Kleikamp           2006-10-11  878  			errstr = "Readonly filesystem";
ac27a0ec112a08 Dave Kleikamp           2006-10-11  879  		break;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  880  	default:
ac27a0ec112a08 Dave Kleikamp           2006-10-11  881  		/* If the caller passed in an extra buffer for unknown
ac27a0ec112a08 Dave Kleikamp           2006-10-11  882  		 * errors, textualise them now.  Else we just return
ac27a0ec112a08 Dave Kleikamp           2006-10-11  883  		 * NULL. */
ac27a0ec112a08 Dave Kleikamp           2006-10-11  884  		if (nbuf) {
ac27a0ec112a08 Dave Kleikamp           2006-10-11  885  			/* Check for truncated error codes... */
ac27a0ec112a08 Dave Kleikamp           2006-10-11  886  			if (snprintf(nbuf, 16, "error %d", -errno) >= 0)
ac27a0ec112a08 Dave Kleikamp           2006-10-11  887  				errstr = nbuf;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  888  		}
ac27a0ec112a08 Dave Kleikamp           2006-10-11  889  		break;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  890  	}
ac27a0ec112a08 Dave Kleikamp           2006-10-11  891  
ac27a0ec112a08 Dave Kleikamp           2006-10-11  892  	return errstr;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  893  }
ac27a0ec112a08 Dave Kleikamp           2006-10-11  894  
617ba13b31fbf5 Mingming Cao            2006-10-11  895  /* __ext4_std_error decodes expected errors from journaling functions
ac27a0ec112a08 Dave Kleikamp           2006-10-11  896   * automatically and invokes the appropriate error response.  */
ac27a0ec112a08 Dave Kleikamp           2006-10-11  897  
c398eda0e43a79 Theodore Ts'o           2010-07-27  898  void __ext4_std_error(struct super_block *sb, const char *function,
c398eda0e43a79 Theodore Ts'o           2010-07-27  899  		      unsigned int line, int errno)
ac27a0ec112a08 Dave Kleikamp           2006-10-11  900  {
ac27a0ec112a08 Dave Kleikamp           2006-10-11  901  	char nbuf[16];
ac27a0ec112a08 Dave Kleikamp           2006-10-11  902  	const char *errstr;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  903  
0db1ff222d40f1 Theodore Ts'o           2017-02-05  904  	if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
0db1ff222d40f1 Theodore Ts'o           2017-02-05  905  		return;
0db1ff222d40f1 Theodore Ts'o           2017-02-05  906  
ac27a0ec112a08 Dave Kleikamp           2006-10-11  907  	/* Special case: if the error is EROFS, and we're not already
ac27a0ec112a08 Dave Kleikamp           2006-10-11  908  	 * inside a transaction, then there's really no point in logging
ac27a0ec112a08 Dave Kleikamp           2006-10-11  909  	 * an error. */
bc98a42c1f7d0f David Howells           2017-07-17  910  	if (errno == -EROFS && journal_current_handle() == NULL && sb_rdonly(sb))
ac27a0ec112a08 Dave Kleikamp           2006-10-11  911  		return;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  912  
efbed4dc5857f8 Theodore Ts'o           2013-10-17  913  	if (ext4_error_ratelimit(sb)) {
617ba13b31fbf5 Mingming Cao            2006-10-11  914  		errstr = ext4_decode_error(sb, errno, nbuf);
c398eda0e43a79 Theodore Ts'o           2010-07-27  915  		printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
c398eda0e43a79 Theodore Ts'o           2010-07-27  916  		       sb->s_id, function, line, errstr);
efbed4dc5857f8 Theodore Ts'o           2013-10-17  917  	}
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @918  	ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
ac27a0ec112a08 Dave Kleikamp           2006-10-11  919  
e789ca0cc1d512 Jan Kara                2020-12-16  920  	ext4_handle_error(sb, false, -errno, 0, 0, function, line);
ac27a0ec112a08 Dave Kleikamp           2006-10-11  921  }
ac27a0ec112a08 Dave Kleikamp           2006-10-11  922  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 41522 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type
  2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
@ 2021-04-27  8:39   ` Amir Goldstein
  0 siblings, 0 replies; 46+ messages in thread
From: Amir Goldstein @ 2021-04-27  8:39 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Expose a new type of fsnotify event for filesystems to report errors for
> userspace monitoring tools.  fanotify will send this type of
> notification for FAN_ERROR marks.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/notify/fsnotify.c             |  2 +-
>  include/linux/fsnotify_backend.h | 16 ++++++++++++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 30d422b8c0fc..9fff35e67b37 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -558,7 +558,7 @@ static __init int fsnotify_init(void)
>  {
>         int ret;
>
> -       BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
> +       BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26);
>
>         ret = init_srcu_struct(&fsnotify_mark_srcu);
>         if (ret)
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index a1a4dd69e5ed..f850bfbe30d4 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -48,6 +48,8 @@
>  #define FS_ACCESS_PERM         0x00020000      /* access event in a permissions hook */
>  #define FS_OPEN_EXEC_PERM      0x00040000      /* open/exec event in a permission hook */
>
> +#define FS_ERROR               0x00100000      /* Used for filesystem error reporting */
> +

Why skip 0x00080000?

Anyway, event bits are starting to run out so I would prefer that you overload
an existing bit used only by inotify/dnotify.

FS_IN_IGNORED is completely internal to inotify and there is no need to
set it in i_fsnotify_mask at all, so if we remove the bit from the output of
inotify_arg_to_mask() no functionality will change and we will be able to
overload the event bit for FS_ERROR (see patch below).

I also kind of like that FS_ERROR is adjacent to FS_UMOUNT and
FS_Q_OVERFLOW :-)

Other FS_IN/FS_DN bits may also be reclaimed but it takes a bit more work
I have patches for those.

Thanks,
Amir.

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 98f61b31745a..351517bae716 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct
inode *inode, u32 arg)
        __u32 mask;

        /*
-        * Everything should accept their own ignored and should receive events
-        * when the inode is unmounted.  All directories care about children.
+        * Everything should receive events when the inode is unmounted.
+        * All directories care about children.
         */
-       mask = (FS_IN_IGNORED | FS_UNMOUNT);
+       mask = FS_UNMOUNT;
        if (S_ISDIR(inode->i_mode))
                mask |= FS_EVENT_ON_CHILD;

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1ce66748a2d2..ecbafb3f36d7 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -42,6 +42,11 @@

 #define FS_UNMOUNT             0x00002000      /* inode on umount fs */
 #define FS_Q_OVERFLOW          0x00004000      /* Event queued overflowed */
+#define FS_ERROR               0x00008000      /* Filesystem error report */
+/*
+ * FS_IN_IGNORED overloads FS_ERROR.  It is only used internally by inotify
+ * which does not support FS_ERROR.
+ */
 #define FS_IN_IGNORED          0x00008000      /* last inotify event here */

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 00/15] File system wide monitoring
  2021-04-27  4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
@ 2021-04-27 15:44   ` Gabriel Krisman Bertazi
  2021-05-11  4:45     ` Khazhy Kumykov
  2021-05-11 10:43   ` Jan Kara
  1 sibling, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-27 15:44 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

Amir Goldstein <amir73il@gmail.com> writes:

> On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> Hi,
>>
>> In an attempt to consolidate some of the feedback from the previous
>> proposals, I wrote a new attempt to solve the file system error reporting
>> problem.  Before I spend more time polishing it, I'd like to hear your
>> feedback if I'm going in the wrong direction, in particular with the
>> modifications to fsnotify.
>>
>
> IMO you are going in the right direction, but you have gone a bit too far ;-)
>
> My understanding of the requirements and my interpretation of the feedback
> from filesystem maintainers is that the missing piece in the ecosystem is a
> user notification that "something went wrong". The "what went wrong" part
> is something that users and admins have long been able to gather from the
> kernel log and from filesystem tools (e.g. last error recorded).
>
> I do not see the need to duplicate existing functionality in fsmonitor.
> Don't get me wrong, I understand why it would have been nice for fsmonitor
> to be able to get all the errors nicely without looking anywhere else, but I
> don't think it justifies the extra complexity.

Hi Amir,

Thanks for the detailed review.

The reasons for the location record and the ring buffer is the use case
from Google to do analysis on a series of errors.  I understand this is
important to them, which is why I expanded a bit on the 'what went
wrong' and multiple errors.  In addition, The file system specific blob
attempts to assist online recovery tools with more information, but it
might make sense to do it in the future, when it is needed.

>> This RFC follows up on my previous proposals which attempted to leverage
>> watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
>> to push error notifications to user space.  This proposal starts by, as
>> suggested by Darrick, limiting the scope of what I'm trying to do to an
>> interface for administrators to monitor the health of a file system,
>> instead of a generic inteface for file errors.  Therefore, this doesn't
>> solve the problem of writeback errors or the need to watch a specific
>> subsystem.
>>
>> * Format
>>
>> The feature is implemented on top of fanotify, as a new type of fanotify
>> mark, FAN_ERROR, which a file system monitoring tool can register to
>
> You have a terminology mistake throughout your series.
> FAN_ERROR is not a type of a mark, it is a type of an event.
> A mark describes the watched object (i.e. a filesystem, mount, inode).

Right.  I understand the mistake and will fix it around the series.
>
>> receive notifications.  A notification is split in three parts, and only
>> the first is guaranteed to exist for any given error event:
>>
>>  - FS generic data: A file system agnostic structure that has a generic
>>  error code and identifies the filesystem.  Basically, it let's
>>  userspace know something happen on a monitored filesystem.
>
> I think an error seq counter per fs would be a nice addition to generic data.
> It does not need to be persistent (it could be if filesystem supports it).

Makes sense to me.

>>
>>  - FS location data: Identifies where in the code the problem
>>  happened. (This is important for the use case of analysing frequent
>>  error points that we discussed earlier).
>>
>>  - FS specific data: A detailed error report in a filesystem specific
>>  format that details what the error is.  Ideally, a capable monitoring
>>  tool can use the information here for error recovery.  For instance,
>>  xfs can put the xfs_scrub structures here, ext4 can send its error
>>  reports, etc.  An example of usage is done in the ext4 patch of this
>>  series.
>>
>> More details on the information in each record can be found on the
>> documentation introduced in patch 15.
>>
>> * Using fanotify
>>
>> Using fanotify for this kind of thing is slightly tricky because we want
>> to guarantee delivery in some complicated conditions, for instance, the
>> file system might want to send an error while holding several locks.
>>
>> Instead of working around file system constraints at the file system
>> level, this proposal tries to make the FAN_ERROR submission safe in
>> those contexts.  This is done with a new mode in fsnotify that
>> preallocates the memory at group creation to be used for the
>> notification submission.
>>
>> This new mode in fsnotify introduces a ring buffer to queue
>> notifications, which eliminates the allocation path in fsnotify.  From
>> what I saw, the allocation is the only problem in fsnotify for
>> filesystems to submit errors in constrained situations.
>>
>
> The ring buffer functionality for fsnotify is interesting and it may be
> useful on its own, but IMO, its too big of a hammer for the problem
> at hand.
>
> The question that you should be asking yourself is what is the
> expected behavior in case of a flood of filesystem corruption errors.
> I think it has already been expressed by filesystem maintainers on
> one your previous postings, that a flood of filesystem corruption
> errors is often noise and the only interesting information is the
> first error.

My idea was be to provide an ioctl for the user to resize the ring
buffer when needed, to make the flood manageable. But I understand your
main point about the ring buffer.  i'm not sure saving only the first
notification solves Google's use case of error monitoring and analysis,
though.  Khazhy, Ted, can you weight in?

> For this reason, I think that FS_ERROR could be implemented
> by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
>
> struct fsnotify_sb_mark {
>         struct fsnotify_mark fsn_mark;
>         struct fsnotify_error_info info;
> }
>
> Similar to fd sampled errseq, there can be only one error report
> per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
> the error report can be allocated at the time of setting the filesystem mark.
>
> With this, you will not need the added complexity of the ring buffer
> and you will not need to limit FAN_ERROR reporting to a group that
> is only listening for FAN_ERROR, which is an unneeded limitation IMO.

The limitation exists because I was concerned about not breaking the
semantics of FAN_ACCESS and others, with regards to merged
notifications.  I believe there should be no other reason why
notifications of FAN_CLASS_NOTIF can't be sent to the ring buffer too.
That limitation could be lifted for everything but permission events, I
think.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 13/15] ext4: Send notifications on error
  2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
  2021-04-26 23:10   ` kernel test robot
  2021-04-27  4:32   ` Amir Goldstein
@ 2021-04-29  0:57   ` Darrick J. Wong
  2 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2021-04-29  0:57 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: amir73il, tytso, david, jack, dhowells, khazhy, linux-fsdevel,
	linux-ext4, kernel

On Mon, Apr 26, 2021 at 02:41:59PM -0400, Gabriel Krisman Bertazi wrote:
> Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> whenever a ext4 error condition is triggered.  This follows the existing
> error conditions in ext4, so it is hooked to the ext4_error* functions.
> 
> It also follows the current dmesg reporting in the format.  The
> filesystem message is composed mostly by the string that would be
> otherwise printed in dmesg.
> 
> A new ext4 specific record format is exposed in the uapi, such that a
> monitoring tool knows what to expect when listening errors of an ext4
> filesystem.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  fs/ext4/super.c                  | 60 ++++++++++++++++++++++++--------
>  include/uapi/linux/ext4-notify.h | 17 +++++++++
>  2 files changed, 62 insertions(+), 15 deletions(-)
>  create mode 100644 include/uapi/linux/ext4-notify.h
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b9693680463a..032e29e7ff6a 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -46,6 +46,8 @@
>  #include <linux/part_stat.h>
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
> +#include <linux/fsnotify.h>
> +#include <uapi/linux/ext4-notify.h>
>  
>  #include "ext4.h"
>  #include "ext4_extents.h"	/* Needed for trace points definition */
> @@ -727,6 +729,22 @@ static void flush_stashed_error_work(struct work_struct *work)
>  	ext4_commit_super(sbi->s_sb);
>  }
>  
> +static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
> +				const char *func, int line,
> +				const char *desc, struct va_format *vaf)
> +{
> +	struct ext4_error_inode_report report;
> +
> +	if (inode->i_sb->s_fsnotify_marks) {
> +		report.inode = inode ? inode->i_ino : -1L;
> +		report.block = block ? block : -1L;
> +
> +		snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
> +
> +		fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
> +	}
> +}
> +
>  #define ext4_error_ratelimit(sb)					\
>  		___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state),	\
>  			     "EXT4-fs error")
> @@ -742,15 +760,18 @@ void __ext4_error(struct super_block *sb, const char *function,
>  		return;
>  
>  	trace_ext4_error(sb, function, line);
> +
> +	va_start(args, fmt);
> +	vaf.fmt = fmt;
> +	vaf.va = &args;
>  	if (ext4_error_ratelimit(sb)) {
> -		va_start(args, fmt);
> -		vaf.fmt = fmt;
> -		vaf.va = &args;
>  		printk(KERN_CRIT
>  		       "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
>  		       sb->s_id, function, line, current->comm, &vaf);
> -		va_end(args);
> +
>  	}
> +	ext4_fsnotify_error(error, sb->s_root->d_inode, block, function, line, NULL, &vaf);
> +	va_end(args);
>  	ext4_handle_error(sb, force_ro, error, 0, block, function, line);
>  }
>  
> @@ -765,10 +786,10 @@ void __ext4_error_inode(struct inode *inode, const char *function,
>  		return;
>  
>  	trace_ext4_error(inode->i_sb, function, line);
> +	va_start(args, fmt);
> +	vaf.fmt = fmt;
> +	vaf.va = &args;
>  	if (ext4_error_ratelimit(inode->i_sb)) {
> -		va_start(args, fmt);
> -		vaf.fmt = fmt;
> -		vaf.va = &args;
>  		if (block)
>  			printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: "
>  			       "inode #%lu: block %llu: comm %s: %pV\n",
> @@ -779,8 +800,11 @@ void __ext4_error_inode(struct inode *inode, const char *function,
>  			       "inode #%lu: comm %s: %pV\n",
>  			       inode->i_sb->s_id, function, line, inode->i_ino,
>  			       current->comm, &vaf);
> -		va_end(args);
>  	}
> +
> +	ext4_fsnotify_error(error, inode, block, function, line, NULL, &vaf);
> +	va_end(args);
> +
>  	ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
>  			  function, line);
>  }
> @@ -798,13 +822,16 @@ void __ext4_error_file(struct file *file, const char *function,
>  		return;
>  
>  	trace_ext4_error(inode->i_sb, function, line);
> +
> +	path = file_path(file, pathname, sizeof(pathname));
> +	if (IS_ERR(path))
> +		path = "(unknown)";
> +
> +	va_start(args, fmt);
> +	vaf.fmt = fmt;
> +	vaf.va = &args;
> +
>  	if (ext4_error_ratelimit(inode->i_sb)) {
> -		path = file_path(file, pathname, sizeof(pathname));
> -		if (IS_ERR(path))
> -			path = "(unknown)";
> -		va_start(args, fmt);
> -		vaf.fmt = fmt;
> -		vaf.va = &args;
>  		if (block)
>  			printk(KERN_CRIT
>  			       "EXT4-fs error (device %s): %s:%d: inode #%lu: "
> @@ -817,8 +844,10 @@ void __ext4_error_file(struct file *file, const char *function,
>  			       "comm %s: path %s: %pV\n",
>  			       inode->i_sb->s_id, function, line, inode->i_ino,
>  			       current->comm, path, &vaf);
> -		va_end(args);
>  	}
> +	ext4_fsnotify_error(EFSCORRUPTED, inode, block, function, line, NULL, &vaf);
> +	va_end(args);
> +
>  	ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
>  			  function, line);
>  }
> @@ -886,6 +915,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
>  		printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
>  		       sb->s_id, function, line, errstr);
>  	}
> +	ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
>  
>  	ext4_handle_error(sb, false, -errno, 0, 0, function, line);
>  }
> diff --git a/include/uapi/linux/ext4-notify.h b/include/uapi/linux/ext4-notify.h
> new file mode 100644
> index 000000000000..31a3bbcafd13
> --- /dev/null
> +++ b/include/uapi/linux/ext4-notify.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * Copyright 2021, Collabora Ltd.
> + */
> +
> +#ifndef EXT4_NOTIFY_H
> +#define EXT4_NOTIFY_H
> +
> +#define EXT4_FSN_DESC_LEN	256
> +
> +struct ext4_error_inode_report {
> +	u64 inode;

I don't have much to contribute this time, other than suggesting that
you might want to encode the inode generation here so that forensics
tools won't waste their time if the inode has been deleted and recreated
in between when the error happens and when the fs gets pulled offline
for analysis.

(...and maybe add a u32 flags field that can remain zero for now)

> +	u64 block;

...and maybe call this "lblk" (assuming this is the logical block offset
within the file?) since that's already in wide use around e2fsprogs and
fs/ext4/.

--D

> +	char desc[EXT4_FSN_DESC_LEN];
> +};
> +
> +#endif
> -- 
> 2.31.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark
@ 2021-04-29 11:31 ` Dan Carpenter
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Carpenter @ 2021-04-29 11:31 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3393 bytes --]

Hi Gabriel,

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/notify/fanotify/fanotify_user.c:112 fanotify_event_len() warn: this array is probably non-NULL. 'fee->fs_data'

vim +112 fs/notify/fanotify/fanotify_user.c

380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   90  static size_t fanotify_event_len(struct fanotify_event *event,
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   91  				 unsigned int fid_mode)
5e469c830fdb5a Amir Goldstein          2019-01-10   92  {
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   93  	size_t event_len = FAN_EVENT_METADATA_LEN;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   94  	struct fanotify_info *info;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   95  	int dir_fh_len;
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26   96  	int fh_len;
929943b38daf81 Amir Goldstein          2020-07-16   97  	int dot_len = 0;
f454fa610a69b9 Amir Goldstein          2020-07-16   98  
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26   99  	if (fanotify_is_error_event(event->mask)) {
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  100  		struct fanotify_error_event *fee = FANOTIFY_EE(event);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  101  		/*
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  102  		 *  Error events (FAN_ERROR) have a different format
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  103  		 *  as follows:
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  104  		 * [ event_metadata            ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  105  		 * [ fs-generic error header   ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  106  		 * [ error location (optional) ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  107  		 * [ fs-specific blob          ]
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  108  		 */
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  109  		event_len = fanotify_error_info_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  110  		if (fee->loc.function)
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  111  			event_len += fanotify_location_info_len(&fee->loc);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26 @112  		if (fee->fs_data)
                                                                            ^^^^^^^^^^^^
This is a zero length array, not a pointer.  It can't be NULL.

6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  113  			event_len += fanotify_error_fsdata_len(fee);
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  114  		return event_len;
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  115  	}
6179a61e1067e6 Gabriel Krisman Bertazi 2021-04-26  116  
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  117  	if (!fid_mode)
380d986e6a7cb0 Gabriel Krisman Bertazi 2021-04-26  118  		return event_len;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 13/15] ext4: Send notifications on error
@ 2021-04-29 13:19 ` Dan Carpenter
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Carpenter @ 2021-04-29 13:19 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4560 bytes --]

Hi Gabriel,

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210427-024627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next
config: i386-randconfig-m021-20210426 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/ext4/super.c:739 ext4_fsnotify_error() warn: variable dereferenced before check 'inode' (see line 738)
fs/ext4/super.c:918 __ext4_std_error() error: uninitialized symbol 'errstr'.

vim +/inode +739 fs/ext4/super.c

151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  732  static void ext4_fsnotify_error(int error, struct inode *inode, __u64 block,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  733  				const char *func, int line,
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  734  				const char *desc, struct va_format *vaf)
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  735  {
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  736  	struct ext4_error_inode_report report;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  737  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @738  	if (inode->i_sb->s_fsnotify_marks) {
                                                                    ^^^^^
Dereference

151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @739  		report.inode = inode ? inode->i_ino : -1L;
                                                                                       ^^^^^
Check too late.

151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  740  		report.block = block ? block : -1L;
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  741  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  742  		snprintf(report.desc, EXT4_FSN_DESC_LEN, "%s%pV\n", desc?:"", vaf);
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  743  
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  744  		fsnotify_error_event(error, inode, func, line, &report, sizeof(report));
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  745  	}
151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26  746  }

[ snip ]

c398eda0e43a79 Theodore Ts'o           2010-07-27  898  void __ext4_std_error(struct super_block *sb, const char *function,
c398eda0e43a79 Theodore Ts'o           2010-07-27  899  		      unsigned int line, int errno)
ac27a0ec112a08 Dave Kleikamp           2006-10-11  900  {
ac27a0ec112a08 Dave Kleikamp           2006-10-11  901  	char nbuf[16];
ac27a0ec112a08 Dave Kleikamp           2006-10-11  902  	const char *errstr;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  903  
0db1ff222d40f1 Theodore Ts'o           2017-02-05  904  	if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
0db1ff222d40f1 Theodore Ts'o           2017-02-05  905  		return;
0db1ff222d40f1 Theodore Ts'o           2017-02-05  906  
ac27a0ec112a08 Dave Kleikamp           2006-10-11  907  	/* Special case: if the error is EROFS, and we're not already
ac27a0ec112a08 Dave Kleikamp           2006-10-11  908  	 * inside a transaction, then there's really no point in logging
ac27a0ec112a08 Dave Kleikamp           2006-10-11  909  	 * an error. */
bc98a42c1f7d0f David Howells           2017-07-17  910  	if (errno == -EROFS && journal_current_handle() == NULL && sb_rdonly(sb))
ac27a0ec112a08 Dave Kleikamp           2006-10-11  911  		return;
ac27a0ec112a08 Dave Kleikamp           2006-10-11  912  
efbed4dc5857f8 Theodore Ts'o           2013-10-17  913  	if (ext4_error_ratelimit(sb)) {
617ba13b31fbf5 Mingming Cao            2006-10-11  914  		errstr = ext4_decode_error(sb, errno, nbuf);
c398eda0e43a79 Theodore Ts'o           2010-07-27  915  		printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
c398eda0e43a79 Theodore Ts'o           2010-07-27  916  		       sb->s_id, function, line, errstr);
efbed4dc5857f8 Theodore Ts'o           2013-10-17  917  	}

"errstr" not set on error path.

151ead19fe71b5 Gabriel Krisman Bertazi 2021-04-26 @918  	ext4_fsnotify_error(errno, NULL, -1L, function, line, errstr, NULL);
ac27a0ec112a08 Dave Kleikamp           2006-10-11  919  
e789ca0cc1d512 Jan Kara                2020-12-16  920  	ext4_handle_error(sb, false, -errno, 0, 0, function, line);
ac27a0ec112a08 Dave Kleikamp           2006-10-11  921  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer
  2021-04-27  5:39   ` Amir Goldstein
@ 2021-04-29 18:33     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-29 18:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

Amir Goldstein <amir73il@gmail.com> writes:

> On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> In order to support file system health/error reporting over fanotify,
>> fsnotify needs to expose a submission path that doesn't allow sleeping.
>> The only problem I identified with the current submission path is the
>> need to dynamically allocate memory for the event queue.
>>
>> This patch avoids the problem by introducing a new mode in fsnotify,
>> where a ring buffer is used to submit events for a group.  Each group
>> has its own ring buffer, and error notifications are submitted
>> exclusively through it.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>>  fs/notify/Makefile               |   2 +-
>>  fs/notify/group.c                |  12 +-
>>  fs/notify/notification.c         |  10 ++
>>  fs/notify/ring.c                 | 199 +++++++++++++++++++++++++++++++
>>  include/linux/fsnotify_backend.h |  37 +++++-
>>  5 files changed, 255 insertions(+), 5 deletions(-)
>>  create mode 100644 fs/notify/ring.c
>>
>> diff --git a/fs/notify/Makefile b/fs/notify/Makefile
>> index 63a4b8828df4..61dae1e90f2d 100644
>> --- a/fs/notify/Makefile
>> +++ b/fs/notify/Makefile
>> @@ -1,6 +1,6 @@
>>  # SPDX-License-Identifier: GPL-2.0
>>  obj-$(CONFIG_FSNOTIFY)         += fsnotify.o notification.o group.o mark.o \
>> -                                  fdinfo.o
>> +                                  fdinfo.o ring.o
>>
>>  obj-y                  += dnotify/
>>  obj-y                  += inotify/
>> diff --git a/fs/notify/group.c b/fs/notify/group.c
>> index 08acb1afc0c2..b99b3de36696 100644
>> --- a/fs/notify/group.c
>> +++ b/fs/notify/group.c
>> @@ -81,7 +81,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
>>          * notification against this group. So clearing the notification queue
>>          * of all events is reliable now.
>>          */
>> -       fsnotify_flush_notify(group);
>> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> +               fsnotify_free_ring_buffer(group);
>> +       else
>> +               fsnotify_flush_notify(group);
>>
>>         /*
>>          * Destroy overflow event (we cannot use fsnotify_destroy_event() as
>> @@ -136,6 +139,13 @@ static struct fsnotify_group *__fsnotify_alloc_group(
>>         group->ops = ops;
>>         group->flags = flags;
>>
>> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
>> +               if (fsnotify_create_ring_buffer(group)) {
>> +                       kfree(group);
>> +                       return ERR_PTR(-ENOMEM);
>> +               }
>> +       }
>> +
>>         return group;
>>  }
>>
>> diff --git a/fs/notify/notification.c b/fs/notify/notification.c
>> index 75d79d6d3ef0..32f97e7b7a80 100644
>> --- a/fs/notify/notification.c
>> +++ b/fs/notify/notification.c
>> @@ -51,6 +51,10 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
>>  bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
>>  {
>>         assert_spin_locked(&group->notification_lock);
>> +
>> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> +               return fsnotify_ring_notify_queue_is_empty(group);
>> +
>>         return list_empty(&group->notification_list) ? true : false;
>>  }
>>
>> @@ -132,6 +136,9 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group,
>>                                   struct fsnotify_event *event)
>>  {
>>         assert_spin_locked(&group->notification_lock);
>> +
>> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> +               return;
>>         /*
>>          * We need to init list head for the case of overflow event so that
>>          * check in fsnotify_add_event() works
>> @@ -166,6 +173,9 @@ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
>>  {
>>         assert_spin_locked(&group->notification_lock);
>>
>> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER)
>> +               return fsnotify_ring_peek_first_event(group);
>> +
>>         return list_first_entry(&group->notification_list,
>>                                 struct fsnotify_event, list);
>>  }
>> diff --git a/fs/notify/ring.c b/fs/notify/ring.c
>> new file mode 100644
>> index 000000000000..75e8af1f8d80
>> --- /dev/null
>> +++ b/fs/notify/ring.c
>> @@ -0,0 +1,199 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/types.h>
>> +#include <linux/fsnotify.h>
>> +#include <linux/memcontrol.h>
>> +
>> +#define INVALID_RING_SLOT -1
>> +
>> +#define FSNOTIFY_RING_PAGES 16
>> +
>> +#define NEXT_SLOT(cur, len, ring_size) ((cur + len) & (ring_size-1))
>> +#define NEXT_PAGE(cur, ring_size) (round_up(cur, PAGE_SIZE) & (ring_size-1))
>> +
>> +bool fsnotify_ring_notify_queue_is_empty(struct fsnotify_group *group)
>> +{
>> +       assert_spin_locked(&group->notification_lock);
>> +
>> +       if (group->ring_buffer.tail == group->ring_buffer.head)
>> +               return true;
>> +       return false;
>> +}
>> +
>> +struct fsnotify_event *fsnotify_ring_peek_first_event(struct fsnotify_group *group)
>> +{
>> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> +       struct fsnotify_event *fsn;
>> +       char *kaddr;
>> +       u64 tail;
>> +
>> +       assert_spin_locked(&group->notification_lock);
>> +
>> +again:
>> +       tail = group->ring_buffer.tail;
>> +
>> +       if ((PAGE_SIZE - (tail & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
>> +               group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
>> +               goto again;
>> +       }
>> +
>> +       kaddr = kmap_atomic(group->ring_buffer.pages[tail / PAGE_SIZE]);
>> +       if (!kaddr)
>> +               return NULL;
>> +       fsn = (struct fsnotify_event *) (kaddr + (tail & (PAGE_SIZE-1)));
>> +
>> +       if (fsn->slot_len == INVALID_RING_SLOT) {
>> +               group->ring_buffer.tail = NEXT_PAGE(tail, ring_size);
>> +               kunmap_atomic(kaddr);
>> +               goto again;
>> +       }
>> +
>> +       /* will be unmapped when entry is consumed. */
>> +       return fsn;
>> +}
>> +
>> +void fsnotify_ring_buffer_consume_event(struct fsnotify_group *group,
>> +                                       struct fsnotify_event *event)
>> +{
>> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> +       u64 new_tail = NEXT_SLOT(group->ring_buffer.tail, event->slot_len, ring_size);
>> +
>> +       kunmap_atomic(event);
>> +
>> +       pr_debug("%s: group=%p tail=%llx->%llx ring_size=%llu\n", __func__,
>> +                group, group->ring_buffer.tail, new_tail, ring_size);
>> +
>> +       WRITE_ONCE(group->ring_buffer.tail, new_tail);
>> +}
>> +
>> +struct fsnotify_event *fsnotify_ring_alloc_event_slot(struct fsnotify_group *group,
>> +                                                     size_t size)
>> +       __acquires(&group->notification_lock)
>> +{
>> +       struct fsnotify_event *fsn;
>> +       u64 head, tail;
>> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> +       u64 new_head;
>> +       void *kaddr;
>> +
>> +       if (WARN_ON(!(group->flags & FSN_SUBMISSION_RING_BUFFER) || size > PAGE_SIZE))
>> +               return ERR_PTR(-EINVAL);
>> +
>> +       pr_debug("%s: start group=%p ring_size=%llu, requested=%lu\n", __func__, group,
>> +                ring_size, size);
>> +
>> +       spin_lock(&group->notification_lock);
>> +again:
>> +       head = group->ring_buffer.head;
>> +       tail = group->ring_buffer.tail;
>> +       new_head = NEXT_SLOT(head, size, ring_size);
>> +
>> +       /* head would catch up to tail, corrupting an entry. */
>> +       if ((head < tail && new_head > tail) || (head > new_head && new_head > tail)) {
>> +               fsn = ERR_PTR(-ENOMEM);
>> +               goto err;
>> +       }
>> +
>> +       /*
>> +        * Not event a skip message fits in the page. We can detect the
>> +        * lack of space. Move on to the next page.
>> +        */
>> +       if ((PAGE_SIZE - (head & (PAGE_SIZE-1))) < sizeof(struct fsnotify_event)) {
>> +               /* Start again on next page */
>> +               group->ring_buffer.head = NEXT_PAGE(head, ring_size);
>> +               goto again;
>> +       }
>> +
>> +       kaddr = kmap_atomic(group->ring_buffer.pages[head / PAGE_SIZE]);
>> +       if (!kaddr) {
>> +               fsn = ERR_PTR(-EFAULT);
>> +               goto err;
>> +       }
>> +
>> +       fsn = (struct fsnotify_event *) (kaddr + (head & (PAGE_SIZE-1)));
>> +
>> +       if ((head >> PAGE_SHIFT) != (new_head >> PAGE_SHIFT)) {
>> +               /*
>> +                * No room in the current page.  Add a fake entry
>> +                * consuming the end the page to avoid splitting event
>> +                * structure.
>> +                */
>> +               fsn->slot_len = INVALID_RING_SLOT;
>> +               kunmap_atomic(kaddr);
>> +               /* Start again on the next page */
>> +               group->ring_buffer.head = NEXT_PAGE(head, ring_size);
>> +
>> +               goto again;
>> +       }
>> +       fsn->slot_len = size;
>> +
>> +       return fsn;
>> +
>> +err:
>> +       spin_unlock(&group->notification_lock);
>> +       return fsn;
>> +}
>> +
>> +void fsnotify_ring_commit_slot(struct fsnotify_group *group, struct fsnotify_event *fsn)
>> +       __releases(&group->notification_lock)
>> +{
>> +       u64 ring_size = group->ring_buffer.nr_pages << PAGE_SHIFT;
>> +       u64 head = group->ring_buffer.head;
>> +       u64 new_head = NEXT_SLOT(head, fsn->slot_len, ring_size);
>> +
>> +       pr_debug("%s: group=%p head=%llx->%llx ring_size=%llu\n", __func__,
>> +                group, head, new_head, ring_size);
>> +
>> +       kunmap_atomic(fsn);
>> +       group->ring_buffer.head = new_head;
>> +
>> +       spin_unlock(&group->notification_lock);
>> +
>> +       wake_up(&group->notification_waitq);
>> +       kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
>> +
>> +}
>> +
>> +void fsnotify_free_ring_buffer(struct fsnotify_group *group)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i < group->ring_buffer.nr_pages; i++)
>> +               __free_page(group->ring_buffer.pages[i]);
>> +       kfree(group->ring_buffer.pages);
>> +       group->ring_buffer.nr_pages = 0;
>> +}
>> +
>> +int fsnotify_create_ring_buffer(struct fsnotify_group *group)
>> +{
>> +       int nr_pages = FSNOTIFY_RING_PAGES;
>> +       int i;
>> +
>> +       pr_debug("%s: group=%p pages=%d\n", __func__, group, nr_pages);
>> +
>> +       group->ring_buffer.pages = kmalloc_array(nr_pages, sizeof(struct pages *),
>> +                                                GFP_KERNEL);
>> +       if (!group->ring_buffer.pages)
>> +               return -ENOMEM;
>> +
>> +       group->ring_buffer.head = 0;
>> +       group->ring_buffer.tail = 0;
>> +
>> +       for (i = 0; i < nr_pages; i++) {
>> +               group->ring_buffer.pages[i] = alloc_pages(GFP_KERNEL, 1);
>> +               if (!group->ring_buffer.pages)
>> +                       goto err_dealloc;
>> +       }
>> +
>> +       group->ring_buffer.nr_pages = nr_pages;
>> +
>> +       return 0;
>> +
>> +err_dealloc:
>> +       for (--i; i >= 0; i--)
>> +               __free_page(group->ring_buffer.pages[i]);
>> +       kfree(group->ring_buffer.pages);
>> +       group->ring_buffer.nr_pages = 0;
>> +       return -ENOMEM;
>> +}
>> +
>> +
>
> Nothing in this file is fsnotify specific.
> Is there no kernel lib implementation for this already?
> If there isn't (I'd be very surprised) please put this in lib/ and post it
> for wider review including self tests.

About the implementation, the only generic code I could find is
include/linux/circ_buf.h, but it doesn't really do much or fit well
here.  For instance, it doesn't deal well with non-contiguous pages.

There are other smarter implementations around the kernel like
Documentation/trace/ring-buffer-design.txt, that would be a better
candidate to be a generic ring buffer in lib/, but I admit to haven't
checked them well enough to see if they would solve the problem for
fsnotify, which has a very simple ring buffer anyway.

>> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
>> index 190c6a402e98..a1a4dd69e5ed 100644
>> --- a/include/linux/fsnotify_backend.h
>> +++ b/include/linux/fsnotify_backend.h
>> @@ -74,6 +74,8 @@
>>  #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
>>                                   FS_OPEN_EXEC_PERM)
>>
>> +#define FSN_SUBMISSION_RING_BUFFER     0x00000080
>
> FSNOTIFY_GROUP_FLAG_RING_BUFFER please (or FSN_GROUP_ if you must)
> and please define this above struct fsnotify_group, even right above the flags
> field like FSNOTIFY_CONN_FLAG_HAS_FSID
>
> *IF* we go this way :)
>
> Thanks,
> Amir.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 06/15] fanotify: Support submission through ring buffer
  2021-04-27  6:02   ` Amir Goldstein
@ 2021-04-29 18:36     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-29 18:36 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

Amir Goldstein <amir73il@gmail.com> writes:

> On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> This adds support for the ring buffer mode in fanotify.  It is enabled
>> by a new flag FAN_PREALLOC_QUEUE passed to fanotify_init.  If this flag
>> is enabled, the group only allows marks that support the ring buffer
>
> I don't like this limitation.
> I think FAN_PREALLOC_QUEUE can work with other events, why not?

The only complications I see are permission events and mergeable events,
which would no longer be merged.  The merging problem is not big,
except it changes the existing expectations.  Other than that, it should
be trivial to have every FAN_CLASS_NOTIF events in the ring buffer.
>
> In any case if we keep ring buffer, please use a different set of
> fanotify_ring_buffer_ops struct instead of spraying if/else all over the
> event queue implementation.
>
>> submission.  In a following patch, FAN_ERROR will make use of this
>> mechanism.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>>  fs/notify/fanotify/fanotify.c      | 77 +++++++++++++++++++---------
>>  fs/notify/fanotify/fanotify_user.c | 81 ++++++++++++++++++------------
>>  include/linux/fanotify.h           |  5 +-
>>  include/uapi/linux/fanotify.h      |  1 +
>>  4 files changed, 105 insertions(+), 59 deletions(-)
>>
>> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
>> index e3669d8a4a64..98591a8155a7 100644
>> --- a/fs/notify/fanotify/fanotify.c
>> +++ b/fs/notify/fanotify/fanotify.c
>> @@ -612,6 +612,26 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>>         return event;
>>  }
>>
>> +static struct fanotify_event *fanotify_ring_get_slot(struct fsnotify_group *group,
>> +                                                    u32 mask, const void *data,
>> +                                                    int data_type)
>> +{
>> +       size_t size = 0;
>> +
>> +       pr_debug("%s: group=%p mask=%x size=%lu\n", __func__, group, mask, size);
>> +
>> +       return FANOTIFY_E(fsnotify_ring_alloc_event_slot(group, size));
>> +}
>> +
>> +static void fanotify_ring_write_event(struct fsnotify_group *group,
>> +                                     struct fanotify_event *event, u32 mask,
>> +                                     const void *data, __kernel_fsid_t *fsid)
>> +{
>> +       fanotify_init_event(group, event, 0, mask);
>> +
>> +       event->pid = get_pid(task_tgid(current));
>> +}
>> +
>>  /*
>>   * Get cached fsid of the filesystem containing the object from any connector.
>>   * All connectors are supposed to have the same fsid, but we do not verify that
>> @@ -701,31 +721,38 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>>                         return 0;
>>         }
>>
>> -       event = fanotify_alloc_event(group, mask, data, data_type, dir,
>> -                                    file_name, &fsid);
>> -       ret = -ENOMEM;
>> -       if (unlikely(!event)) {
>> -               /*
>> -                * We don't queue overflow events for permission events as
>> -                * there the access is denied and so no event is in fact lost.
>> -                */
>> -               if (!fanotify_is_perm_event(mask))
>> -                       fsnotify_queue_overflow(group);
>> -               goto finish;
>> -       }
>> -
>> -       fsn_event = &event->fse;
>> -       ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
>> -       if (ret) {
>> -               /* Permission events shouldn't be merged */
>> -               BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
>> -               /* Our event wasn't used in the end. Free it. */
>> -               fsnotify_destroy_event(group, fsn_event);
>> -
>> -               ret = 0;
>> -       } else if (fanotify_is_perm_event(mask)) {
>> -               ret = fanotify_get_response(group, FANOTIFY_PERM(event),
>> -                                           iter_info);
>> +       if (group->flags & FSN_SUBMISSION_RING_BUFFER) {
>> +               event = fanotify_ring_get_slot(group, mask, data, data_type);
>> +               if (IS_ERR(event))
>> +                       return PTR_ERR(event);
>
> So no FAN_OVERFLOW with the ring buffer implementation?
> This will be unexpected for fanotify users and frankly, less useful IMO.
> I also don't see the technical reason to omit the overflow event.
>
>> +               fanotify_ring_write_event(group, event, mask, data, &fsid);
>> +               fsnotify_ring_commit_slot(group, &event->fse);
>> +       } else {
>> +               event = fanotify_alloc_event(group, mask, data, data_type, dir,
>> +                                            file_name, &fsid);
>> +               ret = -ENOMEM;
>> +               if (unlikely(!event)) {
>> +                       /*
>> +                        * We don't queue overflow events for permission events as
>> +                        * there the access is denied and so no event is in fact lost.
>> +                        */
>> +                       if (!fanotify_is_perm_event(mask))
>> +                               fsnotify_queue_overflow(group);
>> +                       goto finish;
>> +               }
>> +               fsn_event = &event->fse;
>> +               ret = fsnotify_add_event(group, fsn_event, fanotify_merge);
>> +               if (ret) {
>> +                       /* Permission events shouldn't be merged */
>> +                       BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
>> +                       /* Our event wasn't used in the end. Free it. */
>> +                       fsnotify_destroy_event(group, fsn_event);
>> +
>> +                       ret = 0;
>> +               } else if (fanotify_is_perm_event(mask)) {
>> +                       ret = fanotify_get_response(group, FANOTIFY_PERM(event),
>> +                                                   iter_info);
>> +               }
>>         }
>>  finish:
>>         if (fanotify_is_perm_event(mask))
>> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
>> index fe605359af88..5031198bf7db 100644
>> --- a/fs/notify/fanotify/fanotify_user.c
>> +++ b/fs/notify/fanotify/fanotify_user.c
>> @@ -521,7 +521,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
>>                  * Permission events get queued to wait for response.  Other
>>                  * events can be destroyed now.
>>                  */
>> -               if (!fanotify_is_perm_event(event->mask)) {
>> +               if (group->fanotify_data.flags & FAN_PREALLOC_QUEUE) {
>> +                       fsnotify_ring_buffer_consume_event(group, &event->fse);
>> +               } else if (!fanotify_is_perm_event(event->mask)) {
>>                         fsnotify_destroy_event(group, &event->fse);
>>                 } else {
>>                         if (ret <= 0) {
>> @@ -587,40 +589,39 @@ static int fanotify_release(struct inode *ignored, struct file *file)
>>          */
>>         fsnotify_group_stop_queueing(group);
>>
>> -       /*
>> -        * Process all permission events on access_list and notification queue
>> -        * and simulate reply from userspace.
>> -        */
>> -       spin_lock(&group->notification_lock);
>> -       while (!list_empty(&group->fanotify_data.access_list)) {
>> -               struct fanotify_perm_event *event;
>> -
>> -               event = list_first_entry(&group->fanotify_data.access_list,
>> -                               struct fanotify_perm_event, fae.fse.list);
>> -               list_del_init(&event->fae.fse.list);
>> -               finish_permission_event(group, event, FAN_ALLOW);
>> +       if (!(group->flags & FSN_SUBMISSION_RING_BUFFER)) {
>> +               /*
>> +                * Process all permission events on access_list and notification queue
>> +                * and simulate reply from userspace.
>> +                */
>>                 spin_lock(&group->notification_lock);
>> -       }
>> -
>> -       /*
>> -        * Destroy all non-permission events. For permission events just
>> -        * dequeue them and set the response. They will be freed once the
>> -        * response is consumed and fanotify_get_response() returns.
>> -        */
>> -       while (!fsnotify_notify_queue_is_empty(group)) {
>> -               struct fanotify_event *event;
>> -
>> -               event = FANOTIFY_E(fsnotify_remove_first_event(group));
>> -               if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
>> -                       spin_unlock(&group->notification_lock);
>> -                       fsnotify_destroy_event(group, &event->fse);
>> -               } else {
>> -                       finish_permission_event(group, FANOTIFY_PERM(event),
>> -                                               FAN_ALLOW);
>> +               while (!list_empty(&group->fanotify_data.access_list)) {
>> +                       struct fanotify_perm_event *event;
>> +                       event = list_first_entry(&group->fanotify_data.access_list,
>> +                                                struct fanotify_perm_event, fae.fse.list);
>> +                       list_del_init(&event->fae.fse.list);
>> +                       finish_permission_event(group, event, FAN_ALLOW);
>> +                       spin_lock(&group->notification_lock);
>>                 }
>> -               spin_lock(&group->notification_lock);
>> +               /*
>> +                * Destroy all non-permission events. For permission events just
>> +                * dequeue them and set the response. They will be freed once the
>> +                * response is consumed and fanotify_get_response() returns.
>> +                */
>> +               while (!fsnotify_notify_queue_is_empty(group)) {
>> +                       struct fanotify_event *event;
>> +                       event = FANOTIFY_E(fsnotify_remove_first_event(group));
>> +                       if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
>> +                               spin_unlock(&group->notification_lock);
>> +                               fsnotify_destroy_event(group, &event->fse);
>> +                       } else {
>> +                               finish_permission_event(group, FANOTIFY_PERM(event),
>> +                                                       FAN_ALLOW);
>> +                       }
>> +                       spin_lock(&group->notification_lock);
>> +               }
>> +               spin_unlock(&group->notification_lock);
>>         }
>> -       spin_unlock(&group->notification_lock);
>>
>>         /* Response for all permission events it set, wakeup waiters */
>>         wake_up(&group->fanotify_data.access_waitq);
>> @@ -981,6 +982,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>>         if (flags & FAN_NONBLOCK)
>>                 f_flags |= O_NONBLOCK;
>>
>> +       if (flags & FAN_PREALLOC_QUEUE) {
>> +               if (!capable(CAP_SYS_ADMIN))
>> +                       return -EPERM;
>> +
>> +               if (flags & FAN_UNLIMITED_QUEUE)
>> +                       return -EINVAL;
>> +
>> +               fsn_flags = FSN_SUBMISSION_RING_BUFFER;
>> +       }
>> +
>>         /* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
>>         group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops, fsn_flags);
>>         if (IS_ERR(group)) {
>> @@ -1223,6 +1234,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>>                 goto fput_and_out;
>>         }
>>
>> +       if ((group->flags & FSN_SUBMISSION_RING_BUFFER) &&
>> +           (mask & ~FANOTIFY_SUBMISSION_BUFFER_EVENTS))
>> +               goto fput_and_out;
>> +
>>         ret = fanotify_find_path(dfd, pathname, &path, flags,
>>                         (mask & ALL_FSNOTIFY_EVENTS), obj_type);
>>         if (ret)
>> @@ -1327,7 +1342,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
>>   */
>>  static int __init fanotify_user_setup(void)
>>  {
>> -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
>> +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
>>         BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>>
>>         fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
>> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
>> index 3e9c56ee651f..5a4cefb4b1c3 100644
>> --- a/include/linux/fanotify.h
>> +++ b/include/linux/fanotify.h
>> @@ -23,7 +23,8 @@
>>  #define FANOTIFY_INIT_FLAGS    (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
>>                                  FAN_REPORT_TID | \
>>                                  FAN_CLOEXEC | FAN_NONBLOCK | \
>> -                                FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
>> +                                FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
>> +                                FAN_PREALLOC_QUEUE)
>>
>>  #define FANOTIFY_MARK_TYPE_BITS        (FAN_MARK_INODE | FAN_MARK_MOUNT | \
>>                                  FAN_MARK_FILESYSTEM)
>> @@ -71,6 +72,8 @@
>>                                          FANOTIFY_PERM_EVENTS | \
>>                                          FAN_Q_OVERFLOW | FAN_ONDIR)
>>
>> +#define FANOTIFY_SUBMISSION_BUFFER_EVENTS 0
>
> FANOTIFY_RING_BUFFER_EVENTS? FANOTIFY_PREALLOC_EVENTS?
>
> Please leave a comment above to state what this group means.
> I *think* there is no reason to limit the set of events, only the sort of
> information that is possible with FAN_PREALLOC_QUEUE.
>
> Perhaps FAN_REPORT_FID cannot be allowed and as a result
> FANOTIFY_INODE_EVENTS will not be allowed, but I am not even
> sure if that limitation is needed.
>
> Thanks,
> Amir.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 10/15] fanotify: Introduce code location record
  2021-04-27  7:11   ` Amir Goldstein
@ 2021-04-29 18:40     ` Gabriel Krisman Bertazi
  2021-05-11  5:35       ` Khazhy Kumykov
  0 siblings, 1 reply; 46+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-04-29 18:40 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Tso, Darrick J. Wong, Dave Chinner, Jan Kara,
	David Howells, Khazhismel Kumykov, linux-fsdevel, Ext4, kernel

Amir Goldstein <amir73il@gmail.com> writes:

> On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> This patch introduces an optional info record that describes the
>> source (as in the region of the source-code where an event was
>> initiated).  This record is not produced for other type of existing
>> notification, but it is optionally enabled for FAN_ERROR notifications.
>>
>
> I find this functionality controversial, because think that the fs provided
> s_last_error*, s_first_error* is more reliable and more powerful than this
> functionality.
>
> Let's leave it for a future extending proposal, should fanotify event reporting
> proposal pass muster, shall we?
> Or do you think that without this optional extension fanotify event reporting
> will not be valuable enough?

I think it is valuable enough without this bit, at least on a first
moment.  I understand it would be useful for ext4 to analyse information
through this interface, but the main priority is to have a way to push
out the information that an error occured, as you mentioned.

Also, this might be more powerful if we stick to the ring buffer instead
of single stlot, as it would allow more data to be collected than just
first/last.
>
> Thanks,
> Amir.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 00/15] File system wide monitoring
  2021-04-27 15:44   ` Gabriel Krisman Bertazi
@ 2021-05-11  4:45     ` Khazhy Kumykov
  0 siblings, 0 replies; 46+ messages in thread
From: Khazhy Kumykov @ 2021-05-11  4:45 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Amir Goldstein, Theodore Tso, Darrick J. Wong, Dave Chinner,
	Jan Kara, David Howells, linux-fsdevel, Ext4, kernel

[-- Attachment #1: Type: text/plain, Size: 7402 bytes --]

On Tue, Apr 27, 2021 at 8:44 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
>
> > On Mon, Apr 26, 2021 at 9:42 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> >>
> >> Hi,
> >>
> >> In an attempt to consolidate some of the feedback from the previous
> >> proposals, I wrote a new attempt to solve the file system error reporting
> >> problem.  Before I spend more time polishing it, I'd like to hear your
> >> feedback if I'm going in the wrong direction, in particular with the
> >> modifications to fsnotify.
> >>
> >
> > IMO you are going in the right direction, but you have gone a bit too far ;-)
> >
> > My understanding of the requirements and my interpretation of the feedback
> > from filesystem maintainers is that the missing piece in the ecosystem is a
> > user notification that "something went wrong". The "what went wrong" part
> > is something that users and admins have long been able to gather from the
> > kernel log and from filesystem tools (e.g. last error recorded).
> >
> > I do not see the need to duplicate existing functionality in fsmonitor.
> > Don't get me wrong, I understand why it would have been nice for fsmonitor
> > to be able to get all the errors nicely without looking anywhere else, but I
> > don't think it justifies the extra complexity.
>
> Hi Amir,
>
> Thanks for the detailed review.
>
> The reasons for the location record and the ring buffer is the use case
> from Google to do analysis on a series of errors.  I understand this is
> important to them, which is why I expanded a bit on the 'what went
> wrong' and multiple errors.  In addition, The file system specific blob
> attempts to assist online recovery tools with more information, but it
> might make sense to do it in the future, when it is needed.
>
> >> This RFC follows up on my previous proposals which attempted to leverage
> >> watch_queue[1] and fsnotify[2] to provide a mechanism for file systems
> >> to push error notifications to user space.  This proposal starts by, as
> >> suggested by Darrick, limiting the scope of what I'm trying to do to an
> >> interface for administrators to monitor the health of a file system,
> >> instead of a generic inteface for file errors.  Therefore, this doesn't
> >> solve the problem of writeback errors or the need to watch a specific
> >> subsystem.
> >>
> >> * Format
> >>
> >> The feature is implemented on top of fanotify, as a new type of fanotify
> >> mark, FAN_ERROR, which a file system monitoring tool can register to
> >
> > You have a terminology mistake throughout your series.
> > FAN_ERROR is not a type of a mark, it is a type of an event.
> > A mark describes the watched object (i.e. a filesystem, mount, inode).
>
> Right.  I understand the mistake and will fix it around the series.
> >
> >> receive notifications.  A notification is split in three parts, and only
> >> the first is guaranteed to exist for any given error event:
> >>
> >>  - FS generic data: A file system agnostic structure that has a generic
> >>  error code and identifies the filesystem.  Basically, it let's
> >>  userspace know something happen on a monitored filesystem.
> >
> > I think an error seq counter per fs would be a nice addition to generic data.
> > It does not need to be persistent (it could be if filesystem supports it).
>
> Makes sense to me.
>
> >>
> >>  - FS location data: Identifies where in the code the problem
> >>  happened. (This is important for the use case of analysing frequent
> >>  error points that we discussed earlier).
> >>
> >>  - FS specific data: A detailed error report in a filesystem specific
> >>  format that details what the error is.  Ideally, a capable monitoring
> >>  tool can use the information here for error recovery.  For instance,
> >>  xfs can put the xfs_scrub structures here, ext4 can send its error
> >>  reports, etc.  An example of usage is done in the ext4 patch of this
> >>  series.
> >>
> >> More details on the information in each record can be found on the
> >> documentation introduced in patch 15.
> >>
> >> * Using fanotify
> >>
> >> Using fanotify for this kind of thing is slightly tricky because we want
> >> to guarantee delivery in some complicated conditions, for instance, the
> >> file system might want to send an error while holding several locks.
> >>
> >> Instead of working around file system constraints at the file system
> >> level, this proposal tries to make the FAN_ERROR submission safe in
> >> those contexts.  This is done with a new mode in fsnotify that
> >> preallocates the memory at group creation to be used for the
> >> notification submission.
> >>
> >> This new mode in fsnotify introduces a ring buffer to queue
> >> notifications, which eliminates the allocation path in fsnotify.  From
> >> what I saw, the allocation is the only problem in fsnotify for
> >> filesystems to submit errors in constrained situations.
> >>
> >
> > The ring buffer functionality for fsnotify is interesting and it may be
> > useful on its own, but IMO, its too big of a hammer for the problem
> > at hand.
> >
> > The question that you should be asking yourself is what is the
> > expected behavior in case of a flood of filesystem corruption errors.
> > I think it has already been expressed by filesystem maintainers on
> > one your previous postings, that a flood of filesystem corruption
> > errors is often noise and the only interesting information is the
> > first error.
>
> My idea was be to provide an ioctl for the user to resize the ring
> buffer when needed, to make the flood manageable. But I understand your
> main point about the ring buffer.  i'm not sure saving only the first
> notification solves Google's use case of error monitoring and analysis,
> though.  Khazhy, Ted, can you weight in?

I think this is a good point to bring up - a flood of errors shouldn't
drown out other filesystems, and from the perspective of error
reporting, it's better to drop all but one notification from one FS
than to drop the only notification from another. In the cases I can
think of, the first error is probably enough and does simplify things
quite a bit. There is the option of setting up a ring buffer per fs,
which does seem excessive in light of the previous statement.

>
> > For this reason, I think that FS_ERROR could be implemented
> > by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
> >
> > struct fsnotify_sb_mark {
> >         struct fsnotify_mark fsn_mark;
> >         struct fsnotify_error_info info;
> > }
> >
> > Similar to fd sampled errseq, there can be only one error report
> > per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
> > the error report can be allocated at the time of setting the filesystem mark.
> >
> > With this, you will not need the added complexity of the ring buffer
> > and you will not need to limit FAN_ERROR reporting to a group that
> > is only listening for FAN_ERROR, which is an unneeded limitation IMO.
>
> The limitation exists because I was concerned about not breaking the
> semantics of FAN_ACCESS and others, with regards to merged
> notifications.  I believe there should be no other reason why
> notifications of FAN_CLASS_NOTIF can't be sent to the ring buffer too.
> That limitation could be lifted for everything but permission events, I
> think.
>
> --
> Gabriel Krisman Bertazi

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 10/15] fanotify: Introduce code location record
  2021-04-29 18:40     ` Gabriel Krisman Bertazi
@ 2021-05-11  5:35       ` Khazhy Kumykov
  0 siblings, 0 replies; 46+ messages in thread
From: Khazhy Kumykov @ 2021-05-11  5:35 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Amir Goldstein, Theodore Tso, Darrick J. Wong, Dave Chinner,
	Jan Kara, David Howells, linux-fsdevel, Ext4, kernel

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

On Thu, Apr 29, 2021 at 11:40 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
>
> > On Mon, Apr 26, 2021 at 9:43 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> >>
> >> This patch introduces an optional info record that describes the
> >> source (as in the region of the source-code where an event was
> >> initiated).  This record is not produced for other type of existing
> >> notification, but it is optionally enabled for FAN_ERROR notifications.
> >>
> >
> > I find this functionality controversial, because think that the fs provided
> > s_last_error*, s_first_error* is more reliable and more powerful than this
> > functionality.
> >
> > Let's leave it for a future extending proposal, should fanotify event reporting
> > proposal pass muster, shall we?
> > Or do you think that without this optional extension fanotify event reporting
> > will not be valuable enough?
>
> I think it is valuable enough without this bit, at least on a first
> moment.  I understand it would be useful for ext4 to analyse information
> through this interface, but the main priority is to have a way to push
> out the information that an error occured, as you mentioned.

Ack, if it's deemed cleaner we could look at sysfs on notification,
but having the information in the same event provides some convenience
factor, and avoids racing in the event that we're looking at an error
after the first one.

>
> Also, this might be more powerful if we stick to the ring buffer instead
> of single stlot, as it would allow more data to be collected than just
> first/last.
> >
> > Thanks,
> > Amir.
>
> --
> Gabriel Krisman Bertazi

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC 00/15] File system wide monitoring
  2021-04-27  4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
  2021-04-27 15:44   ` Gabriel Krisman Bertazi
@ 2021-05-11 10:43   ` Jan Kara
  1 sibling, 0 replies; 46+ messages in thread
From: Jan Kara @ 2021-05-11 10:43 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Gabriel Krisman Bertazi, Theodore Tso, Darrick J. Wong,
	Dave Chinner, Jan Kara, David Howells, Khazhismel Kumykov,
	linux-fsdevel, Ext4, kernel

On Tue 27-04-21 07:11:49, Amir Goldstein wrote:
> The ring buffer functionality for fsnotify is interesting and it may be
> useful on its own, but IMO, its too big of a hammer for the problem
> at hand.
> 
> The question that you should be asking yourself is what is the
> expected behavior in case of a flood of filesystem corruption errors.
> I think it has already been expressed by filesystem maintainers on
> one your previous postings, that a flood of filesystem corruption
> errors is often noise and the only interesting information is the first error.
> 
> For this reason, I think that FS_ERROR could be implemented
> by attaching an fsnotify_error_info object to an fsnotify_sb_mark:
> 
> struct fsnotify_sb_mark {
>         struct fsnotify_mark fsn_mark;
>         struct fsnotify_error_info info;
> }
> 
> Similar to fd sampled errseq, there can be only one error report
> per sb-group pair (i.e. fsnotify_sb_mark) and the memory needed to store
> the error report can be allocated at the time of setting the filesystem mark.
> 
> With this, you will not need the added complexity of the ring buffer
> and you will not need to limit FAN_ERROR reporting to a group that
> is only listening for FAN_ERROR, which is an unneeded limitation IMO.

Seeing that this 'single error per mark' idea is gathering some support I'd
like to add my 2c: Probably we don't want fsnotify_error_info attached to
every fsnotify_mark, I guess we can have:

struct fanotify_mark {
	struct fsnotify_mark fsn_mark;
	struct fanotify_error_event *event;
};

and 'event' will be normally NULL and if we add FAN_ERROR to mark's mask,
we will allocate event (also containing error info) to use when generating
error event. And then the handling will be somewhat similar to how we
handle overflow events.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-05-11 10:43 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 18:41 [PATCH RFC 00/15] File system wide monitoring Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 01/15] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
2021-04-27  4:42   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 02/15] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
2021-04-27  4:53   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 03/15] fsnotify: Wire flags field on group allocation Gabriel Krisman Bertazi
2021-04-27  5:03   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 04/15] fsnotify: Wire up group information on event initialization Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 05/15] fsnotify: Support event submission through ring buffer Gabriel Krisman Bertazi
2021-04-26 22:00   ` kernel test robot
2021-04-26 22:43   ` kernel test robot
2021-04-27  5:39   ` Amir Goldstein
2021-04-29 18:33     ` Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 06/15] fanotify: Support " Gabriel Krisman Bertazi
2021-04-27  6:02   ` Amir Goldstein
2021-04-29 18:36     ` Gabriel Krisman Bertazi
2021-04-26 18:41 ` [PATCH RFC 07/15] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
2021-04-27  8:39   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 08/15] fsnotify: Introduce helpers to send error_events Gabriel Krisman Bertazi
2021-04-27  6:49   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 09/15] fanotify: Introduce generic error record Gabriel Krisman Bertazi
2021-04-27  7:01   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 10/15] fanotify: Introduce code location record Gabriel Krisman Bertazi
2021-04-27  7:11   ` Amir Goldstein
2021-04-29 18:40     ` Gabriel Krisman Bertazi
2021-05-11  5:35       ` Khazhy Kumykov
2021-04-26 18:41 ` [PATCH RFC 11/15] fanotify: Introduce filesystem specific data record Gabriel Krisman Bertazi
2021-04-27  7:12   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark Gabriel Krisman Bertazi
2021-04-26 22:45   ` kernel test robot
2021-04-27  7:25   ` Amir Goldstein
2021-04-26 18:41 ` [PATCH RFC 13/15] ext4: Send notifications on error Gabriel Krisman Bertazi
2021-04-26 23:10   ` kernel test robot
2021-04-27  4:32   ` Amir Goldstein
2021-04-29  0:57   ` Darrick J. Wong
2021-04-26 18:42 ` [PATCH RFC 14/15] samples: Add fs error monitoring example Gabriel Krisman Bertazi
2021-04-26 23:10   ` kernel test robot
2021-04-26 18:42 ` [PATCH RFC 15/15] Documentation: Document the FAN_ERROR framework Gabriel Krisman Bertazi
2021-04-27  4:11 ` [PATCH RFC 00/15] File system wide monitoring Amir Goldstein
2021-04-27 15:44   ` Gabriel Krisman Bertazi
2021-05-11  4:45     ` Khazhy Kumykov
2021-05-11 10:43   ` Jan Kara
2021-04-27  4:33 [PATCH RFC 12/15] fanotify: Introduce the FAN_ERROR mark kernel test robot
2021-04-29 11:31 ` Dan Carpenter
2021-04-27  8:36 [PATCH RFC 13/15] ext4: Send notifications on error kernel test robot
2021-04-29 13:19 ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.