All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Superblock Notifications
@ 2020-12-08  0:31 Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 1/8] watch_queue: Make watch_sizeof() check record size Gabriel Krisman Bertazi
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	Gabriel Krisman Bertazi, kernel

Hi,

After the two previous RFCs this is an attempt to get the ball spinning
again.

The problem I'm trying to solve is providing an interface to monitor
filesystem errors.  This patchset includes an example implementation of
ext4 error notification.  This goes obviously on top of the watch_queue
mechanism.

Regarding the buffer overrun issue that would require fsinfo or another
method to expose counters, I think they can be added at a later date
with no change to what this patch series attempts to do, therefore I'm
proposing we don't wait for fsinfo before getting this merged.

I mostly tested this with the samples program I have published in the
last patch.  In addition I will be sharing a patchset shortly with
proper documentation and some selftests for the feature.

David, can you please reply to this patchset? What do you think about
the watch_queue modifications I'm proposing?  I really don't want to
waste more time on this code if it doesn't fit the watch_queue API.  Can
I have some guidance in having this upstreamed?

In addition, I'm carrying "watch_queue: Make watch_sizeof() check record
size" on this patchset for now, but is it in anyone's tree going to
Linus any time soon?  I haven't found it.

I also shared this patchset in a branch at:

  Repo: https://gitlab.collabora.com/krisman/linux.git -b notifications

Previous RFC submissions can be found at:

RFC: https://www.spinics.net/lists/linux-ext4/msg74596.html
RFC v2:
https://lore.kernel.org/linux-fsdevel/20201111215213.4152354-1-krisman@collabora.com/

Original cover letter:
======================

Google has been using an out-of-tree mechanism for error notification in
Ext4 and we decided it is time to push for an upstream solution.  This
would surely fit on top of David's notification work.

This patchset is an attempt to restart that discussion.  It forward
ports
some code from David on top of Linus tree, adds features to
watch_queue and implements ext4 support.

The new notifications are designed after ext4 messages, so it exposes
notifications types to fit that filesystem, but it doesn't change much
to other filesystems, so it should be easily extensible.

I'm aware of the discussion around fsinfo, but I'd like to ask if there
are other missing pieces and what we could do to help that work go
upstream.  From a previous mailing list discussion, Linus complained
about lack of users as a main reason for it to not be merged, so hey! :)

In addition, I'd like to ask for feedback on the current implementation,
specifically regarding the passing of extra unformatted information at
the end of the notification and the ext4 support.

The work, as shared on this patchset can be found at:

  https://gitlab.collabora.com/krisman/linux.git -b
  ext4-error-notifications

And there is an example code at:

  https://gitlab.collabora.com/krisman/ext4-watcher

I'm Cc'ing Khazhismel Kumykov, from Google, who can provide more
information about their use case, if requested.

David Howells (3):
  watch_queue: Make watch_sizeof() check record size
  security: Add hooks to rule on setting a watch for superblock
  vfs: Add superblock notifications

Gabriel Krisman Bertazi (5):
  watch_queue: Support a text field at the end of the notification
  vfs: Include origin of the SB error notification
  fs: Add more superblock error subtypes
  ext4: Implement SB error notification through watch_sb
  samples: watch_queue: Add sample of SB notifications

 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/Kconfig                             |  12 ++
 fs/ext4/super.c                        |  31 +++--
 fs/super.c                             | 127 +++++++++++++++++++++
 include/linux/fs.h                     | 150 +++++++++++++++++++++++++
 include/linux/lsm_hook_defs.h          |   1 +
 include/linux/lsm_hooks.h              |   4 +
 include/linux/security.h               |  13 +++
 include/linux/syscalls.h               |   2 +
 include/linux/watch_queue.h            |  21 +++-
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/watch_queue.h       |  54 ++++++++-
 kernel/sys_ni.c                        |   3 +
 kernel/watch_queue.c                   |  29 ++++-
 samples/watch_queue/Makefile           |   2 +-
 samples/watch_queue/watch_sb.c         | 114 +++++++++++++++++++
 security/security.c                    |   6 +
 18 files changed, 556 insertions(+), 19 deletions(-)
 create mode 100644 samples/watch_queue/watch_sb.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/8] watch_queue: Make watch_sizeof() check record size
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 2/8] security: Add hooks to rule on setting a watch for superblock Gabriel Krisman Bertazi
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	kernel, Miklos Szeredi, Gabriel Krisman Bertazi

From: David Howells <dhowells@redhat.com>

Make watch_sizeof() give a build error if the size of the struct won't fit
into the size field in the header.

Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
[Rebase to 5.10]
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/watch_queue.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h
index c994d1b2cdba..f1086d12cd03 100644
--- a/include/linux/watch_queue.h
+++ b/include/linux/watch_queue.h
@@ -120,7 +120,12 @@ static inline void remove_watch_list(struct watch_list *wlist, u64 id)
  * watch_sizeof - Calculate the information part of the size of a watch record,
  * given the structure size.
  */
-#define watch_sizeof(STRUCT) (sizeof(STRUCT) << WATCH_INFO_LENGTH__SHIFT)
+#define watch_sizeof(STRUCT) \
+	({								\
+		size_t max = WATCH_INFO_LENGTH >> WATCH_INFO_LENGTH__SHIFT; \
+		BUILD_BUG_ON(sizeof(STRUCT) > max);			\
+		sizeof(STRUCT) << WATCH_INFO_LENGTH__SHIFT;		\
+	})
 
 #else
 static inline int watch_queue_init(struct pipe_inode_info *pipe)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/8] security: Add hooks to rule on setting a watch for superblock
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 1/8] watch_queue: Make watch_sizeof() check record size Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification Gabriel Krisman Bertazi
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	kernel, Gabriel Krisman Bertazi

From: David Howells <dhowells@redhat.com>

Add security hooks that will allow an LSM to rule on whether or not a watch
may be set for a supperblock.

Signed-off-by: David Howells <dhowells@redhat.com>
[Drop mount and key changes.  Rebase to mainline]
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/lsm_hooks.h     |  4 ++++
 include/linux/security.h      | 13 +++++++++++++
 security/security.c           |  6 ++++++
 4 files changed, 24 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..8fa8533598bc 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -261,6 +261,7 @@ LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
 LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
 	 const struct cred *cred, struct watch_notification *n)
+LSM_HOOK(int, 0, watch_sb, struct super_block *sb)
 #endif /* CONFIG_SECURITY && CONFIG_WATCH_QUEUE */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7ab8afb..11197bf167d3 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1475,6 +1475,10 @@
  *	@w_cred: The credentials of the whoever set the watch.
  *	@cred: The event-triggerer's credentials
  *	@n: The notification being posted
+ * @watch_sb:
+ *	Check to see if a process is allowed to watch for event notifications
+ *	from a superblock.
+ *	@sb: The superblock to watch.
  *
  * @watch_key:
  *	Check to see if a process is allowed to watch for event notifications
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..078e11a8872a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -456,6 +456,11 @@ int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
 int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
 int security_locked_down(enum lockdown_reason what);
+
+#ifdef CONFIG_WATCH_QUEUE
+int security_watch_sb(struct super_block *sb);
+#endif /* CONFIG_WATCH_QUEUE */
+
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1304,6 +1309,14 @@ static inline int security_locked_down(enum lockdown_reason what)
 {
 	return 0;
 }
+
+#ifdef CONFIG_WATCH_QUEUE
+static inline int security_watch_sb(struct super_block *sb)
+{
+	return 0;
+}
+#endif /* CONFIG_WATCH_QUEUE */
+
 #endif	/* CONFIG_SECURITY */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index a28045dc9e7f..a23a972063cd 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2074,6 +2074,12 @@ int security_post_notification(const struct cred *w_cred,
 {
 	return call_int_hook(post_notification, 0, w_cred, cred, n);
 }
+
+int security_watch_sb(struct super_block *sb)
+{
+	return call_int_hook(watch_sb, 0, sb);
+}
+
 #endif /* CONFIG_WATCH_QUEUE */
 
 #ifdef CONFIG_KEY_NOTIFICATIONS
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/8] watch_queue: Support a text field at the end of the notification
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 1/8] watch_queue: Make watch_sizeof() check record size Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 2/8] security: Add hooks to rule on setting a watch for superblock Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	Gabriel Krisman Bertazi, kernel

This allow notifications to send text information to userspace without
having to copy it to a temporary buffer to then copy to the ring.  One
use case to pass text information in notifications is for error
reporting, where more debug information might be needed, but we don't
want to explode the number of subtypes of notifications.  For instance,
ext4 can have a single inode error notification subtype, and pass more
information on the cause of the error in this field.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/watch_queue.h | 14 ++++++++++++--
 kernel/watch_queue.c        | 29 ++++++++++++++++++++++++-----
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h
index f1086d12cd03..2f5a7446bca6 100644
--- a/include/linux/watch_queue.h
+++ b/include/linux/watch_queue.h
@@ -79,7 +79,7 @@ struct watch_list {
 extern void __post_watch_notification(struct watch_list *,
 				      struct watch_notification *,
 				      const struct cred *,
-				      u64);
+				      u64, const char*, va_list*);
 extern struct watch_queue *get_watch_queue(int);
 extern void put_watch_queue(struct watch_queue *);
 extern void init_watch(struct watch *, struct watch_queue *);
@@ -105,7 +105,17 @@ static inline void post_watch_notification(struct watch_list *wlist,
 					   u64 id)
 {
 	if (unlikely(wlist))
-		__post_watch_notification(wlist, n, cred, id);
+		__post_watch_notification(wlist, n, cred, id, NULL, NULL);
+}
+
+static inline void post_watch_notification_string(struct watch_list *wlist,
+						  struct watch_notification *n,
+						  const struct cred *cred,
+						  u64 id, const char *fmt,
+						  va_list *args)
+{
+	if (unlikely(wlist))
+		__post_watch_notification(wlist, n, cred, id, fmt, args);
 }
 
 static inline void remove_watch_list(struct watch_list *wlist, u64 id)
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 0ef8f65bd2d7..89fcf0420ce7 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -70,13 +70,15 @@ static const struct pipe_buf_operations watch_queue_pipe_buf_ops = {
  * Post a notification to a watch queue.
  */
 static bool post_one_notification(struct watch_queue *wqueue,
-				  struct watch_notification *n)
+				  struct watch_notification *n,
+				  const char *fmt, va_list *args)
 {
 	void *p;
 	struct pipe_inode_info *pipe = wqueue->pipe;
 	struct pipe_buffer *buf;
 	struct page *page;
 	unsigned int head, tail, mask, note, offset, len;
+	int wlen = 0;
 	bool done = false;
 
 	if (!pipe)
@@ -102,6 +104,23 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	get_page(page);
 	len = n->info & WATCH_INFO_LENGTH;
 	p = kmap_atomic(page);
+	/*
+	 * Write the tail description before the actual header, because
+	 * the string needs to be generated to calculate the final
+	 * notification size, that is passed in the header.
+	 */
+	if (fmt) {
+		wlen = vscnprintf(p + offset + len, WATCH_INFO_LENGTH - len,
+				  fmt, (args ? *args : NULL));
+		wlen += 1; /* vscnprintf doesn't include '\0' */
+		if (wlen > 0) {
+			n->info = n->info & ~WATCH_INFO_LENGTH;
+			n->info |= (len + wlen) & WATCH_INFO_LENGTH;
+		} else {
+			/* Drop errors when writing the extra string. */
+			wlen = 0;
+		}
+	}
 	memcpy(p + offset, n, len);
 	kunmap_atomic(p);
 
@@ -110,7 +129,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	buf->private = (unsigned long)wqueue;
 	buf->ops = &watch_queue_pipe_buf_ops;
 	buf->offset = offset;
-	buf->len = len;
+	buf->len = (len + wlen);
 	buf->flags = PIPE_BUF_FLAG_WHOLE;
 	pipe->head = head + 1;
 
@@ -175,7 +194,7 @@ static bool filter_watch_notification(const struct watch_filter *wf,
 void __post_watch_notification(struct watch_list *wlist,
 			       struct watch_notification *n,
 			       const struct cred *cred,
-			       u64 id)
+			       u64 id, const char *fmt, va_list *args)
 {
 	const struct watch_filter *wf;
 	struct watch_queue *wqueue;
@@ -202,7 +221,7 @@ void __post_watch_notification(struct watch_list *wlist,
 		if (security_post_notification(watch->cred, cred, n) < 0)
 			continue;
 
-		post_one_notification(wqueue, n);
+		post_one_notification(wqueue, n, fmt, args);
 	}
 
 	rcu_read_unlock();
@@ -522,7 +541,7 @@ int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq,
 	 * protecting *wqueue from deallocation.
 	 */
 	if (wqueue) {
-		post_one_notification(wqueue, &n.watch);
+		post_one_notification(wqueue, &n.watch, NULL, NULL);
 
 		spin_lock_bh(&wqueue->lock);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/8] vfs: Add superblock notifications
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2020-12-08  0:31 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:56   ` Darrick J. Wong
  2020-12-10 22:09   ` Dave Chinner
  2020-12-08  0:31 ` [PATCH 5/8] vfs: Include origin of the SB error notification Gabriel Krisman Bertazi
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	kernel, Gabriel Krisman Bertazi

From: David Howells <dhowells@redhat.com>

Add a superblock event notification facility whereby notifications about
superblock events, such as I/O errors (EIO), quota limits being hit
(EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
process asynchronously.  Note that this does not cover vfsmount topology
changes.  watch_mount() is used for that.

Records are of the following format:

	struct superblock_notification {
		struct watch_notification watch;
		__u64	sb_id;
	} *n;

Where:

	n->watch.type will be WATCH_TYPE_SB_NOTIFY.

	n->watch.subtype will indicate the type of event, such as
	NOTIFY_SUPERBLOCK_READONLY.

	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
	record.

	n->watch.info & WATCH_INFO_ID will be the fifth argument to
	watch_sb(), shifted.

	n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
	NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
	R/O, and being cleared otherwise.

	n->sb_id will be the ID of the superblock, as can be retrieved with
	the fsinfo() syscall, as part of the fsinfo_sb_notifications
	attribute in the watch_id field.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype.  Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <dhowells@redhat.com>
[Rebase to mainline.  Expose inode and block on sb_error.
 Update API and commit message]
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/Kconfig                             |  12 +++
 fs/super.c                             | 127 +++++++++++++++++++++++++
 include/linux/fs.h                     |  87 +++++++++++++++++
 include/linux/syscalls.h               |   2 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/watch_queue.h       |  34 ++++++-
 kernel/sys_ni.c                        |   3 +
 9 files changed, 269 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0d0667a9fbd7..c481ab8c4454 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -445,3 +445,4 @@
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
 440	i386	process_madvise		sys_process_madvise
+441	i386	watch_sb		sys_watch_sb
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 379819244b91..87efe2577169 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -362,6 +362,7 @@
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
 440	common	process_madvise		sys_process_madvise
+441	common	watch_sb		sys_watch_sb
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/fs/Kconfig b/fs/Kconfig
index aa4c12282301..4e96521c37a1 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -117,6 +117,18 @@ source "fs/verity/Kconfig"
 
 source "fs/notify/Kconfig"
 
+config SB_NOTIFICATIONS
+	bool "Superblock event notifications"
+	select WATCH_QUEUE
+	help
+	  This option provides support for receiving superblock event
+	  notifications.  This makes use of the watch_queue API to
+	  handle the notification buffer and provides the sb_notify()
+	  system call to enable/disable watches.
+
+	  Events can include things like changing between R/W and R/O, EIO
+	  generation, ENOSPC generation and EDQUOT generation.
+
 source "fs/quota/Kconfig"
 
 source "fs/autofs/Kconfig"
diff --git a/fs/super.c b/fs/super.c
index 98bb0629ee10..93cf037f9b41 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -37,6 +37,8 @@
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_context.h>
+#include <linux/syscalls.h>
+#include <linux/namei.h>
 #include <uapi/linux/mount.h>
 #include "internal.h"
 
@@ -330,6 +332,10 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
+#ifdef CONFIG_SB_NOTIFICATIONS
+		if (s->s_watchers)
+			remove_watch_list(s->s_watchers, s->s_unique_id);
+#endif
 		cleancache_invalidate_fs(s);
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
@@ -969,6 +975,8 @@ int reconfigure_super(struct fs_context *fc)
 	/* Needs to be ordered wrt mnt_is_readonly() */
 	smp_wmb();
 	sb->s_readonly_remount = 0;
+	notify_sb(sb, NOTIFY_SUPERBLOCK_READONLY,
+		  remount_ro ? NOTIFY_SUPERBLOCK_IS_NOW_RO : 0);
 
 	/*
 	 * Some filesystems modify their metadata via some other path than the
@@ -1818,3 +1826,122 @@ int thaw_super(struct super_block *sb)
 	return thaw_super_locked(sb);
 }
 EXPORT_SYMBOL(thaw_super);
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+/*
+ * Post superblock notifications.
+ */
+
+void post_sb_notification(struct super_block *s, struct superblock_notification *n,
+			  const char *fmt, va_list *args)
+{
+	post_watch_notification_string(s->s_watchers, &n->watch, current_cred(),
+				       s->s_unique_id, fmt, args);
+}
+
+/**
+ * sys_watch_sb - Watch for superblock events.
+ * @dfd: Base directory to pathwalk from or fd referring to superblock.
+ * @filename: Path to superblock to place the watch upon
+ * @at_flags: Pathwalk control flags
+ * @watch_fd: The watch queue to send notifications to.
+ * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
+ */
+SYSCALL_DEFINE5(watch_sb,
+		int, dfd,
+		const char __user *, filename,
+		unsigned int, at_flags,
+		int, watch_fd,
+		int, watch_id)
+{
+	struct watch_queue *wqueue;
+	struct super_block *s;
+	struct watch_list *wlist = NULL;
+	struct watch *watch = NULL;
+	struct path path;
+	unsigned int lookup_flags =
+		LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	int ret;
+
+	if (watch_id < -1 || watch_id > 0xff)
+		return -EINVAL;
+	if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0)
+		return -EINVAL;
+	if (at_flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (at_flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+	ret = user_path_at(dfd, filename, at_flags, &path);
+	if (ret)
+		return ret;
+
+	ret = inode_permission(path.dentry->d_inode, MAY_EXEC);
+	if (ret)
+		goto err_path;
+
+	wqueue = get_watch_queue(watch_fd);
+	if (IS_ERR(wqueue))
+		goto err_path;
+
+	s = path.dentry->d_sb;
+	if (watch_id >= 0) {
+		ret = -ENOMEM;
+		if (!s->s_watchers) {
+			wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
+			if (!wlist)
+				goto err_wqueue;
+			init_watch_list(wlist, NULL);
+		}
+
+		watch = kzalloc(sizeof(*watch), GFP_KERNEL);
+		if (!watch)
+			goto err_wlist;
+
+		init_watch(watch, wqueue);
+		watch->id		= s->s_unique_id;
+		watch->private		= s;
+		watch->info_id		= (u32)watch_id << 24;
+
+		ret = security_watch_sb(s);
+		if (ret < 0)
+			goto err_watch;
+
+		down_write(&s->s_umount);
+		ret = -EIO;
+		if (atomic_read(&s->s_active)) {
+			if (!s->s_watchers) {
+				s->s_watchers = wlist;
+				wlist = NULL;
+			}
+
+			ret = add_watch_to_object(watch, s->s_watchers);
+			if (ret == 0) {
+				spin_lock(&sb_lock);
+				s->s_count++;
+				spin_unlock(&sb_lock);
+				watch = NULL;
+			}
+		}
+		up_write(&s->s_umount);
+	} else {
+		ret = -EBADSLT;
+		if (READ_ONCE(s->s_watchers)) {
+			down_write(&s->s_umount);
+			ret = remove_watch_from_object(s->s_watchers, wqueue,
+						       s->s_unique_id, false);
+			up_write(&s->s_umount);
+		}
+	}
+
+err_watch:
+	kfree(watch);
+err_wlist:
+	kfree(wlist);
+err_wqueue:
+	put_watch_queue(wqueue);
+err_path:
+	path_put(&path);
+	return ret;
+}
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8667d0cdc71e..df588edc0a34 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -39,6 +39,7 @@
 #include <linux/fs_types.h>
 #include <linux/build_bug.h>
 #include <linux/stddef.h>
+#include <linux/watch_queue.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1547,6 +1548,13 @@ struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	/* Superblock event notifications */
+	u64			s_unique_id;
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+	struct watch_list	*s_watchers;
+#endif
 } __randomize_layout;
 
 /* Helper functions so that in most cases filesystems will
@@ -3476,4 +3484,83 @@ static inline int inode_drain_writes(struct inode *inode)
 	return filemap_write_and_wait(inode->i_mapping);
 }
 
+extern void post_sb_notification(struct super_block *, struct superblock_notification *,
+				 const char *fmt, va_list *args);
+/**
+ * notify_sb: Post simple superblock notification.
+ * @s: The superblock the notification is about.
+ * @subtype: The type of notification.
+ * @info: WATCH_INFO_FLAG_* flags to be set in the record.
+ */
+static inline void notify_sb(struct super_block *s,
+			     enum superblock_notification_type subtype,
+			     u32 info)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_notification n = {
+			.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.watch.subtype	= subtype,
+			.watch.info	= watch_sizeof(n) | info,
+			.sb_id		= s->s_unique_id,
+		};
+
+		post_sb_notification(s, &n, NULL, NULL);
+	}
+
+#endif
+}
+
+/**
+ * notify_sb_error: Post superblock error notification.
+ * @s: The superblock the notification is about.
+ * @error: The error number to be recorded.
+ * @inode: The inode the error refers to (if available, 0 otherwise)
+ * @block: The block the error refers to (if available, 0 otherwise)
+ * @fmt: Formating string for extra information appended to the notification
+ * @args: arguments for extra information string appended to the notification
+ */
+static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
+				  u64 block, const char *fmt, va_list *args)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_error_notification n = {
+			.s.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.s.watch.subtype = NOTIFY_SUPERBLOCK_ERROR,
+			.s.watch.info	= watch_sizeof(n),
+			.s.sb_id	= s->s_unique_id,
+			.error_number	= error,
+			.error_cookie	= 0,
+			.inode		= inode,
+			.block		= block,
+		};
+
+		post_sb_notification(s, &n.s, fmt, args);
+	}
+#endif
+	return error;
+}
+
+/**
+ * notify_sb_EDQUOT: Post superblock quota overrun notification.
+ * @s: The superblock the notification is about.
+ */
+static inline int notify_sb_EQDUOT(struct super_block *s)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_notification n = {
+			.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.watch.subtype	= NOTIFY_SUPERBLOCK_EDQUOT,
+			.watch.info	= watch_sizeof(n),
+			.sb_id		= s->s_unique_id,
+		};
+
+		post_sb_notification(s, &n, NULL, NULL);
+	}
+#endif
+	return -EDQUOT;
+}
+
 #endif /* _LINUX_FS_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 37bea07c12f2..5f7b282d331d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1008,6 +1008,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_watch_sb(int dfd, const char __user *path,
+			     unsigned int at_flags, int watch_fd, int watch_id);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 2056318988f7..5eec69e2b312 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
 #define __NR_process_madvise 440
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
+#define __NR_watch_sb 441
+__SYSCALL(__NR_watch_sb, sys_watch_sb)
 
 #undef __NR_syscalls
-#define __NR_syscalls 441
+#define __NR_syscalls 442
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index c3d8320b5d3a..937363d9f7b3 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -14,7 +14,8 @@
 enum watch_notification_type {
 	WATCH_TYPE_META		= 0,	/* Special record */
 	WATCH_TYPE_KEY_NOTIFY	= 1,	/* Key change event notification */
-	WATCH_TYPE__NR		= 2
+	WATCH_TYPE_SB_NOTIFY	= 2,
+	WATCH_TYPE__NR		= 3
 };
 
 enum watch_meta_notification_subtype {
@@ -101,4 +102,35 @@ struct key_notification {
 	__u32	aux;		/* Per-type auxiliary data */
 };
 
+/*
+ * Type of superblock notification.
+ */
+enum superblock_notification_type {
+	NOTIFY_SUPERBLOCK_READONLY	= 0, /* Filesystem toggled between R/O and R/W */
+	NOTIFY_SUPERBLOCK_ERROR		= 1, /* Error in filesystem or blockdev */
+	NOTIFY_SUPERBLOCK_EDQUOT	= 2, /* EDQUOT notification */
+	NOTIFY_SUPERBLOCK_NETWORK	= 3, /* Network status change */
+};
+
+#define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
+
+/*
+ * Superblock notification record.
+ * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
+ * - watch.subtype = enum superblock_notification_subtype
+ */
+struct superblock_notification {
+	struct watch_notification watch; /* WATCH_TYPE_SB_NOTIFY */
+	__u64	sb_id;			/* 64-bit superblock ID [fsinfo_ids::f_sb_id] */
+};
+
+struct superblock_error_notification {
+	struct superblock_notification s; /* subtype = notify_superblock_error */
+	__u32	error_number;
+	__u32	error_cookie;
+	__u64	inode;
+	__u64	block;
+	char	desc[0];
+};
+
 #endif /* _UAPI_LINUX_WATCH_QUEUE_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index f27ac94d5fa7..3e97984bc4c8 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -51,6 +51,9 @@ COND_SYSCALL_COMPAT(io_pgetevents);
 COND_SYSCALL(io_uring_setup);
 COND_SYSCALL(io_uring_enter);
 COND_SYSCALL(io_uring_register);
+COND_SYSCALL(fsinfo);
+COND_SYSCALL(watch_mount);
+COND_SYSCALL(watch_sb);
 
 /* fs/xattr.c */
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2020-12-08  0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:51   ` Darrick J. Wong
  2020-12-08  0:31 ` [PATCH 6/8] fs: Add more superblock error subtypes Gabriel Krisman Bertazi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	Gabriel Krisman Bertazi, kernel

When reporting a filesystem error, we really need to know where the
error came from, therefore, include "function:line" information in the
notification sent to userspace.  There is no current users of notify_sb
in the kernel, so there are no callers to update.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/fs.h               | 11 +++++++++--
 include/uapi/linux/watch_queue.h |  3 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index df588edc0a34..864d86fcc68c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3514,14 +3514,17 @@ static inline void notify_sb(struct super_block *s,
 /**
  * notify_sb_error: Post superblock error notification.
  * @s: The superblock the notification is about.
+ * @function: function name reported as source of the warning.
+ * @line: source code line reported as source of the warning.
  * @error: The error number to be recorded.
  * @inode: The inode the error refers to (if available, 0 otherwise)
  * @block: The block the error refers to (if available, 0 otherwise)
  * @fmt: Formating string for extra information appended to the notification
  * @args: arguments for extra information string appended to the notification
  */
-static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
-				  u64 block, const char *fmt, va_list *args)
+static inline int notify_sb_error(struct super_block *s, const char *function, int line,
+				  int error, u64 inode, u64 block,
+				  const char *fmt, va_list *args)
 {
 #ifdef CONFIG_SB_NOTIFICATIONS
 	if (unlikely(s->s_watchers)) {
@@ -3534,8 +3537,12 @@ static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
 			.error_cookie	= 0,
 			.inode		= inode,
 			.block		= block,
+			.line		= line,
 		};
 
+		memcpy(&n.function, function, SB_NOTIFICATION_FNAME_LEN);
+		n.function[SB_NOTIFICATION_FNAME_LEN-1] = '\0';
+
 		post_sb_notification(s, &n.s, fmt, args);
 	}
 #endif
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index 937363d9f7b3..5fa5286c5cc7 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -114,6 +114,7 @@ enum superblock_notification_type {
 
 #define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
 
+#define SB_NOTIFICATION_FNAME_LEN 30
 /*
  * Superblock notification record.
  * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
@@ -130,6 +131,8 @@ struct superblock_error_notification {
 	__u32	error_cookie;
 	__u64	inode;
 	__u64	block;
+	char	function[SB_NOTIFICATION_FNAME_LEN];
+	__u16	line;
 	char	desc[0];
 };
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/8] fs: Add more superblock error subtypes
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2020-12-08  0:31 ` [PATCH 5/8] vfs: Include origin of the SB error notification Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 7/8] ext4: Implement SB error notification through watch_sb Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	Gabriel Krisman Bertazi, kernel

Expose new SB notification subtype for warnings, errors and general
messages.  This is modeled after the information exposed by ext4, but
should be the same for other filesystems.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/fs.h               | 56 ++++++++++++++++++++++++++++++++
 include/uapi/linux/watch_queue.h | 17 ++++++++++
 2 files changed, 73 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 864d86fcc68c..444030391008 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3549,6 +3549,62 @@ static inline int notify_sb_error(struct super_block *s, const char *function, i
 	return error;
 }
 
+/**
+ * notify_sb_warning: Post superblock warning notification.
+ * @s: The superblock the notification is about.
+ * @function: function name reported as source of the warning.
+ * @line: source code line reported as source of the warning.
+ * @inode: The inode the error refers to (if available, 0 otherwise)
+ * @block: The block the error refers to (if available, 0 otherwise)
+ * @fmt: Formating string for extra information appended to the notification
+ * @args: arguments for extra information string appended to the notification
+ */
+static inline void notify_sb_warning(struct super_block *s, const char *function,
+				     int line, u64 inode, u64 block,
+				     const char *fmt, va_list *args)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_error_notification n = {
+			.s.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.s.watch.subtype = NOTIFY_SUPERBLOCK_WARNING,
+			.s.watch.info	= watch_sizeof(n),
+			.s.sb_id	= s->s_unique_id,
+			.inode		= inode,
+			.block		= block,
+			.line		= line,
+		};
+
+		memcpy(&n.function, function, SB_NOTIFICATION_FNAME_LEN);
+		n.function[SB_NOTIFICATION_FNAME_LEN-1] = '\0';
+
+		post_sb_notification(s, &n.s, fmt, args);
+	}
+#endif
+}
+
+/**
+ * notify_sb_msg: Post superblock message.
+ * @s: The superblock the notification is about.
+ * @fmt: Formating string for extra information appended to the notification
+ * @args: arguments for extra information string appended to the notification
+ */
+static inline void notify_sb_msg(struct super_block *s, const char *fmt, va_list *args)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_msg_notification n = {
+			.s.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.s.watch.subtype = NOTIFY_SUPERBLOCK_MSG,
+			.s.watch.info	= watch_sizeof(n),
+			.s.sb_id	= s->s_unique_id,
+		};
+
+		post_sb_notification(s, &n.s, fmt, args);
+	}
+#endif
+}
+
 /**
  * notify_sb_EDQUOT: Post superblock quota overrun notification.
  * @s: The superblock the notification is about.
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index 5fa5286c5cc7..c4afd545e234 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -110,6 +110,9 @@ enum superblock_notification_type {
 	NOTIFY_SUPERBLOCK_ERROR		= 1, /* Error in filesystem or blockdev */
 	NOTIFY_SUPERBLOCK_EDQUOT	= 2, /* EDQUOT notification */
 	NOTIFY_SUPERBLOCK_NETWORK	= 3, /* Network status change */
+	NOTIFY_SUPERBLOCK_MSG		= 4, /* Filesystem message */
+	NOTIFY_SUPERBLOCK_WARNING	= 5, /* Filesystem warning */
+
 };
 
 #define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
@@ -136,4 +139,18 @@ struct superblock_error_notification {
 	char	desc[0];
 };
 
+struct superblock_msg_notification {
+	struct superblock_notification s; /* subtype = notify_superblock_msg */
+	char desc[0];
+};
+
+struct superblock_warning_notification {
+	struct superblock_notification s; /* subtype = notify_superblock_warning */
+	__u64	inode;
+	__u64	block;
+	char	function[SB_NOTIFICATION_FNAME_LEN];
+	__u16	line;
+	char desc[0];
+};
+
 #endif /* _UAPI_LINUX_WATCH_QUEUE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/8] ext4: Implement SB error notification through watch_sb
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (5 preceding siblings ...)
  2020-12-08  0:31 ` [PATCH 6/8] fs: Add more superblock error subtypes Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08  0:31 ` [PATCH 8/8] samples: watch_queue: Add sample of SB notifications Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	Gabriel Krisman Bertazi, kernel

This follows the same implementation of ext4 error reporting via dmesg,
but expose that information via the new watch_queue notifications API.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 fs/ext4/super.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 94472044f4c1..f239624003fc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -713,15 +713,17 @@ void __ext4_error(struct super_block *sb, const char *function,
 		return;
 
 	trace_ext4_error(sb, function, line);
+	va_start(args, fmt);
 	if (ext4_error_ratelimit(sb)) {
-		va_start(args, fmt);
 		vaf.fmt = fmt;
 		vaf.va = &args;
 		printk(KERN_CRIT
 		       "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n",
 		       sb->s_id, function, line, current->comm, &vaf);
-		va_end(args);
 	}
+	notify_sb_error(sb, function, line, error, 0, 0, fmt, &args);
+	va_end(args);
+
 	save_error_info(sb, error, 0, block, function, line);
 	ext4_handle_error(sb);
 }
@@ -737,8 +739,8 @@ void __ext4_error_inode(struct inode *inode, const char *function,
 		return;
 
 	trace_ext4_error(inode->i_sb, function, line);
+	va_start(args, fmt);
 	if (ext4_error_ratelimit(inode->i_sb)) {
-		va_start(args, fmt);
 		vaf.fmt = fmt;
 		vaf.va = &args;
 		if (block)
@@ -751,8 +753,11 @@ void __ext4_error_inode(struct inode *inode, const char *function,
 			       "inode #%lu: comm %s: %pV\n",
 			       inode->i_sb->s_id, function, line, inode->i_ino,
 			       current->comm, &vaf);
-		va_end(args);
 	}
+	notify_sb_error(inode->i_sb, function, line, error, inode->i_ino, block,
+			fmt, &args);
+	va_end(args);
+
 	save_error_info(inode->i_sb, error, inode->i_ino, block,
 			function, line);
 	ext4_handle_error(inode->i_sb);
@@ -771,11 +776,11 @@ void __ext4_error_file(struct file *file, const char *function,
 		return;
 
 	trace_ext4_error(inode->i_sb, function, line);
+	va_start(args, fmt);
 	if (ext4_error_ratelimit(inode->i_sb)) {
 		path = file_path(file, pathname, sizeof(pathname));
 		if (IS_ERR(path))
 			path = "(unknown)";
-		va_start(args, fmt);
 		vaf.fmt = fmt;
 		vaf.va = &args;
 		if (block)
@@ -790,8 +795,11 @@ void __ext4_error_file(struct file *file, const char *function,
 			       "comm %s: path %s: %pV\n",
 			       inode->i_sb->s_id, function, line, inode->i_ino,
 			       current->comm, path, &vaf);
-		va_end(args);
 	}
+	notify_sb_error(inode->i_sb, function, line, EFSCORRUPTED,
+			inode->i_ino, block, fmt, &args);
+	va_end(args);
+
 	save_error_info(inode->i_sb, EFSCORRUPTED, inode->i_ino, block,
 			function, line);
 	ext4_handle_error(inode->i_sb);
@@ -861,6 +869,8 @@ void __ext4_std_error(struct super_block *sb, const char *function,
 		       sb->s_id, function, line, errstr);
 	}
 
+	notify_sb_error(sb, function, line, errno, 0, 0, errstr, NULL);
+
 	save_error_info(sb, -errno, 0, 0, function, line);
 	ext4_handle_error(sb);
 }
@@ -890,6 +900,7 @@ void __ext4_abort(struct super_block *sb, const char *function,
 	vaf.va = &args;
 	printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: %pV\n",
 	       sb->s_id, function, line, &vaf);
+	notify_sb_error(sb, function, line, error, 0, 0, fmt, &args);
 	va_end(args);
 
 	if (sb_rdonly(sb) == 0) {
@@ -923,6 +934,7 @@ void __ext4_msg(struct super_block *sb,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 	printk("%sEXT4-fs (%s): %pV\n", prefix, sb->s_id, &vaf);
+	notify_sb_msg(sb, fmt, &args);
 	va_end(args);
 }
 
@@ -947,6 +959,7 @@ void __ext4_warning(struct super_block *sb, const char *function,
 	vaf.va = &args;
 	printk(KERN_WARNING "EXT4-fs warning (device %s): %s:%d: %pV\n",
 	       sb->s_id, function, line, &vaf);
+	notify_sb_warning(sb, function, line, 0, 0, fmt, &args);
 	va_end(args);
 }
 
@@ -965,6 +978,7 @@ void __ext4_warning_inode(const struct inode *inode, const char *function,
 	printk(KERN_WARNING "EXT4-fs warning (device %s): %s:%d: "
 	       "inode #%lu: comm %s: %pV\n", inode->i_sb->s_id,
 	       function, line, inode->i_ino, current->comm, &vaf);
+	notify_sb_warning(inode->i_sb, function, line, inode->i_ino, 0, fmt, &args);
 	va_end(args);
 }
 
@@ -984,8 +998,8 @@ __acquires(bitlock)
 	trace_ext4_error(sb, function, line);
 	__save_error_info(sb, EFSCORRUPTED, ino, block, function, line);
 
+	va_start(args, fmt);
 	if (ext4_error_ratelimit(sb)) {
-		va_start(args, fmt);
 		vaf.fmt = fmt;
 		vaf.va = &args;
 		printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: group %u, ",
@@ -996,8 +1010,9 @@ __acquires(bitlock)
 			printk(KERN_CONT "block %llu:",
 			       (unsigned long long) block);
 		printk(KERN_CONT "%pV\n", &vaf);
-		va_end(args);
 	}
+	notify_sb_error(sb, function, line, EFSCORRUPTED, ino, block, fmt, &args);
+	va_end(args);
 
 	if (test_opt(sb, WARN_ON_ERROR))
 		WARN_ON_ONCE(1);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 8/8] samples: watch_queue: Add sample of SB notifications
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (6 preceding siblings ...)
  2020-12-08  0:31 ` [PATCH 7/8] ext4: Implement SB error notification through watch_sb Gabriel Krisman Bertazi
@ 2020-12-08  0:31 ` Gabriel Krisman Bertazi
  2020-12-08 12:51 ` [PATCH 5/8] vfs: Include origin of the SB error notification David Howells
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:31 UTC (permalink / raw)
  To: dhowells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel,
	Gabriel Krisman Bertazi, kernel

This sample demonstrates how to use the watch_sb syscall.  It exposes
notifications like the following:

  root@host:~# ./watch_sb /mnt
  read() = 93
  NOTIFY[000]: ty=000002 sy=01 i=0300005d
    SB AT ext4_remount:5636 ERROR: 16 inode=0 block=0
    description: Abort forced by user
  read() = 96
  NOTIFY[000]: ty=000002 sy=01 i=03000060
    SB AT ext4_lookup:1706 ERROR: 0 inode=13 block=0
    description: iget: bogus i_mode (45)

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 samples/watch_queue/Makefile   |   2 +-
 samples/watch_queue/watch_sb.c | 114 +++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 samples/watch_queue/watch_sb.c

diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile
index c0db3a6bc524..6067d57a5bb1 100644
--- a/samples/watch_queue/Makefile
+++ b/samples/watch_queue/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-userprogs-always-y += watch_test
+userprogs-always-y += watch_test watch_sb
 
 userccflags += -I usr/include
diff --git a/samples/watch_queue/watch_sb.c b/samples/watch_queue/watch_sb.c
new file mode 100644
index 000000000000..51b660334f6b
--- /dev/null
+++ b/samples/watch_queue/watch_sb.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Use watch_sb to watch for SB notifications.
+ *
+ * Copyright (C) 2020 Collabora Ltd.
+ * Written by Gabriel Krisman Bertazi <krisman@collabora.com>
+ *   Based on watch_test.c by David Howells (dhowells@redhat.com)
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <err.h>
+#include <string.h>
+#include<sys/ioctl.h>
+#include <linux/watch_queue.h>
+
+#ifndef __NR_watch_sb
+# define __NR_watch_sb 441
+#endif
+
+static void consumer(int fd)
+{
+	unsigned char buffer[433], *p, *end;
+	union {
+		struct watch_notification n;
+		unsigned char buf1[128];
+		struct superblock_error_notification sen;
+		struct superblock_warning_notification swn;
+		struct superblock_msg_notification smn;
+	} n;
+	ssize_t buf_len;
+
+	for (;;) {
+		buf_len = read(fd, buffer, sizeof(buffer));
+		if (buf_len == -1)
+			err(1, "read");
+
+		if (buf_len == 0) {
+			printf("-- END --\n");
+			return;
+		}
+
+		if (buf_len > sizeof(buffer)) {
+			err(1, "Read buffer overrun: %zd\n", buf_len);
+			return;
+		}
+
+		printf("read() = %zd\n", buf_len);
+
+		p = buffer;
+		end = buffer + buf_len;
+		while (p < end) {
+			size_t largest, len;
+
+			largest = end - p;
+			if (largest > 128)
+				largest = 128;
+			if (largest < sizeof(struct watch_notification))
+				err(1, "Short message header: %zu\n", largest);
+
+			memcpy(&n, p, largest);
+
+			printf("NOTIFY[%03zx]: ty=%06x sy=%02x i=%08x\n",
+			       p - buffer, n.n.type, n.n.subtype, n.n.info);
+
+			len = n.n.info & WATCH_INFO_LENGTH;
+			if (len < sizeof(n.n) || len > largest)
+				err(1, "Bad message length: %zu/%zu\n", len, largest);
+
+			switch (n.n.subtype) {
+			case NOTIFY_SUPERBLOCK_ERROR:
+				printf("\t SB AT %s:%d ERROR: %d inode=%llu block=%llu\n",
+				       n.sen.function, n.sen.line, n.sen.error_number,
+				       n.sen.inode, n.sen.block);
+				if (len > sizeof(n.sen))
+					printf("description: %s\n", n.sen.desc);
+				break;
+			case NOTIFY_SUPERBLOCK_MSG:
+				printf("\t Ext4 MSG: %s\n", n.smn.desc);
+				break;
+			case NOTIFY_SUPERBLOCK_WARNING:
+				printf("\t SB AT %s:%d WARNING inode=%llu block=%llu\n",
+				       n.swn.function, n.swn.line, n.swn.inode, n.swn.block);
+				if (len > sizeof(n.sen))
+					printf("description: %s\n", n.swn.desc);
+				break;
+			default:
+				printf("unknown subtype %c\n", n.n.subtype);
+			}
+			p += len;
+		}
+	}
+}
+
+int main (int argc, char **argv)
+{
+	int fd[2];
+
+	if (argc != 2)
+		errx(1, "Missing mount point\n");
+
+	if (syscall(293, fd, O_NOTIFICATION_PIPE) < 0)
+		err(1, "Failed to open pipe\n");
+
+	if (ioctl(fd[0], IOC_WATCH_QUEUE_SET_SIZE, 256) < 0)
+		err(1, "ioctl fail\n");
+
+	if (syscall(__NR_watch_sb, 0, argv[1], NULL, fd[0], 0x3) < 0)
+		err(1, "Failed to watch SB\n");
+
+	consumer(fd[0]);
+
+	return 0;
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08  0:31 ` [PATCH 5/8] vfs: Include origin of the SB error notification Gabriel Krisman Bertazi
@ 2020-12-08  0:51   ` Darrick J. Wong
  2020-12-08  0:55     ` Gabriel Krisman Bertazi
  2020-12-08 12:42     ` David Howells
  0 siblings, 2 replies; 26+ messages in thread
From: Darrick J. Wong @ 2020-12-08  0:51 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Mon, Dec 07, 2020 at 09:31:14PM -0300, Gabriel Krisman Bertazi wrote:
> When reporting a filesystem error, we really need to know where the
> error came from, therefore, include "function:line" information in the
> notification sent to userspace.  There is no current users of notify_sb
> in the kernel, so there are no callers to update.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  include/linux/fs.h               | 11 +++++++++--
>  include/uapi/linux/watch_queue.h |  3 +++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index df588edc0a34..864d86fcc68c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3514,14 +3514,17 @@ static inline void notify_sb(struct super_block *s,
>  /**
>   * notify_sb_error: Post superblock error notification.
>   * @s: The superblock the notification is about.
> + * @function: function name reported as source of the warning.
> + * @line: source code line reported as source of the warning.
>   * @error: The error number to be recorded.
>   * @inode: The inode the error refers to (if available, 0 otherwise)
>   * @block: The block the error refers to (if available, 0 otherwise)
>   * @fmt: Formating string for extra information appended to the notification
>   * @args: arguments for extra information string appended to the notification
>   */
> -static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
> -				  u64 block, const char *fmt, va_list *args)
> +static inline int notify_sb_error(struct super_block *s, const char *function, int line,
> +				  int error, u64 inode, u64 block,
> +				  const char *fmt, va_list *args)
>  {
>  #ifdef CONFIG_SB_NOTIFICATIONS
>  	if (unlikely(s->s_watchers)) {
> @@ -3534,8 +3537,12 @@ static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
>  			.error_cookie	= 0,
>  			.inode		= inode,
>  			.block		= block,
> +			.line		= line,
>  		};
>  
> +		memcpy(&n.function, function, SB_NOTIFICATION_FNAME_LEN);
> +		n.function[SB_NOTIFICATION_FNAME_LEN-1] = '\0';
> +
>  		post_sb_notification(s, &n.s, fmt, args);
>  	}
>  #endif
> diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
> index 937363d9f7b3..5fa5286c5cc7 100644
> --- a/include/uapi/linux/watch_queue.h
> +++ b/include/uapi/linux/watch_queue.h
> @@ -114,6 +114,7 @@ enum superblock_notification_type {
>  
>  #define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
>  
> +#define SB_NOTIFICATION_FNAME_LEN 30
>  /*
>   * Superblock notification record.
>   * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
> @@ -130,6 +131,8 @@ struct superblock_error_notification {
>  	__u32	error_cookie;
>  	__u64	inode;
>  	__u64	block;
> +	char	function[SB_NOTIFICATION_FNAME_LEN];
> +	__u16	line;

Er... this is enlarging a structure in the userspace ABI, right?  Which
will break userspace that latched on to the structure definition in the
previous patch, and therefore isn't expecting a function name here.

If you're gonna put a character string(?) at the end then I guess you
have to pre-pad the notification structure so that we can add things
later, or.... bump the type code every time you add a field?

(Maybe I misread that?  But this is include/uapi/...)

--D

>  	char	desc[0];
>  };
>  
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08  0:51   ` Darrick J. Wong
@ 2020-12-08  0:55     ` Gabriel Krisman Bertazi
  2020-12-08 12:42     ` David Howells
  1 sibling, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08  0:55 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> On Mon, Dec 07, 2020 at 09:31:14PM -0300, Gabriel Krisman Bertazi wrote:
>> When reporting a filesystem error, we really need to know where the
>> error came from, therefore, include "function:line" information in the
>> notification sent to userspace.  There is no current users of notify_sb
>> in the kernel, so there are no callers to update.
>> 
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>>  include/linux/fs.h               | 11 +++++++++--
>>  include/uapi/linux/watch_queue.h |  3 +++
>>  2 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index df588edc0a34..864d86fcc68c 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -3514,14 +3514,17 @@ static inline void notify_sb(struct super_block *s,
>>  /**
>>   * notify_sb_error: Post superblock error notification.
>>   * @s: The superblock the notification is about.
>> + * @function: function name reported as source of the warning.
>> + * @line: source code line reported as source of the warning.
>>   * @error: The error number to be recorded.
>>   * @inode: The inode the error refers to (if available, 0 otherwise)
>>   * @block: The block the error refers to (if available, 0 otherwise)
>>   * @fmt: Formating string for extra information appended to the notification
>>   * @args: arguments for extra information string appended to the notification
>>   */
>> -static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
>> -				  u64 block, const char *fmt, va_list *args)
>> +static inline int notify_sb_error(struct super_block *s, const char *function, int line,
>> +				  int error, u64 inode, u64 block,
>> +				  const char *fmt, va_list *args)
>>  {
>>  #ifdef CONFIG_SB_NOTIFICATIONS
>>  	if (unlikely(s->s_watchers)) {
>> @@ -3534,8 +3537,12 @@ static inline int notify_sb_error(struct super_block *s, int error,  u64 inode,
>>  			.error_cookie	= 0,
>>  			.inode		= inode,
>>  			.block		= block,
>> +			.line		= line,
>>  		};
>>  
>> +		memcpy(&n.function, function, SB_NOTIFICATION_FNAME_LEN);
>> +		n.function[SB_NOTIFICATION_FNAME_LEN-1] = '\0';
>> +
>>  		post_sb_notification(s, &n.s, fmt, args);
>>  	}
>>  #endif
>> diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
>> index 937363d9f7b3..5fa5286c5cc7 100644
>> --- a/include/uapi/linux/watch_queue.h
>> +++ b/include/uapi/linux/watch_queue.h
>> @@ -114,6 +114,7 @@ enum superblock_notification_type {
>>  
>>  #define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
>>  
>> +#define SB_NOTIFICATION_FNAME_LEN 30
>>  /*
>>   * Superblock notification record.
>>   * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
>> @@ -130,6 +131,8 @@ struct superblock_error_notification {
>>  	__u32	error_cookie;
>>  	__u64	inode;
>>  	__u64	block;
>> +	char	function[SB_NOTIFICATION_FNAME_LEN];
>> +	__u16	line;
>
> Er... this is enlarging a structure in the userspace ABI, right?  Which
> will break userspace that latched on to the structure definition in the
> previous patch, and therefore isn't expecting a function name here.

Hi Darrick,

Since the structure is defined in the patch immediately before, I
thought it would be ok to split the patch to preserve authorship of the
different parts.  I will fold this into patch 4 in the next iteration.

>
> If you're gonna put a character string(?) at the end then I guess you
> have to pre-pad the notification structure so that we can add things
> later, or.... bump the type code every time you add a field?
>
> (Maybe I misread that?  But this is include/uapi/...)
>
> --D
>
>>  	char	desc[0];
>>  };
>>  
>> -- 
>> 2.29.2
>> 

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] vfs: Add superblock notifications
  2020-12-08  0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
@ 2020-12-08  0:56   ` Darrick J. Wong
  2020-12-10 22:09   ` Dave Chinner
  1 sibling, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2020-12-08  0:56 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Mon, Dec 07, 2020 at 09:31:13PM -0300, Gabriel Krisman Bertazi wrote:
> From: David Howells <dhowells@redhat.com>
> 
> Add a superblock event notification facility whereby notifications about
> superblock events, such as I/O errors (EIO), quota limits being hit
> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
> process asynchronously.  Note that this does not cover vfsmount topology
> changes.  watch_mount() is used for that.

<being a lazy reviewer and skipping straight to the data format>

> diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
> index c3d8320b5d3a..937363d9f7b3 100644
> --- a/include/uapi/linux/watch_queue.h
> +++ b/include/uapi/linux/watch_queue.h
> @@ -14,7 +14,8 @@
>  enum watch_notification_type {
>  	WATCH_TYPE_META		= 0,	/* Special record */
>  	WATCH_TYPE_KEY_NOTIFY	= 1,	/* Key change event notification */
> -	WATCH_TYPE__NR		= 2
> +	WATCH_TYPE_SB_NOTIFY	= 2,
> +	WATCH_TYPE__NR		= 3
>  };
>  
>  enum watch_meta_notification_subtype {
> @@ -101,4 +102,35 @@ struct key_notification {
>  	__u32	aux;		/* Per-type auxiliary data */
>  };
>  
> +/*
> + * Type of superblock notification.
> + */
> +enum superblock_notification_type {
> +	NOTIFY_SUPERBLOCK_READONLY	= 0, /* Filesystem toggled between R/O and R/W */
> +	NOTIFY_SUPERBLOCK_ERROR		= 1, /* Error in filesystem or blockdev */
> +	NOTIFY_SUPERBLOCK_EDQUOT	= 2, /* EDQUOT notification */
> +	NOTIFY_SUPERBLOCK_NETWORK	= 3, /* Network status change */
> +};
> +
> +#define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
> +
> +/*
> + * Superblock notification record.
> + * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
> + * - watch.subtype = enum superblock_notification_subtype
> + */
> +struct superblock_notification {
> +	struct watch_notification watch; /* WATCH_TYPE_SB_NOTIFY */
> +	__u64	sb_id;			/* 64-bit superblock ID [fsinfo_ids::f_sb_id] */
> +};
> +
> +struct superblock_error_notification {
> +	struct superblock_notification s; /* subtype = notify_superblock_error */
> +	__u32	error_number;
> +	__u32	error_cookie;
> +	__u64	inode;
> +	__u64	block;

Is this a file offset?  In ... i_blocksize() units?  What about
filesystems that have multiple file offset mapping structures, like xfs?

IOWs can we make a structure that covers enough of the "common"ly
desired features that we don't just end up with the per-fs but then
duplicated everywhere mess that is GETFLAGS/FSGETXATTR?

> +	char	desc[0];

If the end of this is a VLA then I guess we can't add new fields by
bumping the size and hoping userspace notices.  I guess that implies the
need for some padding and a flags field that we can set bits in when we
start using that padding...

--D

> +};
> +
>  #endif /* _UAPI_LINUX_WATCH_QUEUE_H */
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index f27ac94d5fa7..3e97984bc4c8 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -51,6 +51,9 @@ COND_SYSCALL_COMPAT(io_pgetevents);
>  COND_SYSCALL(io_uring_setup);
>  COND_SYSCALL(io_uring_enter);
>  COND_SYSCALL(io_uring_register);
> +COND_SYSCALL(fsinfo);
> +COND_SYSCALL(watch_mount);
> +COND_SYSCALL(watch_sb);
>  
>  /* fs/xattr.c */
>  
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08  0:51   ` Darrick J. Wong
  2020-12-08  0:55     ` Gabriel Krisman Bertazi
@ 2020-12-08 12:42     ` David Howells
  1 sibling, 0 replies; 26+ messages in thread
From: David Howells @ 2020-12-08 12:42 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, Darrick J. Wong, viro, tytso, khazhy, adilger.kernel,
	linux-ext4, linux-fsdevel, kernel

Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> Since the structure is defined in the patch immediately before, I
> thought it would be ok to split the patch to preserve authorship of the
> different parts.

Think git bisect.

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (7 preceding siblings ...)
  2020-12-08  0:31 ` [PATCH 8/8] samples: watch_queue: Add sample of SB notifications Gabriel Krisman Bertazi
@ 2020-12-08 12:51 ` David Howells
  2020-12-08 12:58   ` Gabriel Krisman Bertazi
  2020-12-08 12:57 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification David Howells
  2020-12-08 12:59 ` [PATCH 0/8] Superblock Notifications David Howells
  10 siblings, 1 reply; 26+ messages in thread
From: David Howells @ 2020-12-08 12:51 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> @@ -130,6 +131,8 @@ struct superblock_error_notification {
>  	__u32	error_cookie;
>  	__u64	inode;
>  	__u64	block;
> +	char	function[SB_NOTIFICATION_FNAME_LEN];
> +	__u16	line;
>  	char	desc[0];
>  };

As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
however, merge this ahead a patch).  Also, I would put the __u16 before the
char[].

That said, I'm not sure whether it's useful to include the function name and
line.  Both fields are liable to change over kernel commits, so it's not
something userspace can actually interpret.  I think you're better off dumping
those into dmesg.

Further, this reduces the capacity of desc[] significantly - I don't know if
that's a problem.

And yet further, there's no room for addition of new fields with the desc[]
buffer on the end.  Now maybe you're planning on making use of desc[] for
text-encoding?

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] watch_queue: Support a text field at the end of the notification
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (8 preceding siblings ...)
  2020-12-08 12:51 ` [PATCH 5/8] vfs: Include origin of the SB error notification David Howells
@ 2020-12-08 12:57 ` David Howells
  2020-12-08 12:59 ` [PATCH 0/8] Superblock Notifications David Howells
  10 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2020-12-08 12:57 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> This allow notifications to send text information to userspace without
> having to copy it to a temporary buffer to then copy to the ring.  One
> use case to pass text information in notifications is for error
> reporting, where more debug information might be needed, but we don't
> want to explode the number of subtypes of notifications.  For instance,
> ext4 can have a single inode error notification subtype, and pass more
> information on the cause of the error in this field.

I'm generally okay with this, but you really need to note that if you make use
of this for a subtype, you don't get to add more fields for that subtype
unless there's an offset to it.

Speaking of that, you need to update:

	Documentation/watch_queue.rst

since you're changing the API.

The "Watch Sources" section will also need altering to indicate that
superblock events are now a thing.

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08 12:51 ` [PATCH 5/8] vfs: Include origin of the SB error notification David Howells
@ 2020-12-08 12:58   ` Gabriel Krisman Bertazi
  2020-12-08 18:41     ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08 12:58 UTC (permalink / raw)
  To: David Howells
  Cc: viro, tytso, khazhy, adilger.kernel, linux-ext4, linux-fsdevel, kernel

David Howells <dhowells@redhat.com> writes:

> Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
>
>> @@ -130,6 +131,8 @@ struct superblock_error_notification {
>>  	__u32	error_cookie;
>>  	__u64	inode;
>>  	__u64	block;
>> +	char	function[SB_NOTIFICATION_FNAME_LEN];
>> +	__u16	line;
>>  	char	desc[0];
>>  };
>
> As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
> however, merge this ahead a patch).  Also, I would put the __u16 before the
> char[].
>
> That said, I'm not sure whether it's useful to include the function name and
> line.  Both fields are liable to change over kernel commits, so it's not
> something userspace can actually interpret.  I think you're better off dumping
> those into dmesg.
>
> Further, this reduces the capacity of desc[] significantly - I don't know if
> that's a problem.

Yes, that is a big problem as desc is already quite limited.  I don't
think it is a problem for them to change between kernel versions, as the
monitoring userspace can easily associate it with the running kernel.
The alternative would be generating something like unique IDs for each
error notification in the filesystem, no?

> And yet further, there's no room for addition of new fields with the desc[]
> buffer on the end.  Now maybe you're planning on making use of desc[] for
> text-encoding?

Yes.  I would like to be able to provide more details on the error,
without having a unique id.  For instance, desc would have the formatted
string below, describing the warning:

ext4_warning(inode->i_sb, "couldn't mark inode dirty (err %d)", err);


>
> David
>

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/8] Superblock Notifications
  2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
                   ` (9 preceding siblings ...)
  2020-12-08 12:57 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification David Howells
@ 2020-12-08 12:59 ` David Howells
  10 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2020-12-08 12:59 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> After the two previous RFCs this is an attempt to get the ball spinning
> again.
> 
> The problem I'm trying to solve is providing an interface to monitor
> filesystem errors.  This patchset includes an example implementation of
> ext4 error notification.  This goes obviously on top of the watch_queue
> mechanism.

Thanks for picking this up and advancing it.

> Regarding the buffer overrun issue that would require fsinfo or another
> method to expose counters, I think they can be added at a later date
> with no change to what this patch series attempts to do, therefore I'm
> proposing we don't wait for fsinfo before getting this merged.

That's fine, but anyone wanting to use this will need to be aware that
overruns are a problem.

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08 12:58   ` Gabriel Krisman Bertazi
@ 2020-12-08 18:41     ` Darrick J. Wong
  2020-12-08 19:29       ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2020-12-08 18:41 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: David Howells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Tue, Dec 08, 2020 at 09:58:25AM -0300, Gabriel Krisman Bertazi wrote:
> David Howells <dhowells@redhat.com> writes:
> 
> > Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> >
> >> @@ -130,6 +131,8 @@ struct superblock_error_notification {

FWIW I wonder if this really should be inode_error_notification?

If (for example) ext4 discovered an error in the blockgroup descriptor
and wanted to report it, the inode and block numbers would be
irrelevant, but the blockgroup number would be nice to have.

> >>  	__u32	error_cookie;
> >>  	__u64	inode;
> >>  	__u64	block;
> >> +	char	function[SB_NOTIFICATION_FNAME_LEN];
> >> +	__u16	line;
> >>  	char	desc[0];
> >>  };
> >
> > As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
> > however, merge this ahead a patch).  Also, I would put the __u16 before the
> > char[].
> >
> > That said, I'm not sure whether it's useful to include the function name and
> > line.  Both fields are liable to change over kernel commits, so it's not
> > something userspace can actually interpret.  I think you're better off dumping
> > those into dmesg.
> >
> > Further, this reduces the capacity of desc[] significantly - I don't know if
> > that's a problem.
> 
> Yes, that is a big problem as desc is already quite limited.  I don't

How limited?

> think it is a problem for them to change between kernel versions, as the
> monitoring userspace can easily associate it with the running kernel.

How do you make that association?  $majordistro's 4.18 kernel is not the
same as the upstream 4.18.  Wouldn't you rather the notification message
be entirely self-describing rather than depending on some external
information about the sender?

> The alternative would be generating something like unique IDs for each
> error notification in the filesystem, no?
> 
> > And yet further, there's no room for addition of new fields with the desc[]
> > buffer on the end.  Now maybe you're planning on making use of desc[] for
> > text-encoding?
> 
> Yes.  I would like to be able to provide more details on the error,
> without having a unique id.  For instance, desc would have the formatted
> string below, describing the warning:
> 
> ext4_warning(inode->i_sb, "couldn't mark inode dirty (err %d)", err);

Depending on the upper limit on the length of messages, I wonder if you
could split the superblock notification and the description string into
separate messages (with maybe the error cookie to tie them together) so
that the struct isn't limited by having a VLA on the end, and the
description can be more or less an arbitrary string?

(That said I'm not familiar with the watch queue system so I have no
idea if chained messages even make sense here, or are already
implemented in some other way, or...)

Even better if you could find a way to send the string and the arguments
separately so that whatever's on the receiving end could run it through
a localization catalogue.  Though I remember my roommates trying to
localize 2.0.35 into Latin 25 years ago and never getting very far. :)

--D

> 
> 
> >
> > David
> >
> 
> -- 
> Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08 18:41     ` Darrick J. Wong
@ 2020-12-08 19:29       ` Gabriel Krisman Bertazi
  2020-12-09  3:24         ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-08 19:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: David Howells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> On Tue, Dec 08, 2020 at 09:58:25AM -0300, Gabriel Krisman Bertazi wrote:
>> David Howells <dhowells@redhat.com> writes:
>> 
>> > Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
>> >
>> >> @@ -130,6 +131,8 @@ struct superblock_error_notification {
>
> FWIW I wonder if this really should be inode_error_notification?
>
> If (for example) ext4 discovered an error in the blockgroup descriptor
> and wanted to report it, the inode and block numbers would be
> irrelevant, but the blockgroup number would be nice to have.

A previous RFC had superblock_error_notification and
superblock_inode_error_notification split, I think we can recover that.

>
>> >>  	__u32	error_cookie;
>> >>  	__u64	inode;
>> >>  	__u64	block;
>> >> +	char	function[SB_NOTIFICATION_FNAME_LEN];
>> >> +	__u16	line;
>> >>  	char	desc[0];
>> >>  };
>> >
>> > As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
>> > however, merge this ahead a patch).  Also, I would put the __u16 before the
>> > char[].
>> >
>> > That said, I'm not sure whether it's useful to include the function name and
>> > line.  Both fields are liable to change over kernel commits, so it's not
>> > something userspace can actually interpret.  I think you're better off dumping
>> > those into dmesg.
>> >
>> > Further, this reduces the capacity of desc[] significantly - I don't know if
>> > that's a problem.
>> 
>> Yes, that is a big problem as desc is already quite limited.  I don't
>
> How limited?

The largest notification is 128 bytes, the one with the biggest header
is superblock_error_notification which leaves 56 bytes for description.

>
>> think it is a problem for them to change between kernel versions, as the
>> monitoring userspace can easily associate it with the running kernel.
>
> How do you make that association?  $majordistro's 4.18 kernel is not the
> same as the upstream 4.18.  Wouldn't you rather the notification message
> be entirely self-describing rather than depending on some external
> information about the sender?

True.  I was thinking on my use case where the customer controls their
infrastructure and would specialize their userspace tools, but that is
poor design on my part.  A self describing mechanism would be better.

>
>> The alternative would be generating something like unique IDs for each
>> error notification in the filesystem, no?
>> 
>> > And yet further, there's no room for addition of new fields with the desc[]
>> > buffer on the end.  Now maybe you're planning on making use of desc[] for
>> > text-encoding?
>> 
>> Yes.  I would like to be able to provide more details on the error,
>> without having a unique id.  For instance, desc would have the formatted
>> string below, describing the warning:
>> 
>> ext4_warning(inode->i_sb, "couldn't mark inode dirty (err %d)", err);
>
> Depending on the upper limit on the length of messages, I wonder if you
> could split the superblock notification and the description string into
> separate messages (with maybe the error cookie to tie them together) so
> that the struct isn't limited by having a VLA on the end, and the
> description can be more or less an arbitrary string?
>
> (That said I'm not familiar with the watch queue system so I have no
> idea if chained messages even make sense here, or are already
> implemented in some other way, or...)

I don't see any support for chaining messages in the current watch_queue
implementation, I'd need to extend the interface to support it.  I
considered this idea before, given the small description size, but I
thought it would be over-complicated, even though much more future
proof.  I will look into that.

What about the kernel exporting a per-filesystem table, as a build
target or in /sys/fs/<fs>/errors, that has descriptions strings for each
error?  Then the notification can have only the FS type, index to the
table and params.  This won't exactly be self-describing as you wanted
but, differently from function:line, it removes the need for the source
code, and allows localization.  The per-filesystem table would be
stable ABI, of course.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-08 19:29       ` Gabriel Krisman Bertazi
@ 2020-12-09  3:24         ` Darrick J. Wong
  2020-12-09 13:06           ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2020-12-09  3:24 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: David Howells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Tue, Dec 08, 2020 at 04:29:32PM -0300, Gabriel Krisman Bertazi wrote:
> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> 
> > On Tue, Dec 08, 2020 at 09:58:25AM -0300, Gabriel Krisman Bertazi wrote:
> >> David Howells <dhowells@redhat.com> writes:
> >> 
> >> > Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> >> >
> >> >> @@ -130,6 +131,8 @@ struct superblock_error_notification {
> >
> > FWIW I wonder if this really should be inode_error_notification?
> >
> > If (for example) ext4 discovered an error in the blockgroup descriptor
> > and wanted to report it, the inode and block numbers would be
> > irrelevant, but the blockgroup number would be nice to have.
> 
> A previous RFC had superblock_error_notification and
> superblock_inode_error_notification split, I think we can recover that.
> 
> >
> >> >>  	__u32	error_cookie;
> >> >>  	__u64	inode;
> >> >>  	__u64	block;
> >> >> +	char	function[SB_NOTIFICATION_FNAME_LEN];
> >> >> +	__u16	line;
> >> >>  	char	desc[0];
> >> >>  };
> >> >
> >> > As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
> >> > however, merge this ahead a patch).  Also, I would put the __u16 before the
> >> > char[].
> >> >
> >> > That said, I'm not sure whether it's useful to include the function name and
> >> > line.  Both fields are liable to change over kernel commits, so it's not
> >> > something userspace can actually interpret.  I think you're better off dumping
> >> > those into dmesg.
> >> >
> >> > Further, this reduces the capacity of desc[] significantly - I don't know if
> >> > that's a problem.
> >> 
> >> Yes, that is a big problem as desc is already quite limited.  I don't
> >
> > How limited?
> 
> The largest notification is 128 bytes, the one with the biggest header
> is superblock_error_notification which leaves 56 bytes for description.
> 
> >
> >> think it is a problem for them to change between kernel versions, as the
> >> monitoring userspace can easily associate it with the running kernel.
> >
> > How do you make that association?  $majordistro's 4.18 kernel is not the
> > same as the upstream 4.18.  Wouldn't you rather the notification message
> > be entirely self-describing rather than depending on some external
> > information about the sender?
> 
> True.  I was thinking on my use case where the customer controls their
> infrastructure and would specialize their userspace tools, but that is
> poor design on my part.  A self describing mechanism would be better.
> 
> >
> >> The alternative would be generating something like unique IDs for each
> >> error notification in the filesystem, no?
> >> 
> >> > And yet further, there's no room for addition of new fields with the desc[]
> >> > buffer on the end.  Now maybe you're planning on making use of desc[] for
> >> > text-encoding?
> >> 
> >> Yes.  I would like to be able to provide more details on the error,
> >> without having a unique id.  For instance, desc would have the formatted
> >> string below, describing the warning:
> >> 
> >> ext4_warning(inode->i_sb, "couldn't mark inode dirty (err %d)", err);
> >
> > Depending on the upper limit on the length of messages, I wonder if you
> > could split the superblock notification and the description string into
> > separate messages (with maybe the error cookie to tie them together) so
> > that the struct isn't limited by having a VLA on the end, and the
> > description can be more or less an arbitrary string?
> >
> > (That said I'm not familiar with the watch queue system so I have no
> > idea if chained messages even make sense here, or are already
> > implemented in some other way, or...)
> 
> I don't see any support for chaining messages in the current watch_queue
> implementation, I'd need to extend the interface to support it.  I
> considered this idea before, given the small description size, but I
> thought it would be over-complicated, even though much more future
> proof.  I will look into that.
> 
> What about the kernel exporting a per-filesystem table, as a build
> target or in /sys/fs/<fs>/errors, that has descriptions strings for each
> error?  Then the notification can have only the FS type, index to the
> table and params.  This won't exactly be self-describing as you wanted
> but, differently from function:line, it removes the need for the source
> code, and allows localization.  The per-filesystem table would be
> stable ABI, of course.

Yikes.  I don't think people are going to be ok with a message table
where we can never remove the strings.  I bet GregKH won't like that
either (one value per sysfs file).

(Maybe I misread that and all you meant by stable ABI is the fact that
the table exists at a given path and the notification message gives you
a index into ... wherever we put it.)

--D

> 
> -- 
> Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-09  3:24         ` Darrick J. Wong
@ 2020-12-09 13:06           ` Gabriel Krisman Bertazi
  2020-12-11 22:35             ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-09 13:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: David Howells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> On Tue, Dec 08, 2020 at 04:29:32PM -0300, Gabriel Krisman Bertazi wrote:
>> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
>> 
>> > On Tue, Dec 08, 2020 at 09:58:25AM -0300, Gabriel Krisman Bertazi wrote:
>> >> David Howells <dhowells@redhat.com> writes:
>> >> 
>> >> > Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
>> >> >
>> >> >> @@ -130,6 +131,8 @@ struct superblock_error_notification {
>> >
>> > FWIW I wonder if this really should be inode_error_notification?
>> >
>> > If (for example) ext4 discovered an error in the blockgroup descriptor
>> > and wanted to report it, the inode and block numbers would be
>> > irrelevant, but the blockgroup number would be nice to have.
>> 
>> A previous RFC had superblock_error_notification and
>> superblock_inode_error_notification split, I think we can recover that.
>> 
>> >
>> >> >>  	__u32	error_cookie;
>> >> >>  	__u64	inode;
>> >> >>  	__u64	block;
>> >> >> +	char	function[SB_NOTIFICATION_FNAME_LEN];
>> >> >> +	__u16	line;
>> >> >>  	char	desc[0];
>> >> >>  };
>> >> >
>> >> > As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
>> >> > however, merge this ahead a patch).  Also, I would put the __u16 before the
>> >> > char[].
>> >> >
>> >> > That said, I'm not sure whether it's useful to include the function name and
>> >> > line.  Both fields are liable to change over kernel commits, so it's not
>> >> > something userspace can actually interpret.  I think you're better off dumping
>> >> > those into dmesg.
>> >> >
>> >> > Further, this reduces the capacity of desc[] significantly - I don't know if
>> >> > that's a problem.
>> >> 
>> >> Yes, that is a big problem as desc is already quite limited.  I don't
>> >
>> > How limited?
>> 
>> The largest notification is 128 bytes, the one with the biggest header
>> is superblock_error_notification which leaves 56 bytes for description.
>> 
>> >
>> >> think it is a problem for them to change between kernel versions, as the
>> >> monitoring userspace can easily associate it with the running kernel.
>> >
>> > How do you make that association?  $majordistro's 4.18 kernel is not the
>> > same as the upstream 4.18.  Wouldn't you rather the notification message
>> > be entirely self-describing rather than depending on some external
>> > information about the sender?
>> 
>> True.  I was thinking on my use case where the customer controls their
>> infrastructure and would specialize their userspace tools, but that is
>> poor design on my part.  A self describing mechanism would be better.
>> 
>> >
>> >> The alternative would be generating something like unique IDs for each
>> >> error notification in the filesystem, no?
>> >> 
>> >> > And yet further, there's no room for addition of new fields with the desc[]
>> >> > buffer on the end.  Now maybe you're planning on making use of desc[] for
>> >> > text-encoding?
>> >> 
>> >> Yes.  I would like to be able to provide more details on the error,
>> >> without having a unique id.  For instance, desc would have the formatted
>> >> string below, describing the warning:
>> >> 
>> >> ext4_warning(inode->i_sb, "couldn't mark inode dirty (err %d)", err);
>> >
>> > Depending on the upper limit on the length of messages, I wonder if you
>> > could split the superblock notification and the description string into
>> > separate messages (with maybe the error cookie to tie them together) so
>> > that the struct isn't limited by having a VLA on the end, and the
>> > description can be more or less an arbitrary string?
>> >
>> > (That said I'm not familiar with the watch queue system so I have no
>> > idea if chained messages even make sense here, or are already
>> > implemented in some other way, or...)
>> 
>> I don't see any support for chaining messages in the current watch_queue
>> implementation, I'd need to extend the interface to support it.  I
>> considered this idea before, given the small description size, but I
>> thought it would be over-complicated, even though much more future
>> proof.  I will look into that.
>> 
>> What about the kernel exporting a per-filesystem table, as a build
>> target or in /sys/fs/<fs>/errors, that has descriptions strings for each
>> error?  Then the notification can have only the FS type, index to the
>> table and params.  This won't exactly be self-describing as you wanted
>> but, differently from function:line, it removes the need for the source
>> code, and allows localization.  The per-filesystem table would be
>> stable ABI, of course.
>
> Yikes.  I don't think people are going to be ok with a message table
> where we can never remove the strings.  I bet GregKH won't like that
> either (one value per sysfs file).

Indeed, sysfs seems out of question.  In fact the string format doesn't
even need to be in the kernel, and we don't need the strings to be sent
as part of the notifications.  What if we can have a bunch of
notification types, specific for each error message, and a library in
userspace that parses the notifications and understands the parameters
passed?  The library then displays the data as they wish.

> (Maybe I misread that and all you meant by stable ABI is the fact that
> the table exists at a given path and the notification message gives you
> a index into ... wherever we put it.)

The kernel could even export the table as a build-time target, that
get's installed into X. But even that is not necessary if a library can
make sense of a notification that uniquely identifies each error and
only includes the useful debug parameters without any string formatting?

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] vfs: Add superblock notifications
  2020-12-08  0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
  2020-12-08  0:56   ` Darrick J. Wong
@ 2020-12-10 22:09   ` Dave Chinner
  2020-12-11 20:55     ` Gabriel Krisman Bertazi
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2020-12-10 22:09 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Mon, Dec 07, 2020 at 09:31:13PM -0300, Gabriel Krisman Bertazi wrote:
> From: David Howells <dhowells@redhat.com>
> 
> Add a superblock event notification facility whereby notifications about
> superblock events, such as I/O errors (EIO), quota limits being hit
> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
> process asynchronously.  Note that this does not cover vfsmount topology
> changes.  watch_mount() is used for that.

watch_mount() is not in the upstream tree, nor is it defined in this
patch set.

> Records are of the following format:
> 
> 	struct superblock_notification {
> 		struct watch_notification watch;
> 		__u64	sb_id;
> 	} *n;
> 
> Where:
> 
> 	n->watch.type will be WATCH_TYPE_SB_NOTIFY.
> 
> 	n->watch.subtype will indicate the type of event, such as
> 	NOTIFY_SUPERBLOCK_READONLY.
> 
> 	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
> 	record.
> 
> 	n->watch.info & WATCH_INFO_ID will be the fifth argument to
> 	watch_sb(), shifted.
> 
> 	n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
> 	NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
> 	R/O, and being cleared otherwise.
> 
> 	n->sb_id will be the ID of the superblock, as can be retrieved with
> 	the fsinfo() syscall, as part of the fsinfo_sb_notifications
> 	attribute in the watch_id field.
> 
> Note that it is permissible for event records to be of variable length -
> or, at least, the length may be dependent on the subtype.  Note also that
> the queue can be shared between multiple notifications of various types.

/me puts on his "We really, really, REALLY suck at APIs" hat.

This adds a new syscall that has a complex structure associated with
in. This needs a full man page specification written for it
describing the parameters, the protocol structures, behaviour, etc
before we can really review this. It really also needs full test
infrastructure for every aspect of the syscall written from the man
page (not the implementation) for fstests so that we end up with a
consistent implementation for every filesystem that implements these
watches.

Other things:

- Scoping: inode/block related information is not "superblock"
  information. What about errors in non-inode related objects?
- offets into files/devices/objects need to be in bytes, not blocks
- errors can span multiple contiguous blocks, so the notification
  needs to report the -byte range- the error corresponds to.
- superblocks can have multiple block devices under them with
  individual address ranges. Hence we need {object,dev,offset,len}
  to uniquely identify where an error occurred in a filesystem.
- userspace face structures need padding and flags/version/size
  information so we can tell what shape the structure being passed
  is. It is guaranteed that we will want to expand the structure
  definitions in future, maybe even deprecate some...
- syscall has no flags field.
- syscall is of "at" type (relative path via dfd) so probably shoudl
  be called "watch..._at()"

Fundamentally, though, I'm struggling to understand what the
difference between watch_mount() and watch_sb() is going to be.
"superblock" watches seem like the wrong abstraction for a path
based watch interface. Superblocks can be shared across multiple
disjoint paths, subvolumes and even filesystems.

The path based user API is really asking to watch a mount, not a
superblock. We don't otherwise expose superblocks to userspace at
all, so this seems like the API is somewhat exposing internal kernel
implementation behind mounts. However, there -is- a watch_mount()
syscall floating around somewhere, so it makes me wonder exactly why
we need a second syscall and interface protocol to expose
essentially the same path-based watch information to userspace.
Without having that syscall the same patchset, or a reference to
that patchset (and man page documenting the interface), I have no
idea what it does or why it is different or why it can't be used for
these error notifications....

/me wonders what applications are suppose to do if they have a watch
on a path that then gets over-mounted by a different filesystem and
so their watch for that path is effectively now stale because
operations on that path now redirect to a different mount and
superblock....

> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index f27ac94d5fa7..3e97984bc4c8 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -51,6 +51,9 @@ COND_SYSCALL_COMPAT(io_pgetevents);
>  COND_SYSCALL(io_uring_setup);
>  COND_SYSCALL(io_uring_enter);
>  COND_SYSCALL(io_uring_register);
> +COND_SYSCALL(fsinfo);
> +COND_SYSCALL(watch_mount);
> +COND_SYSCALL(watch_sb);

I think these need to be in the patches that introduce these
syscalls, not this one.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] vfs: Add superblock notifications
  2020-12-10 22:09   ` Dave Chinner
@ 2020-12-11 20:55     ` Gabriel Krisman Bertazi
  2020-12-18  1:06       ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-11 20:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel



Dave,

Thanks a lot for the very detailed review.

> On Mon, Dec 07, 2020 at 09:31:13PM -0300, Gabriel Krisman Bertazi wrote:
>> From: David Howells <dhowells@redhat.com>
>> 
>> Add a superblock event notification facility whereby notifications about
>> superblock events, such as I/O errors (EIO), quota limits being hit
>> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
>> process asynchronously.  Note that this does not cover vfsmount topology
>> changes.  watch_mount() is used for that.
>
> watch_mount() is not in the upstream tree, nor is it defined in this
> patch set.


That is my mistake, not the author's.  I picked this from a longer series that has
a watch_mount implementation, but didn't include it.  Only the commit message
is bad, not the patch absence.

>> Records are of the following format:
>> 
>> 	struct superblock_notification {
>> 		struct watch_notification watch;
>> 		__u64	sb_id;
>> 	} *n;
>> 
>> Where:
>> 
>> 	n->watch.type will be WATCH_TYPE_SB_NOTIFY.
>> 
>> 	n->watch.subtype will indicate the type of event, such as
>> 	NOTIFY_SUPERBLOCK_READONLY.
>> 
>> 	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
>> 	record.
>> 
>> 	n->watch.info & WATCH_INFO_ID will be the fifth argument to
>> 	watch_sb(), shifted.
>> 
>> 	n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
>> 	NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
>> 	R/O, and being cleared otherwise.
>> 
>> 	n->sb_id will be the ID of the superblock, as can be retrieved with
>> 	the fsinfo() syscall, as part of the fsinfo_sb_notifications
>> 	attribute in the watch_id field.
>> 
>> Note that it is permissible for event records to be of variable length -
>> or, at least, the length may be dependent on the subtype.  Note also that
>> the queue can be shared between multiple notifications of various types.
>
> /me puts on his "We really, really, REALLY suck at APIs" hat.
>
> This adds a new syscall that has a complex structure associated with
> in. This needs a full man page specification written for it
> describing the parameters, the protocol structures, behaviour, etc
> before we can really review this. It really also needs full test
> infrastructure for every aspect of the syscall written from the man
> page (not the implementation) for fstests so that we end up with a
> consistent implementation for every filesystem that implements these
> watches.

I see.  I was thinking the other way around, getting a design accepted
by you all before writing down documentation, but that makes a lot of
sense. In fact, I'm taking a step back and writing a text proposal,
without patches, such that we can agree on the main points before I
start coding.

> Other things:
>
> - Scoping: inode/block related information is not "superblock"
>   information. What about errors in non-inode related objects?

The previous RFC separated inode error notifications from other types,
but my idea was to have different notifications types for each object.


> - offets into files/devices/objects need to be in bytes, not blocks
> - errors can span multiple contiguous blocks, so the notification
>   needs to report the -byte range- the error corresponds to.
> - superblocks can have multiple block devices under them with
>   individual address ranges. Hence we need {object,dev,offset,len}
>   to uniquely identify where an error occurred in a filesystem.
> - userspace face structures need padding and flags/version/size
>   information so we can tell what shape the structure being passed
>   is. It is guaranteed that we will want to expand the structure
>   definitions in future, maybe even deprecate some...
> - syscall has no flags field.
> - syscall is of "at" type (relative path via dfd) so probably shoudl
>   be called "watch..._at()"

will do all the above.

>
> Fundamentally, though, I'm struggling to understand what the
> difference between watch_mount() and watch_sb() is going to be.
> "superblock" watches seem like the wrong abstraction for a path
> based watch interface. Superblocks can be shared across multiple
> disjoint paths, subvolumes and even filesystems.

As far as I understand the original patchset, watch_mount was designed
to monitor mountpoint operations (mount, umount,.. ) in a sub-tree,
while watch_sb monitors filesystem operations and errors.  I'm not
working with watch_mount, my current interest is in having a
notifications mechanism for filesystem errors, which seemed to fit
nicely with the watch_sb patchset for watch_queue.

> The path based user API is really asking to watch a mount, not a
> superblock. We don't otherwise expose superblocks to userspace at
> all, so this seems like the API is somewhat exposing internal kernel
> implementation behind mounts. However, there -is- a watch_mount()
> syscall floating around somewhere, so it makes me wonder exactly why
> we need a second syscall and interface protocol to expose
> essentially the same path-based watch information to userspace.

I think these are indeed different syscalls, but maybe a bit misnamed.

If not by path, how could we uniquely identify an entire filesystem?
Maybe pointing to a block device that has a valid filesystem and in the
case of fs spawning through multiple devices, consider all of them?  But
that would not work for some misc filesystems, like tmpfs.

> Without having that syscall the same patchset, or a reference to
> that patchset (and man page documenting the interface), I have no
> idea what it does or why it is different or why it can't be used for
> these error notifications....

As a short summary, My goal is an error reporting mechanism for ext4
(preferably that can also be used by other filesystems) that allows a
userspace application to monitor errors on the filesystem without losing
information and without having to parse a convoluted dmesg.  The
watch_queue API seem to expose exactly the infrastructure for this kind
of thing.  As I said, I'm gonna send a proposal with more details
because, I'd really like to have something that can be used by several
filesystems.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] vfs: Include origin of the SB error notification
  2020-12-09 13:06           ` Gabriel Krisman Bertazi
@ 2020-12-11 22:35             ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2020-12-11 22:35 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: David Howells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Wed, Dec 09, 2020 at 10:06:07AM -0300, Gabriel Krisman Bertazi wrote:
> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> 
> > On Tue, Dec 08, 2020 at 04:29:32PM -0300, Gabriel Krisman Bertazi wrote:
> >> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> >> 
> >> > On Tue, Dec 08, 2020 at 09:58:25AM -0300, Gabriel Krisman Bertazi wrote:
> >> >> David Howells <dhowells@redhat.com> writes:
> >> >> 
> >> >> > Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> >> >> >
> >> >> >> @@ -130,6 +131,8 @@ struct superblock_error_notification {
> >> >
> >> > FWIW I wonder if this really should be inode_error_notification?
> >> >
> >> > If (for example) ext4 discovered an error in the blockgroup descriptor
> >> > and wanted to report it, the inode and block numbers would be
> >> > irrelevant, but the blockgroup number would be nice to have.
> >> 
> >> A previous RFC had superblock_error_notification and
> >> superblock_inode_error_notification split, I think we can recover that.
> >> 
> >> >
> >> >> >>  	__u32	error_cookie;
> >> >> >>  	__u64	inode;
> >> >> >>  	__u64	block;
> >> >> >> +	char	function[SB_NOTIFICATION_FNAME_LEN];
> >> >> >> +	__u16	line;
> >> >> >>  	char	desc[0];
> >> >> >>  };
> >> >> >
> >> >> > As Darrick said, this is a UAPI breaker, so you shouldn't do this (you can,
> >> >> > however, merge this ahead a patch).  Also, I would put the __u16 before the
> >> >> > char[].
> >> >> >
> >> >> > That said, I'm not sure whether it's useful to include the function name and
> >> >> > line.  Both fields are liable to change over kernel commits, so it's not
> >> >> > something userspace can actually interpret.  I think you're better off dumping
> >> >> > those into dmesg.
> >> >> >
> >> >> > Further, this reduces the capacity of desc[] significantly - I don't know if
> >> >> > that's a problem.
> >> >> 
> >> >> Yes, that is a big problem as desc is already quite limited.  I don't
> >> >
> >> > How limited?
> >> 
> >> The largest notification is 128 bytes, the one with the biggest header
> >> is superblock_error_notification which leaves 56 bytes for description.
> >> 
> >> >
> >> >> think it is a problem for them to change between kernel versions, as the
> >> >> monitoring userspace can easily associate it with the running kernel.
> >> >
> >> > How do you make that association?  $majordistro's 4.18 kernel is not the
> >> > same as the upstream 4.18.  Wouldn't you rather the notification message
> >> > be entirely self-describing rather than depending on some external
> >> > information about the sender?
> >> 
> >> True.  I was thinking on my use case where the customer controls their
> >> infrastructure and would specialize their userspace tools, but that is
> >> poor design on my part.  A self describing mechanism would be better.
> >> 
> >> >
> >> >> The alternative would be generating something like unique IDs for each
> >> >> error notification in the filesystem, no?
> >> >> 
> >> >> > And yet further, there's no room for addition of new fields with the desc[]
> >> >> > buffer on the end.  Now maybe you're planning on making use of desc[] for
> >> >> > text-encoding?
> >> >> 
> >> >> Yes.  I would like to be able to provide more details on the error,
> >> >> without having a unique id.  For instance, desc would have the formatted
> >> >> string below, describing the warning:
> >> >> 
> >> >> ext4_warning(inode->i_sb, "couldn't mark inode dirty (err %d)", err);
> >> >
> >> > Depending on the upper limit on the length of messages, I wonder if you
> >> > could split the superblock notification and the description string into
> >> > separate messages (with maybe the error cookie to tie them together) so
> >> > that the struct isn't limited by having a VLA on the end, and the
> >> > description can be more or less an arbitrary string?
> >> >
> >> > (That said I'm not familiar with the watch queue system so I have no
> >> > idea if chained messages even make sense here, or are already
> >> > implemented in some other way, or...)
> >> 
> >> I don't see any support for chaining messages in the current watch_queue
> >> implementation, I'd need to extend the interface to support it.  I
> >> considered this idea before, given the small description size, but I
> >> thought it would be over-complicated, even though much more future
> >> proof.  I will look into that.
> >> 
> >> What about the kernel exporting a per-filesystem table, as a build
> >> target or in /sys/fs/<fs>/errors, that has descriptions strings for each
> >> error?  Then the notification can have only the FS type, index to the
> >> table and params.  This won't exactly be self-describing as you wanted
> >> but, differently from function:line, it removes the need for the source
> >> code, and allows localization.  The per-filesystem table would be
> >> stable ABI, of course.
> >
> > Yikes.  I don't think people are going to be ok with a message table
> > where we can never remove the strings.  I bet GregKH won't like that
> > either (one value per sysfs file).
> 
> Indeed, sysfs seems out of question.  In fact the string format doesn't
> even need to be in the kernel, and we don't need the strings to be sent
> as part of the notifications.  What if we can have a bunch of
> notification types, specific for each error message, and a library in
> userspace that parses the notifications and understands the parameters
> passed?  The library then displays the data as they wish.

Er... I don't think we (XFS) really are going to maintain a userspace
library to decode kernel messages.

> > (Maybe I misread that and all you meant by stable ABI is the fact that
> > the table exists at a given path and the notification message gives you
> > a index into ... wherever we put it.)
> 
> The kernel could even export the table as a build-time target, that
> get's installed into X. But even that is not necessary if a library can
> make sense of a notification that uniquely identifies each error and
> only includes the useful debug parameters without any string formatting?

/me shrugs and thinks that chaining fs notifications via cookie might be
a better way to do this, since then you could push out a stream of
notices about a filesystem event:

(0) generic vfs error telling you something happened at a inode/offset

(1) fs-specific notice giving more details about what that fs thinks is
wrong

(2) formatted message string so that you can send it to the sysadmin or
feed it to google translate or whatever :)

--D

> -- 
> Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] vfs: Add superblock notifications
  2020-12-11 20:55     ` Gabriel Krisman Bertazi
@ 2020-12-18  1:06       ` Dave Chinner
  2021-01-05 19:52         ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2020-12-18  1:06 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel

On Fri, Dec 11, 2020 at 05:55:32PM -0300, Gabriel Krisman Bertazi wrote:
> 
> 
> Dave,
> 
> Thanks a lot for the very detailed review.
> 
> > On Mon, Dec 07, 2020 at 09:31:13PM -0300, Gabriel Krisman Bertazi wrote:
> >> From: David Howells <dhowells@redhat.com>
> >> 
> >> Add a superblock event notification facility whereby notifications about
> >> superblock events, such as I/O errors (EIO), quota limits being hit
> >> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
> >> process asynchronously.  Note that this does not cover vfsmount topology
> >> changes.  watch_mount() is used for that.
> >
> > watch_mount() is not in the upstream tree, nor is it defined in this
> > patch set.
> 
> 
> That is my mistake, not the author's.  I picked this from a longer series that has
> a watch_mount implementation, but didn't include it.  Only the commit message
> is bad, not the patch absence.
> 
> >> Records are of the following format:
> >> 
> >> 	struct superblock_notification {
> >> 		struct watch_notification watch;
> >> 		__u64	sb_id;
> >> 	} *n;
> >> 
> >> Where:
> >> 
> >> 	n->watch.type will be WATCH_TYPE_SB_NOTIFY.
> >> 
> >> 	n->watch.subtype will indicate the type of event, such as
> >> 	NOTIFY_SUPERBLOCK_READONLY.
> >> 
> >> 	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
> >> 	record.
> >> 
> >> 	n->watch.info & WATCH_INFO_ID will be the fifth argument to
> >> 	watch_sb(), shifted.
> >> 
> >> 	n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
> >> 	NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
> >> 	R/O, and being cleared otherwise.
> >> 
> >> 	n->sb_id will be the ID of the superblock, as can be retrieved with
> >> 	the fsinfo() syscall, as part of the fsinfo_sb_notifications
> >> 	attribute in the watch_id field.
> >> 
> >> Note that it is permissible for event records to be of variable length -
> >> or, at least, the length may be dependent on the subtype.  Note also that
> >> the queue can be shared between multiple notifications of various types.
> >
> > /me puts on his "We really, really, REALLY suck at APIs" hat.
> >
> > This adds a new syscall that has a complex structure associated with
> > in. This needs a full man page specification written for it
> > describing the parameters, the protocol structures, behaviour, etc
> > before we can really review this. It really also needs full test
> > infrastructure for every aspect of the syscall written from the man
> > page (not the implementation) for fstests so that we end up with a
> > consistent implementation for every filesystem that implements these
> > watches.
> 
> I see.  I was thinking the other way around, getting a design accepted
> by you all before writing down documentation, but that makes a lot of
> sense. In fact, I'm taking a step back and writing a text proposal,
> without patches, such that we can agree on the main points before I
> start coding.
> 
> > Other things:
> >
> > - Scoping: inode/block related information is not "superblock"
> >   information. What about errors in non-inode related objects?
> 
> The previous RFC separated inode error notifications from other types,
> but my idea was to have different notifications types for each object.
> 
> 
> > - offets into files/devices/objects need to be in bytes, not blocks
> > - errors can span multiple contiguous blocks, so the notification
> >   needs to report the -byte range- the error corresponds to.
> > - superblocks can have multiple block devices under them with
> >   individual address ranges. Hence we need {object,dev,offset,len}
> >   to uniquely identify where an error occurred in a filesystem.
> > - userspace face structures need padding and flags/version/size
> >   information so we can tell what shape the structure being passed
> >   is. It is guaranteed that we will want to expand the structure
> >   definitions in future, maybe even deprecate some...
> > - syscall has no flags field.
> > - syscall is of "at" type (relative path via dfd) so probably shoudl
> >   be called "watch..._at()"
> 
> will do all the above.
> 
> >
> > Fundamentally, though, I'm struggling to understand what the
> > difference between watch_mount() and watch_sb() is going to be.
> > "superblock" watches seem like the wrong abstraction for a path
> > based watch interface. Superblocks can be shared across multiple
> > disjoint paths, subvolumes and even filesystems.
> 
> As far as I understand the original patchset, watch_mount was designed
> to monitor mountpoint operations (mount, umount,.. ) in a sub-tree,
> while watch_sb monitors filesystem operations and errors.  I'm not
> working with watch_mount, my current interest is in having a
> notifications mechanism for filesystem errors, which seemed to fit
> nicely with the watch_sb patchset for watch_queue.

<shrug>

The previous patches are not part of your proposal, and if they are
not likely to be merged, then we don't really care what they are
or what they did. The only thing that matters here is what your
patchset is trying to implement and whether that is appropriate or
not...

> > The path based user API is really asking to watch a mount, not a
> > superblock. We don't otherwise expose superblocks to userspace at
> > all, so this seems like the API is somewhat exposing internal kernel
> > implementation behind mounts. However, there -is- a watch_mount()
> > syscall floating around somewhere, so it makes me wonder exactly why
> > we need a second syscall and interface protocol to expose
> > essentially the same path-based watch information to userspace.
> 
> I think these are indeed different syscalls, but maybe a bit misnamed.
> 
> If not by path, how could we uniquely identify an entire filesystem?

Exactly why do we need to uniquely identify a filesystem based on
it's superblock? Surely it's already been identified by path by the
application that registered the watch?

> Maybe pointing to a block device that has a valid filesystem and in the
> case of fs spawning through multiple devices, consider all of them?  But
> that would not work for some misc filesystems, like tmpfs.

It can't be block device based at all - think NFS, CIFS, etc. We
can't use UUIDs, because not all filesystem have them, and snapshots
often have identical UUIDs.

Really, I think "superblock" notifications are extremely problematic
because the same superblock can be shared across different security
contexts. I'm not sure what the solution might be, but I really
don't like the idea of a mechanism that can report errors in objects
outside the visibility of a namespaced container to that container
just because it has access to some path inside a much bigger
filesystem that is mostly out of bounds to that container.

> > Without having that syscall the same patchset, or a reference to
> > that patchset (and man page documenting the interface), I have no
> > idea what it does or why it is different or why it can't be used for
> > these error notifications....
> 
> As a short summary, My goal is an error reporting mechanism for ext4
> (preferably that can also be used by other filesystems)

There's no "preferably" here. Either it is generic and usable by all
other filesystems, or it's not functionality that should be merged
into the VFS or exposed by a syscall.

> that allows a
> userspace application to monitor errors on the filesystem without losing
> information and without having to parse a convoluted dmesg.  The
> watch_queue API seem to expose exactly the infrastructure for this kind
> of thing.  As I said, I'm gonna send a proposal with more details
> because, I'd really like to have something that can be used by several
> filesystems.

Yes, I know what the desired functionality is, it's just that it's
not as simple as "redirect global error messages to global pipe"
such as what we do with printk and dmesg...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] vfs: Add superblock notifications
  2020-12-18  1:06       ` Dave Chinner
@ 2021-01-05 19:52         ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-01-05 19:52 UTC (permalink / raw)
  To: Dave Chinner
  Cc: dhowells, viro, tytso, khazhy, adilger.kernel, linux-ext4,
	linux-fsdevel, kernel


Dave, thanks for the feedback.

Dave Chinner <david@fromorbit.com> writes:
> On Fri, Dec 11, 2020 at 05:55:32PM -0300, Gabriel Krisman Bertazi wrote:
>> > Fundamentally, though, I'm struggling to understand what the
>> > difference between watch_mount() and watch_sb() is going to be.
>> > "superblock" watches seem like the wrong abstraction for a path
>> > based watch interface. Superblocks can be shared across multiple
>> > disjoint paths, subvolumes and even filesystems.
>> 
>> As far as I understand the original patchset, watch_mount was designed
>> to monitor mountpoint operations (mount, umount,.. ) in a sub-tree,
>> while watch_sb monitors filesystem operations and errors.  I'm not
>> working with watch_mount, my current interest is in having a
>> notifications mechanism for filesystem errors, which seemed to fit
>> nicely with the watch_sb patchset for watch_queue.
>
> <shrug>
>
> The previous patches are not part of your proposal, and if they are
> not likely to be merged, then we don't really care what they are
> or what they did. The only thing that matters here is what your
> patchset is trying to implement and whether that is appropriate or
> not...

I think the mistake was only mentioning them in the commit message, in
the first place.

>> > The path based user API is really asking to watch a mount, not a
>> > superblock. We don't otherwise expose superblocks to userspace at
>> > all, so this seems like the API is somewhat exposing internal kernel
>> > implementation behind mounts. However, there -is- a watch_mount()
>> > syscall floating around somewhere, so it makes me wonder exactly why
>> > we need a second syscall and interface protocol to expose
>> > essentially the same path-based watch information to userspace.
>> 
>> I think these are indeed different syscalls, but maybe a bit misnamed.
>> 
>> If not by path, how could we uniquely identify an entire filesystem?
>
> Exactly why do we need to uniquely identify a filesystem based on
> it's superblock? Surely it's already been identified by path by the
> application that registered the watch?

I see.  In fact, we don't, as that is an internal concept. The patch
abuses the term superblock to refer to the entire filesystem.  I should
to operate in terms of mounts.

>> Maybe pointing to a block device that has a valid filesystem and in the
>> case of fs spawning through multiple devices, consider all of them?  But
>> that would not work for some misc filesystems, like tmpfs.
>
> It can't be block device based at all - think NFS, CIFS, etc. We
> can't use UUIDs, because not all filesystem have them, and snapshots
> often have identical UUIDs.
>
> Really, I think "superblock" notifications are extremely problematic
> because the same superblock can be shared across different security
> contexts. I'm not sure what the solution might be, but I really
> don't like the idea of a mechanism that can report errors in objects
> outside the visibility of a namespaced container to that container
> just because it has access to some path inside a much bigger
> filesystem that is mostly out of bounds to that container.

I see.  To solve the container visibility problem, would it suffice to
forbid watching partial mounts of a filesystem?  For instance, either
the watched path is the root_sb or the API returns EINVAL.  This limits
the usability of the API to whoever controls the root of the filesystem,
which seems to cover the use case of the host monitoring an entire
filesystem.  Would this limitation be acceptable?

Alternatively, we want something similar to fanotify FAN_MARK_FILESYSTEM
semantics? I suppose global errors (like an ext4 fs abort) should be
reported individually for every mountpoint, while inode errors are only
reported for each mountpoint for which the object is accessible.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-05 19:53 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 1/8] watch_queue: Make watch_sizeof() check record size Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 2/8] security: Add hooks to rule on setting a watch for superblock Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
2020-12-08  0:56   ` Darrick J. Wong
2020-12-10 22:09   ` Dave Chinner
2020-12-11 20:55     ` Gabriel Krisman Bertazi
2020-12-18  1:06       ` Dave Chinner
2021-01-05 19:52         ` Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 5/8] vfs: Include origin of the SB error notification Gabriel Krisman Bertazi
2020-12-08  0:51   ` Darrick J. Wong
2020-12-08  0:55     ` Gabriel Krisman Bertazi
2020-12-08 12:42     ` David Howells
2020-12-08  0:31 ` [PATCH 6/8] fs: Add more superblock error subtypes Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 7/8] ext4: Implement SB error notification through watch_sb Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 8/8] samples: watch_queue: Add sample of SB notifications Gabriel Krisman Bertazi
2020-12-08 12:51 ` [PATCH 5/8] vfs: Include origin of the SB error notification David Howells
2020-12-08 12:58   ` Gabriel Krisman Bertazi
2020-12-08 18:41     ` Darrick J. Wong
2020-12-08 19:29       ` Gabriel Krisman Bertazi
2020-12-09  3:24         ` Darrick J. Wong
2020-12-09 13:06           ` Gabriel Krisman Bertazi
2020-12-11 22:35             ` Darrick J. Wong
2020-12-08 12:57 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification David Howells
2020-12-08 12:59 ` [PATCH 0/8] Superblock Notifications David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.