All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] inotify_user: pass directory fd to inotify_find_inode()
@ 2023-09-18 12:32 Max Kellermann
  2023-09-18 12:32 ` [PATCH 2/4] inotify_user: move code to do_inotify_add_watch() Max Kellermann
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-18 12:32 UTC (permalink / raw)
  To: jack, linux-fsdevel, linux-kernel; +Cc: amir73il, max.kellermann

Preparing for inotify_add_watch_at().

To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/notify/inotify/inotify_user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 1c4bfdab008d..1853439a24f6 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -370,12 +370,12 @@ static const struct file_operations inotify_fops = {
 /*
  * find_inode - resolve a user-given path to a specific inode
  */
-static int inotify_find_inode(const char __user *dirname, struct path *path,
+static int inotify_find_inode(int dfd, const char __user *dirname, struct path *path,
 						unsigned int flags, __u64 mask)
 {
 	int error;
 
-	error = user_path_at(AT_FDCWD, dirname, flags, path);
+	error = user_path_at(dfd, dirname, flags, path);
 	if (error)
 		return error;
 	/* you can only watch an inode if you have read permissions on it */
@@ -774,7 +774,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	if (mask & IN_ONLYDIR)
 		flags |= LOOKUP_DIRECTORY;
 
-	ret = inotify_find_inode(pathname, &path, flags,
+	ret = inotify_find_inode(AT_FDCWD, pathname, &path, flags,
 			(mask & IN_ALL_EVENTS));
 	if (ret)
 		goto fput_and_out;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] inotify_user: move code to do_inotify_add_watch()
  2023-09-18 12:32 [PATCH 1/4] inotify_user: pass directory fd to inotify_find_inode() Max Kellermann
@ 2023-09-18 12:32 ` Max Kellermann
  2023-09-18 12:32 ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
  2023-09-18 12:32 ` [PATCH 4/4] arch: register inotify_add_watch_at Max Kellermann
  2 siblings, 0 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-18 12:32 UTC (permalink / raw)
  To: jack, linux-fsdevel, linux-kernel; +Cc: amir73il, max.kellermann

Preparing for inotify_add_watch_at().

To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/notify/inotify/inotify_user.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 1853439a24f6..b6e6f6ab21f8 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -727,8 +727,8 @@ SYSCALL_DEFINE0(inotify_init)
 	return do_inotify_init(0);
 }
 
-SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
-		u32, mask)
+static int do_inotify_add_watch(int fd, int dfd, const char __user *pathname,
+				u32 mask)
 {
 	struct fsnotify_group *group;
 	struct inode *inode;
@@ -774,7 +774,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	if (mask & IN_ONLYDIR)
 		flags |= LOOKUP_DIRECTORY;
 
-	ret = inotify_find_inode(AT_FDCWD, pathname, &path, flags,
+	ret = inotify_find_inode(dfd, pathname, &path, flags,
 			(mask & IN_ALL_EVENTS));
 	if (ret)
 		goto fput_and_out;
@@ -791,6 +791,12 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	return ret;
 }
 
+SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
+		u32, mask)
+{
+	return do_inotify_add_watch(fd, AT_FDCWD, pathname, mask);
+}
+
 SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 {
 	struct fsnotify_group *group;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] inotify_user: add system call inotify_add_watch_at()
  2023-09-18 12:32 [PATCH 1/4] inotify_user: pass directory fd to inotify_find_inode() Max Kellermann
  2023-09-18 12:32 ` [PATCH 2/4] inotify_user: move code to do_inotify_add_watch() Max Kellermann
@ 2023-09-18 12:32 ` Max Kellermann
  2023-09-18 12:40   ` Jan Kara
  2023-09-18 12:32 ` [PATCH 4/4] arch: register inotify_add_watch_at Max Kellermann
  2 siblings, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-18 12:32 UTC (permalink / raw)
  To: jack, linux-fsdevel, linux-kernel; +Cc: amir73il, max.kellermann

This implements a missing piece in the inotify API: referring to a
file by a directory file descriptor and a path name.  This can be
solved in userspace currently only by doing something similar to:

  int old = open(".");
  fchdir(dfd);
  inotify_add_watch(....);
  fchdir(old);

Support for LOOKUP_EMPTY is still missing.  We could add another IN_*
flag for that (which would clutter the IN_* flags list further) or
add a "flags" parameter to the new system call (which would however
duplicate features already present via special IN_* flags).

To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/notify/inotify/inotify_user.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index b6e6f6ab21f8..8a9096c5ebb1 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -797,6 +797,12 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	return do_inotify_add_watch(fd, AT_FDCWD, pathname, mask);
 }
 
+SYSCALL_DEFINE4(inotify_add_watch_at, int, fd, int, dfd, const char __user *, pathname,
+		u32, mask)
+{
+	return do_inotify_add_watch(fd, dfd, pathname, mask);
+}
+
 SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 {
 	struct fsnotify_group *group;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] arch: register inotify_add_watch_at
  2023-09-18 12:32 [PATCH 1/4] inotify_user: pass directory fd to inotify_find_inode() Max Kellermann
  2023-09-18 12:32 ` [PATCH 2/4] inotify_user: move code to do_inotify_add_watch() Max Kellermann
  2023-09-18 12:32 ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
@ 2023-09-18 12:32 ` Max Kellermann
  2023-09-18 20:24   ` kernel test robot
  2023-09-18 20:24   ` kernel test robot
  2 siblings, 2 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-18 12:32 UTC (permalink / raw)
  To: jack, linux-fsdevel, linux-kernel; +Cc: amir73il, max.kellermann

Using syscall number 454 (only different on Alpha).  This skips 453 on
most architectures; 453 is used on x86_64 for "map_shadow_stack" and
my idea is to reserve that number for the remaining architectures,
where an implementation may be added eventually.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 arch/alpha/kernel/syscalls/syscall.tbl      | 1 +
 arch/arm/tools/syscall.tbl                  | 1 +
 arch/arm64/include/asm/unistd.h             | 2 +-
 arch/arm64/include/asm/unistd32.h           | 2 ++
 arch/ia64/kernel/syscalls/syscall.tbl       | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl       | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl     | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    | 1 +
 arch/s390/kernel/syscalls/syscall.tbl       | 1 +
 arch/sh/kernel/syscalls/syscall.tbl         | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
 include/linux/syscalls.h                    | 2 ++
 include/uapi/asm-generic/unistd.h           | 5 ++++-
 20 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index ad37569d0507..3eaf0c8ffe9c 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -492,3 +492,4 @@
 560	common	set_mempolicy_home_node		sys_ni_syscall
 561	common	cachestat			sys_cachestat
 562	common	fchmodat2			sys_fchmodat2
+563	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index c572d6c3dee0..08fc73bf211c 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -466,3 +466,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index bd77253b62e0..63a8a9c4abc1 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		453
+#define __NR_compat_syscalls		455
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 78b68311ec81..384c121dfbdf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -911,6 +911,8 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
 __SYSCALL(__NR_cachestat, sys_cachestat)
 #define __NR_fchmodat2 452
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+#define __NR_inotify_add_watch_at 454
+__SYSCALL(__NR_inotify_add_watch_at, sys_inotify_add_watch_at)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 83d8609aec03..8312606fdcd7 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -373,3 +373,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 259ceb125367..51de66ce3e9b 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index a3798c2637fd..991a0ae1c4be 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -458,3 +458,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 152034b8e0a0..98a7c7c45293 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -391,3 +391,4 @@
 450	n32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n32	cachestat			sys_cachestat
 452	n32	fchmodat2			sys_fchmodat2
+454	n32	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index cb5e757f6621..8751f6de6b96 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -367,3 +367,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n64	cachestat			sys_cachestat
 452	n64	fchmodat2			sys_fchmodat2
+454	n64	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 1a646813afdc..a899807b3b64 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -440,3 +440,4 @@
 450	o32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	o32	cachestat			sys_cachestat
 452	o32	fchmodat2			sys_fchmodat2
+454	o32	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index e97c175b56f9..325885297d5d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -451,3 +451,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 20e50586e8a2..132876dc7aa5 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -539,3 +539,4 @@
 450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 0122cc156952..b6d29996190c 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
+454  common	inotify_add_watch_at	sys_inotify_add_watch_at	sys_inotify_add_watch_at
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index e90d585c4d3e..337875016443 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4ed06c71c43f..9634f81be406 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -498,3 +498,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 2d0b1bd866ea..de83d6861429 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -457,3 +457,4 @@
 450	i386	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	i386	cachestat		sys_cachestat
 452	i386	fchmodat2		sys_fchmodat2
+454	i386	inotify_add_watch_at	sys_inotify_add_watch_at
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 1d6eee30eceb..fdf0128fffce 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -375,6 +375,7 @@
 451	common	cachestat		sys_cachestat
 452	common	fchmodat2		sys_fchmodat2
 453	64	map_shadow_stack	sys_map_shadow_stack
+454	common	inotify_add_watch_at	sys_inotify_add_watch_at
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index fc1a4f3c81d9..43cbdd8f369c 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -423,3 +423,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	inotify_add_watch_at		sys_inotify_add_watch_at
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 22bc6bc147f8..63349ee93cb3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -379,6 +379,8 @@ asmlinkage long sys_fcntl64(unsigned int fd,
 asmlinkage long sys_inotify_init1(int flags);
 asmlinkage long sys_inotify_add_watch(int fd, const char __user *path,
 					u32 mask);
+asmlinkage long sys_inotify_add_watch_at(int fd, int dfd, const char __user *path,
+					 u32 mask);
 asmlinkage long sys_inotify_rm_watch(int fd, __s32 wd);
 asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd,
 				unsigned long arg);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index abe087c53b4b..f84bdb800352 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -823,8 +823,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 #define __NR_fchmodat2 452
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 
+#define __NR_inotify_add_watch_at 454
+__SYSCALL(__NR_inotify_add_watch_at, sys_inotify_add_watch_at)
+
 #undef __NR_syscalls
-#define __NR_syscalls 453
+#define __NR_syscalls 455
 
 /*
  * 32 bit systems traditionally used different
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] inotify_user: add system call inotify_add_watch_at()
  2023-09-18 12:32 ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
@ 2023-09-18 12:40   ` Jan Kara
  2023-09-18 13:57     ` Max Kellermann
  2023-09-18 19:45     ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2023-09-18 12:40 UTC (permalink / raw)
  To: Max Kellermann; +Cc: jack, linux-fsdevel, linux-kernel, amir73il

On Mon 18-09-23 14:32:16, Max Kellermann wrote:
> This implements a missing piece in the inotify API: referring to a
> file by a directory file descriptor and a path name.  This can be
> solved in userspace currently only by doing something similar to:
> 
>   int old = open(".");
>   fchdir(dfd);
>   inotify_add_watch(....);
>   fchdir(old);
> 
> Support for LOOKUP_EMPTY is still missing.  We could add another IN_*
> flag for that (which would clutter the IN_* flags list further) or
> add a "flags" parameter to the new system call (which would however
> duplicate features already present via special IN_* flags).
> 
> To: Jan Kara <jack@suse.cz>
> Cc: Amir Goldstein <amir73il@gmail.com>
> To: linux-fsdevel@vger.kernel.org
> To: linux-kernel@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>

Thanks for the patches! But generally we don't add new functionality to the
inotify API and rather steer users towards fanotify. In this particular
case fanotify_mark(2) already has the support for dirfd + name. Is there
any problem with using fanotify for you? Note that since kernel 5.13 you
don't need CAP_SYS_ADMIN capability for fanotify functionality that is
more-or-less equivalent to what inotify provides.

								Honza

> ---
>  fs/notify/inotify/inotify_user.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
> index b6e6f6ab21f8..8a9096c5ebb1 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -797,6 +797,12 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
>  	return do_inotify_add_watch(fd, AT_FDCWD, pathname, mask);
>  }
>  
> +SYSCALL_DEFINE4(inotify_add_watch_at, int, fd, int, dfd, const char __user *, pathname,
> +		u32, mask)
> +{
> +	return do_inotify_add_watch(fd, dfd, pathname, mask);
> +}
> +
>  SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
>  {
>  	struct fsnotify_group *group;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] inotify_user: add system call inotify_add_watch_at()
  2023-09-18 12:40   ` Jan Kara
@ 2023-09-18 13:57     ` Max Kellermann
  2023-09-18 14:23       ` Jan Kara
  2023-09-18 19:45     ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
  1 sibling, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-18 13:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-kernel, amir73il

On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> Note that since kernel 5.13 you
> don't need CAP_SYS_ADMIN capability for fanotify functionality that is
> more-or-less equivalent to what inotify provides.

Oh, I missed that change - I remember fanotify as being inaccessible
for unprivileged processes, and fanotify being designed for things
like virus scanners. Indeed I should migrate my code to fanotify.

If fanotify has now become the designated successor of inotify, that
should be hinted in the inotify manpage, and if inotify is effectively
feature-frozen, maybe that should be an extra status in the
MAINTAINERS file?

Max

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] inotify_user: add system call inotify_add_watch_at()
  2023-09-18 13:57     ` Max Kellermann
@ 2023-09-18 14:23       ` Jan Kara
  2023-09-18 15:28         ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2023-09-18 14:23 UTC (permalink / raw)
  To: Max Kellermann; +Cc: Jan Kara, linux-fsdevel, linux-kernel, amir73il

On Mon 18-09-23 15:57:43, Max Kellermann wrote:
> On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> > Note that since kernel 5.13 you
> > don't need CAP_SYS_ADMIN capability for fanotify functionality that is
> > more-or-less equivalent to what inotify provides.
> 
> Oh, I missed that change - I remember fanotify as being inaccessible
> for unprivileged processes, and fanotify being designed for things
> like virus scanners. Indeed I should migrate my code to fanotify.
> 
> If fanotify has now become the designated successor of inotify, that
> should be hinted in the inotify manpage, and if inotify is effectively
> feature-frozen, maybe that should be an extra status in the
> MAINTAINERS file?

The manpage update is a good idea. I'm not sure about the MAINTAINERS
status - we do have 'Obsolete' but I'm reluctant to mark inotify as
obsolete as it's perfectly fine for existing users, we fully maintain it
and support it but we just don't want to extend the API anymore. Amir, what
are your thoughts on this?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] inotify_user: add system call inotify_add_watch_at()
  2023-09-18 14:23       ` Jan Kara
@ 2023-09-18 15:28         ` Amir Goldstein
  2023-09-18 18:05           ` inotify maintenance status Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-18 15:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: Max Kellermann, linux-fsdevel, linux-kernel

On Mon, Sep 18, 2023 at 5:23 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 18-09-23 15:57:43, Max Kellermann wrote:
> > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> > > Note that since kernel 5.13 you
> > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is
> > > more-or-less equivalent to what inotify provides.
> >
> > Oh, I missed that change - I remember fanotify as being inaccessible
> > for unprivileged processes, and fanotify being designed for things
> > like virus scanners. Indeed I should migrate my code to fanotify.
> >
> > If fanotify has now become the designated successor of inotify, that
> > should be hinted in the inotify manpage, and if inotify is effectively
> > feature-frozen, maybe that should be an extra status in the
> > MAINTAINERS file?
>
> The manpage update is a good idea. I'm not sure about the MAINTAINERS
> status - we do have 'Obsolete' but I'm reluctant to mark inotify as
> obsolete as it's perfectly fine for existing users, we fully maintain it
> and support it but we just don't want to extend the API anymore. Amir, what
> are your thoughts on this?

I think that the mention of inotify vs. fanotify features in fanotify.7 man page
is decent - if anyone wants to improve it I won't mind.
A mention of fanotify as successor in inotify.7 man page is not a bad idea -
patches welcome.

As to MAINTAINERS, I think that 'Maintained' feels right.
We may consider 'Odd Fixes' for inotify and certainly for 'dnotify',
but that sounds a bit too harsh for the level of maintenance they get.

I'd like to point out that IMO, man-page is mainly aimed for the UAPI
users and MAINTAINERS is mostly aimed at bug reporters and drive-by
developers who submit small fixes.

When developers wish to add a feature/improvement to a subsystem,
they are advised to send an RFC with their intentions to the subsystem
maintainers/list to get feedback on their design before starting to implement.
Otherwise, the feature could be NACKed for several reasons other than
"we would rather invest in the newer API".

Bottom line - I don't see a strong reason to change anything, but I also do
not object to improving man page nor to switching to 'Odd Fixes' status.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-18 15:28         ` Amir Goldstein
@ 2023-09-18 18:05           ` Amir Goldstein
  2023-09-19  7:16             ` Amir Goldstein
                               ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Amir Goldstein @ 2023-09-18 18:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christian Brauner, Max Kellermann

[Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/]

On Mon, Sep 18, 2023 at 6:28 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Sep 18, 2023 at 5:23 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Mon 18-09-23 15:57:43, Max Kellermann wrote:
> > > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> > > > Note that since kernel 5.13 you
> > > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is
> > > > more-or-less equivalent to what inotify provides.
> > >
> > > Oh, I missed that change - I remember fanotify as being inaccessible
> > > for unprivileged processes, and fanotify being designed for things
> > > like virus scanners. Indeed I should migrate my code to fanotify.
> > >
> > > If fanotify has now become the designated successor of inotify, that
> > > should be hinted in the inotify manpage, and if inotify is effectively
> > > feature-frozen, maybe that should be an extra status in the
> > > MAINTAINERS file?
> >
> > The manpage update is a good idea. I'm not sure about the MAINTAINERS
> > status - we do have 'Obsolete' but I'm reluctant to mark inotify as
> > obsolete as it's perfectly fine for existing users, we fully maintain it
> > and support it but we just don't want to extend the API anymore. Amir, what
> > are your thoughts on this?
>
> I think that the mention of inotify vs. fanotify features in fanotify.7 man page
> is decent - if anyone wants to improve it I won't mind.
> A mention of fanotify as successor in inotify.7 man page is not a bad idea -
> patches welcome.
>
> As to MAINTAINERS, I think that 'Maintained' feels right.
> We may consider 'Odd Fixes' for inotify and certainly for 'dnotify',
> but that sounds a bit too harsh for the level of maintenance they get.
>
> I'd like to point out that IMO, man-page is mainly aimed for the UAPI
> users and MAINTAINERS is mostly aimed at bug reporters and drive-by
> developers who submit small fixes.
>
> When developers wish to add a feature/improvement to a subsystem,
> they are advised to send an RFC with their intentions to the subsystem
> maintainers/list to get feedback on their design before starting to implement.
> Otherwise, the feature could be NACKed for several reasons other than
> "we would rather invest in the newer API".
>
> Bottom line - I don't see a strong reason to change anything, but I also do
> not object to improving man page nor to switching to 'Odd Fixes' status.
>

BTW, before we can really mark inotify as Obsolete and document that
inotify was superseded by fanotify, there are at least two items on the
TODO list [1]:
1. UNMOUNT/IGNORED events
2. Filesystems without fid support

MOUNT/UNMOUNT fanotify events have already been discussed
and the feature has known users.

Christian has also mentioned [1] the IN_UNMOUNT use case for
waiting for sb shutdown several times and I will not be surprised
to see systemd starting to use inotify for that use case before too long...

Regarding the second item on the TODO list, we have had this discussion
before - if we are going to continue the current policy of opting-in to fanotify
(i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd filesystems that
only support inotify and we will need to keep supporting inotify only for the
users that use inotify on those odd filesystems.

OTOH, if we implement FAN_REPORT_DINO_NAME, we could
have fanotify inode mark support for any filesystem, where the
pinned marked inode ino is the object id.

Thanks,
Amir.

[1] https://github.com/amir73il/fsnotify-utils/wiki/fsnotify-TODO
[2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] inotify_user: add system call inotify_add_watch_at()
  2023-09-18 12:40   ` Jan Kara
  2023-09-18 13:57     ` Max Kellermann
@ 2023-09-18 19:45     ` Max Kellermann
  1 sibling, 0 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-18 19:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-kernel, amir73il

On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> Is there any problem with using fanotify for you?

Turns out fanotify is unusable for me, unfortunately.
I have been using inotify to get notifications of cgroup events, but
the cgroup filesystem appears to be unsupported by fanotify: all
attempts to use fanotify_mark() on cgroup event files fail with
ENODEV. I think that comes from fanotify_test_fsid(). Filesystems
without a fsid work just fine with inotify, but fail with fanotify.

Since fanotify lacks important features, is it really a good idea to
feature-freeze inotify?

(By the way, what was not documented is that fanotify_init() can only
be used by unprivileged processes if the FAN_REPORT_FID flag was
specified. I had to read the kernel sources to figure that out - I
have no idea why this limitation exists - the code comment in the
kernel source doesn't explain it.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] arch: register inotify_add_watch_at
  2023-09-18 12:32 ` [PATCH 4/4] arch: register inotify_add_watch_at Max Kellermann
@ 2023-09-18 20:24   ` kernel test robot
  2023-09-18 20:24   ` kernel test robot
  1 sibling, 0 replies; 32+ messages in thread
From: kernel test robot @ 2023-09-18 20:24 UTC (permalink / raw)
  To: Max Kellermann, jack, linux-fsdevel, linux-kernel
  Cc: oe-kbuild-all, amir73il, max.kellermann

Hi Max,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/asm]
[also build test ERROR on linus/master v6.6-rc2]
[cannot apply to jack-fs/fsnotify arm64/for-next/core next-20230918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Max-Kellermann/inotify_user-move-code-to-do_inotify_add_watch/20230918-203410
base:   tip/x86/asm
patch link:    https://lore.kernel.org/r/20230918123217.932179-4-max.kellermann%40ionos.com
patch subject: [PATCH 4/4] arch: register inotify_add_watch_at
config: csky-allnoconfig (https://download.01.org/0day-ci/archive/20230919/202309190454.Kk70tC2W-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230919/202309190454.Kk70tC2W-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309190454.Kk70tC2W-lkp@intel.com/

All errors (new ones prefixed by >>):

>> csky-linux-ld: arch/csky/kernel/syscall_table.o:(.data..page_aligned+0x718): undefined reference to `sys_inotify_add_watch_at'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] arch: register inotify_add_watch_at
  2023-09-18 12:32 ` [PATCH 4/4] arch: register inotify_add_watch_at Max Kellermann
  2023-09-18 20:24   ` kernel test robot
@ 2023-09-18 20:24   ` kernel test robot
  1 sibling, 0 replies; 32+ messages in thread
From: kernel test robot @ 2023-09-18 20:24 UTC (permalink / raw)
  To: Max Kellermann, jack, linux-fsdevel, linux-kernel
  Cc: oe-kbuild-all, amir73il, max.kellermann

Hi Max,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/asm]
[also build test ERROR on linus/master v6.6-rc2]
[cannot apply to jack-fs/fsnotify arm64/for-next/core next-20230918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Max-Kellermann/inotify_user-move-code-to-do_inotify_add_watch/20230918-203410
base:   tip/x86/asm
patch link:    https://lore.kernel.org/r/20230918123217.932179-4-max.kellermann%40ionos.com
patch subject: [PATCH 4/4] arch: register inotify_add_watch_at
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20230919/202309190447.Md4xeYhu-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230919/202309190447.Md4xeYhu-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309190447.Md4xeYhu-lkp@intel.com/

All errors (new ones prefixed by >>):

>> or1k-linux-ld: arch/openrisc/kernel/sys_call_table.o:(.data+0x718): undefined reference to `sys_inotify_add_watch_at'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-18 18:05           ` inotify maintenance status Amir Goldstein
@ 2023-09-19  7:16             ` Amir Goldstein
  2023-09-19  9:08               ` Max Kellermann
  2023-09-19  9:26             ` Christian Brauner
  2023-09-19  9:48             ` Jan Kara
  2 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19  7:16 UTC (permalink / raw)
  To: Max Kellermann
  Cc: linux-fsdevel, Christian Brauner, Jan Kara, Ivan Babrou,
	Matthew Bobrowski

On Mon, Sep 18, 2023 at 9:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> [Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/]
>
> On Mon, Sep 18, 2023 at 6:28 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Sep 18, 2023 at 5:23 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Mon 18-09-23 15:57:43, Max Kellermann wrote:
> > > > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> > > > > Note that since kernel 5.13 you
> > > > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is
> > > > > more-or-less equivalent to what inotify provides.
> > > >
> > > > Oh, I missed that change - I remember fanotify as being inaccessible
> > > > for unprivileged processes, and fanotify being designed for things
> > > > like virus scanners. Indeed I should migrate my code to fanotify.
> > > >
> > > > If fanotify has now become the designated successor of inotify, that
> > > > should be hinted in the inotify manpage, and if inotify is effectively
> > > > feature-frozen, maybe that should be an extra status in the
> > > > MAINTAINERS file?
> > >
> > > The manpage update is a good idea. I'm not sure about the MAINTAINERS
> > > status - we do have 'Obsolete' but I'm reluctant to mark inotify as
> > > obsolete as it's perfectly fine for existing users, we fully maintain it
> > > and support it but we just don't want to extend the API anymore. Amir, what
> > > are your thoughts on this?
> >
> > I think that the mention of inotify vs. fanotify features in fanotify.7 man page
> > is decent - if anyone wants to improve it I won't mind.
> > A mention of fanotify as successor in inotify.7 man page is not a bad idea -
> > patches welcome.
> >
> > As to MAINTAINERS, I think that 'Maintained' feels right.
> > We may consider 'Odd Fixes' for inotify and certainly for 'dnotify',
> > but that sounds a bit too harsh for the level of maintenance they get.
> >
> > I'd like to point out that IMO, man-page is mainly aimed for the UAPI
> > users and MAINTAINERS is mostly aimed at bug reporters and drive-by
> > developers who submit small fixes.
> >
> > When developers wish to add a feature/improvement to a subsystem,
> > they are advised to send an RFC with their intentions to the subsystem
> > maintainers/list to get feedback on their design before starting to implement.
> > Otherwise, the feature could be NACKed for several reasons other than
> > "we would rather invest in the newer API".
> >
> > Bottom line - I don't see a strong reason to change anything, but I also do
> > not object to improving man page nor to switching to 'Odd Fixes' status.
> >
>
> BTW, before we can really mark inotify as Obsolete and document that
> inotify was superseded by fanotify, there are at least two items on the
> TODO list [1]:
> 1. UNMOUNT/IGNORED events
> 2. Filesystems without fid support
>
> MOUNT/UNMOUNT fanotify events have already been discussed
> and the feature has known users.
>
> Christian has also mentioned [1] the IN_UNMOUNT use case for
> waiting for sb shutdown several times and I will not be surprised
> to see systemd starting to use inotify for that use case before too long...
>
> Regarding the second item on the TODO list, we have had this discussion
> before - if we are going to continue the current policy of opting-in to fanotify
> (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd filesystems that
> only support inotify and we will need to keep supporting inotify only for the
> users that use inotify on those odd filesystems.
>
> OTOH, if we implement FAN_REPORT_DINO_NAME, we could
> have fanotify inode mark support for any filesystem, where the
> pinned marked inode ino is the object id.
>

Hi Max,

Not sure if you have seen my email before asking your question
on the original patch review thread.
I prefer to answer it here in the wider context of inotify maintenance,
because it touches directly on the topic I raised above.

On Mon, Sep 18, 2023 at 10:45 PM Max Kellermann
<max.kellermann@ionos.com> wrote:
>
> On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> > Is there any problem with using fanotify for you?
>
> Turns out fanotify is unusable for me, unfortunately.
> I have been using inotify to get notifications of cgroup events, but
> the cgroup filesystem appears to be unsupported by fanotify: all
> attempts to use fanotify_mark() on cgroup event files fail with
> ENODEV. I think that comes from fanotify_test_fsid(). Filesystems
> without a fsid work just fine with inotify, but fail with fanotify.
>

This was just fixed by Ivan in commit:
0ce7c12e88cf ("kernfs: attach uuid for every kernfs and report it in fsid")

> Since fanotify lacks important features, is it really a good idea to
> feature-freeze inotify?

As my summary above states, it is correct that fanotify does not
yet fully supersedes inotify, but there is a plan to go in this direction,
hence, inotify is "being phased out" it is not Obsolete nor Deprecated.

However, the question to be asked is different IMO:
When both inotify and fanotify do not support the use case at hand
(as in your case), which is better? to fix/improve inotify or to fix/improve
fanotify?

For me, there should be a very strong reason to choose improving
inotify over improving fanotify.

With the case at hand, you can see that the patch to improve fanotify
to support your use case was far simpler (in LOC at least) than your
patches, not to mention, not having to add a new syscall and new
documentation for an old phased out API.

But there may be exceptions, for example, in 4.19, inotify gained
a new feature:

4d97f7d53da7 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()")

I am not sure if this patch would have been accepted nowadays, but
we need to judge every case.

>
> (By the way, what was not documented is that fanotify_init() can only
> be used by unprivileged processes if the FAN_REPORT_FID flag was

fanotify_init(2):
       Prior to Linux 5.13, calling fanotify_init() required the
CAP_SYS_ADMIN capability.
       Since Linux 5.13, users may call fanotify_init() without the
CAP_SYS_ADMIN capability
       to create  and  initialize an fanotify group with limited functionality.

       The limitations imposed on an event listener created by a user
without the
              CAP_SYS_ADMIN capability are as follows:
...
              •  The user is required to create a group that
identifies filesystem objects
                  by file handles, for example, by providing the
FAN_REPORT_FID flag.

I find this documentation that was written by Matthew very good,
but writing documentation is not my strong side and if you feel that
any part of the documentation should be improved I highly appreciate
the feedback and would appreciate man page patches even more.

When we get to the point that the missing functionality gaps between
inotify and fanotify have been closed, I will surely follow your advice
to mention that in the inotify man page and possibly also in MAINTAINERS.

> specified. I had to read the kernel sources to figure that out - I
> have no idea why this limitation exists - the code comment in the
> kernel source doesn't explain it.)

The legacy fanotify events open and report an event->fd as a way
to identify the object - that is not a safe practice for unprivileged listeners
for several reasons.

FAN_REPORT_FID is designed in a way to be almost a drop in replacement
for inotify watch descriptors as an opaque identifier of the object, except that
fsid+fhanle provide much stronger functionality than wd did.

The limitation that FAN_REPORT_FID requires that fs has fsid+fhandle is
a technicality.  It can be solved by either providing fsid and trivial
encode_fh() (*)
to the filesystem in question (as was done in 6.6-rc1 for overlayfs and kernfs)
or by introducing a new mode FAN_REPORT_INO which reports inode number
instead of fsid+fhandle and is enough for listeners that watch directories
and files on a single fs.

Thanks,
Amir.

(*) the ability for fs to support only encode_fh() was added in kernel v6.5
96b2b072ee62 ("exportfs: allow exporting non-decodeable file handles
to userspace")
and a man page patch was already posted [3].

>
> [1] https://github.com/amir73il/fsnotify-utils/wiki/fsnotify-TODO
> [2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/
[3] https://lore.kernel.org/linux-fsdevel/20230903120433.2605027-1-amir73il@gmail.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19  7:16             ` Amir Goldstein
@ 2023-09-19  9:08               ` Max Kellermann
  2023-09-19 10:01                 ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-19  9:08 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: linux-fsdevel, Christian Brauner, Jan Kara, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 9:17 AM Amir Goldstein <amir73il@gmail.com> wrote:
> This was just fixed by Ivan in commit:
> 0ce7c12e88cf ("kernfs: attach uuid for every kernfs and report it in fsid")

Indeed, nice to see this will soon be fixed.

> As my summary above states, it is correct that fanotify does not
> yet fully supersedes inotify, but there is a plan to go in this direction,
> hence, inotify is "being phased out" it is not Obsolete nor Deprecated.

I agree that if inotify is to be phased out, we should concentrate on fanotify.

I'm however somewhat disappointed with the complexity of the fanotify
API. I'm not entirely convinced that fanotify is a good successor for
inotify, or that inotify should really be replaced. The additional
features that fanotify has could have been added to inotify instead; I
don't get why this needs an entirely new API. Of course, I'm late to
complain, having just learned about (the unprivileged availability of)
fanotify many years after it has been invented.

System calls needed for one inotify event:
- read()

System calls needed for one fanotify event:
- read()
- (do some magic to look up the fsid -
https://github.com/martinpitt/fatrace/blob/master/fatrace.c implements
a lookup table, yet more complexity that doesn't exist with inotify)
- open() to get a file descriptor for the fsid
- open_by_handle_at(fsid_fd, fid.handle)
- readlink("/proc/self/fd/%d") (which adds a dependency on /proc being mounted)
- close(fd)
- close(fsid_fd)

I should mention that this workflow still needs a privileged process -
yes, I can use fanotify in an unprivileged process, but
open_by_handle_at() is a privileged system call - it requires
CAP_DAC_READ_SEARCH. Without it, I cannot obtain information on which
file was modified, can I?
There is FAN_REPORT_NAME, but it gives me only the name of the
directory entry; but since I'm watching a lot of files and all of them
are called "memory.events.local", that's of no use.

Or am I supposed to use name_to_handle_at() on all watched files to
roll my own lookup? (The name_to_hamdle_at() manpage doesn't make me
confident it's a reliable system call; it sounds like it needs
explicit support from filesystems.)

> > (By the way, what was not documented is that fanotify_init() can only
> > be used by unprivileged processes if the FAN_REPORT_FID flag was
[...]
> I find this documentation that was written by Matthew very good,

Indeed! That's my mistake, I missed this section.

> FAN_REPORT_FID is designed in a way to be almost a drop in replacement
> for inotify watch descriptors as an opaque identifier of the object, except that
> fsid+fhanle provide much stronger functionality than wd did.

How is it stronger?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-18 18:05           ` inotify maintenance status Amir Goldstein
  2023-09-19  7:16             ` Amir Goldstein
@ 2023-09-19  9:26             ` Christian Brauner
  2023-09-19  9:48             ` Jan Kara
  2 siblings, 0 replies; 32+ messages in thread
From: Christian Brauner @ 2023-09-19  9:26 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Max Kellermann

> Christian has also mentioned [1] the IN_UNMOUNT use case for
> waiting for sb shutdown several times and I will not be surprised
> to see systemd starting to use inotify for that use case before too long...

I think that having a version of IN_UMOUNT for fanotify would be great.
I've said so a couple of times indeed. It is a really good feature to
monitor superblock deactivation.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-18 18:05           ` inotify maintenance status Amir Goldstein
  2023-09-19  7:16             ` Amir Goldstein
  2023-09-19  9:26             ` Christian Brauner
@ 2023-09-19  9:48             ` Jan Kara
  2023-09-19 14:55               ` Amir Goldstein
  2 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2023-09-19  9:48 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Christian Brauner, Max Kellermann

On Mon 18-09-23 21:05:11, Amir Goldstein wrote:
> [Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/]
...
> BTW, before we can really mark inotify as Obsolete and document that
> inotify was superseded by fanotify, there are at least two items on the
> TODO list [1]:

Yeah, as I wrote in the original thread, I don't feel like inotify should
be marked as obsolete (at least for some more time) so we are on the same
page here I think.

> 1. UNMOUNT/IGNORED events
> 2. Filesystems without fid support
> 
> MOUNT/UNMOUNT fanotify events have already been discussed
> and the feature has known users.
> 
> Christian has also mentioned [1] the IN_UNMOUNT use case for
> waiting for sb shutdown several times and I will not be surprised
> to see systemd starting to use inotify for that use case before too long...

Yup, both FAN_UNMOUNT and FAN_IGNORED should be easy. Unlike inotify, I'd
just make these explicit events you can opt into and not something you
always get.

> Regarding the second item on the TODO list, we have had this discussion
> before - if we are going to continue the current policy of opting-in to
> fanotify (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd
> filesystems that only support inotify and we will need to keep supporting
> inotify only for the users that use inotify on those odd filesystems.
> 
> OTOH, if we implement FAN_REPORT_DINO_NAME, we could
> have fanotify inode mark support for any filesystem, where the
> pinned marked inode ino is the object id.

Is it a real problem after your work to allow filehandles that are not
necessarily usable for NFS export or open_by_handle()? As far as I remember
fanotify should be now able to handle anything that provides f_fsid in its
statfs(2) call. And as I'm checking filesystems not setting fsid currently are:

afs, coda, nfs - networking filesystems where inotify and fanotify have
  dubious value anyway

configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
  quite limited. But some special cases could be useful. Adding fsid support
  is the same amount of trouble as for kernfs - a few LOC. In fact, we
  could perhaps add a fstype flag to indicate that this is a filesystem
  without persistent identification and so uuid should be autogenerated on
  mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
  This way we could handle all these filesystems with trivial amount of
  effort.

freevxfs - the only real filesystem without f_fsid. Trivial to handle one
  way or the other.

So I don't think we need new uAPI additions to finish off this TODO item.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19  9:08               ` Max Kellermann
@ 2023-09-19 10:01                 ` Jan Kara
  2023-09-19 10:42                   ` Amir Goldstein
  2023-09-19 10:42                   ` Max Kellermann
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2023-09-19 10:01 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Amir Goldstein, linux-fsdevel, Christian Brauner, Jan Kara,
	Ivan Babrou, Matthew Bobrowski

On Tue 19-09-23 11:08:21, Max Kellermann wrote:
> On Tue, Sep 19, 2023 at 9:17 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > As my summary above states, it is correct that fanotify does not
> > yet fully supersedes inotify, but there is a plan to go in this direction,
> > hence, inotify is "being phased out" it is not Obsolete nor Deprecated.
> 
> I agree that if inotify is to be phased out, we should concentrate on fanotify.
> 
> I'm however somewhat disappointed with the complexity of the fanotify
> API. I'm not entirely convinced that fanotify is a good successor for
> inotify, or that inotify should really be replaced. The additional
> features that fanotify has could have been added to inotify instead; I
> don't get why this needs an entirely new API. Of course, I'm late to
> complain, having just learned about (the unprivileged availability of)
> fanotify many years after it has been invented.
> 
> System calls needed for one inotify event:
> - read()
> 
> System calls needed for one fanotify event:
> - read()
> - (do some magic to look up the fsid -
> https://github.com/martinpitt/fatrace/blob/master/fatrace.c implements
> a lookup table, yet more complexity that doesn't exist with inotify)
> - open() to get a file descriptor for the fsid
> - open_by_handle_at(fsid_fd, fid.handle)
> - readlink("/proc/self/fd/%d") (which adds a dependency on /proc being mounted)
> - close(fd)
> - close(fsid_fd)
> 
> I should mention that this workflow still needs a privileged process -
> yes, I can use fanotify in an unprivileged process, but
> open_by_handle_at() is a privileged system call - it requires
> CAP_DAC_READ_SEARCH. Without it, I cannot obtain information on which
> file was modified, can I?
> There is FAN_REPORT_NAME, but it gives me only the name of the
> directory entry; but since I'm watching a lot of files and all of them
> are called "memory.events.local", that's of no use.
> 
> Or am I supposed to use name_to_handle_at() on all watched files to
> roll my own lookup? (The name_to_hamdle_at() manpage doesn't make me
> confident it's a reliable system call; it sounds like it needs
> explicit support from filesystems.)

So with inotify event, you get back 'wd' and 'name' to identify the object
where the event happened. How is this (for your usecase) different from
getting back 'fsid + handle' and 'name' back from fanotify? In inotify case
you had to somehow track wd -> path linkage, with fanotify you need to
track 'fsid + handle' -> path linkage.

> > FAN_REPORT_FID is designed in a way to be almost a drop in replacement
> > for inotify watch descriptors as an opaque identifier of the object, except that
> > fsid+fhanle provide much stronger functionality than wd did.
> 
> How is it stronger?

For your particular usecase I don't think there's any advantage of
fsid+fhandle over plain wd. But if you want to monitor multiple filesystems
or if you have priviledged process that can open by handle, or a standard
filesystem where handles are actually persistent, then there are benefits.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 10:01                 ` Jan Kara
@ 2023-09-19 10:42                   ` Amir Goldstein
  2023-09-19 10:48                     ` Max Kellermann
  2023-09-19 10:42                   ` Max Kellermann
  1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 10:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Max Kellermann, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 1:01 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 19-09-23 11:08:21, Max Kellermann wrote:
> > On Tue, Sep 19, 2023 at 9:17 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > As my summary above states, it is correct that fanotify does not
> > > yet fully supersedes inotify, but there is a plan to go in this direction,
> > > hence, inotify is "being phased out" it is not Obsolete nor Deprecated.
> >
> > I agree that if inotify is to be phased out, we should concentrate on fanotify.
> >
> > I'm however somewhat disappointed with the complexity of the fanotify
> > API. I'm not entirely convinced that fanotify is a good successor for
> > inotify, or that inotify should really be replaced. The additional
> > features that fanotify has could have been added to inotify instead; I
> > don't get why this needs an entirely new API. Of course, I'm late to
> > complain, having just learned about (the unprivileged availability of)
> > fanotify many years after it has been invented.
> >
> > System calls needed for one inotify event:
> > - read()
> >
> > System calls needed for one fanotify event:
> > - read()
> > - (do some magic to look up the fsid -
> > https://github.com/martinpitt/fatrace/blob/master/fatrace.c implements
> > a lookup table, yet more complexity that doesn't exist with inotify)
> > - open() to get a file descriptor for the fsid
> > - open_by_handle_at(fsid_fd, fid.handle)
> > - readlink("/proc/self/fd/%d") (which adds a dependency on /proc being mounted)
> > - close(fd)
> > - close(fsid_fd)
> >
> > I should mention that this workflow still needs a privileged process -
> > yes, I can use fanotify in an unprivileged process, but
> > open_by_handle_at() is a privileged system call - it requires
> > CAP_DAC_READ_SEARCH. Without it, I cannot obtain information on which
> > file was modified, can I?
> > There is FAN_REPORT_NAME, but it gives me only the name of the
> > directory entry; but since I'm watching a lot of files and all of them
> > are called "memory.events.local", that's of no use.
> >
> > Or am I supposed to use name_to_handle_at() on all watched files to
> > roll my own lookup? (The name_to_hamdle_at() manpage doesn't make me
> > confident it's a reliable system call; it sounds like it needs
> > explicit support from filesystems.)
>
> So with inotify event, you get back 'wd' and 'name' to identify the object
> where the event happened. How is this (for your usecase) different from
> getting back 'fsid + handle' and 'name' back from fanotify? In inotify case
> you had to somehow track wd -> path linkage, with fanotify you need to
> track 'fsid + handle' -> path linkage.
>

And if you want to see an implementation of a drop-in replacement
of inotify/fanotify, you can take a look at:

https://github.com/inotify-tools/inotify-tools/pull/134

And specifically the first commit
41b2ec4 ("Index watches by fanotify fid") to understand why
fid is a drop-in replacement for wd.

> > > FAN_REPORT_FID is designed in a way to be almost a drop in replacement
> > > for inotify watch descriptors as an opaque identifier of the object, except that
> > > fsid+fhanle provide much stronger functionality than wd did.
> >
> > How is it stronger?
>
> For your particular usecase I don't think there's any advantage of
> fsid+fhandle over plain wd. But if you want to monitor multiple filesystems
> or if you have priviledged process that can open by handle, or a standard
> filesystem where handles are actually persistent, then there are benefits.
>

Those cases are demonstrated in the --filesystem functionality of the
pull request above, which handles "dynamic watches" instead of
having to setup watches recursively on all subdirs.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 10:01                 ` Jan Kara
  2023-09-19 10:42                   ` Amir Goldstein
@ 2023-09-19 10:42                   ` Max Kellermann
  2023-09-19 10:58                     ` Amir Goldstein
  1 sibling, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 10:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 12:01 PM Jan Kara <jack@suse.cz> wrote:
> So with inotify event, you get back 'wd' and 'name' to identify the object
> where the event happened. How is this (for your usecase) different from
> getting back 'fsid + handle' and 'name' back from fanotify? In inotify case
> you had to somehow track wd -> path linkage, with fanotify you need to
> track 'fsid + handle' -> path linkage.

The wd is a simple "int" which is the return value of the system call,
and it's part of "struct inotify_event". One system call for
registering it, one system call fo reading it.

From fanotify, I read a "struct fanotify_event_metadata", and then
check variable-length follow-up structs, iterable those follow-up
structs, find the one with "info_type==FAN_EVENT_INFO_TYPE_FID", now I
have a "fsid" of type "__kernel_fsid_t" (a struct containing two
32-bit integers) and a "file_handle" (a variable-length opaque BLOB).
What do I do with these?

The answer appears to be: when I registered, I should have obtained
the fsid (via statfs()) and the file_handle (via name_to_handle_at()).
That's three extra system calls. One statfs(), and twice
name_to_handle_at(), because the first one is needed to get the length
of the buffer I need to allocate for the file_handle (and hope my
filesystem supports file_handles, because apparently that's an
optional feature). Just look at the name_to_handle_at() manpage for
some horrors of its complexity.

Imagine how much more complex the data structure for looking up the
modified file is: inotify has an int as the lookup key, and fanotify
has two integers plus a variable-length BLOB.

> But if you want to monitor multiple filesystems

I can monitor multiple filesystems with inotify.

> or if you have priviledged process that can open by handle

Getting an already-opened file descriptor, or just the file_handle, is
certainly an interesting fanotify feature. But that could have easily
been added to inotify with a new "mask" flag for the
inotify_add_watch() function.

> or a standard filesystem where handles are actually persistent, then there are benefits.

Same here: that could have been an (optional) inotify feature, instead
of making the whole complexity mandatory for everybody.

Max

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 10:42                   ` Amir Goldstein
@ 2023-09-19 10:48                     ` Max Kellermann
  0 siblings, 0 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 10:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 12:42 PM Amir Goldstein <amir73il@gmail.com> wrote:
> Those cases are demonstrated in the --filesystem functionality of the
> pull request above, which handles "dynamic watches" instead of
> having to setup watches recursively on all subdirs.

The whole-filesystem mode is certainly the most interesting fanotify
feature, and the lack of it the greatest weakness of inotify - but
API-wise, it could have been implemented as a new "mask" flag in
inotify_add_watch() with no extra API complexity. That still doesn't
justify the huge complexity of fanotify.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 10:42                   ` Max Kellermann
@ 2023-09-19 10:58                     ` Amir Goldstein
  2023-09-19 11:21                       ` Max Kellermann
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 10:58 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 1:43 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Tue, Sep 19, 2023 at 12:01 PM Jan Kara <jack@suse.cz> wrote:
> > So with inotify event, you get back 'wd' and 'name' to identify the object
> > where the event happened. How is this (for your usecase) different from
> > getting back 'fsid + handle' and 'name' back from fanotify? In inotify case
> > you had to somehow track wd -> path linkage, with fanotify you need to
> > track 'fsid + handle' -> path linkage.
>
> The wd is a simple "int" which is the return value of the system call,
> and it's part of "struct inotify_event". One system call for
> registering it, one system call fo reading it.
>
> From fanotify, I read a "struct fanotify_event_metadata", and then
> check variable-length follow-up structs, iterable those follow-up
> structs, find the one with "info_type==FAN_EVENT_INFO_TYPE_FID", now I
> have a "fsid" of type "__kernel_fsid_t" (a struct containing two
> 32-bit integers) and a "file_handle" (a variable-length opaque BLOB).
> What do I do with these?
>
> The answer appears to be: when I registered, I should have obtained
> the fsid (via statfs()) and the file_handle (via name_to_handle_at()).
> That's three extra system calls. One statfs(), and twice
> name_to_handle_at(), because the first one is needed to get the length
> of the buffer I need to allocate for the file_handle (and hope my
> filesystem supports file_handles, because apparently that's an
> optional feature). Just look at the name_to_handle_at() manpage for
> some horrors of its complexity.
>
> Imagine how much more complex the data structure for looking up the
> modified file is: inotify has an int as the lookup key, and fanotify
> has two integers plus a variable-length BLOB.
>

You are not describing a technical problem.
Any API complexity can be hidden from users with userspace
libraries. You can use the inotify-tools lib if you prefer.

I assure you that the added complexity to the API was not
done to make your life harder.
inotify API has several design flaws and fanotify API extensions
were designed to address those flaws.

> > But if you want to monitor multiple filesystems
>
> I can monitor multiple filesystems with inotify.
>
> > or if you have priviledged process that can open by handle
>
> Getting an already-opened file descriptor, or just the file_handle, is
> certainly an interesting fanotify feature. But that could have easily
> been added to inotify with a new "mask" flag for the
> inotify_add_watch() function.
>

"could have easily been added" is not a statement that I am willing
to accept.

You are saying that because you do not understand the complexity
involved and that is fine - you can ask.

The things that you are complaining about in the API are the exact
things that were needed to make the advanced features work.

Beyond that, it is a matter of API consolidation -
We prefer to maintain a single unified API that can cover all
the use cases over maintaining several overlapping APIs.

The complexity added to the API for simple use cases can
be mitigated with user libraries - it is not a good reason IMO
to keep maintaining an old limited API in parallel to a new
improved one.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 10:58                     ` Amir Goldstein
@ 2023-09-19 11:21                       ` Max Kellermann
  2023-09-19 12:21                         ` Amir Goldstein
  2023-09-19 13:28                         ` Jan Kara
  0 siblings, 2 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 11:21 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 12:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> Any API complexity can be hidden from users with userspace
> libraries. You can use the inotify-tools lib if you prefer.

That doesn't convince me at all, but that's a question of taste. We'll
just keep using inotify (with a patched kernel, which we have anyway).

> > Getting an already-opened file descriptor, or just the file_handle, is
> > certainly an interesting fanotify feature. But that could have easily
> > been added to inotify with a new "mask" flag for the
> > inotify_add_watch() function.
> >
>
> "could have easily been added" is not a statement that I am willing
> to accept.

Are you willing to take a bet? I come up with a patch for implementing
this for inotify, let's say within a week, and you agree to merge it?

(I'm not interested in this feature, I won't ever use it - all I
wanted is dfd support for inotify_add_watch()).

> The things that you are complaining about in the API are the exact
> things that were needed to make the advanced features work.

Not exactly - I complain that fanotify makes the complexity mandatory,
the complexity is the baseline of the API. It would have been possible
to design an API that is simple for 99% of all users, as simple as
inotify; and only those who need the advanced features get the
complexity as an option.

I don't agree with your point that unnecessary complexity should be
mitigated by throwing more (library) code at it. That's just adding
more complexity and more overhead, the opposite of what I want.

Max

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 11:21                       ` Max Kellermann
@ 2023-09-19 12:21                         ` Amir Goldstein
  2023-09-19 12:51                           ` Max Kellermann
  2023-09-19 13:28                         ` Jan Kara
  1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 12:21 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 2:22 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Tue, Sep 19, 2023 at 12:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > Any API complexity can be hidden from users with userspace
> > libraries. You can use the inotify-tools lib if you prefer.
>
> That doesn't convince me at all, but that's a question of taste. We'll
> just keep using inotify (with a patched kernel, which we have anyway).
>

ok.

> > > Getting an already-opened file descriptor, or just the file_handle, is
> > > certainly an interesting fanotify feature. But that could have easily
> > > been added to inotify with a new "mask" flag for the
> > > inotify_add_watch() function.
> > >
> >
> > "could have easily been added" is not a statement that I am willing
> > to accept.
>
> Are you willing to take a bet? I come up with a patch for implementing
> this for inotify, let's say within a week, and you agree to merge it?
>
> (I'm not interested in this feature, I won't ever use it - all I
> wanted is dfd support for inotify_add_watch()).
>

I am not into ego fights. I have no desire to win an argument.
If you have an improvement that you want to make, you can
submit it and it will be judged technically.

If you want to improve inotify you can argue your case
and it will be judged technically.

But if you do that, I strongly advise to share the community
early in the design review of the new feature/API.
It can save you time.

> > The things that you are complaining about in the API are the exact
> > things that were needed to make the advanced features work.
>
> Not exactly - I complain that fanotify makes the complexity mandatory,
> the complexity is the baseline of the API. It would have been possible
> to design an API that is simple for 99% of all users, as simple as
> inotify; and only those who need the advanced features get the
> complexity as an option.
>
> I don't agree with your point that unnecessary complexity should be
> mitigated by throwing more (library) code at it. That's just adding
> more complexity and more overhead, the opposite of what I want.
>

Sorry, "what I want" is not a technical argument :)
"what many users want" with proof could be a start of a technical
argument.

I agree that simplicity of the kernel UAPI vs. delegating
simplicity to user libraries is a matter of taste and different subsystem
maintainers have different opinions in that regard.

And also, it is a bit late to discuss design preferences of an API
that was merged 4 years ago.
Design flaws and problems, sure, but for complexity it's a bit late.

Regarding inotify improvements, as I wrote, they will each be judged
technically, but the trend is towards phasing it out.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 12:21                         ` Amir Goldstein
@ 2023-09-19 12:51                           ` Max Kellermann
  2023-09-19 13:01                             ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 12:51 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 2:21 PM Amir Goldstein <amir73il@gmail.com> wrote:
> Regarding inotify improvements, as I wrote, they will each be judged
> technically, but the trend is towards phasing it out.

Then please reconsider merging inotify_add_watch_at(). It is a rather
trivial patch set, only exposing a user_path_at() parameter to use
space, like many other new system calls did with other old-style
system calls. Only the last patch, the one which adds the new system
call ot all arch-specific tables, is an ugly one, but that's not a
property of the new feature but a general property of how system calls
are wired in Linux.

My proposed system call adds real value to all those who are currently
using inotify, allowing them to use inotify with a modern and safe and
race-free syscall interface, eliminating the unsafe fchdir() dance to
emulate it in userspace.

The inotify interface is widely used and will be for a long time to
come, while it is hard to find code which already uses fanotify.
GitHub code search finds 438 occurences of fanotify_init() calls, 4.6k
inotify_init1() calls and 6.9k inotify_init() calls. Given the added
complexity of fanotify and the uselessness of most of fanotify's
feature for most software (except for dfd support), it is extremely
unlikely that a noticable fraction of those thousands of projects will
ever migrate to fanotify. Even if inotify is considered a legacy API,
it should be allowed to modernize it; and adding dfd support to system
calls is really important.

Max

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 12:51                           ` Max Kellermann
@ 2023-09-19 13:01                             ` Amir Goldstein
  2023-09-19 13:11                               ` Max Kellermann
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 13:01 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 3:51 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Tue, Sep 19, 2023 at 2:21 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > Regarding inotify improvements, as I wrote, they will each be judged
> > technically, but the trend is towards phasing it out.
>
> Then please reconsider merging inotify_add_watch_at(). It is a rather
> trivial patch set, only exposing a user_path_at() parameter to use
> space, like many other new system calls did with other old-style
> system calls. Only the last patch, the one which adds the new system
> call ot all arch-specific tables, is an ugly one, but that's not a
> property of the new feature but a general property of how system calls
> are wired in Linux.
>
> My proposed system call adds real value to all those who are currently
> using inotify, allowing them to use inotify with a modern and safe and
> race-free syscall interface, eliminating the unsafe fchdir() dance to
> emulate it in userspace.
>
> The inotify interface is widely used and will be for a long time to
> come, while it is hard to find code which already uses fanotify.
> GitHub code search finds 438 occurences of fanotify_init() calls, 4.6k
> inotify_init1() calls and 6.9k inotify_init() calls. Given the added
> complexity of fanotify and the uselessness of most of fanotify's
> feature for most software (except for dfd support), it is extremely
> unlikely that a noticable fraction of those thousands of projects will
> ever migrate to fanotify. Even if inotify is considered a legacy API,
> it should be allowed to modernize it; and adding dfd support to system
> calls is really important.
>

Both Jan and I already gave an answer to this specific patch.
The answer was no.

We do not add new system calls for doing something that is already
possible with existing system calls to make the life of a programmer
easier - this has never been a valid argument for adding a new syscall.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 13:01                             ` Amir Goldstein
@ 2023-09-19 13:11                               ` Max Kellermann
  2023-09-19 13:22                                 ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 13:11 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 3:01 PM Amir Goldstein <amir73il@gmail.com> wrote:
> We do not add new system calls for doing something that is already
> possible with existing system calls to make the life of a programmer
> easier - this has never been a valid argument for adding a new syscall.

- it's not possible to safely add an inotify watch; this isn't about
making something easier, but about making something
safe/reliable/race-free in a way that wasn't possible before
- there are many precedents of new system calls just to add dfd
support (fchmodat, execveat, linkat, mkdirat, ....)
- there are also a few new system calls that were added to make the
life of a programmer easier even though the same was already possible
with existing system calls (close_range, process_madvise, pidfd_getfd,
mount_setattr, ...)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 13:11                               ` Max Kellermann
@ 2023-09-19 13:22                                 ` Amir Goldstein
  2023-09-19 13:41                                   ` Max Kellermann
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 13:22 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 4:11 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Tue, Sep 19, 2023 at 3:01 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > We do not add new system calls for doing something that is already
> > possible with existing system calls to make the life of a programmer
> > easier - this has never been a valid argument for adding a new syscall.
>
> - it's not possible to safely add an inotify watch; this isn't about
> making something easier, but about making something
> safe/reliable/race-free in a way that wasn't possible before

Yes, I meant it is possible to get the very similar functionality in
a race-free way using fanotify.
If fanotify does not meet your requirements please let us know
in what way and perhaps fanotify could be improved.
Using "inotify and not fanotity" is not a legit technical requirement.

> - there are many precedents of new system calls just to add dfd
> support (fchmodat, execveat, linkat, mkdirat, ....)
> - there are also a few new system calls that were added to make the
> life of a programmer easier even though the same was already possible
> with existing system calls (close_range, process_madvise, pidfd_getfd,
> mount_setattr, ...)

All those new syscalls add new functionality/security/performance.
If you think they were added to make the life of the programmer easier
you did not understand them.

Anyway, I've said my opinion about inotify_add_watch_at().
final call is up to Jan.

Thanks,
Amir.

P.S. you may be able to provide magic /proc/self/$fd symlinks
as path argument to inotify_add_watch() after opening them
with O_PATH to get what you want - I didn't try.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 11:21                       ` Max Kellermann
  2023-09-19 12:21                         ` Amir Goldstein
@ 2023-09-19 13:28                         ` Jan Kara
  2023-09-19 13:48                           ` Max Kellermann
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Kara @ 2023-09-19 13:28 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Amir Goldstein, Jan Kara, linux-fsdevel, Christian Brauner,
	Ivan Babrou, Matthew Bobrowski

On Tue 19-09-23 13:21:56, Max Kellermann wrote:
> On Tue, Sep 19, 2023 at 12:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > Getting an already-opened file descriptor, or just the file_handle, is
> > > certainly an interesting fanotify feature. But that could have easily
> > > been added to inotify with a new "mask" flag for the
> > > inotify_add_watch() function.
> > >
> >
> > "could have easily been added" is not a statement that I am willing
> > to accept.
> 
> Are you willing to take a bet? I come up with a patch for implementing
> this for inotify, let's say within a week, and you agree to merge it?

I guess no point in you wasting time for this. But if you'd try, I'll
really find out it isn't so easy. Inotify event is fixed length so
fsid+fhandle is completely out of the realm of "easy extension". If you
wanted to return fd instead of wd, that would be doable with some kind of a
flag in the mark mask, although it would be a bit inconsistent with the
rest of the inotify API.

> > The things that you are complaining about in the API are the exact
> > things that were needed to make the advanced features work.
> 
> Not exactly - I complain that fanotify makes the complexity mandatory,
> the complexity is the baseline of the API. It would have been possible
> to design an API that is simple for 99% of all users, as simple as
> inotify; and only those who need the advanced features get the
> complexity as an option.

Well yes, fanotify could have been designed to make basic usage easier. But
the design (some 15 years ago) was focusing more on filling in the
functional gaps inotify had for usecases such as anti-virus monitors etc.
and kind of left thinking about simple usecases for sometime later.
So we have what we have.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 13:22                                 ` Amir Goldstein
@ 2023-09-19 13:41                                   ` Max Kellermann
  2023-09-19 13:56                                     ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 13:41 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 3:22 PM Amir Goldstein <amir73il@gmail.com> wrote:
> Yes, I meant it is possible to get the very similar functionality in
> a race-free way using fanotify.

That's not the same. We already agreed that fanotify still misses
features that have been available in inotify since forever. Going
fanotify requires a rewrite of large chunks of code. Rejecting trivial
inotify improvements because people should be using fanotify doesn't
make real-world users happy.

> If fanotify does not meet your requirements please let us know
> in what way and perhaps fanotify could be improved.

- return a watch descriptor (like inotify) as a fixed-size lookup key
- add an option so returned events contain the watch descriptor and
path relative to it (like inotify), not just the directory entry name
- allow unprivileged processes to use this new option instead of FAN_REPORT_FID

Supporting this simplified API still makes fanotify harder to use than
inotify, but retains fanotify's full power while minimizing its API
churn for the 99% of users who were already happy with inotify's
feature set.

> > - there are many precedents of new system calls just to add dfd
> > support (fchmodat, execveat, linkat, mkdirat, ....)
> > - there are also a few new system calls that were added to make the
> > life of a programmer easier even though the same was already possible
> > with existing system calls (close_range, process_madvise, pidfd_getfd,
> > mount_setattr, ...)
>
> All those new syscalls add new functionality/security/performance.

So does inotify_add_watch_at().

On the other hand, fanotify reduces performance by adding complexity
and overhead - more system calls necessary, increased lookup overhead
due to variable-length keys instead of 32-bit integers.

> If you think they were added to make the life of the programmer easier
> you did not understand them.

Oh please. Don't be so arrogant.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 13:28                         ` Jan Kara
@ 2023-09-19 13:48                           ` Max Kellermann
  0 siblings, 0 replies; 32+ messages in thread
From: Max Kellermann @ 2023-09-19 13:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

On Tue, Sep 19, 2023 at 3:28 PM Jan Kara <jack@suse.cz> wrote:
> Inotify event is fixed length so
> fsid+fhandle is completely out of the realm of "easy extension".

(Not quite true, it's variable-length. But that's not relevant if
we're talking about adding an optional feature.)

If I were to implement this, I'd add a mask bit called, say,
IN_FILE_HANDLE. If that bit is enabled on a watch, the
(variable-length) inotify_event would be followed by another
(variable-length) struct that looks just like fanotify_event_info_fid,
containining fsid and file_handle (of course, only if the same bit is
also set in inotify_event.mask, which would give the kernel a way to
omit it if it's not available for a certain file).

This is a backwards-compatible extension because it's opt-in. Only
applications which support it will set the IN_FILE_HANDLE bit. Old
applications don't set this bit and never see this trailing struct.

Max

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19 13:41                                   ` Max Kellermann
@ 2023-09-19 13:56                                     ` Amir Goldstein
  0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 13:56 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, Ivan Babrou,
	Matthew Bobrowski

> > > - there are many precedents of new system calls just to add dfd
> > > support (fchmodat, execveat, linkat, mkdirat, ....)
> > > - there are also a few new system calls that were added to make the
> > > life of a programmer easier even though the same was already possible
> > > with existing system calls (close_range, process_madvise, pidfd_getfd,
> > > mount_setattr, ...)
> >
> > All those new syscalls add new functionality/security/performance.
>
> So does inotify_add_watch_at().
>
> On the other hand, fanotify reduces performance by adding complexity
> and overhead - more system calls necessary, increased lookup overhead
> due to variable-length keys instead of 32-bit integers.
>

Technical arguments of performance need to be backed up by
performance numbers from real life workloads.
I am not inventing this stuff as I go.
This is how kernel development works.

> > If you think they were added to make the life of the programmer easier
> > you did not understand them.
>
> Oh please. Don't be so arrogant.

I will try. Please try as well to accept a different POV.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: inotify maintenance status
  2023-09-19  9:48             ` Jan Kara
@ 2023-09-19 14:55               ` Amir Goldstein
  0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2023-09-19 14:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christian Brauner, Max Kellermann

On Tue, Sep 19, 2023 at 12:48 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 18-09-23 21:05:11, Amir Goldstein wrote:
> > [Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/]
> ...
> > BTW, before we can really mark inotify as Obsolete and document that
> > inotify was superseded by fanotify, there are at least two items on the
> > TODO list [1]:
>
> Yeah, as I wrote in the original thread, I don't feel like inotify should
> be marked as obsolete (at least for some more time) so we are on the same
> page here I think.
>
> > 1. UNMOUNT/IGNORED events
> > 2. Filesystems without fid support
> >
> > MOUNT/UNMOUNT fanotify events have already been discussed
> > and the feature has known users.
> >
> > Christian has also mentioned [1] the IN_UNMOUNT use case for
> > waiting for sb shutdown several times and I will not be surprised
> > to see systemd starting to use inotify for that use case before too long...
>
> Yup, both FAN_UNMOUNT and FAN_IGNORED should be easy. Unlike inotify, I'd
> just make these explicit events you can opt into and not something you
> always get.
>
> > Regarding the second item on the TODO list, we have had this discussion
> > before - if we are going to continue the current policy of opting-in to
> > fanotify (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd
> > filesystems that only support inotify and we will need to keep supporting
> > inotify only for the users that use inotify on those odd filesystems.
> >
> > OTOH, if we implement FAN_REPORT_DINO_NAME, we could
> > have fanotify inode mark support for any filesystem, where the
> > pinned marked inode ino is the object id.
>
> Is it a real problem after your work to allow filehandles that are not
> necessarily usable for NFS export or open_by_handle()? As far as I remember
> fanotify should be now able to handle anything that provides f_fsid in its

Not exactly. We still have a requirement for non-empty
dentry->d_sb->s_export_op in fanotify_test_fid(), to align with
the same requirement for AT_HANDLE_FID support.

> statfs(2) call. And as I'm checking filesystems not setting fsid currently are:
>
> afs, coda, nfs - networking filesystems where inotify and fanotify have
>   dubious value anyway
>
> configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
>   quite limited. But some special cases could be useful. Adding fsid support
>   is the same amount of trouble as for kernfs - a few LOC. In fact, we
>   could perhaps add a fstype flag to indicate that this is a filesystem
>   without persistent identification and so uuid should be autogenerated on
>   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
>   This way we could handle all these filesystems with trivial amount of
>   effort.

This sounds good to me.
I have a vague memory of suggesting the same and I think
Christian had objections, but I may be remembering wrong.

Possibly, the same opt-in fstype flag could allow also trivial
AT_HANDLE_FID support with the default export_encode_fh()?

>
> freevxfs - the only real filesystem without f_fsid. Trivial to handle one
>   way or the other.
>
> So I don't think we need new uAPI additions to finish off this TODO item.

Yes, I'd love that.
I can try to post something if there are no objections.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-09-19 14:55 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 12:32 [PATCH 1/4] inotify_user: pass directory fd to inotify_find_inode() Max Kellermann
2023-09-18 12:32 ` [PATCH 2/4] inotify_user: move code to do_inotify_add_watch() Max Kellermann
2023-09-18 12:32 ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
2023-09-18 12:40   ` Jan Kara
2023-09-18 13:57     ` Max Kellermann
2023-09-18 14:23       ` Jan Kara
2023-09-18 15:28         ` Amir Goldstein
2023-09-18 18:05           ` inotify maintenance status Amir Goldstein
2023-09-19  7:16             ` Amir Goldstein
2023-09-19  9:08               ` Max Kellermann
2023-09-19 10:01                 ` Jan Kara
2023-09-19 10:42                   ` Amir Goldstein
2023-09-19 10:48                     ` Max Kellermann
2023-09-19 10:42                   ` Max Kellermann
2023-09-19 10:58                     ` Amir Goldstein
2023-09-19 11:21                       ` Max Kellermann
2023-09-19 12:21                         ` Amir Goldstein
2023-09-19 12:51                           ` Max Kellermann
2023-09-19 13:01                             ` Amir Goldstein
2023-09-19 13:11                               ` Max Kellermann
2023-09-19 13:22                                 ` Amir Goldstein
2023-09-19 13:41                                   ` Max Kellermann
2023-09-19 13:56                                     ` Amir Goldstein
2023-09-19 13:28                         ` Jan Kara
2023-09-19 13:48                           ` Max Kellermann
2023-09-19  9:26             ` Christian Brauner
2023-09-19  9:48             ` Jan Kara
2023-09-19 14:55               ` Amir Goldstein
2023-09-18 19:45     ` [PATCH 3/4] inotify_user: add system call inotify_add_watch_at() Max Kellermann
2023-09-18 12:32 ` [PATCH 4/4] arch: register inotify_add_watch_at Max Kellermann
2023-09-18 20:24   ` kernel test robot
2023-09-18 20:24   ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.