linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] unprivileged fanotify listener
@ 2021-01-24 18:42 Amir Goldstein
  2021-01-24 18:42 ` [RFC][PATCH 1/2] fanotify: configurable limits via sysfs Amir Goldstein
  2021-01-24 18:42 ` [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
  0 siblings, 2 replies; 16+ messages in thread
From: Amir Goldstein @ 2021-01-24 18:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, linux-api

Jan,

These patches try to implement the minimal set and least controversial
functionality that we can allow for unprivileged users as a starting
point.

I tried to be as conservative as I can with the system limits, but
I wasn't sure how to handle the per group marks limit, so I left both
per group and per user limits which looks quite confusing.

I tested unprivileged listener with Matthew's LTP tests [1].
I do not have test for the sysfs tunables yet, but I verified that
existing LTP tests fail when lowering each of the tunables to 1 and
pass after setting them back up.

I think that the sysfs tunables can be considered even without the
unprivileged listener.

Thanks,
Amir.

[1] https://github.com/amir73il/ltp/commits/fanotify_unpriv

Amir Goldstein (2):
  fanotify: configurable limits via sysfs
  fanotify: support limited functionality for unprivileged users

 fs/notify/fanotify/fanotify.c      |  14 ++-
 fs/notify/fanotify/fanotify_user.c | 155 +++++++++++++++++++++++++----
 fs/notify/fdinfo.c                 |   3 +-
 include/linux/fanotify.h           |  19 ++++
 include/linux/fsnotify_backend.h   |   2 +-
 include/linux/sched/user.h         |   3 -
 include/linux/user_namespace.h     |   4 +
 kernel/sysctl.c                    |  12 ++-
 kernel/ucount.c                    |   4 +
 9 files changed, 183 insertions(+), 33 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 1/2] fanotify: configurable limits via sysfs
  2021-01-24 18:42 [RFC][PATCH 0/2] unprivileged fanotify listener Amir Goldstein
@ 2021-01-24 18:42 ` Amir Goldstein
  2021-02-16 16:27   ` Jan Kara
  2021-01-24 18:42 ` [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
  1 sibling, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-01-24 18:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, linux-api

fanotify has some hardcoded limits. The only APIs to escape those limits
are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.

Allow finer grained tuning of the system limits via sysfs tunables under
/proc/sys/fs/fanotify/, similar to tunables under /proc/sys/fs/inotify,
with some minor differences.

- max_queued_events - global system tunable for group queue size limit.
  Like the inotify tunable with the same name, it defaults to 16384 and
  applies on initialization of a new group.

- max_listener_marks - global system tunable of marks limit per group.
  Defaults to 8192. inotify has no per group marks limit.

- max_user_marks - user ns tunable for marks limit per user.
  Like the inotify tunable named max_user_watches, it defaults to 1048576
  and is accounted for every containing user ns.

- max_user_listeners - user ns tunable for number of listeners per user.
  Like the inotify tunable named max_user_instances, it defaults to 128
  and is accounted for every containing user ns.

The slightly different tunable names are derived from the "listener" and
"mark" terminology used in the fanotify man pages.

max_listener_marks was kept for compatibility with legacy fanotify
behavior. Given that inotify max_user_instances was increased from 8192
to 1048576 in kernel v5.10, we may want to consider changing also the
default for max_listener_marks or remove it completely, leaving only the
per user marks limit.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      |  14 ++--
 fs/notify/fanotify/fanotify_user.c | 106 ++++++++++++++++++++++++-----
 include/linux/fanotify.h           |   3 +
 include/linux/fsnotify_backend.h   |   2 +-
 include/linux/sched/user.h         |   3 -
 include/linux/user_namespace.h     |   4 ++
 kernel/sysctl.c                    |  12 +++-
 kernel/ucount.c                    |   4 ++
 8 files changed, 121 insertions(+), 27 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 1192c9953620..aaa81ce916c3 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -736,11 +736,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 
 static void fanotify_free_group_priv(struct fsnotify_group *group)
 {
-	struct user_struct *user;
-
-	user = group->fanotify_data.user;
-	atomic_dec(&user->fanotify_listeners);
-	free_uid(user);
+	if (group->fanotify_data.ucounts)
+		dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_LISTENERS);
 }
 
 static void fanotify_free_path_event(struct fanotify_event *event)
@@ -796,6 +793,12 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
 	}
 }
 
+static void fanotify_freeing_mark(struct fsnotify_mark *mark,
+				  struct fsnotify_group *group)
+{
+	dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS);
+}
+
 static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
 {
 	kmem_cache_free(fanotify_mark_cache, fsn_mark);
@@ -805,5 +808,6 @@ const struct fsnotify_ops fanotify_fsnotify_ops = {
 	.handle_event = fanotify_handle_event,
 	.free_group_priv = fanotify_free_group_priv,
 	.free_event = fanotify_free_event,
+	.freeing_mark = fanotify_freeing_mark,
 	.free_mark = fanotify_free_mark,
 };
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b78dd1f88f7c..4ade3f9df337 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -27,8 +27,65 @@
 #include "fanotify.h"
 
 #define FANOTIFY_DEFAULT_MAX_EVENTS	16384
-#define FANOTIFY_DEFAULT_MAX_MARKS	8192
+#define FANOTIFY_OLD_DEFAULT_MAX_MARKS	8192
 #define FANOTIFY_DEFAULT_MAX_LISTENERS	128
+/*
+ * Legacy fanotify marks limits is per group and we introduced an additional
+ * limit of marks per user, similar to inotify.  Effectively, the legacy limit
+ * of fanotify marks per user is <max marks per group> * <max groups per user>.
+ * This default limit (1M) also happens to match the increased limit for inotify
+ * max_user_watches since v5.10.
+ */
+#define FANOTIFY_DEFAULT_MAX_LISTENER_MARKS \
+	FANOTIFY_OLD_DEFAULT_MAX_MARKS
+#define FANOTIFY_DEFAULT_MAX_USER_MARKS	\
+	(FANOTIFY_DEFAULT_MAX_LISTENER_MARKS * FANOTIFY_DEFAULT_MAX_LISTENERS)
+
+/* configurable via /proc/sys/fs/fanotify/ */
+static int fanotify_max_queued_events __read_mostly;
+static int fanotify_max_listener_marks __read_mostly;
+
+#ifdef CONFIG_SYSCTL
+
+#include <linux/sysctl.h>
+
+struct ctl_table fanotify_table[] = {
+	{
+		.procname	= "max_user_listeners",
+		.data		= &init_user_ns.ucount_max[UCOUNT_FANOTIFY_LISTENERS],
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "max_user_marks",
+		.data		= &init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS],
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "max_listener_marks",
+		.data		= &fanotify_max_listener_marks,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO
+	},
+	{
+		.procname	= "max_queued_events",
+		.data		= &fanotify_max_queued_events,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO
+	},
+	{ }
+};
+#endif /* CONFIG_SYSCTL */
+
 
 /*
  * All flags that may be specified in parameter event_f_flags of fanotify_init.
@@ -822,24 +879,36 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 						   unsigned int type,
 						   __kernel_fsid_t *fsid)
 {
+	struct ucounts *ucounts = group->fanotify_data.ucounts;
 	struct fsnotify_mark *mark;
 	int ret;
 
-	if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks)
+	/*
+	 * Enfore global limit of marks per group and limit of marks per user
+	 * for the user that initiated the group in all containing user ns.
+	 */
+	if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks ||
+	    !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
 		return ERR_PTR(-ENOSPC);
 
 	mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
-	if (!mark)
-		return ERR_PTR(-ENOMEM);
+	if (!mark) {
+		ret = -ENOMEM;
+		goto out_dec_ucounts;
+	}
 
 	fsnotify_init_mark(mark, group);
 	ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid);
 	if (ret) {
 		fsnotify_put_mark(mark);
-		return ERR_PTR(ret);
+		goto out_dec_ucounts;
 	}
 
 	return mark;
+
+out_dec_ucounts:
+	dec_ucount(ucounts, UCOUNT_FANOTIFY_MARKS);
+	return ERR_PTR(ret);
 }
 
 
@@ -924,7 +993,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 {
 	struct fsnotify_group *group;
 	int f_flags, fd;
-	struct user_struct *user;
 	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
 	unsigned int class = flags & FANOTIFY_CLASS_BITS;
 
@@ -963,12 +1031,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID))
 		return -EINVAL;
 
-	user = get_current_user();
-	if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) {
-		free_uid(user);
-		return -EMFILE;
-	}
-
 	f_flags = O_RDWR | FMODE_NONOTIFY;
 	if (flags & FAN_CLOEXEC)
 		f_flags |= O_CLOEXEC;
@@ -978,13 +1040,18 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
 	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
 	if (IS_ERR(group)) {
-		free_uid(user);
 		return PTR_ERR(group);
 	}
 
-	group->fanotify_data.user = user;
+	/* Enforce listeners limits per user in all containing user ns */
+	group->fanotify_data.ucounts = inc_ucount(current_user_ns(), current_euid(),
+						  UCOUNT_FANOTIFY_LISTENERS);
+	if (!group->fanotify_data.ucounts) {
+		fd = -EMFILE;
+		goto out_destroy_group;
+	}
+
 	group->fanotify_data.flags = flags;
-	atomic_inc(&user->fanotify_listeners);
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
 
 	group->overflow_event = fanotify_alloc_overflow_event();
@@ -1019,7 +1086,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 			goto out_destroy_group;
 		group->max_events = UINT_MAX;
 	} else {
-		group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS;
+		group->max_events = fanotify_max_queued_events;
 	}
 
 	if (flags & FAN_UNLIMITED_MARKS) {
@@ -1028,7 +1095,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 			goto out_destroy_group;
 		group->fanotify_data.max_marks = UINT_MAX;
 	} else {
-		group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS;
+		group->fanotify_data.max_marks = fanotify_max_listener_marks;
 	}
 
 	if (flags & FAN_ENABLE_AUDIT) {
@@ -1326,6 +1393,11 @@ static int __init fanotify_user_setup(void)
 			KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
 	}
 
+	fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS;
+	fanotify_max_listener_marks = FANOTIFY_DEFAULT_MAX_LISTENER_MARKS;
+	init_user_ns.ucount_max[UCOUNT_FANOTIFY_LISTENERS] = FANOTIFY_DEFAULT_MAX_LISTENERS;
+	init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = FANOTIFY_DEFAULT_MAX_USER_MARKS;
+
 	return 0;
 }
 device_initcall(fanotify_user_setup);
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 3e9c56ee651f..031a97d8369a 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -2,8 +2,11 @@
 #ifndef _LINUX_FANOTIFY_H
 #define _LINUX_FANOTIFY_H
 
+#include <linux/sysctl.h>
 #include <uapi/linux/fanotify.h>
 
+extern struct ctl_table fanotify_table[]; /* for sysctl */
+
 #define FAN_GROUP_FLAG(group, flag) \
 	((group)->fanotify_data.flags & (flag))
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index e5409b83e731..2179f06f6e89 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -240,7 +240,7 @@ struct fsnotify_group {
 			int flags;           /* flags from fanotify_init() */
 			int f_flags; /* event_f_flags from fanotify_init() */
 			unsigned int max_marks;
-			struct user_struct *user;
+			struct ucounts *ucounts;
 		} fanotify_data;
 #endif /* CONFIG_FANOTIFY */
 	};
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index a8ec3b6093fc..3632c5d6ec55 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -14,9 +14,6 @@ struct user_struct {
 	refcount_t __count;	/* reference count */
 	atomic_t processes;	/* How many processes does this user have? */
 	atomic_t sigpending;	/* How many pending signals does this user have? */
-#ifdef CONFIG_FANOTIFY
-	atomic_t fanotify_listeners;
-#endif
 #ifdef CONFIG_EPOLL
 	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
 #endif
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..d8e6ff5e1040 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -49,6 +49,10 @@ enum ucount_type {
 #ifdef CONFIG_INOTIFY_USER
 	UCOUNT_INOTIFY_INSTANCES,
 	UCOUNT_INOTIFY_WATCHES,
+#endif
+#ifdef CONFIG_FANOTIFY
+	UCOUNT_FANOTIFY_LISTENERS,
+	UCOUNT_FANOTIFY_MARKS,
 #endif
 	UCOUNT_COUNTS,
 };
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c9fbdd848138..46c86830b811 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -148,6 +148,9 @@ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
 #ifdef CONFIG_INOTIFY_USER
 #include <linux/inotify.h>
 #endif
+#ifdef CONFIG_FANOTIFY
+#include <linux/fanotify.h>
+#endif
 
 #ifdef CONFIG_PROC_SYSCTL
 
@@ -3258,7 +3261,14 @@ static struct ctl_table fs_table[] = {
 		.mode		= 0555,
 		.child		= inotify_table,
 	},
-#endif	
+#endif
+#ifdef CONFIG_FANOTIFY
+	{
+		.procname	= "fanotify",
+		.mode		= 0555,
+		.child		= fanotify_table,
+	},
+#endif
 #ifdef CONFIG_EPOLL
 	{
 		.procname	= "epoll",
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 11b1596e2542..edeabc5de28f 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -73,6 +73,10 @@ static struct ctl_table user_table[] = {
 #ifdef CONFIG_INOTIFY_USER
 	UCOUNT_ENTRY("max_inotify_instances"),
 	UCOUNT_ENTRY("max_inotify_watches"),
+#endif
+#ifdef CONFIG_FANOTIFY
+	UCOUNT_ENTRY("max_fanotify_listeners"),
+	UCOUNT_ENTRY("max_fanotify_marks"),
 #endif
 	{ }
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-01-24 18:42 [RFC][PATCH 0/2] unprivileged fanotify listener Amir Goldstein
  2021-01-24 18:42 ` [RFC][PATCH 1/2] fanotify: configurable limits via sysfs Amir Goldstein
@ 2021-01-24 18:42 ` Amir Goldstein
  2021-02-16 17:01   ` Jan Kara
  1 sibling, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-01-24 18:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, linux-api

Add limited support for unprivileged fanotify event listener.
An unprivileged event listener does not get an open file descriptor in
the event nor the process pid of another process.  An unprivileged event
listener cannot request permission events, cannot set mount/filesystem
marks and cannot request unlimited queue/marks.

This enables the limited functionality similar to inotify when watching a
set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
requiring SYS_CAP_ADMIN privileges.

The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
to monitor all changes inside those directories.

This typically requires that the listener keeps a map of watched directory
fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
before starting to watch for changes.

When getting an event, the reported fid of the parent should be resolved
to dirfd and fstatsat(2) with dirfd and name should be used to query the
state of the filesystem entry.

Note that even though events do not report the event creator pid,
fanotify does not merge similar events on the same object that were
generated by different processes. This is aligned with exiting behavior
when generating processes are outside of the listener pidns (which
results in reporting 0 pid to listener).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 49 +++++++++++++++++++++++++++---
 fs/notify/fdinfo.c                 |  3 +-
 include/linux/fanotify.h           | 16 ++++++++++
 3 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 4ade3f9df337..b70de273eedb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -397,9 +397,21 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	metadata.vers = FANOTIFY_METADATA_VERSION;
 	metadata.reserved = 0;
 	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
-	metadata.pid = pid_vnr(event->pid);
+	/*
+	 * An unprivileged event listener does not get an open file descriptor
+	 * in the event nor another generating process pid. If the event was
+	 * generated by the unprivileged process itself, self pid is reported.
+	 * We may relax this in the future by checking calling process access
+	 * permissions to the object.
+	 */
+	if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) ||
+	    task_tgid(current) == event->pid)
+		metadata.pid = pid_vnr(event->pid);
+	else
+		metadata.pid = 0;
 
-	if (path && path->mnt && path->dentry) {
+	if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) &&
+	    path && path->mnt && path->dentry) {
 		fd = create_fd(group, path, &f);
 		if (fd < 0)
 			return fd;
@@ -995,12 +1007,29 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	int f_flags, fd;
 	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
 	unsigned int class = flags & FANOTIFY_CLASS_BITS;
+	unsigned int internal_flags = 0;
 
 	pr_debug("%s: flags=%x event_f_flags=%x\n",
 		 __func__, flags, event_f_flags);
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
+	if (!capable(CAP_SYS_ADMIN)) {
+		/*
+		 * An unprivileged user can setup an unprivileged listener with
+		 * limited functionality - an unprivileged event listener cannot
+		 * request permission events, cannot set mount/filesystem marks
+		 * and cannot request unlimited queue/marks.
+		 */
+		if ((flags & ~FANOTIFY_UNPRIV_INIT_FLAGS) ||
+		    class != FAN_CLASS_NOTIF)
+			return -EPERM;
+
+		/*
+		 * We set the internal flag FANOTIFY_UNPRIV on the group, so we
+		 * know that we need to limit setting mount/filesystem marks on
+		 * this group and avoid providing pid and open fd in the event.
+		 */
+		internal_flags |= FANOTIFY_UNPRIV;
+	}
 
 #ifdef CONFIG_AUDITSYSCALL
 	if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT))
@@ -1051,7 +1080,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		goto out_destroy_group;
 	}
 
-	group->fanotify_data.flags = flags;
+	group->fanotify_data.flags = flags | internal_flags;
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
 
 	group->overflow_event = fanotify_alloc_overflow_event();
@@ -1247,6 +1276,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		goto fput_and_out;
 	group = f.file->private_data;
 
+	/*
+	 * An unprivileged event listener is not allowed to watch a mount
+	 * point nor a filesystem.
+	 */
+	ret = -EPERM;
+	if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) &&
+	    mark_type != FAN_MARK_INODE)
+		goto fput_and_out;
+
 	/*
 	 * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF.  These are not
 	 * allowed to set permissions events.
@@ -1379,6 +1417,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
  */
 static int __init fanotify_user_setup(void)
 {
+	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_FLAGS);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
 
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index f0d6b54be412..57f0d5d9f934 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -144,7 +144,8 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f)
 	struct fsnotify_group *group = f->private_data;
 
 	seq_printf(m, "fanotify flags:%x event-flags:%x\n",
-		   group->fanotify_data.flags, group->fanotify_data.f_flags);
+		   group->fanotify_data.flags & FANOTIFY_INIT_FLAGS,
+		   group->fanotify_data.f_flags);
 
 	show_fdinfo(m, f, fanotify_fdinfo);
 }
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 031a97d8369a..a573c1028c14 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -28,6 +28,22 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
 				 FAN_CLOEXEC | FAN_NONBLOCK | \
 				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
 
+/* Internal flags */
+#define FANOTIFY_UNPRIV		0x80000000
+#define FANOTIFY_INTERNAL_FLAGS	(FANOTIFY_UNPRIV)
+
+/*
+ * fanotify_init() flags allowed for unprivileged listener.
+ * FAN_CLASS_NOTIF in this mask is purely semantic because it is zero,
+ * but it is the only class we allow for unprivileged listener.
+ * Since unprivileged listener does not provide file descriptors in events,
+ * reporting file handles makes sense, but it is not a must.
+ * FAN_REPORT_TID does not make sense for unprivileged listener, which uses
+ * event->pid only to filter out events generated by listener process itself.
+ */
+#define FANOTIFY_UNPRIV_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | \
+					 FAN_CLASS_NOTIF | FANOTIFY_FID_BITS)
+
 #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
 				 FAN_MARK_FILESYSTEM)
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/2] fanotify: configurable limits via sysfs
  2021-01-24 18:42 ` [RFC][PATCH 1/2] fanotify: configurable limits via sysfs Amir Goldstein
@ 2021-02-16 16:27   ` Jan Kara
  2021-02-16 18:02     ` Amir Goldstein
  2021-02-18 18:57     ` Amir Goldstein
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Kara @ 2021-02-16 16:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api

Hi Amir!


I'm sorry that I've got to this only now.

On Sun 24-01-21 20:42:03, Amir Goldstein wrote:
> fanotify has some hardcoded limits. The only APIs to escape those limits
> are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.
> 
> Allow finer grained tuning of the system limits via sysfs tunables under
> /proc/sys/fs/fanotify/, similar to tunables under /proc/sys/fs/inotify,
> with some minor differences.
> 
> - max_queued_events - global system tunable for group queue size limit.
>   Like the inotify tunable with the same name, it defaults to 16384 and
>   applies on initialization of a new group.
> 
> - max_listener_marks - global system tunable of marks limit per group.
>   Defaults to 8192. inotify has no per group marks limit.
> 
> - max_user_marks - user ns tunable for marks limit per user.
>   Like the inotify tunable named max_user_watches, it defaults to 1048576
>   and is accounted for every containing user ns.
> 
> - max_user_listeners - user ns tunable for number of listeners per user.
>   Like the inotify tunable named max_user_instances, it defaults to 128
>   and is accounted for every containing user ns.

I think term 'group' is used in the manpages even more and in the code as
well. 'listener' more generally tends to refer to the application listening
to the events. So I'd rather call the limits 'max_group_marks' and
'max_user_groups'.
 
> The slightly different tunable names are derived from the "listener" and
> "mark" terminology used in the fanotify man pages.
> 
> max_listener_marks was kept for compatibility with legacy fanotify
> behavior. Given that inotify max_user_instances was increased from 8192
> to 1048576 in kernel v5.10, we may want to consider changing also the
> default for max_listener_marks or remove it completely, leaving only the
> per user marks limit.

Yes, probably I'd just drop 'max_group_marks' completely and leave just
per-user marks limit. You can always tune it in init_user_ns if you wish.
Can't you?

Also as a small style nit, please try to stay within 80 columns. Otherwise
the patch looks OK to me.

								Honza


> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/fanotify/fanotify.c      |  14 ++--
>  fs/notify/fanotify/fanotify_user.c | 106 ++++++++++++++++++++++++-----
>  include/linux/fanotify.h           |   3 +
>  include/linux/fsnotify_backend.h   |   2 +-
>  include/linux/sched/user.h         |   3 -
>  include/linux/user_namespace.h     |   4 ++
>  kernel/sysctl.c                    |  12 +++-
>  kernel/ucount.c                    |   4 ++
>  8 files changed, 121 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 1192c9953620..aaa81ce916c3 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -736,11 +736,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>  
>  static void fanotify_free_group_priv(struct fsnotify_group *group)
>  {
> -	struct user_struct *user;
> -
> -	user = group->fanotify_data.user;
> -	atomic_dec(&user->fanotify_listeners);
> -	free_uid(user);
> +	if (group->fanotify_data.ucounts)
> +		dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_LISTENERS);
>  }
>  
>  static void fanotify_free_path_event(struct fanotify_event *event)
> @@ -796,6 +793,12 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
>  	}
>  }
>  
> +static void fanotify_freeing_mark(struct fsnotify_mark *mark,
> +				  struct fsnotify_group *group)
> +{
> +	dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS);
> +}
> +
>  static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
>  {
>  	kmem_cache_free(fanotify_mark_cache, fsn_mark);
> @@ -805,5 +808,6 @@ const struct fsnotify_ops fanotify_fsnotify_ops = {
>  	.handle_event = fanotify_handle_event,
>  	.free_group_priv = fanotify_free_group_priv,
>  	.free_event = fanotify_free_event,
> +	.freeing_mark = fanotify_freeing_mark,
>  	.free_mark = fanotify_free_mark,
>  };
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index b78dd1f88f7c..4ade3f9df337 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -27,8 +27,65 @@
>  #include "fanotify.h"
>  
>  #define FANOTIFY_DEFAULT_MAX_EVENTS	16384
> -#define FANOTIFY_DEFAULT_MAX_MARKS	8192
> +#define FANOTIFY_OLD_DEFAULT_MAX_MARKS	8192
>  #define FANOTIFY_DEFAULT_MAX_LISTENERS	128
> +/*
> + * Legacy fanotify marks limits is per group and we introduced an additional
> + * limit of marks per user, similar to inotify.  Effectively, the legacy limit
> + * of fanotify marks per user is <max marks per group> * <max groups per user>.
> + * This default limit (1M) also happens to match the increased limit for inotify
> + * max_user_watches since v5.10.
> + */
> +#define FANOTIFY_DEFAULT_MAX_LISTENER_MARKS \
> +	FANOTIFY_OLD_DEFAULT_MAX_MARKS
> +#define FANOTIFY_DEFAULT_MAX_USER_MARKS	\
> +	(FANOTIFY_DEFAULT_MAX_LISTENER_MARKS * FANOTIFY_DEFAULT_MAX_LISTENERS)
> +
> +/* configurable via /proc/sys/fs/fanotify/ */
> +static int fanotify_max_queued_events __read_mostly;
> +static int fanotify_max_listener_marks __read_mostly;
> +
> +#ifdef CONFIG_SYSCTL
> +
> +#include <linux/sysctl.h>
> +
> +struct ctl_table fanotify_table[] = {
> +	{
> +		.procname	= "max_user_listeners",
> +		.data		= &init_user_ns.ucount_max[UCOUNT_FANOTIFY_LISTENERS],
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +	},
> +	{
> +		.procname	= "max_user_marks",
> +		.data		= &init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS],
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +	},
> +	{
> +		.procname	= "max_listener_marks",
> +		.data		= &fanotify_max_listener_marks,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO
> +	},
> +	{
> +		.procname	= "max_queued_events",
> +		.data		= &fanotify_max_queued_events,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO
> +	},
> +	{ }
> +};
> +#endif /* CONFIG_SYSCTL */
> +
>  
>  /*
>   * All flags that may be specified in parameter event_f_flags of fanotify_init.
> @@ -822,24 +879,36 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
>  						   unsigned int type,
>  						   __kernel_fsid_t *fsid)
>  {
> +	struct ucounts *ucounts = group->fanotify_data.ucounts;
>  	struct fsnotify_mark *mark;
>  	int ret;
>  
> -	if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks)
> +	/*
> +	 * Enfore global limit of marks per group and limit of marks per user
> +	 * for the user that initiated the group in all containing user ns.
> +	 */
> +	if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks ||
> +	    !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
>  		return ERR_PTR(-ENOSPC);
>  
>  	mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
> -	if (!mark)
> -		return ERR_PTR(-ENOMEM);
> +	if (!mark) {
> +		ret = -ENOMEM;
> +		goto out_dec_ucounts;
> +	}
>  
>  	fsnotify_init_mark(mark, group);
>  	ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid);
>  	if (ret) {
>  		fsnotify_put_mark(mark);
> -		return ERR_PTR(ret);
> +		goto out_dec_ucounts;
>  	}
>  
>  	return mark;
> +
> +out_dec_ucounts:
> +	dec_ucount(ucounts, UCOUNT_FANOTIFY_MARKS);
> +	return ERR_PTR(ret);
>  }
>  
>  
> @@ -924,7 +993,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  {
>  	struct fsnotify_group *group;
>  	int f_flags, fd;
> -	struct user_struct *user;
>  	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
>  	unsigned int class = flags & FANOTIFY_CLASS_BITS;
>  
> @@ -963,12 +1031,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  	if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID))
>  		return -EINVAL;
>  
> -	user = get_current_user();
> -	if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) {
> -		free_uid(user);
> -		return -EMFILE;
> -	}
> -
>  	f_flags = O_RDWR | FMODE_NONOTIFY;
>  	if (flags & FAN_CLOEXEC)
>  		f_flags |= O_CLOEXEC;
> @@ -978,13 +1040,18 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
>  	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
>  	if (IS_ERR(group)) {
> -		free_uid(user);
>  		return PTR_ERR(group);
>  	}
>  
> -	group->fanotify_data.user = user;
> +	/* Enforce listeners limits per user in all containing user ns */
> +	group->fanotify_data.ucounts = inc_ucount(current_user_ns(), current_euid(),
> +						  UCOUNT_FANOTIFY_LISTENERS);
> +	if (!group->fanotify_data.ucounts) {
> +		fd = -EMFILE;
> +		goto out_destroy_group;
> +	}
> +
>  	group->fanotify_data.flags = flags;
> -	atomic_inc(&user->fanotify_listeners);
>  	group->memcg = get_mem_cgroup_from_mm(current->mm);
>  
>  	group->overflow_event = fanotify_alloc_overflow_event();
> @@ -1019,7 +1086,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  			goto out_destroy_group;
>  		group->max_events = UINT_MAX;
>  	} else {
> -		group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS;
> +		group->max_events = fanotify_max_queued_events;
>  	}
>  
>  	if (flags & FAN_UNLIMITED_MARKS) {
> @@ -1028,7 +1095,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  			goto out_destroy_group;
>  		group->fanotify_data.max_marks = UINT_MAX;
>  	} else {
> -		group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS;
> +		group->fanotify_data.max_marks = fanotify_max_listener_marks;
>  	}
>  
>  	if (flags & FAN_ENABLE_AUDIT) {
> @@ -1326,6 +1393,11 @@ static int __init fanotify_user_setup(void)
>  			KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
>  	}
>  
> +	fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS;
> +	fanotify_max_listener_marks = FANOTIFY_DEFAULT_MAX_LISTENER_MARKS;
> +	init_user_ns.ucount_max[UCOUNT_FANOTIFY_LISTENERS] = FANOTIFY_DEFAULT_MAX_LISTENERS;
> +	init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = FANOTIFY_DEFAULT_MAX_USER_MARKS;
> +
>  	return 0;
>  }
>  device_initcall(fanotify_user_setup);
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 3e9c56ee651f..031a97d8369a 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -2,8 +2,11 @@
>  #ifndef _LINUX_FANOTIFY_H
>  #define _LINUX_FANOTIFY_H
>  
> +#include <linux/sysctl.h>
>  #include <uapi/linux/fanotify.h>
>  
> +extern struct ctl_table fanotify_table[]; /* for sysctl */
> +
>  #define FAN_GROUP_FLAG(group, flag) \
>  	((group)->fanotify_data.flags & (flag))
>  
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index e5409b83e731..2179f06f6e89 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -240,7 +240,7 @@ struct fsnotify_group {
>  			int flags;           /* flags from fanotify_init() */
>  			int f_flags; /* event_f_flags from fanotify_init() */
>  			unsigned int max_marks;
> -			struct user_struct *user;
> +			struct ucounts *ucounts;
>  		} fanotify_data;
>  #endif /* CONFIG_FANOTIFY */
>  	};
> diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
> index a8ec3b6093fc..3632c5d6ec55 100644
> --- a/include/linux/sched/user.h
> +++ b/include/linux/sched/user.h
> @@ -14,9 +14,6 @@ struct user_struct {
>  	refcount_t __count;	/* reference count */
>  	atomic_t processes;	/* How many processes does this user have? */
>  	atomic_t sigpending;	/* How many pending signals does this user have? */
> -#ifdef CONFIG_FANOTIFY
> -	atomic_t fanotify_listeners;
> -#endif
>  #ifdef CONFIG_EPOLL
>  	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
>  #endif
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 64cf8ebdc4ec..d8e6ff5e1040 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -49,6 +49,10 @@ enum ucount_type {
>  #ifdef CONFIG_INOTIFY_USER
>  	UCOUNT_INOTIFY_INSTANCES,
>  	UCOUNT_INOTIFY_WATCHES,
> +#endif
> +#ifdef CONFIG_FANOTIFY
> +	UCOUNT_FANOTIFY_LISTENERS,
> +	UCOUNT_FANOTIFY_MARKS,
>  #endif
>  	UCOUNT_COUNTS,
>  };
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c9fbdd848138..46c86830b811 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -148,6 +148,9 @@ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
>  #ifdef CONFIG_INOTIFY_USER
>  #include <linux/inotify.h>
>  #endif
> +#ifdef CONFIG_FANOTIFY
> +#include <linux/fanotify.h>
> +#endif
>  
>  #ifdef CONFIG_PROC_SYSCTL
>  
> @@ -3258,7 +3261,14 @@ static struct ctl_table fs_table[] = {
>  		.mode		= 0555,
>  		.child		= inotify_table,
>  	},
> -#endif	
> +#endif
> +#ifdef CONFIG_FANOTIFY
> +	{
> +		.procname	= "fanotify",
> +		.mode		= 0555,
> +		.child		= fanotify_table,
> +	},
> +#endif
>  #ifdef CONFIG_EPOLL
>  	{
>  		.procname	= "epoll",
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 11b1596e2542..edeabc5de28f 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -73,6 +73,10 @@ static struct ctl_table user_table[] = {
>  #ifdef CONFIG_INOTIFY_USER
>  	UCOUNT_ENTRY("max_inotify_instances"),
>  	UCOUNT_ENTRY("max_inotify_watches"),
> +#endif
> +#ifdef CONFIG_FANOTIFY
> +	UCOUNT_ENTRY("max_fanotify_listeners"),
> +	UCOUNT_ENTRY("max_fanotify_marks"),
>  #endif
>  	{ }
>  };
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-01-24 18:42 ` [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
@ 2021-02-16 17:01   ` Jan Kara
  2021-02-16 18:12     ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2021-02-16 17:01 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api

On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> Add limited support for unprivileged fanotify event listener.
> An unprivileged event listener does not get an open file descriptor in
> the event nor the process pid of another process.  An unprivileged event
> listener cannot request permission events, cannot set mount/filesystem
> marks and cannot request unlimited queue/marks.
> 
> This enables the limited functionality similar to inotify when watching a
> set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> requiring SYS_CAP_ADMIN privileges.
> 
> The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> to monitor all changes inside those directories.
> 
> This typically requires that the listener keeps a map of watched directory
> fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> before starting to watch for changes.
> 
> When getting an event, the reported fid of the parent should be resolved
> to dirfd and fstatsat(2) with dirfd and name should be used to query the
> state of the filesystem entry.
> 
> Note that even though events do not report the event creator pid,
> fanotify does not merge similar events on the same object that were
> generated by different processes. This is aligned with exiting behavior
> when generating processes are outside of the listener pidns (which
> results in reporting 0 pid to listener).
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

The patch looks mostly good to me. Just two questions:

a) Remind me please, why did we decide pid isn't safe to report to
unpriviledged listeners?

b) Why did we decide returning open file descriptors isn't safe for
unpriviledged listeners? Is it about FMODE_NONOTIFY?

I'm not opposed to either but I'm wondering. Also with b) old style
fanotify events are not very useful so maybe we could just disallow all
notification groups without FID/DFID reporting? In the future if we ever
decide returning open fds is safe or how to do it, we can enable that group
type for unpriviledged users. However just starting to return open fds
later won't fly because listener has to close these fds when receiving
events.

								Honza

> ---
>  fs/notify/fanotify/fanotify_user.c | 49 +++++++++++++++++++++++++++---
>  fs/notify/fdinfo.c                 |  3 +-
>  include/linux/fanotify.h           | 16 ++++++++++
>  3 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 4ade3f9df337..b70de273eedb 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -397,9 +397,21 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  	metadata.vers = FANOTIFY_METADATA_VERSION;
>  	metadata.reserved = 0;
>  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> -	metadata.pid = pid_vnr(event->pid);
> +	/*
> +	 * An unprivileged event listener does not get an open file descriptor
> +	 * in the event nor another generating process pid. If the event was
> +	 * generated by the unprivileged process itself, self pid is reported.
> +	 * We may relax this in the future by checking calling process access
> +	 * permissions to the object.
> +	 */
> +	if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) ||
> +	    task_tgid(current) == event->pid)
> +		metadata.pid = pid_vnr(event->pid);
> +	else
> +		metadata.pid = 0;
>  
> -	if (path && path->mnt && path->dentry) {
> +	if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) &&
> +	    path && path->mnt && path->dentry) {
>  		fd = create_fd(group, path, &f);
>  		if (fd < 0)
>  			return fd;
> @@ -995,12 +1007,29 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  	int f_flags, fd;
>  	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
>  	unsigned int class = flags & FANOTIFY_CLASS_BITS;
> +	unsigned int internal_flags = 0;
>  
>  	pr_debug("%s: flags=%x event_f_flags=%x\n",
>  		 __func__, flags, event_f_flags);
>  
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> +	if (!capable(CAP_SYS_ADMIN)) {
> +		/*
> +		 * An unprivileged user can setup an unprivileged listener with
> +		 * limited functionality - an unprivileged event listener cannot
> +		 * request permission events, cannot set mount/filesystem marks
> +		 * and cannot request unlimited queue/marks.
> +		 */
> +		if ((flags & ~FANOTIFY_UNPRIV_INIT_FLAGS) ||
> +		    class != FAN_CLASS_NOTIF)
> +			return -EPERM;
> +
> +		/*
> +		 * We set the internal flag FANOTIFY_UNPRIV on the group, so we
> +		 * know that we need to limit setting mount/filesystem marks on
> +		 * this group and avoid providing pid and open fd in the event.
> +		 */
> +		internal_flags |= FANOTIFY_UNPRIV;
> +	}
>  
>  #ifdef CONFIG_AUDITSYSCALL
>  	if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT))
> @@ -1051,7 +1080,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  		goto out_destroy_group;
>  	}
>  
> -	group->fanotify_data.flags = flags;
> +	group->fanotify_data.flags = flags | internal_flags;
>  	group->memcg = get_mem_cgroup_from_mm(current->mm);
>  
>  	group->overflow_event = fanotify_alloc_overflow_event();
> @@ -1247,6 +1276,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>  		goto fput_and_out;
>  	group = f.file->private_data;
>  
> +	/*
> +	 * An unprivileged event listener is not allowed to watch a mount
> +	 * point nor a filesystem.
> +	 */
> +	ret = -EPERM;
> +	if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) &&
> +	    mark_type != FAN_MARK_INODE)
> +		goto fput_and_out;
> +
>  	/*
>  	 * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF.  These are not
>  	 * allowed to set permissions events.
> @@ -1379,6 +1417,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
>   */
>  static int __init fanotify_user_setup(void)
>  {
> +	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_FLAGS);
>  	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
>  	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>  
> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> index f0d6b54be412..57f0d5d9f934 100644
> --- a/fs/notify/fdinfo.c
> +++ b/fs/notify/fdinfo.c
> @@ -144,7 +144,8 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f)
>  	struct fsnotify_group *group = f->private_data;
>  
>  	seq_printf(m, "fanotify flags:%x event-flags:%x\n",
> -		   group->fanotify_data.flags, group->fanotify_data.f_flags);
> +		   group->fanotify_data.flags & FANOTIFY_INIT_FLAGS,
> +		   group->fanotify_data.f_flags);
>  
>  	show_fdinfo(m, f, fanotify_fdinfo);
>  }
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 031a97d8369a..a573c1028c14 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -28,6 +28,22 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
>  				 FAN_CLOEXEC | FAN_NONBLOCK | \
>  				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
>  
> +/* Internal flags */
> +#define FANOTIFY_UNPRIV		0x80000000
> +#define FANOTIFY_INTERNAL_FLAGS	(FANOTIFY_UNPRIV)
> +
> +/*
> + * fanotify_init() flags allowed for unprivileged listener.
> + * FAN_CLASS_NOTIF in this mask is purely semantic because it is zero,
> + * but it is the only class we allow for unprivileged listener.
> + * Since unprivileged listener does not provide file descriptors in events,
> + * reporting file handles makes sense, but it is not a must.
> + * FAN_REPORT_TID does not make sense for unprivileged listener, which uses
> + * event->pid only to filter out events generated by listener process itself.
> + */
> +#define FANOTIFY_UNPRIV_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | \
> +					 FAN_CLASS_NOTIF | FANOTIFY_FID_BITS)
> +
>  #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
>  				 FAN_MARK_FILESYSTEM)
>  
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/2] fanotify: configurable limits via sysfs
  2021-02-16 16:27   ` Jan Kara
@ 2021-02-16 18:02     ` Amir Goldstein
  2021-02-17 10:25       ` Jan Kara
  2021-02-18 18:57     ` Amir Goldstein
  1 sibling, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-02-16 18:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API

On Tue, Feb 16, 2021 at 6:27 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi Amir!
>
>
> I'm sorry that I've got to this only now.
>
> On Sun 24-01-21 20:42:03, Amir Goldstein wrote:
> > fanotify has some hardcoded limits. The only APIs to escape those limits
> > are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.
> >
> > Allow finer grained tuning of the system limits via sysfs tunables under
> > /proc/sys/fs/fanotify/, similar to tunables under /proc/sys/fs/inotify,
> > with some minor differences.
> >
> > - max_queued_events - global system tunable for group queue size limit.
> >   Like the inotify tunable with the same name, it defaults to 16384 and
> >   applies on initialization of a new group.
> >
> > - max_listener_marks - global system tunable of marks limit per group.
> >   Defaults to 8192. inotify has no per group marks limit.
> >
> > - max_user_marks - user ns tunable for marks limit per user.
> >   Like the inotify tunable named max_user_watches, it defaults to 1048576
> >   and is accounted for every containing user ns.
> >
> > - max_user_listeners - user ns tunable for number of listeners per user.
> >   Like the inotify tunable named max_user_instances, it defaults to 128
> >   and is accounted for every containing user ns.
>
> I think term 'group' is used in the manpages even more and in the code as
> well. 'listener' more generally tends to refer to the application listening
> to the events. So I'd rather call the limits 'max_group_marks' and
> 'max_user_groups'.

ok.

>
> > The slightly different tunable names are derived from the "listener" and
> > "mark" terminology used in the fanotify man pages.
> >
> > max_listener_marks was kept for compatibility with legacy fanotify
> > behavior. Given that inotify max_user_instances was increased from 8192
> > to 1048576 in kernel v5.10, we may want to consider changing also the
> > default for max_listener_marks or remove it completely, leaving only the
> > per user marks limit.
>
> Yes, probably I'd just drop 'max_group_marks' completely and leave just
> per-user marks limit. You can always tune it in init_user_ns if you wish.
> Can't you?

Yes, I suppose.

>
> Also as a small style nit, please try to stay within 80 columns. Otherwise
> the patch looks OK to me.
>

Ever since discussions that led to:
bdc48fa11e46 checkpatch/coding-style: deprecate 80-column warning

I've tuned my editor to warn on 100 columns.
I still try to refrain from long lines, but breaking a ~82 columns line
in an ugly way is something that I try to avoid.

I'll try harder to stay below 80 when it does not create ugly code,
unless you absolutely want the patches to fit in 80 columns.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-16 17:01   ` Jan Kara
@ 2021-02-16 18:12     ` Amir Goldstein
  2021-02-19 16:16       ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-02-16 18:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API

On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > Add limited support for unprivileged fanotify event listener.
> > An unprivileged event listener does not get an open file descriptor in
> > the event nor the process pid of another process.  An unprivileged event
> > listener cannot request permission events, cannot set mount/filesystem
> > marks and cannot request unlimited queue/marks.
> >
> > This enables the limited functionality similar to inotify when watching a
> > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > requiring SYS_CAP_ADMIN privileges.
> >
> > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > to monitor all changes inside those directories.
> >
> > This typically requires that the listener keeps a map of watched directory
> > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > before starting to watch for changes.
> >
> > When getting an event, the reported fid of the parent should be resolved
> > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > state of the filesystem entry.
> >
> > Note that even though events do not report the event creator pid,
> > fanotify does not merge similar events on the same object that were
> > generated by different processes. This is aligned with exiting behavior
> > when generating processes are outside of the listener pidns (which
> > results in reporting 0 pid to listener).
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> The patch looks mostly good to me. Just two questions:
>
> a) Remind me please, why did we decide pid isn't safe to report to
> unpriviledged listeners?

Just because the information that process X modified file Y is not an
information that user can generally obtain without extra capabilities(?)
I can add a flag FAN_REPORT_OWN_PID to make that behavior
explicit and then we can relax reporting pids later.

>
> b) Why did we decide returning open file descriptors isn't safe for
> unpriviledged listeners? Is it about FMODE_NONOTIFY?
>

Don't remember something in particular. I feels risky.

> I'm not opposed to either but I'm wondering. Also with b) old style
> fanotify events are not very useful so maybe we could just disallow all
> notification groups without FID/DFID reporting? In the future if we ever
> decide returning open fds is safe or how to do it, we can enable that group
> type for unpriviledged users. However just starting to return open fds
> later won't fly because listener has to close these fds when receiving
> events.
>

I like this option better.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/2] fanotify: configurable limits via sysfs
  2021-02-16 18:02     ` Amir Goldstein
@ 2021-02-17 10:25       ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2021-02-17 10:25 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Tue 16-02-21 20:02:49, Amir Goldstein wrote:
> On Tue, Feb 16, 2021 at 6:27 PM Jan Kara <jack@suse.cz> wrote:
> > Also as a small style nit, please try to stay within 80 columns. Otherwise
> > the patch looks OK to me.
> >
> 
> Ever since discussions that led to:
> bdc48fa11e46 checkpatch/coding-style: deprecate 80-column warning

Yes, I know.

> I've tuned my editor to warn on 100 columns.
> I still try to refrain from long lines, but breaking a ~82 columns line
> in an ugly way is something that I try to avoid.

Well it depends what is in those two columns. I have my terminals exactly
80 columns wide so that I can fit as many of them on the screen as possible
;). So I don't see whatever is beyond column 80. Sometimes it is obvious
enough but sometimes not and if I have to scroll, it isn't ideal.
 
> I'll try harder to stay below 80 when it does not create ugly code,
> unless you absolutely want the patches to fit in 80 columns.

No, I'm not religious about 80 columns. It is really about readability.
E.g. for strings, few characters beyond 80 columns does not really matter.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/2] fanotify: configurable limits via sysfs
  2021-02-16 16:27   ` Jan Kara
  2021-02-16 18:02     ` Amir Goldstein
@ 2021-02-18 18:57     ` Amir Goldstein
  2021-02-19  9:01       ` Amir Goldstein
  1 sibling, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-02-18 18:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API

On Tue, Feb 16, 2021 at 6:27 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi Amir!
>
>
> I'm sorry that I've got to this only now.
>
> On Sun 24-01-21 20:42:03, Amir Goldstein wrote:
> > fanotify has some hardcoded limits. The only APIs to escape those limits
> > are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.
> >
> > Allow finer grained tuning of the system limits via sysfs tunables under
> > /proc/sys/fs/fanotify/, similar to tunables under /proc/sys/fs/inotify,
> > with some minor differences.
> >
> > - max_queued_events - global system tunable for group queue size limit.
> >   Like the inotify tunable with the same name, it defaults to 16384 and
> >   applies on initialization of a new group.
> >
> > - max_listener_marks - global system tunable of marks limit per group.
> >   Defaults to 8192. inotify has no per group marks limit.
> >
> > - max_user_marks - user ns tunable for marks limit per user.
> >   Like the inotify tunable named max_user_watches, it defaults to 1048576
> >   and is accounted for every containing user ns.
> >
> > - max_user_listeners - user ns tunable for number of listeners per user.
> >   Like the inotify tunable named max_user_instances, it defaults to 128
> >   and is accounted for every containing user ns.
>
> I think term 'group' is used in the manpages even more and in the code as
> well. 'listener' more generally tends to refer to the application listening
> to the events. So I'd rather call the limits 'max_group_marks' and
> 'max_user_groups'.
>
> > The slightly different tunable names are derived from the "listener" and
> > "mark" terminology used in the fanotify man pages.
> >
> > max_listener_marks was kept for compatibility with legacy fanotify
> > behavior. Given that inotify max_user_instances was increased from 8192
> > to 1048576 in kernel v5.10, we may want to consider changing also the
> > default for max_listener_marks or remove it completely, leaving only the
> > per user marks limit.
>
> Yes, probably I'd just drop 'max_group_marks' completely and leave just
> per-user marks limit. You can always tune it in init_user_ns if you wish.
> Can't you?
>

So I am fine with making this change but what about
FAN_UNLIMITED_MARKS?

What will it mean?
Should the group be able to escape ucount limits?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/2] fanotify: configurable limits via sysfs
  2021-02-18 18:57     ` Amir Goldstein
@ 2021-02-19  9:01       ` Amir Goldstein
  0 siblings, 0 replies; 16+ messages in thread
From: Amir Goldstein @ 2021-02-19  9:01 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API

On Thu, Feb 18, 2021 at 8:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Feb 16, 2021 at 6:27 PM Jan Kara <jack@suse.cz> wrote:
> >
> > Hi Amir!
> >
> >
> > I'm sorry that I've got to this only now.
> >
> > On Sun 24-01-21 20:42:03, Amir Goldstein wrote:
> > > fanotify has some hardcoded limits. The only APIs to escape those limits
> > > are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.
> > >
> > > Allow finer grained tuning of the system limits via sysfs tunables under
> > > /proc/sys/fs/fanotify/, similar to tunables under /proc/sys/fs/inotify,
> > > with some minor differences.
> > >
> > > - max_queued_events - global system tunable for group queue size limit.
> > >   Like the inotify tunable with the same name, it defaults to 16384 and
> > >   applies on initialization of a new group.
> > >
> > > - max_listener_marks - global system tunable of marks limit per group.
> > >   Defaults to 8192. inotify has no per group marks limit.
> > >
> > > - max_user_marks - user ns tunable for marks limit per user.
> > >   Like the inotify tunable named max_user_watches, it defaults to 1048576
> > >   and is accounted for every containing user ns.
> > >
> > > - max_user_listeners - user ns tunable for number of listeners per user.
> > >   Like the inotify tunable named max_user_instances, it defaults to 128
> > >   and is accounted for every containing user ns.
> >
> > I think term 'group' is used in the manpages even more and in the code as
> > well. 'listener' more generally tends to refer to the application listening
> > to the events. So I'd rather call the limits 'max_group_marks' and
> > 'max_user_groups'.
> >
> > > The slightly different tunable names are derived from the "listener" and
> > > "mark" terminology used in the fanotify man pages.
> > >
> > > max_listener_marks was kept for compatibility with legacy fanotify
> > > behavior. Given that inotify max_user_instances was increased from 8192
> > > to 1048576 in kernel v5.10, we may want to consider changing also the
> > > default for max_listener_marks or remove it completely, leaving only the
> > > per user marks limit.
> >
> > Yes, probably I'd just drop 'max_group_marks' completely and leave just
> > per-user marks limit. You can always tune it in init_user_ns if you wish.
> > Can't you?
> >
>
> So I am fine with making this change but what about
> FAN_UNLIMITED_MARKS?
>
> What will it mean?
> Should the group be able to escape ucount limits?
>

Nevermind, I figured it out on my own:

 "Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own
  marks are not accounted in the per user marks account, so in effect the
  limit of max_user_marks is only for the collection of groups that are
  not initialized with FAN_UNLIMITED_MARKS."

I've pushed the result to branch fanotify_limits.
I won't post during the merge window unless you tell me to.

Do you think we should go ahead with merging this change before
the unprivileged fanotify change?

IMO, this interface is useful even for privileged groups - I have been carrying
a private patch for max_queued_events for a long time as the default limit is
too low and I wanted to avoid unlimited memory usage without the hassle
of setting up a memcg.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-16 18:12     ` Amir Goldstein
@ 2021-02-19 16:16       ` Amir Goldstein
  2021-02-23 17:16         ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-02-19 16:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API

On Tue, Feb 16, 2021 at 8:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > > Add limited support for unprivileged fanotify event listener.
> > > An unprivileged event listener does not get an open file descriptor in
> > > the event nor the process pid of another process.  An unprivileged event
> > > listener cannot request permission events, cannot set mount/filesystem
> > > marks and cannot request unlimited queue/marks.
> > >
> > > This enables the limited functionality similar to inotify when watching a
> > > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > > requiring SYS_CAP_ADMIN privileges.
> > >
> > > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > > to monitor all changes inside those directories.
> > >
> > > This typically requires that the listener keeps a map of watched directory
> > > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > > before starting to watch for changes.
> > >
> > > When getting an event, the reported fid of the parent should be resolved
> > > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > > state of the filesystem entry.
> > >
> > > Note that even though events do not report the event creator pid,
> > > fanotify does not merge similar events on the same object that were
> > > generated by different processes. This is aligned with exiting behavior
> > > when generating processes are outside of the listener pidns (which
> > > results in reporting 0 pid to listener).
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > The patch looks mostly good to me. Just two questions:
> >
> > a) Remind me please, why did we decide pid isn't safe to report to
> > unpriviledged listeners?
>
> Just because the information that process X modified file Y is not an
> information that user can generally obtain without extra capabilities(?)
> I can add a flag FAN_REPORT_OWN_PID to make that behavior
> explicit and then we can relax reporting pids later.
>

FYI a patch for flag FAN_REPORT_SELF_PID is pushed to branch
fanotify_unpriv.

The UAPI feels a bit awkward with this flag, but that is the easiest way
to start without worrying about disclosing pids.

I guess we can require that unprivileged listener has pid 1 in its own
pid ns. The outcome is similar to FAN_REPORT_SELF_PID, except
it can also get pids of its children which is probably fine.

I am not sure if this is a reasonable option from users POV.

> >
> > b) Why did we decide returning open file descriptors isn't safe for
> > unpriviledged listeners? Is it about FMODE_NONOTIFY?
> >
>
> Don't remember something in particular. I feels risky.
>
> > I'm not opposed to either but I'm wondering. Also with b) old style
> > fanotify events are not very useful so maybe we could just disallow all
> > notification groups without FID/DFID reporting? In the future if we ever
> > decide returning open fds is safe or how to do it, we can enable that group
> > type for unpriviledged users. However just starting to return open fds
> > later won't fly because listener has to close these fds when receiving
> > events.
> >
>
> I like this option better.
>

This is also pushed to branch fanotify_unpriv.
With all the behavior specified explicitly in fanotify_init() and
fanotify_mark() flags, there is no need for the internal
FANOTIFY_UNPRIV group flag, which looks better IMO.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-19 16:16       ` Amir Goldstein
@ 2021-02-23 17:16         ` Amir Goldstein
  2021-02-24 10:52           ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2021-02-23 17:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Kees Cook

On Fri, Feb 19, 2021 at 6:16 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Feb 16, 2021 at 8:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > > > Add limited support for unprivileged fanotify event listener.
> > > > An unprivileged event listener does not get an open file descriptor in
> > > > the event nor the process pid of another process.  An unprivileged event
> > > > listener cannot request permission events, cannot set mount/filesystem
> > > > marks and cannot request unlimited queue/marks.
> > > >
> > > > This enables the limited functionality similar to inotify when watching a
> > > > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > > > requiring SYS_CAP_ADMIN privileges.
> > > >
> > > > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > > > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > > > to monitor all changes inside those directories.
> > > >
> > > > This typically requires that the listener keeps a map of watched directory
> > > > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > > > before starting to watch for changes.
> > > >
> > > > When getting an event, the reported fid of the parent should be resolved
> > > > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > > > state of the filesystem entry.
> > > >
> > > > Note that even though events do not report the event creator pid,
> > > > fanotify does not merge similar events on the same object that were
> > > > generated by different processes. This is aligned with exiting behavior
> > > > when generating processes are outside of the listener pidns (which
> > > > results in reporting 0 pid to listener).
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > The patch looks mostly good to me. Just two questions:
> > >
> > > a) Remind me please, why did we decide pid isn't safe to report to
> > > unpriviledged listeners?
> >
> > Just because the information that process X modified file Y is not an
> > information that user can generally obtain without extra capabilities(?)
> > I can add a flag FAN_REPORT_OWN_PID to make that behavior
> > explicit and then we can relax reporting pids later.
> >
>
> FYI a patch for flag FAN_REPORT_SELF_PID is pushed to branch
> fanotify_unpriv.
>
> The UAPI feels a bit awkward with this flag, but that is the easiest way
> to start without worrying about disclosing pids.
>
> I guess we can require that unprivileged listener has pid 1 in its own
> pid ns. The outcome is similar to FAN_REPORT_SELF_PID, except
> it can also get pids of its children which is probably fine.
>

Jan,

WRT your comment in github:
"So maybe we can just require that this flag is already set by userspace
instead of silently setting it? Like:

if (!(flags & FAN_REPORT_SELF_PID)) return -EPERM;

I'd say that variant is more futureproof and the difference for user
is minimal."

I started with this approach and then I wrote the tests and imagined
the man page
requiring this flag would be a bit awkward, so I changed it to auto-enable.

I am not strongly against the more implicit flag requirement, but in
favor of the
auto-enable approach I would like to argue that with current fanotify you CAN
get zero pid in event, so think about it this way:
If a listener is started in (or moved into) its own pid ns, it will
get zero pid in all
events (other than those generated by itself and its own children).

With the proposed change, the same applies also if the listener is started
without CAP_SYS_ADMIN.

As a matter of fact, we do not need the flag at all, we can determine whether
or not to report pid according to capabilities of the event reader at
event read time.
And we can check for one of:
- CAP_SYS_ADMIN
- CAP_SYS_PACCT
- CAP_SYS_PTRACE

Do you prefer this flag-less approach?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-23 17:16         ` Amir Goldstein
@ 2021-02-24 10:52           ` Jan Kara
  2021-02-24 12:58             ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2021-02-24 10:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API, Kees Cook

On Tue 23-02-21 19:16:40, Amir Goldstein wrote:
> On Fri, Feb 19, 2021 at 6:16 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Feb 16, 2021 at 8:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > > > > Add limited support for unprivileged fanotify event listener.
> > > > > An unprivileged event listener does not get an open file descriptor in
> > > > > the event nor the process pid of another process.  An unprivileged event
> > > > > listener cannot request permission events, cannot set mount/filesystem
> > > > > marks and cannot request unlimited queue/marks.
> > > > >
> > > > > This enables the limited functionality similar to inotify when watching a
> > > > > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > > > > requiring SYS_CAP_ADMIN privileges.
> > > > >
> > > > > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > > > > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > > > > to monitor all changes inside those directories.
> > > > >
> > > > > This typically requires that the listener keeps a map of watched directory
> > > > > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > > > > before starting to watch for changes.
> > > > >
> > > > > When getting an event, the reported fid of the parent should be resolved
> > > > > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > > > > state of the filesystem entry.
> > > > >
> > > > > Note that even though events do not report the event creator pid,
> > > > > fanotify does not merge similar events on the same object that were
> > > > > generated by different processes. This is aligned with exiting behavior
> > > > > when generating processes are outside of the listener pidns (which
> > > > > results in reporting 0 pid to listener).
> > > > >
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > The patch looks mostly good to me. Just two questions:
> > > >
> > > > a) Remind me please, why did we decide pid isn't safe to report to
> > > > unpriviledged listeners?
> > >
> > > Just because the information that process X modified file Y is not an
> > > information that user can generally obtain without extra capabilities(?)
> > > I can add a flag FAN_REPORT_OWN_PID to make that behavior
> > > explicit and then we can relax reporting pids later.
> > >
> >
> > FYI a patch for flag FAN_REPORT_SELF_PID is pushed to branch
> > fanotify_unpriv.
> >
> > The UAPI feels a bit awkward with this flag, but that is the easiest way
> > to start without worrying about disclosing pids.
> >
> > I guess we can require that unprivileged listener has pid 1 in its own
> > pid ns. The outcome is similar to FAN_REPORT_SELF_PID, except
> > it can also get pids of its children which is probably fine.
> >
> 
> Jan,
> 
> WRT your comment in github:
> "So maybe we can just require that this flag is already set by userspace
> instead of silently setting it? Like:
> 
> if (!(flags & FAN_REPORT_SELF_PID)) return -EPERM;
> 
> I'd say that variant is more futureproof and the difference for user
> is minimal."
> 
> I started with this approach and then I wrote the tests and imagined
> the man page
> requiring this flag would be a bit awkward, so I changed it to auto-enable.
> 
> I am not strongly against the more implicit flag requirement, but in
> favor of the
> auto-enable approach I would like to argue that with current fanotify you CAN
> get zero pid in event, so think about it this way:
> If a listener is started in (or moved into) its own pid ns, it will
> get zero pid in all
> events (other than those generated by itself and its own children).
> 
> With the proposed change, the same applies also if the listener is started
> without CAP_SYS_ADMIN.
> 
> As a matter of fact, we do not need the flag at all, we can determine whether
> or not to report pid according to capabilities of the event reader at
> event read time.
> And we can check for one of:
> - CAP_SYS_ADMIN
> - CAP_SYS_PACCT
> - CAP_SYS_PTRACE
> 
> Do you prefer this flag-less approach?

Well, I don't have strong opinion what we should do internally either. The
flag seems OK to me. The biggest question is whether we should expose the
FAN_REPORT_SELF_PID flag to userspace or not. If we would not require
explicit flag for unpriv users, I see little reason to expose that flag at
all.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-24 10:52           ` Jan Kara
@ 2021-02-24 12:58             ` Amir Goldstein
  2021-02-24 13:37               ` Amir Goldstein
  2021-02-24 17:29               ` Jan Kara
  0 siblings, 2 replies; 16+ messages in thread
From: Amir Goldstein @ 2021-02-24 12:58 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Kees Cook

On Wed, Feb 24, 2021 at 12:52 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 23-02-21 19:16:40, Amir Goldstein wrote:
> > On Fri, Feb 19, 2021 at 6:16 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Tue, Feb 16, 2021 at 8:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > > > > > Add limited support for unprivileged fanotify event listener.
> > > > > > An unprivileged event listener does not get an open file descriptor in
> > > > > > the event nor the process pid of another process.  An unprivileged event
> > > > > > listener cannot request permission events, cannot set mount/filesystem
> > > > > > marks and cannot request unlimited queue/marks.
> > > > > >
> > > > > > This enables the limited functionality similar to inotify when watching a
> > > > > > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > > > > > requiring SYS_CAP_ADMIN privileges.
> > > > > >
> > > > > > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > > > > > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > > > > > to monitor all changes inside those directories.
> > > > > >
> > > > > > This typically requires that the listener keeps a map of watched directory
> > > > > > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > > > > > before starting to watch for changes.
> > > > > >
> > > > > > When getting an event, the reported fid of the parent should be resolved
> > > > > > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > > > > > state of the filesystem entry.
> > > > > >
> > > > > > Note that even though events do not report the event creator pid,
> > > > > > fanotify does not merge similar events on the same object that were
> > > > > > generated by different processes. This is aligned with exiting behavior
> > > > > > when generating processes are outside of the listener pidns (which
> > > > > > results in reporting 0 pid to listener).
> > > > > >
> > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > >
> > > > > The patch looks mostly good to me. Just two questions:
> > > > >
> > > > > a) Remind me please, why did we decide pid isn't safe to report to
> > > > > unpriviledged listeners?
> > > >
> > > > Just because the information that process X modified file Y is not an
> > > > information that user can generally obtain without extra capabilities(?)
> > > > I can add a flag FAN_REPORT_OWN_PID to make that behavior
> > > > explicit and then we can relax reporting pids later.
> > > >
> > >
> > > FYI a patch for flag FAN_REPORT_SELF_PID is pushed to branch
> > > fanotify_unpriv.
> > >
> > > The UAPI feels a bit awkward with this flag, but that is the easiest way
> > > to start without worrying about disclosing pids.
> > >
> > > I guess we can require that unprivileged listener has pid 1 in its own
> > > pid ns. The outcome is similar to FAN_REPORT_SELF_PID, except
> > > it can also get pids of its children which is probably fine.
> > >
> >
> > Jan,
> >
> > WRT your comment in github:
> > "So maybe we can just require that this flag is already set by userspace
> > instead of silently setting it? Like:
> >
> > if (!(flags & FAN_REPORT_SELF_PID)) return -EPERM;
> >
> > I'd say that variant is more futureproof and the difference for user
> > is minimal."
> >
> > I started with this approach and then I wrote the tests and imagined
> > the man page
> > requiring this flag would be a bit awkward, so I changed it to auto-enable.
> >
> > I am not strongly against the more implicit flag requirement, but in
> > favor of the
> > auto-enable approach I would like to argue that with current fanotify you CAN
> > get zero pid in event, so think about it this way:
> > If a listener is started in (or moved into) its own pid ns, it will
> > get zero pid in all
> > events (other than those generated by itself and its own children).
> >
> > With the proposed change, the same applies also if the listener is started
> > without CAP_SYS_ADMIN.
> >
> > As a matter of fact, we do not need the flag at all, we can determine whether
> > or not to report pid according to capabilities of the event reader at
> > event read time.
> > And we can check for one of:
> > - CAP_SYS_ADMIN
> > - CAP_SYS_PACCT
> > - CAP_SYS_PTRACE
> >
> > Do you prefer this flag-less approach?
>
> Well, I don't have strong opinion what we should do internally either. The
> flag seems OK to me. The biggest question is whether we should expose the
> FAN_REPORT_SELF_PID flag to userspace or not. If we would not require
> explicit flag for unpriv users, I see little reason to expose that flag at
> all.
>

IMO the only listeners that actually care about event->pid are permission
event listeners. I think that FAN_CLASS_NOTIF listeners do not care
about it. The only thing that *I* ever used event->pid for is to ignore events
from self pid.

My question is, do you mind if we start with this code:

@@ -419,6 +419,14 @@ static ssize_t copy_event_to_user(struct
fsnotify_group *group,
        metadata.reserved = 0;
        metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
        metadata.pid = pid_vnr(event->pid);
+
+        /*
+         * For an unprivileged listener, event->pid can be used to identify the
+         * events generated by the listener process itself, without disclosing
+         * the pids of other processes.
+         */
+        if (!capable(CAP_SYS_ADMIN) &&
+            task_tgid(current) != event->pid)
+                metadata.pid = 0;

No need for any visible or invisible user flags.
If users ask for event->pid of other processes later (I don't think they will)
and we decide that it is safe to disclose them, we will require another flag
and then the test will become:

+        if (!capable(CAP_SYS_ADMIN) ||
+            FAN_GROUP_FLAG(group, FAN_REPORT_PID))

and *if* that ever happens, we will document the FAN_REPORT_PID
flag and say that it is enabled by default for CAP_SYS_ADMIN instead of
requiring and documenting FAN_REPORT_SELF_PID now.

The way I see it, the only disadvantage with this negated approach is
that CAP_SYS_ADMIN listeners cannot turn off event->pid reporting,
but why would anybody need to do that?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-24 12:58             ` Amir Goldstein
@ 2021-02-24 13:37               ` Amir Goldstein
  2021-02-24 17:29               ` Jan Kara
  1 sibling, 0 replies; 16+ messages in thread
From: Amir Goldstein @ 2021-02-24 13:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Kees Cook

On Wed, Feb 24, 2021 at 2:58 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Feb 24, 2021 at 12:52 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 23-02-21 19:16:40, Amir Goldstein wrote:
> > > On Fri, Feb 19, 2021 at 6:16 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 16, 2021 at 8:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
> > > > > >
> > > > > > On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > > > > > > Add limited support for unprivileged fanotify event listener.
> > > > > > > An unprivileged event listener does not get an open file descriptor in
> > > > > > > the event nor the process pid of another process.  An unprivileged event
> > > > > > > listener cannot request permission events, cannot set mount/filesystem
> > > > > > > marks and cannot request unlimited queue/marks.
> > > > > > >
> > > > > > > This enables the limited functionality similar to inotify when watching a
> > > > > > > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > > > > > > requiring SYS_CAP_ADMIN privileges.
> > > > > > >
> > > > > > > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > > > > > > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > > > > > > to monitor all changes inside those directories.
> > > > > > >
> > > > > > > This typically requires that the listener keeps a map of watched directory
> > > > > > > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > > > > > > before starting to watch for changes.
> > > > > > >
> > > > > > > When getting an event, the reported fid of the parent should be resolved
> > > > > > > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > > > > > > state of the filesystem entry.
> > > > > > >
> > > > > > > Note that even though events do not report the event creator pid,
> > > > > > > fanotify does not merge similar events on the same object that were
> > > > > > > generated by different processes. This is aligned with exiting behavior
> > > > > > > when generating processes are outside of the listener pidns (which
> > > > > > > results in reporting 0 pid to listener).
> > > > > > >
> > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > >
> > > > > > The patch looks mostly good to me. Just two questions:
> > > > > >
> > > > > > a) Remind me please, why did we decide pid isn't safe to report to
> > > > > > unpriviledged listeners?
> > > > >
> > > > > Just because the information that process X modified file Y is not an
> > > > > information that user can generally obtain without extra capabilities(?)
> > > > > I can add a flag FAN_REPORT_OWN_PID to make that behavior
> > > > > explicit and then we can relax reporting pids later.
> > > > >
> > > >
> > > > FYI a patch for flag FAN_REPORT_SELF_PID is pushed to branch
> > > > fanotify_unpriv.
> > > >
> > > > The UAPI feels a bit awkward with this flag, but that is the easiest way
> > > > to start without worrying about disclosing pids.
> > > >
> > > > I guess we can require that unprivileged listener has pid 1 in its own
> > > > pid ns. The outcome is similar to FAN_REPORT_SELF_PID, except
> > > > it can also get pids of its children which is probably fine.
> > > >
> > >
> > > Jan,
> > >
> > > WRT your comment in github:
> > > "So maybe we can just require that this flag is already set by userspace
> > > instead of silently setting it? Like:
> > >
> > > if (!(flags & FAN_REPORT_SELF_PID)) return -EPERM;
> > >
> > > I'd say that variant is more futureproof and the difference for user
> > > is minimal."
> > >
> > > I started with this approach and then I wrote the tests and imagined
> > > the man page
> > > requiring this flag would be a bit awkward, so I changed it to auto-enable.
> > >
> > > I am not strongly against the more implicit flag requirement, but in
> > > favor of the
> > > auto-enable approach I would like to argue that with current fanotify you CAN
> > > get zero pid in event, so think about it this way:
> > > If a listener is started in (or moved into) its own pid ns, it will
> > > get zero pid in all
> > > events (other than those generated by itself and its own children).
> > >
> > > With the proposed change, the same applies also if the listener is started
> > > without CAP_SYS_ADMIN.
> > >
> > > As a matter of fact, we do not need the flag at all, we can determine whether
> > > or not to report pid according to capabilities of the event reader at
> > > event read time.
> > > And we can check for one of:
> > > - CAP_SYS_ADMIN
> > > - CAP_SYS_PACCT
> > > - CAP_SYS_PTRACE
> > >
> > > Do you prefer this flag-less approach?
> >
> > Well, I don't have strong opinion what we should do internally either. The
> > flag seems OK to me. The biggest question is whether we should expose the
> > FAN_REPORT_SELF_PID flag to userspace or not. If we would not require
> > explicit flag for unpriv users, I see little reason to expose that flag at
> > all.
> >
>
> IMO the only listeners that actually care about event->pid are permission
> event listeners. I think that FAN_CLASS_NOTIF listeners do not care
> about it. The only thing that *I* ever used event->pid for is to ignore events
> from self pid.
>
> My question is, do you mind if we start with this code:
>
> @@ -419,6 +419,14 @@ static ssize_t copy_event_to_user(struct
> fsnotify_group *group,
>         metadata.reserved = 0;
>         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
>         metadata.pid = pid_vnr(event->pid);
> +
> +        /*
> +         * For an unprivileged listener, event->pid can be used to identify the
> +         * events generated by the listener process itself, without disclosing
> +         * the pids of other processes.
> +         */
> +        if (!capable(CAP_SYS_ADMIN) &&
> +            task_tgid(current) != event->pid)
> +                metadata.pid = 0;
>
> No need for any visible or invisible user flags.
> If users ask for event->pid of other processes later (I don't think they will)
> and we decide that it is safe to disclose them, we will require another flag
> and then the test will become:
>
> +        if (!capable(CAP_SYS_ADMIN) ||
> +            FAN_GROUP_FLAG(group, FAN_REPORT_PID))

This condition came out wrong...

>
> and *if* that ever happens, we will document the FAN_REPORT_PID
> flag and say that it is enabled by default for CAP_SYS_ADMIN instead of
> requiring and documenting FAN_REPORT_SELF_PID now.
>
> The way I see it, the only disadvantage with this negated approach is
> that CAP_SYS_ADMIN listeners cannot turn off event->pid reporting,
> but why would anybody need to do that?
>

Anyway, I pushed an example with FAN_REPORT_PID to branch
fanotify_unpriv. It just defines a flag that indicates the existing behavior,
auto enables for CAP_SYS_ADMIN and denied for !CAP_SYS_ADMIN.

Because it is a purely semantic flag at this point, the patch that defines
the flag can be dropped now and maybe added later.

Option for future use - the reaper process (pid 1) of pid ns will be
allowed to request FAN_REPORT_PID without any other capabilities.

Thanks,
Amir.

P.S. fixes to your review comments on fanotify_merge also pushed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users
  2021-02-24 12:58             ` Amir Goldstein
  2021-02-24 13:37               ` Amir Goldstein
@ 2021-02-24 17:29               ` Jan Kara
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Kara @ 2021-02-24 17:29 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API, Kees Cook

On Wed 24-02-21 14:58:31, Amir Goldstein wrote:
> On Wed, Feb 24, 2021 at 12:52 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 23-02-21 19:16:40, Amir Goldstein wrote:
> > > On Fri, Feb 19, 2021 at 6:16 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 16, 2021 at 8:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 16, 2021 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
> > > > > >
> > > > > > On Sun 24-01-21 20:42:04, Amir Goldstein wrote:
> > > > > > > Add limited support for unprivileged fanotify event listener.
> > > > > > > An unprivileged event listener does not get an open file descriptor in
> > > > > > > the event nor the process pid of another process.  An unprivileged event
> > > > > > > listener cannot request permission events, cannot set mount/filesystem
> > > > > > > marks and cannot request unlimited queue/marks.
> > > > > > >
> > > > > > > This enables the limited functionality similar to inotify when watching a
> > > > > > > set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
> > > > > > > requiring SYS_CAP_ADMIN privileges.
> > > > > > >
> > > > > > > The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
> > > > > > > event listener watching a set of directories (with FAN_EVENT_ON_CHILD)
> > > > > > > to monitor all changes inside those directories.
> > > > > > >
> > > > > > > This typically requires that the listener keeps a map of watched directory
> > > > > > > fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
> > > > > > > before starting to watch for changes.
> > > > > > >
> > > > > > > When getting an event, the reported fid of the parent should be resolved
> > > > > > > to dirfd and fstatsat(2) with dirfd and name should be used to query the
> > > > > > > state of the filesystem entry.
> > > > > > >
> > > > > > > Note that even though events do not report the event creator pid,
> > > > > > > fanotify does not merge similar events on the same object that were
> > > > > > > generated by different processes. This is aligned with exiting behavior
> > > > > > > when generating processes are outside of the listener pidns (which
> > > > > > > results in reporting 0 pid to listener).
> > > > > > >
> > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > >
> > > > > > The patch looks mostly good to me. Just two questions:
> > > > > >
> > > > > > a) Remind me please, why did we decide pid isn't safe to report to
> > > > > > unpriviledged listeners?
> > > > >
> > > > > Just because the information that process X modified file Y is not an
> > > > > information that user can generally obtain without extra capabilities(?)
> > > > > I can add a flag FAN_REPORT_OWN_PID to make that behavior
> > > > > explicit and then we can relax reporting pids later.
> > > > >
> > > >
> > > > FYI a patch for flag FAN_REPORT_SELF_PID is pushed to branch
> > > > fanotify_unpriv.
> > > >
> > > > The UAPI feels a bit awkward with this flag, but that is the easiest way
> > > > to start without worrying about disclosing pids.
> > > >
> > > > I guess we can require that unprivileged listener has pid 1 in its own
> > > > pid ns. The outcome is similar to FAN_REPORT_SELF_PID, except
> > > > it can also get pids of its children which is probably fine.
> > > >
> > >
> > > Jan,
> > >
> > > WRT your comment in github:
> > > "So maybe we can just require that this flag is already set by userspace
> > > instead of silently setting it? Like:
> > >
> > > if (!(flags & FAN_REPORT_SELF_PID)) return -EPERM;
> > >
> > > I'd say that variant is more futureproof and the difference for user
> > > is minimal."
> > >
> > > I started with this approach and then I wrote the tests and imagined
> > > the man page
> > > requiring this flag would be a bit awkward, so I changed it to auto-enable.
> > >
> > > I am not strongly against the more implicit flag requirement, but in
> > > favor of the
> > > auto-enable approach I would like to argue that with current fanotify you CAN
> > > get zero pid in event, so think about it this way:
> > > If a listener is started in (or moved into) its own pid ns, it will
> > > get zero pid in all
> > > events (other than those generated by itself and its own children).
> > >
> > > With the proposed change, the same applies also if the listener is started
> > > without CAP_SYS_ADMIN.
> > >
> > > As a matter of fact, we do not need the flag at all, we can determine whether
> > > or not to report pid according to capabilities of the event reader at
> > > event read time.
> > > And we can check for one of:
> > > - CAP_SYS_ADMIN
> > > - CAP_SYS_PACCT
> > > - CAP_SYS_PTRACE
> > >
> > > Do you prefer this flag-less approach?
> >
> > Well, I don't have strong opinion what we should do internally either. The
> > flag seems OK to me. The biggest question is whether we should expose the
> > FAN_REPORT_SELF_PID flag to userspace or not. If we would not require
> > explicit flag for unpriv users, I see little reason to expose that flag at
> > all.
> >
> 
> IMO the only listeners that actually care about event->pid are permission
> event listeners. I think that FAN_CLASS_NOTIF listeners do not care
> about it. The only thing that *I* ever used event->pid for is to ignore events
> from self pid.
> 
> My question is, do you mind if we start with this code:
> 
> @@ -419,6 +419,14 @@ static ssize_t copy_event_to_user(struct
> fsnotify_group *group,
>         metadata.reserved = 0;
>         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
>         metadata.pid = pid_vnr(event->pid);
> +
> +        /*
> +         * For an unprivileged listener, event->pid can be used to identify the
> +         * events generated by the listener process itself, without disclosing
> +         * the pids of other processes.
> +         */
> +        if (!capable(CAP_SYS_ADMIN) &&
> +            task_tgid(current) != event->pid)
> +                metadata.pid = 0;
> 
> No need for any visible or invisible user flags.

OK, I think this is fine.

> If users ask for event->pid of other processes later (I don't think they will)
> and we decide that it is safe to disclose them, we will require another flag
> and then the test will become:
> 
> +        if (!capable(CAP_SYS_ADMIN) ||
> +            FAN_GROUP_FLAG(group, FAN_REPORT_PID))
> 
> and *if* that ever happens, we will document the FAN_REPORT_PID
> flag and say that it is enabled by default for CAP_SYS_ADMIN instead of
> requiring and documenting FAN_REPORT_SELF_PID now.
> 
> The way I see it, the only disadvantage with this negated approach is
> that CAP_SYS_ADMIN listeners cannot turn off event->pid reporting,
> but why would anybody need to do that?

Yeah, let's leave complications for later when someone asks for it :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-02-24 17:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-24 18:42 [RFC][PATCH 0/2] unprivileged fanotify listener Amir Goldstein
2021-01-24 18:42 ` [RFC][PATCH 1/2] fanotify: configurable limits via sysfs Amir Goldstein
2021-02-16 16:27   ` Jan Kara
2021-02-16 18:02     ` Amir Goldstein
2021-02-17 10:25       ` Jan Kara
2021-02-18 18:57     ` Amir Goldstein
2021-02-19  9:01       ` Amir Goldstein
2021-01-24 18:42 ` [RFC][PATCH 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
2021-02-16 17:01   ` Jan Kara
2021-02-16 18:12     ` Amir Goldstein
2021-02-19 16:16       ` Amir Goldstein
2021-02-23 17:16         ` Amir Goldstein
2021-02-24 10:52           ` Jan Kara
2021-02-24 12:58             ` Amir Goldstein
2021-02-24 13:37               ` Amir Goldstein
2021-02-24 17:29               ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).