All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] unprivileged fanotify listener
@ 2021-03-04 11:29 Amir Goldstein
  2021-03-04 11:29 ` [PATCH v2 1/2] fanotify: configurable limits via sysfs Amir Goldstein
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-04 11:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, linux-api

Jan,

These patches try to implement a minimal set and least controversial
functionality that we can allow for unprivileged users as a starting
point.

The patches were tested on top of v5.12-rc1 and the fanotify_merge
patches using the unprivileged listener LTP tests written by Matthew
and another LTP tests I wrote to test the sysfs tunable limits [1].

Thanks,
Amir.

Changes since v1:
- Dropped marks per group limit in favor of max per user
- Rename sysfs files from 'listener' to 'group' terminology
- Dropped internal group flag FANOTIFY_UNPRIV
- Limit unprivileged listener to FAN_REPORT_FID family
- Report event->pid depending on reader capabilities

[1] https://github.com/amir73il/ltp/commits/fanotify_unpriv

Amir Goldstein (2):
  fanotify: configurable limits via sysfs
  fanotify: support limited functionality for unprivileged users

 fs/notify/fanotify/fanotify.c      |  16 ++-
 fs/notify/fanotify/fanotify_user.c | 152 ++++++++++++++++++++++++-----
 fs/notify/fdinfo.c                 |   3 +-
 fs/notify/group.c                  |   1 -
 fs/notify/mark.c                   |   4 -
 include/linux/fanotify.h           |  36 ++++++-
 include/linux/fsnotify_backend.h   |   6 +-
 include/linux/sched/user.h         |   3 -
 include/linux/user_namespace.h     |   4 +
 kernel/sysctl.c                    |  12 ++-
 kernel/ucount.c                    |   4 +
 11 files changed, 194 insertions(+), 47 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/2] fanotify: configurable limits via sysfs
  2021-03-04 11:29 [PATCH v2 0/2] unprivileged fanotify listener Amir Goldstein
@ 2021-03-04 11:29 ` Amir Goldstein
  2021-03-04 11:29 ` [PATCH v2 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
  2021-03-16 15:55 ` [PATCH v2 0/2] unprivileged fanotify listener Jan Kara
  2 siblings, 0 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-04 11:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, linux-api

fanotify has some hardcoded limits. The only APIs to escape those limits
are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.

Allow finer grained tuning of the system limits via sysfs tunables under
/proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify,
with some minor differences.

- max_queued_events - global system tunable for group queue size limit.
  Like the inotify tunable with the same name, it defaults to 16384 and
  applies on initialization of a new group.

- max_user_marks - user ns tunable for marks limit per user.
  Like the inotify tunable named max_user_watches, on a machine with
  sufficient RAM and it defaults to 1048576 in init userns and can be
  further limited per containing user ns.

- max_user_groups - user ns tunable for number of groups per user.
  Like the inotify tunable named max_user_instances, it defaults to 128
  in init userns and can be further limited per containing user ns.

The slightly different tunable names used for fanotify are derived from
the "group" and "mark" terminology used in the fanotify man pages and
throughout the code.

Considering the fact that the default value for max_user_instances was
increased in kernel v5.10 from 8192 to 1048576, leaving the legacy
fanotify limit of 8192 marks per group in addition to the max_user_marks
limit makes little sense, so the per group marks limit has been removed.

Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own
marks are not accounted in the per user marks account, so in effect the
limit of max_user_marks is only for the collection of groups that are
not initialized with FAN_UNLIMITED_MARKS.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      |  16 ++--
 fs/notify/fanotify/fanotify_user.c | 123 ++++++++++++++++++++++++-----
 fs/notify/group.c                  |   1 -
 fs/notify/mark.c                   |   4 -
 include/linux/fanotify.h           |   3 +
 include/linux/fsnotify_backend.h   |   6 +-
 include/linux/sched/user.h         |   3 -
 include/linux/user_namespace.h     |   4 +
 kernel/sysctl.c                    |  12 ++-
 kernel/ucount.c                    |   4 +
 10 files changed, 137 insertions(+), 39 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e9384de29f6c..1dca852c2cb9 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -801,12 +801,10 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 
 static void fanotify_free_group_priv(struct fsnotify_group *group)
 {
-	struct user_struct *user;
-
 	kfree(group->fanotify_data.merge_hash);
-	user = group->fanotify_data.user;
-	atomic_dec(&user->fanotify_listeners);
-	free_uid(user);
+	if (group->fanotify_data.ucounts)
+		dec_ucount(group->fanotify_data.ucounts,
+			   UCOUNT_FANOTIFY_GROUPS);
 }
 
 static void fanotify_free_path_event(struct fanotify_event *event)
@@ -862,6 +860,13 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
 	}
 }
 
+static void fanotify_freeing_mark(struct fsnotify_mark *mark,
+				  struct fsnotify_group *group)
+{
+	if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS))
+		dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS);
+}
+
 static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
 {
 	kmem_cache_free(fanotify_mark_cache, fsn_mark);
@@ -871,5 +876,6 @@ const struct fsnotify_ops fanotify_fsnotify_ops = {
 	.handle_event = fanotify_handle_event,
 	.free_group_priv = fanotify_free_group_priv,
 	.free_event = fanotify_free_event,
+	.freeing_mark = fanotify_freeing_mark,
 	.free_mark = fanotify_free_mark,
 };
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b89f332248bd..e81848e09646 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -27,8 +27,61 @@
 #include "fanotify.h"
 
 #define FANOTIFY_DEFAULT_MAX_EVENTS	16384
-#define FANOTIFY_DEFAULT_MAX_MARKS	8192
-#define FANOTIFY_DEFAULT_MAX_LISTENERS	128
+#define FANOTIFY_OLD_DEFAULT_MAX_MARKS	8192
+#define FANOTIFY_DEFAULT_MAX_GROUPS	128
+
+/*
+ * Legacy fanotify marks limits (8192) is per group and we introduced a tunable
+ * limit of marks per user, similar to inotify.  Effectively, the legacy limit
+ * of fanotify marks per user is <max marks per group> * <max groups per user>.
+ * This default limit (1M) also happens to match the increased limit of inotify
+ * max_user_watches since v5.10.
+ */
+#define FANOTIFY_DEFAULT_MAX_USER_MARKS	\
+	(FANOTIFY_OLD_DEFAULT_MAX_MARKS * FANOTIFY_DEFAULT_MAX_GROUPS)
+
+/*
+ * Most of the memory cost of adding an inode mark is pinning the marked inode.
+ * The size of the filesystem inode struct is not uniform across filesystems,
+ * so double the size of a VFS inode is used as a conservative approximation.
+ */
+#define INODE_MARK_COST	(2 * sizeof(struct inode))
+
+/* configurable via /proc/sys/fs/fanotify/ */
+static int fanotify_max_queued_events __read_mostly;
+
+#ifdef CONFIG_SYSCTL
+
+#include <linux/sysctl.h>
+
+struct ctl_table fanotify_table[] = {
+	{
+		.procname	= "max_user_groups",
+		.data	= &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS],
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "max_user_marks",
+		.data	= &init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS],
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "max_queued_events",
+		.data		= &fanotify_max_queued_events,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO
+	},
+	{ }
+};
+#endif /* CONFIG_SYSCTL */
 
 /*
  * All flags that may be specified in parameter event_f_flags of fanotify_init.
@@ -847,24 +900,38 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 						   unsigned int type,
 						   __kernel_fsid_t *fsid)
 {
+	struct ucounts *ucounts = group->fanotify_data.ucounts;
 	struct fsnotify_mark *mark;
 	int ret;
 
-	if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks)
+	/*
+	 * Enforce per user marks limits per user in all containing user ns.
+	 * A group with FAN_UNLIMITED_MARKS does not contribute to mark count
+	 * in the limited groups account.
+	 */
+	if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS) &&
+	    !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
 		return ERR_PTR(-ENOSPC);
 
 	mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
-	if (!mark)
-		return ERR_PTR(-ENOMEM);
+	if (!mark) {
+		ret = -ENOMEM;
+		goto out_dec_ucounts;
+	}
 
 	fsnotify_init_mark(mark, group);
 	ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid);
 	if (ret) {
 		fsnotify_put_mark(mark);
-		return ERR_PTR(ret);
+		goto out_dec_ucounts;
 	}
 
 	return mark;
+
+out_dec_ucounts:
+	if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS))
+		dec_ucount(ucounts, UCOUNT_FANOTIFY_MARKS);
+	return ERR_PTR(ret);
 }
 
 
@@ -963,7 +1030,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 {
 	struct fsnotify_group *group;
 	int f_flags, fd;
-	struct user_struct *user;
 	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
 	unsigned int class = flags & FANOTIFY_CLASS_BITS;
 
@@ -1002,12 +1068,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID))
 		return -EINVAL;
 
-	user = get_current_user();
-	if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) {
-		free_uid(user);
-		return -EMFILE;
-	}
-
 	f_flags = O_RDWR | FMODE_NONOTIFY;
 	if (flags & FAN_CLOEXEC)
 		f_flags |= O_CLOEXEC;
@@ -1017,13 +1077,19 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
 	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
 	if (IS_ERR(group)) {
-		free_uid(user);
 		return PTR_ERR(group);
 	}
 
-	group->fanotify_data.user = user;
+	/* Enforce groups limits per user in all containing user ns */
+	group->fanotify_data.ucounts = inc_ucount(current_user_ns(),
+						  current_euid(),
+						  UCOUNT_FANOTIFY_GROUPS);
+	if (!group->fanotify_data.ucounts) {
+		fd = -EMFILE;
+		goto out_destroy_group;
+	}
+
 	group->fanotify_data.flags = flags;
-	atomic_inc(&user->fanotify_listeners);
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
 
 	group->fanotify_data.merge_hash = fanotify_alloc_merge_hash();
@@ -1064,16 +1130,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 			goto out_destroy_group;
 		group->max_events = UINT_MAX;
 	} else {
-		group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS;
+		group->max_events = fanotify_max_queued_events;
 	}
 
 	if (flags & FAN_UNLIMITED_MARKS) {
 		fd = -EPERM;
 		if (!capable(CAP_SYS_ADMIN))
 			goto out_destroy_group;
-		group->fanotify_data.max_marks = UINT_MAX;
-	} else {
-		group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS;
 	}
 
 	if (flags & FAN_ENABLE_AUDIT) {
@@ -1357,6 +1420,21 @@ SYSCALL32_DEFINE6(fanotify_mark,
  */
 static int __init fanotify_user_setup(void)
 {
+	struct sysinfo si;
+	int max_marks;
+
+	si_meminfo(&si);
+	/*
+	 * Allow up to 1% of addressable memory to be accounted for per user
+	 * marks limited to the range [8192, 1048576]. mount and sb marks are
+	 * a lot cheaper than inode marks, but there is no reason for a user
+	 * to have many of those, so calculate by the cost of inode marks.
+	 */
+	max_marks = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) /
+		    INODE_MARK_COST;
+	max_marks = clamp(max_marks, FANOTIFY_OLD_DEFAULT_MAX_MARKS,
+				     FANOTIFY_DEFAULT_MAX_USER_MARKS);
+
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
 
@@ -1371,6 +1449,11 @@ static int __init fanotify_user_setup(void)
 			KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
 	}
 
+	fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS;
+	init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
+					FANOTIFY_DEFAULT_MAX_GROUPS;
+	init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks;
+
 	return 0;
 }
 device_initcall(fanotify_user_setup);
diff --git a/fs/notify/group.c b/fs/notify/group.c
index ffd723ffe46d..fb89c351295d 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -122,7 +122,6 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 
 	/* set to 0 when there a no external references to this group */
 	refcount_set(&group->refcnt, 1);
-	atomic_set(&group->num_marks, 0);
 	atomic_set(&group->user_waits, 0);
 
 	spin_lock_init(&group->notification_lock);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 8387937b9d01..d32ab349db74 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -391,8 +391,6 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 	list_del_init(&mark->g_list);
 	spin_unlock(&mark->lock);
 
-	atomic_dec(&group->num_marks);
-
 	/* Drop mark reference acquired in fsnotify_add_mark_locked() */
 	fsnotify_put_mark(mark);
 }
@@ -656,7 +654,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE | FSNOTIFY_MARK_FLAG_ATTACHED;
 
 	list_add(&mark->g_list, &group->marks_list);
-	atomic_inc(&group->num_marks);
 	fsnotify_get_mark(mark); /* for g_list */
 	spin_unlock(&mark->lock);
 
@@ -674,7 +671,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 			 FSNOTIFY_MARK_FLAG_ATTACHED);
 	list_del_init(&mark->g_list);
 	spin_unlock(&mark->lock);
-	atomic_dec(&group->num_marks);
 
 	fsnotify_put_mark(mark);
 	return ret;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 3e9c56ee651f..031a97d8369a 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -2,8 +2,11 @@
 #ifndef _LINUX_FANOTIFY_H
 #define _LINUX_FANOTIFY_H
 
+#include <linux/sysctl.h>
 #include <uapi/linux/fanotify.h>
 
+extern struct ctl_table fanotify_table[]; /* for sysctl */
+
 #define FAN_GROUP_FLAG(group, flag) \
 	((group)->fanotify_data.flags & (flag))
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 63fb766f0f3e..1ce66748a2d2 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -206,9 +206,6 @@ struct fsnotify_group {
 
 	/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */
 	struct mutex mark_mutex;	/* protect marks_list */
-	atomic_t num_marks;		/* 1 for each mark and 1 for not being
-					 * past the point of no return when freeing
-					 * a group */
 	atomic_t user_waits;		/* Number of tasks waiting for user
 					 * response */
 	struct list_head marks_list;	/* all inode marks for this group */
@@ -240,8 +237,7 @@ struct fsnotify_group {
 			wait_queue_head_t access_waitq;
 			int flags;           /* flags from fanotify_init() */
 			int f_flags; /* event_f_flags from fanotify_init() */
-			unsigned int max_marks;
-			struct user_struct *user;
+			struct ucounts *ucounts;
 		} fanotify_data;
 #endif /* CONFIG_FANOTIFY */
 	};
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index a8ec3b6093fc..3632c5d6ec55 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -14,9 +14,6 @@ struct user_struct {
 	refcount_t __count;	/* reference count */
 	atomic_t processes;	/* How many processes does this user have? */
 	atomic_t sigpending;	/* How many pending signals does this user have? */
-#ifdef CONFIG_FANOTIFY
-	atomic_t fanotify_listeners;
-#endif
 #ifdef CONFIG_EPOLL
 	atomic_long_t epoll_watches; /* The number of file descriptors currently watched */
 #endif
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..f0d961a15fba 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -49,6 +49,10 @@ enum ucount_type {
 #ifdef CONFIG_INOTIFY_USER
 	UCOUNT_INOTIFY_INSTANCES,
 	UCOUNT_INOTIFY_WATCHES,
+#endif
+#ifdef CONFIG_FANOTIFY
+	UCOUNT_FANOTIFY_GROUPS,
+	UCOUNT_FANOTIFY_MARKS,
 #endif
 	UCOUNT_COUNTS,
 };
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 62fbd09b5dc1..4b6b9de89da8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -148,6 +148,9 @@ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
 #ifdef CONFIG_INOTIFY_USER
 #include <linux/inotify.h>
 #endif
+#ifdef CONFIG_FANOTIFY
+#include <linux/fanotify.h>
+#endif
 
 #ifdef CONFIG_PROC_SYSCTL
 
@@ -3258,7 +3261,14 @@ static struct ctl_table fs_table[] = {
 		.mode		= 0555,
 		.child		= inotify_table,
 	},
-#endif	
+#endif
+#ifdef CONFIG_FANOTIFY
+	{
+		.procname	= "fanotify",
+		.mode		= 0555,
+		.child		= fanotify_table,
+	},
+#endif
 #ifdef CONFIG_EPOLL
 	{
 		.procname	= "epoll",
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 11b1596e2542..8d8874f1c35e 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -73,6 +73,10 @@ static struct ctl_table user_table[] = {
 #ifdef CONFIG_INOTIFY_USER
 	UCOUNT_ENTRY("max_inotify_instances"),
 	UCOUNT_ENTRY("max_inotify_watches"),
+#endif
+#ifdef CONFIG_FANOTIFY
+	UCOUNT_ENTRY("max_fanotify_groups"),
+	UCOUNT_ENTRY("max_fanotify_marks"),
 #endif
 	{ }
 };
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/2] fanotify: support limited functionality for unprivileged users
  2021-03-04 11:29 [PATCH v2 0/2] unprivileged fanotify listener Amir Goldstein
  2021-03-04 11:29 ` [PATCH v2 1/2] fanotify: configurable limits via sysfs Amir Goldstein
@ 2021-03-04 11:29 ` Amir Goldstein
  2021-03-16 15:55 ` [PATCH v2 0/2] unprivileged fanotify listener Jan Kara
  2 siblings, 0 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-04 11:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, linux-api

Add limited support for unprivileged fanotify groups.
An unprivileged users is not allowed to get an open file descriptor in
the event nor the process pid of another process.  An unprivileged user
cannot request permission events, cannot set mount/filesystem marks and
cannot request unlimited queue/marks.

This enables the limited functionality similar to inotify when watching a
set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
requiring SYS_CAP_ADMIN privileges.

The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
listener watching a set of directories (with FAN_EVENT_ON_CHILD) to monitor
all changes inside those directories.

This typically requires that the listener keeps a map of watched directory
fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
before starting to watch for changes.

When getting an event, the reported fid of the parent should be resolved
to dirfd and fstatsat(2) with dirfd and name should be used to query the
state of the filesystem entry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 29 ++++++++++++++++++++++++--
 fs/notify/fdinfo.c                 |  3 ++-
 include/linux/fanotify.h           | 33 +++++++++++++++++++++++++-----
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index e81848e09646..65142b1fa823 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -419,6 +419,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	metadata.reserved = 0;
 	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
 	metadata.pid = pid_vnr(event->pid);
+	/*
+	 * For an unprivileged listener, event->pid can be used to identify the
+	 * events generated by the listener process itself, without disclosing
+	 * the pids of other processes.
+	 */
+	if (!capable(CAP_SYS_ADMIN) &&
+	    task_tgid(current) != event->pid)
+		metadata.pid = 0;
 
 	if (path && path->mnt && path->dentry) {
 		fd = create_fd(group, path, &f);
@@ -1036,8 +1044,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	pr_debug("%s: flags=%x event_f_flags=%x\n",
 		 __func__, flags, event_f_flags);
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
+	if (!capable(CAP_SYS_ADMIN)) {
+		/*
+		 * An unprivileged user can setup an fanotify group with
+		 * limited functionality - an unprivileged group is limited to
+		 * notification events with file handles and it cannot use
+		 * unlimited queue/marks.
+		 */
+		if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode)
+			return -EPERM;
+	}
 
 #ifdef CONFIG_AUDITSYSCALL
 	if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT))
@@ -1288,6 +1304,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		goto fput_and_out;
 	group = f.file->private_data;
 
+	/*
+	 * An unprivileged user is not allowed to watch a mount point nor
+	 * a filesystem.
+	 */
+	ret = -EPERM;
+	if (!capable(CAP_SYS_ADMIN) &&
+	    mark_type != FAN_MARK_INODE)
+		goto fput_and_out;
+
 	/*
 	 * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF.  These are not
 	 * allowed to set permissions events.
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index f0d6b54be412..a712b2aaa9ac 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -144,7 +144,8 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f)
 	struct fsnotify_group *group = f->private_data;
 
 	seq_printf(m, "fanotify flags:%x event-flags:%x\n",
-		   group->fanotify_data.flags, group->fanotify_data.f_flags);
+		   group->fanotify_data.flags,
+		   group->fanotify_data.f_flags);
 
 	show_fdinfo(m, f, fanotify_fdinfo);
 }
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 031a97d8369a..bad41bcb25df 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -18,15 +18,38 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
  * these constant, the programs may break if re-compiled with new uapi headers
  * and then run on an old kernel.
  */
-#define FANOTIFY_CLASS_BITS	(FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \
+
+/* Group classes where permission events are allowed */
+#define FANOTIFY_PERM_CLASSES	(FAN_CLASS_CONTENT | \
 				 FAN_CLASS_PRE_CONTENT)
 
+#define FANOTIFY_CLASS_BITS	(FAN_CLASS_NOTIF | FANOTIFY_PERM_CLASSES)
+
 #define FANOTIFY_FID_BITS	(FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
 
-#define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
-				 FAN_REPORT_TID | \
-				 FAN_CLOEXEC | FAN_NONBLOCK | \
-				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
+/*
+ * fanotify_init() flags that require CAP_SYS_ADMIN.
+ * We do not allow unprivileged groups to request permission events.
+ * We do not allow unprivileged groups to get other process pid in events.
+ * We do not allow unprivileged groups to use unlimited resources.
+ */
+#define FANOTIFY_ADMIN_INIT_FLAGS	(FANOTIFY_PERM_CLASSES | \
+					 FAN_REPORT_TID | \
+					 FAN_UNLIMITED_QUEUE | \
+					 FAN_UNLIMITED_MARKS)
+
+/*
+ * fanotify_init() flags that are allowed for user without CAP_SYS_ADMIN.
+ * FAN_CLASS_NOTIF is the only class we allow for unprivileged group.
+ * We do not allow unprivileged groups to get file descriptors in events,
+ * so one of the flags for reporting file handles is required.
+ */
+#define FANOTIFY_USER_INIT_FLAGS	(FAN_CLASS_NOTIF | \
+					 FANOTIFY_FID_BITS | \
+					 FAN_CLOEXEC | FAN_NONBLOCK)
+
+#define FANOTIFY_INIT_FLAGS	(FANOTIFY_ADMIN_INIT_FLAGS | \
+				 FANOTIFY_USER_INIT_FLAGS)
 
 #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
 				 FAN_MARK_FILESYSTEM)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-04 11:29 [PATCH v2 0/2] unprivileged fanotify listener Amir Goldstein
  2021-03-04 11:29 ` [PATCH v2 1/2] fanotify: configurable limits via sysfs Amir Goldstein
  2021-03-04 11:29 ` [PATCH v2 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
@ 2021-03-16 15:55 ` Jan Kara
  2021-03-17 11:01   ` Amir Goldstein
  2 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2021-03-16 15:55 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api

On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> Jan,
> 
> These patches try to implement a minimal set and least controversial
> functionality that we can allow for unprivileged users as a starting
> point.
> 
> The patches were tested on top of v5.12-rc1 and the fanotify_merge
> patches using the unprivileged listener LTP tests written by Matthew
> and another LTP tests I wrote to test the sysfs tunable limits [1].

Thanks. I've added both patches to my tree.

								Honza

> 
> Thanks,
> Amir.
> 
> Changes since v1:
> - Dropped marks per group limit in favor of max per user
> - Rename sysfs files from 'listener' to 'group' terminology
> - Dropped internal group flag FANOTIFY_UNPRIV
> - Limit unprivileged listener to FAN_REPORT_FID family
> - Report event->pid depending on reader capabilities
> 
> [1] https://github.com/amir73il/ltp/commits/fanotify_unpriv
> 
> Amir Goldstein (2):
>   fanotify: configurable limits via sysfs
>   fanotify: support limited functionality for unprivileged users
> 
>  fs/notify/fanotify/fanotify.c      |  16 ++-
>  fs/notify/fanotify/fanotify_user.c | 152 ++++++++++++++++++++++++-----
>  fs/notify/fdinfo.c                 |   3 +-
>  fs/notify/group.c                  |   1 -
>  fs/notify/mark.c                   |   4 -
>  include/linux/fanotify.h           |  36 ++++++-
>  include/linux/fsnotify_backend.h   |   6 +-
>  include/linux/sched/user.h         |   3 -
>  include/linux/user_namespace.h     |   4 +
>  kernel/sysctl.c                    |  12 ++-
>  kernel/ucount.c                    |   4 +
>  11 files changed, 194 insertions(+), 47 deletions(-)
> 
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-16 15:55 ` [PATCH v2 0/2] unprivileged fanotify listener Jan Kara
@ 2021-03-17 11:01   ` Amir Goldstein
  2021-03-17 11:42     ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-17 11:01 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > Jan,
> >
> > These patches try to implement a minimal set and least controversial
> > functionality that we can allow for unprivileged users as a starting
> > point.
> >
> > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > patches using the unprivileged listener LTP tests written by Matthew
> > and another LTP tests I wrote to test the sysfs tunable limits [1].
>
> Thanks. I've added both patches to my tree.

Great!
I'll go post the LTP tests and work on the man page updates.

BTW, I noticed that you pushed the aggregating for_next branch,
but not the fsnotify topic branch.

Is this intentional?

I am asking because I am usually basing my development branches
off of your fsnotify branch, but I can base them on the unpushed branch.

Heads up. I am playing with extra privileges we may be able to
allow an ns_capable user.
For example, watching a FS_USERNS_MOUNT filesystem that the user
itself has mounted inside userns.

Another feature I am investigating is how to utilize the new idmapped
mounts to get a subtree watch functionality. This requires attaching a
userns to the group on fanotify_init().

<hand waving>
If the group's userns are the same or below the idmapped mount userns,
then all the objects accessed via that idmapped mount are accessible
to the group's userns admin. We can use that fact to filter events very
early based on their mnt_userns and the group's userns, which should be
cheaper than any subtree permission checks.
<\hand waving>

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-17 11:01   ` Amir Goldstein
@ 2021-03-17 11:42     ` Jan Kara
  2021-03-17 12:19       ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2021-03-17 11:42 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > Jan,
> > >
> > > These patches try to implement a minimal set and least controversial
> > > functionality that we can allow for unprivileged users as a starting
> > > point.
> > >
> > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > patches using the unprivileged listener LTP tests written by Matthew
> > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> >
> > Thanks. I've added both patches to my tree.
> 
> Great!
> I'll go post the LTP tests and work on the man page updates.
> 
> BTW, I noticed that you pushed the aggregating for_next branch,
> but not the fsnotify topic branch.
> 
> Is this intentional?

Not really, pushed now. Thanks for reminder.

> I am asking because I am usually basing my development branches
> off of your fsnotify branch, but I can base them on the unpushed branch.
> 
> Heads up. I am playing with extra privileges we may be able to
> allow an ns_capable user.
> For example, watching a FS_USERNS_MOUNT filesystem that the user
> itself has mounted inside userns.
> 
> Another feature I am investigating is how to utilize the new idmapped
> mounts to get a subtree watch functionality. This requires attaching a
> userns to the group on fanotify_init().
> 
> <hand waving>
> If the group's userns are the same or below the idmapped mount userns,
> then all the objects accessed via that idmapped mount are accessible
> to the group's userns admin. We can use that fact to filter events very
> early based on their mnt_userns and the group's userns, which should be
> cheaper than any subtree permission checks.
> <\hand waving>

Yeah, I agree this should work. Just it seems to me the userbase for this
functionality will be (at least currently) rather limited. While full
subtree watches would be IMO interesting to much more users.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-17 11:42     ` Jan Kara
@ 2021-03-17 12:19       ` Amir Goldstein
  2021-03-17 17:45         ` Christian Brauner
  2021-03-18 15:44         ` Jan Kara
  0 siblings, 2 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-17 12:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

On Wed, Mar 17, 2021 at 1:42 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> > On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > > Jan,
> > > >
> > > > These patches try to implement a minimal set and least controversial
> > > > functionality that we can allow for unprivileged users as a starting
> > > > point.
> > > >
> > > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > > patches using the unprivileged listener LTP tests written by Matthew
> > > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> > >
> > > Thanks. I've added both patches to my tree.
> >
> > Great!
> > I'll go post the LTP tests and work on the man page updates.
> >
> > BTW, I noticed that you pushed the aggregating for_next branch,
> > but not the fsnotify topic branch.
> >
> > Is this intentional?
>
> Not really, pushed now. Thanks for reminder.
>
> > I am asking because I am usually basing my development branches
> > off of your fsnotify branch, but I can base them on the unpushed branch.
> >
> > Heads up. I am playing with extra privileges we may be able to
> > allow an ns_capable user.
> > For example, watching a FS_USERNS_MOUNT filesystem that the user
> > itself has mounted inside userns.
> >
> > Another feature I am investigating is how to utilize the new idmapped
> > mounts to get a subtree watch functionality. This requires attaching a
> > userns to the group on fanotify_init().
> >
> > <hand waving>
> > If the group's userns are the same or below the idmapped mount userns,
> > then all the objects accessed via that idmapped mount are accessible
> > to the group's userns admin. We can use that fact to filter events very
> > early based on their mnt_userns and the group's userns, which should be
> > cheaper than any subtree permission checks.
> > <\hand waving>
>
> Yeah, I agree this should work. Just it seems to me the userbase for this
> functionality will be (at least currently) rather limited. While full

That may change when systemd home dirs feature starts to use
idmapped mounts.
Being able to watch the user's entire home directory is a big win
already.

> subtree watches would be IMO interesting to much more users.

Agreed.

I was looking into that as well, using the example of nfsd_acceptable()
to implement the subtree permission check.

The problem here is that even if unprivileged users cannot compromise
security, they can still cause significant CPU overhead either queueing
events or filtering events and that is something I haven't been able to
figure out a way to escape from.

BUT, if you allow userns admin to setup subtree watches (a.k.a filtered
filesystem marks) on a userns filesystem/idmapped mount, now users
can watch subtrees in their home directories and other processes will
pay CPU penalty only for file access in the users home directories.

That might be acceptable.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-17 12:19       ` Amir Goldstein
@ 2021-03-17 17:45         ` Christian Brauner
  2021-03-17 19:14           ` Amir Goldstein
  2021-03-18 15:44         ` Jan Kara
  1 sibling, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-17 17:45 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 17, 2021 at 02:19:57PM +0200, Amir Goldstein wrote:
> On Wed, Mar 17, 2021 at 1:42 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> > > On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > > > Jan,
> > > > >
> > > > > These patches try to implement a minimal set and least controversial
> > > > > functionality that we can allow for unprivileged users as a starting
> > > > > point.
> > > > >
> > > > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > > > patches using the unprivileged listener LTP tests written by Matthew
> > > > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> > > >
> > > > Thanks. I've added both patches to my tree.
> > >
> > > Great!
> > > I'll go post the LTP tests and work on the man page updates.
> > >
> > > BTW, I noticed that you pushed the aggregating for_next branch,
> > > but not the fsnotify topic branch.
> > >
> > > Is this intentional?
> >
> > Not really, pushed now. Thanks for reminder.
> >
> > > I am asking because I am usually basing my development branches
> > > off of your fsnotify branch, but I can base them on the unpushed branch.
> > >
> > > Heads up. I am playing with extra privileges we may be able to
> > > allow an ns_capable user.
> > > For example, watching a FS_USERNS_MOUNT filesystem that the user
> > > itself has mounted inside userns.
> > >
> > > Another feature I am investigating is how to utilize the new idmapped
> > > mounts to get a subtree watch functionality. This requires attaching a
> > > userns to the group on fanotify_init().
> > >
> > > <hand waving>
> > > If the group's userns are the same or below the idmapped mount userns,
> > > then all the objects accessed via that idmapped mount are accessible
> > > to the group's userns admin. We can use that fact to filter events very
> > > early based on their mnt_userns and the group's userns, which should be
> > > cheaper than any subtree permission checks.
> > > <\hand waving>
> >
> > Yeah, I agree this should work. Just it seems to me the userbase for this
> > functionality will be (at least currently) rather limited. While full
> 
> That may change when systemd home dirs feature starts to use
> idmapped mounts.
> Being able to watch the user's entire home directory is a big win
> already.

Hey Amir,
Hey Jan,

I think so too.

> 
> > subtree watches would be IMO interesting to much more users.
> 
> Agreed.

We have a use-case for subtree watches: One feature for containers we
have is that users can e.g. tell us that they want the container manager
to hotplug an arbitrary unix or block device into the container whenever
the relevant device shows up on the system. For example they could
instruct the container manager to plugin some new driver device when it
shows up in /dev. That works nicely because of uevents. But users quite
often also instruct us to plugin a path once it shows up in some
directory in the filesystem hierarchy and unplug it once it is removed.
Right now we're mainting an inotify-based hand-rolled recursive watch to
make this work so we detect that add and remove event. I would be wildly
excited if we could get rid of some of that complexity by using subtree
watches. The container manager on the host will be unaffected by this
feature since it will usually have root privileges and manage
unprivileged containers.
The unprivileged (userns use-case specifically here) subtree watches
will be necessary and really good to have to make this work for
container workloads and nested containers, i.e. where the container
manager itselfs runs in a container and starts new containres. Since the
subtree feature would be interesting for systemd itself and since our
container manager (ChromeOS etc.) runs systemd inside unprivileged
containers on a large scale it would be good if subtree watches could
work in userns too.

> 
> I was looking into that as well, using the example of nfsd_acceptable()
> to implement the subtree permission check.
> 
> The problem here is that even if unprivileged users cannot compromise
> security, they can still cause significant CPU overhead either queueing
> events or filtering events and that is something I haven't been able to
> figure out a way to escape from.
> 
> BUT, if you allow userns admin to setup subtree watches (a.k.a filtered
> filesystem marks) on a userns filesystem/idmapped mount, now users

I think that sounds reasonable.
If the mount really is idmapped, it might be interesting to consider
checking for privilege in the mnt_userns in addition to the regular
permission checks that fanotify performs. My (equally handwavy) thinking
is that this might allow for a nice feature where the creator of the
mount (e.g. systemd) can block the creation of subtree watches by
attaching a mnt_userns to the mnt that the user has no privilege in.
(Just a thought.).

Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-17 17:45         ` Christian Brauner
@ 2021-03-17 19:14           ` Amir Goldstein
  2021-03-18 14:31             ` Christian Brauner
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-17 19:14 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 17, 2021 at 7:45 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Wed, Mar 17, 2021 at 02:19:57PM +0200, Amir Goldstein wrote:
> > On Wed, Mar 17, 2021 at 1:42 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> > > > On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > > > > Jan,
> > > > > >
> > > > > > These patches try to implement a minimal set and least controversial
> > > > > > functionality that we can allow for unprivileged users as a starting
> > > > > > point.
> > > > > >
> > > > > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > > > > patches using the unprivileged listener LTP tests written by Matthew
> > > > > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> > > > >
> > > > > Thanks. I've added both patches to my tree.
> > > >
> > > > Great!
> > > > I'll go post the LTP tests and work on the man page updates.
> > > >
> > > > BTW, I noticed that you pushed the aggregating for_next branch,
> > > > but not the fsnotify topic branch.
> > > >
> > > > Is this intentional?
> > >
> > > Not really, pushed now. Thanks for reminder.
> > >
> > > > I am asking because I am usually basing my development branches
> > > > off of your fsnotify branch, but I can base them on the unpushed branch.
> > > >
> > > > Heads up. I am playing with extra privileges we may be able to
> > > > allow an ns_capable user.
> > > > For example, watching a FS_USERNS_MOUNT filesystem that the user
> > > > itself has mounted inside userns.
> > > >
> > > > Another feature I am investigating is how to utilize the new idmapped
> > > > mounts to get a subtree watch functionality. This requires attaching a
> > > > userns to the group on fanotify_init().
> > > >
> > > > <hand waving>
> > > > If the group's userns are the same or below the idmapped mount userns,
> > > > then all the objects accessed via that idmapped mount are accessible
> > > > to the group's userns admin. We can use that fact to filter events very
> > > > early based on their mnt_userns and the group's userns, which should be
> > > > cheaper than any subtree permission checks.
> > > > <\hand waving>
> > >
> > > Yeah, I agree this should work. Just it seems to me the userbase for this
> > > functionality will be (at least currently) rather limited. While full
> >
> > That may change when systemd home dirs feature starts to use
> > idmapped mounts.
> > Being able to watch the user's entire home directory is a big win
> > already.
>
> Hey Amir,
> Hey Jan,
>
> I think so too.
>
> >
> > > subtree watches would be IMO interesting to much more users.
> >
> > Agreed.
>
> We have a use-case for subtree watches: One feature for containers we
> have is that users can e.g. tell us that they want the container manager
> to hotplug an arbitrary unix or block device into the container whenever
> the relevant device shows up on the system. For example they could
> instruct the container manager to plugin some new driver device when it
> shows up in /dev. That works nicely because of uevents. But users quite
> often also instruct us to plugin a path once it shows up in some
> directory in the filesystem hierarchy and unplug it once it is removed.
> Right now we're mainting an inotify-based hand-rolled recursive watch to
> make this work so we detect that add and remove event. I would be wildly
> excited if we could get rid of some of that complexity by using subtree
> watches. The container manager on the host will be unaffected by this
> feature since it will usually have root privileges and manage
> unprivileged containers.
> The unprivileged (userns use-case specifically here) subtree watches
> will be necessary and really good to have to make this work for
> container workloads and nested containers, i.e. where the container
> manager itselfs runs in a container and starts new containres. Since the
> subtree feature would be interesting for systemd itself and since our
> container manager (ChromeOS etc.) runs systemd inside unprivileged
> containers on a large scale it would be good if subtree watches could
> work in userns too.
>

I don't understand the subtree watch use case.
You will have to walk me through it.

What exactly is the container manager trying to detect?
That a subdir of a specific name/path was created/deleted?
It doesn't sound like a recursive watch is needed for that.
What am I missing?

As for nested container managers (and systemd), my thinking is
that if all the mounts that manager is watching for serving its containers
are idmapped to that manager's userns (is that a viable option?), then
there shouldn't be a problem to setup userns filtered watches in order to
be notified on all the events that happen via those idmapped mounts
and filtering by "subtree" is not needed.
I am clearly far from understanding the big picture.

> >
> > I was looking into that as well, using the example of nfsd_acceptable()
> > to implement the subtree permission check.
> >
> > The problem here is that even if unprivileged users cannot compromise
> > security, they can still cause significant CPU overhead either queueing
> > events or filtering events and that is something I haven't been able to
> > figure out a way to escape from.
> >
> > BUT, if you allow userns admin to setup subtree watches (a.k.a filtered
> > filesystem marks) on a userns filesystem/idmapped mount, now users
>
> I think that sounds reasonable.
> If the mount really is idmapped, it might be interesting to consider
> checking for privilege in the mnt_userns in addition to the regular
> permission checks that fanotify performs. My (equally handwavy) thinking
> is that this might allow for a nice feature where the creator of the
> mount (e.g. systemd) can block the creation of subtree watches by
> attaching a mnt_userns to the mnt that the user has no privilege in.
> (Just a thought.).
>

Currently, (upstream) only init_userns CAP_SYS_ADMIN can setup
fanotify watches.
In linux-next, unprivileged user can already setup inode watches
(i.e. like inotify).

So I am not sure what you are referring to by "block the creation of
subtree watches".

If systemd were to idmap my home dir to mnt_userns where my user
has CAP_SYS_ADMIN, then allowing my user to setup a watch for
all events on that mount should not be too hard.
If you think that is useful and you want to play with this feature I can
provide a WIP branch soon.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-17 19:14           ` Amir Goldstein
@ 2021-03-18 14:31             ` Christian Brauner
  2021-03-18 16:48               ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-18 14:31 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 17, 2021 at 09:14:06PM +0200, Amir Goldstein wrote:
> On Wed, Mar 17, 2021 at 7:45 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > On Wed, Mar 17, 2021 at 02:19:57PM +0200, Amir Goldstein wrote:
> > > On Wed, Mar 17, 2021 at 1:42 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> > > > > On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> > > > > >
> > > > > > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > > > > > Jan,
> > > > > > >
> > > > > > > These patches try to implement a minimal set and least controversial
> > > > > > > functionality that we can allow for unprivileged users as a starting
> > > > > > > point.
> > > > > > >
> > > > > > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > > > > > patches using the unprivileged listener LTP tests written by Matthew
> > > > > > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> > > > > >
> > > > > > Thanks. I've added both patches to my tree.
> > > > >
> > > > > Great!
> > > > > I'll go post the LTP tests and work on the man page updates.
> > > > >
> > > > > BTW, I noticed that you pushed the aggregating for_next branch,
> > > > > but not the fsnotify topic branch.
> > > > >
> > > > > Is this intentional?
> > > >
> > > > Not really, pushed now. Thanks for reminder.
> > > >
> > > > > I am asking because I am usually basing my development branches
> > > > > off of your fsnotify branch, but I can base them on the unpushed branch.
> > > > >
> > > > > Heads up. I am playing with extra privileges we may be able to
> > > > > allow an ns_capable user.
> > > > > For example, watching a FS_USERNS_MOUNT filesystem that the user
> > > > > itself has mounted inside userns.
> > > > >
> > > > > Another feature I am investigating is how to utilize the new idmapped
> > > > > mounts to get a subtree watch functionality. This requires attaching a
> > > > > userns to the group on fanotify_init().
> > > > >
> > > > > <hand waving>
> > > > > If the group's userns are the same or below the idmapped mount userns,
> > > > > then all the objects accessed via that idmapped mount are accessible
> > > > > to the group's userns admin. We can use that fact to filter events very
> > > > > early based on their mnt_userns and the group's userns, which should be
> > > > > cheaper than any subtree permission checks.
> > > > > <\hand waving>
> > > >
> > > > Yeah, I agree this should work. Just it seems to me the userbase for this
> > > > functionality will be (at least currently) rather limited. While full
> > >
> > > That may change when systemd home dirs feature starts to use
> > > idmapped mounts.
> > > Being able to watch the user's entire home directory is a big win
> > > already.
> >
> > Hey Amir,
> > Hey Jan,
> >
> > I think so too.
> >
> > >
> > > > subtree watches would be IMO interesting to much more users.
> > >
> > > Agreed.
> >
> > We have a use-case for subtree watches: One feature for containers we
> > have is that users can e.g. tell us that they want the container manager
> > to hotplug an arbitrary unix or block device into the container whenever
> > the relevant device shows up on the system. For example they could
> > instruct the container manager to plugin some new driver device when it
> > shows up in /dev. That works nicely because of uevents. But users quite
> > often also instruct us to plugin a path once it shows up in some
> > directory in the filesystem hierarchy and unplug it once it is removed.
> > Right now we're mainting an inotify-based hand-rolled recursive watch to
> > make this work so we detect that add and remove event. I would be wildly
> > excited if we could get rid of some of that complexity by using subtree
> > watches. The container manager on the host will be unaffected by this
> > feature since it will usually have root privileges and manage
> > unprivileged containers.
> > The unprivileged (userns use-case specifically here) subtree watches
> > will be necessary and really good to have to make this work for
> > container workloads and nested containers, i.e. where the container
> > manager itselfs runs in a container and starts new containres. Since the
> > subtree feature would be interesting for systemd itself and since our
> > container manager (ChromeOS etc.) runs systemd inside unprivileged
> > containers on a large scale it would be good if subtree watches could
> > work in userns too.
> >
> 
> I don't understand the subtree watch use case.
> You will have to walk me through it.
> 
> What exactly is the container manager trying to detect?
> That a subdir of a specific name/path was created/deleted?
> It doesn't sound like a recursive watch is needed for that.
> What am I missing?

Sorry if I was unclear. For example, a user may tell the container
manager to hotplug

/home/jdoe/some/path/

into the container. Users are free to tell the container manager that
that path doesn't need exist. At that poing the container manager will
start to mirror the first part of the path that does exist. And as soon
as the full path has been created the container manager will hotplug
that path as a new mount into the container. Similarly it will
remove that mount from the container as soon as the path is deleted from
the host.

So say the user tells the container manager to inject

/home/jdoe/some/path

into the container but only

/home/jdoe

currently exists then the container manager will recursively watch:

/home/jdoe

waiting for the full path to be created.

This is all a bit nasty since we need to ensure that we notice all
events. For example, the user could create

/home/jdoe/some

but then right after that

/home/jdoe/some

could be removed again. With the inotify listener we need to constantly
add (and remove iirc) watch fds and ensure that we never miss an event
and that's brittle. I'd rather have something that allows me to mirror

/home/jdoe

recursively directly. But maybe I'm misunderstanding fanotify and it
can't really help us but I thought that subtree watches might.

One of the reason't I didn't use fanotiy when we implemented this was
that it couldn't be used inside of user namespaces, i.e. CAP_SYS_ADMIN
in the initial userns was required.
We always make very sure that users can properly nest containers and
have almost all the same features available that they have with
non-nested containers. And since fanotify currently requires
CAP_SYS_ADMIN in the init userns it means a container manager running
inside a container wanting to hotplug paths for nested containers can't
use fanotify. 

(Btw, this is part of the code I wrote to implement this logic via
inotify a long time ago
https://github.com/lxc/lxd/blob/f12f03a4ba4645892ef6cc167c24da49d1217b02/lxd/device/device_utils_inotify.go
[I'm sorry you have to see this in case you click on it...])

> 
> As for nested container managers (and systemd), my thinking is
> that if all the mounts that manager is watching for serving its containers
> are idmapped to that manager's userns (is that a viable option?), then

Yes, it is possible. We do now support AT_RECURSIVE with all mount
attributes including idmapping mounts.

> there shouldn't be a problem to setup userns filtered watches in order to
> be notified on all the events that happen via those idmapped mounts
> and filtering by "subtree" is not needed.
> I am clearly far from understanding the big picture.

I think I need to refamiliarize myself with what "subtree" watches do.
Maybe I misunderstood what they do. I'll take a look.

> 
> > >
> > > I was looking into that as well, using the example of nfsd_acceptable()
> > > to implement the subtree permission check.
> > >
> > > The problem here is that even if unprivileged users cannot compromise
> > > security, they can still cause significant CPU overhead either queueing
> > > events or filtering events and that is something I haven't been able to
> > > figure out a way to escape from.
> > >
> > > BUT, if you allow userns admin to setup subtree watches (a.k.a filtered
> > > filesystem marks) on a userns filesystem/idmapped mount, now users
> >
> > I think that sounds reasonable.
> > If the mount really is idmapped, it might be interesting to consider
> > checking for privilege in the mnt_userns in addition to the regular
> > permission checks that fanotify performs. My (equally handwavy) thinking
> > is that this might allow for a nice feature where the creator of the
> > mount (e.g. systemd) can block the creation of subtree watches by
> > attaching a mnt_userns to the mnt that the user has no privilege in.
> > (Just a thought.).
> >
> 
> Currently, (upstream) only init_userns CAP_SYS_ADMIN can setup
> fanotify watches.
> In linux-next, unprivileged user can already setup inode watches
> (i.e. like inotify).

Just to clarify: you mean "unprivileged" as in non-root users in
init_user_ns and therefore also users in non-init userns. That's what
inotify allows you. This would probably allows us to use fanotify
instead of the hand-rolled recursive notify watching we currently do and
that I linked to above.

> 
> So I am not sure what you are referring to by "block the creation of
> subtree watches".
> 
> If systemd were to idmap my home dir to mnt_userns where my user
> has CAP_SYS_ADMIN, then allowing my user to setup a watch for
> all events on that mount should not be too hard.

Right, that was essentially what my comment was about.

> If you think that is useful and you want to play with this feature I can
> provide a WIP branch soon.

I would like to first play with the support for unprivileged fanotify
but sure, it does sound useful!

Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-17 12:19       ` Amir Goldstein
  2021-03-17 17:45         ` Christian Brauner
@ 2021-03-18 15:44         ` Jan Kara
  2021-03-18 17:07           ` Amir Goldstein
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Kara @ 2021-03-18 15:44 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

On Wed 17-03-21 14:19:57, Amir Goldstein wrote:
> On Wed, Mar 17, 2021 at 1:42 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> > > On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > > > Jan,
> > > > >
> > > > > These patches try to implement a minimal set and least controversial
> > > > > functionality that we can allow for unprivileged users as a starting
> > > > > point.
> > > > >
> > > > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > > > patches using the unprivileged listener LTP tests written by Matthew
> > > > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> > > >
> > > > Thanks. I've added both patches to my tree.
> > >
> > > Great!
> > > I'll go post the LTP tests and work on the man page updates.
> > >
> > > BTW, I noticed that you pushed the aggregating for_next branch,
> > > but not the fsnotify topic branch.
> > >
> > > Is this intentional?
> >
> > Not really, pushed now. Thanks for reminder.
> >
> > > I am asking because I am usually basing my development branches
> > > off of your fsnotify branch, but I can base them on the unpushed branch.
> > >
> > > Heads up. I am playing with extra privileges we may be able to
> > > allow an ns_capable user.
> > > For example, watching a FS_USERNS_MOUNT filesystem that the user
> > > itself has mounted inside userns.
> > >
> > > Another feature I am investigating is how to utilize the new idmapped
> > > mounts to get a subtree watch functionality. This requires attaching a
> > > userns to the group on fanotify_init().
> > >
> > > <hand waving>
> > > If the group's userns are the same or below the idmapped mount userns,
> > > then all the objects accessed via that idmapped mount are accessible
> > > to the group's userns admin. We can use that fact to filter events very
> > > early based on their mnt_userns and the group's userns, which should be
> > > cheaper than any subtree permission checks.
> > > <\hand waving>
> >
> > Yeah, I agree this should work. Just it seems to me the userbase for this
> > functionality will be (at least currently) rather limited. While full
> 
> That may change when systemd home dirs feature starts to use idmapped
> mounts. Being able to watch the user's entire home directory is a big
> win already.

Do you mean that home directory would be an extra mount with userns in
which the user has CAP_SYS_ADMIN so he'd be able to watch subtrees on that
mount?

> > subtree watches would be IMO interesting to much more users.
> 
> Agreed.
> 
> I was looking into that as well, using the example of nfsd_acceptable()
> to implement the subtree permission check.
> 
> The problem here is that even if unprivileged users cannot compromise
> security, they can still cause significant CPU overhead either queueing
> events or filtering events and that is something I haven't been able to
> figure out a way to escape from.

WRT queueing overhead, given a user can place ~1M of directory watches, he
can cause noticable total overhead for queueing events anyway. Furthermore
the queue size is limited so unless the user spends time consuming events
as well, the load won't last long. But I agree we need to be careful not to
introduce too big latencies to operations generating events. So I think if
we could quickly detect whether a generated event has a good chance of
being relevant for some subtree watch of a group and queue it in that case
and worry about permission checks only once events are received and thus
receiver pays the cost of expensive checks, that might be fine as well.

								Honza
 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-18 14:31             ` Christian Brauner
@ 2021-03-18 16:48               ` Amir Goldstein
  2021-03-19 13:40                 ` Christian Brauner
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-18 16:48 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

[...]

I understand the use case.

> I'd rather have something that allows me to mirror
>
> /home/jdoe
>
> recursively directly. But maybe I'm misunderstanding fanotify and it
> can't really help us but I thought that subtree watches might.
>

There are no subtree watches. They are still a holy grale for fanotify...
There are filesystem and mnt watches and the latter support far fewer
events (only events for operations that carry the path argument).

With filesystem watches, you can get events for all mkdirs and you can
figure out the created path, but you'd have to do all the filtering in
userspace.

What I am trying to create is "filtered" filesystem watches and the filter needs
to be efficient enough so the watcher will not incur too big of a penalty
on all the operations in the filesystem.

Thanks to your mnt_userns changes, implementing a filter to intercept
(say) mkdir calles on a specific mnt_userns should be quite simple, but
filtering by "path" (i.e. /home/jdoe/some/path) will still need to happen in
userspace.

This narrows the problem to the nested container manager that will only
need to filter events which happened via mounts under its control.

[...]

> > there shouldn't be a problem to setup userns filtered watches in order to
> > be notified on all the events that happen via those idmapped mounts
> > and filtering by "subtree" is not needed.
> > I am clearly far from understanding the big picture.
>
> I think I need to refamiliarize myself with what "subtree" watches do.
> Maybe I misunderstood what they do. I'll take a look.
>

You will not find them :-)

[...]

> > Currently, (upstream) only init_userns CAP_SYS_ADMIN can setup
> > fanotify watches.
> > In linux-next, unprivileged user can already setup inode watches
> > (i.e. like inotify).
>
> Just to clarify: you mean "unprivileged" as in non-root users in
> init_user_ns and therefore also users in non-init userns. That's what

Correct.

> inotify allows you. This would probably allows us to use fanotify
> instead of the hand-rolled recursive notify watching we currently do and
> that I linked to above.
>

The code that sits in linux-next can give you pretty much a drop-in
replacement of inotify and nothing more. See example code:
https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid

> > If you think that is useful and you want to play with this feature I can
> > provide a WIP branch soon.
>
> I would like to first play with the support for unprivileged fanotify
> but sure, it does sound useful!

Just so you have an idea what I am talking about, this is a very early
POC branch:
https://github.com/amir73il/linux/commits/fanotify_userns

It will not be very useful to you yet I think.
Userns admin can watch all events on a tmpfs/fuse mounted
inside its userns.
Userns admin can watch open/read/write/close events on an
idmapped mount mapped to its userns.
But I think the more useful feature would be to watch all events on
an idmapped mount mapped to its userns.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-18 15:44         ` Jan Kara
@ 2021-03-18 17:07           ` Amir Goldstein
  2021-03-18 18:40             ` Christian Brauner
  2021-03-22 18:38             ` Amir Goldstein
  0 siblings, 2 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-18 17:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

> > That may change when systemd home dirs feature starts to use idmapped
> > mounts. Being able to watch the user's entire home directory is a big
> > win already.
>
> Do you mean that home directory would be an extra mount with userns in
> which the user has CAP_SYS_ADMIN so he'd be able to watch subtrees on that
> mount?
>

That is what I meant.
My understanding of the systemd-homed use case for idmapped mounts is
that the user has CAP_SYS_ADMIN is the mapped userns, but I may be wrong.

> > > subtree watches would be IMO interesting to much more users.
> >
> > Agreed.
> >
> > I was looking into that as well, using the example of nfsd_acceptable()
> > to implement the subtree permission check.
> >
> > The problem here is that even if unprivileged users cannot compromise
> > security, they can still cause significant CPU overhead either queueing
> > events or filtering events and that is something I haven't been able to
> > figure out a way to escape from.
>
> WRT queueing overhead, given a user can place ~1M of directory watches, he
> can cause noticable total overhead for queueing events anyway. Furthermore

I suppose so. But a user placing 1M dir watches at least adds this overhead
knowingly. Adding a overhead on the entire filesystem when just wanting to
watch a small subtree doesn't sound ideal. Especially in very nested setups.
So yes, we need to be careful.

> the queue size is limited so unless the user spends time consuming events
> as well, the load won't last long. But I agree we need to be careful not to
> introduce too big latencies to operations generating events. So I think if
> we could quickly detect whether a generated event has a good chance of
> being relevant for some subtree watch of a group and queue it in that case
> and worry about permission checks only once events are received and thus
> receiver pays the cost of expensive checks, that might be fine as well.
>

So far the only idea I had for "quickly detect" which I cannot find flaws in
is to filter by mnt_userms, but its power is limited.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-18 17:07           ` Amir Goldstein
@ 2021-03-18 18:40             ` Christian Brauner
  2021-03-22 18:38             ` Amir Goldstein
  1 sibling, 0 replies; 33+ messages in thread
From: Christian Brauner @ 2021-03-18 18:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Thu, Mar 18, 2021 at 07:07:00PM +0200, Amir Goldstein wrote:
> > > That may change when systemd home dirs feature starts to use idmapped
> > > mounts. Being able to watch the user's entire home directory is a big
> > > win already.
> >
> > Do you mean that home directory would be an extra mount with userns in
> > which the user has CAP_SYS_ADMIN so he'd be able to watch subtrees on that
> > mount?
> >
> 
> That is what I meant.
> My understanding of the systemd-homed use case for idmapped mounts is
> that the user has CAP_SYS_ADMIN is the mapped userns, but I may be wrong.

systemd can simply create a new userns with the uid/gid of the target
user effectively delegating it (That's independent of actually writing a
uid gid mapping for the userns which will be done with privileges.) and
then attach it to that mount for the user.
Mine and Lennart's idea there so far has been that the creation would
likely be done by the user's session at login time

brauner     1346  0.0  0.0  20956  8512 ?        Ss   Mar03   0:03 /lib/systemd/systemd --user

and systemd as root would then take care of writing the mapping to the
userns and then attaching it to the mount. (I'll see Lennart in the next
few days and see what works best and once we're ready start a discussion
somwhere on a public list, I would suggest.)

(If systemd doesn't want a user to be able to monitor a mnt it can
simply create a userns with a different uid/gid but with the relevant
mapping. This was what my earlier point was about "blocking a user from
creating a subtree watch".)

Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-18 16:48               ` Amir Goldstein
@ 2021-03-19 13:40                 ` Christian Brauner
  2021-03-19 14:21                   ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-19 13:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Thu, Mar 18, 2021 at 06:48:11PM +0200, Amir Goldstein wrote:
> [...]
> 
> I understand the use case.
> 
> > I'd rather have something that allows me to mirror
> >
> > /home/jdoe
> >
> > recursively directly. But maybe I'm misunderstanding fanotify and it
> > can't really help us but I thought that subtree watches might.
> >
> 
> There are no subtree watches. They are still a holy grale for fanotify...
> There are filesystem and mnt watches and the latter support far fewer
> events (only events for operations that carry the path argument).
> 
> With filesystem watches, you can get events for all mkdirs and you can
> figure out the created path, but you'd have to do all the filtering in
> userspace.
> 
> What I am trying to create is "filtered" filesystem watches and the filter needs
> to be efficient enough so the watcher will not incur too big of a penalty
> on all the operations in the filesystem.
> 
> Thanks to your mnt_userns changes, implementing a filter to intercept
> (say) mkdir calles on a specific mnt_userns should be quite simple, but
> filtering by "path" (i.e. /home/jdoe/some/path) will still need to happen in
> userspace.
> 
> This narrows the problem to the nested container manager that will only
> need to filter events which happened via mounts under its control.
> 
> [...]
> 
> > > there shouldn't be a problem to setup userns filtered watches in order to
> > > be notified on all the events that happen via those idmapped mounts
> > > and filtering by "subtree" is not needed.
> > > I am clearly far from understanding the big picture.
> >
> > I think I need to refamiliarize myself with what "subtree" watches do.
> > Maybe I misunderstood what they do. I'll take a look.
> >
> 
> You will not find them :-)

Heh. :)

> 
> [...]
> 
> > > Currently, (upstream) only init_userns CAP_SYS_ADMIN can setup
> > > fanotify watches.
> > > In linux-next, unprivileged user can already setup inode watches
> > > (i.e. like inotify).
> >
> > Just to clarify: you mean "unprivileged" as in non-root users in
> > init_user_ns and therefore also users in non-init userns. That's what
> 
> Correct.
> 
> > inotify allows you. This would probably allows us to use fanotify
> > instead of the hand-rolled recursive notify watching we currently do and
> > that I linked to above.
> >
> 
> The code that sits in linux-next can give you pretty much a drop-in
> replacement of inotify and nothing more. See example code:
> https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid

This is really great. Thank you for doing that work this will help quite
a lot of use-cases and make things way simpler. I created a TODO to port
our path-hotplug to this once this feature lands.

> 
> > > If you think that is useful and you want to play with this feature I can
> > > provide a WIP branch soon.
> >
> > I would like to first play with the support for unprivileged fanotify
> > but sure, it does sound useful!
> 
> Just so you have an idea what I am talking about, this is a very early
> POC branch:
> https://github.com/amir73il/linux/commits/fanotify_userns

Thanks!  I'll try to pull this and take a look next week. I hope that's
ok.

Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-19 13:40                 ` Christian Brauner
@ 2021-03-19 14:21                   ` Amir Goldstein
  2021-03-20 12:57                     ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-19 14:21 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Fri, Mar 19, 2021 at 3:40 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Thu, Mar 18, 2021 at 06:48:11PM +0200, Amir Goldstein wrote:
> > [...]
> >
> > I understand the use case.
> >
> > > I'd rather have something that allows me to mirror
> > >
> > > /home/jdoe
> > >
> > > recursively directly. But maybe I'm misunderstanding fanotify and it
> > > can't really help us but I thought that subtree watches might.
> > >
> >
> > There are no subtree watches. They are still a holy grale for fanotify...
> > There are filesystem and mnt watches and the latter support far fewer
> > events (only events for operations that carry the path argument).
> >
> > With filesystem watches, you can get events for all mkdirs and you can
> > figure out the created path, but you'd have to do all the filtering in
> > userspace.
> >
> > What I am trying to create is "filtered" filesystem watches and the filter needs
> > to be efficient enough so the watcher will not incur too big of a penalty
> > on all the operations in the filesystem.
> >
> > Thanks to your mnt_userns changes, implementing a filter to intercept
> > (say) mkdir calles on a specific mnt_userns should be quite simple, but
> > filtering by "path" (i.e. /home/jdoe/some/path) will still need to happen in
> > userspace.
> >
> > This narrows the problem to the nested container manager that will only
> > need to filter events which happened via mounts under its control.
> >
> > [...]
> >
> > > > there shouldn't be a problem to setup userns filtered watches in order to
> > > > be notified on all the events that happen via those idmapped mounts
> > > > and filtering by "subtree" is not needed.
> > > > I am clearly far from understanding the big picture.
> > >
> > > I think I need to refamiliarize myself with what "subtree" watches do.
> > > Maybe I misunderstood what they do. I'll take a look.
> > >
> >
> > You will not find them :-)
>
> Heh. :)
>
> >
> > [...]
> >
> > > > Currently, (upstream) only init_userns CAP_SYS_ADMIN can setup
> > > > fanotify watches.
> > > > In linux-next, unprivileged user can already setup inode watches
> > > > (i.e. like inotify).
> > >
> > > Just to clarify: you mean "unprivileged" as in non-root users in
> > > init_user_ns and therefore also users in non-init userns. That's what
> >
> > Correct.
> >
> > > inotify allows you. This would probably allows us to use fanotify
> > > instead of the hand-rolled recursive notify watching we currently do and
> > > that I linked to above.
> > >
> >
> > The code that sits in linux-next can give you pretty much a drop-in
> > replacement of inotify and nothing more. See example code:
> > https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid
>
> This is really great. Thank you for doing that work this will help quite
> a lot of use-cases and make things way simpler. I created a TODO to port
> our path-hotplug to this once this feature lands.
>

FWIW, I just tried to build this branch on Ubuntu 20.04.2 with LTS kernel
and there were some build issues, so rebased my branch on upstream
inotify-tools to fix those build issues.

I was not aware that the inotify-tools project is alive, I never intended
to upstream this demo code and never created a github pull request
but rebasing on upstream brought in some CI scripts, when I pushed the
branch to my github it triggered some tests that reported build failures on
Ubuntu 16.04 and 18.04.

Anyway, there is a pre-rebase branch 'fanotify_name' and the post rebase
branch 'fanotify_name_fid'. You can try whichever works for you.

You can look at the test script src/test_demo.sh for usage example.
Or just cd into a writable directory and run the script to see the demo.
The demo determines whether to use a recursive watch or "global"
watch by the uid of the user.

> >
> > > > If you think that is useful and you want to play with this feature I can
> > > > provide a WIP branch soon.
> > >
> > > I would like to first play with the support for unprivileged fanotify
> > > but sure, it does sound useful!
> >
> > Just so you have an idea what I am talking about, this is a very early
> > POC branch:
> > https://github.com/amir73il/linux/commits/fanotify_userns
>
> Thanks!  I'll try to pull this and take a look next week. I hope that's
> ok.
>

Fine. I'm curious to know what it does.
Did not get to test it with userns yet :)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-19 14:21                   ` Amir Goldstein
@ 2021-03-20 12:57                     ` Amir Goldstein
  2021-03-22 12:44                       ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-20 12:57 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

> > > The code that sits in linux-next can give you pretty much a drop-in
> > > replacement of inotify and nothing more. See example code:
> > > https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid
> >
> > This is really great. Thank you for doing that work this will help quite
> > a lot of use-cases and make things way simpler. I created a TODO to port
> > our path-hotplug to this once this feature lands.
> >
>
> FWIW, I just tried to build this branch on Ubuntu 20.04.2 with LTS kernel
> and there were some build issues, so rebased my branch on upstream
> inotify-tools to fix those build issues.
>
> I was not aware that the inotify-tools project is alive, I never intended
> to upstream this demo code and never created a github pull request
> but rebasing on upstream brought in some CI scripts, when I pushed the
> branch to my github it triggered some tests that reported build failures on
> Ubuntu 16.04 and 18.04.
>
> Anyway, there is a pre-rebase branch 'fanotify_name' and the post rebase
> branch 'fanotify_name_fid'. You can try whichever works for you.
>
> You can look at the test script src/test_demo.sh for usage example.
> Or just cd into a writable directory and run the script to see the demo.
> The demo determines whether to use a recursive watch or "global"
> watch by the uid of the user.
>
> > >
> > > > > If you think that is useful and you want to play with this feature I can
> > > > > provide a WIP branch soon.
> > > >
> > > > I would like to first play with the support for unprivileged fanotify
> > > > but sure, it does sound useful!
> > >
> > > Just so you have an idea what I am talking about, this is a very early
> > > POC branch:
> > > https://github.com/amir73il/linux/commits/fanotify_userns
> >
> > Thanks!  I'll try to pull this and take a look next week. I hope that's
> > ok.
> >
>
> Fine. I'm curious to know what it does.
> Did not get to test it with userns yet :)

Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
inside userns and works fine, with two wrinkles I needed to iron:

1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
    zero f_fsid (easy to fix)
2. open_by_handle_at() is not userns aware (can relax for
    FS_USERNS_MOUNT fs)

Pushed these two fixes to branch fanotify_userns.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-20 12:57                     ` Amir Goldstein
@ 2021-03-22 12:44                       ` Amir Goldstein
  2021-03-22 16:28                         ` Christian Brauner
  2021-03-24 13:57                         ` Amir Goldstein
  0 siblings, 2 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-22 12:44 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Sat, Mar 20, 2021 at 2:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > > The code that sits in linux-next can give you pretty much a drop-in
> > > > replacement of inotify and nothing more. See example code:
> > > > https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid
> > >
> > > This is really great. Thank you for doing that work this will help quite
> > > a lot of use-cases and make things way simpler. I created a TODO to port
> > > our path-hotplug to this once this feature lands.
> > >
> >
> > FWIW, I just tried to build this branch on Ubuntu 20.04.2 with LTS kernel
> > and there were some build issues, so rebased my branch on upstream
> > inotify-tools to fix those build issues.
> >
> > I was not aware that the inotify-tools project is alive, I never intended
> > to upstream this demo code and never created a github pull request
> > but rebasing on upstream brought in some CI scripts, when I pushed the
> > branch to my github it triggered some tests that reported build failures on
> > Ubuntu 16.04 and 18.04.
> >
> > Anyway, there is a pre-rebase branch 'fanotify_name' and the post rebase
> > branch 'fanotify_name_fid'. You can try whichever works for you.

FYI, fixed the CI build errors on fanotify_name_fid branch.

> >
> > You can look at the test script src/test_demo.sh for usage example.
> > Or just cd into a writable directory and run the script to see the demo.
> > The demo determines whether to use a recursive watch or "global"
> > watch by the uid of the user.
> >
> > > >
> > > > > > If you think that is useful and you want to play with this feature I can
> > > > > > provide a WIP branch soon.
> > > > >
> > > > > I would like to first play with the support for unprivileged fanotify
> > > > > but sure, it does sound useful!
> > > >
> > > > Just so you have an idea what I am talking about, this is a very early
> > > > POC branch:
> > > > https://github.com/amir73il/linux/commits/fanotify_userns
> > >
> > > Thanks!  I'll try to pull this and take a look next week. I hope that's
> > > ok.
> > >
> >
> > Fine. I'm curious to know what it does.
> > Did not get to test it with userns yet :)
>
> Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> inside userns and works fine, with two wrinkles I needed to iron:
>
> 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
>     zero f_fsid (easy to fix)
> 2. open_by_handle_at() is not userns aware (can relax for
>     FS_USERNS_MOUNT fs)
>
> Pushed these two fixes to branch fanotify_userns.

Pushed another fix to mnt refcount bug in WIP and another commit to
add the last piece that could make fanotify usable for systemd-homed
setup - a filesystem watch filtered by mnt_userns (not tested yet).

One thing I am struggling with is the language to describe user ns
and idmapped mounts related logic. I have a feeling that I am getting
the vocabulary all wrong. See my commit message text below.
Editorial corrections would be appreciated.

Thanks,
Amir.

---

    fanotify: support sb mark filtered by idmapped mount

    For a group created inside userns, adding a mount mark is allowed
    if the mount is idmapped inside the group's userns and adding a
    filesystem mark is allowed if the filesystem was mounted inside the
    group's userns.

    Allow also adding a filesystem mark on a filesystem that is mounted
    in the group's userns via an idmapped mount.

    In that case, the mark is created with an internal flag, which indicates
    that events should be sent to the group only if they happened via an
    idmapped mount inside the group's userns.

    Signed-off-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-22 12:44                       ` Amir Goldstein
@ 2021-03-22 16:28                         ` Christian Brauner
  2021-03-22 17:22                           ` Amir Goldstein
  2021-03-24 13:57                         ` Amir Goldstein
  1 sibling, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-22 16:28 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Mon, Mar 22, 2021 at 02:44:20PM +0200, Amir Goldstein wrote:
> On Sat, Mar 20, 2021 at 2:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > > > The code that sits in linux-next can give you pretty much a drop-in
> > > > > replacement of inotify and nothing more. See example code:
> > > > > https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid
> > > >
> > > > This is really great. Thank you for doing that work this will help quite
> > > > a lot of use-cases and make things way simpler. I created a TODO to port
> > > > our path-hotplug to this once this feature lands.
> > > >
> > >
> > > FWIW, I just tried to build this branch on Ubuntu 20.04.2 with LTS kernel
> > > and there were some build issues, so rebased my branch on upstream
> > > inotify-tools to fix those build issues.
> > >
> > > I was not aware that the inotify-tools project is alive, I never intended
> > > to upstream this demo code and never created a github pull request
> > > but rebasing on upstream brought in some CI scripts, when I pushed the
> > > branch to my github it triggered some tests that reported build failures on
> > > Ubuntu 16.04 and 18.04.
> > >
> > > Anyway, there is a pre-rebase branch 'fanotify_name' and the post rebase
> > > branch 'fanotify_name_fid'. You can try whichever works for you.
> 
> FYI, fixed the CI build errors on fanotify_name_fid branch.
> 
> > >
> > > You can look at the test script src/test_demo.sh for usage example.
> > > Or just cd into a writable directory and run the script to see the demo.
> > > The demo determines whether to use a recursive watch or "global"
> > > watch by the uid of the user.
> > >
> > > > >
> > > > > > > If you think that is useful and you want to play with this feature I can
> > > > > > > provide a WIP branch soon.
> > > > > >
> > > > > > I would like to first play with the support for unprivileged fanotify
> > > > > > but sure, it does sound useful!
> > > > >
> > > > > Just so you have an idea what I am talking about, this is a very early
> > > > > POC branch:
> > > > > https://github.com/amir73il/linux/commits/fanotify_userns
> > > >
> > > > Thanks!  I'll try to pull this and take a look next week. I hope that's
> > > > ok.
> > > >
> > >
> > > Fine. I'm curious to know what it does.
> > > Did not get to test it with userns yet :)
> >
> > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > inside userns and works fine, with two wrinkles I needed to iron:
> >
> > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> >     zero f_fsid (easy to fix)
> > 2. open_by_handle_at() is not userns aware (can relax for
> >     FS_USERNS_MOUNT fs)
> >
> > Pushed these two fixes to branch fanotify_userns.
> 
> Pushed another fix to mnt refcount bug in WIP and another commit to
> add the last piece that could make fanotify usable for systemd-homed
> setup - a filesystem watch filtered by mnt_userns (not tested yet).

Sounds interesting.

So I'm looking and commenting on that branch a little.
One general question, when fanotify FANOTIFY_PERM_EVENTS is set fanotify
will return a file descriptor (for relevant events) referring to the
file/directory that e.g. got created. And there are no permissions
checks other than the capable(CAP_SYS_ADMIN) check when the fanotify
instance is created, right?

> 
> One thing I am struggling with is the language to describe user ns
> and idmapped mounts related logic. I have a feeling that I am getting
> the vocabulary all wrong. See my commit message text below.

The lingo seems legit. :)

I need some time thinking about the concepts to convince myself that
this is safe. But I like the direction as this is going to be really
useful.

Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-22 16:28                         ` Christian Brauner
@ 2021-03-22 17:22                           ` Amir Goldstein
  0 siblings, 0 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-22 17:22 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Mon, Mar 22, 2021 at 6:28 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Mon, Mar 22, 2021 at 02:44:20PM +0200, Amir Goldstein wrote:
> > On Sat, Mar 20, 2021 at 2:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > > > The code that sits in linux-next can give you pretty much a drop-in
> > > > > > replacement of inotify and nothing more. See example code:
> > > > > > https://github.com/amir73il/inotify-tools/commits/fanotify_name_fid
> > > > >
> > > > > This is really great. Thank you for doing that work this will help quite
> > > > > a lot of use-cases and make things way simpler. I created a TODO to port
> > > > > our path-hotplug to this once this feature lands.
> > > > >
> > > >
> > > > FWIW, I just tried to build this branch on Ubuntu 20.04.2 with LTS kernel
> > > > and there were some build issues, so rebased my branch on upstream
> > > > inotify-tools to fix those build issues.
> > > >
> > > > I was not aware that the inotify-tools project is alive, I never intended
> > > > to upstream this demo code and never created a github pull request
> > > > but rebasing on upstream brought in some CI scripts, when I pushed the
> > > > branch to my github it triggered some tests that reported build failures on
> > > > Ubuntu 16.04 and 18.04.
> > > >
> > > > Anyway, there is a pre-rebase branch 'fanotify_name' and the post rebase
> > > > branch 'fanotify_name_fid'. You can try whichever works for you.
> >
> > FYI, fixed the CI build errors on fanotify_name_fid branch.
> >
> > > >
> > > > You can look at the test script src/test_demo.sh for usage example.
> > > > Or just cd into a writable directory and run the script to see the demo.
> > > > The demo determines whether to use a recursive watch or "global"
> > > > watch by the uid of the user.
> > > >
> > > > > >
> > > > > > > > If you think that is useful and you want to play with this feature I can
> > > > > > > > provide a WIP branch soon.
> > > > > > >
> > > > > > > I would like to first play with the support for unprivileged fanotify
> > > > > > > but sure, it does sound useful!
> > > > > >
> > > > > > Just so you have an idea what I am talking about, this is a very early
> > > > > > POC branch:
> > > > > > https://github.com/amir73il/linux/commits/fanotify_userns
> > > > >
> > > > > Thanks!  I'll try to pull this and take a look next week. I hope that's
> > > > > ok.
> > > > >
> > > >
> > > > Fine. I'm curious to know what it does.
> > > > Did not get to test it with userns yet :)
> > >
> > > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > > inside userns and works fine, with two wrinkles I needed to iron:
> > >
> > > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> > >     zero f_fsid (easy to fix)
> > > 2. open_by_handle_at() is not userns aware (can relax for
> > >     FS_USERNS_MOUNT fs)
> > >
> > > Pushed these two fixes to branch fanotify_userns.
> >
> > Pushed another fix to mnt refcount bug in WIP and another commit to
> > add the last piece that could make fanotify usable for systemd-homed
> > setup - a filesystem watch filtered by mnt_userns (not tested yet).
>
> Sounds interesting.
>
> So I'm looking and commenting on that branch a little.
> One general question, when fanotify FANOTIFY_PERM_EVENTS is set fanotify
> will return a file descriptor (for relevant events) referring to the
> file/directory that e.g. got created. And there are no permissions
> checks other than the capable(CAP_SYS_ADMIN) check when the fanotify
> instance is created, right?
>

Correct.
fanotify_init() enforces that in a few maybe not so obvious steps:

1. Either CAP_SYS_ADMIN or fid_mode (no file descriptor in event):

        if (!capable(CAP_SYS_ADMIN)) {
                /*
                 * An unprivileged user can setup an fanotify group with
                 * limited functionality - an unprivileged group is limited to
                 * notification events with file handles and it cannot use
                 * unlimited queue/marks.
                 */
                if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode)
                        return -EPERM;
        }

2. fid_mode is not supported for high priority classes:

        if (fid_mode && class != FAN_CLASS_NOTIF)
                return -EINVAL;

3. Permission events are only allowed for high priority classes:
        /*
         * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF.  These are not
         * allowed to set permissions events.
         */
        ret = -EINVAL;
        if (mask & FANOTIFY_PERM_EVENTS &&
            group->priority == FS_PRIO_0)

You may want to look at the summary of all the limitations on
unprivileged listener here:
https://github.com/amir73il/man-pages/blob/fanotify_unpriv/man2/fanotify_init.2#L400

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-18 17:07           ` Amir Goldstein
  2021-03-18 18:40             ` Christian Brauner
@ 2021-03-22 18:38             ` Amir Goldstein
  2021-03-24 11:48               ` Jan Kara
  1 sibling, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-22 18:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

> > > The problem here is that even if unprivileged users cannot compromise
> > > security, they can still cause significant CPU overhead either queueing
> > > events or filtering events and that is something I haven't been able to
> > > figure out a way to escape from.
> >
> > WRT queueing overhead, given a user can place ~1M of directory watches, he
> > can cause noticable total overhead for queueing events anyway. Furthermore
>
> I suppose so. But a user placing 1M dir watches at least adds this overhead
> knowingly. Adding a overhead on the entire filesystem when just wanting to
> watch a small subtree doesn't sound ideal. Especially in very nested setups.
> So yes, we need to be careful.
>

I was thinking about this some more and I think the answer is in your example.
User can only place 1M dir watches if ucount marks limits permits it.

So whatever we allow to do with subtree or userns filtered marks should
also be limited by ucounts.

> > the queue size is limited so unless the user spends time consuming events
> > as well, the load won't last long. But I agree we need to be careful not to
> > introduce too big latencies to operations generating events. So I think if
> > we could quickly detect whether a generated event has a good chance of
> > being relevant for some subtree watch of a group and queue it in that case
> > and worry about permission checks only once events are received and thus
> > receiver pays the cost of expensive checks, that might be fine as well.
> >
>
> So far the only idea I had for "quickly detect" which I cannot find flaws in
> is to filter by mnt_userms, but its power is limited.
>

So I have implemented this idea on fanotify_userns branch and the cost
per "filtered" sb mark is quite low - its a pretty cheap check in
send_to_group()
But still, if an unbound number of users can add to the sb mark list it is
not going to end well.

<hand waving>
I think what we need here (thinking out loud) is to account the sb marks
to the user that mounted the filesystem or to the user mapped to admin using
idmapped mount, maybe to both(?), probably using a separate ucount entry
(e.g. max_fanotify_filesystem_marks).

We can set this limit by default to a small number (128?) to limit the sb list
iteration per filesystem event and container manager / systemd can further
limit this resource when creating idmapped mounts, which would otherwise
allow the mapped user to add "filtered" (*) sb marks.
</hand waving>

Thanks,
Amir.

(*) "filtered" can refer to both the userns filter I proposed and going forward
     also maybe to subtree filter

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-22 18:38             ` Amir Goldstein
@ 2021-03-24 11:48               ` Jan Kara
  2021-03-24 15:50                 ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2021-03-24 11:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

On Mon 22-03-21 20:38:32, Amir Goldstein wrote:
> > > > The problem here is that even if unprivileged users cannot compromise
> > > > security, they can still cause significant CPU overhead either queueing
> > > > events or filtering events and that is something I haven't been able to
> > > > figure out a way to escape from.
> > >
> > > WRT queueing overhead, given a user can place ~1M of directory watches, he
> > > can cause noticable total overhead for queueing events anyway. Furthermore
> >
> > I suppose so. But a user placing 1M dir watches at least adds this overhead
> > knowingly. Adding a overhead on the entire filesystem when just wanting to
> > watch a small subtree doesn't sound ideal. Especially in very nested setups.
> > So yes, we need to be careful.
> >
> 
> I was thinking about this some more and I think the answer is in your example.
> User can only place 1M dir watches if ucount marks limits permits it.
> 
> So whatever we allow to do with subtree or userns filtered marks should
> also be limited by ucounts.

Yes, agreed.

> > > the queue size is limited so unless the user spends time consuming events
> > > as well, the load won't last long. But I agree we need to be careful not to
> > > introduce too big latencies to operations generating events. So I think if
> > > we could quickly detect whether a generated event has a good chance of
> > > being relevant for some subtree watch of a group and queue it in that case
> > > and worry about permission checks only once events are received and thus
> > > receiver pays the cost of expensive checks, that might be fine as well.
> > >
> >
> > So far the only idea I had for "quickly detect" which I cannot find flaws in
> > is to filter by mnt_userms, but its power is limited.
> 
> So I have implemented this idea on fanotify_userns branch and the cost
> per "filtered" sb mark is quite low - its a pretty cheap check in
> send_to_group()
> But still, if an unbound number of users can add to the sb mark list it is
> not going to end well.

Thinking out loud: So what is the cost going to be for the side generating
events? Ideally it would of O(number of fanotify groups receiving event).
We cannot get better than that and if the constants aren't too big I think
this is acceptable overhead. Sure this can mean total work of (number of
users) * (max number of subtree marks per user) for queueing notification
event but I don't think it is practical for a DoS attack and I also don't
think that in practice users will be watching overlapping subtrees that
much.

The question is whether we can get that fast. Probably not because that
would have to attach subtree watches to directory inodes or otherwise
filter out unrelated fanotify groups in O(1). But the complexity of
O(number of groups receiving events + depth of dir where event is happening)
is probably achievable - we'd walk the tree up and have roots of watched
subtrees marked. What do you think?

Also there is a somewhat related question what is the semantics of subtree
watches in presence of mounts - do subtree watches "see through" mount
points? Probably not but then with bind mounts this can be sometimes
inconvenient / confusing - e.g. if I have /tmp bind-mounted to /var/tmp and
I'm watching subtree of /var, I would not get events for what's in
/var/tmp... Which is logical if you spell it out like this but applications
often don't care how the mount hierarchy looks like, they just care about
locally visible directory structure.

> <hand waving>
> I think what we need here (thinking out loud) is to account the sb marks
> to the user that mounted the filesystem or to the user mapped to admin using
> idmapped mount, maybe to both(?), probably using a separate ucount entry
> (e.g. max_fanotify_filesystem_marks).

I'm somewhat lost here. Are these two users different? We have /home/foo
which is a mounted filesystem. AFAIU it will be mounted in a special user
namespace for user 'foo' - let's call is 'foo-ns'. /home/foo has idmapping
attached so system [ug]ids and non-trivially mapped to on-disk [ug]ids. Now
we have a user - let's call it 'foo-usr' that has enough capabilities
(whatever they are) in 'foo-ns' to place fanotify subtree marks in
/home/foo. So these marks are naturally accounted towards 'foo-usr'. To
whom else you'd like to also account these marks and why?

> We can set this limit by default to a small number (128?) to limit the sb list
> iteration per filesystem event and container manager / systemd can further
> limit this resource when creating idmapped mounts, which would otherwise
> allow the mapped user to add "filtered" (*) sb marks.
> </hand waving>
>
> (*) "filtered" can refer to both the userns filter I proposed and going forward
>      also maybe to subtree filter

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-22 12:44                       ` Amir Goldstein
  2021-03-22 16:28                         ` Christian Brauner
@ 2021-03-24 13:57                         ` Amir Goldstein
  2021-03-24 14:32                           ` Christian Brauner
  1 sibling, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-24 13:57 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

> > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > inside userns and works fine, with two wrinkles I needed to iron:
> >
> > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> >     zero f_fsid (easy to fix)
> > 2. open_by_handle_at() is not userns aware (can relax for
> >     FS_USERNS_MOUNT fs)
> >
> > Pushed these two fixes to branch fanotify_userns.
>
> Pushed another fix to mnt refcount bug in WIP and another commit to
> add the last piece that could make fanotify usable for systemd-homed
> setup - a filesystem watch filtered by mnt_userns (not tested yet).
>

Now I used mount-idmapped (from xfstest) to test that last piece.
Found a minor bug and pushed a fix.

It is working as expected, that is filtering only the events generated via
the idmapped mount. However, because the listener I tested is capable in
the mapped userns and not in the sb userns, the listener cannot
open_ny_handle_at(), so the result is not as useful as one might hope.

I guess we will also need to make open_by_handle_at() idmapped aware
and use a variant of vfs_dentry_acceptable() that validates that the opened
path is legitimately accessible via the idmapped mount.

I think I will leave this complexity to you should you think the userns filtered
watch is something worth the effort.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 13:57                         ` Amir Goldstein
@ 2021-03-24 14:32                           ` Christian Brauner
  2021-03-24 15:05                             ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-24 14:32 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 24, 2021 at 03:57:12PM +0200, Amir Goldstein wrote:
> > > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > > inside userns and works fine, with two wrinkles I needed to iron:
> > >
> > > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> > >     zero f_fsid (easy to fix)
> > > 2. open_by_handle_at() is not userns aware (can relax for
> > >     FS_USERNS_MOUNT fs)
> > >
> > > Pushed these two fixes to branch fanotify_userns.
> >
> > Pushed another fix to mnt refcount bug in WIP and another commit to
> > add the last piece that could make fanotify usable for systemd-homed
> > setup - a filesystem watch filtered by mnt_userns (not tested yet).
> >
> 
> Now I used mount-idmapped (from xfstest) to test that last piece.
> Found a minor bug and pushed a fix.
> 
> It is working as expected, that is filtering only the events generated via
> the idmapped mount. However, because the listener I tested is capable in
> the mapped userns and not in the sb userns, the listener cannot
> open_ny_handle_at(), so the result is not as useful as one might hope.

This is another dumb question probably but in general, are you saying
that someone watching a mount or directory and does _not_ want file
descriptors from fanotify to be returned has no other way of getting to
the path they want to open other than by using open_by_handle_at()?

> 
> I guess we will also need to make open_by_handle_at() idmapped aware
> and use a variant of vfs_dentry_acceptable() that validates that the opened
> path is legitimately accessible via the idmapped mount.

So as a first step, I think there's a legitimate case to be made for
open_by_handle_at() to be made useable inside user namespaces. That's a
change worth to be made independent of fanotify. For example, nowadays
cgroups have a 64 bit identifier that can be used with open_by_handle_at
to map a cgrp id to a path and back:
https://lkml.org/lkml/2020/12/2/1126
Right now this can't be used in user namespaces because of this
restriction but it is genuinely useful to have this feature available
since cgroups are FS_USERNS_MOUNT and that identifier <-> path mapping
is very convenient.
Without looking at the code I'm not super sure how name_to_handle_at()
and open_by_handle_at() behave in the face of mount namespaces so that
would need looking into to. But it would be a genuinely useful change, I
think.

> 
> I think I will leave this complexity to you should you think the userns filtered
> watch is something worth the effort.

Fair enough!

Thanks!
Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 14:32                           ` Christian Brauner
@ 2021-03-24 15:05                             ` Amir Goldstein
  2021-03-24 16:28                               ` Christian Brauner
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-24 15:05 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 24, 2021 at 4:32 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Wed, Mar 24, 2021 at 03:57:12PM +0200, Amir Goldstein wrote:
> > > > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > > > inside userns and works fine, with two wrinkles I needed to iron:
> > > >
> > > > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> > > >     zero f_fsid (easy to fix)
> > > > 2. open_by_handle_at() is not userns aware (can relax for
> > > >     FS_USERNS_MOUNT fs)
> > > >
> > > > Pushed these two fixes to branch fanotify_userns.
> > >
> > > Pushed another fix to mnt refcount bug in WIP and another commit to
> > > add the last piece that could make fanotify usable for systemd-homed
> > > setup - a filesystem watch filtered by mnt_userns (not tested yet).
> > >
> >
> > Now I used mount-idmapped (from xfstest) to test that last piece.
> > Found a minor bug and pushed a fix.
> >
> > It is working as expected, that is filtering only the events generated via
> > the idmapped mount. However, because the listener I tested is capable in
> > the mapped userns and not in the sb userns, the listener cannot
> > open_ny_handle_at(), so the result is not as useful as one might hope.
>
> This is another dumb question probably but in general, are you saying
> that someone watching a mount or directory and does _not_ want file
> descriptors from fanotify to be returned has no other way of getting to
> the path they want to open other than by using open_by_handle_at()?
>

Well there is another way.
It is demonstrated in my demo with intoifywatch --fanotify --recursive.
It involved userspace iterating a subtree of interest to create fid->path
map.

The fanotify recursive watch is similar but not exactly the same as the
old intoify recursive watch, because with inotify recursive watch you
can miss events.

With fanotify recursive watch, the listener (if capable) can setup a
filesystem mark so events will not be missed. They will be recorded
by fid with an unknown path and the path information can be found later
by the crawler and updated in the map before the final report.

Events on fid that were not found by the crawler need not be reported.
That's essentially a subtree watch for the poor implemented in userspace.

I did not implement the combination --fanotify --global --recursive in my
demo, because it did not make sense with the current permission model
(listener that can setup a fs mark can always resolve fids to path), but it
would be quite trivial to add.


> >
> > I guess we will also need to make open_by_handle_at() idmapped aware
> > and use a variant of vfs_dentry_acceptable() that validates that the opened
> > path is legitimately accessible via the idmapped mount.
>
> So as a first step, I think there's a legitimate case to be made for
> open_by_handle_at() to be made useable inside user namespaces. That's a
> change worth to be made independent of fanotify. For example, nowadays
> cgroups have a 64 bit identifier that can be used with open_by_handle_at
> to map a cgrp id to a path and back:
> https://lkml.org/lkml/2020/12/2/1126
> Right now this can't be used in user namespaces because of this
> restriction but it is genuinely useful to have this feature available
> since cgroups are FS_USERNS_MOUNT and that identifier <-> path mapping
> is very convenient.

FS_USERNS_MOUNT is a simple case and I think it is safe.
There is already a patch for that on my fanotify_userns branch.

> Without looking at the code I'm not super sure how name_to_handle_at()
> and open_by_handle_at() behave in the face of mount namespaces so that
> would need looking into to. But it would be a genuinely useful change, I
> think.
>

name_to_handle_at()/open_by_handle_at() should be indifferent to mount ns,
because the former returns mount_id which is globally unique (I think)
and the latter takes a mount_fd argument, which means the caller needs to
prove access to the mount prior to getting an open fd in that mount.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 11:48               ` Jan Kara
@ 2021-03-24 15:50                 ` Amir Goldstein
  2021-03-25 13:49                   ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-24 15:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

> > So I have implemented this idea on fanotify_userns branch and the cost
> > per "filtered" sb mark is quite low - its a pretty cheap check in
> > send_to_group()
> > But still, if an unbound number of users can add to the sb mark list it is
> > not going to end well.
>
> Thinking out loud: So what is the cost going to be for the side generating
> events? Ideally it would of O(number of fanotify groups receiving event).
> We cannot get better than that and if the constants aren't too big I think
> this is acceptable overhead. Sure this can mean total work of (number of
> users) * (max number of subtree marks per user) for queueing notification
> event but I don't think it is practical for a DoS attack and I also don't
> think that in practice users will be watching overlapping subtrees that
> much.
>

Why overlapping?
My concern is not so much from DoS attacks.
My concern is more from innocent users adding unacceptable
accumulated overhead.

Think of a filesystem mounted at /home/ with 100K directories at
/home/user$N, every user gets its own idmapped mount from
systemd-homed and may or may not choose to run a listener to
get events generated under its own home dir (which is an idmapped
mount). Even if we limit one sb mask per user, we can still have 100K
marks list in sb.

For this reason I think we need to limit the number of marks per sb.
The simple way is a global config like max_queued_events, but I think
we can do better than that.

> The question is whether we can get that fast. Probably not because that
> would have to attach subtree watches to directory inodes or otherwise
> filter out unrelated fanotify groups in O(1). But the complexity of
> O(number of groups receiving events + depth of dir where event is happening)
> is probably achievable - we'd walk the tree up and have roots of watched
> subtrees marked. What do you think?
>

I am for that. I already posted a POC along those lines [1].
I was just not sure how to limit the potential accumulated overhead.

[1] https://github.com/amir73il/linux/commits/fanotify_subtree_mark

> Also there is a somewhat related question what is the semantics of subtree
> watches in presence of mounts - do subtree watches "see through" mount
> points? Probably not but then with bind mounts this can be sometimes
> inconvenient / confusing - e.g. if I have /tmp bind-mounted to /var/tmp and
> I'm watching subtree of /var, I would not get events for what's in
> /var/tmp... Which is logical if you spell it out like this but applications
> often don't care how the mount hierarchy looks like, they just care about
> locally visible directory structure.

Those are hard questions.
I think that userns/mountns developers needed to address them a while ago
and I think there are some helpers that help with checking visibility of paths.

>
> > <hand waving>
> > I think what we need here (thinking out loud) is to account the sb marks
> > to the user that mounted the filesystem or to the user mapped to admin using
> > idmapped mount, maybe to both(?), probably using a separate ucount entry
> > (e.g. max_fanotify_filesystem_marks).
>
> I'm somewhat lost here. Are these two users different? We have /home/foo
> which is a mounted filesystem. AFAIU it will be mounted in a special user
> namespace for user 'foo' - let's call is 'foo-ns'. /home/foo has idmapping
> attached so system [ug]ids and non-trivially mapped to on-disk [ug]ids. Now
> we have a user - let's call it 'foo-usr' that has enough capabilities
> (whatever they are) in 'foo-ns' to place fanotify subtree marks in
> /home/foo. So these marks are naturally accounted towards 'foo-usr'. To
> whom else you'd like to also account these marks and why?
>

I would like the system admin to be able to limit 100 sb marks on /home
(filtered or not) because that impacts the send_to_group iteration.
I would also like systemd to be able to grant a smaller quota of filtered
sb marks per user when creating and mapping the idmapped mounts
at /home/foo$N

I *think* we can achieve that, by accounting the sb marks to uid 0
(who mounted /home) in ucounts entry "fanotify_sb_marks".
If /home would have been a FS_USERNS_MOUNT mounted inside
some userns, then all its sb marks would be accounted to uid 0 of
that userns.

I have no ideas if this all adds up.
My head explodes even from trying to express these rules :-/

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 15:05                             ` Amir Goldstein
@ 2021-03-24 16:28                               ` Christian Brauner
  2021-03-24 17:07                                 ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-24 16:28 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 24, 2021 at 05:05:45PM +0200, Amir Goldstein wrote:
> On Wed, Mar 24, 2021 at 4:32 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > On Wed, Mar 24, 2021 at 03:57:12PM +0200, Amir Goldstein wrote:
> > > > > Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
> > > > > inside userns and works fine, with two wrinkles I needed to iron:
> > > > >
> > > > > 1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
> > > > >     zero f_fsid (easy to fix)
> > > > > 2. open_by_handle_at() is not userns aware (can relax for
> > > > >     FS_USERNS_MOUNT fs)
> > > > >
> > > > > Pushed these two fixes to branch fanotify_userns.
> > > >
> > > > Pushed another fix to mnt refcount bug in WIP and another commit to
> > > > add the last piece that could make fanotify usable for systemd-homed
> > > > setup - a filesystem watch filtered by mnt_userns (not tested yet).
> > > >
> > >
> > > Now I used mount-idmapped (from xfstest) to test that last piece.
> > > Found a minor bug and pushed a fix.
> > >
> > > It is working as expected, that is filtering only the events generated via
> > > the idmapped mount. However, because the listener I tested is capable in
> > > the mapped userns and not in the sb userns, the listener cannot
> > > open_ny_handle_at(), so the result is not as useful as one might hope.
> >
> > This is another dumb question probably but in general, are you saying
> > that someone watching a mount or directory and does _not_ want file
> > descriptors from fanotify to be returned has no other way of getting to
> > the path they want to open other than by using open_by_handle_at()?
> >
> 
> Well there is another way.
> It is demonstrated in my demo with intoifywatch --fanotify --recursive.
> It involved userspace iterating a subtree of interest to create fid->path
> map.

Ok, so this seems to be

inotifytools_filename_from_fid()
-> if (fanotify_mark_type != FAN_MARK_FILESYSTEM)
           watch_from_fid()
   -> read_path_from(/proc/self/fd/w->dirfd)

> 
> The fanotify recursive watch is similar but not exactly the same as the
> old intoify recursive watch, because with inotify recursive watch you
> can miss events.
> 
> With fanotify recursive watch, the listener (if capable) can setup a
> filesystem mark so events will not be missed. They will be recorded
> by fid with an unknown path and the path information can be found later
> by the crawler and updated in the map before the final report.
> 
> Events on fid that were not found by the crawler need not be reported.
> That's essentially a subtree watch for the poor implemented in userspace.

This is already a good improvement.
Honestly, having FAN_MARK_INODE workable unprivileged is already pretty
great. In addition having FAN_MARK_MOUNT workable with idmapped mounts
will likely get us what most users care about, afaict that is the POC
in:
https://github.com/amir73il/linux/commit/f0d5d462c5baeb82a658944c6df80704434f09a1

(I'm reading the source correctly that FAN_MARK_MOUNT works with
FAN_REPORT_FID as long as no inode event set in FANOTIFY_INODE_EVENTS is
set? I'm asking because my manpage - probably too old - seems to imply
that FAN_REPORT_FID can't be used with FAN_MARK_MOUNT although I might
just be stumbling over the phrasing.)

I think FAN_MARK_FILESYSTEM should simply stay under the s_userns_s
capable requirement. That's imho the cleanest semantics for this, i.e.
I'd drop:
https://github.com/amir73il/linux/commit/bd20e273f3c3a650805b3da32e493f01cc2a4763
This is neither an urgent use-case nor am I feeling very comfortable
with it.

> 
> I did not implement the combination --fanotify --global --recursive in my
> demo, because it did not make sense with the current permission model
> (listener that can setup a fs mark can always resolve fids to path), but it
> would be quite trivial to add.
> 
> 
> > >
> > > I guess we will also need to make open_by_handle_at() idmapped aware
> > > and use a variant of vfs_dentry_acceptable() that validates that the opened
> > > path is legitimately accessible via the idmapped mount.
> >
> > So as a first step, I think there's a legitimate case to be made for
> > open_by_handle_at() to be made useable inside user namespaces. That's a
> > change worth to be made independent of fanotify. For example, nowadays
> > cgroups have a 64 bit identifier that can be used with open_by_handle_at
> > to map a cgrp id to a path and back:
> > https://lkml.org/lkml/2020/12/2/1126
> > Right now this can't be used in user namespaces because of this
> > restriction but it is genuinely useful to have this feature available
> > since cgroups are FS_USERNS_MOUNT and that identifier <-> path mapping
> > is very convenient.
> 
> FS_USERNS_MOUNT is a simple case and I think it is safe.
> There is already a patch for that on my fanotify_userns branch.

Great!
Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 16:28                               ` Christian Brauner
@ 2021-03-24 17:07                                 ` Amir Goldstein
  2021-03-25 11:12                                   ` Christian Brauner
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-24 17:07 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

> > Well there is another way.
> > It is demonstrated in my demo with intoifywatch --fanotify --recursive.
> > It involved userspace iterating a subtree of interest to create fid->path
> > map.
>
> Ok, so this seems to be
>
> inotifytools_filename_from_fid()
> -> if (fanotify_mark_type != FAN_MARK_FILESYSTEM)
>            watch_from_fid()
>    -> read_path_from(/proc/self/fd/w->dirfd)
>

Yes.

> >
> > The fanotify recursive watch is similar but not exactly the same as the
> > old intoify recursive watch, because with inotify recursive watch you
> > can miss events.
> >
> > With fanotify recursive watch, the listener (if capable) can setup a
> > filesystem mark so events will not be missed. They will be recorded
> > by fid with an unknown path and the path information can be found later
> > by the crawler and updated in the map before the final report.
> >
> > Events on fid that were not found by the crawler need not be reported.
> > That's essentially a subtree watch for the poor implemented in userspace.
>
> This is already a good improvement.
> Honestly, having FAN_MARK_INODE workable unprivileged is already pretty

I'm not so sure why you say that, because unprivileged FAN_MARK_INODE
watches are pretty close in functionality to inotify.
There are only subtle differences.

> great. In addition having FAN_MARK_MOUNT workable with idmapped mounts
> will likely get us what most users care about, afaict that is the POC
> in:
> https://github.com/amir73il/linux/commit/f0d5d462c5baeb82a658944c6df80704434f09a1
>

Hmm, the problem is the limited set of events you can get from
FAN_MARK_MOUNT which does not include FAN_CREATE.

> (I'm reading the source correctly that FAN_MARK_MOUNT works with
> FAN_REPORT_FID as long as no inode event set in FANOTIFY_INODE_EVENTS is
> set? I'm asking because my manpage - probably too old - seems to imply
> that FAN_REPORT_FID can't be used with FAN_MARK_MOUNT although I might
> just be stumbling over the phrasing.)
>

commit d71c9b4a5c6fbc7164007b52dba1de410d018292
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Mon Apr 20 21:42:56 2020 +0300

    fanotify_mark.2: Clarification about FAN_MARK_MOUNT and FAN_REPORT_FID

    It is not true that FAN_MARK_MOUNT cannot be used with a group
    that was initialized with flag FAN_REPORT_FID.
 ...

IOW, no FAN_CREATE, FAN_DELETE, FAN_MOVE

The technical reason for that is Al's objection to pass the mnt context
into vfs helpers (and then fsnotify hooks).

> I think FAN_MARK_FILESYSTEM should simply stay under the s_userns_s
> capable requirement. That's imho the cleanest semantics for this, i.e.
> I'd drop:
> https://github.com/amir73il/linux/commit/bd20e273f3c3a650805b3da32e493f01cc2a4763
> This is neither an urgent use-case nor am I feeling very comfortable
> with it.
>

The purpose of this commit is to provide FAN_CREATE, FAN_DELETE
FAN_MOVE events filtered by (an idmapped) mount.
I don't like it so much myself, but I have not had any better idea how to
achieve that goal so far.

There is another way though.
We can create a new set of hooks outside the vfs helpers that do have
the mnt context.

I have already created such a set for another POC [1].
In this POC I introduced three new events FS_MODIFY_INTENT,
FS_NAME_INTENT, FS_MOVE_INTENT, which I had no plans of
exposing to fanotify. Nor did I need the granularity of CREATE,
DELETE, RENAME event types (all collapsed into NAME_INTENT).

But if we hit a dead end, we can resort to this strategy.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fsnotify_pre_modify

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 17:07                                 ` Amir Goldstein
@ 2021-03-25 11:12                                   ` Christian Brauner
  2021-03-25 15:31                                     ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2021-03-25 11:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

On Wed, Mar 24, 2021 at 07:07:17PM +0200, Amir Goldstein wrote:
> > > Well there is another way.
> > > It is demonstrated in my demo with intoifywatch --fanotify --recursive.
> > > It involved userspace iterating a subtree of interest to create fid->path
> > > map.
> >
> > Ok, so this seems to be
> >
> > inotifytools_filename_from_fid()
> > -> if (fanotify_mark_type != FAN_MARK_FILESYSTEM)
> >            watch_from_fid()
> >    -> read_path_from(/proc/self/fd/w->dirfd)
> >
> 
> Yes.
> 
> > >
> > > The fanotify recursive watch is similar but not exactly the same as the
> > > old intoify recursive watch, because with inotify recursive watch you
> > > can miss events.
> > >
> > > With fanotify recursive watch, the listener (if capable) can setup a
> > > filesystem mark so events will not be missed. They will be recorded
> > > by fid with an unknown path and the path information can be found later
> > > by the crawler and updated in the map before the final report.
> > >
> > > Events on fid that were not found by the crawler need not be reported.
> > > That's essentially a subtree watch for the poor implemented in userspace.
> >
> > This is already a good improvement.
> > Honestly, having FAN_MARK_INODE workable unprivileged is already pretty
> 
> I'm not so sure why you say that, because unprivileged FAN_MARK_INODE
> watches are pretty close in functionality to inotify.
> There are only subtle differences.

Simply because until now we couldn't use fanotify in any way because of
the capable() restriction.

> 
> > great. In addition having FAN_MARK_MOUNT workable with idmapped mounts
> > will likely get us what most users care about, afaict that is the POC
> > in:
> > https://github.com/amir73il/linux/commit/f0d5d462c5baeb82a658944c6df80704434f09a1
> >
> 
> Hmm, the problem is the limited set of events you can get from
> FAN_MARK_MOUNT which does not include FAN_CREATE.

Yes, that's what I gathered from perusing the code.

> 
> > (I'm reading the source correctly that FAN_MARK_MOUNT works with
> > FAN_REPORT_FID as long as no inode event set in FANOTIFY_INODE_EVENTS is
> > set? I'm asking because my manpage - probably too old - seems to imply
> > that FAN_REPORT_FID can't be used with FAN_MARK_MOUNT although I might
> > just be stumbling over the phrasing.)
> >
> 
> commit d71c9b4a5c6fbc7164007b52dba1de410d018292
> Author: Amir Goldstein <amir73il@gmail.com>
> Date:   Mon Apr 20 21:42:56 2020 +0300
> 
>     fanotify_mark.2: Clarification about FAN_MARK_MOUNT and FAN_REPORT_FID
> 
>     It is not true that FAN_MARK_MOUNT cannot be used with a group
>     that was initialized with flag FAN_REPORT_FID.

Btw, I don't see FAN_MARK_INODE in the man2 in the upstream repository.
I know it's essentially the 0 or default value but it would still be
worthwhile to mention it in the manpage.

>  ...
> 
> IOW, no FAN_CREATE, FAN_DELETE, FAN_MOVE
> 
> The technical reason for that is Al's objection to pass the mnt context
> into vfs helpers (and then fsnotify hooks).

Ah yes, I remember that.

> 
> > I think FAN_MARK_FILESYSTEM should simply stay under the s_userns_s
> > capable requirement. That's imho the cleanest semantics for this, i.e.
> > I'd drop:
> > https://github.com/amir73il/linux/commit/bd20e273f3c3a650805b3da32e493f01cc2a4763
> > This is neither an urgent use-case nor am I feeling very comfortable
> > with it.
> >
> 
> The purpose of this commit is to provide FAN_CREATE, FAN_DELETE
> FAN_MOVE events filtered by (an idmapped) mount.

I see.

I wanted to write a few words about the use-case you mention and the
need for subtree watches for this particular case:

"A common use case for of a filesystem subtree is a bind mount, not on
the root of the filesystem, such as the bind mounts used for
containers."

Which presumably means you want to point fanotify to the rootfs mount of
the container and have it watch everything under that rootfs including
any submounts in there. While this certainly might be useful I'm not
sure it's a very common use-case or really necessary to support.

Assuming a full system-like container setup like runC, systemd-nspawn,
LXD, and similar runtimes do it we end up with a few additonal mounts.
They are usually performed in the containers mount+user namespace.
The standard ones, i.e. the ones most container runtimes setup are:
- sysfs
- cgroupfs
- procfs
- mqueue
- devpts
- tmpfs on /dev (as a substitute for devtmpfs not being namespaced)
  - tmpfs on /dev/shm
  - bind mounts of a few host device files into /dev:
    - /dev/fuse
    - /dev/net/tun
    - /dev/full
    - /dev/null
    - /dev/random
    - /dev/tty
    - /dev/urandom
    - /dev/zero
    - /dev/console
I think most of these mounts aren't very interesting to monitor. It's in
general not very common to monitor full pseudo fileystems such as proc,
sysfs, or devpts afaik.
Notably, systemd - both outside and inside containers - will use some
inotify watches but it's always on specific directories and never across
mount borders.

In addition to the default mounts above - I've mentioned this before -
the container manager might choose to inject mounts into a running
container (to share data or whatever). There are different strategies on
how to do this.
I1. The first strategy is to inject the mount into the container in such
    a way that the container has full control over it, i.e. it can
    unmount and remount (with the restriction that some flags might
    become locked when moving the mount across user namespaces).
I2. The second strategy is to inject the mount in such a way that the
    container doesn't have control over it, i.e. the container can't
    umount or remount.

Most container runtimes will support I1. but systemd-nspawn uses I2.
There might be use-cases where the container manager would like to watch
those mounts too. But then the container manager will just call
fanotify_mark() and add that mount to the list of watched mounts when
injecting it.
I get that there are other use-cases that make subtree watches very
interesting but I don't think the container use-case is a particularly
pressing one.

> I don't like it so much myself, but I have not had any better idea how to
> achieve that goal so far.

The limitations of FAN_MARK_MOUNT as I now understand them are indeed
unpleasant. If we could get FAN_MARK_MOUNT with the same event support
as FAN_MARK_INODE that would be great.
I think the delegation model that makes sense to me is to allow
FAN_MARK_MOUNT when the caller is ns_capable(mnt->mnt_userns) and of
course ns_capable() in the userns they called fanotify_init() in. That
feels ok and supportable.
But I don't think anything beyond that like the sb filter patch that you
showed is the right approach.

Christian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-24 15:50                 ` Amir Goldstein
@ 2021-03-25 13:49                   ` Jan Kara
  2021-03-25 15:05                     ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2021-03-25 13:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

On Wed 24-03-21 17:50:52, Amir Goldstein wrote:
> > > So I have implemented this idea on fanotify_userns branch and the cost
> > > per "filtered" sb mark is quite low - its a pretty cheap check in
> > > send_to_group()
> > > But still, if an unbound number of users can add to the sb mark list it is
> > > not going to end well.
> >
> > Thinking out loud: So what is the cost going to be for the side generating
> > events? Ideally it would of O(number of fanotify groups receiving event).
> > We cannot get better than that and if the constants aren't too big I think
> > this is acceptable overhead. Sure this can mean total work of (number of
> > users) * (max number of subtree marks per user) for queueing notification
> > event but I don't think it is practical for a DoS attack and I also don't
> > think that in practice users will be watching overlapping subtrees that
> > much.
> >
> 
> Why overlapping?
> My concern is not so much from DoS attacks.
> My concern is more from innocent users adding unacceptable
> accumulated overhead.
> 
> Think of a filesystem mounted at /home/ with 100K directories at
> /home/user$N, every user gets its own idmapped mount from
> systemd-homed and may or may not choose to run a listener to
> get events generated under its own home dir (which is an idmapped
> mount). Even if we limit one sb mask per user, we can still have 100K
> marks list in sb.

I see but then you'd have to have 100K users using the same filesystem on
the server at the same time? Which doesn't look likely to me? I'd presume
the home dir is watched only if the user is actually running something on
that machine... So I'm not sure how realistic this example is. But yes,
maybe we need some more efficient algorithm for selecting which subtree
watch is actually relevant for an event.

> For this reason I think we need to limit the number of marks per sb.
> The simple way is a global config like max_queued_events, but I think
> we can do better than that.

Adding a global limit on number of sb marks is OK but still I'd like the
system to scale reasonably with say tens to hundreds watches...

> > The question is whether we can get that fast. Probably not because that
> > would have to attach subtree watches to directory inodes or otherwise
> > filter out unrelated fanotify groups in O(1). But the complexity of
> > O(number of groups receiving events + depth of dir where event is happening)
> > is probably achievable - we'd walk the tree up and have roots of watched
> > subtrees marked. What do you think?
> 
> I am for that. I already posted a POC along those lines [1].
> I was just not sure how to limit the potential accumulated overhead.
> 
> [1] https://github.com/amir73il/linux/commits/fanotify_subtree_mark

Yes, but AFAICT your solution was O((number of subtree marks on sb) * depth
of dir) while I'm speaking about O(number of *groups* on sb + depth of
dir). Which is significantly less and will require more careful setup to
achieve such complexity (like placing special marks in lists of watches for
directories that are roots of watched subtrees and checking such lists when
walking up the tree).

> > Also there is a somewhat related question what is the semantics of subtree
> > watches in presence of mounts - do subtree watches "see through" mount
> > points? Probably not but then with bind mounts this can be sometimes
> > inconvenient / confusing - e.g. if I have /tmp bind-mounted to /var/tmp and
> > I'm watching subtree of /var, I would not get events for what's in
> > /var/tmp... Which is logical if you spell it out like this but applications
> > often don't care how the mount hierarchy looks like, they just care about
> > locally visible directory structure.
> 
> Those are hard questions.  I think that userns/mountns developers needed
> to address them a while ago and I think there are some helpers that help
> with checking visibility of paths.
> 
> > > <hand waving>
> > > I think what we need here (thinking out loud) is to account the sb marks
> > > to the user that mounted the filesystem or to the user mapped to admin using
> > > idmapped mount, maybe to both(?), probably using a separate ucount entry
> > > (e.g. max_fanotify_filesystem_marks).
> >
> > I'm somewhat lost here. Are these two users different? We have /home/foo
> > which is a mounted filesystem. AFAIU it will be mounted in a special user
> > namespace for user 'foo' - let's call is 'foo-ns'. /home/foo has idmapping
> > attached so system [ug]ids and non-trivially mapped to on-disk [ug]ids. Now
> > we have a user - let's call it 'foo-usr' that has enough capabilities
> > (whatever they are) in 'foo-ns' to place fanotify subtree marks in
> > /home/foo. So these marks are naturally accounted towards 'foo-usr'. To
> > whom else you'd like to also account these marks and why?
> >
> 
> I would like the system admin to be able to limit 100 sb marks on /home
> (filtered or not) because that impacts the send_to_group iteration.

OK, so per-sb limitation of sb mark number...

> I would also like systemd to be able to grant a smaller quota of filtered
> sb marks per user when creating and mapping the idmapped mounts
> at /home/foo$N

... and a ucount to go with it?

> I *think* we can achieve that, by accounting the sb marks to uid 0
> (who mounted /home) in ucounts entry "fanotify_sb_marks".

But a superblock can be mounted in multiple places, in multiple user
namespaces, potentially by different users (think of nested containers)? So
if we want a per-sb limit on sb marks, I think that accounting those per
user won't really achieve that?

> If /home would have been a FS_USERNS_MOUNT mounted inside
> some userns, then all its sb marks would be accounted to uid 0 of
> that userns.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-25 13:49                   ` Jan Kara
@ 2021-03-25 15:05                     ` Amir Goldstein
  0 siblings, 0 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-25 15:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel, Linux API, Christian Brauner

> > I would like the system admin to be able to limit 100 sb marks on /home
> > (filtered or not) because that impacts the send_to_group iteration.
>
> OK, so per-sb limitation of sb mark number...
>
> > I would also like systemd to be able to grant a smaller quota of filtered
> > sb marks per user when creating and mapping the idmapped mounts
> > at /home/foo$N
>
> ... and a ucount to go with it?
>
> > I *think* we can achieve that, by accounting the sb marks to uid 0
> > (who mounted /home) in ucounts entry "fanotify_sb_marks".
>
> But a superblock can be mounted in multiple places, in multiple user
> namespaces, potentially by different users (think of nested containers)? So
> if we want a per-sb limit on sb marks, I think that accounting those per
> user won't really achieve that?
>

I agree. It won't.
We can start with the global max_fanotify_sb_marks.
I do not have an idea how to make that workable using ucounts.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-25 11:12                                   ` Christian Brauner
@ 2021-03-25 15:31                                     ` Amir Goldstein
  2021-03-28 14:58                                       ` Amir Goldstein
  0 siblings, 1 reply; 33+ messages in thread
From: Amir Goldstein @ 2021-03-25 15:31 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

> I get that there are other use-cases that make subtree watches very
> interesting but I don't think the container use-case is a particularly
> pressing one.
>

That's what I thought.

Containers are usually "contained" by a mount and possibly by userns,
so it makes more sense and it would be more efficient to filter by those
contexts.

> > I don't like it so much myself, but I have not had any better idea how to
> > achieve that goal so far.
>
> The limitations of FAN_MARK_MOUNT as I now understand them are indeed
> unpleasant. If we could get FAN_MARK_MOUNT with the same event support
> as FAN_MARK_INODE that would be great.
> I think the delegation model that makes sense to me is to allow
> FAN_MARK_MOUNT when the caller is ns_capable(mnt->mnt_userns) and of
> course ns_capable() in the userns they called fanotify_init() in. That
> feels ok and supportable.

I present to you a demo [1][2] of FAN_MARK_MOUNT on idmapped mount that:

1. Can subscribe and receive FAN_LINK (new) events
2. Is capable of open_by_handle() if fid is under mount root

FAN_LINK (temp name) is an event that I wanted to add anyway [3] and
AFAIK it's the only event that you really need in order to detect when a dir
was created for the use case of injecting a bind mount into a container.

The kernel branch [1] intentionally excludes the controversial patch that
added support for userns filtered sb marks.

Therefore, trying to run the demo script as is on an idmapped mount
inside userns will auto-detect UID 0, try to setup an sb mark and fail.

Instead, the demo script should be run as follows to combine a
mount mark and recursive inode marks:

./test_demo.sh <idmapped-mount-path> 1

For example:
~# ./test_demo.sh /vdf 1
+ WD=/vdf
+ ID=1
...
+ inotifywatch --fanotify --recursive -w -e link --timeout -2 /vdf
Establishing watches...
...
+ mkdir -p a/dir0 a/dir1 a/dir2/subdir2
+ touch a/dir2/file2
...
[fid=ad91a2b8.81a99d43.3000081;name='dir2'] /vdf/a/dir2
[fid=ad91a2b8.81a99d43.8a;name='.'] /vdf/a/dir2/.
[fid=ad91a2b8.81a99d43.10000a6;name='.'] /vdf/a/dir2/subdir2/.
[fid=ad91a2b8.81a99d43.8a;name='file2'] /vdf/a/dir2/file2
...
total  modify  ..................................  create  link
delete  filename
1      0       0       0       0       0        0       1       0
0       /vdf/a/dir2
1      0       0       0       0       0        0       0       1
0       /vdf/a/dir2/.
1      0       0       0       0       0        0       0       1
0       /vdf/a/dir2/subdir2/.
1      0       0       0       0       0        0       0       1
0       /vdf/a/dir2/file2

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_link
[2] https://github.com/amir73il/inotify-tools/commits/fanotify_link
[3] https://lore.kernel.org/linux-fsdevel/CAOQ4uxhEsbfA5+sW4XPnUKgCkXtwoDA-BR3iRO34Nx5c4y7Nug@mail.gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/2] unprivileged fanotify listener
  2021-03-25 15:31                                     ` Amir Goldstein
@ 2021-03-28 14:58                                       ` Amir Goldstein
  0 siblings, 0 replies; 33+ messages in thread
From: Amir Goldstein @ 2021-03-28 14:58 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, Linux API

> > The limitations of FAN_MARK_MOUNT as I now understand them are indeed
> > unpleasant. If we could get FAN_MARK_MOUNT with the same event support
> > as FAN_MARK_INODE that would be great.
> > I think the delegation model that makes sense to me is to allow
> > FAN_MARK_MOUNT when the caller is ns_capable(mnt->mnt_userns) and of
> > course ns_capable() in the userns they called fanotify_init() in. That
> > feels ok and supportable.
>
> I present to you a demo [1][2] of FAN_MARK_MOUNT on idmapped mount that:
>
> 1. Can subscribe and receive FAN_LINK (new) events
> 2. Is capable of open_by_handle() if fid is under mount root
>
> FAN_LINK (temp name) is an event that I wanted to add anyway [3] and
> AFAIK it's the only event that you really need in order to detect when a dir
> was created for the use case of injecting a bind mount into a container.

Scratch that part about the new event.
I found a way to make FAN_CREATE available for FAN_MARK_MOUNT.
Will post an RFC patch.
Same demo instructions. Different branches [1][2]:

>
> The kernel branch [1] intentionally excludes the controversial patch that
> added support for userns filtered sb marks.
>
> Therefore, trying to run the demo script as is on an idmapped mount
> inside userns will auto-detect UID 0, try to setup an sb mark and fail.
>
> Instead, the demo script should be run as follows to combine a
> mount mark and recursive inode marks:
>
> ./test_demo.sh <idmapped-mount-path> 1
>
> For example:

~# ./test_demo.sh /vdf 1
+ WD=/vdf
+ ID=1
...
+ inotifywatch --fanotify --recursive -w --timeout -2 /vdf
Establishing watches...
...
+ mkdir -p a/dir0 a/dir1 a/dir2/A/B/C/
+ touch a/dir2/A/B/C/file2
...
[fid=94847cf7.d74a50ab.30000c2;name='dir2'] /mnt/a/dir2
Adding recursive watches on directory '/mnt/a/dir2/'...
[fid=94847cf7.d74a50ab.87;name='A'] /mnt/a/dir2/A
Adding recursive watches on directory '/mnt/a/dir2/A/'...
[fid=94847cf7.d74a50ab.1000087;name='B'] /mnt/a/dir2/A/B
Adding recursive watches on directory '/mnt/a/dir2/A/B/'...
[fid=94847cf7.d74a50ab.20073e5;name='C'] /mnt/a/dir2/A/B/C
Adding recursive watches on directory '/mnt/a/dir2/A/B/C/'...
[fid=94847cf7.d74a50ab.30000c9;name='file2'] /mnt/a/dir2/A/B/C/file2

Hope that helps.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_userns
[2] https://github.com/amir73il/inotify-tools/commits/fanotify_userns

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-03-28 14:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04 11:29 [PATCH v2 0/2] unprivileged fanotify listener Amir Goldstein
2021-03-04 11:29 ` [PATCH v2 1/2] fanotify: configurable limits via sysfs Amir Goldstein
2021-03-04 11:29 ` [PATCH v2 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
2021-03-16 15:55 ` [PATCH v2 0/2] unprivileged fanotify listener Jan Kara
2021-03-17 11:01   ` Amir Goldstein
2021-03-17 11:42     ` Jan Kara
2021-03-17 12:19       ` Amir Goldstein
2021-03-17 17:45         ` Christian Brauner
2021-03-17 19:14           ` Amir Goldstein
2021-03-18 14:31             ` Christian Brauner
2021-03-18 16:48               ` Amir Goldstein
2021-03-19 13:40                 ` Christian Brauner
2021-03-19 14:21                   ` Amir Goldstein
2021-03-20 12:57                     ` Amir Goldstein
2021-03-22 12:44                       ` Amir Goldstein
2021-03-22 16:28                         ` Christian Brauner
2021-03-22 17:22                           ` Amir Goldstein
2021-03-24 13:57                         ` Amir Goldstein
2021-03-24 14:32                           ` Christian Brauner
2021-03-24 15:05                             ` Amir Goldstein
2021-03-24 16:28                               ` Christian Brauner
2021-03-24 17:07                                 ` Amir Goldstein
2021-03-25 11:12                                   ` Christian Brauner
2021-03-25 15:31                                     ` Amir Goldstein
2021-03-28 14:58                                       ` Amir Goldstein
2021-03-18 15:44         ` Jan Kara
2021-03-18 17:07           ` Amir Goldstein
2021-03-18 18:40             ` Christian Brauner
2021-03-22 18:38             ` Amir Goldstein
2021-03-24 11:48               ` Jan Kara
2021-03-24 15:50                 ` Amir Goldstein
2021-03-25 13:49                   ` Jan Kara
2021-03-25 15:05                     ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.