linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Bobrowski <repnop@google.com>,
	Christian Brauner <brauner@kernel.org>,
	Lennart Poettering <lennart@poettering.net>,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: [RFC][PATCH] fanotify: support watching filesystems and mounts inside userns
Date: Sun, 16 Apr 2023 09:07:22 +0300	[thread overview]
Message-ID: <20230416060722.1912831-1-amir73il@gmail.com> (raw)

An unprivileged user is allowed to create an fanotify group and add
inode marks, but not filesystem and mount marks.

Add limited support for setting up filesystem and mount marks by an
unprivileged user under the following conditions:

1.   User has CAP_SYS_ADMIN in the user ns where the group was created
2.a. User has CAP_SYS_ADMIN in the user ns where the filesystem was
     mounted (implies FS_USERNS_MOUNT)
  OR (in case setting up a mark mount)
2.b. User has CAP_SYS_ADMIN in the user ns attached to an idmapped mount

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

This patch is from a WIP followup to unpriv fanotify two years ago [1].

I realize that I never posted it, partly because we wanted to let the
dust settle on upriv fanotify first.

Reading back the thread, it mostly revolved around trying to leverage
idmapped mounts to implement subtree watches, which didn't converge to
a workable solution.

Besides that there was the minor detail that userns admin cannot use
open_by_file_handle() to resolve the path from fids, but that's not
realy a show stopper from using fanotify sb/mount marks within userns.

If there were any other concerns regarding this patch, I did not find
them in the thread.

A recent request from Christian to watch (within user ns) when a mount
is being unmounted brought this patch back to mind, but note that this
patch is complementary to FAN_UNMOUNT patches [2] and would be useful
regardless on FAN_UNMOUNT patches.

I've only tested it manually, because LTP does not (yet) have good
helpers to setup userns with its own mount ns nor idmapped mounts.

Christian,

You can find this patch, along with FAN_UNMOUNT patches on my github [3].
Please confirm that this meets your needs for watching container mounts.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxhfx012GtvXMfiaHSk1M7+gTqkz3LsT0i_cHLnZLMk8nw@mail.gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/20230414182903.1852019-1-amir73il@gmail.com/
[3] https://github.com/amir73il/linux/commits/fan_unmount

 fs/notify/fanotify/fanotify.c      |  1 +
 fs/notify/fanotify/fanotify_user.c | 58 +++++++++++++++++-------------
 include/linux/fsnotify_backend.h   |  1 +
 3 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index c204259be6cc..55f45f22b25f 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -1040,6 +1040,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 
 static void fanotify_free_group_priv(struct fsnotify_group *group)
 {
+	put_user_ns(group->user_ns);
 	kfree(group->fanotify_data.merge_hash);
 	if (group->fanotify_data.ucounts)
 		dec_ucount(group->fanotify_data.ucounts,
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index db3b79b8e901..2c3e123aee14 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1238,6 +1238,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 	 * A group with FAN_UNLIMITED_MARKS does not contribute to mark count
 	 * in the limited groups account.
 	 */
+	BUILD_BUG_ON(!(FANOTIFY_ADMIN_INIT_FLAGS & FAN_UNLIMITED_MARKS));
 	if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS) &&
 	    !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
 		return ERR_PTR(-ENOSPC);
@@ -1423,6 +1424,7 @@ static struct hlist_head *fanotify_alloc_merge_hash(void)
 /* fanotify syscalls */
 SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct fsnotify_group *group;
 	int f_flags, fd;
 	unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
@@ -1514,8 +1516,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	}
 
 	/* Enforce groups limits per user in all containing user ns */
-	group->fanotify_data.ucounts = inc_ucount(current_user_ns(),
-						  current_euid(),
+	group->fanotify_data.ucounts = inc_ucount(user_ns, current_euid(),
 						  UCOUNT_FANOTIFY_GROUPS);
 	if (!group->fanotify_data.ucounts) {
 		fd = -EMFILE;
@@ -1524,6 +1525,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 
 	group->fanotify_data.flags = flags | internal_flags;
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
+	group->user_ns = get_user_ns(user_ns);
 
 	group->fanotify_data.merge_hash = fanotify_alloc_merge_hash();
 	if (!group->fanotify_data.merge_hash) {
@@ -1557,21 +1559,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		goto out_destroy_group;
 	}
 
+	BUILD_BUG_ON(!(FANOTIFY_ADMIN_INIT_FLAGS & FAN_UNLIMITED_QUEUE));
 	if (flags & FAN_UNLIMITED_QUEUE) {
-		fd = -EPERM;
-		if (!capable(CAP_SYS_ADMIN))
-			goto out_destroy_group;
 		group->max_events = UINT_MAX;
 	} else {
 		group->max_events = fanotify_max_queued_events;
 	}
 
-	if (flags & FAN_UNLIMITED_MARKS) {
-		fd = -EPERM;
-		if (!capable(CAP_SYS_ADMIN))
-			goto out_destroy_group;
-	}
-
 	if (flags & FAN_ENABLE_AUDIT) {
 		fd = -EPERM;
 		if (!capable(CAP_AUDIT_WRITE))
@@ -1758,17 +1752,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		goto fput_and_out;
 	group = f.file->private_data;
 
-	/*
-	 * An unprivileged user is not allowed to setup mount nor filesystem
-	 * marks.  This also includes setting up such marks by a group that
-	 * was initialized by an unprivileged user.
-	 */
-	ret = -EPERM;
-	if ((!capable(CAP_SYS_ADMIN) ||
-	     FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) &&
-	    mark_type != FAN_MARK_INODE)
-		goto fput_and_out;
-
 	/*
 	 * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF.  These are not
 	 * allowed to set permissions events.
@@ -1821,6 +1804,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		goto fput_and_out;
 
 	if (mark_cmd == FAN_MARK_FLUSH) {
+		/*
+		 * A user is not allowed to flush sb/mount marks unless the
+		 * user is capable in the user ns where the group was created.
+		 */
+		ret = -EPERM;
+		if (!ns_capable(group->user_ns, CAP_SYS_ADMIN) &&
+		    mark_type != FAN_MARK_INODE)
+			goto fput_and_out;
+
 		ret = 0;
 		if (mark_type == FAN_MARK_MOUNT)
 			fsnotify_clear_vfsmount_marks_by_group(group);
@@ -1861,10 +1853,28 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	}
 
 	/* inode held in place by reference to path; group by fget on fd */
-	if (mark_type == FAN_MARK_INODE)
+	if (mark_type == FAN_MARK_INODE) {
 		inode = path.dentry->d_inode;
-	else
+	} else {
+		/*
+		 * A user is not allowed to setup sb/mount marks unless the
+		 * user is capable in the user ns where the group was created.
+		 * The user also needs to be capable either in the user ns
+		 * associated with the marked filesystem or in the user ns
+		 * associated with an idmapped mount (when marking a mount).
+		 */
+		ret = -EPERM;
+		if (!ns_capable(group->user_ns, CAP_SYS_ADMIN))
+			goto path_put_and_out;
+
 		mnt = path.mnt;
+		if (!ns_capable(mnt->mnt_sb->s_user_ns, CAP_SYS_ADMIN)) {
+			if (mark_type == FAN_MARK_FILESYSTEM ||
+			    !ns_capable(real_mount(mnt)->mnt_ns->user_ns,
+					CAP_SYS_ADMIN))
+				goto path_put_and_out;
+		}
+	}
 
 	ret = mnt ? -EINVAL : -EISDIR;
 	/* FAN_MARK_IGNORE requires SURV_MODIFY for sb/mount/dir marks */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f942b1d6326c..70df7eace7e5 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -230,6 +230,7 @@ struct fsnotify_group {
 						 * full */
 
 	struct mem_cgroup *memcg;	/* memcg to charge allocations */
+	struct user_namespace *user_ns;	/* user ns where group was created */
 
 	/* groups can define private fields here or use the void *private */
 	union {
-- 
2.34.1


             reply	other threads:[~2023-04-16  6:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-16  6:07 Amir Goldstein [this message]
2023-04-20 13:12 ` [RFC][PATCH] fanotify: support watching filesystems and mounts inside userns Jan Kara
2023-04-20 14:15   ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230416060722.1912831-1-amir73il@gmail.com \
    --to=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=lennart@poettering.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=repnop@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).