bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace
@ 2023-10-18 10:50 Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 01/14] device_cgroup: Implement devcgroup hooks as lsm security hooks Michael Weiß
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Introduce the flag BPF_DEVCG_ACC_MKNOD_UNS for bpf programs of type
BPF_PROG_TYPE_CGROUP_DEVICE which allows to guard access to mknod
in non-initial user namespaces.

If a container manager restricts its unprivileged (user namespaced)
children by a device cgroup, it is not necessary to deny mknod()
anymore. Thus, user space applications may map devices on different
locations in the file system by using mknod() inside the container.

A use case for this, we also use in GyroidOS, is to run virsh for
VMs inside an unprivileged container. virsh creates device nodes,
e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails
in a non-initial userns, even if a cgroup device white list with the
corresponding major, minor of /dev/null exists. Thus, in this case
the usual bind mounts or pre populated device nodes under /dev are
not sufficient.

To circumvent this limitation, allow mknod() by checking CAP_MKNOD
in the userns by implementing the security_inode_mknod_nscap(). The
hook implementation checks if the corresponding permission flag
BPF_DEVCG_ACC_MKNOD_UNS is set for the device in the bpf program.
To avoid to create unusable inodes in user space the hook also
checks SB_I_NODEV on the corresponding super block.

Further, the security_sb_alloc_userns() hook is implemented using
cgroup_bpf_current_enabled() to allow usage of device nodes on super
blocks mounted by a guarded task.

Patch 1 to 3 rework the current devcgroup_inode hooks as an LSM

Patch 4 to 8 rework explicit calls to devcgroup_check_permission
also as LSM hooks and finalize the conversion of the device_cgroup
subsystem to a LSM.

Patch 9 and 10 introduce new generic security hooks to be used
for the actual mknod device guard implementation.

Patch 11 wires up the security hooks in the vfs

Patch 12 and 13 provide helper functions in the bpf cgroup
subsystem.

Patch 14 finally implement the LSM hooks to grand access

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
Changes in v2:
- Integrate this as LSM (Christian, Paul)
- Switched to a device cgroup specific flag instead of a generic
  bpf program flag (Christian)
- do not ignore SB_I_NODEV in fs/namei.c but use LSM hook in
  sb_alloc_super in fs/super.c
- Link to v1: https://lore.kernel.org/r/20230814-devcg_guard-v1-0-654971ab88b1@aisec.fraunhofer.de

Michael Weiß (14):
  device_cgroup: Implement devcgroup hooks as lsm security hooks
  vfs: Remove explicit devcgroup_inode calls
  device_cgroup: Remove explicit devcgroup_inode hooks
  lsm: Add security_dev_permission() hook
  device_cgroup: Implement dev_permission() hook
  block: Switch from devcgroup_check_permission to security hook
  drm/amdkfd: Switch from devcgroup_check_permission to security hook
  device_cgroup: Hide devcgroup functionality completely in lsm
  lsm: Add security_inode_mknod_nscap() hook
  lsm: Add security_sb_alloc_userns() hook
  vfs: Wire up security hooks for lsm-based device guard in userns
  bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access
  bpf: cgroup: Introduce helper cgroup_bpf_current_enabled()
  device_cgroup: Allow mknod in non-initial userns if guarded

 block/bdev.c                                 |   9 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h        |   7 +-
 fs/namei.c                                   |  24 ++--
 fs/super.c                                   |   6 +-
 include/linux/bpf-cgroup.h                   |   2 +
 include/linux/device_cgroup.h                |  67 -----------
 include/linux/lsm_hook_defs.h                |   4 +
 include/linux/security.h                     |  18 +++
 include/uapi/linux/bpf.h                     |   1 +
 init/Kconfig                                 |   4 +
 kernel/bpf/cgroup.c                          |  14 +++
 security/Kconfig                             |   1 +
 security/Makefile                            |   2 +-
 security/device_cgroup/Kconfig               |   7 ++
 security/device_cgroup/Makefile              |   4 +
 security/{ => device_cgroup}/device_cgroup.c |   3 +-
 security/device_cgroup/device_cgroup.h       |  20 ++++
 security/device_cgroup/lsm.c                 | 114 +++++++++++++++++++
 security/security.c                          |  75 ++++++++++++
 19 files changed, 294 insertions(+), 88 deletions(-)
 delete mode 100644 include/linux/device_cgroup.h
 create mode 100644 security/device_cgroup/Kconfig
 create mode 100644 security/device_cgroup/Makefile
 rename security/{ => device_cgroup}/device_cgroup.c (99%)
 create mode 100644 security/device_cgroup/device_cgroup.h
 create mode 100644 security/device_cgroup/lsm.c


base-commit: 58720809f52779dc0f08e53e54b014209d13eebb
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 01/14] device_cgroup: Implement devcgroup hooks as lsm security hooks
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 02/14] vfs: Remove explicit devcgroup_inode calls Michael Weiß
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

devcgroup_inode_mknod and devcgroup_inode_permission hooks are
called at place where already the corresponding lsm hooks
security_inode_mknod and security_inode_permission are called
to govern device access. Though introduce a small LSM which
implements those two security hooks instead of the additional
explicit devcgroup calls. The explicit API will be removed when
corresponding subsystems will drop the direct call to devcgroup
hooks.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 init/Kconfig                                 |  4 +
 security/Kconfig                             |  1 +
 security/Makefile                            |  2 +-
 security/device_cgroup/Kconfig               |  7 ++
 security/device_cgroup/Makefile              |  4 +
 security/{ => device_cgroup}/device_cgroup.c |  0
 security/device_cgroup/lsm.c                 | 82 ++++++++++++++++++++
 7 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 security/device_cgroup/Kconfig
 create mode 100644 security/device_cgroup/Makefile
 rename security/{ => device_cgroup}/device_cgroup.c (100%)
 create mode 100644 security/device_cgroup/lsm.c

diff --git a/init/Kconfig b/init/Kconfig
index 6d35728b94b2..5ed28dc821f3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1111,6 +1111,8 @@ config PROC_PID_CPUSET
 
 config CGROUP_DEVICE
 	bool "Device controller"
+	select SECURITY
+	select SECURITY_DEVICE_CGROUP
 	help
 	  Provides a cgroup controller implementing whitelists for
 	  devices which a process in the cgroup can mknod or open.
@@ -1136,6 +1138,8 @@ config CGROUP_BPF
 	bool "Support for eBPF programs attached to cgroups"
 	depends on BPF_SYSCALL
 	select SOCK_CGROUP_DATA
+	select SECURITY
+	select SECURITY_DEVICE_CGROUP
 	help
 	  Allow attaching eBPF programs to a cgroup using the bpf(2)
 	  syscall command BPF_PROG_ATTACH.
diff --git a/security/Kconfig b/security/Kconfig
index 52c9af08ad35..0a0e60fc50e1 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -194,6 +194,7 @@ source "security/yama/Kconfig"
 source "security/safesetid/Kconfig"
 source "security/lockdown/Kconfig"
 source "security/landlock/Kconfig"
+source "security/device_cgroup/Kconfig"
 
 source "security/integrity/Kconfig"
 
diff --git a/security/Makefile b/security/Makefile
index 18121f8f85cd..7000cb8a69e8 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -21,7 +21,7 @@ obj-$(CONFIG_SECURITY_YAMA)		+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
 obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
 obj-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown/
-obj-$(CONFIG_CGROUPS)			+= device_cgroup.o
+obj-$(CONFIG_SECURITY_DEVICE_CGROUP)	+= device_cgroup/
 obj-$(CONFIG_BPF_LSM)			+= bpf/
 obj-$(CONFIG_SECURITY_LANDLOCK)		+= landlock/
 
diff --git a/security/device_cgroup/Kconfig b/security/device_cgroup/Kconfig
new file mode 100644
index 000000000000..93934bda3b8e
--- /dev/null
+++ b/security/device_cgroup/Kconfig
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config SECURITY_DEVICE_CGROUP
+	bool "Device Cgroup Support"
+	depends on SECURITY
+	help
+	  Provides the necessary security framework integration
+	  for cgroup device controller implementations.
diff --git a/security/device_cgroup/Makefile b/security/device_cgroup/Makefile
new file mode 100644
index 000000000000..c715b2b96388
--- /dev/null
+++ b/security/device_cgroup/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_SECURITY_DEVICE_CGROUP) += devcgroup.o
+
+devcgroup-y := lsm.o device_cgroup.o
diff --git a/security/device_cgroup.c b/security/device_cgroup/device_cgroup.c
similarity index 100%
rename from security/device_cgroup.c
rename to security/device_cgroup/device_cgroup.c
diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
new file mode 100644
index 000000000000..ef30cff1f610
--- /dev/null
+++ b/security/device_cgroup/lsm.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device cgroup security module
+ *
+ * This file contains device cgroup LSM hooks.
+ *
+ * Copyright (C) 2023 Fraunhofer AISEC. All rights reserved.
+ * Based on code copied from <file:include/linux/device_cgroups.h> (which has no copyright)
+ *
+ * Authors: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
+ */
+
+#include <linux/bpf-cgroup.h>
+#include <linux/device_cgroup.h>
+#include <linux/lsm_hooks.h>
+
+static int devcg_inode_permission(struct inode *inode, int mask)
+{
+	short type, access = 0;
+
+	if (likely(!inode->i_rdev))
+		return 0;
+
+	if (S_ISBLK(inode->i_mode))
+		type = DEVCG_DEV_BLOCK;
+	else if (S_ISCHR(inode->i_mode))
+		type = DEVCG_DEV_CHAR;
+	else
+		return 0;
+
+	if (mask & MAY_WRITE)
+		access |= DEVCG_ACC_WRITE;
+	if (mask & MAY_READ)
+		access |= DEVCG_ACC_READ;
+
+	return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+					  access);
+}
+
+static int __devcg_inode_mknod(int mode, dev_t dev, short access)
+{
+	short type;
+
+	if (!S_ISBLK(mode) && !S_ISCHR(mode))
+		return 0;
+
+	if (S_ISCHR(mode) && dev == WHITEOUT_DEV)
+		return 0;
+
+	if (S_ISBLK(mode))
+		type = DEVCG_DEV_BLOCK;
+	else
+		type = DEVCG_DEV_CHAR;
+
+	return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
+					  access);
+}
+
+static int devcg_inode_mknod(struct inode *dir, struct dentry *dentry,
+				 umode_t mode, dev_t dev)
+{
+	return __devcg_inode_mknod(mode, dev, DEVCG_ACC_MKNOD);
+}
+
+static struct security_hook_list devcg_hooks[] __ro_after_init = {
+	LSM_HOOK_INIT(inode_permission, devcg_inode_permission),
+	LSM_HOOK_INIT(inode_mknod, devcg_inode_mknod),
+};
+
+static int __init devcgroup_init(void)
+{
+	security_add_hooks(devcg_hooks, ARRAY_SIZE(devcg_hooks),
+			   "devcgroup");
+	pr_info("device cgroup initialized\n");
+	return 0;
+}
+
+DEFINE_LSM(devcgroup) = {
+	.name = "devcgroup",
+	.order = LSM_ORDER_FIRST,
+	.init = devcgroup_init,
+};
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 02/14] vfs: Remove explicit devcgroup_inode calls
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 01/14] device_cgroup: Implement devcgroup hooks as lsm security hooks Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 03/14] device_cgroup: Remove explicit devcgroup_inode hooks Michael Weiß
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Since the new lsm-based cgroup device access control is settled,
the explicit calls to devcgroup_inode_permission and
devcgroup_inode_mknod in fs/namei.c are redundant and can safely
be dropped. The corresponding security_inode_permission and
security_inode_mknod hooks are taking over.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 fs/namei.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 567ee547492b..f601fcbdc4d2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -34,7 +34,6 @@
 #include <linux/capability.h>
 #include <linux/file.h>
 #include <linux/fcntl.h>
-#include <linux/device_cgroup.h>
 #include <linux/fs_struct.h>
 #include <linux/posix_acl.h>
 #include <linux/hash.h>
@@ -529,10 +528,6 @@ int inode_permission(struct mnt_idmap *idmap,
 	if (retval)
 		return retval;
 
-	retval = devcgroup_inode_permission(inode, mask);
-	if (retval)
-		return retval;
-
 	return security_inode_permission(inode, mask);
 }
 EXPORT_SYMBOL(inode_permission);
@@ -3987,9 +3982,6 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
 		return -EPERM;
 
 	mode = vfs_prepare_mode(idmap, dir, mode, mode, mode);
-	error = devcgroup_inode_mknod(mode, dev);
-	if (error)
-		return error;
 
 	error = security_inode_mknod(dir, dentry, mode, dev);
 	if (error)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 03/14] device_cgroup: Remove explicit devcgroup_inode hooks
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 01/14] device_cgroup: Implement devcgroup hooks as lsm security hooks Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 02/14] vfs: Remove explicit devcgroup_inode calls Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 04/14] lsm: Add security_dev_permission() hook Michael Weiß
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

All users (actually just fs/namei) of devcgroup_inode_mknod and
devcgroup_inode_permission are removed. Now drop the API completely.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 include/linux/device_cgroup.h | 47 -----------------------------------
 1 file changed, 47 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index d02f32b7514e..d9a62b0cff87 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -14,54 +14,7 @@
 #if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
 int devcgroup_check_permission(short type, u32 major, u32 minor,
 			       short access);
-static inline int devcgroup_inode_permission(struct inode *inode, int mask)
-{
-	short type, access = 0;
-
-	if (likely(!inode->i_rdev))
-		return 0;
-
-	if (S_ISBLK(inode->i_mode))
-		type = DEVCG_DEV_BLOCK;
-	else if (S_ISCHR(inode->i_mode))
-		type = DEVCG_DEV_CHAR;
-	else
-		return 0;
-
-	if (mask & MAY_WRITE)
-		access |= DEVCG_ACC_WRITE;
-	if (mask & MAY_READ)
-		access |= DEVCG_ACC_READ;
-
-	return devcgroup_check_permission(type, imajor(inode), iminor(inode),
-					  access);
-}
-
-static inline int devcgroup_inode_mknod(int mode, dev_t dev)
-{
-	short type;
-
-	if (!S_ISBLK(mode) && !S_ISCHR(mode))
-		return 0;
-
-	if (S_ISCHR(mode) && dev == WHITEOUT_DEV)
-		return 0;
-
-	if (S_ISBLK(mode))
-		type = DEVCG_DEV_BLOCK;
-	else
-		type = DEVCG_DEV_CHAR;
-
-	return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
-					  DEVCG_ACC_MKNOD);
-}
-
 #else
 static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
 			       short access)
-{ return 0; }
-static inline int devcgroup_inode_permission(struct inode *inode, int mask)
-{ return 0; }
-static inline int devcgroup_inode_mknod(int mode, dev_t dev)
-{ return 0; }
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 04/14] lsm: Add security_dev_permission() hook
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (2 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 03/14] device_cgroup: Remove explicit devcgroup_inode hooks Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 05/14] device_cgroup: Implement dev_permission() hook Michael Weiß
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Provide a new lsm hook which may be used to check permission on
a device by its dev_t representation only. This could be used if
an inode is not available and the security_inode_permission
check is not applicable.

A first lsm to use this will be the lately converted cgroup_device
module, to allow permission checks inside driver implementations.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/security.h      |  5 +++++
 security/security.c           | 18 ++++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ac962c4cb44b..a868982725a9 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -275,6 +275,7 @@ LSM_HOOK(int, 0, inode_notifysecctx, struct inode *inode, void *ctx, u32 ctxlen)
 LSM_HOOK(int, 0, inode_setsecctx, struct dentry *dentry, void *ctx, u32 ctxlen)
 LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
 	 u32 *ctxlen)
+LSM_HOOK(int, 0, dev_permission, umode_t mode, dev_t dev, int mask)
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
 LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
diff --git a/include/linux/security.h b/include/linux/security.h
index 5f16eecde00b..8bc6ac8816c6 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -484,6 +484,7 @@ int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
 int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
 int security_locked_down(enum lockdown_reason what);
+int security_dev_permission(umode_t mode, dev_t dev, int mask);
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1395,6 +1396,10 @@ static inline int security_locked_down(enum lockdown_reason what)
 {
 	return 0;
 }
+static inline int security_dev_permission(umode_t mode, dev_t dev, int mask)
+{
+	return 0;
+}
 #endif	/* CONFIG_SECURITY */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index 23b129d482a7..40f6787df3b1 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4016,6 +4016,24 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
 }
 EXPORT_SYMBOL(security_inode_getsecctx);
 
+/**
+ * security_dev_permission() - Check if accessing a dev is allowed
+ * @mode: file mode holding device type
+ * @dev: device
+ * @mask: access mask
+ *
+ * Check permission before accessing an device by its major minor.
+ * This hook is called by drivers which may not have an inode but only
+ * the dev_t representation of a device to check permission.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_dev_permission(umode_t mode, dev_t dev, int mask)
+{
+	return call_int_hook(dev_permission, 0, mode, dev, mask);
+}
+EXPORT_SYMBOL(security_dev_permission);
+
 #ifdef CONFIG_WATCH_QUEUE
 /**
  * security_post_notification() - Check if a watch notification can be posted
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 05/14] device_cgroup: Implement dev_permission() hook
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (3 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 04/14] lsm: Add security_dev_permission() hook Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 06/14] block: Switch from devcgroup_check_permission to security hook Michael Weiß
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Wrap devcgroup_check_permission() by implementing the new security
hook dev_permission().

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 security/device_cgroup/lsm.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
index ef30cff1f610..987d2c20a577 100644
--- a/security/device_cgroup/lsm.c
+++ b/security/device_cgroup/lsm.c
@@ -14,29 +14,32 @@
 #include <linux/device_cgroup.h>
 #include <linux/lsm_hooks.h>
 
-static int devcg_inode_permission(struct inode *inode, int mask)
+static int devcg_dev_permission(umode_t mode, dev_t dev, int mask)
 {
 	short type, access = 0;
 
-	if (likely(!inode->i_rdev))
-		return 0;
-
-	if (S_ISBLK(inode->i_mode))
+	if (S_ISBLK(mode))
 		type = DEVCG_DEV_BLOCK;
-	else if (S_ISCHR(inode->i_mode))
-		type = DEVCG_DEV_CHAR;
 	else
-		return 0;
+		type = DEVCG_DEV_CHAR;
 
 	if (mask & MAY_WRITE)
 		access |= DEVCG_ACC_WRITE;
 	if (mask & MAY_READ)
 		access |= DEVCG_ACC_READ;
 
-	return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+	return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
 					  access);
 }
 
+static int devcg_inode_permission(struct inode *inode, int mask)
+{
+	if (likely(!inode->i_rdev))
+		return 0;
+
+	return devcg_dev_permission(inode->i_mode, inode->i_rdev, mask);
+}
+
 static int __devcg_inode_mknod(int mode, dev_t dev, short access)
 {
 	short type;
@@ -65,6 +68,7 @@ static int devcg_inode_mknod(struct inode *dir, struct dentry *dentry,
 static struct security_hook_list devcg_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(inode_permission, devcg_inode_permission),
 	LSM_HOOK_INIT(inode_mknod, devcg_inode_mknod),
+	LSM_HOOK_INIT(dev_permission, devcg_dev_permission),
 };
 
 static int __init devcgroup_init(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 06/14] block: Switch from devcgroup_check_permission to security hook
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (4 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 05/14] device_cgroup: Implement dev_permission() hook Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 07/14] drm/amdkfd: " Michael Weiß
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

The new lsm-based cgroup device access control provides an
equivalent hook to check device permission. Thus, switch to the
more generic security hook security_dev_permission() instead of
directly calling devcgroup_check_permission().

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 block/bdev.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index f3b13aa1b7d4..fc6de4e2a80b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -10,7 +10,6 @@
 #include <linux/slab.h>
 #include <linux/kmod.h>
 #include <linux/major.h>
-#include <linux/device_cgroup.h>
 #include <linux/blkdev.h>
 #include <linux/blk-integrity.h>
 #include <linux/backing-dev.h>
@@ -27,6 +26,7 @@
 #include <linux/part_stat.h>
 #include <linux/uaccess.h>
 #include <linux/stat.h>
+#include <linux/security.h>
 #include "../fs/internal.h"
 #include "blk.h"
 
@@ -757,10 +757,9 @@ struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 	struct gendisk *disk;
 	int ret;
 
-	ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
-			MAJOR(dev), MINOR(dev),
-			((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
-			((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
+	ret = security_dev_permission(S_IFBLK, dev,
+			((mode & BLK_OPEN_READ) ? MAY_READ : 0) |
+			((mode & BLK_OPEN_WRITE) ? MAY_WRITE : 0));
 	if (ret)
 		return ERR_PTR(ret);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 07/14] drm/amdkfd: Switch from devcgroup_check_permission to security hook
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (5 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 06/14] block: Switch from devcgroup_check_permission to security hook Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 08/14] device_cgroup: Hide devcgroup functionality completely in lsm Michael Weiß
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

The new lsm-based cgroup device access control provides an
equivalent hook to check device permission. Thus, switch to the
more generic security hook security_dev_permission() instead of
directly calling devcgroup_check_permission().

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index fa24e1852493..50979f332e38 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -38,7 +38,7 @@
 #include <linux/seq_file.h>
 #include <linux/kref.h>
 #include <linux/sysfs.h>
-#include <linux/device_cgroup.h>
+#include <linux/security.h>
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_device.h>
@@ -1487,9 +1487,8 @@ static inline int kfd_devcgroup_check_permission(struct kfd_node *kfd)
 #if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
 	struct drm_device *ddev = adev_to_drm(kfd->adev);
 
-	return devcgroup_check_permission(DEVCG_DEV_CHAR, DRM_MAJOR,
-					  ddev->render->index,
-					  DEVCG_ACC_WRITE | DEVCG_ACC_READ);
+	return security_dev_permission(S_IFCHR, MKDEV(DRM_MAJOR, ddev->render->index),
+				       MAY_WRITE | MAY_READ);
 #else
 	return 0;
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 08/14] device_cgroup: Hide devcgroup functionality completely in lsm
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (6 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 07/14] drm/amdkfd: " Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 09/14] lsm: Add security_inode_mknod_nscap() hook Michael Weiß
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Now since all users of devcgroup_check_permission() have been
removed, all device cgroup related functionality is covered by
security hooks. Thus, move the public device_cgroup.h header
into the subfolder of the lsm module.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 security/device_cgroup/device_cgroup.c                    | 3 ++-
 {include/linux => security/device_cgroup}/device_cgroup.h | 0
 security/device_cgroup/lsm.c                              | 3 ++-
 3 files changed, 4 insertions(+), 2 deletions(-)
 rename {include/linux => security/device_cgroup}/device_cgroup.h (100%)

diff --git a/security/device_cgroup/device_cgroup.c b/security/device_cgroup/device_cgroup.c
index dc4df7475081..1a8190929ec3 100644
--- a/security/device_cgroup/device_cgroup.c
+++ b/security/device_cgroup/device_cgroup.c
@@ -6,7 +6,6 @@
  */
 
 #include <linux/bpf-cgroup.h>
-#include <linux/device_cgroup.h>
 #include <linux/cgroup.h>
 #include <linux/ctype.h>
 #include <linux/list.h>
@@ -16,6 +15,8 @@
 #include <linux/rcupdate.h>
 #include <linux/mutex.h>
 
+#include "device_cgroup.h"
+
 #ifdef CONFIG_CGROUP_DEVICE
 
 static DEFINE_MUTEX(devcgroup_mutex);
diff --git a/include/linux/device_cgroup.h b/security/device_cgroup/device_cgroup.h
similarity index 100%
rename from include/linux/device_cgroup.h
rename to security/device_cgroup/device_cgroup.h
diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
index 987d2c20a577..a963536d0a15 100644
--- a/security/device_cgroup/lsm.c
+++ b/security/device_cgroup/lsm.c
@@ -11,9 +11,10 @@
  */
 
 #include <linux/bpf-cgroup.h>
-#include <linux/device_cgroup.h>
 #include <linux/lsm_hooks.h>
 
+#include "device_cgroup.h"
+
 static int devcg_dev_permission(umode_t mode, dev_t dev, int mask)
 {
 	short type, access = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 09/14] lsm: Add security_inode_mknod_nscap() hook
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (7 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 08/14] device_cgroup: Hide devcgroup functionality completely in lsm Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 10/14] lsm: Add security_sb_alloc_userns() hook Michael Weiß
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Provide a new lsm hook which may be used to allow mknod in
non-initial userns. If access to the device is guarded by this
hook, access to mknod may be granted by checking cap mknod for
unprivileged user namespaces.

By default this will return -EPERM if no lsm implements the
hook. A first lsm to use this will be the lately converted
cgroup_device module.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 include/linux/lsm_hook_defs.h |  2 ++
 include/linux/security.h      |  8 ++++++++
 security/security.c           | 31 +++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index a868982725a9..f4fa01182910 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -276,6 +276,8 @@ LSM_HOOK(int, 0, inode_setsecctx, struct dentry *dentry, void *ctx, u32 ctxlen)
 LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
 	 u32 *ctxlen)
 LSM_HOOK(int, 0, dev_permission, umode_t mode, dev_t dev, int mask)
+LSM_HOOK(int, -EPERM, inode_mknod_nscap, struct inode *dir, struct dentry *dentry,
+	 umode_t mode, dev_t dev)
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
 LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
diff --git a/include/linux/security.h b/include/linux/security.h
index 8bc6ac8816c6..bad6992877f4 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -485,6 +485,8 @@ int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
 int security_locked_down(enum lockdown_reason what);
 int security_dev_permission(umode_t mode, dev_t dev, int mask);
+int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
+			       umode_t mode, dev_t dev);
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1400,6 +1402,12 @@ static inline int security_dev_permission(umode_t mode, dev_t dev, int mask)
 {
 	return 0;
 }
+static inline int security_inode_mknod_nscap(struct inode *dir,
+					     struct dentry *dentry,
+					     umode_t mode, dev_t dev);
+{
+	return -EPERM;
+}
 #endif	/* CONFIG_SECURITY */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index 40f6787df3b1..7708374b6d7e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4034,6 +4034,37 @@ int security_dev_permission(umode_t mode, dev_t dev, int mask)
 }
 EXPORT_SYMBOL(security_dev_permission);
 
+/**
+ * security_inode_mknod_nscap() - Check if device is guarded
+ * @dir: parent directory
+ * @dentry: new file
+ * @mode: new file mode
+ * @dev: device number
+ *
+ * If access to the device is guarded by this hook, access to mknod may be granted by
+ * checking cap mknod for unprivileged user namespaces.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
+			       umode_t mode, dev_t dev)
+{
+	int thisrc;
+	int rc = LSM_RET_DEFAULT(inode_mknod_nscap);
+	struct security_hook_list *hp;
+
+	hlist_for_each_entry(hp, &security_hook_heads.inode_mknod_nscap, list) {
+		thisrc = hp->hook.inode_mknod_nscap(dir, dentry, mode, dev);
+		if (thisrc != LSM_RET_DEFAULT(inode_mknod_nscap)) {
+			rc = thisrc;
+			if (thisrc != 0)
+				break;
+		}
+	}
+	return rc;
+}
+EXPORT_SYMBOL(security_inode_mknod_nscap);
+
 #ifdef CONFIG_WATCH_QUEUE
 /**
  * security_post_notification() - Check if a watch notification can be posted
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 10/14] lsm: Add security_sb_alloc_userns() hook
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (8 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 09/14] lsm: Add security_inode_mknod_nscap() hook Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 11/14] vfs: Wire up security hooks for lsm-based device guard in userns Michael Weiß
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Provide a new lsm hook which may be used to allow access to device
nodes for super blocks created in unprivileged namespaces if some
sort of device guard to control access is implemented.

By default this will return -EPERM if no lsm implements the hook.
A first lsm to use this will be the lately converted cgroup_device
module.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/security.h      |  5 +++++
 security/security.c           | 26 ++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index f4fa01182910..0f734a0a5ebc 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -278,6 +278,7 @@ LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
 LSM_HOOK(int, 0, dev_permission, umode_t mode, dev_t dev, int mask)
 LSM_HOOK(int, -EPERM, inode_mknod_nscap, struct inode *dir, struct dentry *dentry,
 	 umode_t mode, dev_t dev)
+LSM_HOOK(int, -EPERM, sb_alloc_userns, struct super_block *sb)
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
 LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
diff --git a/include/linux/security.h b/include/linux/security.h
index bad6992877f4..0f66be1ed1ed 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -487,6 +487,7 @@ int security_locked_down(enum lockdown_reason what);
 int security_dev_permission(umode_t mode, dev_t dev, int mask);
 int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
 			       umode_t mode, dev_t dev);
+int security_sb_alloc_userns(struct super_block *sb);
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1408,6 +1409,10 @@ static inline int security_inode_mknod_nscap(struct inode *dir,
 {
 	return -EPERM;
 }
+static inline int security_sb_alloc_userns(struct super_block *sb)
+{
+	return -EPERM;
+}
 #endif	/* CONFIG_SECURITY */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index 7708374b6d7e..9d5d4ec28e62 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4065,6 +4065,32 @@ int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
 }
 EXPORT_SYMBOL(security_inode_mknod_nscap);
 
+/**
+ * security_sb_alloc_userns() - Grand access to device nodes on sb in userns
+ *
+ * If device access is provided elsewere, this hook will grand access to device nodes
+ * on the allocated sb for unprivileged user namespaces.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_sb_alloc_userns(struct super_block *sb)
+{
+	int thisrc;
+	int rc = LSM_RET_DEFAULT(sb_alloc_userns);
+	struct security_hook_list *hp;
+
+	hlist_for_each_entry(hp, &security_hook_heads.sb_alloc_userns, list) {
+		thisrc = hp->hook.sb_alloc_userns(sb);
+		if (thisrc != LSM_RET_DEFAULT(sb_alloc_userns)) {
+			rc = thisrc;
+			if (thisrc != 0)
+				break;
+		}
+	}
+	return rc;
+}
+EXPORT_SYMBOL(security_sb_alloc_userns);
+
 #ifdef CONFIG_WATCH_QUEUE
 /**
  * security_post_notification() - Check if a watch notification can be posted
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 11/14] vfs: Wire up security hooks for lsm-based device guard in userns
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (9 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 10/14] lsm: Add security_sb_alloc_userns() hook Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 12/14] bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access Michael Weiß
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

Wire up security_inode_mknod_capns() in fs/namei.c. If implemented
and access is granted by an lsm, check ns_capable() instead of the
global CAP_MKNOD.

Wire up security_sb_alloc_userns() in fs/super.c. If implemented
and access is granted by an lsm, the created super block will allow
access to device nodes also if it was created in a non-inital userns.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 fs/namei.c | 16 +++++++++++++++-
 fs/super.c |  6 +++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f601fcbdc4d2..1f68d160e2c0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3949,6 +3949,20 @@ inline struct dentry *user_path_create(int dfd, const char __user *pathname,
 }
 EXPORT_SYMBOL(user_path_create);
 
+static bool mknod_capable(struct inode *dir, struct dentry *dentry,
+			  umode_t mode, dev_t dev)
+{
+	/*
+	 * In case of a security hook implementation check mknod in user
+	 * namespace. Otherwise just check global capability.
+	 */
+	int error = security_inode_mknod_nscap(dir, dentry, mode, dev);
+	if (!error)
+		return ns_capable(current_user_ns(), CAP_MKNOD);
+	else
+		return capable(CAP_MKNOD);
+}
+
 /**
  * vfs_mknod - create device node or file
  * @idmap:	idmap of the mount the inode was found from
@@ -3975,7 +3989,7 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
 		return error;
 
 	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout &&
-	    !capable(CAP_MKNOD))
+	    !mknod_capable(dir, dentry, mode, dev))
 		return -EPERM;
 
 	if (!dir->i_op->mknod)
diff --git a/fs/super.c b/fs/super.c
index 2d762ce67f6e..bb01db6d9986 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -362,7 +362,11 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	}
 	s->s_bdi = &noop_backing_dev_info;
 	s->s_flags = flags;
-	if (s->s_user_ns != &init_user_ns)
+	/*
+	 * We still have to think about this here. Several concerns exist
+	 * about the security model, especially about malicious fuse.
+	 */
+	if (s->s_user_ns != &init_user_ns && security_sb_alloc_userns(s))
 		s->s_iflags |= SB_I_NODEV;
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_roots);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 12/14] bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (10 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 11/14] vfs: Wire up security hooks for lsm-based device guard in userns Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 13/14] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled() Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 14/14] device_cgroup: Allow mknod in non-initial userns if guarded Michael Weiß
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

With this new flag for bpf cgroup device programs, it should be
possible to guard mknod() access in non-initial user namespaces
later on.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 include/uapi/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0448700890f7..0196b9c72d3e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6927,6 +6927,7 @@ enum {
 	BPF_DEVCG_ACC_MKNOD	= (1ULL << 0),
 	BPF_DEVCG_ACC_READ	= (1ULL << 1),
 	BPF_DEVCG_ACC_WRITE	= (1ULL << 2),
+	BPF_DEVCG_ACC_MKNOD_UNS	= (1ULL << 3),
 };
 
 enum {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 13/14] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled()
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (11 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 12/14] bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  2023-10-18 10:50 ` [RFC PATCH v2 14/14] device_cgroup: Allow mknod in non-initial userns if guarded Michael Weiß
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß,
	Alexander Mikhalitsyn

This helper can be used to check if a cgroup-bpf specific program is
active for the current task.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 include/linux/bpf-cgroup.h |  2 ++
 kernel/bpf/cgroup.c        | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 8506690dbb9c..655697c2a620 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -184,6 +184,8 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 	return array != &bpf_empty_prog_array.hdr;
 }
 
+bool cgroup_bpf_current_enabled(enum cgroup_bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)			      \
 ({									      \
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 03b3d4492980..19ae3d037db7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -24,6 +24,20 @@
 DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
 
+bool cgroup_bpf_current_enabled(enum cgroup_bpf_attach_type type)
+{
+	struct cgroup *cgrp;
+	struct bpf_prog_array *array;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	rcu_read_unlock();
+
+	array = rcu_access_pointer(cgrp->bpf.effective[type]);
+	return array != &bpf_empty_prog_array.hdr;
+}
+EXPORT_SYMBOL(cgroup_bpf_current_enabled);
+
 /* __always_inline is necessary to prevent indirect call through run_prog
  * function pointer.
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 14/14] device_cgroup: Allow mknod in non-initial userns if guarded
  2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
                   ` (12 preceding siblings ...)
  2023-10-18 10:50 ` [RFC PATCH v2 13/14] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled() Michael Weiß
@ 2023-10-18 10:50 ` Michael Weiß
  13 siblings, 0 replies; 15+ messages in thread
From: Michael Weiß @ 2023-10-18 10:50 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, Christian Brauner, Alexei Starovoitov, Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, gyroidos, Michael Weiß

If a container manager restricts its unprivileged (user namespaced)
children by a device cgroup, it is not necessary to deny mknod()
anymore. Thus, user space applications may map devices on different
locations in the file system by using mknod() inside the container.

A use case for this, we also use in GyroidOS, is to run virsh for
VMs inside an unprivileged container. virsh creates device nodes,
e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails
in a non-initial userns, even if a cgroup device white list with the
corresponding major, minor of /dev/null exists. Thus, in this case
the usual bind mounts or pre populated device nodes under /dev are
not sufficient.

To circumvent this limitation, allow mknod() by checking CAP_MKNOD
in the userns by implementing the security_inode_mknod_nscap(). The
hook implementation checks if the corresponding permission flag
BPF_DEVCG_ACC_MKNOD_UNS is set for the device in the bpf program.
To avoid to create unusable inodes in user space the hook also checks
SB_I_NODEV on the corresponding super block.

Further, the security_sb_alloc_userns() hook is implemented using
cgroup_bpf_current_enabled() to allow usage of device nodes on super
blocks mounted by a guarded task.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 security/device_cgroup/lsm.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
index a963536d0a15..6bc984d9c9d1 100644
--- a/security/device_cgroup/lsm.c
+++ b/security/device_cgroup/lsm.c
@@ -66,10 +66,37 @@ static int devcg_inode_mknod(struct inode *dir, struct dentry *dentry,
 	return __devcg_inode_mknod(mode, dev, DEVCG_ACC_MKNOD);
 }
 
+#ifdef CONFIG_CGROUP_BPF
+static int devcg_sb_alloc_userns(struct super_block *sb)
+{
+	if (cgroup_bpf_current_enabled(CGROUP_DEVICE))
+		return 0;
+
+	return -EPERM;
+}
+
+static int devcg_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
+				       umode_t mode, dev_t dev)
+{
+	if (!cgroup_bpf_current_enabled(CGROUP_DEVICE))
+		return -EPERM;
+
+	// avoid to create unusable inodes in user space
+	if (dentry->d_sb->s_iflags & SB_I_NODEV)
+		return -EPERM;
+
+	return __devcg_inode_mknod(mode, dev, BPF_DEVCG_ACC_MKNOD_UNS);
+}
+#endif /* CONFIG_CGROUP_BPF */
+
 static struct security_hook_list devcg_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(inode_permission, devcg_inode_permission),
 	LSM_HOOK_INIT(inode_mknod, devcg_inode_mknod),
 	LSM_HOOK_INIT(dev_permission, devcg_dev_permission),
+#ifdef CONFIG_CGROUP_BPF
+	LSM_HOOK_INIT(sb_alloc_userns, devcg_sb_alloc_userns),
+	LSM_HOOK_INIT(inode_mknod_nscap, devcg_inode_mknod_nscap),
+#endif
 };
 
 static int __init devcgroup_init(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-10-18 10:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-18 10:50 [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 01/14] device_cgroup: Implement devcgroup hooks as lsm security hooks Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 02/14] vfs: Remove explicit devcgroup_inode calls Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 03/14] device_cgroup: Remove explicit devcgroup_inode hooks Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 04/14] lsm: Add security_dev_permission() hook Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 05/14] device_cgroup: Implement dev_permission() hook Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 06/14] block: Switch from devcgroup_check_permission to security hook Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 07/14] drm/amdkfd: " Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 08/14] device_cgroup: Hide devcgroup functionality completely in lsm Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 09/14] lsm: Add security_inode_mknod_nscap() hook Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 10/14] lsm: Add security_sb_alloc_userns() hook Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 11/14] vfs: Wire up security hooks for lsm-based device guard in userns Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 12/14] bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 13/14] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled() Michael Weiß
2023-10-18 10:50 ` [RFC PATCH v2 14/14] device_cgroup: Allow mknod in non-initial userns if guarded Michael Weiß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).