All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC
@ 2022-12-09 16:04 jeffxu
  2022-12-09 16:04 ` [PATCH v7 1/6] mm/memfd: add F_SEAL_EXEC jeffxu
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

From: Jeff Xu <jeffxu@google.com>

Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.

However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP in this kind.

On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of  executable memfds and an attacker's [3].

To address those above, this set of patches add following:
1> Let memfd_create() set X bit at creation time.
2> Let memfd to be sealed for modifying X bit.
3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
   X bit.For example, if a container has vm.memfd_noexec=2, then
   memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4> A new security hook in memfd_create(). This make it possible to a new
LSM, which rejects or allows executable memfd based on its security policy.

Change history:
v7:
- patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c).
- patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of
		global ns (pid_sysctl.h). Add a tab (pid_namespace.h).
- patch 5/6: remove #ifdef (memfd_test.c)
- patch 6/6: remove unneeded security_move_mount(security.c).

v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/
- Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h

v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/
- Pass vm.memfd_noexec from current ns to child ns.
- Fix build issue detected by kernel test robot.
- Add missing security.c

v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/
- Address API design comments in v2.
- Let memfd_create() to set X bit at creation time.
- A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit.
- A new security hook in memfd_create().

v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/
- address comments in V1.
- add sysctl (vm.mfd_noexec) to set the default file permissions of
  memfd_create to be non-executable.

v1:https://lwn.net/Articles/890096/

[1] https://crbug.com/1305411
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1
[3] https://lwn.net/Articles/781013/

Daniel Verkamp (2):
  mm/memfd: add F_SEAL_EXEC
  selftests/memfd: add tests for F_SEAL_EXEC

Jeff Xu (4):
  mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
  selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
  mm/memfd: security hook for memfd_create

 include/linux/lsm_hook_defs.h              |   1 +
 include/linux/lsm_hooks.h                  |   4 +
 include/linux/pid_namespace.h              |  19 ++
 include/linux/security.h                   |   6 +
 include/uapi/linux/fcntl.h                 |   1 +
 include/uapi/linux/memfd.h                 |   4 +
 kernel/pid_namespace.c                     |   5 +
 kernel/pid_sysctl.h                        |  59 ++++
 mm/memfd.c                                 |  61 +++-
 mm/shmem.c                                 |   6 +
 security/security.c                        |   5 +
 tools/testing/selftests/memfd/fuse_test.c  |   1 +
 tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++-
 13 files changed, 510 insertions(+), 3 deletions(-)
 create mode 100644 kernel/pid_sysctl.h


base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7 1/6] mm/memfd: add F_SEAL_EXEC
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
@ 2022-12-09 16:04 ` jeffxu
  2022-12-09 16:04 ` [PATCH v7 2/6] selftests/memfd: add tests for F_SEAL_EXEC jeffxu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

From: Daniel Verkamp <dverkamp@chromium.org>

The new F_SEAL_EXEC flag will prevent modification of the exec bits:
written as traditional octal mask, 0111, or as named flags, S_IXUSR |
S_IXGRP | S_IXOTH. Any chmod(2) or similar call that attempts to modify
any of these bits after the seal is applied will fail with errno EPERM.

This will preserve the execute bits as they are at the time of sealing,
so the memfd will become either permanently executable or permanently
un-executable.

Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Co-developed-by: Jeff Xu <jeffxu@google.com>
Signed-off-by: Jeff Xu <jeffxu@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/uapi/linux/fcntl.h | 1 +
 mm/memfd.c                 | 2 ++
 mm/shmem.c                 | 6 ++++++
 3 files changed, 9 insertions(+)

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 2f86b2ad6d7e..e8c07da58c9f 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,7 @@
 #define F_SEAL_GROW	0x0004	/* prevent file from growing */
 #define F_SEAL_WRITE	0x0008	/* prevent writes */
 #define F_SEAL_FUTURE_WRITE	0x0010  /* prevent future writes while mapped */
+#define F_SEAL_EXEC	0x0020  /* prevent chmod modifying exec bits */
 /* (1U << 31) is reserved for signed error codes */
 
 /*
diff --git a/mm/memfd.c b/mm/memfd.c
index 08f5f8304746..4ebeab94aa74 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -147,6 +147,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
 }
 
 #define F_ALL_SEALS (F_SEAL_SEAL | \
+		     F_SEAL_EXEC | \
 		     F_SEAL_SHRINK | \
 		     F_SEAL_GROW | \
 		     F_SEAL_WRITE | \
@@ -175,6 +176,7 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
 	 *   SEAL_SHRINK: Prevent the file from shrinking
 	 *   SEAL_GROW: Prevent the file from growing
 	 *   SEAL_WRITE: Prevent write access to the file
+	 *   SEAL_EXEC: Prevent modification of the exec bits in the file mode
 	 *
 	 * As we don't require any trust relationship between two parties, we
 	 * must prevent seals from being removed. Therefore, sealing a file
diff --git a/mm/shmem.c b/mm/shmem.c
index c1d8b8a1aa3b..e18a9cf9d937 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1085,6 +1085,12 @@ static int shmem_setattr(struct user_namespace *mnt_userns,
 	if (error)
 		return error;
 
+	if ((info->seals & F_SEAL_EXEC) && (attr->ia_valid & ATTR_MODE)) {
+		if ((inode->i_mode ^ attr->ia_mode) & 0111) {
+			return -EPERM;
+		}
+	}
+
 	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
 		loff_t oldsize = inode->i_size;
 		loff_t newsize = attr->ia_size;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 2/6] selftests/memfd: add tests for F_SEAL_EXEC
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
  2022-12-09 16:04 ` [PATCH v7 1/6] mm/memfd: add F_SEAL_EXEC jeffxu
@ 2022-12-09 16:04 ` jeffxu
  2022-12-14 18:52   ` Kees Cook
  2022-12-09 16:04 ` [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

From: Daniel Verkamp <dverkamp@chromium.org>

Basic tests to ensure that user/group/other execute bits cannot be
changed after applying F_SEAL_EXEC to a memfd.

Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Co-developed-by: Jeff Xu <jeffxu@google.com>
Signed-off-by: Jeff Xu <jeffxu@google.com>
---
 tools/testing/selftests/memfd/memfd_test.c | 123 ++++++++++++++++++++-
 1 file changed, 122 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 94df2692e6e4..f18a15a1f275 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -28,12 +28,38 @@
 #define MFD_DEF_SIZE 8192
 #define STACK_SIZE 65536
 
+#define F_SEAL_EXEC	0x0020
+
 /*
  * Default is not to test hugetlbfs
  */
 static size_t mfd_def_size = MFD_DEF_SIZE;
 static const char *memfd_str = MEMFD_STR;
 
+static ssize_t fd2name(int fd, char *buf, size_t bufsize)
+{
+	char buf1[PATH_MAX];
+	int size;
+	ssize_t nbytes;
+
+	size = snprintf(buf1, PATH_MAX, "/proc/self/fd/%d", fd);
+	if (size < 0) {
+		printf("snprintf(%d) failed on %m\n", fd);
+		abort();
+	}
+
+	/*
+	 * reserver one byte for string termination.
+	 */
+	nbytes = readlink(buf1, buf, bufsize-1);
+	if (nbytes == -1) {
+		printf("readlink(%s) failed %m\n", buf1);
+		abort();
+	}
+	buf[nbytes] = '\0';
+	return nbytes;
+}
+
 static int mfd_assert_new(const char *name, loff_t sz, unsigned int flags)
 {
 	int r, fd;
@@ -98,11 +124,14 @@ static unsigned int mfd_assert_get_seals(int fd)
 
 static void mfd_assert_has_seals(int fd, unsigned int seals)
 {
+	char buf[PATH_MAX];
+	int nbytes;
 	unsigned int s;
+	fd2name(fd, buf, PATH_MAX);
 
 	s = mfd_assert_get_seals(fd);
 	if (s != seals) {
-		printf("%u != %u = GET_SEALS(%d)\n", seals, s, fd);
+		printf("%u != %u = GET_SEALS(%s)\n", seals, s, buf);
 		abort();
 	}
 }
@@ -594,6 +623,64 @@ static void mfd_fail_grow_write(int fd)
 	}
 }
 
+static void mfd_assert_mode(int fd, int mode)
+{
+	struct stat st;
+	char buf[PATH_MAX];
+	int nbytes;
+
+	fd2name(fd, buf, PATH_MAX);
+
+	if (fstat(fd, &st) < 0) {
+		printf("fstat(%s) failed: %m\n", buf);
+		abort();
+	}
+
+	if ((st.st_mode & 07777) != mode) {
+		printf("fstat(%s) wrong file mode 0%04o, but expected 0%04o\n",
+		       buf, (int)st.st_mode & 07777, mode);
+		abort();
+	}
+}
+
+static void mfd_assert_chmod(int fd, int mode)
+{
+	char buf[PATH_MAX];
+	int nbytes;
+
+	fd2name(fd, buf, PATH_MAX);
+
+	if (fchmod(fd, mode) < 0) {
+		printf("fchmod(%s, 0%04o) failed: %m\n", buf, mode);
+		abort();
+	}
+
+	mfd_assert_mode(fd, mode);
+}
+
+static void mfd_fail_chmod(int fd, int mode)
+{
+	struct stat st;
+	char buf[PATH_MAX];
+	int nbytes;
+
+	fd2name(fd, buf, PATH_MAX);
+
+	if (fstat(fd, &st) < 0) {
+		printf("fstat(%s) failed: %m\n", buf);
+		abort();
+	}
+
+	if (fchmod(fd, mode) == 0) {
+		printf("fchmod(%s, 0%04o) didn't fail as expected\n",
+		       buf, mode);
+		abort();
+	}
+
+	/* verify that file mode bits did not change */
+	mfd_assert_mode(fd, st.st_mode & 07777);
+}
+
 static int idle_thread_fn(void *arg)
 {
 	sigset_t set;
@@ -880,6 +967,39 @@ static void test_seal_resize(void)
 	close(fd);
 }
 
+/*
+ * Test SEAL_EXEC
+ * Test that chmod() cannot change x bits after sealing
+ */
+static void test_seal_exec(void)
+{
+	int fd;
+
+	printf("%s SEAL-EXEC\n", memfd_str);
+
+	fd = mfd_assert_new("kern_memfd_seal_exec",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+	mfd_assert_mode(fd, 0777);
+
+	mfd_assert_chmod(fd, 0644);
+
+	mfd_assert_has_seals(fd, 0);
+	mfd_assert_add_seals(fd, F_SEAL_EXEC);
+	mfd_assert_has_seals(fd, F_SEAL_EXEC);
+
+	mfd_assert_chmod(fd, 0600);
+	mfd_fail_chmod(fd, 0777);
+	mfd_fail_chmod(fd, 0670);
+	mfd_fail_chmod(fd, 0605);
+	mfd_fail_chmod(fd, 0700);
+	mfd_fail_chmod(fd, 0100);
+	mfd_assert_chmod(fd, 0666);
+
+	close(fd);
+}
+
 /*
  * Test sharing via dup()
  * Test that seals are shared between dupped FDs and they're all equal.
@@ -1059,6 +1179,7 @@ int main(int argc, char **argv)
 	test_seal_shrink();
 	test_seal_grow();
 	test_seal_resize();
+	test_seal_exec();
 
 	test_share_dup("SHARE-DUP", "");
 	test_share_mmap("SHARE-MMAP", "");
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
  2022-12-09 16:04 ` [PATCH v7 1/6] mm/memfd: add F_SEAL_EXEC jeffxu
  2022-12-09 16:04 ` [PATCH v7 2/6] selftests/memfd: add tests for F_SEAL_EXEC jeffxu
@ 2022-12-09 16:04 ` jeffxu
  2022-12-14 18:53   ` Kees Cook
  2022-12-16 18:39     ` SeongJae Park
  2022-12-09 16:04 ` [PATCH v7 4/6] mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd jeffxu
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module, kernel test robot

From: Jeff Xu <jeffxu@google.com>

The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to
set executable bit at creation time (memfd_create).

When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
(mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
be executable (mode: 0777) after creation.

when MFD_EXEC flag is set, memfd is created with executable bit
(mode:0777), this is the same as the old behavior of memfd_create.

The new pid namespaced sysctl vm.memfd_noexec has 3 values:
0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
        MFD_EXEC was set.
1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
        MFD_NOEXEC_SEAL was set.
2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.

The sysctl allows finer control of memfd_create for old-software
that doesn't set the executable bit, for example, a container with
vm.memfd_noexec=1 means the old-software will create non-executable
memfd by default. Also, the value of memfd_noexec is passed to child
namespace at creation time. For example, if the init namespace has
vm.memfd_noexec=2, all its children namespaces will be created with 2.

Signed-off-by: Jeff Xu <jeffxu@google.com>
Co-developed-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reported-by: kernel test robot <lkp@intel.com>
---
 include/linux/pid_namespace.h | 19 +++++++++++
 include/uapi/linux/memfd.h    |  4 +++
 kernel/pid_namespace.c        |  5 +++
 kernel/pid_sysctl.h           | 59 +++++++++++++++++++++++++++++++++++
 mm/memfd.c                    | 48 ++++++++++++++++++++++++++--
 5 files changed, 133 insertions(+), 2 deletions(-)
 create mode 100644 kernel/pid_sysctl.h

diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 07481bb87d4e..c758809d5bcf 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -16,6 +16,21 @@
 
 struct fs_pin;
 
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
+/*
+ * sysctl for vm.memfd_noexec
+ * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
+ *	acts like MFD_EXEC was set.
+ * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
+ *	acts like MFD_NOEXEC_SEAL was set.
+ * 2: memfd_create() without MFD_NOEXEC_SEAL will be
+ *	rejected.
+ */
+#define MEMFD_NOEXEC_SCOPE_EXEC			0
+#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL		1
+#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED	2
+#endif
+
 struct pid_namespace {
 	struct idr idr;
 	struct rcu_head rcu;
@@ -31,6 +46,10 @@ struct pid_namespace {
 	struct ucounts *ucounts;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
+	/* sysctl for vm.memfd_noexec */
+	int memfd_noexec_scope;
+#endif
 } __randomize_layout;
 
 extern struct pid_namespace init_pid_ns;
diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
index 7a8a26751c23..273a4e15dfcf 100644
--- a/include/uapi/linux/memfd.h
+++ b/include/uapi/linux/memfd.h
@@ -8,6 +8,10 @@
 #define MFD_CLOEXEC		0x0001U
 #define MFD_ALLOW_SEALING	0x0002U
 #define MFD_HUGETLB		0x0004U
+/* not executable and sealed to prevent changing to executable. */
+#define MFD_NOEXEC_SEAL		0x0008U
+/* executable */
+#define MFD_EXEC		0x0010U
 
 /*
  * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index f4f8cb0435b4..8a98b1af9376 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -23,6 +23,7 @@
 #include <linux/sched/task.h>
 #include <linux/sched/signal.h>
 #include <linux/idr.h>
+#include "pid_sysctl.h"
 
 static DEFINE_MUTEX(pid_caches_mutex);
 static struct kmem_cache *pid_ns_cachep;
@@ -110,6 +111,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	ns->ucounts = ucounts;
 	ns->pid_allocated = PIDNS_ADDING;
 
+	initialize_memfd_noexec_scope(ns);
+
 	return ns;
 
 out_free_idr:
@@ -455,6 +458,8 @@ static __init int pid_namespaces_init(void)
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	register_sysctl_paths(kern_path, pid_ns_ctl_table);
 #endif
+
+	register_pid_ns_sysctl_table_vm();
 	return 0;
 }
 
diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
new file mode 100644
index 000000000000..90a93161a122
--- /dev/null
+++ b/kernel/pid_sysctl.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef LINUX_PID_SYSCTL_H
+#define LINUX_PID_SYSCTL_H
+
+#include <linux/pid_namespace.h>
+
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
+static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns)
+{
+	ns->memfd_noexec_scope =
+		task_active_pid_ns(current)->memfd_noexec_scope;
+}
+
+static int pid_mfd_noexec_dointvec_minmax(struct ctl_table *table,
+	int write, void *buf, size_t *lenp, loff_t *ppos)
+{
+	struct pid_namespace *ns = task_active_pid_ns(current);
+	struct ctl_table table_copy;
+
+	if (write && !ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	table_copy = *table;
+	if (ns != &init_pid_ns)
+		table_copy.data = &ns->memfd_noexec_scope;
+
+	/*
+	 * set minimum to current value, the effect is only bigger
+	 * value is accepted.
+	 */
+	if (*(int *)table_copy.data > *(int *)table_copy.extra1)
+		table_copy.extra1 = table_copy.data;
+
+	return proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos);
+}
+
+static struct ctl_table pid_ns_ctl_table_vm[] = {
+	{
+		.procname	= "memfd_noexec",
+		.data		= &init_pid_ns.memfd_noexec_scope,
+		.maxlen		= sizeof(init_pid_ns.memfd_noexec_scope),
+		.mode		= 0644,
+		.proc_handler	= pid_mfd_noexec_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_TWO,
+	},
+	{ }
+};
+static struct ctl_path vm_path[] = { { .procname = "vm", }, { } };
+static inline void register_pid_ns_sysctl_table_vm(void)
+{
+	register_sysctl_paths(vm_path, pid_ns_ctl_table_vm);
+}
+#else
+static inline void set_memfd_noexec_scope(struct pid_namespace *ns) {}
+static inline void register_pid_ns_ctl_table_vm(void) {}
+#endif
+
+#endif /* LINUX_PID_SYSCTL_H */
diff --git a/mm/memfd.c b/mm/memfd.c
index 4ebeab94aa74..ec70675a7069 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -18,6 +18,7 @@
 #include <linux/hugetlb.h>
 #include <linux/shmem_fs.h>
 #include <linux/memfd.h>
+#include <linux/pid_namespace.h>
 #include <uapi/linux/memfd.h>
 
 /*
@@ -263,12 +264,14 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
 #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
 #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
 
-#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB)
+#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | MFD_NOEXEC_SEAL | MFD_EXEC)
 
 SYSCALL_DEFINE2(memfd_create,
 		const char __user *, uname,
 		unsigned int, flags)
 {
+	char comm[TASK_COMM_LEN];
+	struct pid_namespace *ns;
 	unsigned int *file_seals;
 	struct file *file;
 	int fd, error;
@@ -285,6 +288,39 @@ SYSCALL_DEFINE2(memfd_create,
 			return -EINVAL;
 	}
 
+	/* Invalid if both EXEC and NOEXEC_SEAL are set.*/
+	if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC_SEAL))
+		return -EINVAL;
+
+	if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
+#ifdef CONFIG_SYSCTL
+		int sysctl = MEMFD_NOEXEC_SCOPE_EXEC;
+
+		ns = task_active_pid_ns(current);
+		if (ns)
+			sysctl = ns->memfd_noexec_scope;
+
+		switch (sysctl) {
+		case MEMFD_NOEXEC_SCOPE_EXEC:
+			flags |= MFD_EXEC;
+			break;
+		case MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL:
+			flags |= MFD_NOEXEC_SEAL;
+			break;
+		default:
+			pr_warn_ratelimited(
+				"memfd_create(): MFD_NOEXEC_SEAL is enforced, pid=%d '%s'\n",
+				task_pid_nr(current), get_task_comm(comm, current));
+			return -EINVAL;
+		}
+#else
+		flags |= MFD_EXEC;
+#endif
+		pr_warn_ratelimited(
+			"memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=%d '%s'\n",
+			task_pid_nr(current), get_task_comm(comm, current));
+	}
+
 	/* length includes terminating zero */
 	len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
 	if (len <= 0)
@@ -328,7 +364,15 @@ SYSCALL_DEFINE2(memfd_create,
 	file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE;
 	file->f_flags |= O_LARGEFILE;
 
-	if (flags & MFD_ALLOW_SEALING) {
+	if (flags & MFD_NOEXEC_SEAL) {
+		struct inode *inode = file_inode(file);
+
+		inode->i_mode &= ~0111;
+		file_seals = memfd_file_seals_ptr(file);
+		*file_seals &= ~F_SEAL_SEAL;
+		*file_seals |= F_SEAL_EXEC;
+	} else if (flags & MFD_ALLOW_SEALING) {
+		/* MFD_EXEC and MFD_ALLOW_SEALING are set */
 		file_seals = memfd_file_seals_ptr(file);
 		*file_seals &= ~F_SEAL_SEAL;
 	}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 4/6] mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
                   ` (2 preceding siblings ...)
  2022-12-09 16:04 ` [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
@ 2022-12-09 16:04 ` jeffxu
  2022-12-09 16:04 ` [PATCH v7 5/6] selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC jeffxu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

From: Jeff Xu <jeffxu@google.com>

In order to avoid WX mappings, add F_SEAL_WRITE when apply
F_SEAL_EXEC to an executable memfd, so W^X from start.

This implys application need to fill the content of the memfd first,
after F_SEAL_EXEC is applied, application can no longer modify the
content of the memfd.

Typically, application seals the memfd right after writing to it.
For example:
1. memfd_create(MFD_EXEC).
2. write() code to the memfd.
3. fcntl(F_ADD_SEALS, F_SEAL_EXEC) to convert the memfd to W^X.
4. call exec() on the memfd.

Signed-off-by: Jeff Xu <jeffxu@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 mm/memfd.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/memfd.c b/mm/memfd.c
index ec70675a7069..92f0a5765f7c 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -222,6 +222,12 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
 		}
 	}
 
+	/*
+	 * SEAL_EXEC implys SEAL_WRITE, making W^X from the start.
+	 */
+	if (seals & F_SEAL_EXEC && inode->i_mode & 0111)
+		seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;
+
 	*file_seals |= seals;
 	error = 0;
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 5/6] selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
                   ` (3 preceding siblings ...)
  2022-12-09 16:04 ` [PATCH v7 4/6] mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd jeffxu
@ 2022-12-09 16:04 ` jeffxu
  2022-12-09 16:04 ` [PATCH v7 6/6] mm/memfd: security hook for memfd_create jeffxu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

From: Jeff Xu <jeffxu@google.com>

Tests to verify MFD_NOEXEC, MFD_EXEC and vm.memfd_noexec sysctl.

Signed-off-by: Jeff Xu <jeffxu@google.com>
Co-developed-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 tools/testing/selftests/memfd/fuse_test.c  |   1 +
 tools/testing/selftests/memfd/memfd_test.c | 228 ++++++++++++++++++++-
 2 files changed, 224 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/memfd/fuse_test.c b/tools/testing/selftests/memfd/fuse_test.c
index be675002f918..93798c8c5d54 100644
--- a/tools/testing/selftests/memfd/fuse_test.c
+++ b/tools/testing/selftests/memfd/fuse_test.c
@@ -22,6 +22,7 @@
 #include <linux/falloc.h>
 #include <fcntl.h>
 #include <linux/memfd.h>
+#include <linux/types.h>
 #include <sched.h>
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index f18a15a1f275..ae71f15f790d 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -30,6 +30,14 @@
 
 #define F_SEAL_EXEC	0x0020
 
+#define F_WX_SEALS (F_SEAL_SHRINK | \
+		    F_SEAL_GROW | \
+		    F_SEAL_WRITE | \
+		    F_SEAL_FUTURE_WRITE | \
+		    F_SEAL_EXEC)
+
+#define MFD_NOEXEC_SEAL	0x0008U
+
 /*
  * Default is not to test hugetlbfs
  */
@@ -80,6 +88,37 @@ static int mfd_assert_new(const char *name, loff_t sz, unsigned int flags)
 	return fd;
 }
 
+static void sysctl_assert_write(const char *val)
+{
+	int fd = open("/proc/sys/vm/memfd_noexec", O_WRONLY | O_CLOEXEC);
+
+	if (fd < 0) {
+		printf("open sysctl failed\n");
+		abort();
+	}
+
+	if (write(fd, val, strlen(val)) < 0) {
+		printf("write sysctl failed\n");
+		abort();
+	}
+}
+
+static void sysctl_fail_write(const char *val)
+{
+	int fd = open("/proc/sys/vm/memfd_noexec", O_WRONLY | O_CLOEXEC);
+
+	if (fd < 0) {
+		printf("open sysctl failed\n");
+		abort();
+	}
+
+	if (write(fd, val, strlen(val)) >= 0) {
+		printf("write sysctl %s succeeded, but failure expected\n",
+				val);
+		abort();
+	}
+}
+
 static int mfd_assert_reopen_fd(int fd_in)
 {
 	int fd;
@@ -758,6 +797,9 @@ static void test_create(void)
 	mfd_fail_new("", ~0);
 	mfd_fail_new("", 0x80000000U);
 
+	/* verify EXEC and NOEXEC_SEAL can't both be set */
+	mfd_fail_new("", MFD_EXEC | MFD_NOEXEC_SEAL);
+
 	/* verify MFD_CLOEXEC is allowed */
 	fd = mfd_assert_new("", 0, MFD_CLOEXEC);
 	close(fd);
@@ -969,20 +1011,21 @@ static void test_seal_resize(void)
 
 /*
  * Test SEAL_EXEC
- * Test that chmod() cannot change x bits after sealing
+ * Test fd is created with exec and allow sealing.
+ * chmod() cannot change x bits after sealing.
  */
-static void test_seal_exec(void)
+static void test_exec_seal(void)
 {
 	int fd;
 
 	printf("%s SEAL-EXEC\n", memfd_str);
 
+	printf("%s	Apply SEAL_EXEC\n", memfd_str);
 	fd = mfd_assert_new("kern_memfd_seal_exec",
 			    mfd_def_size,
-			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_EXEC);
 
 	mfd_assert_mode(fd, 0777);
-
 	mfd_assert_chmod(fd, 0644);
 
 	mfd_assert_has_seals(fd, 0);
@@ -996,10 +1039,181 @@ static void test_seal_exec(void)
 	mfd_fail_chmod(fd, 0700);
 	mfd_fail_chmod(fd, 0100);
 	mfd_assert_chmod(fd, 0666);
+	mfd_assert_write(fd);
+	close(fd);
+
+	printf("%s	Apply ALL_SEALS\n", memfd_str);
+	fd = mfd_assert_new("kern_memfd_seal_exec",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_EXEC);
+
+	mfd_assert_mode(fd, 0777);
+	mfd_assert_chmod(fd, 0700);
+
+	mfd_assert_has_seals(fd, 0);
+	mfd_assert_add_seals(fd, F_SEAL_EXEC);
+	mfd_assert_has_seals(fd, F_WX_SEALS);
 
+	mfd_fail_chmod(fd, 0711);
+	mfd_fail_chmod(fd, 0600);
+	mfd_fail_write(fd);
+	close(fd);
+}
+
+/*
+ * Test EXEC_NO_SEAL
+ * Test fd is created with exec and not allow sealing.
+ */
+static void test_exec_no_seal(void)
+{
+	int fd;
+
+	printf("%s EXEC_NO_SEAL\n", memfd_str);
+
+	/* Create with EXEC but without ALLOW_SEALING */
+	fd = mfd_assert_new("kern_memfd_exec_no_sealing",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_EXEC);
+	mfd_assert_mode(fd, 0777);
+	mfd_assert_has_seals(fd, F_SEAL_SEAL);
+	mfd_assert_chmod(fd, 0666);
 	close(fd);
 }
 
+/*
+ * Test memfd_create with MFD_NOEXEC flag
+ */
+static void test_noexec_seal(void)
+{
+	int fd;
+
+	printf("%s NOEXEC_SEAL\n", memfd_str);
+
+	/* Create with NOEXEC and ALLOW_SEALING */
+	fd = mfd_assert_new("kern_memfd_noexec",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_NOEXEC_SEAL);
+	mfd_assert_mode(fd, 0666);
+	mfd_assert_has_seals(fd, F_SEAL_EXEC);
+	mfd_fail_chmod(fd, 0777);
+	close(fd);
+
+	/* Create with NOEXEC but without ALLOW_SEALING */
+	fd = mfd_assert_new("kern_memfd_noexec",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_NOEXEC_SEAL);
+	mfd_assert_mode(fd, 0666);
+	mfd_assert_has_seals(fd, F_SEAL_EXEC);
+	mfd_fail_chmod(fd, 0777);
+	close(fd);
+}
+
+static void test_sysctl_child(void)
+{
+	int fd;
+
+	printf("%s sysctl 0\n", memfd_str);
+	sysctl_assert_write("0");
+	fd = mfd_assert_new("kern_memfd_sysctl_0",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+	mfd_assert_mode(fd, 0777);
+	mfd_assert_has_seals(fd, 0);
+	mfd_assert_chmod(fd, 0644);
+	close(fd);
+
+	printf("%s sysctl 1\n", memfd_str);
+	sysctl_assert_write("1");
+	fd = mfd_assert_new("kern_memfd_sysctl_1",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+	mfd_assert_mode(fd, 0666);
+	mfd_assert_has_seals(fd, F_SEAL_EXEC);
+	mfd_fail_chmod(fd, 0777);
+	sysctl_fail_write("0");
+	close(fd);
+
+	printf("%s sysctl 2\n", memfd_str);
+	sysctl_assert_write("2");
+	mfd_fail_new("kern_memfd_sysctl_2",
+		MFD_CLOEXEC | MFD_ALLOW_SEALING);
+	sysctl_fail_write("0");
+	sysctl_fail_write("1");
+}
+
+static int newpid_thread_fn(void *arg)
+{
+	test_sysctl_child();
+	return 0;
+}
+
+static void test_sysctl_child2(void)
+{
+	int fd;
+
+	sysctl_fail_write("0");
+	fd = mfd_assert_new("kern_memfd_sysctl_1",
+			    mfd_def_size,
+			    MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+	mfd_assert_mode(fd, 0666);
+	mfd_assert_has_seals(fd, F_SEAL_EXEC);
+	mfd_fail_chmod(fd, 0777);
+	close(fd);
+}
+
+static int newpid_thread_fn2(void *arg)
+{
+	test_sysctl_child2();
+	return 0;
+}
+static pid_t spawn_newpid_thread(unsigned int flags, int (*fn)(void *))
+{
+	uint8_t *stack;
+	pid_t pid;
+
+	stack = malloc(STACK_SIZE);
+	if (!stack) {
+		printf("malloc(STACK_SIZE) failed: %m\n");
+		abort();
+	}
+
+	pid = clone(fn,
+		    stack + STACK_SIZE,
+		    SIGCHLD | flags,
+		    NULL);
+	if (pid < 0) {
+		printf("clone() failed: %m\n");
+		abort();
+	}
+
+	return pid;
+}
+
+static void join_newpid_thread(pid_t pid)
+{
+	waitpid(pid, NULL, 0);
+}
+
+/*
+ * Test sysctl
+ * A very basic sealing test to see whether setting/retrieving seals works.
+ */
+static void test_sysctl(void)
+{
+	int pid = spawn_newpid_thread(CLONE_NEWPID, newpid_thread_fn);
+
+	join_newpid_thread(pid);
+
+	printf("%s child ns\n", memfd_str);
+	sysctl_assert_write("1");
+
+	pid = spawn_newpid_thread(CLONE_NEWPID, newpid_thread_fn2);
+	join_newpid_thread(pid);
+}
+
 /*
  * Test sharing via dup()
  * Test that seals are shared between dupped FDs and they're all equal.
@@ -1173,13 +1387,15 @@ int main(int argc, char **argv)
 
 	test_create();
 	test_basic();
+	test_exec_seal();
+	test_exec_no_seal();
+	test_noexec_seal();
 
 	test_seal_write();
 	test_seal_future_write();
 	test_seal_shrink();
 	test_seal_grow();
 	test_seal_resize();
-	test_seal_exec();
 
 	test_share_dup("SHARE-DUP", "");
 	test_share_mmap("SHARE-MMAP", "");
@@ -1195,6 +1411,8 @@ int main(int argc, char **argv)
 	test_share_fork("SHARE-FORK", SHARED_FT_STR);
 	join_idle_thread(pid);
 
+	test_sysctl();
+
 	printf("memfd: DONE\n");
 
 	return 0;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
                   ` (4 preceding siblings ...)
  2022-12-09 16:04 ` [PATCH v7 5/6] selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC jeffxu
@ 2022-12-09 16:04 ` jeffxu
  2022-12-09 17:02   ` Casey Schaufler
  2022-12-09 18:29   ` Paul Moore
  2022-12-09 18:15 ` [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC Paul Moore
  2022-12-14 18:54 ` Kees Cook
  7 siblings, 2 replies; 25+ messages in thread
From: jeffxu @ 2022-12-09 16:04 UTC (permalink / raw)
  To: skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module, kernel test robot

From: Jeff Xu <jeffxu@google.com>

The new security_memfd_create allows lsm to check flags of
memfd_create.

The security by default system (such as chromeos) can use this
to implement system wide lsm to allow only non-executable memfd
being created.

Signed-off-by: Jeff Xu <jeffxu@google.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 include/linux/lsm_hook_defs.h | 1 +
 include/linux/lsm_hooks.h     | 4 ++++
 include/linux/security.h      | 6 ++++++
 mm/memfd.c                    | 5 +++++
 security/security.c           | 5 +++++
 5 files changed, 21 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ec119da1d89b..fd40840927c8 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -164,6 +164,7 @@ LSM_HOOK(int, 0, file_alloc_security, struct file *file)
 LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file)
 LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd,
 	 unsigned long arg)
+LSM_HOOK(int, 0, memfd_create, char *name, unsigned int flags)
 LSM_HOOK(int, 0, mmap_addr, unsigned long addr)
 LSM_HOOK(int, 0, mmap_file, struct file *file, unsigned long reqprot,
 	 unsigned long prot, unsigned long flags)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 4ec80b96c22e..5a18a6552278 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -543,6 +543,10 @@
  *	simple integer value.  When @arg represents a user space pointer, it
  *	should never be used by the security module.
  *	Return 0 if permission is granted.
+ * @memfd_create:
+ *	@name is the name of memfd file.
+ *	@flags is the flags used in memfd_create.
+ *	Return 0 if permission is granted.
  * @mmap_addr :
  *	Check permissions for a mmap operation at @addr.
  *	@addr contains virtual address that will be used for the operation.
diff --git a/include/linux/security.h b/include/linux/security.h
index ca1b7109c0db..5b87a780822a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -384,6 +384,7 @@ int security_file_permission(struct file *file, int mask);
 int security_file_alloc(struct file *file);
 void security_file_free(struct file *file);
 int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int security_memfd_create(char *name, unsigned int flags);
 int security_mmap_file(struct file *file, unsigned long prot,
 			unsigned long flags);
 int security_mmap_addr(unsigned long addr);
@@ -963,6 +964,11 @@ static inline int security_file_ioctl(struct file *file, unsigned int cmd,
 	return 0;
 }
 
+static inline int security_memfd_create(char *name, unsigned int flags)
+{
+	return 0;
+}
+
 static inline int security_mmap_file(struct file *file, unsigned long prot,
 				     unsigned long flags)
 {
diff --git a/mm/memfd.c b/mm/memfd.c
index 92f0a5765f7c..f04ed5f0474f 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,6 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
 		goto err_name;
 	}
 
+	/* security hook for memfd_create */
+	error = security_memfd_create(name, flags);
+	if (error)
+		return error;
+
 	if (flags & MFD_HUGETLB) {
 		file = hugetlb_file_setup(name, 0, VM_NORESERVE,
 					HUGETLB_ANONHUGE_INODE,
diff --git a/security/security.c b/security/security.c
index 79d82cb6e469..57788cf94075 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1010,6 +1010,11 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
 }
 EXPORT_SYMBOL(security_sb_clone_mnt_opts);
 
+int security_memfd_create(char *name, unsigned int flags)
+{
+	return call_int_hook(memfd_create, 0, name, flags);
+}
+
 int security_move_mount(const struct path *from_path, const struct path *to_path)
 {
 	return call_int_hook(move_mount, 0, from_path, to_path);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-09 16:04 ` [PATCH v7 6/6] mm/memfd: security hook for memfd_create jeffxu
@ 2022-12-09 17:02   ` Casey Schaufler
  2022-12-09 18:29   ` Paul Moore
  1 sibling, 0 replies; 25+ messages in thread
From: Casey Schaufler @ 2022-12-09 17:02 UTC (permalink / raw)
  To: jeffxu, skhan, keescook
  Cc: akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module, kernel test robot, casey

On 12/9/2022 8:04 AM, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@google.com>
>
> The new security_memfd_create allows lsm to check flags of
> memfd_create.
>
> The security by default system (such as chromeos) can use this
> to implement system wide lsm to allow only non-executable memfd
> being created.
>
> Signed-off-by: Jeff Xu <jeffxu@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
>  include/linux/lsm_hook_defs.h | 1 +
>  include/linux/lsm_hooks.h     | 4 ++++
>  include/linux/security.h      | 6 ++++++
>  mm/memfd.c                    | 5 +++++
>  security/security.c           | 5 +++++
>  5 files changed, 21 insertions(+)
>
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index ec119da1d89b..fd40840927c8 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -164,6 +164,7 @@ LSM_HOOK(int, 0, file_alloc_security, struct file *file)
>  LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file)
>  LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd,
>  	 unsigned long arg)
> +LSM_HOOK(int, 0, memfd_create, char *name, unsigned int flags)
>  LSM_HOOK(int, 0, mmap_addr, unsigned long addr)
>  LSM_HOOK(int, 0, mmap_file, struct file *file, unsigned long reqprot,
>  	 unsigned long prot, unsigned long flags)
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 4ec80b96c22e..5a18a6552278 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -543,6 +543,10 @@
>   *	simple integer value.  When @arg represents a user space pointer, it
>   *	should never be used by the security module.
>   *	Return 0 if permission is granted.
> + * @memfd_create:
> + *	@name is the name of memfd file.
> + *	@flags is the flags used in memfd_create.
> + *	Return 0 if permission is granted.
>   * @mmap_addr :
>   *	Check permissions for a mmap operation at @addr.
>   *	@addr contains virtual address that will be used for the operation.
> diff --git a/include/linux/security.h b/include/linux/security.h
> index ca1b7109c0db..5b87a780822a 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -384,6 +384,7 @@ int security_file_permission(struct file *file, int mask);
>  int security_file_alloc(struct file *file);
>  void security_file_free(struct file *file);
>  int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
> +int security_memfd_create(char *name, unsigned int flags);
>  int security_mmap_file(struct file *file, unsigned long prot,
>  			unsigned long flags);
>  int security_mmap_addr(unsigned long addr);
> @@ -963,6 +964,11 @@ static inline int security_file_ioctl(struct file *file, unsigned int cmd,
>  	return 0;
>  }
>  
> +static inline int security_memfd_create(char *name, unsigned int flags)
> +{
> +	return 0;
> +}
> +

Add a proper kernel doc comment for this function.

>  static inline int security_mmap_file(struct file *file, unsigned long prot,
>  				     unsigned long flags)
>  {
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 92f0a5765f7c..f04ed5f0474f 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -356,6 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
>  		goto err_name;
>  	}
>  
> +	/* security hook for memfd_create */
> +	error = security_memfd_create(name, flags);
> +	if (error)
> +		return error;
> +
>  	if (flags & MFD_HUGETLB) {
>  		file = hugetlb_file_setup(name, 0, VM_NORESERVE,
>  					HUGETLB_ANONHUGE_INODE,
> diff --git a/security/security.c b/security/security.c
> index 79d82cb6e469..57788cf94075 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1010,6 +1010,11 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
>  }
>  EXPORT_SYMBOL(security_sb_clone_mnt_opts);
>  
> +int security_memfd_create(char *name, unsigned int flags)
> +{
> +	return call_int_hook(memfd_create, 0, name, flags);
> +}
> +
>  int security_move_mount(const struct path *from_path, const struct path *to_path)
>  {
>  	return call_int_hook(move_mount, 0, from_path, to_path);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
                   ` (5 preceding siblings ...)
  2022-12-09 16:04 ` [PATCH v7 6/6] mm/memfd: security hook for memfd_create jeffxu
@ 2022-12-09 18:15 ` Paul Moore
  2022-12-14 18:54 ` Kees Cook
  7 siblings, 0 replies; 25+ messages in thread
From: Paul Moore @ 2022-12-09 18:15 UTC (permalink / raw)
  To: jeffxu
  Cc: skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module

On Fri, Dec 9, 2022 at 11:05 AM <jeffxu@chromium.org> wrote:
> From: Jeff Xu <jeffxu@google.com>
>
> Since Linux introduced the memfd feature, memfd have always had their
> execute bit set, and the memfd_create() syscall doesn't allow setting
> it differently.
>
> However, in a secure by default system, such as ChromeOS, (where all
> executables should come from the rootfs, which is protected by Verified
> boot), this executable nature of memfd opens a door for NoExec bypass
> and enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
> process created a memfd to share the content with an external process,
> however the memfd is overwritten and used for executing arbitrary code
> and root escalation. [2] lists more VRP in this kind.

...

> [1] https://crbug.com/1305411

Can you make this accessible so those of us on the public lists can
view this bug?  If not, please remove it from future postings and
adjust your description accordingly.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-09 16:04 ` [PATCH v7 6/6] mm/memfd: security hook for memfd_create jeffxu
  2022-12-09 17:02   ` Casey Schaufler
@ 2022-12-09 18:29   ` Paul Moore
  2022-12-13 15:00     ` Jeff Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Paul Moore @ 2022-12-09 18:29 UTC (permalink / raw)
  To: jeffxu
  Cc: skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

On Fri, Dec 9, 2022 at 11:05 AM <jeffxu@chromium.org> wrote:
>
> From: Jeff Xu <jeffxu@google.com>
>
> The new security_memfd_create allows lsm to check flags of
> memfd_create.
>
> The security by default system (such as chromeos) can use this
> to implement system wide lsm to allow only non-executable memfd
> being created.
>
> Signed-off-by: Jeff Xu <jeffxu@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
>  include/linux/lsm_hook_defs.h | 1 +
>  include/linux/lsm_hooks.h     | 4 ++++
>  include/linux/security.h      | 6 ++++++
>  mm/memfd.c                    | 5 +++++
>  security/security.c           | 5 +++++
>  5 files changed, 21 insertions(+)

We typically require at least one in-tree LSM implementation to
accompany a new LSM hook.  Beyond simply providing proof that the hook
has value, it helps provide a functional example both for reviewers as
well as future LSM implementations.  Also, while the BPF LSM is
definitely "in-tree", its nature is such that the actual
implementation lives out-of-tree; something like SELinux, AppArmor,
Smack, etc. are much more desirable from an in-tree example
perspective.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-09 18:29   ` Paul Moore
@ 2022-12-13 15:00     ` Jeff Xu
  2022-12-13 15:37       ` Casey Schaufler
  2022-12-13 19:22       ` Paul Moore
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff Xu @ 2022-12-13 15:00 UTC (permalink / raw)
  To: Paul Moore
  Cc: jeffxu, skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

On Fri, Dec 9, 2022 at 10:29 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Fri, Dec 9, 2022 at 11:05 AM <jeffxu@chromium.org> wrote:
> >
> > From: Jeff Xu <jeffxu@google.com>
> >
> > The new security_memfd_create allows lsm to check flags of
> > memfd_create.
> >
> > The security by default system (such as chromeos) can use this
> > to implement system wide lsm to allow only non-executable memfd
> > being created.
> >
> > Signed-off-by: Jeff Xu <jeffxu@google.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> >  include/linux/lsm_hook_defs.h | 1 +
> >  include/linux/lsm_hooks.h     | 4 ++++
> >  include/linux/security.h      | 6 ++++++
> >  mm/memfd.c                    | 5 +++++
> >  security/security.c           | 5 +++++
> >  5 files changed, 21 insertions(+)
>
> We typically require at least one in-tree LSM implementation to
> accompany a new LSM hook.  Beyond simply providing proof that the hook
> has value, it helps provide a functional example both for reviewers as
> well as future LSM implementations.  Also, while the BPF LSM is
> definitely "in-tree", its nature is such that the actual
> implementation lives out-of-tree; something like SELinux, AppArmor,
> Smack, etc. are much more desirable from an in-tree example
> perspective.
>
Thanks for the comments.
Would that be OK if I add a new LSM in the kernel  to block executable
memfd creation ?
Alternatively,  it might be possible to add this into SELinux or
landlock, it will be a larger change.

Thanks

Jeff


> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-13 15:00     ` Jeff Xu
@ 2022-12-13 15:37       ` Casey Schaufler
  2022-12-13 19:22       ` Paul Moore
  1 sibling, 0 replies; 25+ messages in thread
From: Casey Schaufler @ 2022-12-13 15:37 UTC (permalink / raw)
  To: Jeff Xu, Paul Moore
  Cc: jeffxu, skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot, casey

On 12/13/2022 7:00 AM, Jeff Xu wrote:
> On Fri, Dec 9, 2022 at 10:29 AM Paul Moore <paul@paul-moore.com> wrote:
>> On Fri, Dec 9, 2022 at 11:05 AM <jeffxu@chromium.org> wrote:
>>> From: Jeff Xu <jeffxu@google.com>
>>>
>>> The new security_memfd_create allows lsm to check flags of
>>> memfd_create.
>>>
>>> The security by default system (such as chromeos) can use this
>>> to implement system wide lsm to allow only non-executable memfd
>>> being created.
>>>
>>> Signed-off-by: Jeff Xu <jeffxu@google.com>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> ---
>>>  include/linux/lsm_hook_defs.h | 1 +
>>>  include/linux/lsm_hooks.h     | 4 ++++
>>>  include/linux/security.h      | 6 ++++++
>>>  mm/memfd.c                    | 5 +++++
>>>  security/security.c           | 5 +++++
>>>  5 files changed, 21 insertions(+)
>> We typically require at least one in-tree LSM implementation to
>> accompany a new LSM hook.  Beyond simply providing proof that the hook
>> has value, it helps provide a functional example both for reviewers as
>> well as future LSM implementations.  Also, while the BPF LSM is
>> definitely "in-tree", its nature is such that the actual
>> implementation lives out-of-tree; something like SELinux, AppArmor,
>> Smack, etc. are much more desirable from an in-tree example
>> perspective.
>>
> Thanks for the comments.
> Would that be OK if I add a new LSM in the kernel  to block executable
> memfd creation ?
> Alternatively,  it might be possible to add this into SELinux or
> landlock, it will be a larger change.

I expect you'll get other opinions, but I'd be happy with a small LSM
that does sophisticated memory fd controls. I also expect that the
SELinux crew would like to see a hook included there.

>
> Thanks
>
> Jeff
>
>
>> --
>> paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-13 15:00     ` Jeff Xu
  2022-12-13 15:37       ` Casey Schaufler
@ 2022-12-13 19:22       ` Paul Moore
  2022-12-13 23:05         ` Jeff Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Paul Moore @ 2022-12-13 19:22 UTC (permalink / raw)
  To: Jeff Xu
  Cc: jeffxu, skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

On Tue, Dec 13, 2022 at 10:00 AM Jeff Xu <jeffxu@google.com> wrote:
> On Fri, Dec 9, 2022 at 10:29 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Fri, Dec 9, 2022 at 11:05 AM <jeffxu@chromium.org> wrote:
> > >
> > > From: Jeff Xu <jeffxu@google.com>
> > >
> > > The new security_memfd_create allows lsm to check flags of
> > > memfd_create.
> > >
> > > The security by default system (such as chromeos) can use this
> > > to implement system wide lsm to allow only non-executable memfd
> > > being created.
> > >
> > > Signed-off-by: Jeff Xu <jeffxu@google.com>
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > ---
> > >  include/linux/lsm_hook_defs.h | 1 +
> > >  include/linux/lsm_hooks.h     | 4 ++++
> > >  include/linux/security.h      | 6 ++++++
> > >  mm/memfd.c                    | 5 +++++
> > >  security/security.c           | 5 +++++
> > >  5 files changed, 21 insertions(+)
> >
> > We typically require at least one in-tree LSM implementation to
> > accompany a new LSM hook.  Beyond simply providing proof that the hook
> > has value, it helps provide a functional example both for reviewers as
> > well as future LSM implementations.  Also, while the BPF LSM is
> > definitely "in-tree", its nature is such that the actual
> > implementation lives out-of-tree; something like SELinux, AppArmor,
> > Smack, etc. are much more desirable from an in-tree example
> > perspective.
>
> Thanks for the comments.
> Would that be OK if I add a new LSM in the kernel  to block executable
> memfd creation ?

If you would be proposing the LSM only to meet the requirement of
providing an in-tree LSM example, no that would definitely *not* be
okay.

Proposing a new LSM involves documenting a meaningful security model,
implementing it, developing tests, going through a (likely multi-step)
review process, and finally accepting the long term maintenance
responsibilities of this new LSM.  If you are proposing a new LSM
because you feel the current LSMs do not provide a security model
which meets your needs, then yes, proposing a new LSM might be a good
idea.  However, if you are proposing a new LSM because you don't want
to learn how to add a new hook to an existing LSM, then I suspect you
are misguided/misinformed with the amount of work involved in
submitting a new LSM.

> Alternatively,  it might be possible to add this into SELinux or
> landlock, it will be a larger change.

It will be a much smaller change than submitting a new LSM, and it
would have infinitely more value to the community than a throw-away
LSM where the only use-case is getting your code merged upstream.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 6/6] mm/memfd: security hook for memfd_create
  2022-12-13 19:22       ` Paul Moore
@ 2022-12-13 23:05         ` Jeff Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Xu @ 2022-12-13 23:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: jeffxu, skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

On Tue, Dec 13, 2022 at 11:22 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Tue, Dec 13, 2022 at 10:00 AM Jeff Xu <jeffxu@google.com> wrote:
> > On Fri, Dec 9, 2022 at 10:29 AM Paul Moore <paul@paul-moore.com> wrote:
> > > On Fri, Dec 9, 2022 at 11:05 AM <jeffxu@chromium.org> wrote:
> > > >
> > > > From: Jeff Xu <jeffxu@google.com>
> > > >
> > > > The new security_memfd_create allows lsm to check flags of
> > > > memfd_create.
> > > >
> > > > The security by default system (such as chromeos) can use this
> > > > to implement system wide lsm to allow only non-executable memfd
> > > > being created.
> > > >
> > > > Signed-off-by: Jeff Xu <jeffxu@google.com>
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > ---
> > > >  include/linux/lsm_hook_defs.h | 1 +
> > > >  include/linux/lsm_hooks.h     | 4 ++++
> > > >  include/linux/security.h      | 6 ++++++
> > > >  mm/memfd.c                    | 5 +++++
> > > >  security/security.c           | 5 +++++
> > > >  5 files changed, 21 insertions(+)
> > >
> > > We typically require at least one in-tree LSM implementation to
> > > accompany a new LSM hook.  Beyond simply providing proof that the hook
> > > has value, it helps provide a functional example both for reviewers as
> > > well as future LSM implementations.  Also, while the BPF LSM is
> > > definitely "in-tree", its nature is such that the actual
> > > implementation lives out-of-tree; something like SELinux, AppArmor,
> > > Smack, etc. are much more desirable from an in-tree example
> > > perspective.
> >
> > Thanks for the comments.
> > Would that be OK if I add a new LSM in the kernel  to block executable
> > memfd creation ?
>
> If you would be proposing the LSM only to meet the requirement of
> providing an in-tree LSM example, no that would definitely *not* be
> okay.
>
> Proposing a new LSM involves documenting a meaningful security model,
> implementing it, developing tests, going through a (likely multi-step)
> review process, and finally accepting the long term maintenance
> responsibilities of this new LSM.  If you are proposing a new LSM
> because you feel the current LSMs do not provide a security model
> which meets your needs, then yes, proposing a new LSM might be a good
> idea.  However, if you are proposing a new LSM because you don't want
> to learn how to add a new hook to an existing LSM, then I suspect you
> are misguided/misinformed with the amount of work involved in
> submitting a new LSM.
>
> > Alternatively,  it might be possible to add this into SELinux or
> > landlock, it will be a larger change.
>
> It will be a much smaller change than submitting a new LSM, and it
> would have infinitely more value to the community than a throw-away
> LSM where the only use-case is getting your code merged upstream.
>
Thanks, my original thought is this LSM will be used by ChromeOS,
since all of its memfd shall be non-executable. That said, I see the community
will benefit more with this in SELinux.

I will work to add this in SELinux, appreciate help while I'm learning
to add this.

Jeff

> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/6] selftests/memfd: add tests for F_SEAL_EXEC
  2022-12-09 16:04 ` [PATCH v7 2/6] selftests/memfd: add tests for F_SEAL_EXEC jeffxu
@ 2022-12-14 18:52   ` Kees Cook
  0 siblings, 0 replies; 25+ messages in thread
From: Kees Cook @ 2022-12-14 18:52 UTC (permalink / raw)
  To: jeffxu
  Cc: skhan, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

On Fri, Dec 09, 2022 at 04:04:49PM +0000, jeffxu@chromium.org wrote:
> From: Daniel Verkamp <dverkamp@chromium.org>
> 
> Basic tests to ensure that user/group/other execute bits cannot be
> changed after applying F_SEAL_EXEC to a memfd.
> 
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-09 16:04 ` [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
@ 2022-12-14 18:53   ` Kees Cook
  2022-12-16 18:39     ` SeongJae Park
  1 sibling, 0 replies; 25+ messages in thread
From: Kees Cook @ 2022-12-14 18:53 UTC (permalink / raw)
  To: jeffxu
  Cc: skhan, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module, kernel test robot

On Fri, Dec 09, 2022 at 04:04:50PM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@google.com>
> 
> The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to
> set executable bit at creation time (memfd_create).
> 
> When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
> (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
> be executable (mode: 0777) after creation.
> 
> when MFD_EXEC flag is set, memfd is created with executable bit
> (mode:0777), this is the same as the old behavior of memfd_create.
> 
> The new pid namespaced sysctl vm.memfd_noexec has 3 values:
> 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
>         MFD_EXEC was set.
> 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
>         MFD_NOEXEC_SEAL was set.
> 2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> 
> The sysctl allows finer control of memfd_create for old-software
> that doesn't set the executable bit, for example, a container with
> vm.memfd_noexec=1 means the old-software will create non-executable
> memfd by default. Also, the value of memfd_noexec is passed to child
> namespace at creation time. For example, if the init namespace has
> vm.memfd_noexec=2, all its children namespaces will be created with 2.
> 
> Signed-off-by: Jeff Xu <jeffxu@google.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
                   ` (6 preceding siblings ...)
  2022-12-09 18:15 ` [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC Paul Moore
@ 2022-12-14 18:54 ` Kees Cook
  2022-12-14 23:32   ` Jeff Xu
  7 siblings, 1 reply; 25+ messages in thread
From: Kees Cook @ 2022-12-14 18:54 UTC (permalink / raw)
  To: jeffxu
  Cc: skhan, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

On Fri, Dec 09, 2022 at 04:04:47PM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@google.com>
> 
> Since Linux introduced the memfd feature, memfd have always had their
> execute bit set, and the memfd_create() syscall doesn't allow setting
> it differently.
> 
> However, in a secure by default system, such as ChromeOS, (where all
> executables should come from the rootfs, which is protected by Verified
> boot), this executable nature of memfd opens a door for NoExec bypass
> and enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
> process created a memfd to share the content with an external process,
> however the memfd is overwritten and used for executing arbitrary code
> and root escalation. [2] lists more VRP in this kind.
> 
> On the other hand, executable memfd has its legit use, runc uses memfd’s
> seal and executable feature to copy the contents of the binary then
> execute them, for such system, we need a solution to differentiate runc's
> use of  executable memfds and an attacker's [3].
> 
> To address those above, this set of patches add following:
> 1> Let memfd_create() set X bit at creation time.
> 2> Let memfd to be sealed for modifying X bit.
> 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
>    X bit.For example, if a container has vm.memfd_noexec=2, then
>    memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> 4> A new security hook in memfd_create(). This make it possible to a new
> LSM, which rejects or allows executable memfd based on its security policy.

I think patch 1-5 look good to land. The LSM hook seems separable, and
could continue on its own. Thoughts?

(Which tree should memfd change go through?)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-14 18:54 ` Kees Cook
@ 2022-12-14 23:32   ` Jeff Xu
  2022-12-15  0:08     ` Kees Cook
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2022-12-14 23:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: jeffxu, skhan, akpm, dmitry.torokhov, dverkamp, hughd, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

On Wed, Dec 14, 2022 at 10:54 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Dec 09, 2022 at 04:04:47PM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@google.com>
> >
> > Since Linux introduced the memfd feature, memfd have always had their
> > execute bit set, and the memfd_create() syscall doesn't allow setting
> > it differently.
> >
> > However, in a secure by default system, such as ChromeOS, (where all
> > executables should come from the rootfs, which is protected by Verified
> > boot), this executable nature of memfd opens a door for NoExec bypass
> > and enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
> > process created a memfd to share the content with an external process,
> > however the memfd is overwritten and used for executing arbitrary code
> > and root escalation. [2] lists more VRP in this kind.
> >
> > On the other hand, executable memfd has its legit use, runc uses memfd’s
> > seal and executable feature to copy the contents of the binary then
> > execute them, for such system, we need a solution to differentiate runc's
> > use of  executable memfds and an attacker's [3].
> >
> > To address those above, this set of patches add following:
> > 1> Let memfd_create() set X bit at creation time.
> > 2> Let memfd to be sealed for modifying X bit.
> > 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
> >    X bit.For example, if a container has vm.memfd_noexec=2, then
> >    memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> > 4> A new security hook in memfd_create(). This make it possible to a new
> > LSM, which rejects or allows executable memfd based on its security policy.
>
> I think patch 1-5 look good to land. The LSM hook seems separable, and
> could continue on its own. Thoughts?
>
Agreed.

> (Which tree should memfd change go through?)
>
I'm not sure, is there a recommendation ?

Thanks.
Jeff

> -Kees
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-14 23:32   ` Jeff Xu
@ 2022-12-15  0:08     ` Kees Cook
  2022-12-15 16:55       ` Jeff Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Kees Cook @ 2022-12-15  0:08 UTC (permalink / raw)
  To: Jeff Xu, akpm
  Cc: jeffxu, skhan, dmitry.torokhov, dverkamp, hughd, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

On Wed, Dec 14, 2022 at 03:32:16PM -0800, Jeff Xu wrote:
> On Wed, Dec 14, 2022 at 10:54 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Dec 09, 2022 at 04:04:47PM +0000, jeffxu@chromium.org wrote:
> > > From: Jeff Xu <jeffxu@google.com>
> > >
> > > Since Linux introduced the memfd feature, memfd have always had their
> > > execute bit set, and the memfd_create() syscall doesn't allow setting
> > > it differently.
> > >
> > > However, in a secure by default system, such as ChromeOS, (where all
> > > executables should come from the rootfs, which is protected by Verified
> > > boot), this executable nature of memfd opens a door for NoExec bypass
> > > and enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
> > > process created a memfd to share the content with an external process,
> > > however the memfd is overwritten and used for executing arbitrary code
> > > and root escalation. [2] lists more VRP in this kind.
> > >
> > > On the other hand, executable memfd has its legit use, runc uses memfd’s
> > > seal and executable feature to copy the contents of the binary then
> > > execute them, for such system, we need a solution to differentiate runc's
> > > use of  executable memfds and an attacker's [3].
> > >
> > > To address those above, this set of patches add following:
> > > 1> Let memfd_create() set X bit at creation time.
> > > 2> Let memfd to be sealed for modifying X bit.
> > > 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
> > >    X bit.For example, if a container has vm.memfd_noexec=2, then
> > >    memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> > > 4> A new security hook in memfd_create(). This make it possible to a new
> > > LSM, which rejects or allows executable memfd based on its security policy.
> >
> > I think patch 1-5 look good to land. The LSM hook seems separable, and
> > could continue on its own. Thoughts?
> >
> Agreed.
> 
> > (Which tree should memfd change go through?)
> >
> I'm not sure, is there a recommendation ?

It looks like it's traditionally through akpm's tree. Andrew, will you
carry patches 1-5?

Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-15  0:08     ` Kees Cook
@ 2022-12-15 16:55       ` Jeff Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Xu @ 2022-12-15 16:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: akpm, jeffxu, skhan, dmitry.torokhov, dverkamp, hughd, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module

On Wed, Dec 14, 2022 at 4:08 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Wed, Dec 14, 2022 at 03:32:16PM -0800, Jeff Xu wrote:
> > On Wed, Dec 14, 2022 at 10:54 AM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Fri, Dec 09, 2022 at 04:04:47PM +0000, jeffxu@chromium.org wrote:
> > > > From: Jeff Xu <jeffxu@google.com>
> > > >
> > > > Since Linux introduced the memfd feature, memfd have always had their
> > > > execute bit set, and the memfd_create() syscall doesn't allow setting
> > > > it differently.
> > > >
> > > > However, in a secure by default system, such as ChromeOS, (where all
> > > > executables should come from the rootfs, which is protected by Verified
> > > > boot), this executable nature of memfd opens a door for NoExec bypass
> > > > and enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
> > > > process created a memfd to share the content with an external process,
> > > > however the memfd is overwritten and used for executing arbitrary code
> > > > and root escalation. [2] lists more VRP in this kind.
> > > >
> > > > On the other hand, executable memfd has its legit use, runc uses memfd’s
> > > > seal and executable feature to copy the contents of the binary then
> > > > execute them, for such system, we need a solution to differentiate runc's
> > > > use of  executable memfds and an attacker's [3].
> > > >
> > > > To address those above, this set of patches add following:
> > > > 1> Let memfd_create() set X bit at creation time.
> > > > 2> Let memfd to be sealed for modifying X bit.
> > > > 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
> > > >    X bit.For example, if a container has vm.memfd_noexec=2, then
> > > >    memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> > > > 4> A new security hook in memfd_create(). This make it possible to a new
> > > > LSM, which rejects or allows executable memfd based on its security policy.
> > >
> > > I think patch 1-5 look good to land. The LSM hook seems separable, and
> > > could continue on its own. Thoughts?
> > >
> > Agreed.
> >
> > > (Which tree should memfd change go through?)
> > >
> > I'm not sure, is there a recommendation ?
>
> It looks like it's traditionally through akpm's tree. Andrew, will you
> carry patches 1-5?
>
Hi Andrew, if you are taking this, V8 is the latest that contains patch 1-5.

Thanks
Jeff

> Thanks!
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-09 16:04 ` [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
@ 2022-12-16 18:39     ` SeongJae Park
  2022-12-16 18:39     ` SeongJae Park
  1 sibling, 0 replies; 25+ messages in thread
From: SeongJae Park @ 2022-12-16 18:39 UTC (permalink / raw)
  Cc: skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

Hi Jeff,

> From: Jeff Xu <jeffxu@google.com>
> 
> The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to
> set executable bit at creation time (memfd_create).
> 
> When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
> (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
> be executable (mode: 0777) after creation.
> 
> when MFD_EXEC flag is set, memfd is created with executable bit
> (mode:0777), this is the same as the old behavior of memfd_create.
> 
> The new pid namespaced sysctl vm.memfd_noexec has 3 values:
> 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
>         MFD_EXEC was set.
> 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
>         MFD_NOEXEC_SEAL was set.
> 2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> 
> The sysctl allows finer control of memfd_create for old-software
> that doesn't set the executable bit, for example, a container with
> vm.memfd_noexec=1 means the old-software will create non-executable
> memfd by default. Also, the value of memfd_noexec is passed to child
> namespace at creation time. For example, if the init namespace has
> vm.memfd_noexec=2, all its children namespaces will be created with 2.
> 
> Signed-off-by: Jeff Xu <jeffxu@google.com>
> Co-developed-by: Daniel Verkamp <dverkamp@chromium.org>
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
[...]
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index f4f8cb0435b4..8a98b1af9376 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -23,6 +23,7 @@
>  #include <linux/sched/task.h>
>  #include <linux/sched/signal.h>
>  #include <linux/idr.h>
> +#include "pid_sysctl.h"
>  
>  static DEFINE_MUTEX(pid_caches_mutex);
>  static struct kmem_cache *pid_ns_cachep;
> @@ -110,6 +111,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
>  	ns->ucounts = ucounts;
>  	ns->pid_allocated = PIDNS_ADDING;
>  
> +	initialize_memfd_noexec_scope(ns);
> +
>  	return ns;
>  
>  out_free_idr:
> @@ -455,6 +458,8 @@ static __init int pid_namespaces_init(void)
>  #ifdef CONFIG_CHECKPOINT_RESTORE
>  	register_sysctl_paths(kern_path, pid_ns_ctl_table);
>  #endif
> +
> +	register_pid_ns_sysctl_table_vm();
>  	return 0;
>  }
[...]
>  
> diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
> new file mode 100644
> index 000000000000..90a93161a122
> --- /dev/null
> +++ b/kernel/pid_sysctl.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef LINUX_PID_SYSCTL_H
> +#define LINUX_PID_SYSCTL_H
> +
> +#include <linux/pid_namespace.h>
> +
> +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
> +static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns)
[...]
> +static inline void register_pid_ns_sysctl_table_vm(void)
> +{
> +	register_sysctl_paths(vm_path, pid_ns_ctl_table_vm);
> +}
> +#else
> +static inline void set_memfd_noexec_scope(struct pid_namespace *ns) {}
> +static inline void register_pid_ns_ctl_table_vm(void) {}
> +#endif
[...]

I found this patch makes build fails whne CONFIG_SYSCTL or CONFIG_MEMFD_CREATE
are not defined, as initialize_memfd_noexec_scope() and
register_pid_ns_sysctl_table_vm() are used from pid_namespace.c without the
configs protection.

I just posted a patch for that:
https://lore.kernel.org/linux-mm/20221216183314.169707-1-sj@kernel.org/

Could you please check?


Thanks,
SJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
@ 2022-12-16 18:39     ` SeongJae Park
  0 siblings, 0 replies; 25+ messages in thread
From: SeongJae Park @ 2022-12-16 18:39 UTC (permalink / raw)
  Cc: skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd, jeffxu,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

Hi Jeff,

> From: Jeff Xu <jeffxu@google.com>
> 
> The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to
> set executable bit at creation time (memfd_create).
> 
> When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
> (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
> be executable (mode: 0777) after creation.
> 
> when MFD_EXEC flag is set, memfd is created with executable bit
> (mode:0777), this is the same as the old behavior of memfd_create.
> 
> The new pid namespaced sysctl vm.memfd_noexec has 3 values:
> 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
>         MFD_EXEC was set.
> 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
>         MFD_NOEXEC_SEAL was set.
> 2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> 
> The sysctl allows finer control of memfd_create for old-software
> that doesn't set the executable bit, for example, a container with
> vm.memfd_noexec=1 means the old-software will create non-executable
> memfd by default. Also, the value of memfd_noexec is passed to child
> namespace at creation time. For example, if the init namespace has
> vm.memfd_noexec=2, all its children namespaces will be created with 2.
> 
> Signed-off-by: Jeff Xu <jeffxu@google.com>
> Co-developed-by: Daniel Verkamp <dverkamp@chromium.org>
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
[...]
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index f4f8cb0435b4..8a98b1af9376 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -23,6 +23,7 @@
>  #include <linux/sched/task.h>
>  #include <linux/sched/signal.h>
>  #include <linux/idr.h>
> +#include "pid_sysctl.h"
>  
>  static DEFINE_MUTEX(pid_caches_mutex);
>  static struct kmem_cache *pid_ns_cachep;
> @@ -110,6 +111,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
>  	ns->ucounts = ucounts;
>  	ns->pid_allocated = PIDNS_ADDING;
>  
> +	initialize_memfd_noexec_scope(ns);
> +
>  	return ns;
>  
>  out_free_idr:
> @@ -455,6 +458,8 @@ static __init int pid_namespaces_init(void)
>  #ifdef CONFIG_CHECKPOINT_RESTORE
>  	register_sysctl_paths(kern_path, pid_ns_ctl_table);
>  #endif
> +
> +	register_pid_ns_sysctl_table_vm();
>  	return 0;
>  }
[...]
>  
> diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
> new file mode 100644
> index 000000000000..90a93161a122
> --- /dev/null
> +++ b/kernel/pid_sysctl.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef LINUX_PID_SYSCTL_H
> +#define LINUX_PID_SYSCTL_H
> +
> +#include <linux/pid_namespace.h>
> +
> +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
> +static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns)
[...]
> +static inline void register_pid_ns_sysctl_table_vm(void)
> +{
> +	register_sysctl_paths(vm_path, pid_ns_ctl_table_vm);
> +}
> +#else
> +static inline void set_memfd_noexec_scope(struct pid_namespace *ns) {}
> +static inline void register_pid_ns_ctl_table_vm(void) {}
> +#endif
[...]

I found this patch makes build fails whne CONFIG_SYSCTL or CONFIG_MEMFD_CREATE
are not defined, as initialize_memfd_noexec_scope() and
register_pid_ns_sysctl_table_vm() are used from pid_namespace.c without the
configs protection.

I just posted a patch for that:
https://lore.kernel.org/linux-mm/20221216183314.169707-1-sj@kernel.org/

Could you please check?


Thanks,
SJ


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-16 18:39     ` SeongJae Park
  (?)
@ 2022-12-16 19:03     ` Jeff Xu
  2022-12-16 19:21       ` Andrew Morton
  -1 siblings, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2022-12-16 19:03 UTC (permalink / raw)
  To: SeongJae Park
  Cc: skhan, keescook, akpm, dmitry.torokhov, dverkamp, hughd, jorgelo,
	linux-kernel, linux-kselftest, linux-mm, jannh, linux-hardening,
	linux-security-module, kernel test robot

On Fri, Dec 16, 2022 at 10:39 AM SeongJae Park <sj@kernel.org> wrote:
>
> Hi Jeff,
>
> > From: Jeff Xu <jeffxu@google.com>
> >
> > The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to
> > set executable bit at creation time (memfd_create).
> >
> > When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
> > (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
> > be executable (mode: 0777) after creation.
> >
> > when MFD_EXEC flag is set, memfd is created with executable bit
> > (mode:0777), this is the same as the old behavior of memfd_create.
> >
> > The new pid namespaced sysctl vm.memfd_noexec has 3 values:
> > 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
> >         MFD_EXEC was set.
> > 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
> >         MFD_NOEXEC_SEAL was set.
> > 2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> >
> > The sysctl allows finer control of memfd_create for old-software
> > that doesn't set the executable bit, for example, a container with
> > vm.memfd_noexec=1 means the old-software will create non-executable
> > memfd by default. Also, the value of memfd_noexec is passed to child
> > namespace at creation time. For example, if the init namespace has
> > vm.memfd_noexec=2, all its children namespaces will be created with 2.
> >
> > Signed-off-by: Jeff Xu <jeffxu@google.com>
> > Co-developed-by: Daniel Verkamp <dverkamp@chromium.org>
> > Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> [...]
> > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> > index f4f8cb0435b4..8a98b1af9376 100644
> > --- a/kernel/pid_namespace.c
> > +++ b/kernel/pid_namespace.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/sched/task.h>
> >  #include <linux/sched/signal.h>
> >  #include <linux/idr.h>
> > +#include "pid_sysctl.h"
> >
> >  static DEFINE_MUTEX(pid_caches_mutex);
> >  static struct kmem_cache *pid_ns_cachep;
> > @@ -110,6 +111,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
> >       ns->ucounts = ucounts;
> >       ns->pid_allocated = PIDNS_ADDING;
> >
> > +     initialize_memfd_noexec_scope(ns);
> > +
> >       return ns;
> >
> >  out_free_idr:
> > @@ -455,6 +458,8 @@ static __init int pid_namespaces_init(void)
> >  #ifdef CONFIG_CHECKPOINT_RESTORE
> >       register_sysctl_paths(kern_path, pid_ns_ctl_table);
> >  #endif
> > +
> > +     register_pid_ns_sysctl_table_vm();
> >       return 0;
> >  }
> [...]
> >
> > diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
> > new file mode 100644
> > index 000000000000..90a93161a122
> > --- /dev/null
> > +++ b/kernel/pid_sysctl.h
> > @@ -0,0 +1,59 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef LINUX_PID_SYSCTL_H
> > +#define LINUX_PID_SYSCTL_H
> > +
> > +#include <linux/pid_namespace.h>
> > +
> > +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
> > +static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns)
> [...]
> > +static inline void register_pid_ns_sysctl_table_vm(void)
> > +{
> > +     register_sysctl_paths(vm_path, pid_ns_ctl_table_vm);
> > +}
> > +#else
> > +static inline void set_memfd_noexec_scope(struct pid_namespace *ns) {}
> > +static inline void register_pid_ns_ctl_table_vm(void) {}
> > +#endif
> [...]
>
> I found this patch makes build fails whne CONFIG_SYSCTL or CONFIG_MEMFD_CREATE
> are not defined, as initialize_memfd_noexec_scope() and
> register_pid_ns_sysctl_table_vm() are used from pid_namespace.c without the
> configs protection.
>
> I just posted a patch for that:
> https://lore.kernel.org/linux-mm/20221216183314.169707-1-sj@kernel.org/
>
> Could you please check?
>
Hi SeongJae,
Thanks for the patch ! I responded to the other thread.

Andrew,
From a process point of view, should I update this patch to V9 to
include the fix ?
or add a patch directly on top in the mm-unstable branch.

Thanks
Jeff

>
> Thanks,
> SJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-16 19:03     ` Jeff Xu
@ 2022-12-16 19:21       ` Andrew Morton
  2022-12-16 19:31         ` SeongJae Park
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2022-12-16 19:21 UTC (permalink / raw)
  To: Jeff Xu
  Cc: SeongJae Park, skhan, keescook, dmitry.torokhov, dverkamp, hughd,
	jorgelo, linux-kernel, linux-kselftest, linux-mm, jannh,
	linux-hardening, linux-security-module, kernel test robot

On Fri, 16 Dec 2022 11:03:06 -0800 Jeff Xu <jeffxu@google.com> wrote:

> >
> > I just posted a patch for that:
> > https://lore.kernel.org/linux-mm/20221216183314.169707-1-sj@kernel.org/
> >
> > Could you please check?
> >
> Hi SeongJae,
> Thanks for the patch ! I responded to the other thread.
> 
> Andrew,
> >From a process point of view, should I update this patch to V9 to
> include the fix ?
> or add a patch directly on top in the mm-unstable branch.

A little fixup patch wouild be preferable.

But I added the below yesterday, so I think we're all good?

--- a/kernel/pid_sysctl.h~mm-memfd-add-mfd_noexec_seal-and-mfd_exec-fix
+++ a/kernel/pid_sysctl.h
@@ -52,8 +52,10 @@ static inline void register_pid_ns_sysct
 	register_sysctl_paths(vm_path, pid_ns_ctl_table_vm);
 }
 #else
+static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns) {}
 static inline void set_memfd_noexec_scope(struct pid_namespace *ns) {}
 static inline void register_pid_ns_ctl_table_vm(void) {}
+static inline void register_pid_ns_sysctl_table_vm(void) {}
 #endif
 
 #endif /* LINUX_PID_SYSCTL_H */
_


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
  2022-12-16 19:21       ` Andrew Morton
@ 2022-12-16 19:31         ` SeongJae Park
  0 siblings, 0 replies; 25+ messages in thread
From: SeongJae Park @ 2022-12-16 19:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Xu, SeongJae Park, skhan, keescook, dmitry.torokhov,
	dverkamp, hughd, jorgelo, linux-kernel, linux-kselftest,
	linux-mm, jannh, linux-hardening, linux-security-module,
	kernel test robot

Hi Jeff and Andrew,

On Fri, 16 Dec 2022 11:21:02 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 16 Dec 2022 11:03:06 -0800 Jeff Xu <jeffxu@google.com> wrote:
> 
> > >
> > > I just posted a patch for that:
> > > https://lore.kernel.org/linux-mm/20221216183314.169707-1-sj@kernel.org/
> > >
> > > Could you please check?
> > >
> > Hi SeongJae,
> > Thanks for the patch ! I responded to the other thread.

Thank you for the quick and nice response, Jeff :)

> > 
> > Andrew,
> > >From a process point of view, should I update this patch to V9 to
> > include the fix ?
> > or add a patch directly on top in the mm-unstable branch.
> 
> A little fixup patch wouild be preferable.
> 
> But I added the below yesterday, so I think we're all good?

Good, thank you.  I should be more patient until you push it, but I was unable
to resist ;)


Thanks,
SJ

> 
> --- a/kernel/pid_sysctl.h~mm-memfd-add-mfd_noexec_seal-and-mfd_exec-fix
> +++ a/kernel/pid_sysctl.h
> @@ -52,8 +52,10 @@ static inline void register_pid_ns_sysct
>  	register_sysctl_paths(vm_path, pid_ns_ctl_table_vm);
>  }
>  #else
> +static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns) {}
>  static inline void set_memfd_noexec_scope(struct pid_namespace *ns) {}
>  static inline void register_pid_ns_ctl_table_vm(void) {}
> +static inline void register_pid_ns_sysctl_table_vm(void) {}
>  #endif
>  
>  #endif /* LINUX_PID_SYSCTL_H */
> _

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-16 19:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-09 16:04 [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
2022-12-09 16:04 ` [PATCH v7 1/6] mm/memfd: add F_SEAL_EXEC jeffxu
2022-12-09 16:04 ` [PATCH v7 2/6] selftests/memfd: add tests for F_SEAL_EXEC jeffxu
2022-12-14 18:52   ` Kees Cook
2022-12-09 16:04 ` [PATCH v7 3/6] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC jeffxu
2022-12-14 18:53   ` Kees Cook
2022-12-16 18:39   ` SeongJae Park
2022-12-16 18:39     ` SeongJae Park
2022-12-16 19:03     ` Jeff Xu
2022-12-16 19:21       ` Andrew Morton
2022-12-16 19:31         ` SeongJae Park
2022-12-09 16:04 ` [PATCH v7 4/6] mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd jeffxu
2022-12-09 16:04 ` [PATCH v7 5/6] selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC jeffxu
2022-12-09 16:04 ` [PATCH v7 6/6] mm/memfd: security hook for memfd_create jeffxu
2022-12-09 17:02   ` Casey Schaufler
2022-12-09 18:29   ` Paul Moore
2022-12-13 15:00     ` Jeff Xu
2022-12-13 15:37       ` Casey Schaufler
2022-12-13 19:22       ` Paul Moore
2022-12-13 23:05         ` Jeff Xu
2022-12-09 18:15 ` [PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC Paul Moore
2022-12-14 18:54 ` Kees Cook
2022-12-14 23:32   ` Jeff Xu
2022-12-15  0:08     ` Kees Cook
2022-12-15 16:55       ` Jeff Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.