All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
@ 2022-10-26 15:04 ` Joey Gouly
  0 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

Hi all,

This is a follow up to the RFC that Catalin posted:
  https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable. Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but
subsequently changed to read-only. Therefore the filter simply rejects
any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
support (Branch Target Identification), the dynamic loader cannot change
an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
For libraries, it can resort to unmapping and re-mapping but for the
main executable it does not have a file descriptor. The original bug
report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
for libraries - [3].

This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
However unlike the BPF filter it only denies it if the mapping didn't previous
have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
which is a problem with the BPF filter.

Thanks,
Joey

[1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831

Joey Gouly (2):
  mm: Implement memory-deny-write-execute as a prctl
  kselftest: vm: add tests for memory-deny-write-execute

 include/linux/mman.h                   |  15 ++
 include/linux/sched/coredump.h         |   6 +-
 include/uapi/linux/prctl.h             |   6 +
 kernel/sys.c                           |  18 +++
 mm/mmap.c                              |   3 +
 mm/mprotect.c                          |   5 +
 tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
 7 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
@ 2022-10-26 15:04 ` Joey Gouly
  0 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

Hi all,

This is a follow up to the RFC that Catalin posted:
  https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable. Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but
subsequently changed to read-only. Therefore the filter simply rejects
any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
support (Branch Target Identification), the dynamic loader cannot change
an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
For libraries, it can resort to unmapping and re-mapping but for the
main executable it does not have a file descriptor. The original bug
report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
for libraries - [3].

This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
However unlike the BPF filter it only denies it if the mapping didn't previous
have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
which is a problem with the BPF filter.

Thanks,
Joey

[1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831

Joey Gouly (2):
  mm: Implement memory-deny-write-execute as a prctl
  kselftest: vm: add tests for memory-deny-write-execute

 include/linux/mman.h                   |  15 ++
 include/linux/sched/coredump.h         |   6 +-
 include/uapi/linux/prctl.h             |   6 +
 kernel/sys.c                           |  18 +++
 mm/mmap.c                              |   3 +
 mm/mprotect.c                          |   5 +
 tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
 7 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-10-26 15:04 ` Joey Gouly
@ 2022-10-26 15:04   ` Joey Gouly
  -1 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

The aim of such policy is to prevent a user task from creating an
executable mapping that is also writeable.

An example of mmap() returning -EACCESS if the policy is enabled:

	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);

Similarly, mprotect() would return -EACCESS below:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);

The BPF filter that systemd MDWE uses is stateless, and disallows
mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
be enabled if it was already PROT_EXEC, which allows the following case:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

where PROT_BTI enables branch tracking identification on arm64.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/mman.h           | 15 +++++++++++++++
 include/linux/sched/coredump.h |  6 +++++-
 include/uapi/linux/prctl.h     |  6 ++++++
 kernel/sys.c                   | 18 ++++++++++++++++++
 mm/mmap.c                      |  3 +++
 mm/mprotect.c                  |  5 +++++
 6 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 58b3abd457a3..d84fdeab6b5e 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags)
 }
 
 unsigned long vm_commit_limit(void);
+
+static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
+{
+	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return false;
+
+	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+		return true;
+
+	if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+		return true;
+
+	return false;
+}
+
 #endif /* _LINUX_MMAN_H */
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 8270ad7ae14c..0e17ae7fbfd3 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
  * lifecycle of this mm, just for simplicity.
  */
 #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
+
+#define MMF_HAS_MDWE		28
+#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
+
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
-				 MMF_DISABLE_THP_MASK)
+				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
 
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a5e06dcbba13..ab9db1e86230 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -281,6 +281,12 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK		0xffff
 # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/* Memory deny write / execute */
+#define PR_SET_MDWE			65
+# define PR_MDWE_FLAG_MMAP		1
+
+#define PR_GET_MDWE			66
+
 #define PR_SET_VMA		0x53564d41
 # define PR_SET_VMA_ANON_NAME		0
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 5fd54bf0e886..08e1dd6d2533 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
 }
 #endif /* CONFIG_ANON_VMA_NAME */
 
+static inline int prctl_set_mdwe(void)
+{
+	set_bit(MMF_HAS_MDWE, &current->mm->flags);
+
+	return 0;
+}
+
+static inline int prctl_get_mdwe(void)
+{
+	return test_bit(MMF_HAS_MDWE, &current->mm->flags);
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
 		break;
 #endif
+	case PR_SET_MDWE:
+		error = prctl_set_mdwe();
+		break;
+	case PR_GET_MDWE:
+		error = prctl_get_mdwe();
+		break;
 	case PR_SET_VMA:
 		error = prctl_set_vma(arg2, arg3, arg4, arg5);
 		break;
diff --git a/mm/mmap.c b/mm/mmap.c
index 099468aee4d8..42eaf6683216 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			vm_flags |= VM_NORESERVE;
 	}
 
+	if (map_deny_write_exec(NULL, vm_flags))
+		return -EACCES;
+
 	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
 	if (!IS_ERR_VALUE(addr) &&
 	    ((vm_flags & VM_LOCKED) ||
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8d770855b591..af71ef0788fd 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			break;
 		}
 
+		if (map_deny_write_exec(vma, newflags)) {
+			error = -EACCES;
+			goto out;
+		}
+
 		/* Allow architectures to sanity-check the new flags */
 		if (!arch_validate_flags(newflags)) {
 			error = -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-10-26 15:04   ` Joey Gouly
  0 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

The aim of such policy is to prevent a user task from creating an
executable mapping that is also writeable.

An example of mmap() returning -EACCESS if the policy is enabled:

	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);

Similarly, mprotect() would return -EACCESS below:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);

The BPF filter that systemd MDWE uses is stateless, and disallows
mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
be enabled if it was already PROT_EXEC, which allows the following case:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

where PROT_BTI enables branch tracking identification on arm64.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/mman.h           | 15 +++++++++++++++
 include/linux/sched/coredump.h |  6 +++++-
 include/uapi/linux/prctl.h     |  6 ++++++
 kernel/sys.c                   | 18 ++++++++++++++++++
 mm/mmap.c                      |  3 +++
 mm/mprotect.c                  |  5 +++++
 6 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 58b3abd457a3..d84fdeab6b5e 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags)
 }
 
 unsigned long vm_commit_limit(void);
+
+static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
+{
+	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return false;
+
+	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+		return true;
+
+	if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+		return true;
+
+	return false;
+}
+
 #endif /* _LINUX_MMAN_H */
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 8270ad7ae14c..0e17ae7fbfd3 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
  * lifecycle of this mm, just for simplicity.
  */
 #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
+
+#define MMF_HAS_MDWE		28
+#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
+
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
-				 MMF_DISABLE_THP_MASK)
+				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
 
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a5e06dcbba13..ab9db1e86230 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -281,6 +281,12 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK		0xffff
 # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/* Memory deny write / execute */
+#define PR_SET_MDWE			65
+# define PR_MDWE_FLAG_MMAP		1
+
+#define PR_GET_MDWE			66
+
 #define PR_SET_VMA		0x53564d41
 # define PR_SET_VMA_ANON_NAME		0
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 5fd54bf0e886..08e1dd6d2533 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
 }
 #endif /* CONFIG_ANON_VMA_NAME */
 
+static inline int prctl_set_mdwe(void)
+{
+	set_bit(MMF_HAS_MDWE, &current->mm->flags);
+
+	return 0;
+}
+
+static inline int prctl_get_mdwe(void)
+{
+	return test_bit(MMF_HAS_MDWE, &current->mm->flags);
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
 		break;
 #endif
+	case PR_SET_MDWE:
+		error = prctl_set_mdwe();
+		break;
+	case PR_GET_MDWE:
+		error = prctl_get_mdwe();
+		break;
 	case PR_SET_VMA:
 		error = prctl_set_vma(arg2, arg3, arg4, arg5);
 		break;
diff --git a/mm/mmap.c b/mm/mmap.c
index 099468aee4d8..42eaf6683216 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			vm_flags |= VM_NORESERVE;
 	}
 
+	if (map_deny_write_exec(NULL, vm_flags))
+		return -EACCES;
+
 	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
 	if (!IS_ERR_VALUE(addr) &&
 	    ((vm_flags & VM_LOCKED) ||
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8d770855b591..af71ef0788fd 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			break;
 		}
 
+		if (map_deny_write_exec(vma, newflags)) {
+			error = -EACCES;
+			goto out;
+		}
+
 		/* Allow architectures to sanity-check the new flags */
 		if (!arch_validate_flags(newflags)) {
 			error = -EINVAL;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-26 15:04 ` Joey Gouly
@ 2022-10-26 15:04   ` Joey Gouly
  -1 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

Add some tests to cover the new PR_SET_MDWE prctl.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
---
 tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c
new file mode 100644
index 000000000000..67f3fc06d069
--- /dev/null
+++ b/tools/testing/selftests/vm/mdwe_test.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/hwcap.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <linux/prctl.h>
+
+#include "../kselftest.h"
+
+#define PR_SET_MDWE                     65
+# define PR_MDWE_FLAG_MMAP              1
+
+#define PR_GET_MDWE                     66
+
+#ifdef __aarch64__
+#define PROT_BTI      0x10            /* BTI guarded page */
+#endif
+
+#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n"
+#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n"
+#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n"
+#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n"
+
+int fork_test(int (*func)(int))
+{
+	pid_t pid;
+	int status;
+
+	pid = fork();
+	if (pid < 0) {
+		printf("fork failed\n");
+		return KSFT_FAIL;
+	}
+
+	if (pid == 0)
+		exit(func(1));
+
+	waitpid(pid, &status, 0);
+
+	if (WIFEXITED(status))
+		return WEXITSTATUS(status);
+
+	return 0;
+}
+
+static inline void test_result(int err, const char *msg)
+{
+	switch (err) {
+	case KSFT_PASS:
+		ksft_test_result_pass(msg);
+		break;
+	case KSFT_FAIL:
+		ksft_test_result_fail(msg);
+		break;
+	case KSFT_SKIP:
+		ksft_test_result_skip(msg);
+		break;
+	default:
+		ksft_test_result_error("Unknown return code %d from %s",
+				       err, msg);
+		break;
+	}
+}
+
+int test1(int mdwe_enabled)
+{
+	void *p;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	p = mmap(0, size, PROT_WRITE | PROT_EXEC, mmap_flags, 0, 0);
+
+	if (mdwe_enabled)
+		return p == MAP_FAILED ? KSFT_PASS : KSFT_FAIL;
+	else
+		return p != MAP_FAILED ? KSFT_PASS : KSFT_FAIL;
+}
+
+int test2(int mdwe_enabled)
+{
+	void *p;
+	int ret;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	p = mmap(0, size, PROT_WRITE, mmap_flags, 0, 0);
+	if (p == MAP_FAILED)
+		return 0;
+	ret = mprotect(p, size, PROT_EXEC);
+
+	if (mdwe_enabled)
+		return ret < 0 ? KSFT_PASS : KSFT_FAIL;
+	else
+		return ret == 0 ? KSFT_PASS : KSFT_FAIL;
+}
+
+int test3(int mdwe_enabled)
+{
+	void *p;
+	int ret;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	p = mmap(0, size, PROT_EXEC, mmap_flags, 0, 0);
+	if (p == MAP_FAILED)
+		return 0;
+
+	ret = mprotect(p, size, PROT_EXEC | PROT_READ);
+
+	return ret == 0 ? KSFT_PASS : KSFT_FAIL;
+}
+
+#ifdef __aarch64__
+int test4(int mdwe_enabled)
+{
+	void *p;
+	int ret;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI))
+		return KSFT_SKIP;
+
+	p = mmap(0, size, PROT_EXEC, mmap_flags, 0, 0);
+	if (p == MAP_FAILED)
+		return KSFT_FAIL;
+
+	ret = mprotect(p, size, PROT_EXEC | PROT_BTI);
+
+	return ret == 0 ? KSFT_PASS : KSFT_FAIL;
+}
+#endif
+
+int main(void)
+{
+	int ret;
+
+	ksft_print_header();
+#ifdef __aarch64__
+	ksft_set_plan(12);
+#else
+	ksft_set_plan(9);
+#endif
+
+	// First run the tests without MDWE
+	test_result(test1(0), TEST1);
+	test_result(test2(0), TEST2);
+	test_result(test3(0), TEST3);
+#ifdef __aarch64__
+	test_result(test4(0), TEST4);
+#endif
+
+	// Enable MDWE and then run the tests again.
+	ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
+	if (ret < 0) {
+		ksft_print_msg("PR_SET_MDWE failed or unsupported!\n");
+		goto exit;
+	}
+
+	ret = prctl(PR_GET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
+	if (ret == 0)
+		ksft_exit_fail_msg("PR_GET_MDWE failed!");
+
+	test_result(test1(1), "MDWE: " TEST1);
+	test_result(test2(1), "MDWE: " TEST2);
+	test_result(test3(1), "MDWE: " TEST3);
+#ifdef __aarch64__
+	test_result(test4(1), "MDWE: " TEST4);
+#endif
+
+	// Verify the MDWE setting is transferred when fork()ing
+	test_result(fork_test(test1), "MDWE+fork: " TEST1);
+	test_result(fork_test(test2), "MDWE+fork: " TEST2);
+	test_result(fork_test(test3), "MDWE+fork: " TEST3);
+#ifdef __aarch64__
+	test_result(fork_test(test4), "MDWE+fork: " TEST4);
+#endif
+
+exit:
+	ksft_finished();
+
+	return 0;
+}
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-10-26 15:04   ` Joey Gouly
  0 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

Add some tests to cover the new PR_SET_MDWE prctl.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
---
 tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c
new file mode 100644
index 000000000000..67f3fc06d069
--- /dev/null
+++ b/tools/testing/selftests/vm/mdwe_test.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/hwcap.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <linux/prctl.h>
+
+#include "../kselftest.h"
+
+#define PR_SET_MDWE                     65
+# define PR_MDWE_FLAG_MMAP              1
+
+#define PR_GET_MDWE                     66
+
+#ifdef __aarch64__
+#define PROT_BTI      0x10            /* BTI guarded page */
+#endif
+
+#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n"
+#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n"
+#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n"
+#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n"
+
+int fork_test(int (*func)(int))
+{
+	pid_t pid;
+	int status;
+
+	pid = fork();
+	if (pid < 0) {
+		printf("fork failed\n");
+		return KSFT_FAIL;
+	}
+
+	if (pid == 0)
+		exit(func(1));
+
+	waitpid(pid, &status, 0);
+
+	if (WIFEXITED(status))
+		return WEXITSTATUS(status);
+
+	return 0;
+}
+
+static inline void test_result(int err, const char *msg)
+{
+	switch (err) {
+	case KSFT_PASS:
+		ksft_test_result_pass(msg);
+		break;
+	case KSFT_FAIL:
+		ksft_test_result_fail(msg);
+		break;
+	case KSFT_SKIP:
+		ksft_test_result_skip(msg);
+		break;
+	default:
+		ksft_test_result_error("Unknown return code %d from %s",
+				       err, msg);
+		break;
+	}
+}
+
+int test1(int mdwe_enabled)
+{
+	void *p;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	p = mmap(0, size, PROT_WRITE | PROT_EXEC, mmap_flags, 0, 0);
+
+	if (mdwe_enabled)
+		return p == MAP_FAILED ? KSFT_PASS : KSFT_FAIL;
+	else
+		return p != MAP_FAILED ? KSFT_PASS : KSFT_FAIL;
+}
+
+int test2(int mdwe_enabled)
+{
+	void *p;
+	int ret;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	p = mmap(0, size, PROT_WRITE, mmap_flags, 0, 0);
+	if (p == MAP_FAILED)
+		return 0;
+	ret = mprotect(p, size, PROT_EXEC);
+
+	if (mdwe_enabled)
+		return ret < 0 ? KSFT_PASS : KSFT_FAIL;
+	else
+		return ret == 0 ? KSFT_PASS : KSFT_FAIL;
+}
+
+int test3(int mdwe_enabled)
+{
+	void *p;
+	int ret;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	p = mmap(0, size, PROT_EXEC, mmap_flags, 0, 0);
+	if (p == MAP_FAILED)
+		return 0;
+
+	ret = mprotect(p, size, PROT_EXEC | PROT_READ);
+
+	return ret == 0 ? KSFT_PASS : KSFT_FAIL;
+}
+
+#ifdef __aarch64__
+int test4(int mdwe_enabled)
+{
+	void *p;
+	int ret;
+
+	int size = getpagesize();
+	int mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
+
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI))
+		return KSFT_SKIP;
+
+	p = mmap(0, size, PROT_EXEC, mmap_flags, 0, 0);
+	if (p == MAP_FAILED)
+		return KSFT_FAIL;
+
+	ret = mprotect(p, size, PROT_EXEC | PROT_BTI);
+
+	return ret == 0 ? KSFT_PASS : KSFT_FAIL;
+}
+#endif
+
+int main(void)
+{
+	int ret;
+
+	ksft_print_header();
+#ifdef __aarch64__
+	ksft_set_plan(12);
+#else
+	ksft_set_plan(9);
+#endif
+
+	// First run the tests without MDWE
+	test_result(test1(0), TEST1);
+	test_result(test2(0), TEST2);
+	test_result(test3(0), TEST3);
+#ifdef __aarch64__
+	test_result(test4(0), TEST4);
+#endif
+
+	// Enable MDWE and then run the tests again.
+	ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
+	if (ret < 0) {
+		ksft_print_msg("PR_SET_MDWE failed or unsupported!\n");
+		goto exit;
+	}
+
+	ret = prctl(PR_GET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
+	if (ret == 0)
+		ksft_exit_fail_msg("PR_GET_MDWE failed!");
+
+	test_result(test1(1), "MDWE: " TEST1);
+	test_result(test2(1), "MDWE: " TEST2);
+	test_result(test3(1), "MDWE: " TEST3);
+#ifdef __aarch64__
+	test_result(test4(1), "MDWE: " TEST4);
+#endif
+
+	// Verify the MDWE setting is transferred when fork()ing
+	test_result(fork_test(test1), "MDWE+fork: " TEST1);
+	test_result(fork_test(test2), "MDWE+fork: " TEST2);
+	test_result(fork_test(test3), "MDWE+fork: " TEST3);
+#ifdef __aarch64__
+	test_result(fork_test(test4), "MDWE+fork: " TEST4);
+#endif
+
+exit:
+	ksft_finished();
+
+	return 0;
+}
+
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-26 15:04   ` Joey Gouly
@ 2022-10-28 17:03     ` Mark Brown
  -1 siblings, 0 replies; 34+ messages in thread
From: Mark Brown @ 2022-10-28 17:03 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]

On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:

> Add some tests to cover the new PR_SET_MDWE prctl.

Some comments below but they're all stylistic and let's not make perfect
be the enemy of the good here so

Reviewed-by: Mark Brown <broonie@kernel.org>

and we can iterate later rather than blocking anything on the testcase.

> +#ifdef __aarch64__
> +#define PROT_BTI      0x10            /* BTI guarded page */
> +#endif

We should get this from the kernel headers shouldn't we?  We generally
rely on things getting pulled in from there rather than locally
defining.

> +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n"
> +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n"
> +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n"
> +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n"

> +int test1(int mdwe_enabled)
> +{

It feels like we could usefully make an array of

	struct test {
		int (*run)(bool mdwe_enabled);
		char *name;
	}

then we'd need fewer ifdefs, things could be more usefully named and
it'd be a bit easier to add new cases.

> +#ifdef __aarch64__
> +	ksft_set_plan(12);
> +#else
> +	ksft_set_plan(9);
> +#endif

That'd just be ksft_test_plan(3 * ARRAY_SIZE(tests).

> +	// First run the tests without MDWE
> +	test_result(test1(0), TEST1);
> +	test_result(test2(0), TEST2);
> +	test_result(test3(0), TEST3);
> +#ifdef __aarch64__
> +	test_result(test4(0), TEST4);
> +#endif

and these calls to the tests would all be iterating over the array.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-10-28 17:03     ` Mark Brown
  0 siblings, 0 replies; 34+ messages in thread
From: Mark Brown @ 2022-10-28 17:03 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah


[-- Attachment #1.1: Type: text/plain, Size: 1513 bytes --]

On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:

> Add some tests to cover the new PR_SET_MDWE prctl.

Some comments below but they're all stylistic and let's not make perfect
be the enemy of the good here so

Reviewed-by: Mark Brown <broonie@kernel.org>

and we can iterate later rather than blocking anything on the testcase.

> +#ifdef __aarch64__
> +#define PROT_BTI      0x10            /* BTI guarded page */
> +#endif

We should get this from the kernel headers shouldn't we?  We generally
rely on things getting pulled in from there rather than locally
defining.

> +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n"
> +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n"
> +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n"
> +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n"

> +int test1(int mdwe_enabled)
> +{

It feels like we could usefully make an array of

	struct test {
		int (*run)(bool mdwe_enabled);
		char *name;
	}

then we'd need fewer ifdefs, things could be more usefully named and
it'd be a bit easier to add new cases.

> +#ifdef __aarch64__
> +	ksft_set_plan(12);
> +#else
> +	ksft_set_plan(9);
> +#endif

That'd just be ksft_test_plan(3 * ARRAY_SIZE(tests).

> +	// First run the tests without MDWE
> +	test_result(test1(0), TEST1);
> +	test_result(test2(0), TEST2);
> +	test_result(test3(0), TEST3);
> +#ifdef __aarch64__
> +	test_result(test4(0), TEST4);
> +#endif

and these calls to the tests would all be iterating over the array.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-26 15:04   ` Joey Gouly
@ 2022-10-28 17:45     ` Kees Cook
  -1 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 17:45 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:
> +#include "../kselftest.h"

I recommend using kselftest_harness.h instead; it provides much of the
infrastructure that is open-coded here. But yes, testing is good; thank
you! :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-10-28 17:45     ` Kees Cook
  0 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 17:45 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:
> +#include "../kselftest.h"

I recommend using kselftest_harness.h instead; it provides much of the
infrastructure that is open-coded here. But yes, testing is good; thank
you! :)

-- 
Kees Cook

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-10-26 15:04   ` Joey Gouly
@ 2022-10-28 18:51     ` Kees Cook
  -1 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 18:51 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> The aim of such policy is to prevent a user task from creating an
> executable mapping that is also writeable.
> 
> An example of mmap() returning -EACCESS if the policy is enabled:
> 
> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> 
> Similarly, mprotect() would return -EACCESS below:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> The BPF filter that systemd MDWE uses is stateless, and disallows
> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> be enabled if it was already PROT_EXEC, which allows the following case:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> where PROT_BTI enables branch tracking identification on arm64.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/mman.h           | 15 +++++++++++++++
>  include/linux/sched/coredump.h |  6 +++++-
>  include/uapi/linux/prctl.h     |  6 ++++++
>  kernel/sys.c                   | 18 ++++++++++++++++++
>  mm/mmap.c                      |  3 +++
>  mm/mprotect.c                  |  5 +++++
>  6 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..d84fdeab6b5e 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags)
>  }
>  
>  unsigned long vm_commit_limit(void);
> +
> +static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)

Traditionally, it is easier to write these in the positive instead of
needing to parse a double-negative.

static inline bool allow_write_exec(...)

> +{
> +	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
> +		return false;
> +
> +	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
> +		return true;
> +
> +	if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
> +		return true;
> +
> +	return false;
> +}

Since this is implementation "2" from the earlier discussion[1], I think
some comments in here are good to have. (e.g. to explain to people
reading this code why there is a vma test, etc.) Perhaps even explicit
repeat the implementation expectations.

Restating from that thread:

  2. "is not already PROT_EXEC":

     a)	mmap(PROT_READ|PROT_WRITE|PROT_EXEC);	// fails

     b)	mmap(PROT_READ|PROT_EXEC);
	mprotect(PROT_READ|PROT_EXEC|PROT_BTI);	// passes

     c)	mmap(PROT_READ);
	mprotect(PROT_READ|PROT_EXEC);		// fails

     d)	mmap(PROT_READ|PROT_WRITE);
	mprotect(PROT_READ);
	mprotect(PROT_READ|PROT_EXEC);		// fails

[1] https://lore.kernel.org/linux-arm-kernel/YmGjYYlcSVz38rOe@arm.com/

>  #endif /* _LINUX_MMAN_H */
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 8270ad7ae14c..0e17ae7fbfd3 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
>   * lifecycle of this mm, just for simplicity.
>   */
>  #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
> +
> +#define MMF_HAS_MDWE		28
> +#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
> +
>  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
>  
>  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> -				 MMF_DISABLE_THP_MASK)
> +				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)

Good, yes, new "live forever" bit here. Perhaps bikeshedding over the
name, see below.

>  
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index a5e06dcbba13..ab9db1e86230 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -281,6 +281,12 @@ struct prctl_mm_map {
>  # define PR_SME_VL_LEN_MASK		0xffff
>  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
>  
> +/* Memory deny write / execute */
> +#define PR_SET_MDWE			65
> +# define PR_MDWE_FLAG_MMAP		1
> +
> +#define PR_GET_MDWE			66
> +
>  #define PR_SET_VMA		0x53564d41
>  # define PR_SET_VMA_ANON_NAME		0
>  
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 5fd54bf0e886..08e1dd6d2533 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
>  
> +static inline int prctl_set_mdwe(void)
> +{
> +	set_bit(MMF_HAS_MDWE, &current->mm->flags);
> +
> +	return 0;
> +}
> +
> +static inline int prctl_get_mdwe(void)
> +{
> +	return test_bit(MMF_HAS_MDWE, &current->mm->flags);
> +}

These will need to change -- the aren't constructed for future expansion
at all. At the very least, all the arguments need to passed to be
checked that they are zero. e.g.:

int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
		   unsigned long arg4, unsigned long arg5)
{
	if (arg3 || arg4 || arg5)
		return -EINVAL;

	...

	return 0;
}

Otherwise, there's no way to add arguments in the future because old
userspace may have been sending arbitrary junk on the stack, etc.

And regardless, I think we'll need some explicit flag bits here, since
we can see there has been a long history of various other desired
features that may end up living in here. For now, a single bit is fine.
The intended behavior is the inability to _add_ PROT_EXEC to an existing
vma, and to deny the creating of a W+X vma to begin with, so perhaps
this bit can be named MDWE_FLAG_REFUSE_EXEC_GAIN?

Then the above "..." becomes:

	if (bits & ~(MDWE_FLAG_REFUSE_EXEC_GAIN))
		return -EINVAL;

	if (bits & MDWE_FLAG_REFUSE_EXEC_GAIN)
		set_bit(MMF_HAS_MDWE, &current->mm->flags);
	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
		return -EPERM; /* Cannot unset the flag */

And prctl_get_mdwe() becomes:

int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
		   unsigned long arg4, unsigned long arg5)
{
	if (arg2 || arg3 || arg4 || arg5)
		return -EINVAL;
	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
		MDWE_FLAG_REFUSE_EXEC_GAIN : 0;
}

> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
>  		break;
>  #endif
> +	case PR_SET_MDWE:
> +		error = prctl_set_mdwe();
> +		break;
> +	case PR_GET_MDWE:
> +		error = prctl_get_mdwe();
> +		break;
>  	case PR_SET_VMA:
>  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
>  		break;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 099468aee4d8..42eaf6683216 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  			vm_flags |= VM_NORESERVE;
>  	}
>  
> +	if (map_deny_write_exec(NULL, vm_flags))
> +		return -EACCES;
> +

This seems like the wrong place to do the check -- that the vma argument
is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
it live in mmap_region()? What happens with MAP_FIXED, when there is
an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
check. For example, we had "c" above:

     c)	mmap(PROT_READ);
	mprotect(PROT_READ|PROT_EXEC);		// fails

But this would allow another case:

     e)	addr = mmap(..., PROT_READ, ...);
	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes


>  	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
>  	if (!IS_ERR_VALUE(addr) &&
>  	    ((vm_flags & VM_LOCKED) ||
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 8d770855b591..af71ef0788fd 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>  			break;
>  		}
>  
> +		if (map_deny_write_exec(vma, newflags)) {
> +			error = -EACCES;
> +			goto out;
> +		}
> +

This looks like the right place. Any rationale for why it's before
arch_validate_flags()?

>  		/* Allow architectures to sanity-check the new flags */
>  		if (!arch_validate_flags(newflags)) {
>  			error = -EINVAL;

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-10-28 18:51     ` Kees Cook
  0 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 18:51 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> The aim of such policy is to prevent a user task from creating an
> executable mapping that is also writeable.
> 
> An example of mmap() returning -EACCESS if the policy is enabled:
> 
> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> 
> Similarly, mprotect() would return -EACCESS below:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> The BPF filter that systemd MDWE uses is stateless, and disallows
> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> be enabled if it was already PROT_EXEC, which allows the following case:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> where PROT_BTI enables branch tracking identification on arm64.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/mman.h           | 15 +++++++++++++++
>  include/linux/sched/coredump.h |  6 +++++-
>  include/uapi/linux/prctl.h     |  6 ++++++
>  kernel/sys.c                   | 18 ++++++++++++++++++
>  mm/mmap.c                      |  3 +++
>  mm/mprotect.c                  |  5 +++++
>  6 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..d84fdeab6b5e 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags)
>  }
>  
>  unsigned long vm_commit_limit(void);
> +
> +static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)

Traditionally, it is easier to write these in the positive instead of
needing to parse a double-negative.

static inline bool allow_write_exec(...)

> +{
> +	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
> +		return false;
> +
> +	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
> +		return true;
> +
> +	if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
> +		return true;
> +
> +	return false;
> +}

Since this is implementation "2" from the earlier discussion[1], I think
some comments in here are good to have. (e.g. to explain to people
reading this code why there is a vma test, etc.) Perhaps even explicit
repeat the implementation expectations.

Restating from that thread:

  2. "is not already PROT_EXEC":

     a)	mmap(PROT_READ|PROT_WRITE|PROT_EXEC);	// fails

     b)	mmap(PROT_READ|PROT_EXEC);
	mprotect(PROT_READ|PROT_EXEC|PROT_BTI);	// passes

     c)	mmap(PROT_READ);
	mprotect(PROT_READ|PROT_EXEC);		// fails

     d)	mmap(PROT_READ|PROT_WRITE);
	mprotect(PROT_READ);
	mprotect(PROT_READ|PROT_EXEC);		// fails

[1] https://lore.kernel.org/linux-arm-kernel/YmGjYYlcSVz38rOe@arm.com/

>  #endif /* _LINUX_MMAN_H */
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 8270ad7ae14c..0e17ae7fbfd3 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
>   * lifecycle of this mm, just for simplicity.
>   */
>  #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
> +
> +#define MMF_HAS_MDWE		28
> +#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
> +
>  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
>  
>  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> -				 MMF_DISABLE_THP_MASK)
> +				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)

Good, yes, new "live forever" bit here. Perhaps bikeshedding over the
name, see below.

>  
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index a5e06dcbba13..ab9db1e86230 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -281,6 +281,12 @@ struct prctl_mm_map {
>  # define PR_SME_VL_LEN_MASK		0xffff
>  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
>  
> +/* Memory deny write / execute */
> +#define PR_SET_MDWE			65
> +# define PR_MDWE_FLAG_MMAP		1
> +
> +#define PR_GET_MDWE			66
> +
>  #define PR_SET_VMA		0x53564d41
>  # define PR_SET_VMA_ANON_NAME		0
>  
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 5fd54bf0e886..08e1dd6d2533 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
>  
> +static inline int prctl_set_mdwe(void)
> +{
> +	set_bit(MMF_HAS_MDWE, &current->mm->flags);
> +
> +	return 0;
> +}
> +
> +static inline int prctl_get_mdwe(void)
> +{
> +	return test_bit(MMF_HAS_MDWE, &current->mm->flags);
> +}

These will need to change -- the aren't constructed for future expansion
at all. At the very least, all the arguments need to passed to be
checked that they are zero. e.g.:

int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
		   unsigned long arg4, unsigned long arg5)
{
	if (arg3 || arg4 || arg5)
		return -EINVAL;

	...

	return 0;
}

Otherwise, there's no way to add arguments in the future because old
userspace may have been sending arbitrary junk on the stack, etc.

And regardless, I think we'll need some explicit flag bits here, since
we can see there has been a long history of various other desired
features that may end up living in here. For now, a single bit is fine.
The intended behavior is the inability to _add_ PROT_EXEC to an existing
vma, and to deny the creating of a W+X vma to begin with, so perhaps
this bit can be named MDWE_FLAG_REFUSE_EXEC_GAIN?

Then the above "..." becomes:

	if (bits & ~(MDWE_FLAG_REFUSE_EXEC_GAIN))
		return -EINVAL;

	if (bits & MDWE_FLAG_REFUSE_EXEC_GAIN)
		set_bit(MMF_HAS_MDWE, &current->mm->flags);
	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
		return -EPERM; /* Cannot unset the flag */

And prctl_get_mdwe() becomes:

int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
		   unsigned long arg4, unsigned long arg5)
{
	if (arg2 || arg3 || arg4 || arg5)
		return -EINVAL;
	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
		MDWE_FLAG_REFUSE_EXEC_GAIN : 0;
}

> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
>  		break;
>  #endif
> +	case PR_SET_MDWE:
> +		error = prctl_set_mdwe();
> +		break;
> +	case PR_GET_MDWE:
> +		error = prctl_get_mdwe();
> +		break;
>  	case PR_SET_VMA:
>  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
>  		break;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 099468aee4d8..42eaf6683216 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  			vm_flags |= VM_NORESERVE;
>  	}
>  
> +	if (map_deny_write_exec(NULL, vm_flags))
> +		return -EACCES;
> +

This seems like the wrong place to do the check -- that the vma argument
is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
it live in mmap_region()? What happens with MAP_FIXED, when there is
an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
check. For example, we had "c" above:

     c)	mmap(PROT_READ);
	mprotect(PROT_READ|PROT_EXEC);		// fails

But this would allow another case:

     e)	addr = mmap(..., PROT_READ, ...);
	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes


>  	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
>  	if (!IS_ERR_VALUE(addr) &&
>  	    ((vm_flags & VM_LOCKED) ||
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 8d770855b591..af71ef0788fd 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>  			break;
>  		}
>  
> +		if (map_deny_write_exec(vma, newflags)) {
> +			error = -EACCES;
> +			goto out;
> +		}
> +

This looks like the right place. Any rationale for why it's before
arch_validate_flags()?

>  		/* Allow architectures to sanity-check the new flags */
>  		if (!arch_validate_flags(newflags)) {
>  			error = -EINVAL;

-Kees

-- 
Kees Cook

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-26 15:04   ` Joey Gouly
@ 2022-10-28 20:16     ` Kees Cook
  -1 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 20:16 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

Here's an alternative rewritten to use kselftest_harness.h, with tests
for the prctl() flags, and the missed Makefile addition. This should be
much easier to add more variants and tests to, I hope.

-Kees

From bc442a99ebd9852bfaa7444b521bd55fdbb4d369 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Fri, 28 Oct 2022 13:10:45 -0700
Subject: [PATCH] selftests/vm: add tests for memory-deny-write-execute

Add tests for new prctl() commands, including flag values. Add tests for
new denials based on PROT_EXEC across mmap() and mprotect() with MDWE.

Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 tools/testing/selftests/vm/Makefile    |   1 +
 tools/testing/selftests/vm/mdwe_test.c | 201 +++++++++++++++++++++++++
 2 files changed, 202 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 163c2fde3cb3..8dd4d4910fa5 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -52,6 +52,7 @@ TEST_GEN_FILES += userfaultfd
 TEST_GEN_PROGS += soft-dirty
 TEST_GEN_PROGS += split_huge_page_test
 TEST_GEN_FILES += ksm_tests
+TEST_GEN_PROGS += mdwe_test
 
 ifeq ($(MACHINE),x86_64)
 CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c
new file mode 100644
index 000000000000..d6f6b751bcd6
--- /dev/null
+++ b/tools/testing/selftests/vm/mdwe_test.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifdef __aarch64__
+#include <asm/hwcap.h>
+#endif
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <linux/prctl.h>
+
+#include "../kselftest_harness.h"
+
+#define PR_SET_MDWE                     65
+# define PR_MDWE_FLAG_MMAP              1
+
+#define PR_GET_MDWE                     66
+
+#ifdef __aarch64__
+# define PROT_BTI	0x10            /* BTI guarded page */
+#else
+# define PROT_BTI	0
+#endif
+
+TEST(prctl_flags)
+{
+	EXPECT_LT(prctl(PR_SET_MDWE, 7, 0, 0, 0), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0, 7, 0, 0), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 7, 0), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 0, 7), 0);
+
+	EXPECT_LT(prctl(PR_GET_MDWE, 7, 0, 0, 0), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0, 7, 0, 0), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0, 0, 7, 0), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0, 0, 0, 7), 0);
+}
+
+FIXTURE(mdwe)
+{
+	void *p;
+	int flags;
+        size_t size;
+	pid_t pid;
+};
+
+FIXTURE_VARIANT(mdwe)
+{
+	bool enabled;
+	bool forked;
+};
+
+FIXTURE_VARIANT_ADD(mdwe, stock)
+{
+        .enabled = false,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, enabled)
+{
+        .enabled = true,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, forked)
+{
+        .enabled = true,
+	.forked = true,
+};
+
+FIXTURE_SETUP(mdwe)
+{
+	int ret, status;
+
+	self->p = NULL;
+	self->flags = MAP_SHARED | MAP_ANONYMOUS;
+	self->size = getpagesize();
+
+	if (!variant->enabled)
+		return;
+
+	ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
+	ASSERT_EQ(ret, 0) {
+		TH_LOG("PR_SET_MDWE failed or unsupported");
+	}
+
+	ret = prctl(PR_GET_MDWE, 0, 0, 0, 0);
+	ASSERT_EQ(ret, 1);
+
+	if (variant->forked) {
+		self->pid = fork();
+		ASSERT_GE(self->pid, 0) {
+			TH_LOG("fork failed\n");
+		}
+
+		if (self->pid > 0) {
+			ret = waitpid(self->pid, &status, 0);
+			ASSERT_TRUE(WIFEXITED(status));
+			exit(WEXITSTATUS(status));
+		}
+	}
+}
+
+FIXTURE_TEARDOWN(mdwe)
+{
+	if (self->p && self->p != MAP_FAILED)
+		munmap(self->p, self->size);
+}
+
+TEST_F(mdwe, mmap_READ_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	EXPECT_NE(self->p, MAP_FAILED);
+}
+
+TEST_F(mdwe, mmap_WRITE_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(self->p, MAP_FAILED);
+	} else {
+		EXPECT_NE(self->p, MAP_FAILED);
+	}
+}
+
+TEST_F(mdwe, mprotect_stay_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_F(mdwe, mprotect_add_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mprotect_WRITE_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_WRITE, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mmap_FIXED)
+{
+	void *p;
+
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	p = mmap(self->p, self->size, PROT_READ | PROT_EXEC,
+		 self->flags | MAP_FIXED, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(p, MAP_FAILED);
+	} else {
+		EXPECT_EQ(p, self->p);
+	}
+}
+
+TEST_F(mdwe, arm64_BTI)
+{
+	int ret;
+
+#ifdef __aarch64__
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI))
+#endif
+		SKIP(return, "HWCAP2_BTI not supported");
+
+	self->p = mmap(NULL, self->size, PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_EXEC | PROT_BTI);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_HARNESS_MAIN
-- 
2.34.1


-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-10-28 20:16     ` Kees Cook
  0 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 20:16 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

Here's an alternative rewritten to use kselftest_harness.h, with tests
for the prctl() flags, and the missed Makefile addition. This should be
much easier to add more variants and tests to, I hope.

-Kees

From bc442a99ebd9852bfaa7444b521bd55fdbb4d369 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Fri, 28 Oct 2022 13:10:45 -0700
Subject: [PATCH] selftests/vm: add tests for memory-deny-write-execute

Add tests for new prctl() commands, including flag values. Add tests for
new denials based on PROT_EXEC across mmap() and mprotect() with MDWE.

Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 tools/testing/selftests/vm/Makefile    |   1 +
 tools/testing/selftests/vm/mdwe_test.c | 201 +++++++++++++++++++++++++
 2 files changed, 202 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 163c2fde3cb3..8dd4d4910fa5 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -52,6 +52,7 @@ TEST_GEN_FILES += userfaultfd
 TEST_GEN_PROGS += soft-dirty
 TEST_GEN_PROGS += split_huge_page_test
 TEST_GEN_FILES += ksm_tests
+TEST_GEN_PROGS += mdwe_test
 
 ifeq ($(MACHINE),x86_64)
 CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c
new file mode 100644
index 000000000000..d6f6b751bcd6
--- /dev/null
+++ b/tools/testing/selftests/vm/mdwe_test.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifdef __aarch64__
+#include <asm/hwcap.h>
+#endif
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <linux/prctl.h>
+
+#include "../kselftest_harness.h"
+
+#define PR_SET_MDWE                     65
+# define PR_MDWE_FLAG_MMAP              1
+
+#define PR_GET_MDWE                     66
+
+#ifdef __aarch64__
+# define PROT_BTI	0x10            /* BTI guarded page */
+#else
+# define PROT_BTI	0
+#endif
+
+TEST(prctl_flags)
+{
+	EXPECT_LT(prctl(PR_SET_MDWE, 7, 0, 0, 0), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0, 7, 0, 0), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 7, 0), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 0, 7), 0);
+
+	EXPECT_LT(prctl(PR_GET_MDWE, 7, 0, 0, 0), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0, 7, 0, 0), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0, 0, 7, 0), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0, 0, 0, 7), 0);
+}
+
+FIXTURE(mdwe)
+{
+	void *p;
+	int flags;
+        size_t size;
+	pid_t pid;
+};
+
+FIXTURE_VARIANT(mdwe)
+{
+	bool enabled;
+	bool forked;
+};
+
+FIXTURE_VARIANT_ADD(mdwe, stock)
+{
+        .enabled = false,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, enabled)
+{
+        .enabled = true,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, forked)
+{
+        .enabled = true,
+	.forked = true,
+};
+
+FIXTURE_SETUP(mdwe)
+{
+	int ret, status;
+
+	self->p = NULL;
+	self->flags = MAP_SHARED | MAP_ANONYMOUS;
+	self->size = getpagesize();
+
+	if (!variant->enabled)
+		return;
+
+	ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
+	ASSERT_EQ(ret, 0) {
+		TH_LOG("PR_SET_MDWE failed or unsupported");
+	}
+
+	ret = prctl(PR_GET_MDWE, 0, 0, 0, 0);
+	ASSERT_EQ(ret, 1);
+
+	if (variant->forked) {
+		self->pid = fork();
+		ASSERT_GE(self->pid, 0) {
+			TH_LOG("fork failed\n");
+		}
+
+		if (self->pid > 0) {
+			ret = waitpid(self->pid, &status, 0);
+			ASSERT_TRUE(WIFEXITED(status));
+			exit(WEXITSTATUS(status));
+		}
+	}
+}
+
+FIXTURE_TEARDOWN(mdwe)
+{
+	if (self->p && self->p != MAP_FAILED)
+		munmap(self->p, self->size);
+}
+
+TEST_F(mdwe, mmap_READ_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	EXPECT_NE(self->p, MAP_FAILED);
+}
+
+TEST_F(mdwe, mmap_WRITE_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(self->p, MAP_FAILED);
+	} else {
+		EXPECT_NE(self->p, MAP_FAILED);
+	}
+}
+
+TEST_F(mdwe, mprotect_stay_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_F(mdwe, mprotect_add_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mprotect_WRITE_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_WRITE, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mmap_FIXED)
+{
+	void *p;
+
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	p = mmap(self->p, self->size, PROT_READ | PROT_EXEC,
+		 self->flags | MAP_FIXED, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(p, MAP_FAILED);
+	} else {
+		EXPECT_EQ(p, self->p);
+	}
+}
+
+TEST_F(mdwe, arm64_BTI)
+{
+	int ret;
+
+#ifdef __aarch64__
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI))
+#endif
+		SKIP(return, "HWCAP2_BTI not supported");
+
+	self->p = mmap(NULL, self->size, PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_EXEC | PROT_BTI);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_HARNESS_MAIN
-- 
2.34.1


-- 
Kees Cook

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-26 15:04   ` Joey Gouly
@ 2022-10-28 20:19     ` Kees Cook
  -1 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 20:19 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:
> [...]
> +# define PR_MDWE_FLAG_MMAP              1
> [...]
> +	// Enable MDWE and then run the tests again.
> +	ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
> +	if (ret < 0) {
> +		ksft_print_msg("PR_SET_MDWE failed or unsupported!\n");
> +		goto exit;
> +	}
> +
> +	ret = prctl(PR_GET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
> +	if (ret == 0)
> +		ksft_exit_fail_msg("PR_GET_MDWE failed!");

This flag (PR_MDWE_FLAG_MMAP), while defined in uapi, wasn't actually
being used in the proposed prctl() api. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-10-28 20:19     ` Kees Cook
  0 siblings, 0 replies; 34+ messages in thread
From: Kees Cook @ 2022-10-28 20:19 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:
> [...]
> +# define PR_MDWE_FLAG_MMAP              1
> [...]
> +	// Enable MDWE and then run the tests again.
> +	ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
> +	if (ret < 0) {
> +		ksft_print_msg("PR_SET_MDWE failed or unsupported!\n");
> +		goto exit;
> +	}
> +
> +	ret = prctl(PR_GET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0);
> +	if (ret == 0)
> +		ksft_exit_fail_msg("PR_GET_MDWE failed!");

This flag (PR_MDWE_FLAG_MMAP), while defined in uapi, wasn't actually
being used in the proposed prctl() api. :)

-- 
Kees Cook

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
  2022-10-26 15:04 ` Joey Gouly
@ 2022-11-06 19:42   ` Topi Miettinen
  -1 siblings, 0 replies; 34+ messages in thread
From: Topi Miettinen @ 2022-11-06 19:42 UTC (permalink / raw)
  To: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, linux-mm, linux-arm-kernel, linux-kernel,
	linux-abi-devel, nd, shuah

On 26.10.2022 18.04, Joey Gouly wrote:
> Hi all,
> 
> This is a follow up to the RFC that Catalin posted:
>    https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/
> 
> The background to this is that systemd has a configuration option called
> MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
> is to prevent a user task from inadvertently creating an executable
> mapping that is (or was) writeable. Since such BPF filter is stateless,
> it cannot detect mappings that were previously writeable but
> subsequently changed to read-only. Therefore the filter simply rejects
> any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
> support (Branch Target Identification), the dynamic loader cannot change
> an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
> For libraries, it can resort to unmapping and re-mapping but for the
> main executable it does not have a file descriptor. The original bug
> report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
> for libraries - [3].
> 
> This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
> that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
> Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
> However unlike the BPF filter it only denies it if the mapping didn't previous
> have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
> which is a problem with the BPF filter.

Draft PR for systemd: https://github.com/systemd/systemd/pull/25276

-Topi

> 
> Thanks,
> Joey
> 
> [1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831
> 
> Joey Gouly (2):
>    mm: Implement memory-deny-write-execute as a prctl
>    kselftest: vm: add tests for memory-deny-write-execute
> 
>   include/linux/mman.h                   |  15 ++
>   include/linux/sched/coredump.h         |   6 +-
>   include/uapi/linux/prctl.h             |   6 +
>   kernel/sys.c                           |  18 +++
>   mm/mmap.c                              |   3 +
>   mm/mprotect.c                          |   5 +
>   tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
>   7 files changed, 246 insertions(+), 1 deletion(-)
>   create mode 100644 tools/testing/selftests/vm/mdwe_test.c
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
@ 2022-11-06 19:42   ` Topi Miettinen
  0 siblings, 0 replies; 34+ messages in thread
From: Topi Miettinen @ 2022-11-06 19:42 UTC (permalink / raw)
  To: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, linux-mm, linux-arm-kernel, linux-kernel,
	linux-abi-devel, nd, shuah

On 26.10.2022 18.04, Joey Gouly wrote:
> Hi all,
> 
> This is a follow up to the RFC that Catalin posted:
>    https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/
> 
> The background to this is that systemd has a configuration option called
> MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
> is to prevent a user task from inadvertently creating an executable
> mapping that is (or was) writeable. Since such BPF filter is stateless,
> it cannot detect mappings that were previously writeable but
> subsequently changed to read-only. Therefore the filter simply rejects
> any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
> support (Branch Target Identification), the dynamic loader cannot change
> an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
> For libraries, it can resort to unmapping and re-mapping but for the
> main executable it does not have a file descriptor. The original bug
> report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
> for libraries - [3].
> 
> This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
> that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
> Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
> However unlike the BPF filter it only denies it if the mapping didn't previous
> have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
> which is a problem with the BPF filter.

Draft PR for systemd: https://github.com/systemd/systemd/pull/25276

-Topi

> 
> Thanks,
> Joey
> 
> [1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831
> 
> Joey Gouly (2):
>    mm: Implement memory-deny-write-execute as a prctl
>    kselftest: vm: add tests for memory-deny-write-execute
> 
>   include/linux/mman.h                   |  15 ++
>   include/linux/sched/coredump.h         |   6 +-
>   include/uapi/linux/prctl.h             |   6 +
>   kernel/sys.c                           |  18 +++
>   mm/mmap.c                              |   3 +
>   mm/mprotect.c                          |   5 +
>   tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
>   7 files changed, 246 insertions(+), 1 deletion(-)
>   create mode 100644 tools/testing/selftests/vm/mdwe_test.c
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-28 20:16     ` Kees Cook
@ 2022-11-07 12:23       ` Szabolcs Nagy
  -1 siblings, 0 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2022-11-07 12:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

The 10/28/2022 13:16, Kees Cook wrote:
> +++ b/tools/testing/selftests/vm/mdwe_test.c
> @@ -0,0 +1,201 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#ifdef __aarch64__
> +#include <asm/hwcap.h>
> +#endif
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/auxv.h>
> +#include <sys/mman.h>
> +#include <sys/prctl.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +#include <linux/prctl.h>
> +
> +#include "../kselftest_harness.h"
> +
> +#define PR_SET_MDWE                     65
> +# define PR_MDWE_FLAG_MMAP              1
> +
> +#define PR_GET_MDWE                     66
> +
> +#ifdef __aarch64__
> +# define PROT_BTI	0x10            /* BTI guarded page */
> +#else
> +# define PROT_BTI	0
> +#endif
> +
> +TEST(prctl_flags)
> +{
> +	EXPECT_LT(prctl(PR_SET_MDWE, 7, 0, 0, 0), 0);
> +	EXPECT_LT(prctl(PR_SET_MDWE, 0, 7, 0, 0), 0);
> +	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 7, 0), 0);
> +	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 0, 7), 0);

note that prctl is declared as

  int prctl(int, ...);

and all 4 arguments are documented to be unsigned long in
the linux man pages (even though some are pointers: this
is already a problem for the libc as it does not know if it
should use va_arg(ap, unsigned long) or va_arg(ap, void *),
in practice the call abi rules are the same for those on
linux, so either works unless the compiler deliberately
breaks the code due to the type mismatch ub).

passing an int where an unsigned long is needed is wrong: it
breaks va_arg rules on the c language level (posix rules too)
but more importantly it breaks abi rules: on most LP64 abis
it is not required to be signextended so arbitrary top 32bits
may be passed down.

so e.g.

  prctl(option, 0, 0, 0, 0);

should be written as

  prctl(option, 0L, 0L, 0L, 0L);

or similar (int signedness does not matter according to c
rules), otherwise non-zero top bits may be passed that the
kernel has to ignore, which it currently does not always do.

ideally the kernel updated all the prctl arg macros to have
type long or unsigned long. or explicitly masked out the top
bits when it only uses an int.

see my related rant at
https://lore.kernel.org/linux-api/Y1%2FDS6uoWP7OSkmd@arm.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-11-07 12:23       ` Szabolcs Nagy
  0 siblings, 0 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2022-11-07 12:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

The 10/28/2022 13:16, Kees Cook wrote:
> +++ b/tools/testing/selftests/vm/mdwe_test.c
> @@ -0,0 +1,201 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#ifdef __aarch64__
> +#include <asm/hwcap.h>
> +#endif
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/auxv.h>
> +#include <sys/mman.h>
> +#include <sys/prctl.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +#include <linux/prctl.h>
> +
> +#include "../kselftest_harness.h"
> +
> +#define PR_SET_MDWE                     65
> +# define PR_MDWE_FLAG_MMAP              1
> +
> +#define PR_GET_MDWE                     66
> +
> +#ifdef __aarch64__
> +# define PROT_BTI	0x10            /* BTI guarded page */
> +#else
> +# define PROT_BTI	0
> +#endif
> +
> +TEST(prctl_flags)
> +{
> +	EXPECT_LT(prctl(PR_SET_MDWE, 7, 0, 0, 0), 0);
> +	EXPECT_LT(prctl(PR_SET_MDWE, 0, 7, 0, 0), 0);
> +	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 7, 0), 0);
> +	EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 0, 7), 0);

note that prctl is declared as

  int prctl(int, ...);

and all 4 arguments are documented to be unsigned long in
the linux man pages (even though some are pointers: this
is already a problem for the libc as it does not know if it
should use va_arg(ap, unsigned long) or va_arg(ap, void *),
in practice the call abi rules are the same for those on
linux, so either works unless the compiler deliberately
breaks the code due to the type mismatch ub).

passing an int where an unsigned long is needed is wrong: it
breaks va_arg rules on the c language level (posix rules too)
but more importantly it breaks abi rules: on most LP64 abis
it is not required to be signextended so arbitrary top 32bits
may be passed down.

so e.g.

  prctl(option, 0, 0, 0, 0);

should be written as

  prctl(option, 0L, 0L, 0L, 0L);

or similar (int signedness does not matter according to c
rules), otherwise non-zero top bits may be passed that the
kernel has to ignore, which it currently does not always do.

ideally the kernel updated all the prctl arg macros to have
type long or unsigned long. or explicitly masked out the top
bits when it only uses an int.

see my related rant at
https://lore.kernel.org/linux-api/Y1%2FDS6uoWP7OSkmd@arm.com/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-10-28 17:03     ` Mark Brown
@ 2022-11-08 17:33       ` Joey Gouly
  -1 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-11-08 17:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

Hi,

On Fri, Oct 28, 2022 at 06:03:18PM +0100, Mark Brown wrote:
> On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:
> 
> > Add some tests to cover the new PR_SET_MDWE prctl.
> 
> Some comments below but they're all stylistic and let's not make perfect
> be the enemy of the good here so
> 
> Reviewed-by: Mark Brown <broonie@kernel.org>

Thanks for the review, however I won't keep your R-b tag because I'm going to
move forward with Kees' approach from:

https://lore.kernel.org/linux-arm-kernel/202210281314.C5D3414722@keescook/T/#m45ac9de6c205b560d072a65e4e67e2a7ee363588

Thanks to Kees for rewriting that.

> 
> and we can iterate later rather than blocking anything on the testcase.
> 
> > +#ifdef __aarch64__
> > +#define PROT_BTI      0x10            /* BTI guarded page */
> > +#endif
> 
> We should get this from the kernel headers shouldn't we?  We generally
> rely on things getting pulled in from there rather than locally
> defining.

I believe the mman.h included is from the toolchain, not the kernel's uapi headers.
The toolchain I was using didn't have PROT_BTI defined in its mman.h

> 
> > +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n"
> > +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n"
> > +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n"
> > +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n"
> 
> > +int test1(int mdwe_enabled)
> > +{
> 
> It feels like we could usefully make an array of
> 
> 	struct test {
> 		int (*run)(bool mdwe_enabled);
> 		char *name;
> 	}
> 
> then we'd need fewer ifdefs, things could be more usefully named and
> it'd be a bit easier to add new cases.
> 
> > +#ifdef __aarch64__
> > +	ksft_set_plan(12);
> > +#else
> > +	ksft_set_plan(9);
> > +#endif
> 
> That'd just be ksft_test_plan(3 * ARRAY_SIZE(tests).
> 
> > +	// First run the tests without MDWE
> > +	test_result(test1(0), TEST1);
> > +	test_result(test2(0), TEST2);
> > +	test_result(test3(0), TEST3);
> > +#ifdef __aarch64__
> > +	test_result(test4(0), TEST4);
> > +#endif
> 
> and these calls to the tests would all be iterating over the array.

These comments are solved by the kselftest_harness approach that Kees suggested.

Thanks,
Joey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-11-08 17:33       ` Joey Gouly
  0 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-11-08 17:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

Hi,

On Fri, Oct 28, 2022 at 06:03:18PM +0100, Mark Brown wrote:
> On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:
> 
> > Add some tests to cover the new PR_SET_MDWE prctl.
> 
> Some comments below but they're all stylistic and let's not make perfect
> be the enemy of the good here so
> 
> Reviewed-by: Mark Brown <broonie@kernel.org>

Thanks for the review, however I won't keep your R-b tag because I'm going to
move forward with Kees' approach from:

https://lore.kernel.org/linux-arm-kernel/202210281314.C5D3414722@keescook/T/#m45ac9de6c205b560d072a65e4e67e2a7ee363588

Thanks to Kees for rewriting that.

> 
> and we can iterate later rather than blocking anything on the testcase.
> 
> > +#ifdef __aarch64__
> > +#define PROT_BTI      0x10            /* BTI guarded page */
> > +#endif
> 
> We should get this from the kernel headers shouldn't we?  We generally
> rely on things getting pulled in from there rather than locally
> defining.

I believe the mman.h included is from the toolchain, not the kernel's uapi headers.
The toolchain I was using didn't have PROT_BTI defined in its mman.h

> 
> > +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n"
> > +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n"
> > +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n"
> > +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n"
> 
> > +int test1(int mdwe_enabled)
> > +{
> 
> It feels like we could usefully make an array of
> 
> 	struct test {
> 		int (*run)(bool mdwe_enabled);
> 		char *name;
> 	}
> 
> then we'd need fewer ifdefs, things could be more usefully named and
> it'd be a bit easier to add new cases.
> 
> > +#ifdef __aarch64__
> > +	ksft_set_plan(12);
> > +#else
> > +	ksft_set_plan(9);
> > +#endif
> 
> That'd just be ksft_test_plan(3 * ARRAY_SIZE(tests).
> 
> > +	// First run the tests without MDWE
> > +	test_result(test1(0), TEST1);
> > +	test_result(test2(0), TEST2);
> > +	test_result(test3(0), TEST3);
> > +#ifdef __aarch64__
> > +	test_result(test4(0), TEST4);
> > +#endif
> 
> and these calls to the tests would all be iterating over the array.

These comments are solved by the kselftest_harness approach that Kees suggested.

Thanks,
Joey

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2022-11-08 17:33       ` Joey Gouly
@ 2022-11-09 13:33         ` Mark Brown
  -1 siblings, 0 replies; 34+ messages in thread
From: Mark Brown @ 2022-11-09 13:33 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

On Tue, Nov 08, 2022 at 05:33:03PM +0000, Joey Gouly wrote:
> On Fri, Oct 28, 2022 at 06:03:18PM +0100, Mark Brown wrote:
> > On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:

> > > +#ifdef __aarch64__
> > > +#define PROT_BTI      0x10            /* BTI guarded page */
> > > +#endif

> > We should get this from the kernel headers shouldn't we?  We generally
> > rely on things getting pulled in from there rather than locally
> > defining.

> I believe the mman.h included is from the toolchain, not the kernel's uapi headers.
> The toolchain I was using didn't have PROT_BTI defined in its mman.h

I'd expect that whatever we're doing in the build process ought to be
overriding the default headers provided by the toolchain, that's kind of
the point here...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2022-11-09 13:33         ` Mark Brown
  0 siblings, 0 replies; 34+ messages in thread
From: Mark Brown @ 2022-11-09 13:33 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah


[-- Attachment #1.1: Type: text/plain, Size: 773 bytes --]

On Tue, Nov 08, 2022 at 05:33:03PM +0000, Joey Gouly wrote:
> On Fri, Oct 28, 2022 at 06:03:18PM +0100, Mark Brown wrote:
> > On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote:

> > > +#ifdef __aarch64__
> > > +#define PROT_BTI      0x10            /* BTI guarded page */
> > > +#endif

> > We should get this from the kernel headers shouldn't we?  We generally
> > rely on things getting pulled in from there rather than locally
> > defining.

> I believe the mman.h included is from the toolchain, not the kernel's uapi headers.
> The toolchain I was using didn't have PROT_BTI defined in its mman.h

I'd expect that whatever we're doing in the build process ought to be
overriding the default headers provided by the toolchain, that's kind of
the point here...

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-10-28 18:51     ` Kees Cook
@ 2022-11-10 11:27       ` Joey Gouly
  -1 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-11-10 11:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

Hi,

On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > The aim of such policy is to prevent a user task from creating an
> > executable mapping that is also writeable.
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > The BPF filter that systemd MDWE uses is stateless, and disallows
> > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> > be enabled if it was already PROT_EXEC, which allows the following case:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >  include/linux/mman.h           | 15 +++++++++++++++
> >  include/linux/sched/coredump.h |  6 +++++-
> >  include/uapi/linux/prctl.h     |  6 ++++++
> >  kernel/sys.c                   | 18 ++++++++++++++++++
> >  mm/mmap.c                      |  3 +++
> >  mm/mprotect.c                  |  5 +++++
> >  6 files changed, 52 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 58b3abd457a3..d84fdeab6b5e 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags)
> >  }
> >  
> >  unsigned long vm_commit_limit(void);
> > +
> > +static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
> 
> Traditionally, it is easier to write these in the positive instead of
> needing to parse a double-negative.
> 
> static inline bool allow_write_exec(...)

This doesn't feel like a double negative to me, and I think it would be better
to keep the name of the function similar to the name of the 'feature'.
However I'm not too fussed either way.

> 
> > +{
> > +	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
> > +		return false;
> > +
> > +	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
> > +		return true;
> > +
> > +	if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> Since this is implementation "2" from the earlier discussion[1], I think
> some comments in here are good to have. (e.g. to explain to people
> reading this code why there is a vma test, etc.) Perhaps even explicit
> repeat the implementation expectations.
> 
> Restating from that thread:
> 
>   2. "is not already PROT_EXEC":
> 
>      a)	mmap(PROT_READ|PROT_WRITE|PROT_EXEC);	// fails
> 
>      b)	mmap(PROT_READ|PROT_EXEC);
> 	mprotect(PROT_READ|PROT_EXEC|PROT_BTI);	// passes
> 
>      c)	mmap(PROT_READ);
> 	mprotect(PROT_READ|PROT_EXEC);		// fails
> 
>      d)	mmap(PROT_READ|PROT_WRITE);
> 	mprotect(PROT_READ);
> 	mprotect(PROT_READ|PROT_EXEC);		// fails

Good idea, I will add a comment.

> 
> [1] https://lore.kernel.org/linux-arm-kernel/YmGjYYlcSVz38rOe@arm.com/
> 
> >  #endif /* _LINUX_MMAN_H */
> > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> > index 8270ad7ae14c..0e17ae7fbfd3 100644
> > --- a/include/linux/sched/coredump.h
> > +++ b/include/linux/sched/coredump.h
> > @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
> >   * lifecycle of this mm, just for simplicity.
> >   */
> >  #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
> > +
> > +#define MMF_HAS_MDWE		28
> > +#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
> > +
> >  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> >  
> >  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> > -				 MMF_DISABLE_THP_MASK)
> > +				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
> 
> Good, yes, new "live forever" bit here. Perhaps bikeshedding over the
> name, see below.
> 
> >  
> >  #endif /* _LINUX_SCHED_COREDUMP_H */
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index a5e06dcbba13..ab9db1e86230 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -281,6 +281,12 @@ struct prctl_mm_map {
> >  # define PR_SME_VL_LEN_MASK		0xffff
> >  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
> >  
> > +/* Memory deny write / execute */
> > +#define PR_SET_MDWE			65
> > +# define PR_MDWE_FLAG_MMAP		1
> > +
> > +#define PR_GET_MDWE			66
> > +
> >  #define PR_SET_VMA		0x53564d41
> >  # define PR_SET_VMA_ANON_NAME		0
> >  
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 5fd54bf0e886..08e1dd6d2533 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
> >  }
> >  #endif /* CONFIG_ANON_VMA_NAME */
> >  
> > +static inline int prctl_set_mdwe(void)
> > +{
> > +	set_bit(MMF_HAS_MDWE, &current->mm->flags);
> > +
> > +	return 0;
> > +}
> > +
> > +static inline int prctl_get_mdwe(void)
> > +{
> > +	return test_bit(MMF_HAS_MDWE, &current->mm->flags);
> > +}
> 
> These will need to change -- the aren't constructed for future expansion
> at all. At the very least, all the arguments need to passed to be
> checked that they are zero. e.g.:
> 
> int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
> 		   unsigned long arg4, unsigned long arg5)
> {
> 	if (arg3 || arg4 || arg5)
> 		return -EINVAL;
> 
> 	...
> 
> 	return 0;
> }
> 
> Otherwise, there's no way to add arguments in the future because old
> userspace may have been sending arbitrary junk on the stack, etc.
> 
> And regardless, I think we'll need some explicit flag bits here, since
> we can see there has been a long history of various other desired
> features that may end up living in here. For now, a single bit is fine.
> The intended behavior is the inability to _add_ PROT_EXEC to an existing
> vma, and to deny the creating of a W+X vma to begin with, so perhaps
> this bit can be named MDWE_FLAG_REFUSE_EXEC_GAIN?
> 
> Then the above "..." becomes:
> 
> 	if (bits & ~(MDWE_FLAG_REFUSE_EXEC_GAIN))
> 		return -EINVAL;
> 
> 	if (bits & MDWE_FLAG_REFUSE_EXEC_GAIN)
> 		set_bit(MMF_HAS_MDWE, &current->mm->flags);
> 	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
> 		return -EPERM; /* Cannot unset the flag */
> 
> And prctl_get_mdwe() becomes:
> 
> int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
> 		   unsigned long arg4, unsigned long arg5)
> {
> 	if (arg2 || arg3 || arg4 || arg5)
> 		return -EINVAL;
> 	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
> 		MDWE_FLAG_REFUSE_EXEC_GAIN : 0;
> }

Thanks, makes sense, I have incorporated those changes.

> 
> > +
> >  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >  		unsigned long, arg4, unsigned long, arg5)
> >  {
> > @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
> >  		break;
> >  #endif
> > +	case PR_SET_MDWE:
> > +		error = prctl_set_mdwe();
> > +		break;
> > +	case PR_GET_MDWE:
> > +		error = prctl_get_mdwe();
> > +		break;
> >  	case PR_SET_VMA:
> >  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
> >  		break;
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 099468aee4d8..42eaf6683216 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  			vm_flags |= VM_NORESERVE;
> >  	}
> >  
> > +	if (map_deny_write_exec(NULL, vm_flags))
> > +		return -EACCES;
> > +
> 
> This seems like the wrong place to do the check -- that the vma argument
> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> it live in mmap_region()? What happens with MAP_FIXED, when there is
> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> check. For example, we had "c" above:
> 
>      c)	mmap(PROT_READ);
> 	mprotect(PROT_READ|PROT_EXEC);		// fails
> 
> But this would allow another case:
> 
>      e)	addr = mmap(..., PROT_READ, ...);
> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes

I can move the check into mmap_region() but it won't fix the MAP_FIXED
example that you showed here.

mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
However the `vma` for the 'old' region is not kept around, and a new vma will
be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
will just be as good as passing NULL.

It's possible to save the vm_flags from the region that is unmapped, but Catalin
suggested it might be better if that is part of a later extension, what do you
think? 

> 
> 
> >  	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
> >  	if (!IS_ERR_VALUE(addr) &&
> >  	    ((vm_flags & VM_LOCKED) ||
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 8d770855b591..af71ef0788fd 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >  			break;
> >  		}
> >  
> > +		if (map_deny_write_exec(vma, newflags)) {
> > +			error = -EACCES;
> > +			goto out;
> > +		}
> > +
> 
> This looks like the right place. Any rationale for why it's before
> arch_validate_flags()?o

No big justification, it's just after the VM_ACCESS_FLAGS check and is more generic
than the architecture specific checks.

> 
> >  		/* Allow architectures to sanity-check the new flags */
> >  		if (!arch_validate_flags(newflags)) {
> >  			error = -EINVAL;
> 
> -Kees

Thanks for the review and for the rewritten test, I have replaced my commit with
the one that you sent.

Joey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-11-10 11:27       ` Joey Gouly
  0 siblings, 0 replies; 34+ messages in thread
From: Joey Gouly @ 2022-11-10 11:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

Hi,

On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > The aim of such policy is to prevent a user task from creating an
> > executable mapping that is also writeable.
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > The BPF filter that systemd MDWE uses is stateless, and disallows
> > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> > be enabled if it was already PROT_EXEC, which allows the following case:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >  include/linux/mman.h           | 15 +++++++++++++++
> >  include/linux/sched/coredump.h |  6 +++++-
> >  include/uapi/linux/prctl.h     |  6 ++++++
> >  kernel/sys.c                   | 18 ++++++++++++++++++
> >  mm/mmap.c                      |  3 +++
> >  mm/mprotect.c                  |  5 +++++
> >  6 files changed, 52 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 58b3abd457a3..d84fdeab6b5e 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags)
> >  }
> >  
> >  unsigned long vm_commit_limit(void);
> > +
> > +static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
> 
> Traditionally, it is easier to write these in the positive instead of
> needing to parse a double-negative.
> 
> static inline bool allow_write_exec(...)

This doesn't feel like a double negative to me, and I think it would be better
to keep the name of the function similar to the name of the 'feature'.
However I'm not too fussed either way.

> 
> > +{
> > +	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
> > +		return false;
> > +
> > +	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
> > +		return true;
> > +
> > +	if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> Since this is implementation "2" from the earlier discussion[1], I think
> some comments in here are good to have. (e.g. to explain to people
> reading this code why there is a vma test, etc.) Perhaps even explicit
> repeat the implementation expectations.
> 
> Restating from that thread:
> 
>   2. "is not already PROT_EXEC":
> 
>      a)	mmap(PROT_READ|PROT_WRITE|PROT_EXEC);	// fails
> 
>      b)	mmap(PROT_READ|PROT_EXEC);
> 	mprotect(PROT_READ|PROT_EXEC|PROT_BTI);	// passes
> 
>      c)	mmap(PROT_READ);
> 	mprotect(PROT_READ|PROT_EXEC);		// fails
> 
>      d)	mmap(PROT_READ|PROT_WRITE);
> 	mprotect(PROT_READ);
> 	mprotect(PROT_READ|PROT_EXEC);		// fails

Good idea, I will add a comment.

> 
> [1] https://lore.kernel.org/linux-arm-kernel/YmGjYYlcSVz38rOe@arm.com/
> 
> >  #endif /* _LINUX_MMAN_H */
> > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> > index 8270ad7ae14c..0e17ae7fbfd3 100644
> > --- a/include/linux/sched/coredump.h
> > +++ b/include/linux/sched/coredump.h
> > @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
> >   * lifecycle of this mm, just for simplicity.
> >   */
> >  #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
> > +
> > +#define MMF_HAS_MDWE		28
> > +#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
> > +
> >  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> >  
> >  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> > -				 MMF_DISABLE_THP_MASK)
> > +				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
> 
> Good, yes, new "live forever" bit here. Perhaps bikeshedding over the
> name, see below.
> 
> >  
> >  #endif /* _LINUX_SCHED_COREDUMP_H */
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index a5e06dcbba13..ab9db1e86230 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -281,6 +281,12 @@ struct prctl_mm_map {
> >  # define PR_SME_VL_LEN_MASK		0xffff
> >  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
> >  
> > +/* Memory deny write / execute */
> > +#define PR_SET_MDWE			65
> > +# define PR_MDWE_FLAG_MMAP		1
> > +
> > +#define PR_GET_MDWE			66
> > +
> >  #define PR_SET_VMA		0x53564d41
> >  # define PR_SET_VMA_ANON_NAME		0
> >  
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 5fd54bf0e886..08e1dd6d2533 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
> >  }
> >  #endif /* CONFIG_ANON_VMA_NAME */
> >  
> > +static inline int prctl_set_mdwe(void)
> > +{
> > +	set_bit(MMF_HAS_MDWE, &current->mm->flags);
> > +
> > +	return 0;
> > +}
> > +
> > +static inline int prctl_get_mdwe(void)
> > +{
> > +	return test_bit(MMF_HAS_MDWE, &current->mm->flags);
> > +}
> 
> These will need to change -- the aren't constructed for future expansion
> at all. At the very least, all the arguments need to passed to be
> checked that they are zero. e.g.:
> 
> int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
> 		   unsigned long arg4, unsigned long arg5)
> {
> 	if (arg3 || arg4 || arg5)
> 		return -EINVAL;
> 
> 	...
> 
> 	return 0;
> }
> 
> Otherwise, there's no way to add arguments in the future because old
> userspace may have been sending arbitrary junk on the stack, etc.
> 
> And regardless, I think we'll need some explicit flag bits here, since
> we can see there has been a long history of various other desired
> features that may end up living in here. For now, a single bit is fine.
> The intended behavior is the inability to _add_ PROT_EXEC to an existing
> vma, and to deny the creating of a W+X vma to begin with, so perhaps
> this bit can be named MDWE_FLAG_REFUSE_EXEC_GAIN?
> 
> Then the above "..." becomes:
> 
> 	if (bits & ~(MDWE_FLAG_REFUSE_EXEC_GAIN))
> 		return -EINVAL;
> 
> 	if (bits & MDWE_FLAG_REFUSE_EXEC_GAIN)
> 		set_bit(MMF_HAS_MDWE, &current->mm->flags);
> 	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
> 		return -EPERM; /* Cannot unset the flag */
> 
> And prctl_get_mdwe() becomes:
> 
> int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
> 		   unsigned long arg4, unsigned long arg5)
> {
> 	if (arg2 || arg3 || arg4 || arg5)
> 		return -EINVAL;
> 	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
> 		MDWE_FLAG_REFUSE_EXEC_GAIN : 0;
> }

Thanks, makes sense, I have incorporated those changes.

> 
> > +
> >  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >  		unsigned long, arg4, unsigned long, arg5)
> >  {
> > @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
> >  		break;
> >  #endif
> > +	case PR_SET_MDWE:
> > +		error = prctl_set_mdwe();
> > +		break;
> > +	case PR_GET_MDWE:
> > +		error = prctl_get_mdwe();
> > +		break;
> >  	case PR_SET_VMA:
> >  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
> >  		break;
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 099468aee4d8..42eaf6683216 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  			vm_flags |= VM_NORESERVE;
> >  	}
> >  
> > +	if (map_deny_write_exec(NULL, vm_flags))
> > +		return -EACCES;
> > +
> 
> This seems like the wrong place to do the check -- that the vma argument
> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> it live in mmap_region()? What happens with MAP_FIXED, when there is
> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> check. For example, we had "c" above:
> 
>      c)	mmap(PROT_READ);
> 	mprotect(PROT_READ|PROT_EXEC);		// fails
> 
> But this would allow another case:
> 
>      e)	addr = mmap(..., PROT_READ, ...);
> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes

I can move the check into mmap_region() but it won't fix the MAP_FIXED
example that you showed here.

mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
However the `vma` for the 'old' region is not kept around, and a new vma will
be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
will just be as good as passing NULL.

It's possible to save the vm_flags from the region that is unmapped, but Catalin
suggested it might be better if that is part of a later extension, what do you
think? 

> 
> 
> >  	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
> >  	if (!IS_ERR_VALUE(addr) &&
> >  	    ((vm_flags & VM_LOCKED) ||
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 8d770855b591..af71ef0788fd 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >  			break;
> >  		}
> >  
> > +		if (map_deny_write_exec(vma, newflags)) {
> > +			error = -EACCES;
> > +			goto out;
> > +		}
> > +
> 
> This looks like the right place. Any rationale for why it's before
> arch_validate_flags()?o

No big justification, it's just after the VM_ACCESS_FLAGS check and is more generic
than the architecture specific checks.

> 
> >  		/* Allow architectures to sanity-check the new flags */
> >  		if (!arch_validate_flags(newflags)) {
> >  			error = -EINVAL;
> 
> -Kees

Thanks for the review and for the rewritten test, I have replaced my commit with
the one that you sent.

Joey

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-11-10 11:27       ` Joey Gouly
@ 2022-11-10 12:03         ` Catalin Marinas
  -1 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2022-11-10 12:03 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 099468aee4d8..42eaf6683216 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > >  			vm_flags |= VM_NORESERVE;
> > >  	}
> > >  
> > > +	if (map_deny_write_exec(NULL, vm_flags))
> > > +		return -EACCES;
> > > +
> > 
> > This seems like the wrong place to do the check -- that the vma argument
> > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> > it live in mmap_region()? What happens with MAP_FIXED, when there is
> > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> > check. For example, we had "c" above:
> > 
> >      c)	mmap(PROT_READ);
> > 	mprotect(PROT_READ|PROT_EXEC);		// fails
> > 
> > But this would allow another case:
> > 
> >      e)	addr = mmap(..., PROT_READ, ...);
> > 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
> 
> I can move the check into mmap_region() but it won't fix the MAP_FIXED
> example that you showed here.
> 
> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
> However the `vma` for the 'old' region is not kept around, and a new vma will
> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
> will just be as good as passing NULL.
> 
> It's possible to save the vm_flags from the region that is unmapped, but Catalin
> suggested it might be better if that is part of a later extension, what do you
> think? 

I thought initially we should keep the behaviour close to what systemd
achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
vma is already executable (i.e. check actual permission change not just
the PROT_* flags).

We could pass the old vm_flags for that region (and maybe drop the vma
pointer entirely, just check old and new vm_flags). But this feels like
tightening slightly systemd's MDWE approach. If user-space doesn't get
confused by this, I'm fine to go with it. Otherwise we can add a new
flag later for this behaviour

I guess that's more of a question for Topi on whether point tightening
point (e) is feasible/desirable.

-- 
Catalin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-11-10 12:03         ` Catalin Marinas
  0 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2022-11-10 12:03 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 099468aee4d8..42eaf6683216 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > >  			vm_flags |= VM_NORESERVE;
> > >  	}
> > >  
> > > +	if (map_deny_write_exec(NULL, vm_flags))
> > > +		return -EACCES;
> > > +
> > 
> > This seems like the wrong place to do the check -- that the vma argument
> > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> > it live in mmap_region()? What happens with MAP_FIXED, when there is
> > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> > check. For example, we had "c" above:
> > 
> >      c)	mmap(PROT_READ);
> > 	mprotect(PROT_READ|PROT_EXEC);		// fails
> > 
> > But this would allow another case:
> > 
> >      e)	addr = mmap(..., PROT_READ, ...);
> > 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
> 
> I can move the check into mmap_region() but it won't fix the MAP_FIXED
> example that you showed here.
> 
> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
> However the `vma` for the 'old' region is not kept around, and a new vma will
> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
> will just be as good as passing NULL.
> 
> It's possible to save the vm_flags from the region that is unmapped, but Catalin
> suggested it might be better if that is part of a later extension, what do you
> think? 

I thought initially we should keep the behaviour close to what systemd
achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
vma is already executable (i.e. check actual permission change not just
the PROT_* flags).

We could pass the old vm_flags for that region (and maybe drop the vma
pointer entirely, just check old and new vm_flags). But this feels like
tightening slightly systemd's MDWE approach. If user-space doesn't get
confused by this, I'm fine to go with it. Otherwise we can add a new
flag later for this behaviour

I guess that's more of a question for Topi on whether point tightening
point (e) is feasible/desirable.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-11-10 12:03         ` Catalin Marinas
@ 2022-11-12  6:11           ` Topi Miettinen
  -1 siblings, 0 replies; 34+ messages in thread
From: Topi Miettinen @ 2022-11-12  6:11 UTC (permalink / raw)
  To: Catalin Marinas, Joey Gouly
  Cc: Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On 10.11.2022 14.03, Catalin Marinas wrote:
> On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
>> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
>>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>> index 099468aee4d8..42eaf6683216 100644
>>>> --- a/mm/mmap.c
>>>> +++ b/mm/mmap.c
>>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>>>   			vm_flags |= VM_NORESERVE;
>>>>   	}
>>>>   
>>>> +	if (map_deny_write_exec(NULL, vm_flags))
>>>> +		return -EACCES;
>>>> +
>>>
>>> This seems like the wrong place to do the check -- that the vma argument
>>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
>>> it live in mmap_region()? What happens with MAP_FIXED, when there is
>>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
>>> check. For example, we had "c" above:
>>>
>>>       c)	mmap(PROT_READ);
>>> 	mprotect(PROT_READ|PROT_EXEC);		// fails
>>>
>>> But this would allow another case:
>>>
>>>       e)	addr = mmap(..., PROT_READ, ...);
>>> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
>>
>> I can move the check into mmap_region() but it won't fix the MAP_FIXED
>> example that you showed here.
>>
>> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
>> However the `vma` for the 'old' region is not kept around, and a new vma will
>> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
>> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
>> will just be as good as passing NULL.
>>
>> It's possible to save the vm_flags from the region that is unmapped, but Catalin
>> suggested it might be better if that is part of a later extension, what do you
>> think?
> 
> I thought initially we should keep the behaviour close to what systemd
> achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
> vma is already executable (i.e. check actual permission change not just
> the PROT_* flags).
> 
> We could pass the old vm_flags for that region (and maybe drop the vma
> pointer entirely, just check old and new vm_flags). But this feels like
> tightening slightly systemd's MDWE approach. If user-space doesn't get
> confused by this, I'm fine to go with it. Otherwise we can add a new
> flag later for this behaviour
> 
> I guess that's more of a question for Topi on whether point tightening
> point (e) is feasible/desirable.

I think we want 1:1 compatibility with seccomp() for the basic version, 
so MAP_FIXED shouldn't change the verdict. Later we can introduce more 
versions (perhaps even less strict, too) when it's requested by 
configuration, like MemoryDenyWriteExecute=[relaxed | strict].

-Topi


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-11-12  6:11           ` Topi Miettinen
  0 siblings, 0 replies; 34+ messages in thread
From: Topi Miettinen @ 2022-11-12  6:11 UTC (permalink / raw)
  To: Catalin Marinas, Joey Gouly
  Cc: Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On 10.11.2022 14.03, Catalin Marinas wrote:
> On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
>> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
>>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>> index 099468aee4d8..42eaf6683216 100644
>>>> --- a/mm/mmap.c
>>>> +++ b/mm/mmap.c
>>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>>>   			vm_flags |= VM_NORESERVE;
>>>>   	}
>>>>   
>>>> +	if (map_deny_write_exec(NULL, vm_flags))
>>>> +		return -EACCES;
>>>> +
>>>
>>> This seems like the wrong place to do the check -- that the vma argument
>>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
>>> it live in mmap_region()? What happens with MAP_FIXED, when there is
>>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
>>> check. For example, we had "c" above:
>>>
>>>       c)	mmap(PROT_READ);
>>> 	mprotect(PROT_READ|PROT_EXEC);		// fails
>>>
>>> But this would allow another case:
>>>
>>>       e)	addr = mmap(..., PROT_READ, ...);
>>> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
>>
>> I can move the check into mmap_region() but it won't fix the MAP_FIXED
>> example that you showed here.
>>
>> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
>> However the `vma` for the 'old' region is not kept around, and a new vma will
>> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
>> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
>> will just be as good as passing NULL.
>>
>> It's possible to save the vm_flags from the region that is unmapped, but Catalin
>> suggested it might be better if that is part of a later extension, what do you
>> think?
> 
> I thought initially we should keep the behaviour close to what systemd
> achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
> vma is already executable (i.e. check actual permission change not just
> the PROT_* flags).
> 
> We could pass the old vm_flags for that region (and maybe drop the vma
> pointer entirely, just check old and new vm_flags). But this feels like
> tightening slightly systemd's MDWE approach. If user-space doesn't get
> confused by this, I'm fine to go with it. Otherwise we can add a new
> flag later for this behaviour
> 
> I guess that's more of a question for Topi on whether point tightening
> point (e) is feasible/desirable.

I think we want 1:1 compatibility with seccomp() for the basic version, 
so MAP_FIXED shouldn't change the verdict. Later we can introduce more 
versions (perhaps even less strict, too) when it's requested by 
configuration, like MemoryDenyWriteExecute=[relaxed | strict].

-Topi


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-11-12  6:11           ` Topi Miettinen
@ 2022-11-15 15:35             ` Catalin Marinas
  -1 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2022-11-15 15:35 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Joey Gouly, Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote:
> On 10.11.2022 14.03, Catalin Marinas wrote:
> > On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
> > > On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> > > > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > index 099468aee4d8..42eaf6683216 100644
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > > >   			vm_flags |= VM_NORESERVE;
> > > > >   	}
> > > > > +	if (map_deny_write_exec(NULL, vm_flags))
> > > > > +		return -EACCES;
> > > > > +
> > > > 
> > > > This seems like the wrong place to do the check -- that the vma argument
> > > > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> > > > it live in mmap_region()? What happens with MAP_FIXED, when there is
> > > > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> > > > check. For example, we had "c" above:
> > > > 
> > > >       c)	mmap(PROT_READ);
> > > > 	mprotect(PROT_READ|PROT_EXEC);		// fails
> > > > 
> > > > But this would allow another case:
> > > > 
> > > >       e)	addr = mmap(..., PROT_READ, ...);
> > > > 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
> > > 
> > > I can move the check into mmap_region() but it won't fix the MAP_FIXED
> > > example that you showed here.
> > > 
> > > mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
> > > However the `vma` for the 'old' region is not kept around, and a new vma will
> > > be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
> > > to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
> > > will just be as good as passing NULL.
> > > 
> > > It's possible to save the vm_flags from the region that is unmapped, but Catalin
> > > suggested it might be better if that is part of a later extension, what do you
> > > think?
> > 
> > I thought initially we should keep the behaviour close to what systemd
> > achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
> > vma is already executable (i.e. check actual permission change not just
> > the PROT_* flags).
> > 
> > We could pass the old vm_flags for that region (and maybe drop the vma
> > pointer entirely, just check old and new vm_flags). But this feels like
> > tightening slightly systemd's MDWE approach. If user-space doesn't get
> > confused by this, I'm fine to go with it. Otherwise we can add a new
> > flag later for this behaviour
> > 
> > I guess that's more of a question for Topi on whether point tightening
> > point (e) is feasible/desirable.
> 
> I think we want 1:1 compatibility with seccomp() for the basic version, so
> MAP_FIXED shouldn't change the verdict. Later we can introduce more versions
> (perhaps even less strict, too) when it's requested by configuration, like
> MemoryDenyWriteExecute=[relaxed | strict].

Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is
already PROT_EXEC? Or you'd rather reject that as well?

-- 
Catalin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-11-15 15:35             ` Catalin Marinas
  0 siblings, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2022-11-15 15:35 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Joey Gouly, Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote:
> On 10.11.2022 14.03, Catalin Marinas wrote:
> > On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
> > > On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> > > > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > index 099468aee4d8..42eaf6683216 100644
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > > >   			vm_flags |= VM_NORESERVE;
> > > > >   	}
> > > > > +	if (map_deny_write_exec(NULL, vm_flags))
> > > > > +		return -EACCES;
> > > > > +
> > > > 
> > > > This seems like the wrong place to do the check -- that the vma argument
> > > > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> > > > it live in mmap_region()? What happens with MAP_FIXED, when there is
> > > > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> > > > check. For example, we had "c" above:
> > > > 
> > > >       c)	mmap(PROT_READ);
> > > > 	mprotect(PROT_READ|PROT_EXEC);		// fails
> > > > 
> > > > But this would allow another case:
> > > > 
> > > >       e)	addr = mmap(..., PROT_READ, ...);
> > > > 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
> > > 
> > > I can move the check into mmap_region() but it won't fix the MAP_FIXED
> > > example that you showed here.
> > > 
> > > mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
> > > However the `vma` for the 'old' region is not kept around, and a new vma will
> > > be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
> > > to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
> > > will just be as good as passing NULL.
> > > 
> > > It's possible to save the vm_flags from the region that is unmapped, but Catalin
> > > suggested it might be better if that is part of a later extension, what do you
> > > think?
> > 
> > I thought initially we should keep the behaviour close to what systemd
> > achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
> > vma is already executable (i.e. check actual permission change not just
> > the PROT_* flags).
> > 
> > We could pass the old vm_flags for that region (and maybe drop the vma
> > pointer entirely, just check old and new vm_flags). But this feels like
> > tightening slightly systemd's MDWE approach. If user-space doesn't get
> > confused by this, I'm fine to go with it. Otherwise we can add a new
> > flag later for this behaviour
> > 
> > I guess that's more of a question for Topi on whether point tightening
> > point (e) is feasible/desirable.
> 
> I think we want 1:1 compatibility with seccomp() for the basic version, so
> MAP_FIXED shouldn't change the verdict. Later we can introduce more versions
> (perhaps even less strict, too) when it's requested by configuration, like
> MemoryDenyWriteExecute=[relaxed | strict].

Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is
already PROT_EXEC? Or you'd rather reject that as well?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
  2022-11-15 15:35             ` Catalin Marinas
@ 2022-11-15 19:31               ` Topi Miettinen
  -1 siblings, 0 replies; 34+ messages in thread
From: Topi Miettinen @ 2022-11-15 19:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joey Gouly, Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On 15.11.2022 17.35, Catalin Marinas wrote:
> On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote:
>> On 10.11.2022 14.03, Catalin Marinas wrote:
>>> On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
>>>> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
>>>>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>>> index 099468aee4d8..42eaf6683216 100644
>>>>>> --- a/mm/mmap.c
>>>>>> +++ b/mm/mmap.c
>>>>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>>>>>    			vm_flags |= VM_NORESERVE;
>>>>>>    	}
>>>>>> +	if (map_deny_write_exec(NULL, vm_flags))
>>>>>> +		return -EACCES;
>>>>>> +
>>>>>
>>>>> This seems like the wrong place to do the check -- that the vma argument
>>>>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
>>>>> it live in mmap_region()? What happens with MAP_FIXED, when there is
>>>>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
>>>>> check. For example, we had "c" above:
>>>>>
>>>>>        c)	mmap(PROT_READ);
>>>>> 	mprotect(PROT_READ|PROT_EXEC);		// fails
>>>>>
>>>>> But this would allow another case:
>>>>>
>>>>>        e)	addr = mmap(..., PROT_READ, ...);
>>>>> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
>>>>
>>>> I can move the check into mmap_region() but it won't fix the MAP_FIXED
>>>> example that you showed here.
>>>>
>>>> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
>>>> However the `vma` for the 'old' region is not kept around, and a new vma will
>>>> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
>>>> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
>>>> will just be as good as passing NULL.
>>>>
>>>> It's possible to save the vm_flags from the region that is unmapped, but Catalin
>>>> suggested it might be better if that is part of a later extension, what do you
>>>> think?
>>>
>>> I thought initially we should keep the behaviour close to what systemd
>>> achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
>>> vma is already executable (i.e. check actual permission change not just
>>> the PROT_* flags).
>>>
>>> We could pass the old vm_flags for that region (and maybe drop the vma
>>> pointer entirely, just check old and new vm_flags). But this feels like
>>> tightening slightly systemd's MDWE approach. If user-space doesn't get
>>> confused by this, I'm fine to go with it. Otherwise we can add a new
>>> flag later for this behaviour
>>>
>>> I guess that's more of a question for Topi on whether point tightening
>>> point (e) is feasible/desirable.
>>
>> I think we want 1:1 compatibility with seccomp() for the basic version, so
>> MAP_FIXED shouldn't change the verdict. Later we can introduce more versions
>> (perhaps even less strict, too) when it's requested by configuration, like
>> MemoryDenyWriteExecute=[relaxed | strict].
> 
> Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is
> already PROT_EXEC? Or you'd rather reject that as well?
> 

I think that it's OK to allow that. It's an incompatible change, but it 
shouldn't break anything.

-Topi


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2022-11-15 19:31               ` Topi Miettinen
  0 siblings, 0 replies; 34+ messages in thread
From: Topi Miettinen @ 2022-11-15 19:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joey Gouly, Kees Cook, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy,
	Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On 15.11.2022 17.35, Catalin Marinas wrote:
> On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote:
>> On 10.11.2022 14.03, Catalin Marinas wrote:
>>> On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
>>>> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
>>>>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>>> index 099468aee4d8..42eaf6683216 100644
>>>>>> --- a/mm/mmap.c
>>>>>> +++ b/mm/mmap.c
>>>>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>>>>>    			vm_flags |= VM_NORESERVE;
>>>>>>    	}
>>>>>> +	if (map_deny_write_exec(NULL, vm_flags))
>>>>>> +		return -EACCES;
>>>>>> +
>>>>>
>>>>> This seems like the wrong place to do the check -- that the vma argument
>>>>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
>>>>> it live in mmap_region()? What happens with MAP_FIXED, when there is
>>>>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
>>>>> check. For example, we had "c" above:
>>>>>
>>>>>        c)	mmap(PROT_READ);
>>>>> 	mprotect(PROT_READ|PROT_EXEC);		// fails
>>>>>
>>>>> But this would allow another case:
>>>>>
>>>>>        e)	addr = mmap(..., PROT_READ, ...);
>>>>> 	mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...);	// passes
>>>>
>>>> I can move the check into mmap_region() but it won't fix the MAP_FIXED
>>>> example that you showed here.
>>>>
>>>> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
>>>> However the `vma` for the 'old' region is not kept around, and a new vma will
>>>> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
>>>> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
>>>> will just be as good as passing NULL.
>>>>
>>>> It's possible to save the vm_flags from the region that is unmapped, but Catalin
>>>> suggested it might be better if that is part of a later extension, what do you
>>>> think?
>>>
>>> I thought initially we should keep the behaviour close to what systemd
>>> achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
>>> vma is already executable (i.e. check actual permission change not just
>>> the PROT_* flags).
>>>
>>> We could pass the old vm_flags for that region (and maybe drop the vma
>>> pointer entirely, just check old and new vm_flags). But this feels like
>>> tightening slightly systemd's MDWE approach. If user-space doesn't get
>>> confused by this, I'm fine to go with it. Otherwise we can add a new
>>> flag later for this behaviour
>>>
>>> I guess that's more of a question for Topi on whether point tightening
>>> point (e) is feasible/desirable.
>>
>> I think we want 1:1 compatibility with seccomp() for the basic version, so
>> MAP_FIXED shouldn't change the verdict. Later we can introduce more versions
>> (perhaps even less strict, too) when it's requested by configuration, like
>> MemoryDenyWriteExecute=[relaxed | strict].
> 
> Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is
> already PROT_EXEC? Or you'd rather reject that as well?
> 

I think that it's OK to allow that. It's an incompatible change, but it 
shouldn't break anything.

-Topi


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-11-15 19:32 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 15:04 [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly
2022-10-26 15:04 ` Joey Gouly
2022-10-26 15:04 ` [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly
2022-10-26 15:04   ` Joey Gouly
2022-10-28 18:51   ` Kees Cook
2022-10-28 18:51     ` Kees Cook
2022-11-10 11:27     ` Joey Gouly
2022-11-10 11:27       ` Joey Gouly
2022-11-10 12:03       ` Catalin Marinas
2022-11-10 12:03         ` Catalin Marinas
2022-11-12  6:11         ` Topi Miettinen
2022-11-12  6:11           ` Topi Miettinen
2022-11-15 15:35           ` Catalin Marinas
2022-11-15 15:35             ` Catalin Marinas
2022-11-15 19:31             ` Topi Miettinen
2022-11-15 19:31               ` Topi Miettinen
2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly
2022-10-26 15:04   ` Joey Gouly
2022-10-28 17:03   ` Mark Brown
2022-10-28 17:03     ` Mark Brown
2022-11-08 17:33     ` Joey Gouly
2022-11-08 17:33       ` Joey Gouly
2022-11-09 13:33       ` Mark Brown
2022-11-09 13:33         ` Mark Brown
2022-10-28 17:45   ` Kees Cook
2022-10-28 17:45     ` Kees Cook
2022-10-28 20:16   ` Kees Cook
2022-10-28 20:16     ` Kees Cook
2022-11-07 12:23     ` Szabolcs Nagy
2022-11-07 12:23       ` Szabolcs Nagy
2022-10-28 20:19   ` Kees Cook
2022-10-28 20:19     ` Kees Cook
2022-11-06 19:42 ` [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen
2022-11-06 19:42   ` Topi Miettinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.