All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
@ 2023-01-19 16:03 ` Joey Gouly
  0 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-01-19 16:03 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

Hi all,

This is v2 of the MDWE patchset.

Changes since v1:
	- Rewritten test - thanks Kees!
	- Added comment to `map_deny_write_exec`
	- Moved flag check into mmap_region, should be no functional change
	- Rebased onto v6.2-rc4

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [2], implemented as a SECCOMP BPF filter. Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable. Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but
subsequently changed to read-only. Therefore the filter simply rejects
any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
support (Branch Target Identification), the dynamic loader cannot change
an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
For libraries, it can resort to unmapping and re-mapping but for the
main executable it does not have a file descriptor. The original bug
report in the Red Hat bugzilla - [3] - and subsequent glibc workaround
for libraries - [4].

This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
However unlike the BPF filter it only denies it if the mapping didn't previous
have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
which is a problem with the BPF filter.

Thanks,
Joey

[1] https://lore.kernel.org/linux-arm-kernel/20221026150457.36957-1-joey.gouly@arm.com/
[2] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
[4] https://sourceware.org/bugzilla/show_bug.cgi?id=26831


Joey Gouly (1):
  mm: Implement memory-deny-write-execute as a prctl

Kees Cook (1):
  kselftest: vm: add tests for memory-deny-write-execute

 include/linux/mman.h                   |  34 +++++
 include/linux/sched/coredump.h         |   6 +-
 include/uapi/linux/prctl.h             |   6 +
 kernel/sys.c                           |  33 +++++
 mm/mmap.c                              |  10 ++
 mm/mprotect.c                          |   5 +
 tools/testing/selftests/vm/Makefile    |   1 +
 tools/testing/selftests/vm/mdwe_test.c | 197 +++++++++++++++++++++++++
 8 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
@ 2023-01-19 16:03 ` Joey Gouly
  0 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-01-19 16:03 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

Hi all,

This is v2 of the MDWE patchset.

Changes since v1:
	- Rewritten test - thanks Kees!
	- Added comment to `map_deny_write_exec`
	- Moved flag check into mmap_region, should be no functional change
	- Rebased onto v6.2-rc4

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [2], implemented as a SECCOMP BPF filter. Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable. Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but
subsequently changed to read-only. Therefore the filter simply rejects
any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
support (Branch Target Identification), the dynamic loader cannot change
an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
For libraries, it can resort to unmapping and re-mapping but for the
main executable it does not have a file descriptor. The original bug
report in the Red Hat bugzilla - [3] - and subsequent glibc workaround
for libraries - [4].

This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
However unlike the BPF filter it only denies it if the mapping didn't previous
have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
which is a problem with the BPF filter.

Thanks,
Joey

[1] https://lore.kernel.org/linux-arm-kernel/20221026150457.36957-1-joey.gouly@arm.com/
[2] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
[4] https://sourceware.org/bugzilla/show_bug.cgi?id=26831


Joey Gouly (1):
  mm: Implement memory-deny-write-execute as a prctl

Kees Cook (1):
  kselftest: vm: add tests for memory-deny-write-execute

 include/linux/mman.h                   |  34 +++++
 include/linux/sched/coredump.h         |   6 +-
 include/uapi/linux/prctl.h             |   6 +
 kernel/sys.c                           |  33 +++++
 mm/mmap.c                              |  10 ++
 mm/mprotect.c                          |   5 +
 tools/testing/selftests/vm/Makefile    |   1 +
 tools/testing/selftests/vm/mdwe_test.c | 197 +++++++++++++++++++++++++
 8 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-19 16:03 ` Joey Gouly
@ 2023-01-19 16:03   ` Joey Gouly
  -1 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-01-19 16:03 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

The aim of such policy is to prevent a user task from creating an
executable mapping that is also writeable.

An example of mmap() returning -EACCESS if the policy is enabled:

	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);

Similarly, mprotect() would return -EACCESS below:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);

The BPF filter that systemd MDWE uses is stateless, and disallows
mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
be enabled if it was already PROT_EXEC, which allows the following case:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

where PROT_BTI enables branch tracking identification on arm64.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
 include/linux/sched/coredump.h |  6 +++++-
 include/uapi/linux/prctl.h     |  6 ++++++
 kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
 mm/mmap.c                      | 10 ++++++++++
 mm/mprotect.c                  |  5 +++++
 6 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 58b3abd457a3..cee1e4b566d8 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
 }
 
 unsigned long vm_commit_limit(void);
+
+/*
+ * Denies creating a writable executable mapping or gaining executable permissions.
+ *
+ * This denies the following:
+ *
+ * 	a)	mmap(PROT_WRITE | PROT_EXEC)
+ *
+ *	b)	mmap(PROT_WRITE)
+ *		mprotect(PROT_EXEC)
+ *
+ *	c)	mmap(PROT_WRITE)
+ *		mprotect(PROT_READ)
+ *		mprotect(PROT_EXEC)
+ *
+ * But allows the following:
+ *
+ *	d)	mmap(PROT_READ | PROT_EXEC)
+ *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
+ */
+static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
+{
+	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return false;
+
+	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+		return true;
+
+	if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+		return true;
+
+	return false;
+}
+
 #endif /* _LINUX_MMAN_H */
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 8270ad7ae14c..0e17ae7fbfd3 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
  * lifecycle of this mm, just for simplicity.
  */
 #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
+
+#define MMF_HAS_MDWE		28
+#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
+
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
-				 MMF_DISABLE_THP_MASK)
+				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
 
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a5e06dcbba13..1312a137f7fb 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -281,6 +281,12 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK		0xffff
 # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/* Memory deny write / execute */
+#define PR_SET_MDWE			65
+# define PR_MDWE_REFUSE_EXEC_GAIN	1
+
+#define PR_GET_MDWE			66
+
 #define PR_SET_VMA		0x53564d41
 # define PR_SET_VMA_ANON_NAME		0
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 5fd54bf0e886..b3cab94545ed 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2348,6 +2348,33 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
 }
 #endif /* CONFIG_ANON_VMA_NAME */
 
+static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
+				 unsigned long arg4, unsigned long arg5)
+{
+	if (arg3 || arg4 || arg5)
+		return -EINVAL;
+
+	if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
+		return -EINVAL;
+
+	if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
+		set_bit(MMF_HAS_MDWE, &current->mm->flags);
+	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return -EPERM; /* Cannot unset the flag */
+
+	return 0;
+}
+
+static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
+				 unsigned long arg4, unsigned long arg5)
+{
+	if (arg2 || arg3 || arg4 || arg5)
+		return -EINVAL;
+
+	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
+		PR_MDWE_REFUSE_EXEC_GAIN : 0;
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2623,6 +2650,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
 		break;
 #endif
+	case PR_SET_MDWE:
+		error = prctl_set_mdwe(arg2, arg3, arg4, arg5);
+		break;
+	case PR_GET_MDWE:
+		error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
+		break;
 	case PR_SET_VMA:
 		error = prctl_set_vma(arg2, arg3, arg4, arg5);
 		break;
diff --git a/mm/mmap.c b/mm/mmap.c
index 87d929316d57..99a4d9e2b0d8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2665,6 +2665,16 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vma_set_anonymous(vma);
 	}
 
+	if (map_deny_write_exec(vma, vma->vm_flags)) {
+		error = -EACCES;
+		if (file)
+			goto close_and_free_vma;
+		else if (vma->vm_file)
+			goto unmap_and_free_vma;
+		else
+			goto free_vma;
+	}
+
 	/* Allow architectures to sanity-check the vm_flags */
 	if (!arch_validate_flags(vma->vm_flags)) {
 		error = -EINVAL;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 908df12caa26..bc0587df042f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -762,6 +762,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			break;
 		}
 
+		if (map_deny_write_exec(vma, newflags)) {
+			error = -EACCES;
+			goto out;
+		}
+
 		/* Allow architectures to sanity-check the new flags */
 		if (!arch_validate_flags(newflags)) {
 			error = -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-19 16:03   ` Joey Gouly
  0 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-01-19 16:03 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

The aim of such policy is to prevent a user task from creating an
executable mapping that is also writeable.

An example of mmap() returning -EACCESS if the policy is enabled:

	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);

Similarly, mprotect() would return -EACCESS below:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);

The BPF filter that systemd MDWE uses is stateless, and disallows
mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
be enabled if it was already PROT_EXEC, which allows the following case:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

where PROT_BTI enables branch tracking identification on arm64.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
 include/linux/sched/coredump.h |  6 +++++-
 include/uapi/linux/prctl.h     |  6 ++++++
 kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
 mm/mmap.c                      | 10 ++++++++++
 mm/mprotect.c                  |  5 +++++
 6 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 58b3abd457a3..cee1e4b566d8 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
 }
 
 unsigned long vm_commit_limit(void);
+
+/*
+ * Denies creating a writable executable mapping or gaining executable permissions.
+ *
+ * This denies the following:
+ *
+ * 	a)	mmap(PROT_WRITE | PROT_EXEC)
+ *
+ *	b)	mmap(PROT_WRITE)
+ *		mprotect(PROT_EXEC)
+ *
+ *	c)	mmap(PROT_WRITE)
+ *		mprotect(PROT_READ)
+ *		mprotect(PROT_EXEC)
+ *
+ * But allows the following:
+ *
+ *	d)	mmap(PROT_READ | PROT_EXEC)
+ *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
+ */
+static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
+{
+	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return false;
+
+	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+		return true;
+
+	if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+		return true;
+
+	return false;
+}
+
 #endif /* _LINUX_MMAN_H */
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 8270ad7ae14c..0e17ae7fbfd3 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
  * lifecycle of this mm, just for simplicity.
  */
 #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
+
+#define MMF_HAS_MDWE		28
+#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
+
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
-				 MMF_DISABLE_THP_MASK)
+				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
 
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a5e06dcbba13..1312a137f7fb 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -281,6 +281,12 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK		0xffff
 # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/* Memory deny write / execute */
+#define PR_SET_MDWE			65
+# define PR_MDWE_REFUSE_EXEC_GAIN	1
+
+#define PR_GET_MDWE			66
+
 #define PR_SET_VMA		0x53564d41
 # define PR_SET_VMA_ANON_NAME		0
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 5fd54bf0e886..b3cab94545ed 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2348,6 +2348,33 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
 }
 #endif /* CONFIG_ANON_VMA_NAME */
 
+static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
+				 unsigned long arg4, unsigned long arg5)
+{
+	if (arg3 || arg4 || arg5)
+		return -EINVAL;
+
+	if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
+		return -EINVAL;
+
+	if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
+		set_bit(MMF_HAS_MDWE, &current->mm->flags);
+	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return -EPERM; /* Cannot unset the flag */
+
+	return 0;
+}
+
+static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
+				 unsigned long arg4, unsigned long arg5)
+{
+	if (arg2 || arg3 || arg4 || arg5)
+		return -EINVAL;
+
+	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
+		PR_MDWE_REFUSE_EXEC_GAIN : 0;
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2623,6 +2650,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
 		break;
 #endif
+	case PR_SET_MDWE:
+		error = prctl_set_mdwe(arg2, arg3, arg4, arg5);
+		break;
+	case PR_GET_MDWE:
+		error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
+		break;
 	case PR_SET_VMA:
 		error = prctl_set_vma(arg2, arg3, arg4, arg5);
 		break;
diff --git a/mm/mmap.c b/mm/mmap.c
index 87d929316d57..99a4d9e2b0d8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2665,6 +2665,16 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vma_set_anonymous(vma);
 	}
 
+	if (map_deny_write_exec(vma, vma->vm_flags)) {
+		error = -EACCES;
+		if (file)
+			goto close_and_free_vma;
+		else if (vma->vm_file)
+			goto unmap_and_free_vma;
+		else
+			goto free_vma;
+	}
+
 	/* Allow architectures to sanity-check the vm_flags */
 	if (!arch_validate_flags(vma->vm_flags)) {
 		error = -EINVAL;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 908df12caa26..bc0587df042f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -762,6 +762,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			break;
 		}
 
+		if (map_deny_write_exec(vma, newflags)) {
+			error = -EACCES;
+			goto out;
+		}
+
 		/* Allow architectures to sanity-check the new flags */
 		if (!arch_validate_flags(newflags)) {
 			error = -EINVAL;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2023-01-19 16:03 ` Joey Gouly
@ 2023-01-19 16:03   ` Joey Gouly
  -1 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-01-19 16:03 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

From: Kees Cook <keescook@chromium.org>

Add some tests to cover the new PR_SET_MDWE prctl.

Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
---
 tools/testing/selftests/vm/Makefile    |   1 +
 tools/testing/selftests/vm/mdwe_test.c | 197 +++++++++++++++++++++++++
 2 files changed, 198 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 89c14e41bd43..4e1a7b1d4c52 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -56,6 +56,7 @@ TEST_GEN_PROGS += soft-dirty
 TEST_GEN_PROGS += split_huge_page_test
 TEST_GEN_FILES += ksm_tests
 TEST_GEN_PROGS += ksm_functional_tests
+TEST_GEN_PROGS += mdwe_test
 
 ifeq ($(MACHINE),x86_64)
 CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c
new file mode 100644
index 000000000000..f466a099f1bf
--- /dev/null
+++ b/tools/testing/selftests/vm/mdwe_test.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifdef __aarch64__
+#include <asm/hwcap.h>
+#endif
+
+#include <linux/mman.h>
+#include <linux/prctl.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef __aarch64__
+# define PROT_BTI	0
+#endif
+
+TEST(prctl_flags)
+{
+	EXPECT_LT(prctl(PR_SET_MDWE, 7L, 0L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 7L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 7L, 0L), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 0L, 7L), 0);
+
+	EXPECT_LT(prctl(PR_GET_MDWE, 7L, 0L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 7L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 7L, 0L), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 0L, 7L), 0);
+}
+
+FIXTURE(mdwe)
+{
+	void *p;
+	int flags;
+	size_t size;
+	pid_t pid;
+};
+
+FIXTURE_VARIANT(mdwe)
+{
+	bool enabled;
+	bool forked;
+};
+
+FIXTURE_VARIANT_ADD(mdwe, stock)
+{
+        .enabled = false,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, enabled)
+{
+        .enabled = true,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, forked)
+{
+        .enabled = true,
+	.forked = true,
+};
+
+FIXTURE_SETUP(mdwe)
+{
+	int ret, status;
+
+	self->p = NULL;
+	self->flags = MAP_SHARED | MAP_ANONYMOUS;
+	self->size = getpagesize();
+
+	if (!variant->enabled)
+		return;
+
+	ret = prctl(PR_SET_MDWE, PR_MDWE_REFUSE_EXEC_GAIN, 0L, 0L, 0L);
+	ASSERT_EQ(ret, 0) {
+		TH_LOG("PR_SET_MDWE failed or unsupported");
+	}
+
+	ret = prctl(PR_GET_MDWE, 0L, 0L, 0L, 0L);
+	ASSERT_EQ(ret, 1);
+
+	if (variant->forked) {
+		self->pid = fork();
+		ASSERT_GE(self->pid, 0) {
+			TH_LOG("fork failed\n");
+		}
+
+		if (self->pid > 0) {
+			ret = waitpid(self->pid, &status, 0);
+			ASSERT_TRUE(WIFEXITED(status));
+			exit(WEXITSTATUS(status));
+		}
+	}
+}
+
+FIXTURE_TEARDOWN(mdwe)
+{
+	if (self->p && self->p != MAP_FAILED)
+		munmap(self->p, self->size);
+}
+
+TEST_F(mdwe, mmap_READ_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	EXPECT_NE(self->p, MAP_FAILED);
+}
+
+TEST_F(mdwe, mmap_WRITE_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(self->p, MAP_FAILED);
+	} else {
+		EXPECT_NE(self->p, MAP_FAILED);
+	}
+}
+
+TEST_F(mdwe, mprotect_stay_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_F(mdwe, mprotect_add_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mprotect_WRITE_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_WRITE, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mmap_FIXED)
+{
+	void *p, *p2;
+
+	p2 = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	p = mmap(self->p + self->size, self->size, PROT_READ | PROT_EXEC,
+		 self->flags | MAP_FIXED, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(p, MAP_FAILED);
+	} else {
+		EXPECT_EQ(p, self->p);
+	}
+}
+
+TEST_F(mdwe, arm64_BTI)
+{
+	int ret;
+
+#ifdef __aarch64__
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI))
+#endif
+		SKIP(return, "HWCAP2_BTI not supported");
+
+	self->p = mmap(NULL, self->size, PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_EXEC | PROT_BTI);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_HARNESS_MAIN
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2023-01-19 16:03   ` Joey Gouly
  0 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-01-19 16:03 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, joey.gouly, shuah

From: Kees Cook <keescook@chromium.org>

Add some tests to cover the new PR_SET_MDWE prctl.

Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
---
 tools/testing/selftests/vm/Makefile    |   1 +
 tools/testing/selftests/vm/mdwe_test.c | 197 +++++++++++++++++++++++++
 2 files changed, 198 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 89c14e41bd43..4e1a7b1d4c52 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -56,6 +56,7 @@ TEST_GEN_PROGS += soft-dirty
 TEST_GEN_PROGS += split_huge_page_test
 TEST_GEN_FILES += ksm_tests
 TEST_GEN_PROGS += ksm_functional_tests
+TEST_GEN_PROGS += mdwe_test
 
 ifeq ($(MACHINE),x86_64)
 CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c
new file mode 100644
index 000000000000..f466a099f1bf
--- /dev/null
+++ b/tools/testing/selftests/vm/mdwe_test.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifdef __aarch64__
+#include <asm/hwcap.h>
+#endif
+
+#include <linux/mman.h>
+#include <linux/prctl.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef __aarch64__
+# define PROT_BTI	0
+#endif
+
+TEST(prctl_flags)
+{
+	EXPECT_LT(prctl(PR_SET_MDWE, 7L, 0L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 7L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 7L, 0L), 0);
+	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 0L, 7L), 0);
+
+	EXPECT_LT(prctl(PR_GET_MDWE, 7L, 0L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 7L, 0L, 0L), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 7L, 0L), 0);
+	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 0L, 7L), 0);
+}
+
+FIXTURE(mdwe)
+{
+	void *p;
+	int flags;
+	size_t size;
+	pid_t pid;
+};
+
+FIXTURE_VARIANT(mdwe)
+{
+	bool enabled;
+	bool forked;
+};
+
+FIXTURE_VARIANT_ADD(mdwe, stock)
+{
+        .enabled = false,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, enabled)
+{
+        .enabled = true,
+	.forked = false,
+};
+
+FIXTURE_VARIANT_ADD(mdwe, forked)
+{
+        .enabled = true,
+	.forked = true,
+};
+
+FIXTURE_SETUP(mdwe)
+{
+	int ret, status;
+
+	self->p = NULL;
+	self->flags = MAP_SHARED | MAP_ANONYMOUS;
+	self->size = getpagesize();
+
+	if (!variant->enabled)
+		return;
+
+	ret = prctl(PR_SET_MDWE, PR_MDWE_REFUSE_EXEC_GAIN, 0L, 0L, 0L);
+	ASSERT_EQ(ret, 0) {
+		TH_LOG("PR_SET_MDWE failed or unsupported");
+	}
+
+	ret = prctl(PR_GET_MDWE, 0L, 0L, 0L, 0L);
+	ASSERT_EQ(ret, 1);
+
+	if (variant->forked) {
+		self->pid = fork();
+		ASSERT_GE(self->pid, 0) {
+			TH_LOG("fork failed\n");
+		}
+
+		if (self->pid > 0) {
+			ret = waitpid(self->pid, &status, 0);
+			ASSERT_TRUE(WIFEXITED(status));
+			exit(WEXITSTATUS(status));
+		}
+	}
+}
+
+FIXTURE_TEARDOWN(mdwe)
+{
+	if (self->p && self->p != MAP_FAILED)
+		munmap(self->p, self->size);
+}
+
+TEST_F(mdwe, mmap_READ_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	EXPECT_NE(self->p, MAP_FAILED);
+}
+
+TEST_F(mdwe, mmap_WRITE_EXEC)
+{
+	self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(self->p, MAP_FAILED);
+	} else {
+		EXPECT_NE(self->p, MAP_FAILED);
+	}
+}
+
+TEST_F(mdwe, mprotect_stay_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_F(mdwe, mprotect_add_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mprotect_WRITE_EXEC)
+{
+	int ret;
+
+	self->p = mmap(NULL, self->size, PROT_WRITE, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
+	if (variant->enabled) {
+		EXPECT_LT(ret, 0);
+	} else {
+		EXPECT_EQ(ret, 0);
+	}
+}
+
+TEST_F(mdwe, mmap_FIXED)
+{
+	void *p, *p2;
+
+	p2 = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
+	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	p = mmap(self->p + self->size, self->size, PROT_READ | PROT_EXEC,
+		 self->flags | MAP_FIXED, 0, 0);
+	if (variant->enabled) {
+		EXPECT_EQ(p, MAP_FAILED);
+	} else {
+		EXPECT_EQ(p, self->p);
+	}
+}
+
+TEST_F(mdwe, arm64_BTI)
+{
+	int ret;
+
+#ifdef __aarch64__
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI))
+#endif
+		SKIP(return, "HWCAP2_BTI not supported");
+
+	self->p = mmap(NULL, self->size, PROT_EXEC, self->flags, 0, 0);
+	ASSERT_NE(self->p, MAP_FAILED);
+
+	ret = mprotect(self->p, self->size, PROT_EXEC | PROT_BTI);
+	EXPECT_EQ(ret, 0);
+}
+
+TEST_HARNESS_MAIN
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-19 16:03   ` Joey Gouly
@ 2023-01-23 11:45     ` David Hildenbrand
  -1 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2023-01-23 11:45 UTC (permalink / raw)
  To: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On 19.01.23 17:03, Joey Gouly wrote:
> The aim of such policy is to prevent a user task from creating an
> executable mapping that is also writeable.
> 
> An example of mmap() returning -EACCESS if the policy is enabled:
> 
> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> 
> Similarly, mprotect() would return -EACCESS below:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> The BPF filter that systemd MDWE uses is stateless, and disallows
> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> be enabled if it was already PROT_EXEC, which allows the following case:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> where PROT_BTI enables branch tracking identification on arm64.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>   include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>   include/linux/sched/coredump.h |  6 +++++-
>   include/uapi/linux/prctl.h     |  6 ++++++
>   kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>   mm/mmap.c                      | 10 ++++++++++
>   mm/mprotect.c                  |  5 +++++
>   6 files changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..cee1e4b566d8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>   }
>   
>   unsigned long vm_commit_limit(void);
> +
> +/*
> + * Denies creating a writable executable mapping or gaining executable permissions.
> + *
> + * This denies the following:
> + *
> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> + *
> + *	b)	mmap(PROT_WRITE)
> + *		mprotect(PROT_EXEC)
> + *
> + *	c)	mmap(PROT_WRITE)
> + *		mprotect(PROT_READ)
> + *		mprotect(PROT_EXEC)
> + *
> + * But allows the following:
> + *
> + *	d)	mmap(PROT_READ | PROT_EXEC)
> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> + */

Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set 
VM_EXEC anymore? In an ideal world, there would be no further mprotect 
changes required.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 11:45     ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2023-01-23 11:45 UTC (permalink / raw)
  To: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek
  Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
	Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
	linux-kernel, linux-abi-devel, nd, shuah

On 19.01.23 17:03, Joey Gouly wrote:
> The aim of such policy is to prevent a user task from creating an
> executable mapping that is also writeable.
> 
> An example of mmap() returning -EACCESS if the policy is enabled:
> 
> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> 
> Similarly, mprotect() would return -EACCESS below:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> The BPF filter that systemd MDWE uses is stateless, and disallows
> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> be enabled if it was already PROT_EXEC, which allows the following case:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> where PROT_BTI enables branch tracking identification on arm64.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>   include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>   include/linux/sched/coredump.h |  6 +++++-
>   include/uapi/linux/prctl.h     |  6 ++++++
>   kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>   mm/mmap.c                      | 10 ++++++++++
>   mm/mprotect.c                  |  5 +++++
>   6 files changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..cee1e4b566d8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>   }
>   
>   unsigned long vm_commit_limit(void);
> +
> +/*
> + * Denies creating a writable executable mapping or gaining executable permissions.
> + *
> + * This denies the following:
> + *
> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> + *
> + *	b)	mmap(PROT_WRITE)
> + *		mprotect(PROT_EXEC)
> + *
> + *	c)	mmap(PROT_WRITE)
> + *		mprotect(PROT_READ)
> + *		mprotect(PROT_EXEC)
> + *
> + * But allows the following:
> + *
> + *	d)	mmap(PROT_READ | PROT_EXEC)
> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> + */

Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set 
VM_EXEC anymore? In an ideal world, there would be no further mprotect 
changes required.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-23 11:45     ` David Hildenbrand
@ 2023-01-23 12:19       ` Catalin Marinas
  -1 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-01-23 12:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
> On 19.01.23 17:03, Joey Gouly wrote:
> > The aim of such policy is to prevent a user task from creating an
> > executable mapping that is also writeable.
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > The BPF filter that systemd MDWE uses is stateless, and disallows
> > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> > be enabled if it was already PROT_EXEC, which allows the following case:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >   include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
> >   include/linux/sched/coredump.h |  6 +++++-
> >   include/uapi/linux/prctl.h     |  6 ++++++
> >   kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
> >   mm/mmap.c                      | 10 ++++++++++
> >   mm/mprotect.c                  |  5 +++++
> >   6 files changed, 93 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 58b3abd457a3..cee1e4b566d8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
> >   }
> >   unsigned long vm_commit_limit(void);
> > +
> > +/*
> > + * Denies creating a writable executable mapping or gaining executable permissions.
> > + *
> > + * This denies the following:
> > + *
> > + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> > + *
> > + *	b)	mmap(PROT_WRITE)
> > + *		mprotect(PROT_EXEC)
> > + *
> > + *	c)	mmap(PROT_WRITE)
> > + *		mprotect(PROT_READ)
> > + *		mprotect(PROT_EXEC)
> > + *
> > + * But allows the following:
> > + *
> > + *	d)	mmap(PROT_READ | PROT_EXEC)
> > + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> > + */
> 
> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
> anymore? In an ideal world, there would be no further mprotect changes
> required.

I don't think it works for this scenario. We don't want to disable
PROT_EXEC entirely, only disallow it if the mapping is not already
executable. The below should be allowed:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
time.

We could clear VM_MAYEXEC if the mapping was made VM_WRITE (either by
mmap() or mprotect()) but IIRC we concluded that this should be an
additional prctl() flag. This series aims to be pretty much a drop-in
replacement for the systemd's MDWE SECCOMP feature (but allowing
PROT_BTI).

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 12:19       ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-01-23 12:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
> On 19.01.23 17:03, Joey Gouly wrote:
> > The aim of such policy is to prevent a user task from creating an
> > executable mapping that is also writeable.
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > The BPF filter that systemd MDWE uses is stateless, and disallows
> > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> > be enabled if it was already PROT_EXEC, which allows the following case:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >   include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
> >   include/linux/sched/coredump.h |  6 +++++-
> >   include/uapi/linux/prctl.h     |  6 ++++++
> >   kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
> >   mm/mmap.c                      | 10 ++++++++++
> >   mm/mprotect.c                  |  5 +++++
> >   6 files changed, 93 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 58b3abd457a3..cee1e4b566d8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
> >   }
> >   unsigned long vm_commit_limit(void);
> > +
> > +/*
> > + * Denies creating a writable executable mapping or gaining executable permissions.
> > + *
> > + * This denies the following:
> > + *
> > + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> > + *
> > + *	b)	mmap(PROT_WRITE)
> > + *		mprotect(PROT_EXEC)
> > + *
> > + *	c)	mmap(PROT_WRITE)
> > + *		mprotect(PROT_READ)
> > + *		mprotect(PROT_EXEC)
> > + *
> > + * But allows the following:
> > + *
> > + *	d)	mmap(PROT_READ | PROT_EXEC)
> > + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> > + */
> 
> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
> anymore? In an ideal world, there would be no further mprotect changes
> required.

I don't think it works for this scenario. We don't want to disable
PROT_EXEC entirely, only disallow it if the mapping is not already
executable. The below should be allowed:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
time.

We could clear VM_MAYEXEC if the mapping was made VM_WRITE (either by
mmap() or mprotect()) but IIRC we concluded that this should be an
additional prctl() flag. This series aims to be pretty much a drop-in
replacement for the systemd's MDWE SECCOMP feature (but allowing
PROT_BTI).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-23 12:19       ` Catalin Marinas
@ 2023-01-23 12:53         ` David Hildenbrand
  -1 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2023-01-23 12:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On 23.01.23 13:19, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>> On 19.01.23 17:03, Joey Gouly wrote:
>>> The aim of such policy is to prevent a user task from creating an
>>> executable mapping that is also writeable.
>>>
>>> An example of mmap() returning -EACCESS if the policy is enabled:
>>>
>>> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
>>>
>>> Similarly, mprotect() would return -EACCESS below:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
>>>
>>> The BPF filter that systemd MDWE uses is stateless, and disallows
>>> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
>>> be enabled if it was already PROT_EXEC, which allows the following case:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> where PROT_BTI enables branch tracking identification on arm64.
>>>
>>> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
>>> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>    include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>>>    include/linux/sched/coredump.h |  6 +++++-
>>>    include/uapi/linux/prctl.h     |  6 ++++++
>>>    kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>>>    mm/mmap.c                      | 10 ++++++++++
>>>    mm/mprotect.c                  |  5 +++++
>>>    6 files changed, 93 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>> index 58b3abd457a3..cee1e4b566d8 100644
>>> --- a/include/linux/mman.h
>>> +++ b/include/linux/mman.h
>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>    }
>>>    unsigned long vm_commit_limit(void);
>>> +
>>> +/*
>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>> + *
>>> + * This denies the following:
>>> + *
>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>> + *
>>> + *	b)	mmap(PROT_WRITE)
>>> + *		mprotect(PROT_EXEC)
>>> + *
>>> + *	c)	mmap(PROT_WRITE)
>>> + *		mprotect(PROT_READ)
>>> + *		mprotect(PROT_EXEC)
>>> + *
>>> + * But allows the following:
>>> + *
>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>> + */
>>
>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>> anymore? In an ideal world, there would be no further mprotect changes
>> required.
> 
> I don't think it works for this scenario. We don't want to disable
> PROT_EXEC entirely, only disallow it if the mapping is not already
> executable. The below should be allowed:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
> time.

Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and 
disallow VM_EXEC of course). But I guess we'd have to go one step 
further: if we allow exec access at mmap time, clear VM_MAYWRITE (and 
disallow VM_WRITE of course).

That at least would be then similar to how we handle mmaped files: if 
the file is not executable, we clear VM_MAYEXEC. If the file is not 
writable, we clear VM_MAYWRITE.


Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to 
such memory would be forbidden, which might also be what we are trying 
to achieve, or is that expected to still work? But clearing VM_MAYWRITE 
would mean that is_cow_mapping() would no longer fire for some VMAs, and 
we'd have to check if that's fine in all cases.


Having that said, this patch handles the case when the prctl is applied 
to a process after already having created some writable or executable 
mappings, to at least forbid if afterwards on these mappings. What is 
expected to happen if the process already has writable mappings that are 
executable at the time we enable the prctl?

Clarifying what the expected semantics with /proc/self/mem are would be 
nice.

> 
> We could clear VM_MAYEXEC if the mapping was made VM_WRITE (either by
> mmap() or mprotect()) but IIRC we concluded that this should be an
> additional prctl() flag.

Yes, I agree with enabling this restriction on a per-process level. My 
remarks only apply to processes with this toggle enabled.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 12:53         ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2023-01-23 12:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On 23.01.23 13:19, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>> On 19.01.23 17:03, Joey Gouly wrote:
>>> The aim of such policy is to prevent a user task from creating an
>>> executable mapping that is also writeable.
>>>
>>> An example of mmap() returning -EACCESS if the policy is enabled:
>>>
>>> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
>>>
>>> Similarly, mprotect() would return -EACCESS below:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
>>>
>>> The BPF filter that systemd MDWE uses is stateless, and disallows
>>> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
>>> be enabled if it was already PROT_EXEC, which allows the following case:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> where PROT_BTI enables branch tracking identification on arm64.
>>>
>>> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
>>> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>    include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>>>    include/linux/sched/coredump.h |  6 +++++-
>>>    include/uapi/linux/prctl.h     |  6 ++++++
>>>    kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>>>    mm/mmap.c                      | 10 ++++++++++
>>>    mm/mprotect.c                  |  5 +++++
>>>    6 files changed, 93 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>> index 58b3abd457a3..cee1e4b566d8 100644
>>> --- a/include/linux/mman.h
>>> +++ b/include/linux/mman.h
>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>    }
>>>    unsigned long vm_commit_limit(void);
>>> +
>>> +/*
>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>> + *
>>> + * This denies the following:
>>> + *
>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>> + *
>>> + *	b)	mmap(PROT_WRITE)
>>> + *		mprotect(PROT_EXEC)
>>> + *
>>> + *	c)	mmap(PROT_WRITE)
>>> + *		mprotect(PROT_READ)
>>> + *		mprotect(PROT_EXEC)
>>> + *
>>> + * But allows the following:
>>> + *
>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>> + */
>>
>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>> anymore? In an ideal world, there would be no further mprotect changes
>> required.
> 
> I don't think it works for this scenario. We don't want to disable
> PROT_EXEC entirely, only disallow it if the mapping is not already
> executable. The below should be allowed:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
> time.

Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and 
disallow VM_EXEC of course). But I guess we'd have to go one step 
further: if we allow exec access at mmap time, clear VM_MAYWRITE (and 
disallow VM_WRITE of course).

That at least would be then similar to how we handle mmaped files: if 
the file is not executable, we clear VM_MAYEXEC. If the file is not 
writable, we clear VM_MAYWRITE.


Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to 
such memory would be forbidden, which might also be what we are trying 
to achieve, or is that expected to still work? But clearing VM_MAYWRITE 
would mean that is_cow_mapping() would no longer fire for some VMAs, and 
we'd have to check if that's fine in all cases.


Having that said, this patch handles the case when the prctl is applied 
to a process after already having created some writable or executable 
mappings, to at least forbid if afterwards on these mappings. What is 
expected to happen if the process already has writable mappings that are 
executable at the time we enable the prctl?

Clarifying what the expected semantics with /proc/self/mem are would be 
nice.

> 
> We could clear VM_MAYEXEC if the mapping was made VM_WRITE (either by
> mmap() or mprotect()) but IIRC we concluded that this should be an
> additional prctl() flag.

Yes, I agree with enabling this restriction on a per-process level. My 
remarks only apply to processes with this toggle enabled.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-23 12:53         ` David Hildenbrand
@ 2023-01-23 16:04           ` Catalin Marinas
  -1 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-01-23 16:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
> On 23.01.23 13:19, Catalin Marinas wrote:
> > On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
> > > On 19.01.23 17:03, Joey Gouly wrote:
> > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > index 58b3abd457a3..cee1e4b566d8 100644
> > > > --- a/include/linux/mman.h
> > > > +++ b/include/linux/mman.h
> > > > @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
> > > >    }
> > > >    unsigned long vm_commit_limit(void);
> > > > +
> > > > +/*
> > > > + * Denies creating a writable executable mapping or gaining executable permissions.
> > > > + *
> > > > + * This denies the following:
> > > > + *
> > > > + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> > > > + *
> > > > + *	b)	mmap(PROT_WRITE)
> > > > + *		mprotect(PROT_EXEC)
> > > > + *
> > > > + *	c)	mmap(PROT_WRITE)
> > > > + *		mprotect(PROT_READ)
> > > > + *		mprotect(PROT_EXEC)
> > > > + *
> > > > + * But allows the following:
> > > > + *
> > > > + *	d)	mmap(PROT_READ | PROT_EXEC)
> > > > + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> > > > + */
> > > 
> > > Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
> > > anymore? In an ideal world, there would be no further mprotect changes
> > > required.
> > 
> > I don't think it works for this scenario. We don't want to disable
> > PROT_EXEC entirely, only disallow it if the mapping is not already
> > executable. The below should be allowed:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
> > time.
> 
> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
> VM_EXEC of course).

This should work but it doesn't fully mimic systemd's MDWE behaviour
(e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).
Topi wanted to stay close to that at least in the first incarnation of
this control (can be extended later).

> But I guess we'd have to go one step further: if we allow exec access
> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).

Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
but as additional controls a process can enable.

> That at least would be then similar to how we handle mmaped files: if the
> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
> clear VM_MAYWRITE.

We still allow VM_MAYWRITE for private mappings, though we do clear
VM_MAYEXEC if not executable.

It would be nice to use VM_MAY* flags for this logic but we can only
emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
with some weird combinations like having VM_EXEC without VM_MAYEXEC
(maybe that's fine).

> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
> memory would be forbidden, which might also be what we are trying to
> achieve, or is that expected to still work?

I think currently with systemd's MDWE it still works (I haven't tried
though), unless there's something else forcing that file read-only.

> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
> longer fire for some VMAs, and we'd have to check if that's fine in
> all cases.

This will break __access_remote_vm() AFAICT since it can't do a CoW on
read-only private mapping.

> Having that said, this patch handles the case when the prctl is applied to a
> process after already having created some writable or executable mappings,
> to at least forbid if afterwards on these mappings. What is expected to
> happen if the process already has writable mappings that are executable at
> the time we enable the prctl?

They are expected to continue to work. The prctl() is meant to be
invoked by something like systemd so that any subsequent exec() will
inherit the property.

> Clarifying what the expected semantics with /proc/self/mem are would be
> nice.

Yeah, this series doesn't handle this. Topi, do you know if systemd does
anything about /proc/self/mem? To me this option is more about catching
inadvertent write|exec mappings rather than blocking programs that
insist on doing this (they can always map a memfd file twice with
separate write and exec attributes for example).

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 16:04           ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-01-23 16:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
> On 23.01.23 13:19, Catalin Marinas wrote:
> > On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
> > > On 19.01.23 17:03, Joey Gouly wrote:
> > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > index 58b3abd457a3..cee1e4b566d8 100644
> > > > --- a/include/linux/mman.h
> > > > +++ b/include/linux/mman.h
> > > > @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
> > > >    }
> > > >    unsigned long vm_commit_limit(void);
> > > > +
> > > > +/*
> > > > + * Denies creating a writable executable mapping or gaining executable permissions.
> > > > + *
> > > > + * This denies the following:
> > > > + *
> > > > + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> > > > + *
> > > > + *	b)	mmap(PROT_WRITE)
> > > > + *		mprotect(PROT_EXEC)
> > > > + *
> > > > + *	c)	mmap(PROT_WRITE)
> > > > + *		mprotect(PROT_READ)
> > > > + *		mprotect(PROT_EXEC)
> > > > + *
> > > > + * But allows the following:
> > > > + *
> > > > + *	d)	mmap(PROT_READ | PROT_EXEC)
> > > > + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> > > > + */
> > > 
> > > Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
> > > anymore? In an ideal world, there would be no further mprotect changes
> > > required.
> > 
> > I don't think it works for this scenario. We don't want to disable
> > PROT_EXEC entirely, only disallow it if the mapping is not already
> > executable. The below should be allowed:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
> > time.
> 
> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
> VM_EXEC of course).

This should work but it doesn't fully mimic systemd's MDWE behaviour
(e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).
Topi wanted to stay close to that at least in the first incarnation of
this control (can be extended later).

> But I guess we'd have to go one step further: if we allow exec access
> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).

Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
but as additional controls a process can enable.

> That at least would be then similar to how we handle mmaped files: if the
> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
> clear VM_MAYWRITE.

We still allow VM_MAYWRITE for private mappings, though we do clear
VM_MAYEXEC if not executable.

It would be nice to use VM_MAY* flags for this logic but we can only
emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
with some weird combinations like having VM_EXEC without VM_MAYEXEC
(maybe that's fine).

> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
> memory would be forbidden, which might also be what we are trying to
> achieve, or is that expected to still work?

I think currently with systemd's MDWE it still works (I haven't tried
though), unless there's something else forcing that file read-only.

> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
> longer fire for some VMAs, and we'd have to check if that's fine in
> all cases.

This will break __access_remote_vm() AFAICT since it can't do a CoW on
read-only private mapping.

> Having that said, this patch handles the case when the prctl is applied to a
> process after already having created some writable or executable mappings,
> to at least forbid if afterwards on these mappings. What is expected to
> happen if the process already has writable mappings that are executable at
> the time we enable the prctl?

They are expected to continue to work. The prctl() is meant to be
invoked by something like systemd so that any subsequent exec() will
inherit the property.

> Clarifying what the expected semantics with /proc/self/mem are would be
> nice.

Yeah, this series doesn't handle this. Topi, do you know if systemd does
anything about /proc/self/mem? To me this option is more about catching
inadvertent write|exec mappings rather than blocking programs that
insist on doing this (they can always map a memfd file twice with
separate write and exec attributes for example).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-23 16:04           ` Catalin Marinas
@ 2023-01-23 16:10             ` David Hildenbrand
  -1 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2023-01-23 16:10 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On 23.01.23 17:04, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
>> On 23.01.23 13:19, Catalin Marinas wrote:
>>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>>>> On 19.01.23 17:03, Joey Gouly wrote:
>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>> index 58b3abd457a3..cee1e4b566d8 100644
>>>>> --- a/include/linux/mman.h
>>>>> +++ b/include/linux/mman.h
>>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>     }
>>>>>     unsigned long vm_commit_limit(void);
>>>>> +
>>>>> +/*
>>>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>>>> + *
>>>>> + * This denies the following:
>>>>> + *
>>>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>>>> + *
>>>>> + *	b)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + *	c)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_READ)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * But allows the following:
>>>>> + *
>>>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>>>> + */
>>>>
>>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>>>> anymore? In an ideal world, there would be no further mprotect changes
>>>> required.
>>>
>>> I don't think it works for this scenario. We don't want to disable
>>> PROT_EXEC entirely, only disallow it if the mapping is not already
>>> executable. The below should be allowed:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
>>> time.
>>
>> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
>> VM_EXEC of course).
> 
> This should work but it doesn't fully mimic systemd's MDWE behaviour
> (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).

Interesting.

> Topi wanted to stay close to that at least in the first incarnation of
> this control (can be extended later).
> 
>> But I guess we'd have to go one step further: if we allow exec access
>> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).
> 
> Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
> but as additional controls a process can enable.
> 
>> That at least would be then similar to how we handle mmaped files: if the
>> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
>> clear VM_MAYWRITE.
> 
> We still allow VM_MAYWRITE for private mappings, though we do clear
> VM_MAYEXEC if not executable.
> 
> It would be nice to use VM_MAY* flags for this logic but we can only
> emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> with some weird combinations like having VM_EXEC without VM_MAYEXEC
> (maybe that's fine).

No, we wouldn't want VM_EXEC if VM_MAYEXEC is not set. I don't 
immediately see how that would happen.

> 
>> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
>> memory would be forbidden, which might also be what we are trying to
>> achieve, or is that expected to still work?
> 
> I think currently with systemd's MDWE it still works (I haven't tried
> though), unless there's something else forcing that file read-only.

Okay, just curious if this is an easy way to bypass the MDWE restriction.

> 
>> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
>> longer fire for some VMAs, and we'd have to check if that's fine in
>> all cases.
> 
> This will break __access_remote_vm() AFAICT since it can't do a CoW on
> read-only private mapping.

Yeah, might require some thought.

> 
>> Having that said, this patch handles the case when the prctl is applied to a
>> process after already having created some writable or executable mappings,
>> to at least forbid if afterwards on these mappings. What is expected to
>> happen if the process already has writable mappings that are executable at
>> the time we enable the prctl?
> 
> They are expected to continue to work. The prctl() is meant to be
> invoked by something like systemd so that any subsequent exec() will
> inherit the property.

Okay, thanks. So it's mostly about new processes inheriting that 
restriction.

> 
>> Clarifying what the expected semantics with /proc/self/mem are would be
>> nice.
> 
> Yeah, this series doesn't handle this. Topi, do you know if systemd does
> anything about /proc/self/mem? To me this option is more about catching
> inadvertent write|exec mappings rather than blocking programs that
> insist on doing this (they can always map a memfd file twice with
> separate write and exec attributes for example).

I remember some work regarding forbidding ececutable memfds.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 16:10             ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2023-01-23 16:10 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On 23.01.23 17:04, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
>> On 23.01.23 13:19, Catalin Marinas wrote:
>>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>>>> On 19.01.23 17:03, Joey Gouly wrote:
>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>> index 58b3abd457a3..cee1e4b566d8 100644
>>>>> --- a/include/linux/mman.h
>>>>> +++ b/include/linux/mman.h
>>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>     }
>>>>>     unsigned long vm_commit_limit(void);
>>>>> +
>>>>> +/*
>>>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>>>> + *
>>>>> + * This denies the following:
>>>>> + *
>>>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>>>> + *
>>>>> + *	b)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + *	c)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_READ)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * But allows the following:
>>>>> + *
>>>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>>>> + */
>>>>
>>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>>>> anymore? In an ideal world, there would be no further mprotect changes
>>>> required.
>>>
>>> I don't think it works for this scenario. We don't want to disable
>>> PROT_EXEC entirely, only disallow it if the mapping is not already
>>> executable. The below should be allowed:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
>>> time.
>>
>> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
>> VM_EXEC of course).
> 
> This should work but it doesn't fully mimic systemd's MDWE behaviour
> (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).

Interesting.

> Topi wanted to stay close to that at least in the first incarnation of
> this control (can be extended later).
> 
>> But I guess we'd have to go one step further: if we allow exec access
>> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).
> 
> Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
> but as additional controls a process can enable.
> 
>> That at least would be then similar to how we handle mmaped files: if the
>> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
>> clear VM_MAYWRITE.
> 
> We still allow VM_MAYWRITE for private mappings, though we do clear
> VM_MAYEXEC if not executable.
> 
> It would be nice to use VM_MAY* flags for this logic but we can only
> emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> with some weird combinations like having VM_EXEC without VM_MAYEXEC
> (maybe that's fine).

No, we wouldn't want VM_EXEC if VM_MAYEXEC is not set. I don't 
immediately see how that would happen.

> 
>> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
>> memory would be forbidden, which might also be what we are trying to
>> achieve, or is that expected to still work?
> 
> I think currently with systemd's MDWE it still works (I haven't tried
> though), unless there's something else forcing that file read-only.

Okay, just curious if this is an easy way to bypass the MDWE restriction.

> 
>> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
>> longer fire for some VMAs, and we'd have to check if that's fine in
>> all cases.
> 
> This will break __access_remote_vm() AFAICT since it can't do a CoW on
> read-only private mapping.

Yeah, might require some thought.

> 
>> Having that said, this patch handles the case when the prctl is applied to a
>> process after already having created some writable or executable mappings,
>> to at least forbid if afterwards on these mappings. What is expected to
>> happen if the process already has writable mappings that are executable at
>> the time we enable the prctl?
> 
> They are expected to continue to work. The prctl() is meant to be
> invoked by something like systemd so that any subsequent exec() will
> inherit the property.

Okay, thanks. So it's mostly about new processes inheriting that 
restriction.

> 
>> Clarifying what the expected semantics with /proc/self/mem are would be
>> nice.
> 
> Yeah, this series doesn't handle this. Topi, do you know if systemd does
> anything about /proc/self/mem? To me this option is more about catching
> inadvertent write|exec mappings rather than blocking programs that
> insist on doing this (they can always map a memfd file twice with
> separate write and exec attributes for example).

I remember some work regarding forbidding ececutable memfds.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-23 16:10             ` David Hildenbrand
@ 2023-01-23 16:22               ` Catalin Marinas
  -1 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-01-23 16:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Mon, Jan 23, 2023 at 05:10:08PM +0100, David Hildenbrand wrote:
> On 23.01.23 17:04, Catalin Marinas wrote:
> > On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
> > > That at least would be then similar to how we handle mmaped files: if the
> > > file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
> > > clear VM_MAYWRITE.
> > 
> > We still allow VM_MAYWRITE for private mappings, though we do clear
> > VM_MAYEXEC if not executable.
> > 
> > It would be nice to use VM_MAY* flags for this logic but we can only
> > emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> > flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> > already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> > with some weird combinations like having VM_EXEC without VM_MAYEXEC
> > (maybe that's fine).
> 
> No, we wouldn't want VM_EXEC if VM_MAYEXEC is not set. I don't immediately
> see how that would happen.

You are right, this shouldn't happen. What I had in mind was the current
MDWE model where after an mmap(PROT_EXEC), any mprotect(PROT_EXEC) is
denied. But this series departs slightly from this since we want to
allow PROT_EXEC if already executable.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 16:22               ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-01-23 16:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Mon, Jan 23, 2023 at 05:10:08PM +0100, David Hildenbrand wrote:
> On 23.01.23 17:04, Catalin Marinas wrote:
> > On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
> > > That at least would be then similar to how we handle mmaped files: if the
> > > file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
> > > clear VM_MAYWRITE.
> > 
> > We still allow VM_MAYWRITE for private mappings, though we do clear
> > VM_MAYEXEC if not executable.
> > 
> > It would be nice to use VM_MAY* flags for this logic but we can only
> > emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> > flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> > already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> > with some weird combinations like having VM_EXEC without VM_MAYEXEC
> > (maybe that's fine).
> 
> No, we wouldn't want VM_EXEC if VM_MAYEXEC is not set. I don't immediately
> see how that would happen.

You are right, this shouldn't happen. What I had in mind was the current
MDWE model where after an mmap(PROT_EXEC), any mprotect(PROT_EXEC) is
denied. But this series departs slightly from this since we want to
allow PROT_EXEC if already executable.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-23 16:04           ` Catalin Marinas
@ 2023-01-23 17:48             ` Topi Miettinen
  -1 siblings, 0 replies; 28+ messages in thread
From: Topi Miettinen @ 2023-01-23 17:48 UTC (permalink / raw)
  To: Catalin Marinas, David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On 23.1.2023 18.04, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
>> On 23.01.23 13:19, Catalin Marinas wrote:
>>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>>>> On 19.01.23 17:03, Joey Gouly wrote:
>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>> index 58b3abd457a3..cee1e4b566d8 100644
>>>>> --- a/include/linux/mman.h
>>>>> +++ b/include/linux/mman.h
>>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>     }
>>>>>     unsigned long vm_commit_limit(void);
>>>>> +
>>>>> +/*
>>>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>>>> + *
>>>>> + * This denies the following:
>>>>> + *
>>>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>>>> + *
>>>>> + *	b)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + *	c)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_READ)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * But allows the following:
>>>>> + *
>>>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>>>> + */
>>>>
>>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>>>> anymore? In an ideal world, there would be no further mprotect changes
>>>> required.
>>>
>>> I don't think it works for this scenario. We don't want to disable
>>> PROT_EXEC entirely, only disallow it if the mapping is not already
>>> executable. The below should be allowed:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
>>> time.
>>
>> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
>> VM_EXEC of course).
> 
> This should work but it doesn't fully mimic systemd's MDWE behaviour
> (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).
> Topi wanted to stay close to that at least in the first incarnation of
> this control (can be extended later).
> 
>> But I guess we'd have to go one step further: if we allow exec access
>> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).
> 
> Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
> but as additional controls a process can enable.
> 
>> That at least would be then similar to how we handle mmaped files: if the
>> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
>> clear VM_MAYWRITE.
> 
> We still allow VM_MAYWRITE for private mappings, though we do clear
> VM_MAYEXEC if not executable.
> 
> It would be nice to use VM_MAY* flags for this logic but we can only
> emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> with some weird combinations like having VM_EXEC without VM_MAYEXEC
> (maybe that's fine).
> 
>> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
>> memory would be forbidden, which might also be what we are trying to
>> achieve, or is that expected to still work?
> 
> I think currently with systemd's MDWE it still works (I haven't tried
> though), unless there's something else forcing that file read-only.
> 
>> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
>> longer fire for some VMAs, and we'd have to check if that's fine in
>> all cases.
> 
> This will break __access_remote_vm() AFAICT since it can't do a CoW on
> read-only private mapping.
> 
>> Having that said, this patch handles the case when the prctl is applied to a
>> process after already having created some writable or executable mappings,
>> to at least forbid if afterwards on these mappings. What is expected to
>> happen if the process already has writable mappings that are executable at
>> the time we enable the prctl?
> 
> They are expected to continue to work. The prctl() is meant to be
> invoked by something like systemd so that any subsequent exec() will
> inherit the property.
> 
>> Clarifying what the expected semantics with /proc/self/mem are would be
>> nice.
> 
> Yeah, this series doesn't handle this. Topi, do you know if systemd does
> anything about /proc/self/mem? To me this option is more about catching
> inadvertent write|exec mappings rather than blocking programs that
> insist on doing this (they can always map a memfd file twice with
> separate write and exec attributes for example).
> 

I don't think so. For 100% compatibility with seccomp, the same cases of 
mprotect() use should be blocked regardless of the file descriptor used. 
There could be more relaxed PR_MDWE_* controls in the future if needed.

Updated systemd PR: https://github.com/systemd/systemd/pull/25276

I wish there were highly granular access controls for /proc, including 
/proc/self and /proc/sys/*. Now the best options are to use mount 
namespaces and/or SELinux, but they aren't too good for that.

-Topi


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-01-23 17:48             ` Topi Miettinen
  0 siblings, 0 replies; 28+ messages in thread
From: Topi Miettinen @ 2023-01-23 17:48 UTC (permalink / raw)
  To: Catalin Marinas, David Hildenbrand
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, linux-mm,
	linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah

On 23.1.2023 18.04, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
>> On 23.01.23 13:19, Catalin Marinas wrote:
>>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>>>> On 19.01.23 17:03, Joey Gouly wrote:
>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>> index 58b3abd457a3..cee1e4b566d8 100644
>>>>> --- a/include/linux/mman.h
>>>>> +++ b/include/linux/mman.h
>>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>>>     }
>>>>>     unsigned long vm_commit_limit(void);
>>>>> +
>>>>> +/*
>>>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>>>> + *
>>>>> + * This denies the following:
>>>>> + *
>>>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>>>> + *
>>>>> + *	b)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + *	c)	mmap(PROT_WRITE)
>>>>> + *		mprotect(PROT_READ)
>>>>> + *		mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * But allows the following:
>>>>> + *
>>>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>>>> + */
>>>>
>>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>>>> anymore? In an ideal world, there would be no further mprotect changes
>>>> required.
>>>
>>> I don't think it works for this scenario. We don't want to disable
>>> PROT_EXEC entirely, only disallow it if the mapping is not already
>>> executable. The below should be allowed:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
>>> time.
>>
>> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
>> VM_EXEC of course).
> 
> This should work but it doesn't fully mimic systemd's MDWE behaviour
> (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).
> Topi wanted to stay close to that at least in the first incarnation of
> this control (can be extended later).
> 
>> But I guess we'd have to go one step further: if we allow exec access
>> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).
> 
> Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
> but as additional controls a process can enable.
> 
>> That at least would be then similar to how we handle mmaped files: if the
>> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
>> clear VM_MAYWRITE.
> 
> We still allow VM_MAYWRITE for private mappings, though we do clear
> VM_MAYEXEC if not executable.
> 
> It would be nice to use VM_MAY* flags for this logic but we can only
> emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> with some weird combinations like having VM_EXEC without VM_MAYEXEC
> (maybe that's fine).
> 
>> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
>> memory would be forbidden, which might also be what we are trying to
>> achieve, or is that expected to still work?
> 
> I think currently with systemd's MDWE it still works (I haven't tried
> though), unless there's something else forcing that file read-only.
> 
>> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
>> longer fire for some VMAs, and we'd have to check if that's fine in
>> all cases.
> 
> This will break __access_remote_vm() AFAICT since it can't do a CoW on
> read-only private mapping.
> 
>> Having that said, this patch handles the case when the prctl is applied to a
>> process after already having created some writable or executable mappings,
>> to at least forbid if afterwards on these mappings. What is expected to
>> happen if the process already has writable mappings that are executable at
>> the time we enable the prctl?
> 
> They are expected to continue to work. The prctl() is meant to be
> invoked by something like systemd so that any subsequent exec() will
> inherit the property.
> 
>> Clarifying what the expected semantics with /proc/self/mem are would be
>> nice.
> 
> Yeah, this series doesn't handle this. Topi, do you know if systemd does
> anything about /proc/self/mem? To me this option is more about catching
> inadvertent write|exec mappings rather than blocking programs that
> insist on doing this (they can always map a memfd file twice with
> separate write and exec attributes for example).
> 

I don't think so. For 100% compatibility with seccomp, the same cases of 
mprotect() use should be blocked regardless of the file descriptor used. 
There could be more relaxed PR_MDWE_* controls in the future if needed.

Updated systemd PR: https://github.com/systemd/systemd/pull/25276

I wish there were highly granular access controls for /proc, including 
/proc/self and /proc/sys/*. Now the best options are to use mount 
namespaces and/or SELinux, but they aren't too good for that.

-Topi


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2023-01-19 16:03   ` Joey Gouly
@ 2023-03-01 16:35     ` Peter Xu
  -1 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2023-03-01 16:35 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On Thu, Jan 19, 2023 at 04:03:44PM +0000, Joey Gouly wrote:
> From: Kees Cook <keescook@chromium.org>
> 
> Add some tests to cover the new PR_SET_MDWE prctl.
> 
> Co-developed-by: Joey Gouly <joey.gouly@arm.com>
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Cc: Shuah Khan <shuah@kernel.org>

May need to sync prctl.h into tools/include/uapi?  Otherwise selftests/mm
build fails here.  There's also one compiler report.  A fixup attached
which works for me.

Thanks,

-- 
Peter Xu

[-- Attachment #2: 0001-fixup-kselftest-vm-add-tests-for-memory-deny-write-e.patch --]
[-- Type: text/plain, Size: 1487 bytes --]

From ce8e17c244fcc743c7006316dd431c5650480756 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 1 Mar 2023 11:33:34 -0500
Subject: [PATCH] fixup! kselftest: vm: add tests for memory-deny-write-execute

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/include/uapi/linux/prctl.h       | 6 ++++++
 tools/testing/selftests/mm/mdwe_test.c | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index e4c629c1f1b0..759b3f53e53f 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -281,6 +281,12 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK		0xffff
 # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/* Memory deny write / execute */
+#define PR_SET_MDWE			65
+# define PR_MDWE_REFUSE_EXEC_GAIN	1
+
+#define PR_GET_MDWE			66
+
 #define PR_SET_VMA		0x53564d41
 # define PR_SET_VMA_ANON_NAME		0
 
diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index f466a099f1bf..bc91bef5d254 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -163,9 +163,8 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
 
 TEST_F(mdwe, mmap_FIXED)
 {
-	void *p, *p2;
+	void *p;
 
-	p2 = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
 	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
 	ASSERT_NE(self->p, MAP_FAILED);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2023-03-01 16:35     ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2023-03-01 16:35 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On Thu, Jan 19, 2023 at 04:03:44PM +0000, Joey Gouly wrote:
> From: Kees Cook <keescook@chromium.org>
> 
> Add some tests to cover the new PR_SET_MDWE prctl.
> 
> Co-developed-by: Joey Gouly <joey.gouly@arm.com>
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Cc: Shuah Khan <shuah@kernel.org>

May need to sync prctl.h into tools/include/uapi?  Otherwise selftests/mm
build fails here.  There's also one compiler report.  A fixup attached
which works for me.

Thanks,

-- 
Peter Xu

[-- Attachment #2: 0001-fixup-kselftest-vm-add-tests-for-memory-deny-write-e.patch --]
[-- Type: text/plain, Size: 1487 bytes --]

From ce8e17c244fcc743c7006316dd431c5650480756 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 1 Mar 2023 11:33:34 -0500
Subject: [PATCH] fixup! kselftest: vm: add tests for memory-deny-write-execute

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/include/uapi/linux/prctl.h       | 6 ++++++
 tools/testing/selftests/mm/mdwe_test.c | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index e4c629c1f1b0..759b3f53e53f 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -281,6 +281,12 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK		0xffff
 # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/* Memory deny write / execute */
+#define PR_SET_MDWE			65
+# define PR_MDWE_REFUSE_EXEC_GAIN	1
+
+#define PR_GET_MDWE			66
+
 #define PR_SET_VMA		0x53564d41
 # define PR_SET_VMA_ANON_NAME		0
 
diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index f466a099f1bf..bc91bef5d254 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -163,9 +163,8 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
 
 TEST_F(mdwe, mmap_FIXED)
 {
-	void *p, *p2;
+	void *p;
 
-	p2 = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
 	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
 	ASSERT_NE(self->p, MAP_FAILED);
 
-- 
2.39.1


[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute
  2023-03-01 16:35     ` Peter Xu
@ 2023-03-02 11:07       ` Joey Gouly
  -1 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-03-02 11:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah, Arnaldo Carvalho de Melo

Hi Peter,

On Wed, Mar 01, 2023 at 11:35:56AM -0500, Peter Xu wrote:
> On Thu, Jan 19, 2023 at 04:03:44PM +0000, Joey Gouly wrote:
> > From: Kees Cook <keescook@chromium.org>
> > 
> > Add some tests to cover the new PR_SET_MDWE prctl.
> > 
> > Co-developed-by: Joey Gouly <joey.gouly@arm.com>
> > Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> 
> May need to sync prctl.h into tools/include/uapi?  Otherwise selftests/mm
> build fails here.  There's also one compiler report.  A fixup attached
> which works for me.
> 
> Thanks,
> 
> -- 
> Peter Xu

I've CC'd Arnaldo because they seem to update the tools version of prctl.h a lot.

Sorry about the 'p2' variable in the test, was there for some experiments but seems
I accidentally included it.

Acked-by: Joey Gouly <joey.gouly@arm.com>

Thanks,
Joey

> >From ce8e17c244fcc743c7006316dd431c5650480756 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Wed, 1 Mar 2023 11:33:34 -0500
> Subject: [PATCH] fixup! kselftest: vm: add tests for memory-deny-write-execute
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tools/include/uapi/linux/prctl.h       | 6 ++++++
>  tools/testing/selftests/mm/mdwe_test.c | 3 +--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
> index e4c629c1f1b0..759b3f53e53f 100644
> --- a/tools/include/uapi/linux/prctl.h
> +++ b/tools/include/uapi/linux/prctl.h
> @@ -281,6 +281,12 @@ struct prctl_mm_map {
>  # define PR_SME_VL_LEN_MASK		0xffff
>  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
>  
> +/* Memory deny write / execute */
> +#define PR_SET_MDWE			65
> +# define PR_MDWE_REFUSE_EXEC_GAIN	1
> +
> +#define PR_GET_MDWE			66
> +
>  #define PR_SET_VMA		0x53564d41
>  # define PR_SET_VMA_ANON_NAME		0
>  
> diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
> index f466a099f1bf..bc91bef5d254 100644
> --- a/tools/testing/selftests/mm/mdwe_test.c
> +++ b/tools/testing/selftests/mm/mdwe_test.c
> @@ -163,9 +163,8 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
>  
>  TEST_F(mdwe, mmap_FIXED)
>  {
> -	void *p, *p2;
> +	void *p;
>  
> -	p2 = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
>  	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
>  	ASSERT_NE(self->p, MAP_FAILED);
>  
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute
@ 2023-03-02 11:07       ` Joey Gouly
  0 siblings, 0 replies; 28+ messages in thread
From: Joey Gouly @ 2023-03-02 11:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah, Arnaldo Carvalho de Melo

Hi Peter,

On Wed, Mar 01, 2023 at 11:35:56AM -0500, Peter Xu wrote:
> On Thu, Jan 19, 2023 at 04:03:44PM +0000, Joey Gouly wrote:
> > From: Kees Cook <keescook@chromium.org>
> > 
> > Add some tests to cover the new PR_SET_MDWE prctl.
> > 
> > Co-developed-by: Joey Gouly <joey.gouly@arm.com>
> > Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > Cc: Shuah Khan <shuah@kernel.org>
> 
> May need to sync prctl.h into tools/include/uapi?  Otherwise selftests/mm
> build fails here.  There's also one compiler report.  A fixup attached
> which works for me.
> 
> Thanks,
> 
> -- 
> Peter Xu

I've CC'd Arnaldo because they seem to update the tools version of prctl.h a lot.

Sorry about the 'p2' variable in the test, was there for some experiments but seems
I accidentally included it.

Acked-by: Joey Gouly <joey.gouly@arm.com>

Thanks,
Joey

> >From ce8e17c244fcc743c7006316dd431c5650480756 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Wed, 1 Mar 2023 11:33:34 -0500
> Subject: [PATCH] fixup! kselftest: vm: add tests for memory-deny-write-execute
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tools/include/uapi/linux/prctl.h       | 6 ++++++
>  tools/testing/selftests/mm/mdwe_test.c | 3 +--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
> index e4c629c1f1b0..759b3f53e53f 100644
> --- a/tools/include/uapi/linux/prctl.h
> +++ b/tools/include/uapi/linux/prctl.h
> @@ -281,6 +281,12 @@ struct prctl_mm_map {
>  # define PR_SME_VL_LEN_MASK		0xffff
>  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
>  
> +/* Memory deny write / execute */
> +#define PR_SET_MDWE			65
> +# define PR_MDWE_REFUSE_EXEC_GAIN	1
> +
> +#define PR_GET_MDWE			66
> +
>  #define PR_SET_VMA		0x53564d41
>  # define PR_SET_VMA_ANON_NAME		0
>  
> diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
> index f466a099f1bf..bc91bef5d254 100644
> --- a/tools/testing/selftests/mm/mdwe_test.c
> +++ b/tools/testing/selftests/mm/mdwe_test.c
> @@ -163,9 +163,8 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
>  
>  TEST_F(mdwe, mmap_FIXED)
>  {
> -	void *p, *p2;
> +	void *p;
>  
> -	p2 = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0);
>  	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
>  	ASSERT_NE(self->p, MAP_FAILED);
>  
> -- 
> 2.39.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-01-19 16:03   ` Joey Gouly
@ 2023-03-07 13:01     ` Alexey Izbyshev
  -1 siblings, 0 replies; 28+ messages in thread
From: Alexey Izbyshev @ 2023-03-07 13:01 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On 2023-01-19 19:03, Joey Gouly wrote:
> The aim of such policy is to prevent a user task from creating an
> executable mapping that is also writeable.
> 
> An example of mmap() returning -EACCESS if the policy is enabled:
> 
> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> 
> Similarly, mprotect() would return -EACCESS below:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> The BPF filter that systemd MDWE uses is stateless, and disallows
> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC 
> to
> be enabled if it was already PROT_EXEC, which allows the following 
> case:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> where PROT_BTI enables branch tracking identification on arm64.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/sched/coredump.h |  6 +++++-
>  include/uapi/linux/prctl.h     |  6 ++++++
>  kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>  mm/mmap.c                      | 10 ++++++++++
>  mm/mprotect.c                  |  5 +++++
>  6 files changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..cee1e4b566d8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>  }
> 
>  unsigned long vm_commit_limit(void);
> +
> +/*
> + * Denies creating a writable executable mapping or gaining
> executable permissions.
> + *
> + * This denies the following:
> + *
> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> + *
> + *	b)	mmap(PROT_WRITE)
> + *		mprotect(PROT_EXEC)
> + *
> + *	c)	mmap(PROT_WRITE)
> + *		mprotect(PROT_READ)
> + *		mprotect(PROT_EXEC)
> + *
> + * But allows the following:
> + *
> + *	d)	mmap(PROT_READ | PROT_EXEC)
> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> + */
> +static inline bool map_deny_write_exec(struct vm_area_struct *vma,
> unsigned long vm_flags)
> +{
> +	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
> +		return false;
> +
> +	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
> +		return true;
> +
> +	if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
> +		return true;
> +
> +	return false;
> +}
> +
>  #endif /* _LINUX_MMAN_H */
> diff --git a/include/linux/sched/coredump.h 
> b/include/linux/sched/coredump.h
> index 8270ad7ae14c..0e17ae7fbfd3 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
>   * lifecycle of this mm, just for simplicity.
>   */
>  #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
> +
> +#define MMF_HAS_MDWE		28
> +#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
> +
>  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> 
>  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> -				 MMF_DISABLE_THP_MASK)
> +				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
> 
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index a5e06dcbba13..1312a137f7fb 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -281,6 +281,12 @@ struct prctl_mm_map {
>  # define PR_SME_VL_LEN_MASK		0xffff
>  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
> 
> +/* Memory deny write / execute */
> +#define PR_SET_MDWE			65
> +# define PR_MDWE_REFUSE_EXEC_GAIN	1
> +
> +#define PR_GET_MDWE			66
> +
>  #define PR_SET_VMA		0x53564d41
>  # define PR_SET_VMA_ANON_NAME		0
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 5fd54bf0e886..b3cab94545ed 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2348,6 +2348,33 @@ static int prctl_set_vma(unsigned long opt,
> unsigned long start,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
> 
> +static inline int prctl_set_mdwe(unsigned long bits, unsigned long 
> arg3,
> +				 unsigned long arg4, unsigned long arg5)
> +{
> +	if (arg3 || arg4 || arg5)
> +		return -EINVAL;
> +
> +	if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
> +		return -EINVAL;
> +
> +	if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
> +		set_bit(MMF_HAS_MDWE, &current->mm->flags);
> +	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
> +		return -EPERM; /* Cannot unset the flag */
> +
> +	return 0;
> +}
> +
> +static inline int prctl_get_mdwe(unsigned long arg2, unsigned long 
> arg3,
> +				 unsigned long arg4, unsigned long arg5)
> +{
> +	if (arg2 || arg3 || arg4 || arg5)
> +		return -EINVAL;
> +
> +	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
> +		PR_MDWE_REFUSE_EXEC_GAIN : 0;
> +}
> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned 
> long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2623,6 +2650,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> long, arg2, unsigned long, arg3,
>  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
>  		break;
>  #endif
> +	case PR_SET_MDWE:
> +		error = prctl_set_mdwe(arg2, arg3, arg4, arg5);
> +		break;
> +	case PR_GET_MDWE:
> +		error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
> +		break;
>  	case PR_SET_VMA:
>  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
>  		break;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 87d929316d57..99a4d9e2b0d8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2665,6 +2665,16 @@ unsigned long mmap_region(struct file *file,
> unsigned long addr,
>  		vma_set_anonymous(vma);
>  	}
> 
> +	if (map_deny_write_exec(vma, vma->vm_flags)) {
> +		error = -EACCES;
> +		if (file)
> +			goto close_and_free_vma;
> +		else if (vma->vm_file)
> +			goto unmap_and_free_vma;
> +		else
> +			goto free_vma;
> +	}
> +

Why is the cleanup dispatch logic duplicated here, instead of simply 
doing "goto close_and_free_vma" (where basically the same dispatch is 
done)?

>  	/* Allow architectures to sanity-check the vm_flags */
>  	if (!arch_validate_flags(vma->vm_flags)) {
>  		error = -EINVAL;
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 908df12caa26..bc0587df042f 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -762,6 +762,11 @@ static int do_mprotect_pkey(unsigned long start,
> size_t len,
>  			break;
>  		}
> 
> +		if (map_deny_write_exec(vma, newflags)) {
> +			error = -EACCES;
> +			goto out;
> +		}
> +

Why does this check use "goto out", thereby skipping post-loop cleanup, 
instead of "break" like all other checks? This looks like a bug to me.

Alexey
>  		/* Allow architectures to sanity-check the new flags */
>  		if (!arch_validate_flags(newflags)) {
>  			error = -EINVAL;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-03-07 13:01     ` Alexey Izbyshev
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Izbyshev @ 2023-03-07 13:01 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Catalin Marinas, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On 2023-01-19 19:03, Joey Gouly wrote:
> The aim of such policy is to prevent a user task from creating an
> executable mapping that is also writeable.
> 
> An example of mmap() returning -EACCESS if the policy is enabled:
> 
> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> 
> Similarly, mprotect() would return -EACCESS below:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> The BPF filter that systemd MDWE uses is stateless, and disallows
> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC 
> to
> be enabled if it was already PROT_EXEC, which allows the following 
> case:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> where PROT_BTI enables branch tracking identification on arm64.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/sched/coredump.h |  6 +++++-
>  include/uapi/linux/prctl.h     |  6 ++++++
>  kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>  mm/mmap.c                      | 10 ++++++++++
>  mm/mprotect.c                  |  5 +++++
>  6 files changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..cee1e4b566d8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>  }
> 
>  unsigned long vm_commit_limit(void);
> +
> +/*
> + * Denies creating a writable executable mapping or gaining
> executable permissions.
> + *
> + * This denies the following:
> + *
> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> + *
> + *	b)	mmap(PROT_WRITE)
> + *		mprotect(PROT_EXEC)
> + *
> + *	c)	mmap(PROT_WRITE)
> + *		mprotect(PROT_READ)
> + *		mprotect(PROT_EXEC)
> + *
> + * But allows the following:
> + *
> + *	d)	mmap(PROT_READ | PROT_EXEC)
> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> + */
> +static inline bool map_deny_write_exec(struct vm_area_struct *vma,
> unsigned long vm_flags)
> +{
> +	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
> +		return false;
> +
> +	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
> +		return true;
> +
> +	if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
> +		return true;
> +
> +	return false;
> +}
> +
>  #endif /* _LINUX_MMAN_H */
> diff --git a/include/linux/sched/coredump.h 
> b/include/linux/sched/coredump.h
> index 8270ad7ae14c..0e17ae7fbfd3 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm)
>   * lifecycle of this mm, just for simplicity.
>   */
>  #define MMF_HAS_PINNED		27	/* FOLL_PIN has run, never cleared */
> +
> +#define MMF_HAS_MDWE		28
> +#define MMF_HAS_MDWE_MASK	(1 << MMF_HAS_MDWE)
> +
>  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> 
>  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> -				 MMF_DISABLE_THP_MASK)
> +				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
> 
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index a5e06dcbba13..1312a137f7fb 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -281,6 +281,12 @@ struct prctl_mm_map {
>  # define PR_SME_VL_LEN_MASK		0xffff
>  # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
> 
> +/* Memory deny write / execute */
> +#define PR_SET_MDWE			65
> +# define PR_MDWE_REFUSE_EXEC_GAIN	1
> +
> +#define PR_GET_MDWE			66
> +
>  #define PR_SET_VMA		0x53564d41
>  # define PR_SET_VMA_ANON_NAME		0
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 5fd54bf0e886..b3cab94545ed 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2348,6 +2348,33 @@ static int prctl_set_vma(unsigned long opt,
> unsigned long start,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
> 
> +static inline int prctl_set_mdwe(unsigned long bits, unsigned long 
> arg3,
> +				 unsigned long arg4, unsigned long arg5)
> +{
> +	if (arg3 || arg4 || arg5)
> +		return -EINVAL;
> +
> +	if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
> +		return -EINVAL;
> +
> +	if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
> +		set_bit(MMF_HAS_MDWE, &current->mm->flags);
> +	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
> +		return -EPERM; /* Cannot unset the flag */
> +
> +	return 0;
> +}
> +
> +static inline int prctl_get_mdwe(unsigned long arg2, unsigned long 
> arg3,
> +				 unsigned long arg4, unsigned long arg5)
> +{
> +	if (arg2 || arg3 || arg4 || arg5)
> +		return -EINVAL;
> +
> +	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
> +		PR_MDWE_REFUSE_EXEC_GAIN : 0;
> +}
> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned 
> long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2623,6 +2650,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> long, arg2, unsigned long, arg3,
>  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
>  		break;
>  #endif
> +	case PR_SET_MDWE:
> +		error = prctl_set_mdwe(arg2, arg3, arg4, arg5);
> +		break;
> +	case PR_GET_MDWE:
> +		error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
> +		break;
>  	case PR_SET_VMA:
>  		error = prctl_set_vma(arg2, arg3, arg4, arg5);
>  		break;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 87d929316d57..99a4d9e2b0d8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2665,6 +2665,16 @@ unsigned long mmap_region(struct file *file,
> unsigned long addr,
>  		vma_set_anonymous(vma);
>  	}
> 
> +	if (map_deny_write_exec(vma, vma->vm_flags)) {
> +		error = -EACCES;
> +		if (file)
> +			goto close_and_free_vma;
> +		else if (vma->vm_file)
> +			goto unmap_and_free_vma;
> +		else
> +			goto free_vma;
> +	}
> +

Why is the cleanup dispatch logic duplicated here, instead of simply 
doing "goto close_and_free_vma" (where basically the same dispatch is 
done)?

>  	/* Allow architectures to sanity-check the vm_flags */
>  	if (!arch_validate_flags(vma->vm_flags)) {
>  		error = -EINVAL;
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 908df12caa26..bc0587df042f 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -762,6 +762,11 @@ static int do_mprotect_pkey(unsigned long start,
> size_t len,
>  			break;
>  		}
> 
> +		if (map_deny_write_exec(vma, newflags)) {
> +			error = -EACCES;
> +			goto out;
> +		}
> +

Why does this check use "goto out", thereby skipping post-loop cleanup, 
instead of "break" like all other checks? This looks like a bug to me.

Alexey
>  		/* Allow architectures to sanity-check the new flags */
>  		if (!arch_validate_flags(newflags)) {
>  			error = -EINVAL;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
  2023-03-07 13:01     ` Alexey Izbyshev
@ 2023-03-08 12:36       ` Catalin Marinas
  -1 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-03-08 12:36 UTC (permalink / raw)
  To: Alexey Izbyshev
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Tue, Mar 07, 2023 at 04:01:56PM +0300, Alexey Izbyshev wrote:
> On 2023-01-19 19:03, Joey Gouly wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 87d929316d57..99a4d9e2b0d8 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2665,6 +2665,16 @@ unsigned long mmap_region(struct file *file,
> > unsigned long addr,
> >  		vma_set_anonymous(vma);
> >  	}
> > 
> > +	if (map_deny_write_exec(vma, vma->vm_flags)) {
> > +		error = -EACCES;
> > +		if (file)
> > +			goto close_and_free_vma;
> > +		else if (vma->vm_file)
> > +			goto unmap_and_free_vma;
> > +		else
> > +			goto free_vma;
> > +	}
> > +
> 
> Why is the cleanup dispatch logic duplicated here, instead of simply doing
> "goto close_and_free_vma" (where basically the same dispatch is done)?

Yes, though that's only possible after commit cc8d1b097de7 ("mmap: clean
up mmap_region() unrolling") in 6.3-rc1. It's worth adding a separate
patch to simplify this before final 6.3.

> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 908df12caa26..bc0587df042f 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -762,6 +762,11 @@ static int do_mprotect_pkey(unsigned long start,
> > size_t len,
> >  			break;
> >  		}
> > 
> > +		if (map_deny_write_exec(vma, newflags)) {
> > +			error = -EACCES;
> > +			goto out;
> > +		}
> > +
> 
> Why does this check use "goto out", thereby skipping post-loop cleanup,
> instead of "break" like all other checks? This looks like a bug to me.

Ah, good point, thanks. I think that's a left-over from my early attempt
at this series. The loop was changed in 5.19 with commit 4a18419f71cd
("mm/mprotect: use mmu_gather") but the patch not updated.

So yeah, it needs fixing. Joey, could you please send fixes for both
issues above?

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
@ 2023-03-08 12:36       ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2023-03-08 12:36 UTC (permalink / raw)
  To: Alexey Izbyshev
  Cc: Joey Gouly, Andrew Morton, Lennart Poettering,
	Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook,
	Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen,
	linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd,
	shuah

On Tue, Mar 07, 2023 at 04:01:56PM +0300, Alexey Izbyshev wrote:
> On 2023-01-19 19:03, Joey Gouly wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 87d929316d57..99a4d9e2b0d8 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2665,6 +2665,16 @@ unsigned long mmap_region(struct file *file,
> > unsigned long addr,
> >  		vma_set_anonymous(vma);
> >  	}
> > 
> > +	if (map_deny_write_exec(vma, vma->vm_flags)) {
> > +		error = -EACCES;
> > +		if (file)
> > +			goto close_and_free_vma;
> > +		else if (vma->vm_file)
> > +			goto unmap_and_free_vma;
> > +		else
> > +			goto free_vma;
> > +	}
> > +
> 
> Why is the cleanup dispatch logic duplicated here, instead of simply doing
> "goto close_and_free_vma" (where basically the same dispatch is done)?

Yes, though that's only possible after commit cc8d1b097de7 ("mmap: clean
up mmap_region() unrolling") in 6.3-rc1. It's worth adding a separate
patch to simplify this before final 6.3.

> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 908df12caa26..bc0587df042f 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -762,6 +762,11 @@ static int do_mprotect_pkey(unsigned long start,
> > size_t len,
> >  			break;
> >  		}
> > 
> > +		if (map_deny_write_exec(vma, newflags)) {
> > +			error = -EACCES;
> > +			goto out;
> > +		}
> > +
> 
> Why does this check use "goto out", thereby skipping post-loop cleanup,
> instead of "break" like all other checks? This looks like a bug to me.

Ah, good point, thanks. I think that's a left-over from my early attempt
at this series. The loop was changed in 5.19 with commit 4a18419f71cd
("mm/mprotect: use mmu_gather") but the patch not updated.

So yeah, it needs fixing. Joey, could you please send fixes for both
issues above?

Thanks.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-03-08 12:38 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19 16:03 [PATCH v2 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly
2023-01-19 16:03 ` Joey Gouly
2023-01-19 16:03 ` [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly
2023-01-19 16:03   ` Joey Gouly
2023-01-23 11:45   ` David Hildenbrand
2023-01-23 11:45     ` David Hildenbrand
2023-01-23 12:19     ` Catalin Marinas
2023-01-23 12:19       ` Catalin Marinas
2023-01-23 12:53       ` David Hildenbrand
2023-01-23 12:53         ` David Hildenbrand
2023-01-23 16:04         ` Catalin Marinas
2023-01-23 16:04           ` Catalin Marinas
2023-01-23 16:10           ` David Hildenbrand
2023-01-23 16:10             ` David Hildenbrand
2023-01-23 16:22             ` Catalin Marinas
2023-01-23 16:22               ` Catalin Marinas
2023-01-23 17:48           ` Topi Miettinen
2023-01-23 17:48             ` Topi Miettinen
2023-03-07 13:01   ` Alexey Izbyshev
2023-03-07 13:01     ` Alexey Izbyshev
2023-03-08 12:36     ` Catalin Marinas
2023-03-08 12:36       ` Catalin Marinas
2023-01-19 16:03 ` [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly
2023-01-19 16:03   ` Joey Gouly
2023-03-01 16:35   ` Peter Xu
2023-03-01 16:35     ` Peter Xu
2023-03-02 11:07     ` Joey Gouly
2023-03-02 11:07       ` Joey Gouly

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.