All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] MDWE without inheritance
@ 2023-05-04 17:09 Florent Revest
  2023-05-04 17:09 ` [PATCH 1/4] kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test Florent Revest
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Florent Revest @ 2023-05-04 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, catalin.marinas, anshuman.khandual, joey.gouly, mhocko,
	keescook, david, peterx, izbyshev, nd, broonie, szabolcs.nagy,
	Florent Revest

Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags
current with a flag that prevents pages that were previously not executable from
becoming executable.
This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK)

At Google, we've been using a somewhat similar downstream patch for a few years
now. To make the adoption of this feature easier, we've had it support a mode in
which the W^X flag does not propagate to children. For example, this is handy if
a C process which wants W^X protection suspects it could start children
processes that would use a JIT.

I'd like to align our features with the upstream prctl. This series proposes a
new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It
sets a different flag in current that is not in MMF_INIT_MASK and which does not
propagate.

As part of looking into MDWE, I also fixed a couple of things in the MDWE test.

Florent Revest (4):
  kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test
  kselftest: vm: Fix mdwe's mmap_FIXED test case
  mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl
  kselftest: vm: Add tests for no-inherit memory-deny-write-execute

 include/linux/mman.h                   |   8 +-
 include/linux/sched/coredump.h         |   1 +
 include/uapi/linux/prctl.h             |   1 +
 kernel/sys.c                           |  29 +++++--
 tools/include/uapi/linux/prctl.h       |   1 +
 tools/testing/selftests/mm/mdwe_test.c | 110 +++++++++++++++++++++----
 6 files changed, 128 insertions(+), 22 deletions(-)

-- 
2.40.1.495.gc816e09b53d-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test
  2023-05-04 17:09 [PATCH 0/4] MDWE without inheritance Florent Revest
@ 2023-05-04 17:09 ` Florent Revest
  2023-05-04 17:09 ` [PATCH 2/4] kselftest: vm: Fix mdwe's mmap_FIXED test case Florent Revest
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Florent Revest @ 2023-05-04 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, catalin.marinas, anshuman.khandual, joey.gouly, mhocko,
	keescook, david, peterx, izbyshev, nd, broonie, szabolcs.nagy,
	Florent Revest

Signed-off-by: Florent Revest <revest@chromium.org>
---
 tools/testing/selftests/mm/mdwe_test.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index bc91bef5d254..d0954c657feb 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -49,19 +49,19 @@ FIXTURE_VARIANT(mdwe)
 
 FIXTURE_VARIANT_ADD(mdwe, stock)
 {
-        .enabled = false,
+	.enabled = false,
 	.forked = false,
 };
 
 FIXTURE_VARIANT_ADD(mdwe, enabled)
 {
-        .enabled = true,
+	.enabled = true,
 	.forked = false,
 };
 
 FIXTURE_VARIANT_ADD(mdwe, forked)
 {
-        .enabled = true,
+	.enabled = true,
 	.forked = true,
 };
 
-- 
2.40.1.495.gc816e09b53d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] kselftest: vm: Fix mdwe's mmap_FIXED test case
  2023-05-04 17:09 [PATCH 0/4] MDWE without inheritance Florent Revest
  2023-05-04 17:09 ` [PATCH 1/4] kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test Florent Revest
@ 2023-05-04 17:09 ` Florent Revest
  2023-05-04 17:13   ` Florent Revest
  2023-05-04 17:09 ` [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl Florent Revest
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Florent Revest @ 2023-05-04 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, catalin.marinas, anshuman.khandual, joey.gouly, mhocko,
	keescook, david, peterx, izbyshev, nd, broonie, szabolcs.nagy,
	Florent Revest

I checked with the original author, the mmap_FIXED test case wasn't
properly tested and fails. Currently, it maps two consecutive (non
overlapping) pages and expects the second mapping to be denied by MDWE
but these two pages have nothing to do with each other so MDWE is
actually out of the picture here.

What the test actually intended to do was to remap a virtual address
using MAP_FIXED. However, this operation unmaps the existing mapping and
creates a new one so the va is backed by a new page and MDWE is again
out of the picture, all remappings should succeed.

This patch keeps the test case to make it clear that this situation is
expected to work.

Signed-off-by: Florent Revest <revest@chromium.org>
---
 tools/testing/selftests/mm/mdwe_test.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index d0954c657feb..91aa9c3099e7 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -168,13 +168,10 @@ TEST_F(mdwe, mmap_FIXED)
 	self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
 	ASSERT_NE(self->p, MAP_FAILED);
 
-	p = mmap(self->p + self->size, self->size, PROT_READ | PROT_EXEC,
+	/* MAP_FIXED unmaps the existing page before mapping which is allowed */
+	p = mmap(self->p, self->size, PROT_READ | PROT_EXEC,
 		 self->flags | MAP_FIXED, 0, 0);
-	if (variant->enabled) {
-		EXPECT_EQ(p, MAP_FAILED);
-	} else {
-		EXPECT_EQ(p, self->p);
-	}
+	EXPECT_EQ(p, self->p);
 }
 
 TEST_F(mdwe, arm64_BTI)
-- 
2.40.1.495.gc816e09b53d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl
  2023-05-04 17:09 [PATCH 0/4] MDWE without inheritance Florent Revest
  2023-05-04 17:09 ` [PATCH 1/4] kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test Florent Revest
  2023-05-04 17:09 ` [PATCH 2/4] kselftest: vm: Fix mdwe's mmap_FIXED test case Florent Revest
@ 2023-05-04 17:09 ` Florent Revest
  2023-05-05 18:34   ` Catalin Marinas
  2023-05-04 17:09 ` [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute Florent Revest
  2023-05-04 20:06 ` [PATCH 0/4] MDWE without inheritance Peter Xu
  4 siblings, 1 reply; 19+ messages in thread
From: Florent Revest @ 2023-05-04 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, catalin.marinas, anshuman.khandual, joey.gouly, mhocko,
	keescook, david, peterx, izbyshev, nd, broonie, szabolcs.nagy,
	Florent Revest

This extends the current PR_SET_MDWE prctl arg with a bit to indicate
that the process doesn't want MDWE protection to propagate to children.

To implement this no-inherit mode, the tag in current->mm->flags must be
absent from MMF_INIT_MASK. This means that the encoding for "MDWE but
without inherit" is different in the prctl than in the mm flags. This
leads to a bit of bit-mangling in the prctl implementation.

Signed-off-by: Florent Revest <revest@chromium.org>
---
 include/linux/mman.h             |  8 +++++++-
 include/linux/sched/coredump.h   |  1 +
 include/uapi/linux/prctl.h       |  1 +
 kernel/sys.c                     | 29 +++++++++++++++++++++++------
 tools/include/uapi/linux/prctl.h |  1 +
 5 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index cee1e4b566d8..3d7a0b70ad2d 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -157,6 +157,12 @@ calc_vm_flag_bits(unsigned long flags)
 
 unsigned long vm_commit_limit(void);
 
+static inline bool has_mdwe_enabled(struct task_struct *task)
+{
+	return test_bit(MMF_HAS_MDWE, &task->mm->flags) ||
+	       test_bit(MMF_HAS_MDWE_NO_INHERIT, &task->mm->flags);
+}
+
 /*
  * Denies creating a writable executable mapping or gaining executable permissions.
  *
@@ -178,7 +184,7 @@ unsigned long vm_commit_limit(void);
  */
 static inline bool map_deny_write_exec(struct vm_area_struct *vma,  unsigned long vm_flags)
 {
-	if (!test_bit(MMF_HAS_MDWE, &current->mm->flags))
+	if (!has_mdwe_enabled(current))
 		return false;
 
 	if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 0ee96ea7a0e9..b2d9659ef863 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -91,4 +91,5 @@ static inline int get_dumpable(struct mm_struct *mm)
 				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
 
 #define MMF_VM_MERGE_ANY	29
+#define MMF_HAS_MDWE_NO_INHERIT	30
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index f23d9a16507f..31ec44728412 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -284,6 +284,7 @@ struct prctl_mm_map {
 /* Memory deny write / execute */
 #define PR_SET_MDWE			65
 # define PR_MDWE_REFUSE_EXEC_GAIN	1
+# define PR_MDWE_NO_INHERIT		2
 
 #define PR_GET_MDWE			66
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 339fee3eff6a..c864fd42ece1 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2368,12 +2368,25 @@ static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
 	if (arg3 || arg4 || arg5)
 		return -EINVAL;
 
-	if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
+	if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT))
 		return -EINVAL;
 
-	if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
-		set_bit(MMF_HAS_MDWE, &current->mm->flags);
-	else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
+	/* Cannot set NO_INHERIT without REFUSE_EXEC_GAIN */
+	if (bits & PR_MDWE_NO_INHERIT && !(bits & PR_MDWE_REFUSE_EXEC_GAIN))
+		return -EINVAL;
+
+	if (bits & PR_MDWE_REFUSE_EXEC_GAIN) {
+		if (bits & PR_MDWE_NO_INHERIT) {
+			/* Cannot go from inherit mode to no inherit */
+			if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
+				return -EPERM;
+
+			set_bit(MMF_HAS_MDWE_NO_INHERIT, &current->mm->flags);
+		} else {
+			set_bit(MMF_HAS_MDWE, &current->mm->flags);
+			clear_bit(MMF_HAS_MDWE_NO_INHERIT, &current->mm->flags);
+		}
+	} else if (has_mdwe_enabled(current))
 		return -EPERM; /* Cannot unset the flag */
 
 	return 0;
@@ -2385,8 +2398,12 @@ static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
 	if (arg2 || arg3 || arg4 || arg5)
 		return -EINVAL;
 
-	return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
-		PR_MDWE_REFUSE_EXEC_GAIN : 0;
+	if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
+		return PR_MDWE_REFUSE_EXEC_GAIN;
+	else if (test_bit(MMF_HAS_MDWE_NO_INHERIT, &current->mm->flags))
+		return PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT;
+
+	return 0;
 }
 
 static int prctl_get_auxv(void __user *addr, unsigned long len)
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 759b3f53e53f..a3424852d2d6 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -284,6 +284,7 @@ struct prctl_mm_map {
 /* Memory deny write / execute */
 #define PR_SET_MDWE			65
 # define PR_MDWE_REFUSE_EXEC_GAIN	1
+# define PR_MDWE_NO_INHERIT		2
 
 #define PR_GET_MDWE			66
 
-- 
2.40.1.495.gc816e09b53d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute
  2023-05-04 17:09 [PATCH 0/4] MDWE without inheritance Florent Revest
                   ` (2 preceding siblings ...)
  2023-05-04 17:09 ` [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl Florent Revest
@ 2023-05-04 17:09 ` Florent Revest
  2023-05-04 20:29   ` Alexey Izbyshev
  2023-05-04 20:06 ` [PATCH 0/4] MDWE without inheritance Peter Xu
  4 siblings, 1 reply; 19+ messages in thread
From: Florent Revest @ 2023-05-04 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, catalin.marinas, anshuman.khandual, joey.gouly, mhocko,
	keescook, david, peterx, izbyshev, nd, broonie, szabolcs.nagy,
	Florent Revest

Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
PR_SET_MDWE prctl.

Signed-off-by: Florent Revest <revest@chromium.org>
---
 tools/testing/selftests/mm/mdwe_test.c | 95 ++++++++++++++++++++++++--
 1 file changed, 89 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index 91aa9c3099e7..9f08ed1b99ae 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -22,6 +22,8 @@
 
 TEST(prctl_flags)
 {
+	EXPECT_LT(prctl(PR_SET_MDWE, PR_MDWE_NO_INHERIT, 0L, 0L, 7L), 0);
+
 	EXPECT_LT(prctl(PR_SET_MDWE, 7L, 0L, 0L, 0L), 0);
 	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 7L, 0L, 0L), 0);
 	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 7L, 0L), 0);
@@ -33,6 +35,66 @@ TEST(prctl_flags)
 	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 0L, 7L), 0);
 }
 
+FIXTURE(consecutive_prctl_flags) {};
+FIXTURE_SETUP(consecutive_prctl_flags) {}
+FIXTURE_TEARDOWN(consecutive_prctl_flags) {}
+
+FIXTURE_VARIANT(consecutive_prctl_flags)
+{
+	unsigned long first_flags;
+	unsigned long second_flags;
+	bool should_work;
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, same)
+{
+	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+	.second_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+	.should_work = true,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_disable_mdwe)
+{
+	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+	.second_flags = 0,
+	.should_work = false,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_disable_mdwe_no_inherit)
+{
+	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
+	.second_flags = 0,
+	.should_work = false,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, can_lower_privileges)
+{
+	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
+	.second_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+	.should_work = true,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_gain_privileges)
+{
+	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+	.second_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
+	.should_work = false,
+};
+
+TEST_F(consecutive_prctl_flags, two_prctls)
+{
+	int ret;
+
+	EXPECT_EQ(prctl(PR_SET_MDWE, variant->first_flags, 0L, 0L, 0L), 0);
+
+	ret = prctl(PR_SET_MDWE, variant->second_flags, 0L, 0L, 0L);
+	if (variant->should_work) {
+		EXPECT_EQ(ret, 0);
+	} else {
+		EXPECT_NE(ret, 0);
+	}
+}
+
 FIXTURE(mdwe)
 {
 	void *p;
@@ -45,28 +107,45 @@ FIXTURE_VARIANT(mdwe)
 {
 	bool enabled;
 	bool forked;
+	bool inherit;
 };
 
 FIXTURE_VARIANT_ADD(mdwe, stock)
 {
 	.enabled = false,
 	.forked = false,
+	.inherit = false,
 };
 
 FIXTURE_VARIANT_ADD(mdwe, enabled)
 {
 	.enabled = true,
 	.forked = false,
+	.inherit = true,
 };
 
-FIXTURE_VARIANT_ADD(mdwe, forked)
+FIXTURE_VARIANT_ADD(mdwe, inherited)
 {
 	.enabled = true,
 	.forked = true,
+	.inherit = true,
 };
 
+FIXTURE_VARIANT_ADD(mdwe, not_inherited)
+{
+	.enabled = true,
+	.forked = true,
+	.inherit = false,
+};
+
+static bool executable_map_should_fail(const FIXTURE_VARIANT(mdwe) *variant)
+{
+	return variant->enabled && (!variant->forked || variant->inherit);
+}
+
 FIXTURE_SETUP(mdwe)
 {
+	unsigned long mdwe_flags;
 	int ret, status;
 
 	self->p = NULL;
@@ -76,13 +155,17 @@ FIXTURE_SETUP(mdwe)
 	if (!variant->enabled)
 		return;
 
-	ret = prctl(PR_SET_MDWE, PR_MDWE_REFUSE_EXEC_GAIN, 0L, 0L, 0L);
+	mdwe_flags = PR_MDWE_REFUSE_EXEC_GAIN;
+	if (!variant->inherit)
+		mdwe_flags |= PR_MDWE_NO_INHERIT;
+
+	ret = prctl(PR_SET_MDWE, mdwe_flags, 0L, 0L, 0L);
 	ASSERT_EQ(ret, 0) {
 		TH_LOG("PR_SET_MDWE failed or unsupported");
 	}
 
 	ret = prctl(PR_GET_MDWE, 0L, 0L, 0L, 0L);
-	ASSERT_EQ(ret, 1);
+	ASSERT_EQ(ret, mdwe_flags);
 
 	if (variant->forked) {
 		self->pid = fork();
@@ -113,7 +196,7 @@ TEST_F(mdwe, mmap_READ_EXEC)
 TEST_F(mdwe, mmap_WRITE_EXEC)
 {
 	self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0);
-	if (variant->enabled) {
+	if (executable_map_should_fail(variant)) {
 		EXPECT_EQ(self->p, MAP_FAILED);
 	} else {
 		EXPECT_NE(self->p, MAP_FAILED);
@@ -139,7 +222,7 @@ TEST_F(mdwe, mprotect_add_EXEC)
 	ASSERT_NE(self->p, MAP_FAILED);
 
 	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
-	if (variant->enabled) {
+	if (executable_map_should_fail(variant)) {
 		EXPECT_LT(ret, 0);
 	} else {
 		EXPECT_EQ(ret, 0);
@@ -154,7 +237,7 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
 	ASSERT_NE(self->p, MAP_FAILED);
 
 	ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
-	if (variant->enabled) {
+	if (executable_map_should_fail(variant)) {
 		EXPECT_LT(ret, 0);
 	} else {
 		EXPECT_EQ(ret, 0);
-- 
2.40.1.495.gc816e09b53d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] kselftest: vm: Fix mdwe's mmap_FIXED test case
  2023-05-04 17:09 ` [PATCH 2/4] kselftest: vm: Fix mdwe's mmap_FIXED test case Florent Revest
@ 2023-05-04 17:13   ` Florent Revest
  0 siblings, 0 replies; 19+ messages in thread
From: Florent Revest @ 2023-05-04 17:13 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, catalin.marinas, anshuman.khandual, joey.gouly, mhocko,
	keescook, david, peterx, izbyshev, nd, broonie, szabolcs.nagy

On Thu, May 4, 2023 at 7:10 PM Florent Revest <revest@chromium.org> wrote:
>
> I checked with the original author, the mmap_FIXED test case wasn't
> properly tested and fails. Currently, it maps two consecutive (non
> overlapping) pages and expects the second mapping to be denied by MDWE
> but these two pages have nothing to do with each other so MDWE is
> actually out of the picture here.
>
> What the test actually intended to do was to remap a virtual address
> using MAP_FIXED. However, this operation unmaps the existing mapping and
> creates a new one so the va is backed by a new page and MDWE is again
> out of the picture, all remappings should succeed.
>
> This patch keeps the test case to make it clear that this situation is
> expected to work.
>
> Signed-off-by: Florent Revest <revest@chromium.org>

Ah, snap, I hit send to fast and forgot to add:

Fixes: 4cf1fe34fd18 ("kselftest: vm: add tests for memory-deny-write-execute")

I will do it in the next iteration (surely there'll be other things to
fix... :))

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-04 17:09 [PATCH 0/4] MDWE without inheritance Florent Revest
                   ` (3 preceding siblings ...)
  2023-05-04 17:09 ` [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute Florent Revest
@ 2023-05-04 20:06 ` Peter Xu
  2023-05-05 16:42   ` Florent Revest
  4 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2023-05-04 20:06 UTC (permalink / raw)
  To: Florent Revest
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, izbyshev, nd, broonie,
	szabolcs.nagy

On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote:
> Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags
> current with a flag that prevents pages that were previously not executable from
> becoming executable.
> This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK)
> 
> At Google, we've been using a somewhat similar downstream patch for a few years
> now. To make the adoption of this feature easier, we've had it support a mode in
> which the W^X flag does not propagate to children. For example, this is handy if
> a C process which wants W^X protection suspects it could start children
> processes that would use a JIT.
> 
> I'd like to align our features with the upstream prctl. This series proposes a
> new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It
> sets a different flag in current that is not in MMF_INIT_MASK and which does not
> propagate.

I don't think I have enough context, so sorry if I'm going to ask a naive
question..

I can understand how current MDWE helps on not allowing any modifi-able
content from becoming executable.  How could NO_INHERIT help if it won't
inherit and not in MMF_INIT_MASK?

IIUC it means the restriction will only apply to the current process.  Then
I assume the process can escape from this rule simply by a fork().  If so,
what's the point to protect at all?

And, what's the difference of this comparing to disabling MDWE after being
enabled (which seems to be forbidden for now, but it seems fork() can play
a similar role of disabling it)?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute
  2023-05-04 17:09 ` [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute Florent Revest
@ 2023-05-04 20:29   ` Alexey Izbyshev
  2023-05-05 16:42     ` Florent Revest
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Izbyshev @ 2023-05-04 20:29 UTC (permalink / raw)
  To: Florent Revest
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, peterx, nd, broonie,
	szabolcs.nagy

On 2023-05-04 20:09, Florent Revest wrote:
> Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
> PR_SET_MDWE prctl.
> 
> Signed-off-by: Florent Revest <revest@chromium.org>
> ---
>  tools/testing/selftests/mm/mdwe_test.c | 95 ++++++++++++++++++++++++--
>  1 file changed, 89 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/mdwe_test.c
> b/tools/testing/selftests/mm/mdwe_test.c
> index 91aa9c3099e7..9f08ed1b99ae 100644
> --- a/tools/testing/selftests/mm/mdwe_test.c
> +++ b/tools/testing/selftests/mm/mdwe_test.c
> @@ -22,6 +22,8 @@
> 
>  TEST(prctl_flags)
>  {
> +	EXPECT_LT(prctl(PR_SET_MDWE, PR_MDWE_NO_INHERIT, 0L, 0L, 7L), 0);
> +

PR_MDWE_NO_INHERIT is defined to an int constant, so passing it to 
prctl() without a cast to long or similar may produce wrong code on 
64-bit targets (ABIs typically don't require the compiler to clear the 
upper 32 bits of a 64-bit register when passing a 32-bit argument, so 
va_arg(arg, unsigned long) in prctl() implementation might get junk).

Arguably, defining PR_MDWE_* to plain int constants is a bug, or at 
least a footgun for users of uapi headers.

Thanks,
Alexey

>  	EXPECT_LT(prctl(PR_SET_MDWE, 7L, 0L, 0L, 0L), 0);
>  	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 7L, 0L, 0L), 0);
>  	EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 7L, 0L), 0);
> @@ -33,6 +35,66 @@ TEST(prctl_flags)
>  	EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 0L, 7L), 0);
>  }
> 
> +FIXTURE(consecutive_prctl_flags) {};
> +FIXTURE_SETUP(consecutive_prctl_flags) {}
> +FIXTURE_TEARDOWN(consecutive_prctl_flags) {}
> +
> +FIXTURE_VARIANT(consecutive_prctl_flags)
> +{
> +	unsigned long first_flags;
> +	unsigned long second_flags;
> +	bool should_work;
> +};
> +
> +FIXTURE_VARIANT_ADD(consecutive_prctl_flags, same)
> +{
> +	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
> +	.second_flags = PR_MDWE_REFUSE_EXEC_GAIN,
> +	.should_work = true,
> +};
> +
> +FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_disable_mdwe)
> +{
> +	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
> +	.second_flags = 0,
> +	.should_work = false,
> +};
> +
> +FIXTURE_VARIANT_ADD(consecutive_prctl_flags, 
> cant_disable_mdwe_no_inherit)
> +{
> +	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
> +	.second_flags = 0,
> +	.should_work = false,
> +};
> +
> +FIXTURE_VARIANT_ADD(consecutive_prctl_flags, can_lower_privileges)
> +{
> +	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
> +	.second_flags = PR_MDWE_REFUSE_EXEC_GAIN,
> +	.should_work = true,
> +};
> +
> +FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_gain_privileges)
> +{
> +	.first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
> +	.second_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
> +	.should_work = false,
> +};
> +
> +TEST_F(consecutive_prctl_flags, two_prctls)
> +{
> +	int ret;
> +
> +	EXPECT_EQ(prctl(PR_SET_MDWE, variant->first_flags, 0L, 0L, 0L), 0);
> +
> +	ret = prctl(PR_SET_MDWE, variant->second_flags, 0L, 0L, 0L);
> +	if (variant->should_work) {
> +		EXPECT_EQ(ret, 0);
> +	} else {
> +		EXPECT_NE(ret, 0);
> +	}
> +}
> +
>  FIXTURE(mdwe)
>  {
>  	void *p;
> @@ -45,28 +107,45 @@ FIXTURE_VARIANT(mdwe)
>  {
>  	bool enabled;
>  	bool forked;
> +	bool inherit;
>  };
> 
>  FIXTURE_VARIANT_ADD(mdwe, stock)
>  {
>  	.enabled = false,
>  	.forked = false,
> +	.inherit = false,
>  };
> 
>  FIXTURE_VARIANT_ADD(mdwe, enabled)
>  {
>  	.enabled = true,
>  	.forked = false,
> +	.inherit = true,
>  };
> 
> -FIXTURE_VARIANT_ADD(mdwe, forked)
> +FIXTURE_VARIANT_ADD(mdwe, inherited)
>  {
>  	.enabled = true,
>  	.forked = true,
> +	.inherit = true,
>  };
> 
> +FIXTURE_VARIANT_ADD(mdwe, not_inherited)
> +{
> +	.enabled = true,
> +	.forked = true,
> +	.inherit = false,
> +};
> +
> +static bool executable_map_should_fail(const FIXTURE_VARIANT(mdwe) 
> *variant)
> +{
> +	return variant->enabled && (!variant->forked || variant->inherit);
> +}
> +
>  FIXTURE_SETUP(mdwe)
>  {
> +	unsigned long mdwe_flags;
>  	int ret, status;
> 
>  	self->p = NULL;
> @@ -76,13 +155,17 @@ FIXTURE_SETUP(mdwe)
>  	if (!variant->enabled)
>  		return;
> 
> -	ret = prctl(PR_SET_MDWE, PR_MDWE_REFUSE_EXEC_GAIN, 0L, 0L, 0L);
> +	mdwe_flags = PR_MDWE_REFUSE_EXEC_GAIN;
> +	if (!variant->inherit)
> +		mdwe_flags |= PR_MDWE_NO_INHERIT;
> +
> +	ret = prctl(PR_SET_MDWE, mdwe_flags, 0L, 0L, 0L);
>  	ASSERT_EQ(ret, 0) {
>  		TH_LOG("PR_SET_MDWE failed or unsupported");
>  	}
> 
>  	ret = prctl(PR_GET_MDWE, 0L, 0L, 0L, 0L);
> -	ASSERT_EQ(ret, 1);
> +	ASSERT_EQ(ret, mdwe_flags);
> 
>  	if (variant->forked) {
>  		self->pid = fork();
> @@ -113,7 +196,7 @@ TEST_F(mdwe, mmap_READ_EXEC)
>  TEST_F(mdwe, mmap_WRITE_EXEC)
>  {
>  	self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 
> 0, 0);
> -	if (variant->enabled) {
> +	if (executable_map_should_fail(variant)) {
>  		EXPECT_EQ(self->p, MAP_FAILED);
>  	} else {
>  		EXPECT_NE(self->p, MAP_FAILED);
> @@ -139,7 +222,7 @@ TEST_F(mdwe, mprotect_add_EXEC)
>  	ASSERT_NE(self->p, MAP_FAILED);
> 
>  	ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
> -	if (variant->enabled) {
> +	if (executable_map_should_fail(variant)) {
>  		EXPECT_LT(ret, 0);
>  	} else {
>  		EXPECT_EQ(ret, 0);
> @@ -154,7 +237,7 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
>  	ASSERT_NE(self->p, MAP_FAILED);
> 
>  	ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
> -	if (variant->enabled) {
> +	if (executable_map_should_fail(variant)) {
>  		EXPECT_LT(ret, 0);
>  	} else {
>  		EXPECT_EQ(ret, 0);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-04 20:06 ` [PATCH 0/4] MDWE without inheritance Peter Xu
@ 2023-05-05 16:42   ` Florent Revest
  2023-05-08  1:29     ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Florent Revest @ 2023-05-05 16:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, izbyshev, nd, broonie,
	szabolcs.nagy

On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote:
> > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags
> > current with a flag that prevents pages that were previously not executable from
> > becoming executable.
> > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK)
> >
> > At Google, we've been using a somewhat similar downstream patch for a few years
> > now. To make the adoption of this feature easier, we've had it support a mode in
> > which the W^X flag does not propagate to children. For example, this is handy if
> > a C process which wants W^X protection suspects it could start children
> > processes that would use a JIT.
> >
> > I'd like to align our features with the upstream prctl. This series proposes a
> > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It
> > sets a different flag in current that is not in MMF_INIT_MASK and which does not
> > propagate.
>
> I don't think I have enough context, so sorry if I'm going to ask a naive
> question..

Not at all! :) You're absolutely right, it's important to address these points.

> I can understand how current MDWE helps on not allowing any modifi-able
> content from becoming executable.  How could NO_INHERIT help if it won't
> inherit and not in MMF_INIT_MASK?

The way I see it, enabling MDWE is just a small step towards hardening
a binary anyway. It can possibly make exploitation a bit harder in the
case where the attacker has _just_: a write primitive they can use to
write a shellcode somewhere and a primitive to make that page
executable later. It's a fairly narrow protection already and I think
it only really helps as part of a broader "defense in depth" strategy.

> IIUC it means the restriction will only apply to the current process.  Then
> I assume the process can escape from this rule simply by a fork().  If so,
> what's the point to protect at all?

If we assume enough control from the attacker, then MDWE is already
useless since it can be bypassed by writing to a file and then
mmapping that file with PROT_EXEC. I think that's a good example of
how "perfect can be the enemy of good" in security hardening. MDWE
isn't a silver-bullet but it's a cheap trick and it makes a small dent
in reducing the attack surface so it seems worth having anyway ?

But indeed, to address your question, if you choose to use this
NO_INHERIT flag: you're no longer protected if the attacker can fork()
as part of their exploitation. I think it's been a useful trade-off
for our internal users since, on the other hand, it also makes
adoption a lot easier: our C++ services developers can trivially opt
into a potpourri of hardening features without having to think too
much about how they work under-the-hood. The default behavior has been
to use a NO_INHERIT strategy so users don't get bad surprises the day
when they try to spawn a JITted subcommand. In the meantime, their C++
service still gets a little bit of extra protection.

> And, what's the difference of this comparing to disabling MDWE after being
> enabled (which seems to be forbidden for now, but it seems fork() can play
> a similar role of disabling it)?

That would be functionally somewhat similar, yes. I think it mostly
comes down to ease of adoption. I imagine that users who would opt
into NO_INHERIT are those who are interested in MDWE for the binary
they are writing but aren't 100% confident in what subprocesses they
will run and so they don't have to think about disabling it after
every fork.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute
  2023-05-04 20:29   ` Alexey Izbyshev
@ 2023-05-05 16:42     ` Florent Revest
  2023-05-05 21:26       ` Alexey Izbyshev
  0 siblings, 1 reply; 19+ messages in thread
From: Florent Revest @ 2023-05-05 16:42 UTC (permalink / raw)
  To: Alexey Izbyshev
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, peterx, nd, broonie,
	szabolcs.nagy

On Thu, May 4, 2023 at 10:30 PM Alexey Izbyshev <izbyshev@ispras.ru> wrote:
>
> On 2023-05-04 20:09, Florent Revest wrote:
> > Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
> > PR_SET_MDWE prctl.
> >
> > Signed-off-by: Florent Revest <revest@chromium.org>
> > ---
> >  tools/testing/selftests/mm/mdwe_test.c | 95 ++++++++++++++++++++++++--
> >  1 file changed, 89 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/mm/mdwe_test.c
> > b/tools/testing/selftests/mm/mdwe_test.c
> > index 91aa9c3099e7..9f08ed1b99ae 100644
> > --- a/tools/testing/selftests/mm/mdwe_test.c
> > +++ b/tools/testing/selftests/mm/mdwe_test.c
> > @@ -22,6 +22,8 @@
> >
> >  TEST(prctl_flags)
> >  {
> > +     EXPECT_LT(prctl(PR_SET_MDWE, PR_MDWE_NO_INHERIT, 0L, 0L, 7L), 0);
> > +
>
> PR_MDWE_NO_INHERIT is defined to an int constant, so passing it to
> prctl() without a cast to long or similar may produce wrong code on
> 64-bit targets (ABIs typically don't require the compiler to clear the
> upper 32 bits of a 64-bit register when passing a 32-bit argument, so
> va_arg(arg, unsigned long) in prctl() implementation might get junk).

Ah, good catch Alexey! :)

> Arguably, defining PR_MDWE_* to plain int constants is a bug, or at
> least a footgun for users of uapi headers.

As part of the next version of this series, I'm happy to:
1- change the existing PR_MDWE_REFUSE_EXEC_GAIN to 1UL
2- introduce PR_MDWE_NO_INHERIT as 2UL

But I'm surprised that most of the macros in
include/uapi/linux/prctl.h are the same sort of footguns already ?
Hasn't it been an issue for other prctls yet ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl
  2023-05-04 17:09 ` [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl Florent Revest
@ 2023-05-05 18:34   ` Catalin Marinas
  2023-05-08 12:11     ` Florent Revest
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2023-05-05 18:34 UTC (permalink / raw)
  To: Florent Revest
  Cc: linux-kernel, linux-mm, akpm, anshuman.khandual, joey.gouly,
	mhocko, keescook, david, peterx, izbyshev, nd, broonie,
	szabolcs.nagy

On Thu, May 04, 2023 at 07:09:41PM +0200, Florent Revest wrote:
> This extends the current PR_SET_MDWE prctl arg with a bit to indicate
> that the process doesn't want MDWE protection to propagate to children.
> 
> To implement this no-inherit mode, the tag in current->mm->flags must be
> absent from MMF_INIT_MASK. This means that the encoding for "MDWE but
> without inherit" is different in the prctl than in the mm flags. This
> leads to a bit of bit-mangling in the prctl implementation.

That bit mangling is not that bad but it complicates the code a bit,
especially if we'll add new bits in the future. We also need to check
both the original and the no-inherit bits for each feature.

Another question is whether we want to support more fine-grained
inheriting or just a big knob that disables inheriting for all the
(future) MDWE flags.

I think a somewhat simpler way would be to clear the flags on fork(),
either based on a big MMF_HAS_MDWE_NO_INHERIT knob or individual ones.
Something like below (completely untested):

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 0ee96ea7a0e9..ca83a0c8d19c 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -91,4 +91,12 @@ static inline int get_dumpable(struct mm_struct *mm)
 				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)

 #define MMF_VM_MERGE_ANY	29
+
+#define MMF_INIT_FLAGS(flags)	({				\
+	unsigned long new_flags = flags;			\
+	if (new_flags & (1UL << MMF_HAS_MDWE_NO_INHERIT))	\
+		new_flags &= ~(1UL << MMF_HAS_MDWE_MASK);	\
+	new_flags & MMF_INIT_MASK;				\
+})
+
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index ed4e01daccaa..53f0b68a5451 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1288,7 +1288,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	hugetlb_count_init(mm);

 	if (current->mm) {
-		mm->flags = current->mm->flags & MMF_INIT_MASK;
+		mm->flags = MMF_INIT_FLAGS(current->mm->flags);
 		mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
 	} else {
 		mm->flags = default_dump_filter;

The checks in MMF_INIT_FLAGS() can grow in time if we add more bits in
there but we still only keep a single flag that determines whether the
feature is enabled (maybe that's more like bikeshedding at this moment
when we have a single bit).


(fun remark: I see you cc'ed nd@arm.com'; that's not a real person, it's
what our IT folk asked us to add on cc so that the Exchange server
doesn't append the legal disclaimer; most lists are covered already
without such cc but I guess people feel safer to add it, just in case)

-- 
Catalin

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute
  2023-05-05 16:42     ` Florent Revest
@ 2023-05-05 21:26       ` Alexey Izbyshev
  2023-05-08 12:12         ` Florent Revest
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Izbyshev @ 2023-05-05 21:26 UTC (permalink / raw)
  To: Florent Revest
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, peterx, nd, broonie,
	szabolcs.nagy

On 2023-05-05 19:42, Florent Revest wrote:
> On Thu, May 4, 2023 at 10:30 PM Alexey Izbyshev <izbyshev@ispras.ru> 
> wrote:
>> 
>> On 2023-05-04 20:09, Florent Revest wrote:
>> > Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
>> > PR_SET_MDWE prctl.
>> >
>> > Signed-off-by: Florent Revest <revest@chromium.org>
>> > ---
>> >  tools/testing/selftests/mm/mdwe_test.c | 95 ++++++++++++++++++++++++--
>> >  1 file changed, 89 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/tools/testing/selftests/mm/mdwe_test.c
>> > b/tools/testing/selftests/mm/mdwe_test.c
>> > index 91aa9c3099e7..9f08ed1b99ae 100644
>> > --- a/tools/testing/selftests/mm/mdwe_test.c
>> > +++ b/tools/testing/selftests/mm/mdwe_test.c
>> > @@ -22,6 +22,8 @@
>> >
>> >  TEST(prctl_flags)
>> >  {
>> > +     EXPECT_LT(prctl(PR_SET_MDWE, PR_MDWE_NO_INHERIT, 0L, 0L, 7L), 0);
>> > +
>> 
>> PR_MDWE_NO_INHERIT is defined to an int constant, so passing it to
>> prctl() without a cast to long or similar may produce wrong code on
>> 64-bit targets (ABIs typically don't require the compiler to clear the
>> upper 32 bits of a 64-bit register when passing a 32-bit argument, so
>> va_arg(arg, unsigned long) in prctl() implementation might get junk).
> 
> Ah, good catch Alexey! :)
> 
>> Arguably, defining PR_MDWE_* to plain int constants is a bug, or at
>> least a footgun for users of uapi headers.
> 
> As part of the next version of this series, I'm happy to:
> 1- change the existing PR_MDWE_REFUSE_EXEC_GAIN to 1UL
> 2- introduce PR_MDWE_NO_INHERIT as 2UL
> 
Yes, I think it's the right thing to do. I suggest to spell them as (1UL 
<< 0), etc. for consistency with all other unsigned long flags in the 
header.

> But I'm surprised that most of the macros in
> include/uapi/linux/prctl.h are the same sort of footguns already ?
> Hasn't it been an issue for other prctls yet ?

Yes, they are. I'm not aware of a public discussion of this specific 
issue, but note that at least for some prctl() options the kernel 
doesn't care about upper bits because arguments are truncated before 
doing anything else with them (e.g. for PR_SCHED_CORE raw prctl() 
arguments are implicitly converted to what sched_core_share_pid() 
expects). Also, actually getting junk in the upper bits might not always 
be easy (e.g. on x86-64 all or almost all instructions with r32 
destination operand clear the upper bits). Unfortunately, I don't have a 
better answer than this.

Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-05 16:42   ` Florent Revest
@ 2023-05-08  1:29     ` Peter Xu
  2023-05-08 12:12       ` Florent Revest
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2023-05-08  1:29 UTC (permalink / raw)
  To: Florent Revest
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, izbyshev, nd, broonie,
	szabolcs.nagy

On Fri, May 05, 2023 at 06:42:08PM +0200, Florent Revest wrote:
> On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote:
> > > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags
> > > current with a flag that prevents pages that were previously not executable from
> > > becoming executable.
> > > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK)
> > >
> > > At Google, we've been using a somewhat similar downstream patch for a few years
> > > now. To make the adoption of this feature easier, we've had it support a mode in
> > > which the W^X flag does not propagate to children. For example, this is handy if
> > > a C process which wants W^X protection suspects it could start children
> > > processes that would use a JIT.
> > >
> > > I'd like to align our features with the upstream prctl. This series proposes a
> > > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It
> > > sets a different flag in current that is not in MMF_INIT_MASK and which does not
> > > propagate.
> >
> > I don't think I have enough context, so sorry if I'm going to ask a naive
> > question..
> 
> Not at all! :) You're absolutely right, it's important to address these points.
> 
> > I can understand how current MDWE helps on not allowing any modifi-able
> > content from becoming executable.  How could NO_INHERIT help if it won't
> > inherit and not in MMF_INIT_MASK?
> 
> The way I see it, enabling MDWE is just a small step towards hardening
> a binary anyway. It can possibly make exploitation a bit harder in the
> case where the attacker has _just_: a write primitive they can use to
> write a shellcode somewhere and a primitive to make that page
> executable later. It's a fairly narrow protection already and I think
> it only really helps as part of a broader "defense in depth" strategy.
> 
> > IIUC it means the restriction will only apply to the current process.  Then
> > I assume the process can escape from this rule simply by a fork().  If so,
> > what's the point to protect at all?
> 
> If we assume enough control from the attacker, then MDWE is already
> useless since it can be bypassed by writing to a file and then
> mmapping that file with PROT_EXEC. I think that's a good example of
> how "perfect can be the enemy of good" in security hardening. MDWE
> isn't a silver-bullet but it's a cheap trick and it makes a small dent
> in reducing the attack surface so it seems worth having anyway ?
> 
> But indeed, to address your question, if you choose to use this
> NO_INHERIT flag: you're no longer protected if the attacker can fork()
> as part of their exploitation. I think it's been a useful trade-off
> for our internal users since, on the other hand, it also makes
> adoption a lot easier: our C++ services developers can trivially opt
> into a potpourri of hardening features without having to think too
> much about how they work under-the-hood. The default behavior has been
> to use a NO_INHERIT strategy so users don't get bad surprises the day
> when they try to spawn a JITted subcommand. In the meantime, their C++
> service still gets a little bit of extra protection.
> 
> > And, what's the difference of this comparing to disabling MDWE after being
> > enabled (which seems to be forbidden for now, but it seems fork() can play
> > a similar role of disabling it)?
> 
> That would be functionally somewhat similar, yes. I think it mostly
> comes down to ease of adoption. I imagine that users who would opt
> into NO_INHERIT are those who are interested in MDWE for the binary
> they are writing but aren't 100% confident in what subprocesses they
> will run and so they don't have to think about disabling it after
> every fork.

Okay, that makes sense to me.  Thanks.

Since the original MDWE was for systemd, I'm wondering what will happen if
some program like what you said is invoked by systemd and with MDWE enabled
already.

Currently in your patch IIUC MDWE_NO_INHERIT will fail directly on MDWE
enabled process, but then it makes me think whether it makes more sense to
allow converting MDWE->MDWE_NO_INHERIT in this case.  It seems to provide a
most broad coverage on system daemons using MDWE starting from systemd
initial process, meanwhile allows specific daemons to fork into anything
like a JIT process so it can make itself NO_INHERIT.  Attackers won't
leverage this because MDWE_NO_INHERIT also means MDWE enabled.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl
  2023-05-05 18:34   ` Catalin Marinas
@ 2023-05-08 12:11     ` Florent Revest
  0 siblings, 0 replies; 19+ messages in thread
From: Florent Revest @ 2023-05-08 12:11 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-mm, akpm, anshuman.khandual, joey.gouly,
	mhocko, keescook, david, peterx, izbyshev, nd, broonie,
	szabolcs.nagy

On Fri, May 5, 2023 at 8:34 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, May 04, 2023 at 07:09:41PM +0200, Florent Revest wrote:
> > This extends the current PR_SET_MDWE prctl arg with a bit to indicate
> > that the process doesn't want MDWE protection to propagate to children.
> >
> > To implement this no-inherit mode, the tag in current->mm->flags must be
> > absent from MMF_INIT_MASK. This means that the encoding for "MDWE but
> > without inherit" is different in the prctl than in the mm flags. This
> > leads to a bit of bit-mangling in the prctl implementation.
>
> That bit mangling is not that bad but it complicates the code a bit,
> especially if we'll add new bits in the future. We also need to check
> both the original and the no-inherit bits for each feature.

I agree :)

> Another question is whether we want to support more fine-grained
> inheriting or just a big knob that disables inheriting for all the
> (future) MDWE flags.

Yep, I can't think of a more fine-grained inheritance model off the
top of my head but it would be good to have a sane base for when the
need arises.

> I think a somewhat simpler way would be to clear the flags on fork(),
> either based on a big MMF_HAS_MDWE_NO_INHERIT knob or individual ones.
> Something like below (completely untested):
>
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 0ee96ea7a0e9..ca83a0c8d19c 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -91,4 +91,12 @@ static inline int get_dumpable(struct mm_struct *mm)
>                                  MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
>
>  #define MMF_VM_MERGE_ANY       29
> +
> +#define MMF_INIT_FLAGS(flags)  ({                              \
> +       unsigned long new_flags = flags;                        \
> +       if (new_flags & (1UL << MMF_HAS_MDWE_NO_INHERIT))       \
> +               new_flags &= ~(1UL << MMF_HAS_MDWE_MASK);       \
> +       new_flags & MMF_INIT_MASK;                              \
> +})
> +
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index ed4e01daccaa..53f0b68a5451 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1288,7 +1288,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>         hugetlb_count_init(mm);
>
>         if (current->mm) {
> -               mm->flags = current->mm->flags & MMF_INIT_MASK;
> +               mm->flags = MMF_INIT_FLAGS(current->mm->flags);
>                 mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
>         } else {
>                 mm->flags = default_dump_filter;
>
> The checks in MMF_INIT_FLAGS() can grow in time if we add more bits in
> there but we still only keep a single flag that determines whether the
> feature is enabled (maybe that's more like bikeshedding at this moment
> when we have a single bit).

Sounds good!

I had considered something like this but I was afraid I'd spill too
much logic into fork.c... I didn't think of making it a neat macro in
coredump.h. That's a good point, I'll do this in v2.

> (fun remark: I see you cc'ed nd@arm.com'; that's not a real person, it's
> what our IT folk asked us to add on cc so that the Exchange server
> doesn't append the legal disclaimer; most lists are covered already
> without such cc but I guess people feel safer to add it, just in case)

Ahah! I mostly just copied the cc list from Joey's series. I remember
wondering why I couldn't find any patch sent by this mysterious ND but
I thought that if they got such a cool username, surely they must have
been at ARM since the early days and have some important role... :)
Then... mister nd won't get to see my v2! Thanks for the heads up.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute
  2023-05-05 21:26       ` Alexey Izbyshev
@ 2023-05-08 12:12         ` Florent Revest
  0 siblings, 0 replies; 19+ messages in thread
From: Florent Revest @ 2023-05-08 12:12 UTC (permalink / raw)
  To: Alexey Izbyshev
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, peterx, nd, broonie,
	szabolcs.nagy

On Fri, May 5, 2023 at 11:26 PM Alexey Izbyshev <izbyshev@ispras.ru> wrote:
>
> On 2023-05-05 19:42, Florent Revest wrote:
> > On Thu, May 4, 2023 at 10:30 PM Alexey Izbyshev <izbyshev@ispras.ru>
> > wrote:
> >>
> >> On 2023-05-04 20:09, Florent Revest wrote:
> >> > Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
> >> > PR_SET_MDWE prctl.
> >> >
> >> > Signed-off-by: Florent Revest <revest@chromium.org>
> >> > ---
> >> >  tools/testing/selftests/mm/mdwe_test.c | 95 ++++++++++++++++++++++++--
> >> >  1 file changed, 89 insertions(+), 6 deletions(-)
> >> >
> >> > diff --git a/tools/testing/selftests/mm/mdwe_test.c
> >> > b/tools/testing/selftests/mm/mdwe_test.c
> >> > index 91aa9c3099e7..9f08ed1b99ae 100644
> >> > --- a/tools/testing/selftests/mm/mdwe_test.c
> >> > +++ b/tools/testing/selftests/mm/mdwe_test.c
> >> > @@ -22,6 +22,8 @@
> >> >
> >> >  TEST(prctl_flags)
> >> >  {
> >> > +     EXPECT_LT(prctl(PR_SET_MDWE, PR_MDWE_NO_INHERIT, 0L, 0L, 7L), 0);
> >> > +
> >>
> >> PR_MDWE_NO_INHERIT is defined to an int constant, so passing it to
> >> prctl() without a cast to long or similar may produce wrong code on
> >> 64-bit targets (ABIs typically don't require the compiler to clear the
> >> upper 32 bits of a 64-bit register when passing a 32-bit argument, so
> >> va_arg(arg, unsigned long) in prctl() implementation might get junk).
> >
> > Ah, good catch Alexey! :)
> >
> >> Arguably, defining PR_MDWE_* to plain int constants is a bug, or at
> >> least a footgun for users of uapi headers.
> >
> > As part of the next version of this series, I'm happy to:
> > 1- change the existing PR_MDWE_REFUSE_EXEC_GAIN to 1UL
> > 2- introduce PR_MDWE_NO_INHERIT as 2UL
> >
> Yes, I think it's the right thing to do. I suggest to spell them as (1UL
> << 0), etc. for consistency with all other unsigned long flags in the
> header.

Ah yeah, absolutely! Good tip too, thank you :)

> > But I'm surprised that most of the macros in
> > include/uapi/linux/prctl.h are the same sort of footguns already ?
> > Hasn't it been an issue for other prctls yet ?
>
> Yes, they are. I'm not aware of a public discussion of this specific
> issue, but note that at least for some prctl() options the kernel
> doesn't care about upper bits because arguments are truncated before
> doing anything else with them (e.g. for PR_SCHED_CORE raw prctl()

That makes sense

> arguments are implicitly converted to what sched_core_share_pid()
> expects). Also, actually getting junk in the upper bits might not always
> be easy (e.g. on x86-64 all or almost all instructions with r32
> destination operand clear the upper bits). Unfortunately, I don't have a
> better answer than this.

Okay, I was just curious, that's good to know

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-08  1:29     ` Peter Xu
@ 2023-05-08 12:12       ` Florent Revest
  2023-05-08 14:10         ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Florent Revest @ 2023-05-08 12:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, akpm, catalin.marinas, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, izbyshev, nd, broonie,
	szabolcs.nagy, toiwoton, lennart

On Mon, May 8, 2023 at 3:29 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, May 05, 2023 at 06:42:08PM +0200, Florent Revest wrote:
> > On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote:
> > > > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags
> > > > current with a flag that prevents pages that were previously not executable from
> > > > becoming executable.
> > > > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK)
> > > >
> > > > At Google, we've been using a somewhat similar downstream patch for a few years
> > > > now. To make the adoption of this feature easier, we've had it support a mode in
> > > > which the W^X flag does not propagate to children. For example, this is handy if
> > > > a C process which wants W^X protection suspects it could start children
> > > > processes that would use a JIT.
> > > >
> > > > I'd like to align our features with the upstream prctl. This series proposes a
> > > > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It
> > > > sets a different flag in current that is not in MMF_INIT_MASK and which does not
> > > > propagate.
> > >
> > > I don't think I have enough context, so sorry if I'm going to ask a naive
> > > question..
> >
> > Not at all! :) You're absolutely right, it's important to address these points.
> >
> > > I can understand how current MDWE helps on not allowing any modifi-able
> > > content from becoming executable.  How could NO_INHERIT help if it won't
> > > inherit and not in MMF_INIT_MASK?
> >
> > The way I see it, enabling MDWE is just a small step towards hardening
> > a binary anyway. It can possibly make exploitation a bit harder in the
> > case where the attacker has _just_: a write primitive they can use to
> > write a shellcode somewhere and a primitive to make that page
> > executable later. It's a fairly narrow protection already and I think
> > it only really helps as part of a broader "defense in depth" strategy.
> >
> > > IIUC it means the restriction will only apply to the current process.  Then
> > > I assume the process can escape from this rule simply by a fork().  If so,
> > > what's the point to protect at all?
> >
> > If we assume enough control from the attacker, then MDWE is already
> > useless since it can be bypassed by writing to a file and then
> > mmapping that file with PROT_EXEC. I think that's a good example of
> > how "perfect can be the enemy of good" in security hardening. MDWE
> > isn't a silver-bullet but it's a cheap trick and it makes a small dent
> > in reducing the attack surface so it seems worth having anyway ?
> >
> > But indeed, to address your question, if you choose to use this
> > NO_INHERIT flag: you're no longer protected if the attacker can fork()
> > as part of their exploitation. I think it's been a useful trade-off
> > for our internal users since, on the other hand, it also makes
> > adoption a lot easier: our C++ services developers can trivially opt
> > into a potpourri of hardening features without having to think too
> > much about how they work under-the-hood. The default behavior has been
> > to use a NO_INHERIT strategy so users don't get bad surprises the day
> > when they try to spawn a JITted subcommand. In the meantime, their C++
> > service still gets a little bit of extra protection.
> >
> > > And, what's the difference of this comparing to disabling MDWE after being
> > > enabled (which seems to be forbidden for now, but it seems fork() can play
> > > a similar role of disabling it)?
> >
> > That would be functionally somewhat similar, yes. I think it mostly
> > comes down to ease of adoption. I imagine that users who would opt
> > into NO_INHERIT are those who are interested in MDWE for the binary
> > they are writing but aren't 100% confident in what subprocesses they
> > will run and so they don't have to think about disabling it after
> > every fork.
>
> Okay, that makes sense to me.  Thanks.
>
> Since the original MDWE was for systemd, I'm wondering what will happen if
> some program like what you said is invoked by systemd and with MDWE enabled
> already.

Good question

> Currently in your patch IIUC MDWE_NO_INHERIT will fail directly on MDWE
> enabled process,

Yes, I tried to stay close to the spirit of the existing logic (which
doesn't allow any sort of privilege gains) but this is not
particularly a requirement on our side so I'm quite flexible here.
Maybe Joey has an input here ?

> but then it makes me think whether it makes more sense to
> allow converting MDWE->MDWE_NO_INHERIT in this case.  It seems to provide a
> most broad coverage on system daemons using MDWE starting from systemd
> initial process, meanwhile allows specific daemons to fork into anything
> like a JIT process so it can make itself NO_INHERIT.  Attackers won't
> leverage this because MDWE_NO_INHERIT also means MDWE enabled.

I should have cc-ed systemd folks who could have opinions on this. I
will for v2.

+ cc Topi & Lennart

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-08 12:12       ` Florent Revest
@ 2023-05-08 14:10         ` Catalin Marinas
  2023-05-08 17:21           ` Topi Miettinen
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2023-05-08 14:10 UTC (permalink / raw)
  To: Florent Revest
  Cc: Peter Xu, linux-kernel, linux-mm, akpm, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, izbyshev, nd, broonie,
	szabolcs.nagy, toiwoton, lennart

On Mon, May 08, 2023 at 02:12:21PM +0200, Florent Revest wrote:
> On Mon, May 8, 2023 at 3:29 AM Peter Xu <peterx@redhat.com> wrote:
> > On Fri, May 05, 2023 at 06:42:08PM +0200, Florent Revest wrote:
> > > On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote:
> > > > And, what's the difference of this comparing to disabling MDWE after being
> > > > enabled (which seems to be forbidden for now, but it seems fork() can play
> > > > a similar role of disabling it)?
> > >
> > > That would be functionally somewhat similar, yes. I think it mostly
> > > comes down to ease of adoption. I imagine that users who would opt
> > > into NO_INHERIT are those who are interested in MDWE for the binary
> > > they are writing but aren't 100% confident in what subprocesses they
> > > will run and so they don't have to think about disabling it after
> > > every fork.
> >
> > Okay, that makes sense to me.  Thanks.
> >
> > Since the original MDWE was for systemd, I'm wondering what will happen if
> > some program like what you said is invoked by systemd and with MDWE enabled
> > already.
> 
> Good question

I think JITs work around this by creating two separate mappings of the
same pages, one RX and the other RW (rather than toggling the permission
with mprotect()). I had the impression Chromium can use memfd to do
something similar but I never checked.

> > Currently in your patch IIUC MDWE_NO_INHERIT will fail directly on MDWE
> > enabled process,
> 
> Yes, I tried to stay close to the spirit of the existing logic (which
> doesn't allow any sort of privilege gains) but this is not
> particularly a requirement on our side so I'm quite flexible here.

I think we should keep the original behaviour of systemd here, otherwise
they won't transition to the new interface and keep using the SECCOMP
BPF approach (which, in addition, prevents glibc from setting PROT_BTI
on an already executable mapping).

To me MDWE is not about preventing JITs but rather ensuring buggy
programs don't end up with WX mappings. We ended up this way because of
the SECCOMP BPF limitations (just guessing, I haven't been involved in
its design). With a no-inherit MDWE, one can introduce an additional
policy for systemd. It would be a sysadmin decision which one to enable
and maybe current (inherit) MDWE will disappear in time.

x86 has protection keys and arm64 will soon have permission overlays
that allow user-space to toggle between RX and RW (Joey is looking at
the arm64 support). I'm not sure how we'll end up implemented this on
arm64 (and haven't looked at x86) but I have a suspicion MDWE will get
in the way as the base page table permission will probably need
PROT_WRITE|PROT_EXEC.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-08 14:10         ` Catalin Marinas
@ 2023-05-08 17:21           ` Topi Miettinen
  2023-05-09 10:04             ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Topi Miettinen @ 2023-05-08 17:21 UTC (permalink / raw)
  To: Catalin Marinas, Florent Revest
  Cc: Peter Xu, linux-kernel, linux-mm, akpm, anshuman.khandual,
	joey.gouly, mhocko, keescook, david, izbyshev, nd, broonie,
	szabolcs.nagy, lennart

On 8.5.2023 17.10, Catalin Marinas wrote:
> I think we should keep the original behaviour of systemd here, otherwise
> they won't transition to the new interface and keep using the SECCOMP
> BPF approach (which, in addition, prevents glibc from setting PROT_BTI
> on an already executable mapping).

Systemd has transitioned to prctl(PR_SET_MDWE) method since release of 
v253, so the original behaviour definitely should be kept.

> To me MDWE is not about preventing JITs but rather ensuring buggy
> programs don't end up with WX mappings. We ended up this way because of
> the SECCOMP BPF limitations (just guessing, I haven't been involved in
> its design). With a no-inherit MDWE, one can introduce an additional
> policy for systemd. It would be a sysadmin decision which one to enable
> and maybe current (inherit) MDWE will disappear in time.

There could be a new setting for this, like 
MemoryDenyWriteExecute=no-inherit. I'd only use it for those special 
cases where MemoryDenyWriteExecute=yes can't be used.

> x86 has protection keys and arm64 will soon have permission overlays
> that allow user-space to toggle between RX and RW (Joey is looking at
> the arm64 support). I'm not sure how we'll end up implemented this on
> arm64 (and haven't looked at x86) but I have a suspicion MDWE will get
> in the way as the base page table permission will probably need
> PROT_WRITE|PROT_EXEC.

Wouldn't those features defeat any gains from MDWE? The features 
probably should be forbidden with MemoryDenyWriteExecute=yes.

-Topi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] MDWE without inheritance
  2023-05-08 17:21           ` Topi Miettinen
@ 2023-05-09 10:04             ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2023-05-09 10:04 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Florent Revest, Peter Xu, linux-kernel, linux-mm, akpm,
	anshuman.khandual, joey.gouly, mhocko, keescook, david, izbyshev,
	nd, broonie, szabolcs.nagy, lennart

On Mon, May 08, 2023 at 08:21:16PM +0300, Topi Miettinen wrote:
> On 8.5.2023 17.10, Catalin Marinas wrote:
> > I think we should keep the original behaviour of systemd here, otherwise
> > they won't transition to the new interface and keep using the SECCOMP
> > BPF approach (which, in addition, prevents glibc from setting PROT_BTI
> > on an already executable mapping).
> 
> Systemd has transitioned to prctl(PR_SET_MDWE) method since release of v253,
> so the original behaviour definitely should be kept.

That's great. So yes, no ABI changes allowed anymore.

> > x86 has protection keys and arm64 will soon have permission overlays
> > that allow user-space to toggle between RX and RW (Joey is looking at
> > the arm64 support). I'm not sure how we'll end up implemented this on
> > arm64 (and haven't looked at x86) but I have a suspicion MDWE will get
> > in the way as the base page table permission will probably need
> > PROT_WRITE|PROT_EXEC.
> 
> Wouldn't those features defeat any gains from MDWE? The features probably
> should be forbidden with MemoryDenyWriteExecute=yes.

The permission overlays (controlled by the user) can only further
restrict the mmap() permissions. So MDWE would still work as expected.
If one wants to toggle between RW and RX with overlays, the overall
mmap() needs to be RWX and it won't work if MDWE=yes. No need to
explicitly disable the overlays feature.

On arm64 at least, with the introduction of permission overlays we also
have the notion of "Read, Execute if not Write". This permission
automatically disables Exec if the mapping becomes writable (overlays
can disable writable, allowing exec). We could have a new MDWE policy
which allows this, though I'm not that keen on using it in Linux since
background permission changes done by the kernel can lead to an
unexpected executable permission (e.g. marking a page read-only for
clean/dirty tracking or in preparation for CoW after fork()).

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-09 10:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04 17:09 [PATCH 0/4] MDWE without inheritance Florent Revest
2023-05-04 17:09 ` [PATCH 1/4] kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test Florent Revest
2023-05-04 17:09 ` [PATCH 2/4] kselftest: vm: Fix mdwe's mmap_FIXED test case Florent Revest
2023-05-04 17:13   ` Florent Revest
2023-05-04 17:09 ` [PATCH 3/4] mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl Florent Revest
2023-05-05 18:34   ` Catalin Marinas
2023-05-08 12:11     ` Florent Revest
2023-05-04 17:09 ` [PATCH 4/4] kselftest: vm: Add tests for no-inherit memory-deny-write-execute Florent Revest
2023-05-04 20:29   ` Alexey Izbyshev
2023-05-05 16:42     ` Florent Revest
2023-05-05 21:26       ` Alexey Izbyshev
2023-05-08 12:12         ` Florent Revest
2023-05-04 20:06 ` [PATCH 0/4] MDWE without inheritance Peter Xu
2023-05-05 16:42   ` Florent Revest
2023-05-08  1:29     ` Peter Xu
2023-05-08 12:12       ` Florent Revest
2023-05-08 14:10         ` Catalin Marinas
2023-05-08 17:21           ` Topi Miettinen
2023-05-09 10:04             ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.