All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf
@ 2023-12-08  9:06 Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 1/5] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-08  9:06 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

Background
==========

In our containerized environment, we've identified unexpected OOM events
where the OOM-killer terminates tasks despite having ample free memory.
This anomaly is traced back to tasks within a container using mbind(2) to
bind memory to a specific NUMA node. When the allocated memory on this node
is exhausted, the OOM-killer, prioritizing tasks based on oom_score,
indiscriminately kills tasks. 

The Challenge 
=============
In a containerized environment, independent memory binding by a user can
lead to unexpected system issues or disrupt tasks being run by other users
on the same server. If a user genuinely requires memory binding, we will
allocate dedicated servers to them by leveraging kubelet deployment.

Currently, users possess the ability to autonomously bind their memory to
specific nodes without explicit agreement or authorization from our end.
It's imperative that we establish a method to prevent this behavior.

Proposed Solution
=================

- Capability
  Currently, any task can perform MPOL_BIND without specific capabilities.
  Enforcing CAP_SYS_RESOURCE or CAP_SYS_NICE could be an option, but this
  may have unintended consequences. Capabilities, being broad, might grant
  unnecessary privileges. We should explore alternatives to prevent
  unexpected side effects.

- LSM 
  Introduce LSM hooks for syscalls such as mbind(2) and set_mempolicy(2)
  to disable MPOL_BIND. This approach is more flexibility and allows for
  fine-grained control without unintended consequences. A sample LSM BPF
  program is included, demonstrating practical implementation in a
  production environment.

- seccomp
  seccomp is relatively heavyweight, making it less suitable for
  enabling in our production environment:
  - Both kubelet and containers need adaptation to support it.
  - Dynamically altering security policies for individual containers
    without interrupting their operations isn't straightforward.

Future Considerations
=====================

In addition, there's room for enhancement in the OOM-killer for cases
involving CONSTRAINT_MEMORY_POLICY. It would be more beneficial to
prioritize selecting a victim that has allocated memory on the same NUMA
node. My exploration on the lore led me to a proposal[0] related to this
matter, although consensus seems elusive at this point. Nevertheless,
delving into this specific topic is beyond the scope of the current
patchset.

[0]. https://lore.kernel.org/lkml/20220512044634.63586-1-ligang.bdlg@bytedance.com/

Changes:
- v3 -> v4: 
  - Drop the changes around security_task_movememory (Serge) 
- RCC v2 -> v3: https://lwn.net/Articles/953526/
  - Add MPOL_F_NUMA_BALANCING man-page (Ying)
  - Fix bpf selftests error reported by bot+bpf-ci
- RFC v1 -> RFC v2: https://lwn.net/Articles/952339/
  - Refine the commit log to avoid misleading
  - Use one common lsm hook instead and add comment for it
  - Add selinux implementation
  - Other improments in mempolicy
- RFC v1: https://lwn.net/Articles/951188/

Yafang Shao (5):
  mm, doc: Add doc for MPOL_F_NUMA_BALANCING
  mm: mempolicy: Revise comment regarding mempolicy mode flags
  mm, security: Add lsm hook for memory policy adjustment
  security: selinux: Implement set_mempolicy hook
  selftests/bpf: Add selftests for set_mempolicy with a lsm prog

 .../admin-guide/mm/numa_memory_policy.rst          | 27 ++++++++
 include/linux/lsm_hook_defs.h                      |  3 +
 include/linux/security.h                           |  9 +++
 include/uapi/linux/mempolicy.h                     |  2 +-
 mm/mempolicy.c                                     |  8 +++
 security/security.c                                | 13 ++++
 security/selinux/hooks.c                           |  8 +++
 security/selinux/include/classmap.h                |  2 +-
 .../selftests/bpf/prog_tests/set_mempolicy.c       | 81 ++++++++++++++++++++++
 .../selftests/bpf/progs/test_set_mempolicy.c       | 28 ++++++++
 10 files changed, 179 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/5] mm, doc: Add doc for MPOL_F_NUMA_BALANCING
  2023-12-08  9:06 [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
@ 2023-12-08  9:06 ` Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 2/5] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-08  9:06 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

The document on MPOL_F_NUMA_BALANCING was missed in the initial commit
The MPOL_F_NUMA_BALANCING document was inadvertently omitted from the
initial commit bda420b98505 ("numa balancing: migrate on fault among
multiple bound nodes")

Let's ensure its inclusion.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
 .../admin-guide/mm/numa_memory_policy.rst          | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index eca38fa..19071b71 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -332,6 +332,33 @@ MPOL_F_RELATIVE_NODES
 	MPOL_PREFERRED policies that were created with an empty nodemask
 	(local allocation).
 
+MPOL_F_NUMA_BALANCING (since Linux 5.12)
+        When operating in MPOL_BIND mode, enables NUMA balancing for tasks,
+        contingent upon kernel support. This feature optimizes page
+        placement within the confines of the specified memory binding
+        policy. The addition of the MPOL_F_NUMA_BALANCING flag augments the
+        control mechanism for NUMA balancing:
+
+        - The sysctl knob numa_balancing governs global activation or
+          deactivation of NUMA balancing.
+
+        - Even if sysctl numa_balancing is enabled, NUMA balancing remains
+          disabled by default for memory areas or applications utilizing
+          explicit memory policies.
+
+        - The MPOL_F_NUMA_BALANCING flag facilitates NUMA balancing
+          activation for applications employing explicit memory policies
+          (MPOL_BIND).
+
+        This flags enables various optimizations for page placement through
+        NUMA balancing. For instance, when an application's memory is bound
+        to multiple nodes (MPOL_BIND), the hint page fault handler attempts
+        to migrate accessed pages to reduce cross-node access if the
+        accessing node aligns with the policy nodemask.
+
+        If the flag isn't supported by the kernel, or is used with mode
+        other than MPOL_BIND, -1 is returned and errno is set to EINVAL.
+
 Memory Policy Reference Counting
 ================================
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/5] mm: mempolicy: Revise comment regarding mempolicy mode flags
  2023-12-08  9:06 [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 1/5] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
@ 2023-12-08  9:06 ` Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-08  9:06 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao,
	Eric Dumazet

MPOL_F_STATIC_NODES, MPOL_F_RELATIVE_NODES, and MPOL_F_NUMA_BALANCING are
mode flags applicable to both set_mempolicy(2) and mbind(2) system calls.
It's worth noting that MPOL_F_NUMA_BALANCING was initially introduced in
commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
nodes") exclusively for set_mempolicy(2). However, it was later made a
shared flag for both set_mempolicy(2) and mbind(2) following
commit 6d2aec9e123b ("mm/mempolicy: do not allow illegal
MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()").

This revised version aims to clarify the details regarding the mode flags.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/mempolicy.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index a8963f7..afed4a4 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -26,7 +26,7 @@ enum {
 	MPOL_MAX,	/* always last member of enum */
 };
 
-/* Flags for set_mempolicy */
+/* Flags for set_mempolicy() or mbind() */
 #define MPOL_F_STATIC_NODES	(1 << 15)
 #define MPOL_F_RELATIVE_NODES	(1 << 14)
 #define MPOL_F_NUMA_BALANCING	(1 << 13) /* Optimize with NUMA balancing if possible */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment
  2023-12-08  9:06 [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 1/5] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 2/5] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
@ 2023-12-08  9:06 ` Yafang Shao
  2023-12-08 17:30   ` Casey Schaufler
  2023-12-08  9:06 ` [PATCH v4 4/5] security: selinux: Implement set_mempolicy hook Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
  4 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2023-12-08  9:06 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

In a containerized environment, independent memory binding by a user can
lead to unexpected system issues or disrupt tasks being run by other users
on the same server. If a user genuinely requires memory binding, we will
allocate dedicated servers to them by leveraging kubelet deployment.

At present, users have the capability to bind their memory to a specific
node without explicit agreement or authorization from us. Consequently, a
new LSM hook is introduced to mitigate this. This implementation allows us
to exercise fine-grained control over memory policy adjustments within our
container environment

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/lsm_hook_defs.h |  3 +++
 include/linux/security.h      |  9 +++++++++
 mm/mempolicy.c                |  8 ++++++++
 security/security.c           | 13 +++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ff217a5..5580127 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -419,3 +419,6 @@
 LSM_HOOK(int, 0, uring_sqpoll, void)
 LSM_HOOK(int, 0, uring_cmd, struct io_uring_cmd *ioucmd)
 #endif /* CONFIG_IO_URING */
+
+LSM_HOOK(int, 0, set_mempolicy, unsigned long mode, unsigned short mode_flags,
+	 nodemask_t *nmask, unsigned int flags)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1d1df326..cc4a19a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -484,6 +484,8 @@ int security_setprocattr(const char *lsm, const char *name, void *value,
 int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
 int security_locked_down(enum lockdown_reason what);
+int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+			   nodemask_t *nmask, unsigned int flags);
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1395,6 +1397,13 @@ static inline int security_locked_down(enum lockdown_reason what)
 {
 	return 0;
 }
+
+static inline int
+security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+		       nodemask_t *nmask, unsigned int flags)
+{
+	return 0;
+}
 #endif	/* CONFIG_SECURITY */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590e..9535d9e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1483,6 +1483,10 @@ static long kernel_mbind(unsigned long start, unsigned long len,
 	if (err)
 		return err;
 
+	err = security_set_mempolicy(lmode, mode_flags, &nodes, flags);
+	if (err)
+		return err;
+
 	return do_mbind(start, len, lmode, mode_flags, &nodes, flags);
 }
 
@@ -1577,6 +1581,10 @@ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask,
 	if (err)
 		return err;
 
+	err = security_set_mempolicy(lmode, mode_flags, &nodes, 0);
+	if (err)
+		return err;
+
 	return do_set_mempolicy(lmode, mode_flags, &nodes);
 }
 
diff --git a/security/security.c b/security/security.c
index dcb3e70..685ad79 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5337,3 +5337,16 @@ int security_uring_cmd(struct io_uring_cmd *ioucmd)
 	return call_int_hook(uring_cmd, 0, ioucmd);
 }
 #endif /* CONFIG_IO_URING */
+
+/**
+ * security_set_mempolicy() - Check if memory policy can be adjusted
+ * @mode: The memory policy mode to be set
+ * @mode_flags: optional mode flags
+ * @nmask: modemask to which the mode applies
+ * @flags: mode flags for mbind(2) only
+ */
+int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+			   nodemask_t *nmask, unsigned int flags)
+{
+	return call_int_hook(set_mempolicy, 0, mode, mode_flags, nmask, flags);
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 4/5] security: selinux: Implement set_mempolicy hook
  2023-12-08  9:06 [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
                   ` (2 preceding siblings ...)
  2023-12-08  9:06 ` [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
@ 2023-12-08  9:06 ` Yafang Shao
  2023-12-08  9:06 ` [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
  4 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-08  9:06 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

Add a SELinux access control for the newly introduced set_mempolicy lsm
hook. A new permission "setmempolicy" is defined under the "process" class
for it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 security/selinux/hooks.c            | 8 ++++++++
 security/selinux/include/classmap.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index feda711..1528d4d 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4238,6 +4238,13 @@ static int selinux_userns_create(const struct cred *cred)
 			USER_NAMESPACE__CREATE, NULL);
 }
 
+static int selinux_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+				 nodemask_t *nmask, unsigned int flags)
+{
+	return avc_has_perm(current_sid(), task_sid_obj(current), SECCLASS_PROCESS,
+			    PROCESS__SETMEMPOLICY, NULL);
+}
+
 /* Returns error only if unable to parse addresses */
 static int selinux_parse_skb_ipv4(struct sk_buff *skb,
 			struct common_audit_data *ad, u8 *proto)
@@ -7072,6 +7079,7 @@ static int selinux_uring_cmd(struct io_uring_cmd *ioucmd)
 	LSM_HOOK_INIT(task_kill, selinux_task_kill),
 	LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
 	LSM_HOOK_INIT(userns_create, selinux_userns_create),
+	LSM_HOOK_INIT(set_mempolicy, selinux_set_mempolicy),
 
 	LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
 	LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index a3c3807..c280d92 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -51,7 +51,7 @@
 	    "getattr", "setexec", "setfscreate", "noatsecure", "siginh",
 	    "setrlimit", "rlimitinh", "dyntransition", "setcurrent",
 	    "execmem", "execstack", "execheap", "setkeycreate",
-	    "setsockcreate", "getrlimit", NULL } },
+	    "setsockcreate", "getrlimit", "setmempolicy", NULL } },
 	{ "process2",
 	  { "nnp_transition", "nosuid_transition", NULL } },
 	{ "system",
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog
  2023-12-08  9:06 [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
                   ` (3 preceding siblings ...)
  2023-12-08  9:06 ` [PATCH v4 4/5] security: selinux: Implement set_mempolicy hook Yafang Shao
@ 2023-12-08  9:06 ` Yafang Shao
  2023-12-12 19:22   ` KP Singh
  4 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2023-12-08  9:06 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

The result as follows,
  #263/1   set_mempolicy/MPOL_BIND_without_lsm:OK
  #263/2   set_mempolicy/MPOL_DEFAULT_without_lsm:OK
  #263/3   set_mempolicy/MPOL_BIND_with_lsm:OK
  #263/4   set_mempolicy/MPOL_DEFAULT_with_lsm:OK
  #263     set_mempolicy:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 .../selftests/bpf/prog_tests/set_mempolicy.c       | 81 ++++++++++++++++++++++
 .../selftests/bpf/progs/test_set_mempolicy.c       | 28 ++++++++
 2 files changed, 109 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c

diff --git a/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
new file mode 100644
index 0000000..736b5e3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <linux/mempolicy.h>
+#include <test_progs.h>
+#include "test_set_mempolicy.skel.h"
+
+#define SIZE 4096
+
+static void mempolicy_bind(bool success)
+{
+	unsigned long mask = 1;
+	char *addr;
+	int err;
+
+	addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (!ASSERT_OK_PTR(addr, "mmap"))
+		return;
+
+	/* -lnuma is required by mbind(2), so use __NR_mbind to avoid the dependency. */
+	err = syscall(__NR_mbind, addr, SIZE, MPOL_BIND, &mask, sizeof(mask), 0);
+	if (success)
+		ASSERT_OK(err, "mbind_success");
+	else
+		ASSERT_ERR(err, "mbind_fail");
+
+	munmap(addr, SIZE);
+}
+
+static void mempolicy_default(void)
+{
+	char *addr;
+	int err;
+
+	addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (!ASSERT_OK_PTR(addr, "mmap"))
+		return;
+
+	err = syscall(__NR_mbind, addr, SIZE, MPOL_DEFAULT, NULL, 0, 0);
+	ASSERT_OK(err, "mbind_success");
+
+	munmap(addr, SIZE);
+}
+
+void test_set_mempolicy(void)
+{
+	struct test_set_mempolicy *skel;
+	int err;
+
+	skel = test_set_mempolicy__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	skel->bss->target_pid = getpid();
+
+	err = test_set_mempolicy__load(skel);
+	if (!ASSERT_OK(err, "load"))
+		goto destroy;
+
+	if (test__start_subtest("MPOL_BIND_without_lsm"))
+		mempolicy_bind(true);
+	if (test__start_subtest("MPOL_DEFAULT_without_lsm"))
+		mempolicy_default();
+
+	/* Attach LSM prog first */
+	err = test_set_mempolicy__attach(skel);
+	if (!ASSERT_OK(err, "attach"))
+		goto destroy;
+
+	/* syscall to adjust memory policy */
+	if (test__start_subtest("MPOL_BIND_with_lsm"))
+		mempolicy_bind(false);
+	if (test__start_subtest("MPOL_DEFAULT_with_lsm"))
+		mempolicy_default();
+
+destroy:
+	test_set_mempolicy__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_set_mempolicy.c b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
new file mode 100644
index 0000000..b5356d5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int target_pid;
+
+static int mem_policy_adjustment(u64 mode)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+
+	if (task->pid != target_pid)
+		return 0;
+
+	if (mode != MPOL_BIND)
+		return 0;
+	return -1;
+}
+
+SEC("lsm/set_mempolicy")
+int BPF_PROG(setmempolicy, u64 mode, u16 mode_flags, nodemask_t *nmask, u32 flags)
+{
+	return mem_policy_adjustment(mode);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment
  2023-12-08  9:06 ` [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
@ 2023-12-08 17:30   ` Casey Schaufler
  2023-12-10  2:54     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Casey Schaufler @ 2023-12-08 17:30 UTC (permalink / raw)
  To: Yafang Shao, akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Casey Schaufler

On 12/8/2023 1:06 AM, Yafang Shao wrote:
> In a containerized environment, independent memory binding by a user can
> lead to unexpected system issues or disrupt tasks being run by other users
> on the same server. If a user genuinely requires memory binding, we will
> allocate dedicated servers to them by leveraging kubelet deployment.
>
> At present, users have the capability to bind their memory to a specific
> node without explicit agreement or authorization from us. Consequently, a
> new LSM hook is introduced to mitigate this. This implementation allows us
> to exercise fine-grained control over memory policy adjustments within our
> container environment

I wonder if security_vm_enough_memory() ought to be reimplemented as
an option to security_set_mempolicy(). I'm not convinced either way,
but I can argue both. 

> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/lsm_hook_defs.h |  3 +++
>  include/linux/security.h      |  9 +++++++++
>  mm/mempolicy.c                |  8 ++++++++
>  security/security.c           | 13 +++++++++++++
>  4 files changed, 33 insertions(+)
>
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index ff217a5..5580127 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -419,3 +419,6 @@
>  LSM_HOOK(int, 0, uring_sqpoll, void)
>  LSM_HOOK(int, 0, uring_cmd, struct io_uring_cmd *ioucmd)
>  #endif /* CONFIG_IO_URING */
> +
> +LSM_HOOK(int, 0, set_mempolicy, unsigned long mode, unsigned short mode_flags,
> +	 nodemask_t *nmask, unsigned int flags)
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 1d1df326..cc4a19a 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -484,6 +484,8 @@ int security_setprocattr(const char *lsm, const char *name, void *value,
>  int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
>  int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
>  int security_locked_down(enum lockdown_reason what);
> +int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
> +			   nodemask_t *nmask, unsigned int flags);
>  #else /* CONFIG_SECURITY */
>  
>  static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
> @@ -1395,6 +1397,13 @@ static inline int security_locked_down(enum lockdown_reason what)
>  {
>  	return 0;
>  }
> +
> +static inline int
> +security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
> +		       nodemask_t *nmask, unsigned int flags)
> +{
> +	return 0;
> +}
>  #endif	/* CONFIG_SECURITY */
>  
>  #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 10a590e..9535d9e 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1483,6 +1483,10 @@ static long kernel_mbind(unsigned long start, unsigned long len,
>  	if (err)
>  		return err;
>  
> +	err = security_set_mempolicy(lmode, mode_flags, &nodes, flags);
> +	if (err)
> +		return err;
> +
>  	return do_mbind(start, len, lmode, mode_flags, &nodes, flags);
>  }
>  
> @@ -1577,6 +1581,10 @@ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask,
>  	if (err)
>  		return err;
>  
> +	err = security_set_mempolicy(lmode, mode_flags, &nodes, 0);
> +	if (err)
> +		return err;
> +
>  	return do_set_mempolicy(lmode, mode_flags, &nodes);
>  }
>  
> diff --git a/security/security.c b/security/security.c
> index dcb3e70..685ad79 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -5337,3 +5337,16 @@ int security_uring_cmd(struct io_uring_cmd *ioucmd)
>  	return call_int_hook(uring_cmd, 0, ioucmd);
>  }
>  #endif /* CONFIG_IO_URING */
> +
> +/**
> + * security_set_mempolicy() - Check if memory policy can be adjusted
> + * @mode: The memory policy mode to be set
> + * @mode_flags: optional mode flags
> + * @nmask: modemask to which the mode applies
> + * @flags: mode flags for mbind(2) only
> + */
> +int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
> +			   nodemask_t *nmask, unsigned int flags)
> +{
> +	return call_int_hook(set_mempolicy, 0, mode, mode_flags, nmask, flags);
> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment
  2023-12-08 17:30   ` Casey Schaufler
@ 2023-12-10  2:54     ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-10  2:54 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang,
	linux-mm, linux-security-module, bpf, ligang.bdlg

On Sat, Dec 9, 2023 at 1:30 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 12/8/2023 1:06 AM, Yafang Shao wrote:
> > In a containerized environment, independent memory binding by a user can
> > lead to unexpected system issues or disrupt tasks being run by other users
> > on the same server. If a user genuinely requires memory binding, we will
> > allocate dedicated servers to them by leveraging kubelet deployment.
> >
> > At present, users have the capability to bind their memory to a specific
> > node without explicit agreement or authorization from us. Consequently, a
> > new LSM hook is introduced to mitigate this. This implementation allows us
> > to exercise fine-grained control over memory policy adjustments within our
> > container environment
>
> I wonder if security_vm_enough_memory() ought to be reimplemented as
> an option to security_set_mempolicy(). I'm not convinced either way,
> but I can argue both.

The function security_vm_enough_memory() serves to verify the
permissibility of a new memory map, while security_set_mempolicy()
comes into play post-memory map allocation. Expanding
security_vm_enough_memory() to include memory policy checks might
potentially lead to regressions. Therefore, I would prefer to
introduce a new function, security_set_mempolicy(), to handle these
checks separately.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog
  2023-12-08  9:06 ` [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
@ 2023-12-12 19:22   ` KP Singh
  2023-12-13  3:08     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: KP Singh @ 2023-12-12 19:22 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang,
	linux-mm, linux-security-module, bpf, ligang.bdlg

On Fri, Dec 8, 2023 at 10:06 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> The result as follows,
>   #263/1   set_mempolicy/MPOL_BIND_without_lsm:OK
>   #263/2   set_mempolicy/MPOL_DEFAULT_without_lsm:OK
>   #263/3   set_mempolicy/MPOL_BIND_with_lsm:OK
>   #263/4   set_mempolicy/MPOL_DEFAULT_with_lsm:OK
>   #263     set_mempolicy:OK
>   Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

Please write a commit description on what the test actually does. I
even think of something simple that mentions a BPF LSM program that
denies all mbind with the mode MPOL_BIND and checks whether the
corresponding syscall is denied when the program is loaded.


>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/set_mempolicy.c       | 81 ++++++++++++++++++++++
>  .../selftests/bpf/progs/test_set_mempolicy.c       | 28 ++++++++
>  2 files changed, 109 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
> new file mode 100644
> index 0000000..736b5e3
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
> @@ -0,0 +1,81 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
> +
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <linux/mempolicy.h>
> +#include <test_progs.h>
> +#include "test_set_mempolicy.skel.h"
> +
> +#define SIZE 4096
> +
> +static void mempolicy_bind(bool success)
> +{
> +       unsigned long mask = 1;
> +       char *addr;
> +       int err;
> +
> +       addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> +       if (!ASSERT_OK_PTR(addr, "mmap"))
> +               return;
> +
> +       /* -lnuma is required by mbind(2), so use __NR_mbind to avoid the dependency. */
> +       err = syscall(__NR_mbind, addr, SIZE, MPOL_BIND, &mask, sizeof(mask), 0);
> +       if (success)
> +               ASSERT_OK(err, "mbind_success");
> +       else
> +               ASSERT_ERR(err, "mbind_fail");
> +
> +       munmap(addr, SIZE);
> +}
> +
> +static void mempolicy_default(void)
> +{
> +       char *addr;
> +       int err;
> +
> +       addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> +       if (!ASSERT_OK_PTR(addr, "mmap"))
> +               return;
> +
> +       err = syscall(__NR_mbind, addr, SIZE, MPOL_DEFAULT, NULL, 0, 0);
> +       ASSERT_OK(err, "mbind_success");
> +
> +       munmap(addr, SIZE);
> +}
> +
> +void test_set_mempolicy(void)
> +{
> +       struct test_set_mempolicy *skel;
> +       int err;
> +
> +       skel = test_set_mempolicy__open();
> +       if (!ASSERT_OK_PTR(skel, "open"))
> +               return;
> +
> +       skel->bss->target_pid = getpid();
> +
> +       err = test_set_mempolicy__load(skel);
> +       if (!ASSERT_OK(err, "load"))
> +               goto destroy;
> +
> +       if (test__start_subtest("MPOL_BIND_without_lsm"))
> +               mempolicy_bind(true);
> +       if (test__start_subtest("MPOL_DEFAULT_without_lsm"))
> +               mempolicy_default();
> +
> +       /* Attach LSM prog first */
> +       err = test_set_mempolicy__attach(skel);
> +       if (!ASSERT_OK(err, "attach"))
> +               goto destroy;
> +
> +       /* syscall to adjust memory policy */
> +       if (test__start_subtest("MPOL_BIND_with_lsm"))
> +               mempolicy_bind(false);
> +       if (test__start_subtest("MPOL_DEFAULT_with_lsm"))
> +               mempolicy_default();
> +
> +destroy:
> +       test_set_mempolicy__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_set_mempolicy.c b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
> new file mode 100644
> index 0000000..b5356d5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +int target_pid;
> +
> +static int mem_policy_adjustment(u64 mode)
> +{
> +       struct task_struct *task = bpf_get_current_task_btf();
> +
> +       if (task->pid != target_pid)
> +               return 0;
> +
> +       if (mode != MPOL_BIND)
> +               return 0;
> +       return -1;
> +}
> +
> +SEC("lsm/set_mempolicy")
> +int BPF_PROG(setmempolicy, u64 mode, u16 mode_flags, nodemask_t *nmask, u32 flags)
> +{
> +       return mem_policy_adjustment(mode);
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 1.8.3.1
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog
  2023-12-12 19:22   ` KP Singh
@ 2023-12-13  3:08     ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-13  3:08 UTC (permalink / raw)
  To: KP Singh
  Cc: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang,
	linux-mm, linux-security-module, bpf, ligang.bdlg

On Wed, Dec 13, 2023 at 3:22 AM KP Singh <kpsingh@kernel.org> wrote:
>
> On Fri, Dec 8, 2023 at 10:06 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > The result as follows,
> >   #263/1   set_mempolicy/MPOL_BIND_without_lsm:OK
> >   #263/2   set_mempolicy/MPOL_DEFAULT_without_lsm:OK
> >   #263/3   set_mempolicy/MPOL_BIND_with_lsm:OK
> >   #263/4   set_mempolicy/MPOL_DEFAULT_with_lsm:OK
> >   #263     set_mempolicy:OK
> >   Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
>
> Please write a commit description on what the test actually does. I

will do it.

> even think of something simple that mentions a BPF LSM program that
> denies all mbind with the mode MPOL_BIND and checks whether the
> corresponding syscall is denied when the program is loaded.

It does. Additionally, it verifies whether the mbind syscall is denied
with different modes, such as MPOL_DEFAULT."

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-13  3:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08  9:06 [PATCH v4 0/5] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
2023-12-08  9:06 ` [PATCH v4 1/5] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
2023-12-08  9:06 ` [PATCH v4 2/5] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
2023-12-08  9:06 ` [PATCH v4 3/5] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
2023-12-08 17:30   ` Casey Schaufler
2023-12-10  2:54     ` Yafang Shao
2023-12-08  9:06 ` [PATCH v4 4/5] security: selinux: Implement set_mempolicy hook Yafang Shao
2023-12-08  9:06 ` [PATCH v4 5/5] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
2023-12-12 19:22   ` KP Singh
2023-12-13  3:08     ` Yafang Shao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.