linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Introduce security_create_user_ns()
@ 2022-08-15 16:20 Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 1/4] security, lsm: " Frederick Lawler
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Frederick Lawler @ 2022-08-15 16:20 UTC (permalink / raw)
  To: kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris, serge, paul,
	stephen.smalley.work, eparis, shuah, brauner, casey, ebiederm,
	bpf, linux-security-module, selinux, linux-kselftest
  Cc: linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz,
	Frederick Lawler

While user namespaces do not make the kernel more vulnerable, they are however
used to initiate exploits. Some users do not want to block namespace creation
for the entirety of the system, which some distributions provide. Instead, we
needed a way to have some applications be blocked, and others allowed. This is
not possible with those tools. Managing hierarchies also did not fit our case
because we're determining which tasks are allowed based on their attributes.

While exploring a solution, we first leveraged the LSM cred_prepare hook
because that is the closest hook to prevent a call to create_user_ns().

The calls look something like this:

    cred = prepare_creds()
        security_prepare_creds()
            call_int_hook(cred_prepare, ...
    if (cred)
        create_user_ns(cred)

We noticed that error codes were not propagated from this hook and
introduced a patch [1] to propagate those errors.

The discussion notes that security_prepare_creds() is not appropriate for
MAC policies, and instead the hook is meant for LSM authors to prepare
credentials for mutation. [2]

Additionally, cred_prepare hook is not without problems. Handling the clone3
case is a bit more tricky due to the user space pointer passed to it. This
makes checking the syscall subject to a possible TOCTTOU attack.

Ultimately, we concluded that a better course of action is to introduce
a new security hook for LSM authors. [3]

This patch set first introduces a new security_create_user_ns() function
and userns_create LSM hook, then marks the hook as sleepable in BPF. The
following patches after include a BPF test and a patch for an SELinux
implementation.

We want to encourage use of user namespaces, and also cater the needs
of users/administrators to observe and/or control access. There is no
expectation of an impact on user space applications because access control 
is opt-in, and users wishing to observe within a LSM context 


Links:
1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare.com/

Past discussions:
V4: https://lore.kernel.org/all/20220801180146.1157914-1-fred@cloudflare.com/
V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/
V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/
V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/

Changes since v4:
- Update commit description
- Update cover letter
Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given
  permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take
  struct cred
- Move security_create_user_ns() call after id mapping check in
  create_user_ns()
- Update documentation to reflect changes

Frederick Lawler (4):
  security, lsm: Introduce security_create_user_ns()
  bpf-lsm: Make bpf_lsm_userns_create() sleepable
  selftests/bpf: Add tests verifying bpf lsm userns_create hook
  selinux: Implement userns_create hook

 include/linux/lsm_hook_defs.h                 |   1 +
 include/linux/lsm_hooks.h                     |   4 +
 include/linux/security.h                      |   6 ++
 kernel/bpf/bpf_lsm.c                          |   1 +
 kernel/user_namespace.c                       |   5 +
 security/security.c                           |   5 +
 security/selinux/hooks.c                      |   9 ++
 security/selinux/include/classmap.h           |   2 +
 .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
 .../selftests/bpf/progs/test_deny_namespace.c |  33 ++++++
 10 files changed, 168 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 1/4] security, lsm: Introduce security_create_user_ns()
  2022-08-15 16:20 [PATCH v5 0/4] Introduce security_create_user_ns() Frederick Lawler
@ 2022-08-15 16:20 ` Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 2/4] bpf-lsm: Make bpf_lsm_userns_create() sleepable Frederick Lawler
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Frederick Lawler @ 2022-08-15 16:20 UTC (permalink / raw)
  To: kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris, serge, paul,
	stephen.smalley.work, eparis, shuah, brauner, casey, ebiederm,
	bpf, linux-security-module, selinux, linux-kselftest
  Cc: linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz,
	Frederick Lawler

User namespaces are an effective tool to allow programs to run with
permission without requiring the need for a program to run as root. User
namespaces may also be used as a sandboxing technique. However, attackers
sometimes leverage user namespaces as an initial attack vector to perform
some exploit. [1,2,3]

While it is not the unprivileged user namespace functionality, which
causes the kernel to be exploitable, users/administrators might want to
more granularly limit or at least monitor how various processes use this
functionality, while vulnerable kernel subsystems are being patched.

Preventing user namespace already creation comes in a few of forms in
order of granularity:

        1. /proc/sys/user/max_user_namespaces sysctl
        2. Distro specific patch(es)
        3. CONFIG_USER_NS

To block a task based on its attributes, the LSM hook cred_prepare is a
decent candidate for use because it provides more granular control, and
it is called before create_user_ns():

        cred = prepare_creds()
                security_prepare_creds()
                        call_int_hook(cred_prepare, ...
        if (cred)
                create_user_ns(cred)

Since security_prepare_creds() is meant for LSMs to copy and prepare
credentials, access control is an unintended use of the hook. [4]
Further, security_prepare_creds() will always return a ENOMEM if the
hook returns any non-zero error code.

This hook also does not handle the clone3 case which requires us to
access a user space pointer to know if we're in the CLONE_NEW_USER
call path which may be subject to a TOCTTOU attack.

Lastly, cred_prepare is called in many call paths, and a targeted hook
further limits the frequency of calls which is a beneficial outcome.
Therefore introduce a new function security_create_user_ns() with an
accompanying userns_create LSM hook.

With the new userns_create hook, users will have more control over the
observability and access control over user namespace creation. Users
should expect that normal operation of user namespaces will behave as
usual, and only be impacted when controls are implemented by users or
administrators.

This hook takes the prepared creds for LSM authors to write policy
against. On success, the new namespace is applied to credentials,
otherwise an error is returned.

Links:
1. https://nvd.nist.gov/vuln/detail/CVE-2022-0492
2. https://nvd.nist.gov/vuln/detail/CVE-2022-25636
3. https://nvd.nist.gov/vuln/detail/CVE-2022-34918
4. https://lore.kernel.org/all/1c4b1c0d-12f6-6e9e-a6a3-cdce7418110c@schaufler-ca.com/

Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Frederick Lawler <fred@cloudflare.com>

---
Changes since v4:
- Update commit description
Changes since v3:
- No changes
Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Changed commit wording
- Moved execution to be after id mapping check
- Changed signature to only accept a const struct cred *
---
 include/linux/lsm_hook_defs.h | 1 +
 include/linux/lsm_hooks.h     | 4 ++++
 include/linux/security.h      | 6 ++++++
 kernel/user_namespace.c       | 5 +++++
 security/security.c           | 5 +++++
 5 files changed, 21 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 806448173033..aa7272e83626 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -224,6 +224,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
 	 unsigned long arg3, unsigned long arg4, unsigned long arg5)
 LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
 	 struct inode *inode)
+LSM_HOOK(int, 0, userns_create, const struct cred *cred)
 LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
 LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp,
 	 u32 *secid)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 84a0d7e02176..2e11a2a22ed1 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -806,6 +806,10 @@
  *	security attributes, e.g. for /proc/pid inodes.
  *	@p contains the task_struct for the task.
  *	@inode contains the inode structure for the inode.
+ * @userns_create:
+ *	Check permission prior to creating a new user namespace.
+ *	@cred points to prepared creds.
+ *	Return 0 if successful, otherwise < 0 error code.
  *
  * Security hooks for Netlink messaging.
  *
diff --git a/include/linux/security.h b/include/linux/security.h
index 1bc362cb413f..767802fe9bfa 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -437,6 +437,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
 int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			unsigned long arg4, unsigned long arg5);
 void security_task_to_inode(struct task_struct *p, struct inode *inode);
+int security_create_user_ns(const struct cred *cred);
 int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
 void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid);
 int security_msg_msg_alloc(struct msg_msg *msg);
@@ -1194,6 +1195,11 @@ static inline int security_task_prctl(int option, unsigned long arg2,
 static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
 { }
 
+static inline int security_create_user_ns(const struct cred *cred)
+{
+	return 0;
+}
+
 static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
 					  short flag)
 {
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 5481ba44a8d6..3f464bbda0e9 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -9,6 +9,7 @@
 #include <linux/highuid.h>
 #include <linux/cred.h>
 #include <linux/securebits.h>
+#include <linux/security.h>
 #include <linux/keyctl.h>
 #include <linux/key-type.h>
 #include <keys/user-type.h>
@@ -113,6 +114,10 @@ int create_user_ns(struct cred *new)
 	    !kgid_has_mapping(parent_ns, group))
 		goto fail_dec;
 
+	ret = security_create_user_ns(new);
+	if (ret < 0)
+		goto fail_dec;
+
 	ret = -ENOMEM;
 	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
diff --git a/security/security.c b/security/security.c
index 14d30fec8a00..1e60c4b570ec 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1909,6 +1909,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode)
 	call_void_hook(task_to_inode, p, inode);
 }
 
+int security_create_user_ns(const struct cred *cred)
+{
+	return call_int_hook(userns_create, 0, cred);
+}
+
 int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag)
 {
 	return call_int_hook(ipc_permission, 0, ipcp, flag);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 2/4] bpf-lsm: Make bpf_lsm_userns_create() sleepable
  2022-08-15 16:20 [PATCH v5 0/4] Introduce security_create_user_ns() Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 1/4] security, lsm: " Frederick Lawler
@ 2022-08-15 16:20 ` Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 3/4] selftests/bpf: Add tests verifying bpf lsm userns_create hook Frederick Lawler
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Frederick Lawler @ 2022-08-15 16:20 UTC (permalink / raw)
  To: kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris, serge, paul,
	stephen.smalley.work, eparis, shuah, brauner, casey, ebiederm,
	bpf, linux-security-module, selinux, linux-kselftest
  Cc: linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz,
	Frederick Lawler

Users may want to audit calls to security_create_user_ns() and access
user space memory. Also create_user_ns() runs without
pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable
for mandatory access control policies.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Frederick Lawler <fred@cloudflare.com>

---
Changes since v4:
- None
Changes since v3:
- None
Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- None
---
 kernel/bpf/bpf_lsm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index fa71d58b7ded..761998fda762 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -335,6 +335,7 @@ BTF_ID(func, bpf_lsm_task_getsecid_obj)
 BTF_ID(func, bpf_lsm_task_prctl)
 BTF_ID(func, bpf_lsm_task_setscheduler)
 BTF_ID(func, bpf_lsm_task_to_inode)
+BTF_ID(func, bpf_lsm_userns_create)
 BTF_SET_END(sleepable_lsm_hooks)
 
 bool bpf_lsm_is_sleepable_hook(u32 btf_id)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 3/4] selftests/bpf: Add tests verifying bpf lsm userns_create hook
  2022-08-15 16:20 [PATCH v5 0/4] Introduce security_create_user_ns() Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 1/4] security, lsm: " Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 2/4] bpf-lsm: Make bpf_lsm_userns_create() sleepable Frederick Lawler
@ 2022-08-15 16:20 ` Frederick Lawler
  2022-08-15 16:20 ` [PATCH v5 4/4] selinux: Implement " Frederick Lawler
  2022-08-16 21:51 ` [PATCH v5 0/4] Introduce security_create_user_ns() Paul Moore
  4 siblings, 0 replies; 35+ messages in thread
From: Frederick Lawler @ 2022-08-15 16:20 UTC (permalink / raw)
  To: kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris, serge, paul,
	stephen.smalley.work, eparis, shuah, brauner, casey, ebiederm,
	bpf, linux-security-module, selinux, linux-kselftest
  Cc: linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz,
	Frederick Lawler

The LSM hook userns_create was introduced to provide LSM's an
opportunity to block or allow unprivileged user namespace creation. This
test serves two purposes: it provides a test eBPF implementation, and
tests the hook successfully blocks or allows user namespace creation.

This tests 3 cases:

        1. Unattached bpf program does not block unpriv user namespace
           creation.
        2. Attached bpf program allows user namespace creation given
           CAP_SYS_ADMIN privileges.
        3. Attached bpf program denies user namespace creation for a
           user without CAP_SYS_ADMIN.

Acked-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Frederick Lawler <fred@cloudflare.com>

---
The generic deny_namespace file name is used for future namespace
expansion. I didn't want to limit these files to just the create_user_ns
hook.
Changes since v4:
- None
Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given
  permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Introduce this patch
---
 .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
 .../selftests/bpf/progs/test_deny_namespace.c |  33 ++++++
 2 files changed, 135 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c

diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c
new file mode 100644
index 000000000000..1bc6241b755b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include "test_deny_namespace.skel.h"
+#include <sched.h>
+#include "cap_helpers.h"
+#include <stdio.h>
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+/* negative return value -> some internal error
+ * positive return value -> userns creation failed
+ * 0                     -> userns creation succeeded
+ */
+static int create_user_ns(void)
+{
+	pid_t pid;
+
+	pid = fork();
+	if (pid < 0)
+		return -1;
+
+	if (pid == 0) {
+		if (unshare(CLONE_NEWUSER))
+			_exit(EXIT_FAILURE);
+		_exit(EXIT_SUCCESS);
+	}
+
+	return wait_for_pid(pid);
+}
+
+static void test_userns_create_bpf(void)
+{
+	__u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
+	__u64 old_caps = 0;
+
+	cap_enable_effective(cap_mask, &old_caps);
+
+	ASSERT_OK(create_user_ns(), "priv new user ns");
+
+	cap_disable_effective(cap_mask, &old_caps);
+
+	ASSERT_EQ(create_user_ns(), EPERM, "unpriv new user ns");
+
+	if (cap_mask & old_caps)
+		cap_enable_effective(cap_mask, NULL);
+}
+
+static void test_unpriv_userns_create_no_bpf(void)
+{
+	__u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
+	__u64 old_caps = 0;
+
+	cap_disable_effective(cap_mask, &old_caps);
+
+	ASSERT_OK(create_user_ns(), "no-bpf unpriv new user ns");
+
+	if (cap_mask & old_caps)
+		cap_enable_effective(cap_mask, NULL);
+}
+
+void test_deny_namespace(void)
+{
+	struct test_deny_namespace *skel = NULL;
+	int err;
+
+	if (test__start_subtest("unpriv_userns_create_no_bpf"))
+		test_unpriv_userns_create_no_bpf();
+
+	skel = test_deny_namespace__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel load"))
+		goto close_prog;
+
+	err = test_deny_namespace__attach(skel);
+	if (!ASSERT_OK(err, "attach"))
+		goto close_prog;
+
+	if (test__start_subtest("userns_create_bpf"))
+		test_userns_create_bpf();
+
+	test_deny_namespace__detach(skel);
+
+close_prog:
+	test_deny_namespace__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c
new file mode 100644
index 000000000000..09ad5a4ebd1f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+#include <linux/capability.h>
+
+struct kernel_cap_struct {
+	__u32 cap[_LINUX_CAPABILITY_U32S_3];
+} __attribute__((preserve_access_index));
+
+struct cred {
+	struct kernel_cap_struct cap_effective;
+} __attribute__((preserve_access_index));
+
+char _license[] SEC("license") = "GPL";
+
+SEC("lsm.s/userns_create")
+int BPF_PROG(test_userns_create, const struct cred *cred, int ret)
+{
+	struct kernel_cap_struct caps = cred->cap_effective;
+	int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN);
+	__u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN);
+
+	if (ret)
+		return 0;
+
+	ret = -EPERM;
+	if (caps.cap[cap_index] & cap_mask)
+		return 0;
+
+	return -EPERM;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 4/4] selinux: Implement userns_create hook
  2022-08-15 16:20 [PATCH v5 0/4] Introduce security_create_user_ns() Frederick Lawler
                   ` (2 preceding siblings ...)
  2022-08-15 16:20 ` [PATCH v5 3/4] selftests/bpf: Add tests verifying bpf lsm userns_create hook Frederick Lawler
@ 2022-08-15 16:20 ` Frederick Lawler
  2022-08-16 21:51 ` [PATCH v5 0/4] Introduce security_create_user_ns() Paul Moore
  4 siblings, 0 replies; 35+ messages in thread
From: Frederick Lawler @ 2022-08-15 16:20 UTC (permalink / raw)
  To: kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris, serge, paul,
	stephen.smalley.work, eparis, shuah, brauner, casey, ebiederm,
	bpf, linux-security-module, selinux, linux-kselftest
  Cc: linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz,
	Frederick Lawler

Unprivileged user namespace creation is an intended feature to enable
sandboxing, however this feature is often used to as an initial step to
perform a privilege escalation attack.

This patch implements a new user_namespace { create } access control
permission to restrict which domains allow or deny user namespace
creation. This is necessary for system administrators to quickly protect
their systems while waiting for vulnerability patches to be applied.

This permission can be used in the following way:

        allow domA_t domA_t : user_namespace { create };

Signed-off-by: Frederick Lawler <fred@cloudflare.com>

---
Changes since v4:
- None
Changes since v3:
- None
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Introduce this patch
---
 security/selinux/hooks.c            | 9 +++++++++
 security/selinux/include/classmap.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 79573504783b..b9f1078450b3 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4221,6 +4221,14 @@ static void selinux_task_to_inode(struct task_struct *p,
 	spin_unlock(&isec->lock);
 }
 
+static int selinux_userns_create(const struct cred *cred)
+{
+	u32 sid = current_sid();
+
+	return avc_has_perm(&selinux_state, sid, sid, SECCLASS_USER_NAMESPACE,
+						USER_NAMESPACE__CREATE, NULL);
+}
+
 /* Returns error only if unable to parse addresses */
 static int selinux_parse_skb_ipv4(struct sk_buff *skb,
 			struct common_audit_data *ad, u8 *proto)
@@ -7111,6 +7119,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(task_movememory, selinux_task_movememory),
 	LSM_HOOK_INIT(task_kill, selinux_task_kill),
 	LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
+	LSM_HOOK_INIT(userns_create, selinux_userns_create),
 
 	LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
 	LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index ff757ae5f253..0bff55bb9cde 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -254,6 +254,8 @@ const struct security_class_mapping secclass_map[] = {
 	  { COMMON_FILE_PERMS, NULL } },
 	{ "io_uring",
 	  { "override_creds", "sqpoll", NULL } },
+	{ "user_namespace",
+	  { "create", NULL } },
 	{ NULL }
   };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-15 16:20 [PATCH v5 0/4] Introduce security_create_user_ns() Frederick Lawler
                   ` (3 preceding siblings ...)
  2022-08-15 16:20 ` [PATCH v5 4/4] selinux: Implement " Frederick Lawler
@ 2022-08-16 21:51 ` Paul Moore
  2022-08-17 15:07   ` Eric W. Biederman
  4 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-16 21:51 UTC (permalink / raw)
  To: Frederick Lawler
  Cc: kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris, serge,
	stephen.smalley.work, eparis, shuah, brauner, casey, ebiederm,
	bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

On Mon, Aug 15, 2022 at 12:20 PM Frederick Lawler <fred@cloudflare.com> wrote:
>
> While user namespaces do not make the kernel more vulnerable, they are however
> used to initiate exploits. Some users do not want to block namespace creation
> for the entirety of the system, which some distributions provide. Instead, we
> needed a way to have some applications be blocked, and others allowed. This is
> not possible with those tools. Managing hierarchies also did not fit our case
> because we're determining which tasks are allowed based on their attributes.
>
> While exploring a solution, we first leveraged the LSM cred_prepare hook
> because that is the closest hook to prevent a call to create_user_ns().
>
> The calls look something like this:
>
>     cred = prepare_creds()
>         security_prepare_creds()
>             call_int_hook(cred_prepare, ...
>     if (cred)
>         create_user_ns(cred)
>
> We noticed that error codes were not propagated from this hook and
> introduced a patch [1] to propagate those errors.
>
> The discussion notes that security_prepare_creds() is not appropriate for
> MAC policies, and instead the hook is meant for LSM authors to prepare
> credentials for mutation. [2]
>
> Additionally, cred_prepare hook is not without problems. Handling the clone3
> case is a bit more tricky due to the user space pointer passed to it. This
> makes checking the syscall subject to a possible TOCTTOU attack.
>
> Ultimately, we concluded that a better course of action is to introduce
> a new security hook for LSM authors. [3]
>
> This patch set first introduces a new security_create_user_ns() function
> and userns_create LSM hook, then marks the hook as sleepable in BPF. The
> following patches after include a BPF test and a patch for an SELinux
> implementation.
>
> We want to encourage use of user namespaces, and also cater the needs
> of users/administrators to observe and/or control access. There is no
> expectation of an impact on user space applications because access control
> is opt-in, and users wishing to observe within a LSM context
>
>
> Links:
> 1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
> 2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
> 3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare.com/
>
> Past discussions:
> V4: https://lore.kernel.org/all/20220801180146.1157914-1-fred@cloudflare.com/
> V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/
> V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/
> V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
>
> Changes since v4:
> - Update commit description
> - Update cover letter
> Changes since v3:
> - Explicitly set CAP_SYS_ADMIN to test namespace is created given
>   permission
> - Simplify BPF test to use sleepable hook only
> - Prefer unshare() over clone() for tests
> Changes since v2:
> - Rename create_user_ns hook to userns_create
> - Use user_namespace as an object opposed to a generic namespace object
> - s/domB_t/domA_t in commit message
> Changes since v1:
> - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
> - Add selinux: Implement create_user_ns hook patch
> - Change function signature of security_create_user_ns() to only take
>   struct cred
> - Move security_create_user_ns() call after id mapping check in
>   create_user_ns()
> - Update documentation to reflect changes
>
> Frederick Lawler (4):
>   security, lsm: Introduce security_create_user_ns()
>   bpf-lsm: Make bpf_lsm_userns_create() sleepable
>   selftests/bpf: Add tests verifying bpf lsm userns_create hook
>   selinux: Implement userns_create hook
>
>  include/linux/lsm_hook_defs.h                 |   1 +
>  include/linux/lsm_hooks.h                     |   4 +
>  include/linux/security.h                      |   6 ++
>  kernel/bpf/bpf_lsm.c                          |   1 +
>  kernel/user_namespace.c                       |   5 +
>  security/security.c                           |   5 +
>  security/selinux/hooks.c                      |   9 ++
>  security/selinux/include/classmap.h           |   2 +
>  .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
>  .../selftests/bpf/progs/test_deny_namespace.c |  33 ++++++
>  10 files changed, 168 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c

I just merged this into the lsm/next tree, thanks for seeing this
through Frederick, and thank you to everyone who took the time to
review the patches and add their tags.

  git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git next

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-16 21:51 ` [PATCH v5 0/4] Introduce security_create_user_ns() Paul Moore
@ 2022-08-17 15:07   ` Eric W. Biederman
  2022-08-17 16:01     ` Paul Moore
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2022-08-17 15:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederick Lawler, kpsingh, revest, jackmanb, ast, daniel, andrii,
	kafai, songliubraving, yhs, john.fastabend, jmorris, serge,
	stephen.smalley.work, eparis, shuah, brauner, casey, bpf,
	linux-security-module, selinux, linux-kselftest, linux-kernel,
	netdev, kernel-team, cgzones, karl, tixxdz, Paul Moore

>
> I just merged this into the lsm/next tree, thanks for seeing this
> through Frederick, and thank you to everyone who took the time to
> review the patches and add their tags.
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git next

Paul, Frederick

I repeat my NACK, in part because I am being ignored and in part
because the hook does not make technical sense.


Linus I want you to know that this has been put in the lsm tree against
my explicit and clear objections.

My request to talk about the actual problems that are being address has
been completely ignored.

I have been a bit slow in dealing with this conversation because I am
very much sick and not on top of my game, but that is no excuse to steam
roll over me, instead of addressing my concerns.


This is an irresponsible way of adding an access control to user
namespace creation.  This is a linux-api and manpages level kind of
change, as this is a semantic change visible to userspace.  Instead that
concern has been brushed off as different return code to userspace.

For observably this is a terrible LSM interface because there is no
pair with user namespace destruction, nor is their any ability for the
LSM to allocate any state to track the user namespace.  As there is no
patch actually calling audit or anything else observably does not appear
to be a driving factor of this new interface.




The common scenarios I am aware of for using the user namespace are:
- Creating a container.
- Using the user namespace to sandbox your application like chrome does.
- Running an exploit.

Returning an error code in the first 2 scenarios will create a userspace
regression as either userspace will run less securely or it won't work
at all.

Returning an error code in the third scenario when someone is trying to
exploit your machine is equally foolish as you are giving the exploit
the chance to continue running.  The application should be killed
instead.


Further adding a random failure mode to user namespace creation if it is
used at all will just encourage userspace to use a setuid application to
perform the namespace creation instead.  Creating a less secure system
overall.

If the concern is to reduce the attack surface everything this
proposed hook can do is already possible with the security_capable
security hook.

So Paul, Frederick please drop this.  I can't see what this new hook is
good for except creating regressions in existing userspace code.  I am
not willing to support such a hook in code that I maintain.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 15:07   ` Eric W. Biederman
@ 2022-08-17 16:01     ` Paul Moore
  2022-08-17 19:57       ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-17 16:01 UTC (permalink / raw)
  To: Eric W. Biederman, Linus Torvalds
  Cc: Frederick Lawler, kpsingh, revest, jackmanb, ast, daniel, andrii,
	kafai, songliubraving, yhs, john.fastabend, jmorris, serge,
	stephen.smalley.work, eparis, shuah, brauner, casey, bpf,
	linux-security-module, selinux, linux-kselftest, linux-kernel,
	netdev, kernel-team, cgzones, karl, tixxdz

On Wed, Aug 17, 2022 at 11:08 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
> > I just merged this into the lsm/next tree, thanks for seeing this
> > through Frederick, and thank you to everyone who took the time to
> > review the patches and add their tags.
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git next
>
> Paul, Frederick
>
> I repeat my NACK, in part because I am being ignored and in part
> because the hook does not make technical sense.
>
> Linus I want you to know that this has been put in the lsm tree against
> my explicit and clear objections.

Eric, we are disagreeing with you, not ignoring you; that's an
important distinction.  This is the fifth iteration of the patchset,
or the sixth (?) if you could Frederick's earlier attempts using the
credential hooks, and with each revision multiple people have tried to
work with you to find a mutually agreeable solution to the use cases
presented by Frederick and others.  In the end of the v4 discussion it
was my opinion that you kept moving the goalposts in an effort to
prevent any additional hooks/controls/etc. to the user namespace code
which is why I made the decision to merge the code into the lsm/next
branch against your wishes.  Multiple people have come out in support
of this functionality, and you remain the only one opposed to the
change; normally a maintainer's objection would be enough to block the
change, but it is my opinion that Eric is acting in bad faith.

At the end of the v4 patchset I suggested merging this into lsm/next
so it could get a full -rc cycle in linux-next, assuming no issues
were uncovered during testing I was planning to send it to Linus
during the next merge window with commentary on the contentiousness of
the patchset, including Eric's NACK.  I'm personally very disappointed
that it has come to this, but I'm at a loss of how to work with you
(Eric) to find a solution; this is the only path forward that I can
see at this point.  Others have expressed their agreement with this
approach, both on-list and privately.

If anyone other than Eric or myself has a different view of the
situation, *please* add your comments now.  I believe I've done a fair
job of summarizing things, but everyone has a bias and I'm definitely
no exception.

Finally, I'm going to refrain from rehashing the same arguments over
again in this revision of the patchset, instead I'll just provide
links to the previous drafts in case anyone wants to spend an hour or
two:

Revision v1
https://lore.kernel.org/linux-security-module/20220621233939.993579-1-fred@cloudflare.com/

Revision v2
https://lore.kernel.org/linux-security-module/20220707223228.1940249-1-fred@cloudflare.com/

Revision v3
https://lore.kernel.org/linux-security-module/20220721172808.585539-1-fred@cloudflare.com/

Revision v4
https://lore.kernel.org/linux-security-module/20220801180146.1157914-1-fred@cloudflare.com/

--
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 16:01     ` Paul Moore
@ 2022-08-17 19:57       ` Eric W. Biederman
  2022-08-17 20:13         ` Paul Moore
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2022-08-17 19:57 UTC (permalink / raw)
  To: Paul Moore
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

Paul Moore <paul@paul-moore.com> writes:

> At the end of the v4 patchset I suggested merging this into lsm/next
> so it could get a full -rc cycle in linux-next, assuming no issues
> were uncovered during testing

What in the world can be uncovered in linux-next for code that has no in
tree users.

That is one of my largest problems.  I want to talk about the users and
the use cases and I don't get dialog.  Nor do I get hey look back there
you missed it.

Since you don't want to rehash this.  I will just repeat my conclusion
that the patchset appears to introduce an ineffective defense that will
achieve nothing in the defense of the kernel, and so all it will achieve
a code maintenance burden and to occasionally break legitimate users of
the user namespace.

Further the process is broken.  You are changing the semantics of an
operation with the introduction of a security hook.  That needs a
man-page and discussion on linux-abi.  In general of the scrutiny we
give to new systems and changed system calls.  As this change
fundamentally changes the semantics of creating a user namespace.

Skipping that part of the process is not simply disagree that is being
irresponsible.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 19:57       ` Eric W. Biederman
@ 2022-08-17 20:13         ` Paul Moore
  2022-08-17 20:56           ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-17 20:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
>
> > At the end of the v4 patchset I suggested merging this into lsm/next
> > so it could get a full -rc cycle in linux-next, assuming no issues
> > were uncovered during testing
>
> What in the world can be uncovered in linux-next for code that has no in
> tree users.

The patchset provides both BPF LSM and SELinux implementations of the
hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
If no one beats me to it, I plan to work on adding a test to the
selinux-testsuite as soon as I'm done dealing with other urgent
LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
run these tests multiple times a week (multiple times a day sometimes)
against the -rcX kernels with the lsm/next, selinux/next, and
audit/next branches applied on top.  I know others do similar things.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 20:13         ` Paul Moore
@ 2022-08-17 20:56           ` Eric W. Biederman
  2022-08-17 21:09             ` Paul Moore
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2022-08-17 20:56 UTC (permalink / raw)
  To: Paul Moore
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

Paul Moore <paul@paul-moore.com> writes:

> On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Paul Moore <paul@paul-moore.com> writes:
>>
>> > At the end of the v4 patchset I suggested merging this into lsm/next
>> > so it could get a full -rc cycle in linux-next, assuming no issues
>> > were uncovered during testing
>>
>> What in the world can be uncovered in linux-next for code that has no in
>> tree users.
>
> The patchset provides both BPF LSM and SELinux implementations of the
> hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> If no one beats me to it, I plan to work on adding a test to the
> selinux-testsuite as soon as I'm done dealing with other urgent
> LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> run these tests multiple times a week (multiple times a day sometimes)
> against the -rcX kernels with the lsm/next, selinux/next, and
> audit/next branches applied on top.  I know others do similar things.

A layer of hooks that leaves all of the logic to userspace is not an
in-tree user for purposes of understanding the logic of the code.


The reason why I implemented user namespaces is so that all of linux's
neat features could be exposed to non-root userspace processes, in
a way that doesn't break suid root processes.


The access control you are adding to user namespaces looks to take that
away.  It looks to remove the whole point of user namespaces.


So without any mention of how people intend to use this feature, without
any code that uses this hook to implement semantics.  Without any talk
about how this semantic change is reasonable.  I strenuously object.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 20:56           ` Eric W. Biederman
@ 2022-08-17 21:09             ` Paul Moore
  2022-08-17 21:24               ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-17 21:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Paul Moore <paul@paul-moore.com> writes:
> >>
> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> >> > were uncovered during testing
> >>
> >> What in the world can be uncovered in linux-next for code that has no in
> >> tree users.
> >
> > The patchset provides both BPF LSM and SELinux implementations of the
> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> > If no one beats me to it, I plan to work on adding a test to the
> > selinux-testsuite as soon as I'm done dealing with other urgent
> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> > run these tests multiple times a week (multiple times a day sometimes)
> > against the -rcX kernels with the lsm/next, selinux/next, and
> > audit/next branches applied on top.  I know others do similar things.
>
> A layer of hooks that leaves all of the logic to userspace is not an
> in-tree user for purposes of understanding the logic of the code.

The BPF LSM selftests which are part of this patchset live in-tree.
The SELinux hook implementation is completely in-tree with the
subject/verb/object relationship clearly described by the code itself.
After all, the selinux_userns_create() function consists of only two
lines, one of which is an assignment.  Yes, it is true that the
SELinux policy lives outside the kernel, but that is because there is
no singular SELinux policy for everyone.  From a practical
perspective, the SELinux policy is really just a configuration file
used to setup the kernel at runtime; it is not significantly different
than an iptables script, /etc/sysctl.conf, or any of the other myriad
of configuration files used to configure the kernel during boot.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 21:09             ` Paul Moore
@ 2022-08-17 21:24               ` Eric W. Biederman
  2022-08-17 21:50                 ` Paul Moore
  2022-08-18 14:05                 ` Serge E. Hallyn
  0 siblings, 2 replies; 35+ messages in thread
From: Eric W. Biederman @ 2022-08-17 21:24 UTC (permalink / raw)
  To: Paul Moore
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

Paul Moore <paul@paul-moore.com> writes:

> On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Paul Moore <paul@paul-moore.com> writes:
>> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >> Paul Moore <paul@paul-moore.com> writes:
>> >>
>> >> > At the end of the v4 patchset I suggested merging this into lsm/next
>> >> > so it could get a full -rc cycle in linux-next, assuming no issues
>> >> > were uncovered during testing
>> >>
>> >> What in the world can be uncovered in linux-next for code that has no in
>> >> tree users.
>> >
>> > The patchset provides both BPF LSM and SELinux implementations of the
>> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
>> > If no one beats me to it, I plan to work on adding a test to the
>> > selinux-testsuite as soon as I'm done dealing with other urgent
>> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
>> > run these tests multiple times a week (multiple times a day sometimes)
>> > against the -rcX kernels with the lsm/next, selinux/next, and
>> > audit/next branches applied on top.  I know others do similar things.
>>
>> A layer of hooks that leaves all of the logic to userspace is not an
>> in-tree user for purposes of understanding the logic of the code.
>
> The BPF LSM selftests which are part of this patchset live in-tree.
> The SELinux hook implementation is completely in-tree with the
> subject/verb/object relationship clearly described by the code itself.
> After all, the selinux_userns_create() function consists of only two
> lines, one of which is an assignment.  Yes, it is true that the
> SELinux policy lives outside the kernel, but that is because there is
> no singular SELinux policy for everyone.  From a practical
> perspective, the SELinux policy is really just a configuration file
> used to setup the kernel at runtime; it is not significantly different
> than an iptables script, /etc/sysctl.conf, or any of the other myriad
> of configuration files used to configure the kernel during boot.

I object to adding the new system configuration knob.

Especially when I don't see people explaining why such a knob is a good
idea.  What is userspace going to do with this new feature that makes it
worth maintaining in the kernel?

That is always the conversation we have when adding new features, and
that is exactly the conversation that has not happened here.

Adding a layer of indirection should not exempt a new feature from
needing to justify itself.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 21:24               ` Eric W. Biederman
@ 2022-08-17 21:50                 ` Paul Moore
  2022-08-18  0:35                   ` Jonathan Chapman-Moore
  2022-08-18 14:05                 ` Serge E. Hallyn
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-17 21:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

On Wed, Aug 17, 2022 at 5:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> I object to adding the new system configuration knob.
>
> Especially when I don't see people explaining why such a knob is a good
> idea.  What is userspace going to do with this new feature that makes it
> worth maintaining in the kernel?

From https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf_eg@mail.gmail.com/

 "We have valid use cases not specifically related to the
  attack surface, but go into the middle from bpf observability
  to enforcement. As we want to track namespace creation, changes,
  nesting and per task creds context depending on the nature of
  the workload."
 -Djalal Harouni

From https://lore.kernel.org/linux-security-module/CALrw=nGT0kcHh4wyBwUF-Q8+v8DgnyEJM55vfmABwfU67EQn=g@mail.gmail.com/

 "[W]e do want to embrace user namespaces in our code and some of
  our workloads already depend on it. Hence we didn't agree to
  Debian's approach of just having a global sysctl. But there is
  "our code" and there is "third party" code, which might not even
  be open source due to various reasons. And while the path exists
  for that code to do something bad - we want to block it."
 -Ignat Korchagin

From https://lore.kernel.org/linux-security-module/CAHC9VhSKmqn5wxF3BZ67Z+-CV7sZzdnO+JODq48rZJ4WAe8ULA@mail.gmail.com/

 "I've heard you talk about bugs being the only reason why people
  would want to ever block user namespaces, but I think we've all
  seen use cases now where it goes beyond that.  However, even if
  it didn't, the need to build high confidence/assurance systems
  where big chunks of functionality can be disabled based on a
  security policy is a very real use case, and this patchset would
  help enable that."
 -Paul Moore (with apologies for self-quoting)

From https://lore.kernel.org/linux-security-module/CAHC9VhRSCXCM51xpOT95G_WVi=UQ44gNV=uvvG23p8wn16uYSA@mail.gmail.com/

 "One of the selling points of the BPF LSM is that it allows for
  various different ways of reporting and logging beyond audit.
  However, even if it was limited to just audit I believe that
  provides some useful justification as auditing fork()/clone()
  isn't quite the same and could be difficult to do at scale in
  some configurations."
 -Paul Moore (my apologies again)

From https://lore.kernel.org/linux-security-module/20220722082159.jgvw7jgds3qwfyqk@wittgenstein/

 "Nice and straightforward."
 -Christian Brauner

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 21:50                 ` Paul Moore
@ 2022-08-18  0:35                   ` Jonathan Chapman-Moore
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Chapman-Moore @ 2022-08-18  0:35 UTC (permalink / raw)
  To: Paul Moore, Eric W. Biederman
  Cc: Linus Torvalds, Frederick Lawler, kpsingh, revest, jackmanb, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

Hi,

Please remove me from this list and stop harassing me.

Jonathan Moore

-----Original Message-----
From: Paul Moore <paul@paul-moore.com> 
Sent: Wednesday, August 17, 2022 5:51 PM
To: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>; Frederick Lawler <fred@cloudflare.com>; kpsingh@kernel.org; revest@chromium.org; jackmanb@chromium.org; ast@kernel.org; daniel@iogearbox.net; andrii@kernel.org; kafai@fb.com; songliubraving@fb.com; yhs@fb.com; john.fastabend@gmail.com; jmorris@namei.org; serge@hallyn.com; stephen.smalley.work@gmail.com; eparis@parisplace.org; shuah@kernel.org; brauner@kernel.org; casey@schaufler-ca.com; bpf@vger.kernel.org; linux-security-module@vger.kernel.org; selinux@vger.kernel.org; linux-kselftest@vger.kernel.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; kernel-team@cloudflare.com; cgzones@googlemail.com; karl@bigbadwolfsecurity.com; tixxdz@gmail.com
Subject: Re: [PATCH v5 0/4] Introduce security_create_user_ns()

On Wed, Aug 17, 2022 at 5:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> I object to adding the new system configuration knob.
>
> Especially when I don't see people explaining why such a knob is a good
> idea.  What is userspace going to do with this new feature that makes it
> worth maintaining in the kernel?

From https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf_eg@mail.gmail.com/

 "We have valid use cases not specifically related to the
  attack surface, but go into the middle from bpf observability
  to enforcement. As we want to track namespace creation, changes,
  nesting and per task creds context depending on the nature of
  the workload."
 -Djalal Harouni

From https://lore.kernel.org/linux-security-module/CALrw=nGT0kcHh4wyBwUF-Q8+v8DgnyEJM55vfmABwfU67EQn=g@mail.gmail.com/

 "[W]e do want to embrace user namespaces in our code and some of
  our workloads already depend on it. Hence we didn't agree to
  Debian's approach of just having a global sysctl. But there is
  "our code" and there is "third party" code, which might not even
  be open source due to various reasons. And while the path exists
  for that code to do something bad - we want to block it."
 -Ignat Korchagin

From https://lore.kernel.org/linux-security-module/CAHC9VhSKmqn5wxF3BZ67Z+-CV7sZzdnO+JODq48rZJ4WAe8ULA@mail.gmail.com/

 "I've heard you talk about bugs being the only reason why people
  would want to ever block user namespaces, but I think we've all
  seen use cases now where it goes beyond that.  However, even if
  it didn't, the need to build high confidence/assurance systems
  where big chunks of functionality can be disabled based on a
  security policy is a very real use case, and this patchset would
  help enable that."
 -Paul Moore (with apologies for self-quoting)

From https://lore.kernel.org/linux-security-module/CAHC9VhRSCXCM51xpOT95G_WVi=UQ44gNV=uvvG23p8wn16uYSA@mail.gmail.com/

 "One of the selling points of the BPF LSM is that it allows for
  various different ways of reporting and logging beyond audit.
  However, even if it was limited to just audit I believe that
  provides some useful justification as auditing fork()/clone()
  isn't quite the same and could be difficult to do at scale in
  some configurations."
 -Paul Moore (my apologies again)

From https://lore.kernel.org/linux-security-module/20220722082159.jgvw7jgds3qwfyqk@wittgenstein/

 "Nice and straightforward."
 -Christian Brauner

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-17 21:24               ` Eric W. Biederman
  2022-08-17 21:50                 ` Paul Moore
@ 2022-08-18 14:05                 ` Serge E. Hallyn
  2022-08-18 15:11                   ` Paul Moore
  1 sibling, 1 reply; 35+ messages in thread
From: Serge E. Hallyn @ 2022-08-18 14:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul Moore, Linus Torvalds, Frederick Lawler, kpsingh, revest,
	jackmanb, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, jmorris, serge, stephen.smalley.work, eparis,
	shuah, brauner, casey, bpf, linux-security-module, selinux,
	linux-kselftest, linux-kernel, netdev, kernel-team, cgzones,
	karl, tixxdz

On Wed, Aug 17, 2022 at 04:24:28PM -0500, Eric W. Biederman wrote:
> Paul Moore <paul@paul-moore.com> writes:
> 
> > On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Paul Moore <paul@paul-moore.com> writes:
> >> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >> Paul Moore <paul@paul-moore.com> writes:
> >> >>
> >> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> >> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> >> >> > were uncovered during testing
> >> >>
> >> >> What in the world can be uncovered in linux-next for code that has no in
> >> >> tree users.
> >> >
> >> > The patchset provides both BPF LSM and SELinux implementations of the
> >> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> >> > If no one beats me to it, I plan to work on adding a test to the
> >> > selinux-testsuite as soon as I'm done dealing with other urgent
> >> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> >> > run these tests multiple times a week (multiple times a day sometimes)
> >> > against the -rcX kernels with the lsm/next, selinux/next, and
> >> > audit/next branches applied on top.  I know others do similar things.
> >>
> >> A layer of hooks that leaves all of the logic to userspace is not an
> >> in-tree user for purposes of understanding the logic of the code.
> >
> > The BPF LSM selftests which are part of this patchset live in-tree.
> > The SELinux hook implementation is completely in-tree with the
> > subject/verb/object relationship clearly described by the code itself.
> > After all, the selinux_userns_create() function consists of only two
> > lines, one of which is an assignment.  Yes, it is true that the
> > SELinux policy lives outside the kernel, but that is because there is
> > no singular SELinux policy for everyone.  From a practical
> > perspective, the SELinux policy is really just a configuration file
> > used to setup the kernel at runtime; it is not significantly different
> > than an iptables script, /etc/sysctl.conf, or any of the other myriad
> > of configuration files used to configure the kernel during boot.
> 
> I object to adding the new system configuration knob.

I do strongly sympathize with Eric's points.  It will be very easy, once
user namespace creation has been further restricted in some distros, to
say "well see this stuff is silly" and go back to simply requiring root
to create all containers and namespaces, which is generally quite a bit
easier anywway.  And then, of course, give everyone root so they can
start containers.

As Eric said,

 | Further adding a random failure mode to user namespace creation if it is
 | used at all will just encourage userspace to use a setuid application to
 | perform the namespace creation instead.  Creating a less secure system
 | overall.

However, I'm also looking at e.g. CVE-2022-2588 and CVE-2022-2586, and
yes there are two issues which do require discussion (three if you
count reportability, which is mainly a tool in guarding against the others).

The first is, indeed, configuration knobs.  There are tools, including
chrome, which use user namespaces to make things better.  The hope is
that more and more tools will do so.

The second is damage control.  When an 0day has been announced, things
change.  You can say "well the bug was there all along", but it is
different when every lazy ne'erdowell can pick an exploit off a mailing
list and use it against a product for which spinning a new version with
a new kernel and getting customers to update is probably a months-long
endeavor.  Some of these products do in fact require namespaces (user
and otherwise) as part of their function.  And - to my chagrin - I suspect
most of them create usernamespace as the root user, before possibly processing
untrusted user input, so unprivileged_userns_clone isn't a good fit.

SELinux (and LSMs in generaly) do in fact seem like a useful place to
add some configuration, because they tend to assign different domains
to tasks with different purposes and trust levels.  But another such
place is the init system / service manager.  And in most cases these
days, this will use cgroups to collect tasks of certain types.  So I
wonder (this is ALMOST ENTIRELY thinking out loud, not thought through
sufficiently) whether we should be setting a cgroup.nslock or
somesuch.

Of course, kernel livepatch is another potentially useful mitigation.
Currently that's not possible for everyone.

Maybe there is a more fundamental way we can approach this.  Part of me
still likes the idea of splitting the id mapping and capability-in-userns
parts, but that's not sufficient.  Maybe looking over all the relevant
CVEs would give a better hint.

Eric, you said

 | If the concern is to reduce the attack surface everything this
 | proposed hook can do is already possible with the security_capable
 | security hook.

I suppose I could envision an LSM which gets activated when we find
out there was a net-ns-exacerbated 0-day, which refuses CAP_NET_ADMIN
for a task not in init_user_ns?  Ideally it would be more flexible
than that.

> idea.  What is userspace going to do with this new feature that makes it
> worth maintaining in the kernel?
> 
> That is always the conversation we have when adding new features, and
> that is exactly the conversation that has not happened here.

Eric and Paul, I wonder, will you - or some people you'd like to represent
you - be at plumbers in September?  Should there be a BOF session there?  (I
won't be there, but could join over video)  I think a brainstorming session 
for solutions to the above problems would be good.

> Adding a layer of indirection should not exempt a new feature from
> needing to justify itself.
> 
> Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-18 14:05                 ` Serge E. Hallyn
@ 2022-08-18 15:11                   ` Paul Moore
  2022-08-19 14:45                     ` Serge E. Hallyn
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-18 15:11 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Linus Torvalds, Frederick Lawler, kpsingh,
	revest, jackmanb, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, jmorris, stephen.smalley.work, eparis,
	shuah, brauner, casey, bpf, linux-security-module, selinux,
	linux-kselftest, linux-kernel, netdev, kernel-team, cgzones,
	karl, tixxdz

On Thu, Aug 18, 2022 at 10:05 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Wed, Aug 17, 2022 at 04:24:28PM -0500, Eric W. Biederman wrote:
> > Paul Moore <paul@paul-moore.com> writes:
> > > On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > >> Paul Moore <paul@paul-moore.com> writes:
> > >> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > >> >> Paul Moore <paul@paul-moore.com> writes:
> > >> >>
> > >> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> > >> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> > >> >> > were uncovered during testing
> > >> >>
> > >> >> What in the world can be uncovered in linux-next for code that has no in
> > >> >> tree users.
> > >> >
> > >> > The patchset provides both BPF LSM and SELinux implementations of the
> > >> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> > >> > If no one beats me to it, I plan to work on adding a test to the
> > >> > selinux-testsuite as soon as I'm done dealing with other urgent
> > >> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> > >> > run these tests multiple times a week (multiple times a day sometimes)
> > >> > against the -rcX kernels with the lsm/next, selinux/next, and
> > >> > audit/next branches applied on top.  I know others do similar things.
> > >>
> > >> A layer of hooks that leaves all of the logic to userspace is not an
> > >> in-tree user for purposes of understanding the logic of the code.
> > >
> > > The BPF LSM selftests which are part of this patchset live in-tree.
> > > The SELinux hook implementation is completely in-tree with the
> > > subject/verb/object relationship clearly described by the code itself.
> > > After all, the selinux_userns_create() function consists of only two
> > > lines, one of which is an assignment.  Yes, it is true that the
> > > SELinux policy lives outside the kernel, but that is because there is
> > > no singular SELinux policy for everyone.  From a practical
> > > perspective, the SELinux policy is really just a configuration file
> > > used to setup the kernel at runtime; it is not significantly different
> > > than an iptables script, /etc/sysctl.conf, or any of the other myriad
> > > of configuration files used to configure the kernel during boot.
> >
> > I object to adding the new system configuration knob.
>
> I do strongly sympathize with Eric's points.  It will be very easy, once
> user namespace creation has been further restricted in some distros, to
> say "well see this stuff is silly" and go back to simply requiring root
> to create all containers and namespaces, which is generally quite a bit
> easier anywway.  And then, of course, give everyone root so they can
> start containers.

That's assuming a lot.  Many years have passed since namespaces were
first introduced, and awareness of good security practices has
improved, perhaps not as much as any of us would like, but to say that
distros, system builders, and even users are the same as they were so
many years ago is a bit of a stretch in my opinion.

However, even ignoring that for a moment, do we really want to go to a
place where we dictate how users compose and secure their systems?
Linux "took over the world" because it offered a level of flexibility
that wasn't really possible before, and it has flourished because it
has kept that mentality.  The Linux Kernel can be shoehorned onto most
hardware that you can get your hands on these days, with driver
support for most anything you can think to plug into the system.  Do
you want a single-user environment with no per-user separation?  We
can do that.  Do you want a traditional DAC based system that leans
heavy on ACLs and capabilities?  We can do that.  Do you want a
container host that allows you to carve up the system with a high
degree of granularity thanks to the different namespaces?  We can do
that.  How about a system that leverages the LSM to enforce a least
privilege ideal, even on the most privileged root user?  We can do
that too.  This patchset is about giving distro, system builders, and
users another choice in how they build their system.  We've seen both
in this patchset and in previously failed attempts that there is a
definite want from a user perspective for functionality such as this,
and I think it's time we deliver it in the upstream kernel so they
don't have to keep patching their own systems with out-of-tree
patches.

> Eric and Paul, I wonder, will you - or some people you'd like to represent
> you - be at plumbers in September?  Should there be a BOF session there?  (I
> won't be there, but could join over video)  I think a brainstorming session
> for solutions to the above problems would be good.

Regardless of if Eric or I will be at LPC, it is doubtful that all of
the people who have participated in this discussion will be able to
attend, and I think it's important that the users who are asking for
this patchset have a chance to be heard in each forum where this is
discussed.  While conferences are definitely nice - I definitely
missed them over the past couple of years - we can't use them as a
crutch to help us reach a conclusion on this issue; we've debated much
more difficult things over the mailing lists, I see no reason why this
would be any different.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-18 15:11                   ` Paul Moore
@ 2022-08-19 14:45                     ` Serge E. Hallyn
  2022-08-19 21:10                       ` Paul Moore
  0 siblings, 1 reply; 35+ messages in thread
From: Serge E. Hallyn @ 2022-08-19 14:45 UTC (permalink / raw)
  To: Paul Moore
  Cc: Serge E. Hallyn, Eric W. Biederman, Linus Torvalds,
	Frederick Lawler, kpsingh, revest, jackmanb, ast, daniel, andrii,
	kafai, songliubraving, yhs, john.fastabend, jmorris,
	stephen.smalley.work, eparis, shuah, brauner, casey, bpf,
	linux-security-module, selinux, linux-kselftest, linux-kernel,
	netdev, kernel-team, cgzones, karl, tixxdz

On Thu, Aug 18, 2022 at 11:11:06AM -0400, Paul Moore wrote:
> On Thu, Aug 18, 2022 at 10:05 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > On Wed, Aug 17, 2022 at 04:24:28PM -0500, Eric W. Biederman wrote:
> > > Paul Moore <paul@paul-moore.com> writes:
> > > > On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > >> Paul Moore <paul@paul-moore.com> writes:
> > > >> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > >> >> Paul Moore <paul@paul-moore.com> writes:
> > > >> >>
> > > >> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> > > >> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> > > >> >> > were uncovered during testing
> > > >> >>
> > > >> >> What in the world can be uncovered in linux-next for code that has no in
> > > >> >> tree users.
> > > >> >
> > > >> > The patchset provides both BPF LSM and SELinux implementations of the
> > > >> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> > > >> > If no one beats me to it, I plan to work on adding a test to the
> > > >> > selinux-testsuite as soon as I'm done dealing with other urgent
> > > >> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> > > >> > run these tests multiple times a week (multiple times a day sometimes)
> > > >> > against the -rcX kernels with the lsm/next, selinux/next, and
> > > >> > audit/next branches applied on top.  I know others do similar things.
> > > >>
> > > >> A layer of hooks that leaves all of the logic to userspace is not an
> > > >> in-tree user for purposes of understanding the logic of the code.
> > > >
> > > > The BPF LSM selftests which are part of this patchset live in-tree.
> > > > The SELinux hook implementation is completely in-tree with the
> > > > subject/verb/object relationship clearly described by the code itself.
> > > > After all, the selinux_userns_create() function consists of only two
> > > > lines, one of which is an assignment.  Yes, it is true that the
> > > > SELinux policy lives outside the kernel, but that is because there is
> > > > no singular SELinux policy for everyone.  From a practical
> > > > perspective, the SELinux policy is really just a configuration file
> > > > used to setup the kernel at runtime; it is not significantly different
> > > > than an iptables script, /etc/sysctl.conf, or any of the other myriad
> > > > of configuration files used to configure the kernel during boot.
> > >
> > > I object to adding the new system configuration knob.
> >
> > I do strongly sympathize with Eric's points.  It will be very easy, once
> > user namespace creation has been further restricted in some distros, to
> > say "well see this stuff is silly" and go back to simply requiring root
> > to create all containers and namespaces, which is generally quite a bit
> > easier anywway.  And then, of course, give everyone root so they can
> > start containers.
> 
> That's assuming a lot.  Many years have passed since namespaces were
> first introduced, and awareness of good security practices has
> improved, perhaps not as much as any of us would like, but to say that
> distros, system builders, and even users are the same as they were so
> many years ago is a bit of a stretch in my opinion.

Maybe.  But I do get a bit worried based on some of what I've been
reading in mailing lists lately.  Kernel dev definitely moves like
fashion - remember when every api should have its own filesystem?
That was not a different group of people.

> However, even ignoring that for a moment, do we really want to go to a
> place where we dictate how users compose and secure their systems?
> Linux "took over the world" because it offered a level of flexibility
> that wasn't really possible before, and it has flourished because it
> has kept that mentality.  The Linux Kernel can be shoehorned onto most
> hardware that you can get your hands on these days, with driver
> support for most anything you can think to plug into the system.  Do
> you want a single-user environment with no per-user separation?  We
> can do that.  Do you want a traditional DAC based system that leans
> heavy on ACLs and capabilities?  We can do that.  Do you want a
> container host that allows you to carve up the system with a high
> degree of granularity thanks to the different namespaces?  We can do
> that.  How about a system that leverages the LSM to enforce a least
> privilege ideal, even on the most privileged root user?  We can do
> that too.  This patchset is about giving distro, system builders, and
> users another choice in how they build their system.  We've seen both

Oh, you misunderstand.  Whereas I do feel there are important concerns in
Eric's objections, and whereas I don't feel this set sufficiently
addresses the problems that I see and outlined above, I do see value in
this set, and was not aiming to deter it.  We need better ways to
mitigate a certain clas sof 0-days without completely disallowing use of
user namespaces, and this may help.

> in this patchset and in previously failed attempts that there is a
> definite want from a user perspective for functionality such as this,
> and I think it's time we deliver it in the upstream kernel so they
> don't have to keep patching their own systems with out-of-tree
> patches.
> 
> > Eric and Paul, I wonder, will you - or some people you'd like to represent
> > you - be at plumbers in September?  Should there be a BOF session there?  (I
> > won't be there, but could join over video)  I think a brainstorming session
> > for solutions to the above problems would be good.
> 
> Regardless of if Eric or I will be at LPC, it is doubtful that all of
> the people who have participated in this discussion will be able to
> attend, and I think it's important that the users who are asking for
> this patchset have a chance to be heard in each forum where this is
> discussed.  While conferences are definitely nice - I definitely
> missed them over the past couple of years - we can't use them as a
> crutch to help us reach a conclusion on this issue; we've debated much

No I wasn't thinking we would use LPC to decide on this patchset.  As far
as I can see, the patchset is merged.  I am hoping we can come up with
"something better" to address people's needs, make everyone happy, and
bring forth world peace.  Which would stack just fine with what's here
for defense in depth.

You may well not be interested in further work, and that's fine.  I need
to set aside a few days to think on this.

> more difficult things over the mailing lists, I see no reason why this
> would be any different.
> 
> -- 
> paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-19 14:45                     ` Serge E. Hallyn
@ 2022-08-19 21:10                       ` Paul Moore
  2022-08-25 18:15                         ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-19 21:10 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Linus Torvalds, Frederick Lawler, kpsingh,
	revest, jackmanb, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, jmorris, stephen.smalley.work, eparis,
	shuah, brauner, casey, bpf, linux-security-module, selinux,
	linux-kselftest, linux-kernel, netdev, kernel-team, cgzones,
	karl, tixxdz

On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Thu, Aug 18, 2022 at 11:11:06AM -0400, Paul Moore wrote:
> > On Thu, Aug 18, 2022 at 10:05 AM Serge E. Hallyn <serge@hallyn.com> wrote:

...

> > > I do strongly sympathize with Eric's points.  It will be very easy, once
> > > user namespace creation has been further restricted in some distros, to
> > > say "well see this stuff is silly" and go back to simply requiring root
> > > to create all containers and namespaces, which is generally quite a bit
> > > easier anywway.  And then, of course, give everyone root so they can
> > > start containers.
> >
> > That's assuming a lot.  Many years have passed since namespaces were
> > first introduced, and awareness of good security practices has
> > improved, perhaps not as much as any of us would like, but to say that
> > distros, system builders, and even users are the same as they were so
> > many years ago is a bit of a stretch in my opinion.
>
> Maybe.  But I do get a bit worried based on some of what I've been
> reading in mailing lists lately.  Kernel dev definitely moves like
> fashion - remember when every api should have its own filesystem?
> That was not a different group of people.

I'm not going to argue against the idea that kernel development is
subject to fads, I just don't agree that adding a LSM control point
for user namespace creation is going to be the end of user namespaces.

> > However, even ignoring that for a moment, do we really want to go to a
> > place where we dictate how users compose and secure their systems?
> > Linux "took over the world" because it offered a level of flexibility
> > that wasn't really possible before, and it has flourished because it
> > has kept that mentality.  The Linux Kernel can be shoehorned onto most
> > hardware that you can get your hands on these days, with driver
> > support for most anything you can think to plug into the system.  Do
> > you want a single-user environment with no per-user separation?  We
> > can do that.  Do you want a traditional DAC based system that leans
> > heavy on ACLs and capabilities?  We can do that.  Do you want a
> > container host that allows you to carve up the system with a high
> > degree of granularity thanks to the different namespaces?  We can do
> > that.  How about a system that leverages the LSM to enforce a least
> > privilege ideal, even on the most privileged root user?  We can do
> > that too.  This patchset is about giving distro, system builders, and
> > users another choice in how they build their system.  We've seen both
>
> Oh, you misunderstand.  Whereas I do feel there are important concerns in
> Eric's objections, and whereas I don't feel this set sufficiently
> addresses the problems that I see and outlined above, I do see value in
> this set, and was not aiming to deter it.  We need better ways to
> mitigate a certain clas sof 0-days without completely disallowing use of
> user namespaces, and this may help.

Ah, thanks for the explanation, I missed that (obviously) in your
previous email.  If I'm perfectly honest, I suppose the protracted
debate with Eric has also left me a little overly sensitive to any
perceived arguments against this patchset.

> > in this patchset and in previously failed attempts that there is a
> > definite want from a user perspective for functionality such as this,
> > and I think it's time we deliver it in the upstream kernel so they
> > don't have to keep patching their own systems with out-of-tree
> > patches.
> >
> > > Eric and Paul, I wonder, will you - or some people you'd like to represent
> > > you - be at plumbers in September?  Should there be a BOF session there?  (I
> > > won't be there, but could join over video)  I think a brainstorming session
> > > for solutions to the above problems would be good.
> >
> > Regardless of if Eric or I will be at LPC, it is doubtful that all of
> > the people who have participated in this discussion will be able to
> > attend, and I think it's important that the users who are asking for
> > this patchset have a chance to be heard in each forum where this is
> > discussed.  While conferences are definitely nice - I definitely
> > missed them over the past couple of years - we can't use them as a
> > crutch to help us reach a conclusion on this issue; we've debated much
>
> No I wasn't thinking we would use LPC to decide on this patchset.  As far
> as I can see, the patchset is merged.

While I maintain that Frederick's patches are a good thing, I'm not
going to consider them "merged" until I see them in Linus' tree or
Linus decided to voice his support on the lists.  These patches do
have Eric's NACK, and a maintainer's NACK isn't something to take
lightly.  I certainly don't.

>  I am hoping we can come up with
> "something better" to address people's needs, make everyone happy, and
> bring forth world peace.  Which would stack just fine with what's here
> for defense in depth.
>
> You may well not be interested in further work, and that's fine.  I need
> to set aside a few days to think on this.

I'm happy to continue the discussion as long as it's constructive; I
think we all are.  My gut feeling is that Frederick's approach falls
closest to the sweet spot of "workable without being overly offensive"
(*cough*), but if you've got an additional approach in mind, or an
alternative approach that solves the same use case problems, I think
we'd all love to hear about it.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-19 21:10                       ` Paul Moore
@ 2022-08-25 18:15                         ` Eric W. Biederman
  2022-08-25 19:19                           ` Paul Moore
  2022-08-26 15:23                           ` Serge E. Hallyn
  0 siblings, 2 replies; 35+ messages in thread
From: Eric W. Biederman @ 2022-08-25 18:15 UTC (permalink / raw)
  To: Paul Moore
  Cc: Serge E. Hallyn, Linus Torvalds, Frederick Lawler, kpsingh,
	revest, jackmanb, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, jmorris, stephen.smalley.work, eparis,
	shuah, brauner, casey, bpf, linux-security-module, selinux,
	linux-kselftest, linux-kernel, netdev, kernel-team, cgzones,
	karl, tixxdz

Paul Moore <paul@paul-moore.com> writes:

> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>>  I am hoping we can come up with
>> "something better" to address people's needs, make everyone happy, and
>> bring forth world peace.  Which would stack just fine with what's here
>> for defense in depth.
>>
>> You may well not be interested in further work, and that's fine.  I need
>> to set aside a few days to think on this.
>
> I'm happy to continue the discussion as long as it's constructive; I
> think we all are.  My gut feeling is that Frederick's approach falls
> closest to the sweet spot of "workable without being overly offensive"
> (*cough*), but if you've got an additional approach in mind, or an
> alternative approach that solves the same use case problems, I think
> we'd all love to hear about it.

I would love to actually hear the problems people are trying to solve so
that we can have a sensible conversation about the trade offs.

As best I can tell without more information people want to use
the creation of a user namespace as a signal that the code is
attempting an exploit.

As such let me propose instead of returning an error code which will let
the exploit continue, have the security hook return a bool.  With true
meaning the code can continue and on false it will trigger using SIGSYS
to terminate the program like seccomp does.

I am not super fond of that idea, but it means that userspace code is
not expected to deal with the situation, and the only conversation a
userspace application developer needs to enter into with a system
administrator or security policy developer is one to prove they are not
exploit code.  Plus it makes much more sense to kill an exploit
immediately instead of letting it run.


In general when addressing code coverage concerns I think it makes more
sense to use the security hooks to implement some variety of the principle
of least privilege and only give applications access to the kernel
facilities they are known to use.

As far as I can tell creating a user namespace does not increase the
attack surface.  It is the creation of the other namespaces from a user
namespace that begins to do that.  So in general I would think
restrictions should be in places they matter.

Just like the bugs that have exploits that involve the user namespace
are not user namespace bugs, but instead they are bugs in other
subsystems that just happen to go through the user namespace as the
easiest path to the buggy code, not the only path to the buggy code.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 18:15                         ` Eric W. Biederman
@ 2022-08-25 19:19                           ` Paul Moore
  2022-08-25 21:58                             ` Song Liu
  2022-08-26  9:10                             ` Ignat Korchagin
  2022-08-26 15:23                           ` Serge E. Hallyn
  1 sibling, 2 replies; 35+ messages in thread
From: Paul Moore @ 2022-08-25 19:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Linus Torvalds, Frederick Lawler, kpsingh,
	revest, jackmanb, ast, daniel, andrii, kafai, songliubraving,
	yhs, john.fastabend, jmorris, stephen.smalley.work, eparis,
	shuah, brauner, casey, bpf, linux-security-module, selinux,
	linux-kselftest, linux-kernel, netdev, kernel-team, cgzones,
	karl, tixxdz

On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> > On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>  I am hoping we can come up with
> >> "something better" to address people's needs, make everyone happy, and
> >> bring forth world peace.  Which would stack just fine with what's here
> >> for defense in depth.
> >>
> >> You may well not be interested in further work, and that's fine.  I need
> >> to set aside a few days to think on this.
> >
> > I'm happy to continue the discussion as long as it's constructive; I
> > think we all are.  My gut feeling is that Frederick's approach falls
> > closest to the sweet spot of "workable without being overly offensive"
> > (*cough*), but if you've got an additional approach in mind, or an
> > alternative approach that solves the same use case problems, I think
> > we'd all love to hear about it.
>
> I would love to actually hear the problems people are trying to solve so
> that we can have a sensible conversation about the trade offs.

Here are several taken from the previous threads, it's surely not a
complete list, but it should give you a good idea:

https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/

> As best I can tell without more information people want to use
> the creation of a user namespace as a signal that the code is
> attempting an exploit.

Some use cases are like that, there are several other use cases that
go beyond this; see all of our previous discussions on this
topic/patchset.  As has been mentioned before, there are use cases
that require improved observability, access control, or both.

> As such let me propose instead of returning an error code which will let
> the exploit continue, have the security hook return a bool.  With true
> meaning the code can continue and on false it will trigger using SIGSYS
> to terminate the program like seccomp does.

Having the kernel forcibly exit the process isn't something that most
LSMs would likely want.  I suppose we could modify the hook/caller so
that *if* an LSM wanted to return SIGSYS the system would kill the
process, but I would want that to be something in addition to
returning an error code like LSMs normally do (e.g. EACCES).

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 19:19                           ` Paul Moore
@ 2022-08-25 21:58                             ` Song Liu
  2022-08-25 22:10                               ` Paul Moore
  2022-08-26 15:24                               ` Serge E. Hallyn
  2022-08-26  9:10                             ` Ignat Korchagin
  1 sibling, 2 replies; 35+ messages in thread
From: Song Liu @ 2022-08-25 21:58 UTC (permalink / raw)
  To: Paul Moore
  Cc: Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz



> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
> 
> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Paul Moore <paul@paul-moore.com> writes:
>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>>>> I am hoping we can come up with
>>>> "something better" to address people's needs, make everyone happy, and
>>>> bring forth world peace.  Which would stack just fine with what's here
>>>> for defense in depth.
>>>> 
>>>> You may well not be interested in further work, and that's fine.  I need
>>>> to set aside a few days to think on this.
>>> 
>>> I'm happy to continue the discussion as long as it's constructive; I
>>> think we all are.  My gut feeling is that Frederick's approach falls
>>> closest to the sweet spot of "workable without being overly offensive"
>>> (*cough*), but if you've got an additional approach in mind, or an
>>> alternative approach that solves the same use case problems, I think
>>> we'd all love to hear about it.
>> 
>> I would love to actually hear the problems people are trying to solve so
>> that we can have a sensible conversation about the trade offs.
> 
> Here are several taken from the previous threads, it's surely not a
> complete list, but it should give you a good idea:
> 
> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> 
>> As best I can tell without more information people want to use
>> the creation of a user namespace as a signal that the code is
>> attempting an exploit.
> 
> Some use cases are like that, there are several other use cases that
> go beyond this; see all of our previous discussions on this
> topic/patchset.  As has been mentioned before, there are use cases
> that require improved observability, access control, or both.
> 
>> As such let me propose instead of returning an error code which will let
>> the exploit continue, have the security hook return a bool.  With true
>> meaning the code can continue and on false it will trigger using SIGSYS
>> to terminate the program like seccomp does.
> 
> Having the kernel forcibly exit the process isn't something that most
> LSMs would likely want.  I suppose we could modify the hook/caller so
> that *if* an LSM wanted to return SIGSYS the system would kill the
> process, but I would want that to be something in addition to
> returning an error code like LSMs normally do (e.g. EACCES).

I am new to user_namespace and security work, so please pardon me if
anything below is very wrong. 

IIUC, user_namespace is a tool that enables trusted userspace code to 
control the behavior of untrusted (or less trusted) userspace code. 
Failing create_user_ns() doesn't make the system more reliable. 
Specifically, we call create_user_ns() via two paths: fork/clone and 
unshare. For both paths, we need the userspace to use user_namespace, 
and to honor failed create_user_ns(). 

On the other hand, I would echo that killing the process is not 
practical in some use cases. Specifically, allowing the application to 
run in a less secure environment for a short period of time might be 
much better than killing it and taking down the whole service. Of 
course, there are other cases that security is more important, and 
taking down the whole service is the better choice. 

I guess the ultimate solution is a way to enforce using user_namespace
in the kernel (if it ever makes sense...). But I don't know how that
gonna work. Before we have such solution, maybe we only need an 
void hook for observability (or just a tracepoint, coming from BPF
background). 

Thanks,
Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 21:58                             ` Song Liu
@ 2022-08-25 22:10                               ` Paul Moore
  2022-08-25 22:42                                 ` Song Liu
  2022-08-26 15:24                               ` Serge E. Hallyn
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-25 22:10 UTC (permalink / raw)
  To: Song Liu
  Cc: Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz

On Thu, Aug 25, 2022 at 5:58 PM Song Liu <songliubraving@fb.com> wrote:
> > On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Paul Moore <paul@paul-moore.com> writes:
> >>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>> I am hoping we can come up with
> >>>> "something better" to address people's needs, make everyone happy, and
> >>>> bring forth world peace.  Which would stack just fine with what's here
> >>>> for defense in depth.
> >>>>
> >>>> You may well not be interested in further work, and that's fine.  I need
> >>>> to set aside a few days to think on this.
> >>>
> >>> I'm happy to continue the discussion as long as it's constructive; I
> >>> think we all are.  My gut feeling is that Frederick's approach falls
> >>> closest to the sweet spot of "workable without being overly offensive"
> >>> (*cough*), but if you've got an additional approach in mind, or an
> >>> alternative approach that solves the same use case problems, I think
> >>> we'd all love to hear about it.
> >>
> >> I would love to actually hear the problems people are trying to solve so
> >> that we can have a sensible conversation about the trade offs.
> >
> > Here are several taken from the previous threads, it's surely not a
> > complete list, but it should give you a good idea:
> >
> > https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> >
> >> As best I can tell without more information people want to use
> >> the creation of a user namespace as a signal that the code is
> >> attempting an exploit.
> >
> > Some use cases are like that, there are several other use cases that
> > go beyond this; see all of our previous discussions on this
> > topic/patchset.  As has been mentioned before, there are use cases
> > that require improved observability, access control, or both.
> >
> >> As such let me propose instead of returning an error code which will let
> >> the exploit continue, have the security hook return a bool.  With true
> >> meaning the code can continue and on false it will trigger using SIGSYS
> >> to terminate the program like seccomp does.
> >
> > Having the kernel forcibly exit the process isn't something that most
> > LSMs would likely want.  I suppose we could modify the hook/caller so
> > that *if* an LSM wanted to return SIGSYS the system would kill the
> > process, but I would want that to be something in addition to
> > returning an error code like LSMs normally do (e.g. EACCES).
>
> I am new to user_namespace and security work, so please pardon me if
> anything below is very wrong.
>
> IIUC, user_namespace is a tool that enables trusted userspace code to
> control the behavior of untrusted (or less trusted) userspace code.
> Failing create_user_ns() doesn't make the system more reliable.
> Specifically, we call create_user_ns() via two paths: fork/clone and
> unshare. For both paths, we need the userspace to use user_namespace,
> and to honor failed create_user_ns().
>
> On the other hand, I would echo that killing the process is not
> practical in some use cases. Specifically, allowing the application to
> run in a less secure environment for a short period of time might be
> much better than killing it and taking down the whole service. Of
> course, there are other cases that security is more important, and
> taking down the whole service is the better choice.
>
> I guess the ultimate solution is a way to enforce using user_namespace
> in the kernel (if it ever makes sense...).

The LSM framework, and the BPF and SELinux LSM implementations in this
patchset, provide a mechanism to do just that: kernel enforced access
controls using flexible security policies which can be tailored by the
distro, solution provider, or end user to meet the specific needs of
their use case.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 22:10                               ` Paul Moore
@ 2022-08-25 22:42                                 ` Song Liu
  2022-08-26 15:02                                   ` Paul Moore
  0 siblings, 1 reply; 35+ messages in thread
From: Song Liu @ 2022-08-25 22:42 UTC (permalink / raw)
  To: Paul Moore
  Cc: Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz



> On Aug 25, 2022, at 3:10 PM, Paul Moore <paul@paul-moore.com> wrote:
> 
> On Thu, Aug 25, 2022 at 5:58 PM Song Liu <songliubraving@fb.com> wrote:
>>> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
>>> 
>>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>> Paul Moore <paul@paul-moore.com> writes:
>>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>>>>>> I am hoping we can come up with
>>>>>> "something better" to address people's needs, make everyone happy, and
>>>>>> bring forth world peace.  Which would stack just fine with what's here
>>>>>> for defense in depth.
>>>>>> 
>>>>>> You may well not be interested in further work, and that's fine.  I need
>>>>>> to set aside a few days to think on this.
>>>>> 
>>>>> I'm happy to continue the discussion as long as it's constructive; I
>>>>> think we all are.  My gut feeling is that Frederick's approach falls
>>>>> closest to the sweet spot of "workable without being overly offensive"
>>>>> (*cough*), but if you've got an additional approach in mind, or an
>>>>> alternative approach that solves the same use case problems, I think
>>>>> we'd all love to hear about it.
>>>> 
>>>> I would love to actually hear the problems people are trying to solve so
>>>> that we can have a sensible conversation about the trade offs.
>>> 
>>> Here are several taken from the previous threads, it's surely not a
>>> complete list, but it should give you a good idea:
>>> 
>>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>>> 
>>>> As best I can tell without more information people want to use
>>>> the creation of a user namespace as a signal that the code is
>>>> attempting an exploit.
>>> 
>>> Some use cases are like that, there are several other use cases that
>>> go beyond this; see all of our previous discussions on this
>>> topic/patchset.  As has been mentioned before, there are use cases
>>> that require improved observability, access control, or both.
>>> 
>>>> As such let me propose instead of returning an error code which will let
>>>> the exploit continue, have the security hook return a bool.  With true
>>>> meaning the code can continue and on false it will trigger using SIGSYS
>>>> to terminate the program like seccomp does.
>>> 
>>> Having the kernel forcibly exit the process isn't something that most
>>> LSMs would likely want.  I suppose we could modify the hook/caller so
>>> that *if* an LSM wanted to return SIGSYS the system would kill the
>>> process, but I would want that to be something in addition to
>>> returning an error code like LSMs normally do (e.g. EACCES).
>> 
>> I am new to user_namespace and security work, so please pardon me if
>> anything below is very wrong.
>> 
>> IIUC, user_namespace is a tool that enables trusted userspace code to
>> control the behavior of untrusted (or less trusted) userspace code.
>> Failing create_user_ns() doesn't make the system more reliable.
>> Specifically, we call create_user_ns() via two paths: fork/clone and
>> unshare. For both paths, we need the userspace to use user_namespace,
>> and to honor failed create_user_ns().
>> 
>> On the other hand, I would echo that killing the process is not
>> practical in some use cases. Specifically, allowing the application to
>> run in a less secure environment for a short period of time might be
>> much better than killing it and taking down the whole service. Of
>> course, there are other cases that security is more important, and
>> taking down the whole service is the better choice.
>> 
>> I guess the ultimate solution is a way to enforce using user_namespace
>> in the kernel (if it ever makes sense...).
> 
> The LSM framework, and the BPF and SELinux LSM implementations in this
> patchset, provide a mechanism to do just that: kernel enforced access
> controls using flexible security policies which can be tailored by the
> distro, solution provider, or end user to meet the specific needs of
> their use case.

In this case, I wouldn't call the kernel is enforcing access control. 
(I might be wrong). There are 3 components here: kernel, LSM, and 
trusted userspace (whoever calls unshare). AFAICT, kernel simply passes
the decision made by LSM (BPF or SELinux) to the trusted userspace. It 
is up to the trusted userspace to honor the return value of unshare(). 
If the userspace simply ignores unshare failures, or does not call
unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?

This might still be useful in some cases. (I am far from an expert on
these). I just feel this is not the typical solution to enforce 
something.

Thanks,
Song

PS: If I said something very stupid, I would not feel offended if someone
pointed it out loud. :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 19:19                           ` Paul Moore
  2022-08-25 21:58                             ` Song Liu
@ 2022-08-26  9:10                             ` Ignat Korchagin
  2022-08-26 15:12                               ` Paul Moore
  1 sibling, 1 reply; 35+ messages in thread
From: Ignat Korchagin @ 2022-08-26  9:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Paul Moore, Linus Torvalds, Frederick Lawler,
	kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris,
	stephen.smalley.work, eparis, shuah, Christian Brauner, casey,
	bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

On Thu, Aug 25, 2022 at 8:19 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Paul Moore <paul@paul-moore.com> writes:
> > > On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > >>  I am hoping we can come up with
> > >> "something better" to address people's needs, make everyone happy, and
> > >> bring forth world peace.  Which would stack just fine with what's here
> > >> for defense in depth.
> > >>
> > >> You may well not be interested in further work, and that's fine.  I need
> > >> to set aside a few days to think on this.
> > >
> > > I'm happy to continue the discussion as long as it's constructive; I
> > > think we all are.  My gut feeling is that Frederick's approach falls
> > > closest to the sweet spot of "workable without being overly offensive"
> > > (*cough*), but if you've got an additional approach in mind, or an
> > > alternative approach that solves the same use case problems, I think
> > > we'd all love to hear about it.
> >
> > I would love to actually hear the problems people are trying to solve so
> > that we can have a sensible conversation about the trade offs.
>
> Here are several taken from the previous threads, it's surely not a
> complete list, but it should give you a good idea:
>
> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>
> > As best I can tell without more information people want to use
> > the creation of a user namespace as a signal that the code is
> > attempting an exploit.
>
> Some use cases are like that, there are several other use cases that
> go beyond this; see all of our previous discussions on this
> topic/patchset.  As has been mentioned before, there are use cases
> that require improved observability, access control, or both.
>
> > As such let me propose instead of returning an error code which will let
> > the exploit continue, have the security hook return a bool.  With true
> > meaning the code can continue and on false it will trigger using SIGSYS
> > to terminate the program like seccomp does.
>
> Having the kernel forcibly exit the process isn't something that most
> LSMs would likely want.  I suppose we could modify the hook/caller so
> that *if* an LSM wanted to return SIGSYS the system would kill the
> process, but I would want that to be something in addition to
> returning an error code like LSMs normally do (e.g. EACCES).

I would also add here that seccomp allows more flexibility than just
delivering SIGSYS to a violating application. We can program seccomp
bpf to:
  * deliver a signal
  * return a CUSTOM error code (and BTW somehow this does not trigger
any requirements to change userapi or document in manpages: in my toy
example in [1] I'm delivering ENETDOWN from a uname(2) system call,
which is not documented in the man pages, but totally valid from a
seccomp usage perspective)
  * do-nothing, but log the action

So I would say the seccomp reference supports the current approach
more than the alternative approach of delivering SIGSYS as technically
an LSM implementation of the hook (at least in-kernel one) can chose
to deliver a signal to a task via kernel-api, but BPF-LSM (and others)
can deliver custom error codes and log the actions as well.

Ignat

> --
> paul-moore.com

[1]: https://blog.cloudflare.com/sandboxing-in-linux-with-zero-lines-of-code/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 22:42                                 ` Song Liu
@ 2022-08-26 15:02                                   ` Paul Moore
  2022-08-26 16:57                                     ` Song Liu
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2022-08-26 15:02 UTC (permalink / raw)
  To: Song Liu
  Cc: Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz

On Thu, Aug 25, 2022 at 6:42 PM Song Liu <songliubraving@fb.com> wrote:
> > On Aug 25, 2022, at 3:10 PM, Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Aug 25, 2022 at 5:58 PM Song Liu <songliubraving@fb.com> wrote:

...

> >> I am new to user_namespace and security work, so please pardon me if
> >> anything below is very wrong.
> >>
> >> IIUC, user_namespace is a tool that enables trusted userspace code to
> >> control the behavior of untrusted (or less trusted) userspace code.
> >> Failing create_user_ns() doesn't make the system more reliable.
> >> Specifically, we call create_user_ns() via two paths: fork/clone and
> >> unshare. For both paths, we need the userspace to use user_namespace,
> >> and to honor failed create_user_ns().
> >>
> >> On the other hand, I would echo that killing the process is not
> >> practical in some use cases. Specifically, allowing the application to
> >> run in a less secure environment for a short period of time might be
> >> much better than killing it and taking down the whole service. Of
> >> course, there are other cases that security is more important, and
> >> taking down the whole service is the better choice.
> >>
> >> I guess the ultimate solution is a way to enforce using user_namespace
> >> in the kernel (if it ever makes sense...).
> >
> > The LSM framework, and the BPF and SELinux LSM implementations in this
> > patchset, provide a mechanism to do just that: kernel enforced access
> > controls using flexible security policies which can be tailored by the
> > distro, solution provider, or end user to meet the specific needs of
> > their use case.
>
> In this case, I wouldn't call the kernel is enforcing access control.
> (I might be wrong). There are 3 components here: kernel, LSM, and
> trusted userspace (whoever calls unshare).

The LSM layer, and the LSMs themselves are part of the kernel; look at
the changes in this patchset to see the LSM, BPF LSM, and SELinux
kernel changes.  Explaining how the different LSMs work is quite a bit
beyond the scope of this discussion, but there is plenty of
information available online that should be able to serve as an
introduction, not to mention the kernel source itself.  However, in
very broad terms you can think of the individual LSMs as somewhat
analogous to filesystem drivers, e.g. ext4, and the LSM itself as the
VFS layer.

> AFAICT, kernel simply passes
> the decision made by LSM (BPF or SELinux) to the trusted userspace. It
> is up to the trusted userspace to honor the return value of unshare().

With a LSM enabled and enforcing a security policy on user namespace
creation, which appears to be the case of most concern, the kernel
would make a decision on the namespace creation based on various
factors (e.g. for SELinux this would be the calling process' security
domain and the domain's permission set as determined by the configured
security policy) and if the operation was rejected an error code would
be returned to userspace and the operation rejected.  It is the exact
same thing as what would happen if the calling process is chrooted or
doesn't have a proper UID/GID mapping.  Don't forget that the
create_user_ns() function already enforces a security policy and
returns errors to userspace; this patchset doesn't add anything new in
that regard, it just allows for a richer and more flexible security
policy to be built on top of the existing constraints.

> If the userspace simply ignores unshare failures, or does not call
> unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?

The process is still subject to any security policies that are active
and being enforced by the kernel.  A malicious or misconfigured
application can still be constrained by the kernel using both the
kernel's legacy Discretionary Access Controls (DAC) as well as the
more comprehensive Mandatory Access Controls (MAC) provided by many of
the LSMs.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-26  9:10                             ` Ignat Korchagin
@ 2022-08-26 15:12                               ` Paul Moore
  0 siblings, 0 replies; 35+ messages in thread
From: Paul Moore @ 2022-08-26 15:12 UTC (permalink / raw)
  To: Ignat Korchagin
  Cc: Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, kpsingh, revest, jackmanb, ast, daniel, andrii,
	kafai, songliubraving, yhs, john.fastabend, jmorris,
	stephen.smalley.work, eparis, shuah, Christian Brauner, casey,
	bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl, tixxdz

On Fri, Aug 26, 2022 at 5:11 AM Ignat Korchagin <ignat@cloudflare.com> wrote:
> I would also add here that seccomp allows more flexibility than just
> delivering SIGSYS to a violating application. We can program seccomp
> bpf to:
>   * deliver a signal
>   * return a CUSTOM error code (and BTW somehow this does not trigger
> any requirements to change userapi or document in manpages: in my toy
> example in [1] I'm delivering ENETDOWN from a uname(2) system call,
> which is not documented in the man pages, but totally valid from a
> seccomp usage perspective)
>   * do-nothing, but log the action
>
> So I would say the seccomp reference supports the current approach
> more than the alternative approach of delivering SIGSYS as technically
> an LSM implementation of the hook (at least in-kernel one) can chose
> to deliver a signal to a task via kernel-api, but BPF-LSM (and others)
> can deliver custom error codes and log the actions as well.

I agree that seccomp mode 2 allows for more flexibility than was
mentioned earlier, however seccomp filtering has some limitations in
this particular case which can be an issue for some.  The first, and
perhaps most important, is that some of the information that a seccomp
filter might want to inspect is effectively hidden with the clone3(2)
syscall due to the clone_args struct; this would make it difficult for
a seccomp filter to identify namespace related operations.  The second
issue is that a seccomp mode 2 based approach requires the
applications themselves to "Do The Right Thing" and ensure that the
proper seccomp filter is loaded into the kernel before the target
fork()/clone()/unshare() call is executed; a LSM which implements a
proper mandatory access control mechanism does not rely on the
application, it enforces the system's security policy regardless of
what actions userspace performs.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 18:15                         ` Eric W. Biederman
  2022-08-25 19:19                           ` Paul Moore
@ 2022-08-26 15:23                           ` Serge E. Hallyn
  1 sibling, 0 replies; 35+ messages in thread
From: Serge E. Hallyn @ 2022-08-26 15:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul Moore, Serge E. Hallyn, Linus Torvalds, Frederick Lawler,
	kpsingh, revest, jackmanb, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, jmorris,
	stephen.smalley.work, eparis, shuah, brauner, casey, bpf,
	linux-security-module, selinux, linux-kselftest, linux-kernel,
	netdev, kernel-team, cgzones, karl, tixxdz

On Thu, Aug 25, 2022 at 01:15:46PM -0500, Eric W. Biederman wrote:
> Paul Moore <paul@paul-moore.com> writes:
> 
> > On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>  I am hoping we can come up with
> >> "something better" to address people's needs, make everyone happy, and
> >> bring forth world peace.  Which would stack just fine with what's here
> >> for defense in depth.
> >>
> >> You may well not be interested in further work, and that's fine.  I need
> >> to set aside a few days to think on this.
> >
> > I'm happy to continue the discussion as long as it's constructive; I
> > think we all are.  My gut feeling is that Frederick's approach falls
> > closest to the sweet spot of "workable without being overly offensive"
> > (*cough*), but if you've got an additional approach in mind, or an
> > alternative approach that solves the same use case problems, I think
> > we'd all love to hear about it.
> 
> I would love to actually hear the problems people are trying to solve so
> that we can have a sensible conversation about the trade offs.
> 
> As best I can tell without more information people want to use
> the creation of a user namespace as a signal that the code is
> attempting an exploit.

I don't think that's it at all.  I think the problem is that it seems
you can pretty reliably get a root shell at some point in the future
by creating a user namespace, leaving it open for a bit, and waiting
for a new announcement of the latest netfilter or whatever exploit
that requires root in a user namespace.  Then go back to your userns
shell and run the exploit.

So i was hoping we could do something more targeted.  Be it splitting
off the ability to run code under capable_ns code from uid mapping (to
an extent), or maybe some limited-livepatch type of thing where
certain parts of code become inaccessible to code in a non-init userns
after some sysctl has been toggled, or something cooloer that I've
failed to think of.

-serge

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-25 21:58                             ` Song Liu
  2022-08-25 22:10                               ` Paul Moore
@ 2022-08-26 15:24                               ` Serge E. Hallyn
  2022-08-26 17:00                                 ` Song Liu
  1 sibling, 1 reply; 35+ messages in thread
From: Serge E. Hallyn @ 2022-08-26 15:24 UTC (permalink / raw)
  To: Song Liu
  Cc: Paul Moore, Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz

On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> 
> 
> > On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
> > 
> > On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Paul Moore <paul@paul-moore.com> writes:
> >>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>> I am hoping we can come up with
> >>>> "something better" to address people's needs, make everyone happy, and
> >>>> bring forth world peace.  Which would stack just fine with what's here
> >>>> for defense in depth.
> >>>> 
> >>>> You may well not be interested in further work, and that's fine.  I need
> >>>> to set aside a few days to think on this.
> >>> 
> >>> I'm happy to continue the discussion as long as it's constructive; I
> >>> think we all are.  My gut feeling is that Frederick's approach falls
> >>> closest to the sweet spot of "workable without being overly offensive"
> >>> (*cough*), but if you've got an additional approach in mind, or an
> >>> alternative approach that solves the same use case problems, I think
> >>> we'd all love to hear about it.
> >> 
> >> I would love to actually hear the problems people are trying to solve so
> >> that we can have a sensible conversation about the trade offs.
> > 
> > Here are several taken from the previous threads, it's surely not a
> > complete list, but it should give you a good idea:
> > 
> > https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> > 
> >> As best I can tell without more information people want to use
> >> the creation of a user namespace as a signal that the code is
> >> attempting an exploit.
> > 
> > Some use cases are like that, there are several other use cases that
> > go beyond this; see all of our previous discussions on this
> > topic/patchset.  As has been mentioned before, there are use cases
> > that require improved observability, access control, or both.
> > 
> >> As such let me propose instead of returning an error code which will let
> >> the exploit continue, have the security hook return a bool.  With true
> >> meaning the code can continue and on false it will trigger using SIGSYS
> >> to terminate the program like seccomp does.
> > 
> > Having the kernel forcibly exit the process isn't something that most
> > LSMs would likely want.  I suppose we could modify the hook/caller so
> > that *if* an LSM wanted to return SIGSYS the system would kill the
> > process, but I would want that to be something in addition to
> > returning an error code like LSMs normally do (e.g. EACCES).
> 
> I am new to user_namespace and security work, so please pardon me if
> anything below is very wrong. 
> 
> IIUC, user_namespace is a tool that enables trusted userspace code to 
> control the behavior of untrusted (or less trusted) userspace code. 

No.  user namespaces are not a way for more trusted code to control the
behavior of less trusted code.

> Failing create_user_ns() doesn't make the system more reliable. 
> Specifically, we call create_user_ns() via two paths: fork/clone and 
> unshare. For both paths, we need the userspace to use user_namespace, 
> and to honor failed create_user_ns(). 
> 
> On the other hand, I would echo that killing the process is not 
> practical in some use cases. Specifically, allowing the application to 
> run in a less secure environment for a short period of time might be 
> much better than killing it and taking down the whole service. Of 
> course, there are other cases that security is more important, and 
> taking down the whole service is the better choice. 
> 
> I guess the ultimate solution is a way to enforce using user_namespace
> in the kernel (if it ever makes sense...). But I don't know how that
> gonna work. Before we have such solution, maybe we only need an 
> void hook for observability (or just a tracepoint, coming from BPF
> background). 
> 
> Thanks,
> Song

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-26 15:02                                   ` Paul Moore
@ 2022-08-26 16:57                                     ` Song Liu
  0 siblings, 0 replies; 35+ messages in thread
From: Song Liu @ 2022-08-26 16:57 UTC (permalink / raw)
  To: Paul Moore
  Cc: Eric W. Biederman, Serge E. Hallyn, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz



> On Aug 26, 2022, at 8:02 AM, Paul Moore <paul@paul-moore.com> wrote:
> 
> On Thu, Aug 25, 2022 at 6:42 PM Song Liu <songliubraving@fb.com> wrote:
>>> On Aug 25, 2022, at 3:10 PM, Paul Moore <paul@paul-moore.com> wrote:
>>> On Thu, Aug 25, 2022 at 5:58 PM Song Liu <songliubraving@fb.com> wrote:
> 
> ...
> 
>>>> I am new to user_namespace and security work, so please pardon me if
>>>> anything below is very wrong.
>>>> 
>>>> IIUC, user_namespace is a tool that enables trusted userspace code to
>>>> control the behavior of untrusted (or less trusted) userspace code.
>>>> Failing create_user_ns() doesn't make the system more reliable.
>>>> Specifically, we call create_user_ns() via two paths: fork/clone and
>>>> unshare. For both paths, we need the userspace to use user_namespace,
>>>> and to honor failed create_user_ns().
>>>> 
>>>> On the other hand, I would echo that killing the process is not
>>>> practical in some use cases. Specifically, allowing the application to
>>>> run in a less secure environment for a short period of time might be
>>>> much better than killing it and taking down the whole service. Of
>>>> course, there are other cases that security is more important, and
>>>> taking down the whole service is the better choice.
>>>> 
>>>> I guess the ultimate solution is a way to enforce using user_namespace
>>>> in the kernel (if it ever makes sense...).
>>> 
>>> The LSM framework, and the BPF and SELinux LSM implementations in this
>>> patchset, provide a mechanism to do just that: kernel enforced access
>>> controls using flexible security policies which can be tailored by the
>>> distro, solution provider, or end user to meet the specific needs of
>>> their use case.
>> 
>> In this case, I wouldn't call the kernel is enforcing access control.
>> (I might be wrong). There are 3 components here: kernel, LSM, and
>> trusted userspace (whoever calls unshare).
> 
> The LSM layer, and the LSMs themselves are part of the kernel; look at
> the changes in this patchset to see the LSM, BPF LSM, and SELinux
> kernel changes.  Explaining how the different LSMs work is quite a bit
> beyond the scope of this discussion, but there is plenty of
> information available online that should be able to serve as an
> introduction, not to mention the kernel source itself.  However, in
> very broad terms you can think of the individual LSMs as somewhat
> analogous to filesystem drivers, e.g. ext4, and the LSM itself as the
> VFS layer.

Thanks for the explanation. This matches my understanding with LSM. 

> 
>> AFAICT, kernel simply passes
>> the decision made by LSM (BPF or SELinux) to the trusted userspace. It
>> is up to the trusted userspace to honor the return value of unshare().
> 
> With a LSM enabled and enforcing a security policy on user namespace
> creation, which appears to be the case of most concern, the kernel
> would make a decision on the namespace creation based on various
> factors (e.g. for SELinux this would be the calling process' security
> domain and the domain's permission set as determined by the configured
> security policy) and if the operation was rejected an error code would
> be returned to userspace and the operation rejected.  It is the exact
> same thing as what would happen if the calling process is chrooted or
> doesn't have a proper UID/GID mapping.  Don't forget that the
> create_user_ns() function already enforces a security policy and
> returns errors to userspace; this patchset doesn't add anything new in
> that regard, it just allows for a richer and more flexible security
> policy to be built on top of the existing constraints.

I believe I don't understand user namespace enough to agree or disagree
here. I guess I should read more. 

Thanks,
Song

> 
>> If the userspace simply ignores unshare failures, or does not call
>> unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?
> 
> The process is still subject to any security policies that are active
> and being enforced by the kernel.  A malicious or misconfigured
> application can still be constrained by the kernel using both the
> kernel's legacy Discretionary Access Controls (DAC) as well as the
> more comprehensive Mandatory Access Controls (MAC) provided by many of
> the LSMs.
> 
> -- 
> paul-moore.com


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-26 15:24                               ` Serge E. Hallyn
@ 2022-08-26 17:00                                 ` Song Liu
  2022-08-26 21:00                                   ` Serge E. Hallyn
  0 siblings, 1 reply; 35+ messages in thread
From: Song Liu @ 2022-08-26 17:00 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Paul Moore, Eric W. Biederman, Linus Torvalds, Frederick Lawler,
	KP Singh, revest, jackmanb, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Yonghong Song, John Fastabend,
	James Morris, stephen.smalley.work, eparis, Shuah Khan, brauner,
	Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz



> On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> 
> On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
>> 
>> 
>>> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
>>> 
>>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>> Paul Moore <paul@paul-moore.com> writes:
>>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>>>>>> I am hoping we can come up with
>>>>>> "something better" to address people's needs, make everyone happy, and
>>>>>> bring forth world peace.  Which would stack just fine with what's here
>>>>>> for defense in depth.
>>>>>> 
>>>>>> You may well not be interested in further work, and that's fine.  I need
>>>>>> to set aside a few days to think on this.
>>>>> 
>>>>> I'm happy to continue the discussion as long as it's constructive; I
>>>>> think we all are.  My gut feeling is that Frederick's approach falls
>>>>> closest to the sweet spot of "workable without being overly offensive"
>>>>> (*cough*), but if you've got an additional approach in mind, or an
>>>>> alternative approach that solves the same use case problems, I think
>>>>> we'd all love to hear about it.
>>>> 
>>>> I would love to actually hear the problems people are trying to solve so
>>>> that we can have a sensible conversation about the trade offs.
>>> 
>>> Here are several taken from the previous threads, it's surely not a
>>> complete list, but it should give you a good idea:
>>> 
>>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>>> 
>>>> As best I can tell without more information people want to use
>>>> the creation of a user namespace as a signal that the code is
>>>> attempting an exploit.
>>> 
>>> Some use cases are like that, there are several other use cases that
>>> go beyond this; see all of our previous discussions on this
>>> topic/patchset.  As has been mentioned before, there are use cases
>>> that require improved observability, access control, or both.
>>> 
>>>> As such let me propose instead of returning an error code which will let
>>>> the exploit continue, have the security hook return a bool.  With true
>>>> meaning the code can continue and on false it will trigger using SIGSYS
>>>> to terminate the program like seccomp does.
>>> 
>>> Having the kernel forcibly exit the process isn't something that most
>>> LSMs would likely want.  I suppose we could modify the hook/caller so
>>> that *if* an LSM wanted to return SIGSYS the system would kill the
>>> process, but I would want that to be something in addition to
>>> returning an error code like LSMs normally do (e.g. EACCES).
>> 
>> I am new to user_namespace and security work, so please pardon me if
>> anything below is very wrong. 
>> 
>> IIUC, user_namespace is a tool that enables trusted userspace code to 
>> control the behavior of untrusted (or less trusted) userspace code. 
> 
> No.  user namespaces are not a way for more trusted code to control the
> behavior of less trusted code.

Hmm.. In this case, I think I really need to learn more. 

Thanks for pointing out my misunderstanding.

Song

> 
>> Failing create_user_ns() doesn't make the system more reliable. 
>> Specifically, we call create_user_ns() via two paths: fork/clone and 
>> unshare. For both paths, we need the userspace to use user_namespace, 
>> and to honor failed create_user_ns(). 
>> 
>> On the other hand, I would echo that killing the process is not 
>> practical in some use cases. Specifically, allowing the application to 
>> run in a less secure environment for a short period of time might be 
>> much better than killing it and taking down the whole service. Of 
>> course, there are other cases that security is more important, and 
>> taking down the whole service is the better choice. 
>> 
>> I guess the ultimate solution is a way to enforce using user_namespace
>> in the kernel (if it ever makes sense...). But I don't know how that
>> gonna work. Before we have such solution, maybe we only need an 
>> void hook for observability (or just a tracepoint, coming from BPF
>> background). 
>> 
>> Thanks,
>> Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-26 17:00                                 ` Song Liu
@ 2022-08-26 21:00                                   ` Serge E. Hallyn
  2022-08-26 22:34                                     ` Song Liu
  2022-08-29 15:33                                     ` Christian Brauner
  0 siblings, 2 replies; 35+ messages in thread
From: Serge E. Hallyn @ 2022-08-26 21:00 UTC (permalink / raw)
  To: Song Liu
  Cc: Serge E. Hallyn, Paul Moore, Eric W. Biederman, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, brauner, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz

On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
> 
> 
> > On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > 
> > On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> >> 
> >> 
> >>> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
> >>> 
> >>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>>> Paul Moore <paul@paul-moore.com> writes:
> >>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>>>> I am hoping we can come up with
> >>>>>> "something better" to address people's needs, make everyone happy, and
> >>>>>> bring forth world peace.  Which would stack just fine with what's here
> >>>>>> for defense in depth.
> >>>>>> 
> >>>>>> You may well not be interested in further work, and that's fine.  I need
> >>>>>> to set aside a few days to think on this.
> >>>>> 
> >>>>> I'm happy to continue the discussion as long as it's constructive; I
> >>>>> think we all are.  My gut feeling is that Frederick's approach falls
> >>>>> closest to the sweet spot of "workable without being overly offensive"
> >>>>> (*cough*), but if you've got an additional approach in mind, or an
> >>>>> alternative approach that solves the same use case problems, I think
> >>>>> we'd all love to hear about it.
> >>>> 
> >>>> I would love to actually hear the problems people are trying to solve so
> >>>> that we can have a sensible conversation about the trade offs.
> >>> 
> >>> Here are several taken from the previous threads, it's surely not a
> >>> complete list, but it should give you a good idea:
> >>> 
> >>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> >>> 
> >>>> As best I can tell without more information people want to use
> >>>> the creation of a user namespace as a signal that the code is
> >>>> attempting an exploit.
> >>> 
> >>> Some use cases are like that, there are several other use cases that
> >>> go beyond this; see all of our previous discussions on this
> >>> topic/patchset.  As has been mentioned before, there are use cases
> >>> that require improved observability, access control, or both.
> >>> 
> >>>> As such let me propose instead of returning an error code which will let
> >>>> the exploit continue, have the security hook return a bool.  With true
> >>>> meaning the code can continue and on false it will trigger using SIGSYS
> >>>> to terminate the program like seccomp does.
> >>> 
> >>> Having the kernel forcibly exit the process isn't something that most
> >>> LSMs would likely want.  I suppose we could modify the hook/caller so
> >>> that *if* an LSM wanted to return SIGSYS the system would kill the
> >>> process, but I would want that to be something in addition to
> >>> returning an error code like LSMs normally do (e.g. EACCES).
> >> 
> >> I am new to user_namespace and security work, so please pardon me if
> >> anything below is very wrong. 
> >> 
> >> IIUC, user_namespace is a tool that enables trusted userspace code to 
> >> control the behavior of untrusted (or less trusted) userspace code. 
> > 
> > No.  user namespaces are not a way for more trusted code to control the
> > behavior of less trusted code.
> 
> Hmm.. In this case, I think I really need to learn more. 
> 
> Thanks for pointing out my misunderstanding.

(I thought maybe Eric would chime in with a better explanation, but I'll
fill it in for now :)

One of the main goals of user namespaces is to allow unprivileged users
to do things like chroot and mount, which are very useful development
tools, without needing admin privileges.  So it's almost the opposite
of what you said: rather than to enable trusted userspace code to control
the behavior of less trusted code, it's to allow less privileged code to
do things which do not affect other users, without having to assume *more*
privilege.

To be precise, the goals were:

1. uid mapping - allow two users to both "use uid 500" without conflicting
2. provide (unprivileged) users privilege over their own resources
3. absolutely no extra privilege over other resources
4. be able to nest

While (3) was technically achieved, the problem we have is that
(2) provides unprivileged users the ability to exercise kernel code
which they previously could not.

-serge

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-26 21:00                                   ` Serge E. Hallyn
@ 2022-08-26 22:34                                     ` Song Liu
  2022-08-29 15:33                                     ` Christian Brauner
  1 sibling, 0 replies; 35+ messages in thread
From: Song Liu @ 2022-08-26 22:34 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Paul Moore, Eric W. Biederman, Linus Torvalds, Frederick Lawler,
	KP Singh, revest, jackmanb, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Yonghong Song, John Fastabend,
	James Morris, stephen.smalley.work, eparis, Shuah Khan, brauner,
	Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz



> On Aug 26, 2022, at 2:00 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> 
> On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
>> 
>> 
>>> On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
>>> 
>>> On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
>>>> 
>>>> 
>>>>> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
>>>>> 
>>>>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>>> Paul Moore <paul@paul-moore.com> writes:
>>>>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>>>>>>>> I am hoping we can come up with
>>>>>>>> "something better" to address people's needs, make everyone happy, and
>>>>>>>> bring forth world peace.  Which would stack just fine with what's here
>>>>>>>> for defense in depth.
>>>>>>>> 
>>>>>>>> You may well not be interested in further work, and that's fine.  I need
>>>>>>>> to set aside a few days to think on this.
>>>>>>> 
>>>>>>> I'm happy to continue the discussion as long as it's constructive; I
>>>>>>> think we all are.  My gut feeling is that Frederick's approach falls
>>>>>>> closest to the sweet spot of "workable without being overly offensive"
>>>>>>> (*cough*), but if you've got an additional approach in mind, or an
>>>>>>> alternative approach that solves the same use case problems, I think
>>>>>>> we'd all love to hear about it.
>>>>>> 
>>>>>> I would love to actually hear the problems people are trying to solve so
>>>>>> that we can have a sensible conversation about the trade offs.
>>>>> 
>>>>> Here are several taken from the previous threads, it's surely not a
>>>>> complete list, but it should give you a good idea:
>>>>> 
>>>>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>>>>> 
>>>>>> As best I can tell without more information people want to use
>>>>>> the creation of a user namespace as a signal that the code is
>>>>>> attempting an exploit.
>>>>> 
>>>>> Some use cases are like that, there are several other use cases that
>>>>> go beyond this; see all of our previous discussions on this
>>>>> topic/patchset.  As has been mentioned before, there are use cases
>>>>> that require improved observability, access control, or both.
>>>>> 
>>>>>> As such let me propose instead of returning an error code which will let
>>>>>> the exploit continue, have the security hook return a bool.  With true
>>>>>> meaning the code can continue and on false it will trigger using SIGSYS
>>>>>> to terminate the program like seccomp does.
>>>>> 
>>>>> Having the kernel forcibly exit the process isn't something that most
>>>>> LSMs would likely want.  I suppose we could modify the hook/caller so
>>>>> that *if* an LSM wanted to return SIGSYS the system would kill the
>>>>> process, but I would want that to be something in addition to
>>>>> returning an error code like LSMs normally do (e.g. EACCES).
>>>> 
>>>> I am new to user_namespace and security work, so please pardon me if
>>>> anything below is very wrong. 
>>>> 
>>>> IIUC, user_namespace is a tool that enables trusted userspace code to 
>>>> control the behavior of untrusted (or less trusted) userspace code. 
>>> 
>>> No.  user namespaces are not a way for more trusted code to control the
>>> behavior of less trusted code.
>> 
>> Hmm.. In this case, I think I really need to learn more. 
>> 
>> Thanks for pointing out my misunderstanding.
> 
> (I thought maybe Eric would chime in with a better explanation, but I'll
> fill it in for now :)
> 
> One of the main goals of user namespaces is to allow unprivileged users
> to do things like chroot and mount, which are very useful development
> tools, without needing admin privileges.  So it's almost the opposite
> of what you said: rather than to enable trusted userspace code to control
> the behavior of less trusted code, it's to allow less privileged code to
> do things which do not affect other users, without having to assume *more*
> privilege.

Thanks for the explanation! 

> 
> To be precise, the goals were:
> 
> 1. uid mapping - allow two users to both "use uid 500" without conflicting
> 2. provide (unprivileged) users privilege over their own resources
> 3. absolutely no extra privilege over other resources
> 4. be able to nest

Now I have better idea about "what". But I am not quite sure about how to do
it. I will do more homework, and probably come back with more questions. :)

> 
> While (3) was technically achieved, the problem we have is that
> (2) provides unprivileged users the ability to exercise kernel code
> which they previously could not.

Do you mean this one?

"""
I think the problem is that it seems
you can pretty reliably get a root shell at some point in the future
by creating a user namespace, leaving it open for a bit, and waiting
for a new announcement of the latest netfilter or whatever exploit
that requires root in a user namespace.  Then go back to your userns
shell and run the exploit.
"""

Please don't share how to do it yet. I want to use it as a test for my study. :)

Thanks again!

Song

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-26 21:00                                   ` Serge E. Hallyn
  2022-08-26 22:34                                     ` Song Liu
@ 2022-08-29 15:33                                     ` Christian Brauner
  2022-09-03  3:58                                       ` Serge E. Hallyn
  1 sibling, 1 reply; 35+ messages in thread
From: Christian Brauner @ 2022-08-29 15:33 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Song Liu, Paul Moore, Eric W. Biederman, Linus Torvalds,
	Frederick Lawler, KP Singh, revest, jackmanb, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin Lau, Yonghong Song,
	John Fastabend, James Morris, stephen.smalley.work, eparis,
	Shuah Khan, Casey Schaufler, bpf, LSM List, selinux,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, Networking,
	kernel-team, cgzones, karl, tixxdz

On Fri, Aug 26, 2022 at 04:00:39PM -0500, Serge Hallyn wrote:
> On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
> > 
> > 
> > > On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > > 
> > > On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> > >> 
> > >> 
> > >>> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
> > >>> 
> > >>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > >>>> Paul Moore <paul@paul-moore.com> writes:
> > >>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > >>>>>> I am hoping we can come up with
> > >>>>>> "something better" to address people's needs, make everyone happy, and
> > >>>>>> bring forth world peace.  Which would stack just fine with what's here
> > >>>>>> for defense in depth.
> > >>>>>> 
> > >>>>>> You may well not be interested in further work, and that's fine.  I need
> > >>>>>> to set aside a few days to think on this.
> > >>>>> 
> > >>>>> I'm happy to continue the discussion as long as it's constructive; I
> > >>>>> think we all are.  My gut feeling is that Frederick's approach falls
> > >>>>> closest to the sweet spot of "workable without being overly offensive"
> > >>>>> (*cough*), but if you've got an additional approach in mind, or an
> > >>>>> alternative approach that solves the same use case problems, I think
> > >>>>> we'd all love to hear about it.
> > >>>> 
> > >>>> I would love to actually hear the problems people are trying to solve so
> > >>>> that we can have a sensible conversation about the trade offs.
> > >>> 
> > >>> Here are several taken from the previous threads, it's surely not a
> > >>> complete list, but it should give you a good idea:
> > >>> 
> > >>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> > >>> 
> > >>>> As best I can tell without more information people want to use
> > >>>> the creation of a user namespace as a signal that the code is
> > >>>> attempting an exploit.
> > >>> 
> > >>> Some use cases are like that, there are several other use cases that
> > >>> go beyond this; see all of our previous discussions on this
> > >>> topic/patchset.  As has been mentioned before, there are use cases
> > >>> that require improved observability, access control, or both.
> > >>> 
> > >>>> As such let me propose instead of returning an error code which will let
> > >>>> the exploit continue, have the security hook return a bool.  With true
> > >>>> meaning the code can continue and on false it will trigger using SIGSYS
> > >>>> to terminate the program like seccomp does.
> > >>> 
> > >>> Having the kernel forcibly exit the process isn't something that most
> > >>> LSMs would likely want.  I suppose we could modify the hook/caller so
> > >>> that *if* an LSM wanted to return SIGSYS the system would kill the
> > >>> process, but I would want that to be something in addition to
> > >>> returning an error code like LSMs normally do (e.g. EACCES).
> > >> 
> > >> I am new to user_namespace and security work, so please pardon me if
> > >> anything below is very wrong. 
> > >> 
> > >> IIUC, user_namespace is a tool that enables trusted userspace code to 
> > >> control the behavior of untrusted (or less trusted) userspace code. 
> > > 
> > > No.  user namespaces are not a way for more trusted code to control the
> > > behavior of less trusted code.
> > 
> > Hmm.. In this case, I think I really need to learn more. 
> > 
> > Thanks for pointing out my misunderstanding.
> 
> (I thought maybe Eric would chime in with a better explanation, but I'll
> fill it in for now :)
> 
> One of the main goals of user namespaces is to allow unprivileged users
> to do things like chroot and mount, which are very useful development
> tools, without needing admin privileges.  So it's almost the opposite
> of what you said: rather than to enable trusted userspace code to control
> the behavior of less trusted code, it's to allow less privileged code to
> do things which do not affect other users, without having to assume *more*
> privilege.
> 
> To be precise, the goals were:
> 
> 1. uid mapping - allow two users to both "use uid 500" without conflicting
> 2. provide (unprivileged) users privilege over their own resources
> 3. absolutely no extra privilege over other resources
> 4. be able to nest
> 
> While (3) was technically achieved, the problem we have is that
> (2) provides unprivileged users the ability to exercise kernel code
> which they previously could not.

The consequence of the refusal to give users any way to control whether
or not user namespaces are available to unprivileged users is that a
non-significant number of distros still carry the same patch for about
10 years now that adds an unprivileged_userns_clone sysctl to restrict
them to privileged users. That includes current Debian and Archlinux btw.

The LSM hook is a simple way to allow administrators to control this and
will allow user namespaces to be enabled in scenarios where they
would otherwise not be accepted precisely because they are available to
unprivileged users.

I fully understand the motivation and usefulness in unprivileged
scenarios but it's an unfounded fear that giving users the ability to
control user namespace creation via an LSM hook will cause proliferation
of setuid binaries (Ignoring for a moment that any fully unprivileged
container with useful idmappings has to rely on the new{g,u}idmap setuid
binaries to setup useful mappings anyway.) or decrease system safety let
alone cause regressions (Which I don't think is an applicable term here
at all.). Distros that have unprivileged user namespaces turned on by
default are extremely unlikely to switch to an LSM profile that turns
them off and distros that already turn them off will continue to turn
them off whether or not that LSM hook is available.

It's much more likely that workloads that want to minimize their attack
surface while still getting the benefits of user namespaces for e.g.
service isolation will feel comfortable enabling them for the first time
since they can control them via an LSM profile.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/4] Introduce security_create_user_ns()
  2022-08-29 15:33                                     ` Christian Brauner
@ 2022-09-03  3:58                                       ` Serge E. Hallyn
  0 siblings, 0 replies; 35+ messages in thread
From: Serge E. Hallyn @ 2022-09-03  3:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Serge E. Hallyn, Song Liu, Paul Moore, Eric W. Biederman,
	Linus Torvalds, Frederick Lawler, KP Singh, revest, jackmanb,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Yonghong Song, John Fastabend, James Morris,
	stephen.smalley.work, eparis, Shuah Khan, Casey Schaufler, bpf,
	LSM List, selinux, open list:KERNEL SELFTEST FRAMEWORK, LKML,
	Networking, kernel-team, cgzones, karl, tixxdz

On Mon, Aug 29, 2022 at 05:33:04PM +0200, Christian Brauner wrote:
> On Fri, Aug 26, 2022 at 04:00:39PM -0500, Serge Hallyn wrote:
> > On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
> > > 
> > > 
> > > > On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > > > 
> > > > On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> > > >> 
> > > >> 
> > > >>> On Aug 25, 2022, at 12:19 PM, Paul Moore <paul@paul-moore.com> wrote:
> > > >>> 
> > > >>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > >>>> Paul Moore <paul@paul-moore.com> writes:
> > > >>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > > >>>>>> I am hoping we can come up with
> > > >>>>>> "something better" to address people's needs, make everyone happy, and
> > > >>>>>> bring forth world peace.  Which would stack just fine with what's here
> > > >>>>>> for defense in depth.
> > > >>>>>> 
> > > >>>>>> You may well not be interested in further work, and that's fine.  I need
> > > >>>>>> to set aside a few days to think on this.
> > > >>>>> 
> > > >>>>> I'm happy to continue the discussion as long as it's constructive; I
> > > >>>>> think we all are.  My gut feeling is that Frederick's approach falls
> > > >>>>> closest to the sweet spot of "workable without being overly offensive"
> > > >>>>> (*cough*), but if you've got an additional approach in mind, or an
> > > >>>>> alternative approach that solves the same use case problems, I think
> > > >>>>> we'd all love to hear about it.
> > > >>>> 
> > > >>>> I would love to actually hear the problems people are trying to solve so
> > > >>>> that we can have a sensible conversation about the trade offs.
> > > >>> 
> > > >>> Here are several taken from the previous threads, it's surely not a
> > > >>> complete list, but it should give you a good idea:
> > > >>> 
> > > >>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> > > >>> 
> > > >>>> As best I can tell without more information people want to use
> > > >>>> the creation of a user namespace as a signal that the code is
> > > >>>> attempting an exploit.
> > > >>> 
> > > >>> Some use cases are like that, there are several other use cases that
> > > >>> go beyond this; see all of our previous discussions on this
> > > >>> topic/patchset.  As has been mentioned before, there are use cases
> > > >>> that require improved observability, access control, or both.
> > > >>> 
> > > >>>> As such let me propose instead of returning an error code which will let
> > > >>>> the exploit continue, have the security hook return a bool.  With true
> > > >>>> meaning the code can continue and on false it will trigger using SIGSYS
> > > >>>> to terminate the program like seccomp does.
> > > >>> 
> > > >>> Having the kernel forcibly exit the process isn't something that most
> > > >>> LSMs would likely want.  I suppose we could modify the hook/caller so
> > > >>> that *if* an LSM wanted to return SIGSYS the system would kill the
> > > >>> process, but I would want that to be something in addition to
> > > >>> returning an error code like LSMs normally do (e.g. EACCES).
> > > >> 
> > > >> I am new to user_namespace and security work, so please pardon me if
> > > >> anything below is very wrong. 
> > > >> 
> > > >> IIUC, user_namespace is a tool that enables trusted userspace code to 
> > > >> control the behavior of untrusted (or less trusted) userspace code. 
> > > > 
> > > > No.  user namespaces are not a way for more trusted code to control the
> > > > behavior of less trusted code.
> > > 
> > > Hmm.. In this case, I think I really need to learn more. 
> > > 
> > > Thanks for pointing out my misunderstanding.
> > 
> > (I thought maybe Eric would chime in with a better explanation, but I'll
> > fill it in for now :)
> > 
> > One of the main goals of user namespaces is to allow unprivileged users
> > to do things like chroot and mount, which are very useful development
> > tools, without needing admin privileges.  So it's almost the opposite
> > of what you said: rather than to enable trusted userspace code to control
> > the behavior of less trusted code, it's to allow less privileged code to
> > do things which do not affect other users, without having to assume *more*
> > privilege.
> > 
> > To be precise, the goals were:
> > 
> > 1. uid mapping - allow two users to both "use uid 500" without conflicting
> > 2. provide (unprivileged) users privilege over their own resources
> > 3. absolutely no extra privilege over other resources
> > 4. be able to nest
> > 
> > While (3) was technically achieved, the problem we have is that
> > (2) provides unprivileged users the ability to exercise kernel code
> > which they previously could not.
> 
> The consequence of the refusal to give users any way to control whether
> or not user namespaces are available to unprivileged users is that a
> non-significant number of distros still carry the same patch for about
> 10 years now that adds an unprivileged_userns_clone sysctl to restrict
> them to privileged users. That includes current Debian and Archlinux btw.

Hi Christian,

I'm wondering about your placement of this argument in the thread, and whether
you interpreted what I said above as an argument against this patchset, or
whether you're just expanding on what I said.

> The LSM hook is a simple way to allow administrators to control this and

(I think the "control" here is suboptimal, but I've not seen - nor
conceived of - anything better as of yet)

> will allow user namespaces to be enabled in scenarios where they
> would otherwise not be accepted precisely because they are available to
> unprivileged users.
> 
> I fully understand the motivation and usefulness in unprivileged
> scenarios but it's an unfounded fear that giving users the ability to
> control user namespace creation via an LSM hook will cause proliferation
> of setuid binaries (Ignoring for a moment that any fully unprivileged
> container with useful idmappings has to rely on the new{g,u}idmap setuid
> binaries to setup useful mappings anyway.) or decrease system safety let
> alone cause regressions (Which I don't think is an applicable term here
> at all.). Distros that have unprivileged user namespaces turned on by
> default are extremely unlikely to switch to an LSM profile that turns
> them off and distros that already turn them off will continue to turn
> them off whether or not that LSM hook is available.
> 
> It's much more likely that workloads that want to minimize their attack
> surface while still getting the benefits of user namespaces for e.g.
> service isolation will feel comfortable enabling them for the first time
> since they can control them via an LSM profile.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-09-03  3:58 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-15 16:20 [PATCH v5 0/4] Introduce security_create_user_ns() Frederick Lawler
2022-08-15 16:20 ` [PATCH v5 1/4] security, lsm: " Frederick Lawler
2022-08-15 16:20 ` [PATCH v5 2/4] bpf-lsm: Make bpf_lsm_userns_create() sleepable Frederick Lawler
2022-08-15 16:20 ` [PATCH v5 3/4] selftests/bpf: Add tests verifying bpf lsm userns_create hook Frederick Lawler
2022-08-15 16:20 ` [PATCH v5 4/4] selinux: Implement " Frederick Lawler
2022-08-16 21:51 ` [PATCH v5 0/4] Introduce security_create_user_ns() Paul Moore
2022-08-17 15:07   ` Eric W. Biederman
2022-08-17 16:01     ` Paul Moore
2022-08-17 19:57       ` Eric W. Biederman
2022-08-17 20:13         ` Paul Moore
2022-08-17 20:56           ` Eric W. Biederman
2022-08-17 21:09             ` Paul Moore
2022-08-17 21:24               ` Eric W. Biederman
2022-08-17 21:50                 ` Paul Moore
2022-08-18  0:35                   ` Jonathan Chapman-Moore
2022-08-18 14:05                 ` Serge E. Hallyn
2022-08-18 15:11                   ` Paul Moore
2022-08-19 14:45                     ` Serge E. Hallyn
2022-08-19 21:10                       ` Paul Moore
2022-08-25 18:15                         ` Eric W. Biederman
2022-08-25 19:19                           ` Paul Moore
2022-08-25 21:58                             ` Song Liu
2022-08-25 22:10                               ` Paul Moore
2022-08-25 22:42                                 ` Song Liu
2022-08-26 15:02                                   ` Paul Moore
2022-08-26 16:57                                     ` Song Liu
2022-08-26 15:24                               ` Serge E. Hallyn
2022-08-26 17:00                                 ` Song Liu
2022-08-26 21:00                                   ` Serge E. Hallyn
2022-08-26 22:34                                     ` Song Liu
2022-08-29 15:33                                     ` Christian Brauner
2022-09-03  3:58                                       ` Serge E. Hallyn
2022-08-26  9:10                             ` Ignat Korchagin
2022-08-26 15:12                               ` Paul Moore
2022-08-26 15:23                           ` Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).