stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression
       [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
@ 2022-02-11  2:13 ` Eric W. Biederman
  2022-02-14 18:37   ` Michal Koutný
  2022-02-11  2:13 ` [PATCH 2/8] ucounts: Fix set_cred_ucounts Eric W. Biederman
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-11  2:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexey Gladkov, Kees Cook, Shuah Khan, Christian Brauner,
	Solar Designer, Ran Xiaokai, containers, Michal Koutný,
	Eric W. Biederman, stable

Michal Koutný <mkoutny@suse.com> wrote:

> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 21d1c5e386bc ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.

This can be fixed either by fixing the test or by moving the increment
to be before the test.  Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.

This is necessary so that in the case of CLONE_NEWUSER the new credential
with the new ucounts is available of is_ucounts_overlimit to test.

Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts.  The change of the test in
fork was bad because it was before the increment.  The test in
set_user was wrong and the change to ucounts fixed it.  So this
fix is only restore the old behavior in one lcatio not two.

Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com
Cc: stable@vger.kernel.org
Reported-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/fork.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..17d8a8c85e3b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2021,18 +2021,18 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_PROVE_LOCKING
 	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif
+	retval = copy_creds(p, clone_flags);
+	if (retval < 0)
+		goto bad_fork_free;
+
 	retval = -EAGAIN;
 	if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
 		if (p->real_cred->user != INIT_USER &&
 		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
-			goto bad_fork_free;
+			goto bad_fork_cleanup_count;
 	}
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
-	retval = copy_creds(p, clone_flags);
-	if (retval < 0)
-		goto bad_fork_free;
-
 	/*
 	 * If multiple threads are within copy_process(), then this check
 	 * triggers too late. This doesn't hurt, the check is only there
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] ucounts: Fix set_cred_ucounts
       [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
  2022-02-11  2:13 ` [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression Eric W. Biederman
@ 2022-02-11  2:13 ` Eric W. Biederman
  2022-02-15 11:10   ` Michal Koutný
  2022-02-11  2:13 ` [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve Eric W. Biederman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-11  2:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexey Gladkov, Kees Cook, Shuah Khan, Christian Brauner,
	Solar Designer, Ran Xiaokai, containers, Michal Koutný,
	Eric W. Biederman, stable

Michal Koutný <mkoutny@suse.com> wrote:
> Tasks are associated to multiple users at once. Historically and as per
> setrlimit(2) RLIMIT_NPROC is enforce based on real user ID.
>
> The commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> made the accounting structure "indexed" by euid and hence potentially
> account tasks differently.
>
> The effective user ID may be different e.g. for setuid programs but
> those are exec'd into already existing task (i.e. below limit), so
> different accounting is moot.
>
> Some special setresuid(2) users may notice the difference, justifying
> this fix.

I looked at the cred->ucount is only used for rlimit operations that
were previously stored in cred->user.  Making the fact cred->ucount
can refer to a different user from cred->user a bug working will all
rlimits not just RLIMIT_NPROC.

So fix set_cred_ucounts to always use the real uid not the effective uid.

Further simplify set_cred_ucounts by noticing that set_cred_ucounts
somehow retained a draft version of the check to see if alloc_ucounts
was needed that checks the new->user and new->user_ns against the
current_real_cred(), when nothing matters for setting the ucounts
field of a struct cred except the other fields in that same struct
cred.

So delete the confusing and wrong check against the
current_real_cred(), and all of it's intermediate variables.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-4-mkoutny@suse.com
Reported-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/cred.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 473d17c431f3..933155c96922 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -665,21 +665,16 @@ EXPORT_SYMBOL(cred_fscmp);
 
 int set_cred_ucounts(struct cred *new)
 {
-	struct task_struct *task = current;
-	const struct cred *old = task->real_cred;
 	struct ucounts *new_ucounts, *old_ucounts = new->ucounts;
 
-	if (new->user == old->user && new->user_ns == old->user_ns)
-		return 0;
-
 	/*
 	 * This optimization is needed because alloc_ucounts() uses locks
 	 * for table lookups.
 	 */
-	if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->euid))
+	if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->uid))
 		return 0;
 
-	if (!(new_ucounts = alloc_ucounts(new->user_ns, new->euid)))
+	if (!(new_ucounts = alloc_ucounts(new->user_ns, new->uid)))
 		return -EAGAIN;
 
 	new->ucounts = new_ucounts;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve
       [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
  2022-02-11  2:13 ` [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression Eric W. Biederman
  2022-02-11  2:13 ` [PATCH 2/8] ucounts: Fix set_cred_ucounts Eric W. Biederman
@ 2022-02-11  2:13 ` Eric W. Biederman
  2022-02-12 23:17   ` Solar Designer
  2022-02-11  2:13 ` [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC Eric W. Biederman
  2022-02-11  2:13 ` [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit Eric W. Biederman
  4 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-11  2:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexey Gladkov, Kees Cook, Shuah Khan, Christian Brauner,
	Solar Designer, Ran Xiaokai, containers, Michal Koutný,
	Eric W. Biederman, stable

Michal Koutný <mkoutny@suse.com> wrote:
> The check is currently against the current->cred but since those are
> going to change and we want to check RLIMIT_NPROC condition after the
> switch, supply the capability check with the new cred.
> But since we're checking new_user being INIT_USER any new cred's
> capability-based allowance may be redundant when the check fails and the
> alternative solution would be revert of the commit 2863643fb8b9
> ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")

As of commit 2863643fb8b9 ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
should check RLIMIT_NPROC is buggy, as it tests the capabilites from
before the credential change and not aftwards.

As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
ucounts") examining the rlimit is buggy as cred->ucounts has not yet
been properly set in the new credential.

Make the code correct and more robust moving the test to see if
execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
of the rest of the logic into execve() itself.

As the flag only indicateds that RLIMIT_NPROC should be checked
in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@suse.com
Reported-by: Michal Koutný <mkoutny@suse.com>
Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c             | 15 ++++++++-------
 include/linux/sched.h |  2 +-
 kernel/cred.c         | 13 +++++++++----
 kernel/fork.c         |  2 +-
 kernel/sys.c          | 14 --------------
 5 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 79f2c9483302..1e7f757cbc2c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1875,20 +1875,21 @@ static int do_execveat_common(int fd, struct filename *filename,
 		return PTR_ERR(filename);
 
 	/*
-	 * We move the actual failure in case of RLIMIT_NPROC excess from
-	 * set*uid() to execve() because too many poorly written programs
-	 * don't check setuid() return code.  Here we additionally recheck
-	 * whether NPROC limit is still exceeded.
+	 * After calling set*uid() is RLIMT_NPROC exceeded?
+	 * This can not be checked in set*uid() because too many programs don't
+	 * check the setuid() return code.
 	 */
-	if ((current->flags & PF_NPROC_EXCEEDED) &&
-	    is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
+	if ((current->flags & PF_NPROC_CHECK) &&
+	    is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
+	    (current_user() != INIT_USER) &&
+	    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
 		retval = -EAGAIN;
 		goto out_ret;
 	}
 
 	/* We're below the limit (still or again), so we don't want to make
 	 * further execve() calls fail. */
-	current->flags &= ~PF_NPROC_EXCEEDED;
+	current->flags &= ~PF_NPROC_CHECK;
 
 	bprm = alloc_bprm(fd, filename);
 	if (IS_ERR(bprm)) {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248..6605a262a6be 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1678,7 +1678,7 @@ extern struct pid *cad_pid;
 #define PF_DUMPCORE		0x00000200	/* Dumped core */
 #define PF_SIGNALED		0x00000400	/* Killed by a signal */
 #define PF_MEMALLOC		0x00000800	/* Allocating memory */
-#define PF_NPROC_EXCEEDED	0x00001000	/* set_user() noticed that RLIMIT_NPROC was exceeded */
+#define PF_NPROC_CHECK		0x00001000	/* Check in execve if RLIMIT_NPROC was exceeded */
 #define PF_USED_MATH		0x00002000	/* If unset the fpu must be initialized before use */
 #define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
 #define PF_FROZEN		0x00010000	/* Frozen for system suspend */
diff --git a/kernel/cred.c b/kernel/cred.c
index 933155c96922..229cff081167 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -490,13 +490,18 @@ int commit_creds(struct cred *new)
 	if (!gid_eq(new->fsgid, old->fsgid))
 		key_fsgid_changed(new);
 
-	/* do it
-	 * RLIMIT_NPROC limits on user->processes have already been checked
-	 * in set_user().
+	/*
+	 * Remember if the NPROC limit may be exceeded.  The set*uid() functions
+	 * can not fail if the NPROC limit is exceeded as too many programs
+	 * don't check the return code.  Instead enforce the NPROC limit for
+	 * programs doing set*uid()+execve by harmlessly defering the failure
+	 * to the execve() stage.
 	 */
 	alter_cred_subscribers(new, 2);
-	if (new->user != old->user || new->user_ns != old->user_ns)
+	if (new->user != old->user || new->user_ns != old->user_ns) {
 		inc_rlimit_ucounts(new->ucounts, UCOUNT_RLIMIT_NPROC, 1);
+		task->flags |= PF_NPROC_CHECK;
+	}
 	rcu_assign_pointer(task->real_cred, new);
 	rcu_assign_pointer(task->cred, new);
 	if (new->user != old->user || new->user_ns != old->user_ns)
diff --git a/kernel/fork.c b/kernel/fork.c
index 17d8a8c85e3b..2b6a28a86325 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2031,7 +2031,7 @@ static __latent_entropy struct task_struct *copy_process(
 		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
 			goto bad_fork_cleanup_count;
 	}
-	current->flags &= ~PF_NPROC_EXCEEDED;
+	current->flags &= ~PF_NPROC_CHECK;
 
 	/*
 	 * If multiple threads are within copy_process(), then this check
diff --git a/kernel/sys.c b/kernel/sys.c
index ecc4cf019242..b1ed21d79f3b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -472,20 +472,6 @@ static int set_user(struct cred *new)
 	if (!new_user)
 		return -EAGAIN;
 
-	/*
-	 * We don't fail in case of NPROC limit excess here because too many
-	 * poorly written programs don't check set*uid() return code, assuming
-	 * it never fails if called by root.  We may still enforce NPROC limit
-	 * for programs doing set*uid()+execve() by harmlessly deferring the
-	 * failure to the execve() stage.
-	 */
-	if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
-			new_user != INIT_USER &&
-			!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
-		current->flags |= PF_NPROC_EXCEEDED;
-	else
-		current->flags &= ~PF_NPROC_EXCEEDED;
-
 	free_uid(new->user);
 	new->user = new_user;
 	return 0;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC
       [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
                   ` (2 preceding siblings ...)
  2022-02-11  2:13 ` [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve Eric W. Biederman
@ 2022-02-11  2:13 ` Eric W. Biederman
  2022-02-15 10:54   ` Michal Koutný
  2022-02-11  2:13 ` [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit Eric W. Biederman
  4 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-11  2:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexey Gladkov, Kees Cook, Shuah Khan, Christian Brauner,
	Solar Designer, Ran Xiaokai, containers, Michal Koutný,
	Eric W. Biederman, stable

In [1] Michal Koutný <mkoutny@suse.com> reported that it does not make
sense to unconditionally exempt the INIT_USER during fork and exec
from RLIMIT_NPROC and then to impose a limit on that same user with
is_ucounts_overlimit.  So I looked into why the exeception was added.

commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits (RLIMIT_NPROC)") says:
>  If a setuid process swaps it's real and effective uids and then
>  forks, the fork fails if the new realuid has more processes than
>  the original process was limited to.  This is particularly a
>  problem if a user with a process limit (e.g. 256) runs a
>  setuid-root program which does setuid() + fork() (e.g. lprng) while
>  root already has more than 256 process (which is quite possible).
>
>  The root problem here is that a limit which should be a per-user
>  limit is being implemented as a per-process limit with per-process
>  (e.g. CAP_SYS_RESOURCE) controls.  Being a per-user limit, it
>  should be that the root-user can over-ride it, not just some
>  process with CAP_SYS_RESOURCE.
>
>  This patch adds a test to ignore process limits if the real user is root.

With the moving of the limits from per-user to per-user-per-user_ns it
is clear that testing a user_struct is no longer the proper test and
the test should be a test against the ucounts struct to match the
rest of the logic change that was made.

With RLIMIT_NPROC not being enforced for the global root user anywhere
else should it be enforced in is_ucounts_overlimit for a user
namespace created by the global root user?

Since this is limited to just the global root user, and RLIMIT_NPROC
is already ignored for that user I am going to vote no.

This change does imply that it becomes possible to limit all users in
a user namespace but to not enforce the rlimits on the root user or
anyone with CAP_SYS_ADMIN and CAP_SYS_RESOURCE in the user namespace.
It is not clear to me why any of those exceptions exist so I figure
we should until this is actually a problem for someone before
we relax the permission checks here.

Cc: stable@vger.kernel.org
Reported-by: Michal Koutný <mkoutny@suse.com>
[1] https://lkml.kernel.org/r/20220207121800.5079-5-mkoutny@suse.com
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c               | 2 +-
 kernel/fork.c           | 2 +-
 kernel/user_namespace.c | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 1e7f757cbc2c..01c8c7bae9b4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	 */
 	if ((current->flags & PF_NPROC_CHECK) &&
 	    is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
-	    (current_user() != INIT_USER) &&
+	    (current_ucounts() != &init_ucounts) &&
 	    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
 		retval = -EAGAIN;
 		goto out_ret;
diff --git a/kernel/fork.c b/kernel/fork.c
index 2b6a28a86325..6f62d37f3650 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
 
 	retval = -EAGAIN;
 	if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
-		if (p->real_cred->user != INIT_USER &&
+		if ((task_ucounts(p) != &init_ucounts) &&
 		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
 			goto bad_fork_cleanup_count;
 	}
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6b2e3ca7ee99..f0c04073403d 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
 		ns->ucount_max[i] = INT_MAX;
 	}
 	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
+	if (new->ucounts == &init_ucounts)
+		set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
 	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
 	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
 	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit
       [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
                   ` (3 preceding siblings ...)
  2022-02-11  2:13 ` [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC Eric W. Biederman
@ 2022-02-11  2:13 ` Eric W. Biederman
  2022-02-12 22:36   ` Solar Designer
  4 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-11  2:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexey Gladkov, Kees Cook, Shuah Khan, Christian Brauner,
	Solar Designer, Ran Xiaokai, containers, Michal Koutný,
	Eric W. Biederman, stable

While examining is_ucounts_overlimit and reading the various messages
I realized that is_ucounts_overlimit fails to deal with counts that
may have wrapped.

Being wrapped should be a transitory state for counts and they should
never be wrapped for long, but it can happen so handle it.

Cc: stable@vger.kernel.org
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ucount.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/ucount.c b/kernel/ucount.c
index 65b597431c86..06ea04d44685 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -350,7 +350,8 @@ bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsign
 	if (rlimit > LONG_MAX)
 		max = LONG_MAX;
 	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
-		if (get_ucounts_value(iter, type) > max)
+		long val = get_ucounts_value(iter, type);
+		if (val < 0 || val > max)
 			return true;
 		max = READ_ONCE(iter->ns->ucount_max[type]);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit
  2022-02-11  2:13 ` [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit Eric W. Biederman
@ 2022-02-12 22:36   ` Solar Designer
  2022-02-14 15:23     ` Eric W. Biederman
  2022-02-14 17:16     ` David Laight
  0 siblings, 2 replies; 20+ messages in thread
From: Solar Designer @ 2022-02-12 22:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

On Thu, Feb 10, 2022 at 08:13:21PM -0600, Eric W. Biederman wrote:
> While examining is_ucounts_overlimit and reading the various messages
> I realized that is_ucounts_overlimit fails to deal with counts that
> may have wrapped.
> 
> Being wrapped should be a transitory state for counts and they should
> never be wrapped for long, but it can happen so handle it.
> 
> Cc: stable@vger.kernel.org
> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/ucount.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 65b597431c86..06ea04d44685 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -350,7 +350,8 @@ bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsign
>  	if (rlimit > LONG_MAX)
>  		max = LONG_MAX;
>  	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> -		if (get_ucounts_value(iter, type) > max)
> +		long val = get_ucounts_value(iter, type);
> +		if (val < 0 || val > max)
>  			return true;
>  		max = READ_ONCE(iter->ns->ucount_max[type]);
>  	}

You probably deliberately assume "gcc -fwrapv", but otherwise:

As you're probably aware, a signed integer wrapping is undefined
behavior in C.  In the function above, "val" having wrapped to negative
assumes we had occurred UB elsewhere.  Further, there's an instance of
UB in the function itself:

bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long rlimit)
{
	struct ucounts *iter;
	long max = rlimit;
	if (rlimit > LONG_MAX)
		max = LONG_MAX;

The assignment on "long max = rlimit;" would have already been UB if
"rlimit > LONG_MAX", which is only checked afterwards.  I think the
above would be better written as:

	if (rlimit > LONG_MAX)
		rlimit = LONG_MAX;
	long max = rlimit;

considering that "rlimit" is never used further in that function.

And to more likely avoid wraparound of "val", perhaps have the limit at
a value significantly lower than LONG_MAX, like half that?  So:

	if (rlimit > LONG_MAX / 2)
		rlimit = LONG_MAX / 2;
	long max = rlimit;

And sure, also keep the "val < 0" check as defensive programming, or you
can do:

	if (rlimit > LONG_MAX / 2)
		rlimit = LONG_MAX / 2;
[...]
		if ((unsigned long)get_ucounts_value(iter, type) > rlimit)
			return true;

and drop both "val" and "max".  However, this also assumes the return
type of get_ucounts_value() doesn't become larger than "unsigned long".

I assume that once is_ucounts_overlimit() returned true, it is expected
the value would almost not grow further (except a little due to races).

I also assume there's some reason a signed type is used there.

Alexander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve
  2022-02-11  2:13 ` [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve Eric W. Biederman
@ 2022-02-12 23:17   ` Solar Designer
  2022-02-14 15:10     ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Solar Designer @ 2022-02-12 23:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

On Thu, Feb 10, 2022 at 08:13:19PM -0600, Eric W. Biederman wrote:
> As of commit 2863643fb8b9 ("set_user: add capability check when
> rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
> should check RLIMIT_NPROC is buggy, as it tests the capabilites from
> before the credential change and not aftwards.
> 
> As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
> ucounts") examining the rlimit is buggy as cred->ucounts has not yet
> been properly set in the new credential.
> 
> Make the code correct and more robust moving the test to see if
> execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
> of the rest of the logic into execve() itself.
> 
> As the flag only indicateds that RLIMIT_NPROC should be checked
> in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.
> 
> Cc: stable@vger.kernel.org
> Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@suse.com
> Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@suse.com
> Reported-by: Michal Koutn?? <mkoutny@suse.com>
> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

On one hand, this looks good.

On the other, you asked about the Apache httpd suexec scenario in the
other thread, and here's what this means for it (per my code review):

In that scenario, we have two execve(): first from httpd to suexec, then
from suexec to the CGI script.  Previously, the limit check only
occurred on the setuid() call by suexec, and its effect was deferred
until execve() of the script.  Now wouldn't it occur on both execve()
calls, because commit_creds() is also called on execve() (such as in
case the program is SUID, which suexec actually is)?  Since the check is
kind of against real uid (not the euid=0 that suexec gains), it'd apply
the limit against httpd pseudo-user's process count.  While it could be
a reasonable kernel policy to impose this limit in more places, this is
a change of behavior for Apache httpd, and is not the intended behavior
there.  However, I think the answer to my question earlier in this
paragraph is actually a "no", the check wouldn't occur on the execve()
of suexec, because "new->user != old->user" would be false.  Right?

As an alternative, you could keep setting the (renamed and reused) flag
in set_user().  That would avoid the (non-)issue I described above - but
again, your patch is probably fine as-is.

I do see it's logical to have these two lines next to each other:

>  		inc_rlimit_ucounts(new->ucounts, UCOUNT_RLIMIT_NPROC, 1);
> +		task->flags |= PF_NPROC_CHECK;

Of course, someone would need to actually test this.

Alexander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve
  2022-02-12 23:17   ` Solar Designer
@ 2022-02-14 15:10     ` Eric W. Biederman
  2022-02-14 17:43       ` Eric W. Biederman
  2022-02-15 10:25       ` Michal Koutný
  0 siblings, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-14 15:10 UTC (permalink / raw)
  To: Solar Designer
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

Solar Designer <solar@openwall.com> writes:

> On Thu, Feb 10, 2022 at 08:13:19PM -0600, Eric W. Biederman wrote:
>> As of commit 2863643fb8b9 ("set_user: add capability check when
>> rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
>> should check RLIMIT_NPROC is buggy, as it tests the capabilites from
>> before the credential change and not aftwards.
>> 
>> As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
>> ucounts") examining the rlimit is buggy as cred->ucounts has not yet
>> been properly set in the new credential.
>> 
>> Make the code correct and more robust moving the test to see if
>> execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
>> of the rest of the logic into execve() itself.
>> 
>> As the flag only indicateds that RLIMIT_NPROC should be checked
>> in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.
>> 
>> Cc: stable@vger.kernel.org
>> Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@suse.com
>> Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@suse.com
>> Reported-by: Michal Koutn?? <mkoutny@suse.com>
>> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> On one hand, this looks good.
>
> On the other, you asked about the Apache httpd suexec scenario in the
> other thread, and here's what this means for it (per my code review):
>
> In that scenario, we have two execve(): first from httpd to suexec, then
> from suexec to the CGI script.  Previously, the limit check only
> occurred on the setuid() call by suexec, and its effect was deferred
> until execve() of the script.  Now wouldn't it occur on both execve()
> calls, because commit_creds() is also called on execve() (such as in
> case the program is SUID, which suexec actually is)?

Yes.  Moving the check into commit_creds means that the exec after a
suid exec will perform an RLIMIT_NPROC check and could possibly fail.  I
would call that a bug.  Anything happening in execve should be checked
and handled in execve as execve can fail.

It also points out that our permission checks for increasing
RLIMIT_NPROC are highly inconsistent.

One set of permissions in fork().
Another set of permissions in set*id() and delayed until execve.
No permission checks for the uid change in execve.

Every time I look into the previous behavior of RLIMIT_NPROC I seem
to find issues.  Currently I am planning a posting to linux-api
so sorting out what when RLIMIT_NPROC should be enforced and how
RLIMIT_NPROC gets accounted receives review.  I am also planning a
feature branch to deal with the historical goofiness.

I really like how cleanly this patch seems to be.  Unfortunately it is
wrong.

> Since the check is
> kind of against real uid (not the euid=0 that suexec gains), it'd apply
> the limit against httpd pseudo-user's process count.  While it could be
> a reasonable kernel policy to impose this limit in more places, this is
> a change of behavior for Apache httpd, and is not the intended behavior
> there.  However, I think the answer to my question earlier in this
> paragraph is actually a "no", the check wouldn't occur on the execve()
> of suexec, because "new->user != old->user" would be false.  Right?
>
> As an alternative, you could keep setting the (renamed and reused) flag
> in set_user().  That would avoid the (non-)issue I described above - but
> again, your patch is probably fine as-is.
>
> I do see it's logical to have these two lines next to each other:
>
>>  		inc_rlimit_ucounts(new->ucounts, UCOUNT_RLIMIT_NPROC, 1);
>> +		task->flags |= PF_NPROC_CHECK;
>
> Of course, someone would need to actually test this.

That too.

I am increasingly of the opinion that the process accounting should not
be in cred.c at all.  That we just remember the who was charged with the
process when we created it, and then at exec time we can update that
charge, and verify that the new user is solid.  At exit time we can look
up who was charged with the process and decrement the count.

Of course at this point my opinion may change after I implement that.

Eric


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit
  2022-02-12 22:36   ` Solar Designer
@ 2022-02-14 15:23     ` Eric W. Biederman
  2022-02-14 15:23       ` Eric W. Biederman
  2022-02-15 11:25       ` Michal Koutný
  2022-02-14 17:16     ` David Laight
  1 sibling, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-14 15:23 UTC (permalink / raw)
  To: Solar Designer
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

Solar Designer <solar@openwall.com> writes:

> On Thu, Feb 10, 2022 at 08:13:21PM -0600, Eric W. Biederman wrote:
>> While examining is_ucounts_overlimit and reading the various messages
>> I realized that is_ucounts_overlimit fails to deal with counts that
>> may have wrapped.
>> 
>> Being wrapped should be a transitory state for counts and they should
>> never be wrapped for long, but it can happen so handle it.
>> 
>> Cc: stable@vger.kernel.org
>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/ucount.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 65b597431c86..06ea04d44685 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -350,7 +350,8 @@ bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsign
>>  	if (rlimit > LONG_MAX)
>>  		max = LONG_MAX;
>>  	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> -		if (get_ucounts_value(iter, type) > max)
>> +		long val = get_ucounts_value(iter, type);
>> +		if (val < 0 || val > max)
>>  			return true;
>>  		max = READ_ONCE(iter->ns->ucount_max[type]);
>>  	}
>
> You probably deliberately assume "gcc -fwrapv", but otherwise:
>
> As you're probably aware, a signed integer wrapping is undefined
> behavior in C.  In the function above, "val" having wrapped to negative
> assumes we had occurred UB elsewhere.  Further, there's an instance of
> UB in the function itself:

While in cases like this we pass the value in a long, the operations on
the value occur in an atomic_long_t.  As atomic_long_t is implemented in
assembly we do escape the problems of undefined behavior.


> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long rlimit)
> {
> 	struct ucounts *iter;
> 	long max = rlimit;
> 	if (rlimit > LONG_MAX)
> 		max = LONG_MAX;
>
> The assignment on "long max = rlimit;" would have already been UB if
> "rlimit > LONG_MAX", which is only checked afterwards.  I think the
> above would be better written as:
>
> 	if (rlimit > LONG_MAX)
> 		rlimit = LONG_MAX;
> 	long max = rlimit;
>
> considering that "rlimit" is never used further in that function.

Thank you for spotting that.  That looks like a good idea.  Even if it
works in this case it is better to establish patterns that are not
problematic if copy and pasted elsewhere.

> And to more likely avoid wraparound of "val", perhaps have the limit at
> a value significantly lower than LONG_MAX, like half that?  So:

For the case of RLIMIT_NPROC the real world limit is PID_MAX_LIMIT
which is 2^22.

Beyond that the code deliberately uses all values with the high bit/sign
bit set to flag that things went too high.  So the code already reserves
half of the values.

> I assume that once is_ucounts_overlimit() returned true, it is expected
> the value would almost not grow further (except a little due to races).

Pretty much. The function essentially only exists so that we can
handle the weirdness of RLIMIT_NPROC.  Now that I have discovered the
weirdness of RLIMIT_NPROC is old historical sloppiness I expect the
proper solution is to rework how RLIMIT_NPROC operates and to remove
is_ucounts_overlimit all together.   I have to figure out what a proper
RLIMIT_NPROC check looks like in proc.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit
  2022-02-14 15:23     ` Eric W. Biederman
@ 2022-02-14 15:23       ` Eric W. Biederman
  2022-02-15 11:25       ` Michal Koutný
  1 sibling, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-14 15:23 UTC (permalink / raw)
  To: Solar Designer
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Solar Designer <solar@openwall.com> writes:
>
>> On Thu, Feb 10, 2022 at 08:13:21PM -0600, Eric W. Biederman wrote:
>>> While examining is_ucounts_overlimit and reading the various messages
>>> I realized that is_ucounts_overlimit fails to deal with counts that
>>> may have wrapped.
>>> 
>>> Being wrapped should be a transitory state for counts and they should
>>> never be wrapped for long, but it can happen so handle it.
>>> 
>>> Cc: stable@vger.kernel.org
>>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  kernel/ucount.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>>> index 65b597431c86..06ea04d44685 100644
>>> --- a/kernel/ucount.c
>>> +++ b/kernel/ucount.c
>>> @@ -350,7 +350,8 @@ bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsign
>>>  	if (rlimit > LONG_MAX)
>>>  		max = LONG_MAX;
>>>  	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>>> -		if (get_ucounts_value(iter, type) > max)
>>> +		long val = get_ucounts_value(iter, type);
>>> +		if (val < 0 || val > max)
>>>  			return true;
>>>  		max = READ_ONCE(iter->ns->ucount_max[type]);
>>>  	}
>>
>> You probably deliberately assume "gcc -fwrapv", but otherwise:
>>
>> As you're probably aware, a signed integer wrapping is undefined
>> behavior in C.  In the function above, "val" having wrapped to negative
>> assumes we had occurred UB elsewhere.  Further, there's an instance of
>> UB in the function itself:
>
> While in cases like this we pass the value in a long, the operations on
> the value occur in an atomic_long_t.  As atomic_long_t is implemented in
> assembly we do escape the problems of undefined behavior.
>
>
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long rlimit)
>> {
>> 	struct ucounts *iter;
>> 	long max = rlimit;
>> 	if (rlimit > LONG_MAX)
>> 		max = LONG_MAX;
>>
>> The assignment on "long max = rlimit;" would have already been UB if
>> "rlimit > LONG_MAX", which is only checked afterwards.  I think the
>> above would be better written as:
>>
>> 	if (rlimit > LONG_MAX)
>> 		rlimit = LONG_MAX;
>> 	long max = rlimit;
>>
>> considering that "rlimit" is never used further in that function.
>
> Thank you for spotting that.  That looks like a good idea.  Even if it
> works in this case it is better to establish patterns that are not
> problematic if copy and pasted elsewhere.
>
>> And to more likely avoid wraparound of "val", perhaps have the limit at
>> a value significantly lower than LONG_MAX, like half that?  So:
>
> For the case of RLIMIT_NPROC the real world limit is PID_MAX_LIMIT
> which is 2^22.
>
> Beyond that the code deliberately uses all values with the high bit/sign
> bit set to flag that things went too high.  So the code already reserves
> half of the values.
>
>> I assume that once is_ucounts_overlimit() returned true, it is expected
>> the value would almost not grow further (except a little due to races).
>
> Pretty much. The function essentially only exists so that we can
> handle the weirdness of RLIMIT_NPROC.  Now that I have discovered the
> weirdness of RLIMIT_NPROC is old historical sloppiness I expect the
> proper solution is to rework how RLIMIT_NPROC operates and to remove
> is_ucounts_overlimit all together.   I have to figure out what a proper
> RLIMIT_NPROC check looks like in proc.
                                   ^^^^ execve

Eric


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit
  2022-02-12 22:36   ` Solar Designer
  2022-02-14 15:23     ` Eric W. Biederman
@ 2022-02-14 17:16     ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2022-02-14 17:16 UTC (permalink / raw)
  To: 'Solar Designer', Eric W. Biederman
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

From: Solar Designer
> Sent: 12 February 2022 22:37
...
> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long rlimit)
> {
> 	struct ucounts *iter;
> 	long max = rlimit;
> 	if (rlimit > LONG_MAX)
> 		max = LONG_MAX;
> 
> The assignment on "long max = rlimit;" would have already been UB if
> "rlimit > LONG_MAX", which is only checked afterwards.  I think the
> above would be better written as:

I'm pretty sure assignments and casts of negative values to unsigned
types are actually well defined.
Although the actual value may differ for ones-compliment and
sign-overpunch systems.
But I suspect Linux requires twos-compliment negative numbers.

(In much the same way as it requires that NULL be the all zero
bit pattern - although a load of annoying compiler warnings are only
relevant if that isn't the case.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve
  2022-02-14 15:10     ` Eric W. Biederman
@ 2022-02-14 17:43       ` Eric W. Biederman
  2022-02-15 10:25       ` Michal Koutný
  1 sibling, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-14 17:43 UTC (permalink / raw)
  To: Solar Designer
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Ran Xiaokai, Michal Koutn??,
	stable

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Solar Designer <solar@openwall.com> writes:
>
>> On Thu, Feb 10, 2022 at 08:13:19PM -0600, Eric W. Biederman wrote:
>>> As of commit 2863643fb8b9 ("set_user: add capability check when
>>> rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
>>> should check RLIMIT_NPROC is buggy, as it tests the capabilites from
>>> before the credential change and not aftwards.
>>> 
>>> As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
>>> ucounts") examining the rlimit is buggy as cred->ucounts has not yet
>>> been properly set in the new credential.
>>> 
>>> Make the code correct and more robust moving the test to see if
>>> execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
>>> of the rest of the logic into execve() itself.
>>> 
>>> As the flag only indicateds that RLIMIT_NPROC should be checked
>>> in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.
>>> 
>>> Cc: stable@vger.kernel.org
>>> Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@suse.com
>>> Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@suse.com
>>> Reported-by: Michal Koutn?? <mkoutny@suse.com>
>>> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
>>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> On one hand, this looks good.
>>
>> On the other, you asked about the Apache httpd suexec scenario in the
>> other thread, and here's what this means for it (per my code review):
>>
>> In that scenario, we have two execve(): first from httpd to suexec, then
>> from suexec to the CGI script.  Previously, the limit check only
>> occurred on the setuid() call by suexec, and its effect was deferred
>> until execve() of the script.  Now wouldn't it occur on both execve()
>> calls, because commit_creds() is also called on execve() (such as in
>> case the program is SUID, which suexec actually is)?
>
> Yes.  Moving the check into commit_creds means that the exec after a
> suid exec will perform an RLIMIT_NPROC check and could possibly fail.  I
> would call that a bug.  Anything happening in execve should be checked
> and handled in execve as execve can fail.
>
> It also points out that our permission checks for increasing
> RLIMIT_NPROC are highly inconsistent.
>
> One set of permissions in fork().
> Another set of permissions in set*id() and delayed until execve.
> No permission checks for the uid change in execve.
>
> Every time I look into the previous behavior of RLIMIT_NPROC I seem
> to find issues.  Currently I am planning a posting to linux-api
> so sorting out what when RLIMIT_NPROC should be enforced and how
> RLIMIT_NPROC gets accounted receives review.  I am also planning a
> feature branch to deal with the historical goofiness.
>
> I really like how cleanly this patch seems to be.  Unfortunately it is
> wrong.

Hmm.  Maybe not as wrong as I thought.  An suid execve does not change
the real user.

Still a bit wrong from a conservative change point of view because the
user namespace can change in setns and CLONE_NEWUSER which will change
the accounting now.  Which with the ucount rlimit stuff changes where
things should be accounted.

I am playing with the idea of changing accounting aka (cred->ucounts &
cred->user) to only change in fork (aka clone without CLONE_THREAD) and
exec.  I think that would make maintenance and  cleaning all of this up
easier.

That would also remove serious complications from RLIMIT_SIGPENDING as
well.

I thought SIGPENDING was only a multi-threaded process issue but from
one signal to the next the set*id() family functions can be called.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression
  2022-02-11  2:13 ` [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression Eric W. Biederman
@ 2022-02-14 18:37   ` Michal Koutný
  2022-02-16 15:22     ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Koutný @ 2022-02-14 18:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Solar Designer, Ran Xiaokai, containers,
	stable

On Thu, Feb 10, 2022 at 08:13:17PM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> This can be fixed either by fixing the test or by moving the increment
> to be before the test.  Fix it my moving copy_creds which contains
> the increment before is_ucounts_overlimit.

This is simpler than my approach and I find it correct too.

> Both the test in fork and the test in set_user were semantically
> changed when the code moved to ucounts.  The change of the test in
> fork was bad because it was before the increment.
>
> The test in set_user was wrong and the change to ucounts fixed it.  So
> this fix is only restore the old behavior in one lcatio not two.

Whom should be the task accounted to in the case of set*uid? (The change
to ucounts made the check against the pre-switch user's ucounts.)

> ---
>  kernel/fork.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Michal Koutný <mkoutny@suse.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve
  2022-02-14 15:10     ` Eric W. Biederman
  2022-02-14 17:43       ` Eric W. Biederman
@ 2022-02-15 10:25       ` Michal Koutný
  2022-02-16 15:35         ` Eric W. Biederman
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Koutný @ 2022-02-15 10:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Solar Designer, linux-kernel, Alexey Gladkov, Kees Cook,
	Shuah Khan, Christian Brauner, Ran Xiaokai, stable

On Mon, Feb 14, 2022 at 09:10:49AM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> I really like how cleanly this patch seems to be.  Unfortunately it is
> wrong.

It seems [1] so:

setuid()		// RLIMIT_NPROC is fine at this moment
...		fork()
		...
...		fork()
execve()		// eh, oh

This "punishes" the exec'ing task although the cause is elsewhere.

Michal

[1] The decoupled setuid()+execve() check can be interpretted both ways.
I understood historically the excess at the setuid() moment is relevant.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC
  2022-02-11  2:13 ` [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC Eric W. Biederman
@ 2022-02-15 10:54   ` Michal Koutný
  2022-02-16 15:41     ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Koutný @ 2022-02-15 10:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Solar Designer, Ran Xiaokai, containers,
	stable

On Thu, Feb 10, 2022 at 08:13:20PM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> @@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
[...]
> -	    (current_user() != INIT_USER) &&
> +	    (current_ucounts() != &init_ucounts) &&
[...]
> @@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
[...]
> -		if (p->real_cred->user != INIT_USER &&
> +		if ((task_ucounts(p) != &init_ucounts) &&

These substitutions make sense to me.

>  		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>  			goto bad_fork_cleanup_count;
>  	}
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 6b2e3ca7ee99..f0c04073403d 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
>  		ns->ucount_max[i] = INT_MAX;
>  	}
>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
> +	if (new->ucounts == &init_ucounts)
> +		set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));

First, I wanted to object this double fork_init() but I realized it's
relevant for newly created user_ns.

Second, I think new->ucounts would be correct at this point and the
check should be

> if (ucounts == &init_ucounts)

i.e. before set_cred_ucounts() new->ucounts may not be correct.

I'd suggest also a comment in the create_user_ns() explaining the
reason is to exempt global root from RLIMINT_NRPOC also indirectly via
descendant user_nss.

Thanks,
Michal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/8] ucounts: Fix set_cred_ucounts
  2022-02-11  2:13 ` [PATCH 2/8] ucounts: Fix set_cred_ucounts Eric W. Biederman
@ 2022-02-15 11:10   ` Michal Koutný
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Koutný @ 2022-02-15 11:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Solar Designer, Ran Xiaokai, containers,
	stable

On Thu, Feb 10, 2022 at 08:13:18PM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 473d17c431f3..933155c96922 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -665,21 +665,16 @@ EXPORT_SYMBOL(cred_fscmp);
>  
>  int set_cred_ucounts(struct cred *new)
>  {
> -	struct task_struct *task = current;
> -	const struct cred *old = task->real_cred;
>  	struct ucounts *new_ucounts, *old_ucounts = new->ucounts;
>  
> -	if (new->user == old->user && new->user_ns == old->user_ns)
> -		return 0;
> -
>  	/*
>  	 * This optimization is needed because alloc_ucounts() uses locks
>  	 * for table lookups.
>  	 */
> -	if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->euid))
> +	if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->uid))
>  		return 0;
>  
> -	if (!(new_ucounts = alloc_ucounts(new->user_ns, new->euid)))
> +	if (!(new_ucounts = alloc_ucounts(new->user_ns, new->uid)))
>  		return -EAGAIN;
>  
>  	new->ucounts = new_ucounts;

Reviewed-by: Michal Koutný <mkoutny@suse.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit
  2022-02-14 15:23     ` Eric W. Biederman
  2022-02-14 15:23       ` Eric W. Biederman
@ 2022-02-15 11:25       ` Michal Koutný
  1 sibling, 0 replies; 20+ messages in thread
From: Michal Koutný @ 2022-02-15 11:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Solar Designer, linux-kernel, Alexey Gladkov, Kees Cook,
	Shuah Khan, Christian Brauner, Ran Xiaokai, stable

On Mon, Feb 14, 2022 at 09:23:09AM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Pretty much. The function essentially only exists so that we can
> handle the weirdness of RLIMIT_NPROC.
> Now that I have discovered the
> weirdness of RLIMIT_NPROC is old historical sloppiness I expect the
> proper solution is to rework how RLIMIT_NPROC operates and to remove
> is_ucounts_overlimit all together.

The fork path could make do with some kind of atomic add+check (similar
to inc_ucounts) and overflows would be sanitized by that. (Seems to
apply to other former RLIMIT_* per-user counters too.)

The is_ucounts_overlimit() and overflowable increment indeed appears
necessary only to satisfy the set*uid+execve pair.

For the sake of bug-fixing, both the patches 5/8 and 6/8 can have 
Reviewed-by: Michal Koutný <mkoutny@suse.com>


Michal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression
  2022-02-14 18:37   ` Michal Koutný
@ 2022-02-16 15:22     ` Eric W. Biederman
  0 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-16 15:22 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Solar Designer, Ran Xiaokai, containers,
	stable

Michal Koutný <mkoutny@suse.com> writes:

> On Thu, Feb 10, 2022 at 08:13:17PM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> This can be fixed either by fixing the test or by moving the increment
>> to be before the test.  Fix it my moving copy_creds which contains
>> the increment before is_ucounts_overlimit.
>
> This is simpler than my approach and I find it correct too.
>
>> Both the test in fork and the test in set_user were semantically
>> changed when the code moved to ucounts.  The change of the test in
>> fork was bad because it was before the increment.
>>
>> The test in set_user was wrong and the change to ucounts fixed it.  So
>> this fix is only restore the old behavior in one lcatio not two.
>
> Whom should be the task accounted to in the case of set*uid? (The change
> to ucounts made the check against the pre-switch user's ucounts.)

It needs to be post-switch in the case of set*id().

I have that fixed in the next version of my patchset.

>> ---
>>  kernel/fork.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve
  2022-02-15 10:25       ` Michal Koutný
@ 2022-02-16 15:35         ` Eric W. Biederman
  0 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-16 15:35 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Solar Designer, linux-kernel, Alexey Gladkov, Kees Cook,
	Shuah Khan, Christian Brauner, Ran Xiaokai, stable

Michal Koutný <mkoutny@suse.com> writes:

> On Mon, Feb 14, 2022 at 09:10:49AM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> I really like how cleanly this patch seems to be.  Unfortunately it is
>> wrong.
>
> It seems [1] so:
>
> setuid()		// RLIMIT_NPROC is fine at this moment
> ...		fork()
> 		...
> ...		fork()
> execve()		// eh, oh
>
> This "punishes" the exec'ing task although the cause is elsewhere.
>
> Michal
>
> [1] The decoupled setuid()+execve() check can be interpretted both ways.
> I understood historically the excess at the setuid() moment is
> relevant.

I have been digging into this to understand why we are doing the strange
things we are doing.

Ordinarily for rlimits we are fine with letting things go over limit
until we reach a case where we need the limit (which would be fork in
the RLIMIT_NPROC case).  So things like setrlimit do not check your
counts to see if you will be over the limit.

The practical problem with fork in the unix model is that you can not
change limits or do anything for the new process until it is created
(with clone/fork).  Making it impossible to set the rlimits and change
the user before the new process is created.

The result is that in applications like apache when they run cgi scripts
(as a different user than the apache process) RLIMIT_NPROC did not work
until a check was placed into set*id() as well.  As the typical cgi
script did not fork it just did it's work and exited.

That it was discovered that allowing set*id() to fail was a footgun for
privileged processes.  And we have the existing system.


Which leads me to the starting point that set*id() checking rlimits is
a necessary but fundamentally a special case.

As long as the original use case works I think there is some latitude in
the implementation.  Maybe we set a flag and perform all of the checks
in exec.  Maybe we just send SIGKILL.  Maybe we just say it is an ugly
wart but it is our ugly wart and comment it and leave it alone.
I am leaving  that decision to a clean-up patchset.










^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC
  2022-02-15 10:54   ` Michal Koutný
@ 2022-02-16 15:41     ` Eric W. Biederman
  0 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2022-02-16 15:41 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-kernel, Alexey Gladkov, Kees Cook, Shuah Khan,
	Christian Brauner, Solar Designer, Ran Xiaokai, containers,
	stable

Michal Koutný <mkoutny@suse.com> writes:

> On Thu, Feb 10, 2022 at 08:13:20PM -0600, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> @@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
> [...]
>> -	    (current_user() != INIT_USER) &&
>> +	    (current_ucounts() != &init_ucounts) &&
> [...]
>> @@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
> [...]
>> -		if (p->real_cred->user != INIT_USER &&
>> +		if ((task_ucounts(p) != &init_ucounts) &&
>
> These substitutions make sense to me.
>
>>  		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>>  			goto bad_fork_cleanup_count;
>>  	}
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 6b2e3ca7ee99..f0c04073403d 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
>>  		ns->ucount_max[i] = INT_MAX;
>>  	}
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
>> +	if (new->ucounts == &init_ucounts)
>> +		set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
>
> First, I wanted to object this double fork_init() but I realized it's
> relevant for newly created user_ns.
>
> Second, I think new->ucounts would be correct at this point and the
> check should be
>
>> if (ucounts == &init_ucounts)
>
> i.e. before set_cred_ucounts() new->ucounts may not be correct.
>
> I'd suggest also a comment in the create_user_ns() explaining the
> reason is to exempt global root from RLIMINT_NRPOC also indirectly via
> descendant user_nss.

Yes.

This one got culled from my next version of the patchset as it is not
conservative enough.  I think it is probably the right general
direction.

On further reflection I am not convinced that it makes sense to test
user or ucounts.  They are really not fields designed to support
permission checks.

I think if we want to exempt the root user's children from the root
users rlimit using the second set_rlimit_ucount_max is the way to go.

Someone filed a bug that strongly suggests that we want the second
set_rlimit_ucount_max:
https://bugzilla.kernel.org/show_bug.cgi?id=215596

I am still trying to understand that case. 

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-02-16 15:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
2022-02-11  2:13 ` [PATCH 1/8] ucounts: Fix RLIMIT_NPROC regression Eric W. Biederman
2022-02-14 18:37   ` Michal Koutný
2022-02-16 15:22     ` Eric W. Biederman
2022-02-11  2:13 ` [PATCH 2/8] ucounts: Fix set_cred_ucounts Eric W. Biederman
2022-02-15 11:10   ` Michal Koutný
2022-02-11  2:13 ` [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve Eric W. Biederman
2022-02-12 23:17   ` Solar Designer
2022-02-14 15:10     ` Eric W. Biederman
2022-02-14 17:43       ` Eric W. Biederman
2022-02-15 10:25       ` Michal Koutný
2022-02-16 15:35         ` Eric W. Biederman
2022-02-11  2:13 ` [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC Eric W. Biederman
2022-02-15 10:54   ` Michal Koutný
2022-02-16 15:41     ` Eric W. Biederman
2022-02-11  2:13 ` [PATCH 5/8] ucounts: Handle wrapping in is_ucounts_overlimit Eric W. Biederman
2022-02-12 22:36   ` Solar Designer
2022-02-14 15:23     ` Eric W. Biederman
2022-02-14 15:23       ` Eric W. Biederman
2022-02-15 11:25       ` Michal Koutný
2022-02-14 17:16     ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).