linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] CAP_SYS_NICE inside user namespace
@ 2019-11-01 18:18 Prakash Sangappa
  2019-11-01 18:18 ` [RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa
  0 siblings, 1 reply; 2+ messages in thread
From: Prakash Sangappa @ 2019-11-01 18:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: ebiederm, tglx, peterz, serge, prakash.sangappa

Some of the capabilities(7) which affect system wide resources, are ineffective
inside user namespaces. This restriction applies even to root user( uid 0)
from init namespace mapped into the user namespace. One such capability
is CAP_SYS_NICE which is required to change process priority. As a result of
which the root user cannot perform operations like increase a process priority
using -ve nice value or set RT priority on processes inside the user namespace.
A workaround to deal with this restriction is to use the help of a process /
daemon running outside the user namespace to change process priority, which is
a an inconvenience.

We could allow these restricted capabilities to take effect only for the root
user from init namespace mapped inside a user namespace and limit the effect
with use of cgroups. It would seem reasonable to deal with each of these
restricted capabilities on a case by case basis and address them. This patch
is concerning CAP_SYS_NICE capability. The proposal here is to selectively
allow CAP_SYS_NICE to take effect inside user namespace only for a root user
mapped from init name space. 

Which user id gets to map the root user(uid 0) from init namespace inside its
user namespaces is authorized thru /etc/subuid & /etc/subgid entries. Only
system admin / root user on the system can add these entries.
Therefore any ordinary user cannot simply map the root user(uid 0) into
user namespaces created. Necessary cgroup bandwidth control can be used
to limit cpu usage for such user namespaces.

The capabilities(7) manpage lists all the operations / system calls that are
subject to CAP_SYS_NICE capability check. This patch currently allows
CAP_SYS_NICE to take effect inside a user namespace only for system calls
affecting process priority. For completeness sake should memory
operations(migrate_pages(2), move_pages(2), mbind(2)) mentioned in the
manpage, also be permitted? There are no cgroup controls to limit the effect
of these memory operations.

Looking for feedback on this approach.

Prakash Sangappa (1):
  Selectively allow CAP_SYS_NICE capability inside user namespaces

 kernel/sched/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-01 18:18 [RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Prakash Sangappa
@ 2019-11-01 18:18 ` Prakash Sangappa
  0 siblings, 0 replies; 2+ messages in thread
From: Prakash Sangappa @ 2019-11-01 18:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: ebiederm, tglx, peterz, serge, prakash.sangappa

Allow CAP_SYS_NICE to take effect for processes having effective uid of a
root user from init namespace.

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
---
 kernel/sched/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f..628bd46 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice)
 	int nice_rlim = nice_to_rlimit(nice);
 
 	return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
+		(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
+		uid_eq(current_euid(), GLOBAL_ROOT_UID)) ||
 		capable(CAP_SYS_NICE));
 }
 
@@ -4784,7 +4786,9 @@ static int __sched_setscheduler(struct task_struct *p,
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (user && !capable(CAP_SYS_NICE)) {
+	if (user && !(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
+		uid_eq(current_euid(), GLOBAL_ROOT_UID)) &&
+		!capable(CAP_SYS_NICE)) {
 		if (fair_policy(policy)) {
 			if (attr->sched_nice < task_nice(p) &&
 			    !can_nice(p, attr->sched_nice))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-11-01 18:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 18:18 [RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Prakash Sangappa
2019-11-01 18:18 ` [RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).