linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace
@ 2019-11-18 17:01 Prakash Sangappa
  2019-11-18 17:01 ` [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa
  2019-11-18 19:36 ` [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Jann Horn
  0 siblings, 2 replies; 11+ messages in thread
From: Prakash Sangappa @ 2019-11-18 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-api; +Cc: ebiederm, tglx, peterz, serge, prakash.sangappa

Some of the capabilities(7) which affect system wide resources, are ineffective
inside user namespaces. This restriction applies even to root user( uid 0)
from init namespace mapped into the user namespace. One such capability
is CAP_SYS_NICE which is required to change process priority. As a result of
which the root user cannot perform operations like increase a process priority
using -ve nice value or set RT priority on processes inside the user namespace.
A workaround to deal with this restriction is to use the help of a process /
daemon running outside the user namespace to change process priority, which is
a an inconvenience.

We could allow these restricted capabilities to take effect only for the root
user from init namespace mapped inside a user namespace and limit the effect
with use of cgroups. It would seem reasonable to deal with each of these
restricted capabilities on a case by case basis and address them. This patch
is concerning CAP_SYS_NICE capability. The proposal here is to selectively
allow CAP_SYS_NICE to take effect inside user namespace only for a root user
mapped from init name space. 

Which user id gets to map the root user(uid 0) from init namespace inside its
user namespaces is authorized thru /etc/subuid & /etc/subgid entries. Only
system admin / root user on the system can add these entries.
Therefore any ordinary user cannot simply map the root user(uid 0) into
user namespaces created. Necessary cgroup bandwidth control can be used
to limit cpu usage for such user namespaces.

The capabilities(7) manpage lists all the operations / system calls that are
subject to CAP_SYS_NICE capability check. This patch currently allows
CAP_SYS_NICE to take effect inside a user namespace only for system calls
affecting process priority. For completeness sake should memory
operations(migrate_pages(2), move_pages(2), mbind(2)) mentioned in the
manpage, also be permitted? There are no cgroup controls to limit the effect
of these memory operations.

Looking for feedback on this approach.

Prakash Sangappa (1):
  Selectively allow CAP_SYS_NICE capability inside user namespaces

 kernel/sched/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-18 17:01 [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Prakash Sangappa
@ 2019-11-18 17:01 ` Prakash Sangappa
  2019-11-18 19:30   ` Jann Horn
  2019-11-21 21:27   ` Eric W. Biederman
  2019-11-18 19:36 ` [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Jann Horn
  1 sibling, 2 replies; 11+ messages in thread
From: Prakash Sangappa @ 2019-11-18 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-api; +Cc: ebiederm, tglx, peterz, serge, prakash.sangappa

Allow CAP_SYS_NICE to take effect for processes having effective uid of a
root user from init namespace.

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
---
 kernel/sched/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f..628bd46 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice)
 	int nice_rlim = nice_to_rlimit(nice);
 
 	return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
+		(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
+		uid_eq(current_euid(), GLOBAL_ROOT_UID)) ||
 		capable(CAP_SYS_NICE));
 }
 
@@ -4784,7 +4786,9 @@ static int __sched_setscheduler(struct task_struct *p,
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (user && !capable(CAP_SYS_NICE)) {
+	if (user && !(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
+		uid_eq(current_euid(), GLOBAL_ROOT_UID)) &&
+		!capable(CAP_SYS_NICE)) {
 		if (fair_policy(policy)) {
 			if (attr->sched_nice < task_nice(p) &&
 			    !can_nice(p, attr->sched_nice))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-18 17:01 ` [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa
@ 2019-11-18 19:30   ` Jann Horn
  2019-11-19  0:46     ` prakash.sangappa
  2019-11-21 21:27   ` Eric W. Biederman
  1 sibling, 1 reply; 11+ messages in thread
From: Jann Horn @ 2019-11-18 19:30 UTC (permalink / raw)
  To: Prakash Sangappa
  Cc: kernel list, Linux API, Eric W. Biederman, Thomas Gleixner,
	Peter Zijlstra, Serge E. Hallyn, Christian Brauner

On Mon, Nov 18, 2019 at 6:04 PM Prakash Sangappa
<prakash.sangappa@oracle.com> wrote:
> Allow CAP_SYS_NICE to take effect for processes having effective uid of a
> root user from init namespace.
[...]
> @@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice)
>         int nice_rlim = nice_to_rlimit(nice);
>
>         return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
> +               (ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
> +               uid_eq(current_euid(), GLOBAL_ROOT_UID)) ||
>                 capable(CAP_SYS_NICE));

I very strongly dislike tying such a feature to GLOBAL_ROOT_UID.
Wouldn't it be better to control this through procfs, similar to
uid_map and gid_map? If you really need an escape hatch to become
privileged outside a user namespace, then I'd much prefer a file
"cap_map" that lets someone with appropriate capabilities in the outer
namespace write a bitmask of capabilities that should have effect
outside the container, or something like that. And limit that to bits
where that's sane, like CAP_SYS_NICE.

If we tie features like this to GLOBAL_ROOT_UID, more people are going
to run their containers with GLOBAL_ROOT_UID. Which is a terrible,
terrible idea. GLOBAL_ROOT_UID gives you privilege over all sorts of
files that you shouldn't be able to access, and only things like mount
namespaces and possibly LSMs prevent you from exercising that
privilege. GLOBAL_ROOT_UID should only ever be given to processes that
you trust completely.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace
  2019-11-18 17:01 [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Prakash Sangappa
  2019-11-18 17:01 ` [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa
@ 2019-11-18 19:36 ` Jann Horn
  2019-11-18 20:34   ` Prakash Sangappa
  1 sibling, 1 reply; 11+ messages in thread
From: Jann Horn @ 2019-11-18 19:36 UTC (permalink / raw)
  To: Prakash Sangappa
  Cc: kernel list, Linux API, Eric W. Biederman, Thomas Gleixner,
	Peter Zijlstra, Serge E. Hallyn, Christian Brauner

On Mon, Nov 18, 2019 at 6:04 PM Prakash Sangappa
<prakash.sangappa@oracle.com> wrote:
> Some of the capabilities(7) which affect system wide resources, are ineffective
> inside user namespaces. This restriction applies even to root user( uid 0)
> from init namespace mapped into the user namespace. One such capability
> is CAP_SYS_NICE which is required to change process priority. As a result of
> which the root user cannot perform operations like increase a process priority
> using -ve nice value or set RT priority on processes inside the user namespace.
> A workaround to deal with this restriction is to use the help of a process /
> daemon running outside the user namespace to change process priority, which is
> a an inconvenience.

What is the goal here, in the big picture? Is your goal to allow
container admins to control the priorities of their tasks *relative to
each other*, or do you actually explicitly want container A to be able
to decide that its current workload is more timing-sensitive than
container B's?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace
  2019-11-18 19:36 ` [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Jann Horn
@ 2019-11-18 20:34   ` Prakash Sangappa
  2019-11-21 18:33     ` Enrico Weigelt, metux IT consult
  0 siblings, 1 reply; 11+ messages in thread
From: Prakash Sangappa @ 2019-11-18 20:34 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Linux API, Eric W. Biederman, Thomas Gleixner,
	Peter Zijlstra, Serge E. Hallyn, Christian Brauner



On 11/18/19 11:36 AM, Jann Horn wrote:
> On Mon, Nov 18, 2019 at 6:04 PM Prakash Sangappa
> <prakash.sangappa@oracle.com> wrote:
>> Some of the capabilities(7) which affect system wide resources, are ineffective
>> inside user namespaces. This restriction applies even to root user( uid 0)
>> from init namespace mapped into the user namespace. One such capability
>> is CAP_SYS_NICE which is required to change process priority. As a result of
>> which the root user cannot perform operations like increase a process priority
>> using -ve nice value or set RT priority on processes inside the user namespace.
>> A workaround to deal with this restriction is to use the help of a process /
>> daemon running outside the user namespace to change process priority, which is
>> a an inconvenience.
> What is the goal here, in the big picture? Is your goal to allow
> container admins to control the priorities of their tasks *relative to
> each other*, or do you actually explicitly want container A to be able
> to decide that its current workload is more timing-sensitive than
> container B's?

It is more the latter. Admin should be able to explicitly decide that 
container A
workload is to be given priority over other containers.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-18 19:30   ` Jann Horn
@ 2019-11-19  0:46     ` prakash.sangappa
  0 siblings, 0 replies; 11+ messages in thread
From: prakash.sangappa @ 2019-11-19  0:46 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Linux API, Eric W. Biederman, Thomas Gleixner,
	Peter Zijlstra, Serge E. Hallyn, Christian Brauner



On 11/18/2019 11:30 AM, Jann Horn wrote:
> On Mon, Nov 18, 2019 at 6:04 PM Prakash Sangappa
> <prakash.sangappa@oracle.com> wrote:
>> Allow CAP_SYS_NICE to take effect for processes having effective uid of a
>> root user from init namespace.
> [...]
>> @@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice)
>>          int nice_rlim = nice_to_rlimit(nice);
>>
>>          return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
>> +               (ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
>> +               uid_eq(current_euid(), GLOBAL_ROOT_UID)) ||
>>                  capable(CAP_SYS_NICE));
> I very strongly dislike tying such a feature to GLOBAL_ROOT_UID.
> Wouldn't it be better to control this through procfs, similar to
> uid_map and gid_map? If you really need an escape hatch to become
> privileged outside a user namespace, then I'd much prefer a file
> "cap_map" that lets someone with appropriate capabilities in the outer
> namespace write a bitmask of capabilities that should have effect
> outside the container, or something like that. And limit that to bits
> where that's sane, like CAP_SYS_NICE.

Sounds reasonable. Adding a 'cap_map' file to user namespace, would give 
more control. We could allow the  capability in 'cap_map' to take effect 
only if corresponding capability is enabled for the user inside the user 
namespace Ex uid 0. Start with support for CAP_SYS_NICE?


>
> If we tie features like this to GLOBAL_ROOT_UID, more people are going
> to run their containers with GLOBAL_ROOT_UID. Which is a terrible,
> terrible idea. GLOBAL_ROOT_UID gives you privilege over all sorts of
> files that you shouldn't be able to access, and only things like mount
> namespaces and possibly LSMs prevent you from exercising that
> privilege. GLOBAL_ROOT_UID should only ever be given to processes that
> you trust completely.

Agreed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace
  2019-11-18 20:34   ` Prakash Sangappa
@ 2019-11-21 18:33     ` Enrico Weigelt, metux IT consult
  2019-11-22  1:54       ` Prakash Sangappa
  0 siblings, 1 reply; 11+ messages in thread
From: Enrico Weigelt, metux IT consult @ 2019-11-21 18:33 UTC (permalink / raw)
  To: Prakash Sangappa, Jann Horn
  Cc: kernel list, Linux API, Eric W. Biederman, Thomas Gleixner,
	Peter Zijlstra, Serge E. Hallyn, Christian Brauner

On 18.11.19 21:34, Prakash Sangappa wrote:

> It is more the latter. Admin should be able to explicitly decide that
> container A
> workload is to be given priority over other containers.

I guess, you're talking about the host's admin, correct ?

Shouldn't this already be possibly by tweaking the container's cgroups ?


--mtx

-- 
Dringender Hinweis: aufgrund existenzieller Bedrohung durch "Emotet"
sollten Sie *niemals* MS-Office-Dokumente via E-Mail annehmen/öffenen,
selbst wenn diese von vermeintlich vertrauenswürdigen Absendern zu
stammen scheinen. Andernfalls droht Totalschaden.
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-18 17:01 ` [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa
  2019-11-18 19:30   ` Jann Horn
@ 2019-11-21 21:27   ` Eric W. Biederman
  2019-11-22  1:45     ` Prakash Sangappa
  1 sibling, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2019-11-21 21:27 UTC (permalink / raw)
  To: Prakash Sangappa; +Cc: linux-kernel, linux-api, tglx, peterz, serge

Prakash Sangappa <prakash.sangappa@oracle.com> writes:

> Allow CAP_SYS_NICE to take effect for processes having effective uid of a
> root user from init namespace.
>
> Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
> ---
>  kernel/sched/core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f..628bd46 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice)
>  	int nice_rlim = nice_to_rlimit(nice);
>  
>  	return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
> +		(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
> +		uid_eq(current_euid(), GLOBAL_ROOT_UID)) ||
>  		capable(CAP_SYS_NICE));
>  }
>  
> @@ -4784,7 +4786,9 @@ static int __sched_setscheduler(struct task_struct *p,
>  	/*
>  	 * Allow unprivileged RT tasks to decrease priority:
>  	 */
> -	if (user && !capable(CAP_SYS_NICE)) {
> +	if (user && !(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
> +		uid_eq(current_euid(), GLOBAL_ROOT_UID)) &&
> +		!capable(CAP_SYS_NICE)) {
>  		if (fair_policy(policy)) {
>  			if (attr->sched_nice < task_nice(p) &&
>  			    !can_nice(p, attr->sched_nice))


I remember looking at this before.  I don't remember if I commented.

1) Having GLOBAL_ROOT_UID in a user namespace is A Bad Idea™.
   Definitely not something we should make special case for.
   That configuration is almost certainly a privilege escalation waiting
   to happen.

2) If I read the other thread correctly there was talk about setting the
   nice levels of processes in other containers.  Ouch!

   The only thing I can think that makes any sense at all is to allow
   setting the nice levels of the processes in your own container.

   I can totally see having a test to see if a processes credentials are
   in the caller's user namespace or a child of caller's user namespace
   and allowing admin level access if the caller has the appropriate
   caps in their user namespace.

   But in this case I don't see anything preventing the admin in a
   container from using the ordinary nice levels on a task.  You are
   unlocking the nice levels reserved for the system administrator
   for special occassions.   I don't see how that makes any sense
   to do from inside a container.

The design goal of user namespaces (assuming a non-buggy kernel) is to
ensure user namespaces give a user no more privileges than the user had
before creating a user namespace.  In this case you are granting a user
who creates a user namespace the ability to change nice levels on all
process in the system (limited to users whose uid happens to be
GLOBAL_ROOT_UID).  But still this is effectively a way to get
CAP_SYS_NICE back if it was dropped.

As a violation of security policy this change simply can not be allowed.
The entire idiom:  "ns_capable(__task_cred(p)->user_ns, ...)" is a check
that provides no security.

Eric





   
   


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-21 21:27   ` Eric W. Biederman
@ 2019-11-22  1:45     ` Prakash Sangappa
  2020-01-08 21:23       ` prakash.sangappa
  0 siblings, 1 reply; 11+ messages in thread
From: Prakash Sangappa @ 2019-11-22  1:45 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-api, tglx, peterz, serge



On 11/21/19 1:27 PM, ebiederm@xmission.com wrote:
> Prakash Sangappa <prakash.sangappa@oracle.com> writes:
>
>> Allow CAP_SYS_NICE to take effect for processes having effective uid of a
>> root user from init namespace.
>>
>> Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
>> ---
>>   kernel/sched/core.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 7880f4f..628bd46 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice)
>>   	int nice_rlim = nice_to_rlimit(nice);
>>   
>>   	return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
>> +		(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
>> +		uid_eq(current_euid(), GLOBAL_ROOT_UID)) ||
>>   		capable(CAP_SYS_NICE));
>>   }
>>   
>> @@ -4784,7 +4786,9 @@ static int __sched_setscheduler(struct task_struct *p,
>>   	/*
>>   	 * Allow unprivileged RT tasks to decrease priority:
>>   	 */
>> -	if (user && !capable(CAP_SYS_NICE)) {
>> +	if (user && !(ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) &&
>> +		uid_eq(current_euid(), GLOBAL_ROOT_UID)) &&
>> +		!capable(CAP_SYS_NICE)) {
>>   		if (fair_policy(policy)) {
>>   			if (attr->sched_nice < task_nice(p) &&
>>   			    !can_nice(p, attr->sched_nice))
>
> I remember looking at this before.  I don't remember if I commented.

Thanks for looking at this.

>
> 1) Having GLOBAL_ROOT_UID in a user namespace is A Bad Idea™.
>     Definitely not something we should make special case for.
>     That configuration is almost certainly a privilege escalation waiting
>     to happen.

Mapping root uid 0(GLOBAL_ROOT_UID) from init namespace into a user 
namespace is allowed right now. so the proposal was to extend this to 
allow capabilities like CAP_SYS_NICE to take effect which is lacking.

Understand encouraging use of GLOBAL_ROOT_UID for this purpose may not 
be a good idea.

We could look at other means to grant such capabilities to user 
namespace thru a per process /proc file like 'cap_map' or something as 
suggested in the other thread. What do you think about this approach?

Only privileged user in init namespace gets to add an entry to this 
file. We need to define if this gets inherited by any nested user 
namespaces that get created.



> 2) If I read the other thread correctly there was talk about setting the
>     nice levels of processes in other containers.  Ouch!

No not in other containers. Only on processes with in the container 
which as this capability. The use case is to use it in a container with 
user namespace and pid namespace. So no processes from other containers 
should be visible. Necessary checks should be added?.


>
>     The only thing I can think that makes any sense at all is to allow
>     setting the nice levels of the processes in your own container.

Yes that is the intended use.

>
>     I can totally see having a test to see if a processes credentials are
>     in the caller's user namespace or a child of caller's user namespace
>     and allowing admin level access if the caller has the appropriate
>     caps in their user namespace.

Ok

>     But in this case I don't see anything preventing the admin in a
>     container from using the ordinary nice levels on a task.  You are
>     unlocking the nice levels reserved for the system administrator
>     for special occassions.   I don't see how that makes any sense
>     to do from inside a container.

But this is what seems to be lacking. A container could have some 
critical processes running which need to run at a higher priority.

>
> The design goal of user namespaces (assuming a non-buggy kernel) is to
> ensure user namespaces give a user no more privileges than the user had
> before creating a user namespace.  In this case you are granting a user
> who creates a user namespace the ability to change nice levels on all
> process in the system (limited to users whose uid happens to be
> GLOBAL_ROOT_UID).  But still this is effectively a way to get
> CAP_SYS_NICE back if it was dropped.

Giving privileges to only to those user with root uid from init 
namespace inside the user namespace(GLOBAL_ROOT_UID), or if not using 
GLOBAL_ROOT_UID, then privilege granted thru the /proc mechanism as 
mentioned above.

>
> As a violation of security policy this change simply can not be allowed.
> The entire idiom:  "ns_capable(__task_cred(p)->user_ns, ...)" is a check
> that provides no security.

If the effect of allowing such privileges inside user namespace could be 
controlled with use of Cgroups, even then would it be a concern?

-Prakash
> Eric
>
>
>
>
>
>     
>     
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace
  2019-11-21 18:33     ` Enrico Weigelt, metux IT consult
@ 2019-11-22  1:54       ` Prakash Sangappa
  0 siblings, 0 replies; 11+ messages in thread
From: Prakash Sangappa @ 2019-11-22  1:54 UTC (permalink / raw)
  To: Enrico Weigelt, metux IT consult, Jann Horn
  Cc: kernel list, Linux API, Eric W. Biederman, Thomas Gleixner,
	Peter Zijlstra, Serge E. Hallyn, Christian Brauner



On 11/21/19 10:33 AM, Enrico Weigelt, metux IT consult wrote:
> On 18.11.19 21:34, Prakash Sangappa wrote:
>
>> It is more the latter. Admin should be able to explicitly decide that  container A
>> workload is to be given priority over other containers.
> I guess, you're talking about the host's admin, correct ?

Yes, Specifically host's admin decides which container gets the 
privilege to increase priority of processes inside that container.

>
> Shouldn't this already be possibly by tweaking the container's cgroups ?

Don't think so. The use case is that admin/user inside the container 
needs to be able to increase the priority of some the critical processes 
running in the container.

>
>
> --mtx
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces
  2019-11-22  1:45     ` Prakash Sangappa
@ 2020-01-08 21:23       ` prakash.sangappa
  0 siblings, 0 replies; 11+ messages in thread
From: prakash.sangappa @ 2020-01-08 21:23 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-api, tglx, peterz, serge



On 11/21/2019 05:45 PM, Prakash Sangappa wrote:
>
>
> On 11/21/19 1:27 PM, ebiederm@xmission.com wrote:
>> Prakash Sangappa <prakash.sangappa@oracle.com> writes:
<..>
>> 2) If I read the other thread correctly there was talk about setting the
>>     nice levels of processes in other containers.  Ouch!
>
> No not in other containers. Only on processes within the container 
> which has this capability. The use case is to use it in a container 
> with user namespace and pid namespace. So no processes from other 
> containers should be visible. Necessary checks should be added?.
>
>
>>
>>     The only thing I can think that makes any sense at all is to allow
>>     setting the nice levels of the processes in your own container.
>
> Yes that is the intended use.
>
>>
>>     I can totally see having a test to see if a processes credentials 
>> are
>>     in the caller's user namespace or a child of caller's user namespace
>>     and allowing admin level access if the caller has the appropriate
>>     caps in their user namespace.
>
> Ok
>
>>     But in this case I don't see anything preventing the admin in a
>>     container from using the ordinary nice levels on a task. You are
>>     unlocking the nice levels reserved for the system administrator
>>     for special occassions.   I don't see how that makes any sense
>>     to do from inside a container.
>
> But this is what seems to be lacking. A container could have some 
> critical processes running which need to run at a higher priority.

Any comments about this? What would be the recommendation for dealing 
with such a requirement?



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-08 21:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-18 17:01 [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Prakash Sangappa
2019-11-18 17:01 ` [RESEND RFC PATCH 1/1] Selectively allow CAP_SYS_NICE capability inside user namespaces Prakash Sangappa
2019-11-18 19:30   ` Jann Horn
2019-11-19  0:46     ` prakash.sangappa
2019-11-21 21:27   ` Eric W. Biederman
2019-11-22  1:45     ` Prakash Sangappa
2020-01-08 21:23       ` prakash.sangappa
2019-11-18 19:36 ` [RESEND RFC PATCH 0/1] CAP_SYS_NICE inside user namespace Jann Horn
2019-11-18 20:34   ` Prakash Sangappa
2019-11-21 18:33     ` Enrico Weigelt, metux IT consult
2019-11-22  1:54       ` Prakash Sangappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).