linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] memcg v1: provide read access to memory.pressure_level
@ 2023-03-22 14:25 Florian Schmidt
  2023-03-22 15:57 ` Michal Hocko
  2023-03-24 15:03 ` Michal Koutný
  0 siblings, 2 replies; 7+ messages in thread
From: Florian Schmidt @ 2023-03-22 14:25 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton
  Cc: Florian Schmidt, cgroups, linux-mm, linux-kernel

cgroups v1 has a unique way of setting up memory pressure notifications:
the user opens "memory.pressure_level" of the cgroup they want to
monitor for pressure, then open "cgroup.event_control" and write the fd
(among other things) to that file. memory.pressure_level has no other
use, specifically it does not support any read or write operations.
Consequently, no handlers are provided, and the file ends up with
permissions 000. However, to actually use the mechanism, the subscribing
user must have read access to the file and open the fd for reading, see
memcg_write_event_control().

This is all fine as long as the subscribing process runs as root and is
otherwise unconfined by further restrictions. However, if you add strict
access controls such as selinux, the permission bits will be enforced,
and opening memory.pressure_level for reading will fail, preventing the
process from subscribing, even as root.

There are several ways around this issue, but adding a dummy read
handler seems like the least invasive to me. I'd be interested to hear:
(a) do you think there is a less invasive way? Alternatively, we could
    add a flag in cftype in include/linux/cgroup-defs.h, but that seems
    more invasive for what is a legacy interface.
(b) would you be interested to take this patch, or is it too niche a fix
    for a legacy subsystem?
---
 mm/memcontrol.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5abffe6f8389..e48c749d9724 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
 	}
 }
 
+/*
+ * This function doesn't do anything useful. Its only job is to provide a read
+ * handler so that the file gets read permissions when it's created.
+ */
+static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
+				     __always_unused void *v)
+{
+	return -EINVAL;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 static int memcg_online_kmem(struct mem_cgroup *memcg)
 {
@@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
 	},
 	{
 		.name = "pressure_level",
+		.seq_show = mem_cgroup_dummy_seq_show,
 	},
 #ifdef CONFIG_NUMA
 	{
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] memcg v1: provide read access to memory.pressure_level
  2023-03-22 14:25 [RFC] memcg v1: provide read access to memory.pressure_level Florian Schmidt
@ 2023-03-22 15:57 ` Michal Hocko
  2023-03-22 16:00   ` Florian Schmidt
  2023-03-24 15:03 ` Michal Koutný
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2023-03-22 15:57 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, cgroups, linux-mm, linux-kernel

On Wed 22-03-23 14:25:25, Florian Schmidt wrote:
> cgroups v1 has a unique way of setting up memory pressure notifications:
> the user opens "memory.pressure_level" of the cgroup they want to
> monitor for pressure, then open "cgroup.event_control" and write the fd
> (among other things) to that file. memory.pressure_level has no other
> use, specifically it does not support any read or write operations.
> Consequently, no handlers are provided, and the file ends up with
> permissions 000. However, to actually use the mechanism, the subscribing
> user must have read access to the file and open the fd for reading, see
> memcg_write_event_control().
> 
> This is all fine as long as the subscribing process runs as root and is
> otherwise unconfined by further restrictions. However, if you add strict
> access controls such as selinux, the permission bits will be enforced,
> and opening memory.pressure_level for reading will fail, preventing the
> process from subscribing, even as root.
>
> 
> There are several ways around this issue, but adding a dummy read
> handler seems like the least invasive to me.

I was struggling to see how that addresses the problem because all you
need is a read permission. But then I've looked into cgroup code and
learned that permissions are constructed based on available callbacks
(cgroup_file_mode). This would have made the review easier ;)

I have no issue with the patch. It would be great to hear from cgroup
maintainers whether a concept of default permissions is something that
would be useful also for other files.

> I'd be interested to hear:
> (a) do you think there is a less invasive way? Alternatively, we could
>     add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>     more invasive for what is a legacy interface.
> (b) would you be interested to take this patch, or is it too niche a fix
>     for a legacy subsystem?

After you add your s-o-b, feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

If cgroup people find a concept of default permissions for a cgroup file
sound then this could be replaced by that approach but this is really an
easy workaround.
> ---
>  mm/memcontrol.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5abffe6f8389..e48c749d9724 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>  	}
>  }
>  
> +/*
> + * This function doesn't do anything useful. Its only job is to provide a read
> + * handler so that the file gets read permissions when it's created.

I would just reference cgroup_file_mode() in the comment to make our
lifes easier and comment more helpful.

> + */
> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
> +				     __always_unused void *v)
> +{
> +	return -EINVAL;
> +}
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  static int memcg_online_kmem(struct mem_cgroup *memcg)
>  {
> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
>  	},
>  	{
>  		.name = "pressure_level",
> +		.seq_show = mem_cgroup_dummy_seq_show,
>  	},
>  #ifdef CONFIG_NUMA
>  	{
> -- 
> 2.32.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] memcg v1: provide read access to memory.pressure_level
  2023-03-22 15:57 ` Michal Hocko
@ 2023-03-22 16:00   ` Florian Schmidt
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Schmidt @ 2023-03-22 16:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, cgroups, linux-mm, linux-kernel



On 22/03/2023 15:57, Michal Hocko wrote:
> On Wed 22-03-23 14:25:25, Florian Schmidt wrote:
>> cgroups v1 has a unique way of setting up memory pressure notifications:
>> the user opens "memory.pressure_level" of the cgroup they want to
>> monitor for pressure, then open "cgroup.event_control" and write the fd
>> (among other things) to that file. memory.pressure_level has no other
>> use, specifically it does not support any read or write operations.
>> Consequently, no handlers are provided, and the file ends up with
>> permissions 000. However, to actually use the mechanism, the subscribing
>> user must have read access to the file and open the fd for reading, see
>> memcg_write_event_control().
>>
>> This is all fine as long as the subscribing process runs as root and is
>> otherwise unconfined by further restrictions. However, if you add strict
>> access controls such as selinux, the permission bits will be enforced,
>> and opening memory.pressure_level for reading will fail, preventing the
>> process from subscribing, even as root.
>>
>>
>> There are several ways around this issue, but adding a dummy read
>> handler seems like the least invasive to me.
> 
> I was struggling to see how that addresses the problem because all you
> need is a read permission. But then I've looked into cgroup code and
> learned that permissions are constructed based on available callbacks
> (cgroup_file_mode). This would have made the review easier ;)

Oh, sorry, I forgot to mention that salient detail!
I didn't check whether that was a common pattern or not...


> 
> I have no issue with the patch. It would be great to hear from cgroup
> maintainers whether a concept of default permissions is something that
> would be useful also for other files.
> 
>> I'd be interested to hear:
>> (a) do you think there is a less invasive way? Alternatively, we could
>>      add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>>      more invasive for what is a legacy interface.
>> (b) would you be interested to take this patch, or is it too niche a fix
>>      for a legacy subsystem?
> 
> After you add your s-o-b, feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> If cgroup people find a concept of default permissions for a cgroup file
> sound then this could be replaced by that approach but this is really an
> easy workaround.

Will do, once I know the path forward and construct a proper commit 
message, I'll add the s-o-b and ack.

>> ---
>>   mm/memcontrol.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 5abffe6f8389..e48c749d9724 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>>   	}
>>   }
>>   
>> +/*
>> + * This function doesn't do anything useful. Its only job is to provide a read
>> + * handler so that the file gets read permissions when it's created.
> 
> I would just reference cgroup_file_mode() in the comment to make our
> lifes easier and comment more helpful.

Ack.


> 
>> + */
>> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
>> +				     __always_unused void *v)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>>   #ifdef CONFIG_MEMCG_KMEM
>>   static int memcg_online_kmem(struct mem_cgroup *memcg)
>>   {
>> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
>>   	},
>>   	{
>>   		.name = "pressure_level",
>> +		.seq_show = mem_cgroup_dummy_seq_show,
>>   	},
>>   #ifdef CONFIG_NUMA
>>   	{
>> -- 
>> 2.32.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] memcg v1: provide read access to memory.pressure_level
  2023-03-22 14:25 [RFC] memcg v1: provide read access to memory.pressure_level Florian Schmidt
  2023-03-22 15:57 ` Michal Hocko
@ 2023-03-24 15:03 ` Michal Koutný
  2023-03-27 13:59   ` Florian Schmidt
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2023-03-24 15:03 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

Hello.

On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote:
> cgroups v1 has a unique way of setting up memory pressure notifications:
...
> There are several ways around this issue, but adding a dummy read
> handler seems like the least invasive to me. I'd be interested to hear:
> (a) do you think there is a less invasive way? Alternatively, we could
>     add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>     more invasive for what is a legacy interface.

You can (as privileged user) modify file perms in userspace first (e.g.
chmod o+r memory.pressure_level) and then it can used by non-privileged
users. (Or do LSM prevent you from that too?)

> (b) would you be interested to take this patch, or is it too niche a fix
>     for a legacy subsystem?

I'd rather not extend this "unique way" with additionally unique dummy
helpers.

My 0.02 €,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] memcg v1: provide read access to memory.pressure_level
  2023-03-24 15:03 ` Michal Koutný
@ 2023-03-27 13:59   ` Florian Schmidt
  2023-03-27 20:40     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Schmidt @ 2023-03-27 13:59 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel

Hi Michal,

On 24/03/2023 15:03, Michal Koutný wrote:
> On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote:
>> cgroups v1 has a unique way of setting up memory pressure notifications:
> ...
>> There are several ways around this issue, but adding a dummy read
>> handler seems like the least invasive to me. I'd be interested to hear:
>> (a) do you think there is a less invasive way? Alternatively, we could
>>      add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>>      more invasive for what is a legacy interface.
> 
> You can (as privileged user) modify file perms in userspace first (e.g.
> chmod o+r memory.pressure_level) and then it can used by non-privileged
> users. (Or do LSM prevent you from that too?)

That's true, we can work around this in userspace (though it means you 
need to give the process additional permissions, to change file 
permissions on top of just reading and writing).

Though considering that the memcg_write_event_control() explicitly 
checks whether the caller has read permissions on pressure_level, it 
felt sensible to me that the file would be created with read permissions 
in the first place, just like all the other files are created with 
permissions that are suitable for their immediate use without having to 
manually change permissions. The current implementation feels 
inconsistent in that way.


>> (b) would you be interested to take this patch, or is it too niche a fix
>>      for a legacy subsystem?
> 
> I'd rather not extend this "unique way" with additionally unique dummy
> helpers.

I understand that this is all code that has no modern user any more, 
which is why I tried to keep the fix as self-contained as possible.
Another option would be to have a special handler in cgroup_file_mode(), 
but that feels a lot klunkier to me, and leaks a v1-specific behaviour 
into the shared cgroup code.


Cheers,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] memcg v1: provide read access to memory.pressure_level
  2023-03-27 13:59   ` Florian Schmidt
@ 2023-03-27 20:40     ` Michal Hocko
  2023-04-04  8:44       ` Florian Schmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2023-03-27 20:40 UTC (permalink / raw)
  To: Florian Schmidt
  Cc: Michal Koutný,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, cgroups, linux-mm, linux-kernel

On Mon 27-03-23 14:59:37, Florian Schmidt wrote:
> Hi Michal,
> 
> On 24/03/2023 15:03, Michal Koutný wrote:
> > On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote:
[...]
> > > (b) would you be interested to take this patch, or is it too niche a fix
> > >      for a legacy subsystem?
> > 
> > I'd rather not extend this "unique way" with additionally unique dummy
> > helpers.
> 
> I understand that this is all code that has no modern user any more, which
> is why I tried to keep the fix as self-contained as possible.
> Another option would be to have a special handler in cgroup_file_mode(), but
> that feels a lot klunkier to me, and leaks a v1-specific behaviour into the
> shared cgroup code.

Yes, this is effectivelly a deprecated interface but I do agree that we
shouldn't really make life of users more complicated than necessary. If
the simplest solution to address this is to provide an empty callback
then be it. I am not sure but I do not think there are other cgroup
interfaces to warrant a more generic solution.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] memcg v1: provide read access to memory.pressure_level
  2023-03-27 20:40     ` Michal Hocko
@ 2023-04-04  8:44       ` Florian Schmidt
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Schmidt @ 2023-04-04  8:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michal Koutný,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, cgroups, linux-mm, linux-kernel

Hi all,

to summarise, I've heard generally positive feedback from Michal H and 
some more reserved, but not fundamentally opposed feedback from Michal 
K. Thanks to both of you.

Since there's been no other feedback for the last few days, I'll raise a 
proper patch, and any potential further discussion can then be done on that.

Cheers,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-04  8:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22 14:25 [RFC] memcg v1: provide read access to memory.pressure_level Florian Schmidt
2023-03-22 15:57 ` Michal Hocko
2023-03-22 16:00   ` Florian Schmidt
2023-03-24 15:03 ` Michal Koutný
2023-03-27 13:59   ` Florian Schmidt
2023-03-27 20:40     ` Michal Hocko
2023-04-04  8:44       ` Florian Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).