All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-05 12:13 ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-05 12:13 UTC (permalink / raw)
  To: hannes, mhocko, roman.gushchin, shakeelb; +Cc: cgroups, linux-mm, rganesan

v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
sysfs. This is missing for v2 memcg though "memory.current" is exported.
There is no other easy way of getting this information in Linux.
getrsuage() returns ru_maxrss but that's the max RSS of a single process
instead of the aggregated max RSS of all the processes. Hence, expose
memcg->watermark as "memory.watermark" for v2 memcg.

Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
---
 mm/memcontrol.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 725f76723220..57ed07deff3e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6098,6 +6098,14 @@ static u64 memory_current_read(struct cgroup_subsys_state *css,
 	return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE;
 }
 
+static u64 memory_watermark_read(struct cgroup_subsys_state *css,
+				 struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+	return (u64)memcg->memory.watermark * PAGE_SIZE;
+}
+
 static int memory_min_show(struct seq_file *m, void *v)
 {
 	return seq_puts_memcg_tunable(m,
@@ -6361,6 +6369,11 @@ static struct cftype memory_files[] = {
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = memory_current_read,
 	},
+	{
+		.name = "watermark",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = memory_watermark_read,
+	},
 	{
 		.name = "min",
 		.flags = CFTYPE_NOT_ON_ROOT,
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-05 12:13 ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-05 12:13 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	rganesan-nzgTgzXrdUbQT0dZR+AlfA

v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
sysfs. This is missing for v2 memcg though "memory.current" is exported.
There is no other easy way of getting this information in Linux.
getrsuage() returns ru_maxrss but that's the max RSS of a single process
instead of the aggregated max RSS of all the processes. Hence, expose
memcg->watermark as "memory.watermark" for v2 memcg.

Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
---
 mm/memcontrol.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 725f76723220..57ed07deff3e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6098,6 +6098,14 @@ static u64 memory_current_read(struct cgroup_subsys_state *css,
 	return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE;
 }
 
+static u64 memory_watermark_read(struct cgroup_subsys_state *css,
+				 struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+	return (u64)memcg->memory.watermark * PAGE_SIZE;
+}
+
 static int memory_min_show(struct seq_file *m, void *v)
 {
 	return seq_puts_memcg_tunable(m,
@@ -6361,6 +6369,11 @@ static struct cftype memory_files[] = {
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = memory_current_read,
 	},
+	{
+		.name = "watermark",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = memory_watermark_read,
+	},
 	{
 		.name = "min",
 		.flags = CFTYPE_NOT_ON_ROOT,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-05 16:12   ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-05 16:12 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan@arista.com> wrote:
>
> v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in

*max_usage_in_bytes

> sysfs. This is missing for v2 memcg though "memory.current" is exported.
> There is no other easy way of getting this information in Linux.
> getrsuage() returns ru_maxrss but that's the max RSS of a single process
> instead of the aggregated max RSS of all the processes. Hence, expose
> memcg->watermark as "memory.watermark" for v2 memcg.
>
> Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>

Can you please explain the use-case for which you need this metric?
Also note that this is not really an aggregated RSS of all the
processes in the cgroup. So, do you want max RSS or max charge and for
what use-case?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-05 16:12   ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-05 16:12 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org> wrote:
>
> v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in

*max_usage_in_bytes

> sysfs. This is missing for v2 memcg though "memory.current" is exported.
> There is no other easy way of getting this information in Linux.
> getrsuage() returns ru_maxrss but that's the max RSS of a single process
> instead of the aggregated max RSS of all the processes. Hence, expose
> memcg->watermark as "memory.watermark" for v2 memcg.
>
> Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>

Can you please explain the use-case for which you need this metric?
Also note that this is not really an aggregated RSS of all the
processes in the cgroup. So, do you want max RSS or max charge and for
what use-case?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
  2022-05-05 16:12   ` Shakeel Butt
  (?)
@ 2022-05-05 17:21   ` Ganesan Rajagopal
  -1 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-05 17:21 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

On Thu, May 5, 2022 at 9:42 PM Shakeel Butt <shakeelb@google.com> wrote:

> On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan@arista.com>
> wrote:
> >
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
>
> *max_usage_in_bytes
>

Oops, thanks for the correction.

> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
>
> Can you please explain the use-case for which you need this metric?
> Also note that this is not really an aggregated RSS of all the
> processes in the cgroup. So, do you want max RSS or max charge and for
> what use-case?
>

We run a lot of automated tests when building our software and used to
run into OOM scenarios when the tests run unbounded. We use this metric
to heuristically limit how many tests can run in parallel using per test
historical data.

I understand this isn't really aggregated RSS, max charge works. We just
need some metric to account for the peak memory usage.  We don't need
it to be super accurate because there's significant variance between test
runs anyway. We conservatively use the historical max to limit parallelism.

Since this metric is not exposed in v2 memcg, the only alternative is to
poll "memory.current" which would be quite inefficient and grossly
inaccurate.

Ganesan

[-- Attachment #2: Type: text/html, Size: 4136 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-05 17:27     ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-05 17:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Thu, May 5, 2022 at 9:42 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan@arista.com> wrote:
> >
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
>
> *max_usage_in_bytes

Oops, thanks for the correction.

> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
>
> Can you please explain the use-case for which you need this metric?
> Also note that this is not really an aggregated RSS of all the
> processes in the cgroup. So, do you want max RSS or max charge and for
> what use-case?

We run a lot of automated tests when building our software and used to
run into OOM scenarios when the tests run unbounded. We use this metric
to heuristically limit how many tests can run in parallel using per test
historical data.

I understand this isn't really aggregated RSS, max charge works. We just
need some metric to account for the peak memory usage.  We don't need
it to be super accurate because there's significant variance between test
runs anyway. We conservatively use the historical max to limit parallelism.

Since this metric is not exposed in v2 memcg, the only alternative is to
poll "memory.current" which would be quite inefficient and grossly
inaccurate.

Ganesan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-05 17:27     ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-05 17:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Thu, May 5, 2022 at 9:42 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
>
> *max_usage_in_bytes

Oops, thanks for the correction.

> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
>
> Can you please explain the use-case for which you need this metric?
> Also note that this is not really an aggregated RSS of all the
> processes in the cgroup. So, do you want max RSS or max charge and for
> what use-case?

We run a lot of automated tests when building our software and used to
run into OOM scenarios when the tests run unbounded. We use this metric
to heuristically limit how many tests can run in parallel using per test
historical data.

I understand this isn't really aggregated RSS, max charge works. We just
need some metric to account for the peak memory usage.  We don't need
it to be super accurate because there's significant variance between test
runs anyway. We conservatively use the historical max to limit parallelism.

Since this metric is not exposed in v2 memcg, the only alternative is to
poll "memory.current" which would be quite inefficient and grossly
inaccurate.

Ganesan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 14:56   ` Johannes Weiner
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Weiner @ 2022-05-06 14:56 UTC (permalink / raw)
  To: Ganesan Rajagopal; +Cc: mhocko, roman.gushchin, shakeelb, cgroups, linux-mm

On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> sysfs. This is missing for v2 memcg though "memory.current" is exported.
> There is no other easy way of getting this information in Linux.
> getrsuage() returns ru_maxrss but that's the max RSS of a single process
> instead of the aggregated max RSS of all the processes. Hence, expose
> memcg->watermark as "memory.watermark" for v2 memcg.
> 
> Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>

This wasn't initially added to cgroup2 because its usefulness is very
specific: it's (mostly) useless on limited cgroups, on long-running
cgroups, and on cgroups that are recycled for multiple jobs. And I
expect these categories apply to the majority of cgroup usecases.

However, for the situation where you want to measure the footprint of
a short-lived, unlimited one-off cgroup, there really is no good
alternative. And it's a legitimate usecase. It doesn't cost much to
maintain this info. So I think we should go ahead with this patch.

But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 14:56   ` Johannes Weiner
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Weiner @ 2022-05-06 14:56 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> sysfs. This is missing for v2 memcg though "memory.current" is exported.
> There is no other easy way of getting this information in Linux.
> getrsuage() returns ru_maxrss but that's the max RSS of a single process
> instead of the aggregated max RSS of all the processes. Hence, expose
> memcg->watermark as "memory.watermark" for v2 memcg.
> 
> Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>

This wasn't initially added to cgroup2 because its usefulness is very
specific: it's (mostly) useless on limited cgroups, on long-running
cgroups, and on cgroups that are recycled for multiple jobs. And I
expect these categories apply to the majority of cgroup usecases.

However, for the situation where you want to measure the footprint of
a short-lived, unlimited one-off cgroup, there really is no good
alternative. And it's a legitimate usecase. It doesn't cost much to
maintain this info. So I think we should go ahead with this patch.

But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 15:04     ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-06 15:04 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: mhocko, roman.gushchin, shakeelb, cgroups, linux-mm

On Fri, May 6, 2022 at 8:27 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
>
> This wasn't initially added to cgroup2 because its usefulness is very
> specific: it's (mostly) useless on limited cgroups, on long-running
> cgroups, and on cgroups that are recycled for multiple jobs. And I
> expect these categories apply to the majority of cgroup usecases.
>
> However, for the situation where you want to measure the footprint of
> a short-lived, unlimited one-off cgroup, there really is no good
> alternative. And it's a legitimate usecase. It doesn't cost much to
> maintain this info. So I think we should go ahead with this patch.
>
> But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.

Thank you for the review. I'll refresh the patch with the documentation blurb.

Ganesan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 15:04     ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-06 15:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri, May 6, 2022 at 8:27 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>
> On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
>
> This wasn't initially added to cgroup2 because its usefulness is very
> specific: it's (mostly) useless on limited cgroups, on long-running
> cgroups, and on cgroups that are recycled for multiple jobs. And I
> expect these categories apply to the majority of cgroup usecases.
>
> However, for the situation where you want to measure the footprint of
> a short-lived, unlimited one-off cgroup, there really is no good
> alternative. And it's a legitimate usecase. It doesn't cost much to
> maintain this info. So I think we should go ahead with this patch.
>
> But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.

Thank you for the review. I'll refresh the patch with the documentation blurb.

Ganesan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 15:56     ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 15:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Ganesan Rajagopal, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
>
> This wasn't initially added to cgroup2 because its usefulness is very
> specific: it's (mostly) useless on limited cgroups, on long-running
> cgroups, and on cgroups that are recycled for multiple jobs. And I
> expect these categories apply to the majority of cgroup usecases.
>
> However, for the situation where you want to measure the footprint of
> a short-lived, unlimited one-off cgroup, there really is no good
> alternative. And it's a legitimate usecase. It doesn't cost much to
> maintain this info. So I think we should go ahead with this patch.
>
> But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.

No objection from me. I do have two points: (1) watermark is not a
good name for this interface, maybe max_usage or something. (2) a way
to reset (i.e. write to it, reset it).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 15:56     ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 15:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Ganesan Rajagopal, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>
> On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > There is no other easy way of getting this information in Linux.
> > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > instead of the aggregated max RSS of all the processes. Hence, expose
> > memcg->watermark as "memory.watermark" for v2 memcg.
> >
> > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
>
> This wasn't initially added to cgroup2 because its usefulness is very
> specific: it's (mostly) useless on limited cgroups, on long-running
> cgroups, and on cgroups that are recycled for multiple jobs. And I
> expect these categories apply to the majority of cgroup usecases.
>
> However, for the situation where you want to measure the footprint of
> a short-lived, unlimited one-off cgroup, there really is no good
> alternative. And it's a legitimate usecase. It doesn't cost much to
> maintain this info. So I think we should go ahead with this patch.
>
> But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.

No objection from me. I do have two points: (1) watermark is not a
good name for this interface, maybe max_usage or something. (2) a way
to reset (i.e. write to it, reset it).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 15:58       ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 15:58 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Thu, May 5, 2022 at 10:27 AM Ganesan Rajagopal <rganesan@arista.com> wrote:
>
> On Thu, May 5, 2022 at 9:42 PM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan@arista.com> wrote:
> > >
> > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> >
> > *max_usage_in_bytes
>
> Oops, thanks for the correction.
>
> > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > There is no other easy way of getting this information in Linux.
> > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > memcg->watermark as "memory.watermark" for v2 memcg.
> > >
> > > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
> >
> > Can you please explain the use-case for which you need this metric?
> > Also note that this is not really an aggregated RSS of all the
> > processes in the cgroup. So, do you want max RSS or max charge and for
> > what use-case?
>
> We run a lot of automated tests when building our software and used to
> run into OOM scenarios when the tests run unbounded. We use this metric
> to heuristically limit how many tests can run in parallel using per test
> historical data.
>
> I understand this isn't really aggregated RSS, max charge works. We just
> need some metric to account for the peak memory usage.  We don't need
> it to be super accurate because there's significant variance between test
> runs anyway. We conservatively use the historical max to limit parallelism.
>
> Since this metric is not exposed in v2 memcg, the only alternative is to
> poll "memory.current" which would be quite inefficient and grossly
> inaccurate.
>

Oh also include the details you explained here in your commit message.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 15:58       ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 15:58 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Thu, May 5, 2022 at 10:27 AM Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Thu, May 5, 2022 at 9:42 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On Thu, May 5, 2022 at 5:13 AM Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> >
> > *max_usage_in_bytes
>
> Oops, thanks for the correction.
>
> > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > There is no other easy way of getting this information in Linux.
> > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > memcg->watermark as "memory.watermark" for v2 memcg.
> > >
> > > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
> >
> > Can you please explain the use-case for which you need this metric?
> > Also note that this is not really an aggregated RSS of all the
> > processes in the cgroup. So, do you want max RSS or max charge and for
> > what use-case?
>
> We run a lot of automated tests when building our software and used to
> run into OOM scenarios when the tests run unbounded. We use this metric
> to heuristically limit how many tests can run in parallel using per test
> historical data.
>
> I understand this isn't really aggregated RSS, max charge works. We just
> need some metric to account for the peak memory usage.  We don't need
> it to be super accurate because there's significant variance between test
> runs anyway. We conservatively use the historical max to limit parallelism.
>
> Since this metric is not exposed in v2 memcg, the only alternative is to
> poll "memory.current" which would be quite inefficient and grossly
> inaccurate.
>

Oh also include the details you explained here in your commit message.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 16:43       ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-06 16:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 9:26 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > There is no other easy way of getting this information in Linux.
> > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > memcg->watermark as "memory.watermark" for v2 memcg.
> > >
> > > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
> >
> > This wasn't initially added to cgroup2 because its usefulness is very
> > specific: it's (mostly) useless on limited cgroups, on long-running
> > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > expect these categories apply to the majority of cgroup usecases.
> >
> > However, for the situation where you want to measure the footprint of
> > a short-lived, unlimited one-off cgroup, there really is no good
> > alternative. And it's a legitimate usecase. It doesn't cost much to
> > maintain this info. So I think we should go ahead with this patch.
> >
> > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
>
> No objection from me. I do have two points: (1) watermark is not a
> good name for this interface, maybe max_usage or something. (2) a way
> to reset (i.e. write to it, reset it).

Thanks for the feedback. I'll post an updated patch with your suggestions
(including updating the description). For resetting the value I assume
setting it to memory.current would be logical?

Ganesan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 16:43       ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-06 16:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 9:26 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> >
> > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > There is no other easy way of getting this information in Linux.
> > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > memcg->watermark as "memory.watermark" for v2 memcg.
> > >
> > > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
> >
> > This wasn't initially added to cgroup2 because its usefulness is very
> > specific: it's (mostly) useless on limited cgroups, on long-running
> > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > expect these categories apply to the majority of cgroup usecases.
> >
> > However, for the situation where you want to measure the footprint of
> > a short-lived, unlimited one-off cgroup, there really is no good
> > alternative. And it's a legitimate usecase. It doesn't cost much to
> > maintain this info. So I think we should go ahead with this patch.
> >
> > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
>
> No objection from me. I do have two points: (1) watermark is not a
> good name for this interface, maybe max_usage or something. (2) a way
> to reset (i.e. write to it, reset it).

Thanks for the feedback. I'll post an updated patch with your suggestions
(including updating the description). For resetting the value I assume
setting it to memory.current would be logical?

Ganesan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 16:45         ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 16:45 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 9:44 AM Ganesan Rajagopal <rganesan@arista.com> wrote:
>
> On Fri, May 6, 2022 at 9:26 PM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > > There is no other easy way of getting this information in Linux.
> > > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > > memcg->watermark as "memory.watermark" for v2 memcg.
> > > >
> > > > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
> > >
> > > This wasn't initially added to cgroup2 because its usefulness is very
> > > specific: it's (mostly) useless on limited cgroups, on long-running
> > > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > > expect these categories apply to the majority of cgroup usecases.
> > >
> > > However, for the situation where you want to measure the footprint of
> > > a short-lived, unlimited one-off cgroup, there really is no good
> > > alternative. And it's a legitimate usecase. It doesn't cost much to
> > > maintain this info. So I think we should go ahead with this patch.
> > >
> > > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> >
> > No objection from me. I do have two points: (1) watermark is not a
> > good name for this interface, maybe max_usage or something. (2) a way
> > to reset (i.e. write to it, reset it).
>
> Thanks for the feedback. I'll post an updated patch with your suggestions
> (including updating the description). For resetting the value I assume
> setting it to memory.current would be logical?
>

Yes, that is what v1 does, so no need to do anything different.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 16:45         ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 16:45 UTC (permalink / raw)
  To: Ganesan Rajagopal
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 9:44 AM Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Fri, May 6, 2022 at 9:26 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> > >
> > > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > > There is no other easy way of getting this information in Linux.
> > > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > > memcg->watermark as "memory.watermark" for v2 memcg.
> > > >
> > > > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
> > >
> > > This wasn't initially added to cgroup2 because its usefulness is very
> > > specific: it's (mostly) useless on limited cgroups, on long-running
> > > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > > expect these categories apply to the majority of cgroup usecases.
> > >
> > > However, for the situation where you want to measure the footprint of
> > > a short-lived, unlimited one-off cgroup, there really is no good
> > > alternative. And it's a legitimate usecase. It doesn't cost much to
> > > maintain this info. So I think we should go ahead with this patch.
> > >
> > > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> >
> > No objection from me. I do have two points: (1) watermark is not a
> > good name for this interface, maybe max_usage or something. (2) a way
> > to reset (i.e. write to it, reset it).
>
> Thanks for the feedback. I'll post an updated patch with your suggestions
> (including updating the description). For resetting the value I assume
> setting it to memory.current would be logical?
>

Yes, that is what v1 does, so no need to do anything different.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 16:58           ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-06 16:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 10:15 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Fri, May 6, 2022 at 9:44 AM Ganesan Rajagopal <rganesan@arista.com> wrote:
> >
> > On Fri, May 6, 2022 at 9:26 PM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > >
> > > > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > > > There is no other easy way of getting this information in Linux.
> > > > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > > > memcg->watermark as "memory.watermark" for v2 memcg.
> > > > >
> > > > > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
> > > >
> > > > This wasn't initially added to cgroup2 because its usefulness is very
> > > > specific: it's (mostly) useless on limited cgroups, on long-running
> > > > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > > > expect these categories apply to the majority of cgroup usecases.
> > > >
> > > > However, for the situation where you want to measure the footprint of
> > > > a short-lived, unlimited one-off cgroup, there really is no good
> > > > alternative. And it's a legitimate usecase. It doesn't cost much to
> > > > maintain this info. So I think we should go ahead with this patch.
> > > >
> > > > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> > >
> > > No objection from me. I do have two points: (1) watermark is not a
> > > good name for this interface, maybe max_usage or something. (2) a way
> > > to reset (i.e. write to it, reset it).
> >
> > Thanks for the feedback. I'll post an updated patch with your suggestions
> > (including updating the description). For resetting the value I assume
> > setting it to memory.current would be logical?
> >
>
> Yes, that is what v1 does, so no need to do anything different.

Sounds good. Thanks.

Ganesan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 16:58           ` Ganesan Rajagopal
  0 siblings, 0 replies; 25+ messages in thread
From: Ganesan Rajagopal @ 2022-05-06 16:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 10:15 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Fri, May 6, 2022 at 9:44 AM Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On Fri, May 6, 2022 at 9:26 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > > On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> > > >
> > > > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > > > There is no other easy way of getting this information in Linux.
> > > > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > > > memcg->watermark as "memory.watermark" for v2 memcg.
> > > > >
> > > > > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
> > > >
> > > > This wasn't initially added to cgroup2 because its usefulness is very
> > > > specific: it's (mostly) useless on limited cgroups, on long-running
> > > > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > > > expect these categories apply to the majority of cgroup usecases.
> > > >
> > > > However, for the situation where you want to measure the footprint of
> > > > a short-lived, unlimited one-off cgroup, there really is no good
> > > > alternative. And it's a legitimate usecase. It doesn't cost much to
> > > > maintain this info. So I think we should go ahead with this patch.
> > > >
> > > > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> > >
> > > No objection from me. I do have two points: (1) watermark is not a
> > > good name for this interface, maybe max_usage or something. (2) a way
> > > to reset (i.e. write to it, reset it).
> >
> > Thanks for the feedback. I'll post an updated patch with your suggestions
> > (including updating the description). For resetting the value I assume
> > setting it to memory.current would be logical?
> >
>
> Yes, that is what v1 does, so no need to do anything different.

Sounds good. Thanks.

Ganesan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 20:49       ` Johannes Weiner
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Weiner @ 2022-05-06 20:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Ganesan Rajagopal, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 06, 2022 at 08:56:31AM -0700, Shakeel Butt wrote:
> On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > There is no other easy way of getting this information in Linux.
> > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > memcg->watermark as "memory.watermark" for v2 memcg.
> > >
> > > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
> >
> > This wasn't initially added to cgroup2 because its usefulness is very
> > specific: it's (mostly) useless on limited cgroups, on long-running
> > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > expect these categories apply to the majority of cgroup usecases.
> >
> > However, for the situation where you want to measure the footprint of
> > a short-lived, unlimited one-off cgroup, there really is no good
> > alternative. And it's a legitimate usecase. It doesn't cost much to
> > maintain this info. So I think we should go ahead with this patch.
> >
> > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> 
> No objection from me. I do have two points: (1) watermark is not a
> good name for this interface, maybe max_usage or something.

How about memory.peak? It'd be nice to avoid underscores.

> (2) a way to reset (i.e. write to it, reset it).

We used to have that with cgroup1, but it gets weird in modern cgroup
environments when there can be multiple consumers. One of them resets
the stat for their own purpose, now the others have no idea what
sample time frame they are looking at.

It'd be more robust if we just said "peak usage since birth of
cgroup". If you want to sample new work, create a new cgroup.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 20:49       ` Johannes Weiner
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Weiner @ 2022-05-06 20:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Ganesan Rajagopal, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 06, 2022 at 08:56:31AM -0700, Shakeel Butt wrote:
> On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> >
> > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > There is no other easy way of getting this information in Linux.
> > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > memcg->watermark as "memory.watermark" for v2 memcg.
> > >
> > > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
> >
> > This wasn't initially added to cgroup2 because its usefulness is very
> > specific: it's (mostly) useless on limited cgroups, on long-running
> > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > expect these categories apply to the majority of cgroup usecases.
> >
> > However, for the situation where you want to measure the footprint of
> > a short-lived, unlimited one-off cgroup, there really is no good
> > alternative. And it's a legitimate usecase. It doesn't cost much to
> > maintain this info. So I think we should go ahead with this patch.
> >
> > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> 
> No objection from me. I do have two points: (1) watermark is not a
> good name for this interface, maybe max_usage or something.

How about memory.peak? It'd be nice to avoid underscores.

> (2) a way to reset (i.e. write to it, reset it).

We used to have that with cgroup1, but it gets weird in modern cgroup
environments when there can be multiple consumers. One of them resets
the stat for their own purpose, now the others have no idea what
sample time frame they are looking at.

It'd be more robust if we just said "peak usage since birth of
cgroup". If you want to sample new work, create a new cgroup.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 21:04         ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 21:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Ganesan Rajagopal, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 1:50 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, May 06, 2022 at 08:56:31AM -0700, Shakeel Butt wrote:
> > On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > > There is no other easy way of getting this information in Linux.
> > > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > > memcg->watermark as "memory.watermark" for v2 memcg.
> > > >
> > > > Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
> > >
> > > This wasn't initially added to cgroup2 because its usefulness is very
> > > specific: it's (mostly) useless on limited cgroups, on long-running
> > > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > > expect these categories apply to the majority of cgroup usecases.
> > >
> > > However, for the situation where you want to measure the footprint of
> > > a short-lived, unlimited one-off cgroup, there really is no good
> > > alternative. And it's a legitimate usecase. It doesn't cost much to
> > > maintain this info. So I think we should go ahead with this patch.
> > >
> > > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> >
> > No objection from me. I do have two points: (1) watermark is not a
> > good name for this interface, maybe max_usage or something.
>
> How about memory.peak? It'd be nice to avoid underscores.
>
> > (2) a way to reset (i.e. write to it, reset it).
>
> We used to have that with cgroup1, but it gets weird in modern cgroup
> environments when there can be multiple consumers. One of them resets
> the stat for their own purpose, now the others have no idea what
> sample time frame they are looking at.
>
> It'd be more robust if we just said "peak usage since birth of
> cgroup". If you want to sample new work, create a new cgroup.

SGTM for both.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg
@ 2022-05-06 21:04         ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2022-05-06 21:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Ganesan Rajagopal, Michal Hocko, Roman Gushchin, Cgroups, Linux MM

On Fri, May 6, 2022 at 1:50 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>
> On Fri, May 06, 2022 at 08:56:31AM -0700, Shakeel Butt wrote:
> > On Fri, May 6, 2022 at 7:57 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> > >
> > > On Thu, May 05, 2022 at 05:13:30AM -0700, Ganesan Rajagopal wrote:
> > > > v1 memcg exports memcg->watermark as "memory.mem_usage_in_bytes" in
> > > > sysfs. This is missing for v2 memcg though "memory.current" is exported.
> > > > There is no other easy way of getting this information in Linux.
> > > > getrsuage() returns ru_maxrss but that's the max RSS of a single process
> > > > instead of the aggregated max RSS of all the processes. Hence, expose
> > > > memcg->watermark as "memory.watermark" for v2 memcg.
> > > >
> > > > Signed-off-by: Ganesan Rajagopal <rganesan-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
> > >
> > > This wasn't initially added to cgroup2 because its usefulness is very
> > > specific: it's (mostly) useless on limited cgroups, on long-running
> > > cgroups, and on cgroups that are recycled for multiple jobs. And I
> > > expect these categories apply to the majority of cgroup usecases.
> > >
> > > However, for the situation where you want to measure the footprint of
> > > a short-lived, unlimited one-off cgroup, there really is no good
> > > alternative. And it's a legitimate usecase. It doesn't cost much to
> > > maintain this info. So I think we should go ahead with this patch.
> > >
> > > But please add a blurb to Documentation/admin-guide/cgroup-v2.rst.
> >
> > No objection from me. I do have two points: (1) watermark is not a
> > good name for this interface, maybe max_usage or something.
>
> How about memory.peak? It'd be nice to avoid underscores.
>
> > (2) a way to reset (i.e. write to it, reset it).
>
> We used to have that with cgroup1, but it gets weird in modern cgroup
> environments when there can be multiple consumers. One of them resets
> the stat for their own purpose, now the others have no idea what
> sample time frame they are looking at.
>
> It'd be more robust if we just said "peak usage since birth of
> cgroup". If you want to sample new work, create a new cgroup.

SGTM for both.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-05-06 21:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-05 12:13 [PATCH] mm/memcontrol: Export memcg->watermark via sysfs for v2 memcg Ganesan Rajagopal
2022-05-05 12:13 ` Ganesan Rajagopal
2022-05-05 16:12 ` Shakeel Butt
2022-05-05 16:12   ` Shakeel Butt
2022-05-05 17:21   ` Ganesan Rajagopal
2022-05-05 17:27   ` Ganesan Rajagopal
2022-05-05 17:27     ` Ganesan Rajagopal
2022-05-06 15:58     ` Shakeel Butt
2022-05-06 15:58       ` Shakeel Butt
2022-05-06 14:56 ` Johannes Weiner
2022-05-06 14:56   ` Johannes Weiner
2022-05-06 15:04   ` Ganesan Rajagopal
2022-05-06 15:04     ` Ganesan Rajagopal
2022-05-06 15:56   ` Shakeel Butt
2022-05-06 15:56     ` Shakeel Butt
2022-05-06 16:43     ` Ganesan Rajagopal
2022-05-06 16:43       ` Ganesan Rajagopal
2022-05-06 16:45       ` Shakeel Butt
2022-05-06 16:45         ` Shakeel Butt
2022-05-06 16:58         ` Ganesan Rajagopal
2022-05-06 16:58           ` Ganesan Rajagopal
2022-05-06 20:49     ` Johannes Weiner
2022-05-06 20:49       ` Johannes Weiner
2022-05-06 21:04       ` Shakeel Butt
2022-05-06 21:04         ` Shakeel Butt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.