linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memcontrol: add memory.peak in cgroup root
@ 2023-02-21 14:34 Matthew Chae
  2023-02-21 15:07 ` Michal Hocko
  2023-02-23 14:20 ` Michal Koutný
  0 siblings, 2 replies; 8+ messages in thread
From: Matthew Chae @ 2023-02-21 14:34 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton
  Cc: kernel, christopher.wong, Matthew Chae, Muchun Song, cgroups,
	linux-mm, linux-kernel

The kernel currently doesn't provide any method to show the overall
system's peak memory usage recorded. Instead, only each slice's peak
memory usage recorded except for cgroup root is shown through each
memory.peak.

Each slice might consume their peak memory at different time. This is
stored at memory.peak in each own slice. The sum of every memory.peak
doesn't mean the total system's peak memory usage recorded. The sum at
certain point without having a peak memory usage in their slice can have
the largest value.

       time |  slice1  |  slice2  |   sum
      =======================================
        t1  |    50    |   200    |   250
      ---------------------------------------
        t2  |   150    |   150    |   300
      ---------------------------------------
        t3  |   180    |    20    |   200
      ---------------------------------------
        t4  |    80    |    20    |   100

memory.peak value of slice1 is 180 and memory.peak value of slice2 is 200.
Only these information are provided through memory.peak value from each
slice without providing the overall system's peak memory usage. The total
sum of these two value is 380, but this doesn't represent the real peak
memory usage of the overall system. The peak value what we want to get is
shown in t2 as 300, which doesn't have any biggest number even in one
slice. Therefore the proper way to show the system's overall peak memory
usage recorded needs to be provided.

Hence, expose memory.peak in the cgrop root in order to allow this.

Co-developed-by: Christopher Wong <christopher.wong@axis.com>
Signed-off-by: Christopher Wong <christopher.wong@axis.com>
Signed-off-by: Matthew Chae <matthew.chae@axis.com>
---
 mm/memcontrol.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 73afff8062f9..974fc044a7e7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6646,7 +6646,6 @@ static struct cftype memory_files[] = {
 	},
 	{
 		.name = "peak",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = memory_peak_read,
 	},
 	{
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcontrol: add memory.peak in cgroup root
  2023-02-21 14:34 [PATCH] mm/memcontrol: add memory.peak in cgroup root Matthew Chae
@ 2023-02-21 15:07 ` Michal Hocko
       [not found]   ` <DB4PR02MB93344BAA949FA7E25E298C90FEA59@DB4PR02MB9334.eurprd02.prod.outlook.com>
  2023-02-23 14:20 ` Michal Koutný
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2023-02-21 15:07 UTC (permalink / raw)
  To: Matthew Chae
  Cc: Johannes Weiner, Roman Gushchin, Shakeel Butt, Andrew Morton,
	kernel, christopher.wong, Muchun Song, cgroups, linux-mm,
	linux-kernel

On Tue 21-02-23 15:34:20, Matthew Chae wrote:
> The kernel currently doesn't provide any method to show the overall
> system's peak memory usage recorded. Instead, only each slice's peak
> memory usage recorded except for cgroup root is shown through each
> memory.peak.
> 
> Each slice might consume their peak memory at different time. This is
> stored at memory.peak in each own slice. The sum of every memory.peak
> doesn't mean the total system's peak memory usage recorded. The sum at
> certain point without having a peak memory usage in their slice can have
> the largest value.
> 
>        time |  slice1  |  slice2  |   sum
>       =======================================
>         t1  |    50    |   200    |   250
>       ---------------------------------------
>         t2  |   150    |   150    |   300
>       ---------------------------------------
>         t3  |   180    |    20    |   200
>       ---------------------------------------
>         t4  |    80    |    20    |   100
> 
> memory.peak value of slice1 is 180 and memory.peak value of slice2 is 200.
> Only these information are provided through memory.peak value from each
> slice without providing the overall system's peak memory usage. The total
> sum of these two value is 380, but this doesn't represent the real peak
> memory usage of the overall system. The peak value what we want to get is
> shown in t2 as 300, which doesn't have any biggest number even in one
> slice. Therefore the proper way to show the system's overall peak memory
> usage recorded needs to be provided.

The problem I can see is that the root's peak value doesn't really
represent the system peak memory usage because it only reflects memcg
accounted memory. So there is plenty of memory consumption which is not
covered. On top of that a lot of memory contributed to the root memcg is
not accounted at all (see try_charge and its callers) so the cumulative
hierarchical value is incomplete and I believe misleading as well.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sv: [PATCH] mm/memcontrol: add memory.peak in cgroup root
       [not found]   ` <DB4PR02MB93344BAA949FA7E25E298C90FEA59@DB4PR02MB9334.eurprd02.prod.outlook.com>
@ 2023-02-21 16:16     ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2023-02-21 16:16 UTC (permalink / raw)
  To: Christopher Wong
  Cc: Matthew Chae, Johannes Weiner, Roman Gushchin, Shakeel Butt,
	Andrew Morton, kernel, Muchun Song, cgroups, linux-mm,
	linux-kernel

On Tue 21-02-23 16:13:14, Christopher Wong wrote:
> Hi Michal,
> 
> Thanks for the quick response! I think we are just trying to
> get the same value that was available for us in cgroup v1
> memory.max_usage_in_bytes. I guess this value also is incomplete for
> representing the system memory usage.

Correct.

> Is it due the incompleteness that the memory.peak has been left out in
> the root of cgroup v2?

I think so but I do not remember 100%. You might want to look into email
archives.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcontrol: add memory.peak in cgroup root
  2023-02-21 14:34 [PATCH] mm/memcontrol: add memory.peak in cgroup root Matthew Chae
  2023-02-21 15:07 ` Michal Hocko
@ 2023-02-23 14:20 ` Michal Koutný
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2023-02-23 14:20 UTC (permalink / raw)
  To: Matthew Chae
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Andrew Morton, kernel, christopher.wong, Muchun Song, cgroups,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

Hello Matthew.

On Tue, Feb 21, 2023 at 03:34:20PM +0100, Matthew Chae <matthew.chae@axis.com> wrote:
> The kernel currently doesn't provide any method to show the overall
> system's peak memory usage recorded. Instead, only each slice's peak
> memory usage recorded except for cgroup root is shown through each
> memory.peak.

The memory.peak value is useful as a calibration insight when you want to
configure memcg limit.
But there is no global (memcg) limit on memory. So what would be this
(not clearly) defined value good for? Or better then userspace sampling
of chosen available metric?

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcontrol: add memory.peak in cgroup root
       [not found]       ` <df9cc4e1-befc-af10-c353-733bec54baf1@axis.com>
@ 2023-02-24 15:02         ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2023-02-24 15:02 UTC (permalink / raw)
  To: Matthew Chae
  Cc: Matthew Chae, Roman Gushchin, Michal Koutný,
	Johannes Weiner, Shakeel Butt, Andrew Morton, kernel,
	Christopher Wong, Muchun Song, cgroups, linux-mm, linux-kernel

On Fri 24-02-23 15:18:49, Matthew Chae wrote:
> Hi Michal
> 
> Thank you for helping me gain full insight.
> It looks like there is no proper way to get the peak memory usage recorded
> without adding any overhead to the system and for all users. But I fully
> understand what you kindly explained. Basically, having low memory left
> doesn't mean a bad situation for the system, So checking the peak memory
> doesn't mean a lot and is not necessary.

You might find https://www.pdl.cmu.edu/ftp/NVM/tmo_asplos22.pdf
interesting and helpful
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcontrol: add memory.peak in cgroup root
       [not found]   ` <AM5PR0202MB2516BD45CFBC033F9EA3B0A4E1AB9@AM5PR0202MB2516.eurprd02.prod.outlook.com>
  2023-02-23 22:00     ` Roman Gushchin
@ 2023-02-24  8:23     ` Michal Hocko
       [not found]       ` <df9cc4e1-befc-af10-c353-733bec54baf1@axis.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2023-02-24  8:23 UTC (permalink / raw)
  To: Matthew Chae
  Cc: Roman Gushchin, Michal Koutný,
	Johannes Weiner, Shakeel Butt, Andrew Morton, kernel,
	Christopher Wong, Muchun Song, cgroups, linux-mm, linux-kernel

On Thu 23-02-23 19:00:57, Matthew Chae wrote:
> Hi Roman,
> 
> I'd like to get the peak memory usage recorded overall time, rather than at a certain time.

Sampling /proc/vmstat should have a minimal overhead and you will get
not only a single value but also a break down to broad cathegory users
(LRU, slab, page tables etc.). Unfortunatelly this doesn't cover all the
users (e.g. direct users of the page allocator are not accounted to any
specific counter) but it should give you a reasonable idea how is memory
utilized. Specific metrics really depend on what you are interested in.

Another approach that might give you a different angle to the memory
consumption is to watch PSI metrics. This will not tell you the peak
memory usage but it will give you an useful cost model for the memory
usage. Being low on free memory itself is not a bad thing, i.e. you are
paying for the amount of memory so it would be rather sub-optimal to not
use it whole, right? If the memory can be reclaimed easily (e.g. by
reclaiming idle caches) then the overhead of a high memory utilization
should be reasonably low so the overal price of the reclaim is worth it.
On the other hand an over utilized system with a working set size larger
than the available memory would spend a lot of time reclaiming so the
performance would drop down.

All that being said the primary question is what is your usecase.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcontrol: add memory.peak in cgroup root
       [not found]   ` <AM5PR0202MB2516BD45CFBC033F9EA3B0A4E1AB9@AM5PR0202MB2516.eurprd02.prod.outlook.com>
@ 2023-02-23 22:00     ` Roman Gushchin
  2023-02-24  8:23     ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2023-02-23 22:00 UTC (permalink / raw)
  To: Matthew Chae
  Cc: Michal Koutný,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	kernel, Christopher Wong, Muchun Song, cgroups, linux-mm,
	linux-kernel

On Thu, Feb 23, 2023 at 07:00:57PM +0000, Matthew Chae wrote:
> Hi Roman,
> 
> I'd like to get the peak memory usage recorded overall time, rather than at a certain time.
> Plus, I expect that the systematical way might have better performance compared to userspace sampling.

I'm not necessarily saying to do this in userspace, you can try add a new system-wide counter
(new /proc/vmstat entry). Obviously, it might be easier to do this in userspace.
My point is to do it on system level rather than cgroup level and record a bottom of free
memory rather than the peak of used memory.

> If I understand correctly, recording the bottom of available free memory might not be helpful for this.
> Am I missing something?

Why?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcontrol: add memory.peak in cgroup root
       [not found] <AM5PR0202MB25167BFBBE892630A2EE3B7DE1AB9@AM5PR0202MB2516.eurprd02.prod.outlook.com>
@ 2023-02-23 17:30 ` Roman Gushchin
       [not found]   ` <AM5PR0202MB2516BD45CFBC033F9EA3B0A4E1AB9@AM5PR0202MB2516.eurprd02.prod.outlook.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2023-02-23 17:30 UTC (permalink / raw)
  To: Matthew Chae
  Cc: Michal Koutný,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	kernel, Christopher Wong, Muchun Song, cgroups, linux-mm,
	linux-kernel

On Thu, Feb 23, 2023 at 04:22:33PM +0000, Matthew Chae wrote:
> Hi Michal,
> 
> First off, thank you for sharing your opinion.
> I'd like to monitor the peak memory usage recorded of overall system or at least cgroup accounted memory through memory.peak.
> But it looks like this is not relevant to what I wanted.
> It might be good to have some proper way for checking the system's peak memory usage recorded.

I guess you might want to do the opposite: instead of tracking the peak usage,
you can record the bottom of available free memory.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-24 15:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-21 14:34 [PATCH] mm/memcontrol: add memory.peak in cgroup root Matthew Chae
2023-02-21 15:07 ` Michal Hocko
     [not found]   ` <DB4PR02MB93344BAA949FA7E25E298C90FEA59@DB4PR02MB9334.eurprd02.prod.outlook.com>
2023-02-21 16:16     ` Sv: " Michal Hocko
2023-02-23 14:20 ` Michal Koutný
     [not found] <AM5PR0202MB25167BFBBE892630A2EE3B7DE1AB9@AM5PR0202MB2516.eurprd02.prod.outlook.com>
2023-02-23 17:30 ` Roman Gushchin
     [not found]   ` <AM5PR0202MB2516BD45CFBC033F9EA3B0A4E1AB9@AM5PR0202MB2516.eurprd02.prod.outlook.com>
2023-02-23 22:00     ` Roman Gushchin
2023-02-24  8:23     ` Michal Hocko
     [not found]       ` <df9cc4e1-befc-af10-c353-733bec54baf1@axis.com>
2023-02-24 15:02         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).