linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Documentation: Clarify usage of memory limits
@ 2023-06-01 18:38 Dan Schatzberg
  2023-06-01 19:15 ` Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Dan Schatzberg @ 2023-06-01 18:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Down, Zefan Li, Johannes Weiner, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

The existing documentation refers to memory.high as the "main mechanism
to control memory usage." This seems incorrect to me - memory.high can
result in reclaim pressure which simply leads to stalls unless some
external component observes and actions on it (e.g. systemd-oomd can be
used for this purpose). While this is feasible, users are unaware of
this interaction and are led to believe that memory.high alone is an
effective mechanism for limiting memory.

The documentation should recommend the use of memory.max as the
effective way to enforce memory limits - it triggers reclaim and results
in OOM kills by itself.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index f67c0829350b..e592a9364473 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1213,23 +1213,25 @@ PAGE_SIZE multiple when read back.
 	A read-write single value file which exists on non-root
 	cgroups.  The default is "max".
 
-	Memory usage throttle limit.  This is the main mechanism to
-	control memory usage of a cgroup.  If a cgroup's usage goes
+	Memory usage throttle limit.  If a cgroup's usage goes
 	over the high boundary, the processes of the cgroup are
 	throttled and put under heavy reclaim pressure.
 
 	Going over the high limit never invokes the OOM killer and
-	under extreme conditions the limit may be breached.
+	under extreme conditions the limit may be breached. The high
+	limit should be used in scenarios where an external process
+	monitors the limited cgroup to alleviate heavy reclaim
+	pressure.
 
   memory.max
 	A read-write single value file which exists on non-root
 	cgroups.  The default is "max".
 
-	Memory usage hard limit.  This is the final protection
-	mechanism.  If a cgroup's memory usage reaches this limit and
-	can't be reduced, the OOM killer is invoked in the cgroup.
-	Under certain circumstances, the usage may go over the limit
-	temporarily.
+	Memory usage hard limit.  This is the main mechanism to limit
+	memory usage of a cgroup.  If a cgroup's memory usage reaches
+	this limit and can't be reduced, the OOM killer is invoked in
+	the cgroup. Under certain circumstances, the usage may go
+	over the limit temporarily.
 
 	In default configuration regular 0-order allocations always
 	succeed unless OOM killer chooses current task as a victim.
@@ -1238,10 +1240,6 @@ PAGE_SIZE multiple when read back.
 	Caller could retry them differently, return into userspace
 	as -ENOMEM or silently ignore in cases like disk readahead.
 
-	This is the ultimate protection mechanism.  As long as the
-	high limit is used and monitored properly, this limit's
-	utility is limited to providing the final safety net.
-
   memory.reclaim
 	A write-only nested-keyed file which exists for all cgroups.
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-01 18:38 [PATCH] Documentation: Clarify usage of memory limits Dan Schatzberg
@ 2023-06-01 19:15 ` Waiman Long
  2023-06-01 19:53   ` Johannes Weiner
  2023-06-01 19:36 ` Johannes Weiner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2023-06-01 19:15 UTC (permalink / raw)
  To: Dan Schatzberg, Tejun Heo
  Cc: Chris Down, Zefan Li, Johannes Weiner, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

On 6/1/23 14:38, Dan Schatzberg wrote:
> The existing documentation refers to memory.high as the "main mechanism
> to control memory usage." This seems incorrect to me - memory.high can
> result in reclaim pressure which simply leads to stalls unless some
> external component observes and actions on it (e.g. systemd-oomd can be
> used for this purpose). While this is feasible, users are unaware of
> this interaction and are led to believe that memory.high alone is an
> effective mechanism for limiting memory.
>
> The documentation should recommend the use of memory.max as the
> effective way to enforce memory limits - it triggers reclaim and results
> in OOM kills by itself.

That is not how my understanding of memory.high works. When memory usage 
goes past memory.high, memory reclaim will be initiated to reclaim the 
memory back. Stall happens when memory.usage keep increasing like by 
consuming memory faster than what memory reclaim can recover. When 
memory.max is reached, OOM killer will then kill off the tasks.

IOW, memory consumption should not go past memory.high in normal usage 
scenario. I believe what you describe here isn't quite correct.

Cheers,
Longman

> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
> ---
>   Documentation/admin-guide/cgroup-v2.rst | 22 ++++++++++------------
>   1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index f67c0829350b..e592a9364473 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1213,23 +1213,25 @@ PAGE_SIZE multiple when read back.
>   	A read-write single value file which exists on non-root
>   	cgroups.  The default is "max".
>   
> -	Memory usage throttle limit.  This is the main mechanism to
> -	control memory usage of a cgroup.  If a cgroup's usage goes
> +	Memory usage throttle limit.  If a cgroup's usage goes
>   	over the high boundary, the processes of the cgroup are
>   	throttled and put under heavy reclaim pressure.
>   
>   	Going over the high limit never invokes the OOM killer and
> -	under extreme conditions the limit may be breached.
> +	under extreme conditions the limit may be breached. The high
> +	limit should be used in scenarios where an external process
> +	monitors the limited cgroup to alleviate heavy reclaim
> +	pressure.
>   
>     memory.max
>   	A read-write single value file which exists on non-root
>   	cgroups.  The default is "max".
>   
> -	Memory usage hard limit.  This is the final protection
> -	mechanism.  If a cgroup's memory usage reaches this limit and
> -	can't be reduced, the OOM killer is invoked in the cgroup.
> -	Under certain circumstances, the usage may go over the limit
> -	temporarily.
> +	Memory usage hard limit.  This is the main mechanism to limit
> +	memory usage of a cgroup.  If a cgroup's memory usage reaches
> +	this limit and can't be reduced, the OOM killer is invoked in
> +	the cgroup. Under certain circumstances, the usage may go
> +	over the limit temporarily.
>   
>   	In default configuration regular 0-order allocations always
>   	succeed unless OOM killer chooses current task as a victim.
> @@ -1238,10 +1240,6 @@ PAGE_SIZE multiple when read back.
>   	Caller could retry them differently, return into userspace
>   	as -ENOMEM or silently ignore in cases like disk readahead.
>   
> -	This is the ultimate protection mechanism.  As long as the
> -	high limit is used and monitored properly, this limit's
> -	utility is limited to providing the final safety net.
> -
>     memory.reclaim
>   	A write-only nested-keyed file which exists for all cgroups.
>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-01 18:38 [PATCH] Documentation: Clarify usage of memory limits Dan Schatzberg
  2023-06-01 19:15 ` Waiman Long
@ 2023-06-01 19:36 ` Johannes Weiner
  2023-06-03 21:33 ` Chris Down
  2023-06-06  0:09 ` Tejun Heo
  3 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2023-06-01 19:36 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Tejun Heo, Chris Down, Zefan Li, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

On Thu, Jun 01, 2023 at 11:38:19AM -0700, Dan Schatzberg wrote:
> The existing documentation refers to memory.high as the "main mechanism
> to control memory usage." This seems incorrect to me - memory.high can
> result in reclaim pressure which simply leads to stalls unless some
> external component observes and actions on it (e.g. systemd-oomd can be
> used for this purpose). While this is feasible, users are unaware of
> this interaction and are led to believe that memory.high alone is an
> effective mechanism for limiting memory.
> 
> The documentation should recommend the use of memory.max as the
> effective way to enforce memory limits - it triggers reclaim and results
> in OOM kills by itself.
> 
> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

Yeah, this is quite stale. How this ended up working in practice is a
bit different from how we initially conceived it.

Thanks for updating it.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-01 19:15 ` Waiman Long
@ 2023-06-01 19:53   ` Johannes Weiner
  2023-06-02  0:09     ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2023-06-01 19:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Dan Schatzberg, Tejun Heo, Chris Down, Zefan Li, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

On Thu, Jun 01, 2023 at 03:15:28PM -0400, Waiman Long wrote:
> On 6/1/23 14:38, Dan Schatzberg wrote:
> > The existing documentation refers to memory.high as the "main mechanism
> > to control memory usage." This seems incorrect to me - memory.high can
> > result in reclaim pressure which simply leads to stalls unless some
> > external component observes and actions on it (e.g. systemd-oomd can be
> > used for this purpose). While this is feasible, users are unaware of
> > this interaction and are led to believe that memory.high alone is an
> > effective mechanism for limiting memory.
> > 
> > The documentation should recommend the use of memory.max as the
> > effective way to enforce memory limits - it triggers reclaim and results
> > in OOM kills by itself.
> 
> That is not how my understanding of memory.high works. When memory usage
> goes past memory.high, memory reclaim will be initiated to reclaim the
> memory back. Stall happens when memory.usage keep increasing like by
> consuming memory faster than what memory reclaim can recover. When
> memory.max is reached, OOM killer will then kill off the tasks.

This was the initial plan indeed: Slow down the workload and thus slow
the growth; hope that the workload recovers with voluntary frees; set
memory.max as a safety if it keeps going beyond.

This never panned out. Once workloads are stuck, they might not back
down on their own. By increasingly slowing growth, it becomes harder
and harder for them to reach the memory.max intervention point.

It's a very brittle configuration strategy. Unless you very carefully
calibrate memory.high and memory.max together with awareness of the
throttling algorithm, workloads that hit memory.high will just go to
sleep indefinitely. They require outside intervention that either
adjusts limits or implements kill policies based on observed sleeps
(they're reported as pressure via psi).

So the common usecases today end up being that memory.max is for
enforcing kernel OOM kills, and memory.high is a tool to implement
userspace OOM killing policies.

Dan is right to point out the additional expectations for userspace
management when memory.high is in used. And memory.max is still the
primary, works-out-of-the-box method of memory containment.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-01 19:53   ` Johannes Weiner
@ 2023-06-02  0:09     ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2023-06-02  0:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Dan Schatzberg, Tejun Heo, Chris Down, Zefan Li, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

On 6/1/23 15:53, Johannes Weiner wrote:
> On Thu, Jun 01, 2023 at 03:15:28PM -0400, Waiman Long wrote:
>> On 6/1/23 14:38, Dan Schatzberg wrote:
>>> The existing documentation refers to memory.high as the "main mechanism
>>> to control memory usage." This seems incorrect to me - memory.high can
>>> result in reclaim pressure which simply leads to stalls unless some
>>> external component observes and actions on it (e.g. systemd-oomd can be
>>> used for this purpose). While this is feasible, users are unaware of
>>> this interaction and are led to believe that memory.high alone is an
>>> effective mechanism for limiting memory.
>>>
>>> The documentation should recommend the use of memory.max as the
>>> effective way to enforce memory limits - it triggers reclaim and results
>>> in OOM kills by itself.
>> That is not how my understanding of memory.high works. When memory usage
>> goes past memory.high, memory reclaim will be initiated to reclaim the
>> memory back. Stall happens when memory.usage keep increasing like by
>> consuming memory faster than what memory reclaim can recover. When
>> memory.max is reached, OOM killer will then kill off the tasks.
> This was the initial plan indeed: Slow down the workload and thus slow
> the growth; hope that the workload recovers with voluntary frees; set
> memory.max as a safety if it keeps going beyond.
>
> This never panned out. Once workloads are stuck, they might not back
> down on their own. By increasingly slowing growth, it becomes harder
> and harder for them to reach the memory.max intervention point.
>
> It's a very brittle configuration strategy. Unless you very carefully
> calibrate memory.high and memory.max together with awareness of the
> throttling algorithm, workloads that hit memory.high will just go to
> sleep indefinitely. They require outside intervention that either
> adjusts limits or implements kill policies based on observed sleeps
> (they're reported as pressure via psi).
>
> So the common usecases today end up being that memory.max is for
> enforcing kernel OOM kills, and memory.high is a tool to implement
> userspace OOM killing policies.
>
> Dan is right to point out the additional expectations for userspace
> management when memory.high is in used. And memory.max is still the
> primary, works-out-of-the-box method of memory containment.

Thanks for clarification. I have to reset my false assumption.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-01 18:38 [PATCH] Documentation: Clarify usage of memory limits Dan Schatzberg
  2023-06-01 19:15 ` Waiman Long
  2023-06-01 19:36 ` Johannes Weiner
@ 2023-06-03 21:33 ` Chris Down
  2023-06-06  0:09 ` Tejun Heo
  3 siblings, 0 replies; 8+ messages in thread
From: Chris Down @ 2023-06-03 21:33 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

Dan Schatzberg writes:
>The existing documentation refers to memory.high as the "main mechanism
>to control memory usage." This seems incorrect to me - memory.high can
>result in reclaim pressure which simply leads to stalls unless some
>external component observes and actions on it (e.g. systemd-oomd can be
>used for this purpose). While this is feasible, users are unaware of
>this interaction and are led to believe that memory.high alone is an
>effective mechanism for limiting memory.
>
>The documentation should recommend the use of memory.max as the
>effective way to enforce memory limits - it triggers reclaim and results
>in OOM kills by itself.
>
>Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

Oof, the documentation is very out of date indeed -- no wonder people were 
confused by other advice to only use memory.high with something external 
monitoring the cgroup.

Thanks!

Acked-by: Chris Down <chris@chrisdown.name>

>---
> Documentation/admin-guide/cgroup-v2.rst | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
>diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>index f67c0829350b..e592a9364473 100644
>--- a/Documentation/admin-guide/cgroup-v2.rst
>+++ b/Documentation/admin-guide/cgroup-v2.rst
>@@ -1213,23 +1213,25 @@ PAGE_SIZE multiple when read back.
> 	A read-write single value file which exists on non-root
> 	cgroups.  The default is "max".
>
>-	Memory usage throttle limit.  This is the main mechanism to
>-	control memory usage of a cgroup.  If a cgroup's usage goes
>+	Memory usage throttle limit.  If a cgroup's usage goes
> 	over the high boundary, the processes of the cgroup are
> 	throttled and put under heavy reclaim pressure.
>
> 	Going over the high limit never invokes the OOM killer and
>-	under extreme conditions the limit may be breached.
>+	under extreme conditions the limit may be breached. The high
>+	limit should be used in scenarios where an external process
>+	monitors the limited cgroup to alleviate heavy reclaim
>+	pressure.
>
>   memory.max
> 	A read-write single value file which exists on non-root
> 	cgroups.  The default is "max".
>
>-	Memory usage hard limit.  This is the final protection
>-	mechanism.  If a cgroup's memory usage reaches this limit and
>-	can't be reduced, the OOM killer is invoked in the cgroup.
>-	Under certain circumstances, the usage may go over the limit
>-	temporarily.
>+	Memory usage hard limit.  This is the main mechanism to limit
>+	memory usage of a cgroup.  If a cgroup's memory usage reaches
>+	this limit and can't be reduced, the OOM killer is invoked in
>+	the cgroup. Under certain circumstances, the usage may go
>+	over the limit temporarily.
>
> 	In default configuration regular 0-order allocations always
> 	succeed unless OOM killer chooses current task as a victim.
>@@ -1238,10 +1240,6 @@ PAGE_SIZE multiple when read back.
> 	Caller could retry them differently, return into userspace
> 	as -ENOMEM or silently ignore in cases like disk readahead.
>
>-	This is the ultimate protection mechanism.  As long as the
>-	high limit is used and monitored properly, this limit's
>-	utility is limited to providing the final safety net.
>-
>   memory.reclaim
> 	A write-only nested-keyed file which exists for all cgroups.
>
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-01 18:38 [PATCH] Documentation: Clarify usage of memory limits Dan Schatzberg
                   ` (2 preceding siblings ...)
  2023-06-03 21:33 ` Chris Down
@ 2023-06-06  0:09 ` Tejun Heo
  2023-06-06 13:09   ` Dan Schatzberg
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2023-06-06  0:09 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Chris Down, Zefan Li, Johannes Weiner, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

Hello,

On Thu, Jun 01, 2023 at 11:38:19AM -0700, Dan Schatzberg wrote:
> The existing documentation refers to memory.high as the "main mechanism
> to control memory usage." This seems incorrect to me - memory.high can
> result in reclaim pressure which simply leads to stalls unless some
> external component observes and actions on it (e.g. systemd-oomd can be
> used for this purpose). While this is feasible, users are unaware of
> this interaction and are led to believe that memory.high alone is an
> effective mechanism for limiting memory.
> 
> The documentation should recommend the use of memory.max as the
> effective way to enforce memory limits - it triggers reclaim and results
> in OOM kills by itself.
> 
> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

Applied to cgroup/for-6.4-fixes. Please see below for a comment tho.

> @@ -1213,23 +1213,25 @@ PAGE_SIZE multiple when read back.
>  	A read-write single value file which exists on non-root
>  	cgroups.  The default is "max".
>  
> -	Memory usage throttle limit.  This is the main mechanism to
> -	control memory usage of a cgroup.  If a cgroup's usage goes
> +	Memory usage throttle limit.  If a cgroup's usage goes
>  	over the high boundary, the processes of the cgroup are
>  	throttled and put under heavy reclaim pressure.
>  
>  	Going over the high limit never invokes the OOM killer and
> -	under extreme conditions the limit may be breached.
> +	under extreme conditions the limit may be breached. The high
> +	limit should be used in scenarios where an external process
> +	monitors the limited cgroup to alleviate heavy reclaim
> +	pressure.

I think it'd be helpful to provide pointers to oomd and systemd's
implementation of it here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation: Clarify usage of memory limits
  2023-06-06  0:09 ` Tejun Heo
@ 2023-06-06 13:09   ` Dan Schatzberg
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Schatzberg @ 2023-06-06 13:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Down, Zefan Li, Johannes Weiner, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list

On Mon, Jun 05, 2023 at 02:09:15PM -1000, Tejun Heo wrote:
> Hello,
>
> ...
> 
> I think it'd be helpful to provide pointers to oomd and systemd's
> implementation of it here.

Yeah, I considered that but didn't see any other external links in
this doc, so it felt out of place. I don't feel strongly, but feel
free to add to the patch and link to oomd
(https://github.com/facebookincubator/oomd) and systemd-oomd
(https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-06 13:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 18:38 [PATCH] Documentation: Clarify usage of memory limits Dan Schatzberg
2023-06-01 19:15 ` Waiman Long
2023-06-01 19:53   ` Johannes Weiner
2023-06-02  0:09     ` Waiman Long
2023-06-01 19:36 ` Johannes Weiner
2023-06-03 21:33 ` Chris Down
2023-06-06  0:09 ` Tejun Heo
2023-06-06 13:09   ` Dan Schatzberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).