linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: add config option to select the initial overcommit mode
@ 2016-05-10 11:56 Sebastian Frias
  2016-05-10 12:00 ` Fwd: " Sebastian Frias
                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-10 11:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Michal Hocko, Linus Torvalds; +Cc: LKML, mason

Currently the initial value of the overcommit mode is OVERCOMMIT_GUESS.
However, on embedded systems it is usually better to disable overcommit
to avoid waking up the OOM-killer and its well known undesirable
side-effects.

This config option allows to setup the initial overcommit mode to any of
the 3 available values, OVERCOMMIT_GUESS (which remains as default),
OVERCOMMIT_ALWAYS and OVERCOMMIT_NEVER.
The overcommit mode can still be changed thru sysctl after the system
boots up.

This config option depends on CONFIG_EXPERT.
This patch does not introduces functional changes.

Signed-off-by: Sebastian Frias <sf84@laposte.net>
---

NOTE: I understand that the overcommit mode can be changed dynamically thru
sysctl, but on embedded systems, where we know in advance that overcommit
will be disabled, there's no reason to postpone such setting.

I would also be interested in knowing if you guys think this option should
disable sysctl access for overcommit mode, essentially hardcoding the
overcommit mode when this option is used.

NOTE2: I tried to track down the history of overcommit but back then there
were no single patches apparently and the patch that appears to have
introduced the first overcommit mode (OVERCOMMIT_ALWAYS) is commit
9334eab8a36f ("Import 2.1.27"). OVERCOMMIT_NEVER was introduced with commit
502bff0685b2 ("[PATCH] strict overcommit").
My understanding is that prior to commit 9334eab8a36f ("Import 2.1.27")
there was no overcommit, is that correct?

NOTE3: checkpatch.pl is warning about missing description for the config
symbols ("please write a paragraph that describes the config symbol fully")
but my understanding is that that is a false positive (or the warning message
not clear enough for me to understand it) considering that I have added
'help' sections for each 'config' section.
---
 mm/Kconfig | 32 ++++++++++++++++++++++++++++++++
 mm/util.c  |  8 +++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index abb7dcf..6dad57d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -439,6 +439,38 @@ choice
 	  benefit.
 endchoice
 
+choice
+	prompt "Overcommit Mode"
+	default OVERCOMMIT_GUESS
+	depends on EXPERT
+	help
+	  Selects the initial value for Overcommit mode.
+
+	  NOTE: The overcommit mode can be changed dynamically through sysctl.
+
+	config OVERCOMMIT_GUESS
+		bool "Guess"
+	help
+	  Selecting this option forces the initial value of overcommit mode to
+	  "Guess" overcommits. This is the default value.
+	  See Documentation/vm/overcommit-accounting for more information.
+
+	config OVERCOMMIT_ALWAYS
+		bool "Always"
+	help
+	  Selecting this option forces the initial value of overcommit mode to
+	  "Always" overcommit.
+	  See Documentation/vm/overcommit-accounting for more information.
+
+	config OVERCOMMIT_NEVER
+		bool "Never"
+	help
+	  Selecting this option forces the initial value of overcommit mode to
+	  "Never" overcommit.
+	  See Documentation/vm/overcommit-accounting for more information.
+
+endchoice
+
 #
 # UP and nommu archs use km based percpu allocator
 #
diff --git a/mm/util.c b/mm/util.c
index 917e0e3..fd098bb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -418,7 +418,13 @@ int __page_mapcount(struct page *page)
 }
 EXPORT_SYMBOL_GPL(__page_mapcount);
 
-int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
+#if defined(CONFIG_OVERCOMMIT_NEVER)
+int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_NEVER;
+#elif defined(CONFIG_OVERCOMMIT_ALWAYS)
+int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_ALWAYS;
+#else
+int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
+#endif
 int sysctl_overcommit_ratio __read_mostly = 50;
 unsigned long sysctl_overcommit_kbytes __read_mostly;
 int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Fwd: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-10 11:56 [PATCH] mm: add config option to select the initial overcommit mode Sebastian Frias
@ 2016-05-10 12:00 ` Sebastian Frias
  2016-05-10 12:39   ` Andy Whitcroft
  2016-05-13  8:04 ` Michal Hocko
  2016-05-17  9:03 ` Mason
  2 siblings, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-10 12:00 UTC (permalink / raw)
  To: Andy Whitcroft, Joe Perches; +Cc: mason, LKML

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

Hi,

Using checkpatch.pl on the forwarded patch results in:

WARNING: please write a paragraph that describes the config symbol fully
#57: FILE: mm/Kconfig:451:
+       config OVERCOMMIT_GUESS

WARNING: please write a paragraph that describes the config symbol fully
#64: FILE: mm/Kconfig:458:
+       config OVERCOMMIT_ALWAYS

but there is a 'help' section for those 'config' sections.
NOTE: I followed the same indentation than the code laying just above the place where I inserted mine.

I think it is a false positive, what do you think?

Best regards,

Sebastian

[-- Attachment #2: [PATCH] mm: add config option to select the initial overcommit mode.eml --]
[-- Type: message/rfc822, Size: 4407 bytes --]

From: Sebastian Frias <sf84@laposte.net>
To: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,  Michal Hocko <mhocko@suse.com>, Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>, mason <slash.tmp@free.fr>
Subject: [PATCH] mm: add config option to select the initial overcommit mode
Date: Tue, 10 May 2016 13:56:30 +0200
Message-ID: <5731CC6E.3080807@laposte.net>

Currently the initial value of the overcommit mode is OVERCOMMIT_GUESS.
However, on embedded systems it is usually better to disable overcommit
to avoid waking up the OOM-killer and its well known undesirable
side-effects.

This config option allows to setup the initial overcommit mode to any of
the 3 available values, OVERCOMMIT_GUESS (which remains as default),
OVERCOMMIT_ALWAYS and OVERCOMMIT_NEVER.
The overcommit mode can still be changed thru sysctl after the system
boots up.

This config option depends on CONFIG_EXPERT.
This patch does not introduces functional changes.

Signed-off-by: Sebastian Frias <sf84@laposte.net>
---

NOTE: I understand that the overcommit mode can be changed dynamically thru
sysctl, but on embedded systems, where we know in advance that overcommit
will be disabled, there's no reason to postpone such setting.

I would also be interested in knowing if you guys think this option should
disable sysctl access for overcommit mode, essentially hardcoding the
overcommit mode when this option is used.

NOTE2: I tried to track down the history of overcommit but back then there
were no single patches apparently and the patch that appears to have
introduced the first overcommit mode (OVERCOMMIT_ALWAYS) is commit
9334eab8a36f ("Import 2.1.27"). OVERCOMMIT_NEVER was introduced with commit
502bff0685b2 ("[PATCH] strict overcommit").
My understanding is that prior to commit 9334eab8a36f ("Import 2.1.27")
there was no overcommit, is that correct?

NOTE3: checkpatch.pl is warning about missing description for the config
symbols ("please write a paragraph that describes the config symbol fully")
but my understanding is that that is a false positive (or the warning message
not clear enough for me to understand it) considering that I have added
'help' sections for each 'config' section.
---
 mm/Kconfig | 32 ++++++++++++++++++++++++++++++++
 mm/util.c  |  8 +++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index abb7dcf..6dad57d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -439,6 +439,38 @@ choice
 	  benefit.
 endchoice
 
+choice
+	prompt "Overcommit Mode"
+	default OVERCOMMIT_GUESS
+	depends on EXPERT
+	help
+	  Selects the initial value for Overcommit mode.
+
+	  NOTE: The overcommit mode can be changed dynamically through sysctl.
+
+	config OVERCOMMIT_GUESS
+		bool "Guess"
+	help
+	  Selecting this option forces the initial value of overcommit mode to
+	  "Guess" overcommits. This is the default value.
+	  See Documentation/vm/overcommit-accounting for more information.
+
+	config OVERCOMMIT_ALWAYS
+		bool "Always"
+	help
+	  Selecting this option forces the initial value of overcommit mode to
+	  "Always" overcommit.
+	  See Documentation/vm/overcommit-accounting for more information.
+
+	config OVERCOMMIT_NEVER
+		bool "Never"
+	help
+	  Selecting this option forces the initial value of overcommit mode to
+	  "Never" overcommit.
+	  See Documentation/vm/overcommit-accounting for more information.
+
+endchoice
+
 #
 # UP and nommu archs use km based percpu allocator
 #
diff --git a/mm/util.c b/mm/util.c
index 917e0e3..fd098bb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -418,7 +418,13 @@ int __page_mapcount(struct page *page)
 }
 EXPORT_SYMBOL_GPL(__page_mapcount);
 
-int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
+#if defined(CONFIG_OVERCOMMIT_NEVER)
+int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_NEVER;
+#elif defined(CONFIG_OVERCOMMIT_ALWAYS)
+int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_ALWAYS;
+#else
+int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
+#endif
 int sysctl_overcommit_ratio __read_mostly = 50;
 unsigned long sysctl_overcommit_kbytes __read_mostly;
 int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: Fwd: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-10 12:00 ` Fwd: " Sebastian Frias
@ 2016-05-10 12:39   ` Andy Whitcroft
  2016-05-10 13:02     ` Sebastian Frias
  0 siblings, 1 reply; 52+ messages in thread
From: Andy Whitcroft @ 2016-05-10 12:39 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Joe Perches, mason, LKML

On Tue, May 10, 2016 at 02:00:47PM +0200, Sebastian Frias wrote:
> Hi,
> 
> Using checkpatch.pl on the forwarded patch results in:
> 
> WARNING: please write a paragraph that describes the config symbol fully
> #57: FILE: mm/Kconfig:451:
> +       config OVERCOMMIT_GUESS
> 
> WARNING: please write a paragraph that describes the config symbol fully
> #64: FILE: mm/Kconfig:458:
> +       config OVERCOMMIT_ALWAYS
> 
> but there is a 'help' section for those 'config' sections.
> NOTE: I followed the same indentation than the code laying just above the place where I inserted mine.
> 
> I think it is a false positive, what do you think?
> 
> Best regards,
> 
> Sebastian

Well, I am expecting the issue to be that the per option help is not
indented within the option like I am expecting ...

> Date: Tue, 10 May 2016 13:56:30 +0200
> From: Sebastian Frias <sf84@laposte.net>
> To: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, Michal
>  Hocko <mhocko@suse.com>, Linus Torvalds <torvalds@linux-foundation.org>
> CC: LKML <linux-kernel@vger.kernel.org>, mason <slash.tmp@free.fr>
> Subject: [PATCH] mm: add config option to select the initial overcommit mode
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101
>  Thunderbird/31.2.0
> 
> Currently the initial value of the overcommit mode is OVERCOMMIT_GUESS.
> However, on embedded systems it is usually better to disable overcommit
> to avoid waking up the OOM-killer and its well known undesirable
> side-effects.
> 
> This config option allows to setup the initial overcommit mode to any of
> the 3 available values, OVERCOMMIT_GUESS (which remains as default),
> OVERCOMMIT_ALWAYS and OVERCOMMIT_NEVER.
> The overcommit mode can still be changed thru sysctl after the system
> boots up.
> 
> This config option depends on CONFIG_EXPERT.
> This patch does not introduces functional changes.
> 
> Signed-off-by: Sebastian Frias <sf84@laposte.net>
> ---
> 
> NOTE: I understand that the overcommit mode can be changed dynamically thru
> sysctl, but on embedded systems, where we know in advance that overcommit
> will be disabled, there's no reason to postpone such setting.
> 
> I would also be interested in knowing if you guys think this option should
> disable sysctl access for overcommit mode, essentially hardcoding the
> overcommit mode when this option is used.
> 
> NOTE2: I tried to track down the history of overcommit but back then there
> were no single patches apparently and the patch that appears to have
> introduced the first overcommit mode (OVERCOMMIT_ALWAYS) is commit
> 9334eab8a36f ("Import 2.1.27"). OVERCOMMIT_NEVER was introduced with commit
> 502bff0685b2 ("[PATCH] strict overcommit").
> My understanding is that prior to commit 9334eab8a36f ("Import 2.1.27")
> there was no overcommit, is that correct?
> 
> NOTE3: checkpatch.pl is warning about missing description for the config
> symbols ("please write a paragraph that describes the config symbol fully")
> but my understanding is that that is a false positive (or the warning message
> not clear enough for me to understand it) considering that I have added
> 'help' sections for each 'config' section.
> ---
>  mm/Kconfig | 32 ++++++++++++++++++++++++++++++++
>  mm/util.c  |  8 +++++++-
>  2 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index abb7dcf..6dad57d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -439,6 +439,38 @@ choice
>  	  benefit.
>  endchoice
>  
> +choice
> +	prompt "Overcommit Mode"
> +	default OVERCOMMIT_GUESS
> +	depends on EXPERT
> +	help
> +	  Selects the initial value for Overcommit mode.
> +
> +	  NOTE: The overcommit mode can be changed dynamically through sysctl.
> +
> +	config OVERCOMMIT_GUESS
> +		bool "Guess"

I am expecting the help below to be indented at the same level as the
bool above.  As you have done with the help for the choice itself.  I am
pretty sure checkpatch is assuming the "contents" of the config item are
all intented more than it is.

> +	help
> +	  Selecting this option forces the initial value of overcommit mode to
> +	  "Guess" overcommits. This is the default value.
> +	  See Documentation/vm/overcommit-accounting for more information.
[...]

-apw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Fwd: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-10 12:39   ` Andy Whitcroft
@ 2016-05-10 13:02     ` Sebastian Frias
  0 siblings, 0 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-10 13:02 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, mason, LKML

Hi Andy,

On 05/10/2016 02:39 PM, Andy Whitcroft wrote:
> On Tue, May 10, 2016 at 02:00:47PM +0200, Sebastian Frias wrote:
>> Hi,
>>
>> Using checkpatch.pl on the forwarded patch results in:
>>
>> WARNING: please write a paragraph that describes the config symbol fully
>> #57: FILE: mm/Kconfig:451:
>> +       config OVERCOMMIT_GUESS
>>
>> WARNING: please write a paragraph that describes the config symbol fully
>> #64: FILE: mm/Kconfig:458:
>> +       config OVERCOMMIT_ALWAYS
>>
>> but there is a 'help' section for those 'config' sections.
>> NOTE: I followed the same indentation than the code laying just above the place where I inserted mine.
>>
>> I think it is a false positive, what do you think?
>>
>> Best regards,
>>
>> Sebastian
> 
> Well, I am expecting the issue to be that the per option help is not
> indented within the option like I am expecting ...

Changing the indentation does not solve the issue.
Marc (in CC) just told me he had had the same issue and it was related to having less than 4 lines of 'help'.
If I add a dummy line to the 'help' section the warning goes indeed away.

Also notice that 'config OVERCOMMIT_NEVER' is not giving warnings even if its 'help' section has also 3 lines.

> 
>> Date: Tue, 10 May 2016 13:56:30 +0200
>> From: Sebastian Frias <sf84@laposte.net>
>> To: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, Michal
>>  Hocko <mhocko@suse.com>, Linus Torvalds <torvalds@linux-foundation.org>
>> CC: LKML <linux-kernel@vger.kernel.org>, mason <slash.tmp@free.fr>
>> Subject: [PATCH] mm: add config option to select the initial overcommit mode
>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101
>>  Thunderbird/31.2.0
>>
>> Currently the initial value of the overcommit mode is OVERCOMMIT_GUESS.
>> However, on embedded systems it is usually better to disable overcommit
>> to avoid waking up the OOM-killer and its well known undesirable
>> side-effects.
>>
>> This config option allows to setup the initial overcommit mode to any of
>> the 3 available values, OVERCOMMIT_GUESS (which remains as default),
>> OVERCOMMIT_ALWAYS and OVERCOMMIT_NEVER.
>> The overcommit mode can still be changed thru sysctl after the system
>> boots up.
>>
>> This config option depends on CONFIG_EXPERT.
>> This patch does not introduces functional changes.
>>
>> Signed-off-by: Sebastian Frias <sf84@laposte.net>
>> ---
>>
>> NOTE: I understand that the overcommit mode can be changed dynamically thru
>> sysctl, but on embedded systems, where we know in advance that overcommit
>> will be disabled, there's no reason to postpone such setting.
>>
>> I would also be interested in knowing if you guys think this option should
>> disable sysctl access for overcommit mode, essentially hardcoding the
>> overcommit mode when this option is used.
>>
>> NOTE2: I tried to track down the history of overcommit but back then there
>> were no single patches apparently and the patch that appears to have
>> introduced the first overcommit mode (OVERCOMMIT_ALWAYS) is commit
>> 9334eab8a36f ("Import 2.1.27"). OVERCOMMIT_NEVER was introduced with commit
>> 502bff0685b2 ("[PATCH] strict overcommit").
>> My understanding is that prior to commit 9334eab8a36f ("Import 2.1.27")
>> there was no overcommit, is that correct?
>>
>> NOTE3: checkpatch.pl is warning about missing description for the config
>> symbols ("please write a paragraph that describes the config symbol fully")
>> but my understanding is that that is a false positive (or the warning message
>> not clear enough for me to understand it) considering that I have added
>> 'help' sections for each 'config' section.
>> ---
>>  mm/Kconfig | 32 ++++++++++++++++++++++++++++++++
>>  mm/util.c  |  8 +++++++-
>>  2 files changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index abb7dcf..6dad57d 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -439,6 +439,38 @@ choice
>>  	  benefit.
>>  endchoice
>>  
>> +choice
>> +	prompt "Overcommit Mode"
>> +	default OVERCOMMIT_GUESS
>> +	depends on EXPERT
>> +	help
>> +	  Selects the initial value for Overcommit mode.
>> +
>> +	  NOTE: The overcommit mode can be changed dynamically through sysctl.
>> +
>> +	config OVERCOMMIT_GUESS
>> +		bool "Guess"
> 
> I am expecting the help below to be indented at the same level as the
> bool above.  As you have done with the help for the choice itself.  I am
> pretty sure checkpatch is assuming the "contents" of the config item are
> all intented more than it is.
> 
>> +	help
>> +	  Selecting this option forces the initial value of overcommit mode to
>> +	  "Guess" overcommits. This is the default value.
>> +	  See Documentation/vm/overcommit-accounting for more information.
> [...]
> 
> -apw
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-10 11:56 [PATCH] mm: add config option to select the initial overcommit mode Sebastian Frias
  2016-05-10 12:00 ` Fwd: " Sebastian Frias
@ 2016-05-13  8:04 ` Michal Hocko
  2016-05-13  8:44   ` Mason
  2016-05-17  9:03 ` Mason
  2 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-13  8:04 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: linux-mm, Andrew Morton, Linus Torvalds, LKML, mason

On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
[...]
> NOTE: I understand that the overcommit mode can be changed dynamically thru
> sysctl, but on embedded systems, where we know in advance that overcommit
> will be disabled, there's no reason to postpone such setting.

To be honest I am not particularly happy about yet another config
option. At least not without a strong reason (the one above doesn't
sound that way). The config space is really large already.
So why a later initialization matters at all? Early userspace shouldn't
consume too much address space to blow up later, no?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13  8:04 ` Michal Hocko
@ 2016-05-13  8:44   ` Mason
  2016-05-13  9:52     ` Michal Hocko
  2016-05-13  9:52     ` Sebastian Frias
  0 siblings, 2 replies; 52+ messages in thread
From: Mason @ 2016-05-13  8:44 UTC (permalink / raw)
  To: Michal Hocko, Sebastian Frias
  Cc: linux-mm, Andrew Morton, Linus Torvalds, LKML

On 13/05/2016 10:04, Michal Hocko wrote:

> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
> [...]
>> NOTE: I understand that the overcommit mode can be changed dynamically thru
>> sysctl, but on embedded systems, where we know in advance that overcommit
>> will be disabled, there's no reason to postpone such setting.
> 
> To be honest I am not particularly happy about yet another config
> option. At least not without a strong reason (the one above doesn't
> sound that way). The config space is really large already.
> So why a later initialization matters at all? Early userspace shouldn't
> consume too much address space to blow up later, no?

One thing I'm not quite clear on is: why was the default set
to over-commit on?

I suppose the biggest use-case is when a "large" process forks
only to exec microseconds later into a "small" process, it would
be silly to refuse that fork. But isn't that what the COW
optimization addresses, without the need for over-commit?

Another issue with overcommit=on is that some programmers seem
to take for granted that "allocations will never fail" and so
neglect to handle malloc == NULL conditions gracefully.

I tried to run LTP with overcommit off, and I vaguely recall that
I had more failures than with overcommit on. (Perhaps only those
tests that tickle the dreaded OOM assassin.)

Regards.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13  8:44   ` Mason
@ 2016-05-13  9:52     ` Michal Hocko
  2016-05-13 10:18       ` Mason
  2016-05-13  9:52     ` Sebastian Frias
  1 sibling, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-13  9:52 UTC (permalink / raw)
  To: Mason; +Cc: Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 10:44:30, Mason wrote:
> On 13/05/2016 10:04, Michal Hocko wrote:
> 
> > On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
> > [...]
> >> NOTE: I understand that the overcommit mode can be changed dynamically thru
> >> sysctl, but on embedded systems, where we know in advance that overcommit
> >> will be disabled, there's no reason to postpone such setting.
> > 
> > To be honest I am not particularly happy about yet another config
> > option. At least not without a strong reason (the one above doesn't
> > sound that way). The config space is really large already.
> > So why a later initialization matters at all? Early userspace shouldn't
> > consume too much address space to blow up later, no?
> 
> One thing I'm not quite clear on is: why was the default set
> to over-commit on?

Because many applications simply rely on large and sparsely used address
space, I guess. That's why the default is GUESS where we ignore the
cumulative charges and simply check the current state and blow up only
when the current request is way too large.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13  8:44   ` Mason
  2016-05-13  9:52     ` Michal Hocko
@ 2016-05-13  9:52     ` Sebastian Frias
  2016-05-13 12:00       ` Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13  9:52 UTC (permalink / raw)
  To: Mason, Michal Hocko; +Cc: linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi,

On 05/13/2016 10:44 AM, Mason wrote:
> On 13/05/2016 10:04, Michal Hocko wrote:
> 
>> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
>> [...]
>>> NOTE: I understand that the overcommit mode can be changed dynamically thru
>>> sysctl, but on embedded systems, where we know in advance that overcommit
>>> will be disabled, there's no reason to postpone such setting.
>>
>> To be honest I am not particularly happy about yet another config
>> option. At least not without a strong reason (the one above doesn't
>> sound that way). The config space is really large already.
>> So why a later initialization matters at all? Early userspace shouldn't
>> consume too much address space to blow up later, no?

By the way, do you know what's the rationale to allow this setting to be controlled by the userspace dynamically?
Was it for testing only?

> 
> One thing I'm not quite clear on is: why was the default set
> to over-commit on?

Indeed, I was hoping we could throw some light into that.
My patch had another note:

   "NOTE2: I tried to track down the history of overcommit but back then there
were no single patches apparently and the patch that appears to have
introduced the first overcommit mode (OVERCOMMIT_ALWAYS) is commit
9334eab8a36f ("Import 2.1.27"). OVERCOMMIT_NEVER was introduced with commit
502bff0685b2 ("[PATCH] strict overcommit").
My understanding is that prior to commit 9334eab8a36f ("Import 2.1.27")
there was no overcommit, is that correct?"

It'd be nice to know more about why was overcommit introduced.
Furthermore, it looks like allowing overcommit and the introduction of the OOM-killer has given rise to lots of other options to try to tame the OOM-killer.
Without context, that may seem like a form of "feature creep" around it.
Moreover, it makes Linux behave differently from let's say Solaris.

   https://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6

Hopefully this discussion could clear some of this up and maybe result in more documentation around this subject.

> 
> I suppose the biggest use-case is when a "large" process forks
> only to exec microseconds later into a "small" process, it would
> be silly to refuse that fork. But isn't that what the COW
> optimization addresses, without the need for over-commit?
> 
> Another issue with overcommit=on is that some programmers seem
> to take for granted that "allocations will never fail" and so
> neglect to handle malloc == NULL conditions gracefully.
> 
> I tried to run LTP with overcommit off, and I vaguely recall that
> I had more failures than with overcommit on. (Perhaps only those
> tests that tickle the dreaded OOM assassin.)

>From what I remember, one of the LTP maintainers said that it is highly unlikely people test (or run LTP for that matter) with different settings for overcommit.

Years ago, while using MacOS X, a long running process apparently took all the memory over night.
The next day when I checked the computer I saw a dialog that said something like (I don't remember the exact wording) "process X has been paused due to lack of memory (or is requesting too much memory, I don't remember). If you think this is not normal you can terminate process X, otherwise you can terminate other processes to free memory and unpause process X to continue" and then some options to proceed.

If left unattended (thus the dialog unanswered), the computer would still work, all other processes were left intact and only the "offending" process was paused.
Arguably, if the "offending" process is just left paused, it takes the memory away from other processes, and if it was a server, maybe it wouldn't have enough memory to reply to requests.
On the server world I can thus understand that some setting could indicate that when the situation arises, the "dialog" is automatically dismissed with some default action, like "terminate the offending process".

To me it seems really strange for the "OOM-killer" to exist.
It has happened to me that it kills my terminals or editors, how can people deal with random processes being killed?
Doesn't it bother anybody?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13  9:52     ` Michal Hocko
@ 2016-05-13 10:18       ` Mason
  2016-05-13 10:42         ` Sebastian Frias
                           ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Mason @ 2016-05-13 10:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 13/05/2016 11:52, Michal Hocko wrote:
> On Fri 13-05-16 10:44:30, Mason wrote:
>> On 13/05/2016 10:04, Michal Hocko wrote:
>>
>>> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
>>> [...]
>>>> NOTE: I understand that the overcommit mode can be changed dynamically thru
>>>> sysctl, but on embedded systems, where we know in advance that overcommit
>>>> will be disabled, there's no reason to postpone such setting.
>>>
>>> To be honest I am not particularly happy about yet another config
>>> option. At least not without a strong reason (the one above doesn't
>>> sound that way). The config space is really large already.
>>> So why a later initialization matters at all? Early userspace shouldn't
>>> consume too much address space to blow up later, no?
>>
>> One thing I'm not quite clear on is: why was the default set
>> to over-commit on?
> 
> Because many applications simply rely on large and sparsely used address
> space, I guess.

What kind of applications are we talking about here?

Server apps? Client apps? Supercomputer apps?

I heard some HPC software use large sparse matrices, but is it a common
idiom to request large allocations, only to use a fraction of it?

If you'll excuse the slight trolling, I'm sure many applications don't
expect being randomly zapped by the OOM killer ;-)

> That's why the default is GUESS where we ignore the cumulative
> charges and simply check the current state and blow up only when
> the current request is way too large.

I wouldn't call denying a request "blowing up". Application will
receive NULL, and is supposed to handle it gracefully.

"Blowing up" is receiving SIGKILL because another process happened
to allocate too much memory.

Regards.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 10:18       ` Mason
@ 2016-05-13 10:42         ` Sebastian Frias
  2016-05-13 11:44         ` Michal Hocko
  2016-05-13 13:27         ` Austin S. Hemmelgarn
  2 siblings, 0 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 10:42 UTC (permalink / raw)
  To: Mason, Michal Hocko; +Cc: linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi,

On 05/13/2016 12:18 PM, Mason wrote:
> On 13/05/2016 11:52, Michal Hocko wrote:
>> On Fri 13-05-16 10:44:30, Mason wrote:
>>> On 13/05/2016 10:04, Michal Hocko wrote:
>>>
>>>> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
>>>> [...]
>>>>> NOTE: I understand that the overcommit mode can be changed dynamically thru
>>>>> sysctl, but on embedded systems, where we know in advance that overcommit
>>>>> will be disabled, there's no reason to postpone such setting.
>>>>
>>>> To be honest I am not particularly happy about yet another config
>>>> option. At least not without a strong reason (the one above doesn't
>>>> sound that way). The config space is really large already.
>>>> So why a later initialization matters at all? Early userspace shouldn't
>>>> consume too much address space to blow up later, no?
>>>
>>> One thing I'm not quite clear on is: why was the default set
>>> to over-commit on?
>>
>> Because many applications simply rely on large and sparsely used address
>> space, I guess.
> 
> What kind of applications are we talking about here?
> 
> Server apps? Client apps? Supercomputer apps?
> 
> I heard some HPC software use large sparse matrices, but is it a common
> idiom to request large allocations, only to use a fraction of it?
> 

Let's say there are specific applications that require overcommit.
Shouldn't overcommit be changed for those specific circumstances?
In other words, why is overcommit=GUESS default for everybody?

> If you'll excuse the slight trolling, I'm sure many applications don't
> expect being randomly zapped by the OOM killer ;-)
> 
>> That's why the default is GUESS where we ignore the cumulative
>> charges and simply check the current state and blow up only when
>> the current request is way too large.
> 
> I wouldn't call denying a request "blowing up". Application will
> receive NULL, and is supposed to handle it gracefully.
> 
> "Blowing up" is receiving SIGKILL because another process happened
> to allocate too much memory.

I agree.
Furthermore, the "blow up when the current request is too large" is more complex than that due to delay between the allocation and the time when the system realises it cannot honour the promise, there must be lots of code/heuristics involved there.
Anyway, it'd be nice to understand the real history behind overcommit (as I stated earlier, my understanding of the history is that in the early days there was no overcommit) and why it is there by default if only specific applications would require it.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 10:18       ` Mason
  2016-05-13 10:42         ` Sebastian Frias
@ 2016-05-13 11:44         ` Michal Hocko
  2016-05-13 12:15           ` Mason
  2016-05-13 13:27         ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 11:44 UTC (permalink / raw)
  To: Mason; +Cc: Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 12:18:54, Mason wrote:
> On 13/05/2016 11:52, Michal Hocko wrote:
> > On Fri 13-05-16 10:44:30, Mason wrote:
> >> On 13/05/2016 10:04, Michal Hocko wrote:
> >>
> >>> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
> >>> [...]
> >>>> NOTE: I understand that the overcommit mode can be changed dynamically thru
> >>>> sysctl, but on embedded systems, where we know in advance that overcommit
> >>>> will be disabled, there's no reason to postpone such setting.
> >>>
> >>> To be honest I am not particularly happy about yet another config
> >>> option. At least not without a strong reason (the one above doesn't
> >>> sound that way). The config space is really large already.
> >>> So why a later initialization matters at all? Early userspace shouldn't
> >>> consume too much address space to blow up later, no?
> >>
> >> One thing I'm not quite clear on is: why was the default set
> >> to over-commit on?
> > 
> > Because many applications simply rely on large and sparsely used address
> > space, I guess.
> 
> What kind of applications are we talking about here?
> 
> Server apps? Client apps? Supercomputer apps?

It is all over the place. But some are worse than others (e.g. try to
run some larger java application).

Anyway, this is my laptop where I do not run anything really special
(xfce, browser, few consoles, git, mutt):
$ grep Commit /proc/meminfo 
CommitLimit:     3497288 kB
Committed_AS:    3560804 kB

I am running with the default overcommit setup so I do not care about
the limit but the Committed_AS will tell you how much is actually
committed. I am definitelly not out of memory:
$ free
              total        used        free      shared  buff/cache   available
Mem:        3922584     1724120      217336      105264     1981128     2036164
Swap:       1535996      386364     1149632

If you check the rss/vsize ratio of your processes (which is not precise
but give at least some clue) then you will see that I am quite below 10% on
my system in average:
$ ps -ao vsize,rss -ax | awk '{if ($1+0>0) printf "%d\n", $2*100/$1 }' | calc_min_max.awk 
min: 0.00 max: 44.00 avg: 6.16 std: 7.85 nr: 120

> I heard some HPC software use large sparse matrices, but is it a common
> idiom to request large allocations, only to use a fraction of it?
> 
> If you'll excuse the slight trolling, I'm sure many applications don't
> expect being randomly zapped by the OOM killer ;-)

No, neither banks (and their customers) are prepared for a default
aren't they ;).

But more seriously. Overcommit is simply a reality these days. It would
be quite naive to think that enabling the overcommit protection would
guarantee that no OOM will trigger. The kernel can consume a lot of
memory as well which might be unreclaimable.
 
> > That's why the default is GUESS where we ignore the cumulative
> > charges and simply check the current state and blow up only when
> > the current request is way too large.
> 
> I wouldn't call denying a request "blowing up". Application will
> receive NULL, and is supposed to handle it gracefully.

Sure they will handle ENOMEM (in better case) but in reality it would
basically mean that they will fail eventually because there is hardly a
fallback. And it really sucks to fail with "Not enough memory" when you
check and your memory is mostly free/reclaimable (see the example above
from my running system).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13  9:52     ` Sebastian Frias
@ 2016-05-13 12:00       ` Michal Hocko
  2016-05-13 12:39         ` Sebastian Frias
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 12:00 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 11:52:30, Sebastian Frias wrote:
> Hi,
> 
> On 05/13/2016 10:44 AM, Mason wrote:
> > On 13/05/2016 10:04, Michal Hocko wrote:
> > 
> >> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
> >> [...]
> >>> NOTE: I understand that the overcommit mode can be changed dynamically thru
> >>> sysctl, but on embedded systems, where we know in advance that overcommit
> >>> will be disabled, there's no reason to postpone such setting.
> >>
> >> To be honest I am not particularly happy about yet another config
> >> option. At least not without a strong reason (the one above doesn't
> >> sound that way). The config space is really large already.
> >> So why a later initialization matters at all? Early userspace shouldn't
> >> consume too much address space to blow up later, no?
> 
> By the way, do you know what's the rationale to allow this setting to
> be controlled by the userspace dynamically?  Was it for testing only?

Dunno, but I guess the default might be just too benevolent for some
specific workloads which are not so wasteful to their address space
and the strict overcommit is really helpful for them.

OVERCOMMIT_ALWAYS is certainly useful for testing.

> > One thing I'm not quite clear on is: why was the default set
> > to over-commit on?
> 
> Indeed, I was hoping we could throw some light into that.
> My patch had another note:

I cannot really tell because this was way before my time but I guess the
reason was that userspace is usually very address space hungry while the
actual memory consumption is not that bad. See my other email.

>    "NOTE2: I tried to track down the history of overcommit but back then there
> were no single patches apparently and the patch that appears to have
> introduced the first overcommit mode (OVERCOMMIT_ALWAYS) is commit
> 9334eab8a36f ("Import 2.1.27"). OVERCOMMIT_NEVER was introduced with commit
> 502bff0685b2 ("[PATCH] strict overcommit").
> My understanding is that prior to commit 9334eab8a36f ("Import 2.1.27")
> there was no overcommit, is that correct?"
> 
> It'd be nice to know more about why was overcommit introduced.
> Furthermore, it looks like allowing overcommit and the introduction of the OOM-killer has given rise to lots of other options to try to tame the OOM-killer.
> Without context, that may seem like a form of "feature creep" around it.
> Moreover, it makes Linux behave differently from let's say Solaris.
> 
>    https://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6

Well, those are some really strong statements which do not really
reflect the reality of the linux userspace. I am not going to argue with
those points because it doesn't make much sense. Yes in an ideal world
everybody consumes only so much he needs. Well the real life is a bit
different...

> Hopefully this discussion could clear some of this up and maybe result
> in more documentation around this subject.

What kind of documentation would help?
Documentation/vm/overcommit-accounting seems to be pretty much extensive
about all available modes including things to be aware of.
 
> > I suppose the biggest use-case is when a "large" process forks
> > only to exec microseconds later into a "small" process, it would
> > be silly to refuse that fork. But isn't that what the COW
> > optimization addresses, without the need for over-commit?
> > 
> > Another issue with overcommit=on is that some programmers seem
> > to take for granted that "allocations will never fail" and so
> > neglect to handle malloc == NULL conditions gracefully.
> > 
> > I tried to run LTP with overcommit off, and I vaguely recall that
> > I had more failures than with overcommit on. (Perhaps only those
> > tests that tickle the dreaded OOM assassin.)
> 
> From what I remember, one of the LTP maintainers said that it is
> highly unlikely people test (or run LTP for that matter) with
> different settings for overcommit.

Yes this is sad and the result of a excessive configuration space.
That's why I was pushing back to adding yet another one without having
really good reasons...

> Years ago, while using MacOS X, a long running process apparently took
> all the memory over night.  The next day when I checked the computer
> I saw a dialog that said something like (I don't remember the exact
> wording) "process X has been paused due to lack of memory (or is
> requesting too much memory, I don't remember). If you think this is
> not normal you can terminate process X, otherwise you can terminate
> other processes to free memory and unpause process X to continue" and
> then some options to proceed.
>
> If left unattended (thus the dialog unanswered), the computer would
> still work, all other processes were left intact and only the
> "offending" process was paused.  Arguably, if the "offending" process
> is just left paused, it takes the memory away from other processes,
> and if it was a server, maybe it wouldn't have enough memory to reply
> to requests.  On the server world I can thus understand that some
> setting could indicate that when the situation arises, the "dialog" is
> automatically dismissed with some default action, like "terminate the
> offending process".

Not sure what you are trying to tell here but it seems like killing such
a leaking task is a better option as the memory can be reused for others
rather than keep it blocked for an unbounded amount of time.

> To me it seems really strange for the "OOM-killer" to exist.  It has
> happened to me that it kills my terminals or editors, how can people
> deal with random processes being killed?  Doesn't it bother anybody?

Killing random tasks is definitely a misbehavior and it happened a lot
in the past when heuristics were based on multiple metrics (including
the run time etc.). Things have changed considerably since then and
seeing random tasks being selected shouldn't happen all that often and
if it happens it should be reported, understood and fixed.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 11:44         ` Michal Hocko
@ 2016-05-13 12:15           ` Mason
  2016-05-13 14:01             ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Mason @ 2016-05-13 12:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 13/05/2016 13:44, Michal Hocko wrote:

> Anyway, this is my laptop where I do not run anything really special
> (xfce, browser, few consoles, git, mutt):
> $ grep Commit /proc/meminfo
> CommitLimit:     3497288 kB
> Committed_AS:    3560804 kB
> 
> I am running with the default overcommit setup so I do not care about
> the limit but the Committed_AS will tell you how much is actually
> committed. I am definitelly not out of memory:
> $ free
>               total        used        free      shared  buff/cache   available
> Mem:        3922584     1724120      217336      105264     1981128     2036164
> Swap:       1535996      386364     1149632

I see. Thanks for the data point.

I had a different type of system in mind.
256 to 512 MB of RAM, no swap.
Perhaps Sebastian's choice could be made to depend on CONFIG_EMBEDDED,
rather than CONFIG_EXPERT?

Regards.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 12:00       ` Michal Hocko
@ 2016-05-13 12:39         ` Sebastian Frias
  2016-05-13 13:11           ` Austin S. Hemmelgarn
  2016-05-13 14:51           ` Michal Hocko
  0 siblings, 2 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 12:39 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Michal,

On 05/13/2016 02:00 PM, Michal Hocko wrote:
> On Fri 13-05-16 11:52:30, Sebastian Frias wrote:
>>
>> By the way, do you know what's the rationale to allow this setting to
>> be controlled by the userspace dynamically?  Was it for testing only?
> 
> Dunno, but I guess the default might be just too benevolent for some
> specific workloads which are not so wasteful to their address space
> and the strict overcommit is really helpful for them.
> 

Exactly. That's why I was wondering what is the history behind enabling it by default.

> OVERCOMMIT_ALWAYS is certainly useful for testing.
> 
>>> One thing I'm not quite clear on is: why was the default set
>>> to over-commit on?
>>
>> Indeed, I was hoping we could throw some light into that.
>> My patch had another note:
> 
> I cannot really tell because this was way before my time but I guess the
> reason was that userspace is usually very address space hungry while the
> actual memory consumption is not that bad. See my other email.

Yes, I saw that, thanks for the example.
It's just that it feels like the default value is there to deal with (what it should be?) very specific cases, right?

>> It'd be nice to know more about why was overcommit introduced.
>> Furthermore, it looks like allowing overcommit and the introduction of the OOM-killer has given rise to lots of other options to try to tame the OOM-killer.
>> Without context, that may seem like a form of "feature creep" around it.
>> Moreover, it makes Linux behave differently from let's say Solaris.
>>
>>    https://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6
> 
> Well, those are some really strong statements which do not really
> reflect the reality of the linux userspace. I am not going to argue with
> those points because it doesn't make much sense. Yes in an ideal world
> everybody consumes only so much he needs. Well the real life is a bit
> different...

:-)
I see, so basically it is a sort of workaround.

Anyway, in the embedded world the memory and system requirements are usually controlled.

Would you agree to the option if it was dependent on CONFIG_EMBEDDED? Or if it was a hidden option?
(I understand though that it wouldn't affect the size of config space)

> 
>> Hopefully this discussion could clear some of this up and maybe result
>> in more documentation around this subject.
> 
> What kind of documentation would help?

Well, mostly the history of this setting, why it was introduced, etc. more or less what we are discussing here.
Because honestly, killing random processes does not seems like a straightforward idea, ie: it is not obvious.
Like I was saying, without context, such behaviour looks a bit crazy.

>>
>> From what I remember, one of the LTP maintainers said that it is
>> highly unlikely people test (or run LTP for that matter) with
>> different settings for overcommit.
> 
> Yes this is sad and the result of a excessive configuration space.
> That's why I was pushing back to adding yet another one without having
> really good reasons...

Well, a more urgent problem would be that in that case overcommit=never is not really well tested.

> 
>> Years ago, while using MacOS X, a long running process apparently took
>> all the memory over night.  The next day when I checked the computer
>> I saw a dialog that said something like (I don't remember the exact
>> wording) "process X has been paused due to lack of memory (or is
>> requesting too much memory, I don't remember). If you think this is
>> not normal you can terminate process X, otherwise you can terminate
>> other processes to free memory and unpause process X to continue" and
>> then some options to proceed.
>>
>> If left unattended (thus the dialog unanswered), the computer would
>> still work, all other processes were left intact and only the
>> "offending" process was paused.  Arguably, if the "offending" process
>> is just left paused, it takes the memory away from other processes,
>> and if it was a server, maybe it wouldn't have enough memory to reply
>> to requests.  On the server world I can thus understand that some
>> setting could indicate that when the situation arises, the "dialog" is
>> automatically dismissed with some default action, like "terminate the
>> offending process".
> 
> Not sure what you are trying to tell here but it seems like killing such
> a leaking task is a better option as the memory can be reused for others
> rather than keep it blocked for an unbounded amount of time.

My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.

> 
>> To me it seems really strange for the "OOM-killer" to exist.  It has
>> happened to me that it kills my terminals or editors, how can people
>> deal with random processes being killed?  Doesn't it bother anybody?
> 
> Killing random tasks is definitely a misbehavior and it happened a lot
> in the past when heuristics were based on multiple metrics (including
> the run time etc.). Things have changed considerably since then and
> seeing random tasks being selected shouldn't happen all that often and
> if it happens it should be reported, understood and fixed.
> 

Well, it's hard to report, since it is essentially the result of a dynamic system.
I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.

In the end, no processes is a good candidate for termination.
What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
I mean, all running processes are supposedly there and running for a reason.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 12:39         ` Sebastian Frias
@ 2016-05-13 13:11           ` Austin S. Hemmelgarn
  2016-05-13 13:32             ` Sebastian Frias
  2016-05-13 13:34             ` Sebastian Frias
  2016-05-13 14:51           ` Michal Hocko
  1 sibling, 2 replies; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 13:11 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 08:39, Sebastian Frias wrote:
> On 05/13/2016 02:00 PM, Michal Hocko wrote:
>> On Fri 13-05-16 11:52:30, Sebastian Frias wrote:
>>>
>>> From what I remember, one of the LTP maintainers said that it is
>>> highly unlikely people test (or run LTP for that matter) with
>>> different settings for overcommit.
>>
>> Yes this is sad and the result of a excessive configuration space.
>> That's why I was pushing back to adding yet another one without having
>> really good reasons...
>
> Well, a more urgent problem would be that in that case overcommit=never is not really well tested.
I know more people who use overcommit=never than overcommit=always.  I 
use it myself on all my personal systems, but I also allocate 
significant amounts of swap space (usually 64G, but I also have a big 
disks in my systems and don't often hit swap), don't use Java, and 
generally don't use a lot of the more wasteful programs either (many of 
them on desktop systems tend to be stuff like office software).  I know 
a number of people who use overcommit=never on their servers and give 
them a decent amount of swap space (and again, don't use Java).
>
>>
>>> Years ago, while using MacOS X, a long running process apparently took
>>> all the memory over night.  The next day when I checked the computer
>>> I saw a dialog that said something like (I don't remember the exact
>>> wording) "process X has been paused due to lack of memory (or is
>>> requesting too much memory, I don't remember). If you think this is
>>> not normal you can terminate process X, otherwise you can terminate
>>> other processes to free memory and unpause process X to continue" and
>>> then some options to proceed.
>>>
>>> If left unattended (thus the dialog unanswered), the computer would
>>> still work, all other processes were left intact and only the
>>> "offending" process was paused.  Arguably, if the "offending" process
>>> is just left paused, it takes the memory away from other processes,
>>> and if it was a server, maybe it wouldn't have enough memory to reply
>>> to requests.  On the server world I can thus understand that some
>>> setting could indicate that when the situation arises, the "dialog" is
>>> automatically dismissed with some default action, like "terminate the
>>> offending process".
>>
>> Not sure what you are trying to tell here but it seems like killing such
>> a leaking task is a better option as the memory can be reused for others
>> rather than keep it blocked for an unbounded amount of time.
>
> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
There's an option for the OOM-killer to just kill the allocating task 
instead of using the scoring heuristic.  This is about as deterministic 
as things can get though.
>
>>
>>> To me it seems really strange for the "OOM-killer" to exist.  It has
>>> happened to me that it kills my terminals or editors, how can people
>>> deal with random processes being killed?  Doesn't it bother anybody?
>>
>> Killing random tasks is definitely a misbehavior and it happened a lot
>> in the past when heuristics were based on multiple metrics (including
>> the run time etc.). Things have changed considerably since then and
>> seeing random tasks being selected shouldn't happen all that often and
>> if it happens it should be reported, understood and fixed.
>>
>
> Well, it's hard to report, since it is essentially the result of a dynamic system.
> I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
> Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.
>
> In the end, no processes is a good candidate for termination.
> What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
> I mean, all running processes are supposedly there and running for a reason.
OTOH, just because something is there for a reason doesn't mean it's 
doing what it's supposed to be.  Bugs happen, including memory leaks, 
and if something is misbehaving enough that it impacts the rest of the 
system, it really should be dealt with.

This brings to mind a complex bug involving Tor and GCC whereby building 
certain (old) versions of Tor with certain (old) versions of GCC with 
-Os would cause an infinite loop in GCC.  You obviously have GCC running 
for a reason, but that doesn't mean that it's doing what it should be.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 10:18       ` Mason
  2016-05-13 10:42         ` Sebastian Frias
  2016-05-13 11:44         ` Michal Hocko
@ 2016-05-13 13:27         ` Austin S. Hemmelgarn
  2 siblings, 0 replies; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 13:27 UTC (permalink / raw)
  To: Mason, Michal Hocko
  Cc: Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 06:18, Mason wrote:
> On 13/05/2016 11:52, Michal Hocko wrote:
>> On Fri 13-05-16 10:44:30, Mason wrote:
>>> On 13/05/2016 10:04, Michal Hocko wrote:
>>>
>>>> On Tue 10-05-16 13:56:30, Sebastian Frias wrote:
>>>> [...]
>>>>> NOTE: I understand that the overcommit mode can be changed dynamically thru
>>>>> sysctl, but on embedded systems, where we know in advance that overcommit
>>>>> will be disabled, there's no reason to postpone such setting.
>>>>
>>>> To be honest I am not particularly happy about yet another config
>>>> option. At least not without a strong reason (the one above doesn't
>>>> sound that way). The config space is really large already.
>>>> So why a later initialization matters at all? Early userspace shouldn't
>>>> consume too much address space to blow up later, no?
>>>
>>> One thing I'm not quite clear on is: why was the default set
>>> to over-commit on?
>>
>> Because many applications simply rely on large and sparsely used address
>> space, I guess.
>
> What kind of applications are we talking about here?
>
> Server apps? Client apps? Supercomputer apps?
>
> I heard some HPC software use large sparse matrices, but is it a common
> idiom to request large allocations, only to use a fraction of it?
Just looking at my laptop right now, I count the number of processes 
which have a RSS which is more than 25% of their allocated memory to be 
about 15-20 out of ~170 processes and ~360 threads.  Somewhat 
unsurprisingly, most of the ones that fit this are highly purpose 
specific (cachefilesd, syslogd, etc), and the only ones whose RSS is 
within 1% of their allocated memory are BOINC applications (distributed 
and/or scientific computing apps tend to be really good about efficient 
usage of memory, even when they use sparse matrices).  There are in fact 
a lot of 'normal' daemons that do this (sshd on my system for example 
has 460k resident and 28.5M allocated, atd has 122k resident and 12.6M 
allocated, acpid has 120k resident and 4.2M allocated).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 13:11           ` Austin S. Hemmelgarn
@ 2016-05-13 13:32             ` Sebastian Frias
  2016-05-13 13:51               ` Austin S. Hemmelgarn
  2016-05-13 13:34             ` Sebastian Frias
  1 sibling, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 13:32 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Austin,

On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
> On 2016-05-13 08:39, Sebastian Frias wrote:
>> Well, a more urgent problem would be that in that case overcommit=never is not really well tested.
> I know more people who use overcommit=never than overcommit=always.  I use it myself on all my personal systems, but I also allocate significant amounts of swap space (usually 64G, but I also have a big disks in my systems and don't often hit swap), don't use Java, and generally don't use a lot of the more wasteful programs either (many of them on desktop systems tend to be stuff like office software).  I know a number of people who use overcommit=never on their servers and give them a decent amount of swap space (and again, don't use Java).

Then I'll look into LTP and the issues it has when overcommit=never.

>>
>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.

I didn't see that in Documentation/vm/overcommit-accounting or am I looking in the wrong place?

>>
>> Well, it's hard to report, since it is essentially the result of a dynamic system.
>> I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
>> Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.
>>
>> In the end, no processes is a good candidate for termination.
>> What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
>> I mean, all running processes are supposedly there and running for a reason.
> OTOH, just because something is there for a reason doesn't mean it's doing what it's supposed to be.  Bugs happen, including memory leaks, and if something is misbehaving enough that it impacts the rest of the system, it really should be dealt with.

Exactly, it's just that in this case, the system is deciding how to deal with the situation by itself.

> 
> This brings to mind a complex bug involving Tor and GCC whereby building certain (old) versions of Tor with certain (old) versions of GCC with -Os would cause an infinite loop in GCC.  You obviously have GCC running for a reason, but that doesn't mean that it's doing what it should be.

I'm not sure if I followed the analogy/example, but are you saying that the OOM-killer killed GCC in your example?
This seems an odd example though, I mean, shouldn't the guy in front of the computer notice the loop and kill GCC by himself?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 13:11           ` Austin S. Hemmelgarn
  2016-05-13 13:32             ` Sebastian Frias
@ 2016-05-13 13:34             ` Sebastian Frias
  2016-05-13 14:14               ` Austin S. Hemmelgarn
  2016-05-13 15:01               ` One Thousand Gnomes
  1 sibling, 2 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 13:34 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Austin,

On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
> On 2016-05-13 08:39, Sebastian Frias wrote:
>>
>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.

By the way, why does it has to "kill" anything in that case?
I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 13:32             ` Sebastian Frias
@ 2016-05-13 13:51               ` Austin S. Hemmelgarn
  2016-05-13 14:35                 ` Sebastian Frias
  0 siblings, 1 reply; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 13:51 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 09:32, Sebastian Frias wrote:
> Hi Austin,
>
> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
>> On 2016-05-13 08:39, Sebastian Frias wrote:
>>> Well, a more urgent problem would be that in that case overcommit=never is not really well tested.
>> I know more people who use overcommit=never than overcommit=always.  I use it myself on all my personal systems, but I also allocate significant amounts of swap space (usually 64G, but I also have a big disks in my systems and don't often hit swap), don't use Java, and generally don't use a lot of the more wasteful programs either (many of them on desktop systems tend to be stuff like office software).  I know a number of people who use overcommit=never on their servers and give them a decent amount of swap space (and again, don't use Java).
>
> Then I'll look into LTP and the issues it has when overcommit=never.
>
>>>
>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.
>
> I didn't see that in Documentation/vm/overcommit-accounting or am I looking in the wrong place?
It's controlled by a sysctl value, so it's listed in 
Documentation/sysctl/vm.txt
The relevant sysctl is vm.oom_kill_allocating_task
>
>>>
>>> Well, it's hard to report, since it is essentially the result of a dynamic system.
>>> I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
>>> Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.
>>>
>>> In the end, no processes is a good candidate for termination.
>>> What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
>>> I mean, all running processes are supposedly there and running for a reason.
>> OTOH, just because something is there for a reason doesn't mean it's doing what it's supposed to be.  Bugs happen, including memory leaks, and if something is misbehaving enough that it impacts the rest of the system, it really should be dealt with.
>
> Exactly, it's just that in this case, the system is deciding how to deal with the situation by itself.
On a busy server where uptime is critical, you can't wait for someone to 
notice and handle it manually, you need the issue resolved ASAP.  Now, 
this won't always kill the correct thing, but if it's due to a memory 
leak, it often will work like it should.
>
>>
>> This brings to mind a complex bug involving Tor and GCC whereby building certain (old) versions of Tor with certain (old) versions of GCC with -Os would cause an infinite loop in GCC.  You obviously have GCC running for a reason, but that doesn't mean that it's doing what it should be.
>
> I'm not sure if I followed the analogy/example, but are you saying that the OOM-killer killed GCC in your example?
> This seems an odd example though, I mean, shouldn't the guy in front of the computer notice the loop and kill GCC by himself?
No, I didn't mean as an example of the OOM killer, I just meant as an 
example of software not doing what it should.  It's not as easy to find 
an example for the OOM killer, so I don't really have a good example. 
The general concept is the same though, the only difference is there 
isn't a kernel protection against infinite loops (because they aren't 
always bugs, while memory leaks and similar are).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 12:15           ` Mason
@ 2016-05-13 14:01             ` Michal Hocko
  2016-05-13 14:15               ` Sebastian Frias
  2016-05-13 15:04               ` One Thousand Gnomes
  0 siblings, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 14:01 UTC (permalink / raw)
  To: Mason; +Cc: Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 14:15:35, Mason wrote:
> On 13/05/2016 13:44, Michal Hocko wrote:
> 
> > Anyway, this is my laptop where I do not run anything really special
> > (xfce, browser, few consoles, git, mutt):
> > $ grep Commit /proc/meminfo
> > CommitLimit:     3497288 kB
> > Committed_AS:    3560804 kB
> > 
> > I am running with the default overcommit setup so I do not care about
> > the limit but the Committed_AS will tell you how much is actually
> > committed. I am definitelly not out of memory:
> > $ free
> >               total        used        free      shared  buff/cache   available
> > Mem:        3922584     1724120      217336      105264     1981128     2036164
> > Swap:       1535996      386364     1149632
> 
> I see. Thanks for the data point.
> 
> I had a different type of system in mind.
> 256 to 512 MB of RAM, no swap.
> Perhaps Sebastian's choice could be made to depend on CONFIG_EMBEDDED,
> rather than CONFIG_EXPERT?

Even if the overcommit behavior is different on those systems the
primary question hasn't been answered yet. Why cannot this be done from
the userspace? In other words what wouldn't work properly?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 13:34             ` Sebastian Frias
@ 2016-05-13 14:14               ` Austin S. Hemmelgarn
  2016-05-13 14:23                 ` Sebastian Frias
  2016-05-13 15:01               ` One Thousand Gnomes
  1 sibling, 1 reply; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 14:14 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 09:34, Sebastian Frias wrote:
> Hi Austin,
>
> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
>> On 2016-05-13 08:39, Sebastian Frias wrote:
>>>
>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.
>
> By the way, why does it has to "kill" anything in that case?
> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
In theory, that's a great idea.  In practice though, it only works if:
1. The allocating task correctly handles malloc() (or whatever other 
function it uses) returning NULL, which a number of programs don't.
2. The task actually has fallback options for memory limits.  Many 
programs that do handle getting a NULL pointer from malloc() handle it 
by exiting anyway, so there's not as much value in this case.
3. There isn't a memory leak somewhere on the system.  Killing the 
allocating task doesn't help much if this is the case of course.

You have to keep in mind though, that on a properly provisioned system, 
the only situations where the OOM killer should be invoked are when 
there's a memory leak, or when someone is intentionally trying to DoS 
the system through memory exhaustion.  If you're hitting the OOM killer 
for any other reason than those or a kernel bug, then you just need more 
memory or more swap space.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:01             ` Michal Hocko
@ 2016-05-13 14:15               ` Sebastian Frias
  2016-05-13 15:04               ` One Thousand Gnomes
  1 sibling, 0 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 14:15 UTC (permalink / raw)
  To: Michal Hocko, Mason; +Cc: linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Michal,

On 05/13/2016 04:01 PM, Michal Hocko wrote:
> On Fri 13-05-16 14:15:35, Mason wrote:
>> On 13/05/2016 13:44, Michal Hocko wrote:
>>
>>> Anyway, this is my laptop where I do not run anything really special
>>> (xfce, browser, few consoles, git, mutt):
>>> $ grep Commit /proc/meminfo
>>> CommitLimit:     3497288 kB
>>> Committed_AS:    3560804 kB
>>>
>>> I am running with the default overcommit setup so I do not care about
>>> the limit but the Committed_AS will tell you how much is actually
>>> committed. I am definitelly not out of memory:
>>> $ free
>>>               total        used        free      shared  buff/cache   available
>>> Mem:        3922584     1724120      217336      105264     1981128     2036164
>>> Swap:       1535996      386364     1149632
>>
>> I see. Thanks for the data point.
>>
>> I had a different type of system in mind.
>> 256 to 512 MB of RAM, no swap.
>> Perhaps Sebastian's choice could be made to depend on CONFIG_EMBEDDED,
>> rather than CONFIG_EXPERT?
> 
> Even if the overcommit behavior is different on those systems the
> primary question hasn't been answered yet. Why cannot this be done from
> the userspace? In other words what wouldn't work properly?
> 

You are right, and I said that since the beginning, nothing prevents the userspace from doing it.

But it'd be interesting to know the history of this option, for example, why it is left for userspace.
Are there systems that dynamically change this setting?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:14               ` Austin S. Hemmelgarn
@ 2016-05-13 14:23                 ` Sebastian Frias
  2016-05-13 15:02                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 14:23 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Austin,

On 05/13/2016 04:14 PM, Austin S. Hemmelgarn wrote:
> On 2016-05-13 09:34, Sebastian Frias wrote:
>> Hi Austin,
>>
>> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
>>> On 2016-05-13 08:39, Sebastian Frias wrote:
>>>>
>>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
>>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.
>>
>> By the way, why does it has to "kill" anything in that case?
>> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
> In theory, that's a great idea.  In practice though, it only works if:
> 1. The allocating task correctly handles malloc() (or whatever other function it uses) returning NULL, which a number of programs don't.
> 2. The task actually has fallback options for memory limits.  Many programs that do handle getting a NULL pointer from malloc() handle it by exiting anyway, so there's not as much value in this case.
> 3. There isn't a memory leak somewhere on the system.  Killing the allocating task doesn't help much if this is the case of course.

Well, the thing is that the current behaviour, i.e.: overcommiting, does not improves the quality of those programs.
I mean, what incentive do they have to properly handle situations 1, 2?

Also, if there's a memory leak, the termination of any task, whether it is the allocating task or something random, does not help either, the system will eventually go down, right?

> 
> You have to keep in mind though, that on a properly provisioned system, the only situations where the OOM killer should be invoked are when there's a memory leak, or when someone is intentionally trying to DoS the system through memory exhaustion. 

Exactly, the DoS attack is another reason why the OOM-killer does not seem a good idea, at least compared to just letting malloc return NULL and let the program fail.

>If you're hitting the OOM killer for any other reason than those or a kernel bug, then you just need more memory or more swap space.
> 

Indeed.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 13:51               ` Austin S. Hemmelgarn
@ 2016-05-13 14:35                 ` Sebastian Frias
  2016-05-13 14:54                   ` Michal Hocko
  2016-05-13 15:15                   ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 14:35 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Austin,

On 05/13/2016 03:51 PM, Austin S. Hemmelgarn wrote:
> On 2016-05-13 09:32, Sebastian Frias wrote:
>> I didn't see that in Documentation/vm/overcommit-accounting or am I looking in the wrong place?
> It's controlled by a sysctl value, so it's listed in Documentation/sysctl/vm.txt
> The relevant sysctl is vm.oom_kill_allocating_task

Thanks, I just read that.
Does not look like a replacement for overcommit=never though.

>>
>>>>
>>>> Well, it's hard to report, since it is essentially the result of a dynamic system.
>>>> I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
>>>> Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.
>>>>
>>>> In the end, no processes is a good candidate for termination.
>>>> What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
>>>> I mean, all running processes are supposedly there and running for a reason.
>>> OTOH, just because something is there for a reason doesn't mean it's doing what it's supposed to be.  Bugs happen, including memory leaks, and if something is misbehaving enough that it impacts the rest of the system, it really should be dealt with.
>>
>> Exactly, it's just that in this case, the system is deciding how to deal with the situation by itself.
> On a busy server where uptime is critical, you can't wait for someone to notice and handle it manually, you need the issue resolved ASAP.  Now, this won't always kill the correct thing, but if it's due to a memory leak, it often will work like it should.

The keyword is "'often' will work as expected".
So you are saying that it will kill a program leaking memory in what, like 90% of the cases?
I'm not sure if I would setup a server with critical uptime to have the OOM-killer enabled, do you think that'd be a good idea?

Anyway, as a side note, I just want to say thank you guys for having this discussion.
I think it is an interesting thread and hopefully it will advance the "knowledge" about this setting.

>>
>>>
>>> This brings to mind a complex bug involving Tor and GCC whereby building certain (old) versions of Tor with certain (old) versions of GCC with -Os would cause an infinite loop in GCC.  You obviously have GCC running for a reason, but that doesn't mean that it's doing what it should be.
>>
>> I'm not sure if I followed the analogy/example, but are you saying that the OOM-killer killed GCC in your example?
>> This seems an odd example though, I mean, shouldn't the guy in front of the computer notice the loop and kill GCC by himself?
> No, I didn't mean as an example of the OOM killer, I just meant as an example of software not doing what it should.  It's not as easy to find an example for the OOM killer, so I don't really have a good example. The general concept is the same though, the only difference is there isn't a kernel protection against infinite loops (because they aren't always bugs, while memory leaks and similar are).

So how does the kernel knows that a process is "leaking memory" as opposed to just "using lots of memory"? (wouldn't that be comparable to answering how does the kernel knows the difference between an infinite loop and one that is not?)

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 12:39         ` Sebastian Frias
  2016-05-13 13:11           ` Austin S. Hemmelgarn
@ 2016-05-13 14:51           ` Michal Hocko
  2016-05-13 14:59             ` Mason
  2016-05-13 15:10             ` Sebastian Frias
  1 sibling, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 14:51 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 14:39:01, Sebastian Frias wrote:
> Hi Michal,
> 
> On 05/13/2016 02:00 PM, Michal Hocko wrote:
> > On Fri 13-05-16 11:52:30, Sebastian Frias wrote:
[...]
> >> Indeed, I was hoping we could throw some light into that.
> >> My patch had another note:
> > 
> > I cannot really tell because this was way before my time but I guess the
> > reason was that userspace is usually very address space hungry while the
> > actual memory consumption is not that bad. See my other email.
> 
> Yes, I saw that, thanks for the example.
> It's just that it feels like the default value is there to deal with
> (what it should be?) very specific cases, right?

The default should cover the most use cases. If you can prove that the
vast majority of embeded systems are different and would _benefit_ from
a different default I wouldn't be opposed to change the default there.

> >> It'd be nice to know more about why was overcommit introduced.
> >> Furthermore, it looks like allowing overcommit and the introduction
> >> of the OOM-killer has given rise to lots of other options to try to
> >> tame the OOM-killer.
> >> Without context, that may seem like a form of "feature creep" around it.
> >> Moreover, it makes Linux behave differently from let's say Solaris.
> >>
> >>    https://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6
> > 
> > Well, those are some really strong statements which do not really
> > reflect the reality of the linux userspace. I am not going to argue with
> > those points because it doesn't make much sense. Yes in an ideal world
> > everybody consumes only so much he needs. Well the real life is a bit
> > different...
> 
> :-)
> I see, so basically it is a sort of workaround.

No it is not a workaround. It is just serving the purpose of the
operating system. The allow using the HW as much as possible to the
existing userspace. You cannot expect userspace will change just because
we do not like the overcommiting the memory with all the fallouts.

> Anyway, in the embedded world the memory and system requirements are
> usually controlled.

OK, but even when it is controlled does it suffer in any way just
because of the default setting? Do you see OOM killer invocation
when the overcommit would prevent from that?

> Would you agree to the option if it was dependent on
> CONFIG_EMBEDDED? Or if it was a hidden option?
> (I understand though that it wouldn't affect the size of config space)

It could be done in the code and make the default depending on the
existing config. But first try to think about what would be an advantage
of such a change.
 
> >> Hopefully this discussion could clear some of this up and maybe result
> >> in more documentation around this subject.
> > 
> > What kind of documentation would help?
> 
> Well, mostly the history of this setting, why it was introduced, etc.
> more or less what we are discussing here.  Because honestly, killing
> random processes does not seems like a straightforward idea, ie: it is
> not obvious.  Like I was saying, without context, such behaviour looks
> a bit crazy.

But we are not killing a random process. The semantic is quite clear. We
are trying to kill the biggest memory hog and if it has some children
try to sacrifice them to save as much work as possible.

> >> From what I remember, one of the LTP maintainers said that it is
> >> highly unlikely people test (or run LTP for that matter) with
> >> different settings for overcommit.
> > 
> > Yes this is sad and the result of a excessive configuration space.
> > That's why I was pushing back to adding yet another one without having
> > really good reasons...
> 
> Well, a more urgent problem would be that in that case
> overcommit=never is not really well tested.

This is a problem of the userspace and am really skeptical that a change
in default would make any existing bugs going away. It is more likely we
will see reports that ENOMEM has been returned even though there is
pletny of memory available.

[...]

> > Killing random tasks is definitely a misbehavior and it happened a lot
> > in the past when heuristics were based on multiple metrics (including
> > the run time etc.). Things have changed considerably since then and
> > seeing random tasks being selected shouldn't happen all that often and
> > if it happens it should be reported, understood and fixed.
> > 
> 
> Well, it's hard to report, since it is essentially the result of a
> dynamic system.

Each oom killer invocation will provide a detailed report which will
help MM developers to debug what went wrong and why.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:35                 ` Sebastian Frias
@ 2016-05-13 14:54                   ` Michal Hocko
  2016-05-13 15:15                   ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 14:54 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: Austin S. Hemmelgarn, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML

On Fri 13-05-16 16:35:20, Sebastian Frias wrote:
> Hi Austin,
> 
> On 05/13/2016 03:51 PM, Austin S. Hemmelgarn wrote:
> > On 2016-05-13 09:32, Sebastian Frias wrote:
> >> I didn't see that in Documentation/vm/overcommit-accounting or am I looking in the wrong place?
> > It's controlled by a sysctl value, so it's listed in Documentation/sysctl/vm.txt
> > The relevant sysctl is vm.oom_kill_allocating_task
> 
> Thanks, I just read that.
> Does not look like a replacement for overcommit=never though.

No this is just an OOM strategy. I wouldn't recommend it though because
the behavior might be really time dependant - unlike the regular OOM
killer strategy to select the largest memory consumer.

And again, overcommit=never doesn't imply no-OOM. It just makes it less
likely. The kernel can consume quite some unreclaimable memory as well.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:51           ` Michal Hocko
@ 2016-05-13 14:59             ` Mason
  2016-05-13 15:11               ` One Thousand Gnomes
  2016-05-13 15:10             ` Sebastian Frias
  1 sibling, 1 reply; 52+ messages in thread
From: Mason @ 2016-05-13 14:59 UTC (permalink / raw)
  To: Michal Hocko, Sebastian Frias
  Cc: linux-mm, Andrew Morton, Linus Torvalds, LKML

On 13/05/2016 16:51, Michal Hocko wrote:

> The default should cover the most use cases. If you can prove that the
> vast majority of embedded systems are different and would _benefit_ from
> a different default I wouldn't be opposed to change the default there.

It seems important to point out that Sebastian's patch does NOT change
the default behavior. It merely creates a knob allowing one to override
the default via Kconfig.

+choice
+	prompt "Overcommit Mode"
+	default OVERCOMMIT_GUESS
+	depends on EXPERT

Regards.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 13:34             ` Sebastian Frias
  2016-05-13 14:14               ` Austin S. Hemmelgarn
@ 2016-05-13 15:01               ` One Thousand Gnomes
  2016-05-13 15:15                 ` Sebastian Frias
  1 sibling, 1 reply; 52+ messages in thread
From: One Thousand Gnomes @ 2016-05-13 15:01 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: Austin S. Hemmelgarn, Michal Hocko, Mason, linux-mm,
	Andrew Morton, Linus Torvalds, LKML

On Fri, 13 May 2016 15:34:52 +0200
Sebastian Frias <sf84@laposte.net> wrote:

> Hi Austin,
> 
> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
> > On 2016-05-13 08:39, Sebastian Frias wrote:  
> >>
> >> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.  
> > There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.  
> 
> By the way, why does it has to "kill" anything in that case?
> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?

Just turn off overcommit and it will do that. With overcommit disabled
the kernel will not hand out address space in excess of memory plus swap.

Alan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:23                 ` Sebastian Frias
@ 2016-05-13 15:02                   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 15:02 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 10:23, Sebastian Frias wrote:
> Hi Austin,
>
> On 05/13/2016 04:14 PM, Austin S. Hemmelgarn wrote:
>> On 2016-05-13 09:34, Sebastian Frias wrote:
>>> Hi Austin,
>>>
>>> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
>>>> On 2016-05-13 08:39, Sebastian Frias wrote:
>>>>>
>>>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
>>>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.
>>>
>>> By the way, why does it has to "kill" anything in that case?
>>> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
>> In theory, that's a great idea.  In practice though, it only works if:
>> 1. The allocating task correctly handles malloc() (or whatever other function it uses) returning NULL, which a number of programs don't.
>> 2. The task actually has fallback options for memory limits.  Many programs that do handle getting a NULL pointer from malloc() handle it by exiting anyway, so there's not as much value in this case.
>> 3. There isn't a memory leak somewhere on the system.  Killing the allocating task doesn't help much if this is the case of course.
>
> Well, the thing is that the current behaviour, i.e.: overcommiting, does not improves the quality of those programs.
> I mean, what incentive do they have to properly handle situations 1, 2?
Overcommit got introduced because of these, not the other way around. 
It's not forcing them to change, but it's also a core concept in any 
modern virtual memory based OS, and that's not ever going to change either.

You also have to keep in mind that most apps aren't doing this 
intentionally.  There are three general reasons they do this:
1. They don't know how much memory they will need, so they guess high 
because malloc() is computationally expensive.  This is technically 
intentional, but it's also something that can't be avoided in some cases 
  Dropbox is a perfect example of this taken way too far (they also take 
the concept of a thread pool too far).
2. The program has a lot of code that isn't frequently run.  It makes no 
sense to keep code that isn't used in RAM, so it gets either dropped (if 
it's unmodified), or it gets swapped out.  Most of the programs that I 
see on my system fall into this category (acpid  for example just sleeps 
until an ACPI event happens, so it usually won't have most of it's code 
in memory on a busy system).
3. The application wants to do it's own memory management.  This is 
common on a lot of HPC apps and some high performance server software.
>
> Also, if there's a memory leak, the termination of any task, whether it is the allocating task or something random, does not help either, the system will eventually go down, right?
If the memory leak is in the kernel, then yes, the OOM killer won't 
help, period.  But if the memory leak is in userspace, and the OOM 
killer kills the task with the leak (which it usually will if you don't 
have it set to kill the allocating task), then it may have just saved 
the system from crashing completely.  Yes some user may lose some 
unsaved work, but they would lose that data anyway if the system 
crashes, and they can probably still use the rest of the system.
>> You have to keep in mind though, that on a properly provisioned system, the only situations where the OOM killer should be invoked are when there's a memory leak, or when someone is intentionally trying to DoS the system through memory exhaustion.
>
> Exactly, the DoS attack is another reason why the OOM-killer does not seem a good idea, at least compared to just letting malloc return NULL and let the program fail.
Because of overcommit, it's possible for the allocation to succeed, but 
the subsequent access to fail.  At that point, you're way past malloc() 
returning, and you have to do something.

Also, returning NULL on a failed malloc() provides zero protection 
against all but the most brain-dead memory exhaustion based DoS attacks. 
  The general core of a memory exhaustion DoS against a local system 
follows a simple three step procedure:
     1. Try to allocate a small chunk of memory (less than or equal to 
page size)
     2. If the allocation succeeded, write to the first byte of that 
chunk of memory, forcing actual allocation
     3. Repeat indefinitely from step 1
Step 2 is the crucial part here, if you don't write to the memory, it 
will only eat up your own virtual address space.  If you don't check for 
a NULL pointer and skip writing, you get a segfault.  If the OOM killer 
isn't invoked in such a situation, then this will just eat up all the 
free system memory, and then _keep running_ and eat up all the other 
memory as it's freed by other things exiting due to lack of memory.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:01             ` Michal Hocko
  2016-05-13 14:15               ` Sebastian Frias
@ 2016-05-13 15:04               ` One Thousand Gnomes
  2016-05-13 15:37                 ` Sebastian Frias
  1 sibling, 1 reply; 52+ messages in thread
From: One Thousand Gnomes @ 2016-05-13 15:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mason, Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

> > Perhaps Sebastian's choice could be made to depend on CONFIG_EMBEDDED,
> > rather than CONFIG_EXPERT?  
> 
> Even if the overcommit behavior is different on those systems the
> primary question hasn't been answered yet. Why cannot this be done from
> the userspace? In other words what wouldn't work properly?

Most allocations in C have no mechanism to report failure.

Stakc expansion failure is not reportable. Copy on write failure is not
reportable and so on.

Alan
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:51           ` Michal Hocko
  2016-05-13 14:59             ` Mason
@ 2016-05-13 15:10             ` Sebastian Frias
  2016-05-13 15:41               ` One Thousand Gnomes
  1 sibling, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 15:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Michal,

On 05/13/2016 04:51 PM, Michal Hocko wrote:
> 
> The default should cover the most use cases. If you can prove that the
> vast majority of embeded systems are different and would _benefit_ from
> a different default I wouldn't be opposed to change the default there.

I'm unsure of a way to prove that.
I mean, what was the way used to prove that "the most use cases" is ok with overcommit=guess? It seems it was an empirical thing.

Also note that this is not changing any default.
It is merely adding the option to change the initial mode without relying on the userspace.

>> :-)
>> I see, so basically it is a sort of workaround.
> 
> No it is not a workaround. It is just serving the purpose of the
> operating system. The allow using the HW as much as possible to the
> existing userspace. You cannot expect userspace will change just because
> we do not like the overcommiting the memory with all the fallouts.

I agree, but that is one of the things that is fuzzy.
My understanding is that there was a time when there was no overcommit at all.
If that's the case, understanding why overcommit was introduced would be helpful.

>> Anyway, in the embedded world the memory and system requirements are
>> usually controlled.
> 
> OK, but even when it is controlled does it suffer in any way just
> because of the default setting? Do you see OOM killer invocation
> when the overcommit would prevent from that?

I'll have to check those LTP tests again, I'll come back to this question later then.

>> Would you agree to the option if it was dependent on
>> CONFIG_EMBEDDED? Or if it was a hidden option?
>> (I understand though that it wouldn't affect the size of config space)
> 
> It could be done in the code and make the default depending on the
> existing config. But first try to think about what would be an advantage
> of such a change.

:) Well, right now I'm just trying to understand the history of this setting, because it is not obvious why it is good.

>>
>> Well, mostly the history of this setting, why it was introduced, etc.
>> more or less what we are discussing here.  Because honestly, killing
>> random processes does not seems like a straightforward idea, ie: it is
>> not obvious.  Like I was saying, without context, such behaviour looks
>> a bit crazy.
> 
> But we are not killing a random process. The semantic is quite clear. We
> are trying to kill the biggest memory hog and if it has some children
> try to sacrifice them to save as much work as possible.

Ok.
That's not the impression I have considering in my case it killed terminals and editors, but I'll try to get some examples.

>>
>> Well, a more urgent problem would be that in that case
>> overcommit=never is not really well tested.
> 
> This is a problem of the userspace and am really skeptical that a change
> in default would make any existing bugs going away. It is more likely we
> will see reports that ENOMEM has been returned even though there is
> pletny of memory available.
> 

Again, I did not propose to change the default.
The idea was just to allow setting the initial overcommit mode in the kernel without relying on the userspace.
(also beause it is still not yet clear why it is left to the userspace)

>>
>> Well, it's hard to report, since it is essentially the result of a
>> dynamic system.
> 
> Each oom killer invocation will provide a detailed report which will
> help MM developers to debug what went wrong and why.
> 

Ok.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:59             ` Mason
@ 2016-05-13 15:11               ` One Thousand Gnomes
  2016-05-13 15:26                 ` Michal Hocko
  2016-05-13 15:32                 ` Sebastian Frias
  0 siblings, 2 replies; 52+ messages in thread
From: One Thousand Gnomes @ 2016-05-13 15:11 UTC (permalink / raw)
  To: Mason
  Cc: Michal Hocko, Sebastian Frias, linux-mm, Andrew Morton,
	Linus Torvalds, LKML

> It seems important to point out that Sebastian's patch does NOT change
> the default behavior. It merely creates a knob allowing one to override
> the default via Kconfig.
> 
> +choice
> +	prompt "Overcommit Mode"
> +	default OVERCOMMIT_GUESS
> +	depends on EXPERT

Which is still completely pointless given that its a single sysctl value
set at early userspace time and most distributions ship with things like
sysctl and /etc/sysctl.conf

We have a million other such knobs, putting them in kconfig just gets
silly.

Alan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 14:35                 ` Sebastian Frias
  2016-05-13 14:54                   ` Michal Hocko
@ 2016-05-13 15:15                   ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 15:15 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 10:35, Sebastian Frias wrote:
> Hi Austin,
>
> On 05/13/2016 03:51 PM, Austin S. Hemmelgarn wrote:
>> On 2016-05-13 09:32, Sebastian Frias wrote:
>>>
>>>>>
>>>>> Well, it's hard to report, since it is essentially the result of a dynamic system.
>>>>> I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
>>>>> Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.
>>>>>
>>>>> In the end, no processes is a good candidate for termination.
>>>>> What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
>>>>> I mean, all running processes are supposedly there and running for a reason.
>>>> OTOH, just because something is there for a reason doesn't mean it's doing what it's supposed to be.  Bugs happen, including memory leaks, and if something is misbehaving enough that it impacts the rest of the system, it really should be dealt with.
>>>
>>> Exactly, it's just that in this case, the system is deciding how to deal with the situation by itself.
>> On a busy server where uptime is critical, you can't wait for someone to notice and handle it manually, you need the issue resolved ASAP.  Now, this won't always kill the correct thing, but if it's due to a memory leak, it often will work like it should.
>
> The keyword is "'often' will work as expected".
> So you are saying that it will kill a program leaking memory in what, like 90% of the cases?
If the program leaking memory has the highest memory consumption, it 
will be the one that gets killed.  If not, then it will eventually be 
the one with the highest memory consumption and be killed (usually 
pretty quickly if it's leaking memory fast).
> I'm not sure if I would setup a server with critical uptime to have the OOM-killer enabled, do you think that'd be a good idea?
It really depends.  If you've got a setup with a bunch of web-servers 
behind a couple of load balancers which are set up in a HA 
configuration, I absolutely would run with the OOM killer enabled on 
everything.  There are in fact very few cases I wouldn't run with it 
enabled, as it's almost always better on a server to be able to actually 
log in to see what's wrong than to have to deal with resource exhaustion.

Most of the servers where I work are set to panic on OOM instead of 
killing something, because if we hit an OOM condition it's either a bug 
or a DoS attack, and either case needs to be noticed immediately, and 
taking out the entire system is the most reliable way to make sure it 
gets noticed.
>
> Anyway, as a side note, I just want to say thank you guys for having this discussion.
> I think it is an interesting thread and hopefully it will advance the "knowledge" about this setting.
>
>>>
>>>>
>>>> This brings to mind a complex bug involving Tor and GCC whereby building certain (old) versions of Tor with certain (old) versions of GCC with -Os would cause an infinite loop in GCC.  You obviously have GCC running for a reason, but that doesn't mean that it's doing what it should be.
>>>
>>> I'm not sure if I followed the analogy/example, but are you saying that the OOM-killer killed GCC in your example?
>>> This seems an odd example though, I mean, shouldn't the guy in front of the computer notice the loop and kill GCC by himself?
>> No, I didn't mean as an example of the OOM killer, I just meant as an example of software not doing what it should.  It's not as easy to find an example for the OOM killer, so I don't really have a good example. The general concept is the same though, the only difference is there isn't a kernel protection against infinite loops (because they aren't always bugs, while memory leaks and similar are).
>
> So how does the kernel knows that a process is "leaking memory" as opposed to just "using lots of memory"? (wouldn't that be comparable to answering how does the kernel knows the difference between an infinite loop and one that is not?)
It doesn't, it sees who's using the most RAM and kills that first.  If 
something is leaking memory, it will eventually kill that and you should 
have a working system again if you have process supervision.

There are three cases where it won't kill the task with the largest 
memory consumption:
1. You have /proc/sys/vm/panic_on_oom set to 1, which will cause the 
kernel to panic instead of killing a single task.
2. You have /proc/sys/vm/oom_kill_allocating_task set to 1, in which 
case it will kill whatever triggered the fault that caused the OOM 
condition.
3. You have adjusted the OOM score for tasks via /proc.  The score 
normally scales with memory usage, but it's possible to set it higher 
for specific tasks.  Many of the public distributed computing platforms 
(like BOINC) use this to cause their applications to be the first target 
for the OOM killer.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:01               ` One Thousand Gnomes
@ 2016-05-13 15:15                 ` Sebastian Frias
  2016-05-13 15:25                   ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 15:15 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Austin S. Hemmelgarn, Michal Hocko, Mason, linux-mm,
	Andrew Morton, Linus Torvalds, LKML

Hi Alan,

On 05/13/2016 05:01 PM, One Thousand Gnomes wrote:
> On Fri, 13 May 2016 15:34:52 +0200
> Sebastian Frias <sf84@laposte.net> wrote:
> 
>> Hi Austin,
>>
>> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
>>> On 2016-05-13 08:39, Sebastian Frias wrote:  
>>>>
>>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.  
>>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.  
>>
>> By the way, why does it has to "kill" anything in that case?
>> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
> 
> Just turn off overcommit and it will do that. With overcommit disabled
> the kernel will not hand out address space in excess of memory plus swap.

I think I'm confused.
Michal just said:

   "And again, overcommit=never doesn't imply no-OOM. It just makes it less
likely. The kernel can consume quite some unreclaimable memory as well."

which I understand as the OOM-killer will still lurk around and could still wake up.

Will overcommit=never totally disable the OOM-Killer or not?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:15                 ` Sebastian Frias
@ 2016-05-13 15:25                   ` Michal Hocko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 15:25 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: One Thousand Gnomes, Austin S. Hemmelgarn, Mason, linux-mm,
	Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 17:15:26, Sebastian Frias wrote:
> Hi Alan,
> 
> On 05/13/2016 05:01 PM, One Thousand Gnomes wrote:
> > On Fri, 13 May 2016 15:34:52 +0200
> > Sebastian Frias <sf84@laposte.net> wrote:
> > 
> >> Hi Austin,
> >>
> >> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
> >>> On 2016-05-13 08:39, Sebastian Frias wrote:  
> >>>>
> >>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.  
> >>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.  
> >>
> >> By the way, why does it has to "kill" anything in that case?
> >> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
> > 
> > Just turn off overcommit and it will do that. With overcommit disabled
> > the kernel will not hand out address space in excess of memory plus swap.
> 
> I think I'm confused.
> Michal just said:
> 
>    "And again, overcommit=never doesn't imply no-OOM. It just makes it less
> likely. The kernel can consume quite some unreclaimable memory as well."
> 
> which I understand as the OOM-killer will still lurk around and could still wake up.
> 
> Will overcommit=never totally disable the OOM-Killer or not?

Please have a look at __vm_enough_memory and which allocations are
accounted. There are lots of those in kernel which are not accounted so
the OOM killer still might be invoked if there is an excessive in kernel
unreclaimable memory consumer.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:11               ` One Thousand Gnomes
@ 2016-05-13 15:26                 ` Michal Hocko
  2016-05-13 15:32                 ` Sebastian Frias
  1 sibling, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-13 15:26 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Mason, Sebastian Frias, linux-mm, Andrew Morton, Linus Torvalds, LKML

On Fri 13-05-16 16:11:04, One Thousand Gnomes wrote:
> > It seems important to point out that Sebastian's patch does NOT change
> > the default behavior. It merely creates a knob allowing one to override
> > the default via Kconfig.
> > 
> > +choice
> > +	prompt "Overcommit Mode"
> > +	default OVERCOMMIT_GUESS
> > +	depends on EXPERT
> 
> Which is still completely pointless given that its a single sysctl value
> set at early userspace time and most distributions ship with things like
> sysctl and /etc/sysctl.conf
> 
> We have a million other such knobs, putting them in kconfig just gets
> silly.

Exactly my point from the very begining. Thanks for being so direct
here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:11               ` One Thousand Gnomes
  2016-05-13 15:26                 ` Michal Hocko
@ 2016-05-13 15:32                 ` Sebastian Frias
  1 sibling, 0 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 15:32 UTC (permalink / raw)
  To: One Thousand Gnomes, Mason
  Cc: Michal Hocko, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Alan,

On 05/13/2016 05:11 PM, One Thousand Gnomes wrote:
>> It seems important to point out that Sebastian's patch does NOT change
>> the default behavior. It merely creates a knob allowing one to override
>> the default via Kconfig.
>>
>> +choice
>> +	prompt "Overcommit Mode"
>> +	default OVERCOMMIT_GUESS
>> +	depends on EXPERT
> 
> Which is still completely pointless given that its a single sysctl value
> set at early userspace time and most distributions ship with things like
> sysctl and /etc/sysctl.conf
> 

You are right, and I said that when the thread started, but I think most people here are looking at this from a server/desktop perspective.
Also, we wanted to have more background on this setting, its history, etc. thus this discussion.
It would be interesting in know what other people working on embedded systems think about this subject, because most examples given are for much bigger systems.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:04               ` One Thousand Gnomes
@ 2016-05-13 15:37                 ` Sebastian Frias
  2016-05-13 15:43                   ` One Thousand Gnomes
  2016-05-13 17:01                   ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-13 15:37 UTC (permalink / raw)
  To: One Thousand Gnomes, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Alan,

On 05/13/2016 05:04 PM, One Thousand Gnomes wrote:
>>> Perhaps Sebastian's choice could be made to depend on CONFIG_EMBEDDED,
>>> rather than CONFIG_EXPERT?  
>>
>> Even if the overcommit behavior is different on those systems the
>> primary question hasn't been answered yet. Why cannot this be done from
>> the userspace? In other words what wouldn't work properly?
> 
> Most allocations in C have no mechanism to report failure.
> 
> Stakc expansion failure is not reportable. Copy on write failure is not
> reportable and so on.

But wouldn't those affect a given process at at time?
Does that means that the OOM-killer is woken up to kill process X when those situations arise on process Y?

Also, under what conditions would copy-on-write fail?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:10             ` Sebastian Frias
@ 2016-05-13 15:41               ` One Thousand Gnomes
  2016-05-23 13:11                 ` Sebastian Frias
  0 siblings, 1 reply; 52+ messages in thread
From: One Thousand Gnomes @ 2016-05-13 15:41 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: Michal Hocko, Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

> My understanding is that there was a time when there was no overcommit at all.
> If that's the case, understanding why overcommit was introduced would be helpful.

Linux always had overcommit.

The origin of overcommit is virtual memory for the most part. In a
classic swapping system without VM the meaning of brk() and thus malloc()
is that it allocates memory (or swap). Likewise this is true of fork()
and stack extension.

In a virtual memory system these allocate _address space_. It does not
become populated except by page faulting, copy on write and the like. It
turns out that for most use cases on a virtual memory system we get huge
amounts of page sharing or untouched space.

Historically Linux did guess based overcommit and I added no overcommit
support way back when, along with 'anything is allowed' support for
certain HPC use cases.

The beancounter patches combined with this made the entire setup
completely robust but the beancounters never hit upstream although years
later they became part of the basis of the cgroups.

You can sort of set a current Linux up for definitely no overcommit using
cgroups and no overcommit settings. It works for most stuff although last
I checked most graphics drivers were terminally broken (and not just to
no overcommit but to the point you can remote DoS Linux boxes with a
suitably constructed web page and chrome browser)

Alan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:37                 ` Sebastian Frias
@ 2016-05-13 15:43                   ` One Thousand Gnomes
  2016-05-17  8:24                     ` Sebastian Frias
  2016-05-13 17:01                   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 52+ messages in thread
From: One Thousand Gnomes @ 2016-05-13 15:43 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: Michal Hocko, Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

> But wouldn't those affect a given process at at time?
> Does that means that the OOM-killer is woken up to kill process X when those situations arise on process Y?

Not sure I understand the question.

> Also, under what conditions would copy-on-write fail?

When you have no memory or swap pages free and you touch a COW page that
is currently shared. At that point there is no resource to back to the
copy so something must die - either the process doing the copy or
something else.

Alan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:37                 ` Sebastian Frias
  2016-05-13 15:43                   ` One Thousand Gnomes
@ 2016-05-13 17:01                   ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-13 17:01 UTC (permalink / raw)
  To: Sebastian Frias, One Thousand Gnomes, Michal Hocko
  Cc: Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

On 2016-05-13 11:37, Sebastian Frias wrote:
> Hi Alan,
>
> On 05/13/2016 05:04 PM, One Thousand Gnomes wrote:
>>>> Perhaps Sebastian's choice could be made to depend on CONFIG_EMBEDDED,
>>>> rather than CONFIG_EXPERT?
>>>
>>> Even if the overcommit behavior is different on those systems the
>>> primary question hasn't been answered yet. Why cannot this be done from
>>> the userspace? In other words what wouldn't work properly?
>>
>> Most allocations in C have no mechanism to report failure.
>>
>> Stakc expansion failure is not reportable. Copy on write failure is not
>> reportable and so on.
>
> But wouldn't those affect a given process at at time?
> Does that means that the OOM-killer is woken up to kill process X when those situations arise on process Y?
Barring memory cgroups, if you have hit an OOM condition, it impacts the 
entire system.  Some process other than the one which first hit the 
failure may get killed, but every process will fail allocations until 
the situation is resolved.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:43                   ` One Thousand Gnomes
@ 2016-05-17  8:24                     ` Sebastian Frias
  2016-05-17  8:57                       ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-17  8:24 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Michal Hocko, Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Alan,

On 05/13/2016 05:43 PM, One Thousand Gnomes wrote:
>> But wouldn't those affect a given process at at time?
>> Does that means that the OOM-killer is woken up to kill process X when those situations arise on process Y?
> 
> Not sure I understand the question.

I'm sorry for the "at at time" typo.
What I meant was that situations you described "Stakc expansion failure is not reportable. Copy on write failure is not reportable and so on.", should affect one process at the time, in that case:
1) either process X with the COW failure happens could die
2) either random process Y dies so that COW failure on process X can be handled.

Do you know why was 2) chosen over 1)?

> 
>> Also, under what conditions would copy-on-write fail?
> 
> When you have no memory or swap pages free and you touch a COW page that
> is currently shared. At that point there is no resource to back to the
> copy so something must die - either the process doing the copy or
> something else.

Exactly, and why does "killing something else" makes more sense (or was chosen over) "killing the process doing the copy"?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-17  8:24                     ` Sebastian Frias
@ 2016-05-17  8:57                       ` Michal Hocko
  2016-05-17 16:16                         ` Sebastian Frias
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-17  8:57 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML

On Tue 17-05-16 10:24:20, Sebastian Frias wrote:
[...]
> >> Also, under what conditions would copy-on-write fail?
> > 
> > When you have no memory or swap pages free and you touch a COW page that
> > is currently shared. At that point there is no resource to back to the
> > copy so something must die - either the process doing the copy or
> > something else.
> 
> Exactly, and why does "killing something else" makes more sense (or
> was chosen over) "killing the process doing the copy"?

Because that "something else" is usually a memory hog and so chances are
that the out of memory situation will get resolved. If you kill "process
doing the copy" then you might end up just not getting any memory back
because that might be a little forked process which doesn't own all that
much memory on its own. That would leave you in the oom situation for a
long time until somebody actually sitting on some memory happens to ask
for CoW... See the difference?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-10 11:56 [PATCH] mm: add config option to select the initial overcommit mode Sebastian Frias
  2016-05-10 12:00 ` Fwd: " Sebastian Frias
  2016-05-13  8:04 ` Michal Hocko
@ 2016-05-17  9:03 ` Mason
  2 siblings, 0 replies; 52+ messages in thread
From: Mason @ 2016-05-17  9:03 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: linux-mm, LKML

On 10/05/2016 13:56, Sebastian Frias wrote:

> Currently the initial value of the overcommit mode is OVERCOMMIT_GUESS.
> However, on embedded systems it is usually better to disable overcommit
> to avoid waking up the OOM-killer and its well known undesirable
> side-effects.

There is an interesting article on LWN:

Toward more predictable and reliable out-of-memory handling
https://lwn.net/Articles/668126/

Regards.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-17  8:57                       ` Michal Hocko
@ 2016-05-17 16:16                         ` Sebastian Frias
  2016-05-17 17:29                           ` Austin S. Hemmelgarn
  2016-05-17 20:16                           ` Michal Hocko
  0 siblings, 2 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-17 16:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

Hi Michal,

On 05/17/2016 10:57 AM, Michal Hocko wrote:
> On Tue 17-05-16 10:24:20, Sebastian Frias wrote:
> [...]
>>>> Also, under what conditions would copy-on-write fail?
>>>
>>> When you have no memory or swap pages free and you touch a COW page that
>>> is currently shared. At that point there is no resource to back to the
>>> copy so something must die - either the process doing the copy or
>>> something else.
>>
>> Exactly, and why does "killing something else" makes more sense (or
>> was chosen over) "killing the process doing the copy"?
> 
> Because that "something else" is usually a memory hog and so chances are
> that the out of memory situation will get resolved. If you kill "process
> doing the copy" then you might end up just not getting any memory back
> because that might be a little forked process which doesn't own all that
> much memory on its own. That would leave you in the oom situation for a
> long time until somebody actually sitting on some memory happens to ask
> for CoW... See the difference?
> 

I see the difference, your answer seems a bit like the one from Austin, basically:
- killing a process is a sort of kernel protection attempting to deal "automatically" with some situation, like deciding what is a 'memory hog', or what is 'in infinite loop', "usually" in a correct way.
It seems there's people who think its better to avoid having to take such decisions and/or they should be decided by the user, because "usually" != "always".
And people who see that as a nice thing but complex thing to do.
In this thread we've tried to explain why this heuristic (and/or OOM-killer) is/was needed and/or its history, which has been very enlightening by the way.

>From reading Documentation/cgroup-v1/memory.txt (and from a few replies here talking about cgroups), it looks like the OOM-killer is still being actively discussed, well, there's also "cgroup-v2".
My understanding is that cgroup's memory control will pause processes in a given cgroup until the OOM situation is solved for that cgroup, right?
If that is right, it means that there is indeed a way to deal with an OOM situation (stack expansion, COW failure, 'memory hog', etc.) in a better way than the OOM-killer, right?
In which case, do you guys know if there is a way to make the whole system behave as if it was inside a cgroup? (*)

Best regards,

Sebastian


(*): I tried setting up a simple test but failed, so I think I need more reading :-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-17 16:16                         ` Sebastian Frias
@ 2016-05-17 17:29                           ` Austin S. Hemmelgarn
  2016-05-18 15:19                             ` Sebastian Frias
  2016-05-17 20:16                           ` Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-17 17:29 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

On 2016-05-17 12:16, Sebastian Frias wrote:
> Hi Michal,
>
> On 05/17/2016 10:57 AM, Michal Hocko wrote:
>> On Tue 17-05-16 10:24:20, Sebastian Frias wrote:
>> [...]
>>>>> Also, under what conditions would copy-on-write fail?
>>>>
>>>> When you have no memory or swap pages free and you touch a COW page that
>>>> is currently shared. At that point there is no resource to back to the
>>>> copy so something must die - either the process doing the copy or
>>>> something else.
>>>
>>> Exactly, and why does "killing something else" makes more sense (or
>>> was chosen over) "killing the process doing the copy"?
>>
>> Because that "something else" is usually a memory hog and so chances are
>> that the out of memory situation will get resolved. If you kill "process
>> doing the copy" then you might end up just not getting any memory back
>> because that might be a little forked process which doesn't own all that
>> much memory on its own. That would leave you in the oom situation for a
>> long time until somebody actually sitting on some memory happens to ask
>> for CoW... See the difference?
>>
>
> I see the difference, your answer seems a bit like the one from Austin, basically:
> - killing a process is a sort of kernel protection attempting to deal "automatically" with some situation, like deciding what is a 'memory hog', or what is 'in infinite loop', "usually" in a correct way.
> It seems there's people who think its better to avoid having to take such decisions and/or they should be decided by the user, because "usually" != "always".
FWIW, it's really easy to see what's using a lot of memory, it's 
impossible to tell if something is stuck in an infinite loop without 
looking deep into the process state and possibly even at the source code 
(and even then it can be almost impossible to be certain).  This is why 
we have a OOM-Killer, and not a infinite-loop-killer.

Again I reiterate, if a system is properly provisioned (that is, if you 
have put in enough RAM and possibly swap space to do what you want to 
use it for), the only reason the OOM-killer should be invoked is due to 
a bug.  The non-default overcommit options still have the same issues 
they just change how and when they happen (overcommit=never will fire 
sooner, overcommit=always will fire later), and also can impact memory 
allocation performance (I have numbers somewhere that I can't find right 
now that demonstrated that overcommit=never gave more deterministic and 
(on average) marginally better malloc() performance, and simple logic 
would suggest that overcommit=always would make malloc() perform better 
too).
> And people who see that as a nice thing but complex thing to do.
> In this thread we've tried to explain why this heuristic (and/or OOM-killer) is/was needed and/or its history, which has been very enlightening by the way.
>
> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here talking about cgroups), it looks like the OOM-killer is still being actively discussed, well, there's also "cgroup-v2".
> My understanding is that cgroup's memory control will pause processes in a given cgroup until the OOM situation is solved for that cgroup, right?
> If that is right, it means that there is indeed a way to deal with an OOM situation (stack expansion, COW failure, 'memory hog', etc.) in a better way than the OOM-killer, right?
> In which case, do you guys know if there is a way to make the whole system behave as if it was inside a cgroup? (*)
No, not with the process freeze behavior, because getting the group 
running again requires input from an external part of the system, which 
by definition doesn't exist if the group is the entire system; and, 
because our GUI isn't built into the kernel, we can't pause things and 
pop up a little dialog asking the user what to do to resolve the issue.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-17 16:16                         ` Sebastian Frias
  2016-05-17 17:29                           ` Austin S. Hemmelgarn
@ 2016-05-17 20:16                           ` Michal Hocko
  2016-05-18 15:18                             ` Sebastian Frias
  1 sibling, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-17 20:16 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
[...]
> From reading Documentation/cgroup-v1/memory.txt (and from a few
> replies here talking about cgroups), it looks like the OOM-killer is
> still being actively discussed, well, there's also "cgroup-v2".
> My understanding is that cgroup's memory control will pause processes
> in a given cgroup until the OOM situation is solved for that cgroup,
> right?

It will be blocked waiting either for some external action which would
result in OOM codition going away or any other charge release. You have
to configure memcg for that though. The default behavior is to invoke
the same OOM killer algorithm which is just reduced to tasks from the
memcg (hierarchy).

> If that is right, it means that there is indeed a way to deal
> with an OOM situation (stack expansion, COW failure, 'memory hog',
> etc.) in a better way than the OOM-killer, right?
> In which case, do you guys know if there is a way to make the whole
> system behave as if it was inside a cgroup? (*)

No it is not. You have to realize that the system wide and the memcg OOM
situations are quite different. There is usually quite some memory free
when you hit the memcg OOM so the administrator can actually do
something. The global OOM means there is _no_ memory at all. Many kernel
operations will need some memory to do something useful. Let's say you
would want to do an educated guess about who to kill - most proc APIs
will need to allocate. And this is just a beginning. Things are getting
really nasty when you get deeper and deeper. E.g. the OOM killer has to
give the oom victim access to memory reserves so that the task can exit
because that path needs to allocate as well. So even if you wanted to
give userspace some chance to resolve the OOM situation you would either
need some special API to tell "this process is really special and it can
access memory reserves and it has an absolute priority etc." or have a
in kernel fallback to do something or your system could lockup really
easily.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-17 20:16                           ` Michal Hocko
@ 2016-05-18 15:18                             ` Sebastian Frias
  2016-05-19  7:14                               ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-18 15:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

Hi Michal,

On 05/17/2016 10:16 PM, Michal Hocko wrote:
> On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
> [...]
>> From reading Documentation/cgroup-v1/memory.txt (and from a few
>> replies here talking about cgroups), it looks like the OOM-killer is
>> still being actively discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes
>> in a given cgroup until the OOM situation is solved for that cgroup,
>> right?
> 
> It will be blocked waiting either for some external action which would
> result in OOM codition going away or any other charge release. You have
> to configure memcg for that though. The default behavior is to invoke
> the same OOM killer algorithm which is just reduced to tasks from the
> memcg (hierarchy).

Ok, I see, thanks!

> 
>> If that is right, it means that there is indeed a way to deal
>> with an OOM situation (stack expansion, COW failure, 'memory hog',
>> etc.) in a better way than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole
>> system behave as if it was inside a cgroup? (*)
> 
> No it is not. You have to realize that the system wide and the memcg OOM
> situations are quite different. There is usually quite some memory free
> when you hit the memcg OOM so the administrator can actually do
> something. 

Ok, so it works like the 5% reserved for 'root' on filesystems?

>The global OOM means there is _no_ memory at all. Many kernel
> operations will need some memory to do something useful. Let's say you
> would want to do an educated guess about who to kill - most proc APIs
> will need to allocate. And this is just a beginning. Things are getting
> really nasty when you get deeper and deeper. E.g. the OOM killer has to
> give the oom victim access to memory reserves so that the task can exit
> because that path needs to allocate as well. 

Really? I would have thought that once that SIGKILL is sent, the victim process is not expected to do anything else and thus its memory could be claimed immediately.
Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM)

>So even if you wanted to
> give userspace some chance to resolve the OOM situation you would either
> need some special API to tell "this process is really special and it can
> access memory reserves and it has an absolute priority etc." or have a
> in kernel fallback to do something or your system could lockup really
> easily.
> 

I see, so basically at least two cgroups would be needed, one reserved for handling the OOM situation through some API and another for the "rest of the system".
Basically just like the 5% reserved for 'root' on filesystems.
Do you think that would work?

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-17 17:29                           ` Austin S. Hemmelgarn
@ 2016-05-18 15:19                             ` Sebastian Frias
  2016-05-18 16:28                               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 52+ messages in thread
From: Sebastian Frias @ 2016-05-18 15:19 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Michal Hocko
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

Hi Austin,

On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote:
>> I see the difference, your answer seems a bit like the one from Austin, basically:
>> - killing a process is a sort of kernel protection attempting to deal "automatically" with some situation, like deciding what is a 'memory hog', or what is 'in infinite loop', "usually" in a correct way.
>> It seems there's people who think its better to avoid having to take such decisions and/or they should be decided by the user, because "usually" != "always".
> FWIW, it's really easy to see what's using a lot of memory, it's impossible to tell if something is stuck in an infinite loop without looking deep into the process state and possibly even at the source code (and even then it can be almost impossible to be certain).  This is why we have a OOM-Killer, and not a infinite-loop-killer.
> 
> Again I reiterate, if a system is properly provisioned (that is, if you have put in enough RAM and possibly swap space to do what you want to use it for), the only reason the OOM-killer should be invoked is due to a bug. 

Are you sure that's the only possible reason?
I mean, what if somebody keeps opening tabs on Firefox?
If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs open, please close some to free memory".

> The non-default overcommit options still have the same issues they just change how and when they happen (overcommit=never will fire sooner, overcommit=always will fire later), and also can impact memory allocation performance (I have numbers somewhere that I can't find right now that demonstrated that overcommit=never gave more deterministic and (on average) marginally better malloc() performance, and simple logic would suggest that overcommit=always would make malloc() perform better too).
>> And people who see that as a nice thing but complex thing to do.
>> In this thread we've tried to explain why this heuristic (and/or OOM-killer) is/was needed and/or its history, which has been very enlightening by the way.
>>
>> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here talking about cgroups), it looks like the OOM-killer is still being actively discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes in a given cgroup until the OOM situation is solved for that cgroup, right?
>> If that is right, it means that there is indeed a way to deal with an OOM situation (stack expansion, COW failure, 'memory hog', etc.) in a better way than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole system behave as if it was inside a cgroup? (*)
> No, not with the process freeze behavior, because getting the group running again requires input from an external part of the system, which by definition doesn't exist if the group is the entire system; 

Do you mean that it pauses all processes in the cgroup?
I thought it would pause on a case-by-case basis, like first process to reach the limit gets paused, and so on.

Honestly I thought it would work a bit like the filesystems, where 'root' usually has 5% reserved, so that a process (or processes) filling the disk does not disrupt the system to the point of preventing 'root' from performing administrative actions.

That makes me think, why is disk space handled differently than memory in this case? I mean, why is disk space exhaustion handled differently than memory exhaustion?
We could imagine that both resources are required for proper system and process operation, so if OOM-killer is there to attempt to keep the system working at all costs (even if that means sacrificing processes), why isn't there an OOFS-killer (out-of-free-space killer)?

>and, because our GUI isn't built into the kernel, we can't pause things and pop up a little dialog asking the user what to do to resolve the issue.

:-) Yeah, I was thinking that could be handled with the cgroups' notification system + the reserved space (like on filesystems)
Maybe I was too optimistic (naive or just plain ignorant) about this.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-18 15:19                             ` Sebastian Frias
@ 2016-05-18 16:28                               ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 52+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-18 16:28 UTC (permalink / raw)
  To: Sebastian Frias, Michal Hocko
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

On 2016-05-18 11:19, Sebastian Frias wrote:
> Hi Austin,
>
> On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote:
>>> I see the difference, your answer seems a bit like the one from Austin, basically:
>>> - killing a process is a sort of kernel protection attempting to deal "automatically" with some situation, like deciding what is a 'memory hog', or what is 'in infinite loop', "usually" in a correct way.
>>> It seems there's people who think its better to avoid having to take such decisions and/or they should be decided by the user, because "usually" != "always".
>> FWIW, it's really easy to see what's using a lot of memory, it's impossible to tell if something is stuck in an infinite loop without looking deep into the process state and possibly even at the source code (and even then it can be almost impossible to be certain).  This is why we have a OOM-Killer, and not a infinite-loop-killer.
>>
>> Again I reiterate, if a system is properly provisioned (that is, if you have put in enough RAM and possibly swap space to do what you want to use it for), the only reason the OOM-killer should be invoked is due to a bug.
>
> Are you sure that's the only possible reason?
> I mean, what if somebody keeps opening tabs on Firefox?
> If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs open, please close some to free memory".
That's an application issue, and I'm pretty sure that most browsers do 
mention this.  That also falls within normal usage for a desktop system 
(somewhat, if you're opening more than a few dozen tabs, you're asking 
for trouble for other reasons too).
>
>> The non-default overcommit options still have the same issues they just change how and when they happen (overcommit=never will fire sooner, overcommit=always will fire later), and also can impact memory allocation performance (I have numbers somewhere that I can't find right now that demonstrated that overcommit=never gave more deterministic and (on average) marginally better malloc() performance, and simple logic would suggest that overcommit=always would make malloc() perform better too).
>>> And people who see that as a nice thing but complex thing to do.
>>> In this thread we've tried to explain why this heuristic (and/or OOM-killer) is/was needed and/or its history, which has been very enlightening by the way.
>>>
>>> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here talking about cgroups), it looks like the OOM-killer is still being actively discussed, well, there's also "cgroup-v2".
>>> My understanding is that cgroup's memory control will pause processes in a given cgroup until the OOM situation is solved for that cgroup, right?
>>> If that is right, it means that there is indeed a way to deal with an OOM situation (stack expansion, COW failure, 'memory hog', etc.) in a better way than the OOM-killer, right?
>>> In which case, do you guys know if there is a way to make the whole system behave as if it was inside a cgroup? (*)
>> No, not with the process freeze behavior, because getting the group running again requires input from an external part of the system, which by definition doesn't exist if the group is the entire system;
>
> Do you mean that it pauses all processes in the cgroup?
> I thought it would pause on a case-by-case basis, like first process to reach the limit gets paused, and so on.
>
> Honestly I thought it would work a bit like the filesystems, where 'root' usually has 5% reserved, so that a process (or processes) filling the disk does not disrupt the system to the point of preventing 'root' from performing administrative actions.
>
> That makes me think, why is disk space handled differently than memory in this case? I mean, why is disk space exhaustion handled differently than memory exhaustion?
> We could imagine that both resources are required for proper system and process operation, so if OOM-killer is there to attempt to keep the system working at all costs (even if that means sacrificing processes), why isn't there an OOFS-killer (out-of-free-space killer)?
There are actually sysctl's for this, vm/{admin,user}_reserve_kbytes. 
The admin one is system-wide and provides a reserve for users with 
CAP_SYS_ADMIN.  The user one is per-process and prevents a process from 
allocating beyond a specific point, and is intended for overcommit=never 
mode.

That said, there are a couple of reasons that disk space and memory are 
handled differently:
1. The kernel needs RAM to function, it does not need disk space to 
function.  In other words, if we have no free RAM, the system is 
guaranteed to be unusable, but if we have no disk space, the system may 
or may not still be usable.
2. Freeing disk space is usually an easy decision for the user, figuring 
out what to kill to free RAM is not.
3. Most end users have at least a basic understanding of disk space 
being finite, while they don't necessarily have a similar understanding 
of memory being finite (note that I'm not talking about sysadmins and 
similar, I"m talking about people's grandmothers, and people who have no 
low-level background with computers, and people like some of my friends 
who still have trouble understanding the difference between memory and 
persistent storage)
>
>> and, because our GUI isn't built into the kernel, we can't pause things and pop up a little dialog asking the user what to do to resolve the issue.
>
> :-) Yeah, I was thinking that could be handled with the cgroups' notification system + the reserved space (like on filesystems)
> Maybe I was too optimistic (naive or just plain ignorant) about this.
Ideally, we would have something that could check against some watermark 
and notify like Windows does when virtual memory is getting low (most 
people never see this, because they let windows manage the page file, 
which means it just gleefully allocates whatever it needs on disk).  I 
don't know of a way to do that right now without polling though, and 
that level of inefficiency should ideally be avoided.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-18 15:18                             ` Sebastian Frias
@ 2016-05-19  7:14                               ` Michal Hocko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-19  7:14 UTC (permalink / raw)
  To: Sebastian Frias
  Cc: One Thousand Gnomes, Mason, linux-mm, Andrew Morton,
	Linus Torvalds, LKML, bsingharora

On Wed 18-05-16 17:18:45, Sebastian Frias wrote:
> Hi Michal,
> 
> On 05/17/2016 10:16 PM, Michal Hocko wrote:
> > On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
[...]
> > The global OOM means there is _no_ memory at all. Many kernel
> > operations will need some memory to do something useful. Let's say you
> > would want to do an educated guess about who to kill - most proc APIs
> > will need to allocate. And this is just a beginning. Things are getting
> > really nasty when you get deeper and deeper. E.g. the OOM killer has to
> > give the oom victim access to memory reserves so that the task can exit
> > because that path needs to allocate as well. 
> 
> Really? I would have thought that once that SIGKILL is sent, the
> victim process is not expected to do anything else and thus its
> memory could be claimed immediately.  Or the OOM-killer is more of a
> OOM-terminator? (i.e.: sends SIGTERM)

Well, the path to exit is not exactly trivial. Resources have to be
released and that requires memory sometimes. E.g. exit_robust_list
needs to access the futex and that in turn means a page fault if the
memory was swapped out...
 
> >So even if you wanted to
> > give userspace some chance to resolve the OOM situation you would either
> > need some special API to tell "this process is really special and it can
> > access memory reserves and it has an absolute priority etc." or have a
> > in kernel fallback to do something or your system could lockup really
> > easily.
> > 
> 
> I see, so basically at least two cgroups would be needed, one reserved
> for handling the OOM situation through some API and another for the
> "rest of the system".  Basically just like the 5% reserved for 'root'
> on filesystems.

If you want to handle memcg OOM then you can use memory.oom_control (see
Documentation/cgroup-v1/memory.txt for more information) and have the
oom handler outside of that memcg.

> Do you think that would work?

But handling the _global_ oom from userspace is just insane with the
current kernel implementation. It just cannot work reliably.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] mm: add config option to select the initial overcommit mode
  2016-05-13 15:41               ` One Thousand Gnomes
@ 2016-05-23 13:11                 ` Sebastian Frias
  0 siblings, 0 replies; 52+ messages in thread
From: Sebastian Frias @ 2016-05-23 13:11 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Michal Hocko, Mason, linux-mm, Andrew Morton, Linus Torvalds, LKML

Hi Alan,

On 05/13/2016 05:41 PM, One Thousand Gnomes wrote:
>> My understanding is that there was a time when there was no overcommit at all.
>> If that's the case, understanding why overcommit was introduced would be helpful.
> 
> Linux always had overcommit.
> 
> The origin of overcommit is virtual memory for the most part. In a
> classic swapping system without VM the meaning of brk() and thus malloc()
> is that it allocates memory (or swap). Likewise this is true of fork()
> and stack extension.
> 
> In a virtual memory system these allocate _address space_. It does not
> become populated except by page faulting, copy on write and the like. It
> turns out that for most use cases on a virtual memory system we get huge
> amounts of page sharing or untouched space.
> 
> Historically Linux did guess based overcommit and I added no overcommit
> support way back when, along with 'anything is allowed' support for
> certain HPC use cases.
> 
> The beancounter patches combined with this made the entire setup
> completely robust but the beancounters never hit upstream although years
> later they became part of the basis of the cgroups.
> 
> You can sort of set a current Linux up for definitely no overcommit using
> cgroups and no overcommit settings. It works for most stuff although last
> I checked most graphics drivers were terminally broken (and not just to
> no overcommit but to the point you can remote DoS Linux boxes with a
> suitably constructed web page and chrome browser)
> 
> Alan
> 

Thanks for your comment, it certainly provides more clues and provided some history about the "overcommit" setting.
I will see if we can do what we want with cgroups.

Best regards,

Sebastian

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-05-23 13:11 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-10 11:56 [PATCH] mm: add config option to select the initial overcommit mode Sebastian Frias
2016-05-10 12:00 ` Fwd: " Sebastian Frias
2016-05-10 12:39   ` Andy Whitcroft
2016-05-10 13:02     ` Sebastian Frias
2016-05-13  8:04 ` Michal Hocko
2016-05-13  8:44   ` Mason
2016-05-13  9:52     ` Michal Hocko
2016-05-13 10:18       ` Mason
2016-05-13 10:42         ` Sebastian Frias
2016-05-13 11:44         ` Michal Hocko
2016-05-13 12:15           ` Mason
2016-05-13 14:01             ` Michal Hocko
2016-05-13 14:15               ` Sebastian Frias
2016-05-13 15:04               ` One Thousand Gnomes
2016-05-13 15:37                 ` Sebastian Frias
2016-05-13 15:43                   ` One Thousand Gnomes
2016-05-17  8:24                     ` Sebastian Frias
2016-05-17  8:57                       ` Michal Hocko
2016-05-17 16:16                         ` Sebastian Frias
2016-05-17 17:29                           ` Austin S. Hemmelgarn
2016-05-18 15:19                             ` Sebastian Frias
2016-05-18 16:28                               ` Austin S. Hemmelgarn
2016-05-17 20:16                           ` Michal Hocko
2016-05-18 15:18                             ` Sebastian Frias
2016-05-19  7:14                               ` Michal Hocko
2016-05-13 17:01                   ` Austin S. Hemmelgarn
2016-05-13 13:27         ` Austin S. Hemmelgarn
2016-05-13  9:52     ` Sebastian Frias
2016-05-13 12:00       ` Michal Hocko
2016-05-13 12:39         ` Sebastian Frias
2016-05-13 13:11           ` Austin S. Hemmelgarn
2016-05-13 13:32             ` Sebastian Frias
2016-05-13 13:51               ` Austin S. Hemmelgarn
2016-05-13 14:35                 ` Sebastian Frias
2016-05-13 14:54                   ` Michal Hocko
2016-05-13 15:15                   ` Austin S. Hemmelgarn
2016-05-13 13:34             ` Sebastian Frias
2016-05-13 14:14               ` Austin S. Hemmelgarn
2016-05-13 14:23                 ` Sebastian Frias
2016-05-13 15:02                   ` Austin S. Hemmelgarn
2016-05-13 15:01               ` One Thousand Gnomes
2016-05-13 15:15                 ` Sebastian Frias
2016-05-13 15:25                   ` Michal Hocko
2016-05-13 14:51           ` Michal Hocko
2016-05-13 14:59             ` Mason
2016-05-13 15:11               ` One Thousand Gnomes
2016-05-13 15:26                 ` Michal Hocko
2016-05-13 15:32                 ` Sebastian Frias
2016-05-13 15:10             ` Sebastian Frias
2016-05-13 15:41               ` One Thousand Gnomes
2016-05-23 13:11                 ` Sebastian Frias
2016-05-17  9:03 ` Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).