Re: [PATCH] mm: add config option to select the initial overcommit mode

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Sebastian Frias <sf84@laposte.net>, Michal Hocko <mhocko@kernel.org>
Cc: Mason <slash.tmp@free.fr>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: add config option to select the initial overcommit mode
Date: Fri, 13 May 2016 11:02:23 -0400	[thread overview]
Message-ID: <a3fd524c-be46-8848-830d-259b544a0ab6@gmail.com> (raw)
In-Reply-To: <5735E372.1090609@laposte.net>

On 2016-05-13 10:23, Sebastian Frias wrote:
> Hi Austin,
>
> On 05/13/2016 04:14 PM, Austin S. Hemmelgarn wrote:
>> On 2016-05-13 09:34, Sebastian Frias wrote:
>>> Hi Austin,
>>>
>>> On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
>>>> On 2016-05-13 08:39, Sebastian Frias wrote:
>>>>>
>>>>> My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
>>>> There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic.  This is about as deterministic as things can get though.
>>>
>>> By the way, why does it has to "kill" anything in that case?
>>> I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
>> In theory, that's a great idea.  In practice though, it only works if:
>> 1. The allocating task correctly handles malloc() (or whatever other function it uses) returning NULL, which a number of programs don't.
>> 2. The task actually has fallback options for memory limits.  Many programs that do handle getting a NULL pointer from malloc() handle it by exiting anyway, so there's not as much value in this case.
>> 3. There isn't a memory leak somewhere on the system.  Killing the allocating task doesn't help much if this is the case of course.
>
> Well, the thing is that the current behaviour, i.e.: overcommiting, does not improves the quality of those programs.
> I mean, what incentive do they have to properly handle situations 1, 2?
Overcommit got introduced because of these, not the other way around. 
It's not forcing them to change, but it's also a core concept in any 
modern virtual memory based OS, and that's not ever going to change either.

You also have to keep in mind that most apps aren't doing this 
intentionally.  There are three general reasons they do this:
1. They don't know how much memory they will need, so they guess high 
because malloc() is computationally expensive.  This is technically 
intentional, but it's also something that can't be avoided in some cases 
  Dropbox is a perfect example of this taken way too far (they also take 
the concept of a thread pool too far).
2. The program has a lot of code that isn't frequently run.  It makes no 
sense to keep code that isn't used in RAM, so it gets either dropped (if 
it's unmodified), or it gets swapped out.  Most of the programs that I 
see on my system fall into this category (acpid  for example just sleeps 
until an ACPI event happens, so it usually won't have most of it's code 
in memory on a busy system).
3. The application wants to do it's own memory management.  This is 
common on a lot of HPC apps and some high performance server software.
>
> Also, if there's a memory leak, the termination of any task, whether it is the allocating task or something random, does not help either, the system will eventually go down, right?
If the memory leak is in the kernel, then yes, the OOM killer won't 
help, period.  But if the memory leak is in userspace, and the OOM 
killer kills the task with the leak (which it usually will if you don't 
have it set to kill the allocating task), then it may have just saved 
the system from crashing completely.  Yes some user may lose some 
unsaved work, but they would lose that data anyway if the system 
crashes, and they can probably still use the rest of the system.
>> You have to keep in mind though, that on a properly provisioned system, the only situations where the OOM killer should be invoked are when there's a memory leak, or when someone is intentionally trying to DoS the system through memory exhaustion.
>
> Exactly, the DoS attack is another reason why the OOM-killer does not seem a good idea, at least compared to just letting malloc return NULL and let the program fail.
Because of overcommit, it's possible for the allocation to succeed, but 
the subsequent access to fail.  At that point, you're way past malloc() 
returning, and you have to do something.

Also, returning NULL on a failed malloc() provides zero protection 
against all but the most brain-dead memory exhaustion based DoS attacks. 
  The general core of a memory exhaustion DoS against a local system 
follows a simple three step procedure:
     1. Try to allocate a small chunk of memory (less than or equal to 
page size)
     2. If the allocation succeeded, write to the first byte of that 
chunk of memory, forcing actual allocation
     3. Repeat indefinitely from step 1
Step 2 is the crucial part here, if you don't write to the memory, it 
will only eat up your own virtual address space.  If you don't check for 
a NULL pointer and skip writing, you get a segfault.  If the OOM killer 
isn't invoked in such a situation, then this will just eat up all the 
free system memory, and then _keep running_ and eat up all the other 
memory as it's freed by other things exiting due to lack of memory.