Thomas Gleixner wrote:
> On Tue, 2004-11-23 at 08:41 +0100, Martin MOKREJŠ wrote: 
> 
>>>One big problem when killing the requesting process or just sending
>>>ENOMEM to the requesting process is, that exactly this process might be
>>>a ssh login, when you try to log into to machine after some application
>>>went crazy and ate up most of the memory. The result is that you
>>>_cannot_ log into the machine, because the login is either killed or
>>>cannot start because it receives ENOMEM.
>>
>>I believe the application is _first_ who will get ENOMEM. It must be
>>terrible luck that it would ask exactly for the size of remaining free
>>memory. Most probably, it will ask for less or more. "Less" in not
>>a problem in this case, so consider it asks for more. Then, OOM killer
>>might well expect the application asking for memory is most probably
>>exactly the application which caused the trouble.
> 
> 
> For one application, which eats up all memory the 2.4 ENOMEM bahviour
> works.
> 
> The scenario which made one of my boxes unusable under 2.4 is a forking
> server, which gets out of control. The last fork gets ENOMEM and does
> not happen, but the other forked processes are still there and consuming
> memory. The server application does the correct thing. It receives
> ENOMEM on fork() and cancels the connection request. On the next request
> the game starts again. Somebody notices that the box is not repsonding
> anymore and tries to login via ssh. Guess what happens. ssh login cannot
> fork due to ENOMEM. The same will happen on 2.6 if we make it behave
> like 2.4. 
> 
> We have TWO problems in oom handling:
> 
> 1. When do we trigger the out of memory killer
> 
> As far as my test cases go, 2.6.10-rc2-mm3 does not longer trigger the
> oom without reason.
> 
> 2. Which process do we select to kill
> 
> The decision is screwed since the oom killer was introduced. Also the
> reentrancy problem and some of the mechanisms in the out_of_memory
> function have to be modified to make it work.
> That's what my patch is addressing.
> 
> 
>>>Putting hard coded decisions like "prefer sshd, xyz,...", " don't kill
>>>a, b, c" are out of discussion.
>>
>>I'd go for it at least nowadays.
> 
> 
> Sure, you can do so on your box, but can you accept, that we _CANNOT_
> hard code a list of do not kill apps, except init, into the kernel. I
> don't want to see the mail thread on LKML, where the list of precious
> application is discussed.
> 
> 
>>> 
>>>The ideas which were proposed to have a possibility to set a "don't kill
>>>me" or "yes, I'm a candidate" flag are likely to be a future way to go.
>>>But at the moment we have no way to make this work in current userlands.
>>
>>Do you think login or sshd will ever use flag "yes, I'm a candidate"?
>>I think exactly same bahaviour we get right now with those hard coded decisions
>>you mention above. Otherwise the hard coded decision is programmed into
>>every sshd, init instance anyway. I think it's not necessary to put
>>login and shells on thsi ban list, user will re-login again. ;)
> 
> 
> Having a generic interface to make this configurable is the only way to
> go. So users can decide what is important in their environment. There is
> more than a desktop PC environment and a lot of embedded boxes need to
> protect special applications.
> 
> 
>>>I refined the decision, so it does not longer kill the parent, if there
>>>were forked child processes available to kill. So it now should keep
>>>your bash alive.
>>
>>Yes, it doesn't kill parent bash. I don't understand the _doubled_ output
>>in syslog, but maybe you do. Is that related to hyperthreading? ;)
>>Tested on 2.6.10-rc2-mm2.
> 
> 
>>oom-killer: gfp_mask=0xd2
>>Free pages:        3924kB (112kB HighMem)
> 
> 
>>oom-killer: gfp_mask=0x1d2
>>Free pages:        3924kB (112kB HighMem)
> 
> 
> No, it's not related to hyperthreading. It's on the way out. 
> 
> I put an additional check into the page allocator. Does this help ?

The application got killed. But, consider yourself the stacktrace ... ;)