* oom killer in 2.4.23 @ 2003-12-04 16:12 Peter Bergmann 2003-12-04 17:02 ` Maciej Zenczykowski 0 siblings, 1 reply; 19+ messages in thread From: Peter Bergmann @ 2003-12-04 16:12 UTC (permalink / raw) To: linux-kernel I would appreciate if someone could answer the following question: obviously oom killer has been removed from 2.4.23. the result is _bad_ in my special environment. (xserver gets killed instead of application) i'm sure you had very good reasons for removing the oom killer. nevertheless i've seen, that oom_kill.c is still in mm/ but disabled with #if 0 and out_of_memory() is not called anymore from vmscan.c. is it sufficient to remove the #if 0 from oom_kill.c, call out_of_memory() from vmscan.c again and add PF_MEMDIE to sched.h in order to get the "old" behaviour ? or will this result in - i don't know - something horrible? thanks for your help! cheers, pet (please cc me as i'm not (yet) subscribed.) -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 16:12 oom killer in 2.4.23 Peter Bergmann @ 2003-12-04 17:02 ` Maciej Zenczykowski 2003-12-04 17:20 ` Guillermo Menguez Alvarez 2003-12-04 18:33 ` Peter Bergmann 0 siblings, 2 replies; 19+ messages in thread From: Maciej Zenczykowski @ 2003-12-04 17:02 UTC (permalink / raw) To: Peter Bergmann; +Cc: linux-kernel Yes, and as a side question, couldn't oom killer be made into a config option? Cheers, MaZe. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 17:02 ` Maciej Zenczykowski @ 2003-12-04 17:20 ` Guillermo Menguez Alvarez 2003-12-04 23:52 ` Andrea Arcangeli 2003-12-04 18:33 ` Peter Bergmann 1 sibling, 1 reply; 19+ messages in thread From: Guillermo Menguez Alvarez @ 2003-12-04 17:20 UTC (permalink / raw) To: Maciej Zenczykowski; +Cc: linux-kernel > Yes, and as a side question, couldn't oom killer be made into a config > option? As I see in the ChangeLog: aa VM merge: page reclaiming logic changes: Kills oom killer OOM Killer has been removed due to AA VM changes, so maybe it can't be cleanly enabled again. Regards, Guillermo. -- Usuario Linux #212057 - Maquinas Linux #98894, #130864 y #168988 Proyecto LONIX: http://lonix.sourceforge.net Lagrimas en la Lluvia: http://www.lagrimasenlalluvia.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 17:20 ` Guillermo Menguez Alvarez @ 2003-12-04 23:52 ` Andrea Arcangeli 0 siblings, 0 replies; 19+ messages in thread From: Andrea Arcangeli @ 2003-12-04 23:52 UTC (permalink / raw) To: Guillermo Menguez Alvarez; +Cc: Maciej Zenczykowski, linux-kernel On Thu, Dec 04, 2003 at 06:20:16PM +0100, Guillermo Menguez Alvarez wrote: > > Yes, and as a side question, couldn't oom killer be made into a config > > option? > > As I see in the ChangeLog: > > aa VM merge: page reclaiming logic changes: Kills oom killer > > OOM Killer has been removed due to AA VM changes, so maybe it can't be > cleanly enabled again. it can be re-enabled without too much pain if you can accept the desktop behaviour of 2.4.22 and previous not suitable for servers. the oom killer had deadlocks and it was relaying on very inaccurate accounting, so it had a number of corner cases were it was killing tasks by mistakes (it's fooled by shm/mlock/noswap etc..), read also the bugreports for 2.4.22 with tasks being killed because there was no swap in the box (or just try to run your machine w/o swap, swap is not a must, it's a wish). Fixing those in 2.4 sounds too complicated, and now it's too late to even hope to make a proper oom killer for 2.4. For the record 2.2 was capable of checking iopl to defer a few times the killing of the X server, that wasn't forward ported to 2.4. For 2.6 we can do something better than all the past oom killers at least. 2.6 gets fooled by mlock too btw, ranom kill tasks etc.. so it's not much better than 2.4.22 was in oom killing respect. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 17:02 ` Maciej Zenczykowski 2003-12-04 17:20 ` Guillermo Menguez Alvarez @ 2003-12-04 18:33 ` Peter Bergmann 2003-12-04 18:42 ` Jens Axboe 1 sibling, 1 reply; 19+ messages in thread From: Peter Bergmann @ 2003-12-04 18:33 UTC (permalink / raw) To: Maciej Zenczykowski; +Cc: linux-kernel > > Yes, and as a side question, couldn't oom killer be made into a config > option? > > Cheers, > MaZe. I just tried it and - no it does not work. At least not with the following changes: added #define PF_MEMDIE 0x00001000 to sched.h replaced oom_kill.c with the 2.4.22 version added out_of_memory() to the end of try_to_free_pages_zone() replaced if (current->flags & PF_MEMALLOC && !in_interrupt()) { with replaced if ((current->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt() ) { in page_alloc.c effect is still unchanged. processes get killed by VM and not oom_kikll.c any hints ?? -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 18:33 ` Peter Bergmann @ 2003-12-04 18:42 ` Jens Axboe 2003-12-04 20:38 ` Peter Bergmann 0 siblings, 1 reply; 19+ messages in thread From: Jens Axboe @ 2003-12-04 18:42 UTC (permalink / raw) To: Peter Bergmann; +Cc: Maciej Zenczykowski, linux-kernel On Thu, Dec 04 2003, Peter Bergmann wrote: > > > > Yes, and as a side question, couldn't oom killer be made into a config > > option? > > > > Cheers, > > MaZe. > > I just tried it and - no it does not work. > At least not with the following changes: > > added #define PF_MEMDIE 0x00001000 to sched.h > > replaced oom_kill.c with the 2.4.22 version > > added out_of_memory() to the end of try_to_free_pages_zone() > > replaced if (current->flags & PF_MEMALLOC && !in_interrupt()) { > with > replaced if ((current->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt() > ) { > in page_alloc.c > > > effect is still unchanged. > processes get killed by VM and not oom_kikll.c > > any hints ?? You probably want to look at the change to vmscan.c:try_to_free_pages_zone(). -- Jens Axboe ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 18:42 ` Jens Axboe @ 2003-12-04 20:38 ` Peter Bergmann 2003-12-04 20:28 ` Szakacsits Szabolcs 2003-12-04 23:58 ` Andrea Arcangeli 0 siblings, 2 replies; 19+ messages in thread From: Peter Bergmann @ 2003-12-04 20:38 UTC (permalink / raw) To: Jens Axboe; +Cc: maze, linux-kernel > > effect is still unchanged. > > processes get killed by VM and not oom_kikll.c > > > > any hints ?? > > You probably want to look at the change to > vmscan.c:try_to_free_pages_zone(). > > -- > Jens Axboe I did, but my vm knolege is rather limited. I don't really know really know _where_ to place out_of_memory() in the new try_to_free_pages_zone()... and what other changes would be necessary in vmscan.c. My try & error approach did not succeed. I would be really glad if someone (aa may be :) could provide the information where/how to place the call for a custom (or the old) oom killer - if it's really that simple ... cheers, pet -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 20:38 ` Peter Bergmann @ 2003-12-04 20:28 ` Szakacsits Szabolcs 2003-12-04 23:58 ` Andrea Arcangeli 1 sibling, 0 replies; 19+ messages in thread From: Szakacsits Szabolcs @ 2003-12-04 20:28 UTC (permalink / raw) To: Peter Bergmann; +Cc: Jens Axboe, maze, linux-kernel On Thu, 4 Dec 2003, Peter Bergmann wrote: > I would be really glad if someone (aa may be :) could > provide the information where/how to place the call for a custom > (or the old) oom killer - if it's really that simple ... In the 2.2 backport I called it from the page fault handler, http://mlf.linux.rulez.org/mlf/ezaz/reserved_root_memory.html It worked fine for 2.2 but I don't know the current 2.4 VM state (without the oom killer it's just running amok, as you experienced). Szaka ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-04 20:38 ` Peter Bergmann 2003-12-04 20:28 ` Szakacsits Szabolcs @ 2003-12-04 23:58 ` Andrea Arcangeli 1 sibling, 0 replies; 19+ messages in thread From: Andrea Arcangeli @ 2003-12-04 23:58 UTC (permalink / raw) To: Peter Bergmann; +Cc: Jens Axboe, maze, linux-kernel On Thu, Dec 04, 2003 at 09:38:28PM +0100, Peter Bergmann wrote: > > > effect is still unchanged. > > > processes get killed by VM and not oom_kikll.c > > > > > > any hints ?? > > > > You probably want to look at the change to > > vmscan.c:try_to_free_pages_zone(). > > > > -- > > Jens Axboe > > I did, but my vm knolege is rather limited. > I don't really know really know _where_ to place > out_of_memory() in the new try_to_free_pages_zone()... > and what other changes would be necessary in vmscan.c. > > My try & error approach did not succeed. > > I would be really glad if someone (aa may be :) could > provide the information where/how to place the call for a custom > (or the old) oom killer - if it's really that simple ... it's that simple to reenable it in 2.4.22 status, so if you're ok to deadlock. 2.4.23 can't deadlock, it can live lock if you're unlucky with timings yes (think if you add 32G of swap and your ram runs at 1k/sec instead of 1G/sec), but not deadlock and it won't random kill tasks even if it shouldn't to. deadlock is a bug, killing task despite there's ram free is a bug, livelock is something you can avoid by dropping all swap. if you drop all swap with 2.4.22 it'll go nuts killing tasks (see the bugreports). Since doing it right wasn't possible in 2.4, I dropped it years ago, -aa users are w/o an oom killer for years and I never heard a single complain. somebody asked why yes, but they were happy afterwards. I don't think I asked Marcelo to merge it, I explained why I dropped it, people sent him bugreports about the oom killer going nuts, and he agreed my solution was the best short term w/o adding lots of effort to make the oom killer right. Note the oom killer goes nuts in 2.6 too, nobody did it right yet, that's why I don't think it's a 2.4 issue. Marcelo asked me to to make it configurable at runtime so you could go in the deadlock prone stautus of 2.4.22 on demand, but I'm not going to add more features to 2.4 today unless they're blocker bugs (even if that would be simple to implement), actually it's not even my choice so don't ask me for that sorry. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <Z6Iv-7O2-29@gated-at.bofh.it>]
[parent not found: <Z8Ag-3BK-3@gated-at.bofh.it>]
[parent not found: <Zbyn-23P-29@gated-at.bofh.it>]
* Re: oom killer in 2.4.23 [not found] ` <Zbyn-23P-29@gated-at.bofh.it> @ 2003-12-05 13:05 ` Kristian Peters 2003-12-05 13:56 ` Robert L. Harris ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Kristian Peters @ 2003-12-05 13:05 UTC (permalink / raw) To: Andrea Arcangeli, lkml Andrea Arcangeli <andrea@suse.de> schrieb: > Marcelo asked me to to make it configurable at runtime so you could go > in the deadlock prone stautus of 2.4.22 on demand, but I'm not going to > add more features to 2.4 today unless they're blocker bugs (even if that > would be simple to implement), actually it's not even my choice so don't > ask me for that sorry. Andrea, your vm does not work correctly in any cases. It's so simple. I've tried to fill up my memory with that crappy khexedit that comes with kde2. You'll see how my memory fills with the contents of the whole file I load. When I have started 2 or 3 instances of khexedit my memory was nearly completely filled. Than I tried to start another khexedit (with a file that should nearly fit into memory), and the pain began. See: Dec 5 13:33:52 adlib kernel: __alloc_pages: 2-order allocation failed (gfp=0x1f0/0) Dec 5 13:33:52 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:33:59 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:33:59 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:34:00 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Dec 5 13:34:01 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:34:02 adlib last message repeated 3 times -------> kernel killed wmfire without saying Dec 5 13:34:02 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:34:03 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:34:18 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:34:18 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:34:22 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:34:22 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:34:28 adlib last message repeated 3 times -------> kernel killed xosview without mentioning Dec 5 13:34:29 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:34:29 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:34:32 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:34:41 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Dec 5 13:34:41 adlib kernel: VM: killing process khexedit -------> that was intended Ok. That still is acceptable but when I tried to start mozilla, it got even worse: Dec 5 13:37:26 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:37:27 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Dec 5 13:37:27 adlib kernel: VM: killing process mozilla-bin Dec 5 13:37:27 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:37:27 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:37:28 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:37:30 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:37:30 adlib last message repeated 2 times Dec 5 13:37:30 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:37:40 adlib last message repeated 3 times Dec 5 13:37:56 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Dec 5 13:37:56 adlib kernel: VM: killing process mozilla-bin -------> that was intended too -------> but not the killing of another xosview and aterm Dec 5 13:37:57 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:37:57 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) Dec 5 13:37:58 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Dec 5 13:40:32 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Dec 5 13:40:32 adlib kernel: VM: killing process XFree86 -------> ouch ... Ok. Could you please describe what your vm really does here in my specific case ? Rick's old vm worked better. It'd have killed the task that had last allocated memory. PS: If you need more details it should be no problem to do this again. *Kristian _o) /\\ _\_V ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-05 13:05 ` Kristian Peters @ 2003-12-05 13:56 ` Robert L. Harris 2003-12-05 19:58 ` Andrea Arcangeli 2003-12-05 22:38 ` Mike Fedyk 2 siblings, 0 replies; 19+ messages in thread From: Robert L. Harris @ 2003-12-05 13:56 UTC (permalink / raw) To: Kristian Peters; +Cc: Andrea Arcangeli, lkml [-- Attachment #1: Type: text/plain, Size: 2511 bytes --] Thus spake Kristian Peters (kristian.peters@korseby.net): > Andrea Arcangeli <andrea@suse.de> schrieb: > > Marcelo asked me to to make it configurable at runtime so you could go > > in the deadlock prone stautus of 2.4.22 on demand, but I'm not going to > > add more features to 2.4 today unless they're blocker bugs (even if that > > would be simple to implement), actually it's not even my choice so don't > > ask me for that sorry. > > Andrea, your vm does not work correctly in any cases. > > It's so simple. I've tried to fill up my memory with that crappy khexedit that comes with kde2. You'll see how my memory fills with the contents of the whole file I load. When I have started 2 or 3 instances of khexedit my memory was nearly completely filled. Than I tried to start another khexedit (with a file that should nearly fit into memory), and the pain began. > > See: > > Dec 5 13:33:52 adlib kernel: __alloc_pages: 2-order allocation failed (gfp=0x1f0/0) > Dec 5 13:33:52 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) > Dec 5 13:33:59 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) > Dec 5 13:33:59 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) <snip> > Dec 5 13:37:57 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0) > Dec 5 13:37:58 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) > Dec 5 13:40:32 adlib kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) > Dec 5 13:40:32 adlib kernel: VM: killing process XFree86 > > -------> ouch ... > > > Ok. Could you please describe what your vm really does here in my specific case ? > Rick's old vm worked better. It'd have killed the task that had last allocated memory. > > PS: If you need more details it should be no problem to do this again. > > I'm see'ing similar except it's killing random apps such as CRON, named and some others. How far do I have to roll back to get the previous oomkiller? Trying to roll 2.4.23 out. :wq! --------------------------------------------------------------------------- Robert L. Harris | GPG Key ID: E344DA3B @ x-hkp://pgp.mit.edu DISCLAIMER: These are MY OPINIONS ALONE. I speak for no-one else. Life is not a destination, it's a journey. Microsoft produces 15 car pileups on the highway. Don't stop traffic to stand and gawk at the tragedy. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-05 13:05 ` Kristian Peters 2003-12-05 13:56 ` Robert L. Harris @ 2003-12-05 19:58 ` Andrea Arcangeli 2003-12-06 9:31 ` Kristian Peters 2003-12-05 22:38 ` Mike Fedyk 2 siblings, 1 reply; 19+ messages in thread From: Andrea Arcangeli @ 2003-12-05 19:58 UTC (permalink / raw) To: Kristian Peters; +Cc: lkml On Fri, Dec 05, 2003 at 02:05:20PM +0100, Kristian Peters wrote: > Andrea Arcangeli <andrea@suse.de> schrieb: > > Marcelo asked me to to make it configurable at runtime so you could go > > in the deadlock prone stautus of 2.4.22 on demand, but I'm not going to > > add more features to 2.4 today unless they're blocker bugs (even if that > > would be simple to implement), actually it's not even my choice so don't > > ask me for that sorry. > > Andrea, your vm does not work correctly in any cases. what you're complaining is the 'selection of the task to be killed'. That's not solvable. the kernel can't read your brain period. Only if the kernel could read the brain of the adminstrator then you would be happy, there is no way the kernel can know which is the task you really want to have killed first. > It's so simple. I've tried to fill up my memory with that crappy > khexedit that comes with kde2. You'll see how my memory fills with the > contents of the whole file I load. When I have started 2 or 3 > instances of khexedit my memory was nearly completely filled. Than I > tried to start another khexedit (with a file that should nearly fit > into memory), and the pain began. The kernel can't know what is a pain for you and what is pain for other people. Measuring the page fault rate seems to get the closest heuristic that may not be a pain for most people, all current oom killers are a pain for somebody, desktop users where pretty much fine with 2.4.22, server users had pain with 2.4.22 and should have less pain with 2.4.23. There is no way to make everybody happy in 2.4. > Rick's old vm worked better. It'd have killed the task that had last allocated memory. it was the biggest one that's why. the old omm killer gets your desktop scenario always correctly, true, (as far as your biggest task doesn't get stuck on nfs etc..) > PS: If you need more details it should be no problem to do this again. I'm aware about those issues, that's a feature not a bug. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-05 19:58 ` Andrea Arcangeli @ 2003-12-06 9:31 ` Kristian Peters 2003-12-09 14:21 ` Andrea Arcangeli 0 siblings, 1 reply; 19+ messages in thread From: Kristian Peters @ 2003-12-06 9:31 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Robert L. Harris Andrea Arcangeli <andrea@suse.de> schrieb: > what you're complaining is the 'selection of the task to be killed'. > That's not solvable. the kernel can't read your brain period. Only if > the kernel could read the brain of the adminstrator then you would be > happy, there is no way the kernel can know which is the task you really > want to have killed first. I agree. On a server the most likely application to be killed would be the service with the most pages in memory. And those services tend to be the important ones. However, for a simple desktop-linux that statistical approach seems to be wrong. Your vm has even killed /sbin/getty sometimes, so that I can't login via a simple console. Re-enabling the oom-killer gives a good result for me: Dec 6 09:46:19 adlib kernel: Out of Memory: Killed process 643 (khexedit). Dec 6 09:48:42 adlib kernel: Out of Memory: Killed process 645 (khexedit). What I complain is that your vm kills some processes without mentioning in the logs. How can I determine what processes the kernel has killed ? I hope that fairly simple patch does things right for all people that want the old behaviour. It's already a year ago I last hacked on the kernel. diff -rauN linux-2.4.23/include/linux/sched.h linux-2.4.23-kp1/include/linux/sched.h --- linux-2.4.23/include/linux/sched.h Fri Nov 28 19:26:21 2003 +++ linux-2.4.23-kp1/include/linux/sched.h Sat Dec 6 09:57:04 2003 @@ -429,6 +429,7 @@ #define PF_DUMPCORE 0x00000200 /* dumped core */ #define PF_SIGNALED 0x00000400 /* killed by a signal */ #define PF_MEMALLOC 0x00000800 /* Allocating memory */ +#define PF_MEMDIE 0x00001000 /* Killed for out-of-memory */ #define PF_FREE_PAGES 0x00002000 /* per process page freeing */ #define PF_NOIO 0x00004000 /* avoid generating further I/O */ diff -rauN linux-2.4.23/mm/oom_kill.c linux-2.4.23-kp1/mm/oom_kill.c --- linux-2.4.23/mm/oom_kill.c Fri Nov 28 19:26:21 2003 +++ linux-2.4.23-kp1/mm/oom_kill.c Fri Dec 5 20:31:39 2003 @@ -21,8 +21,6 @@ #include <linux/swapctl.h> #include <linux/timex.h> -#if 0 /* Nothing in this file is used */ - /* #define DEBUG */ /** @@ -257,5 +255,3 @@ first = now; count = 0; } - -#endif /* Unused file */ diff -rauN linux-2.4.23/mm/vmscan.c linux-2.4.23-kp1/mm/vmscan.c --- linux-2.4.23/mm/vmscan.c Fri Nov 28 19:26:21 2003 +++ linux-2.4.23-kp1/mm/vmscan.c Sat Dec 6 10:21:55 2003 @@ -649,13 +649,7 @@ failed_swapout = !swap_out(classzone); } while (--tries); - if (likely(current->pid != 1)) - break; - if (!check_classzone_need_balance(classzone)) - break; - - __set_current_state(TASK_RUNNING); - yield(); + out_of_memory(); } return 0; *Kristian _o) /\\ _\_V ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-06 9:31 ` Kristian Peters @ 2003-12-09 14:21 ` Andrea Arcangeli 2003-12-09 14:52 ` Richard B. Johnson 0 siblings, 1 reply; 19+ messages in thread From: Andrea Arcangeli @ 2003-12-09 14:21 UTC (permalink / raw) To: Kristian Peters; +Cc: linux-kernel, Robert L. Harris On Sat, Dec 06, 2003 at 10:31:43AM +0100, Kristian Peters wrote: > Andrea Arcangeli <andrea@suse.de> schrieb: > > what you're complaining is the 'selection of the task to be killed'. > > That's not solvable. the kernel can't read your brain period. Only if > > the kernel could read the brain of the adminstrator then you would be > > happy, there is no way the kernel can know which is the task you really > > want to have killed first. > > I agree. On a server the most likely application to be killed would be the service with the most pages in memory. And those services tend to be the important ones. > > However, for a simple desktop-linux that statistical approach seems to > be wrong. Your vm has even killed /sbin/getty sometimes, so that I > can't login via a simple console. I see this problem, for most desktop users the biggest task plus some other random bit like the nicelevel and length of runtime etc.. actually result in a reasonable estimate of the best task to kill. > Re-enabling the oom-killer gives a good result for me: > > Dec 6 09:46:19 adlib kernel: Out of Memory: Killed process 643 (khexedit). > Dec 6 09:48:42 adlib kernel: Out of Memory: Killed process 645 (khexedit). > > What I complain is that your vm kills some processes without > mentioning in the logs. How can I determine what processes the kernel > has killed ? no this is is not the case. It is always mentioned in the logs. Grep for VM and you'll find all of them, no difference from before. I agree it would be very bad if the admin couldn't know what was killed exactly to restart everything properly (with the bad app under rlimit the next time ;) > I hope that fairly simple patch does things right for all people that > want the old behaviour. It's already a year ago I last hacked on the > kernel. > > diff -rauN linux-2.4.23/include/linux/sched.h linux-2.4.23-kp1/include/linux/sched.h > --- linux-2.4.23/include/linux/sched.h Fri Nov 28 19:26:21 2003 > +++ linux-2.4.23-kp1/include/linux/sched.h Sat Dec 6 09:57:04 2003 > @@ -429,6 +429,7 @@ > #define PF_DUMPCORE 0x00000200 /* dumped core */ > #define PF_SIGNALED 0x00000400 /* killed by a signal */ > #define PF_MEMALLOC 0x00000800 /* Allocating memory */ > +#define PF_MEMDIE 0x00001000 /* Killed for out-of-memory */ > #define PF_FREE_PAGES 0x00002000 /* per process page freeing */ > #define PF_NOIO 0x00004000 /* avoid generating further I/O */ > > diff -rauN linux-2.4.23/mm/oom_kill.c linux-2.4.23-kp1/mm/oom_kill.c > --- linux-2.4.23/mm/oom_kill.c Fri Nov 28 19:26:21 2003 > +++ linux-2.4.23-kp1/mm/oom_kill.c Fri Dec 5 20:31:39 2003 > @@ -21,8 +21,6 @@ > #include <linux/swapctl.h> > #include <linux/timex.h> > > -#if 0 /* Nothing in this file is used */ > - > /* #define DEBUG */ > > /** > @@ -257,5 +255,3 @@ > first = now; > count = 0; > } > - > -#endif /* Unused file */ > diff -rauN linux-2.4.23/mm/vmscan.c linux-2.4.23-kp1/mm/vmscan.c > --- linux-2.4.23/mm/vmscan.c Fri Nov 28 19:26:21 2003 > +++ linux-2.4.23-kp1/mm/vmscan.c Sat Dec 6 10:21:55 2003 > @@ -649,13 +649,7 @@ > failed_swapout = !swap_out(classzone); > } while (--tries); > > - if (likely(current->pid != 1)) > - break; > - if (!check_classzone_need_balance(classzone)) > - break; > - > - __set_current_state(TASK_RUNNING); > - yield(); > + out_of_memory(); > } > > return 0; this should go back to the 2.4.22 deadlock prone oom killer yes. you can also leave the yield() in the main loop (after out_of_memory()) (__set_current_state(TASK_RUNNING) isn't needed these days just yield is safe enough). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-09 14:21 ` Andrea Arcangeli @ 2003-12-09 14:52 ` Richard B. Johnson 2003-12-09 17:06 ` Andrea Arcangeli 0 siblings, 1 reply; 19+ messages in thread From: Richard B. Johnson @ 2003-12-09 14:52 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Kristian Peters, linux-kernel, Robert L. Harris On Tue, 9 Dec 2003, Andrea Arcangeli wrote: > On Sat, Dec 06, 2003 at 10:31:43AM +0100, Kristian Peters wrote: > > Andrea Arcangeli <andrea@suse.de> schrieb: > > > what you're complaining is the 'selection of the task to be killed'. > > > That's not solvable. the kernel can't read your brain period. Only if > > > the kernel could read the brain of the adminstrator then you would be > > > happy, there is no way the kernel can know which is the task you really > > > want to have killed first. > > > > I agree. On a server the most likely application to be killed would be the service with the most pages in memory. And those services tend to be the important ones. > > > > However, for a simple desktop-linux that statistical approach seems to > > be wrong. Your vm has even killed /sbin/getty sometimes, so that I > > can't login via a simple console. > Not true! Killing a getty or agetty or any other login program cannot prevent a login! Init will just start another one if there is enough RAM to fork()/exec(). If there isn't, it has nothing to do with killing the getties and everything to do with being completely out of RAM. In that case, you need to enable quotas and not let fork-bombs or other deliberate crashers be executed by root. Script started on Tue Dec 9 09:42:06 2003 # ps PID TTY STAT TIME COMMAND 478 3 S 0:00 /sbin/agetty 38400 tty3 479 4 S 0:00 /sbin/agetty 38400 tty4 480 5 S 0:00 /sbin/agetty 38400 tty5 481 6 S 0:00 /sbin/agetty 38400 tty6 4404 1 S 0:00 -bash 6102 2 S 0:00 -bash 8772 1 S 0:00 pine 8778 2 S 0:00 script 8779 2 S 0:00 script 8780 p0 S 0:00 bash -i 8784 p0 R 0:00 ps # killall -9 agetty # ps PID TTY STAT TIME COMMAND 4404 1 S 0:00 -bash 6102 2 S 0:00 -bash 8772 1 S 0:00 pine 8778 2 S 0:00 script 8779 2 S 0:00 script 8780 p0 S 0:00 bash -i 8787 3 S 0:00 /sbin/agetty 38400 tty3 8788 4 S 0:00 /sbin/agetty 38400 tty4 8789 5 S 0:00 /sbin/agetty 38400 tty5 8790 6 S 0:00 /sbin/agetty 38400 tty6 8791 p0 R 0:00 ps # exit exit Script done on Tue Dec 9 09:42:41 2003 The above shows 4 getty's being killed. They are immediately restarted by init. These getties become new shell-tasks in the following order: exec /sbin/login, exec /bin/bash. There are no additional forks so, in principle, you have nearly all the resources needed for a shell as soon as a getty is started. Cheers, Dick Johnson Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-09 14:52 ` Richard B. Johnson @ 2003-12-09 17:06 ` Andrea Arcangeli 2003-12-09 18:50 ` Kristian Peters 0 siblings, 1 reply; 19+ messages in thread From: Andrea Arcangeli @ 2003-12-09 17:06 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Kristian Peters, linux-kernel, Robert L. Harris On Tue, Dec 09, 2003 at 09:52:13AM -0500, Richard B. Johnson wrote: > On Tue, 9 Dec 2003, Andrea Arcangeli wrote: > > > On Sat, Dec 06, 2003 at 10:31:43AM +0100, Kristian Peters wrote: > > > Andrea Arcangeli <andrea@suse.de> schrieb: > > > > what you're complaining is the 'selection of the task to be killed'. > > > > That's not solvable. the kernel can't read your brain period. Only if > > > > the kernel could read the brain of the adminstrator then you would be > > > > happy, there is no way the kernel can know which is the task you really > > > > want to have killed first. > > > > > > I agree. On a server the most likely application to be killed would be the service with the most pages in memory. And those services tend to be the important ones. > > > > > > However, for a simple desktop-linux that statistical approach seems to > > > be wrong. Your vm has even killed /sbin/getty sometimes, so that I > > > can't login via a simple console. > > > > Not true! Killing a getty or agetty or any other login > program cannot prevent a login! Init will just start another killing getty is a very very lucky scenario indeed. the way I read it was that agetty can hardly be a mem eater, and in turn the same way agetty was killed, it could have been ssh or X to be killed. That's true, but the same issues on the desktop will happen if you have no swap if you enable the old oom killer, the vm will go nuts even if you don't run oom, and there will be all other sort of troubles mentioned a few times already. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-09 17:06 ` Andrea Arcangeli @ 2003-12-09 18:50 ` Kristian Peters 0 siblings, 0 replies; 19+ messages in thread From: Kristian Peters @ 2003-12-09 18:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: root, linux-kernel, Robert.L.Harris Andrea Arcangeli <andrea@suse.de> schrieb: > killing getty is a very very lucky scenario indeed. the way I read it > was that agetty can hardly be a mem eater, and in turn the same way > agetty was killed, it could have been ssh or X to be killed. That's > true, but the same issues on the desktop will happen if you have no swap > if you enable the old oom killer, the vm will go nuts even if you don't > run oom, and there will be all other sort of troubles mentioned a few > times already. I think getty was respawned but the console was screwed somehow. The screen was all black as with broken framebuffer and I couldn't type anything (even not the "blind" way.). Only X was there. Thanks for all your answers. *Kristian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-05 13:05 ` Kristian Peters 2003-12-05 13:56 ` Robert L. Harris 2003-12-05 19:58 ` Andrea Arcangeli @ 2003-12-05 22:38 ` Mike Fedyk 2003-12-05 22:56 ` Andrea Arcangeli 2 siblings, 1 reply; 19+ messages in thread From: Mike Fedyk @ 2003-12-05 22:38 UTC (permalink / raw) To: Kristian Peters; +Cc: Andrea Arcangeli, lkml On Fri, Dec 05, 2003 at 02:05:20PM +0100, Kristian Peters wrote: > Dec 5 13:34:41 adlib kernel: VM: killing process khexedit > Dec 5 13:37:27 adlib kernel: VM: killing process mozilla-bin > Dec 5 13:37:56 adlib kernel: VM: killing process mozilla-bin > Dec 5 13:40:32 adlib kernel: VM: killing process XFree86 This is with 2.4.23? Why is the VM killing anything if the oom-killer is removed? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: oom killer in 2.4.23 2003-12-05 22:38 ` Mike Fedyk @ 2003-12-05 22:56 ` Andrea Arcangeli 0 siblings, 0 replies; 19+ messages in thread From: Andrea Arcangeli @ 2003-12-05 22:56 UTC (permalink / raw) To: Kristian Peters, lkml On Fri, Dec 05, 2003 at 02:38:25PM -0800, Mike Fedyk wrote: > On Fri, Dec 05, 2003 at 02:05:20PM +0100, Kristian Peters wrote: > > > Dec 5 13:34:41 adlib kernel: VM: killing process khexedit > > Dec 5 13:37:27 adlib kernel: VM: killing process mozilla-bin > > Dec 5 13:37:56 adlib kernel: VM: killing process mozilla-bin > > Dec 5 13:40:32 adlib kernel: VM: killing process XFree86 > > This is with 2.4.23? > > Why is the VM killing anything if the oom-killer is removed? the 2.4.23 kernel will kill the task that triggered the oom condition, it has to kill something of course, and the task that triggered the oom during the page fault is the only one we can kill synchronously easily, in turn guaranteeing that the machine won't deadlock in omm. the oom killer normally is meant as the heuristc that chooses a special task to kill, instead of the one that triggered the oom condition. But choosing a different task and not the one that triggered the oom in the page fault, isn't math safe w.r.t deadlocks. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2003-12-09 18:50 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-12-04 16:12 oom killer in 2.4.23 Peter Bergmann 2003-12-04 17:02 ` Maciej Zenczykowski 2003-12-04 17:20 ` Guillermo Menguez Alvarez 2003-12-04 23:52 ` Andrea Arcangeli 2003-12-04 18:33 ` Peter Bergmann 2003-12-04 18:42 ` Jens Axboe 2003-12-04 20:38 ` Peter Bergmann 2003-12-04 20:28 ` Szakacsits Szabolcs 2003-12-04 23:58 ` Andrea Arcangeli [not found] <Z6Iv-7O2-29@gated-at.bofh.it> [not found] ` <Z8Ag-3BK-3@gated-at.bofh.it> [not found] ` <Zbyn-23P-29@gated-at.bofh.it> 2003-12-05 13:05 ` Kristian Peters 2003-12-05 13:56 ` Robert L. Harris 2003-12-05 19:58 ` Andrea Arcangeli 2003-12-06 9:31 ` Kristian Peters 2003-12-09 14:21 ` Andrea Arcangeli 2003-12-09 14:52 ` Richard B. Johnson 2003-12-09 17:06 ` Andrea Arcangeli 2003-12-09 18:50 ` Kristian Peters 2003-12-05 22:38 ` Mike Fedyk 2003-12-05 22:56 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).