<prelude> I have a large multithreaded program that has a habit of using too much memory, and as a safeguard, I want to kill it before it makes the system unstable. The OOM killer often guesses wrong, and RLIMIT_AS kills too soon because of the address space used up by the many thread stacks. So I'd like an RLIMIT_RSS that just kills the fat process. There have been a couple patches to implement RLIMIT_RSS, e.g. Peter Chubb's http://marc.theaimsgroup.com/?l=linux-kernel&m=97892951101598&w=2 and the two from Rik, all of which are too complex for my needs (and which only swap out instead of kill), so I guess I have to roll my own. </prelude> Rik's patch checks rss in handle_mm_fault(); Peter's checked it in do_swap_page() and do_anonymous_page(). As a kernel newbie, I don't have a feel for how those calls relate to each other. Is there a tool somewhere that will take a set of function names and list all the kernel call chains that start in one of the functions and end in another? - Dan
On Wed, 2002-09-25 at 05:36, Dan Kegel wrote:
> <prelude>
> I have a large multithreaded program that has a habit of using too
> much memory, and as a safeguard, I want to kill it before it makes
> the system unstable. The OOM killer often guesses wrong, and RLIMIT_AS
> kills too soon because of the address space used up by the many thread
> stacks.
> So I'd like an RLIMIT_RSS that just kills the fat process.
The RSS limit isnt a "kill" limit in Unix. its a residency limit. Its
preventing the obese process from getting more than a certain amount of
RAM as opposed to swap
Alan Cox wrote: > > On Wed, 2002-09-25 at 05:36, Dan Kegel wrote: > > <prelude> > > I have a large multithreaded program that has a habit of using too > > much memory, and as a safeguard, I want to kill it before it makes > > the system unstable. The OOM killer often guesses wrong, and RLIMIT_AS > > kills too soon because of the address space used up by the many thread > > stacks. > > So I'd like an RLIMIT_RSS that just kills the fat process. > > The RSS limit isnt a "kill" limit in Unix. its a residency limit. Its > preventing the obese process from getting more than a certain amount of > RAM as opposed to swap Yeah. RLIMIT_RSS seemed like something I could hijack for the purpose, though. And the code change was really small ( http://marc.theaimsgroup.com/?l=linux-kernel&m=103299570928378 ). If only the darn program didn't have so many threads, RLIMIT_AS or the no-overcommit patch would be perfect. I unfortunately can't get rid of the threads, so I'm stuck trying to figure out some way to kill the right program when the system gets low on memory. Maybe I should look at giving the OOM killer hints? - Dan
On Thursday 26 September 2002 12:17 pm, Dan Kegel wrote: > If only the darn program didn't have so many threads, RLIMIT_AS > or the no-overcommit patch would be perfect. I unfortunately can't > get rid of the threads, so I'm stuck trying to figure out some way > to kill the right program when the system gets low on memory. > > Maybe I should look at giving the OOM killer hints? The OOM killer should certainly know about threads and thread groups. If you kill one thread, you generally have to kill the whole group because there's no way of knowing if that thread was holding a futex or otherwise custodian of critical data and thus you just threw the program into la-la land. > - Dan Rob
Rob Landley wrote:
>
> On Thursday 26 September 2002 12:17 pm, Dan Kegel wrote:
>
> > If only the darn program didn't have so many threads, RLIMIT_AS
> > or the no-overcommit patch would be perfect. I unfortunately can't
> > get rid of the threads, so I'm stuck trying to figure out some way
> > to kill the right program when the system gets low on memory.
> >
> > Maybe I should look at giving the OOM killer hints?
>
> The OOM killer should certainly know about threads and thread groups. If you
> kill one thread, you generally have to kill the whole group because there's
> no way of knowing if that thread was holding a futex or otherwise custodian
> of critical data and thus you just threw the program into la-la land.
The OOM killer gets that part right; it kills all threads that share the
same mm. Where it screws up is in picking the process to kill.
This is understandable, since it's a tough problem.
Hey, how about this: I could teach the OOM killer to look at
RLIMIT_RSS. Processes which were at or nearly at their RLIMIT_RSS
would be killed first. That would be more generally useful than
my hacky little patch, and it would be even tinier. Like this, say:
--- oom_kill.c.orig Thu Sep 26 17:31:12 2002
+++ oom_kill.c Thu Sep 26 17:36:44 2002
@@ -86,6 +86,15 @@
points *= 2;
/*
+ * Processes at or near their RSS or AS limits are probably causing
+ * trouble, so double their badness points.
+ */
+ if (((3 * p->mm->rss) / 4) >= (p->rlim[RLIMIT_RSS].rlim_max >>
PAGE_SHIFT))
+ points *= 2;
+ if (((3 * p->mm->total_vm) / 4) >= (p->rlim[RLIMIT_AS].rlim_max >>
PAGE_SHIFT))
+ points *= 2;
+
+ /*
* Superuser processes are usually more important, so we make it
* less likely that we kill those.
*/
How's that look?
- Dan