linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Terminate process that fails on a constrained allocation
@ 2006-02-08 18:05 Christoph Lameter
  2006-02-08 18:13 ` Andi Kleen
  2006-02-08 18:33 ` Paul Jackson
  0 siblings, 2 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 18:05 UTC (permalink / raw)
  To: akpm; +Cc: pj, ak, linux-kernel

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is similar
to what we do when we run out of hugepages.

This patch adds a check before the out of memory killer is invoked. At that
point performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should not call
the OOM killer.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc2/mm/page_alloc.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 09:55:21.000000000 -0800
@@ -1011,6 +1011,28 @@ rebalance:
 		if (page)
 			goto got_pg;
 
+#ifdef CONFIG_NUMA
+		/*
+		 * In the NUMA case we may have gotten here because the
+		 * memory policies or cpusets have restricted the allocation.
+		 */
+		{
+			nodemask_t nodes;
+
+			nodes_empty(nodes);
+			for (z = zonelist->zones; *z; z++)
+				if (cpuset_zone_allowed(*z, gfp_mask))
+					node_set((*z)->zone_pgdat->node_id,
+							nodes);
+			/*
+			 * If we were only allowed to allocate from
+			 * a subset of nodes then terminate the process.
+			 */
+			if (!nodes_subset(node_online_map, nodes))
+				return NULL;
+		}
+#endif
+
 		out_of_memory(gfp_mask, order);
 		goto restart;
 	}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:05 Terminate process that fails on a constrained allocation Christoph Lameter
@ 2006-02-08 18:13 ` Andi Kleen
  2006-02-08 18:34   ` Paul Jackson
  2006-02-08 18:33 ` Paul Jackson
  1 sibling, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 18:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, pj, linux-kernel

On Wednesday 08 February 2006 19:05, Christoph Lameter wrote:

> This patch adds a check before the out of memory killer is invoked. At that
> point performance considerations do not matter much so we just scan the zonelist
> and reconstruct a list of nodes. If the list of nodes does not contain all
> online nodes then this is a constrained allocation and we should not call
> the OOM killer.

Looks good.

Ok I would have used an noinline function instead of putting the code inline
to prevent register pressure etc. and make it more readable.

-Andi


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:05 Terminate process that fails on a constrained allocation Christoph Lameter
  2006-02-08 18:13 ` Andi Kleen
@ 2006-02-08 18:33 ` Paul Jackson
  2006-02-08 18:42   ` Christoph Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Jackson @ 2006-02-08 18:33 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, ak, linux-kernel

A couple of comments ...

It took me an extra couple of passes to understand this code.

I wonder if the following, essentially equivalent (if I didn't
break something - never tested this) is easier to understand:

#ifdef CONFIG_NUMA
		/*
		 * In the NUMA case we may have gotten here because the
		 * memory policies or cpusets have restricted the allocation.
		 */
		{
			nodemask_t nodes;	/* compute nodes not allowd */

			nodes = node_online_map;
			for (z = zonelist->zones; *z; z++)
				if (cpuset_zone_allowed(*z, gfp_mask))
					node_clear((*z)->zone_pgdat->node_id,
							nodes);
			/*
			 * If there are any nodes left set in 'nodes', these
			 * are nodes the cpuset or mempolicy settings aren't
			 * letting us use.  In that case, return NULL to the
			 * current task, rather than invoking out_of_memory()
			 * on the system.
			 */
			if (!nodes_empty(nodes))
				return NULL;
		}
#endif

Second point - I thought I had already throttled the oom_killer
to some degree, with the lines, in mm/oom_kill.c select_bad_process():

                /* If p's nodes don't overlap ours, it won't help to kill p. */
                if (!cpuset_excl_nodes_overlap(p))
                        continue;

What your patch is doing affectively disables the oom_killer for
big numa systems, rather than having it operate within the set
of tasks using overlapping resources.

Do we need this more radical constraint on the oom_killer?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:13 ` Andi Kleen
@ 2006-02-08 18:34   ` Paul Jackson
  2006-02-08 18:54     ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Jackson @ 2006-02-08 18:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: clameter, akpm, linux-kernel

Andi wrote:
> Ok I would have used an noinline function instead of putting the code inline
> to prevent register pressure etc. and make it more readable.

good idea.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:33 ` Paul Jackson
@ 2006-02-08 18:42   ` Christoph Lameter
  2006-02-08 18:57     ` Paul Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 18:42 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, ak, linux-kernel

On Wed, 8 Feb 2006, Paul Jackson wrote:

> What your patch is doing affectively disables the oom_killer for
> big numa systems, rather than having it operate within the set
> of tasks using overlapping resources.

No it only disables the oom killer for constrained allocations. If the big 
numa system uses a cpuset that limits the number of nodes then those 
allocations occurring within these cpusets are constrained allocations 
which will then lead to the killing of the application that allocated too 
much memory. I think this is much more consistent than trying to tame the 
OOM killer enough to stay in a certain cpuset.

F.e. a sysadmin may mistakenly start a process allocating too much memory
in a cpuset. The OOM killer will then start randomly shooting other 
processes one of which may be a critical process.. Ouch.

> Do we need this more radical constraint on the oom_killer?

Yes. One can make the OOM killer go postal if one specifies a memory 
policy that contains only one node and then allocates enough memory.

Actually a good Denial of service attack than can be used with cpusets 
and memory policies.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:34   ` Paul Jackson
@ 2006-02-08 18:54     ` Christoph Lameter
  2006-02-08 19:01       ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 18:54 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Andi Kleen, akpm, linux-kernel

On Wed, 8 Feb 2006, Paul Jackson wrote:

> good idea.

Ok. Here is V2:

Terminate process that fails on a constrained allocation

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is similar
to what we do when running out of hugepages.

This patch adds a check before the out of memory killer is invoked. At that
point performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should not calle
the OOM killer.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc2/mm/page_alloc.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 10:53:08.000000000 -0800
@@ -817,6 +818,27 @@ failed:
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
 
 /*
+ * check if a given zonelist allows allocation over all the nodes
+ * in the system.
+ */
+int zonelist_incomplete(struct zonelist *zonelist, gfp_t gfp_mask)
+{
+#ifdef CONFIG_NUMA
+	struct zone **z;
+	nodemask_t nodes;
+
+	nodes = node_online_map;
+	for (z = zonelist->zones; *z; z++)
+		if (cpuset_zone_allowed(*z, gfp_mask))
+			node_clear((*z)->zone_pgdat->node_id,
+					nodes);
+	return !nodes_empty(nodes);
+#else
+	return 0;
+#endif
+}
+
+/*
  * Return 1 if free pages are above 'mark'. This takes into account the order
  * of the allocation.
  */
@@ -1011,6 +1033,9 @@ rebalance:
 		if (page)
 			goto got_pg;
 
+		if (zonelist_incomplete(zonelist, gfp_mask))
+			return NULL;
+
 		out_of_memory(gfp_mask, order);
 		goto restart;
 	}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:42   ` Christoph Lameter
@ 2006-02-08 18:57     ` Paul Jackson
  2006-02-08 19:02       ` Christoph Lameter
  2006-02-08 19:05       ` Andi Kleen
  0 siblings, 2 replies; 28+ messages in thread
From: Paul Jackson @ 2006-02-08 18:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, ak, linux-kernel

> No it only disables the oom killer for constrained allocations.

But on many big numa systems, the way they are administered,
that affectively disables the oom killer.

> F.e. a sysadmin may mistakenly start a process allocating too much memory
> in a cpuset. The OOM killer will then start randomly shooting other 
> processes one of which may be a critical process.. Ouch.

That same exact claim could be made to justify a patch that
removed mm/oom_kill.c entirely.  And the same exact claim
could be made, dropping the "in a cpuset" phrase, on an UMA
desktop PC.

The basic premise of the oom killer is that, instead of blowing
off the caller, who might innocently have asked for one page
too many after some other hog used up all the available memory,
we try to pick the worst offender.

Get the worst offender, not just who ever finally hit the limit.

> Actually a good Denial of service attack than can be used with cpusets 
> and memory policies.

Nothing special about cpusets there.   The same DOS opportunity
exists on simple one cpu, one node systems.

...

Granted, I am objecting to this patch with mixed feelings.

I've yet to be convinced that the oom killer is our friend,
and half of me (not seriously) is almost wishing it were
gone.

Would another option be to continue to fine tune the heuristics
that the oom killer uses to pick its next victim?

What situation did you hit that motivated this change?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:54     ` Christoph Lameter
@ 2006-02-08 19:01       ` Andi Kleen
  2006-02-08 19:15         ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 19:01 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Jackson, akpm, linux-kernel

On Wednesday 08 February 2006 19:54, Christoph Lameter wrote:

> Index: linux-2.6.16-rc2/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
> +++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 10:53:08.000000000 -0800
> @@ -817,6 +818,27 @@ failed:
>  #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
>  
>  /*
> + * check if a given zonelist allows allocation over all the nodes
> + * in the system.
> + */
> +int zonelist_incomplete(struct zonelist *zonelist, gfp_t gfp_mask)

static noinline

-Andi


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:57     ` Paul Jackson
@ 2006-02-08 19:02       ` Christoph Lameter
  2006-02-08 19:05       ` Andi Kleen
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 19:02 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, ak, linux-kernel

On Wed, 8 Feb 2006, Paul Jackson wrote:

> > F.e. a sysadmin may mistakenly start a process allocating too much memory
> > in a cpuset. The OOM killer will then start randomly shooting other 
> > processes one of which may be a critical process.. Ouch.
> 
> That same exact claim could be made to justify a patch that
> removed mm/oom_kill.c entirely.  And the same exact claim
> could be made, dropping the "in a cpuset" phrase, on an UMA
> desktop PC.

Well the justification for the OOM Killer was that it keeps the system 
as a whole alive ..... cpusets are not vital to the system. Usually the 
folks using cpusets are very aware of the memory usage of their processes 
and I think they would hate that a random process be killed.

The obvious case is that someone starts a process that allocates 
large amount of memory while other processes are already running.

> The basic premise of the oom killer is that, instead of blowing
> off the caller, who might innocently have asked for one page
> too many after some other hog used up all the available memory,
> we try to pick the worst offender.
> 
> Get the worst offender, not just who ever finally hit the limit.

Typically the worst offender is who just asked for the page.

> I've yet to be convinced that the oom killer is our friend,
> and half of me (not seriously) is almost wishing it were
> gone.

I definitely agree with you. Lets at least keep it from doing harm as much 
as possible.
 
> Would another option be to continue to fine tune the heuristics
> that the oom killer uses to pick its next victim?
> 
> What situation did you hit that motivated this change?

mbind to one node. Allocate on that node until you trigger the OOM killer.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 18:57     ` Paul Jackson
  2006-02-08 19:02       ` Christoph Lameter
@ 2006-02-08 19:05       ` Andi Kleen
  2006-02-08 20:22         ` Paul Jackson
  1 sibling, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 19:05 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Christoph Lameter, akpm, linux-kernel

On Wednesday 08 February 2006 19:57, Paul Jackson wrote:
> > No it only disables the oom killer for constrained allocations.
> 
> But on many big numa systems, the way they are administered,
> that affectively disables the oom killer.

I guess it won't matter because they are administrated with appropiate ulimits
I guess. And to be honest the OOM killer never really worked all that
well, so it's not a big loss.

[still often wish we had that "global virtual memory ulimit for uid"]


> I've yet to be convinced that the oom killer is our friend,
> and half of me (not seriously) is almost wishing it were
> gone.

It's more than half of me near seriously agreeing with you.

> Would another option be to continue to fine tune the heuristics
> that the oom killer uses to pick its next victim?
> 
> What situation did you hit that motivated this change?

It's a long known design bug of the NUMA policy, but it recently 
hit with some test program again.


-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 19:01       ` Andi Kleen
@ 2006-02-08 19:15         ` Christoph Lameter
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 19:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Paul Jackson, akpm, linux-kernel

On Wed, 8 Feb 2006, Andi Kleen wrote:

> static noinline

Next rev:

Terminate process that fails on a constrained allocation

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is similar
to what we do when running out of hugepages.

This patch adds a check before the out of memory killer is invoked. At that
point performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should not calle
the OOM killer.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc2/mm/page_alloc.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 11:14:32.000000000 -0800
@@ -612,6 +612,7 @@ void drain_remote_pages(void)
 	}
 	local_irq_restore(flags);
 }
+
 #endif
 
 #if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU)
@@ -817,6 +818,27 @@ failed:
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
 
 /*
+ * Check if a given zonelist is complete and allows allocation over all 
+ * the nodes in the system.
+ */
+static noinline int zonelist_incomplete(struct zonelist *zonelist, gfp_t gfp_mask)
+{
+#ifdef CONFIG_NUMA
+	struct zone **z;
+	nodemask_t nodes;
+
+	nodes = node_online_map;
+	for (z = zonelist->zones; *z; z++)
+		if (cpuset_zone_allowed(*z, gfp_mask))
+			node_clear((*z)->zone_pgdat->node_id,
+					nodes);
+	return !nodes_empty(nodes);
+#else
+	return 0;
+#endif
+}
+
+/*
  * Return 1 if free pages are above 'mark'. This takes into account the order
  * of the allocation.
  */
@@ -1011,6 +1033,15 @@ rebalance:
 		if (page)
 			goto got_pg;
 
+		/*
+		 * If we allocated using a zonelist that does not include
+		 * all nodes in the system then the OOM situation may be
+		 * due to configured limitations. Just abort the application
+		 * instead of calling the OOM killer.
+		 */
+		if (zonelist_incomplete(zonelist, gfp_mask))
+			return NULL;
+
 		out_of_memory(gfp_mask, order);
 		goto restart;
 	}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 19:05       ` Andi Kleen
@ 2006-02-08 20:22         ` Paul Jackson
  2006-02-08 20:36           ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Jackson @ 2006-02-08 20:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: clameter, akpm, linux-kernel

If the oom killer is of any use on a numa system using set_mempolicy,
mbind and/or cpusets, then it would be of use in the bootcpuset -- that
cpuset which is often setup to hold the classic Unix daemons.  I'm not
sure we want to do that.

If we changed the mm/oom_kill.c select_bad_process() constraint from

                /* If p's nodes don't overlap ours, it won't help to kill p. */
                if (!cpuset_excl_nodes_overlap(p))
                        continue;

which kills any task that has any overlap with nodes with the current task:

 1) to the much tighter limit of only killing tasks that are on the same or
    a subset of the same nodes, and

 2) changed it to look at -both- the cpuset and MPOL_BIND constraints,

then oom killer would work in a bootcpuset rather as it does now on
simple UMA systems.

The new logic would be -- only kill tasks that are constrained to
the same or a subset of the same nodes as the current task.

With this, the problem you describe with the oom killer and a task that
has used mbind or cpuset to just allow a single node would be fixed --
only tasks constrained to just that node, no more, would be considered
for killing.

At the same time, a typical bootcpuset would have oom killer behaviour
resembling what people have become accustomed to.

If we are going to neuter or eliminate the oom killer, it seems like
we should do it across the board, not just on numa boxes using
some form of memory constraint (mempolicy or cpuset).

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 20:22         ` Paul Jackson
@ 2006-02-08 20:36           ` Christoph Lameter
  2006-02-08 20:55             ` Paul Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 20:36 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Andi Kleen, akpm, linux-kernel

On Wed, 8 Feb 2006, Paul Jackson wrote:

> The new logic would be -- only kill tasks that are constrained to
> the same or a subset of the same nodes as the current task.

If a task has restricted its memory allocation to one node and does 
excessive allocations then that process needs to die not other processes 
that are harmlessly running on the node and that may not be allocating 
memory at the time.

> At the same time, a typical bootcpuset would have oom killer behaviour
> resembling what people have become accustomed to.

People are accustomed of having random processes killed? <shudder>

> If we are going to neuter or eliminate the oom killer, it seems like
> we should do it across the board, not just on numa boxes using
> some form of memory constraint (mempolicy or cpuset).

We are terminating processes perform restricted allocations. Restricted 
allocations are only possible on NUMA boxes so the phrase "numa boxes 
using some form of memory constraint" is a tautology. OOM killing makes 
sense for global allocations if the system is really tight on memory and 
survival is the main goal even if he have to cannibalize processes. 
However, cannibalism is still a taboo.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 20:36           ` Christoph Lameter
@ 2006-02-08 20:55             ` Paul Jackson
  2006-02-08 21:01               ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Jackson @ 2006-02-08 20:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: ak, akpm, linux-kernel

Can anyone give us a good reason why we shouldn't just remove the oom
killer, entirely?

Christoph wrote:
> If a task has restricted its memory allocation to one node and does 
> excessive allocations then that process needs to die not other processes 
> that are harmlessly running on the node and that may not be allocating 
> memory at the time.

That _exact_ same argument applies to a system that only has one node.

If we want to remove the oom killer, lets just remove the oom killer.


> People are accustomed of having random processes killed? <shudder>

That's what the oom killer does ... well, it makes an honest effort
not to be random.

So, yes, since it has been there a long time, people are used to
it.  Maybe they don't like it, maybe with good reason.  But it
is there.


> OOM killing makes 
> sense for global allocations if the system is really tight on memory and 
> survival is the main goal

If that argument justifies OOM killing on a simple UMA system, then
surely, for -some- critical tasks, it justifies it on a big NUMA system.

Either OOM is useful in some cases or it is not.


-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 20:55             ` Paul Jackson
@ 2006-02-08 21:01               ` Andi Kleen
  2006-02-08 21:03                 ` Paul Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 21:01 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Christoph Lameter, akpm, linux-kernel

On Wednesday 08 February 2006 21:55, Paul Jackson wrote:

> If that argument justifies OOM killing on a simple UMA system, then
> surely, for -some- critical tasks, it justifies it on a big NUMA system.
> 
> Either OOM is useful in some cases or it is not.


I don't think you really want to open a  full scale "is the oom killer needed"
thread. Check the archives - there have been some going on for months.

But I think we can agree that together with mbind the oom killer is pretty
useless, can't we?


-Andi (who is definitely in favour of Christoph's latest patch) 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 21:01               ` Andi Kleen
@ 2006-02-08 21:03                 ` Paul Jackson
  2006-02-08 21:21                   ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Jackson @ 2006-02-08 21:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: clameter, akpm, linux-kernel

> I don't think you really want to open a  full scale "is the oom killer needed"
> thread. Check the archives - there have been some going on for months.
> 
> But I think we can agree that together with mbind the oom killer is pretty
> useless, can't we?

Excellent points.

I approve this patch.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 21:03                 ` Paul Jackson
@ 2006-02-08 21:21                   ` Andi Kleen
  2006-02-08 21:39                     ` Andrew Morton
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 21:21 UTC (permalink / raw)
  To: Paul Jackson; +Cc: clameter, akpm, linux-kernel

On Wednesday 08 February 2006 22:03, Paul Jackson wrote:
> > I don't think you really want to open a  full scale "is the oom killer needed"
> > thread. Check the archives - there have been some going on for months.
> > 
> > But I think we can agree that together with mbind the oom killer is pretty
> > useless, can't we?
> 
> Excellent points.
> 
> I approve this patch.

I think it should be put into 2.6.16. Andrew?

I had the small objection about adding static noinline, but it's really not important
and the patch can be used as it.

-Andi



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 21:21                   ` Andi Kleen
@ 2006-02-08 21:39                     ` Andrew Morton
  2006-02-08 22:11                       ` Christoph Lameter
  2006-02-08 23:28                       ` Christoph Lameter
  0 siblings, 2 replies; 28+ messages in thread
From: Andrew Morton @ 2006-02-08 21:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: pj, clameter, linux-kernel

Andi Kleen <ak@suse.de> wrote:
>
> On Wednesday 08 February 2006 22:03, Paul Jackson wrote:
> > > I don't think you really want to open a  full scale "is the oom killer needed"
> > > thread. Check the archives - there have been some going on for months.
> > > 
> > > But I think we can agree that together with mbind the oom killer is pretty
> > > useless, can't we?
> > 
> > Excellent points.
> > 
> > I approve this patch.
> 
> I think it should be put into 2.6.16. Andrew?

Does every single caller of __alloc_pages(__GFP_FS) correctly handle a NULL
return?  I doubt it, in which case this patch will cause oopses and hangs.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 21:39                     ` Andrew Morton
@ 2006-02-08 22:11                       ` Christoph Lameter
  2006-02-08 22:41                         ` Andi Kleen
  2006-02-08 22:48                         ` Christoph Lameter
  2006-02-08 23:28                       ` Christoph Lameter
  1 sibling, 2 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 22:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, pj, linux-kernel

On Wed, 8 Feb 2006, Andrew Morton wrote:

> > I think it should be put into 2.6.16. Andrew?
> 
> Does every single caller of __alloc_pages(__GFP_FS) correctly handle a NULL
> return?  I doubt it, in which case this patch will cause oopses and hangs.

I sent you a patch with static inline..... But I am having second thoughts 
about this patch. Paul is partially right. Maybe we can move the logic 
into the out_of_memory handler for now? That would allow us to implement 
more sophisticated things later (for example page migration would allow us
to move memory of processes that can also allocate on other nodes from the 
nodes where we lack memory) and Paul may put something in there to 
address his concerns.

---

Terminate process that fails on a constrained allocation

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is similar
to what we do when running out of hugepages.

This patch adds a check before the out of memory killer is invoked. At that
point performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should not calle
the OOM killer.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc2/mm/page_alloc.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 14:06:12.000000000 -0800
@@ -1011,8 +1012,10 @@ rebalance:
 		if (page)
 			goto got_pg;
 
-		out_of_memory(gfp_mask, order);
-		goto restart;
+		if (out_of_memory(zonelist, gfp_mask, order))
+			goto restart;
+
+		return NULL;
 	}
 
 	/*
Index: linux-2.6.16-rc2/include/linux/swap.h
===================================================================
--- linux-2.6.16-rc2.orig/include/linux/swap.h	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/include/linux/swap.h	2006-02-08 14:10:02.000000000 -0800
@@ -147,7 +147,7 @@ struct swap_list_t {
 #define vm_swap_full() (nr_swap_pages*2 < total_swap_pages)
 
 /* linux/mm/oom_kill.c */
-extern void out_of_memory(gfp_t gfp_mask, int order);
+extern int out_of_memory(struct zonelist *zl, gfp_t gfp_mask, int order);
 
 /* linux/mm/memory.c */
 extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
Index: linux-2.6.16-rc2/mm/oom_kill.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/oom_kill.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/oom_kill.c	2006-02-08 14:09:43.000000000 -0800
@@ -131,6 +131,27 @@ unsigned long badness(struct task_struct
 }
 
 /*
+ * check if a given zonelist allows allocation over all the nodes
+ * in the system.
+ */
+static noinline int zonelist_incomplete(struct zonelist *zonelist, gfp_t gfp_mask)
+{
+#ifdef CONFIG_NUMA
+	struct zone **z;
+	nodemask_t nodes;
+
+	nodes = node_online_map;
+	for (z = zonelist->zones; *z; z++)
+		if (cpuset_zone_allowed(*z, gfp_mask))
+			node_clear((*z)->zone_pgdat->node_id,
+					nodes);
+	return !nodes_empty(nodes);
+#else
+	return 0;
+#endif
+}
+
+/*
  * Simple selection loop. We chose the process with the highest
  * number of 'points'. We expect the caller will lock the tasklist.
  *
@@ -262,12 +283,23 @@ static struct mm_struct *oom_kill_proces
  * killing a random task (bad), letting the system crash (worse)
  * OR try to be smart about which process to kill. Note that we
  * don't have to be perfect here, we just have to be good.
+ *
+ * Function returns 1 if the allocation should be retried.
+ * zero if the allocation should fail.
  */
-void out_of_memory(gfp_t gfp_mask, int order)
+int out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order)
 {
 	struct mm_struct *mm = NULL;
 	task_t * p;
 
+	/*
+	 * Simply fail an allocation that does not allow
+	 * allocation on all nodes. We may want to have
+	 * more sophisticated logic here in the future.
+	 */
+	if (zonelist_incomplete(zonelist, gfp_mask))
+		return 0;
+
 	if (printk_ratelimit()) {
 		printk("oom-killer: gfp_mask=0x%x, order=%d\n",
 			gfp_mask, order);
@@ -306,4 +338,5 @@ retry:
 	 */
 	if (!test_thread_flag(TIF_MEMDIE))
 		schedule_timeout_interruptible(1);
+	return 1;
 }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 22:11                       ` Christoph Lameter
@ 2006-02-08 22:41                         ` Andi Kleen
  2006-02-08 23:29                           ` Christoph Lameter
  2006-02-08 23:35                           ` Andrew Morton
  2006-02-08 22:48                         ` Christoph Lameter
  1 sibling, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 22:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, pj, linux-kernel

On Wednesday 08 February 2006 23:11, Christoph Lameter wrote:
> On Wed, 8 Feb 2006, Andrew Morton wrote:
> 
> > > I think it should be put into 2.6.16. Andrew?
> > 
> > Does every single caller of __alloc_pages(__GFP_FS) correctly handle a NULL
> > return?  I doubt it, in which case this patch will cause oopses and hangs.
> 
> I sent you a patch with static inline.....

noinline 

> But I am having second thoughts  
> about this patch. Paul is partially right. Maybe we can move the logic 
> into the out_of_memory handler for now? That would allow us to implement 
> more sophisticated things later 

I have my doubts that's really worth it, but ok.

> (for example page migration would allow us 
> to move memory of processes that can also allocate on other nodes from the 
> nodes where we lack memory) and Paul may put something in there to 
> address his concerns.
> 
> ---
> 
> Terminate process that fails on a constrained allocation

Patch looks good for me too.  Thanks.

Unfortunately Andrew's point with the GFP_NOFS still applies :/
But I would consider any caller of this not handling NULL be broken.
Andrew do you have any stronger evidence it's a real problem?

Another way would be to force a default non strict policy with GFP_NOFS, but
that would be somewhat ugly again and impact the fast paths.

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 22:11                       ` Christoph Lameter
  2006-02-08 22:41                         ` Andi Kleen
@ 2006-02-08 22:48                         ` Christoph Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 22:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, pj, linux-kernel

Cleaned up and tested version. Better comments and I forgot to modify 
drivers/char/sysrq.c:


Terminate process that fails on a constrained allocation

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is similar
to what we do when running out of hugepages.

This patch adds a check before we kill processes. At that
point performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should not kill
processes but fail the allocation.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc2/mm/page_alloc.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 14:28:50.000000000 -0800
@@ -1011,8 +1011,10 @@ rebalance:
 		if (page)
 			goto got_pg;
 
-		out_of_memory(gfp_mask, order);
-		goto restart;
+		if (out_of_memory(zonelist, gfp_mask, order))
+			goto restart;
+
+		return NULL;
 	}
 
 	/*
Index: linux-2.6.16-rc2/include/linux/swap.h
===================================================================
--- linux-2.6.16-rc2.orig/include/linux/swap.h	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/include/linux/swap.h	2006-02-08 14:28:50.000000000 -0800
@@ -147,7 +147,7 @@ struct swap_list_t {
 #define vm_swap_full() (nr_swap_pages*2 < total_swap_pages)
 
 /* linux/mm/oom_kill.c */
-extern void out_of_memory(gfp_t gfp_mask, int order);
+extern int out_of_memory(struct zonelist *zl, gfp_t gfp_mask, int order);
 
 /* linux/mm/memory.c */
 extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
Index: linux-2.6.16-rc2/mm/oom_kill.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/oom_kill.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/oom_kill.c	2006-02-08 14:36:27.000000000 -0800
@@ -131,6 +131,27 @@ unsigned long badness(struct task_struct
 }
 
 /*
+ * check if a given zonelist allows allocation over all the nodes
+ * in the system.
+ */
+static noinline int zonelist_incomplete(struct zonelist *zonelist, gfp_t gfp_mask)
+{
+#ifdef CONFIG_NUMA
+	struct zone **z;
+	nodemask_t nodes;
+
+	nodes = node_online_map;
+	for (z = zonelist->zones; *z; z++)
+		if (cpuset_zone_allowed(*z, gfp_mask))
+			node_clear((*z)->zone_pgdat->node_id,
+					nodes);
+	return !nodes_empty(nodes);
+#else
+	return 0;
+#endif
+}
+
+/*
  * Simple selection loop. We chose the process with the highest
  * number of 'points'. We expect the caller will lock the tasklist.
  *
@@ -256,18 +277,33 @@ static struct mm_struct *oom_kill_proces
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * out_of_memory - deal with out of memory situations
+ *
+ * If we are out of memory then check if this was due to the allocation
+ * being restricted to only a part of system memory. If so then
+ * fail the allocation.
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
  * OR try to be smart about which process to kill. Note that we
  * don't have to be perfect here, we just have to be good.
+ *
+ * Returns 1 if the allocation should be retried.
+ *         0 if the allocation should fail.
  */
-void out_of_memory(gfp_t gfp_mask, int order)
+int out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order)
 {
 	struct mm_struct *mm = NULL;
 	task_t * p;
 
+	/*
+	 * Simply fail an allocation that does not allow
+	 * allocation on all nodes. We may want to have
+	 * more sophisticated logic here in the future.
+	 */
+	if (zonelist_incomplete(zonelist, gfp_mask))
+		return 0;
+
 	if (printk_ratelimit()) {
 		printk("oom-killer: gfp_mask=0x%x, order=%d\n",
 			gfp_mask, order);
@@ -306,4 +342,5 @@ retry:
 	 */
 	if (!test_thread_flag(TIF_MEMDIE))
 		schedule_timeout_interruptible(1);
+	return 1;
 }
Index: linux-2.6.16-rc2/drivers/char/sysrq.c
===================================================================
--- linux-2.6.16-rc2.orig/drivers/char/sysrq.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/drivers/char/sysrq.c	2006-02-08 14:29:17.000000000 -0800
@@ -243,7 +243,7 @@ static struct sysrq_key_op sysrq_term_op
 
 static void moom_callback(void *ignored)
 {
-	out_of_memory(GFP_KERNEL, 0);
+	out_of_memory(&NODE_DATA(0)->node_zonelists[ZONE_NORMAL], GFP_KERNEL, 0);
 }
 
 static DECLARE_WORK(moom_work, moom_callback, NULL);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 21:39                     ` Andrew Morton
  2006-02-08 22:11                       ` Christoph Lameter
@ 2006-02-08 23:28                       ` Christoph Lameter
  2006-02-08 23:43                         ` Andrew Morton
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 23:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, pj, linux-kernel

On Wed, 8 Feb 2006, Andrew Morton wrote:

> > I think it should be put into 2.6.16. Andrew?
> 
> Does every single caller of __alloc_pages(__GFP_FS) correctly handle a NULL
> return?  I doubt it, in which case this patch will cause oopses and hangs.

If a caller cannot handle NULL then __GFP_NOFAIL has to be set, right?

So we will never get to the piece of code under discussion.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 22:41                         ` Andi Kleen
@ 2006-02-08 23:29                           ` Christoph Lameter
  2006-02-08 23:35                           ` Andrew Morton
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 23:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, pj, linux-kernel

On Wed, 8 Feb 2006, Andi Kleen wrote:

> Unfortunately Andrew's point with the GFP_NOFS still applies :/
> But I would consider any caller of this not handling NULL be broken.
> Andrew do you have any stronger evidence it's a real problem?

I would think that a caller will have to set __GFP_NOFAIL if a NULL is 
unacceptable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 22:41                         ` Andi Kleen
  2006-02-08 23:29                           ` Christoph Lameter
@ 2006-02-08 23:35                           ` Andrew Morton
  1 sibling, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2006-02-08 23:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: clameter, pj, linux-kernel

Andi Kleen <ak@suse.de> wrote:
>
> 
> Unfortunately Andrew's point with the GFP_NOFS still applies :/
> But I would consider any caller of this not handling NULL be broken.

Sure.

> Andrew do you have any stronger evidence it's a real problem?

No.  About a million years ago I had a make-alloc_pages-fail-at-1%-rate
debug patch.  Doing that again would be an interesting exercise.

Another option is to only fail userspace allocations (__GFP_HIGHMEM is set,
or a new flag in GFP_HIGHUSER).  For the workloads which the NUMA guys care
about I expect this is a 99% solution and the amount of code which it puts
at risk is vastly less.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 23:28                       ` Christoph Lameter
@ 2006-02-08 23:43                         ` Andrew Morton
  2006-02-08 23:54                           ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2006-02-08 23:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: ak, pj, linux-kernel

Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> On Wed, 8 Feb 2006, Andrew Morton wrote:
> 
> > > I think it should be put into 2.6.16. Andrew?
> > 
> > Does every single caller of __alloc_pages(__GFP_FS) correctly handle a NULL
> > return?  I doubt it, in which case this patch will cause oopses and hangs.
> 
> If a caller cannot handle NULL then __GFP_NOFAIL has to be set, right?

That would assume non-buggy code.  I'm talking about the exercising of
hitherto-unused codepaths.  We've fixed many, many pieces of code which
simply assumed that kmalloc(GFP_KERNEL) succeeds.  I doubt if many such
simple bugs still exist, but there will be more subtle ones in there.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 23:43                         ` Andrew Morton
@ 2006-02-08 23:54                           ` Christoph Lameter
  2006-02-08 23:57                             ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 23:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ak, pj, linux-kernel

On Wed, 8 Feb 2006, Andrew Morton wrote:

> > If a caller cannot handle NULL then __GFP_NOFAIL has to be set, right?
> 
> That would assume non-buggy code.  I'm talking about the exercising of
> hitherto-unused codepaths.  We've fixed many, many pieces of code which
> simply assumed that kmalloc(GFP_KERNEL) succeeds.  I doubt if many such
> simple bugs still exist, but there will be more subtle ones in there.

We could add __GFP_NOFAIL to kmem_getpages in slab.c to insure that 
kmalloc waits rather than return NULL. Also a too drastic measure right?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
  2006-02-08 23:54                           ` Christoph Lameter
@ 2006-02-08 23:57                             ` Andi Kleen
  0 siblings, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2006-02-08 23:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, pj, linux-kernel

On Thursday 09 February 2006 00:54, Christoph Lameter wrote:
> On Wed, 8 Feb 2006, Andrew Morton wrote:
> 
> > > If a caller cannot handle NULL then __GFP_NOFAIL has to be set, right?
> > 
> > That would assume non-buggy code.  I'm talking about the exercising of
> > hitherto-unused codepaths.  We've fixed many, many pieces of code which
> > simply assumed that kmalloc(GFP_KERNEL) succeeds.  I doubt if many such
> > simple bugs still exist, but there will be more subtle ones in there.
> 
> We could add __GFP_NOFAIL to kmem_getpages in slab.c to insure that 
> kmalloc waits rather than return NULL. Also a too drastic measure right?

Definitely too drastic. I bet that would cause deadlocks in quite some loads.

-Andi


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Terminate process that fails on a constrained allocation
@ 2006-02-08 20:14 linux
  0 siblings, 0 replies; 28+ messages in thread
From: linux @ 2006-02-08 20:14 UTC (permalink / raw)
  To: clameter; +Cc: linux-kernel

Would perhaps a less drastic solution, that at least supports the common
partitioned-system configuration, be to limit the oom killer to processes
whose nodes are a *subset* of ours?

That way, a limited process won't kill any unlimited processes, but it
can fight with other processes with the same limits.  However, an
unlimited process discovering oom can kill anything on the system if
necessary.

(This requires a modified version of cpuset_excl_nodes_overlap that
calls nodes_subset() instead of nodes_intersects(), but it's pretty
straightforward.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-02-08 23:57 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-08 18:05 Terminate process that fails on a constrained allocation Christoph Lameter
2006-02-08 18:13 ` Andi Kleen
2006-02-08 18:34   ` Paul Jackson
2006-02-08 18:54     ` Christoph Lameter
2006-02-08 19:01       ` Andi Kleen
2006-02-08 19:15         ` Christoph Lameter
2006-02-08 18:33 ` Paul Jackson
2006-02-08 18:42   ` Christoph Lameter
2006-02-08 18:57     ` Paul Jackson
2006-02-08 19:02       ` Christoph Lameter
2006-02-08 19:05       ` Andi Kleen
2006-02-08 20:22         ` Paul Jackson
2006-02-08 20:36           ` Christoph Lameter
2006-02-08 20:55             ` Paul Jackson
2006-02-08 21:01               ` Andi Kleen
2006-02-08 21:03                 ` Paul Jackson
2006-02-08 21:21                   ` Andi Kleen
2006-02-08 21:39                     ` Andrew Morton
2006-02-08 22:11                       ` Christoph Lameter
2006-02-08 22:41                         ` Andi Kleen
2006-02-08 23:29                           ` Christoph Lameter
2006-02-08 23:35                           ` Andrew Morton
2006-02-08 22:48                         ` Christoph Lameter
2006-02-08 23:28                       ` Christoph Lameter
2006-02-08 23:43                         ` Andrew Morton
2006-02-08 23:54                           ` Christoph Lameter
2006-02-08 23:57                             ` Andi Kleen
2006-02-08 20:14 linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).