All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-20  9:51 ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-02-20  9:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Mel Gorman, Nishanth Aravamudan, linux-mm, LKML

We had a report about strange OOM killer strikes on a PPC machine
although there was a lot of swap free and a tons of anonymous memory
which could be swapped out. In the end it turned out that the OOM was
a side effect of zone reclaim which wasn't doesn't unmap and swapp out
and so the system was pushed to the OOM. Although this sounds like a bug
somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
numactl on the said hardware suggests that the zone reclaim should
have been set in the first place:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node   0   2
0:  10  40
2:  40  10

So all the CPUs are associated with Node0 which doesn't have any memory
while Node2 contains all the available memory. Node distances cause an
automatic zone_reclaim_mode enabling.

Zone reclaim is intended to keep the allocations local but this doesn't
make any sense on the memory less nodes. So let's exclude such nodes
for init_zone_allows_reclaim which evaluates zone reclaim behavior and
suitable reclaim_nodes.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e953f07edb0..fafb9e24e87f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
 {
 	int i;
 
-	for_each_online_node(i)
+	for_each_node_state(i, N_MEMORY)
 		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
 			node_set(i, NODE_DATA(nid)->reclaim_nodes);
 		else
@@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
-	init_zone_allows_reclaim(nid);
+	if (node_state(nid, N_MEMORY))
+		init_zone_allows_reclaim(nid);
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
 #endif
-- 
1.9.0.rc3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-20  9:51 ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-02-20  9:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Mel Gorman, Nishanth Aravamudan, linux-mm, LKML

We had a report about strange OOM killer strikes on a PPC machine
although there was a lot of swap free and a tons of anonymous memory
which could be swapped out. In the end it turned out that the OOM was
a side effect of zone reclaim which wasn't doesn't unmap and swapp out
and so the system was pushed to the OOM. Although this sounds like a bug
somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
numactl on the said hardware suggests that the zone reclaim should
have been set in the first place:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node   0   2
0:  10  40
2:  40  10

So all the CPUs are associated with Node0 which doesn't have any memory
while Node2 contains all the available memory. Node distances cause an
automatic zone_reclaim_mode enabling.

Zone reclaim is intended to keep the allocations local but this doesn't
make any sense on the memory less nodes. So let's exclude such nodes
for init_zone_allows_reclaim which evaluates zone reclaim behavior and
suitable reclaim_nodes.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e953f07edb0..fafb9e24e87f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
 {
 	int i;
 
-	for_each_online_node(i)
+	for_each_node_state(i, N_MEMORY)
 		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
 			node_set(i, NODE_DATA(nid)->reclaim_nodes);
 		else
@@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
-	init_zone_allows_reclaim(nid);
+	if (node_state(nid, N_MEMORY))
+		init_zone_allows_reclaim(nid);
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
 #endif
-- 
1.9.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
  2014-02-20  9:51 ` Michal Hocko
@ 2014-02-21 15:04   ` Michal Hocko
  -1 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-02-21 15:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Mel Gorman, Nishanth Aravamudan, linux-mm, LKML

On Thu 20-02-14 10:51:44, Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should

Hmm, not somehow got lost... It should read "suggests that the zone
reclaim should not have been set in the first place"

> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node   0   2
> 0:  10  40
> 2:  40  10
> 
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
> 
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exclude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> ---
>  mm/page_alloc.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e953f07edb0..fafb9e24e87f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
>  {
>  	int i;
>  
> -	for_each_online_node(i)
> +	for_each_node_state(i, N_MEMORY)
>  		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
>  			node_set(i, NODE_DATA(nid)->reclaim_nodes);
>  		else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  
>  	pgdat->node_id = nid;
>  	pgdat->node_start_pfn = node_start_pfn;
> -	init_zone_allows_reclaim(nid);
> +	if (node_state(nid, N_MEMORY))
> +		init_zone_allows_reclaim(nid);
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>  #endif
> -- 
> 1.9.0.rc3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-21 15:04   ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-02-21 15:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Mel Gorman, Nishanth Aravamudan, linux-mm, LKML

On Thu 20-02-14 10:51:44, Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should

Hmm, not somehow got lost... It should read "suggests that the zone
reclaim should not have been set in the first place"

> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node   0   2
> 0:  10  40
> 2:  40  10
> 
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
> 
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exclude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> ---
>  mm/page_alloc.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e953f07edb0..fafb9e24e87f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
>  {
>  	int i;
>  
> -	for_each_online_node(i)
> +	for_each_node_state(i, N_MEMORY)
>  		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
>  			node_set(i, NODE_DATA(nid)->reclaim_nodes);
>  		else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  
>  	pgdat->node_id = nid;
>  	pgdat->node_start_pfn = node_start_pfn;
> -	init_zone_allows_reclaim(nid);
> +	if (node_state(nid, N_MEMORY))
> +		init_zone_allows_reclaim(nid);
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>  #endif
> -- 
> 1.9.0.rc3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
  2014-02-20  9:51 ` Michal Hocko
@ 2014-02-21 22:07   ` Andrew Morton
  -1 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2014-02-21 22:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Mel Gorman, Nishanth Aravamudan, linux-mm, LKML

On Thu, 20 Feb 2014 10:51:44 +0100 Michal Hocko <mhocko@suse.cz> wrote:

> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node   0   2
> 0:  10  40
> 2:  40  10
> 
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
> 
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exclude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
>  {
>  	int i;
>  
> -	for_each_online_node(i)
> +	for_each_node_state(i, N_MEMORY)
>  		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
>  			node_set(i, NODE_DATA(nid)->reclaim_nodes);
>  		else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  
>  	pgdat->node_id = nid;
>  	pgdat->node_start_pfn = node_start_pfn;
> -	init_zone_allows_reclaim(nid);
> +	if (node_state(nid, N_MEMORY))
> +		init_zone_allows_reclaim(nid);
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>  #endif

What happens if someone later hot-adds some memory to that node?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-21 22:07   ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2014-02-21 22:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Mel Gorman, Nishanth Aravamudan, linux-mm, LKML

On Thu, 20 Feb 2014 10:51:44 +0100 Michal Hocko <mhocko@suse.cz> wrote:

> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node   0   2
> 0:  10  40
> 2:  40  10
> 
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
> 
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exclude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
>  {
>  	int i;
>  
> -	for_each_online_node(i)
> +	for_each_node_state(i, N_MEMORY)
>  		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
>  			node_set(i, NODE_DATA(nid)->reclaim_nodes);
>  		else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  
>  	pgdat->node_id = nid;
>  	pgdat->node_start_pfn = node_start_pfn;
> -	init_zone_allows_reclaim(nid);
> +	if (node_state(nid, N_MEMORY))
> +		init_zone_allows_reclaim(nid);
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>  #endif

What happens if someone later hot-adds some memory to that node?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
  2014-02-21 22:07   ` Andrew Morton
  (?)
@ 2014-02-21 23:57   ` Nishanth Aravamudan
  -1 siblings, 0 replies; 9+ messages in thread
From: Nishanth Aravamudan @ 2014-02-21 23:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, David Rientjes, Mel Gorman, linux-mm, LKML

On 21.02.2014 [14:07:35 -0800], Andrew Morton wrote:
> On Thu, 20 Feb 2014 10:51:44 +0100 Michal Hocko <mhocko@suse.cz> wrote:
> 
> > We had a report about strange OOM killer strikes on a PPC machine
> > although there was a lot of swap free and a tons of anonymous memory
> > which could be swapped out. In the end it turned out that the OOM was
> > a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> > and so the system was pushed to the OOM. Although this sounds like a bug
> > somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> > numactl on the said hardware suggests that the zone reclaim should
> > have been set in the first place:
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > node 0 size: 0 MB
> > node 0 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7168 MB
> > node 2 free: 6019 MB
> > node distances:
> > node   0   2
> > 0:  10  40
> > 2:  40  10
> > 
> > So all the CPUs are associated with Node0 which doesn't have any memory
> > while Node2 contains all the available memory. Node distances cause an
> > automatic zone_reclaim_mode enabling.
> > 
> > Zone reclaim is intended to keep the allocations local but this doesn't
> > make any sense on the memory less nodes. So let's exclude such nodes
> > for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> > suitable reclaim_nodes.
> > 
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> >  {
> >  	int i;
> >  
> > -	for_each_online_node(i)
> > +	for_each_node_state(i, N_MEMORY)
> >  		if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> >  			node_set(i, NODE_DATA(nid)->reclaim_nodes);
> >  		else
> > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> >  
> >  	pgdat->node_id = nid;
> >  	pgdat->node_start_pfn = node_start_pfn;
> > -	init_zone_allows_reclaim(nid);
> > +	if (node_state(nid, N_MEMORY))
> > +		init_zone_allows_reclaim(nid);
> >  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> >  	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> >  #endif
> 
> What happens if someone later hot-adds some memory to that node?

This probably isn't a very good answer, but I think the question of how
to support a node that starts off memoryless and then gets memory
hot-added later is still open. But this at least gets us further and
would be needed anyways, I think.

I'm going to try and look at the hot-add component after we get this
base stuff in, if that's ok, but it's definitely on my todo list.

Thanks,
Nish


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
  2014-02-20  9:51 ` Michal Hocko
@ 2014-02-25 13:06   ` Mel Gorman
  -1 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2014-02-25 13:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, David Rientjes, Nishanth Aravamudan, linux-mm, LKML

On Thu, Feb 20, 2014 at 10:51:44AM +0100, Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node   0   2
> 0:  10  40
> 2:  40  10
> 
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
> 
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exclude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

FWIW, I do expect that memory hot-adding memory later will cause problems
because the node is only initialised if it was previously considered offline
but Nishanth is making changes in that area and it would be easier to fix
it up in that context.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-25 13:06   ` Mel Gorman
  0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2014-02-25 13:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, David Rientjes, Nishanth Aravamudan, linux-mm, LKML

On Thu, Feb 20, 2014 at 10:51:44AM +0100, Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node   0   2
> 0:  10  40
> 2:  40  10
> 
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
> 
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exclude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

FWIW, I do expect that memory hot-adding memory later will cause problems
because the node is only initialised if it was previously considered offline
but Nishanth is making changes in that area and it would be easier to fix
it up in that context.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-25 13:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-20  9:51 [PATCH] mm: exclude memory less nodes from zone_reclaim Michal Hocko
2014-02-20  9:51 ` Michal Hocko
2014-02-21 15:04 ` Michal Hocko
2014-02-21 15:04   ` Michal Hocko
2014-02-21 22:07 ` Andrew Morton
2014-02-21 22:07   ` Andrew Morton
2014-02-21 23:57   ` Nishanth Aravamudan
2014-02-25 13:06 ` Mel Gorman
2014-02-25 13:06   ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.