All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
@ 2011-09-27  7:23 Shaohua Li
  2011-09-27 11:28 ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-09-27  7:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, mel, Rik van Riel, linux-mm

has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
failure risk. For a high end_zone, if any zone below or equal to it has min
matermark ok, we have no risk. But current logic is any zone has min watermark
not ok, then we have risk. This is wrong to me.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 mm/vmscan.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux/mm/vmscan.c
===================================================================
--- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
+++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
@@ -2463,7 +2463,7 @@ loop_again:
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		unsigned long lru_pages = 0;
-		int has_under_min_watermark_zone = 0;
+		int has_under_min_watermark_zone = 1;
 
 		/* The swap token gets in the way of swapout... */
 		if (!priority)
@@ -2594,9 +2594,10 @@ loop_again:
 				 * means that we have a GFP_ATOMIC allocation
 				 * failure risk. Hurry up!
 				 */
-				if (!zone_watermark_ok_safe(zone, order,
+				if (has_under_min_watermark_zone &&
+					    zone_watermark_ok_safe(zone, order,
 					    min_wmark_pages(zone), end_zone, 0))
-					has_under_min_watermark_zone = 1;
+					has_under_min_watermark_zone = 0;
 			} else {
 				/*
 				 * If a zone reaches its high watermark,


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-09-27  7:23 [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure Shaohua Li
@ 2011-09-27 11:28 ` Michal Hocko
  2011-09-28  0:48   ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-09-27 11:28 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Andrew Morton, mel, Rik van Riel, linux-mm

On Tue 27-09-11 15:23:07, Shaohua Li wrote:
> has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> failure risk. For a high end_zone, if any zone below or equal to it has min
> matermark ok, we have no risk. But current logic is any zone has min watermark
> not ok, then we have risk. This is wrong to me.

This, however, means that we skip congestion_wait more often as ZONE_DMA
tend to be mostly balanced, right? This would mean that kswapd could hog
CPU more.
Does this fix any particular problem you are seeing?

> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> ---
>  mm/vmscan.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Index: linux/mm/vmscan.c
> ===================================================================
> --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> @@ -2463,7 +2463,7 @@ loop_again:
>  
>  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
>  		unsigned long lru_pages = 0;
> -		int has_under_min_watermark_zone = 0;
> +		int has_under_min_watermark_zone = 1;
>  
>  		/* The swap token gets in the way of swapout... */
>  		if (!priority)
> @@ -2594,9 +2594,10 @@ loop_again:
>  				 * means that we have a GFP_ATOMIC allocation
>  				 * failure risk. Hurry up!
>  				 */
> -				if (!zone_watermark_ok_safe(zone, order,
> +				if (has_under_min_watermark_zone &&
> +					    zone_watermark_ok_safe(zone, order,
>  					    min_wmark_pages(zone), end_zone, 0))
> -					has_under_min_watermark_zone = 1;
> +					has_under_min_watermark_zone = 0;
>  			} else {
>  				/*
>  				 * If a zone reaches its high watermark,
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-09-27 11:28 ` Michal Hocko
@ 2011-09-28  0:48   ` Shaohua Li
  2011-09-28  9:27     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-09-28  0:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, mel, Rik van Riel, linux-mm

On Tue, 2011-09-27 at 19:28 +0800, Michal Hocko wrote:
> On Tue 27-09-11 15:23:07, Shaohua Li wrote:
> > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > failure risk. For a high end_zone, if any zone below or equal to it has min
> > matermark ok, we have no risk. But current logic is any zone has min watermark
> > not ok, then we have risk. This is wrong to me.
> 
> This, however, means that we skip congestion_wait more often as ZONE_DMA
> tend to be mostly balanced, right? This would mean that kswapd could hog
> CPU more.
We actually might have more congestion_wait, as now if any zone can meet
min watermark, we don't have has_under_min_watermark_zone set so do
congestion_wait

> Does this fix any particular problem you are seeing?
No, just thought the logic is wrong.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-09-28  0:48   ` Shaohua Li
@ 2011-09-28  9:27     ` Michal Hocko
  2011-10-08  3:14       ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-09-28  9:27 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Andrew Morton, mel, Rik van Riel, linux-mm

On Wed 28-09-11 08:48:53, Shaohua Li wrote:
> On Tue, 2011-09-27 at 19:28 +0800, Michal Hocko wrote:
> > On Tue 27-09-11 15:23:07, Shaohua Li wrote:
> > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > not ok, then we have risk. This is wrong to me.
> > 
> > This, however, means that we skip congestion_wait more often as ZONE_DMA
> > tend to be mostly balanced, right? This would mean that kswapd could hog
> > CPU more.
> We actually might have more congestion_wait, as now if any zone can meet
> min watermark, we don't have has_under_min_watermark_zone set so do
> congestion_wait

Ahh, sorry, got confused.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-09-28  9:27     ` Michal Hocko
@ 2011-10-08  3:14       ` Shaohua Li
  2011-10-08  3:19         ` David Rientjes
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-08  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mel, Rik van Riel, Michal Hocko, MinchanKim, linux-mm

On Wed, 2011-09-28 at 17:27 +0800, Michal Hocko wrote:
> On Wed 28-09-11 08:48:53, Shaohua Li wrote:
> > On Tue, 2011-09-27 at 19:28 +0800, Michal Hocko wrote:
> > > On Tue 27-09-11 15:23:07, Shaohua Li wrote:
> > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > not ok, then we have risk. This is wrong to me.
> > > 
> > > This, however, means that we skip congestion_wait more often as ZONE_DMA
> > > tend to be mostly balanced, right? This would mean that kswapd could hog
> > > CPU more.
> > We actually might have more congestion_wait, as now if any zone can meet
> > min watermark, we don't have has_under_min_watermark_zone set so do
> > congestion_wait
> 
> Ahh, sorry, got confused.
resend the patch to correct email address of akpm.

Subject: vmscan: correctly detect GFP_ATOMIC allocation failure

has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
failure risk. For a high end_zone, if any zone below or equal to it has min
matermark ok, we have no risk. But current logic is any zone has min watermark
not ok, then we have risk. This is wrong to me.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 mm/vmscan.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux/mm/vmscan.c
===================================================================
--- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
+++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
@@ -2463,7 +2463,7 @@ loop_again:
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		unsigned long lru_pages = 0;
-		int has_under_min_watermark_zone = 0;
+		int has_under_min_watermark_zone = 1;
 
 		/* The swap token gets in the way of swapout... */
 		if (!priority)
@@ -2594,9 +2594,10 @@ loop_again:
 				 * means that we have a GFP_ATOMIC allocation
 				 * failure risk. Hurry up!
 				 */
-				if (!zone_watermark_ok_safe(zone, order,
+				if (has_under_min_watermark_zone &&
+					    zone_watermark_ok_safe(zone, order,
 					    min_wmark_pages(zone), end_zone, 0))
-					has_under_min_watermark_zone = 1;
+					has_under_min_watermark_zone = 0;
 			} else {
 				/*
 				 * If a zone reaches its high watermark,


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-08  3:14       ` Shaohua Li
@ 2011-10-08  3:19         ` David Rientjes
  2011-10-08  3:35           ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-10-08  3:19 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Andrew Morton, mel, Rik van Riel, Michal Hocko, MinchanKim, linux-mm

On Sat, 8 Oct 2011, Shaohua Li wrote:

> has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> failure risk. For a high end_zone, if any zone below or equal to it has min
> matermark ok, we have no risk. But current logic is any zone has min watermark
> not ok, then we have risk. This is wrong to me.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> ---
>  mm/vmscan.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Index: linux/mm/vmscan.c
> ===================================================================
> --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> @@ -2463,7 +2463,7 @@ loop_again:
>  
>  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
>  		unsigned long lru_pages = 0;
> -		int has_under_min_watermark_zone = 0;
> +		int has_under_min_watermark_zone = 1;

bool

>  
>  		/* The swap token gets in the way of swapout... */
>  		if (!priority)
> @@ -2594,9 +2594,10 @@ loop_again:
>  				 * means that we have a GFP_ATOMIC allocation
>  				 * failure risk. Hurry up!
>  				 */
> -				if (!zone_watermark_ok_safe(zone, order,
> +				if (has_under_min_watermark_zone &&
> +					    zone_watermark_ok_safe(zone, order,
>  					    min_wmark_pages(zone), end_zone, 0))
> -					has_under_min_watermark_zone = 1;
> +					has_under_min_watermark_zone = 0;
>  			} else {
>  				/*
>  				 * If a zone reaches its high watermark,

Ignore checking the min watermark for a moment and consider if all zones 
are above the high watermark (a situation where kswapd does not need to 
do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
cleared and never actually stalls on congestion_wait().  Notice this is 
congestion_wait() and not wait_iff_congested(), so the clearing of 
ZONE_CONGESTED doesn't prevent this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-08  3:19         ` David Rientjes
@ 2011-10-08  3:35           ` Shaohua Li
  2011-10-08  5:56             ` [patch v2]vmscan: " Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-08  3:35 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, mel, Rik van Riel, Michal Hocko, MinchanKim, linux-mm

On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> On Sat, 8 Oct 2011, Shaohua Li wrote:
> 
> > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > failure risk. For a high end_zone, if any zone below or equal to it has min
> > matermark ok, we have no risk. But current logic is any zone has min watermark
> > not ok, then we have risk. This is wrong to me.
> > 
> > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > ---
> >  mm/vmscan.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > Index: linux/mm/vmscan.c
> > ===================================================================
> > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > @@ -2463,7 +2463,7 @@ loop_again:
> >  
> >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> >  		unsigned long lru_pages = 0;
> > -		int has_under_min_watermark_zone = 0;
> > +		int has_under_min_watermark_zone = 1;
> 
> bool
> 
> >  
> >  		/* The swap token gets in the way of swapout... */
> >  		if (!priority)
> > @@ -2594,9 +2594,10 @@ loop_again:
> >  				 * means that we have a GFP_ATOMIC allocation
> >  				 * failure risk. Hurry up!
> >  				 */
> > -				if (!zone_watermark_ok_safe(zone, order,
> > +				if (has_under_min_watermark_zone &&
> > +					    zone_watermark_ok_safe(zone, order,
> >  					    min_wmark_pages(zone), end_zone, 0))
> > -					has_under_min_watermark_zone = 1;
> > +					has_under_min_watermark_zone = 0;
> >  			} else {
> >  				/*
> >  				 * If a zone reaches its high watermark,
> 
> Ignore checking the min watermark for a moment and consider if all zones 
> are above the high watermark (a situation where kswapd does not need to 
> do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> cleared and never actually stalls on congestion_wait().  Notice this is 
> congestion_wait() and not wait_iff_congested(), so the clearing of 
> ZONE_CONGESTED doesn't prevent this.
if all zones are above the high watermark, we will have i < 0 when
detecting the highest imbalanced zone, and the whole loop will end
without run into congestion_wait().
or I can add a clearing has_under_min_watermark_zone in the else block
to be safe.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-08  3:35           ` Shaohua Li
@ 2011-10-08  5:56             ` Shaohua Li
  2011-10-08 10:25               ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-08  5:56 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: mel, Rik van Riel, Michal Hocko, MinchanKim, linux-mm

On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > 
> > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > not ok, then we have risk. This is wrong to me.
> > > 
> > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > ---
> > >  mm/vmscan.c |    7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > Index: linux/mm/vmscan.c
> > > ===================================================================
> > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > @@ -2463,7 +2463,7 @@ loop_again:
> > >  
> > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > >  		unsigned long lru_pages = 0;
> > > -		int has_under_min_watermark_zone = 0;
> > > +		int has_under_min_watermark_zone = 1;
> > 
> > bool
> > 
> > >  
> > >  		/* The swap token gets in the way of swapout... */
> > >  		if (!priority)
> > > @@ -2594,9 +2594,10 @@ loop_again:
> > >  				 * means that we have a GFP_ATOMIC allocation
> > >  				 * failure risk. Hurry up!
> > >  				 */
> > > -				if (!zone_watermark_ok_safe(zone, order,
> > > +				if (has_under_min_watermark_zone &&
> > > +					    zone_watermark_ok_safe(zone, order,
> > >  					    min_wmark_pages(zone), end_zone, 0))
> > > -					has_under_min_watermark_zone = 1;
> > > +					has_under_min_watermark_zone = 0;
> > >  			} else {
> > >  				/*
> > >  				 * If a zone reaches its high watermark,
> > 
> > Ignore checking the min watermark for a moment and consider if all zones 
> > are above the high watermark (a situation where kswapd does not need to 
> > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > cleared and never actually stalls on congestion_wait().  Notice this is 
> > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > ZONE_CONGESTED doesn't prevent this.
> if all zones are above the high watermark, we will have i < 0 when
> detecting the highest imbalanced zone, and the whole loop will end
> without run into congestion_wait().
> or I can add a clearing has_under_min_watermark_zone in the else block
> to be safe.
Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2

has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
failure risk. For a high end_zone, if any zone below or equal to it has min
matermark ok, we have no risk. But current logic is any zone has min watermark
not ok, then we have risk. This is wrong to me.

v2: use bool and clear has_under_min_watermark_zone for zone with watermark ok
as suggested by David Rientjes.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 mm/vmscan.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux/mm/vmscan.c
===================================================================
--- linux.orig/mm/vmscan.c	2011-10-08 13:31:49.000000000 +0800
+++ linux/mm/vmscan.c	2011-10-08 13:32:27.000000000 +0800
@@ -2463,7 +2463,7 @@ loop_again:
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		unsigned long lru_pages = 0;
-		int has_under_min_watermark_zone = 0;
+		bool has_under_min_watermark_zone = true;
 
 		/* The swap token gets in the way of swapout... */
 		if (!priority)
@@ -2594,9 +2594,10 @@ loop_again:
 				 * means that we have a GFP_ATOMIC allocation
 				 * failure risk. Hurry up!
 				 */
-				if (!zone_watermark_ok_safe(zone, order,
+				if (has_under_min_watermark_zone &&
+					    zone_watermark_ok_safe(zone, order,
 					    min_wmark_pages(zone), end_zone, 0))
-					has_under_min_watermark_zone = 1;
+					has_under_min_watermark_zone = false;
 			} else {
 				/*
 				 * If a zone reaches its high watermark,
@@ -2608,6 +2609,7 @@ loop_again:
 				zone_clear_flag(zone, ZONE_CONGESTED);
 				if (i <= *classzone_idx)
 					balanced += zone->present_pages;
+				has_under_min_watermark_zone = false;
 			}
 
 		}



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-08  5:56             ` [patch v2]vmscan: " Shaohua Li
@ 2011-10-08 10:25               ` Minchan Kim
  2011-10-09  5:53                 ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-08 10:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > 
> > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > not ok, then we have risk. This is wrong to me.
> > > > 
> > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > ---
> > > >  mm/vmscan.c |    7 ++++---
> > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > 
> > > > Index: linux/mm/vmscan.c
> > > > ===================================================================
> > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > >  
> > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > >  		unsigned long lru_pages = 0;
> > > > -		int has_under_min_watermark_zone = 0;
> > > > +		int has_under_min_watermark_zone = 1;
> > > 
> > > bool
> > > 
> > > >  
> > > >  		/* The swap token gets in the way of swapout... */
> > > >  		if (!priority)
> > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > >  				 * means that we have a GFP_ATOMIC allocation
> > > >  				 * failure risk. Hurry up!
> > > >  				 */
> > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > +				if (has_under_min_watermark_zone &&
> > > > +					    zone_watermark_ok_safe(zone, order,
> > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > -					has_under_min_watermark_zone = 1;
> > > > +					has_under_min_watermark_zone = 0;
> > > >  			} else {
> > > >  				/*
> > > >  				 * If a zone reaches its high watermark,
> > > 
> > > Ignore checking the min watermark for a moment and consider if all zones 
> > > are above the high watermark (a situation where kswapd does not need to 
> > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > ZONE_CONGESTED doesn't prevent this.
> > if all zones are above the high watermark, we will have i < 0 when
> > detecting the highest imbalanced zone, and the whole loop will end
> > without run into congestion_wait().
> > or I can add a clearing has_under_min_watermark_zone in the else block
> > to be safe.
> Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> 
> has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> failure risk. For a high end_zone, if any zone below or equal to it has min
> matermark ok, we have no risk. But current logic is any zone has min watermark
> not ok, then we have risk. This is wrong to me.

I think it's not a right or wrong problem but a policy stuff.
If we are going to start busy reclaiming for atomic allocation
after we see all lower zones' min water mark pages are already consumed
It could make you go through long latency and is likely to fail atomic allocation
stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
so kswapd should always do best effort for it)

I don't mean you are wrong but we are very careful about this
and at least need some experiments with atomic allocaion stream, I think.

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-08 10:25               ` Minchan Kim
@ 2011-10-09  5:53                 ` Shaohua Li
  2011-10-09  8:01                   ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-09  5:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > 
> > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > not ok, then we have risk. This is wrong to me.
> > > > > 
> > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > ---
> > > > >  mm/vmscan.c |    7 ++++---
> > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > 
> > > > > Index: linux/mm/vmscan.c
> > > > > ===================================================================
> > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > >  
> > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > >  		unsigned long lru_pages = 0;
> > > > > -		int has_under_min_watermark_zone = 0;
> > > > > +		int has_under_min_watermark_zone = 1;
> > > > 
> > > > bool
> > > > 
> > > > >  
> > > > >  		/* The swap token gets in the way of swapout... */
> > > > >  		if (!priority)
> > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > >  				 * failure risk. Hurry up!
> > > > >  				 */
> > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > +				if (has_under_min_watermark_zone &&
> > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > -					has_under_min_watermark_zone = 1;
> > > > > +					has_under_min_watermark_zone = 0;
> > > > >  			} else {
> > > > >  				/*
> > > > >  				 * If a zone reaches its high watermark,
> > > > 
> > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > are above the high watermark (a situation where kswapd does not need to 
> > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > ZONE_CONGESTED doesn't prevent this.
> > > if all zones are above the high watermark, we will have i < 0 when
> > > detecting the highest imbalanced zone, and the whole loop will end
> > > without run into congestion_wait().
> > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > to be safe.
> > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > 
> > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > failure risk. For a high end_zone, if any zone below or equal to it has min
> > matermark ok, we have no risk. But current logic is any zone has min watermark
> > not ok, then we have risk. This is wrong to me.
> 
> I think it's not a right or wrong problem but a policy stuff.
> If we are going to start busy reclaiming for atomic allocation
> after we see all lower zones' min water mark pages are already consumed
> It could make you go through long latency and is likely to fail atomic allocation
> stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> so kswapd should always do best effort for it)
> 
> I don't mean you are wrong but we are very careful about this
> and at least need some experiments with atomic allocaion stream, I think.
yes. this is a policy problem. I just don't want the kswapd keep running
even there is no immediate risk of atomic allocation fail.
One problem here is end_zone could be high, and low zone always doesn't
meet min watermark. So kswapd keeps running without a wait and builds
big priority.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-09  5:53                 ` Shaohua Li
@ 2011-10-09  8:01                   ` Minchan Kim
  2011-10-09  8:17                     ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-09  8:01 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > 
> > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > 
> > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > ---
> > > > > >  mm/vmscan.c |    7 ++++---
> > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > Index: linux/mm/vmscan.c
> > > > > > ===================================================================
> > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > >  
> > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > >  		unsigned long lru_pages = 0;
> > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > 
> > > > > bool
> > > > > 
> > > > > >  
> > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > >  		if (!priority)
> > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > >  				 * failure risk. Hurry up!
> > > > > >  				 */
> > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > +					has_under_min_watermark_zone = 0;
> > > > > >  			} else {
> > > > > >  				/*
> > > > > >  				 * If a zone reaches its high watermark,
> > > > > 
> > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > ZONE_CONGESTED doesn't prevent this.
> > > > if all zones are above the high watermark, we will have i < 0 when
> > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > without run into congestion_wait().
> > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > to be safe.
> > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > 
> > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > not ok, then we have risk. This is wrong to me.
> > 
> > I think it's not a right or wrong problem but a policy stuff.
> > If we are going to start busy reclaiming for atomic allocation
> > after we see all lower zones' min water mark pages are already consumed
> > It could make you go through long latency and is likely to fail atomic allocation
> > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > so kswapd should always do best effort for it)
> > 
> > I don't mean you are wrong but we are very careful about this
> > and at least need some experiments with atomic allocaion stream, I think.
> yes. this is a policy problem. I just don't want the kswapd keep running
> even there is no immediate risk of atomic allocation fail.
> One problem here is end_zone could be high, and low zone always doesn't
> meet min watermark. So kswapd keeps running without a wait and builds
> big priority.

It could be but I think it's a mistake of admin if he handles such rare system.
Couldn't he lower the reserved pages for highmem?

> 
> Thanks,
> Shaohua
> 

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-09  8:01                   ` Minchan Kim
@ 2011-10-09  8:17                     ` Shaohua Li
  2011-10-09 15:10                       ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-09  8:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > 
> > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > 
> > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > ---
> > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > Index: linux/mm/vmscan.c
> > > > > > > ===================================================================
> > > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > >  
> > > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > >  		unsigned long lru_pages = 0;
> > > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > > 
> > > > > > bool
> > > > > > 
> > > > > > >  
> > > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > > >  		if (!priority)
> > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > > >  				 * failure risk. Hurry up!
> > > > > > >  				 */
> > > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > > +					has_under_min_watermark_zone = 0;
> > > > > > >  			} else {
> > > > > > >  				/*
> > > > > > >  				 * If a zone reaches its high watermark,
> > > > > > 
> > > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > without run into congestion_wait().
> > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > to be safe.
> > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > 
> > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > not ok, then we have risk. This is wrong to me.
> > > 
> > > I think it's not a right or wrong problem but a policy stuff.
> > > If we are going to start busy reclaiming for atomic allocation
> > > after we see all lower zones' min water mark pages are already consumed
> > > It could make you go through long latency and is likely to fail atomic allocation
> > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > so kswapd should always do best effort for it)
> > > 
> > > I don't mean you are wrong but we are very careful about this
> > > and at least need some experiments with atomic allocaion stream, I think.
> > yes. this is a policy problem. I just don't want the kswapd keep running
> > even there is no immediate risk of atomic allocation fail.
> > One problem here is end_zone could be high, and low zone always doesn't
> > meet min watermark. So kswapd keeps running without a wait and builds
> > big priority.
> 
> It could be but I think it's a mistake of admin if he handles such rare system.
> Couldn't he lower the reserved pages for highmem?
not because admin changes reserved pages. we still have the
zone->lowmem_reserve[] issue for zone_watermark_ok here.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-09  8:17                     ` Shaohua Li
@ 2011-10-09 15:10                       ` Minchan Kim
  2011-10-10  7:28                         ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-09 15:10 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Sun, Oct 09, 2011 at 04:17:51PM +0800, Shaohua Li wrote:
> On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> > On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > > 
> > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > 
> > > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > > ---
> > > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > Index: linux/mm/vmscan.c
> > > > > > > > ===================================================================
> > > > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > > >  
> > > > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > > >  		unsigned long lru_pages = 0;
> > > > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > > > 
> > > > > > > bool
> > > > > > > 
> > > > > > > >  
> > > > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > > > >  		if (!priority)
> > > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > > > >  				 * failure risk. Hurry up!
> > > > > > > >  				 */
> > > > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > > > +					has_under_min_watermark_zone = 0;
> > > > > > > >  			} else {
> > > > > > > >  				/*
> > > > > > > >  				 * If a zone reaches its high watermark,
> > > > > > > 
> > > > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > > without run into congestion_wait().
> > > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > > to be safe.
> > > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > > 
> > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > not ok, then we have risk. This is wrong to me.
> > > > 
> > > > I think it's not a right or wrong problem but a policy stuff.
> > > > If we are going to start busy reclaiming for atomic allocation
> > > > after we see all lower zones' min water mark pages are already consumed
> > > > It could make you go through long latency and is likely to fail atomic allocation
> > > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > > so kswapd should always do best effort for it)
> > > > 
> > > > I don't mean you are wrong but we are very careful about this
> > > > and at least need some experiments with atomic allocaion stream, I think.
> > > yes. this is a policy problem. I just don't want the kswapd keep running
> > > even there is no immediate risk of atomic allocation fail.
> > > One problem here is end_zone could be high, and low zone always doesn't
> > > meet min watermark. So kswapd keeps running without a wait and builds
> > > big priority.
> > 
> > It could be but I think it's a mistake of admin if he handles such rare system.
> > Couldn't he lower the reserved pages for highmem?
> not because admin changes reserved pages. we still have the
> zone->lowmem_reserve[] issue for zone_watermark_ok here.

Sorry I couldn't understand your point.
I mean if min watermark is too high, you could lower min_free_kbytes.
If reserved pages is too high, you could handle lowmem_reserve_ratio.
Could we solve the problem with those knobs?

In addition, kswapd could easily set all_unreclaimable of the zone in your example.
Then, kswapd should just peek the zone once in a while if the zone is all_unreclaimable.
It should be no problem in CPU overhead.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-09 15:10                       ` Minchan Kim
@ 2011-10-10  7:28                         ` Shaohua Li
  2011-10-10 15:42                           ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-10  7:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Sun, 2011-10-09 at 23:10 +0800, Minchan Kim wrote:
> On Sun, Oct 09, 2011 at 04:17:51PM +0800, Shaohua Li wrote:
> > On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> > > On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > > > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > > > 
> > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > > > ---
> > > > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > 
> > > > > > > > > Index: linux/mm/vmscan.c
> > > > > > > > > ===================================================================
> > > > > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > > > >  
> > > > > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > > > >  		unsigned long lru_pages = 0;
> > > > > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > > > > 
> > > > > > > > bool
> > > > > > > > 
> > > > > > > > >  
> > > > > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > > > > >  		if (!priority)
> > > > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > > > > >  				 * failure risk. Hurry up!
> > > > > > > > >  				 */
> > > > > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > > > > +					has_under_min_watermark_zone = 0;
> > > > > > > > >  			} else {
> > > > > > > > >  				/*
> > > > > > > > >  				 * If a zone reaches its high watermark,
> > > > > > > > 
> > > > > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > > > without run into congestion_wait().
> > > > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > > > to be safe.
> > > > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > > > 
> > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > not ok, then we have risk. This is wrong to me.
> > > > > 
> > > > > I think it's not a right or wrong problem but a policy stuff.
> > > > > If we are going to start busy reclaiming for atomic allocation
> > > > > after we see all lower zones' min water mark pages are already consumed
> > > > > It could make you go through long latency and is likely to fail atomic allocation
> > > > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > > > so kswapd should always do best effort for it)
> > > > > 
> > > > > I don't mean you are wrong but we are very careful about this
> > > > > and at least need some experiments with atomic allocaion stream, I think.
> > > > yes. this is a policy problem. I just don't want the kswapd keep running
> > > > even there is no immediate risk of atomic allocation fail.
> > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > meet min watermark. So kswapd keeps running without a wait and builds
> > > > big priority.
> > > 
> > > It could be but I think it's a mistake of admin if he handles such rare system.
> > > Couldn't he lower the reserved pages for highmem?
> > not because admin changes reserved pages. we still have the
> > zone->lowmem_reserve[] issue for zone_watermark_ok here.
> 
> Sorry I couldn't understand your point.
> I mean if min watermark is too high, you could lower min_free_kbytes.
> If reserved pages is too high, you could handle lowmem_reserve_ratio.
> Could we solve the problem with those knobs?
I mean a system with default setting. Changing the knobs might solve the
problem, but few people know it, so not a right solution.

> In addition, kswapd could easily set all_unreclaimable of the zone in your example.
> Then, kswapd should just peek the zone once in a while if the zone is all_unreclaimable.
> It should be no problem in CPU overhead.
even the low zone has all_unreclaimable, the high zone will be scanned
again with low priority (because its high watermark might not be ok, but
min watermark is ok), and does not do congestion_wait. This will make a
lot of unnecessary page scan. With wait, some pages might be freed.
DMA zone is a special case, and I thought this can happen in other low
zones too. such low zones might have lru pages. so all_unreclaimable
isn't set in such zones.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-10  7:28                         ` Shaohua Li
@ 2011-10-10 15:42                           ` Minchan Kim
  2011-10-11  5:30                             ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-10 15:42 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Mon, Oct 10, 2011 at 03:28:13PM +0800, Shaohua Li wrote:
> On Sun, 2011-10-09 at 23:10 +0800, Minchan Kim wrote:
> > On Sun, Oct 09, 2011 at 04:17:51PM +0800, Shaohua Li wrote:
> > > On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> > > > On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > > > > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > > > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > > > > 
> > > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > > > > ---
> > > > > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > Index: linux/mm/vmscan.c
> > > > > > > > > > ===================================================================
> > > > > > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > > > > >  
> > > > > > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > > > > >  		unsigned long lru_pages = 0;
> > > > > > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > > > > > 
> > > > > > > > > bool
> > > > > > > > > 
> > > > > > > > > >  
> > > > > > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > > > > > >  		if (!priority)
> > > > > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > > > > > >  				 * failure risk. Hurry up!
> > > > > > > > > >  				 */
> > > > > > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > > > > > +					has_under_min_watermark_zone = 0;
> > > > > > > > > >  			} else {
> > > > > > > > > >  				/*
> > > > > > > > > >  				 * If a zone reaches its high watermark,
> > > > > > > > > 
> > > > > > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > > > > without run into congestion_wait().
> > > > > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > > > > to be safe.
> > > > > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > > > > 
> > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > 
> > > > > > I think it's not a right or wrong problem but a policy stuff.
> > > > > > If we are going to start busy reclaiming for atomic allocation
> > > > > > after we see all lower zones' min water mark pages are already consumed
> > > > > > It could make you go through long latency and is likely to fail atomic allocation
> > > > > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > > > > so kswapd should always do best effort for it)
> > > > > > 
> > > > > > I don't mean you are wrong but we are very careful about this
> > > > > > and at least need some experiments with atomic allocaion stream, I think.
> > > > > yes. this is a policy problem. I just don't want the kswapd keep running
> > > > > even there is no immediate risk of atomic allocation fail.
> > > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > > meet min watermark. So kswapd keeps running without a wait and builds
> > > > > big priority.
> > > > 
> > > > It could be but I think it's a mistake of admin if he handles such rare system.
> > > > Couldn't he lower the reserved pages for highmem?
> > > not because admin changes reserved pages. we still have the
> > > zone->lowmem_reserve[] issue for zone_watermark_ok here.
> > 
> > Sorry I couldn't understand your point.
> > I mean if min watermark is too high, you could lower min_free_kbytes.
> > If reserved pages is too high, you could handle lowmem_reserve_ratio.
> > Could we solve the problem with those knobs?
> I mean a system with default setting. Changing the knobs might solve the
> problem, but few people know it, so not a right solution.

We couldn't cover whole cases so that we should focus on common case.
That's why we need knobs.
Do you think your case is general one we should handle it by default?
If you really think, we have to fix watermark setting logic rathe than kswapd.

> 
> > In addition, kswapd could easily set all_unreclaimable of the zone in your example.
> > Then, kswapd should just peek the zone once in a while if the zone is all_unreclaimable.
> > It should be no problem in CPU overhead.
> even the low zone has all_unreclaimable, the high zone will be scanned
> again with low priority (because its high watermark might not be ok, but
> min watermark is ok), and does not do congestion_wait. This will make a

As you said, high zone is below high watermark so it's natural since it's
one of kswapd goal. Why it doesn't sleep is that one zone of zonelists is
below min_watmark. So it's natural in current policy, too.

> lot of unnecessary page scan. With wait, some pages might be freed.

Hmm, My opinion is different. Now the high zone haven't been filled up
enough free pages to satisfy high watemark although prioirty is high(ie, priority < DEF_PRIORITY -2).
It means we should extend window size to isolate from LRU.

Having said that, I understand your point.
If it isn't, we can go after sleeping by congestion_wait so that scan/sec would be low.
But the situation is emergency because min_watermark is very important mark to work system goes well.
It is related to PF_MEMALLOC. So let's sacrifice CPU overhead for system stability.

> DMA zone is a special case, and I thought this can happen in other low
> zones too. such low zones might have lru pages. so all_unreclaimable
> isn't set in such zones.
> 
> Thanks,
> Shaohua
> 

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-10 15:42                           ` Minchan Kim
@ 2011-10-11  5:30                             ` Shaohua Li
  2011-10-11  6:54                               ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-11  5:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Mon, 2011-10-10 at 23:42 +0800, Minchan Kim wrote:
> On Mon, Oct 10, 2011 at 03:28:13PM +0800, Shaohua Li wrote:
> > On Sun, 2011-10-09 at 23:10 +0800, Minchan Kim wrote:
> > > On Sun, Oct 09, 2011 at 04:17:51PM +0800, Shaohua Li wrote:
> > > > On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> > > > > On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > > > > > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > > > > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > > > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > > > > > 
> > > > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > Index: linux/mm/vmscan.c
> > > > > > > > > > > ===================================================================
> > > > > > > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > > > > > >  
> > > > > > > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > > > > > >  		unsigned long lru_pages = 0;
> > > > > > > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > > > > > > 
> > > > > > > > > > bool
> > > > > > > > > > 
> > > > > > > > > > >  
> > > > > > > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > > > > > > >  		if (!priority)
> > > > > > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > > > > > > >  				 * failure risk. Hurry up!
> > > > > > > > > > >  				 */
> > > > > > > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > > > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > > > > > > +					has_under_min_watermark_zone = 0;
> > > > > > > > > > >  			} else {
> > > > > > > > > > >  				/*
> > > > > > > > > > >  				 * If a zone reaches its high watermark,
> > > > > > > > > > 
> > > > > > > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > > > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > > > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > > > > > without run into congestion_wait().
> > > > > > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > > > > > to be safe.
> > > > > > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > > > > > 
> > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > 
> > > > > > > I think it's not a right or wrong problem but a policy stuff.
> > > > > > > If we are going to start busy reclaiming for atomic allocation
> > > > > > > after we see all lower zones' min water mark pages are already consumed
> > > > > > > It could make you go through long latency and is likely to fail atomic allocation
> > > > > > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > > > > > so kswapd should always do best effort for it)
> > > > > > > 
> > > > > > > I don't mean you are wrong but we are very careful about this
> > > > > > > and at least need some experiments with atomic allocaion stream, I think.
> > > > > > yes. this is a policy problem. I just don't want the kswapd keep running
> > > > > > even there is no immediate risk of atomic allocation fail.
> > > > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > > > meet min watermark. So kswapd keeps running without a wait and builds
> > > > > > big priority.
> > > > > 
> > > > > It could be but I think it's a mistake of admin if he handles such rare system.
> > > > > Couldn't he lower the reserved pages for highmem?
> > > > not because admin changes reserved pages. we still have the
> > > > zone->lowmem_reserve[] issue for zone_watermark_ok here.
> > > 
> > > Sorry I couldn't understand your point.
> > > I mean if min watermark is too high, you could lower min_free_kbytes.
> > > If reserved pages is too high, you could handle lowmem_reserve_ratio.
> > > Could we solve the problem with those knobs?
> > I mean a system with default setting. Changing the knobs might solve the
> > problem, but few people know it, so not a right solution.
> 
> We couldn't cover whole cases so that we should focus on common case.
> That's why we need knobs.
> Do you think your case is general one we should handle it by default?
> If you really think, we have to fix watermark setting logic rathe than kswapd.
it's pretty common a system has big high zone and small low zone these
days. Why current watermark setting is wrong?
 
> > > In addition, kswapd could easily set all_unreclaimable of the zone in your example.
> > > Then, kswapd should just peek the zone once in a while if the zone is all_unreclaimable.
> > > It should be no problem in CPU overhead.
> > even the low zone has all_unreclaimable, the high zone will be scanned
> > again with low priority (because its high watermark might not be ok, but
> > min watermark is ok), and does not do congestion_wait. This will make a
> 
> As you said, high zone is below high watermark so it's natural since it's
> one of kswapd goal. Why it doesn't sleep is that one zone of zonelists is
> below min_watmark. So it's natural in current policy, too.
there are two cases one zone is below min_watermark.
1. the zone is below min_watermark for allocation in the zone. in this
case we need hurry up.
2. the zone is below min_watermark for allocation from high zone. we
don't really need hurry up if other zone is above min_watermark.
Since low zone need to reserve pages for high zone, the second case
could be common.
Yes, keeping kswapd running in this case can reduce the chance
GFP_ATOMIC failure. But my patch will not cause immediate failure
because there is still some zones which are above min_watermark and can
meet the GFP_ATOMIC allocation. And keeping kswapd running has some
drawbacks:
1. cpu overhead
2. extend isolate window size, so trash working set.
Considering DMA zone, we almost always have DMA zone min_watermark not
ok for any allocation from high zone. So we will always have such
drawbacks.

Or is something below better? we can avoid the big reserved pages
accounting to the min_wmark_pages for low zone. if high zone is under
min_wmark, kswapd will not sleep.
                               if (!zone_watermark_ok_safe(zone, order,
-                                            min_wmark_pages(zone), end_zone, 0))
+                                            min_wmark_pages(zone), 0, 0))
                                        has_under_min_watermark_zone = 1;

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-11  5:30                             ` Shaohua Li
@ 2011-10-11  6:54                               ` Minchan Kim
  2011-10-12  2:48                                 ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-11  6:54 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Tue, Oct 11, 2011 at 01:30:10PM +0800, Shaohua Li wrote:
> On Mon, 2011-10-10 at 23:42 +0800, Minchan Kim wrote:
> > On Mon, Oct 10, 2011 at 03:28:13PM +0800, Shaohua Li wrote:
> > > On Sun, 2011-10-09 at 23:10 +0800, Minchan Kim wrote:
> > > > On Sun, Oct 09, 2011 at 04:17:51PM +0800, Shaohua Li wrote:
> > > > > On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> > > > > > On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > > > > > > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > > > > > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > > > > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > > > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > > 
> > > > > > > > > > > > Index: linux/mm/vmscan.c
> > > > > > > > > > > > ===================================================================
> > > > > > > > > > > > --- linux.orig/mm/vmscan.c	2011-09-27 15:09:29.000000000 +0800
> > > > > > > > > > > > +++ linux/mm/vmscan.c	2011-09-27 15:14:45.000000000 +0800
> > > > > > > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > > > > > > >  
> > > > > > > > > > > >  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > > > > > > >  		unsigned long lru_pages = 0;
> > > > > > > > > > > > -		int has_under_min_watermark_zone = 0;
> > > > > > > > > > > > +		int has_under_min_watermark_zone = 1;
> > > > > > > > > > > 
> > > > > > > > > > > bool
> > > > > > > > > > > 
> > > > > > > > > > > >  
> > > > > > > > > > > >  		/* The swap token gets in the way of swapout... */
> > > > > > > > > > > >  		if (!priority)
> > > > > > > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > > > > > > >  				 * means that we have a GFP_ATOMIC allocation
> > > > > > > > > > > >  				 * failure risk. Hurry up!
> > > > > > > > > > > >  				 */
> > > > > > > > > > > > -				if (!zone_watermark_ok_safe(zone, order,
> > > > > > > > > > > > +				if (has_under_min_watermark_zone &&
> > > > > > > > > > > > +					    zone_watermark_ok_safe(zone, order,
> > > > > > > > > > > >  					    min_wmark_pages(zone), end_zone, 0))
> > > > > > > > > > > > -					has_under_min_watermark_zone = 1;
> > > > > > > > > > > > +					has_under_min_watermark_zone = 0;
> > > > > > > > > > > >  			} else {
> > > > > > > > > > > >  				/*
> > > > > > > > > > > >  				 * If a zone reaches its high watermark,
> > > > > > > > > > > 
> > > > > > > > > > > Ignore checking the min watermark for a moment and consider if all zones 
> > > > > > > > > > > are above the high watermark (a situation where kswapd does not need to 
> > > > > > > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get 
> > > > > > > > > > > cleared and never actually stalls on congestion_wait().  Notice this is 
> > > > > > > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of 
> > > > > > > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > > > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > > > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > > > > > > without run into congestion_wait().
> > > > > > > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > > > > > > to be safe.
> > > > > > > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > > > > > > 
> > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > 
> > > > > > > > I think it's not a right or wrong problem but a policy stuff.
> > > > > > > > If we are going to start busy reclaiming for atomic allocation
> > > > > > > > after we see all lower zones' min water mark pages are already consumed
> > > > > > > > It could make you go through long latency and is likely to fail atomic allocation
> > > > > > > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > > > > > > so kswapd should always do best effort for it)
> > > > > > > > 
> > > > > > > > I don't mean you are wrong but we are very careful about this
> > > > > > > > and at least need some experiments with atomic allocaion stream, I think.
> > > > > > > yes. this is a policy problem. I just don't want the kswapd keep running
> > > > > > > even there is no immediate risk of atomic allocation fail.
> > > > > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > > > > meet min watermark. So kswapd keeps running without a wait and builds
> > > > > > > big priority.
> > > > > > 
> > > > > > It could be but I think it's a mistake of admin if he handles such rare system.
> > > > > > Couldn't he lower the reserved pages for highmem?
> > > > > not because admin changes reserved pages. we still have the
> > > > > zone->lowmem_reserve[] issue for zone_watermark_ok here.
> > > > 
> > > > Sorry I couldn't understand your point.
> > > > I mean if min watermark is too high, you could lower min_free_kbytes.
> > > > If reserved pages is too high, you could handle lowmem_reserve_ratio.
> > > > Could we solve the problem with those knobs?
> > > I mean a system with default setting. Changing the knobs might solve the
> > > problem, but few people know it, so not a right solution.
> > 
> > We couldn't cover whole cases so that we should focus on common case.
> > That's why we need knobs.
> > Do you think your case is general one we should handle it by default?
> > If you really think, we have to fix watermark setting logic rathe than kswapd.
> it's pretty common a system has big high zone and small low zone these
> days. Why current watermark setting is wrong?

You said as follows,

> > > > > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > > > > meet min watermark. So kswapd keeps running without a wait and builds

You should have said not "meet min watermark" but "meet min_watermak + reserved pages[HIGH_ZONE]"
for the clearness. Sorry for misleading your statement. Now I got realized your intention.

>  
> > > > In addition, kswapd could easily set all_unreclaimable of the zone in your example.
> > > > Then, kswapd should just peek the zone once in a while if the zone is all_unreclaimable.
> > > > It should be no problem in CPU overhead.
> > > even the low zone has all_unreclaimable, the high zone will be scanned
> > > again with low priority (because its high watermark might not be ok, but
> > > min watermark is ok), and does not do congestion_wait. This will make a
> > 
> > As you said, high zone is below high watermark so it's natural since it's
> > one of kswapd goal. Why it doesn't sleep is that one zone of zonelists is
> > below min_watmark. So it's natural in current policy, too.
> there are two cases one zone is below min_watermark.
> 1. the zone is below min_watermark for allocation in the zone. in this
> case we need hurry up.
> 2. the zone is below min_watermark for allocation from high zone. we
> don't really need hurry up if other zone is above min_watermark.
> Since low zone need to reserve pages for high zone, the second case
> could be common.

You mean "lowmem_reserve"?
It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
because it could be fatal to allow process pages to be allocated from low zone.
Also, We could set each ratio for reserved pages of zones.
How could we make sure lower zones have enough free pages for higher zone?

> Yes, keeping kswapd running in this case can reduce the chance
> GFP_ATOMIC failure. But my patch will not cause immediate failure
> because there is still some zones which are above min_watermark and can
> meet the GFP_ATOMIC allocation. And keeping kswapd running has some

True. It was why I said "I don't mean you are wrong but we are very careful about this."
Normally, it could handle but might fail on sudden peak of atomic allocation stream.
Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
So I would like to be very careful and that's why I suggest we need at least some experiment.
Through it, Andrew could make final call.

> drawbacks:
> 1. cpu overhead
> 2. extend isolate window size, so trash working set.
> Considering DMA zone, we almost always have DMA zone min_watermark not
> ok for any allocation from high zone. So we will always have such
> drawbacks.

I agree with you in that it's a problem.
I think the real solution is to remove the zone from allocation fallback list in such case
because the lower zone could never meet any allocation for the higher zone.
But it makes code rather complicated as we have to consider admin who can change
reserved pages anytime.
So how about this?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8913374..f71ed2f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2693,8 +2693,16 @@ loop_again:
                                 * failure risk. Hurry up!
                                 */
                                if (!zone_watermark_ok_safe(zone, order,
-                                           min_wmark_pages(zone), end_zone, 0))
-                                       has_under_min_watermark_zone = 1;
+                                           min_wmark_pages(zone), end_zone, 0)) {
+                                       /*
+                                        * In case of big reserved page for higher zone,
+                                        * it is pointless to try reclaimaing pages quickly
+                                        * because it could never meet the requirement.
+                                        */
+                                       if (zone->present_pages >
+                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
+                                               has_under_min_watermark_zone = 1;
+                               }
                        } else {
                                /*
                                 * If a zone reaches its high watermark,

Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.

> 
> Or is something below better? we can avoid the big reserved pages
> accounting to the min_wmark_pages for low zone. if high zone is under
> min_wmark, kswapd will not sleep.
>                                if (!zone_watermark_ok_safe(zone, order,
> -                                            min_wmark_pages(zone), end_zone, 0))
> +                                            min_wmark_pages(zone), 0, 0))
>                                         has_under_min_watermark_zone = 1;

I think it's not a good idea since page allocator always considers classzone_idx.
So although we fix kswapd issue through your changes, page allocator still can't allocate memory
and wakes up kswapd, again.

> 
> Thanks,
> Shaohua
> 

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-11  6:54                               ` Minchan Kim
@ 2011-10-12  2:48                                 ` Shaohua Li
  2011-10-12  7:59                                   ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-12  2:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Tue, 2011-10-11 at 14:54 +0800, Minchan Kim wrote:
> On Tue, Oct 11, 2011 at 01:30:10PM +0800, Shaohua Li wrote:
> > On Mon, 2011-10-10 at 23:42 +0800, Minchan Kim wrote:
> > > On Mon, Oct 10, 2011 at 03:28:13PM +0800, Shaohua Li wrote:
> > > > On Sun, 2011-10-09 at 23:10 +0800, Minchan Kim wrote:
> > > > > On Sun, Oct 09, 2011 at 04:17:51PM +0800, Shaohua Li wrote:
> > > > > > On Sun, 2011-10-09 at 16:01 +0800, Minchan Kim wrote:
> > > > > > > On Sun, Oct 09, 2011 at 01:53:11PM +0800, Shaohua Li wrote:
> > > > > > > > On Sat, 2011-10-08 at 18:25 +0800, Minchan Kim wrote:
> > > > > > > > > On Sat, Oct 08, 2011 at 01:56:52PM +0800, Shaohua Li wrote:
> > > > > > > > > > On Sat, 2011-10-08 at 11:35 +0800, Shaohua Li wrote:
> > > > > > > > > > > On Sat, 2011-10-08 at 11:19 +0800, David Rientjes wrote:
> > > > > > > > > > > > On Sat, 8 Oct 2011, Shaohua Li wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  mm/vmscan.c |    7 ++++---
> > > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Index: linux/mm/vmscan.c
> > > > > > > > > > > > > ===================================================================
> > > > > > > > > > > > > --- linux.orig/mm/vmscan.c      2011-09-27 15:09:29.000000000 +0800
> > > > > > > > > > > > > +++ linux/mm/vmscan.c   2011-09-27 15:14:45.000000000 +0800
> > > > > > > > > > > > > @@ -2463,7 +2463,7 @@ loop_again:
> > > > > > > > > > > > >
> > > > > > > > > > > > >         for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > > > > > > > > > > > >                 unsigned long lru_pages = 0;
> > > > > > > > > > > > > -               int has_under_min_watermark_zone = 0;
> > > > > > > > > > > > > +               int has_under_min_watermark_zone = 1;
> > > > > > > > > > > >
> > > > > > > > > > > > bool
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >                 /* The swap token gets in the way of swapout... */
> > > > > > > > > > > > >                 if (!priority)
> > > > > > > > > > > > > @@ -2594,9 +2594,10 @@ loop_again:
> > > > > > > > > > > > >                                  * means that we have a GFP_ATOMIC allocation
> > > > > > > > > > > > >                                  * failure risk. Hurry up!
> > > > > > > > > > > > >                                  */
> > > > > > > > > > > > > -                               if (!zone_watermark_ok_safe(zone, order,
> > > > > > > > > > > > > +                               if (has_under_min_watermark_zone &&
> > > > > > > > > > > > > +                                           zone_watermark_ok_safe(zone, order,
> > > > > > > > > > > > >                                             min_wmark_pages(zone), end_zone, 0))
> > > > > > > > > > > > > -                                       has_under_min_watermark_zone = 1;
> > > > > > > > > > > > > +                                       has_under_min_watermark_zone = 0;
> > > > > > > > > > > > >                         } else {
> > > > > > > > > > > > >                                 /*
> > > > > > > > > > > > >                                  * If a zone reaches its high watermark,
> > > > > > > > > > > >
> > > > > > > > > > > > Ignore checking the min watermark for a moment and consider if all zones
> > > > > > > > > > > > are above the high watermark (a situation where kswapd does not need to
> > > > > > > > > > > > do aggressive reclaim), then has_under_min_watermark_zone doesn't get
> > > > > > > > > > > > cleared and never actually stalls on congestion_wait().  Notice this is
> > > > > > > > > > > > congestion_wait() and not wait_iff_congested(), so the clearing of
> > > > > > > > > > > > ZONE_CONGESTED doesn't prevent this.
> > > > > > > > > > > if all zones are above the high watermark, we will have i < 0 when
> > > > > > > > > > > detecting the highest imbalanced zone, and the whole loop will end
> > > > > > > > > > > without run into congestion_wait().
> > > > > > > > > > > or I can add a clearing has_under_min_watermark_zone in the else block
> > > > > > > > > > > to be safe.
> > > > > > > > > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v2
> > > > > > > > > >
> > > > > > > > > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > > > > > > > > failure risk. For a high end_zone, if any zone below or equal to it has min
> > > > > > > > > > matermark ok, we have no risk. But current logic is any zone has min watermark
> > > > > > > > > > not ok, then we have risk. This is wrong to me.
> > > > > > > > >
> > > > > > > > > I think it's not a right or wrong problem but a policy stuff.
> > > > > > > > > If we are going to start busy reclaiming for atomic allocation
> > > > > > > > > after we see all lower zones' min water mark pages are already consumed
> > > > > > > > > It could make you go through long latency and is likely to fail atomic allocation
> > > > > > > > > stream(Because, there is nothing to do for aotmic allocation fail in direct reclaim
> > > > > > > > > so kswapd should always do best effort for it)
> > > > > > > > >
> > > > > > > > > I don't mean you are wrong but we are very careful about this
> > > > > > > > > and at least need some experiments with atomic allocaion stream, I think.
> > > > > > > > yes. this is a policy problem. I just don't want the kswapd keep running
> > > > > > > > even there is no immediate risk of atomic allocation fail.
> > > > > > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > > > > > meet min watermark. So kswapd keeps running without a wait and builds
> > > > > > > > big priority.
> > > > > > >
> > > > > > > It could be but I think it's a mistake of admin if he handles such rare system.
> > > > > > > Couldn't he lower the reserved pages for highmem?
> > > > > > not because admin changes reserved pages. we still have the
> > > > > > zone->lowmem_reserve[] issue for zone_watermark_ok here.
> > > > >
> > > > > Sorry I couldn't understand your point.
> > > > > I mean if min watermark is too high, you could lower min_free_kbytes.
> > > > > If reserved pages is too high, you could handle lowmem_reserve_ratio.
> > > > > Could we solve the problem with those knobs?
> > > > I mean a system with default setting. Changing the knobs might solve the
> > > > problem, but few people know it, so not a right solution.
> > >
> > > We couldn't cover whole cases so that we should focus on common case.
> > > That's why we need knobs.
> > > Do you think your case is general one we should handle it by default?
> > > If you really think, we have to fix watermark setting logic rathe than kswapd.
> > it's pretty common a system has big high zone and small low zone these
> > days. Why current watermark setting is wrong?
> 
> You said as follows,
> 
> > > > > > > > One problem here is end_zone could be high, and low zone always doesn't
> > > > > > > > meet min watermark. So kswapd keeps running without a wait and builds
> 
> You should have said not "meet min watermark" but "meet min_watermak + reserved pages[HIGH_ZONE]"
> for the clearness. Sorry for misleading your statement. Now I got realized your intention.
I should have made it clear, sorry.

> > > > > In addition, kswapd could easily set all_unreclaimable of the zone in your example.
> > > > > Then, kswapd should just peek the zone once in a while if the zone is all_unreclaimable.
> > > > > It should be no problem in CPU overhead.
> > > > even the low zone has all_unreclaimable, the high zone will be scanned
> > > > again with low priority (because its high watermark might not be ok, but
> > > > min watermark is ok), and does not do congestion_wait. This will make a
> > >
> > > As you said, high zone is below high watermark so it's natural since it's
> > > one of kswapd goal. Why it doesn't sleep is that one zone of zonelists is
> > > below min_watmark. So it's natural in current policy, too.
> > there are two cases one zone is below min_watermark.
> > 1. the zone is below min_watermark for allocation in the zone. in this
> > case we need hurry up.
> > 2. the zone is below min_watermark for allocation from high zone. we
> > don't really need hurry up if other zone is above min_watermark.
> > Since low zone need to reserve pages for high zone, the second case
> > could be common.
> 
> You mean "lowmem_reserve"?
> It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
> because it could be fatal to allow process pages to be allocated from low zone.
> Also, We could set each ratio for reserved pages of zones.
> How could we make sure lower zones have enough free pages for higher zone?
lowmem_reserve causes the problem, but it's not a fault of
lowmem_reserve. I'm thinking changing it.

> > Yes, keeping kswapd running in this case can reduce the chance
> > GFP_ATOMIC failure. But my patch will not cause immediate failure
> > because there is still some zones which are above min_watermark and can
> > meet the GFP_ATOMIC allocation. And keeping kswapd running has some
> 
> True. It was why I said "I don't mean you are wrong but we are very careful about this."
> Normally, it could handle but might fail on sudden peak of atomic allocation stream.
> Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
> So I would like to be very careful and that's why I suggest we need at least some experiment.
> Through it, Andrew could make final call.
sure.

> > drawbacks:
> > 1. cpu overhead
> > 2. extend isolate window size, so trash working set.
> > Considering DMA zone, we almost always have DMA zone min_watermark not
> > ok for any allocation from high zone. So we will always have such
> > drawbacks.
> 
> I agree with you in that it's a problem.
> I think the real solution is to remove the zone from allocation fallback list in such case
> because the lower zone could never meet any allocation for the higher zone.
> But it makes code rather complicated as we have to consider admin who can change
> reserved pages anytime.
Not worthy the complication.

> So how about this?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8913374..f71ed2f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2693,8 +2693,16 @@ loop_again:
>                                  * failure risk. Hurry up!
>                                  */
>                                 if (!zone_watermark_ok_safe(zone, order,
> -                                           min_wmark_pages(zone), end_zone, 0))
> -                                       has_under_min_watermark_zone = 1;
> +                                           min_wmark_pages(zone), end_zone, 0)) {
> +                                       /*
> +                                        * In case of big reserved page for higher zone,
> +                                        * it is pointless to try reclaimaing pages quickly
> +                                        * because it could never meet the requirement.
> +                                        */
> +                                       if (zone->present_pages >
> +                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
> +                                               has_under_min_watermark_zone = 1;
> +                               }
>                         } else {
>                                 /*
>                                  * If a zone reaches its high watermark,
This looks like a workaround just for DMA zone. present_pages could be
bigger than min_mwark+lowmem_reserve. And We still suffered from the
issue, for example, a DMA32 zone with some pages allocated for DMA, or a
zone which has some lru pages, but still much smaller than high zone.

> Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
> In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.
yes, we can do this to avoid unnecessary scan. but DMA zone hasn't lru
pages, so not sure how big the benefit is here.

> > Or is something below better? we can avoid the big reserved pages
> > accounting to the min_wmark_pages for low zone. if high zone is under
> > min_wmark, kswapd will not sleep.
> >                                if (!zone_watermark_ok_safe(zone, order,
> > -                                            min_wmark_pages(zone), end_zone, 0))
> > +                                            min_wmark_pages(zone), 0, 0))
> >                                         has_under_min_watermark_zone = 1;
> 
> I think it's not a good idea since page allocator always considers classzone_idx.
> So although we fix kswapd issue through your changes, page allocator still can't allocate memory
> and wakes up kswapd, again.
why kswapd will be waked up again? The high zone itself still has
min_wark+low_reserve ok for the allocation(classzone_idx 0 means
checking the low_reserve for allocation from the zone itself), so the
allocation can be met.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v2]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-12  2:48                                 ` Shaohua Li
@ 2011-10-12  7:59                                   ` Minchan Kim
  2011-10-18  2:13                                     ` [patch v3]vmscan: " Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-12  7:59 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Wed, Oct 12, 2011 at 10:48:59AM +0800, Shaohua Li wrote:
 > > there are two cases one zone is below min_watermark.
> > > 1. the zone is below min_watermark for allocation in the zone. in this
> > > case we need hurry up.
> > > 2. the zone is below min_watermark for allocation from high zone. we
> > > don't really need hurry up if other zone is above min_watermark.
> > > Since low zone need to reserve pages for high zone, the second case
> > > could be common.
> > 
> > You mean "lowmem_reserve"?
> > It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
> > because it could be fatal to allow process pages to be allocated from low zone.
> > Also, We could set each ratio for reserved pages of zones.
> > How could we make sure lower zones have enough free pages for higher zone?
> lowmem_reserve causes the problem, but it's not a fault of
> lowmem_reserve. I'm thinking changing it.
> 
> > > Yes, keeping kswapd running in this case can reduce the chance
> > > GFP_ATOMIC failure. But my patch will not cause immediate failure
> > > because there is still some zones which are above min_watermark and can
> > > meet the GFP_ATOMIC allocation. And keeping kswapd running has some
> > 
> > True. It was why I said "I don't mean you are wrong but we are very careful about this."
> > Normally, it could handle but might fail on sudden peak of atomic allocation stream.
> > Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
> > So I would like to be very careful and that's why I suggest we need at least some experiment.
> > Through it, Andrew could make final call.
> sure.
> 
> > > drawbacks:
> > > 1. cpu overhead
> > > 2. extend isolate window size, so trash working set.
> > > Considering DMA zone, we almost always have DMA zone min_watermark not
> > > ok for any allocation from high zone. So we will always have such
> > > drawbacks.
> > 
> > I agree with you in that it's a problem.
> > I think the real solution is to remove the zone from allocation fallback list in such case
> > because the lower zone could never meet any allocation for the higher zone.
> > But it makes code rather complicated as we have to consider admin who can change
> > reserved pages anytime.
> Not worthy the complication.
> 
> > So how about this?
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 8913374..f71ed2f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2693,8 +2693,16 @@ loop_again:
> >                                  * failure risk. Hurry up!
> >                                  */
> >                                 if (!zone_watermark_ok_safe(zone, order,
> > -                                           min_wmark_pages(zone), end_zone, 0))
> > -                                       has_under_min_watermark_zone = 1;
> > +                                           min_wmark_pages(zone), end_zone, 0)) {
> > +                                       /*
> > +                                        * In case of big reserved page for higher zone,
> > +                                        * it is pointless to try reclaimaing pages quickly
> > +                                        * because it could never meet the requirement.
> > +                                        */
> > +                                       if (zone->present_pages >
> > +                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
> > +                                               has_under_min_watermark_zone = 1;
> > +                               }
> >                         } else {
> >                                 /*
> >                                  * If a zone reaches its high watermark,
> This looks like a workaround just for DMA zone. present_pages could be
> bigger than min_mwark+lowmem_reserve. And We still suffered from the
> issue, for example, a DMA32 zone with some pages allocated for DMA, or a
> zone which has some lru pages, but still much smaller than high zone.

Right. I thought about it but couldn't have a good idea for it. :(

> 
> > Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
> > In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.
> yes, we can do this to avoid unnecessary scan. but DMA zone hasn't lru
> pages, so not sure how big the benefit is here.

At least, we can prevent has_under_min_watermark_zone from set.
But it still have a problem you pointed out earlier.

> 
> > > Or is something below better? we can avoid the big reserved pages
> > > accounting to the min_wmark_pages for low zone. if high zone is under
> > > min_wmark, kswapd will not sleep.
> > >                                if (!zone_watermark_ok_safe(zone, order,
> > > -                                            min_wmark_pages(zone), end_zone, 0))
> > > +                                            min_wmark_pages(zone), 0, 0))
> > >                                         has_under_min_watermark_zone = 1;
> > 
> > I think it's not a good idea since page allocator always considers classzone_idx.
> > So although we fix kswapd issue through your changes, page allocator still can't allocate memory
> > and wakes up kswapd, again.
> why kswapd will be waked up again? The high zone itself still has
> min_wark+low_reserve ok for the allocation(classzone_idx 0 means
> checking the low_reserve for allocation from the zone itself), so the
> allocation can be met.

You're absolutely right.
I got confused. Sorry about that.

I like this than your old version.
That's because it could rush if one of zonelist is consumed as below min_watermak.
It could mitigate GFP_ALLOC fail than yours old version but still would be higher than now.
So, we need the number.

Could you repost this as formal patch with good comment and number?
Personally, I like description based on scenario with kind step-by-step.
Feel free to use my description in my patch if you want.

Thanks for patient discussion in spite of my irregular reply, Shaohua.

> 
> Thanks,
> Shaohua
> 

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch v3]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-12  7:59                                   ` Minchan Kim
@ 2011-10-18  2:13                                     ` Shaohua Li
  2011-10-27 22:50                                       ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-18  2:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Wed, 2011-10-12 at 15:59 +0800, Minchan Kim wrote:
> On Wed, Oct 12, 2011 at 10:48:59AM +0800, Shaohua Li wrote:
>  > > there are two cases one zone is below min_watermark.
> > > > 1. the zone is below min_watermark for allocation in the zone. in this
> > > > case we need hurry up.
> > > > 2. the zone is below min_watermark for allocation from high zone. we
> > > > don't really need hurry up if other zone is above min_watermark.
> > > > Since low zone need to reserve pages for high zone, the second case
> > > > could be common.
> > > 
> > > You mean "lowmem_reserve"?
> > > It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
> > > because it could be fatal to allow process pages to be allocated from low zone.
> > > Also, We could set each ratio for reserved pages of zones.
> > > How could we make sure lower zones have enough free pages for higher zone?
> > lowmem_reserve causes the problem, but it's not a fault of
> > lowmem_reserve. I'm thinking changing it.
> > 
> > > > Yes, keeping kswapd running in this case can reduce the chance
> > > > GFP_ATOMIC failure. But my patch will not cause immediate failure
> > > > because there is still some zones which are above min_watermark and can
> > > > meet the GFP_ATOMIC allocation. And keeping kswapd running has some
> > > 
> > > True. It was why I said "I don't mean you are wrong but we are very careful about this."
> > > Normally, it could handle but might fail on sudden peak of atomic allocation stream.
> > > Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
> > > So I would like to be very careful and that's why I suggest we need at least some experiment.
> > > Through it, Andrew could make final call.
> > sure.
> > 
> > > > drawbacks:
> > > > 1. cpu overhead
> > > > 2. extend isolate window size, so trash working set.
> > > > Considering DMA zone, we almost always have DMA zone min_watermark not
> > > > ok for any allocation from high zone. So we will always have such
> > > > drawbacks.
> > > 
> > > I agree with you in that it's a problem.
> > > I think the real solution is to remove the zone from allocation fallback list in such case
> > > because the lower zone could never meet any allocation for the higher zone.
> > > But it makes code rather complicated as we have to consider admin who can change
> > > reserved pages anytime.
> > Not worthy the complication.
> > 
> > > So how about this?
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 8913374..f71ed2f 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2693,8 +2693,16 @@ loop_again:
> > >                                  * failure risk. Hurry up!
> > >                                  */
> > >                                 if (!zone_watermark_ok_safe(zone, order,
> > > -                                           min_wmark_pages(zone), end_zone, 0))
> > > -                                       has_under_min_watermark_zone = 1;
> > > +                                           min_wmark_pages(zone), end_zone, 0)) {
> > > +                                       /*
> > > +                                        * In case of big reserved page for higher zone,
> > > +                                        * it is pointless to try reclaimaing pages quickly
> > > +                                        * because it could never meet the requirement.
> > > +                                        */
> > > +                                       if (zone->present_pages >
> > > +                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
> > > +                                               has_under_min_watermark_zone = 1;
> > > +                               }
> > >                         } else {
> > >                                 /*
> > >                                  * If a zone reaches its high watermark,
> > This looks like a workaround just for DMA zone. present_pages could be
> > bigger than min_mwark+lowmem_reserve. And We still suffered from the
> > issue, for example, a DMA32 zone with some pages allocated for DMA, or a
> > zone which has some lru pages, but still much smaller than high zone.
> 
> Right. I thought about it but couldn't have a good idea for it. :(
> 
> > 
> > > Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
> > > In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.
> > yes, we can do this to avoid unnecessary scan. but DMA zone hasn't lru
> > pages, so not sure how big the benefit is here.
> 
> At least, we can prevent has_under_min_watermark_zone from set.
> But it still have a problem you pointed out earlier.
> 
> > 
> > > > Or is something below better? we can avoid the big reserved pages
> > > > accounting to the min_wmark_pages for low zone. if high zone is under
> > > > min_wmark, kswapd will not sleep.
> > > >                                if (!zone_watermark_ok_safe(zone, order,
> > > > -                                            min_wmark_pages(zone), end_zone, 0))
> > > > +                                            min_wmark_pages(zone), 0, 0))
> > > >                                         has_under_min_watermark_zone = 1;
> > > 
> > > I think it's not a good idea since page allocator always considers classzone_idx.
> > > So although we fix kswapd issue through your changes, page allocator still can't allocate memory
> > > and wakes up kswapd, again.
> > why kswapd will be waked up again? The high zone itself still has
> > min_wark+low_reserve ok for the allocation(classzone_idx 0 means
> > checking the low_reserve for allocation from the zone itself), so the
> > allocation can be met.
> 
> You're absolutely right.
> I got confused. Sorry about that.
> 
> I like this than your old version.
> That's because it could rush if one of zonelist is consumed as below min_watermak.
> It could mitigate GFP_ALLOC fail than yours old version but still would be higher than now.
> So, we need the number.
> 
> Could you repost this as formal patch with good comment and number?
> Personally, I like description based on scenario with kind step-by-step.
> Feel free to use my description in my patch if you want.
> 
> Thanks for patient discussion in spite of my irregular reply, Shaohua.
Thanks for your time. Here is patch. I'm trying to get some number, but
didn't find a proper workload to demonstrate the atomic allocation
failure. Any suggestion for the workload?

Thanks,
Shaohua


Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v3

has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
failure risk. Current logic is if any zone has min watermark not ok, we have
risk.

Low zone needs reserve memory to avoid fallback from high zone. The reserved
memory is zone->lowmem_reserve[]. If high zone is big, low zone's
min_wmark + lowmem_reserve[] usually is big. Sometimes min_wmark +
lowmem_reserve[] could even be higher than zone->present_pages. An example is
DMA zone. Other low zones could have the similar high reserved memory though
might still have margins between reserved pages and present pages. So in kswapd
loop, if end_zone is a high zone, has_under_min_watermark_zone could be easily
set or always set for DMA.

Let's consider end_zone is a high zone and it has high_mwark not ok, but
min_mwark ok. A DMA zone always has present_pages less than reserved pages, so
has_under_min_watermark_zone is always set. When kswapd is running, there are
some drawbacks:
1. kswapd can keep unnecessary running without congestion_wait. high zone
already can meet GFP_ATOMIC. The running will waste some CPU.
2. kswapd can scan much more pages to trash working set. congestion_wait can
slow down scan if kswapd has trouble. Now congestion_wait is skipped, kswapd
will keep scanning unnecessary pages.

So since DMA zone always set has_under_min_watermark_zone, current logic actually
equals to that kswapd keeps running without congestion_wait till high zone has
high wmark ok when it has trouble. This is not intended.

In this path, we test the min_mwark against the zone itself. This doesn't change
the behavior of high zone. For low zone, we now exclude lowmem_reserve for high
zone to avoid unnecessary running.

Note: With this patch, we could have potential higher risk of GFP_ATOMIC failure.

v3: Uses a less intrusive method to determine if has_under_min_watermark_zone
should be set after discussion with Minchan.
v2: use bool and clear has_under_min_watermark_zone for zone with watermark ok
as suggested by David Rientjes.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 mm/vmscan.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux/mm/vmscan.c
===================================================================
--- linux.orig/mm/vmscan.c	2011-10-17 16:02:30.000000000 +0800
+++ linux/mm/vmscan.c	2011-10-18 09:32:23.000000000 +0800
@@ -2463,7 +2463,7 @@ loop_again:
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		unsigned long lru_pages = 0;
-		int has_under_min_watermark_zone = 0;
+		bool has_under_min_watermark_zone = false;
 
 		/* The swap token gets in the way of swapout... */
 		if (!priority)
@@ -2593,8 +2593,8 @@ loop_again:
 				 * failure risk. Hurry up!
 				 */
 				if (!zone_watermark_ok_safe(zone, order,
-					    min_wmark_pages(zone), end_zone, 0))
-					has_under_min_watermark_zone = 1;
+					    min_wmark_pages(zone), 0, 0))
+					has_under_min_watermark_zone = true;
 			} else {
 				/*
 				 * If a zone reaches its high watermark,


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v3]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-18  2:13                                     ` [patch v3]vmscan: " Shaohua Li
@ 2011-10-27 22:50                                       ` Minchan Kim
  2011-10-28  5:15                                         ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2011-10-27 22:50 UTC (permalink / raw)
  To: Shaohua Li
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Tue, Oct 18, 2011 at 10:13:52AM +0800, Shaohua Li wrote:
> On Wed, 2011-10-12 at 15:59 +0800, Minchan Kim wrote:
> > On Wed, Oct 12, 2011 at 10:48:59AM +0800, Shaohua Li wrote:
> >  > > there are two cases one zone is below min_watermark.
> > > > > 1. the zone is below min_watermark for allocation in the zone. in this
> > > > > case we need hurry up.
> > > > > 2. the zone is below min_watermark for allocation from high zone. we
> > > > > don't really need hurry up if other zone is above min_watermark.
> > > > > Since low zone need to reserve pages for high zone, the second case
> > > > > could be common.
> > > > 
> > > > You mean "lowmem_reserve"?
> > > > It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
> > > > because it could be fatal to allow process pages to be allocated from low zone.
> > > > Also, We could set each ratio for reserved pages of zones.
> > > > How could we make sure lower zones have enough free pages for higher zone?
> > > lowmem_reserve causes the problem, but it's not a fault of
> > > lowmem_reserve. I'm thinking changing it.
> > > 
> > > > > Yes, keeping kswapd running in this case can reduce the chance
> > > > > GFP_ATOMIC failure. But my patch will not cause immediate failure
> > > > > because there is still some zones which are above min_watermark and can
> > > > > meet the GFP_ATOMIC allocation. And keeping kswapd running has some
> > > > 
> > > > True. It was why I said "I don't mean you are wrong but we are very careful about this."
> > > > Normally, it could handle but might fail on sudden peak of atomic allocation stream.
> > > > Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
> > > > So I would like to be very careful and that's why I suggest we need at least some experiment.
> > > > Through it, Andrew could make final call.
> > > sure.
> > > 
> > > > > drawbacks:
> > > > > 1. cpu overhead
> > > > > 2. extend isolate window size, so trash working set.
> > > > > Considering DMA zone, we almost always have DMA zone min_watermark not
> > > > > ok for any allocation from high zone. So we will always have such
> > > > > drawbacks.
> > > > 
> > > > I agree with you in that it's a problem.
> > > > I think the real solution is to remove the zone from allocation fallback list in such case
> > > > because the lower zone could never meet any allocation for the higher zone.
> > > > But it makes code rather complicated as we have to consider admin who can change
> > > > reserved pages anytime.
> > > Not worthy the complication.
> > > 
> > > > So how about this?
> > > > 
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 8913374..f71ed2f 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2693,8 +2693,16 @@ loop_again:
> > > >                                  * failure risk. Hurry up!
> > > >                                  */
> > > >                                 if (!zone_watermark_ok_safe(zone, order,
> > > > -                                           min_wmark_pages(zone), end_zone, 0))
> > > > -                                       has_under_min_watermark_zone = 1;
> > > > +                                           min_wmark_pages(zone), end_zone, 0)) {
> > > > +                                       /*
> > > > +                                        * In case of big reserved page for higher zone,
> > > > +                                        * it is pointless to try reclaimaing pages quickly
> > > > +                                        * because it could never meet the requirement.
> > > > +                                        */
> > > > +                                       if (zone->present_pages >
> > > > +                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
> > > > +                                               has_under_min_watermark_zone = 1;
> > > > +                               }
> > > >                         } else {
> > > >                                 /*
> > > >                                  * If a zone reaches its high watermark,
> > > This looks like a workaround just for DMA zone. present_pages could be
> > > bigger than min_mwark+lowmem_reserve. And We still suffered from the
> > > issue, for example, a DMA32 zone with some pages allocated for DMA, or a
> > > zone which has some lru pages, but still much smaller than high zone.
> > 
> > Right. I thought about it but couldn't have a good idea for it. :(
> > 
> > > 
> > > > Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
> > > > In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.
> > > yes, we can do this to avoid unnecessary scan. but DMA zone hasn't lru
> > > pages, so not sure how big the benefit is here.
> > 
> > At least, we can prevent has_under_min_watermark_zone from set.
> > But it still have a problem you pointed out earlier.
> > 
> > > 
> > > > > Or is something below better? we can avoid the big reserved pages
> > > > > accounting to the min_wmark_pages for low zone. if high zone is under
> > > > > min_wmark, kswapd will not sleep.
> > > > >                                if (!zone_watermark_ok_safe(zone, order,
> > > > > -                                            min_wmark_pages(zone), end_zone, 0))
> > > > > +                                            min_wmark_pages(zone), 0, 0))
> > > > >                                         has_under_min_watermark_zone = 1;
> > > > 
> > > > I think it's not a good idea since page allocator always considers classzone_idx.
> > > > So although we fix kswapd issue through your changes, page allocator still can't allocate memory
> > > > and wakes up kswapd, again.
> > > why kswapd will be waked up again? The high zone itself still has
> > > min_wark+low_reserve ok for the allocation(classzone_idx 0 means
> > > checking the low_reserve for allocation from the zone itself), so the
> > > allocation can be met.
> > 
> > You're absolutely right.
> > I got confused. Sorry about that.
> > 
> > I like this than your old version.
> > That's because it could rush if one of zonelist is consumed as below min_watermak.
> > It could mitigate GFP_ALLOC fail than yours old version but still would be higher than now.
> > So, we need the number.
> > 
> > Could you repost this as formal patch with good comment and number?
> > Personally, I like description based on scenario with kind step-by-step.
> > Feel free to use my description in my patch if you want.
> > 
> > Thanks for patient discussion in spite of my irregular reply, Shaohua.
> Thanks for your time. Here is patch. I'm trying to get some number, but
> didn't find a proper workload to demonstrate the atomic allocation
> failure. Any suggestion for the workload?

Sorry for really really late response.

Hmm.. I didn't have a such test.
But it seems recently we had a such report, again.
https://lkml.org/lkml/2011/10/18/131
If you want it, you could ask him.

> 
> Thanks,
> Shaohua
> 
> 
> Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v3
> 
> has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> failure risk. Current logic is if any zone has min watermark not ok, we have
> risk.
> 
> Low zone needs reserve memory to avoid fallback from high zone. The reserved
> memory is zone->lowmem_reserve[]. If high zone is big, low zone's
> min_wmark + lowmem_reserve[] usually is big. Sometimes min_wmark +
> lowmem_reserve[] could even be higher than zone->present_pages. An example is
> DMA zone. Other low zones could have the similar high reserved memory though
> might still have margins between reserved pages and present pages. So in kswapd
> loop, if end_zone is a high zone, has_under_min_watermark_zone could be easily
> set or always set for DMA.
> 
> Let's consider end_zone is a high zone and it has high_mwark not ok, but
> min_mwark ok. A DMA zone always has present_pages less than reserved pages, so
> has_under_min_watermark_zone is always set. When kswapd is running, there are
> some drawbacks:
> 1. kswapd can keep unnecessary running without congestion_wait. high zone
> already can meet GFP_ATOMIC. The running will waste some CPU.
> 2. kswapd can scan much more pages to trash working set. congestion_wait can
> slow down scan if kswapd has trouble. Now congestion_wait is skipped, kswapd
> will keep scanning unnecessary pages.
> 
> So since DMA zone always set has_under_min_watermark_zone, current logic actually
> equals to that kswapd keeps running without congestion_wait till high zone has
> high wmark ok when it has trouble. This is not intended.
> 
> In this path, we test the min_mwark against the zone itself. This doesn't change
> the behavior of high zone. For low zone, we now exclude lowmem_reserve for high
> zone to avoid unnecessary running.
> 
> Note: With this patch, we could have potential higher risk of GFP_ATOMIC failure.
> 
> v3: Uses a less intrusive method to determine if has_under_min_watermark_zone
> should be set after discussion with Minchan.
> v2: use bool and clear has_under_min_watermark_zone for zone with watermark ok
> as suggested by David Rientjes.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

As you know, I give my reviewed-by as a token of "code looks good to me".
But still, we see atomic allocation failure messasge recently.
So you need to get a acked-by as a toekn of "I support this idea" of
others(ie, Mel/Rik are right person).

I hope it would be better to write down it in description that
it's real problem of your system and the patch solves it.

Thanks!

> ---
>  mm/vmscan.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Index: linux/mm/vmscan.c
> ===================================================================
> --- linux.orig/mm/vmscan.c	2011-10-17 16:02:30.000000000 +0800
> +++ linux/mm/vmscan.c	2011-10-18 09:32:23.000000000 +0800
> @@ -2463,7 +2463,7 @@ loop_again:
>  
>  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
>  		unsigned long lru_pages = 0;
> -		int has_under_min_watermark_zone = 0;
> +		bool has_under_min_watermark_zone = false;
>  
>  		/* The swap token gets in the way of swapout... */
>  		if (!priority)
> @@ -2593,8 +2593,8 @@ loop_again:
>  				 * failure risk. Hurry up!
>  				 */
>  				if (!zone_watermark_ok_safe(zone, order,
> -					    min_wmark_pages(zone), end_zone, 0))
> -					has_under_min_watermark_zone = 1;
> +					    min_wmark_pages(zone), 0, 0))
> +					has_under_min_watermark_zone = true;
>  			} else {
>  				/*
>  				 * If a zone reaches its high watermark,
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v3]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-27 22:50                                       ` Minchan Kim
@ 2011-10-28  5:15                                         ` Shaohua Li
  2011-11-07  5:15                                           ` Shaohua Li
  0 siblings, 1 reply; 23+ messages in thread
From: Shaohua Li @ 2011-10-28  5:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Fri, 2011-10-28 at 06:50 +0800, Minchan Kim wrote:
> On Tue, Oct 18, 2011 at 10:13:52AM +0800, Shaohua Li wrote:
> > On Wed, 2011-10-12 at 15:59 +0800, Minchan Kim wrote:
> > > On Wed, Oct 12, 2011 at 10:48:59AM +0800, Shaohua Li wrote:
> > >  > > there are two cases one zone is below min_watermark.
> > > > > > 1. the zone is below min_watermark for allocation in the zone. in this
> > > > > > case we need hurry up.
> > > > > > 2. the zone is below min_watermark for allocation from high zone. we
> > > > > > don't really need hurry up if other zone is above min_watermark.
> > > > > > Since low zone need to reserve pages for high zone, the second case
> > > > > > could be common.
> > > > > 
> > > > > You mean "lowmem_reserve"?
> > > > > It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
> > > > > because it could be fatal to allow process pages to be allocated from low zone.
> > > > > Also, We could set each ratio for reserved pages of zones.
> > > > > How could we make sure lower zones have enough free pages for higher zone?
> > > > lowmem_reserve causes the problem, but it's not a fault of
> > > > lowmem_reserve. I'm thinking changing it.
> > > > 
> > > > > > Yes, keeping kswapd running in this case can reduce the chance
> > > > > > GFP_ATOMIC failure. But my patch will not cause immediate failure
> > > > > > because there is still some zones which are above min_watermark and can
> > > > > > meet the GFP_ATOMIC allocation. And keeping kswapd running has some
> > > > > 
> > > > > True. It was why I said "I don't mean you are wrong but we are very careful about this."
> > > > > Normally, it could handle but might fail on sudden peak of atomic allocation stream.
> > > > > Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
> > > > > So I would like to be very careful and that's why I suggest we need at least some experiment.
> > > > > Through it, Andrew could make final call.
> > > > sure.
> > > > 
> > > > > > drawbacks:
> > > > > > 1. cpu overhead
> > > > > > 2. extend isolate window size, so trash working set.
> > > > > > Considering DMA zone, we almost always have DMA zone min_watermark not
> > > > > > ok for any allocation from high zone. So we will always have such
> > > > > > drawbacks.
> > > > > 
> > > > > I agree with you in that it's a problem.
> > > > > I think the real solution is to remove the zone from allocation fallback list in such case
> > > > > because the lower zone could never meet any allocation for the higher zone.
> > > > > But it makes code rather complicated as we have to consider admin who can change
> > > > > reserved pages anytime.
> > > > Not worthy the complication.
> > > > 
> > > > > So how about this?
> > > > > 
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 8913374..f71ed2f 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -2693,8 +2693,16 @@ loop_again:
> > > > >                                  * failure risk. Hurry up!
> > > > >                                  */
> > > > >                                 if (!zone_watermark_ok_safe(zone, order,
> > > > > -                                           min_wmark_pages(zone), end_zone, 0))
> > > > > -                                       has_under_min_watermark_zone = 1;
> > > > > +                                           min_wmark_pages(zone), end_zone, 0)) {
> > > > > +                                       /*
> > > > > +                                        * In case of big reserved page for higher zone,
> > > > > +                                        * it is pointless to try reclaimaing pages quickly
> > > > > +                                        * because it could never meet the requirement.
> > > > > +                                        */
> > > > > +                                       if (zone->present_pages >
> > > > > +                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
> > > > > +                                               has_under_min_watermark_zone = 1;
> > > > > +                               }
> > > > >                         } else {
> > > > >                                 /*
> > > > >                                  * If a zone reaches its high watermark,
> > > > This looks like a workaround just for DMA zone. present_pages could be
> > > > bigger than min_mwark+lowmem_reserve. And We still suffered from the
> > > > issue, for example, a DMA32 zone with some pages allocated for DMA, or a
> > > > zone which has some lru pages, but still much smaller than high zone.
> > > 
> > > Right. I thought about it but couldn't have a good idea for it. :(
> > > 
> > > > 
> > > > > Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
> > > > > In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.
> > > > yes, we can do this to avoid unnecessary scan. but DMA zone hasn't lru
> > > > pages, so not sure how big the benefit is here.
> > > 
> > > At least, we can prevent has_under_min_watermark_zone from set.
> > > But it still have a problem you pointed out earlier.
> > > 
> > > > 
> > > > > > Or is something below better? we can avoid the big reserved pages
> > > > > > accounting to the min_wmark_pages for low zone. if high zone is under
> > > > > > min_wmark, kswapd will not sleep.
> > > > > >                                if (!zone_watermark_ok_safe(zone, order,
> > > > > > -                                            min_wmark_pages(zone), end_zone, 0))
> > > > > > +                                            min_wmark_pages(zone), 0, 0))
> > > > > >                                         has_under_min_watermark_zone = 1;
> > > > > 
> > > > > I think it's not a good idea since page allocator always considers classzone_idx.
> > > > > So although we fix kswapd issue through your changes, page allocator still can't allocate memory
> > > > > and wakes up kswapd, again.
> > > > why kswapd will be waked up again? The high zone itself still has
> > > > min_wark+low_reserve ok for the allocation(classzone_idx 0 means
> > > > checking the low_reserve for allocation from the zone itself), so the
> > > > allocation can be met.
> > > 
> > > You're absolutely right.
> > > I got confused. Sorry about that.
> > > 
> > > I like this than your old version.
> > > That's because it could rush if one of zonelist is consumed as below min_watermak.
> > > It could mitigate GFP_ALLOC fail than yours old version but still would be higher than now.
> > > So, we need the number.
> > > 
> > > Could you repost this as formal patch with good comment and number?
> > > Personally, I like description based on scenario with kind step-by-step.
> > > Feel free to use my description in my patch if you want.
> > > 
> > > Thanks for patient discussion in spite of my irregular reply, Shaohua.
> > Thanks for your time. Here is patch. I'm trying to get some number, but
> > didn't find a proper workload to demonstrate the atomic allocation
> > failure. Any suggestion for the workload?
> 
> Sorry for really really late response.
> 
> Hmm.. I didn't have a such test.
> But it seems recently we had a such report, again.
> https://lkml.org/lkml/2011/10/18/131
> If you want it, you could ask him.
I'll ask.

> > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v3
> > 
> > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > failure risk. Current logic is if any zone has min watermark not ok, we have
> > risk.
> > 
> > Low zone needs reserve memory to avoid fallback from high zone. The reserved
> > memory is zone->lowmem_reserve[]. If high zone is big, low zone's
> > min_wmark + lowmem_reserve[] usually is big. Sometimes min_wmark +
> > lowmem_reserve[] could even be higher than zone->present_pages. An example is
> > DMA zone. Other low zones could have the similar high reserved memory though
> > might still have margins between reserved pages and present pages. So in kswapd
> > loop, if end_zone is a high zone, has_under_min_watermark_zone could be easily
> > set or always set for DMA.
> > 
> > Let's consider end_zone is a high zone and it has high_mwark not ok, but
> > min_mwark ok. A DMA zone always has present_pages less than reserved pages, so
> > has_under_min_watermark_zone is always set. When kswapd is running, there are
> > some drawbacks:
> > 1. kswapd can keep unnecessary running without congestion_wait. high zone
> > already can meet GFP_ATOMIC. The running will waste some CPU.
> > 2. kswapd can scan much more pages to trash working set. congestion_wait can
> > slow down scan if kswapd has trouble. Now congestion_wait is skipped, kswapd
> > will keep scanning unnecessary pages.
> > 
> > So since DMA zone always set has_under_min_watermark_zone, current logic actually
> > equals to that kswapd keeps running without congestion_wait till high zone has
> > high wmark ok when it has trouble. This is not intended.
> > 
> > In this path, we test the min_mwark against the zone itself. This doesn't change
> > the behavior of high zone. For low zone, we now exclude lowmem_reserve for high
> > zone to avoid unnecessary running.
> > 
> > Note: With this patch, we could have potential higher risk of GFP_ATOMIC failure.
> > 
> > v3: Uses a less intrusive method to determine if has_under_min_watermark_zone
> > should be set after discussion with Minchan.
> > v2: use bool and clear has_under_min_watermark_zone for zone with watermark ok
> > as suggested by David Rientjes.
> > 
> > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> As you know, I give my reviewed-by as a token of "code looks good to me".
> But still, we see atomic allocation failure messasge recently.
> So you need to get a acked-by as a toekn of "I support this idea" of
> others(ie, Mel/Rik are right person).
> 
> I hope it would be better to write down it in description that
> it's real problem of your system and the patch solves it.
Thanks, I haven't test case exposing this issue, but just thought the
logic looks buggy.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch v3]vmscan: correctly detect GFP_ATOMIC allocation failure
  2011-10-28  5:15                                         ` Shaohua Li
@ 2011-11-07  5:15                                           ` Shaohua Li
  0 siblings, 0 replies; 23+ messages in thread
From: Shaohua Li @ 2011-11-07  5:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, mel, Rik van Riel, Michal Hocko, linux-mm

On Fri, 2011-10-28 at 13:15 +0800, Shaohua Li wrote:
> On Fri, 2011-10-28 at 06:50 +0800, Minchan Kim wrote:
> > On Tue, Oct 18, 2011 at 10:13:52AM +0800, Shaohua Li wrote:
> > > On Wed, 2011-10-12 at 15:59 +0800, Minchan Kim wrote:
> > > > On Wed, Oct 12, 2011 at 10:48:59AM +0800, Shaohua Li wrote:
> > > >  > > there are two cases one zone is below min_watermark.
> > > > > > > 1. the zone is below min_watermark for allocation in the zone. in this
> > > > > > > case we need hurry up.
> > > > > > > 2. the zone is below min_watermark for allocation from high zone. we
> > > > > > > don't really need hurry up if other zone is above min_watermark.
> > > > > > > Since low zone need to reserve pages for high zone, the second case
> > > > > > > could be common.
> > > > > > 
> > > > > > You mean "lowmem_reserve"?
> > > > > > It means opposite. It is a mechanism to defend using of lowmem pages from high zone allocation
> > > > > > because it could be fatal to allow process pages to be allocated from low zone.
> > > > > > Also, We could set each ratio for reserved pages of zones.
> > > > > > How could we make sure lower zones have enough free pages for higher zone?
> > > > > lowmem_reserve causes the problem, but it's not a fault of
> > > > > lowmem_reserve. I'm thinking changing it.
> > > > > 
> > > > > > > Yes, keeping kswapd running in this case can reduce the chance
> > > > > > > GFP_ATOMIC failure. But my patch will not cause immediate failure
> > > > > > > because there is still some zones which are above min_watermark and can
> > > > > > > meet the GFP_ATOMIC allocation. And keeping kswapd running has some
> > > > > > 
> > > > > > True. It was why I said "I don't mean you are wrong but we are very careful about this."
> > > > > > Normally, it could handle but might fail on sudden peak of atomic allocation stream.
> > > > > > Recently, we have suffered from many reporting of GFP_AOTMIC allocation than olded.
> > > > > > So I would like to be very careful and that's why I suggest we need at least some experiment.
> > > > > > Through it, Andrew could make final call.
> > > > > sure.
> > > > > 
> > > > > > > drawbacks:
> > > > > > > 1. cpu overhead
> > > > > > > 2. extend isolate window size, so trash working set.
> > > > > > > Considering DMA zone, we almost always have DMA zone min_watermark not
> > > > > > > ok for any allocation from high zone. So we will always have such
> > > > > > > drawbacks.
> > > > > > 
> > > > > > I agree with you in that it's a problem.
> > > > > > I think the real solution is to remove the zone from allocation fallback list in such case
> > > > > > because the lower zone could never meet any allocation for the higher zone.
> > > > > > But it makes code rather complicated as we have to consider admin who can change
> > > > > > reserved pages anytime.
> > > > > Not worthy the complication.
> > > > > 
> > > > > > So how about this?
> > > > > > 
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 8913374..f71ed2f 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -2693,8 +2693,16 @@ loop_again:
> > > > > >                                  * failure risk. Hurry up!
> > > > > >                                  */
> > > > > >                                 if (!zone_watermark_ok_safe(zone, order,
> > > > > > -                                           min_wmark_pages(zone), end_zone, 0))
> > > > > > -                                       has_under_min_watermark_zone = 1;
> > > > > > +                                           min_wmark_pages(zone), end_zone, 0)) {
> > > > > > +                                       /*
> > > > > > +                                        * In case of big reserved page for higher zone,
> > > > > > +                                        * it is pointless to try reclaimaing pages quickly
> > > > > > +                                        * because it could never meet the requirement.
> > > > > > +                                        */
> > > > > > +                                       if (zone->present_pages >
> > > > > > +                                               min_wmark_pages(zone) + zone->lowmem_reserve[end_zone])
> > > > > > +                                               has_under_min_watermark_zone = 1;
> > > > > > +                               }
> > > > > >                         } else {
> > > > > >                                 /*
> > > > > >                                  * If a zone reaches its high watermark,
> > > > > This looks like a workaround just for DMA zone. present_pages could be
> > > > > bigger than min_mwark+lowmem_reserve. And We still suffered from the
> > > > > issue, for example, a DMA32 zone with some pages allocated for DMA, or a
> > > > > zone which has some lru pages, but still much smaller than high zone.
> > > > 
> > > > Right. I thought about it but couldn't have a good idea for it. :(
> > > > 
> > > > > 
> > > > > > Even, we could apply this at starting of the loop so that we can avoid unnecessary scanning st the beginning.
> > > > > > In that case, we have to apply zone->lowmem_reserve[end_zone] only because we have to consider NO_WATERMARK alloc case.
> > > > > yes, we can do this to avoid unnecessary scan. but DMA zone hasn't lru
> > > > > pages, so not sure how big the benefit is here.
> > > > 
> > > > At least, we can prevent has_under_min_watermark_zone from set.
> > > > But it still have a problem you pointed out earlier.
> > > > 
> > > > > 
> > > > > > > Or is something below better? we can avoid the big reserved pages
> > > > > > > accounting to the min_wmark_pages for low zone. if high zone is under
> > > > > > > min_wmark, kswapd will not sleep.
> > > > > > >                                if (!zone_watermark_ok_safe(zone, order,
> > > > > > > -                                            min_wmark_pages(zone), end_zone, 0))
> > > > > > > +                                            min_wmark_pages(zone), 0, 0))
> > > > > > >                                         has_under_min_watermark_zone = 1;
> > > > > > 
> > > > > > I think it's not a good idea since page allocator always considers classzone_idx.
> > > > > > So although we fix kswapd issue through your changes, page allocator still can't allocate memory
> > > > > > and wakes up kswapd, again.
> > > > > why kswapd will be waked up again? The high zone itself still has
> > > > > min_wark+low_reserve ok for the allocation(classzone_idx 0 means
> > > > > checking the low_reserve for allocation from the zone itself), so the
> > > > > allocation can be met.
> > > > 
> > > > You're absolutely right.
> > > > I got confused. Sorry about that.
> > > > 
> > > > I like this than your old version.
> > > > That's because it could rush if one of zonelist is consumed as below min_watermak.
> > > > It could mitigate GFP_ALLOC fail than yours old version but still would be higher than now.
> > > > So, we need the number.
> > > > 
> > > > Could you repost this as formal patch with good comment and number?
> > > > Personally, I like description based on scenario with kind step-by-step.
> > > > Feel free to use my description in my patch if you want.
> > > > 
> > > > Thanks for patient discussion in spite of my irregular reply, Shaohua.
> > > Thanks for your time. Here is patch. I'm trying to get some number, but
> > > didn't find a proper workload to demonstrate the atomic allocation
> > > failure. Any suggestion for the workload?
> > 
> > Sorry for really really late response.
> > 
> > Hmm.. I didn't have a such test.
> > But it seems recently we had a such report, again.
> > https://lkml.org/lkml/2011/10/18/131
> > If you want it, you could ask him.
> I'll ask.
This issue isn't related to GFP_ATOMIC.
I tested a lot of fio/ffsb/tcp/udp tests in 2 machines (one two socket
and the other 4 socket) here and didn't find any problem. Apparently
this doesn't mean the test proves there is no problem but I hope
somebody can look at the patch.

Thanks,
Shaohua

> > > Subject: vmscan: correctly detect GFP_ATOMIC allocation failure -v3
> > > 
> > > has_under_min_watermark_zone is used to detect if there is GFP_ATOMIC allocation
> > > failure risk. Current logic is if any zone has min watermark not ok, we have
> > > risk.
> > > 
> > > Low zone needs reserve memory to avoid fallback from high zone. The reserved
> > > memory is zone->lowmem_reserve[]. If high zone is big, low zone's
> > > min_wmark + lowmem_reserve[] usually is big. Sometimes min_wmark +
> > > lowmem_reserve[] could even be higher than zone->present_pages. An example is
> > > DMA zone. Other low zones could have the similar high reserved memory though
> > > might still have margins between reserved pages and present pages. So in kswapd
> > > loop, if end_zone is a high zone, has_under_min_watermark_zone could be easily
> > > set or always set for DMA.
> > > 
> > > Let's consider end_zone is a high zone and it has high_mwark not ok, but
> > > min_mwark ok. A DMA zone always has present_pages less than reserved pages, so
> > > has_under_min_watermark_zone is always set. When kswapd is running, there are
> > > some drawbacks:
> > > 1. kswapd can keep unnecessary running without congestion_wait. high zone
> > > already can meet GFP_ATOMIC. The running will waste some CPU.
> > > 2. kswapd can scan much more pages to trash working set. congestion_wait can
> > > slow down scan if kswapd has trouble. Now congestion_wait is skipped, kswapd
> > > will keep scanning unnecessary pages.
> > > 
> > > So since DMA zone always set has_under_min_watermark_zone, current logic actually
> > > equals to that kswapd keeps running without congestion_wait till high zone has
> > > high wmark ok when it has trouble. This is not intended.
> > > 
> > > In this path, we test the min_mwark against the zone itself. This doesn't change
> > > the behavior of high zone. For low zone, we now exclude lowmem_reserve for high
> > > zone to avoid unnecessary running.
> > > 
> > > Note: With this patch, we could have potential higher risk of GFP_ATOMIC failure.
> > > 
> > > v3: Uses a less intrusive method to determine if has_under_min_watermark_zone
> > > should be set after discussion with Minchan.
> > > v2: use bool and clear has_under_min_watermark_zone for zone with watermark ok
> > > as suggested by David Rientjes.
> > > 
> > > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > 
> > As you know, I give my reviewed-by as a token of "code looks good to me".
> > But still, we see atomic allocation failure messasge recently.
> > So you need to get a acked-by as a toekn of "I support this idea" of
> > others(ie, Mel/Rik are right person).
> > 
> > I hope it would be better to write down it in description that
> > it's real problem of your system and the patch solves it.
> Thanks, I haven't test case exposing this issue, but just thought the
> logic looks buggy.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-11-07  5:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-27  7:23 [patch 2/2]vmscan: correctly detect GFP_ATOMIC allocation failure Shaohua Li
2011-09-27 11:28 ` Michal Hocko
2011-09-28  0:48   ` Shaohua Li
2011-09-28  9:27     ` Michal Hocko
2011-10-08  3:14       ` Shaohua Li
2011-10-08  3:19         ` David Rientjes
2011-10-08  3:35           ` Shaohua Li
2011-10-08  5:56             ` [patch v2]vmscan: " Shaohua Li
2011-10-08 10:25               ` Minchan Kim
2011-10-09  5:53                 ` Shaohua Li
2011-10-09  8:01                   ` Minchan Kim
2011-10-09  8:17                     ` Shaohua Li
2011-10-09 15:10                       ` Minchan Kim
2011-10-10  7:28                         ` Shaohua Li
2011-10-10 15:42                           ` Minchan Kim
2011-10-11  5:30                             ` Shaohua Li
2011-10-11  6:54                               ` Minchan Kim
2011-10-12  2:48                                 ` Shaohua Li
2011-10-12  7:59                                   ` Minchan Kim
2011-10-18  2:13                                     ` [patch v3]vmscan: " Shaohua Li
2011-10-27 22:50                                       ` Minchan Kim
2011-10-28  5:15                                         ` Shaohua Li
2011-11-07  5:15                                           ` Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.