All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Vinayak Menon <vinmenon@codeaurora.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, vdavydov@parallels.com, mhocko@suse.cz,
	mgorman@suse.de, minchan@kernel.org
Subject: Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated
Date: Thu, 15 Jan 2015 17:17:28 -0800	[thread overview]
Message-ID: <20150115171728.ebc77a48.akpm@linux-foundation.org> (raw)
In-Reply-To: <1421235419-30736-1-git-send-email-vinmenon@codeaurora.org>

On Wed, 14 Jan 2015 17:06:59 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:

> It is observed that sometimes multiple tasks get blocked for long
> in the congestion_wait loop below, in shrink_inactive_list. This
> is because of vm_stat values not being synced.
> 
> (__schedule) from [<c0a03328>]
> (schedule_timeout) from [<c0a04940>]
> (io_schedule_timeout) from [<c01d585c>]
> (congestion_wait) from [<c01cc9d8>]
> (shrink_inactive_list) from [<c01cd034>]
> (shrink_zone) from [<c01cdd08>]
> (try_to_free_pages) from [<c01c442c>]
> (__alloc_pages_nodemask) from [<c01f1884>]
> (new_slab) from [<c09fcf60>]
> (__slab_alloc) from [<c01f1a6c>]
> 
> In one such instance, zone_page_state(zone, NR_ISOLATED_FILE)
> had returned 14, zone_page_state(zone, NR_INACTIVE_FILE)
> returned 92, and GFP_IOFS was set, and this resulted
> in too_many_isolated returning true. But one of the CPU's
> pageset vm_stat_diff had NR_ISOLATED_FILE as "-14". So the
> actual isolated count was zero. As there weren't any more
> updates to NR_ISOLATED_FILE and vmstat_update deffered work
> had not been scheduled yet, 7 tasks were spinning in the
> congestion wait loop for around 4 seconds, in the direct
> reclaim path.
> 
> This patch uses zone_page_state_snapshot instead, but restricts
> its usage to avoid performance penalty.

Seems reasonable.

>
> ...
>
> @@ -1516,15 +1531,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	unsigned long nr_immediate = 0;
>  	isolate_mode_t isolate_mode = 0;
>  	int file = is_file_lru(lru);
> +	int safe = 0;
>  	struct zone *zone = lruvec_zone(lruvec);
>  	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>  
> -	while (unlikely(too_many_isolated(zone, file, sc))) {
> +	while (unlikely(too_many_isolated(zone, file, sc, safe))) {
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
>  
>  		/* We are about to die and free our memory. Return now. */
>  		if (fatal_signal_pending(current))
>  			return SWAP_CLUSTER_MAX;
> +
> +		safe = 1;
>  	}

But here and under the circumstances you describe, we'll call
congestion_wait() a single time.  That shouldn't have occurred.

So how about we put the fallback logic into too_many_isolated() itself?



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix

Move the zone_page_state_snapshot() fallback logic into
too_many_isolated(), so shrink_inactive_list() doesn't incorrectly call
congestion_wait().

Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff -puN mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix
+++ a/mm/vmscan.c
@@ -1402,7 +1402,7 @@ int isolate_lru_page(struct page *page)
 }
 
 static int __too_many_isolated(struct zone *zone, int file,
-	struct scan_control *sc, int safe)
+			       struct scan_control *sc, int safe)
 {
 	unsigned long inactive, isolated;
 
@@ -1435,7 +1435,7 @@ static int __too_many_isolated(struct zo
  * unnecessary swapping, thrashing and OOM.
  */
 static int too_many_isolated(struct zone *zone, int file,
-		struct scan_control *sc, int safe)
+			     struct scan_control *sc)
 {
 	if (current_is_kswapd())
 		return 0;
@@ -1443,12 +1443,14 @@ static int too_many_isolated(struct zone
 	if (!global_reclaim(sc))
 		return 0;
 
-	if (unlikely(__too_many_isolated(zone, file, sc, 0))) {
-		if (safe)
-			return __too_many_isolated(zone, file, sc, safe);
-		else
-			return 1;
-	}
+	/*
+	 * __too_many_isolated(safe=0) is fast but inaccurate, because it
+	 * doesn't account for the vm_stat_diff[] counters.  So if it looks
+	 * like too_many_isolated() is about to return true, fall back to the
+	 * slower, more accurate zone_page_state_snapshot().
+	 */
+	if (unlikely(__too_many_isolated(zone, file, sc, 0)))
+		return __too_many_isolated(zone, file, sc, safe);
 
 	return 0;
 }
@@ -1540,18 +1542,15 @@ shrink_inactive_list(unsigned long nr_to
 	unsigned long nr_immediate = 0;
 	isolate_mode_t isolate_mode = 0;
 	int file = is_file_lru(lru);
-	int safe = 0;
 	struct zone *zone = lruvec_zone(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
-	while (unlikely(too_many_isolated(zone, file, sc, safe))) {
+	while (unlikely(too_many_isolated(zone, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/* We are about to die and free our memory. Return now. */
 		if (fatal_signal_pending(current))
 			return SWAP_CLUSTER_MAX;
-
-		safe = 1;
 	}
 
 	lru_add_drain();
_


WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Vinayak Menon <vinmenon@codeaurora.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, vdavydov@parallels.com, mhocko@suse.cz,
	mgorman@suse.de, minchan@kernel.org
Subject: Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated
Date: Thu, 15 Jan 2015 17:17:28 -0800	[thread overview]
Message-ID: <20150115171728.ebc77a48.akpm@linux-foundation.org> (raw)
In-Reply-To: <1421235419-30736-1-git-send-email-vinmenon@codeaurora.org>

On Wed, 14 Jan 2015 17:06:59 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:

> It is observed that sometimes multiple tasks get blocked for long
> in the congestion_wait loop below, in shrink_inactive_list. This
> is because of vm_stat values not being synced.
> 
> (__schedule) from [<c0a03328>]
> (schedule_timeout) from [<c0a04940>]
> (io_schedule_timeout) from [<c01d585c>]
> (congestion_wait) from [<c01cc9d8>]
> (shrink_inactive_list) from [<c01cd034>]
> (shrink_zone) from [<c01cdd08>]
> (try_to_free_pages) from [<c01c442c>]
> (__alloc_pages_nodemask) from [<c01f1884>]
> (new_slab) from [<c09fcf60>]
> (__slab_alloc) from [<c01f1a6c>]
> 
> In one such instance, zone_page_state(zone, NR_ISOLATED_FILE)
> had returned 14, zone_page_state(zone, NR_INACTIVE_FILE)
> returned 92, and GFP_IOFS was set, and this resulted
> in too_many_isolated returning true. But one of the CPU's
> pageset vm_stat_diff had NR_ISOLATED_FILE as "-14". So the
> actual isolated count was zero. As there weren't any more
> updates to NR_ISOLATED_FILE and vmstat_update deffered work
> had not been scheduled yet, 7 tasks were spinning in the
> congestion wait loop for around 4 seconds, in the direct
> reclaim path.
> 
> This patch uses zone_page_state_snapshot instead, but restricts
> its usage to avoid performance penalty.

Seems reasonable.

>
> ...
>
> @@ -1516,15 +1531,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	unsigned long nr_immediate = 0;
>  	isolate_mode_t isolate_mode = 0;
>  	int file = is_file_lru(lru);
> +	int safe = 0;
>  	struct zone *zone = lruvec_zone(lruvec);
>  	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>  
> -	while (unlikely(too_many_isolated(zone, file, sc))) {
> +	while (unlikely(too_many_isolated(zone, file, sc, safe))) {
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
>  
>  		/* We are about to die and free our memory. Return now. */
>  		if (fatal_signal_pending(current))
>  			return SWAP_CLUSTER_MAX;
> +
> +		safe = 1;
>  	}

But here and under the circumstances you describe, we'll call
congestion_wait() a single time.  That shouldn't have occurred.

So how about we put the fallback logic into too_many_isolated() itself?



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix

Move the zone_page_state_snapshot() fallback logic into
too_many_isolated(), so shrink_inactive_list() doesn't incorrectly call
congestion_wait().

Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff -puN mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix
+++ a/mm/vmscan.c
@@ -1402,7 +1402,7 @@ int isolate_lru_page(struct page *page)
 }
 
 static int __too_many_isolated(struct zone *zone, int file,
-	struct scan_control *sc, int safe)
+			       struct scan_control *sc, int safe)
 {
 	unsigned long inactive, isolated;
 
@@ -1435,7 +1435,7 @@ static int __too_many_isolated(struct zo
  * unnecessary swapping, thrashing and OOM.
  */
 static int too_many_isolated(struct zone *zone, int file,
-		struct scan_control *sc, int safe)
+			     struct scan_control *sc)
 {
 	if (current_is_kswapd())
 		return 0;
@@ -1443,12 +1443,14 @@ static int too_many_isolated(struct zone
 	if (!global_reclaim(sc))
 		return 0;
 
-	if (unlikely(__too_many_isolated(zone, file, sc, 0))) {
-		if (safe)
-			return __too_many_isolated(zone, file, sc, safe);
-		else
-			return 1;
-	}
+	/*
+	 * __too_many_isolated(safe=0) is fast but inaccurate, because it
+	 * doesn't account for the vm_stat_diff[] counters.  So if it looks
+	 * like too_many_isolated() is about to return true, fall back to the
+	 * slower, more accurate zone_page_state_snapshot().
+	 */
+	if (unlikely(__too_many_isolated(zone, file, sc, 0)))
+		return __too_many_isolated(zone, file, sc, safe);
 
 	return 0;
 }
@@ -1540,18 +1542,15 @@ shrink_inactive_list(unsigned long nr_to
 	unsigned long nr_immediate = 0;
 	isolate_mode_t isolate_mode = 0;
 	int file = is_file_lru(lru);
-	int safe = 0;
 	struct zone *zone = lruvec_zone(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
-	while (unlikely(too_many_isolated(zone, file, sc, safe))) {
+	while (unlikely(too_many_isolated(zone, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/* We are about to die and free our memory. Return now. */
 		if (fatal_signal_pending(current))
 			return SWAP_CLUSTER_MAX;
-
-		safe = 1;
 	}
 
 	lru_add_drain();
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-01-16  1:17 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-14 11:36 [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated Vinayak Menon
2015-01-14 11:36 ` Vinayak Menon
2015-01-14 16:50 ` Michal Hocko
2015-01-14 16:50   ` Michal Hocko
2015-01-15 17:24   ` Vinayak Menon
2015-01-15 17:24     ` Vinayak Menon
2015-01-16 15:49     ` Michal Hocko
2015-01-16 15:49       ` Michal Hocko
2015-01-16 17:57       ` Michal Hocko
2015-01-16 17:57         ` Michal Hocko
2015-01-16 19:17         ` Christoph Lameter
2015-01-16 19:17           ` Christoph Lameter
2015-01-17 15:18       ` Vinayak Menon
2015-01-17 15:18         ` Vinayak Menon
2015-01-17 19:48         ` Christoph Lameter
2015-01-17 19:48           ` Christoph Lameter
2015-01-19  4:27           ` Vinayak Menon
2015-01-19  4:27             ` Vinayak Menon
2015-01-21 14:39             ` Michal Hocko
2015-01-21 14:39               ` Michal Hocko
2015-01-22 15:16               ` Vlastimil Babka
2015-01-22 15:16                 ` Vlastimil Babka
2015-01-22 16:11               ` Christoph Lameter
2015-01-22 16:11                 ` Christoph Lameter
2015-01-26 17:46                 ` Michal Hocko
2015-01-26 17:46                   ` Michal Hocko
2015-01-26 18:35                   ` Christoph Lameter
2015-01-26 18:35                     ` Christoph Lameter
2015-01-27 10:52                     ` Michal Hocko
2015-01-27 10:52                       ` Michal Hocko
2015-01-27 16:59                       ` Christoph Lameter
2015-01-27 16:59                         ` Christoph Lameter
2015-01-30 15:28                         ` Michal Hocko
2015-01-30 15:28                           ` Michal Hocko
2015-01-26 17:28           ` Michal Hocko
2015-01-26 17:28             ` Michal Hocko
2015-01-26 18:35             ` Christoph Lameter
2015-01-26 18:35               ` Christoph Lameter
2015-01-26 22:11             ` Andrew Morton
2015-01-26 22:11               ` Andrew Morton
2015-01-27 10:41               ` Michal Hocko
2015-01-27 10:41                 ` Michal Hocko
2015-01-27 10:33             ` Vinayak Menon
2015-01-27 10:33               ` Vinayak Menon
2015-01-27 10:45               ` Michal Hocko
2015-01-27 10:45                 ` Michal Hocko
2015-01-29 17:32       ` Christoph Lameter
2015-01-29 17:32         ` Christoph Lameter
2015-01-30 15:27         ` Michal Hocko
2015-01-30 15:27           ` Michal Hocko
2015-01-16  1:17 ` Andrew Morton [this message]
2015-01-16  1:17   ` Andrew Morton
2015-01-16  5:10   ` Vinayak Menon
2015-01-16  5:10     ` Vinayak Menon
2015-01-17 16:29   ` Vinayak Menon
2015-01-17 16:29     ` Vinayak Menon
2015-02-11 22:14     ` Andrew Morton
2015-02-11 22:14       ` Andrew Morton
2015-02-12 16:19       ` Vlastimil Babka
2015-02-12 16:19         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150115171728.ebc77a48.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=vdavydov@parallels.com \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.