All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jia He <hejianet@gmail.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm: fix condition for throttle_direct_reclaim
Date: Tue, 14 Mar 2017 09:16:03 +0100	[thread overview]
Message-ID: <20170314081602.GA7772@dhcp22.suse.cz> (raw)
In-Reply-To: <20170313221920.7881-1-shakeelb@google.com>

On Mon 13-03-17 15:19:20, Shakeel Butt wrote:
> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES

s@Recently@Since "mm: fix 100% CPU kswapd busyloop on unreclaimable nodes"@

> number of unsucessful iterations. Before going to sleep, kswapd thread
> will unconditionally wakeup all threads sleeping on pfmemalloc_wait.
> However the awoken threads will recheck the watermarks and wake the
> kswapd thread and sleep again on pfmemalloc_wait. There is a chance
> of continuous back and forth between kswapd and direct reclaiming
> threads if the kswapd keep failing and thus defeat the purpose of
> adding backoff mechanism to kswapd.

I would be probably more explicit about this being a livelock which
prevents the machine to reclaim anything or go OOM because _all_ direct
reclaimers might end up in in throttle_direct_reclaim so there is nobody
to make a forward progress.

> So, add kswapd_failures check
> on the throttle_direct_reclaim condition.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>

OK, seems like the simplest way forward. But we definitely have to do
something about throttle_direct_reclaim long term.

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
> v2:
> Instead of separate helper function for checking kswapd_failures,
> added the check into pfmemalloc_watermark_ok() and renamed that
> function.
> 
>  mm/vmscan.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bae698484e8e..afa5b20ab6d8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2783,7 +2783,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  	return 0;
>  }
>  
> -static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
> +static bool allow_direct_reclaim(pg_data_t *pgdat)
>  {
>  	struct zone *zone;
>  	unsigned long pfmemalloc_reserve = 0;
> @@ -2791,6 +2791,9 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
>  	int i;
>  	bool wmark_ok;
>  
> +	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> +		return true;
> +
>  	for (i = 0; i <= ZONE_NORMAL; i++) {
>  		zone = &pgdat->node_zones[i];
>  		if (!managed_zone(zone))
> @@ -2873,7 +2876,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>  
>  		/* Throttle based on the first usable node */
>  		pgdat = zone->zone_pgdat;
> -		if (pfmemalloc_watermark_ok(pgdat))
> +		if (allow_direct_reclaim(pgdat))
>  			goto out;
>  		break;
>  	}
> @@ -2895,14 +2898,14 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>  	 */
>  	if (!(gfp_mask & __GFP_FS)) {
>  		wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
> -			pfmemalloc_watermark_ok(pgdat), HZ);
> +			allow_direct_reclaim(pgdat), HZ);
>  
>  		goto check_pending;
>  	}
>  
>  	/* Throttle until kswapd wakes the process */
>  	wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
> -		pfmemalloc_watermark_ok(pgdat));
> +		allow_direct_reclaim(pgdat));
>  
>  check_pending:
>  	if (fatal_signal_pending(current))
> @@ -3102,7 +3105,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx)
>  {
>  	/*
>  	 * The throttled processes are normally woken up in balance_pgdat() as
> -	 * soon as pfmemalloc_watermark_ok() is true. But there is a potential
> +	 * soon as allow_direct_reclaim() is true. But there is a potential
>  	 * race between when kswapd checks the watermarks and a process gets
>  	 * throttled. There is also a potential race if processes get
>  	 * throttled, kswapd wakes, a large process exits thereby balancing the
> @@ -3271,7 +3274,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
>  		 * able to safely make forward progress. Wake them
>  		 */
>  		if (waitqueue_active(&pgdat->pfmemalloc_wait) &&
> -				pfmemalloc_watermark_ok(pgdat))
> +				allow_direct_reclaim(pgdat))
>  			wake_up_all(&pgdat->pfmemalloc_wait);
>  
>  		/* Check if kswapd should be suspending */
> -- 
> 2.12.0.246.ga2ecc84866-goog
> 

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jia He <hejianet@gmail.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm: fix condition for throttle_direct_reclaim
Date: Tue, 14 Mar 2017 09:16:03 +0100	[thread overview]
Message-ID: <20170314081602.GA7772@dhcp22.suse.cz> (raw)
In-Reply-To: <20170313221920.7881-1-shakeelb@google.com>

On Mon 13-03-17 15:19:20, Shakeel Butt wrote:
> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES

s@Recently@Since "mm: fix 100% CPU kswapd busyloop on unreclaimable nodes"@

> number of unsucessful iterations. Before going to sleep, kswapd thread
> will unconditionally wakeup all threads sleeping on pfmemalloc_wait.
> However the awoken threads will recheck the watermarks and wake the
> kswapd thread and sleep again on pfmemalloc_wait. There is a chance
> of continuous back and forth between kswapd and direct reclaiming
> threads if the kswapd keep failing and thus defeat the purpose of
> adding backoff mechanism to kswapd.

I would be probably more explicit about this being a livelock which
prevents the machine to reclaim anything or go OOM because _all_ direct
reclaimers might end up in in throttle_direct_reclaim so there is nobody
to make a forward progress.

> So, add kswapd_failures check
> on the throttle_direct_reclaim condition.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>

OK, seems like the simplest way forward. But we definitely have to do
something about throttle_direct_reclaim long term.

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
> v2:
> Instead of separate helper function for checking kswapd_failures,
> added the check into pfmemalloc_watermark_ok() and renamed that
> function.
> 
>  mm/vmscan.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bae698484e8e..afa5b20ab6d8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2783,7 +2783,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  	return 0;
>  }
>  
> -static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
> +static bool allow_direct_reclaim(pg_data_t *pgdat)
>  {
>  	struct zone *zone;
>  	unsigned long pfmemalloc_reserve = 0;
> @@ -2791,6 +2791,9 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
>  	int i;
>  	bool wmark_ok;
>  
> +	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> +		return true;
> +
>  	for (i = 0; i <= ZONE_NORMAL; i++) {
>  		zone = &pgdat->node_zones[i];
>  		if (!managed_zone(zone))
> @@ -2873,7 +2876,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>  
>  		/* Throttle based on the first usable node */
>  		pgdat = zone->zone_pgdat;
> -		if (pfmemalloc_watermark_ok(pgdat))
> +		if (allow_direct_reclaim(pgdat))
>  			goto out;
>  		break;
>  	}
> @@ -2895,14 +2898,14 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>  	 */
>  	if (!(gfp_mask & __GFP_FS)) {
>  		wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
> -			pfmemalloc_watermark_ok(pgdat), HZ);
> +			allow_direct_reclaim(pgdat), HZ);
>  
>  		goto check_pending;
>  	}
>  
>  	/* Throttle until kswapd wakes the process */
>  	wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
> -		pfmemalloc_watermark_ok(pgdat));
> +		allow_direct_reclaim(pgdat));
>  
>  check_pending:
>  	if (fatal_signal_pending(current))
> @@ -3102,7 +3105,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx)
>  {
>  	/*
>  	 * The throttled processes are normally woken up in balance_pgdat() as
> -	 * soon as pfmemalloc_watermark_ok() is true. But there is a potential
> +	 * soon as allow_direct_reclaim() is true. But there is a potential
>  	 * race between when kswapd checks the watermarks and a process gets
>  	 * throttled. There is also a potential race if processes get
>  	 * throttled, kswapd wakes, a large process exits thereby balancing the
> @@ -3271,7 +3274,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
>  		 * able to safely make forward progress. Wake them
>  		 */
>  		if (waitqueue_active(&pgdat->pfmemalloc_wait) &&
> -				pfmemalloc_watermark_ok(pgdat))
> +				allow_direct_reclaim(pgdat))
>  			wake_up_all(&pgdat->pfmemalloc_wait);
>  
>  		/* Check if kswapd should be suspending */
> -- 
> 2.12.0.246.ga2ecc84866-goog
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-03-14  8:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-13 22:19 [PATCH v2] mm: fix condition for throttle_direct_reclaim Shakeel Butt
2017-03-13 22:19 ` Shakeel Butt
2017-03-14  3:04 ` Hillf Danton
2017-03-14  3:04   ` Hillf Danton
2017-03-14  8:16 ` Michal Hocko [this message]
2017-03-14  8:16   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170314081602.GA7772@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hejianet@gmail.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.