linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
@ 2015-01-05  8:56 Vlastimil Babka
  2015-01-05  8:56 ` [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Vlastimil Babka @ 2015-01-05  8:56 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Vladimir Davydov, stable,
	Mel Gorman, Johannes Weiner, Michal Hocko, Rik van Riel

Charles Shirron and Paul Cassella from Cray Inc have reported kswapd stuck
in a busy loop with nothing left to balance, but kswapd_try_to_sleep() failing
to sleep. Their analysis found the cause to be a combination of several
factors:

1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait

2. The process has been killed (by OOM in this case), but has not yet been
   scheduled to remove itself from the waitqueue and die.

3. kswapd checks for throttled processes in prepare_kswapd_sleep():

        if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                wake_up(&pgdat->pfmemalloc_wait);
		return false; // kswapd will not go to sleep
	}

   However, for a process that was already killed, wake_up() does not remove
   the process from the waitqueue, since try_to_wake_up() checks its state
   first and returns false when the process is no longer waiting.

4. kswapd is running on the same CPU as the only CPU that the process is
   allowed to run on (through cpus_allowed, or possibly single-cpu system).

5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
   encounters no voluntary preemption points and repeatedly fails
   prepare_kswapd_sleep(), blocking the process from running and removing
   itself from the waitqueue, which would let kswapd sleep.

So, the source of the problem is that we prevent kswapd from going to sleep
until there are processes waiting on the pfmemalloc_wait queue, and a process
waiting on a queue is guaranteed to be removed from the queue only when it
gets scheduled. This was done to make sure that no process is left sleeping
on pfmemalloc_wait when kswapd itself goes to sleep.

However, it isn't necessary to postpone kswapd sleep until the pfmemalloc_wait
queue actually empties. To prevent processes from being left sleeping, it's
actually enough to guarantee that all processes waiting on pfmemalloc_wait
queue have been woken up by the time we put kswapd to sleep.

This patch therefore fixes this issue by substituting 'wake_up' with
'wake_up_all' and removing 'return false' in the code snippet from
prepare_kswapd_sleep() above. Note that if any process puts itself in the
queue after this waitqueue_active() check, or after the wake up itself, it
means that the process will also wake up kswapd - and since we are under
prepare_to_wait(), the wake up won't be missed. Also we update the comment
prepare_kswapd_sleep() to hopefully more clearly describe the races it is
preventing.

Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves
                      are low and swap is backed by network storage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org>   # v3.6+
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
---
Changes in v3 (v2 was sent by Vladimir Davydov, thanks for his new solution):

- split to two patches again, as I (and Michal Hocko) think it's more correct
- some rewording in changelog
- change the code comment again as in v1 with small updates (v2 dropped this
  part), since it has been clearly a source of confusion so far

 mm/vmscan.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd9a72b..ab2505c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2921,18 +2921,20 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
 		return false;
 
 	/*
-	 * There is a potential race between when kswapd checks its watermarks
-	 * and a process gets throttled. There is also a potential race if
-	 * processes get throttled, kswapd wakes, a large process exits therby
-	 * balancing the zones that causes kswapd to miss a wakeup. If kswapd
-	 * is going to sleep, no process should be sleeping on pfmemalloc_wait
-	 * so wake them now if necessary. If necessary, processes will wake
-	 * kswapd and get throttled again
+	 * The throttled processes are normally woken up in balance_pgdat() as
+	 * soon as pfmemalloc_watermark_ok() is true. But there is a potential
+	 * race between when kswapd checks the watermarks and a process gets
+	 * throttled. There is also a potential race if processes get
+	 * throttled, kswapd wakes, a large process exits thereby balancing the
+	 * zones, which causes kswapd to exit balance_pgdat() before reaching
+	 * the wake up checks. If kswapd is going to sleep, no process should
+	 * be sleeping on pfmemalloc_wait, so wake them now if necessary. If
+	 * the wake up is premature, processes will wake kswapd and get
+	 * throttled again. The difference from wake ups in balance_pgdat() is
+	 * that here we are under prepare_to_wait().
 	 */
-	if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
-		wake_up(&pgdat->pfmemalloc_wait);
-		return false;
-	}
+	if (waitqueue_active(&pgdat->pfmemalloc_wait))
+		wake_up_all(&pgdat->pfmemalloc_wait);
 
 	return pgdat_balanced(pgdat, order, classzone_idx);
 }
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once
  2015-01-05  8:56 [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Vlastimil Babka
@ 2015-01-05  8:56 ` Vlastimil Babka
  2015-01-06  2:58   ` Rik van Riel
  2015-01-05  9:12 ` [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Michal Hocko
  2015-01-06  2:45 ` Rik van Riel
  2 siblings, 1 reply; 5+ messages in thread
From: Vlastimil Babka @ 2015-01-05  8:56 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Mel Gorman, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Rik van Riel, stable

Kswapd in balance_pgdate() currently uses wake_up() on processes waiting in
throttle_direct_reclaim(), which only wakes up a single process. This might
leave processes waiting for longer than necessary, until the check is reached
in the next loop iteration. Processes might also be left waiting if zone was
fully balanced in single iteration. Note that the comment in balance_pgdat()
also says "Wake them", so waking up a single process does not seem intentional.

Thus, replace wake_up() with wake_up_all().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ab2505c..f73afa3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3175,7 +3175,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 		 */
 		if (waitqueue_active(&pgdat->pfmemalloc_wait) &&
 				pfmemalloc_watermark_ok(pgdat))
-			wake_up(&pgdat->pfmemalloc_wait);
+			wake_up_all(&pgdat->pfmemalloc_wait);
 
 		/*
 		 * Fragmentation may mean that the system cannot be rebalanced
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  2015-01-05  8:56 [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Vlastimil Babka
  2015-01-05  8:56 ` [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once Vlastimil Babka
@ 2015-01-05  9:12 ` Michal Hocko
  2015-01-06  2:45 ` Rik van Riel
  2 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2015-01-05  9:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Vladimir Davydov, stable,
	Mel Gorman, Johannes Weiner, Rik van Riel

On Mon 05-01-15 09:56:42, Vlastimil Babka wrote:
> Charles Shirron and Paul Cassella from Cray Inc have reported kswapd stuck
> in a busy loop with nothing left to balance, but kswapd_try_to_sleep() failing
> to sleep. Their analysis found the cause to be a combination of several
> factors:
> 
> 1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait
> 
> 2. The process has been killed (by OOM in this case), but has not yet been
>    scheduled to remove itself from the waitqueue and die.
> 
> 3. kswapd checks for throttled processes in prepare_kswapd_sleep():
> 
>         if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
>                 wake_up(&pgdat->pfmemalloc_wait);
> 		return false; // kswapd will not go to sleep
> 	}
> 
>    However, for a process that was already killed, wake_up() does not remove
>    the process from the waitqueue, since try_to_wake_up() checks its state
>    first and returns false when the process is no longer waiting.
> 
> 4. kswapd is running on the same CPU as the only CPU that the process is
>    allowed to run on (through cpus_allowed, or possibly single-cpu system).
> 
> 5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
>    encounters no voluntary preemption points and repeatedly fails
>    prepare_kswapd_sleep(), blocking the process from running and removing
>    itself from the waitqueue, which would let kswapd sleep.
> 
> So, the source of the problem is that we prevent kswapd from going to sleep
> until there are processes waiting on the pfmemalloc_wait queue, and a process
> waiting on a queue is guaranteed to be removed from the queue only when it
> gets scheduled. This was done to make sure that no process is left sleeping
> on pfmemalloc_wait when kswapd itself goes to sleep.
> 
> However, it isn't necessary to postpone kswapd sleep until the pfmemalloc_wait
> queue actually empties. To prevent processes from being left sleeping, it's
> actually enough to guarantee that all processes waiting on pfmemalloc_wait
> queue have been woken up by the time we put kswapd to sleep.
> 
> This patch therefore fixes this issue by substituting 'wake_up' with
> 'wake_up_all' and removing 'return false' in the code snippet from
> prepare_kswapd_sleep() above. Note that if any process puts itself in the
> queue after this waitqueue_active() check, or after the wake up itself, it
> means that the process will also wake up kswapd - and since we are under
> prepare_to_wait(), the wake up won't be missed. Also we update the comment
> prepare_kswapd_sleep() to hopefully more clearly describe the races it is
> preventing.
> 
> Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves
>                       are low and swap is backed by network storage")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Cc: <stable@vger.kernel.org>   # v3.6+
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Rik van Riel <riel@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks!

> ---
> Changes in v3 (v2 was sent by Vladimir Davydov, thanks for his new solution):
> 
> - split to two patches again, as I (and Michal Hocko) think it's more correct
> - some rewording in changelog
> - change the code comment again as in v1 with small updates (v2 dropped this
>   part), since it has been clearly a source of confusion so far
> 
>  mm/vmscan.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd9a72b..ab2505c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2921,18 +2921,20 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
>  		return false;
>  
>  	/*
> -	 * There is a potential race between when kswapd checks its watermarks
> -	 * and a process gets throttled. There is also a potential race if
> -	 * processes get throttled, kswapd wakes, a large process exits therby
> -	 * balancing the zones that causes kswapd to miss a wakeup. If kswapd
> -	 * is going to sleep, no process should be sleeping on pfmemalloc_wait
> -	 * so wake them now if necessary. If necessary, processes will wake
> -	 * kswapd and get throttled again
> +	 * The throttled processes are normally woken up in balance_pgdat() as
> +	 * soon as pfmemalloc_watermark_ok() is true. But there is a potential
> +	 * race between when kswapd checks the watermarks and a process gets
> +	 * throttled. There is also a potential race if processes get
> +	 * throttled, kswapd wakes, a large process exits thereby balancing the
> +	 * zones, which causes kswapd to exit balance_pgdat() before reaching
> +	 * the wake up checks. If kswapd is going to sleep, no process should
> +	 * be sleeping on pfmemalloc_wait, so wake them now if necessary. If
> +	 * the wake up is premature, processes will wake kswapd and get
> +	 * throttled again. The difference from wake ups in balance_pgdat() is
> +	 * that here we are under prepare_to_wait().
>  	 */
> -	if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
> -		wake_up(&pgdat->pfmemalloc_wait);
> -		return false;
> -	}
> +	if (waitqueue_active(&pgdat->pfmemalloc_wait))
> +		wake_up_all(&pgdat->pfmemalloc_wait);
>  
>  	return pgdat_balanced(pgdat, order, classzone_idx);
>  }
> -- 
> 2.1.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  2015-01-05  8:56 [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Vlastimil Babka
  2015-01-05  8:56 ` [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once Vlastimil Babka
  2015-01-05  9:12 ` [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Michal Hocko
@ 2015-01-06  2:45 ` Rik van Riel
  2 siblings, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2015-01-06  2:45 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Vladimir Davydov, stable, Mel Gorman,
	Johannes Weiner, Michal Hocko

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/05/2015 03:56 AM, Vlastimil Babka wrote:
> Charles Shirron and Paul Cassella from Cray Inc have reported
> kswapd stuck in a busy loop with nothing left to balance, but
> kswapd_try_to_sleep() failing to sleep. Their analysis found the
> cause to be a combination of several factors:

> Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC
> reserves are low and swap is backed by network storage") 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
> Vladimir Davydov <vdavydov@parallels.com> Cc:
> <stable@vger.kernel.org>   # v3.6+ Cc: Mel Gorman
> <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc:
> Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUq0xLAAoJEM553pKExN6D8ZgH/3AjsMKmA/JytTf5WOjpO0+W
ozN7ttYjYfIAFytYymPYkbm5/8pLBhy9HM9szMijRa+vhLyiGr5CSB0jTTU+1ImP
BynATfT4NbAHssosMRCHqRep/NmGkaV0qkiL4ndEaCwiGz/x481jgvHXP3GdHxbu
kIt8gYrP1gK3yM+yGk06nhsM9QU0C0P/ngzAvMTrch/nKr679Uu+chyLAskms7hZ
+x0OZkkN1gWGDCn55swLPURbIPnWedml+HB19e30fehLtpCcPoTzc1Zr46fPcI4O
BhJJX8QCWhpYdXqE1TZCid3zV4wPTJKAe1MskzBCgo3ELYE7ll1gM24QcfrpdgA=
=0cv0
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once
  2015-01-05  8:56 ` [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once Vlastimil Babka
@ 2015-01-06  2:58   ` Rik van Riel
  0 siblings, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2015-01-06  2:58 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Mel Gorman, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, stable

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/05/2015 03:56 AM, Vlastimil Babka wrote:
> Kswapd in balance_pgdate() currently uses wake_up() on processes
> waiting in throttle_direct_reclaim(), which only wakes up a single
> process. This might leave processes waiting for longer than
> necessary, until the check is reached in the next loop iteration.
> Processes might also be left waiting if zone was fully balanced in
> single iteration. Note that the comment in balance_pgdat() also
> says "Wake them", so waking up a single process does not seem
> intentional.
> 
> Thus, replace wake_up() with wake_up_all().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman
> <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc:
> Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov
> <vdavydov@parallels.com> Cc: Rik van Riel <riel@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUq08+AAoJEM553pKExN6D9bEIAKV916k0OguiyAGIXrgjPrba
3W2eQ2FFWIm6KwumPas7t6jBzR11yFhpAIACf6nNt12EmH73UwaW2z6vdYque4c+
HU79DXTNTQ91xJfXI8XfNsZW/s8zpCZ+Sm08N7/O7k6c76yKR+owXmoSnjb2Q4N9
O0F0db3Jkd52sH/2l+LCUKrTeI9fRBrKnpAv7FAhZ5go8N7tdtIjYv8hhpIZ/83F
WSRsORt0VKOx0an+JO09e5f6R+RF7RAqiU6yUdDAk52CJzyRqECksLxDOvw81OHD
QUT5TuMVNoqXNJxFpQvyI8Dn82d1CiisX2Wztcic4OlxPJQ6gK04SmeWMXw8Ijw=
=pnD6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-06  2:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05  8:56 [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Vlastimil Babka
2015-01-05  8:56 ` [PATCH V3 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once Vlastimil Babka
2015-01-06  2:58   ` Rik van Riel
2015-01-05  9:12 ` [PATCH V3 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Michal Hocko
2015-01-06  2:45 ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).