linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock
@ 2023-05-19 11:13 Johannes Weiner
  2023-05-24  9:21 ` Vlastimil Babka
  2023-05-25  8:29 ` Mel Gorman
  0 siblings, 2 replies; 4+ messages in thread
From: Johannes Weiner @ 2023-05-19 11:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

During stress testing with higher-order allocations, a deadlock
scenario was observed in compaction: One GFP_NOFS allocation was
sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in
the system were busy with compactors spinning on buffer locks held by
the sleeping GFP_NOFS allocation.

Reclaim is susceptible to this same deadlock; we fixed it by granting
GFP_NOFS allocations additional LRU isolation headroom, to ensure it
makes forward progress while holding fs locks that other reclaimers
might acquire. Do the same here.

This code has been like this since compaction was initially merged,
and I only managed to trigger this with out-of-tree patches that
dramatically increase the contexts that do GFP_NOFS compaction. While
the issue is real, it seems theoretical in nature given existing
allocation sites. Worth fixing now, but no Fixes tag or stable CC.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

v2:
- clarify too_many_isolated() comment (Mel)
- split isolation deadlock from no-contiguous-anon lockups as that's
  a different scenario and deserves its own patch

diff --git a/mm/compaction.c b/mm/compaction.c
index c8bcdea15f5f..c9a4b6dffcf2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
 }
 
 /* Similar to reclaim, but different enough that they don't share logic */
-static bool too_many_isolated(pg_data_t *pgdat)
+static bool too_many_isolated(struct compact_control *cc)
 {
+	pg_data_t *pgdat = cc->zone->zone_pgdat;
 	bool too_many;
 
 	unsigned long active, inactive, isolated;
@@ -758,6 +759,17 @@ static bool too_many_isolated(pg_data_t *pgdat)
 	isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
 			node_page_state(pgdat, NR_ISOLATED_ANON);
 
+	/*
+	 * Allow GFP_NOFS to isolate past the limit set for regular
+	 * compaction runs. This prevents an ABBA deadlock when other
+	 * compactors have already isolated to the limit, but are
+	 * blocked on filesystem locks held by the GFP_NOFS thread.
+	 */
+	if (cc->gfp_mask & __GFP_FS) {
+		inactive >>= 3;
+		active >>= 3;
+	}
+
 	too_many = isolated > (inactive + active) / 2;
 	if (!too_many)
 		wake_throttle_isolated(pgdat);
@@ -806,7 +818,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 * list by either parallel reclaimers or compaction. If there are,
 	 * delay for some time until fewer pages are isolated
 	 */
-	while (unlikely(too_many_isolated(pgdat))) {
+	while (unlikely(too_many_isolated(cc))) {
 		/* stop isolation if there are still pages not migrated */
 		if (cc->nr_migratepages)
 			return -EAGAIN;
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock
  2023-05-19 11:13 [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock Johannes Weiner
@ 2023-05-24  9:21 ` Vlastimil Babka
  2023-05-24 15:58   ` Johannes Weiner
  2023-05-25  8:29 ` Mel Gorman
  1 sibling, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2023-05-24  9:21 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Mel Gorman, Michal Hocko, linux-mm, linux-kernel, kernel-team

On 5/19/23 13:13, Johannes Weiner wrote:
> During stress testing with higher-order allocations, a deadlock
> scenario was observed in compaction: One GFP_NOFS allocation was
> sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in
> the system were busy with compactors spinning on buffer locks held by
> the sleeping GFP_NOFS allocation.
> 
> Reclaim is susceptible to this same deadlock; we fixed it by granting
> GFP_NOFS allocations additional LRU isolation headroom, to ensure it
> makes forward progress while holding fs locks that other reclaimers
> might acquire. Do the same here.
> 
> This code has been like this since compaction was initially merged,
> and I only managed to trigger this with out-of-tree patches that
> dramatically increase the contexts that do GFP_NOFS compaction. While
> the issue is real, it seems theoretical in nature given existing
> allocation sites. Worth fixing now, but no Fixes tag or stable CC.

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

So IIUC the change is done by not giving GFP_NOFS extra headroom, but
instead restricting the headroom of __GFP_FS allocations. But the original
one was probably too generous anyway so it should be fine?

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> v2:
> - clarify too_many_isolated() comment (Mel)
> - split isolation deadlock from no-contiguous-anon lockups as that's
>   a different scenario and deserves its own patch
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c8bcdea15f5f..c9a4b6dffcf2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
>  }
>  
>  /* Similar to reclaim, but different enough that they don't share logic */
> -static bool too_many_isolated(pg_data_t *pgdat)
> +static bool too_many_isolated(struct compact_control *cc)
>  {
> +	pg_data_t *pgdat = cc->zone->zone_pgdat;
>  	bool too_many;
>  
>  	unsigned long active, inactive, isolated;
> @@ -758,6 +759,17 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
>  			node_page_state(pgdat, NR_ISOLATED_ANON);
>  
> +	/*
> +	 * Allow GFP_NOFS to isolate past the limit set for regular
> +	 * compaction runs. This prevents an ABBA deadlock when other
> +	 * compactors have already isolated to the limit, but are
> +	 * blocked on filesystem locks held by the GFP_NOFS thread.
> +	 */
> +	if (cc->gfp_mask & __GFP_FS) {
> +		inactive >>= 3;
> +		active >>= 3;
> +	}
> +
>  	too_many = isolated > (inactive + active) / 2;
>  	if (!too_many)
>  		wake_throttle_isolated(pgdat);
> @@ -806,7 +818,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	 * list by either parallel reclaimers or compaction. If there are,
>  	 * delay for some time until fewer pages are isolated
>  	 */
> -	while (unlikely(too_many_isolated(pgdat))) {
> +	while (unlikely(too_many_isolated(cc))) {
>  		/* stop isolation if there are still pages not migrated */
>  		if (cc->nr_migratepages)
>  			return -EAGAIN;



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock
  2023-05-24  9:21 ` Vlastimil Babka
@ 2023-05-24 15:58   ` Johannes Weiner
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2023-05-24 15:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Michal Hocko, linux-mm, linux-kernel,
	kernel-team

On Wed, May 24, 2023 at 11:21:43AM +0200, Vlastimil Babka wrote:
> On 5/19/23 13:13, Johannes Weiner wrote:
> > During stress testing with higher-order allocations, a deadlock
> > scenario was observed in compaction: One GFP_NOFS allocation was
> > sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in
> > the system were busy with compactors spinning on buffer locks held by
> > the sleeping GFP_NOFS allocation.
> > 
> > Reclaim is susceptible to this same deadlock; we fixed it by granting
> > GFP_NOFS allocations additional LRU isolation headroom, to ensure it
> > makes forward progress while holding fs locks that other reclaimers
> > might acquire. Do the same here.
> > 
> > This code has been like this since compaction was initially merged,
> > and I only managed to trigger this with out-of-tree patches that
> > dramatically increase the contexts that do GFP_NOFS compaction. While
> > the issue is real, it seems theoretical in nature given existing
> > allocation sites. Worth fixing now, but no Fixes tag or stable CC.
> 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> So IIUC the change is done by not giving GFP_NOFS extra headroom, but
> instead restricting the headroom of __GFP_FS allocations. But the original
> one was probably too generous anyway so it should be fine?

Yes, the original limit is generally half the LRU, which is quite high.

The new limit is 1/16th of the LRU for regular compactors and half for
GFP_NOFS ones. Note that I didn't make these up; they're stolen from
too_many_isolated() in vmscan.c. I figured those are proven values and
no sense in deviating from them until we have a reason to do so.

> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock
  2023-05-19 11:13 [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock Johannes Weiner
  2023-05-24  9:21 ` Vlastimil Babka
@ 2023-05-25  8:29 ` Mel Gorman
  1 sibling, 0 replies; 4+ messages in thread
From: Mel Gorman @ 2023-05-25  8:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

On Fri, May 19, 2023 at 01:13:59PM +0200, Johannes Weiner wrote:
> During stress testing with higher-order allocations, a deadlock
> scenario was observed in compaction: One GFP_NOFS allocation was
> sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in
> the system were busy with compactors spinning on buffer locks held by
> the sleeping GFP_NOFS allocation.
> 
> Reclaim is susceptible to this same deadlock; we fixed it by granting
> GFP_NOFS allocations additional LRU isolation headroom, to ensure it
> makes forward progress while holding fs locks that other reclaimers
> might acquire. Do the same here.
> 
> This code has been like this since compaction was initially merged,
> and I only managed to trigger this with out-of-tree patches that
> dramatically increase the contexts that do GFP_NOFS compaction. While
> the issue is real, it seems theoretical in nature given existing
> allocation sites. Worth fixing now, but no Fixes tag or stable CC.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

> ---
>  mm/compaction.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> v2:
> - clarify too_many_isolated() comment (Mel)
> - split isolation deadlock from no-contiguous-anon lockups as that's
>   a different scenario and deserves its own patch
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c8bcdea15f5f..c9a4b6dffcf2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
>  }
>  
>  /* Similar to reclaim, but different enough that they don't share logic */
> -static bool too_many_isolated(pg_data_t *pgdat)
> +static bool too_many_isolated(struct compact_control *cc)
>  {
> +	pg_data_t *pgdat = cc->zone->zone_pgdat;
>  	bool too_many;
>  
>  	unsigned long active, inactive, isolated;
> @@ -758,6 +759,17 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
>  			node_page_state(pgdat, NR_ISOLATED_ANON);
>  
> +	/*
> +	 * Allow GFP_NOFS to isolate past the limit set for regular
> +	 * compaction runs. This prevents an ABBA deadlock when other
> +	 * compactors have already isolated to the limit, but are
> +	 * blocked on filesystem locks held by the GFP_NOFS thread.
> +	 */
> +	if (cc->gfp_mask & __GFP_FS) {
> +		inactive >>= 3;
> +		active >>= 3;
> +	}
> +
>  	too_many = isolated > (inactive + active) / 2;
>  	if (!too_many)
>  		wake_throttle_isolated(pgdat);
> @@ -806,7 +818,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	 * list by either parallel reclaimers or compaction. If there are,
>  	 * delay for some time until fewer pages are isolated
>  	 */
> -	while (unlikely(too_many_isolated(pgdat))) {
> +	while (unlikely(too_many_isolated(cc))) {
>  		/* stop isolation if there are still pages not migrated */
>  		if (cc->nr_migratepages)
>  			return -EAGAIN;
> -- 
> 2.40.0
> 

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-25  8:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19 11:13 [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock Johannes Weiner
2023-05-24  9:21 ` Vlastimil Babka
2023-05-24 15:58   ` Johannes Weiner
2023-05-25  8:29 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).