linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks
@ 2022-09-29 21:54 Dave Chinner
  2022-09-29 22:20 ` Dave Chinner
  2022-09-30  0:33 ` Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Dave Chinner @ 2022-09-29 21:54 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, linux-xfs

From: Dave Chinner <dchinner@redhat.com>

This should be more obvious, but gfpflags_allow_blocking() is not
the same thing as a GFP_KERNEL reclaim contexts. The former checks
GFP_DIRECT_RECLAIM which tells us if direct reclaim is allowed. The
latter (GFP_KERNEL) allows blocking on anything, including
filesystem and IO structures during reclaim.

However, we do lots of memory allocation from various filesystems we
are under GFP_NOFS contexts, including page cache folios. Hence if
direct reclaim in GFP_NOFS context waits on filesystem progress
(e.g. waits on folio writeback) then memory reclaim can deadlock.

e.g. page cache allocation (which is GFP_NOFS context) gets stuck
waiting on page writeback like so:

[   75.943494] task:test_write      state:D stack:12560 pid: 3728 ppid:  3613 flags:0x00004002
[   75.944788] Call Trace:
[   75.945183]  <TASK>
[   75.945543]  __schedule+0x2f9/0xa30
[   75.946118]  ? __mod_memcg_lruvec_state+0x41/0x90
[   75.946895]  schedule+0x5a/0xc0
[   75.947397]  io_schedule+0x42/0x70
[   75.947992]  folio_wait_bit_common+0x159/0x3d0
[   75.948732]  ? dio_warn_stale_pagecache.part.0+0x50/0x50
[   75.949505]  folio_wait_writeback+0x28/0x80
[   75.950163]  shrink_page_list+0x96e/0xc30
[   75.950843]  shrink_lruvec+0x558/0xb80
[   75.951440]  shrink_node+0x2c6/0x700
[   75.952059]  do_try_to_free_pages+0xd5/0x570
[   75.952771]  try_to_free_mem_cgroup_pages+0x105/0x220
[   75.953548]  reclaim_high.constprop.0+0xa3/0xf0
[   75.954209]  mem_cgroup_handle_over_high+0x8f/0x280
[   75.955025]  ? kmem_cache_alloc_lru+0x1c6/0x3f0
[   75.955781]  try_charge_memcg+0x6c3/0x820
[   75.956436]  ? __mem_cgroup_threshold+0x16/0x150
[   75.957204]  charge_memcg+0x76/0xf0
[   75.957810]  __mem_cgroup_charge+0x29/0x80
[   75.958464]  __filemap_add_folio+0x225/0x590
[   75.959112]  ? scan_shadow_nodes+0x30/0x30
[   75.959794]  filemap_add_folio+0x37/0xa0
[   75.960432]  __filemap_get_folio+0x1fd/0x340
[   75.961141]  ? xas_load+0x5/0xa0
[   75.961712]  iomap_write_begin+0x103/0x6a0
[   75.962390]  ? filemap_dirty_folio+0x5c/0x80
[   75.963106]  ? iomap_write_end+0xa2/0x2b0
[   75.963744]  iomap_file_buffered_write+0x17c/0x380
[   75.964546]  xfs_file_buffered_write+0xb1/0x2e0
[   75.965286]  ? xfs_file_buffered_write+0x2b2/0x2e0
[   75.966097]  vfs_write+0x2ca/0x3d0
[   75.966702]  __x64_sys_pwrite64+0x8c/0xc0
[   75.967349]  do_syscall_64+0x35/0x80

At this point, the system has 58 pending XFS IO completions that are
stuck waiting for workqueue progress:

[ 1664.460579] workqueue xfs-conv/dm-0: flags=0x4c
[ 1664.461332]   pwq 48: cpus=24 node=3 flags=0x0 nice=0 active=58/256 refcnt=59
[ 1664.461335]     pending: xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io

and nothing is making progress. The reason progress is not being
made is not clear from what I can gather from the steaming corpse,
but it is clear that the memcg reclaim code should not be blocking
on filesystem related objects in GFP_NOFS allocation contexts.

We have the reclaim context parameters right there when we call
mem_cgroup_handle_over_high(), so pass them down the stack so memcg
reclaim doesn't cause deadlocks. This makes the reclaim deadlocks in
the test I've been running go away.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/memcontrol.h       | 4 ++--
 include/linux/resume_user_mode.h | 2 +-
 mm/memcontrol.c                  | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6257867fbf95..575bb8cfc810 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -919,7 +919,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
 	return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
 }
 
-void mem_cgroup_handle_over_high(void);
+void mem_cgroup_handle_over_high(gfp_t gfp_mask);
 
 unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);
 
@@ -1433,7 +1433,7 @@ static inline void folio_memcg_unlock(struct folio *folio)
 {
 }
 
-static inline void mem_cgroup_handle_over_high(void)
+static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask)
 {
 }
 
diff --git a/include/linux/resume_user_mode.h b/include/linux/resume_user_mode.h
index 285189454449..f8f3e958e9cf 100644
--- a/include/linux/resume_user_mode.h
+++ b/include/linux/resume_user_mode.h
@@ -55,7 +55,7 @@ static inline void resume_user_mode_work(struct pt_regs *regs)
 	}
 #endif
 
-	mem_cgroup_handle_over_high();
+	mem_cgroup_handle_over_high(GFP_KERNEL);
 	blkcg_maybe_throttle_current();
 
 	rseq_handle_notify_resume(NULL, regs);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b69979c9ced5..09fbebff9796 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2491,7 +2491,7 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
  * Scheduled by try_charge() to be executed from the userland return path
  * and reclaims memory over the high limit.
  */
-void mem_cgroup_handle_over_high(void)
+void mem_cgroup_handle_over_high(gfp_t gfp_mask)
 {
 	unsigned long penalty_jiffies;
 	unsigned long pflags;
@@ -2519,7 +2519,7 @@ void mem_cgroup_handle_over_high(void)
 	 */
 	nr_reclaimed = reclaim_high(memcg,
 				    in_retry ? SWAP_CLUSTER_MAX : nr_pages,
-				    GFP_KERNEL);
+				    gfp_mask);
 
 	/*
 	 * memory.high is breached and reclaim is unable to keep up. Throttle
@@ -2755,7 +2755,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
 	    !(current->flags & PF_MEMALLOC) &&
 	    gfpflags_allow_blocking(gfp_mask)) {
-		mem_cgroup_handle_over_high();
+		mem_cgroup_handle_over_high(gfp_mask);
 	}
 	return 0;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks
  2022-09-29 21:54 [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks Dave Chinner
@ 2022-09-29 22:20 ` Dave Chinner
  2022-09-30 12:18   ` Michal Koutný
  2022-09-30  0:33 ` Dave Chinner
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2022-09-29 22:20 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, linux-xfs

On Fri, Sep 30, 2022 at 07:54:40AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This should be more obvious, but gfpflags_allow_blocking() is not
> the same thing as a GFP_KERNEL reclaim contexts. The former checks
> GFP_DIRECT_RECLAIM which tells us if direct reclaim is allowed. The
> latter (GFP_KERNEL) allows blocking on anything, including
> filesystem and IO structures during reclaim.
> 
> However, we do lots of memory allocation from various filesystems we
> are under GFP_NOFS contexts, including page cache folios. Hence if
> direct reclaim in GFP_NOFS context waits on filesystem progress
> (e.g. waits on folio writeback) then memory reclaim can deadlock.
> 
> e.g. page cache allocation (which is GFP_NOFS context) gets stuck
> waiting on page writeback like so:
> 
> [   75.943494] task:test_write      state:D stack:12560 pid: 3728 ppid:  3613 flags:0x00004002
> [   75.944788] Call Trace:
> [   75.945183]  <TASK>
> [   75.945543]  __schedule+0x2f9/0xa30
> [   75.946118]  ? __mod_memcg_lruvec_state+0x41/0x90
> [   75.946895]  schedule+0x5a/0xc0
> [   75.947397]  io_schedule+0x42/0x70
> [   75.947992]  folio_wait_bit_common+0x159/0x3d0
> [   75.948732]  ? dio_warn_stale_pagecache.part.0+0x50/0x50
> [   75.949505]  folio_wait_writeback+0x28/0x80
> [   75.950163]  shrink_page_list+0x96e/0xc30
> [   75.950843]  shrink_lruvec+0x558/0xb80
> [   75.951440]  shrink_node+0x2c6/0x700
> [   75.952059]  do_try_to_free_pages+0xd5/0x570
> [   75.952771]  try_to_free_mem_cgroup_pages+0x105/0x220
> [   75.953548]  reclaim_high.constprop.0+0xa3/0xf0
> [   75.954209]  mem_cgroup_handle_over_high+0x8f/0x280
> [   75.955025]  ? kmem_cache_alloc_lru+0x1c6/0x3f0
> [   75.955781]  try_charge_memcg+0x6c3/0x820
> [   75.956436]  ? __mem_cgroup_threshold+0x16/0x150
> [   75.957204]  charge_memcg+0x76/0xf0
> [   75.957810]  __mem_cgroup_charge+0x29/0x80
> [   75.958464]  __filemap_add_folio+0x225/0x590
> [   75.959112]  ? scan_shadow_nodes+0x30/0x30
> [   75.959794]  filemap_add_folio+0x37/0xa0
> [   75.960432]  __filemap_get_folio+0x1fd/0x340
> [   75.961141]  ? xas_load+0x5/0xa0
> [   75.961712]  iomap_write_begin+0x103/0x6a0
> [   75.962390]  ? filemap_dirty_folio+0x5c/0x80
> [   75.963106]  ? iomap_write_end+0xa2/0x2b0
> [   75.963744]  iomap_file_buffered_write+0x17c/0x380
> [   75.964546]  xfs_file_buffered_write+0xb1/0x2e0
> [   75.965286]  ? xfs_file_buffered_write+0x2b2/0x2e0
> [   75.966097]  vfs_write+0x2ca/0x3d0
> [   75.966702]  __x64_sys_pwrite64+0x8c/0xc0
> [   75.967349]  do_syscall_64+0x35/0x80
> 
> At this point, the system has 58 pending XFS IO completions that are
> stuck waiting for workqueue progress:
> 
> [ 1664.460579] workqueue xfs-conv/dm-0: flags=0x4c
> [ 1664.461332]   pwq 48: cpus=24 node=3 flags=0x0 nice=0 active=58/256 refcnt=59
> [ 1664.461335]     pending: xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io
> 
> and nothing is making progress. The reason progress is not being
> made is not clear from what I can gather from the steaming corpse,
> but it is clear that the memcg reclaim code should not be blocking
> on filesystem related objects in GFP_NOFS allocation contexts.
> 
> We have the reclaim context parameters right there when we call
> mem_cgroup_handle_over_high(), so pass them down the stack so memcg
> reclaim doesn't cause deadlocks. This makes the reclaim deadlocks in
> the test I've been running go away.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

FWIW:

Fixes: b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling")

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks
  2022-09-29 21:54 [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks Dave Chinner
  2022-09-29 22:20 ` Dave Chinner
@ 2022-09-30  0:33 ` Dave Chinner
  1 sibling, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2022-09-30  0:33 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, linux-xfs

[oops, cc should have been linux-mm@kvack.org]

On Fri, Sep 30, 2022 at 07:54:40AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This should be more obvious, but gfpflags_allow_blocking() is not
> the same thing as a GFP_KERNEL reclaim contexts. The former checks
> GFP_DIRECT_RECLAIM which tells us if direct reclaim is allowed. The
> latter (GFP_KERNEL) allows blocking on anything, including
> filesystem and IO structures during reclaim.
> 
> However, we do lots of memory allocation from various filesystems we
> are under GFP_NOFS contexts, including page cache folios. Hence if
> direct reclaim in GFP_NOFS context waits on filesystem progress
> (e.g. waits on folio writeback) then memory reclaim can deadlock.
> 
> e.g. page cache allocation (which is GFP_NOFS context) gets stuck
> waiting on page writeback like so:
> 
> [   75.943494] task:test_write      state:D stack:12560 pid: 3728 ppid:  3613 flags:0x00004002
> [   75.944788] Call Trace:
> [   75.945183]  <TASK>
> [   75.945543]  __schedule+0x2f9/0xa30
> [   75.946118]  ? __mod_memcg_lruvec_state+0x41/0x90
> [   75.946895]  schedule+0x5a/0xc0
> [   75.947397]  io_schedule+0x42/0x70
> [   75.947992]  folio_wait_bit_common+0x159/0x3d0
> [   75.948732]  ? dio_warn_stale_pagecache.part.0+0x50/0x50
> [   75.949505]  folio_wait_writeback+0x28/0x80
> [   75.950163]  shrink_page_list+0x96e/0xc30
> [   75.950843]  shrink_lruvec+0x558/0xb80
> [   75.951440]  shrink_node+0x2c6/0x700
> [   75.952059]  do_try_to_free_pages+0xd5/0x570
> [   75.952771]  try_to_free_mem_cgroup_pages+0x105/0x220
> [   75.953548]  reclaim_high.constprop.0+0xa3/0xf0
> [   75.954209]  mem_cgroup_handle_over_high+0x8f/0x280
> [   75.955025]  ? kmem_cache_alloc_lru+0x1c6/0x3f0
> [   75.955781]  try_charge_memcg+0x6c3/0x820
> [   75.956436]  ? __mem_cgroup_threshold+0x16/0x150
> [   75.957204]  charge_memcg+0x76/0xf0
> [   75.957810]  __mem_cgroup_charge+0x29/0x80
> [   75.958464]  __filemap_add_folio+0x225/0x590
> [   75.959112]  ? scan_shadow_nodes+0x30/0x30
> [   75.959794]  filemap_add_folio+0x37/0xa0
> [   75.960432]  __filemap_get_folio+0x1fd/0x340
> [   75.961141]  ? xas_load+0x5/0xa0
> [   75.961712]  iomap_write_begin+0x103/0x6a0
> [   75.962390]  ? filemap_dirty_folio+0x5c/0x80
> [   75.963106]  ? iomap_write_end+0xa2/0x2b0
> [   75.963744]  iomap_file_buffered_write+0x17c/0x380
> [   75.964546]  xfs_file_buffered_write+0xb1/0x2e0
> [   75.965286]  ? xfs_file_buffered_write+0x2b2/0x2e0
> [   75.966097]  vfs_write+0x2ca/0x3d0
> [   75.966702]  __x64_sys_pwrite64+0x8c/0xc0
> [   75.967349]  do_syscall_64+0x35/0x80
> 
> At this point, the system has 58 pending XFS IO completions that are
> stuck waiting for workqueue progress:
> 
> [ 1664.460579] workqueue xfs-conv/dm-0: flags=0x4c
> [ 1664.461332]   pwq 48: cpus=24 node=3 flags=0x0 nice=0 active=58/256 refcnt=59
> [ 1664.461335]     pending: xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io, xfs_end_io
> 
> and nothing is making progress. The reason progress is not being
> made is not clear from what I can gather from the steaming corpse,
> but it is clear that the memcg reclaim code should not be blocking
> on filesystem related objects in GFP_NOFS allocation contexts.
> 
> We have the reclaim context parameters right there when we call
> mem_cgroup_handle_over_high(), so pass them down the stack so memcg
> reclaim doesn't cause deadlocks. This makes the reclaim deadlocks in
> the test I've been running go away.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  include/linux/memcontrol.h       | 4 ++--
>  include/linux/resume_user_mode.h | 2 +-
>  mm/memcontrol.c                  | 6 +++---
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 6257867fbf95..575bb8cfc810 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -919,7 +919,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
>  	return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
>  }
>  
> -void mem_cgroup_handle_over_high(void);
> +void mem_cgroup_handle_over_high(gfp_t gfp_mask);
>  
>  unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);
>  
> @@ -1433,7 +1433,7 @@ static inline void folio_memcg_unlock(struct folio *folio)
>  {
>  }
>  
> -static inline void mem_cgroup_handle_over_high(void)
> +static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask)
>  {
>  }
>  
> diff --git a/include/linux/resume_user_mode.h b/include/linux/resume_user_mode.h
> index 285189454449..f8f3e958e9cf 100644
> --- a/include/linux/resume_user_mode.h
> +++ b/include/linux/resume_user_mode.h
> @@ -55,7 +55,7 @@ static inline void resume_user_mode_work(struct pt_regs *regs)
>  	}
>  #endif
>  
> -	mem_cgroup_handle_over_high();
> +	mem_cgroup_handle_over_high(GFP_KERNEL);
>  	blkcg_maybe_throttle_current();
>  
>  	rseq_handle_notify_resume(NULL, regs);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b69979c9ced5..09fbebff9796 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2491,7 +2491,7 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
>   * Scheduled by try_charge() to be executed from the userland return path
>   * and reclaims memory over the high limit.
>   */
> -void mem_cgroup_handle_over_high(void)
> +void mem_cgroup_handle_over_high(gfp_t gfp_mask)
>  {
>  	unsigned long penalty_jiffies;
>  	unsigned long pflags;
> @@ -2519,7 +2519,7 @@ void mem_cgroup_handle_over_high(void)
>  	 */
>  	nr_reclaimed = reclaim_high(memcg,
>  				    in_retry ? SWAP_CLUSTER_MAX : nr_pages,
> -				    GFP_KERNEL);
> +				    gfp_mask);
>  
>  	/*
>  	 * memory.high is breached and reclaim is unable to keep up. Throttle
> @@ -2755,7 +2755,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
>  	    !(current->flags & PF_MEMALLOC) &&
>  	    gfpflags_allow_blocking(gfp_mask)) {
> -		mem_cgroup_handle_over_high();
> +		mem_cgroup_handle_over_high(gfp_mask);
>  	}
>  	return 0;
>  }
> -- 
> 2.37.2
> 
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks
  2022-09-29 22:20 ` Dave Chinner
@ 2022-09-30 12:18   ` Michal Koutný
  2022-09-30 22:08     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Koutný @ 2022-09-30 12:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: cgroups, linux-mm, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 317 bytes --]

On Fri, Sep 30, 2022 at 08:20:06AM +1000, Dave Chinner <david@fromorbit.com> wrote:
> Fixes: b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling")

Perhaps you meant this instead?

Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for large overcharges")

Thanks,
Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks
  2022-09-30 12:18   ` Michal Koutný
@ 2022-09-30 22:08     ` Dave Chinner
  2022-10-03 15:08       ` Michal Koutný
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2022-09-30 22:08 UTC (permalink / raw)
  To: Michal Koutný; +Cc: cgroups, linux-mm, linux-xfs

On Fri, Sep 30, 2022 at 02:18:56PM +0200, Michal Koutný wrote:
> On Fri, Sep 30, 2022 at 08:20:06AM +1000, Dave Chinner <david@fromorbit.com> wrote:
> > Fixes: b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling")
> 
> Perhaps you meant this instead?
> 
> Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for large overcharges")

You might be right in that c9afe31ec443 exposed the issue, but it's
not the root cause. I think c9afe31ec443 just a case of a
new caller of mem_cgroup_handle_over_high() stepping on the landmine
left by b3ff92916af3 adding an unconditional GFP_KERNEL direct
reclaim deep in the guts of the memcg code.

IOWs, if b3ff92916af3 did things the right way to begin with, then
c9afe31ec443 would not have caused any problems. So what's the real
root cause of the issue - the commit that stepped on the landmine,
or the commit that placed the landmine?

Either way, if anyone backports b3ff92916af3 or has a kernel with
b3ff92916af3 and not c9afe31ec443, they still need to know
about the landmine in b3ff92916af3....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks
  2022-09-30 22:08     ` Dave Chinner
@ 2022-10-03 15:08       ` Michal Koutný
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Koutný @ 2022-10-03 15:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: cgroups, linux-mm, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1204 bytes --]

On Sat, Oct 01, 2022 at 08:08:34AM +1000, Dave Chinner <david@fromorbit.com> wrote:
> You might be right in that c9afe31ec443 exposed the issue, but it's
> not the root cause. I think c9afe31ec443 just a case of a
> new caller of mem_cgroup_handle_over_high() stepping on the landmine
> left by b3ff92916af3 adding an unconditional GFP_KERNEL direct
> reclaim deep in the guts of the memcg code.

It's specific of the memory.high induced reclaim that it happens out of
sensitive paths (as was with exit to usermode or workqueue), so there'd
be no explicit flags to pass through, hence the unconditional
GFP_KERNEL.

> So what's the real root cause of the issue - the commit that stepped
> on the landmine, or the commit that placed the landmine?

My preference here is slighty on the newer commit but feel free to
reference both.

> Either way, if anyone backports b3ff92916af3 or has a kernel with
> b3ff92916af3 and not c9afe31ec443, they still need to know
> about the landmine in b3ff92916af3....

To be on the same page -- having just b3ff92916af3 won't lead to the
described cycle when FS code reclaims without GFP_NOFS? (IOW, how would
the fix look like fix without c9afe31ec443?)

Thanks,
Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-03 15:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29 21:54 [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks Dave Chinner
2022-09-29 22:20 ` Dave Chinner
2022-09-30 12:18   ` Michal Koutný
2022-09-30 22:08     ` Dave Chinner
2022-10-03 15:08       ` Michal Koutný
2022-09-30  0:33 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).