linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] f2fs: avoid deadlock in gc thread under low memory
@ 2022-04-13  8:44 Rokudo Yan
  2022-04-13  9:46 ` Michal Hocko
  2022-04-13 17:00 ` Jaegeuk Kim
  0 siblings, 2 replies; 9+ messages in thread
From: Rokudo Yan @ 2022-04-13  8:44 UTC (permalink / raw)
  To: jaegeuk; +Cc: linux-f2fs-devel, linux-kernel, tang.ding, wu-yan

There is a potential deadlock in gc thread may happen
under low memory as below:

gc_thread_func
 -f2fs_gc
  -do_garbage_collect
   -gc_data_segment
    -move_data_block
     -set_page_writeback(fio.encrypted_page);
     -f2fs_submit_page_write
as f2fs_submit_page_write try to do io merge when possible, so the
encrypted_page is marked PG_writeback but may not submit to block
layer immediately, if system enter low memory when gc thread try
to move next data block, it may do direct reclaim and enter fs layer
as below:
   -move_data_block
    -f2fs_grab_cache_page(index=?, for_write=false)
     -grab_cache_page
      -find_or_create_page
       -pagecache_get_page
        -__page_cache_alloc --  __GFP_FS is set
         -alloc_pages_node
          -__alloc_pages
           -__alloc_pages_slowpath
            -__alloc_pages_direct_reclaim
             -__perform_reclaim
              -try_to_free_pages
               -do_try_to_free_pages
                -shrink_zones
                 -mem_cgroup_soft_limit_reclaim
                  -mem_cgroup_soft_reclaim
                   -mem_cgroup_shrink_node
                    -shrink_node_memcg
                     -shrink_list
                      -shrink_inactive_list
                       -shrink_page_list
                        -wait_on_page_writeback -- the page is marked
                       writeback during previous move_data_block call

the gc thread wait for the encrypted_page writeback complete,
but as gc thread held sbi->gc_lock, the writeback & sync thread
may blocked waiting for sbi->gc_lock, so the bio contain the
encrypted_page may nerver submit to block layer and complete the
writeback, which cause deadlock. To avoid this deadlock condition,
we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
enter fs layer when try to alloc cache page during move_data_block.

Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
---
 fs/f2fs/gc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index e020804f7b07..cc71f77b98c8 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
 
 	wait_ms = gc_th->min_sleep_time;
 
+	/*
+	 * Make sure that no allocations from gc thread will ever
+	 * recurse to the fs layer to avoid deadlock as it will
+	 * hold sbi->gc_lock during garbage collection
+	 */
+	memalloc_nofs_save();
 	set_freezable();
 	do {
 		bool sync_mode, foreground = false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-13  8:44 [PATCH] f2fs: avoid deadlock in gc thread under low memory Rokudo Yan
@ 2022-04-13  9:46 ` Michal Hocko
  2022-04-13 11:20   ` Wu Yan
  2022-04-13 17:00 ` Jaegeuk Kim
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2022-04-13  9:46 UTC (permalink / raw)
  To: Rokudo Yan; +Cc: jaegeuk, linux-f2fs-devel, linux-kernel, tang.ding

On Wed 13-04-22 16:44:32, Rokudo Yan wrote:
> There is a potential deadlock in gc thread may happen
> under low memory as below:
> 
> gc_thread_func
>  -f2fs_gc
>   -do_garbage_collect
>    -gc_data_segment
>     -move_data_block
>      -set_page_writeback(fio.encrypted_page);
>      -f2fs_submit_page_write
> as f2fs_submit_page_write try to do io merge when possible, so the
> encrypted_page is marked PG_writeback but may not submit to block
> layer immediately, if system enter low memory when gc thread try
> to move next data block, it may do direct reclaim and enter fs layer
> as below:
>    -move_data_block
>     -f2fs_grab_cache_page(index=?, for_write=false)
>      -grab_cache_page
>       -find_or_create_page
>        -pagecache_get_page
>         -__page_cache_alloc --  __GFP_FS is set
>          -alloc_pages_node
>           -__alloc_pages
>            -__alloc_pages_slowpath
>             -__alloc_pages_direct_reclaim
>              -__perform_reclaim
>               -try_to_free_pages
>                -do_try_to_free_pages
>                 -shrink_zones
>                  -mem_cgroup_soft_limit_reclaim
>                   -mem_cgroup_soft_reclaim
>                    -mem_cgroup_shrink_node
>                     -shrink_node_memcg
>                      -shrink_list
>                       -shrink_inactive_list
>                        -shrink_page_list
>                         -wait_on_page_writeback -- the page is marked
>                        writeback during previous move_data_block call

This is a memcg reclaim path and you would have to have __GFP_ACCOUNT in
the gfp mask to hit it from the page allocator. I am not really familiar
with f2fs but I doubt it is using this flag.

On the other hand the memory is charged to a memcg when the newly
allocated page is added to the page cache. That wouldn't trigger the
soft reclaim path but that is not really necessary because even the
regular memcg reclaim would trigger wait_on_page_writeback for cgroup
v1.

Also are you sure that the mapping's gfp mask has __GFP_FS set for this
allocation? f2fs_iget uses GFP_NOFS like mask for some inode types.

All that being said, you will need to change the above call chain but it
would be worth double checking the dead lock is real.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-13  9:46 ` Michal Hocko
@ 2022-04-13 11:20   ` Wu Yan
  2022-04-13 11:35     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Wu Yan @ 2022-04-13 11:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: jaegeuk, linux-f2fs-devel, linux-kernel, tang.ding

On 4/13/22 17:46, Michal Hocko wrote:
> On Wed 13-04-22 16:44:32, Rokudo Yan wrote:
>> There is a potential deadlock in gc thread may happen
>> under low memory as below:
>>
>> gc_thread_func
>>   -f2fs_gc
>>    -do_garbage_collect
>>     -gc_data_segment
>>      -move_data_block
>>       -set_page_writeback(fio.encrypted_page);
>>       -f2fs_submit_page_write
>> as f2fs_submit_page_write try to do io merge when possible, so the
>> encrypted_page is marked PG_writeback but may not submit to block
>> layer immediately, if system enter low memory when gc thread try
>> to move next data block, it may do direct reclaim and enter fs layer
>> as below:
>>     -move_data_block
>>      -f2fs_grab_cache_page(index=?, for_write=false)
>>       -grab_cache_page
>>        -find_or_create_page
>>         -pagecache_get_page
>>          -__page_cache_alloc --  __GFP_FS is set
>>           -alloc_pages_node
>>            -__alloc_pages
>>             -__alloc_pages_slowpath
>>              -__alloc_pages_direct_reclaim
>>               -__perform_reclaim
>>                -try_to_free_pages
>>                 -do_try_to_free_pages
>>                  -shrink_zones
>>                   -mem_cgroup_soft_limit_reclaim
>>                    -mem_cgroup_soft_reclaim
>>                     -mem_cgroup_shrink_node
>>                      -shrink_node_memcg
>>                       -shrink_list
>>                        -shrink_inactive_list
>>                         -shrink_page_list
>>                          -wait_on_page_writeback -- the page is marked
>>                         writeback during previous move_data_block call
> 
> This is a memcg reclaim path and you would have to have __GFP_ACCOUNT in
> the gfp mask to hit it from the page allocator. I am not really familiar
> with f2fs but I doubt it is using this flag.
> 
> On the other hand the memory is charged to a memcg when the newly
> allocated page is added to the page cache. That wouldn't trigger the
> soft reclaim path but that is not really necessary because even the
> regular memcg reclaim would trigger wait_on_page_writeback for cgroup
> v1.
> 
> Also are you sure that the mapping's gfp mask has __GFP_FS set for this
> allocation? f2fs_iget uses GFP_NOFS like mask for some inode types.
> 
> All that being said, you will need to change the above call chain but it
> would be worth double checking the dead lock is real.

Hi, Michal

1. The issue is occur when do monkey test in Android Device with 4GB RAM 
+ 3GB zram, and memory cgroup v1 enabled.

2. full memory dump has caught when the issue occur and the dead lock 
has confirmed from dump. We can see the mapping->gfp_mask is 0x14200ca,
so both __GFP_ACCOUNT(0x1000000) and __GFP_FS(0x80) set

crash-arm64> struct inode.i_mapping 0xFFFFFFDFD578EEA0
   i_mapping = 0xffffffdfd578f028,
crash-arm64> struct address_space.host,gfp_mask -x 0xffffffdfd578f028
   host = 0xffffffdfd578eea0,
   gfp_mask = 0x14200ca,

Thanks
yanwu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-13 11:20   ` Wu Yan
@ 2022-04-13 11:35     ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2022-04-13 11:35 UTC (permalink / raw)
  To: Wu Yan; +Cc: jaegeuk, linux-f2fs-devel, linux-kernel, tang.ding

On Wed 13-04-22 19:20:06, Wu Yan wrote:
> On 4/13/22 17:46, Michal Hocko wrote:
> > On Wed 13-04-22 16:44:32, Rokudo Yan wrote:
> > > There is a potential deadlock in gc thread may happen
> > > under low memory as below:
> > > 
> > > gc_thread_func
> > >   -f2fs_gc
> > >    -do_garbage_collect
> > >     -gc_data_segment
> > >      -move_data_block
> > >       -set_page_writeback(fio.encrypted_page);
> > >       -f2fs_submit_page_write
> > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > encrypted_page is marked PG_writeback but may not submit to block
> > > layer immediately, if system enter low memory when gc thread try
> > > to move next data block, it may do direct reclaim and enter fs layer
> > > as below:
> > >     -move_data_block
> > >      -f2fs_grab_cache_page(index=?, for_write=false)
> > >       -grab_cache_page
> > >        -find_or_create_page
> > >         -pagecache_get_page
> > >          -__page_cache_alloc --  __GFP_FS is set
> > >           -alloc_pages_node
> > >            -__alloc_pages
> > >             -__alloc_pages_slowpath
> > >              -__alloc_pages_direct_reclaim
> > >               -__perform_reclaim
> > >                -try_to_free_pages
> > >                 -do_try_to_free_pages
> > >                  -shrink_zones
> > >                   -mem_cgroup_soft_limit_reclaim
> > >                    -mem_cgroup_soft_reclaim
> > >                     -mem_cgroup_shrink_node
> > >                      -shrink_node_memcg
> > >                       -shrink_list
> > >                        -shrink_inactive_list
> > >                         -shrink_page_list
> > >                          -wait_on_page_writeback -- the page is marked
> > >                         writeback during previous move_data_block call
> > 
> > This is a memcg reclaim path and you would have to have __GFP_ACCOUNT in
> > the gfp mask to hit it from the page allocator. I am not really familiar
> > with f2fs but I doubt it is using this flag.
> > 
> > On the other hand the memory is charged to a memcg when the newly
> > allocated page is added to the page cache. That wouldn't trigger the
> > soft reclaim path but that is not really necessary because even the
> > regular memcg reclaim would trigger wait_on_page_writeback for cgroup
> > v1.
> > 
> > Also are you sure that the mapping's gfp mask has __GFP_FS set for this
> > allocation? f2fs_iget uses GFP_NOFS like mask for some inode types.
> > 
> > All that being said, you will need to change the above call chain but it
> > would be worth double checking the dead lock is real.
> 
> Hi, Michal
> 
> 1. The issue is occur when do monkey test in Android Device with 4GB RAM +
> 3GB zram, and memory cgroup v1 enabled.
> 
> 2. full memory dump has caught when the issue occur and the dead lock has
> confirmed from dump. We can see the mapping->gfp_mask is 0x14200ca,
> so both __GFP_ACCOUNT(0x1000000) and __GFP_FS(0x80) set

This is rather surprising, I have to say because page cache is charged
explicitly (__filemap_add_folio). Are you testing with the upstream
kernel or could this be a non-upstream change possibly?

> crash-arm64> struct inode.i_mapping 0xFFFFFFDFD578EEA0
>   i_mapping = 0xffffffdfd578f028,
> crash-arm64> struct address_space.host,gfp_mask -x 0xffffffdfd578f028
>   host = 0xffffffdfd578eea0,
>   gfp_mask = 0x14200ca,

Anyway, if the __GFP_FS is set then the deadlock is possible even
without __GFP_ACCOUNT.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-13  8:44 [PATCH] f2fs: avoid deadlock in gc thread under low memory Rokudo Yan
  2022-04-13  9:46 ` Michal Hocko
@ 2022-04-13 17:00 ` Jaegeuk Kim
  2022-04-14  1:54   ` Wu Yan
  1 sibling, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2022-04-13 17:00 UTC (permalink / raw)
  To: Rokudo Yan; +Cc: linux-f2fs-devel, linux-kernel, tang.ding

On 04/13, Rokudo Yan wrote:
> There is a potential deadlock in gc thread may happen
> under low memory as below:
> 
> gc_thread_func
>  -f2fs_gc
>   -do_garbage_collect
>    -gc_data_segment
>     -move_data_block
>      -set_page_writeback(fio.encrypted_page);
>      -f2fs_submit_page_write
> as f2fs_submit_page_write try to do io merge when possible, so the
> encrypted_page is marked PG_writeback but may not submit to block
> layer immediately, if system enter low memory when gc thread try
> to move next data block, it may do direct reclaim and enter fs layer
> as below:
>    -move_data_block
>     -f2fs_grab_cache_page(index=?, for_write=false)
>      -grab_cache_page
>       -find_or_create_page
>        -pagecache_get_page
>         -__page_cache_alloc --  __GFP_FS is set
>          -alloc_pages_node
>           -__alloc_pages
>            -__alloc_pages_slowpath
>             -__alloc_pages_direct_reclaim
>              -__perform_reclaim
>               -try_to_free_pages
>                -do_try_to_free_pages
>                 -shrink_zones
>                  -mem_cgroup_soft_limit_reclaim
>                   -mem_cgroup_soft_reclaim
>                    -mem_cgroup_shrink_node
>                     -shrink_node_memcg
>                      -shrink_list
>                       -shrink_inactive_list
>                        -shrink_page_list
>                         -wait_on_page_writeback -- the page is marked
>                        writeback during previous move_data_block call
> 
> the gc thread wait for the encrypted_page writeback complete,
> but as gc thread held sbi->gc_lock, the writeback & sync thread
> may blocked waiting for sbi->gc_lock, so the bio contain the
> encrypted_page may nerver submit to block layer and complete the
> writeback, which cause deadlock. To avoid this deadlock condition,
> we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
> enter fs layer when try to alloc cache page during move_data_block.
> 
> Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
> ---
>  fs/f2fs/gc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index e020804f7b07..cc71f77b98c8 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
>  
>  	wait_ms = gc_th->min_sleep_time;
>  
> +	/*
> +	 * Make sure that no allocations from gc thread will ever
> +	 * recurse to the fs layer to avoid deadlock as it will
> +	 * hold sbi->gc_lock during garbage collection
> +	 */
> +	memalloc_nofs_save();

I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:

--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
                                CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;

        /* do not read out */
-       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
+       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
        if (!page)
                return -ENOMEM;

Thanks,

>  	set_freezable();
>  	do {
>  		bool sync_mode, foreground = false;
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-13 17:00 ` Jaegeuk Kim
@ 2022-04-14  1:54   ` Wu Yan
  2022-04-14  2:18     ` Jaegeuk Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Wu Yan @ 2022-04-14  1:54 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel, tang.ding

On 4/14/22 01:00, Jaegeuk Kim wrote:
> On 04/13, Rokudo Yan wrote:
>> There is a potential deadlock in gc thread may happen
>> under low memory as below:
>>
>> gc_thread_func
>>   -f2fs_gc
>>    -do_garbage_collect
>>     -gc_data_segment
>>      -move_data_block
>>       -set_page_writeback(fio.encrypted_page);
>>       -f2fs_submit_page_write
>> as f2fs_submit_page_write try to do io merge when possible, so the
>> encrypted_page is marked PG_writeback but may not submit to block
>> layer immediately, if system enter low memory when gc thread try
>> to move next data block, it may do direct reclaim and enter fs layer
>> as below:
>>     -move_data_block
>>      -f2fs_grab_cache_page(index=?, for_write=false)
>>       -grab_cache_page
>>        -find_or_create_page
>>         -pagecache_get_page
>>          -__page_cache_alloc --  __GFP_FS is set
>>           -alloc_pages_node
>>            -__alloc_pages
>>             -__alloc_pages_slowpath
>>              -__alloc_pages_direct_reclaim
>>               -__perform_reclaim
>>                -try_to_free_pages
>>                 -do_try_to_free_pages
>>                  -shrink_zones
>>                   -mem_cgroup_soft_limit_reclaim
>>                    -mem_cgroup_soft_reclaim
>>                     -mem_cgroup_shrink_node
>>                      -shrink_node_memcg
>>                       -shrink_list
>>                        -shrink_inactive_list
>>                         -shrink_page_list
>>                          -wait_on_page_writeback -- the page is marked
>>                         writeback during previous move_data_block call
>>
>> the gc thread wait for the encrypted_page writeback complete,
>> but as gc thread held sbi->gc_lock, the writeback & sync thread
>> may blocked waiting for sbi->gc_lock, so the bio contain the
>> encrypted_page may nerver submit to block layer and complete the
>> writeback, which cause deadlock. To avoid this deadlock condition,
>> we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
>> enter fs layer when try to alloc cache page during move_data_block.
>>
>> Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
>> ---
>>   fs/f2fs/gc.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index e020804f7b07..cc71f77b98c8 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
>>   
>>   	wait_ms = gc_th->min_sleep_time;
>>   
>> +	/*
>> +	 * Make sure that no allocations from gc thread will ever
>> +	 * recurse to the fs layer to avoid deadlock as it will
>> +	 * hold sbi->gc_lock during garbage collection
>> +	 */
>> +	memalloc_nofs_save();
> 
> I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
> 
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
>                                  CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
> 
>          /* do not read out */
> -       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
> +       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
>          if (!page)
>                  return -ENOMEM;
> 
> Thanks,
> 
>>   	set_freezable();
>>   	do {
>>   		bool sync_mode, foreground = false;
>> -- 
>> 2.25.1

Hi, Jaegeuk

I'm not sure if any other case may trigger the issue, but the stack 
traces I have caught so far are all the same as below:

f2fs_gc-253:12  D 226966.808196 572 302561 150976 0x1200840 0x0 572 
237207473347056
<ffffff889d88668c> __switch_to+0x134/0x150
<ffffff889e764b6c> __schedule+0xd5c/0x1100
<ffffff889e76554c> io_schedule+0x90/0xc0
<ffffff889d9fb880> wait_on_page_bit+0x194/0x208
<ffffff889da167b4> shrink_page_list+0x62c/0xe74
<ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
<ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
<ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
<ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
<ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
<ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
<ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
<ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec
<ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
<ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
<ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
<ffffff889d8de9b0> kthread+0x17c/0x18c
<ffffff889d88567c> ret_from_fork+0x10/0x18

Thanks
yanwu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-14  1:54   ` Wu Yan
@ 2022-04-14  2:18     ` Jaegeuk Kim
  2022-04-14  2:27       ` Wu Yan
  0 siblings, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2022-04-14  2:18 UTC (permalink / raw)
  To: Wu Yan; +Cc: linux-f2fs-devel, linux-kernel, tang.ding

On 04/14, Wu Yan wrote:
> On 4/14/22 01:00, Jaegeuk Kim wrote:
> > On 04/13, Rokudo Yan wrote:
> > > There is a potential deadlock in gc thread may happen
> > > under low memory as below:
> > > 
> > > gc_thread_func
> > >   -f2fs_gc
> > >    -do_garbage_collect
> > >     -gc_data_segment
> > >      -move_data_block
> > >       -set_page_writeback(fio.encrypted_page);
> > >       -f2fs_submit_page_write
> > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > encrypted_page is marked PG_writeback but may not submit to block
> > > layer immediately, if system enter low memory when gc thread try
> > > to move next data block, it may do direct reclaim and enter fs layer
> > > as below:
> > >     -move_data_block
> > >      -f2fs_grab_cache_page(index=?, for_write=false)
> > >       -grab_cache_page
> > >        -find_or_create_page
> > >         -pagecache_get_page
> > >          -__page_cache_alloc --  __GFP_FS is set
> > >           -alloc_pages_node
> > >            -__alloc_pages
> > >             -__alloc_pages_slowpath
> > >              -__alloc_pages_direct_reclaim
> > >               -__perform_reclaim
> > >                -try_to_free_pages
> > >                 -do_try_to_free_pages
> > >                  -shrink_zones
> > >                   -mem_cgroup_soft_limit_reclaim
> > >                    -mem_cgroup_soft_reclaim
> > >                     -mem_cgroup_shrink_node
> > >                      -shrink_node_memcg
> > >                       -shrink_list
> > >                        -shrink_inactive_list
> > >                         -shrink_page_list
> > >                          -wait_on_page_writeback -- the page is marked
> > >                         writeback during previous move_data_block call
> > > 
> > > the gc thread wait for the encrypted_page writeback complete,
> > > but as gc thread held sbi->gc_lock, the writeback & sync thread
> > > may blocked waiting for sbi->gc_lock, so the bio contain the
> > > encrypted_page may nerver submit to block layer and complete the
> > > writeback, which cause deadlock. To avoid this deadlock condition,
> > > we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
> > > enter fs layer when try to alloc cache page during move_data_block.
> > > 
> > > Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
> > > ---
> > >   fs/f2fs/gc.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > index e020804f7b07..cc71f77b98c8 100644
> > > --- a/fs/f2fs/gc.c
> > > +++ b/fs/f2fs/gc.c
> > > @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
> > >   	wait_ms = gc_th->min_sleep_time;
> > > +	/*
> > > +	 * Make sure that no allocations from gc thread will ever
> > > +	 * recurse to the fs layer to avoid deadlock as it will
> > > +	 * hold sbi->gc_lock during garbage collection
> > > +	 */
> > > +	memalloc_nofs_save();
> > 
> > I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
> > 
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
> >                                  CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
> > 
> >          /* do not read out */
> > -       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
> > +       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
> >          if (!page)
> >                  return -ENOMEM;
> > 
> > Thanks,
> > 
> > >   	set_freezable();
> > >   	do {
> > >   		bool sync_mode, foreground = false;
> > > -- 
> > > 2.25.1
> 
> Hi, Jaegeuk
> 
> I'm not sure if any other case may trigger the issue, but the stack traces I
> have caught so far are all the same as below:
> 
> f2fs_gc-253:12  D 226966.808196 572 302561 150976 0x1200840 0x0 572
> 237207473347056
> <ffffff889d88668c> __switch_to+0x134/0x150
> <ffffff889e764b6c> __schedule+0xd5c/0x1100
> <ffffff889e76554c> io_schedule+0x90/0xc0
> <ffffff889d9fb880> wait_on_page_bit+0x194/0x208
> <ffffff889da167b4> shrink_page_list+0x62c/0xe74
> <ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
> <ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
> <ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
> <ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
> <ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
> <ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
> <ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
> <ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec

Is this deadlock trying to grab a lock, instead of waiting for writeback?
Could you share all the backtraces of the tasks?

For writeback above, looking at the code, f2fs_gc uses three mappings, meta,
node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(),
while data inode does not. So, the above f2fs_grab_cache_page() in
move_data_block() is actually calling w/o NOFS.

> <ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
> <ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
> <ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
> <ffffff889d8de9b0> kthread+0x17c/0x18c
> <ffffff889d88567c> ret_from_fork+0x10/0x18
> 
> Thanks
> yanwu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-14  2:18     ` Jaegeuk Kim
@ 2022-04-14  2:27       ` Wu Yan
  2022-04-14  2:42         ` Jaegeuk Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Wu Yan @ 2022-04-14  2:27 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel, tang.ding

On 4/14/22 10:18, Jaegeuk Kim wrote:
> On 04/14, Wu Yan wrote:
>> On 4/14/22 01:00, Jaegeuk Kim wrote:
>>> On 04/13, Rokudo Yan wrote:
>>>> There is a potential deadlock in gc thread may happen
>>>> under low memory as below:
>>>>
>>>> gc_thread_func
>>>>    -f2fs_gc
>>>>     -do_garbage_collect
>>>>      -gc_data_segment
>>>>       -move_data_block
>>>>        -set_page_writeback(fio.encrypted_page);
>>>>        -f2fs_submit_page_write
>>>> as f2fs_submit_page_write try to do io merge when possible, so the
>>>> encrypted_page is marked PG_writeback but may not submit to block
>>>> layer immediately, if system enter low memory when gc thread try
>>>> to move next data block, it may do direct reclaim and enter fs layer
>>>> as below:
>>>>      -move_data_block
>>>>       -f2fs_grab_cache_page(index=?, for_write=false)
>>>>        -grab_cache_page
>>>>         -find_or_create_page
>>>>          -pagecache_get_page
>>>>           -__page_cache_alloc --  __GFP_FS is set
>>>>            -alloc_pages_node
>>>>             -__alloc_pages
>>>>              -__alloc_pages_slowpath
>>>>               -__alloc_pages_direct_reclaim
>>>>                -__perform_reclaim
>>>>                 -try_to_free_pages
>>>>                  -do_try_to_free_pages
>>>>                   -shrink_zones
>>>>                    -mem_cgroup_soft_limit_reclaim
>>>>                     -mem_cgroup_soft_reclaim
>>>>                      -mem_cgroup_shrink_node
>>>>                       -shrink_node_memcg
>>>>                        -shrink_list
>>>>                         -shrink_inactive_list
>>>>                          -shrink_page_list
>>>>                           -wait_on_page_writeback -- the page is marked
>>>>                          writeback during previous move_data_block call
>>>>
>>>> the gc thread wait for the encrypted_page writeback complete,
>>>> but as gc thread held sbi->gc_lock, the writeback & sync thread
>>>> may blocked waiting for sbi->gc_lock, so the bio contain the
>>>> encrypted_page may nerver submit to block layer and complete the
>>>> writeback, which cause deadlock. To avoid this deadlock condition,
>>>> we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
>>>> enter fs layer when try to alloc cache page during move_data_block.
>>>>
>>>> Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
>>>> ---
>>>>    fs/f2fs/gc.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>> index e020804f7b07..cc71f77b98c8 100644
>>>> --- a/fs/f2fs/gc.c
>>>> +++ b/fs/f2fs/gc.c
>>>> @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
>>>>    	wait_ms = gc_th->min_sleep_time;
>>>> +	/*
>>>> +	 * Make sure that no allocations from gc thread will ever
>>>> +	 * recurse to the fs layer to avoid deadlock as it will
>>>> +	 * hold sbi->gc_lock during garbage collection
>>>> +	 */
>>>> +	memalloc_nofs_save();
>>>
>>> I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
>>>
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
>>>                                   CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
>>>
>>>           /* do not read out */
>>> -       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
>>> +       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
>>>           if (!page)
>>>                   return -ENOMEM;
>>>
>>> Thanks,
>>>
>>>>    	set_freezable();
>>>>    	do {
>>>>    		bool sync_mode, foreground = false;
>>>> -- 
>>>> 2.25.1
>>
>> Hi, Jaegeuk
>>
>> I'm not sure if any other case may trigger the issue, but the stack traces I
>> have caught so far are all the same as below:
>>
>> f2fs_gc-253:12  D 226966.808196 572 302561 150976 0x1200840 0x0 572
>> 237207473347056
>> <ffffff889d88668c> __switch_to+0x134/0x150
>> <ffffff889e764b6c> __schedule+0xd5c/0x1100
>> <ffffff889e76554c> io_schedule+0x90/0xc0
>> <ffffff889d9fb880> wait_on_page_bit+0x194/0x208
>> <ffffff889da167b4> shrink_page_list+0x62c/0xe74
>> <ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
>> <ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
>> <ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
>> <ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
>> <ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
>> <ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
>> <ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
>> <ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec
> 
> Is this deadlock trying to grab a lock, instead of waiting for writeback?
> Could you share all the backtraces of the tasks?
> 
> For writeback above, looking at the code, f2fs_gc uses three mappings, meta,
> node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(),
> while data inode does not. So, the above f2fs_grab_cache_page() in
> move_data_block() is actually calling w/o NOFS.
> 
>> <ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
>> <ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
>> <ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
>> <ffffff889d8de9b0> kthread+0x17c/0x18c
>> <ffffff889d88567c> ret_from_fork+0x10/0x18
>>
>> Thanks
>> yanwu

Hi, Jaegeuk

The gc thread is blocked on wait_on_page_writeback(encrypted page submit 
before) when it try grab data inode page, the parsed stack traces as below:

ppid=572 pid=572 D cpu=1 prio=120 wait=378s f2fs_gc-253:12
    Native callstack:
	vmlinux  wait_on_page_bit_common(page=0xFFFFFFBF7D2CD700, state=2, 
lock=false) + 304 
                                <mm/filemap.c:1035>
	vmlinux  wait_on_page_bit(page=0xFFFFFFBF7D2CD700, bit_nr=15) + 400 
 
                             <mm/filemap.c:1074>
	vmlinux  wait_on_page_writeback(page=0xFFFFFFBF7D2CD700) + 36 
 
                             <include/linux/pagemap.h:557>
	vmlinux  shrink_page_list(page_list=0xFFFFFF8011E83418, 
pgdat=contig_page_data, sc=0xFFFFFF8011E835B8, ttu_flags=0, 
stat=0xFFFFFF8011E833F0, force_reclaim=false) + 1576  <mm/vmscan.c:1171>
	vmlinux  shrink_inactive_list(lruvec=0xFFFFFFE003C304C0, 
sc=0xFFFFFF8011E835B8, lru=LRU_INACTIVE_FILE) + 700 
                                          <mm/vmscan.c:1966>
	vmlinux  shrink_list(lru=LRU_INACTIVE_FILE, lruvec=0xFFFFFF8011E834B8, 
sc=0xFFFFFF8011E835B8) + 128 
                            <mm/vmscan.c:2350>
	vmlinux  shrink_node_memcg(pgdat=contig_page_data, 
memcg=0xFFFFFFE003C1A300, sc=0xFFFFFF8011E835B8, 
lru_pages=0xFFFFFF8011E835B0) + 984 
<mm/vmscan.c:2726>
	vmlinux  mem_cgroup_shrink_node(memcg=0xFFFFFFE003C1A300, 
gfp_mask=21102794, noswap=false, pgdat=contig_page_data, 
nr_scanned=0xFFFFFF8011E836A0) + 320                   <mm/vmscan.c:3416>
	vmlinux  mem_cgroup_soft_reclaim(root_memcg=0xFFFFFFE003C1A300, 
pgdat=contig_page_data) + 164 
                                   <mm/memcontrol.c:1643>
	vmlinux  mem_cgroup_soft_limit_reclaim(pgdat=contig_page_data, order=0, 
gfp_mask=21102794, total_scanned=0xFFFFFF8011E83720) + 388 
                           <mm/memcontrol.c:2913>
	vmlinux  shrink_zones(zonelist=contig_page_data + 14784, 
sc=0xFFFFFF8011E83790) + 352 
                                          <mm/vmscan.c:3094>
	vmlinux  do_try_to_free_pages(zonelist=contig_page_data + 14784, 
sc=0xFFFFFF8011E83790) + 512 
                                  <mm/vmscan.c:3164>
	vmlinux  try_to_free_pages(zonelist=contig_page_data + 14784, order=0, 
gfp_mask=21102794, nodemask=0) + 856 
                            <mm/vmscan.c:3370>
	vmlinux  __perform_reclaim(gfp_mask=300431548, order=0, 
ac=0xFFFFFF8011E83900) + 60 
                                           <mm/page_alloc.c:3831>
	vmlinux  __alloc_pages_direct_reclaim(gfp_mask=300431548, order=0, 
alloc_flags=300431604, ac=0xFFFFFF8011E83900) + 60 
                                <mm/page_alloc.c:3853>
	vmlinux  __alloc_pages_slowpath(gfp_mask=300431548, order=0, 
ac=0xFFFFFF8011E83900) + 1244 
                                      <mm/page_alloc.c:4240>
	vmlinux  __alloc_pages_nodemask() + 1952 
 
                             <mm/page_alloc.c:4463>
	vmlinux  __alloc_pages(gfp_mask=21102794, order=0, preferred_nid=0) + 
16 
                             <include/linux/gfp.h:515>
	vmlinux  __alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16 
 
                             <include/linux/gfp.h:528>
	vmlinux  alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16 
 
                             <include/linux/gfp.h:542>
	vmlinux  __page_cache_alloc(gfp=21102794) + 16 
 
                             <include/linux/pagemap.h:226>
	vmlinux  pagecache_get_page() + 384 
 
                             <mm/filemap.c:1520>
	vmlinux  find_or_create_page(offset=209) + 112 
 
                             <include/linux/pagemap.h:333>
	vmlinux  grab_cache_page(index=209) + 112 
 
                             <include/linux/pagemap.h:399>
	vmlinux  f2fs_grab_cache_page(index=209, for_write=false) + 112 
 
                             <fs/f2fs/f2fs.h:2429>
	vmlinux  move_data_block(inode=0xFFFFFFDFD578EEA0, gc_type=300432152, 
segno=21904, off=145) + 3584 
                             <fs/f2fs/gc.c:1119>
	vmlinux  gc_data_segment(sbi=0xFFFFFFE007C03000, 
sum=0xFFFFFF8011E83B10, gc_list=0xFFFFFF8011E83AB8, segno=21904, 
gc_type=300432152) + 3644                               <fs/f2fs/gc.c:1475>
	vmlinux  do_garbage_collect(sbi=0xFFFFFFE007C03000, start_segno=21904, 
gc_list=0xFFFFFF8011E83CF0, gc_type=0) + 4060 
                            <fs/f2fs/gc.c:1592>
	vmlinux  f2fs_gc(sbi=0xFFFFFFE007C03000, background=true, 
segno=4294967295) + 1180 
                                         <fs/f2fs/gc.c:1684>
	vmlinux  gc_thread_func(data=0xFFFFFFE007C03000) + 572 
 
                             <fs/f2fs/gc.c:118>
	vmlinux  kthread() + 376 
 
                             <kernel/kthread.c:232>
	vmlinux  ret_from_fork() +

Thanks
yanwu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
  2022-04-14  2:27       ` Wu Yan
@ 2022-04-14  2:42         ` Jaegeuk Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Jaegeuk Kim @ 2022-04-14  2:42 UTC (permalink / raw)
  To: Wu Yan; +Cc: linux-f2fs-devel, linux-kernel, tang.ding

On 04/14, Wu Yan wrote:
> On 4/14/22 10:18, Jaegeuk Kim wrote:
> > On 04/14, Wu Yan wrote:
> > > On 4/14/22 01:00, Jaegeuk Kim wrote:
> > > > On 04/13, Rokudo Yan wrote:
> > > > > There is a potential deadlock in gc thread may happen
> > > > > under low memory as below:
> > > > > 
> > > > > gc_thread_func
> > > > >    -f2fs_gc
> > > > >     -do_garbage_collect
> > > > >      -gc_data_segment
> > > > >       -move_data_block
> > > > >        -set_page_writeback(fio.encrypted_page);
> > > > >        -f2fs_submit_page_write
> > > > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > > > encrypted_page is marked PG_writeback but may not submit to block
> > > > > layer immediately, if system enter low memory when gc thread try
> > > > > to move next data block, it may do direct reclaim and enter fs layer
> > > > > as below:
> > > > >      -move_data_block
> > > > >       -f2fs_grab_cache_page(index=?, for_write=false)
> > > > >        -grab_cache_page
> > > > >         -find_or_create_page
> > > > >          -pagecache_get_page
> > > > >           -__page_cache_alloc --  __GFP_FS is set
> > > > >            -alloc_pages_node
> > > > >             -__alloc_pages
> > > > >              -__alloc_pages_slowpath
> > > > >               -__alloc_pages_direct_reclaim
> > > > >                -__perform_reclaim
> > > > >                 -try_to_free_pages
> > > > >                  -do_try_to_free_pages
> > > > >                   -shrink_zones
> > > > >                    -mem_cgroup_soft_limit_reclaim
> > > > >                     -mem_cgroup_soft_reclaim
> > > > >                      -mem_cgroup_shrink_node
> > > > >                       -shrink_node_memcg
> > > > >                        -shrink_list
> > > > >                         -shrink_inactive_list
> > > > >                          -shrink_page_list
> > > > >                           -wait_on_page_writeback -- the page is marked
> > > > >                          writeback during previous move_data_block call
> > > > > 
> > > > > the gc thread wait for the encrypted_page writeback complete,
> > > > > but as gc thread held sbi->gc_lock, the writeback & sync thread
> > > > > may blocked waiting for sbi->gc_lock, so the bio contain the
> > > > > encrypted_page may nerver submit to block layer and complete the
> > > > > writeback, which cause deadlock. To avoid this deadlock condition,
> > > > > we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
> > > > > enter fs layer when try to alloc cache page during move_data_block.
> > > > > 
> > > > > Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
> > > > > ---
> > > > >    fs/f2fs/gc.c | 6 ++++++
> > > > >    1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > > > index e020804f7b07..cc71f77b98c8 100644
> > > > > --- a/fs/f2fs/gc.c
> > > > > +++ b/fs/f2fs/gc.c
> > > > > @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
> > > > >    	wait_ms = gc_th->min_sleep_time;
> > > > > +	/*
> > > > > +	 * Make sure that no allocations from gc thread will ever
> > > > > +	 * recurse to the fs layer to avoid deadlock as it will
> > > > > +	 * hold sbi->gc_lock during garbage collection
> > > > > +	 */
> > > > > +	memalloc_nofs_save();
> > > > 
> > > > I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
> > > > 
> > > > --- a/fs/f2fs/gc.c
> > > > +++ b/fs/f2fs/gc.c
> > > > @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
> > > >                                   CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
> > > > 
> > > >           /* do not read out */
> > > > -       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
> > > > +       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
> > > >           if (!page)
> > > >                   return -ENOMEM;
> > > > 
> > > > Thanks,
> > > > 
> > > > >    	set_freezable();
> > > > >    	do {
> > > > >    		bool sync_mode, foreground = false;
> > > > > -- 
> > > > > 2.25.1
> > > 
> > > Hi, Jaegeuk
> > > 
> > > I'm not sure if any other case may trigger the issue, but the stack traces I
> > > have caught so far are all the same as below:
> > > 
> > > f2fs_gc-253:12  D 226966.808196 572 302561 150976 0x1200840 0x0 572
> > > 237207473347056
> > > <ffffff889d88668c> __switch_to+0x134/0x150
> > > <ffffff889e764b6c> __schedule+0xd5c/0x1100
> > > <ffffff889e76554c> io_schedule+0x90/0xc0
> > > <ffffff889d9fb880> wait_on_page_bit+0x194/0x208
> > > <ffffff889da167b4> shrink_page_list+0x62c/0xe74
> > > <ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
> > > <ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
> > > <ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
> > > <ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
> > > <ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
> > > <ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
> > > <ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
> > > <ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec
> > 
> > Is this deadlock trying to grab a lock, instead of waiting for writeback?
> > Could you share all the backtraces of the tasks?
> > 
> > For writeback above, looking at the code, f2fs_gc uses three mappings, meta,
> > node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(),
> > while data inode does not. So, the above f2fs_grab_cache_page() in
> > move_data_block() is actually calling w/o NOFS.
> > 
> > > <ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
> > > <ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
> > > <ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
> > > <ffffff889d8de9b0> kthread+0x17c/0x18c
> > > <ffffff889d88567c> ret_from_fork+0x10/0x18
> > > 
> > > Thanks
> > > yanwu
> 
> Hi, Jaegeuk
> 
> The gc thread is blocked on wait_on_page_writeback(encrypted page submit
> before) when it try grab data inode page, the parsed stack traces as below:
> 
> ppid=572 pid=572 D cpu=1 prio=120 wait=378s f2fs_gc-253:12
>    Native callstack:
> 	vmlinux  wait_on_page_bit_common(page=0xFFFFFFBF7D2CD700, state=2,
> lock=false) + 304                                <mm/filemap.c:1035>
> 	vmlinux  wait_on_page_bit(page=0xFFFFFFBF7D2CD700, bit_nr=15) + 400
> 
>                             <mm/filemap.c:1074>
> 	vmlinux  wait_on_page_writeback(page=0xFFFFFFBF7D2CD700) + 36
> 
>                             <include/linux/pagemap.h:557>
> 	vmlinux  shrink_page_list(page_list=0xFFFFFF8011E83418,
> pgdat=contig_page_data, sc=0xFFFFFF8011E835B8, ttu_flags=0,
> stat=0xFFFFFF8011E833F0, force_reclaim=false) + 1576  <mm/vmscan.c:1171>
> 	vmlinux  shrink_inactive_list(lruvec=0xFFFFFFE003C304C0,
> sc=0xFFFFFF8011E835B8, lru=LRU_INACTIVE_FILE) + 700
> <mm/vmscan.c:1966>
> 	vmlinux  shrink_list(lru=LRU_INACTIVE_FILE, lruvec=0xFFFFFF8011E834B8,
> sc=0xFFFFFF8011E835B8) + 128                            <mm/vmscan.c:2350>
> 	vmlinux  shrink_node_memcg(pgdat=contig_page_data,
> memcg=0xFFFFFFE003C1A300, sc=0xFFFFFF8011E835B8,
> lru_pages=0xFFFFFF8011E835B0) + 984 <mm/vmscan.c:2726>
> 	vmlinux  mem_cgroup_shrink_node(memcg=0xFFFFFFE003C1A300,
> gfp_mask=21102794, noswap=false, pgdat=contig_page_data,
> nr_scanned=0xFFFFFF8011E836A0) + 320                   <mm/vmscan.c:3416>
> 	vmlinux  mem_cgroup_soft_reclaim(root_memcg=0xFFFFFFE003C1A300,
> pgdat=contig_page_data) + 164
> <mm/memcontrol.c:1643>
> 	vmlinux  mem_cgroup_soft_limit_reclaim(pgdat=contig_page_data, order=0,
> gfp_mask=21102794, total_scanned=0xFFFFFF8011E83720) + 388
> <mm/memcontrol.c:2913>
> 	vmlinux  shrink_zones(zonelist=contig_page_data + 14784,
> sc=0xFFFFFF8011E83790) + 352
> <mm/vmscan.c:3094>
> 	vmlinux  do_try_to_free_pages(zonelist=contig_page_data + 14784,
> sc=0xFFFFFF8011E83790) + 512
> <mm/vmscan.c:3164>
> 	vmlinux  try_to_free_pages(zonelist=contig_page_data + 14784, order=0,
> gfp_mask=21102794, nodemask=0) + 856
> <mm/vmscan.c:3370>
> 	vmlinux  __perform_reclaim(gfp_mask=300431548, order=0,
> ac=0xFFFFFF8011E83900) + 60
> <mm/page_alloc.c:3831>
> 	vmlinux  __alloc_pages_direct_reclaim(gfp_mask=300431548, order=0,
> alloc_flags=300431604, ac=0xFFFFFF8011E83900) + 60
> <mm/page_alloc.c:3853>
> 	vmlinux  __alloc_pages_slowpath(gfp_mask=300431548, order=0,
> ac=0xFFFFFF8011E83900) + 1244
> <mm/page_alloc.c:4240>
> 	vmlinux  __alloc_pages_nodemask() + 1952
> 
>                             <mm/page_alloc.c:4463>
> 	vmlinux  __alloc_pages(gfp_mask=21102794, order=0, preferred_nid=0) + 16
> <include/linux/gfp.h:515>
> 	vmlinux  __alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16
> 
>                             <include/linux/gfp.h:528>
> 	vmlinux  alloc_pages_node(nid=0, gfp_mask=21102794, order=0) + 16
> 
>                             <include/linux/gfp.h:542>
> 	vmlinux  __page_cache_alloc(gfp=21102794) + 16
> 
>                             <include/linux/pagemap.h:226>
> 	vmlinux  pagecache_get_page() + 384
> 
>                             <mm/filemap.c:1520>
> 	vmlinux  find_or_create_page(offset=209) + 112
> 
>                             <include/linux/pagemap.h:333>
> 	vmlinux  grab_cache_page(index=209) + 112
> 
>                             <include/linux/pagemap.h:399>
> 	vmlinux  f2fs_grab_cache_page(index=209, for_write=false) + 112

Ok, I think this should be enough.

--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
                                  CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;

          /* do not read out */
-       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
+       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
          if (!page)
                  return -ENOMEM;

> 
>                             <fs/f2fs/f2fs.h:2429>
> 	vmlinux  move_data_block(inode=0xFFFFFFDFD578EEA0, gc_type=300432152,
> segno=21904, off=145) + 3584                             <fs/f2fs/gc.c:1119>
> 	vmlinux  gc_data_segment(sbi=0xFFFFFFE007C03000, sum=0xFFFFFF8011E83B10,
> gc_list=0xFFFFFF8011E83AB8, segno=21904, gc_type=300432152) + 3644
> <fs/f2fs/gc.c:1475>
> 	vmlinux  do_garbage_collect(sbi=0xFFFFFFE007C03000, start_segno=21904,
> gc_list=0xFFFFFF8011E83CF0, gc_type=0) + 4060
> <fs/f2fs/gc.c:1592>
> 	vmlinux  f2fs_gc(sbi=0xFFFFFFE007C03000, background=true, segno=4294967295)
> + 1180                                         <fs/f2fs/gc.c:1684>
> 	vmlinux  gc_thread_func(data=0xFFFFFFE007C03000) + 572
> 
>                             <fs/f2fs/gc.c:118>
> 	vmlinux  kthread() + 376
> 
>                             <kernel/kthread.c:232>
> 	vmlinux  ret_from_fork() +
> 
> Thanks
> yanwu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-14  2:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13  8:44 [PATCH] f2fs: avoid deadlock in gc thread under low memory Rokudo Yan
2022-04-13  9:46 ` Michal Hocko
2022-04-13 11:20   ` Wu Yan
2022-04-13 11:35     ` Michal Hocko
2022-04-13 17:00 ` Jaegeuk Kim
2022-04-14  1:54   ` Wu Yan
2022-04-14  2:18     ` Jaegeuk Kim
2022-04-14  2:27       ` Wu Yan
2022-04-14  2:42         ` Jaegeuk Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).