linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fuse: avoid deadlock when write fuse inode
@ 2021-02-02  4:08 Huang Jianan
  2021-02-02  4:11 ` Huang Jianan
  2021-03-24 15:28 ` Miklos Szeredi
  0 siblings, 2 replies; 6+ messages in thread
From: Huang Jianan @ 2021-02-02  4:08 UTC (permalink / raw)
  To: fuse-devel; +Cc: huangjianan, guoweichao, zhangshiming, linux-kernel

We found the following deadlock situations in low memory scenarios:
Thread A                         Thread B
- __writeback_single_inode
 - fuse_write_inode
  - fuse_simple_request
   - __fuse_request_send
    - request_wait_answer
                                 - fuse_dev_splice_read
                                  - fuse_copy_fill
                                   - __alloc_pages_direct_reclaim
                                    - do_shrink_slab
                                     - super_cache_scan
                                      - shrink_dentry_list
                                       - dentry_unlink_inode
                                        - iput_final
                                         - inode_wait_for_writeback

The request and inode processed by Thread A and B are the same, which
causes a deadlock. To avoid this, we remove the __GFP_FS flag when
allocating memory in fuse_copy_fill, so there will be no memory
reclaimation in super_cache_scan.

Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
---
 fs/fuse/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 588f8d1240aa..e580b9d04c25 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -721,7 +721,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs)
 			if (cs->nr_segs >= cs->pipe->max_usage)
 				return -EIO;
 
-			page = alloc_page(GFP_HIGHUSER);
+			page = alloc_page(GFP_HIGHUSER & ~__GFP_FS);
 			if (!page)
 				return -ENOMEM;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] fuse: avoid deadlock when write fuse inode
  2021-02-02  4:08 [PATCH] fuse: avoid deadlock when write fuse inode Huang Jianan
@ 2021-02-02  4:11 ` Huang Jianan
  2021-02-07  1:47   ` [fuse-devel] " Huang Jianan
  2021-03-24 15:28 ` Miklos Szeredi
  1 sibling, 1 reply; 6+ messages in thread
From: Huang Jianan @ 2021-02-02  4:11 UTC (permalink / raw)
  To: fuse-devel; +Cc: guoweichao, zhangshiming, linux-kernel

Hi all,


This patch works well in our product, but I am not sure this is the correct

way to solve this problem. I think that the inode->i_count shouldn't be

zero after iput is executed in dentry_unlink_inode, then the inode won't

be writeback. But i haven't found where iget is missing.


Thanks,

Jianan

On 2021/2/2 12:08, Huang Jianan wrote:
> We found the following deadlock situations in low memory scenarios:
> Thread A                         Thread B
> - __writeback_single_inode
>   - fuse_write_inode
>    - fuse_simple_request
>     - __fuse_request_send
>      - request_wait_answer
>                                   - fuse_dev_splice_read
>                                    - fuse_copy_fill
>                                     - __alloc_pages_direct_reclaim
>                                      - do_shrink_slab
>                                       - super_cache_scan
>                                        - shrink_dentry_list
>                                         - dentry_unlink_inode
>                                          - iput_final
>                                           - inode_wait_for_writeback
>
> The request and inode processed by Thread A and B are the same, which
> causes a deadlock. To avoid this, we remove the __GFP_FS flag when
> allocating memory in fuse_copy_fill, so there will be no memory
> reclaimation in super_cache_scan.
>
> Signed-off-by: Huang Jianan <huangjianan@oppo.com>
> Signed-off-by: Guo Weichao <guoweichao@oppo.com>
> ---
>   fs/fuse/dev.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 588f8d1240aa..e580b9d04c25 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -721,7 +721,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs)
>   			if (cs->nr_segs >= cs->pipe->max_usage)
>   				return -EIO;
>   
> -			page = alloc_page(GFP_HIGHUSER);
> +			page = alloc_page(GFP_HIGHUSER & ~__GFP_FS);
>   			if (!page)
>   				return -ENOMEM;
>   

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode
  2021-02-02  4:11 ` Huang Jianan
@ 2021-02-07  1:47   ` Huang Jianan
  0 siblings, 0 replies; 6+ messages in thread
From: Huang Jianan @ 2021-02-07  1:47 UTC (permalink / raw)
  To: fuse-devel, miklos; +Cc: guoweichao, zhangshiming, linux-kernel, linux-fsdevel

friendly ping ... 😁

On 2021/2/2 12:11, Huang Jianan via fuse-devel wrote:
> Hi all,
>
>
> This patch works well in our product, but I am not sure this is the 
> correct
>
> way to solve this problem. I think that the inode->i_count shouldn't be
>
> zero after iput is executed in dentry_unlink_inode, then the inode won't
>
> be writeback. But i haven't found where iget is missing.
>
>
> Thanks,
>
> Jianan
>
> On 2021/2/2 12:08, Huang Jianan wrote:
>> We found the following deadlock situations in low memory scenarios:
>> Thread A                         Thread B
>> - __writeback_single_inode
>>   - fuse_write_inode
>>    - fuse_simple_request
>>     - __fuse_request_send
>>      - request_wait_answer
>>                                   - fuse_dev_splice_read
>>                                    - fuse_copy_fill
>>                                     - __alloc_pages_direct_reclaim
>>                                      - do_shrink_slab
>>                                       - super_cache_scan
>>                                        - shrink_dentry_list
>>                                         - dentry_unlink_inode
>>                                          - iput_final
>>                                           - inode_wait_for_writeback
>>
>> The request and inode processed by Thread A and B are the same, which
>> causes a deadlock. To avoid this, we remove the __GFP_FS flag when
>> allocating memory in fuse_copy_fill, so there will be no memory
>> reclaimation in super_cache_scan.
>>
>> Signed-off-by: Huang Jianan <huangjianan@oppo.com>
>> Signed-off-by: Guo Weichao <guoweichao@oppo.com>
>> ---
>>   fs/fuse/dev.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 588f8d1240aa..e580b9d04c25 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -721,7 +721,7 @@ static int fuse_copy_fill(struct fuse_copy_state 
>> *cs)
>>               if (cs->nr_segs >= cs->pipe->max_usage)
>>                   return -EIO;
>>   -            page = alloc_page(GFP_HIGHUSER);
>> +            page = alloc_page(GFP_HIGHUSER & ~__GFP_FS);
>>               if (!page)
>>                   return -ENOMEM;
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode
  2021-02-02  4:08 [PATCH] fuse: avoid deadlock when write fuse inode Huang Jianan
  2021-02-02  4:11 ` Huang Jianan
@ 2021-03-24 15:28 ` Miklos Szeredi
  2022-03-10 11:10   ` Rokudo Yan
  1 sibling, 1 reply; 6+ messages in thread
From: Miklos Szeredi @ 2021-03-24 15:28 UTC (permalink / raw)
  To: Huang Jianan
  Cc: linux-kernel, guoweichao, zhangshiming, linux-fsdevel, linux-mm,
	Ed Tsai (蔡宗軒)

On Tue, Feb 2, 2021 at 5:41 AM Huang Jianan via fuse-devel
<fuse-devel@lists.sourceforge.net> wrote:
>
> We found the following deadlock situations in low memory scenarios:
> Thread A                         Thread B
> - __writeback_single_inode
>  - fuse_write_inode
>   - fuse_simple_request
>    - __fuse_request_send
>     - request_wait_answer
>                                  - fuse_dev_splice_read
>                                   - fuse_copy_fill
>                                    - __alloc_pages_direct_reclaim
>                                     - do_shrink_slab
>                                      - super_cache_scan
>                                       - shrink_dentry_list
>                                        - dentry_unlink_inode
>                                         - iput_final
>                                          - inode_wait_for_writeback

On what kernel are you seeing this?

I don't see how it can happen on upstream kernels, since there's a
"write_inode_now(inode, 1)" call in fuse_release() and nothing can
dirty the inode after the file has been released.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode
  2021-03-24 15:28 ` Miklos Szeredi
@ 2022-03-10 11:10   ` Rokudo Yan
  2022-04-25 13:15     ` Miklos Szeredi
  0 siblings, 1 reply; 6+ messages in thread
From: Rokudo Yan @ 2022-03-10 11:10 UTC (permalink / raw)
  To: miklos
  Cc: Ed.Tsai, guoweichao, huangjianan, linux-fsdevel, linux-kernel,
	linux-mm, zhangshiming

Hi, Miklos

The similar issue occurs in our Android device(4G RAM + 3G zram + 8 arm cores + kernel-4.14) too.
Under the monkey test, kswapd and fuse daemon thread deadlocked when free pages is extreme low
(less than 1/2 of the min watermark), the backtrace of the 2 threads is as follows. kswapd
try to evict inode to free some memory(blocked at inode_wait_for_writeback), and fuse daemon thread
handle the fuse inode write request, which is throttled when do direct reclaim in page allocation
slow path(blocked at throttle_direct_reclaim). As the __GFP_FS is set, the thread is throttled until
kswapd free enough pages until watermark ok(check allow_direct_reclaim), which cause the deadlock.
Although the kernel version is 4.14, the same issue exists in the upstream kernel too.

kswapd0         D 26485194.538158 157 1287917 23577482 0x1a20840 0x0 157 438599862461462
<ffffff8beec866b4> __switch_to+0x134/0x150
<ffffff8befb838cc> __schedule+0xd5c/0x1100
<ffffff8befb83ce0> schedule+0x70/0x90
<ffffff8befb849b4> bit_wait+0x14/0x54
<ffffff8befb84350> __wait_on_bit+0x74/0xe0
<ffffff8beeeae0b4> inode_wait_for_writeback+0xa0/0xe4
<ffffff8beee9b95c> evict+0xa4/0x284
<ffffff8beee99b58> iput+0x25c/0x2ac
<ffffff8beee9602c> dentry_unlink_inode+0xd8/0xe4
<ffffff8beee93274> __dentry_kill+0xe8/0x22c
<ffffff8beee9374c> shrink_dentry_list+0x19c/0x3b0
<ffffff8beee9340c> prune_dcache_sb+0x54/0x80
<ffffff8beee79c50> super_cache_scan+0x114/0x164
<ffffff8beee16504> shrink_slab+0x454/0x528
<ffffff8beee1b81c> shrink_node+0x144/0x318
<ffffff8beee1a100> kswapd+0x830/0x9e0
<ffffff8beecde9f0> kthread+0x17c/0x18c
<ffffff8beec856a4> ret_from_fork+0x10/0x18
<ffffffffffffffff> 0xffffffffffffffff

Thread-19       D 7542.719029 2888 24823 5064 0x1404840 0x1000008 24235 438599754021693
<ffffff8beec866b4> __switch_to+0x134/0x150
<ffffff8befb838cc> __schedule+0xd5c/0x1100
<ffffff8befb83ce0> schedule+0x70/0x90
<ffffff8beee18258> try_to_free_pages+0x264/0x4b0
<ffffff8beee06978> __alloc_pages_nodemask+0x7a4/0x10d0
<ffffff8beefac784> fuse_copy_fill+0x15c/0x210
<ffffff8beefabbcc> fuse_dev_do_read+0x434/0xc24
<ffffff8beefab56c> fuse_dev_splice_read+0x84/0x1d8
<ffffff8beeeb5788> SyS_splice+0x67c/0x8bc
<ffffff8beec83fc0> el0_svc_naked+0x34/0x38
<ffffffffffffffff> 0xffffffffffffffff

code snippet:
static bool throttle_direct_reclaim(...)
{
...
	/*
	 * If the caller cannot enter the filesystem, it's possible that it
	 * is due to the caller holding an FS lock or performing a journal
	 * transaction in the case of a filesystem like ext[3|4]. In this case,
	 * it is not safe to block on pfmemalloc_wait as kswapd could be
	 * blocked waiting on the same lock. Instead, throttle for up to a
	 * second before continuing.
	 */
	if (!(gfp_mask & __GFP_FS)) {
		wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
			allow_direct_reclaim(pgdat), HZ);

		goto check_pending;
	}

	/* Throttle until kswapd wakes the process */
	wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
		allow_direct_reclaim(pgdat));
...
}

Thanks,
yanwu

On Wed, 24 Mar 2021 16:28:35 +0100 Miklos Szeredi via <miklos@szeredi.hu> wrote:
> On what kernel are you seeing this?

> I don't see how it can happen on upstream kernels, since there's a
>"write_inode_now(inode, 1)" call in fuse_release() and nothing can
> dirty the inode after the file has been released.

> Thanks,
> Miklos

>On Tue, Feb 2, 2021 at 5:41 AM Huang Jianan via fuse-devel
><fuse-devel@lists.sourceforge.net> wrote:
>>
>> We found the following deadlock situations in low memory scenarios:
>> Thread A                         Thread B
>> - __writeback_single_inode
>>  - fuse_write_inode
>>   - fuse_simple_request
>>    - __fuse_request_send
>>     - request_wait_answer
>>                                  - fuse_dev_splice_read
>>                                   - fuse_copy_fill
>>                                    - __alloc_pages_direct_reclaim
>>                                     - do_shrink_slab
>>                                      - super_cache_scan
>>                                       - shrink_dentry_list
>>                                        - dentry_unlink_inode
>>                                         - iput_final
>>                                          - inode_wait_for_writeback



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode
  2022-03-10 11:10   ` Rokudo Yan
@ 2022-04-25 13:15     ` Miklos Szeredi
  0 siblings, 0 replies; 6+ messages in thread
From: Miklos Szeredi @ 2022-04-25 13:15 UTC (permalink / raw)
  To: Rokudo Yan
  Cc: Ed Tsai (蔡宗軒),
	guoweichao, Huang Jianan, linux-fsdevel, linux-kernel, linux-mm,
	zhangshiming

On Thu, 10 Mar 2022 at 12:11, Rokudo Yan <wu-yan@tcl.com> wrote:
>
> Hi, Miklos
>
> The similar issue occurs in our Android device(4G RAM + 3G zram + 8 arm cores + kernel-4.14) too.
> Under the monkey test, kswapd and fuse daemon thread deadlocked when free pages is extreme low
> (less than 1/2 of the min watermark), the backtrace of the 2 threads is as follows. kswapd
> try to evict inode to free some memory(blocked at inode_wait_for_writeback), and fuse daemon thread
> handle the fuse inode write request, which is throttled when do direct reclaim in page allocation
> slow path(blocked at throttle_direct_reclaim). As the __GFP_FS is set, the thread is throttled until
> kswapd free enough pages until watermark ok(check allow_direct_reclaim), which cause the deadlock.
> Although the kernel version is 4.14, the same issue exists in the upstream kernel too.
>
> kswapd0         D 26485194.538158 157 1287917 23577482 0x1a20840 0x0 157 438599862461462
> <ffffff8beec866b4> __switch_to+0x134/0x150
> <ffffff8befb838cc> __schedule+0xd5c/0x1100
> <ffffff8befb83ce0> schedule+0x70/0x90
> <ffffff8befb849b4> bit_wait+0x14/0x54
> <ffffff8befb84350> __wait_on_bit+0x74/0xe0
> <ffffff8beeeae0b4> inode_wait_for_writeback+0xa0/0xe4

This is the one I don't understand.  Fuse inodes must never be dirty
on eviction for the reason stated in my previous reply:

> > I don't see how it can happen on upstream kernels, since there's a
> >"write_inode_now(inode, 1)" call in fuse_release() and nothing can
> > dirty the inode after the file has been released.

If you could trace the source of this dirtyness I think that would
explain this deadlock.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-25 13:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-02  4:08 [PATCH] fuse: avoid deadlock when write fuse inode Huang Jianan
2021-02-02  4:11 ` Huang Jianan
2021-02-07  1:47   ` [fuse-devel] " Huang Jianan
2021-03-24 15:28 ` Miklos Szeredi
2022-03-10 11:10   ` Rokudo Yan
2022-04-25 13:15     ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).