* [PATCH] fuse: avoid deadlock when write fuse inode @ 2021-02-02 4:08 Huang Jianan 2021-02-02 4:11 ` Huang Jianan 2021-03-24 15:28 ` Miklos Szeredi 0 siblings, 2 replies; 6+ messages in thread From: Huang Jianan @ 2021-02-02 4:08 UTC (permalink / raw) To: fuse-devel; +Cc: huangjianan, guoweichao, zhangshiming, linux-kernel We found the following deadlock situations in low memory scenarios: Thread A Thread B - __writeback_single_inode - fuse_write_inode - fuse_simple_request - __fuse_request_send - request_wait_answer - fuse_dev_splice_read - fuse_copy_fill - __alloc_pages_direct_reclaim - do_shrink_slab - super_cache_scan - shrink_dentry_list - dentry_unlink_inode - iput_final - inode_wait_for_writeback The request and inode processed by Thread A and B are the same, which causes a deadlock. To avoid this, we remove the __GFP_FS flag when allocating memory in fuse_copy_fill, so there will be no memory reclaimation in super_cache_scan. Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> --- fs/fuse/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 588f8d1240aa..e580b9d04c25 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -721,7 +721,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; - page = alloc_page(GFP_HIGHUSER); + page = alloc_page(GFP_HIGHUSER & ~__GFP_FS); if (!page) return -ENOMEM; -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] fuse: avoid deadlock when write fuse inode 2021-02-02 4:08 [PATCH] fuse: avoid deadlock when write fuse inode Huang Jianan @ 2021-02-02 4:11 ` Huang Jianan 2021-02-07 1:47 ` [fuse-devel] " Huang Jianan 2021-03-24 15:28 ` Miklos Szeredi 1 sibling, 1 reply; 6+ messages in thread From: Huang Jianan @ 2021-02-02 4:11 UTC (permalink / raw) To: fuse-devel; +Cc: guoweichao, zhangshiming, linux-kernel Hi all, This patch works well in our product, but I am not sure this is the correct way to solve this problem. I think that the inode->i_count shouldn't be zero after iput is executed in dentry_unlink_inode, then the inode won't be writeback. But i haven't found where iget is missing. Thanks, Jianan On 2021/2/2 12:08, Huang Jianan wrote: > We found the following deadlock situations in low memory scenarios: > Thread A Thread B > - __writeback_single_inode > - fuse_write_inode > - fuse_simple_request > - __fuse_request_send > - request_wait_answer > - fuse_dev_splice_read > - fuse_copy_fill > - __alloc_pages_direct_reclaim > - do_shrink_slab > - super_cache_scan > - shrink_dentry_list > - dentry_unlink_inode > - iput_final > - inode_wait_for_writeback > > The request and inode processed by Thread A and B are the same, which > causes a deadlock. To avoid this, we remove the __GFP_FS flag when > allocating memory in fuse_copy_fill, so there will be no memory > reclaimation in super_cache_scan. > > Signed-off-by: Huang Jianan <huangjianan@oppo.com> > Signed-off-by: Guo Weichao <guoweichao@oppo.com> > --- > fs/fuse/dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 588f8d1240aa..e580b9d04c25 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -721,7 +721,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) > if (cs->nr_segs >= cs->pipe->max_usage) > return -EIO; > > - page = alloc_page(GFP_HIGHUSER); > + page = alloc_page(GFP_HIGHUSER & ~__GFP_FS); > if (!page) > return -ENOMEM; > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode 2021-02-02 4:11 ` Huang Jianan @ 2021-02-07 1:47 ` Huang Jianan 0 siblings, 0 replies; 6+ messages in thread From: Huang Jianan @ 2021-02-07 1:47 UTC (permalink / raw) To: fuse-devel, miklos; +Cc: guoweichao, zhangshiming, linux-kernel, linux-fsdevel friendly ping ... 😁 On 2021/2/2 12:11, Huang Jianan via fuse-devel wrote: > Hi all, > > > This patch works well in our product, but I am not sure this is the > correct > > way to solve this problem. I think that the inode->i_count shouldn't be > > zero after iput is executed in dentry_unlink_inode, then the inode won't > > be writeback. But i haven't found where iget is missing. > > > Thanks, > > Jianan > > On 2021/2/2 12:08, Huang Jianan wrote: >> We found the following deadlock situations in low memory scenarios: >> Thread A Thread B >> - __writeback_single_inode >> - fuse_write_inode >> - fuse_simple_request >> - __fuse_request_send >> - request_wait_answer >> - fuse_dev_splice_read >> - fuse_copy_fill >> - __alloc_pages_direct_reclaim >> - do_shrink_slab >> - super_cache_scan >> - shrink_dentry_list >> - dentry_unlink_inode >> - iput_final >> - inode_wait_for_writeback >> >> The request and inode processed by Thread A and B are the same, which >> causes a deadlock. To avoid this, we remove the __GFP_FS flag when >> allocating memory in fuse_copy_fill, so there will be no memory >> reclaimation in super_cache_scan. >> >> Signed-off-by: Huang Jianan <huangjianan@oppo.com> >> Signed-off-by: Guo Weichao <guoweichao@oppo.com> >> --- >> fs/fuse/dev.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c >> index 588f8d1240aa..e580b9d04c25 100644 >> --- a/fs/fuse/dev.c >> +++ b/fs/fuse/dev.c >> @@ -721,7 +721,7 @@ static int fuse_copy_fill(struct fuse_copy_state >> *cs) >> if (cs->nr_segs >= cs->pipe->max_usage) >> return -EIO; >> - page = alloc_page(GFP_HIGHUSER); >> + page = alloc_page(GFP_HIGHUSER & ~__GFP_FS); >> if (!page) >> return -ENOMEM; > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode 2021-02-02 4:08 [PATCH] fuse: avoid deadlock when write fuse inode Huang Jianan 2021-02-02 4:11 ` Huang Jianan @ 2021-03-24 15:28 ` Miklos Szeredi 2022-03-10 11:10 ` Rokudo Yan 1 sibling, 1 reply; 6+ messages in thread From: Miklos Szeredi @ 2021-03-24 15:28 UTC (permalink / raw) To: Huang Jianan Cc: linux-kernel, guoweichao, zhangshiming, linux-fsdevel, linux-mm, Ed Tsai (蔡宗軒) On Tue, Feb 2, 2021 at 5:41 AM Huang Jianan via fuse-devel <fuse-devel@lists.sourceforge.net> wrote: > > We found the following deadlock situations in low memory scenarios: > Thread A Thread B > - __writeback_single_inode > - fuse_write_inode > - fuse_simple_request > - __fuse_request_send > - request_wait_answer > - fuse_dev_splice_read > - fuse_copy_fill > - __alloc_pages_direct_reclaim > - do_shrink_slab > - super_cache_scan > - shrink_dentry_list > - dentry_unlink_inode > - iput_final > - inode_wait_for_writeback On what kernel are you seeing this? I don't see how it can happen on upstream kernels, since there's a "write_inode_now(inode, 1)" call in fuse_release() and nothing can dirty the inode after the file has been released. Thanks, Miklos ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode 2021-03-24 15:28 ` Miklos Szeredi @ 2022-03-10 11:10 ` Rokudo Yan 2022-04-25 13:15 ` Miklos Szeredi 0 siblings, 1 reply; 6+ messages in thread From: Rokudo Yan @ 2022-03-10 11:10 UTC (permalink / raw) To: miklos Cc: Ed.Tsai, guoweichao, huangjianan, linux-fsdevel, linux-kernel, linux-mm, zhangshiming Hi, Miklos The similar issue occurs in our Android device(4G RAM + 3G zram + 8 arm cores + kernel-4.14) too. Under the monkey test, kswapd and fuse daemon thread deadlocked when free pages is extreme low (less than 1/2 of the min watermark), the backtrace of the 2 threads is as follows. kswapd try to evict inode to free some memory(blocked at inode_wait_for_writeback), and fuse daemon thread handle the fuse inode write request, which is throttled when do direct reclaim in page allocation slow path(blocked at throttle_direct_reclaim). As the __GFP_FS is set, the thread is throttled until kswapd free enough pages until watermark ok(check allow_direct_reclaim), which cause the deadlock. Although the kernel version is 4.14, the same issue exists in the upstream kernel too. kswapd0 D 26485194.538158 157 1287917 23577482 0x1a20840 0x0 157 438599862461462 <ffffff8beec866b4> __switch_to+0x134/0x150 <ffffff8befb838cc> __schedule+0xd5c/0x1100 <ffffff8befb83ce0> schedule+0x70/0x90 <ffffff8befb849b4> bit_wait+0x14/0x54 <ffffff8befb84350> __wait_on_bit+0x74/0xe0 <ffffff8beeeae0b4> inode_wait_for_writeback+0xa0/0xe4 <ffffff8beee9b95c> evict+0xa4/0x284 <ffffff8beee99b58> iput+0x25c/0x2ac <ffffff8beee9602c> dentry_unlink_inode+0xd8/0xe4 <ffffff8beee93274> __dentry_kill+0xe8/0x22c <ffffff8beee9374c> shrink_dentry_list+0x19c/0x3b0 <ffffff8beee9340c> prune_dcache_sb+0x54/0x80 <ffffff8beee79c50> super_cache_scan+0x114/0x164 <ffffff8beee16504> shrink_slab+0x454/0x528 <ffffff8beee1b81c> shrink_node+0x144/0x318 <ffffff8beee1a100> kswapd+0x830/0x9e0 <ffffff8beecde9f0> kthread+0x17c/0x18c <ffffff8beec856a4> ret_from_fork+0x10/0x18 <ffffffffffffffff> 0xffffffffffffffff Thread-19 D 7542.719029 2888 24823 5064 0x1404840 0x1000008 24235 438599754021693 <ffffff8beec866b4> __switch_to+0x134/0x150 <ffffff8befb838cc> __schedule+0xd5c/0x1100 <ffffff8befb83ce0> schedule+0x70/0x90 <ffffff8beee18258> try_to_free_pages+0x264/0x4b0 <ffffff8beee06978> __alloc_pages_nodemask+0x7a4/0x10d0 <ffffff8beefac784> fuse_copy_fill+0x15c/0x210 <ffffff8beefabbcc> fuse_dev_do_read+0x434/0xc24 <ffffff8beefab56c> fuse_dev_splice_read+0x84/0x1d8 <ffffff8beeeb5788> SyS_splice+0x67c/0x8bc <ffffff8beec83fc0> el0_svc_naked+0x34/0x38 <ffffffffffffffff> 0xffffffffffffffff code snippet: static bool throttle_direct_reclaim(...) { ... /* * If the caller cannot enter the filesystem, it's possible that it * is due to the caller holding an FS lock or performing a journal * transaction in the case of a filesystem like ext[3|4]. In this case, * it is not safe to block on pfmemalloc_wait as kswapd could be * blocked waiting on the same lock. Instead, throttle for up to a * second before continuing. */ if (!(gfp_mask & __GFP_FS)) { wait_event_interruptible_timeout(pgdat->pfmemalloc_wait, allow_direct_reclaim(pgdat), HZ); goto check_pending; } /* Throttle until kswapd wakes the process */ wait_event_killable(zone->zone_pgdat->pfmemalloc_wait, allow_direct_reclaim(pgdat)); ... } Thanks, yanwu On Wed, 24 Mar 2021 16:28:35 +0100 Miklos Szeredi via <miklos@szeredi.hu> wrote: > On what kernel are you seeing this? > I don't see how it can happen on upstream kernels, since there's a >"write_inode_now(inode, 1)" call in fuse_release() and nothing can > dirty the inode after the file has been released. > Thanks, > Miklos >On Tue, Feb 2, 2021 at 5:41 AM Huang Jianan via fuse-devel ><fuse-devel@lists.sourceforge.net> wrote: >> >> We found the following deadlock situations in low memory scenarios: >> Thread A Thread B >> - __writeback_single_inode >> - fuse_write_inode >> - fuse_simple_request >> - __fuse_request_send >> - request_wait_answer >> - fuse_dev_splice_read >> - fuse_copy_fill >> - __alloc_pages_direct_reclaim >> - do_shrink_slab >> - super_cache_scan >> - shrink_dentry_list >> - dentry_unlink_inode >> - iput_final >> - inode_wait_for_writeback ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH] fuse: avoid deadlock when write fuse inode 2022-03-10 11:10 ` Rokudo Yan @ 2022-04-25 13:15 ` Miklos Szeredi 0 siblings, 0 replies; 6+ messages in thread From: Miklos Szeredi @ 2022-04-25 13:15 UTC (permalink / raw) To: Rokudo Yan Cc: Ed Tsai (蔡宗軒), guoweichao, Huang Jianan, linux-fsdevel, linux-kernel, linux-mm, zhangshiming On Thu, 10 Mar 2022 at 12:11, Rokudo Yan <wu-yan@tcl.com> wrote: > > Hi, Miklos > > The similar issue occurs in our Android device(4G RAM + 3G zram + 8 arm cores + kernel-4.14) too. > Under the monkey test, kswapd and fuse daemon thread deadlocked when free pages is extreme low > (less than 1/2 of the min watermark), the backtrace of the 2 threads is as follows. kswapd > try to evict inode to free some memory(blocked at inode_wait_for_writeback), and fuse daemon thread > handle the fuse inode write request, which is throttled when do direct reclaim in page allocation > slow path(blocked at throttle_direct_reclaim). As the __GFP_FS is set, the thread is throttled until > kswapd free enough pages until watermark ok(check allow_direct_reclaim), which cause the deadlock. > Although the kernel version is 4.14, the same issue exists in the upstream kernel too. > > kswapd0 D 26485194.538158 157 1287917 23577482 0x1a20840 0x0 157 438599862461462 > <ffffff8beec866b4> __switch_to+0x134/0x150 > <ffffff8befb838cc> __schedule+0xd5c/0x1100 > <ffffff8befb83ce0> schedule+0x70/0x90 > <ffffff8befb849b4> bit_wait+0x14/0x54 > <ffffff8befb84350> __wait_on_bit+0x74/0xe0 > <ffffff8beeeae0b4> inode_wait_for_writeback+0xa0/0xe4 This is the one I don't understand. Fuse inodes must never be dirty on eviction for the reason stated in my previous reply: > > I don't see how it can happen on upstream kernels, since there's a > >"write_inode_now(inode, 1)" call in fuse_release() and nothing can > > dirty the inode after the file has been released. If you could trace the source of this dirtyness I think that would explain this deadlock. Thanks, Miklos ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-04-25 13:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-02 4:08 [PATCH] fuse: avoid deadlock when write fuse inode Huang Jianan 2021-02-02 4:11 ` Huang Jianan 2021-02-07 1:47 ` [fuse-devel] " Huang Jianan 2021-03-24 15:28 ` Miklos Szeredi 2022-03-10 11:10 ` Rokudo Yan 2022-04-25 13:15 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).