* [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition @ 2019-11-08 11:03 Sahitya Tummala 2019-11-11 2:51 ` Chao Yu 0 siblings, 1 reply; 6+ messages in thread From: Sahitya Tummala @ 2019-11-08 11:03 UTC (permalink / raw) To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: linux-kernel There could be a potential deadlock when the storage capacity is almost full and theren't enough free segments available, due to which FG_GC is needed in the atomic commit ioctl as shown in the below callstack - schedule_timeout io_schedule_timeout congestion_wait f2fs_drop_inmem_pages_all f2fs_gc f2fs_balance_fs __write_node_page f2fs_fsync_node_pages f2fs_do_sync_file f2fs_ioctl If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set, then it waits forever in f2fs_drop_inmem_pages_all(), for this atomic inode to be dropped. And the rest of the system is stuck waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs() in the stack above. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> --- fs/f2fs/segment.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index da830fc..335ec09 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -300,7 +300,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure) if (inode) { if (gc_failure) { - if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) + if (fi->i_gc_failures[GC_FAILURE_ATOMIC] || + F2FS_I(inode)->inmem_task == current) goto drop; goto skip; } -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition 2019-11-08 11:03 [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition Sahitya Tummala @ 2019-11-11 2:51 ` Chao Yu 2019-11-11 3:40 ` Sahitya Tummala 0 siblings, 1 reply; 6+ messages in thread From: Chao Yu @ 2019-11-11 2:51 UTC (permalink / raw) To: Sahitya Tummala, Jaegeuk Kim, linux-f2fs-devel; +Cc: linux-kernel On 2019/11/8 19:03, Sahitya Tummala wrote: > There could be a potential deadlock when the storage capacity > is almost full and theren't enough free segments available, due > to which FG_GC is needed in the atomic commit ioctl as shown in > the below callstack - > > schedule_timeout > io_schedule_timeout > congestion_wait > f2fs_drop_inmem_pages_all > f2fs_gc > f2fs_balance_fs > __write_node_page > f2fs_fsync_node_pages > f2fs_do_sync_file > f2fs_ioctl > > If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set, > then it waits forever in f2fs_drop_inmem_pages_all(), for this > atomic inode to be dropped. And the rest of the system is stuck > waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs() > in the stack above. I think the root cause of this issue is there is potential infinite loop in f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will skip dropping its in-memory cache and calling iput(), and traverse the list again, most possibly there is the same inode in the head of that list. Could you please check below fix: diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 7bf7b0194944..8a3a35b42a37 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1395,6 +1395,7 @@ struct f2fs_sb_info { unsigned int gc_mode; /* current GC state */ unsigned int next_victim_seg[2]; /* next segment in victim section */ /* for skip statistic */ + unsigned int atomic_files; /* # of opened atomic file */ unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ unsigned long long skipped_gc_rwsem; /* FG_GC only */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ecd063239642..79f4b348951a 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) spin_lock(&sbi->inode_lock[ATOMIC_FILE]); if (list_empty(&fi->inmem_ilist)) list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]); + sbi->atomic_files++; spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); /* add inode in inmem_list first and set atomic_file */ diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 8b977bbd6822..6aa0bb693697 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure) struct list_head *head = &sbi->inode_list[ATOMIC_FILE]; struct inode *inode; struct f2fs_inode_info *fi; + unsigned int count = sbi->atomic_files; + unsigned int looped = 0; next: spin_lock(&sbi->inode_lock[ATOMIC_FILE]); if (list_empty(head)) { @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure) } fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist); inode = igrab(&fi->vfs_inode); + if (inode) + list_move_tail(&fi->inmem_ilist, head); spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); if (inode) { if (gc_failure) { - if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) - goto drop; - goto skip; + if (!fi->i_gc_failures[GC_FAILURE_ATOMIC]) + goto skip; } -drop: set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); f2fs_drop_inmem_pages(inode); +skip: iput(inode); } -skip: + congestion_wait(BLK_RW_ASYNC, HZ/50); cond_resched(); + + if (gc_failure) { + if (++looped >= count) + return; + } + goto next; } @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) spin_lock(&sbi->inode_lock[ATOMIC_FILE]); if (!list_empty(&fi->inmem_ilist)) list_del_init(&fi->inmem_ilist); + sbi->atomic_files--; spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); } Thanks, _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition 2019-11-11 2:51 ` Chao Yu @ 2019-11-11 3:40 ` Sahitya Tummala 2019-11-11 6:28 ` Chao Yu 0 siblings, 1 reply; 6+ messages in thread From: Sahitya Tummala @ 2019-11-11 3:40 UTC (permalink / raw) To: Chao Yu; +Cc: Jaegeuk Kim, linux-kernel, linux-f2fs-devel Hi Chao, On Mon, Nov 11, 2019 at 10:51:10AM +0800, Chao Yu wrote: > On 2019/11/8 19:03, Sahitya Tummala wrote: > > There could be a potential deadlock when the storage capacity > > is almost full and theren't enough free segments available, due > > to which FG_GC is needed in the atomic commit ioctl as shown in > > the below callstack - > > > > schedule_timeout > > io_schedule_timeout > > congestion_wait > > f2fs_drop_inmem_pages_all > > f2fs_gc > > f2fs_balance_fs > > __write_node_page > > f2fs_fsync_node_pages > > f2fs_do_sync_file > > f2fs_ioctl > > > > If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set, > > then it waits forever in f2fs_drop_inmem_pages_all(), for this > > atomic inode to be dropped. And the rest of the system is stuck > > waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs() > > in the stack above. > > I think the root cause of this issue is there is potential infinite loop in > f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the > first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will > skip dropping its in-memory cache and calling iput(), and traverse the list > again, most possibly there is the same inode in the head of that list. > I thought we are expecting for those atomic updates (without any gc failures) to be committed by doing congestion_wait() and thus retrying again. Hence, I just fixed only if we are ending up waiting for commit to happen in the atomic commit path itself, which will be a deadlock. > Could you please check below fix: > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 7bf7b0194944..8a3a35b42a37 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1395,6 +1395,7 @@ struct f2fs_sb_info { > unsigned int gc_mode; /* current GC state */ > unsigned int next_victim_seg[2]; /* next segment in victim section */ > /* for skip statistic */ > + unsigned int atomic_files; /* # of opened atomic file */ > unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ > unsigned long long skipped_gc_rwsem; /* FG_GC only */ > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index ecd063239642..79f4b348951a 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) > spin_lock(&sbi->inode_lock[ATOMIC_FILE]); > if (list_empty(&fi->inmem_ilist)) > list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]); > + sbi->atomic_files++; > spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); > > /* add inode in inmem_list first and set atomic_file */ > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 8b977bbd6822..6aa0bb693697 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, > bool gc_failure) > struct list_head *head = &sbi->inode_list[ATOMIC_FILE]; > struct inode *inode; > struct f2fs_inode_info *fi; > + unsigned int count = sbi->atomic_files; If the sbi->atomic_files decrements just after this, then the below exit condition may not work. In that case, looped will never be >= count. > + unsigned int looped = 0; > next: > spin_lock(&sbi->inode_lock[ATOMIC_FILE]); > if (list_empty(head)) { > @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, > bool gc_failure) > } > fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist); > inode = igrab(&fi->vfs_inode); > + if (inode) > + list_move_tail(&fi->inmem_ilist, head); > spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); > > if (inode) { > if (gc_failure) { > - if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) > - goto drop; > - goto skip; > + if (!fi->i_gc_failures[GC_FAILURE_ATOMIC]) > + goto skip; > } > -drop: > set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); > f2fs_drop_inmem_pages(inode); > +skip: > iput(inode); Does this result into f2fs_evict_inode() in this context for this inode? thanks, > } > -skip: > + > congestion_wait(BLK_RW_ASYNC, HZ/50); > cond_resched(); > + > + if (gc_failure) { > + if (++looped >= count) > + return; > + } > + > goto next; > } > > @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) > spin_lock(&sbi->inode_lock[ATOMIC_FILE]); > if (!list_empty(&fi->inmem_ilist)) > list_del_init(&fi->inmem_ilist); > + sbi->atomic_files--; > spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); > } > > Thanks, -- -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition 2019-11-11 3:40 ` Sahitya Tummala @ 2019-11-11 6:28 ` Chao Yu 2019-11-11 6:44 ` Sahitya Tummala 0 siblings, 1 reply; 6+ messages in thread From: Chao Yu @ 2019-11-11 6:28 UTC (permalink / raw) To: Sahitya Tummala; +Cc: Jaegeuk Kim, linux-kernel, linux-f2fs-devel Hi Sahitya, On 2019/11/11 11:40, Sahitya Tummala wrote: > Hi Chao, > > On Mon, Nov 11, 2019 at 10:51:10AM +0800, Chao Yu wrote: >> On 2019/11/8 19:03, Sahitya Tummala wrote: >>> There could be a potential deadlock when the storage capacity >>> is almost full and theren't enough free segments available, due >>> to which FG_GC is needed in the atomic commit ioctl as shown in >>> the below callstack - >>> >>> schedule_timeout >>> io_schedule_timeout >>> congestion_wait >>> f2fs_drop_inmem_pages_all >>> f2fs_gc >>> f2fs_balance_fs >>> __write_node_page >>> f2fs_fsync_node_pages >>> f2fs_do_sync_file >>> f2fs_ioctl >>> >>> If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set, >>> then it waits forever in f2fs_drop_inmem_pages_all(), for this >>> atomic inode to be dropped. And the rest of the system is stuck >>> waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs() >>> in the stack above. >> >> I think the root cause of this issue is there is potential infinite loop in >> f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the >> first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will >> skip dropping its in-memory cache and calling iput(), and traverse the list >> again, most possibly there is the same inode in the head of that list. >> > > I thought we are expecting for those atomic updates (without any gc failures) to be > committed by doing congestion_wait() and thus retrying again. Hence, I just Nope, we only need to drop inode which encounter gc failures, and keep the rest inodes. > fixed only if we are ending up waiting for commit to happen in the atomic > commit path itself, which will be a deadlock. Look into call stack you provide, I don't think it's correct to drop such inode, as its dirty pages should be committed before f2fs_fsync_node_pages(), so calling f2fs_drop_inmem_pages won't release any inmem pages, and won't help looped GC caused by skipping due to inmem pages. And then I figure out below fix... > >> Could you please check below fix: >> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >> index 7bf7b0194944..8a3a35b42a37 100644 >> --- a/fs/f2fs/f2fs.h >> +++ b/fs/f2fs/f2fs.h >> @@ -1395,6 +1395,7 @@ struct f2fs_sb_info { >> unsigned int gc_mode; /* current GC state */ >> unsigned int next_victim_seg[2]; /* next segment in victim section */ >> /* for skip statistic */ >> + unsigned int atomic_files; /* # of opened atomic file */ >> unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ >> unsigned long long skipped_gc_rwsem; /* FG_GC only */ >> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >> index ecd063239642..79f4b348951a 100644 >> --- a/fs/f2fs/file.c >> +++ b/fs/f2fs/file.c >> @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) >> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); >> if (list_empty(&fi->inmem_ilist)) >> list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]); >> + sbi->atomic_files++; >> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >> >> /* add inode in inmem_list first and set atomic_file */ >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >> index 8b977bbd6822..6aa0bb693697 100644 >> --- a/fs/f2fs/segment.c >> +++ b/fs/f2fs/segment.c >> @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, >> bool gc_failure) >> struct list_head *head = &sbi->inode_list[ATOMIC_FILE]; >> struct inode *inode; >> struct f2fs_inode_info *fi; >> + unsigned int count = sbi->atomic_files; > > If the sbi->atomic_files decrements just after this, then the below exit condition > may not work. In that case, looped will never be >= count. > >> + unsigned int looped = 0; >> next: >> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); >> if (list_empty(head)) { >> @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, >> bool gc_failure) >> } >> fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist); >> inode = igrab(&fi->vfs_inode); >> + if (inode) >> + list_move_tail(&fi->inmem_ilist, head); >> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >> >> if (inode) { >> if (gc_failure) { >> - if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) >> - goto drop; >> - goto skip; >> + if (!fi->i_gc_failures[GC_FAILURE_ATOMIC]) >> + goto skip; >> } >> -drop: >> set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); >> f2fs_drop_inmem_pages(inode); >> +skip: >> iput(inode); > > Does this result into f2fs_evict_inode() in this context for this inode? Yup, we need to call igrab/iput in pair in f2fs_drop_inmem_pages_all() anyway. Previously, we may have .i_count leak... Thanks, > > thanks, > >> } >> -skip: >> + >> congestion_wait(BLK_RW_ASYNC, HZ/50); >> cond_resched(); >> + >> + if (gc_failure) { >> + if (++looped >= count) >> + return; >> + } >> + >> goto next; >> } >> >> @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) >> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); >> if (!list_empty(&fi->inmem_ilist)) >> list_del_init(&fi->inmem_ilist); >> + sbi->atomic_files--; >> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >> } >> >> Thanks, > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition 2019-11-11 6:28 ` Chao Yu @ 2019-11-11 6:44 ` Sahitya Tummala 2019-11-11 7:18 ` Chao Yu 0 siblings, 1 reply; 6+ messages in thread From: Sahitya Tummala @ 2019-11-11 6:44 UTC (permalink / raw) To: Chao Yu; +Cc: Jaegeuk Kim, linux-kernel, linux-f2fs-devel Hi Chao, On Mon, Nov 11, 2019 at 02:28:47PM +0800, Chao Yu wrote: > Hi Sahitya, > > On 2019/11/11 11:40, Sahitya Tummala wrote: > > Hi Chao, > > > > On Mon, Nov 11, 2019 at 10:51:10AM +0800, Chao Yu wrote: > >> On 2019/11/8 19:03, Sahitya Tummala wrote: > >>> There could be a potential deadlock when the storage capacity > >>> is almost full and theren't enough free segments available, due > >>> to which FG_GC is needed in the atomic commit ioctl as shown in > >>> the below callstack - > >>> > >>> schedule_timeout > >>> io_schedule_timeout > >>> congestion_wait > >>> f2fs_drop_inmem_pages_all > >>> f2fs_gc > >>> f2fs_balance_fs > >>> __write_node_page > >>> f2fs_fsync_node_pages > >>> f2fs_do_sync_file > >>> f2fs_ioctl > >>> > >>> If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set, > >>> then it waits forever in f2fs_drop_inmem_pages_all(), for this > >>> atomic inode to be dropped. And the rest of the system is stuck > >>> waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs() > >>> in the stack above. > >> > >> I think the root cause of this issue is there is potential infinite loop in > >> f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the > >> first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will > >> skip dropping its in-memory cache and calling iput(), and traverse the list > >> again, most possibly there is the same inode in the head of that list. > >> > > > > I thought we are expecting for those atomic updates (without any gc failures) to be > > committed by doing congestion_wait() and thus retrying again. Hence, I just > > Nope, we only need to drop inode which encounter gc failures, and keep the rest > inodes. > > > fixed only if we are ending up waiting for commit to happen in the atomic > > commit path itself, which will be a deadlock. > > Look into call stack you provide, I don't think it's correct to drop such inode, > as its dirty pages should be committed before f2fs_fsync_node_pages(), so > calling f2fs_drop_inmem_pages won't release any inmem pages, and won't help > looped GC caused by skipping due to inmem pages. > > And then I figure out below fix... > Thanks for the explanation. The fix below looks good to me. Thanks, Sahitya. > > > >> Could you please check below fix: > >> > >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > >> index 7bf7b0194944..8a3a35b42a37 100644 > >> --- a/fs/f2fs/f2fs.h > >> +++ b/fs/f2fs/f2fs.h > >> @@ -1395,6 +1395,7 @@ struct f2fs_sb_info { > >> unsigned int gc_mode; /* current GC state */ > >> unsigned int next_victim_seg[2]; /* next segment in victim section */ > >> /* for skip statistic */ > >> + unsigned int atomic_files; /* # of opened atomic file */ > >> unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ > >> unsigned long long skipped_gc_rwsem; /* FG_GC only */ > >> > >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > >> index ecd063239642..79f4b348951a 100644 > >> --- a/fs/f2fs/file.c > >> +++ b/fs/f2fs/file.c > >> @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) > >> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); > >> if (list_empty(&fi->inmem_ilist)) > >> list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]); > >> + sbi->atomic_files++; > >> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); > >> > >> /* add inode in inmem_list first and set atomic_file */ > >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > >> index 8b977bbd6822..6aa0bb693697 100644 > >> --- a/fs/f2fs/segment.c > >> +++ b/fs/f2fs/segment.c > >> @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, > >> bool gc_failure) > >> struct list_head *head = &sbi->inode_list[ATOMIC_FILE]; > >> struct inode *inode; > >> struct f2fs_inode_info *fi; > >> + unsigned int count = sbi->atomic_files; > > > > If the sbi->atomic_files decrements just after this, then the below exit condition > > may not work. In that case, looped will never be >= count. > > > >> + unsigned int looped = 0; > >> next: > >> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); > >> if (list_empty(head)) { > >> @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, > >> bool gc_failure) > >> } > >> fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist); > >> inode = igrab(&fi->vfs_inode); > >> + if (inode) > >> + list_move_tail(&fi->inmem_ilist, head); > >> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); > >> > >> if (inode) { > >> if (gc_failure) { > >> - if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) > >> - goto drop; > >> - goto skip; > >> + if (!fi->i_gc_failures[GC_FAILURE_ATOMIC]) > >> + goto skip; > >> } > >> -drop: > >> set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); > >> f2fs_drop_inmem_pages(inode); > >> +skip: > >> iput(inode); > > > > Does this result into f2fs_evict_inode() in this context for this inode? > > Yup, we need to call igrab/iput in pair in f2fs_drop_inmem_pages_all() anyway. > > Previously, we may have .i_count leak... > > Thanks, > > > > > thanks, > > > >> } > >> -skip: > >> + > >> congestion_wait(BLK_RW_ASYNC, HZ/50); > >> cond_resched(); > >> + > >> + if (gc_failure) { > >> + if (++looped >= count) > >> + return; > >> + } > >> + > >> goto next; > >> } > >> > >> @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) > >> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); > >> if (!list_empty(&fi->inmem_ilist)) > >> list_del_init(&fi->inmem_ilist); > >> + sbi->atomic_files--; > >> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); > >> } > >> > >> Thanks, > > -- -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition 2019-11-11 6:44 ` Sahitya Tummala @ 2019-11-11 7:18 ` Chao Yu 0 siblings, 0 replies; 6+ messages in thread From: Chao Yu @ 2019-11-11 7:18 UTC (permalink / raw) To: Sahitya Tummala; +Cc: Jaegeuk Kim, linux-kernel, linux-f2fs-devel Hi Sahitya, On 2019/11/11 14:44, Sahitya Tummala wrote: > Hi Chao, > > On Mon, Nov 11, 2019 at 02:28:47PM +0800, Chao Yu wrote: >> Hi Sahitya, >> >> On 2019/11/11 11:40, Sahitya Tummala wrote: >>> Hi Chao, >>> >>> On Mon, Nov 11, 2019 at 10:51:10AM +0800, Chao Yu wrote: >>>> On 2019/11/8 19:03, Sahitya Tummala wrote: >>>>> There could be a potential deadlock when the storage capacity >>>>> is almost full and theren't enough free segments available, due >>>>> to which FG_GC is needed in the atomic commit ioctl as shown in >>>>> the below callstack - >>>>> >>>>> schedule_timeout >>>>> io_schedule_timeout >>>>> congestion_wait >>>>> f2fs_drop_inmem_pages_all >>>>> f2fs_gc >>>>> f2fs_balance_fs >>>>> __write_node_page >>>>> f2fs_fsync_node_pages >>>>> f2fs_do_sync_file >>>>> f2fs_ioctl >>>>> >>>>> If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set, >>>>> then it waits forever in f2fs_drop_inmem_pages_all(), for this >>>>> atomic inode to be dropped. And the rest of the system is stuck >>>>> waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs() >>>>> in the stack above. >>>> >>>> I think the root cause of this issue is there is potential infinite loop in >>>> f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the >>>> first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will >>>> skip dropping its in-memory cache and calling iput(), and traverse the list >>>> again, most possibly there is the same inode in the head of that list. >>>> >>> >>> I thought we are expecting for those atomic updates (without any gc failures) to be >>> committed by doing congestion_wait() and thus retrying again. Hence, I just >> >> Nope, we only need to drop inode which encounter gc failures, and keep the rest >> inodes. >> >>> fixed only if we are ending up waiting for commit to happen in the atomic >>> commit path itself, which will be a deadlock. >> >> Look into call stack you provide, I don't think it's correct to drop such inode, >> as its dirty pages should be committed before f2fs_fsync_node_pages(), so >> calling f2fs_drop_inmem_pages won't release any inmem pages, and won't help >> looped GC caused by skipping due to inmem pages. >> >> And then I figure out below fix... >> > > Thanks for the explanation. > The fix below looks good to me. So could you submit v2? :) Thanks, > > Thanks, > Sahitya. > >>> >>>> Could you please check below fix: >>>> >>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>> index 7bf7b0194944..8a3a35b42a37 100644 >>>> --- a/fs/f2fs/f2fs.h >>>> +++ b/fs/f2fs/f2fs.h >>>> @@ -1395,6 +1395,7 @@ struct f2fs_sb_info { >>>> unsigned int gc_mode; /* current GC state */ >>>> unsigned int next_victim_seg[2]; /* next segment in victim section */ >>>> /* for skip statistic */ >>>> + unsigned int atomic_files; /* # of opened atomic file */ >>>> unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ >>>> unsigned long long skipped_gc_rwsem; /* FG_GC only */ >>>> >>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>> index ecd063239642..79f4b348951a 100644 >>>> --- a/fs/f2fs/file.c >>>> +++ b/fs/f2fs/file.c >>>> @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) >>>> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); >>>> if (list_empty(&fi->inmem_ilist)) >>>> list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]); >>>> + sbi->atomic_files++; >>>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>>> >>>> /* add inode in inmem_list first and set atomic_file */ >>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>>> index 8b977bbd6822..6aa0bb693697 100644 >>>> --- a/fs/f2fs/segment.c >>>> +++ b/fs/f2fs/segment.c >>>> @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, >>>> bool gc_failure) >>>> struct list_head *head = &sbi->inode_list[ATOMIC_FILE]; >>>> struct inode *inode; >>>> struct f2fs_inode_info *fi; >>>> + unsigned int count = sbi->atomic_files; >>> >>> If the sbi->atomic_files decrements just after this, then the below exit condition >>> may not work. In that case, looped will never be >= count. >>> >>>> + unsigned int looped = 0; >>>> next: >>>> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); >>>> if (list_empty(head)) { >>>> @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, >>>> bool gc_failure) >>>> } >>>> fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist); >>>> inode = igrab(&fi->vfs_inode); >>>> + if (inode) >>>> + list_move_tail(&fi->inmem_ilist, head); >>>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>>> >>>> if (inode) { >>>> if (gc_failure) { >>>> - if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) >>>> - goto drop; >>>> - goto skip; >>>> + if (!fi->i_gc_failures[GC_FAILURE_ATOMIC]) >>>> + goto skip; >>>> } >>>> -drop: >>>> set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); >>>> f2fs_drop_inmem_pages(inode); >>>> +skip: >>>> iput(inode); >>> >>> Does this result into f2fs_evict_inode() in this context for this inode? >> >> Yup, we need to call igrab/iput in pair in f2fs_drop_inmem_pages_all() anyway. >> >> Previously, we may have .i_count leak... >> >> Thanks, >> >>> >>> thanks, >>> >>>> } >>>> -skip: >>>> + >>>> congestion_wait(BLK_RW_ASYNC, HZ/50); >>>> cond_resched(); >>>> + >>>> + if (gc_failure) { >>>> + if (++looped >= count) >>>> + return; >>>> + } >>>> + >>>> goto next; >>>> } >>>> >>>> @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) >>>> spin_lock(&sbi->inode_lock[ATOMIC_FILE]); >>>> if (!list_empty(&fi->inmem_ilist)) >>>> list_del_init(&fi->inmem_ilist); >>>> + sbi->atomic_files--; >>>> spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); >>>> } >>>> >>>> Thanks, >>> > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-11-11 7:18 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-08 11:03 [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition Sahitya Tummala 2019-11-11 2:51 ` Chao Yu 2019-11-11 3:40 ` Sahitya Tummala 2019-11-11 6:28 ` Chao Yu 2019-11-11 6:44 ` Sahitya Tummala 2019-11-11 7:18 ` Chao Yu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).