* [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint @ 2020-04-30 10:58 Sayali Lokhande 2020-05-06 6:21 ` Chao Yu 2020-05-08 16:10 ` Jaegeuk Kim 0 siblings, 2 replies; 10+ messages in thread From: Sayali Lokhande @ 2020-04-30 10:58 UTC (permalink / raw) To: jaegeuk, yuchao0, linux-f2fs-devel; +Cc: linux-kernel There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> --- fs/f2fs/checkpoint.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 5ba649e..97b6378 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1219,21 +1219,19 @@ static int block_operations(struct f2fs_sb_info *sbi) goto retry_flush_quotas; } -retry_flush_nodes: down_write(&sbi->node_write); if (get_pages(sbi, F2FS_DIRTY_NODES)) { up_write(&sbi->node_write); + up_write(&sbi->node_change); + f2fs_unlock_all(sbi); atomic_inc(&sbi->wb_sync_req[NODE]); err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); atomic_dec(&sbi->wb_sync_req[NODE]); - if (err) { - up_write(&sbi->node_change); - f2fs_unlock_all(sbi); + if (err) goto out; - } cond_resched(); - goto retry_flush_nodes; + goto retry_flush_quotas; } /* -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-04-30 10:58 [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint Sayali Lokhande @ 2020-05-06 6:21 ` Chao Yu 2020-05-08 16:10 ` Jaegeuk Kim 1 sibling, 0 replies; 10+ messages in thread From: Chao Yu @ 2020-05-06 6:21 UTC (permalink / raw) To: Sayali Lokhande, jaegeuk, linux-f2fs-devel; +Cc: linux-kernel On 2020/4/30 18:58, Sayali Lokhande wrote: > There could be a scenario where f2fs_sync_node_pages gets > called during checkpoint, which in turn tries to flush > inline data and calls iput(). This results in deadlock as > iput() tries to hold cp_rwsem, which is already held at the > beginning by checkpoint->block_operations(). > > Call stack : > > Thread A Thread B > f2fs_write_checkpoint() > - block_operations(sbi) > - f2fs_lock_all(sbi); > - down_write(&sbi->cp_rwsem); > > - open() > - igrab() > - write() write inline data > - unlink() > - f2fs_sync_node_pages() > - if (is_inline_node(page)) > - flush_inline_data() > - ilookup() > page = f2fs_pagecache_get_page() > if (!page) > goto iput_out; > iput_out: > -close() > -iput() > iput(inode); > - f2fs_evict_inode() > - f2fs_truncate_blocks() > - f2fs_lock_op() > - down_read(&sbi->cp_rwsem); > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Thanks, _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-04-30 10:58 [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint Sayali Lokhande 2020-05-06 6:21 ` Chao Yu @ 2020-05-08 16:10 ` Jaegeuk Kim 2020-05-09 3:02 ` Chao Yu 1 sibling, 1 reply; 10+ messages in thread From: Jaegeuk Kim @ 2020-05-08 16:10 UTC (permalink / raw) To: Sayali Lokhande; +Cc: linux-kernel, linux-f2fs-devel Hi Sayali, In order to address the perf regression, how about this? From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim <jaegeuk@kernel.org> Date: Fri, 8 May 2020 09:08:37 -0700 Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/node.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 1db8cabf727ef..626d7daca09de 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, goto continue_unlock; } - /* flush inline_data */ - if (is_inline_node(page)) { + /* flush inline_data, if it's not sync path. */ + if (do_balance && is_inline_node(page)) { clear_inline_node(page); unlock_page(page); flush_inline_data(sbi, ino_of_node(page)); -- 2.26.2.645.ge9eca65c58-goog _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-08 16:10 ` Jaegeuk Kim @ 2020-05-09 3:02 ` Chao Yu 2020-05-09 19:03 ` Jaegeuk Kim 0 siblings, 1 reply; 10+ messages in thread From: Chao Yu @ 2020-05-09 3:02 UTC (permalink / raw) To: Jaegeuk Kim, Sayali Lokhande; +Cc: linux-kernel, linux-f2fs-devel On 2020/5/9 0:10, Jaegeuk Kim wrote: > Hi Sayali, > > In order to address the perf regression, how about this? > >>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim <jaegeuk@kernel.org> > Date: Fri, 8 May 2020 09:08:37 -0700 > Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint > > There could be a scenario where f2fs_sync_node_pages gets > called during checkpoint, which in turn tries to flush > inline data and calls iput(). This results in deadlock as > iput() tries to hold cp_rwsem, which is already held at the > beginning by checkpoint->block_operations(). > > Call stack : > > Thread A Thread B > f2fs_write_checkpoint() > - block_operations(sbi) > - f2fs_lock_all(sbi); > - down_write(&sbi->cp_rwsem); > > - open() > - igrab() > - write() write inline data > - unlink() > - f2fs_sync_node_pages() > - if (is_inline_node(page)) > - flush_inline_data() > - ilookup() > page = f2fs_pagecache_get_page() > if (!page) > goto iput_out; > iput_out: > -close() > -iput() > iput(inode); > - f2fs_evict_inode() > - f2fs_truncate_blocks() > - f2fs_lock_op() > - down_read(&sbi->cp_rwsem); > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > fs/f2fs/node.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 1db8cabf727ef..626d7daca09de 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > goto continue_unlock; > } > > - /* flush inline_data */ > - if (is_inline_node(page)) { > + /* flush inline_data, if it's not sync path. */ > + if (do_balance && is_inline_node(page)) { IIRC, this flow was designed to avoid running out of free space issue during checkpoint: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly I guess this may cause failing one case of fstest. Since block_operations->f2fs_sync_inode_meta has synced inode cache to inode page, so in block_operations->f2fs_sync_node_pages, could we check nlink before flush_inline_data(): if (is_inline_node(page)) { if (IS_INODE(page) && raw_inode_page->i_links) { flush_inline_data() } } > clear_inline_node(page); > unlock_page(page); > flush_inline_data(sbi, ino_of_node(page)); > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-09 3:02 ` Chao Yu @ 2020-05-09 19:03 ` Jaegeuk Kim 2020-05-11 1:28 ` Chao Yu 0 siblings, 1 reply; 10+ messages in thread From: Jaegeuk Kim @ 2020-05-09 19:03 UTC (permalink / raw) To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel On 05/09, Chao Yu wrote: > On 2020/5/9 0:10, Jaegeuk Kim wrote: > > Hi Sayali, > > > > In order to address the perf regression, how about this? > > > >>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 > > From: Jaegeuk Kim <jaegeuk@kernel.org> > > Date: Fri, 8 May 2020 09:08:37 -0700 > > Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint > > > > There could be a scenario where f2fs_sync_node_pages gets > > called during checkpoint, which in turn tries to flush > > inline data and calls iput(). This results in deadlock as > > iput() tries to hold cp_rwsem, which is already held at the > > beginning by checkpoint->block_operations(). > > > > Call stack : > > > > Thread A Thread B > > f2fs_write_checkpoint() > > - block_operations(sbi) > > - f2fs_lock_all(sbi); > > - down_write(&sbi->cp_rwsem); > > > > - open() > > - igrab() > > - write() write inline data > > - unlink() > > - f2fs_sync_node_pages() > > - if (is_inline_node(page)) > > - flush_inline_data() > > - ilookup() > > page = f2fs_pagecache_get_page() > > if (!page) > > goto iput_out; > > iput_out: > > -close() > > -iput() > > iput(inode); > > - f2fs_evict_inode() > > - f2fs_truncate_blocks() > > - f2fs_lock_op() > > - down_read(&sbi->cp_rwsem); > > > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > --- > > fs/f2fs/node.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > index 1db8cabf727ef..626d7daca09de 100644 > > --- a/fs/f2fs/node.c > > +++ b/fs/f2fs/node.c > > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > > goto continue_unlock; > > } > > > > - /* flush inline_data */ > > - if (is_inline_node(page)) { > > + /* flush inline_data, if it's not sync path. */ > > + if (do_balance && is_inline_node(page)) { > > IIRC, this flow was designed to avoid running out of free space issue > during checkpoint: > > 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > > The sceanrio is: > 1. create fully node blocks > 2. flush node blocks > 3. write inline_data for all the node blocks again > 4. flush node blocks redundantly > > I guess this may cause failing one case of fstest. Yeah, actually I was hitting 204 failure, and thus, revised like this. Now, I don't see any regression in fstest. From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 From: Sayali Lokhande <sayalil@codeaurora.org> Date: Thu, 30 Apr 2020 16:28:29 +0530 Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/checkpoint.c | 9 ++++++++- fs/f2fs/f2fs.h | 4 ++-- fs/f2fs/node.c | 10 +++++----- 3 files changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d49f7a01d8a26..928aea4ff663d 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) }; int err = 0, cnt = 0; + /* + * Let's flush node pages first to flush inline_data. + * We'll actually guarantee everything below under f2fs_lock_all. + */ + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); + retry_flush_quotas: f2fs_lock_all(sbi); if (__need_flush_quota(sbi)) { @@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi) if (get_pages(sbi, F2FS_DIRTY_NODES)) { up_write(&sbi->node_write); atomic_inc(&sbi->wb_sync_req[NODE]); - err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); + err = f2fs_sync_node_pages(sbi, &wbc, false, + true, FS_CP_NODE_IO); atomic_dec(&sbi->wb_sync_req[NODE]); if (err) { up_write(&sbi->node_change); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d916540f12813..ac6ae42b9dd4e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, struct writeback_control *wbc, bool atomic, unsigned int *seq_id); int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, - struct writeback_control *wbc, - bool do_balance, enum iostat_type io_type); + struct writeback_control *wbc, + bool do_balance, bool sync, enum iostat_type io_type); int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 1db8cabf727ef..fd00a8c119088 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page) } int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, - struct writeback_control *wbc, - bool do_balance, enum iostat_type io_type) + struct writeback_control *wbc, + bool do_balance, bool sync, enum iostat_type io_type) { pgoff_t index; struct pagevec pvec; @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, goto continue_unlock; } - /* flush inline_data */ - if (is_inline_node(page)) { + /* flush inline_data, if it's async context. */ + if (!sync && is_inline_node(page)) { clear_inline_node(page); unlock_page(page); flush_inline_data(sbi, ino_of_node(page)); @@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space *mapping, diff = nr_pages_to_write(sbi, NODE, wbc); blk_start_plug(&plug); - f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO); + f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO); blk_finish_plug(&plug); wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff); -- 2.26.2.645.ge9eca65c58-goog _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-09 19:03 ` Jaegeuk Kim @ 2020-05-11 1:28 ` Chao Yu 2020-05-11 22:11 ` Jaegeuk Kim 0 siblings, 1 reply; 10+ messages in thread From: Chao Yu @ 2020-05-11 1:28 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel On 2020/5/10 3:03, Jaegeuk Kim wrote: > On 05/09, Chao Yu wrote: >> On 2020/5/9 0:10, Jaegeuk Kim wrote: >>> Hi Sayali, >>> >>> In order to address the perf regression, how about this? >>> >>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 >>> From: Jaegeuk Kim <jaegeuk@kernel.org> >>> Date: Fri, 8 May 2020 09:08:37 -0700 >>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint >>> >>> There could be a scenario where f2fs_sync_node_pages gets >>> called during checkpoint, which in turn tries to flush >>> inline data and calls iput(). This results in deadlock as >>> iput() tries to hold cp_rwsem, which is already held at the >>> beginning by checkpoint->block_operations(). >>> >>> Call stack : >>> >>> Thread A Thread B >>> f2fs_write_checkpoint() >>> - block_operations(sbi) >>> - f2fs_lock_all(sbi); >>> - down_write(&sbi->cp_rwsem); >>> >>> - open() >>> - igrab() >>> - write() write inline data >>> - unlink() >>> - f2fs_sync_node_pages() >>> - if (is_inline_node(page)) >>> - flush_inline_data() >>> - ilookup() >>> page = f2fs_pagecache_get_page() >>> if (!page) >>> goto iput_out; >>> iput_out: >>> -close() >>> -iput() >>> iput(inode); >>> - f2fs_evict_inode() >>> - f2fs_truncate_blocks() >>> - f2fs_lock_op() >>> - down_read(&sbi->cp_rwsem); >>> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>> --- >>> fs/f2fs/node.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>> index 1db8cabf727ef..626d7daca09de 100644 >>> --- a/fs/f2fs/node.c >>> +++ b/fs/f2fs/node.c >>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>> goto continue_unlock; >>> } >>> >>> - /* flush inline_data */ >>> - if (is_inline_node(page)) { >>> + /* flush inline_data, if it's not sync path. */ >>> + if (do_balance && is_inline_node(page)) { >> >> IIRC, this flow was designed to avoid running out of free space issue >> during checkpoint: >> >> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >> >> The sceanrio is: >> 1. create fully node blocks >> 2. flush node blocks >> 3. write inline_data for all the node blocks again >> 4. flush node blocks redundantly >> >> I guess this may cause failing one case of fstest. > > Yeah, actually I was hitting 204 failure, and thus, revised like this. > Now, I don't see any regression in fstest. > >>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 > From: Sayali Lokhande <sayalil@codeaurora.org> > Date: Thu, 30 Apr 2020 16:28:29 +0530 > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > > There could be a scenario where f2fs_sync_node_pages gets > called during checkpoint, which in turn tries to flush > inline data and calls iput(). This results in deadlock as > iput() tries to hold cp_rwsem, which is already held at the > beginning by checkpoint->block_operations(). > > Call stack : > > Thread A Thread B > f2fs_write_checkpoint() > - block_operations(sbi) > - f2fs_lock_all(sbi); > - down_write(&sbi->cp_rwsem); > > - open() > - igrab() > - write() write inline data > - unlink() > - f2fs_sync_node_pages() > - if (is_inline_node(page)) > - flush_inline_data() > - ilookup() > page = f2fs_pagecache_get_page() > if (!page) > goto iput_out; > iput_out: > -close() > -iput() > iput(inode); > - f2fs_evict_inode() > - f2fs_truncate_blocks() > - f2fs_lock_op() > - down_read(&sbi->cp_rwsem); > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > fs/f2fs/checkpoint.c | 9 ++++++++- > fs/f2fs/f2fs.h | 4 ++-- > fs/f2fs/node.c | 10 +++++----- > 3 files changed, 15 insertions(+), 8 deletions(-) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index d49f7a01d8a26..928aea4ff663d 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) > }; > int err = 0, cnt = 0; > > + /* > + * Let's flush node pages first to flush inline_data. > + * We'll actually guarantee everything below under f2fs_lock_all. > + */ > + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); It is possible that user write a large number of inline data in between f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in race condition. Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will flush inode page twice which is unneeded. f2fs_sync_node_pages() --- flush dirty inode page f2fs_lock_all() ... f2fs_sync_inode_meta() --- update dirty inode page f2fs_sync_node_pages() --- flush dirty inode page again. Thanks, > + > retry_flush_quotas: > f2fs_lock_all(sbi); > if (__need_flush_quota(sbi)) { > @@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi) > if (get_pages(sbi, F2FS_DIRTY_NODES)) { > up_write(&sbi->node_write); > atomic_inc(&sbi->wb_sync_req[NODE]); > - err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); > + err = f2fs_sync_node_pages(sbi, &wbc, false, > + true, FS_CP_NODE_IO); > atomic_dec(&sbi->wb_sync_req[NODE]); > if (err) { > up_write(&sbi->node_change); > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index d916540f12813..ac6ae42b9dd4e 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, > struct writeback_control *wbc, bool atomic, > unsigned int *seq_id); > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > - struct writeback_control *wbc, > - bool do_balance, enum iostat_type io_type); > + struct writeback_control *wbc, > + bool do_balance, bool sync, enum iostat_type io_type); > int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); > bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); > void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 1db8cabf727ef..fd00a8c119088 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page) > } > > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > - struct writeback_control *wbc, > - bool do_balance, enum iostat_type io_type) > + struct writeback_control *wbc, > + bool do_balance, bool sync, enum iostat_type io_type) > { > pgoff_t index; > struct pagevec pvec; > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > goto continue_unlock; > } > > - /* flush inline_data */ > - if (is_inline_node(page)) { > + /* flush inline_data, if it's async context. */ > + if (!sync && is_inline_node(page)) { > clear_inline_node(page); > unlock_page(page); > flush_inline_data(sbi, ino_of_node(page)); > @@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space *mapping, > > diff = nr_pages_to_write(sbi, NODE, wbc); > blk_start_plug(&plug); > - f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO); > + f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO); > blk_finish_plug(&plug); > wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff); > > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-11 1:28 ` Chao Yu @ 2020-05-11 22:11 ` Jaegeuk Kim 2020-05-12 1:57 ` Chao Yu 0 siblings, 1 reply; 10+ messages in thread From: Jaegeuk Kim @ 2020-05-11 22:11 UTC (permalink / raw) To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel On 05/11, Chao Yu wrote: > On 2020/5/10 3:03, Jaegeuk Kim wrote: > > On 05/09, Chao Yu wrote: > >> On 2020/5/9 0:10, Jaegeuk Kim wrote: > >>> Hi Sayali, > >>> > >>> In order to address the perf regression, how about this? > >>> > >>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 > >>> From: Jaegeuk Kim <jaegeuk@kernel.org> > >>> Date: Fri, 8 May 2020 09:08:37 -0700 > >>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint > >>> > >>> There could be a scenario where f2fs_sync_node_pages gets > >>> called during checkpoint, which in turn tries to flush > >>> inline data and calls iput(). This results in deadlock as > >>> iput() tries to hold cp_rwsem, which is already held at the > >>> beginning by checkpoint->block_operations(). > >>> > >>> Call stack : > >>> > >>> Thread A Thread B > >>> f2fs_write_checkpoint() > >>> - block_operations(sbi) > >>> - f2fs_lock_all(sbi); > >>> - down_write(&sbi->cp_rwsem); > >>> > >>> - open() > >>> - igrab() > >>> - write() write inline data > >>> - unlink() > >>> - f2fs_sync_node_pages() > >>> - if (is_inline_node(page)) > >>> - flush_inline_data() > >>> - ilookup() > >>> page = f2fs_pagecache_get_page() > >>> if (!page) > >>> goto iput_out; > >>> iput_out: > >>> -close() > >>> -iput() > >>> iput(inode); > >>> - f2fs_evict_inode() > >>> - f2fs_truncate_blocks() > >>> - f2fs_lock_op() > >>> - down_read(&sbi->cp_rwsem); > >>> > >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > >>> --- > >>> fs/f2fs/node.c | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > >>> index 1db8cabf727ef..626d7daca09de 100644 > >>> --- a/fs/f2fs/node.c > >>> +++ b/fs/f2fs/node.c > >>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > >>> goto continue_unlock; > >>> } > >>> > >>> - /* flush inline_data */ > >>> - if (is_inline_node(page)) { > >>> + /* flush inline_data, if it's not sync path. */ > >>> + if (do_balance && is_inline_node(page)) { > >> > >> IIRC, this flow was designed to avoid running out of free space issue > >> during checkpoint: > >> > >> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > >> > >> The sceanrio is: > >> 1. create fully node blocks > >> 2. flush node blocks > >> 3. write inline_data for all the node blocks again > >> 4. flush node blocks redundantly > >> > >> I guess this may cause failing one case of fstest. > > > > Yeah, actually I was hitting 204 failure, and thus, revised like this. > > Now, I don't see any regression in fstest. > > > >>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 > > From: Sayali Lokhande <sayalil@codeaurora.org> > > Date: Thu, 30 Apr 2020 16:28:29 +0530 > > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > > > > There could be a scenario where f2fs_sync_node_pages gets > > called during checkpoint, which in turn tries to flush > > inline data and calls iput(). This results in deadlock as > > iput() tries to hold cp_rwsem, which is already held at the > > beginning by checkpoint->block_operations(). > > > > Call stack : > > > > Thread A Thread B > > f2fs_write_checkpoint() > > - block_operations(sbi) > > - f2fs_lock_all(sbi); > > - down_write(&sbi->cp_rwsem); > > > > - open() > > - igrab() > > - write() write inline data > > - unlink() > > - f2fs_sync_node_pages() > > - if (is_inline_node(page)) > > - flush_inline_data() > > - ilookup() > > page = f2fs_pagecache_get_page() > > if (!page) > > goto iput_out; > > iput_out: > > -close() > > -iput() > > iput(inode); > > - f2fs_evict_inode() > > - f2fs_truncate_blocks() > > - f2fs_lock_op() > > - down_read(&sbi->cp_rwsem); > > > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > --- > > fs/f2fs/checkpoint.c | 9 ++++++++- > > fs/f2fs/f2fs.h | 4 ++-- > > fs/f2fs/node.c | 10 +++++----- > > 3 files changed, 15 insertions(+), 8 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index d49f7a01d8a26..928aea4ff663d 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) > > }; > > int err = 0, cnt = 0; > > > > + /* > > + * Let's flush node pages first to flush inline_data. > > + * We'll actually guarantee everything below under f2fs_lock_all. > > + */ > > + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); > > It is possible that user write a large number of inline data in between > f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in > race condition. > > Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will > flush inode page twice which is unneeded. > > f2fs_sync_node_pages() --- flush dirty inode page > f2fs_lock_all() > ... > f2fs_sync_inode_meta() --- update dirty inode page > f2fs_sync_node_pages() --- flush dirty inode page again. > Another version: From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001 From: Sayali Lokhande <sayalil@codeaurora.org> Date: Thu, 30 Apr 2020 16:28:29 +0530 Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/checkpoint.c | 5 +++++ fs/f2fs/f2fs.h | 1 + fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d49f7a01d8a26..79e605f38f4fa 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi) }; int err = 0, cnt = 0; + /* + * Let's flush inline_data in dirty node pages. + */ + f2fs_flush_inline_data(sbi); + retry_flush_quotas: f2fs_lock_all(sbi); if (__need_flush_quota(sbi)) { diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 2a8ea81c52a15..7f3d259e7e376 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); struct page *f2fs_get_node_page_ra(struct page *parent, int start); int f2fs_move_node_page(struct page *node_page, int gc_type); +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi); int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, struct writeback_control *wbc, bool atomic, unsigned int *seq_id); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 1db8cabf727ef..e632de10aedab 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page) return true; } +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi) +{ + pgoff_t index = 0; + struct pagevec pvec; + int nr_pages; + int ret = 0; + + pagevec_init(&pvec); + + while ((nr_pages = pagevec_lookup_tag(&pvec, + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { + int i; + + for (i = 0; i < nr_pages; i++) { + struct page *page = pvec.pages[i]; + + if (!IS_DNODE(page)) + continue; + + lock_page(page); + + if (unlikely(page->mapping != NODE_MAPPING(sbi))) { +continue_unlock: + unlock_page(page); + continue; + } + + if (!PageDirty(page)) { + /* someone wrote it for us */ + goto continue_unlock; + } + + /* flush inline_data, if it's async context. */ + if (is_inline_node(page)) { + clear_inline_node(page); + unlock_page(page); + flush_inline_data(sbi, ino_of_node(page)); + continue; + } + unlock_page(page); + } + pagevec_release(&pvec); + cond_resched(); + } + return ret; +} + int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, bool do_balance, enum iostat_type io_type) @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, goto continue_unlock; } - /* flush inline_data */ - if (is_inline_node(page)) { + /* flush inline_data, if it's async context. */ + if (do_balance && is_inline_node(page)) { clear_inline_node(page); unlock_page(page); flush_inline_data(sbi, ino_of_node(page)); -- 2.26.2.645.ge9eca65c58-goog _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-11 22:11 ` Jaegeuk Kim @ 2020-05-12 1:57 ` Chao Yu 2020-05-12 3:24 ` Jaegeuk Kim 0 siblings, 1 reply; 10+ messages in thread From: Chao Yu @ 2020-05-12 1:57 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel On 2020/5/12 6:11, Jaegeuk Kim wrote: > On 05/11, Chao Yu wrote: >> On 2020/5/10 3:03, Jaegeuk Kim wrote: >>> On 05/09, Chao Yu wrote: >>>> On 2020/5/9 0:10, Jaegeuk Kim wrote: >>>>> Hi Sayali, >>>>> >>>>> In order to address the perf regression, how about this? >>>>> >>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 >>>>> From: Jaegeuk Kim <jaegeuk@kernel.org> >>>>> Date: Fri, 8 May 2020 09:08:37 -0700 >>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint >>>>> >>>>> There could be a scenario where f2fs_sync_node_pages gets >>>>> called during checkpoint, which in turn tries to flush >>>>> inline data and calls iput(). This results in deadlock as >>>>> iput() tries to hold cp_rwsem, which is already held at the >>>>> beginning by checkpoint->block_operations(). >>>>> >>>>> Call stack : >>>>> >>>>> Thread A Thread B >>>>> f2fs_write_checkpoint() >>>>> - block_operations(sbi) >>>>> - f2fs_lock_all(sbi); >>>>> - down_write(&sbi->cp_rwsem); >>>>> >>>>> - open() >>>>> - igrab() >>>>> - write() write inline data >>>>> - unlink() >>>>> - f2fs_sync_node_pages() >>>>> - if (is_inline_node(page)) >>>>> - flush_inline_data() >>>>> - ilookup() >>>>> page = f2fs_pagecache_get_page() >>>>> if (!page) >>>>> goto iput_out; >>>>> iput_out: >>>>> -close() >>>>> -iput() >>>>> iput(inode); >>>>> - f2fs_evict_inode() >>>>> - f2fs_truncate_blocks() >>>>> - f2fs_lock_op() >>>>> - down_read(&sbi->cp_rwsem); >>>>> >>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> >>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>>>> --- >>>>> fs/f2fs/node.c | 4 ++-- >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>>>> index 1db8cabf727ef..626d7daca09de 100644 >>>>> --- a/fs/f2fs/node.c >>>>> +++ b/fs/f2fs/node.c >>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>>>> goto continue_unlock; >>>>> } >>>>> >>>>> - /* flush inline_data */ >>>>> - if (is_inline_node(page)) { >>>>> + /* flush inline_data, if it's not sync path. */ >>>>> + if (do_balance && is_inline_node(page)) { >>>> >>>> IIRC, this flow was designed to avoid running out of free space issue >>>> during checkpoint: >>>> >>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>>> >>>> The sceanrio is: >>>> 1. create fully node blocks >>>> 2. flush node blocks >>>> 3. write inline_data for all the node blocks again >>>> 4. flush node blocks redundantly >>>> >>>> I guess this may cause failing one case of fstest. >>> >>> Yeah, actually I was hitting 204 failure, and thus, revised like this. >>> Now, I don't see any regression in fstest. >>> >>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 >>> From: Sayali Lokhande <sayalil@codeaurora.org> >>> Date: Thu, 30 Apr 2020 16:28:29 +0530 >>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint >>> >>> There could be a scenario where f2fs_sync_node_pages gets >>> called during checkpoint, which in turn tries to flush >>> inline data and calls iput(). This results in deadlock as >>> iput() tries to hold cp_rwsem, which is already held at the >>> beginning by checkpoint->block_operations(). >>> >>> Call stack : >>> >>> Thread A Thread B >>> f2fs_write_checkpoint() >>> - block_operations(sbi) >>> - f2fs_lock_all(sbi); >>> - down_write(&sbi->cp_rwsem); >>> >>> - open() >>> - igrab() >>> - write() write inline data >>> - unlink() >>> - f2fs_sync_node_pages() >>> - if (is_inline_node(page)) >>> - flush_inline_data() >>> - ilookup() >>> page = f2fs_pagecache_get_page() >>> if (!page) >>> goto iput_out; >>> iput_out: >>> -close() >>> -iput() >>> iput(inode); >>> - f2fs_evict_inode() >>> - f2fs_truncate_blocks() >>> - f2fs_lock_op() >>> - down_read(&sbi->cp_rwsem); >>> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>> --- >>> fs/f2fs/checkpoint.c | 9 ++++++++- >>> fs/f2fs/f2fs.h | 4 ++-- >>> fs/f2fs/node.c | 10 +++++----- >>> 3 files changed, 15 insertions(+), 8 deletions(-) >>> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>> index d49f7a01d8a26..928aea4ff663d 100644 >>> --- a/fs/f2fs/checkpoint.c >>> +++ b/fs/f2fs/checkpoint.c >>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) >>> }; >>> int err = 0, cnt = 0; >>> >>> + /* >>> + * Let's flush node pages first to flush inline_data. >>> + * We'll actually guarantee everything below under f2fs_lock_all. >>> + */ >>> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); >> >> It is possible that user write a large number of inline data in between >> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in >> race condition. >> >> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will >> flush inode page twice which is unneeded. >> >> f2fs_sync_node_pages() --- flush dirty inode page >> f2fs_lock_all() >> ... >> f2fs_sync_inode_meta() --- update dirty inode page >> f2fs_sync_node_pages() --- flush dirty inode page again. >> > > Another version: > >>From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001 > From: Sayali Lokhande <sayalil@codeaurora.org> > Date: Thu, 30 Apr 2020 16:28:29 +0530 > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > > There could be a scenario where f2fs_sync_node_pages gets > called during checkpoint, which in turn tries to flush > inline data and calls iput(). This results in deadlock as > iput() tries to hold cp_rwsem, which is already held at the > beginning by checkpoint->block_operations(). > > Call stack : > > Thread A Thread B > f2fs_write_checkpoint() > - block_operations(sbi) > - f2fs_lock_all(sbi); > - down_write(&sbi->cp_rwsem); > > - open() > - igrab() > - write() write inline data > - unlink() > - f2fs_sync_node_pages() > - if (is_inline_node(page)) > - flush_inline_data() > - ilookup() > page = f2fs_pagecache_get_page() > if (!page) > goto iput_out; > iput_out: > -close() > -iput() > iput(inode); > - f2fs_evict_inode() > - f2fs_truncate_blocks() > - f2fs_lock_op() > - down_read(&sbi->cp_rwsem); > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > fs/f2fs/checkpoint.c | 5 +++++ > fs/f2fs/f2fs.h | 1 + > fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 55 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index d49f7a01d8a26..79e605f38f4fa 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi) > }; > int err = 0, cnt = 0; > > + /* > + * Let's flush inline_data in dirty node pages. > + */ > + f2fs_flush_inline_data(sbi); Still there is a gap, user can write a large number of inline data here... Would that be enough? I doubt we can suffer this issue in below pathes as well: - block_operations - f2fs_sync_dirty_inodes - iput - f2fs_sync_inode_meta - iput Thanks, > + > retry_flush_quotas: > f2fs_lock_all(sbi); > if (__need_flush_quota(sbi)) { > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 2a8ea81c52a15..7f3d259e7e376 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); > struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); > struct page *f2fs_get_node_page_ra(struct page *parent, int start); > int f2fs_move_node_page(struct page *node_page, int gc_type); > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi); > int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, > struct writeback_control *wbc, bool atomic, > unsigned int *seq_id); > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 1db8cabf727ef..e632de10aedab 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page) > return true; > } > > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi) > +{ > + pgoff_t index = 0; > + struct pagevec pvec; > + int nr_pages; > + int ret = 0; > + > + pagevec_init(&pvec); > + > + while ((nr_pages = pagevec_lookup_tag(&pvec, > + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { > + int i; > + > + for (i = 0; i < nr_pages; i++) { > + struct page *page = pvec.pages[i]; > + > + if (!IS_DNODE(page)) > + continue; > + > + lock_page(page); > + > + if (unlikely(page->mapping != NODE_MAPPING(sbi))) { > +continue_unlock: > + unlock_page(page); > + continue; > + } > + > + if (!PageDirty(page)) { > + /* someone wrote it for us */ > + goto continue_unlock; > + } > + > + /* flush inline_data, if it's async context. */ > + if (is_inline_node(page)) { > + clear_inline_node(page); > + unlock_page(page); > + flush_inline_data(sbi, ino_of_node(page)); > + continue; > + } > + unlock_page(page); > + } > + pagevec_release(&pvec); > + cond_resched(); > + } > + return ret; > +} > + > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > struct writeback_control *wbc, > bool do_balance, enum iostat_type io_type) > @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > goto continue_unlock; > } > > - /* flush inline_data */ > - if (is_inline_node(page)) { > + /* flush inline_data, if it's async context. */ > + if (do_balance && is_inline_node(page)) { > clear_inline_node(page); > unlock_page(page); > flush_inline_data(sbi, ino_of_node(page)); > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-12 1:57 ` Chao Yu @ 2020-05-12 3:24 ` Jaegeuk Kim 2020-05-12 6:49 ` Chao Yu 0 siblings, 1 reply; 10+ messages in thread From: Jaegeuk Kim @ 2020-05-12 3:24 UTC (permalink / raw) To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel On 05/12, Chao Yu wrote: > On 2020/5/12 6:11, Jaegeuk Kim wrote: > > On 05/11, Chao Yu wrote: > >> On 2020/5/10 3:03, Jaegeuk Kim wrote: > >>> On 05/09, Chao Yu wrote: > >>>> On 2020/5/9 0:10, Jaegeuk Kim wrote: > >>>>> Hi Sayali, > >>>>> > >>>>> In order to address the perf regression, how about this? > >>>>> > >>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 > >>>>> From: Jaegeuk Kim <jaegeuk@kernel.org> > >>>>> Date: Fri, 8 May 2020 09:08:37 -0700 > >>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint > >>>>> > >>>>> There could be a scenario where f2fs_sync_node_pages gets > >>>>> called during checkpoint, which in turn tries to flush > >>>>> inline data and calls iput(). This results in deadlock as > >>>>> iput() tries to hold cp_rwsem, which is already held at the > >>>>> beginning by checkpoint->block_operations(). > >>>>> > >>>>> Call stack : > >>>>> > >>>>> Thread A Thread B > >>>>> f2fs_write_checkpoint() > >>>>> - block_operations(sbi) > >>>>> - f2fs_lock_all(sbi); > >>>>> - down_write(&sbi->cp_rwsem); > >>>>> > >>>>> - open() > >>>>> - igrab() > >>>>> - write() write inline data > >>>>> - unlink() > >>>>> - f2fs_sync_node_pages() > >>>>> - if (is_inline_node(page)) > >>>>> - flush_inline_data() > >>>>> - ilookup() > >>>>> page = f2fs_pagecache_get_page() > >>>>> if (!page) > >>>>> goto iput_out; > >>>>> iput_out: > >>>>> -close() > >>>>> -iput() > >>>>> iput(inode); > >>>>> - f2fs_evict_inode() > >>>>> - f2fs_truncate_blocks() > >>>>> - f2fs_lock_op() > >>>>> - down_read(&sbi->cp_rwsem); > >>>>> > >>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > >>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > >>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > >>>>> --- > >>>>> fs/f2fs/node.c | 4 ++-- > >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > >>>>> index 1db8cabf727ef..626d7daca09de 100644 > >>>>> --- a/fs/f2fs/node.c > >>>>> +++ b/fs/f2fs/node.c > >>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > >>>>> goto continue_unlock; > >>>>> } > >>>>> > >>>>> - /* flush inline_data */ > >>>>> - if (is_inline_node(page)) { > >>>>> + /* flush inline_data, if it's not sync path. */ > >>>>> + if (do_balance && is_inline_node(page)) { > >>>> > >>>> IIRC, this flow was designed to avoid running out of free space issue > >>>> during checkpoint: > >>>> > >>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > >>>> > >>>> The sceanrio is: > >>>> 1. create fully node blocks > >>>> 2. flush node blocks > >>>> 3. write inline_data for all the node blocks again > >>>> 4. flush node blocks redundantly > >>>> > >>>> I guess this may cause failing one case of fstest. > >>> > >>> Yeah, actually I was hitting 204 failure, and thus, revised like this. > >>> Now, I don't see any regression in fstest. > >>> > >>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 > >>> From: Sayali Lokhande <sayalil@codeaurora.org> > >>> Date: Thu, 30 Apr 2020 16:28:29 +0530 > >>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > >>> > >>> There could be a scenario where f2fs_sync_node_pages gets > >>> called during checkpoint, which in turn tries to flush > >>> inline data and calls iput(). This results in deadlock as > >>> iput() tries to hold cp_rwsem, which is already held at the > >>> beginning by checkpoint->block_operations(). > >>> > >>> Call stack : > >>> > >>> Thread A Thread B > >>> f2fs_write_checkpoint() > >>> - block_operations(sbi) > >>> - f2fs_lock_all(sbi); > >>> - down_write(&sbi->cp_rwsem); > >>> > >>> - open() > >>> - igrab() > >>> - write() write inline data > >>> - unlink() > >>> - f2fs_sync_node_pages() > >>> - if (is_inline_node(page)) > >>> - flush_inline_data() > >>> - ilookup() > >>> page = f2fs_pagecache_get_page() > >>> if (!page) > >>> goto iput_out; > >>> iput_out: > >>> -close() > >>> -iput() > >>> iput(inode); > >>> - f2fs_evict_inode() > >>> - f2fs_truncate_blocks() > >>> - f2fs_lock_op() > >>> - down_read(&sbi->cp_rwsem); > >>> > >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > >>> --- > >>> fs/f2fs/checkpoint.c | 9 ++++++++- > >>> fs/f2fs/f2fs.h | 4 ++-- > >>> fs/f2fs/node.c | 10 +++++----- > >>> 3 files changed, 15 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > >>> index d49f7a01d8a26..928aea4ff663d 100644 > >>> --- a/fs/f2fs/checkpoint.c > >>> +++ b/fs/f2fs/checkpoint.c > >>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) > >>> }; > >>> int err = 0, cnt = 0; > >>> > >>> + /* > >>> + * Let's flush node pages first to flush inline_data. > >>> + * We'll actually guarantee everything below under f2fs_lock_all. > >>> + */ > >>> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); > >> > >> It is possible that user write a large number of inline data in between > >> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in > >> race condition. > >> > >> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will > >> flush inode page twice which is unneeded. > >> > >> f2fs_sync_node_pages() --- flush dirty inode page > >> f2fs_lock_all() > >> ... > >> f2fs_sync_inode_meta() --- update dirty inode page > >> f2fs_sync_node_pages() --- flush dirty inode page again. > >> > > > > Another version: > > > >>From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001 > > From: Sayali Lokhande <sayalil@codeaurora.org> > > Date: Thu, 30 Apr 2020 16:28:29 +0530 > > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > > > > There could be a scenario where f2fs_sync_node_pages gets > > called during checkpoint, which in turn tries to flush > > inline data and calls iput(). This results in deadlock as > > iput() tries to hold cp_rwsem, which is already held at the > > beginning by checkpoint->block_operations(). > > > > Call stack : > > > > Thread A Thread B > > f2fs_write_checkpoint() > > - block_operations(sbi) > > - f2fs_lock_all(sbi); > > - down_write(&sbi->cp_rwsem); > > > > - open() > > - igrab() > > - write() write inline data > > - unlink() > > - f2fs_sync_node_pages() > > - if (is_inline_node(page)) > > - flush_inline_data() > > - ilookup() > > page = f2fs_pagecache_get_page() > > if (!page) > > goto iput_out; > > iput_out: > > -close() > > -iput() > > iput(inode); > > - f2fs_evict_inode() > > - f2fs_truncate_blocks() > > - f2fs_lock_op() > > - down_read(&sbi->cp_rwsem); > > > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > --- > > fs/f2fs/checkpoint.c | 5 +++++ > > fs/f2fs/f2fs.h | 1 + > > fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++-- > > 3 files changed, 55 insertions(+), 2 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index d49f7a01d8a26..79e605f38f4fa 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi) > > }; > > int err = 0, cnt = 0; > > > > + /* > > + * Let's flush inline_data in dirty node pages. > > + */ > > + f2fs_flush_inline_data(sbi); > > Still there is a gap, user can write a large number of inline data here... I think generic/204 is the case, and I don't hit a panic with this patch. > > Would that be enough? I doubt we can suffer this issue in below pathes > as well: I don't think so, since iput is called after f2fs_unlock_all(). > > - block_operations > - f2fs_sync_dirty_inodes > - iput > - f2fs_sync_inode_meta > - iput > > Thanks, > > > + > > retry_flush_quotas: > > f2fs_lock_all(sbi); > > if (__need_flush_quota(sbi)) { > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index 2a8ea81c52a15..7f3d259e7e376 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); > > struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); > > struct page *f2fs_get_node_page_ra(struct page *parent, int start); > > int f2fs_move_node_page(struct page *node_page, int gc_type); > > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi); > > int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, > > struct writeback_control *wbc, bool atomic, > > unsigned int *seq_id); > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > index 1db8cabf727ef..e632de10aedab 100644 > > --- a/fs/f2fs/node.c > > +++ b/fs/f2fs/node.c > > @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page) > > return true; > > } > > > > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi) > > +{ > > + pgoff_t index = 0; > > + struct pagevec pvec; > > + int nr_pages; > > + int ret = 0; > > + > > + pagevec_init(&pvec); > > + > > + while ((nr_pages = pagevec_lookup_tag(&pvec, > > + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { > > + int i; > > + > > + for (i = 0; i < nr_pages; i++) { > > + struct page *page = pvec.pages[i]; > > + > > + if (!IS_DNODE(page)) > > + continue; > > + > > + lock_page(page); > > + > > + if (unlikely(page->mapping != NODE_MAPPING(sbi))) { > > +continue_unlock: > > + unlock_page(page); > > + continue; > > + } > > + > > + if (!PageDirty(page)) { > > + /* someone wrote it for us */ > > + goto continue_unlock; > > + } > > + > > + /* flush inline_data, if it's async context. */ > > + if (is_inline_node(page)) { > > + clear_inline_node(page); > > + unlock_page(page); > > + flush_inline_data(sbi, ino_of_node(page)); > > + continue; > > + } > > + unlock_page(page); > > + } > > + pagevec_release(&pvec); > > + cond_resched(); > > + } > > + return ret; > > +} > > + > > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > > struct writeback_control *wbc, > > bool do_balance, enum iostat_type io_type) > > @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > > goto continue_unlock; > > } > > > > - /* flush inline_data */ > > - if (is_inline_node(page)) { > > + /* flush inline_data, if it's async context. */ > > + if (do_balance && is_inline_node(page)) { > > clear_inline_node(page); > > unlock_page(page); > > flush_inline_data(sbi, ino_of_node(page)); > > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint 2020-05-12 3:24 ` Jaegeuk Kim @ 2020-05-12 6:49 ` Chao Yu 0 siblings, 0 replies; 10+ messages in thread From: Chao Yu @ 2020-05-12 6:49 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel On 2020/5/12 11:24, Jaegeuk Kim wrote: > On 05/12, Chao Yu wrote: >> On 2020/5/12 6:11, Jaegeuk Kim wrote: >>> On 05/11, Chao Yu wrote: >>>> On 2020/5/10 3:03, Jaegeuk Kim wrote: >>>>> On 05/09, Chao Yu wrote: >>>>>> On 2020/5/9 0:10, Jaegeuk Kim wrote: >>>>>>> Hi Sayali, >>>>>>> >>>>>>> In order to address the perf regression, how about this? >>>>>>> >>>>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 >>>>>>> From: Jaegeuk Kim <jaegeuk@kernel.org> >>>>>>> Date: Fri, 8 May 2020 09:08:37 -0700 >>>>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint >>>>>>> >>>>>>> There could be a scenario where f2fs_sync_node_pages gets >>>>>>> called during checkpoint, which in turn tries to flush >>>>>>> inline data and calls iput(). This results in deadlock as >>>>>>> iput() tries to hold cp_rwsem, which is already held at the >>>>>>> beginning by checkpoint->block_operations(). >>>>>>> >>>>>>> Call stack : >>>>>>> >>>>>>> Thread A Thread B >>>>>>> f2fs_write_checkpoint() >>>>>>> - block_operations(sbi) >>>>>>> - f2fs_lock_all(sbi); >>>>>>> - down_write(&sbi->cp_rwsem); >>>>>>> >>>>>>> - open() >>>>>>> - igrab() >>>>>>> - write() write inline data >>>>>>> - unlink() >>>>>>> - f2fs_sync_node_pages() >>>>>>> - if (is_inline_node(page)) >>>>>>> - flush_inline_data() >>>>>>> - ilookup() >>>>>>> page = f2fs_pagecache_get_page() >>>>>>> if (!page) >>>>>>> goto iput_out; >>>>>>> iput_out: >>>>>>> -close() >>>>>>> -iput() >>>>>>> iput(inode); >>>>>>> - f2fs_evict_inode() >>>>>>> - f2fs_truncate_blocks() >>>>>>> - f2fs_lock_op() >>>>>>> - down_read(&sbi->cp_rwsem); >>>>>>> >>>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>>>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> >>>>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>>>>>> --- >>>>>>> fs/f2fs/node.c | 4 ++-- >>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>>>>>> index 1db8cabf727ef..626d7daca09de 100644 >>>>>>> --- a/fs/f2fs/node.c >>>>>>> +++ b/fs/f2fs/node.c >>>>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>>>>>> goto continue_unlock; >>>>>>> } >>>>>>> >>>>>>> - /* flush inline_data */ >>>>>>> - if (is_inline_node(page)) { >>>>>>> + /* flush inline_data, if it's not sync path. */ >>>>>>> + if (do_balance && is_inline_node(page)) { >>>>>> >>>>>> IIRC, this flow was designed to avoid running out of free space issue >>>>>> during checkpoint: >>>>>> >>>>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>>>>> >>>>>> The sceanrio is: >>>>>> 1. create fully node blocks >>>>>> 2. flush node blocks >>>>>> 3. write inline_data for all the node blocks again >>>>>> 4. flush node blocks redundantly >>>>>> >>>>>> I guess this may cause failing one case of fstest. >>>>> >>>>> Yeah, actually I was hitting 204 failure, and thus, revised like this. >>>>> Now, I don't see any regression in fstest. >>>>> >>>>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 >>>>> From: Sayali Lokhande <sayalil@codeaurora.org> >>>>> Date: Thu, 30 Apr 2020 16:28:29 +0530 >>>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint >>>>> >>>>> There could be a scenario where f2fs_sync_node_pages gets >>>>> called during checkpoint, which in turn tries to flush >>>>> inline data and calls iput(). This results in deadlock as >>>>> iput() tries to hold cp_rwsem, which is already held at the >>>>> beginning by checkpoint->block_operations(). >>>>> >>>>> Call stack : >>>>> >>>>> Thread A Thread B >>>>> f2fs_write_checkpoint() >>>>> - block_operations(sbi) >>>>> - f2fs_lock_all(sbi); >>>>> - down_write(&sbi->cp_rwsem); >>>>> >>>>> - open() >>>>> - igrab() >>>>> - write() write inline data >>>>> - unlink() >>>>> - f2fs_sync_node_pages() >>>>> - if (is_inline_node(page)) >>>>> - flush_inline_data() >>>>> - ilookup() >>>>> page = f2fs_pagecache_get_page() >>>>> if (!page) >>>>> goto iput_out; >>>>> iput_out: >>>>> -close() >>>>> -iput() >>>>> iput(inode); >>>>> - f2fs_evict_inode() >>>>> - f2fs_truncate_blocks() >>>>> - f2fs_lock_op() >>>>> - down_read(&sbi->cp_rwsem); >>>>> >>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> >>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>>>> --- >>>>> fs/f2fs/checkpoint.c | 9 ++++++++- >>>>> fs/f2fs/f2fs.h | 4 ++-- >>>>> fs/f2fs/node.c | 10 +++++----- >>>>> 3 files changed, 15 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>>>> index d49f7a01d8a26..928aea4ff663d 100644 >>>>> --- a/fs/f2fs/checkpoint.c >>>>> +++ b/fs/f2fs/checkpoint.c >>>>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) >>>>> }; >>>>> int err = 0, cnt = 0; >>>>> >>>>> + /* >>>>> + * Let's flush node pages first to flush inline_data. >>>>> + * We'll actually guarantee everything below under f2fs_lock_all. >>>>> + */ >>>>> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); >>>> >>>> It is possible that user write a large number of inline data in between >>>> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in >>>> race condition. >>>> >>>> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will >>>> flush inode page twice which is unneeded. >>>> >>>> f2fs_sync_node_pages() --- flush dirty inode page >>>> f2fs_lock_all() >>>> ... >>>> f2fs_sync_inode_meta() --- update dirty inode page >>>> f2fs_sync_node_pages() --- flush dirty inode page again. >>>> >>> >>> Another version: >>> >>> >From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001 >>> From: Sayali Lokhande <sayalil@codeaurora.org> >>> Date: Thu, 30 Apr 2020 16:28:29 +0530 >>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint >>> >>> There could be a scenario where f2fs_sync_node_pages gets >>> called during checkpoint, which in turn tries to flush >>> inline data and calls iput(). This results in deadlock as >>> iput() tries to hold cp_rwsem, which is already held at the >>> beginning by checkpoint->block_operations(). >>> >>> Call stack : >>> >>> Thread A Thread B >>> f2fs_write_checkpoint() >>> - block_operations(sbi) >>> - f2fs_lock_all(sbi); >>> - down_write(&sbi->cp_rwsem); >>> >>> - open() >>> - igrab() >>> - write() write inline data >>> - unlink() >>> - f2fs_sync_node_pages() >>> - if (is_inline_node(page)) >>> - flush_inline_data() >>> - ilookup() >>> page = f2fs_pagecache_get_page() >>> if (!page) >>> goto iput_out; >>> iput_out: >>> -close() >>> -iput() >>> iput(inode); >>> - f2fs_evict_inode() >>> - f2fs_truncate_blocks() >>> - f2fs_lock_op() >>> - down_read(&sbi->cp_rwsem); >>> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>> --- >>> fs/f2fs/checkpoint.c | 5 +++++ >>> fs/f2fs/f2fs.h | 1 + >>> fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++-- >>> 3 files changed, 55 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>> index d49f7a01d8a26..79e605f38f4fa 100644 >>> --- a/fs/f2fs/checkpoint.c >>> +++ b/fs/f2fs/checkpoint.c >>> @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi) >>> }; >>> int err = 0, cnt = 0; >>> >>> + /* >>> + * Let's flush inline_data in dirty node pages. >>> + */ >>> + f2fs_flush_inline_data(sbi); >> >> Still there is a gap, user can write a large number of inline data here... > > I think generic/204 is the case, and I don't hit a panic with this patch. Yes, may be triggered by a variant of generic/204 which is not exist yet, and it's a rare corner case. > >> >> Would that be enough? I doubt we can suffer this issue in below pathes >> as well: > > I don't think so, since iput is called after f2fs_unlock_all(). That's correct. Thanks, > >> >> - block_operations >> - f2fs_sync_dirty_inodes >> - iput >> - f2fs_sync_inode_meta >> - iput >> >> Thanks, >> >>> + >>> retry_flush_quotas: >>> f2fs_lock_all(sbi); >>> if (__need_flush_quota(sbi)) { >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>> index 2a8ea81c52a15..7f3d259e7e376 100644 >>> --- a/fs/f2fs/f2fs.h >>> +++ b/fs/f2fs/f2fs.h >>> @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); >>> struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); >>> struct page *f2fs_get_node_page_ra(struct page *parent, int start); >>> int f2fs_move_node_page(struct page *node_page, int gc_type); >>> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi); >>> int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, >>> struct writeback_control *wbc, bool atomic, >>> unsigned int *seq_id); >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>> index 1db8cabf727ef..e632de10aedab 100644 >>> --- a/fs/f2fs/node.c >>> +++ b/fs/f2fs/node.c >>> @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page) >>> return true; >>> } >>> >>> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi) >>> +{ >>> + pgoff_t index = 0; >>> + struct pagevec pvec; >>> + int nr_pages; >>> + int ret = 0; >>> + >>> + pagevec_init(&pvec); >>> + >>> + while ((nr_pages = pagevec_lookup_tag(&pvec, >>> + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { >>> + int i; >>> + >>> + for (i = 0; i < nr_pages; i++) { >>> + struct page *page = pvec.pages[i]; >>> + >>> + if (!IS_DNODE(page)) >>> + continue; >>> + >>> + lock_page(page); >>> + >>> + if (unlikely(page->mapping != NODE_MAPPING(sbi))) { >>> +continue_unlock: >>> + unlock_page(page); >>> + continue; >>> + } >>> + >>> + if (!PageDirty(page)) { >>> + /* someone wrote it for us */ >>> + goto continue_unlock; >>> + } >>> + >>> + /* flush inline_data, if it's async context. */ >>> + if (is_inline_node(page)) { >>> + clear_inline_node(page); >>> + unlock_page(page); >>> + flush_inline_data(sbi, ino_of_node(page)); >>> + continue; >>> + } >>> + unlock_page(page); >>> + } >>> + pagevec_release(&pvec); >>> + cond_resched(); >>> + } >>> + return ret; >>> +} >>> + >>> int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>> struct writeback_control *wbc, >>> bool do_balance, enum iostat_type io_type) >>> @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>> goto continue_unlock; >>> } >>> >>> - /* flush inline_data */ >>> - if (is_inline_node(page)) { >>> + /* flush inline_data, if it's async context. */ >>> + if (do_balance && is_inline_node(page)) { >>> clear_inline_node(page); >>> unlock_page(page); >>> flush_inline_data(sbi, ino_of_node(page)); >>> > . > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-05-12 6:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-30 10:58 [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint Sayali Lokhande 2020-05-06 6:21 ` Chao Yu 2020-05-08 16:10 ` Jaegeuk Kim 2020-05-09 3:02 ` Chao Yu 2020-05-09 19:03 ` Jaegeuk Kim 2020-05-11 1:28 ` Chao Yu 2020-05-11 22:11 ` Jaegeuk Kim 2020-05-12 1:57 ` Chao Yu 2020-05-12 3:24 ` Jaegeuk Kim 2020-05-12 6:49 ` Chao Yu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).