All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sahitya Tummala <stummala@codeaurora.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] f2fs: fix the panic in do_checkpoint()
Date: Mon, 17 Feb 2020 08:39:37 +0530	[thread overview]
Message-ID: <20200217030937.GC20234@codeaurora.org> (raw)
In-Reply-To: <7fb14366-0ae3-e2d9-49ff-b7257054002c@huawei.com>

On Fri, Feb 14, 2020 at 03:16:59PM +0800, Chao Yu wrote:
> Hi Sahitya,
> 
> On 2020/2/14 11:54, Sahitya Tummala wrote:
> > Hi Chao,
> > 
> > On Fri, Feb 14, 2020 at 10:26:18AM +0800, Chao Yu wrote:
> >> Hi Sahitya,
> >>
> >> On 2020/2/12 18:34, Sahitya Tummala wrote:
> >>> There could be a scenario where f2fs_sync_meta_pages() will not
> >>> ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
> >>> resulting in the below panic in do_checkpoint() -
> >>>
> >>> f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
> >>> 				!f2fs_cp_error(sbi));
> >>>
> >>> This can happen in a low-memory condition, where shrinker could
> >>> also be doing the writepage operation (stack shown below)
> >>> at the same time when checkpoint is running on another core.
> >>>
> >>> schedule
> >>> down_write
> >>> f2fs_submit_page_write -> by this time, this page in page cache is tagged
> >>> 			as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
> >>> 			is cleared, due to which f2fs_sync_meta_pages()
> >>> 			cannot sync this page in do_checkpoint() path.
> >>> f2fs_do_write_meta_page
> >>> __f2fs_write_meta_page
> >>> f2fs_write_meta_page
> >>> shrink_page_list
> >>> shrink_inactive_list
> >>> shrink_node_memcg
> >>> shrink_node
> >>> kswapd
> >>
> >> IMO, there may be one more simple fix here:
> >>
> >> -	f2fs_do_write_meta_page(sbi, page, io_type);
> >> 	dec_page_count(sbi, F2FS_DIRTY_META);
> >>
> >> +	f2fs_do_write_meta_page(sbi, page, io_type);
> >>
> >> If we can remove F2FS_DIRTY_META reference count before we clear
> >> PAGECACHE_TAG_DIRTY, we can avoid this race condition.
> >>
> >> - dec_page_count(sbi, F2FS_DIRTY_META);
> >> - f2fs_do_write_meta_page
> >>  - set_page_writeback
> >>   - __test_set_page_writeback
> >>    - xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY);
> >>
> >> Thoughts?
> > 
> > I believe it will be a problem because let's say the shrinker path is 
> > still in the below stack -
> > 
> > f2fs_submit_page_write();
> > ->down_write(&io->io_rwsem);
> > 
> > Then, the current f2fs_wait_on_all_pages_writeback() will not wait for
> > that page as it is not yet tagged as F2FS_WB_CP_DATA. Hence, the checkpoint
> > may proceed without waiting for this meta page to be written to disk.
> 
> Oh, you're right.
> 
> It looks the race can happen for data/node pages? like in fsync() procedure.
> 
Yes, it can happen. I see that you have already put a patch to fix that :).

> > 
> >>
> >>>
> >>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >>> ---
> >>>  fs/f2fs/checkpoint.c | 16 ++++++++--------
> >>>  fs/f2fs/f2fs.h       |  2 +-
> >>>  fs/f2fs/super.c      |  2 +-
> >>>  3 files changed, 10 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index ffdaba0..2b651a3 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -1250,14 +1250,14 @@ static void unblock_operations(struct f2fs_sb_info *sbi)
> >>>  	f2fs_unlock_all(sbi);
> >>>  }
> >>>  
> >>> -void f2fs_wait_on_all_pages_writeback(struct f2fs_sb_info *sbi)
> >>> +void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
> >>>  {
> >>>  	DEFINE_WAIT(wait);
> >>>  
> >>>  	for (;;) {
> >>>  		prepare_to_wait(&sbi->cp_wait, &wait, TASK_UNINTERRUPTIBLE);
> >>>  
> >>> -		if (!get_pages(sbi, F2FS_WB_CP_DATA))
> >>> +		if (!get_pages(sbi, type))
> >>>  			break;
> >>>  
> >>>  		if (unlikely(f2fs_cp_error(sbi)))
> >>> @@ -1384,8 +1384,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >>>  
> >>>  	/* Flush all the NAT/SIT pages */
> >>>  	f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
> >>> -	f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
> >>> -					!f2fs_cp_error(sbi));
> >>> +	/* Wait for all dirty meta pages to be submitted for IO */
> >>> +	f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META);
> >>
> >> I'm afraid calling f2fs_wait_on_all_pages() after we call
> >> f2fs_sync_meta_pages() is low efficient, as we only want to write out
> >> dirty meta pages instead of wait for writebacking them to device cache.
> >>
> > 
> > I have modified the existing function f2fs_wait_on_all_pages_writeback() to
> > a generic one f2fs_wait_on_all_pages(), where it will wait according to the
> > requested type. In this case, it will only wait for dirty F2FS_DIRTY_META pages
> > but not for the writeback to device cache.
> 
> Oh, I see, as last dirty reference count decreaser won't wake up waiter, we need
> to wait for timeout, right? Can we decrease timeout count, maybe HZ/50 as we did
> for congestion.

Yes, I agree that it is better to decrease the timeout to HZ/50.

Thanks,

> 
> io_schedule_timeout(5*HZ);
> 
> Thanks,
> 
> > 
> > Thanks,
> > 
> >> Thanks,
> >>
> >>>  
> >>>  	/*
> >>>  	 * modify checkpoint
> >>> @@ -1493,11 +1493,11 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >>>  
> >>>  	/* Here, we have one bio having CP pack except cp pack 2 page */
> >>>  	f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
> >>> -	f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
> >>> -					!f2fs_cp_error(sbi));
> >>> +	/* Wait for all dirty meta pages to be submitted for IO */
> >>> +	f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META);
> >>>  
> >>>  	/* wait for previous submitted meta pages writeback */
> >>> -	f2fs_wait_on_all_pages_writeback(sbi);
> >>> +	f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA);
> >>>  
> >>>  	/* flush all device cache */
> >>>  	err = f2fs_flush_device_cache(sbi);
> >>> @@ -1506,7 +1506,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >>>  
> >>>  	/* barrier and flush checkpoint cp pack 2 page if it can */
> >>>  	commit_checkpoint(sbi, ckpt, start_blk);
> >>> -	f2fs_wait_on_all_pages_writeback(sbi);
> >>> +	f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA);
> >>>  
> >>>  	/*
> >>>  	 * invalidate intermediate page cache borrowed from meta inode
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 5a888a0..b0e0535 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -3196,7 +3196,7 @@ bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
> >>>  void f2fs_update_dirty_page(struct inode *inode, struct page *page);
> >>>  void f2fs_remove_dirty_inode(struct inode *inode);
> >>>  int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type);
> >>> -void f2fs_wait_on_all_pages_writeback(struct f2fs_sb_info *sbi);
> >>> +void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type);
> >>>  int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >>>  void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi);
> >>>  int __init f2fs_create_checkpoint_caches(void);
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>> index 5111e1f..084633b 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -1105,7 +1105,7 @@ static void f2fs_put_super(struct super_block *sb)
> >>>  	/* our cp_error case, we can wait for any writeback page */
> >>>  	f2fs_flush_merged_writes(sbi);
> >>>  
> >>> -	f2fs_wait_on_all_pages_writeback(sbi);
> >>> +	f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA);
> >>>  
> >>>  	f2fs_bug_on(sbi, sbi->fsync_node_num);
> >>>  
> >>>
> > 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

      parent reply	other threads:[~2020-02-17  3:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-12 10:34 [PATCH] f2fs: fix the panic in do_checkpoint() Sahitya Tummala
2020-02-12 10:34 ` [f2fs-dev] " Sahitya Tummala
2020-02-14  2:26 ` Chao Yu
2020-02-14  2:26   ` [f2fs-dev] " Chao Yu
2020-02-14  3:54   ` Sahitya Tummala
2020-02-14  7:16     ` Chao Yu
2020-02-14  7:16       ` [f2fs-dev] " Chao Yu
2020-02-14  7:35       ` Chao Yu
2020-02-14  7:35         ` Chao Yu
2020-02-17  3:09       ` Sahitya Tummala [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200217030937.GC20234@codeaurora.org \
    --to=stummala@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.