From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_PASS,T_DKIMWL_WL_HIGH autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 403DDC43142 for ; Sun, 29 Jul 2018 02:55:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CAB7B20899 for ; Sun, 29 Jul 2018 02:55:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="sKnDmheX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CAB7B20899 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726294AbeG2EYB (ORCPT ); Sun, 29 Jul 2018 00:24:01 -0400 Received: from mail.kernel.org ([198.145.29.99]:60714 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725749AbeG2EYA (ORCPT ); Sun, 29 Jul 2018 00:24:00 -0400 Received: from [192.168.0.101] (unknown [180.111.102.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E0E0220893; Sun, 29 Jul 2018 02:55:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1532832915; bh=XNQIIyMejyT0rDQIgZ+xFJUpZxI5mRex5r5Nns2u+O0=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=sKnDmheXMfj/XIKMUTfGvNpmJ1k/dBk9zpM7mU1JRaGdheTvyAZAQsUpjCmorf1iB n8GUj0sUDWoZgD1iChv3ooC1zEPEPULlkDdVThpefIS49/PJ/M9HkqGeREfe0apkOW fYMBiDzU1iLubJ76GIyiVkU48WnsuJivArCVw52w= Subject: Re: [PATCH v5 1/2] f2fs: fix to avoid broken of dnode block list To: Jaegeuk Kim Cc: Chao Yu , linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org References: <20180728013613.91304-1-yuchao0@huawei.com> <20180729013351.GE83620@jaegeuk-macbookpro.roam.corp.google.com> <3b94966d-5dc8-2bdf-2633-bdbb99fba74a@kernel.org> <20180729024917.GA94739@jaegeuk-macbookpro.roam.corp.google.com> From: Chao Yu Message-ID: <5b5e0f89-090b-b939-2243-68f64b5ead87@kernel.org> Date: Sun, 29 Jul 2018 10:55:12 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180729024917.GA94739@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/7/29 10:49, Jaegeuk Kim wrote: > On 07/29, Chao Yu wrote: >> On 2018/7/29 9:33, Jaegeuk Kim wrote: >>> On 07/28, Chao Yu wrote: >>>> f2fs recovery flow is relying on dnode block link list, it means fsynced >>>> file recovery depends on previous dnode's persistence in the list, so >>>> during fsync() we should wait on all regular inode's dnode writebacked >>>> before issuing flush. >>>> >>>> By this way, we can avoid dnode block list being broken by out-of-order >>>> IO submission due to IO scheduler or driver. >>>> >>>> Sheng Yong helps to do the test with this patch: >>>> >>>> Target:/data (f2fs, -) >>>> 64MB / 32768KB / 4KB / 8 >>>> >>>> 1 / PERSIST / Index >>>> >>>> Base: >>>> SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) >>>> 1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08 >>>> 2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7 >>>> 3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48 >>>> Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333 >>>> >>>> After: >>>> SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) >>>> 1 798.81 202.5 41143 40613.87 602.71 838.08 913.83 >>>> 2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27 >>>> 3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91 >>>> Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333 >>>> >>>> Patched/Original: >>>> 0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189 >>> >>> I expect Sheng will provide more test results tho, at least it seems SEQ-RD >>> in Base shows better than two After results, even if it doesn't matter with >>> the issue. Please confirm it first in order for anybody to say there is no >>> regression. >> >> Agreed. > > Hmm, this patch breaks fault injection test where gives a panic in put_super > having fsync_node_num. Let me do the test and fix it. Thanks, > >> >> Thanks, >> >>> >>>> >>>> It looks like atomic write will suffer performance regression. >>>> >>>> I suspect that the criminal is that we forcing to wait all dnode being in >>>> storage cache before we issue PREFLUSH+FUA. >>>> >>>> BTW, will commit ("f2fs: don't need to wait for node writes for atomic write") >>>> cause the problem: we will lose data of last transaction after SPO, even if >>>> atomic write return no error: >>>> >>>> - atomic_open(); >>>> - write() P1, P2, P3; >>>> - atomic_commit(); >>>> - writeback data: P1, P2, P3; >>>> - writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is >>>> writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing >>>> last transaction. >>>> - preflush + fua; >>>> - power-cut >>>> >>>> If we don't wait dnode writeback for atomic_write: >>>> >>>> SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) >>>> 1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85 >>>> 2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77 >>>> 3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92 >>>> Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18 >>>> >>>> Patched/Original: >>>> 0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294 >>>> >>>> SQLite's performance recovers. >>>> >>>> Jaegeuk: >>>> "Practically, I don't see db corruption becase of this. We can excuse to lose >>>> the last transaction." >>>> >>>> Finally, we decide to keep original implementation of atomic write interface >>>> sematics that we don't wait all dnode writeback before preflush+fua submission. >>>> >>>> Tested-by: Sheng Yong >>>> Signed-off-by: Chao Yu >>>> --- >>>> v5: >>>> - add missing Tested-by. >>>> - fix f2fs_reset_fsync_node_info() to reset sbi->fsync_seg_id correctly. >>>> fs/f2fs/checkpoint.c | 2 + >>>> fs/f2fs/data.c | 2 + >>>> fs/f2fs/f2fs.h | 21 ++++++- >>>> fs/f2fs/file.c | 5 +- >>>> fs/f2fs/node.c | 144 +++++++++++++++++++++++++++++++++++-------- >>>> fs/f2fs/super.c | 4 ++ >>>> 6 files changed, 150 insertions(+), 28 deletions(-) >>>> >>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>>> index 581710760ba6..2136430f9f0d 100644 >>>> --- a/fs/f2fs/checkpoint.c >>>> +++ b/fs/f2fs/checkpoint.c >>>> @@ -1410,6 +1410,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) >>>> >>>> f2fs_release_ino_entry(sbi, false); >>>> >>>> + f2fs_reset_fsync_node_info(sbi); >>>> + >>>> clear_sbi_flag(sbi, SBI_IS_DIRTY); >>>> clear_sbi_flag(sbi, SBI_NEED_CP); >>>> __set_cp_next_pack(sbi); >>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>> index 6b8ca5011bfd..572c91e43337 100644 >>>> --- a/fs/f2fs/data.c >>>> +++ b/fs/f2fs/data.c >>>> @@ -177,6 +177,8 @@ static void f2fs_write_end_io(struct bio *bio) >>>> page->index != nid_of_node(page)); >>>> >>>> dec_page_count(sbi, type); >>>> + if (f2fs_in_warm_node_list(sbi, page)) >>>> + f2fs_del_fsync_node_entry(sbi, page); >>>> clear_cold_data(page); >>>> end_page_writeback(page); >>>> } >>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>> index 0374f069520c..6627fef9ae38 100644 >>>> --- a/fs/f2fs/f2fs.h >>>> +++ b/fs/f2fs/f2fs.h >>>> @@ -228,6 +228,12 @@ struct inode_entry { >>>> struct inode *inode; /* vfs inode pointer */ >>>> }; >>>> >>>> +struct fsync_node_entry { >>>> + struct list_head list; /* list head */ >>>> + struct page *page; /* warm node page pointer */ >>>> + unsigned int seq_id; /* sequence id */ >>>> +}; >>>> + >>>> /* for the bitmap indicate blocks to be discarded */ >>>> struct discard_entry { >>>> struct list_head list; /* list head */ >>>> @@ -1156,6 +1162,11 @@ struct f2fs_sb_info { >>>> >>>> struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */ >>>> >>>> + spinlock_t fsync_node_lock; /* for node entry lock */ >>>> + struct list_head fsync_node_list; /* node list head */ >>>> + unsigned int fsync_seg_id; /* sequence id */ >>>> + unsigned int fsync_node_num; /* number of node entries */ >>>> + >>>> /* for orphan inode, use 0'th array */ >>>> unsigned int max_orphans; /* max orphan inodes */ >>>> >>>> @@ -2822,6 +2833,10 @@ struct node_info; >>>> >>>> int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid); >>>> bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type); >>>> +bool f2fs_in_warm_node_list(struct f2fs_sb_info *sbi, struct page *page); >>>> +void f2fs_init_fsync_node_info(struct f2fs_sb_info *sbi); >>>> +void f2fs_del_fsync_node_entry(struct f2fs_sb_info *sbi, struct page *page); >>>> +void f2fs_reset_fsync_node_info(struct f2fs_sb_info *sbi); >>>> int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid); >>>> bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid); >>>> bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino); >>>> @@ -2831,7 +2846,8 @@ pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs); >>>> int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode); >>>> int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from); >>>> int f2fs_truncate_xattr_node(struct inode *inode); >>>> -int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino); >>>> +int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, >>>> + unsigned int seq_id); >>>> int f2fs_remove_inode_page(struct inode *inode); >>>> struct page *f2fs_new_inode_page(struct inode *inode); >>>> struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs); >>>> @@ -2840,7 +2856,8 @@ struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); >>>> struct page *f2fs_get_node_page_ra(struct page *parent, int start); >>>> void f2fs_move_node_page(struct page *node_page, int gc_type); >>>> int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, >>>> - struct writeback_control *wbc, bool atomic); >>>> + struct writeback_control *wbc, bool atomic, >>>> + unsigned int *seq_id); >>>> int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>>> struct writeback_control *wbc, >>>> bool do_balance, enum iostat_type io_type); >>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>> index 8f9ab66858ca..7bd2412a8c37 100644 >>>> --- a/fs/f2fs/file.c >>>> +++ b/fs/f2fs/file.c >>>> @@ -213,6 +213,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, >>>> .nr_to_write = LONG_MAX, >>>> .for_reclaim = 0, >>>> }; >>>> + unsigned int seq_id = 0; >>>> >>>> if (unlikely(f2fs_readonly(inode->i_sb))) >>>> return 0; >>>> @@ -275,7 +276,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, >>>> } >>>> sync_nodes: >>>> atomic_inc(&sbi->wb_sync_req[NODE]); >>>> - ret = f2fs_fsync_node_pages(sbi, inode, &wbc, atomic); >>>> + ret = f2fs_fsync_node_pages(sbi, inode, &wbc, atomic, &seq_id); >>>> atomic_dec(&sbi->wb_sync_req[NODE]); >>>> if (ret) >>>> goto out; >>>> @@ -301,7 +302,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, >>>> * given fsync mark. >>>> */ >>>> if (!atomic) { >>>> - ret = f2fs_wait_on_node_pages_writeback(sbi, ino); >>>> + ret = f2fs_wait_on_node_pages_writeback(sbi, seq_id); >>>> if (ret) >>>> goto out; >>>> } >>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>>> index 9d9f4c9750c4..e109c671cd84 100644 >>>> --- a/fs/f2fs/node.c >>>> +++ b/fs/f2fs/node.c >>>> @@ -28,6 +28,7 @@ >>>> static struct kmem_cache *nat_entry_slab; >>>> static struct kmem_cache *free_nid_slab; >>>> static struct kmem_cache *nat_entry_set_slab; >>>> +static struct kmem_cache *fsync_node_entry_slab; >>>> >>>> /* >>>> * Check whether the given nid is within node id range. >>>> @@ -264,6 +265,72 @@ static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i, >>>> start, nr); >>>> } >>>> >>>> +bool f2fs_in_warm_node_list(struct f2fs_sb_info *sbi, struct page *page) >>>> +{ >>>> + return NODE_MAPPING(sbi) == page->mapping && >>>> + IS_DNODE(page) && is_cold_node(page); >>>> +} >>>> + >>>> +void f2fs_init_fsync_node_info(struct f2fs_sb_info *sbi) >>>> +{ >>>> + spin_lock_init(&sbi->fsync_node_lock); >>>> + INIT_LIST_HEAD(&sbi->fsync_node_list); >>>> + sbi->fsync_seg_id = 0; >>>> + sbi->fsync_node_num = 0; >>>> +} >>>> + >>>> +static unsigned int f2fs_add_fsync_node_entry(struct f2fs_sb_info *sbi, >>>> + struct page *page) >>>> +{ >>>> + struct fsync_node_entry *fn; >>>> + unsigned long flags; >>>> + unsigned int seq_id; >>>> + >>>> + fn = f2fs_kmem_cache_alloc(fsync_node_entry_slab, GFP_NOFS); >>>> + >>>> + get_page(page); >>>> + fn->page = page; >>>> + INIT_LIST_HEAD(&fn->list); >>>> + >>>> + spin_lock_irqsave(&sbi->fsync_node_lock, flags); >>>> + list_add_tail(&fn->list, &sbi->fsync_node_list); >>>> + fn->seq_id = sbi->fsync_seg_id++; >>>> + seq_id = fn->seq_id; >>>> + sbi->fsync_node_num++; >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> + >>>> + return seq_id; >>>> +} >>>> + >>>> +void f2fs_del_fsync_node_entry(struct f2fs_sb_info *sbi, struct page *page) >>>> +{ >>>> + struct fsync_node_entry *fn; >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&sbi->fsync_node_lock, flags); >>>> + list_for_each_entry(fn, &sbi->fsync_node_list, list) { >>>> + if (fn->page == page) { >>>> + list_del(&fn->list); >>>> + sbi->fsync_node_num--; >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> + kmem_cache_free(fsync_node_entry_slab, fn); >>>> + put_page(page); >>>> + return; >>>> + } >>>> + } >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> + f2fs_bug_on(sbi, 1); >>>> +} >>>> + >>>> +void f2fs_reset_fsync_node_info(struct f2fs_sb_info *sbi) >>>> +{ >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&sbi->fsync_node_lock, flags); >>>> + sbi->fsync_seg_id = 0; >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> +} >>>> + >>>> int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) >>>> { >>>> struct f2fs_nm_info *nm_i = NM_I(sbi); >>>> @@ -1384,7 +1451,7 @@ static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino) >>>> >>>> static int __write_node_page(struct page *page, bool atomic, bool *submitted, >>>> struct writeback_control *wbc, bool do_balance, >>>> - enum iostat_type io_type) >>>> + enum iostat_type io_type, unsigned int *seq_id) >>>> { >>>> struct f2fs_sb_info *sbi = F2FS_P_SB(page); >>>> nid_t nid; >>>> @@ -1401,6 +1468,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, >>>> .io_type = io_type, >>>> .io_wbc = wbc, >>>> }; >>>> + unsigned int seq; >>>> >>>> trace_f2fs_writepage(page, NODE); >>>> >>>> @@ -1442,6 +1510,13 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, >>>> >>>> set_page_writeback(page); >>>> ClearPageError(page); >>>> + >>>> + if (f2fs_in_warm_node_list(sbi, page)) { >>>> + seq = f2fs_add_fsync_node_entry(sbi, page); >>>> + if (seq_id) >>>> + *seq_id = seq; >>>> + } >>>> + >>>> fio.old_blkaddr = ni.blk_addr; >>>> f2fs_do_write_node_page(nid, &fio); >>>> set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page)); >>>> @@ -1489,7 +1564,7 @@ void f2fs_move_node_page(struct page *node_page, int gc_type) >>>> goto out_page; >>>> >>>> if (__write_node_page(node_page, false, NULL, >>>> - &wbc, false, FS_GC_NODE_IO)) >>>> + &wbc, false, FS_GC_NODE_IO, NULL)) >>>> unlock_page(node_page); >>>> goto release_page; >>>> } else { >>>> @@ -1506,11 +1581,13 @@ void f2fs_move_node_page(struct page *node_page, int gc_type) >>>> static int f2fs_write_node_page(struct page *page, >>>> struct writeback_control *wbc) >>>> { >>>> - return __write_node_page(page, false, NULL, wbc, false, FS_NODE_IO); >>>> + return __write_node_page(page, false, NULL, wbc, false, >>>> + FS_NODE_IO, NULL); >>>> } >>>> >>>> int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, >>>> - struct writeback_control *wbc, bool atomic) >>>> + struct writeback_control *wbc, bool atomic, >>>> + unsigned int *seq_id) >>>> { >>>> pgoff_t index; >>>> pgoff_t last_idx = ULONG_MAX; >>>> @@ -1591,7 +1668,7 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, >>>> ret = __write_node_page(page, atomic && >>>> page == last_page, >>>> &submitted, wbc, true, >>>> - FS_NODE_IO); >>>> + FS_NODE_IO, seq_id); >>>> if (ret) { >>>> unlock_page(page); >>>> f2fs_put_page(last_page, 0); >>>> @@ -1708,7 +1785,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>>> set_dentry_mark(page, 0); >>>> >>>> ret = __write_node_page(page, false, &submitted, >>>> - wbc, do_balance, io_type); >>>> + wbc, do_balance, io_type, NULL); >>>> if (ret) >>>> unlock_page(page); >>>> else if (submitted) >>>> @@ -1739,35 +1816,46 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>>> return ret; >>>> } >>>> >>>> -int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino) >>>> +int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, >>>> + unsigned int seq_id) >>>> { >>>> - pgoff_t index = 0; >>>> - struct pagevec pvec; >>>> + struct fsync_node_entry *fn; >>>> + struct page *page; >>>> + struct list_head *head = &sbi->fsync_node_list; >>>> + unsigned long flags; >>>> + unsigned int cur_seq_id = 0; >>>> int ret2, ret = 0; >>>> - int nr_pages; >>>> >>>> - pagevec_init(&pvec); >>>> + while (seq_id && cur_seq_id < seq_id) { >>>> + spin_lock_irqsave(&sbi->fsync_node_lock, flags); >>>> + if (list_empty(head)) { >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> + break; >>>> + } >>>> + fn = list_first_entry(head, struct fsync_node_entry, list); >>>> + if (fn->seq_id > seq_id) { >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> + break; >>>> + } >>>> + cur_seq_id = fn->seq_id; >>>> + page = fn->page; >>>> + get_page(page); >>>> + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); >>>> >>>> - while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, >>>> - PAGECACHE_TAG_WRITEBACK))) { >>>> - int i; >>>> + f2fs_wait_on_page_writeback(page, NODE, true); >>>> + if (TestClearPageError(page)) >>>> + ret = -EIO; >>>> >>>> - for (i = 0; i < nr_pages; i++) { >>>> - struct page *page = pvec.pages[i]; >>>> + put_page(page); >>>> >>>> - if (ino && ino_of_node(page) == ino) { >>>> - f2fs_wait_on_page_writeback(page, NODE, true); >>>> - if (TestClearPageError(page)) >>>> - ret = -EIO; >>>> - } >>>> - } >>>> - pagevec_release(&pvec); >>>> - cond_resched(); >>>> + if (ret) >>>> + break; >>>> } >>>> >>>> ret2 = filemap_check_errors(NODE_MAPPING(sbi)); >>>> if (!ret) >>>> ret = ret2; >>>> + >>>> return ret; >>>> } >>>> >>>> @@ -2982,8 +3070,15 @@ int __init f2fs_create_node_manager_caches(void) >>>> sizeof(struct nat_entry_set)); >>>> if (!nat_entry_set_slab) >>>> goto destroy_free_nid; >>>> + >>>> + fsync_node_entry_slab = f2fs_kmem_cache_create("fsync_node_entry", >>>> + sizeof(struct fsync_node_entry)); >>>> + if (!fsync_node_entry_slab) >>>> + goto destroy_nat_entry_set; >>>> return 0; >>>> >>>> +destroy_nat_entry_set: >>>> + kmem_cache_destroy(nat_entry_set_slab); >>>> destroy_free_nid: >>>> kmem_cache_destroy(free_nid_slab); >>>> destroy_nat_entry: >>>> @@ -2994,6 +3089,7 @@ int __init f2fs_create_node_manager_caches(void) >>>> >>>> void f2fs_destroy_node_manager_caches(void) >>>> { >>>> + kmem_cache_destroy(fsync_node_entry_slab); >>>> kmem_cache_destroy(nat_entry_set_slab); >>>> kmem_cache_destroy(free_nid_slab); >>>> kmem_cache_destroy(nat_entry_slab); >>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c >>>> index 0eff4637fe55..cb1ba9f18353 100644 >>>> --- a/fs/f2fs/super.c >>>> +++ b/fs/f2fs/super.c >>>> @@ -1030,6 +1030,8 @@ static void f2fs_put_super(struct super_block *sb) >>>> */ >>>> f2fs_release_ino_entry(sbi, true); >>>> >>>> + f2fs_bug_on(sbi, sbi->fsync_node_num); >>>> + >>>> f2fs_leave_shrinker(sbi); >>>> mutex_unlock(&sbi->umount_mutex); >>>> >>>> @@ -2923,6 +2925,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) >>>> >>>> f2fs_init_ino_entry_info(sbi); >>>> >>>> + f2fs_init_fsync_node_info(sbi); >>>> + >>>> /* setup f2fs internal modules */ >>>> err = f2fs_build_segment_manager(sbi); >>>> if (err) { >>>> -- >>>> 2.18.0.rc1