linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 01/10] f2fs: fix negative value for lseek offset
@ 2014-09-14 22:14 Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 02/10] f2fs: remove lengthy inode->i_ino Jaegeuk Kim
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
	lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.

Reported-by: Tommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9f0ea3d..5cde363 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -353,6 +353,8 @@ static loff_t f2fs_llseek(struct file *file, loff_t offset, int whence)
 						maxbytes, i_size_read(inode));
 	case SEEK_DATA:
 	case SEEK_HOLE:
+		if (offset < 0)
+			return -ENXIO;
 		return f2fs_seek_block(file, offset, whence);
 	}
 
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/10] f2fs: remove lengthy inode->i_ino
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 03/10] f2fs: expand counting dirty pages in the inode page cache Jaegeuk Kim
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

This patch is to remove lengthy name by adding a new variable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5cde363..77426c7 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -139,6 +139,7 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_mapping->host;
 	struct f2fs_inode_info *fi = F2FS_I(inode);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	nid_t ino = inode->i_ino;
 	int ret = 0;
 	bool need_cp = false;
 	struct writeback_control wbc = {
@@ -168,9 +169,9 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * if there is no written data, don't waste time to write recovery info.
 	 */
 	if (!is_inode_flag_set(fi, FI_APPEND_WRITE) &&
-		!exist_written_data(sbi, inode->i_ino, APPEND_INO)) {
+			!exist_written_data(sbi, ino, APPEND_INO)) {
 		if (is_inode_flag_set(fi, FI_UPDATE_WRITE) ||
-			exist_written_data(sbi, inode->i_ino, UPDATE_INO))
+				exist_written_data(sbi, ino, UPDATE_INO))
 			goto flush_out;
 		goto out;
 	}
@@ -208,23 +209,23 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		}
 	} else {
 		/* if there is no written node page, write its inode page */
-		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
-			if (fsync_mark_done(sbi, inode->i_ino))
+		while (!sync_node_pages(sbi, ino, &wbc)) {
+			if (fsync_mark_done(sbi, ino))
 				goto out;
 			mark_inode_dirty_sync(inode);
 			ret = f2fs_write_inode(inode, NULL);
 			if (ret)
 				goto out;
 		}
-		ret = wait_on_node_pages_writeback(sbi, inode->i_ino);
+		ret = wait_on_node_pages_writeback(sbi, ino);
 		if (ret)
 			goto out;
 
 		/* once recovery info is written, don't need to tack this */
-		remove_dirty_inode(sbi, inode->i_ino, APPEND_INO);
+		remove_dirty_inode(sbi, ino, APPEND_INO);
 		clear_inode_flag(fi, FI_APPEND_WRITE);
 flush_out:
-		remove_dirty_inode(sbi, inode->i_ino, UPDATE_INO);
+		remove_dirty_inode(sbi, ino, UPDATE_INO);
 		clear_inode_flag(fi, FI_UPDATE_WRITE);
 		ret = f2fs_issue_flush(F2FS_I_SB(inode));
 	}
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/10] f2fs: expand counting dirty pages in the inode page cache
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 02/10] f2fs: remove lengthy inode->i_ino Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users Jaegeuk Kim
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/checkpoint.c | 10 ++++++----
 fs/f2fs/data.c       | 10 +++++-----
 fs/f2fs/dir.c        |  2 +-
 fs/f2fs/f2fs.h       | 24 ++++++++++++------------
 fs/f2fs/gc.c         |  2 +-
 fs/f2fs/inode.c      |  2 +-
 fs/f2fs/super.c      |  2 +-
 7 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 935a56e..7262d99 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -627,14 +627,16 @@ static int __add_dirty_inode(struct inode *inode, struct dir_inode_entry *new)
 	return 0;
 }
 
-void set_dirty_dir_page(struct inode *inode, struct page *page)
+void update_dirty_page(struct inode *inode, struct page *page)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct dir_inode_entry *new;
 	int ret = 0;
 
-	if (!S_ISDIR(inode->i_mode))
+	if (!S_ISDIR(inode->i_mode)) {
+		inode_inc_dirty_pages(inode);
 		return;
+	}
 
 	new = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
 	new->inode = inode;
@@ -642,7 +644,7 @@ void set_dirty_dir_page(struct inode *inode, struct page *page)
 
 	spin_lock(&sbi->dir_inode_lock);
 	ret = __add_dirty_inode(inode, new);
-	inode_inc_dirty_dents(inode);
+	inode_inc_dirty_pages(inode);
 	SetPagePrivate(page);
 	spin_unlock(&sbi->dir_inode_lock);
 
@@ -677,7 +679,7 @@ void remove_dirty_dir_inode(struct inode *inode)
 		return;
 
 	spin_lock(&sbi->dir_inode_lock);
-	if (get_dirty_dents(inode) ||
+	if (get_dirty_pages(inode) ||
 			!is_inode_flag_set(F2FS_I(inode), FI_DIRTY_DIR)) {
 		spin_unlock(&sbi->dir_inode_lock);
 		return;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 64d8550..6637741 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -863,7 +863,7 @@ done:
 
 	clear_cold_data(page);
 out:
-	inode_dec_dirty_dents(inode);
+	inode_dec_dirty_pages(inode);
 	unlock_page(page);
 	if (need_balance_fs)
 		f2fs_balance_fs(sbi);
@@ -901,7 +901,7 @@ static int f2fs_write_data_pages(struct address_space *mapping,
 		return 0;
 
 	if (S_ISDIR(inode->i_mode) && wbc->sync_mode == WB_SYNC_NONE &&
-			get_dirty_dents(inode) < nr_pages_to_skip(sbi, DATA) &&
+			get_dirty_pages(inode) < nr_pages_to_skip(sbi, DATA) &&
 			available_free_memory(sbi, DIRTY_DENTS))
 		goto skip_write;
 
@@ -923,7 +923,7 @@ static int f2fs_write_data_pages(struct address_space *mapping,
 	return ret;
 
 skip_write:
-	wbc->pages_skipped += get_dirty_dents(inode);
+	wbc->pages_skipped += get_dirty_pages(inode);
 	return 0;
 }
 
@@ -1108,7 +1108,7 @@ static void f2fs_invalidate_data_page(struct page *page, unsigned int offset,
 {
 	struct inode *inode = page->mapping->host;
 	if (PageDirty(page))
-		inode_dec_dirty_dents(inode);
+		inode_dec_dirty_pages(inode);
 	ClearPagePrivate(page);
 }
 
@@ -1130,7 +1130,7 @@ static int f2fs_set_data_page_dirty(struct page *page)
 
 	if (!PageDirty(page)) {
 		__set_page_dirty_nobuffers(page);
-		set_dirty_dir_page(inode, page);
+		update_dirty_page(inode, page);
 		return 1;
 	}
 	return 0;
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index f1ceeb2..b54f871 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -618,7 +618,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
 		truncate_hole(dir, page->index, page->index + 1);
 		clear_page_dirty_for_io(page);
 		ClearPageUptodate(page);
-		inode_dec_dirty_dents(dir);
+		inode_dec_dirty_pages(dir);
 	}
 	f2fs_put_page(page, 1);
 }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1def9ee..2756c16 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -237,7 +237,7 @@ struct f2fs_inode_info {
 	/* Use below internally in f2fs*/
 	unsigned long flags;		/* use to pass per-file flags */
 	struct rw_semaphore i_sem;	/* protect fi info */
-	atomic_t dirty_dents;		/* # of dirty dentry pages */
+	atomic_t dirty_pages;		/* # of dirty pages */
 	f2fs_hash_t chash;		/* hash value of given file name */
 	unsigned int clevel;		/* maximum level of given file name */
 	nid_t i_xattr_nid;		/* node id that contains xattrs */
@@ -747,10 +747,11 @@ static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
 	F2FS_SET_SB_DIRT(sbi);
 }
 
-static inline void inode_inc_dirty_dents(struct inode *inode)
+static inline void inode_inc_dirty_pages(struct inode *inode)
 {
-	inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_DENTS);
-	atomic_inc(&F2FS_I(inode)->dirty_dents);
+	atomic_inc(&F2FS_I(inode)->dirty_pages);
+	if (S_ISDIR(inode->i_mode))
+		inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_DENTS);
 }
 
 static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
@@ -758,13 +759,12 @@ static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
 	atomic_dec(&sbi->nr_pages[count_type]);
 }
 
-static inline void inode_dec_dirty_dents(struct inode *inode)
+static inline void inode_dec_dirty_pages(struct inode *inode)
 {
-	if (!S_ISDIR(inode->i_mode))
-		return;
+	atomic_dec(&F2FS_I(inode)->dirty_pages);
 
-	dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_DENTS);
-	atomic_dec(&F2FS_I(inode)->dirty_dents);
+	if (S_ISDIR(inode->i_mode))
+		dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_DENTS);
 }
 
 static inline int get_pages(struct f2fs_sb_info *sbi, int count_type)
@@ -772,9 +772,9 @@ static inline int get_pages(struct f2fs_sb_info *sbi, int count_type)
 	return atomic_read(&sbi->nr_pages[count_type]);
 }
 
-static inline int get_dirty_dents(struct inode *inode)
+static inline int get_dirty_pages(struct inode *inode)
 {
-	return atomic_read(&F2FS_I(inode)->dirty_dents);
+	return atomic_read(&F2FS_I(inode)->dirty_pages);
 }
 
 static inline int get_blocktype_secs(struct f2fs_sb_info *sbi, int block_type)
@@ -1302,7 +1302,7 @@ void add_orphan_inode(struct f2fs_sb_info *, nid_t);
 void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
 void recover_orphan_inodes(struct f2fs_sb_info *);
 int get_valid_checkpoint(struct f2fs_sb_info *);
-void set_dirty_dir_page(struct inode *, struct page *);
+void update_dirty_page(struct inode *, struct page *);
 void add_dirty_dir_inode(struct inode *);
 void remove_dirty_dir_inode(struct inode *);
 void sync_dirty_dir_inodes(struct f2fs_sb_info *);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 075ea1e..dca7818 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -537,7 +537,7 @@ static void move_data_page(struct inode *inode, struct page *page, int gc_type)
 		f2fs_wait_on_page_writeback(page, DATA);
 
 		if (clear_page_dirty_for_io(page))
-			inode_dec_dirty_dents(inode);
+			inode_dec_dirty_pages(inode);
 		set_cold_data(page);
 		do_write_data_page(page, &fio);
 		clear_cold_data(page);
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 95c0bc2..ff95547 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -276,7 +276,7 @@ void f2fs_evict_inode(struct inode *inode)
 			inode->i_ino == F2FS_META_INO(sbi))
 		goto out_clear;
 
-	f2fs_bug_on(sbi, get_dirty_dents(inode));
+	f2fs_bug_on(sbi, get_dirty_pages(inode));
 	remove_dirty_dir_inode(inode);
 
 	if (inode->i_nlink || is_bad_inode(inode))
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 3275e73..b5af9be 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -366,7 +366,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
 
 	/* Initialize f2fs-specific inode info */
 	fi->vfs_inode.i_version = 1;
-	atomic_set(&fi->dirty_dents, 0);
+	atomic_set(&fi->dirty_pages, 0);
 	fi->i_current_depth = 1;
 	fi->i_advise = 0;
 	rwlock_init(&fi->ext.ext_lock);
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 02/10] f2fs: remove lengthy inode->i_ino Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 03/10] f2fs: expand counting dirty pages in the inode page cache Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-15  2:13   ` [f2fs-dev] " Changman Lee
  2014-09-14 22:14 ` [PATCH 05/10] f2fs: fix roll-forward missing scenarios Jaegeuk Kim
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 Documentation/ABI/testing/sysfs-fs-f2fs |  7 +++++++
 Documentation/filesystems/f2fs.txt      |  9 ++++++++-
 fs/f2fs/f2fs.h                          |  1 +
 fs/f2fs/file.c                          |  7 +++----
 fs/f2fs/segment.c                       |  3 ++-
 fs/f2fs/segment.h                       | 14 ++++++++++----
 fs/f2fs/super.c                         |  2 ++
 7 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 62dd725..6f9157f 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -44,6 +44,13 @@ Description:
 		 Controls the FS utilization condition for the in-place-update
 		 policies.
 
+What:		/sys/fs/f2fs/<disk>/min_fsync_blocks
+Date:		September 2014
+Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
+Description:
+		 Controls the dirty page count condition for the in-place-update
+		 policies.
+
 What:		/sys/fs/f2fs/<disk>/max_small_discards
 Date:		November 2013
 Contact:	"Jaegeuk Kim" <jaegeuk.kim@samsung.com>
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
index a2046a7..d010da8 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/<devname>
                               updates in f2fs. There are five policies:
                                0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR,
                                2: F2FS_IPU_UTIL,  3: F2FS_IPU_SSR_UTIL,
-                               4: F2FS_IPU_DISABLE.
+                               4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE.
 
  min_ipu_util                 This parameter controls the threshold to trigger
                               in-place-updates. The number indicates percentage
                               of the filesystem utilization, and used by
                               F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
 
+ min_fsync_blocks             This parameter controls the threshold to trigger
+                              in-place-updates when F2FS_IPU_FSYNC mode is set.
+			      The number indicates the number of dirty pages
+			      when fsync needs to flush on its call path. If
+			      the number is less than this value, it triggers
+			      in-place-updates.
+
  max_victim_search	      This parameter controls the number of trials to
 			      find a victim segment when conducting SSR and
 			      cleaning operations. The default value is 4096
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2756c16..4f84d2a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -386,6 +386,7 @@ struct f2fs_sm_info {
 
 	unsigned int ipu_policy;	/* in-place-update policy */
 	unsigned int min_ipu_util;	/* in-place-update threshold */
+	unsigned int min_fsync_blocks;	/* threshold for fsync */
 
 	/* for flush command control */
 	struct flush_cmd_control *cmd_control_info;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 77426c7..af06e22 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	trace_f2fs_sync_file_enter(inode);
 
 	/* if fdatasync is triggered, let's do in-place-update */
-	if (datasync)
+	if (get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
 		set_inode_flag(fi, FI_NEED_IPU);
-
 	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-	if (datasync)
-		clear_inode_flag(fi, FI_NEED_IPU);
+	clear_inode_flag(fi, FI_NEED_IPU);
+
 	if (ret) {
 		trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
 		return ret;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index e158d63..c6f627b 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1928,8 +1928,9 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
 	sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
 	sm_info->rec_prefree_segments = sm_info->main_segments *
 					DEF_RECLAIM_PREFREE_SEGMENTS / 100;
-	sm_info->ipu_policy = F2FS_IPU_DISABLE;
+	sm_info->ipu_policy = F2FS_IPU_FSYNC;
 	sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
+	sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
 
 	INIT_LIST_HEAD(&sm_info->discard_list);
 	sm_info->nr_discards = 0;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index bed0dc9..013aed2 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -472,15 +472,20 @@ static inline int utilization(struct f2fs_sb_info *sbi)
  * F2FS_IPU_UTIL - if FS utilization is over threashold,
  * F2FS_IPU_SSR_UTIL - if SSR mode is activated and FS utilization is over
  *                     threashold,
+ * F2FS_IPU_FSYNC - activated in fsync path only for high performance flash
+ *                     storages. IPU will be triggered only if the # of dirty
+ *                     pages over min_fsync_blocks.
  * F2FS_IPUT_DISABLE - disable IPU. (=default option)
  */
 #define DEF_MIN_IPU_UTIL	70
+#define DEF_MIN_FSYNC_BLOCKS	8
 
 enum {
 	F2FS_IPU_FORCE,
 	F2FS_IPU_SSR,
 	F2FS_IPU_UTIL,
 	F2FS_IPU_SSR_UTIL,
+	F2FS_IPU_FSYNC,
 	F2FS_IPU_DISABLE,
 };
 
@@ -492,10 +497,6 @@ static inline bool need_inplace_update(struct inode *inode)
 	if (S_ISDIR(inode->i_mode))
 		return false;
 
-	/* this is only set during fdatasync */
-	if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
-		return true;
-
 	switch (SM_I(sbi)->ipu_policy) {
 	case F2FS_IPU_FORCE:
 		return true;
@@ -511,6 +512,11 @@ static inline bool need_inplace_update(struct inode *inode)
 		if (need_SSR(sbi) && utilization(sbi) > SM_I(sbi)->min_ipu_util)
 			return true;
 		break;
+	case F2FS_IPU_FSYNC:
+		/* this is only set during fdatasync */
+		if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
+			return true;
+		break;
 	case F2FS_IPU_DISABLE:
 		break;
 	}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b5af9be..ed4095e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -190,6 +190,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, reclaim_segments, rec_prefree_segments);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, max_small_discards, max_discards);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
+F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
 F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_victim_search, max_victim_search);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, dir_level, dir_level);
@@ -204,6 +205,7 @@ static struct attribute *f2fs_attrs[] = {
 	ATTR_LIST(max_small_discards),
 	ATTR_LIST(ipu_policy),
 	ATTR_LIST(min_ipu_util),
+	ATTR_LIST(min_fsync_blocks),
 	ATTR_LIST(max_victim_search),
 	ATTR_LIST(dir_level),
 	ATTR_LIST(ram_thresh),
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/10] f2fs: fix roll-forward missing scenarios
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
                   ` (2 preceding siblings ...)
  2014-09-14 22:14 ` [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 06/10] f2fs: do not skip latest inode information Jaegeuk Kim
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c     | 20 ++++++++++++----
 fs/f2fs/node.c     | 11 ++++++++-
 fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index af06e22..c814cd2 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -207,15 +207,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 			up_write(&fi->i_sem);
 		}
 	} else {
-		/* if there is no written node page, write its inode page */
-		while (!sync_node_pages(sbi, ino, &wbc)) {
-			if (fsync_mark_done(sbi, ino))
-				goto out;
+sync_nodes:
+		sync_node_pages(sbi, ino, &wbc);
+
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 *  -> ok
+		 * inode(x) | CP | dnode(F) | inode(x)
+		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
+		 * CP | inode(x) | dnode(F)
+		 *  -> CP | inode(x) | dnode(F) | inode(DF)
+		 * CP | dnode(F) | inode(x)
+		 *  -> CP | dnode(F) | inode(x) | inode(DF)
+		 */
+		if (!fsync_mark_done(sbi, ino)) {
 			mark_inode_dirty_sync(inode);
 			ret = f2fs_write_inode(inode, NULL);
 			if (ret)
 				goto out;
+			goto sync_nodes;
 		}
+
 		ret = wait_on_node_pages_writeback(sbi, ino);
 		if (ret)
 			goto out;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b32eb56..653aa71 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -248,8 +248,17 @@ retry:
 
 	/* update fsync_mark if its inode nat entry is still alive */
 	e = __lookup_nat_cache(nm_i, ni->ino);
-	if (e)
+	if (e) {
+		/*
+		 * CP | inode(x) | dnode(F)
+		 *  -> CP | inode(x) | dnode(F) | inode(DF)
+		 */
+		if (!e->checkpointed && !e->fsync_done &&
+				ni->ino != ni->nid && fsync_done)
+			goto skip;
 		e->fsync_done = fsync_done;
+	}
+skip:
 	write_unlock(&nm_i->nat_tree_lock);
 }
 
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6c5a74a..3736728 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -14,6 +14,36 @@
 #include "node.h"
 #include "segment.h"
 
+/*
+ * Roll forward recovery scenarios.
+ *
+ * [Term] F: fsync_mark, D: dentry_mark
+ *
+ * 1. inode(x) | CP | inode(x) | dnode(F)
+ * -> Update the latest inode(x).
+ *
+ * 2. inode(x) | CP | inode(F) | dnode(F)
+ * -> No problem.
+ *
+ * 3. inode(x) | CP | dnode(F) | inode(x)
+ * -> Recover to the latest dnode(F), and drop the last inode(x)
+ *
+ * 4. inode(x) | CP | dnode(F) | inode(F)
+ * -> No problem.
+ *
+ * 5. CP | inode(x) | dnode(F)
+ * -> The inode(DF) was missing. Should drop this dnode(F).
+ *
+ * 6. CP | inode(DF) | dnode(F)
+ * -> No problem.
+ *
+ * 7. CP | dnode(F) | inode(DF)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *
+ * 8. CP | dnode(F) | inode(x)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *    But it will fail due to no inode(DF).
+ */
 static struct kmem_cache *fsync_entry_slab;
 
 bool space_for_roll_forward(struct f2fs_sb_info *sbi)
@@ -110,27 +140,32 @@ out:
 	return err;
 }
 
-static int recover_inode(struct inode *inode, struct page *node_page)
+static void __recover_inode(struct inode *inode, struct page *page)
 {
-	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
+	struct f2fs_inode *raw = F2FS_INODE(page);
+
+	inode->i_mode = le16_to_cpu(raw->i_mode);
+	i_size_write(inode, le64_to_cpu(raw->i_size));
+	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
+	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
+	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+}
 
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
 	if (!IS_INODE(node_page))
 		return 0;
 
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
-	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
-	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
-	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+	__recover_inode(inode, node_page);
 
 	if (is_dent_dnode(node_page))
 		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), raw_inode->i_name);
+			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
 	return 0;
 }
 
@@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 				break;
 			}
 
+			/*
+			 * CP | dnode(F) | inode(DF)
+			 * For this case, we should not give up now.
+			 */
 			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (err == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
@@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 * In this case, we can lose the latest inode(x).
+		 * So, call __recover_inode for the inode update.
+		 */
+		if (IS_INODE(page))
+			__recover_inode(entry->inode, page);
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/10] f2fs: do not skip latest inode information
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
                   ` (3 preceding siblings ...)
  2014-09-14 22:14 ` [PATCH 05/10] f2fs: fix roll-forward missing scenarios Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed Jaegeuk Kim
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index c814cd2..fb96256 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -169,12 +169,21 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 */
 	if (!is_inode_flag_set(fi, FI_APPEND_WRITE) &&
 			!exist_written_data(sbi, ino, APPEND_INO)) {
+		struct page *i = find_get_page(NODE_MAPPING(sbi), ino);
+
+		/* But we need to avoid that there are some inode updates */
+		if ((i && PageDirty(i)) || !fsync_mark_done(sbi, ino)) {
+			f2fs_put_page(i, 0);
+			goto go_write;
+		}
+		f2fs_put_page(i, 0);
+
 		if (is_inode_flag_set(fi, FI_UPDATE_WRITE) ||
 				exist_written_data(sbi, ino, UPDATE_INO))
 			goto flush_out;
 		goto out;
 	}
-
+go_write:
 	/* guarantee free sections for fsync */
 	f2fs_balance_fs(sbi);
 
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
                   ` (4 preceding siblings ...)
  2014-09-14 22:14 ` [PATCH 06/10] f2fs: do not skip latest inode information Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-22  2:36   ` [f2fs-dev] " Chao Yu
  2014-09-14 22:14 ` [PATCH 08/10] f2fs: remove redundant operation during roll-forward recovery Jaegeuk Kim
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/checkpoint.c | 11 ++++++++--
 fs/f2fs/f2fs.h       |  5 +++--
 fs/f2fs/recovery.c   | 59 +++++++++++++++++++++++++---------------------------
 fs/f2fs/segment.h    |  5 +++--
 4 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7262d99..d1ed889 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -82,6 +82,8 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
 	case META_SSA:
 	case META_CP:
 		return 0;
+	case META_POR:
+		return SM_I(sbi)->main_blkaddr + sbi->user_block_count;
 	default:
 		BUG();
 	}
@@ -90,11 +92,11 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
 /*
  * Readahead CP/NAT/SIT/SSA pages
  */
-int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
+int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type)
 {
 	block_t prev_blk_addr = 0;
 	struct page *page;
-	int blkno = start;
+	block_t blkno = start;
 	int max_blks = get_max_meta_blks(sbi, type);
 
 	struct f2fs_io_info fio = {
@@ -128,6 +130,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
 			/* get ssa/cp block addr */
 			blk_addr = blkno;
 			break;
+		case META_POR:
+			if (unlikely(blkno >= max_blks))
+				goto out;
+			blk_addr = blkno;
+			break;
 		default:
 			BUG();
 		}
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4f84d2a..48d7d46 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -103,7 +103,8 @@ enum {
 	META_CP,
 	META_NAT,
 	META_SIT,
-	META_SSA
+	META_SSA,
+	META_POR,
 };
 
 /* for the list of ino */
@@ -1291,7 +1292,7 @@ void destroy_segment_manager_caches(void);
  */
 struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
 struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
-int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
+int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int);
 long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
 void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
 void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 3736728..6f7fbfa 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -173,7 +173,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 {
 	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
 	struct curseg_info *curseg;
-	struct page *page;
+	struct page *page = NULL;
 	block_t blkaddr;
 	int err = 0;
 
@@ -181,20 +181,19 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
 	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
 
-	/* read node page */
-	page = alloc_page(GFP_F2FS_ZERO);
-	if (!page)
-		return -ENOMEM;
-	lock_page(page);
-
 	while (1) {
 		struct fsync_inode_entry *entry;
 
-		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
-		if (err)
-			return err;
+		if (blkaddr < SM_I(sbi)->main_blkaddr ||
+					blkaddr > TOTAL_BLKS(sbi))
+			break;
 
-		lock_page(page);
+		page = get_meta_page(sbi, blkaddr);
+		if (IS_ERR(page))
+			return PTR_ERR(page);
+
+		ra_meta_pages(sbi, next_blkaddr_of_node(page),
+				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
 
 		if (cp_ver != cpver_of_node(page))
 			break;
@@ -243,11 +242,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 next:
 		/* check next segment */
 		blkaddr = next_blkaddr_of_node(page);
+		f2fs_put_page(page, 1);
 	}
-
-	unlock_page(page);
-	__free_pages(page, 0);
-
+	f2fs_put_page(page, 1);
 	return err;
 }
 
@@ -427,7 +424,7 @@ static int recover_data(struct f2fs_sb_info *sbi,
 {
 	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
 	struct curseg_info *curseg;
-	struct page *page;
+	struct page *page = NULL;
 	int err = 0;
 	block_t blkaddr;
 
@@ -435,21 +432,19 @@ static int recover_data(struct f2fs_sb_info *sbi,
 	curseg = CURSEG_I(sbi, type);
 	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
 
-	/* read node page */
-	page = alloc_page(GFP_F2FS_ZERO);
-	if (!page)
-		return -ENOMEM;
-
-	lock_page(page);
-
 	while (1) {
 		struct fsync_inode_entry *entry;
 
-		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
-		if (err)
-			return err;
+		if (blkaddr < SM_I(sbi)->main_blkaddr ||
+					blkaddr > TOTAL_BLKS(sbi))
+			break;
 
-		lock_page(page);
+		page = get_meta_page(sbi, blkaddr);
+		if (IS_ERR(page))
+			return PTR_ERR(page);
+
+		ra_meta_pages(sbi, next_blkaddr_of_node(page),
+				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
 
 		if (cp_ver != cpver_of_node(page))
 			break;
@@ -477,11 +472,9 @@ static int recover_data(struct f2fs_sb_info *sbi,
 next:
 		/* check next segment */
 		blkaddr = next_blkaddr_of_node(page);
+		f2fs_put_page(page, 1);
 	}
-
-	unlock_page(page);
-	__free_pages(page, 0);
-
+	f2fs_put_page(page, 1);
 	if (!err)
 		allocate_new_segments(sbi);
 	return err;
@@ -527,6 +520,10 @@ out:
 	destroy_fsync_dnodes(&inode_list);
 	kmem_cache_destroy(fsync_entry_slab);
 
+	/* truncate meta pages to be used by the recovery */
+	truncate_inode_pages_range(META_MAPPING(sbi),
+		SM_I(sbi)->main_blkaddr << PAGE_CACHE_SHIFT, -1);
+
 	if (err) {
 		truncate_inode_pages_final(NODE_MAPPING(sbi));
 		truncate_inode_pages_final(META_MAPPING(sbi));
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 013aed2..c6f37f2 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -87,6 +87,7 @@
 	(BITS_TO_LONGS(nr) * sizeof(unsigned long))
 #define TOTAL_SEGS(sbi)	(SM_I(sbi)->main_segments)
 #define TOTAL_SECS(sbi)	(sbi->total_sections)
+#define TOTAL_BLKS(sbi)	(SM_I(sbi)->segment_count << sbi->log_blocks_per_seg)
 
 #define SECTOR_FROM_BLOCK(sbi, blk_addr)				\
 	(((sector_t)blk_addr) << (sbi)->log_sectors_per_block)
@@ -553,7 +554,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
 static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
 {
 	struct f2fs_sm_info *sm_info = SM_I(sbi);
-	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
+	block_t total_blks = TOTAL_BLKS(sbi);
 	block_t start_addr = sm_info->seg0_blkaddr;
 	block_t end_addr = start_addr + total_blks - 1;
 	BUG_ON(blk_addr < start_addr);
@@ -606,7 +607,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
 static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
 {
 	struct f2fs_sm_info *sm_info = SM_I(sbi);
-	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
+	block_t total_blks = TOTAL_BLKS(sbi);
 	block_t start_addr = sm_info->seg0_blkaddr;
 	block_t end_addr = start_addr + total_blks - 1;
 
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/10] f2fs: remove redundant operation during roll-forward recovery
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
                   ` (5 preceding siblings ...)
  2014-09-14 22:14 ` [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi) Jaegeuk Kim
  2014-09-14 22:14 ` [PATCH 10/10] f2fs: fix double lock for inode page during roll-foward recovery Jaegeuk Kim
  8 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/f2fs.h     |  4 +++-
 fs/f2fs/recovery.c | 41 +++++++++++++++++------------------------
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 48d7d46..74dde99 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -137,7 +137,9 @@ struct discard_entry {
 struct fsync_inode_entry {
 	struct list_head list;	/* list head */
 	struct inode *inode;	/* vfs inode pointer */
-	block_t blkaddr;	/* block address locating the last inode */
+	block_t blkaddr;	/* block address locating the last fsync */
+	block_t last_dentry;	/* block address locating the last dentry */
+	block_t last_inode;	/* block address locating the last inode */
 };
 
 #define nats_in_cursum(sum)		(le16_to_cpu(sum->n_nats))
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6f7fbfa..95d9dc9 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -66,7 +66,7 @@ static struct fsync_inode_entry *get_fsync_inode(struct list_head *head,
 	return NULL;
 }
 
-static int recover_dentry(struct page *ipage, struct inode *inode)
+static int recover_dentry(struct inode *inode, struct page *ipage)
 {
 	struct f2fs_inode *raw_inode = F2FS_INODE(ipage);
 	nid_t pino = le32_to_cpu(raw_inode->i_pino);
@@ -140,7 +140,7 @@ out:
 	return err;
 }
 
-static void __recover_inode(struct inode *inode, struct page *page)
+static void recover_inode(struct inode *inode, struct page *page)
 {
 	struct f2fs_inode *raw = F2FS_INODE(page);
 
@@ -152,21 +152,9 @@ static void __recover_inode(struct inode *inode, struct page *page)
 	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
 	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
 	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
-}
-
-static int recover_inode(struct inode *inode, struct page *node_page)
-{
-	if (!IS_INODE(node_page))
-		return 0;
-
-	__recover_inode(inode, node_page);
-
-	if (is_dent_dnode(node_page))
-		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
-	return 0;
+			ino_of_node(page), F2FS_INODE(page)->i_name);
 }
 
 static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
@@ -214,12 +202,11 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 			}
 
 			/* add this fsync inode to the list */
-			entry = kmem_cache_alloc(fsync_entry_slab, GFP_NOFS);
+			entry = kmem_cache_alloc(fsync_entry_slab, GFP_F2FS_ZERO);
 			if (!entry) {
 				err = -ENOMEM;
 				break;
 			}
-
 			/*
 			 * CP | dnode(F) | inode(DF)
 			 * For this case, we should not give up now.
@@ -236,9 +223,11 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 		}
 		entry->blkaddr = blkaddr;
 
-		err = recover_inode(entry->inode, page);
-		if (err && err != -ENOENT)
-			break;
+		if (IS_INODE(page)) {
+			entry->last_inode = blkaddr;
+			if (is_dent_dnode(page))
+				entry->last_dentry = blkaddr;
+		}
 next:
 		/* check next segment */
 		blkaddr = next_blkaddr_of_node(page);
@@ -455,11 +444,15 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		/*
 		 * inode(x) | CP | inode(x) | dnode(F)
 		 * In this case, we can lose the latest inode(x).
-		 * So, call __recover_inode for the inode update.
+		 * So, call recover_inode for the inode update.
 		 */
-		if (IS_INODE(page))
-			__recover_inode(entry->inode, page);
-
+		if (entry->last_inode == blkaddr)
+			recover_inode(entry->inode, page);
+		if (entry->last_dentry == blkaddr) {
+			err = recover_dentry(entry->inode, page);
+			if (err)
+				break;
+		}
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
 			break;
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi)
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
                   ` (6 preceding siblings ...)
  2014-09-14 22:14 ` [PATCH 08/10] f2fs: remove redundant operation during roll-forward recovery Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  2014-09-14 22:34   ` Joe Perches
  2014-09-14 22:14 ` [PATCH 10/10] f2fs: fix double lock for inode page during roll-foward recovery Jaegeuk Kim
  8 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

This patch cleans up a simple macro.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/data.c     | 2 +-
 fs/f2fs/node.c     | 2 +-
 fs/f2fs/recovery.c | 4 ++--
 fs/f2fs/segment.c  | 2 +-
 fs/f2fs/segment.h  | 8 ++++----
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6637741..e800d71 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -193,7 +193,7 @@ void f2fs_submit_page_mbio(struct f2fs_sb_info *sbi, struct page *page,
 		__submit_merged_bio(io);
 alloc_new:
 	if (io->bio == NULL) {
-		int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+		int bio_blocks = MAX_BIO_BLOCKS(sbi);
 
 		io->bio = __bio_alloc(sbi, blk_addr, bio_blocks, is_read);
 		io->fio = *fio;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 653aa71..e38343d 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1692,7 +1692,7 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
 	struct f2fs_summary *sum_entry;
 	struct inode *inode = sbi->sb->s_bdev->bd_inode;
 	block_t addr;
-	int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+	int bio_blocks = MAX_BIO_BLOCKS(sbi);
 	struct page *pages[bio_blocks];
 	int i, idx, last_offset, nrpages, err = 0;
 
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 95d9dc9..dbeb32d 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -181,7 +181,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 			return PTR_ERR(page);
 
 		ra_meta_pages(sbi, next_blkaddr_of_node(page),
-				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
+					MAX_BIO_BLOCKS(sbi), META_POR);
 
 		if (cp_ver != cpver_of_node(page))
 			break;
@@ -433,7 +433,7 @@ static int recover_data(struct f2fs_sb_info *sbi,
 			return PTR_ERR(page);
 
 		ra_meta_pages(sbi, next_blkaddr_of_node(page),
-				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
+					MAX_BIO_BLOCKS(sbi), META_POR);
 
 		if (cp_ver != cpver_of_node(page))
 			break;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c6f627b..4ea53aa 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1757,7 +1757,7 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
 	int sit_blk_cnt = SIT_BLK_CNT(sbi);
 	unsigned int i, start, end;
 	unsigned int readed, start_blk = 0;
-	int nrpages = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+	int nrpages = MAX_BIO_BLOCKS(sbi);
 
 	do {
 		readed = ra_meta_pages(sbi, start_blk, nrpages, META_SIT);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index c6f37f2..1ae6c35 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -93,8 +93,8 @@
 	(((sector_t)blk_addr) << (sbi)->log_sectors_per_block)
 #define SECTOR_TO_BLOCK(sbi, sectors)					\
 	(sectors >> (sbi)->log_sectors_per_block)
-#define MAX_BIO_BLOCKS(max_hw_blocks)					\
-	(min((int)max_hw_blocks, BIO_MAX_PAGES))
+#define MAX_BIO_BLOCKS(sbi)					\
+	(min((int)max_hw_blocks(sbi), BIO_MAX_PAGES))
 
 /*
  * indicate a block allocation direction: RIGHT and LEFT.
@@ -728,7 +728,7 @@ static inline int nr_pages_to_skip(struct f2fs_sb_info *sbi, int type)
 	else if (type == NODE)
 		return 3 * sbi->blocks_per_seg;
 	else if (type == META)
-		return MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+		return MAX_BIO_BLOCKS(sbi);
 	else
 		return 0;
 }
@@ -751,7 +751,7 @@ static inline long nr_pages_to_write(struct f2fs_sb_info *sbi, int type,
 	else if (type == NODE)
 		desired = 3 * max_hw_blocks(sbi);
 	else
-		desired = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+		desired = MAX_BIO_BLOCKS(sbi);
 
 	wbc->nr_to_write = desired;
 	return desired - nr_to_write;
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/10] f2fs: fix double lock for inode page during roll-foward recovery
  2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
                   ` (7 preceding siblings ...)
  2014-09-14 22:14 ` [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi) Jaegeuk Kim
@ 2014-09-14 22:14 ` Jaegeuk Kim
  8 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-14 22:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-f2fs-devel; +Cc: Jaegeuk Kim

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/recovery.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index dbeb32d..5638c05 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -306,16 +306,30 @@ got_it:
 	ino = ino_of_node(node_page);
 	f2fs_put_page(node_page, 1);
 
-	/* Deallocate previous index in the node page */
-	inode = f2fs_iget(sbi->sb, ino);
-	if (IS_ERR(inode))
-		return PTR_ERR(inode);
+	if (ino != dn->inode->i_ino) {
+		/* Deallocate previous index in the node page */
+		inode = f2fs_iget(sbi->sb, ino);
+		if (IS_ERR(inode))
+			return PTR_ERR(inode);
+	} else {
+		inode = dn->inode;
+	}
 
 	bidx = start_bidx_of_node(offset, F2FS_I(inode)) +
-					le16_to_cpu(sum.ofs_in_node);
+			le16_to_cpu(sum.ofs_in_node);
 
-	truncate_hole(inode, bidx, bidx + 1);
-	iput(inode);
+	if (ino != dn->inode->i_ino) {
+		truncate_hole(inode, bidx, bidx + 1);
+		iput(inode);
+	} else {
+		struct dnode_of_data tdn;
+		set_new_dnode(&tdn, inode, dn->inode_page, NULL, 0);
+		if (get_dnode_of_data(&tdn, bidx, LOOKUP_NODE))
+			return 0;
+		if (tdn.data_blkaddr != NULL_ADDR)
+			truncate_data_blocks_range(&tdn, 1);
+		f2fs_put_page(tdn.node_page, 1);
+	}
 	return 0;
 }
 
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi)
  2014-09-14 22:14 ` [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi) Jaegeuk Kim
@ 2014-09-14 22:34   ` Joe Perches
  2014-09-18  5:37     ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Joe Perches @ 2014-09-14 22:34 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

On Sun, 2014-09-14 at 15:14 -0700, Jaegeuk Kim wrote:
> This patch cleans up a simple macro.

There seems to be many different types used here.

	MAX_BIO_BLOCKS returns unsigned int
	bio_blocks is int,
	blocks_per_seg is unsigned int,
	ra_meta_pages(,,int,)
	nr_pages_to_skip uses int
	nr_pages_to_write returns a long

Perhaps it'd be nicer to standardize the type
so there'd be fewer promotions/implicit casts.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users
  2014-09-14 22:14 ` [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users Jaegeuk Kim
@ 2014-09-15  2:13   ` Changman Lee
  2014-09-18  5:39     ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Changman Lee @ 2014-09-15  2:13 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

Hi JK,

I think it' nicer if this can be used as 'OR' with other policy
together. If so, we can also cover the weakness in high utilization.

Regard,
Changman

On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote:
> If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
> only starts to try in-place-updates.
> And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
> keeps out-of-order manner. Otherwise, it triggers in-place-updates.
> 
> This may be used by storage showing very high random write performance.
> 
> For example, it can be used when,
> 
> Seq. writes (Data) + wait + Seq. writes (Node)
> 
> is pretty much slower than,
> 
> Rand. writes (Data)
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  7 +++++++
>  Documentation/filesystems/f2fs.txt      |  9 ++++++++-
>  fs/f2fs/f2fs.h                          |  1 +
>  fs/f2fs/file.c                          |  7 +++----
>  fs/f2fs/segment.c                       |  3 ++-
>  fs/f2fs/segment.h                       | 14 ++++++++++----
>  fs/f2fs/super.c                         |  2 ++
>  7 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 62dd725..6f9157f 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -44,6 +44,13 @@ Description:
>  		 Controls the FS utilization condition for the in-place-update
>  		 policies.
>  
> +What:		/sys/fs/f2fs/<disk>/min_fsync_blocks
> +Date:		September 2014
> +Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> +Description:
> +		 Controls the dirty page count condition for the in-place-update
> +		 policies.
> +
>  What:		/sys/fs/f2fs/<disk>/max_small_discards
>  Date:		November 2013
>  Contact:	"Jaegeuk Kim" <jaegeuk.kim@samsung.com>
> diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
> index a2046a7..d010da8 100644
> --- a/Documentation/filesystems/f2fs.txt
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/<devname>
>                                updates in f2fs. There are five policies:
>                                 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR,
>                                 2: F2FS_IPU_UTIL,  3: F2FS_IPU_SSR_UTIL,
> -                               4: F2FS_IPU_DISABLE.
> +                               4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE.
>  
>   min_ipu_util                 This parameter controls the threshold to trigger
>                                in-place-updates. The number indicates percentage
>                                of the filesystem utilization, and used by
>                                F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
>  
> + min_fsync_blocks             This parameter controls the threshold to trigger
> +                              in-place-updates when F2FS_IPU_FSYNC mode is set.
> +			      The number indicates the number of dirty pages
> +			      when fsync needs to flush on its call path. If
> +			      the number is less than this value, it triggers
> +			      in-place-updates.
> +
>   max_victim_search	      This parameter controls the number of trials to
>  			      find a victim segment when conducting SSR and
>  			      cleaning operations. The default value is 4096
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2756c16..4f84d2a 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -386,6 +386,7 @@ struct f2fs_sm_info {
>  
>  	unsigned int ipu_policy;	/* in-place-update policy */
>  	unsigned int min_ipu_util;	/* in-place-update threshold */
> +	unsigned int min_fsync_blocks;	/* threshold for fsync */
>  
>  	/* for flush command control */
>  	struct flush_cmd_control *cmd_control_info;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 77426c7..af06e22 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  	trace_f2fs_sync_file_enter(inode);
>  
>  	/* if fdatasync is triggered, let's do in-place-update */
> -	if (datasync)
> +	if (get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
>  		set_inode_flag(fi, FI_NEED_IPU);
> -
>  	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
> -	if (datasync)
> -		clear_inode_flag(fi, FI_NEED_IPU);
> +	clear_inode_flag(fi, FI_NEED_IPU);
> +
>  	if (ret) {
>  		trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
>  		return ret;
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index e158d63..c6f627b 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1928,8 +1928,9 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
>  	sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
>  	sm_info->rec_prefree_segments = sm_info->main_segments *
>  					DEF_RECLAIM_PREFREE_SEGMENTS / 100;
> -	sm_info->ipu_policy = F2FS_IPU_DISABLE;
> +	sm_info->ipu_policy = F2FS_IPU_FSYNC;
>  	sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
> +	sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
>  
>  	INIT_LIST_HEAD(&sm_info->discard_list);
>  	sm_info->nr_discards = 0;
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index bed0dc9..013aed2 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -472,15 +472,20 @@ static inline int utilization(struct f2fs_sb_info *sbi)
>   * F2FS_IPU_UTIL - if FS utilization is over threashold,
>   * F2FS_IPU_SSR_UTIL - if SSR mode is activated and FS utilization is over
>   *                     threashold,
> + * F2FS_IPU_FSYNC - activated in fsync path only for high performance flash
> + *                     storages. IPU will be triggered only if the # of dirty
> + *                     pages over min_fsync_blocks.
>   * F2FS_IPUT_DISABLE - disable IPU. (=default option)
>   */
>  #define DEF_MIN_IPU_UTIL	70
> +#define DEF_MIN_FSYNC_BLOCKS	8
>  
>  enum {
>  	F2FS_IPU_FORCE,
>  	F2FS_IPU_SSR,
>  	F2FS_IPU_UTIL,
>  	F2FS_IPU_SSR_UTIL,
> +	F2FS_IPU_FSYNC,
>  	F2FS_IPU_DISABLE,
>  };
>  
> @@ -492,10 +497,6 @@ static inline bool need_inplace_update(struct inode *inode)
>  	if (S_ISDIR(inode->i_mode))
>  		return false;
>  
> -	/* this is only set during fdatasync */
> -	if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
> -		return true;
> -
>  	switch (SM_I(sbi)->ipu_policy) {
>  	case F2FS_IPU_FORCE:
>  		return true;
> @@ -511,6 +512,11 @@ static inline bool need_inplace_update(struct inode *inode)
>  		if (need_SSR(sbi) && utilization(sbi) > SM_I(sbi)->min_ipu_util)
>  			return true;
>  		break;
> +	case F2FS_IPU_FSYNC:
> +		/* this is only set during fdatasync */
> +		if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
> +			return true;
> +		break;
>  	case F2FS_IPU_DISABLE:
>  		break;
>  	}
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index b5af9be..ed4095e 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -190,6 +190,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, reclaim_segments, rec_prefree_segments);
>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, max_small_discards, max_discards);
>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
> +F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
>  F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh);
>  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_victim_search, max_victim_search);
>  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, dir_level, dir_level);
> @@ -204,6 +205,7 @@ static struct attribute *f2fs_attrs[] = {
>  	ATTR_LIST(max_small_discards),
>  	ATTR_LIST(ipu_policy),
>  	ATTR_LIST(min_ipu_util),
> +	ATTR_LIST(min_fsync_blocks),
>  	ATTR_LIST(max_victim_search),
>  	ATTR_LIST(dir_level),
>  	ATTR_LIST(ram_thresh),
> -- 
> 1.8.5.2 (Apple Git-48)
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi)
  2014-09-14 22:34   ` Joe Perches
@ 2014-09-18  5:37     ` Jaegeuk Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-18  5:37 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

Hi,

Thank you for the review.
I changed the return value of MAX_BIO_BLOCKS to int.
IMO, it's the best way that I can do for now.

Thanks,

>From f3bcd1d658d1c4aa8178ddc2d4e6a7e45d8405cd Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Thu, 11 Sep 2014 14:37:35 -0700
Subject: [PATCH] f2fs: use MAX_BIO_BLOCKS(sbi)

Change log from v1:
 o change the return value of MAX_BIO_BLOCKS to int type.

This patch cleans up a simple macro.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/data.c     | 2 +-
 fs/f2fs/node.c     | 2 +-
 fs/f2fs/recovery.c | 4 ++--
 fs/f2fs/segment.c  | 2 +-
 fs/f2fs/segment.h  | 8 ++++----
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index fdc3dbe..7749f30 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -193,7 +193,7 @@ void f2fs_submit_page_mbio(struct f2fs_sb_info *sbi, struct page *page,
 		__submit_merged_bio(io);
 alloc_new:
 	if (io->bio == NULL) {
-		int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+		int bio_blocks = MAX_BIO_BLOCKS(sbi);
 
 		io->bio = __bio_alloc(sbi, blk_addr, bio_blocks, is_read);
 		io->fio = *fio;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 7a2d9c9..21ed91b 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1697,7 +1697,7 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
 	struct f2fs_summary *sum_entry;
 	struct inode *inode = sbi->sb->s_bdev->bd_inode;
 	block_t addr;
-	int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+	int bio_blocks = MAX_BIO_BLOCKS(sbi);
 	struct page *pages[bio_blocks];
 	int i, idx, last_offset, nrpages, err = 0;
 
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 78ef10c..1ccee36 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -180,7 +180,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 		page = get_meta_page(sbi, blkaddr);
 
 		ra_meta_pages(sbi, next_blkaddr_of_node(page),
-				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
+					MAX_BIO_BLOCKS(sbi), META_POR);
 
 		if (cp_ver != cpver_of_node(page))
 			break;
@@ -444,7 +444,7 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		page = get_meta_page(sbi, blkaddr);
 
 		ra_meta_pages(sbi, next_blkaddr_of_node(page),
-				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
+					MAX_BIO_BLOCKS(sbi), META_POR);
 
 		if (cp_ver != cpver_of_node(page)) {
 			f2fs_put_page(page, 1);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c6f627b..4ea53aa 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1757,7 +1757,7 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
 	int sit_blk_cnt = SIT_BLK_CNT(sbi);
 	unsigned int i, start, end;
 	unsigned int readed, start_blk = 0;
-	int nrpages = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+	int nrpages = MAX_BIO_BLOCKS(sbi);
 
 	do {
 		readed = ra_meta_pages(sbi, start_blk, nrpages, META_SIT);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index c6f37f2..633e822 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -93,8 +93,8 @@
 	(((sector_t)blk_addr) << (sbi)->log_sectors_per_block)
 #define SECTOR_TO_BLOCK(sbi, sectors)					\
 	(sectors >> (sbi)->log_sectors_per_block)
-#define MAX_BIO_BLOCKS(max_hw_blocks)					\
-	(min((int)max_hw_blocks, BIO_MAX_PAGES))
+#define MAX_BIO_BLOCKS(sbi)						\
+	((int)min(max_hw_blocks(sbi), BIO_MAX_PAGES))
 
 /*
  * indicate a block allocation direction: RIGHT and LEFT.
@@ -728,7 +728,7 @@ static inline int nr_pages_to_skip(struct f2fs_sb_info *sbi, int type)
 	else if (type == NODE)
 		return 3 * sbi->blocks_per_seg;
 	else if (type == META)
-		return MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+		return MAX_BIO_BLOCKS(sbi);
 	else
 		return 0;
 }
@@ -751,7 +751,7 @@ static inline long nr_pages_to_write(struct f2fs_sb_info *sbi, int type,
 	else if (type == NODE)
 		desired = 3 * max_hw_blocks(sbi);
 	else
-		desired = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+		desired = MAX_BIO_BLOCKS(sbi);
 
 	wbc->nr_to_write = desired;
 	return desired - nr_to_write;
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users
  2014-09-15  2:13   ` [f2fs-dev] " Changman Lee
@ 2014-09-18  5:39     ` Jaegeuk Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-18  5:39 UTC (permalink / raw)
  To: Changman Lee; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

On Mon, Sep 15, 2014 at 11:13:15AM +0900, Changman Lee wrote:
> Hi JK,
> 
> I think it' nicer if this can be used as 'OR' with other policy
> together. If so, we can also cover the weakness in high utilization.

Agreed.
I'll send another patch for that.
Thanks,

> 
> Regard,
> Changman
> 
> On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote:
> > If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
> > only starts to try in-place-updates.
> > And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
> > keeps out-of-order manner. Otherwise, it triggers in-place-updates.
> > 
> > This may be used by storage showing very high random write performance.
> > 
> > For example, it can be used when,
> > 
> > Seq. writes (Data) + wait + Seq. writes (Node)
> > 
> > is pretty much slower than,
> > 
> > Rand. writes (Data)
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  Documentation/ABI/testing/sysfs-fs-f2fs |  7 +++++++
> >  Documentation/filesystems/f2fs.txt      |  9 ++++++++-
> >  fs/f2fs/f2fs.h                          |  1 +
> >  fs/f2fs/file.c                          |  7 +++----
> >  fs/f2fs/segment.c                       |  3 ++-
> >  fs/f2fs/segment.h                       | 14 ++++++++++----
> >  fs/f2fs/super.c                         |  2 ++
> >  7 files changed, 33 insertions(+), 10 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> > index 62dd725..6f9157f 100644
> > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > @@ -44,6 +44,13 @@ Description:
> >  		 Controls the FS utilization condition for the in-place-update
> >  		 policies.
> >  
> > +What:		/sys/fs/f2fs/<disk>/min_fsync_blocks
> > +Date:		September 2014
> > +Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> > +Description:
> > +		 Controls the dirty page count condition for the in-place-update
> > +		 policies.
> > +
> >  What:		/sys/fs/f2fs/<disk>/max_small_discards
> >  Date:		November 2013
> >  Contact:	"Jaegeuk Kim" <jaegeuk.kim@samsung.com>
> > diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
> > index a2046a7..d010da8 100644
> > --- a/Documentation/filesystems/f2fs.txt
> > +++ b/Documentation/filesystems/f2fs.txt
> > @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/<devname>
> >                                updates in f2fs. There are five policies:
> >                                 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR,
> >                                 2: F2FS_IPU_UTIL,  3: F2FS_IPU_SSR_UTIL,
> > -                               4: F2FS_IPU_DISABLE.
> > +                               4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE.
> >  
> >   min_ipu_util                 This parameter controls the threshold to trigger
> >                                in-place-updates. The number indicates percentage
> >                                of the filesystem utilization, and used by
> >                                F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
> >  
> > + min_fsync_blocks             This parameter controls the threshold to trigger
> > +                              in-place-updates when F2FS_IPU_FSYNC mode is set.
> > +			      The number indicates the number of dirty pages
> > +			      when fsync needs to flush on its call path. If
> > +			      the number is less than this value, it triggers
> > +			      in-place-updates.
> > +
> >   max_victim_search	      This parameter controls the number of trials to
> >  			      find a victim segment when conducting SSR and
> >  			      cleaning operations. The default value is 4096
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 2756c16..4f84d2a 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -386,6 +386,7 @@ struct f2fs_sm_info {
> >  
> >  	unsigned int ipu_policy;	/* in-place-update policy */
> >  	unsigned int min_ipu_util;	/* in-place-update threshold */
> > +	unsigned int min_fsync_blocks;	/* threshold for fsync */
> >  
> >  	/* for flush command control */
> >  	struct flush_cmd_control *cmd_control_info;
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 77426c7..af06e22 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >  	trace_f2fs_sync_file_enter(inode);
> >  
> >  	/* if fdatasync is triggered, let's do in-place-update */
> > -	if (datasync)
> > +	if (get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
> >  		set_inode_flag(fi, FI_NEED_IPU);
> > -
> >  	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
> > -	if (datasync)
> > -		clear_inode_flag(fi, FI_NEED_IPU);
> > +	clear_inode_flag(fi, FI_NEED_IPU);
> > +
> >  	if (ret) {
> >  		trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
> >  		return ret;
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index e158d63..c6f627b 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -1928,8 +1928,9 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
> >  	sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
> >  	sm_info->rec_prefree_segments = sm_info->main_segments *
> >  					DEF_RECLAIM_PREFREE_SEGMENTS / 100;
> > -	sm_info->ipu_policy = F2FS_IPU_DISABLE;
> > +	sm_info->ipu_policy = F2FS_IPU_FSYNC;
> >  	sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
> > +	sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
> >  
> >  	INIT_LIST_HEAD(&sm_info->discard_list);
> >  	sm_info->nr_discards = 0;
> > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> > index bed0dc9..013aed2 100644
> > --- a/fs/f2fs/segment.h
> > +++ b/fs/f2fs/segment.h
> > @@ -472,15 +472,20 @@ static inline int utilization(struct f2fs_sb_info *sbi)
> >   * F2FS_IPU_UTIL - if FS utilization is over threashold,
> >   * F2FS_IPU_SSR_UTIL - if SSR mode is activated and FS utilization is over
> >   *                     threashold,
> > + * F2FS_IPU_FSYNC - activated in fsync path only for high performance flash
> > + *                     storages. IPU will be triggered only if the # of dirty
> > + *                     pages over min_fsync_blocks.
> >   * F2FS_IPUT_DISABLE - disable IPU. (=default option)
> >   */
> >  #define DEF_MIN_IPU_UTIL	70
> > +#define DEF_MIN_FSYNC_BLOCKS	8
> >  
> >  enum {
> >  	F2FS_IPU_FORCE,
> >  	F2FS_IPU_SSR,
> >  	F2FS_IPU_UTIL,
> >  	F2FS_IPU_SSR_UTIL,
> > +	F2FS_IPU_FSYNC,
> >  	F2FS_IPU_DISABLE,
> >  };
> >  
> > @@ -492,10 +497,6 @@ static inline bool need_inplace_update(struct inode *inode)
> >  	if (S_ISDIR(inode->i_mode))
> >  		return false;
> >  
> > -	/* this is only set during fdatasync */
> > -	if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
> > -		return true;
> > -
> >  	switch (SM_I(sbi)->ipu_policy) {
> >  	case F2FS_IPU_FORCE:
> >  		return true;
> > @@ -511,6 +512,11 @@ static inline bool need_inplace_update(struct inode *inode)
> >  		if (need_SSR(sbi) && utilization(sbi) > SM_I(sbi)->min_ipu_util)
> >  			return true;
> >  		break;
> > +	case F2FS_IPU_FSYNC:
> > +		/* this is only set during fdatasync */
> > +		if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
> > +			return true;
> > +		break;
> >  	case F2FS_IPU_DISABLE:
> >  		break;
> >  	}
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index b5af9be..ed4095e 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -190,6 +190,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, reclaim_segments, rec_prefree_segments);
> >  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, max_small_discards, max_discards);
> >  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
> >  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
> > +F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
> >  F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh);
> >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_victim_search, max_victim_search);
> >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, dir_level, dir_level);
> > @@ -204,6 +205,7 @@ static struct attribute *f2fs_attrs[] = {
> >  	ATTR_LIST(max_small_discards),
> >  	ATTR_LIST(ipu_policy),
> >  	ATTR_LIST(min_ipu_util),
> > +	ATTR_LIST(min_fsync_blocks),
> >  	ATTR_LIST(max_victim_search),
> >  	ATTR_LIST(dir_level),
> >  	ATTR_LIST(ram_thresh),
> > -- 
> > 1.8.5.2 (Apple Git-48)
> > 
> > 
> > ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce
> > Perforce version control. Predictably reliable.
> > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
  2014-09-14 22:14 ` [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed Jaegeuk Kim
@ 2014-09-22  2:36   ` Chao Yu
  2014-09-23  4:46     ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2014-09-22  2:36 UTC (permalink / raw)
  To: 'Jaegeuk Kim'; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Monday, September 15, 2014 6:14 AM
> To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> linux-f2fs-devel@lists.sourceforge.net
> Cc: Jaegeuk Kim
> Subject: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
> 
> Previously, all the dnode pages should be read during the roll-forward recovery.
> Even worsely, whole the chain was traversed twice.
> This patch removes that redundant and costly read operations by using page cache
> of meta_inode and readahead function as well.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/checkpoint.c | 11 ++++++++--
>  fs/f2fs/f2fs.h       |  5 +++--
>  fs/f2fs/recovery.c   | 59 +++++++++++++++++++++++++---------------------------
>  fs/f2fs/segment.h    |  5 +++--
>  4 files changed, 43 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 7262d99..d1ed889 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -82,6 +82,8 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
>  	case META_SSA:
>  	case META_CP:
>  		return 0;
> +	case META_POR:
> +		return SM_I(sbi)->main_blkaddr + sbi->user_block_count;

Here we will skip virtual over-provision segments, so better to use TOTAL_BLKS(sbi).

>  	default:
>  		BUG();
>  	}
> @@ -90,11 +92,11 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
>  /*
>   * Readahead CP/NAT/SIT/SSA pages
>   */
> -int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
> +int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type)
>  {
>  	block_t prev_blk_addr = 0;
>  	struct page *page;
> -	int blkno = start;
> +	block_t blkno = start;
>  	int max_blks = get_max_meta_blks(sbi, type);
> 
>  	struct f2fs_io_info fio = {
> @@ -128,6 +130,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int
> type)
>  			/* get ssa/cp block addr */
>  			blk_addr = blkno;
>  			break;
> +		case META_POR:
> +			if (unlikely(blkno >= max_blks))
> +				goto out;
> +			blk_addr = blkno;
> +			break;

The real modification in patch which is merged to dev of f2fs is as following:

- /* get ssa/cp block addr */
+ case META_POR:
+ if (blkno >= max_blks || blkno < min_blks)
+ goto out;

IMHO, it's better to verify boundary separately for META_{SSA,CP,POR} with unlikely.
How do you think?

>  		default:
>  			BUG();
>  		}
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4f84d2a..48d7d46 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -103,7 +103,8 @@ enum {
>  	META_CP,
>  	META_NAT,
>  	META_SIT,
> -	META_SSA
> +	META_SSA,
> +	META_POR,
>  };
> 
>  /* for the list of ino */
> @@ -1291,7 +1292,7 @@ void destroy_segment_manager_caches(void);
>   */
>  struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
>  struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
> -int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
> +int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int);
>  long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
>  void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
>  void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 3736728..6f7fbfa 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -173,7 +173,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  {
>  	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
>  	struct curseg_info *curseg;
> -	struct page *page;
> +	struct page *page = NULL;
>  	block_t blkaddr;
>  	int err = 0;
> 
> @@ -181,20 +181,19 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
>  	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> 
> -	/* read node page */
> -	page = alloc_page(GFP_F2FS_ZERO);
> -	if (!page)
> -		return -ENOMEM;
> -	lock_page(page);
> -
>  	while (1) {
>  		struct fsync_inode_entry *entry;
> 
> -		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
> -		if (err)
> -			return err;
> +		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> +					blkaddr > TOTAL_BLKS(sbi))

blkaddr >= TOTAL_BLKS(sbi) ?

> +			break;
> 
> -		lock_page(page);
> +		page = get_meta_page(sbi, blkaddr);
> +		if (IS_ERR(page))
> +			return PTR_ERR(page);
> +
> +		ra_meta_pages(sbi, next_blkaddr_of_node(page),
> +				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);

In first round we readahead [0,n] pages, then in second round we readahead [1,n+1] pages,
but as [1,n] pages is already uptodate, so we will readahead only one page, the same for the
following rounds. It's low efficiency.

So how about modify in find_fsync_dnodes/recover_data like this:

	block_t blkaddr, ra_start_blk;
	int err = 0;

	/* get node pages in the current segment */
	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
	ra_start_blk = blkaddr;

	while (1) {
		struct fsync_inode_entry *entry;

		if (blkaddr < SM_I(sbi)->main_blkaddr ||
					blkaddr > TOTAL_BLKS(sbi))
			return 0;

		if (ra_start_blk == blkaddr) {
			ra_meta_pages(sbi, ra_start_blk, MAX_BIO_BLOCKS(sbi),
								META_POR);
			ra_start_blk += MAX_BIO_BLOCKS(sbi);
		}

		page = get_meta_page(sbi, blkaddr);

Thanks,
Yu

> 
>  		if (cp_ver != cpver_of_node(page))
>  			break;
> @@ -243,11 +242,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  next:
>  		/* check next segment */
>  		blkaddr = next_blkaddr_of_node(page);
> +		f2fs_put_page(page, 1);
>  	}
> -
> -	unlock_page(page);
> -	__free_pages(page, 0);
> -
> +	f2fs_put_page(page, 1);
>  	return err;
>  }
> 
> @@ -427,7 +424,7 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  {
>  	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
>  	struct curseg_info *curseg;
> -	struct page *page;
> +	struct page *page = NULL;
>  	int err = 0;
>  	block_t blkaddr;
> 
> @@ -435,21 +432,19 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  	curseg = CURSEG_I(sbi, type);
>  	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> 
> -	/* read node page */
> -	page = alloc_page(GFP_F2FS_ZERO);
> -	if (!page)
> -		return -ENOMEM;
> -
> -	lock_page(page);
> -
>  	while (1) {
>  		struct fsync_inode_entry *entry;
> 
> -		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
> -		if (err)
> -			return err;
> +		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> +					blkaddr > TOTAL_BLKS(sbi))
> +			break;
> 
> -		lock_page(page);
> +		page = get_meta_page(sbi, blkaddr);
> +		if (IS_ERR(page))
> +			return PTR_ERR(page);
> +
> +		ra_meta_pages(sbi, next_blkaddr_of_node(page),
> +				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
> 
>  		if (cp_ver != cpver_of_node(page))
>  			break;
> @@ -477,11 +472,9 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  next:
>  		/* check next segment */
>  		blkaddr = next_blkaddr_of_node(page);
> +		f2fs_put_page(page, 1);
>  	}
> -
> -	unlock_page(page);
> -	__free_pages(page, 0);
> -
> +	f2fs_put_page(page, 1);
>  	if (!err)
>  		allocate_new_segments(sbi);
>  	return err;
> @@ -527,6 +520,10 @@ out:
>  	destroy_fsync_dnodes(&inode_list);
>  	kmem_cache_destroy(fsync_entry_slab);
> 
> +	/* truncate meta pages to be used by the recovery */
> +	truncate_inode_pages_range(META_MAPPING(sbi),
> +		SM_I(sbi)->main_blkaddr << PAGE_CACHE_SHIFT, -1);
> +
>  	if (err) {
>  		truncate_inode_pages_final(NODE_MAPPING(sbi));
>  		truncate_inode_pages_final(META_MAPPING(sbi));
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index 013aed2..c6f37f2 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -87,6 +87,7 @@
>  	(BITS_TO_LONGS(nr) * sizeof(unsigned long))
>  #define TOTAL_SEGS(sbi)	(SM_I(sbi)->main_segments)
>  #define TOTAL_SECS(sbi)	(sbi->total_sections)
> +#define TOTAL_BLKS(sbi)	(SM_I(sbi)->segment_count << sbi->log_blocks_per_seg)
> 
>  #define SECTOR_FROM_BLOCK(sbi, blk_addr)				\
>  	(((sector_t)blk_addr) << (sbi)->log_sectors_per_block)
> @@ -553,7 +554,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int
> segno)
>  static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
>  {
>  	struct f2fs_sm_info *sm_info = SM_I(sbi);
> -	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
> +	block_t total_blks = TOTAL_BLKS(sbi);
>  	block_t start_addr = sm_info->seg0_blkaddr;
>  	block_t end_addr = start_addr + total_blks - 1;
>  	BUG_ON(blk_addr < start_addr);
> @@ -606,7 +607,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int
> segno)
>  static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
>  {
>  	struct f2fs_sm_info *sm_info = SM_I(sbi);
> -	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
> +	block_t total_blks = TOTAL_BLKS(sbi);
>  	block_t start_addr = sm_info->seg0_blkaddr;
>  	block_t end_addr = start_addr + total_blks - 1;
> 
> --
> 1.8.5.2 (Apple Git-48)
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
  2014-09-22  2:36   ` [f2fs-dev] " Chao Yu
@ 2014-09-23  4:46     ` Jaegeuk Kim
  2014-09-23  6:54       ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2014-09-23  4:46 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

Hi Chao,

On Mon, Sep 22, 2014 at 10:36:25AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Monday, September 15, 2014 6:14 AM
> > To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> > linux-f2fs-devel@lists.sourceforge.net
> > Cc: Jaegeuk Kim
> > Subject: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
> > 
> > Previously, all the dnode pages should be read during the roll-forward recovery.
> > Even worsely, whole the chain was traversed twice.
> > This patch removes that redundant and costly read operations by using page cache
> > of meta_inode and readahead function as well.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/checkpoint.c | 11 ++++++++--
> >  fs/f2fs/f2fs.h       |  5 +++--
> >  fs/f2fs/recovery.c   | 59 +++++++++++++++++++++++++---------------------------
> >  fs/f2fs/segment.h    |  5 +++--
> >  4 files changed, 43 insertions(+), 37 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 7262d99..d1ed889 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -82,6 +82,8 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
> >  	case META_SSA:
> >  	case META_CP:
> >  		return 0;
> > +	case META_POR:
> > +		return SM_I(sbi)->main_blkaddr + sbi->user_block_count;
> 
> Here we will skip virtual over-provision segments, so better to use TOTAL_BLKS(sbi).
> 
> >  	default:
> >  		BUG();
> >  	}
> > @@ -90,11 +92,11 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
> >  /*
> >   * Readahead CP/NAT/SIT/SSA pages
> >   */
> > -int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
> > +int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type)
> >  {
> >  	block_t prev_blk_addr = 0;
> >  	struct page *page;
> > -	int blkno = start;
> > +	block_t blkno = start;
> >  	int max_blks = get_max_meta_blks(sbi, type);
> > 
> >  	struct f2fs_io_info fio = {
> > @@ -128,6 +130,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int
> > type)
> >  			/* get ssa/cp block addr */
> >  			blk_addr = blkno;
> >  			break;
> > +		case META_POR:
> > +			if (unlikely(blkno >= max_blks))
> > +				goto out;
> > +			blk_addr = blkno;
> > +			break;
> 
> The real modification in patch which is merged to dev of f2fs is as following:
> 
> - /* get ssa/cp block addr */
> + case META_POR:
> + if (blkno >= max_blks || blkno < min_blks)
> + goto out;
> 
> IMHO, it's better to verify boundary separately for META_{SSA,CP,POR} with unlikely.
> How do you think?

Not bad.
Could you check the v2 below?

> 
> >  		default:
> >  			BUG();
> >  		}
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4f84d2a..48d7d46 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -103,7 +103,8 @@ enum {
> >  	META_CP,
> >  	META_NAT,
> >  	META_SIT,
> > -	META_SSA
> > +	META_SSA,
> > +	META_POR,
> >  };
> > 
> >  /* for the list of ino */
> > @@ -1291,7 +1292,7 @@ void destroy_segment_manager_caches(void);
> >   */
> >  struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
> >  struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
> > -int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
> > +int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int);
> >  long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
> >  void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
> >  void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 3736728..6f7fbfa 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -173,7 +173,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> > *head)
> >  {
> >  	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
> >  	struct curseg_info *curseg;
> > -	struct page *page;
> > +	struct page *page = NULL;
> >  	block_t blkaddr;
> >  	int err = 0;
> > 
> > @@ -181,20 +181,19 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> > *head)
> >  	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> >  	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> > 
> > -	/* read node page */
> > -	page = alloc_page(GFP_F2FS_ZERO);
> > -	if (!page)
> > -		return -ENOMEM;
> > -	lock_page(page);
> > -
> >  	while (1) {
> >  		struct fsync_inode_entry *entry;
> > 
> > -		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
> > -		if (err)
> > -			return err;
> > +		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> > +					blkaddr > TOTAL_BLKS(sbi))
> 
> blkaddr >= TOTAL_BLKS(sbi) ?

Right.

> 
> > +			break;
> > 
> > -		lock_page(page);
> > +		page = get_meta_page(sbi, blkaddr);
> > +		if (IS_ERR(page))
> > +			return PTR_ERR(page);
> > +
> > +		ra_meta_pages(sbi, next_blkaddr_of_node(page),
> > +				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
> 
> In first round we readahead [0,n] pages, then in second round we readahead [1,n+1] pages,
> but as [1,n] pages is already uptodate, so we will readahead only one page, the same for the
> following rounds. It's low efficiency.
> 
> So how about modify in find_fsync_dnodes/recover_data like this:
> 
> 	block_t blkaddr, ra_start_blk;
> 	int err = 0;
> 
> 	/* get node pages in the current segment */
> 	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> 	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> 	ra_start_blk = blkaddr;
> 
> 	while (1) {
> 		struct fsync_inode_entry *entry;
> 
> 		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> 					blkaddr > TOTAL_BLKS(sbi))
> 			return 0;
> 
> 		if (ra_start_blk == blkaddr) {
> 			ra_meta_pages(sbi, ra_start_blk, MAX_BIO_BLOCKS(sbi),
> 								META_POR);
> 			ra_start_blk += MAX_BIO_BLOCKS(sbi);
> 		}
> 
> 		page = get_meta_page(sbi, blkaddr);

Well. the blkaddr would not be alway able to be matched.
How about this?

commit 1cdbd53437f32046af8d94f551505754446c9720
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Thu Sep 11 13:49:55 2014 -0700

    f2fs: use meta_inode cache to improve roll-forward speed
    
    Previously, all the dnode pages should be read during the roll-forward recovery.
    Even worsely, whole the chain was traversed twice.
    This patch removes that redundant and costly read operations by using page cache
    of meta_inode and readahead function as well.
    
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d44d287..700869c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -72,7 +72,23 @@ out:
 	return page;
 }
 
-static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
+struct page *get_meta_page_ra(struct f2fs_sb_info *sbi, pgoff_t index)
+{
+	bool readahead = false;
+	struct page *page;
+
+	page = find_get_page(META_MAPPING(sbi), index);
+	if (!page || (page && !PageUptodate(page)))
+		readahead = true;
+	f2fs_put_page(page, 0);
+
+	if (readahead)
+		ra_meta_pages(sbi, index,
+				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
+	return get_meta_page(sbi, index);
+}
+
+static inline block_t get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
 {
 	switch (type) {
 	case META_NAT:
@@ -82,6 +98,8 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
 	case META_SSA:
 	case META_CP:
 		return 0;
+	case META_POR:
+		return SM_I(sbi)->main_blkaddr + TOTAL_BLKS(sbi);
 	default:
 		BUG();
 	}
@@ -90,12 +108,13 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
 /*
  * Readahead CP/NAT/SIT/SSA pages
  */
-int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
+int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type)
 {
 	block_t prev_blk_addr = 0;
 	struct page *page;
-	int blkno = start;
-	int max_blks = get_max_meta_blks(sbi, type);
+	block_t blkno = start;
+	block_t max_blks = get_max_meta_blks(sbi, type);
+	block_t min_blks = SM_I(sbi)->seg0_blkaddr;
 
 	struct f2fs_io_info fio = {
 		.type = META,
@@ -125,7 +144,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
 			break;
 		case META_SSA:
 		case META_CP:
-			/* get ssa/cp block addr */
+		case META_POR:
+			if (unlikely(blkno >= max_blks))
+				goto out;
+			if (unlikely(blkno < min_blks))
+				goto out;
 			blk_addr = blkno;
 			break;
 		default:
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8efa170..b6439c3 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -103,7 +103,8 @@ enum {
 	META_CP,
 	META_NAT,
 	META_SIT,
-	META_SSA
+	META_SSA,
+	META_POR,
 };
 
 /* for the list of ino */
@@ -1294,7 +1295,8 @@ void destroy_segment_manager_caches(void);
  */
 struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
 struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
-int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
+struct page *get_meta_page_ra(struct f2fs_sb_info *, pgoff_t);
+int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int);
 long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
 void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
 void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index e587ee1..3a259a9 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -138,7 +138,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 {
 	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
 	struct curseg_info *curseg;
-	struct page *page;
+	struct page *page = NULL;
 	block_t blkaddr;
 	int err = 0;
 
@@ -146,20 +146,14 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
 	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
 
-	/* read node page */
-	page = alloc_page(GFP_F2FS_ZERO);
-	if (!page)
-		return -ENOMEM;
-	lock_page(page);
-
 	while (1) {
 		struct fsync_inode_entry *entry;
 
-		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
-		if (err)
-			return err;
+		if (blkaddr < SM_I(sbi)->main_blkaddr ||
+					blkaddr >= TOTAL_BLKS(sbi))
+			return 0;
 
-		lock_page(page);
+		page = get_meta_page_ra(sbi, blkaddr);
 
 		if (cp_ver != cpver_of_node(page))
 			break;
@@ -202,11 +196,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 next:
 		/* check next segment */
 		blkaddr = next_blkaddr_of_node(page);
+		f2fs_put_page(page, 1);
 	}
-
-	unlock_page(page);
-	__free_pages(page, 0);
-
+	f2fs_put_page(page, 1);
 	return err;
 }
 
@@ -400,7 +392,7 @@ static int recover_data(struct f2fs_sb_info *sbi,
 {
 	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
 	struct curseg_info *curseg;
-	struct page *page;
+	struct page *page = NULL;
 	int err = 0;
 	block_t blkaddr;
 
@@ -408,32 +400,29 @@ static int recover_data(struct f2fs_sb_info *sbi,
 	curseg = CURSEG_I(sbi, type);
 	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
 
-	/* read node page */
-	page = alloc_page(GFP_F2FS_ZERO);
-	if (!page)
-		return -ENOMEM;
-
-	lock_page(page);
-
 	while (1) {
 		struct fsync_inode_entry *entry;
 
-		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
-		if (err)
-			return err;
+		if (blkaddr < SM_I(sbi)->main_blkaddr ||
+					blkaddr > TOTAL_BLKS(sbi))
+			break;
 
-		lock_page(page);
+		page = get_meta_page_ra(sbi, blkaddr);
 
-		if (cp_ver != cpver_of_node(page))
+		if (cp_ver != cpver_of_node(page)) {
+			f2fs_put_page(page, 1);
 			break;
+		}
 
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
-		if (err)
+		if (err) {
+			f2fs_put_page(page, 1);
 			break;
+		}
 
 		if (entry->blkaddr == blkaddr) {
 			iput(entry->inode);
@@ -443,11 +432,8 @@ static int recover_data(struct f2fs_sb_info *sbi,
 next:
 		/* check next segment */
 		blkaddr = next_blkaddr_of_node(page);
+		f2fs_put_page(page, 1);
 	}
-
-	unlock_page(page);
-	__free_pages(page, 0);
-
 	if (!err)
 		allocate_new_segments(sbi);
 	return err;
@@ -493,6 +479,10 @@ out:
 	destroy_fsync_dnodes(&inode_list);
 	kmem_cache_destroy(fsync_entry_slab);
 
+	/* truncate meta pages to be used by the recovery */
+	truncate_inode_pages_range(META_MAPPING(sbi),
+		SM_I(sbi)->main_blkaddr << PAGE_CACHE_SHIFT, -1);
+
 	if (err) {
 		truncate_inode_pages_final(NODE_MAPPING(sbi));
 		truncate_inode_pages_final(META_MAPPING(sbi));
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 013aed2..c6f37f2 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -87,6 +87,7 @@
 	(BITS_TO_LONGS(nr) * sizeof(unsigned long))
 #define TOTAL_SEGS(sbi)	(SM_I(sbi)->main_segments)
 #define TOTAL_SECS(sbi)	(sbi->total_sections)
+#define TOTAL_BLKS(sbi)	(SM_I(sbi)->segment_count << sbi->log_blocks_per_seg)
 
 #define SECTOR_FROM_BLOCK(sbi, blk_addr)				\
 	(((sector_t)blk_addr) << (sbi)->log_sectors_per_block)
@@ -553,7 +554,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
 static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
 {
 	struct f2fs_sm_info *sm_info = SM_I(sbi);
-	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
+	block_t total_blks = TOTAL_BLKS(sbi);
 	block_t start_addr = sm_info->seg0_blkaddr;
 	block_t end_addr = start_addr + total_blks - 1;
 	BUG_ON(blk_addr < start_addr);
@@ -606,7 +607,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
 static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
 {
 	struct f2fs_sm_info *sm_info = SM_I(sbi);
-	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
+	block_t total_blks = TOTAL_BLKS(sbi);
 	block_t start_addr = sm_info->seg0_blkaddr;
 	block_t end_addr = start_addr + total_blks - 1;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
  2014-09-23  4:46     ` Jaegeuk Kim
@ 2014-09-23  6:54       ` Chao Yu
  0 siblings, 0 replies; 17+ messages in thread
From: Chao Yu @ 2014-09-23  6:54 UTC (permalink / raw)
  To: 'Jaegeuk Kim'; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Tuesday, September 23, 2014 12:47 PM
> To: Chao Yu
> Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
> 
> Hi Chao,
> 
> On Mon, Sep 22, 2014 at 10:36:25AM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> >
> > > -----Original Message-----
> > > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > > Sent: Monday, September 15, 2014 6:14 AM
> > > To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> > > linux-f2fs-devel@lists.sourceforge.net
> > > Cc: Jaegeuk Kim
> > > Subject: [f2fs-dev] [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed
> > >
> > > Previously, all the dnode pages should be read during the roll-forward recovery.
> > > Even worsely, whole the chain was traversed twice.
> > > This patch removes that redundant and costly read operations by using page cache
> > > of meta_inode and readahead function as well.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/checkpoint.c | 11 ++++++++--
> > >  fs/f2fs/f2fs.h       |  5 +++--
> > >  fs/f2fs/recovery.c   | 59 +++++++++++++++++++++++++---------------------------
> > >  fs/f2fs/segment.h    |  5 +++--
> > >  4 files changed, 43 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > > index 7262d99..d1ed889 100644
> > > --- a/fs/f2fs/checkpoint.c
> > > +++ b/fs/f2fs/checkpoint.c
> > > @@ -82,6 +82,8 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
> > >  	case META_SSA:
> > >  	case META_CP:
> > >  		return 0;
> > > +	case META_POR:
> > > +		return SM_I(sbi)->main_blkaddr + sbi->user_block_count;
> >
> > Here we will skip virtual over-provision segments, so better to use TOTAL_BLKS(sbi).
> >
> > >  	default:
> > >  		BUG();
> > >  	}
> > > @@ -90,11 +92,11 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
> > >  /*
> > >   * Readahead CP/NAT/SIT/SSA pages
> > >   */
> > > -int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
> > > +int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type)
> > >  {
> > >  	block_t prev_blk_addr = 0;
> > >  	struct page *page;
> > > -	int blkno = start;
> > > +	block_t blkno = start;
> > >  	int max_blks = get_max_meta_blks(sbi, type);
> > >
> > >  	struct f2fs_io_info fio = {
> > > @@ -128,6 +130,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages,
> int
> > > type)
> > >  			/* get ssa/cp block addr */
> > >  			blk_addr = blkno;
> > >  			break;
> > > +		case META_POR:
> > > +			if (unlikely(blkno >= max_blks))
> > > +				goto out;
> > > +			blk_addr = blkno;
> > > +			break;
> >
> > The real modification in patch which is merged to dev of f2fs is as following:
> >
> > - /* get ssa/cp block addr */
> > + case META_POR:
> > + if (blkno >= max_blks || blkno < min_blks)
> > + goto out;
> >
> > IMHO, it's better to verify boundary separately for META_{SSA,CP,POR} with unlikely.
> > How do you think?
> 
> Not bad.
> Could you check the v2 below?
> 
> >
> > >  		default:
> > >  			BUG();
> > >  		}
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index 4f84d2a..48d7d46 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -103,7 +103,8 @@ enum {
> > >  	META_CP,
> > >  	META_NAT,
> > >  	META_SIT,
> > > -	META_SSA
> > > +	META_SSA,
> > > +	META_POR,
> > >  };
> > >
> > >  /* for the list of ino */
> > > @@ -1291,7 +1292,7 @@ void destroy_segment_manager_caches(void);
> > >   */
> > >  struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
> > >  struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
> > > -int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
> > > +int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int);
> > >  long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
> > >  void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
> > >  void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
> > > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > > index 3736728..6f7fbfa 100644
> > > --- a/fs/f2fs/recovery.c
> > > +++ b/fs/f2fs/recovery.c
> > > @@ -173,7 +173,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> > > *head)
> > >  {
> > >  	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
> > >  	struct curseg_info *curseg;
> > > -	struct page *page;
> > > +	struct page *page = NULL;
> > >  	block_t blkaddr;
> > >  	int err = 0;
> > >
> > > @@ -181,20 +181,19 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> > > *head)
> > >  	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> > >  	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> > >
> > > -	/* read node page */
> > > -	page = alloc_page(GFP_F2FS_ZERO);
> > > -	if (!page)
> > > -		return -ENOMEM;
> > > -	lock_page(page);
> > > -
> > >  	while (1) {
> > >  		struct fsync_inode_entry *entry;
> > >
> > > -		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
> > > -		if (err)
> > > -			return err;
> > > +		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> > > +					blkaddr > TOTAL_BLKS(sbi))
> >
> > blkaddr >= TOTAL_BLKS(sbi) ?
> 
> Right.
> 
> >
> > > +			break;
> > >
> > > -		lock_page(page);
> > > +		page = get_meta_page(sbi, blkaddr);
> > > +		if (IS_ERR(page))
> > > +			return PTR_ERR(page);
> > > +
> > > +		ra_meta_pages(sbi, next_blkaddr_of_node(page),
> > > +				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
> >
> > In first round we readahead [0,n] pages, then in second round we readahead [1,n+1] pages,
> > but as [1,n] pages is already uptodate, so we will readahead only one page, the same for the
> > following rounds. It's low efficiency.
> >
> > So how about modify in find_fsync_dnodes/recover_data like this:
> >
> > 	block_t blkaddr, ra_start_blk;
> > 	int err = 0;
> >
> > 	/* get node pages in the current segment */
> > 	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
> > 	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> > 	ra_start_blk = blkaddr;
> >
> > 	while (1) {
> > 		struct fsync_inode_entry *entry;
> >
> > 		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> > 					blkaddr > TOTAL_BLKS(sbi))
> > 			return 0;
> >
> > 		if (ra_start_blk == blkaddr) {
> > 			ra_meta_pages(sbi, ra_start_blk, MAX_BIO_BLOCKS(sbi),
> > 								META_POR);
> > 			ra_start_blk += MAX_BIO_BLOCKS(sbi);
> > 		}
> >
> > 		page = get_meta_page(sbi, blkaddr);
> 
> Well. the blkaddr would not be alway able to be matched.
> How about this?
> 
> commit 1cdbd53437f32046af8d94f551505754446c9720
> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> Date:   Thu Sep 11 13:49:55 2014 -0700
> 
>     f2fs: use meta_inode cache to improve roll-forward speed
> 
>     Previously, all the dnode pages should be read during the roll-forward recovery.
>     Even worsely, whole the chain was traversed twice.
>     This patch removes that redundant and costly read operations by using page cache
>     of meta_inode and readahead function as well.
> 
>     Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Reviewed-by: Chao Yu <chao2.yu@samsung.com>

> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d44d287..700869c 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -72,7 +72,23 @@ out:
>  	return page;
>  }
> 
> -static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
> +struct page *get_meta_page_ra(struct f2fs_sb_info *sbi, pgoff_t index)
> +{
> +	bool readahead = false;
> +	struct page *page;
> +
> +	page = find_get_page(META_MAPPING(sbi), index);
> +	if (!page || (page && !PageUptodate(page)))
> +		readahead = true;
> +	f2fs_put_page(page, 0);
> +
> +	if (readahead)
> +		ra_meta_pages(sbi, index,
> +				MAX_BIO_BLOCKS(max_hw_blocks(sbi)), META_POR);
> +	return get_meta_page(sbi, index);
> +}
> +
> +static inline block_t get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
>  {
>  	switch (type) {
>  	case META_NAT:
> @@ -82,6 +98,8 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
>  	case META_SSA:
>  	case META_CP:
>  		return 0;
> +	case META_POR:
> +		return SM_I(sbi)->main_blkaddr + TOTAL_BLKS(sbi);

return SM_I(sbi)->seg0_blkaddr + TOTAL_BLKS(sbi);

>  	default:
>  		BUG();
>  	}
> @@ -90,12 +108,13 @@ static inline int get_max_meta_blks(struct f2fs_sb_info *sbi, int type)
>  /*
>   * Readahead CP/NAT/SIT/SSA pages
>   */
> -int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
> +int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type)
>  {
>  	block_t prev_blk_addr = 0;
>  	struct page *page;
> -	int blkno = start;
> -	int max_blks = get_max_meta_blks(sbi, type);
> +	block_t blkno = start;
> +	block_t max_blks = get_max_meta_blks(sbi, type);
> +	block_t min_blks = SM_I(sbi)->seg0_blkaddr;
> 
>  	struct f2fs_io_info fio = {
>  		.type = META,
> @@ -125,7 +144,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int
> type)
>  			break;
>  		case META_SSA:
>  		case META_CP:
> -			/* get ssa/cp block addr */
> +		case META_POR:
> +			if (unlikely(blkno >= max_blks))
> +				goto out;
> +			if (unlikely(blkno < min_blks))
> +				goto out;
>  			blk_addr = blkno;
>  			break;
>  		default:
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 8efa170..b6439c3 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -103,7 +103,8 @@ enum {
>  	META_CP,
>  	META_NAT,
>  	META_SIT,
> -	META_SSA
> +	META_SSA,
> +	META_POR,
>  };
> 
>  /* for the list of ino */
> @@ -1294,7 +1295,8 @@ void destroy_segment_manager_caches(void);
>   */
>  struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
>  struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
> -int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
> +struct page *get_meta_page_ra(struct f2fs_sb_info *, pgoff_t);
> +int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int);
>  long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
>  void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
>  void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index e587ee1..3a259a9 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -138,7 +138,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  {
>  	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
>  	struct curseg_info *curseg;
> -	struct page *page;
> +	struct page *page = NULL;
>  	block_t blkaddr;
>  	int err = 0;
> 
> @@ -146,20 +146,14 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  	curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
>  	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> 
> -	/* read node page */
> -	page = alloc_page(GFP_F2FS_ZERO);
> -	if (!page)
> -		return -ENOMEM;
> -	lock_page(page);
> -
>  	while (1) {
>  		struct fsync_inode_entry *entry;
> 
> -		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
> -		if (err)
> -			return err;
> +		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> +					blkaddr >= TOTAL_BLKS(sbi))

Here, upper boundary should be the same as value in get_max_meta_blks().

blkaddr >= SM_I(sbi)->seg0_blkaddr + TOTAL_BLKS(sbi)

> +			return 0;
> 
> -		lock_page(page);
> +		page = get_meta_page_ra(sbi, blkaddr);
> 
>  		if (cp_ver != cpver_of_node(page))
>  			break;
> @@ -202,11 +196,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  next:
>  		/* check next segment */
>  		blkaddr = next_blkaddr_of_node(page);
> +		f2fs_put_page(page, 1);
>  	}
> -
> -	unlock_page(page);
> -	__free_pages(page, 0);
> -
> +	f2fs_put_page(page, 1);
>  	return err;
>  }
> 
> @@ -400,7 +392,7 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  {
>  	unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
>  	struct curseg_info *curseg;
> -	struct page *page;
> +	struct page *page = NULL;
>  	int err = 0;
>  	block_t blkaddr;
> 
> @@ -408,32 +400,29 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  	curseg = CURSEG_I(sbi, type);
>  	blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
> 
> -	/* read node page */
> -	page = alloc_page(GFP_F2FS_ZERO);
> -	if (!page)
> -		return -ENOMEM;
> -
> -	lock_page(page);
> -
>  	while (1) {
>  		struct fsync_inode_entry *entry;
> 
> -		err = f2fs_submit_page_bio(sbi, page, blkaddr, READ_SYNC);
> -		if (err)
> -			return err;
> +		if (blkaddr < SM_I(sbi)->main_blkaddr ||
> +					blkaddr > TOTAL_BLKS(sbi))

blkaddr >= SM_I(sbi)->seg0_blkaddr + TOTAL_BLKS(sbi)

It's my mistaken that I do not reminder you to fix this in recover_data.

Thanks,
Yu

> +			break;
> 
> -		lock_page(page);
> +		page = get_meta_page_ra(sbi, blkaddr);
> 
> -		if (cp_ver != cpver_of_node(page))
> +		if (cp_ver != cpver_of_node(page)) {
> +			f2fs_put_page(page, 1);
>  			break;
> +		}
> 
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> 
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> -		if (err)
> +		if (err) {
> +			f2fs_put_page(page, 1);
>  			break;
> +		}
> 
>  		if (entry->blkaddr == blkaddr) {
>  			iput(entry->inode);
> @@ -443,11 +432,8 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  next:
>  		/* check next segment */
>  		blkaddr = next_blkaddr_of_node(page);
> +		f2fs_put_page(page, 1);
>  	}
> -
> -	unlock_page(page);
> -	__free_pages(page, 0);
> -
>  	if (!err)
>  		allocate_new_segments(sbi);
>  	return err;
> @@ -493,6 +479,10 @@ out:
>  	destroy_fsync_dnodes(&inode_list);
>  	kmem_cache_destroy(fsync_entry_slab);
> 
> +	/* truncate meta pages to be used by the recovery */
> +	truncate_inode_pages_range(META_MAPPING(sbi),
> +		SM_I(sbi)->main_blkaddr << PAGE_CACHE_SHIFT, -1);
> +
>  	if (err) {
>  		truncate_inode_pages_final(NODE_MAPPING(sbi));
>  		truncate_inode_pages_final(META_MAPPING(sbi));
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index 013aed2..c6f37f2 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -87,6 +87,7 @@
>  	(BITS_TO_LONGS(nr) * sizeof(unsigned long))
>  #define TOTAL_SEGS(sbi)	(SM_I(sbi)->main_segments)
>  #define TOTAL_SECS(sbi)	(sbi->total_sections)
> +#define TOTAL_BLKS(sbi)	(SM_I(sbi)->segment_count << sbi->log_blocks_per_seg)
> 
>  #define SECTOR_FROM_BLOCK(sbi, blk_addr)				\
>  	(((sector_t)blk_addr) << (sbi)->log_sectors_per_block)
> @@ -553,7 +554,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int
> segno)
>  static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
>  {
>  	struct f2fs_sm_info *sm_info = SM_I(sbi);
> -	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
> +	block_t total_blks = TOTAL_BLKS(sbi);
>  	block_t start_addr = sm_info->seg0_blkaddr;
>  	block_t end_addr = start_addr + total_blks - 1;
>  	BUG_ON(blk_addr < start_addr);
> @@ -606,7 +607,7 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int
> segno)
>  static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
>  {
>  	struct f2fs_sm_info *sm_info = SM_I(sbi);
> -	block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
> +	block_t total_blks = TOTAL_BLKS(sbi);
>  	block_t start_addr = sm_info->seg0_blkaddr;
>  	block_t end_addr = start_addr + total_blks - 1;
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-09-23  6:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-14 22:14 [PATCH 01/10] f2fs: fix negative value for lseek offset Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 02/10] f2fs: remove lengthy inode->i_ino Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 03/10] f2fs: expand counting dirty pages in the inode page cache Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users Jaegeuk Kim
2014-09-15  2:13   ` [f2fs-dev] " Changman Lee
2014-09-18  5:39     ` Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 05/10] f2fs: fix roll-forward missing scenarios Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 06/10] f2fs: do not skip latest inode information Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 07/10] f2fs: use meta_inode cache to improve roll-forward speed Jaegeuk Kim
2014-09-22  2:36   ` [f2fs-dev] " Chao Yu
2014-09-23  4:46     ` Jaegeuk Kim
2014-09-23  6:54       ` Chao Yu
2014-09-14 22:14 ` [PATCH 08/10] f2fs: remove redundant operation during roll-forward recovery Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 09/10] f2fs: use MAX_BIO_BLOCKS(sbi) Jaegeuk Kim
2014-09-14 22:34   ` Joe Perches
2014-09-18  5:37     ` Jaegeuk Kim
2014-09-14 22:14 ` [PATCH 10/10] f2fs: fix double lock for inode page during roll-foward recovery Jaegeuk Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).