All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp
@ 2017-05-20 12:24 Hou Pengyang
  2017-05-20 12:24 ` [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing Hou Pengyang
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-05-20 12:24 UTC (permalink / raw)
  To: jaegeuk, yuchao0, chao, miaoxie; +Cc: linux-f2fs-devel

This patches are to designed to optimize NAT/SIT flushing procedure:

patch 1) -- patch 3):

during flush_nat_entries, we do: 
1) gang_lookup a radix tree, find all the dirty nat_entry_set;
2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to 
    write to journal as much as possible to avoid unnessary IO. 

This patch optimize the look_up & sort algorithm by introducing an array:

    f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
    (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]) 

dirty_set[1] links all nat_entry_set whose entry_cnt is 1
dirty_set[2] links all nat_entry_set whose entry_cnt is 2
......
dirty_set[N] links all nat_entry_set whose entry_cnt is N

as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT 
be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.

So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.

Update:
    we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry's
entry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.

Flush:
    when flush dirty nat_entry_set, we just flush nat_entry_set from 
f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
gang_lookup and sort can be avoid.
    
footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
                              = 8*455 = 3.6k bytes

Same for SIT.

In patch 5), we use array to manage sit_entry_sets instead of list, which 
can avoid list travel, and there is no need to alloc/free lab memory.
It is low footrpint consuming.

Hou Pengyang (5):
  f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
  f2fs: reconstruct sit flushing code
  f2fs: use bucket sort to avoid list sort when flush sit entries
  f2fs: avoid unnecessary to_journal checking
  f2fs: use an array to record sit_entry_set info to make sort fast

 fs/f2fs/f2fs.h    |   5 +-
 fs/f2fs/node.c    |  79 ++++++++-----------
 fs/f2fs/segment.c | 227 ++++++++++++++++++++++++++----------------------------
 fs/f2fs/segment.h |   2 +-
 4 files changed, 147 insertions(+), 166 deletions(-)

-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
  2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
@ 2017-05-20 12:24 ` Hou Pengyang
  2017-06-07  9:44   ` Chao Yu
  2017-06-09 13:51   ` Hou Pengyang
  2017-05-20 12:24 ` [PATCH 2/5] f2fs: reconstruct sit flushing code Hou Pengyang
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-05-20 12:24 UTC (permalink / raw)
  To: jaegeuk, yuchao0, chao, miaoxie; +Cc: linux-f2fs-devel

during flush_nat_entries, we do: 
1) gang_lookup a radix tree, find all the dirty nat_entry_set;
2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to 
    write to journal as much as possible to avoid unnessary IO. 

This patch optimize the look_up & sort algorithm by introducing an array:

    f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
    (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]) 

dirty_set[1] links all nat_entry_set whose entry_cnt is 1
dirty_set[2] links all nat_entry_set whose entry_cnt is 2
......
dirty_set[N] links all nat_entry_set whose entry_cnt is N

as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT 
be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.

So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.

Update:
    we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry'sentry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.

Flush:
    when flush dirty nat_entry_set, we just flush nat_entry_set from
f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
gang_lookup and sort can be avoid.
    
footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
                              = 8*455 = 3.6k bytes

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 fs/f2fs/f2fs.h |  2 ++
 fs/f2fs/node.c | 56 ++++++++++++++++++++++----------------------------------
 2 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 093d68a..88a96bb 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -628,6 +628,8 @@ struct f2fs_nm_info {
 
 	/* for checkpoint */
 	char *nat_bitmap;		/* NAT bitmap pointer */
+	/* list dirty_set whose */
+	struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]; /* for */
 
 	unsigned int nat_bits_blocks;	/* # of nat bits blocks */
 	unsigned char *nat_bits;	/* NAT bits blocks */
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d22db8c..340c33d 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -174,13 +174,14 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 	list_move_tail(&ne->list, &head->entry_list);
 	nm_i->dirty_nat_cnt++;
 	head->entry_cnt++;
+	list_move_tail(&head->set_list, &(nm_i->dirty_set[head->entry_cnt]));
 	set_nat_flag(ne, IS_DIRTY, true);
 }
 
 static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 		struct nat_entry_set *set, struct nat_entry *ne)
 {
-	list_move_tail(&ne->list, &nm_i->nat_entries);
+	list_move_tail(&ne->list, &nm_i->nat_entries); /* to be shrinked */
 	set_nat_flag(ne, IS_DIRTY, false);
 	set->entry_cnt--;
 	nm_i->dirty_nat_cnt--;
@@ -2340,24 +2341,6 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
 	up_write(&curseg->journal_rwsem);
 }
 
-static void __adjust_nat_entry_set(struct nat_entry_set *nes,
-						struct list_head *head, int max)
-{
-	struct nat_entry_set *cur;
-
-	if (nes->entry_cnt >= max)
-		goto add_out;
-
-	list_for_each_entry(cur, head, set_list) {
-		if (cur->entry_cnt >= nes->entry_cnt) {
-			list_add(&nes->set_list, cur->set_list.prev);
-			return;
-		}
-	}
-add_out:
-	list_add_tail(&nes->set_list, head);
-}
-
 static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
 						struct page *page)
 {
@@ -2460,9 +2443,11 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
 
 	/* Allow dirty nats by node block allocation in write_begin */
 	if (!set->entry_cnt) {
+		list_del(&set->set_list);
 		radix_tree_delete(&NM_I(sbi)->nat_set_root, set->set);
 		kmem_cache_free(nat_entry_set_slab, set);
-	}
+	} else
+		f2fs_bug_on(sbi, 1);
 }
 
 /*
@@ -2473,10 +2458,8 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	struct f2fs_nm_info *nm_i = NM_I(sbi);
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
 	struct f2fs_journal *journal = curseg->journal;
-	struct nat_entry_set *setvec[SETVEC_SIZE];
 	struct nat_entry_set *set, *tmp;
-	unsigned int found;
-	nid_t set_idx = 0;
+	int i;
 	LIST_HEAD(sets);
 
 	if (!nm_i->dirty_nat_cnt)
@@ -2493,19 +2476,12 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 		!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
 		remove_nats_in_journal(sbi);
 
-	while ((found = __gang_lookup_nat_set(nm_i,
-					set_idx, SETVEC_SIZE, setvec))) {
-		unsigned idx;
-		set_idx = setvec[found - 1]->set + 1;
-		for (idx = 0; idx < found; idx++)
-			__adjust_nat_entry_set(setvec[idx], &sets,
-						MAX_NAT_JENTRIES(journal));
+	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++) {
+		list_for_each_entry_safe(set, tmp, &nm_i->dirty_set[i], set_list)
+			__flush_nat_entry_set(sbi, set, cpc);
+		f2fs_bug_on(sbi, !list_empty(&nm_i->dirty_set[i]));
 	}
 
-	/* flush dirty nats in nat entry set */
-	list_for_each_entry_safe(set, tmp, &sets, set_list)
-		__flush_nat_entry_set(sbi, set, cpc);
-
 	up_write(&nm_i->nat_tree_lock);
 	/* Allow dirty nats by node block allocation in write_begin */
 }
@@ -2668,6 +2644,15 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi)
 	return 0;
 }
 
+static void init_dirty_set_list(struct f2fs_sb_info *sbi)
+{
+	struct f2fs_nm_info *nm_i = NM_I(sbi);
+	int i;
+
+	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++)
+		INIT_LIST_HEAD(&nm_i->dirty_set[i]);
+}
+
 int build_node_manager(struct f2fs_sb_info *sbi)
 {
 	int err;
@@ -2687,6 +2672,9 @@ int build_node_manager(struct f2fs_sb_info *sbi)
 	/* load free nid status from nat_bits table */
 	load_free_nid_bitmap(sbi);
 
+	/* init dirty set list */
+	init_dirty_set_list(sbi);
+
 	build_free_nids(sbi, true, true);
 	return 0;
 }
-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] f2fs: reconstruct sit flushing code
  2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
  2017-05-20 12:24 ` [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing Hou Pengyang
@ 2017-05-20 12:24 ` Hou Pengyang
  2017-05-20 12:24 ` [PATCH 3/5] f2fs: use bucket sort to avoid list sort when flush sit entries Hou Pengyang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-05-20 12:24 UTC (permalink / raw)
  To: jaegeuk, yuchao0, chao, miaoxie; +Cc: linux-f2fs-devel

This patch recontructs sit flushing code for the afterward patch, with logic unchanged.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 fs/f2fs/segment.c | 119 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 64 insertions(+), 55 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2d87aff..2c99fec 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2724,6 +2724,69 @@ static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
 	up_write(&curseg->journal_rwsem);
 }
 
+static void __flush_sit_entry_set(struct f2fs_sb_info *sbi,
+							struct sit_entry_set *ses, struct cp_control *cpc,
+							bool to_journal)
+{
+	struct sit_info *sit_i = SIT_I(sbi);
+	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
+	struct f2fs_journal *journal = curseg->journal;
+	unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
+	struct page *page = NULL;
+	struct f2fs_sit_block *raw_sit = NULL;
+	unsigned int start_segno = ses->start_segno;
+	unsigned int end = min(start_segno + SIT_ENTRY_PER_BLOCK,
+			(unsigned long)MAIN_SEGS(sbi));
+	unsigned int segno = start_segno;
+	struct seg_entry *se;
+
+	if (to_journal) {
+		down_write(&curseg->journal_rwsem);
+	} else {
+		page = get_next_sit_page(sbi, start_segno);
+		raw_sit = page_address(page);
+	}
+
+	/* flush dirty sit entries in region of current sit set */
+	for_each_set_bit_from(segno, bitmap, end) {
+		int offset, sit_offset;
+
+		se = get_seg_entry(sbi, segno);
+
+		/* add discard candidates */
+		if (!(cpc->reason & CP_DISCARD)) {
+			cpc->trim_start = segno;
+	        add_discard_addrs(sbi, cpc, false);
+	    }
+
+		if (to_journal) {
+			offset = lookup_journal_in_cursum(journal,
+				SIT_JOURNAL, segno, 1);
+			f2fs_bug_on(sbi, offset < 0);
+			segno_in_journal(journal, offset) =
+				cpu_to_le32(segno);
+			seg_info_to_raw_sit(se,
+				&sit_in_journal(journal, offset));
+	    } else {
+			sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
+			seg_info_to_raw_sit(se,
+				&raw_sit->entries[sit_offset]);
+	    }
+
+		__clear_bit(segno, bitmap);
+		sit_i->dirty_sentries--;
+		ses->entry_cnt--;
+	}
+	if (to_journal)
+		up_write(&curseg->journal_rwsem);
+	else
+		f2fs_put_page(page, 1);
+
+	f2fs_bug_on(sbi, ses->entry_cnt);
+	release_sit_entry_set(ses);
+
+}
+
 /*
  * CP calls this function, which flushes SIT entries including sit_journal,
  * and moves prefree segs to free segs.
@@ -2731,13 +2794,11 @@ static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
 void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 {
 	struct sit_info *sit_i = SIT_I(sbi);
-	unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
 	struct f2fs_journal *journal = curseg->journal;
 	struct sit_entry_set *ses, *tmp;
 	struct list_head *head = &SM_I(sbi)->sit_entry_set;
 	bool to_journal = true;
-	struct seg_entry *se;
 
 	mutex_lock(&sit_i->sentry_lock);
 
@@ -2764,62 +2825,10 @@ void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	 * #2, flush sit entries to sit page.
 	 */
 	list_for_each_entry_safe(ses, tmp, head, set_list) {
-		struct page *page = NULL;
-		struct f2fs_sit_block *raw_sit = NULL;
-		unsigned int start_segno = ses->start_segno;
-		unsigned int end = min(start_segno + SIT_ENTRY_PER_BLOCK,
-						(unsigned long)MAIN_SEGS(sbi));
-		unsigned int segno = start_segno;
-
 		if (to_journal &&
 			!__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL))
 			to_journal = false;
-
-		if (to_journal) {
-			down_write(&curseg->journal_rwsem);
-		} else {
-			page = get_next_sit_page(sbi, start_segno);
-			raw_sit = page_address(page);
-		}
-
-		/* flush dirty sit entries in region of current sit set */
-		for_each_set_bit_from(segno, bitmap, end) {
-			int offset, sit_offset;
-
-			se = get_seg_entry(sbi, segno);
-
-			/* add discard candidates */
-			if (!(cpc->reason & CP_DISCARD)) {
-				cpc->trim_start = segno;
-				add_discard_addrs(sbi, cpc, false);
-			}
-
-			if (to_journal) {
-				offset = lookup_journal_in_cursum(journal,
-							SIT_JOURNAL, segno, 1);
-				f2fs_bug_on(sbi, offset < 0);
-				segno_in_journal(journal, offset) =
-							cpu_to_le32(segno);
-				seg_info_to_raw_sit(se,
-					&sit_in_journal(journal, offset));
-			} else {
-				sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
-				seg_info_to_raw_sit(se,
-						&raw_sit->entries[sit_offset]);
-			}
-
-			__clear_bit(segno, bitmap);
-			sit_i->dirty_sentries--;
-			ses->entry_cnt--;
-		}
-
-		if (to_journal)
-			up_write(&curseg->journal_rwsem);
-		else
-			f2fs_put_page(page, 1);
-
-		f2fs_bug_on(sbi, ses->entry_cnt);
-		release_sit_entry_set(ses);
+		__flush_sit_entry_set(sbi, ses, cpc, to_journal);
 	}
 
 	f2fs_bug_on(sbi, !list_empty(head));
-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] f2fs: use bucket sort to avoid list sort when flush sit entries
  2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
  2017-05-20 12:24 ` [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing Hou Pengyang
  2017-05-20 12:24 ` [PATCH 2/5] f2fs: reconstruct sit flushing code Hou Pengyang
@ 2017-05-20 12:24 ` Hou Pengyang
  2017-05-20 12:24 ` [PATCH 4/5] f2fs: avoid unnecessary to_journal checking Hou Pengyang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-05-20 12:24 UTC (permalink / raw)
  To: jaegeuk, yuchao0, chao, miaoxie; +Cc: linux-f2fs-devel

Like we optimized nat flushing with bucket sort, in this patch, we introduce a bucket 
 f2fs_sm_info->dirty to linked link all nat_entry_set to be flushed by their cnt_count.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 fs/f2fs/f2fs.h    |  1 +
 fs/f2fs/segment.c | 53 +++++++++++++++++++++++++++++------------------------
 fs/f2fs/segment.h |  3 ++-
 3 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 88a96bb..198da30 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -733,6 +733,7 @@ struct f2fs_sm_info {
 	unsigned int trim_sections;		/* # of sections to trim */
 
 	struct list_head sit_entry_set;	/* sit entry set list */
+	struct list_head dirty_set[SIT_ENTRY_PER_BLOCK + 1];
 
 	unsigned int ipu_policy;	/* in-place-update policy */
 	unsigned int min_ipu_util;	/* in-place-update threshold */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2c99fec..af470d3 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2648,39 +2648,28 @@ static struct sit_entry_set *grab_sit_entry_set(void)
 
 	ses->entry_cnt = 0;
 	INIT_LIST_HEAD(&ses->set_list);
+	INIT_LIST_HEAD(&ses->cnt_list);
 	return ses;
 }
 
 static void release_sit_entry_set(struct sit_entry_set *ses)
 {
 	list_del(&ses->set_list);
+	list_del(&ses->cnt_list);
 	kmem_cache_free(sit_entry_set_slab, ses);
 }
 
-static void adjust_sit_entry_set(struct sit_entry_set *ses,
-						struct list_head *head)
-{
-	struct sit_entry_set *next = ses;
-
-	if (list_is_last(&ses->set_list, head))
-		return;
-
-	list_for_each_entry_continue(next, head, set_list)
-		if (ses->entry_cnt <= next->entry_cnt)
-			break;
-
-	list_move_tail(&ses->set_list, &next->set_list);
-}
-
-static void add_sit_entry(unsigned int segno, struct list_head *head)
+static void add_sit_entry(struct f2fs_sb_info *sbi,
+					unsigned int segno, struct list_head *head)
 {
 	struct sit_entry_set *ses;
+	struct f2fs_sm_info *sm_i = SM_I(sbi);
 	unsigned int start_segno = START_SEGNO(segno);
 
 	list_for_each_entry(ses, head, set_list) {
 		if (ses->start_segno == start_segno) {
 			ses->entry_cnt++;
-			adjust_sit_entry_set(ses, head);
+			list_move_tail(&ses->cnt_list, &sm_i->dirty_set[ses->entry_cnt]);
 			return;
 		}
 	}
@@ -2689,6 +2678,7 @@ static void add_sit_entry(unsigned int segno, struct list_head *head)
 
 	ses->start_segno = start_segno;
 	ses->entry_cnt++;
+	list_move_tail(&ses->cnt_list, &sm_i->dirty_set[ses->entry_cnt]);
 	list_add(&ses->set_list, head);
 }
 
@@ -2700,7 +2690,7 @@ static void add_sits_in_set(struct f2fs_sb_info *sbi)
 	unsigned int segno;
 
 	for_each_set_bit(segno, bitmap, MAIN_SEGS(sbi))
-		add_sit_entry(segno, set_list);
+		add_sit_entry(sbi, segno, set_list);
 }
 
 static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
@@ -2718,7 +2708,7 @@ static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
 		dirtied = __mark_sit_entry_dirty(sbi, segno);
 
 		if (!dirtied)
-			add_sit_entry(segno, &SM_I(sbi)->sit_entry_set);
+			add_sit_entry(sbi, segno, &SM_I(sbi)->sit_entry_set);
 	}
 	update_sits_in_cursum(journal, -i);
 	up_write(&curseg->journal_rwsem);
@@ -2794,11 +2784,13 @@ static void __flush_sit_entry_set(struct f2fs_sb_info *sbi,
 void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 {
 	struct sit_info *sit_i = SIT_I(sbi);
+	struct f2fs_sm_info *sm_i = SM_I(sbi);
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
 	struct f2fs_journal *journal = curseg->journal;
 	struct sit_entry_set *ses, *tmp;
 	struct list_head *head = &SM_I(sbi)->sit_entry_set;
 	bool to_journal = true;
+	int i;
 
 	mutex_lock(&sit_i->sentry_lock);
 
@@ -2824,11 +2816,14 @@ void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	 * #1, flush sit entries to journal in current cold data summary block.
 	 * #2, flush sit entries to sit page.
 	 */
-	list_for_each_entry_safe(ses, tmp, head, set_list) {
-		if (to_journal &&
-			!__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL))
-			to_journal = false;
-		__flush_sit_entry_set(sbi, ses, cpc, to_journal);
+	for (i = 0; i <= SIT_ENTRY_PER_BLOCK; i++) {
+		list_for_each_entry_safe(ses, tmp, &sm_i->dirty_set[i], cnt_list) {
+			if (to_journal &&
+				!__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL))
+				to_journal = false;
+			__flush_sit_entry_set(sbi, ses, cpc, to_journal);
+		}
+		f2fs_bug_on(sbi, !list_empty(&sm_i->dirty_set[i]));
 	}
 
 	f2fs_bug_on(sbi, !list_empty(head));
@@ -3196,6 +3191,15 @@ static void init_min_max_mtime(struct f2fs_sb_info *sbi)
 	mutex_unlock(&sit_i->sentry_lock);
 }
 
+static void init_sit_dirty_set_list(struct f2fs_sb_info *sbi)
+{
+	struct f2fs_sm_info *sm_i = SM_I(sbi);
+	int i;
+
+	for (i = 0; i <= SIT_ENTRY_PER_BLOCK; i++)
+		INIT_LIST_HEAD(&sm_i->dirty_set[i]);
+}
+
 int build_segment_manager(struct f2fs_sb_info *sbi)
 {
 	struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
@@ -3260,6 +3264,7 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
 		return err;
 
 	init_min_max_mtime(sbi);
+	init_sit_dirty_set_list(sbi);
 	return 0;
 }
 
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index e9ba1f1..3fca58e 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -294,7 +294,8 @@ struct curseg_info {
 };
 
 struct sit_entry_set {
-	struct list_head set_list;	/* link with all sit sets */
+	struct list_head set_list;	/* link with all sit sets globally */
+	struct list_head cnt_list;  /* link to sets with same # of dirty sit entries */
 	unsigned int start_segno;	/* start segno of sits in set */
 	unsigned int entry_cnt;		/* the # of sit entries in set */
 };
-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] f2fs: avoid unnecessary to_journal checking
  2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
                   ` (2 preceding siblings ...)
  2017-05-20 12:24 ` [PATCH 3/5] f2fs: use bucket sort to avoid list sort when flush sit entries Hou Pengyang
@ 2017-05-20 12:24 ` Hou Pengyang
  2017-05-20 12:24 ` [PATCH 5/5] f2fs: use an array to record sit_entry_set info to make sort fast Hou Pengyang
  2017-05-24  4:14 ` [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Jaegeuk Kim
  5 siblings, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-05-20 12:24 UTC (permalink / raw)
  To: jaegeuk, yuchao0, chao, miaoxie; +Cc: linux-f2fs-devel

There is no need to do __has_cursem_space checking everytime, this patch fixes it.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 fs/f2fs/node.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 340c33d..58b4f7a 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2373,25 +2373,15 @@ static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
 }
 
 static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
-		struct nat_entry_set *set, struct cp_control *cpc)
+		struct nat_entry_set *set, struct cp_control *cpc, bool to_journal)
 {
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
 	struct f2fs_journal *journal = curseg->journal;
 	nid_t start_nid = set->set * NAT_ENTRY_PER_BLOCK;
-	bool to_journal = true;
 	struct f2fs_nat_block *nat_blk;
 	struct nat_entry *ne, *cur;
 	struct page *page = NULL;
 
-	/*
-	 * there are two steps to flush nat entries:
-	 * #1, flush nat entries to journal in current hot data summary block.
-	 * #2, flush nat entries to nat page.
-	 */
-	if (enabled_nat_bits(sbi, cpc) ||
-		!__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL))
-		to_journal = false;
-
 	if (to_journal) {
 		down_write(&curseg->journal_rwsem);
 	} else {
@@ -2459,6 +2449,7 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
 	struct f2fs_journal *journal = curseg->journal;
 	struct nat_entry_set *set, *tmp;
+	bool to_journal = true;
 	int i;
 	LIST_HEAD(sets);
 
@@ -2476,9 +2467,19 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 		!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
 		remove_nats_in_journal(sbi);
 
+	/*
+	 * there are two steps to flush nat entries:
+	 * #1, flush nat entries to journal in current hot data summary block.
+	 * #2, flush nat entries to nat page.
+	 */
 	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++) {
-		list_for_each_entry_safe(set, tmp, &nm_i->dirty_set[i], set_list)
-			__flush_nat_entry_set(sbi, set, cpc);
+		list_for_each_entry_safe(set, tmp, &nm_i->dirty_set[i], set_list) {
+			if (to_journal && (enabled_nat_bits(sbi, cpc) ||
+				!__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL)))
+				to_journal = false;
+
+			__flush_nat_entry_set(sbi, set, cpc, to_journal);
+		}
 		f2fs_bug_on(sbi, !list_empty(&nm_i->dirty_set[i]));
 	}
 
-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] f2fs: use an array to record sit_entry_set info to make sort fast
  2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
                   ` (3 preceding siblings ...)
  2017-05-20 12:24 ` [PATCH 4/5] f2fs: avoid unnecessary to_journal checking Hou Pengyang
@ 2017-05-20 12:24 ` Hou Pengyang
  2017-05-24  4:14 ` [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Jaegeuk Kim
  5 siblings, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-05-20 12:24 UTC (permalink / raw)
  To: jaegeuk, yuchao0, chao, miaoxie; +Cc: linux-f2fs-devel

Currently, we dynamicly alloc sit_entry_set from slab, and link all the sit_entry_set
by sit_entry_set->set_list.

It is inefficient, since in add_sit_entry, we may need to travel all the list to find
the target sit_entry_set.

This patch fixes this by introducing a static array:

	f2fs_sm_info->sit_sets
	(struct sit_entry_set *sit_sets)

when we can find the target sit_entry_set in O(1) time, and then to update the sit_entry_set
info.

What's more there is no need to alloc/free slab memory.

footprint:(64bit machine)

	1T   :  sizeof(struct sit_entry_set) * 1T/(2M*55) = 0.87M
	128G :  sizeof(struct sit_entry_set) * 128G/(2M*55) = 1.16k
	64G  :  0.55k
	32G  :  0.27k
	

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 fs/f2fs/f2fs.h    |  2 +-
 fs/f2fs/segment.c | 75 +++++++++++++++++++------------------------------------
 fs/f2fs/segment.h |  1 -
 3 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 198da30..429d33b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -732,7 +732,7 @@ struct f2fs_sm_info {
 	/* for batched trimming */
 	unsigned int trim_sections;		/* # of sections to trim */
 
-	struct list_head sit_entry_set;	/* sit entry set list */
+	struct sit_entry_set *sit_sets;	/* # of dirty segment in this sit_entry_set */
 	struct list_head dirty_set[SIT_ENTRY_PER_BLOCK + 1];
 
 	unsigned int ipu_policy;	/* in-place-update policy */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index af470d3..04d6329 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -28,7 +28,6 @@
 
 static struct kmem_cache *discard_entry_slab;
 static struct kmem_cache *discard_cmd_slab;
-static struct kmem_cache *sit_entry_set_slab;
 static struct kmem_cache *inmem_entry_slab;
 
 static unsigned long __reverse_ulong(unsigned char *str)
@@ -2641,56 +2640,30 @@ static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
 	return dst_page;
 }
 
-static struct sit_entry_set *grab_sit_entry_set(void)
-{
-	struct sit_entry_set *ses =
-			f2fs_kmem_cache_alloc(sit_entry_set_slab, GFP_NOFS);
-
-	ses->entry_cnt = 0;
-	INIT_LIST_HEAD(&ses->set_list);
-	INIT_LIST_HEAD(&ses->cnt_list);
-	return ses;
-}
-
 static void release_sit_entry_set(struct sit_entry_set *ses)
 {
-	list_del(&ses->set_list);
 	list_del(&ses->cnt_list);
-	kmem_cache_free(sit_entry_set_slab, ses);
+	INIT_LIST_HEAD(&ses->cnt_list);
 }
 
 static void add_sit_entry(struct f2fs_sb_info *sbi,
-					unsigned int segno, struct list_head *head)
+					unsigned int segno)
 {
-	struct sit_entry_set *ses;
 	struct f2fs_sm_info *sm_i = SM_I(sbi);
-	unsigned int start_segno = START_SEGNO(segno);
+	unsigned int set = SIT_BLOCK_OFFSET(segno);
+	struct sit_entry_set *ses= &sm_i->sit_sets[set];
 
-	list_for_each_entry(ses, head, set_list) {
-		if (ses->start_segno == start_segno) {
-			ses->entry_cnt++;
-			list_move_tail(&ses->cnt_list, &sm_i->dirty_set[ses->entry_cnt]);
-			return;
-		}
-	}
-
-	ses = grab_sit_entry_set();
-
-	ses->start_segno = start_segno;
 	ses->entry_cnt++;
 	list_move_tail(&ses->cnt_list, &sm_i->dirty_set[ses->entry_cnt]);
-	list_add(&ses->set_list, head);
 }
 
 static void add_sits_in_set(struct f2fs_sb_info *sbi)
 {
-	struct f2fs_sm_info *sm_info = SM_I(sbi);
-	struct list_head *set_list = &sm_info->sit_entry_set;
 	unsigned long *bitmap = SIT_I(sbi)->dirty_sentries_bitmap;
 	unsigned int segno;
 
 	for_each_set_bit(segno, bitmap, MAIN_SEGS(sbi))
-		add_sit_entry(sbi, segno, set_list);
+		add_sit_entry(sbi, segno);
 }
 
 static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
@@ -2708,7 +2681,7 @@ static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
 		dirtied = __mark_sit_entry_dirty(sbi, segno);
 
 		if (!dirtied)
-			add_sit_entry(sbi, segno, &SM_I(sbi)->sit_entry_set);
+			add_sit_entry(sbi, segno);
 	}
 	update_sits_in_cursum(journal, -i);
 	up_write(&curseg->journal_rwsem);
@@ -2788,7 +2761,6 @@ void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
 	struct f2fs_journal *journal = curseg->journal;
 	struct sit_entry_set *ses, *tmp;
-	struct list_head *head = &SM_I(sbi)->sit_entry_set;
 	bool to_journal = true;
 	int i;
 
@@ -2826,7 +2798,6 @@ void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 		f2fs_bug_on(sbi, !list_empty(&sm_i->dirty_set[i]));
 	}
 
-	f2fs_bug_on(sbi, !list_empty(head));
 	f2fs_bug_on(sbi, sit_i->dirty_sentries);
 out:
 	if (cpc->reason & CP_DISCARD) {
@@ -3191,13 +3162,26 @@ static void init_min_max_mtime(struct f2fs_sb_info *sbi)
 	mutex_unlock(&sit_i->sentry_lock);
 }
 
-static void init_sit_dirty_set_list(struct f2fs_sb_info *sbi)
+static int init_sit_dirty_set_list(struct f2fs_sb_info *sbi)
 {
 	struct f2fs_sm_info *sm_i = SM_I(sbi);
-	int i;
+	int i, entries;
+
+	entries = (MAIN_SEGS(sbi) + SIT_ENTRY_PER_BLOCK - 1) / SIT_ENTRY_PER_BLOCK;
+	sm_i->sit_sets = f2fs_kvzalloc(entries *
+							sizeof(struct sit_entry_set), GFP_KERNEL);
+	if (!sm_i->sit_sets)
+		return -ENOMEM;
+
+	/* kvzalloc, so no need to init sit_entry_set->entry-cnt */
+	for (i = 0; i < entries; i++) {
+		INIT_LIST_HEAD(&sm_i->sit_sets[i].cnt_list);
+		sm_i->sit_sets[i].start_segno = i * SIT_ENTRY_PER_BLOCK;
+	}
 
 	for (i = 0; i <= SIT_ENTRY_PER_BLOCK; i++)
 		INIT_LIST_HEAD(&sm_i->dirty_set[i]);
+	return 0;
 }
 
 int build_segment_manager(struct f2fs_sb_info *sbi)
@@ -3233,8 +3217,6 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
 
 	sm_info->trim_sections = DEF_BATCHED_TRIM_SECTIONS;
 
-	INIT_LIST_HEAD(&sm_info->sit_entry_set);
-
 	if (test_opt(sbi, FLUSH_MERGE) && !f2fs_readonly(sbi->sb)) {
 		err = create_flush_cmd_control(sbi);
 		if (err)
@@ -3264,8 +3246,8 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
 		return err;
 
 	init_min_max_mtime(sbi);
-	init_sit_dirty_set_list(sbi);
-	return 0;
+	err = init_sit_dirty_set_list(sbi);
+	return err;
 }
 
 static void discard_dirty_segmap(struct f2fs_sb_info *sbi,
@@ -3372,6 +3354,7 @@ void destroy_segment_manager(struct f2fs_sb_info *sbi)
 	destroy_curseg(sbi);
 	destroy_free_segmap(sbi);
 	destroy_sit_info(sbi);
+	kfree(sm_info->sit_sets);
 	sbi->sm_info = NULL;
 	kfree(sm_info);
 }
@@ -3388,19 +3371,12 @@ int __init create_segment_manager_caches(void)
 	if (!discard_cmd_slab)
 		goto destroy_discard_entry;
 
-	sit_entry_set_slab = f2fs_kmem_cache_create("sit_entry_set",
-			sizeof(struct sit_entry_set));
-	if (!sit_entry_set_slab)
-		goto destroy_discard_cmd;
-
 	inmem_entry_slab = f2fs_kmem_cache_create("inmem_page_entry",
 			sizeof(struct inmem_pages));
 	if (!inmem_entry_slab)
-		goto destroy_sit_entry_set;
+		goto destroy_discard_cmd;
 	return 0;
 
-destroy_sit_entry_set:
-	kmem_cache_destroy(sit_entry_set_slab);
 destroy_discard_cmd:
 	kmem_cache_destroy(discard_cmd_slab);
 destroy_discard_entry:
@@ -3411,7 +3387,6 @@ int __init create_segment_manager_caches(void)
 
 void destroy_segment_manager_caches(void)
 {
-	kmem_cache_destroy(sit_entry_set_slab);
 	kmem_cache_destroy(discard_cmd_slab);
 	kmem_cache_destroy(discard_entry_slab);
 	kmem_cache_destroy(inmem_entry_slab);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 3fca58e..dfb6ed5 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -294,7 +294,6 @@ struct curseg_info {
 };
 
 struct sit_entry_set {
-	struct list_head set_list;	/* link with all sit sets globally */
 	struct list_head cnt_list;  /* link to sets with same # of dirty sit entries */
 	unsigned int start_segno;	/* start segno of sits in set */
 	unsigned int entry_cnt;		/* the # of sit entries in set */
-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp
  2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
                   ` (4 preceding siblings ...)
  2017-05-20 12:24 ` [PATCH 5/5] f2fs: use an array to record sit_entry_set info to make sort fast Hou Pengyang
@ 2017-05-24  4:14 ` Jaegeuk Kim
  2017-06-09 13:48   ` Hou Pengyang
  5 siblings, 1 reply; 11+ messages in thread
From: Jaegeuk Kim @ 2017-05-24  4:14 UTC (permalink / raw)
  To: Hou Pengyang; +Cc: chao, linux-f2fs-devel

Hi Pengyang,

Could you please shed a light on some performance gains by this?

Thanks,

On 05/20, Hou Pengyang wrote:
> This patches are to designed to optimize NAT/SIT flushing procedure:
> 
> patch 1) -- patch 3):
> 
> during flush_nat_entries, we do: 
> 1) gang_lookup a radix tree, find all the dirty nat_entry_set;
> 2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to 
>     write to journal as much as possible to avoid unnessary IO. 
> 
> This patch optimize the look_up & sort algorithm by introducing an array:
> 
>     f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
>     (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]) 
> 
> dirty_set[1] links all nat_entry_set whose entry_cnt is 1
> dirty_set[2] links all nat_entry_set whose entry_cnt is 2
> ......
> dirty_set[N] links all nat_entry_set whose entry_cnt is N
> 
> as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT 
> be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.
> 
> So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.
> 
> Update:
>     we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
> 1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry's
> entry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.
> 
> Flush:
>     when flush dirty nat_entry_set, we just flush nat_entry_set from 
> f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
> gang_lookup and sort can be avoid.
>     
> footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
>                               = 8*455 = 3.6k bytes
> 
> Same for SIT.
> 
> In patch 5), we use array to manage sit_entry_sets instead of list, which 
> can avoid list travel, and there is no need to alloc/free lab memory.
> It is low footrpint consuming.
> 
> Hou Pengyang (5):
>   f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
>   f2fs: reconstruct sit flushing code
>   f2fs: use bucket sort to avoid list sort when flush sit entries
>   f2fs: avoid unnecessary to_journal checking
>   f2fs: use an array to record sit_entry_set info to make sort fast
> 
>  fs/f2fs/f2fs.h    |   5 +-
>  fs/f2fs/node.c    |  79 ++++++++-----------
>  fs/f2fs/segment.c | 227 ++++++++++++++++++++++++++----------------------------
>  fs/f2fs/segment.h |   2 +-
>  4 files changed, 147 insertions(+), 166 deletions(-)
> 
> -- 
> 2.10.1
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
  2017-05-20 12:24 ` [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing Hou Pengyang
@ 2017-06-07  9:44   ` Chao Yu
  2017-06-09 13:48     ` Hou Pengyang
  2017-06-09 13:51   ` Hou Pengyang
  1 sibling, 1 reply; 11+ messages in thread
From: Chao Yu @ 2017-06-07  9:44 UTC (permalink / raw)
  To: Hou Pengyang, jaegeuk, chao, miaoxie; +Cc: linux-f2fs-devel

On 2017/5/20 20:24, Hou Pengyang wrote:
> during flush_nat_entries, we do: 
> 1) gang_lookup a radix tree, find all the dirty nat_entry_set;
> 2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to 
>     write to journal as much as possible to avoid unnessary IO. 
> 
> This patch optimize the look_up & sort algorithm by introducing an array:
> 
>     f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
>     (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]) 
> 
> dirty_set[1] links all nat_entry_set whose entry_cnt is 1
> dirty_set[2] links all nat_entry_set whose entry_cnt is 2
> ......
> dirty_set[N] links all nat_entry_set whose entry_cnt is N
> 
> as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT 
> be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.
> 
> So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.
> 
> Update:
>     we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
> 1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry'sentry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.
> 
> Flush:
>     when flush dirty nat_entry_set, we just flush nat_entry_set from
> f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
> gang_lookup and sort can be avoid.
>     
> footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
>                               = 8*455 = 3.6k bytes
> 
> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
> ---
>  fs/f2fs/f2fs.h |  2 ++
>  fs/f2fs/node.c | 56 ++++++++++++++++++++++----------------------------------
>  2 files changed, 24 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 093d68a..88a96bb 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -628,6 +628,8 @@ struct f2fs_nm_info {
>  
>  	/* for checkpoint */
>  	char *nat_bitmap;		/* NAT bitmap pointer */
> +	/* list dirty_set whose */
> +	struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]; /* for */
>  
>  	unsigned int nat_bits_blocks;	/* # of nat bits blocks */
>  	unsigned char *nat_bits;	/* NAT bits blocks */
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index d22db8c..340c33d 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -174,13 +174,14 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
>  	list_move_tail(&ne->list, &head->entry_list);
>  	nm_i->dirty_nat_cnt++;
>  	head->entry_cnt++;
> +	list_move_tail(&head->set_list, &(nm_i->dirty_set[head->entry_cnt]));
>  	set_nat_flag(ne, IS_DIRTY, true);
>  }
>  
>  static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
>  		struct nat_entry_set *set, struct nat_entry *ne)
>  {
> -	list_move_tail(&ne->list, &nm_i->nat_entries);
> +	list_move_tail(&ne->list, &nm_i->nat_entries); /* to be shrinked */
>  	set_nat_flag(ne, IS_DIRTY, false);
>  	set->entry_cnt--;
>  	nm_i->dirty_nat_cnt--;
> @@ -2340,24 +2341,6 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
>  	up_write(&curseg->journal_rwsem);
>  }
>  
> -static void __adjust_nat_entry_set(struct nat_entry_set *nes,
> -						struct list_head *head, int max)
> -{
> -	struct nat_entry_set *cur;
> -
> -	if (nes->entry_cnt >= max)
> -		goto add_out;
> -
> -	list_for_each_entry(cur, head, set_list) {
> -		if (cur->entry_cnt >= nes->entry_cnt) {
> -			list_add(&nes->set_list, cur->set_list.prev);
> -			return;
> -		}
> -	}
> -add_out:
> -	list_add_tail(&nes->set_list, head);
> -}
> -
>  static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
>  						struct page *page)
>  {
> @@ -2460,9 +2443,11 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
>  
>  	/* Allow dirty nats by node block allocation in write_begin */
>  	if (!set->entry_cnt) {
> +		list_del(&set->set_list);
>  		radix_tree_delete(&NM_I(sbi)->nat_set_root, set->set);
>  		kmem_cache_free(nat_entry_set_slab, set);
> -	}
> +	} else
> +		f2fs_bug_on(sbi, 1);

Should remove this bug_on, because there will be new nat entries allocated
during flush nat entries since we allow buffered write during checkpoint.

Thanks,

>  }
>  
>  /*
> @@ -2473,10 +2458,8 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>  	struct f2fs_nm_info *nm_i = NM_I(sbi);
>  	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
>  	struct f2fs_journal *journal = curseg->journal;
> -	struct nat_entry_set *setvec[SETVEC_SIZE];
>  	struct nat_entry_set *set, *tmp;
> -	unsigned int found;
> -	nid_t set_idx = 0;
> +	int i;
>  	LIST_HEAD(sets);
>  
>  	if (!nm_i->dirty_nat_cnt)
> @@ -2493,19 +2476,12 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>  		!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
>  		remove_nats_in_journal(sbi);
>  
> -	while ((found = __gang_lookup_nat_set(nm_i,
> -					set_idx, SETVEC_SIZE, setvec))) {
> -		unsigned idx;
> -		set_idx = setvec[found - 1]->set + 1;
> -		for (idx = 0; idx < found; idx++)
> -			__adjust_nat_entry_set(setvec[idx], &sets,
> -						MAX_NAT_JENTRIES(journal));
> +	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++) {
> +		list_for_each_entry_safe(set, tmp, &nm_i->dirty_set[i], set_list)
> +			__flush_nat_entry_set(sbi, set, cpc);
> +		f2fs_bug_on(sbi, !list_empty(&nm_i->dirty_set[i]));
>  	}
>  
> -	/* flush dirty nats in nat entry set */
> -	list_for_each_entry_safe(set, tmp, &sets, set_list)
> -		__flush_nat_entry_set(sbi, set, cpc);
> -
>  	up_write(&nm_i->nat_tree_lock);
>  	/* Allow dirty nats by node block allocation in write_begin */
>  }
> @@ -2668,6 +2644,15 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi)
>  	return 0;
>  }
>  
> +static void init_dirty_set_list(struct f2fs_sb_info *sbi)
> +{
> +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> +	int i;
> +
> +	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++)
> +		INIT_LIST_HEAD(&nm_i->dirty_set[i]);
> +}
> +
>  int build_node_manager(struct f2fs_sb_info *sbi)
>  {
>  	int err;
> @@ -2687,6 +2672,9 @@ int build_node_manager(struct f2fs_sb_info *sbi)
>  	/* load free nid status from nat_bits table */
>  	load_free_nid_bitmap(sbi);
>  
> +	/* init dirty set list */
> +	init_dirty_set_list(sbi);
> +
>  	build_free_nids(sbi, true, true);
>  	return 0;
>  }
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
  2017-06-07  9:44   ` Chao Yu
@ 2017-06-09 13:48     ` Hou Pengyang
  0 siblings, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-06-09 13:48 UTC (permalink / raw)
  To: Chao Yu; +Cc: chao, linux-f2fs-devel, jaegeuk

On 2017/6/7 17:44, Chao Yu wrote:
> On 2017/5/20 20:24, Hou Pengyang wrote:
>> during flush_nat_entries, we do:
>> 1) gang_lookup a radix tree, find all the dirty nat_entry_set;
>> 2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to
>>      write to journal as much as possible to avoid unnessary IO.
>>
>> This patch optimize the look_up & sort algorithm by introducing an array:
>>
>>      f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
>>      (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1])
>>
>> dirty_set[1] links all nat_entry_set whose entry_cnt is 1
>> dirty_set[2] links all nat_entry_set whose entry_cnt is 2
>> ......
>> dirty_set[N] links all nat_entry_set whose entry_cnt is N
>>
>> as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT
>> be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.
>>
>> So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.
>>
>> Update:
>>      we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
>> 1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry'sentry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.
>>
>> Flush:
>>      when flush dirty nat_entry_set, we just flush nat_entry_set from
>> f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
>> gang_lookup and sort can be avoid.
>>
>> footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
>>                                = 8*455 = 3.6k bytes
>>
>> Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
>> ---
>>   fs/f2fs/f2fs.h |  2 ++
>>   fs/f2fs/node.c | 56 ++++++++++++++++++++++----------------------------------
>>   2 files changed, 24 insertions(+), 34 deletions(-)
>>
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 093d68a..88a96bb 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -628,6 +628,8 @@ struct f2fs_nm_info {
>>
>>   	/* for checkpoint */
>>   	char *nat_bitmap;		/* NAT bitmap pointer */
>> +	/* list dirty_set whose */
>> +	struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]; /* for */
>>
>>   	unsigned int nat_bits_blocks;	/* # of nat bits blocks */
>>   	unsigned char *nat_bits;	/* NAT bits blocks */
>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>> index d22db8c..340c33d 100644
>> --- a/fs/f2fs/node.c
>> +++ b/fs/f2fs/node.c
>> @@ -174,13 +174,14 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
>>   	list_move_tail(&ne->list, &head->entry_list);
>>   	nm_i->dirty_nat_cnt++;
>>   	head->entry_cnt++;
>> +	list_move_tail(&head->set_list, &(nm_i->dirty_set[head->entry_cnt]));
>>   	set_nat_flag(ne, IS_DIRTY, true);
>>   }
>>
>>   static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
>>   		struct nat_entry_set *set, struct nat_entry *ne)
>>   {
>> -	list_move_tail(&ne->list, &nm_i->nat_entries);
>> +	list_move_tail(&ne->list, &nm_i->nat_entries); /* to be shrinked */
>>   	set_nat_flag(ne, IS_DIRTY, false);
>>   	set->entry_cnt--;
>>   	nm_i->dirty_nat_cnt--;
>> @@ -2340,24 +2341,6 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
>>   	up_write(&curseg->journal_rwsem);
>>   }
>>
>> -static void __adjust_nat_entry_set(struct nat_entry_set *nes,
>> -						struct list_head *head, int max)
>> -{
>> -	struct nat_entry_set *cur;
>> -
>> -	if (nes->entry_cnt >= max)
>> -		goto add_out;
>> -
>> -	list_for_each_entry(cur, head, set_list) {
>> -		if (cur->entry_cnt >= nes->entry_cnt) {
>> -			list_add(&nes->set_list, cur->set_list.prev);
>> -			return;
>> -		}
>> -	}
>> -add_out:
>> -	list_add_tail(&nes->set_list, head);
>> -}
>> -
>>   static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
>>   						struct page *page)
>>   {
>> @@ -2460,9 +2443,11 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
>>
>>   	/* Allow dirty nats by node block allocation in write_begin */
>>   	if (!set->entry_cnt) {
>> +		list_del(&set->set_list);
>>   		radix_tree_delete(&NM_I(sbi)->nat_set_root, set->set);
>>   		kmem_cache_free(nat_entry_set_slab, set);
>> -	}
>> +	} else
>> +		f2fs_bug_on(sbi, 1);
>
> Should remove this bug_on, because there will be new nat entries allocated
> during flush nat entries since we allow buffered write during checkpoint.
>
> Thanks,

hi, chao, I've got it. I will send a new version of this patch.
Thanks,
>
>>   }
>>
>>   /*
>> @@ -2473,10 +2458,8 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>   	struct f2fs_nm_info *nm_i = NM_I(sbi);
>>   	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
>>   	struct f2fs_journal *journal = curseg->journal;
>> -	struct nat_entry_set *setvec[SETVEC_SIZE];
>>   	struct nat_entry_set *set, *tmp;
>> -	unsigned int found;
>> -	nid_t set_idx = 0;
>> +	int i;
>>   	LIST_HEAD(sets);
>>
>>   	if (!nm_i->dirty_nat_cnt)
>> @@ -2493,19 +2476,12 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>   		!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
>>   		remove_nats_in_journal(sbi);
>>
>> -	while ((found = __gang_lookup_nat_set(nm_i,
>> -					set_idx, SETVEC_SIZE, setvec))) {
>> -		unsigned idx;
>> -		set_idx = setvec[found - 1]->set + 1;
>> -		for (idx = 0; idx < found; idx++)
>> -			__adjust_nat_entry_set(setvec[idx], &sets,
>> -						MAX_NAT_JENTRIES(journal));
>> +	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++) {
>> +		list_for_each_entry_safe(set, tmp, &nm_i->dirty_set[i], set_list)
>> +			__flush_nat_entry_set(sbi, set, cpc);
>> +		f2fs_bug_on(sbi, !list_empty(&nm_i->dirty_set[i]));
>>   	}
>>
>> -	/* flush dirty nats in nat entry set */
>> -	list_for_each_entry_safe(set, tmp, &sets, set_list)
>> -		__flush_nat_entry_set(sbi, set, cpc);
>> -
>>   	up_write(&nm_i->nat_tree_lock);
>>   	/* Allow dirty nats by node block allocation in write_begin */
>>   }
>> @@ -2668,6 +2644,15 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi)
>>   	return 0;
>>   }
>>
>> +static void init_dirty_set_list(struct f2fs_sb_info *sbi)
>> +{
>> +	struct f2fs_nm_info *nm_i = NM_I(sbi);
>> +	int i;
>> +
>> +	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++)
>> +		INIT_LIST_HEAD(&nm_i->dirty_set[i]);
>> +}
>> +
>>   int build_node_manager(struct f2fs_sb_info *sbi)
>>   {
>>   	int err;
>> @@ -2687,6 +2672,9 @@ int build_node_manager(struct f2fs_sb_info *sbi)
>>   	/* load free nid status from nat_bits table */
>>   	load_free_nid_bitmap(sbi);
>>
>> +	/* init dirty set list */
>> +	init_dirty_set_list(sbi);
>> +
>>   	build_free_nids(sbi, true, true);
>>   	return 0;
>>   }
>>
>
>
> .
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp
  2017-05-24  4:14 ` [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Jaegeuk Kim
@ 2017-06-09 13:48   ` Hou Pengyang
  0 siblings, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-06-09 13:48 UTC (permalink / raw)
  To: Jaegeuk Kim, chao; +Cc: linux-f2fs-devel

On 2017/5/24 12:14, Jaegeuk Kim wrote:
> Hi Pengyang,
>
> Could you please shed a light on some performance gains by this?

Hi, Jaegeuk,
Sorry for replying so late.

I've tested the duration of nat-radix-tree traveling and list-sort of
nat-entry, it is about *100us* averagely during fragmentation(arm64 mobile).

100us seems nobody for CP, but I still think it make senses for scenario
where massive dirty nat_entries exist in system(corner case, hard to 
produce).

Thanks,
>
> Thanks,
>
> On 05/20, Hou Pengyang wrote:
>> This patches are to designed to optimize NAT/SIT flushing procedure:
>>
>> patch 1) -- patch 3):
>>
>> during flush_nat_entries, we do:
>> 1) gang_lookup a radix tree, find all the dirty nat_entry_set;
>> 2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to
>>      write to journal as much as possible to avoid unnessary IO.
>>
>> This patch optimize the look_up & sort algorithm by introducing an array:
>>
>>      f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
>>      (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1])
>>
>> dirty_set[1] links all nat_entry_set whose entry_cnt is 1
>> dirty_set[2] links all nat_entry_set whose entry_cnt is 2
>> ......
>> dirty_set[N] links all nat_entry_set whose entry_cnt is N
>>
>> as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT
>> be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.
>>
>> So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.
>>
>> Update:
>>      we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
>> 1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry's
>> entry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.
>>
>> Flush:
>>      when flush dirty nat_entry_set, we just flush nat_entry_set from
>> f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
>> gang_lookup and sort can be avoid.
>>
>> footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
>>                                = 8*455 = 3.6k bytes
>>
>> Same for SIT.
>>
>> In patch 5), we use array to manage sit_entry_sets instead of list, which
>> can avoid list travel, and there is no need to alloc/free lab memory.
>> It is low footrpint consuming.
>>
>> Hou Pengyang (5):
>>    f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
>>    f2fs: reconstruct sit flushing code
>>    f2fs: use bucket sort to avoid list sort when flush sit entries
>>    f2fs: avoid unnecessary to_journal checking
>>    f2fs: use an array to record sit_entry_set info to make sort fast
>>
>>   fs/f2fs/f2fs.h    |   5 +-
>>   fs/f2fs/node.c    |  79 ++++++++-----------
>>   fs/f2fs/segment.c | 227 ++++++++++++++++++++++++++----------------------------
>>   fs/f2fs/segment.h |   2 +-
>>   4 files changed, 147 insertions(+), 166 deletions(-)
>>
>> --
>> 2.10.1
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Linux-f2fs-devel mailing list
>> Linux-f2fs-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
> .
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing
  2017-05-20 12:24 ` [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing Hou Pengyang
  2017-06-07  9:44   ` Chao Yu
@ 2017-06-09 13:51   ` Hou Pengyang
  1 sibling, 0 replies; 11+ messages in thread
From: Hou Pengyang @ 2017-06-09 13:51 UTC (permalink / raw)
  To: jaegeuk, chao, yuchao0, miaoxie; +Cc: linux-f2fs-devel

during flush_nat_entries, we do: 
1) gang_lookup a radix tree, find all the dirty nat_entry_set;
2) sort nat_entry_set by nat_entry_set->entry_cnt, in order to 
    write to journal as much as possible to avoid unnessary IO. 

This patch optimize the look_up & sort algorithm by introducing an array:

    f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK+1];
    (struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1]) 

dirty_set[1] links all nat_entry_set whose entry_cnt is 1
dirty_set[2] links all nat_entry_set whose entry_cnt is 2
......
dirty_set[N] links all nat_entry_set whose entry_cnt is N

as one NAT block has NAT_ENTRY_PER_BLOCK entries at MOST, so there should NOT 
be one nat_entry_set whose entry_cnt is larger than NAT_ENTRY_PER_BLOCK.

So it is enough for f2fs_nm_info->dirty_set to link all nat_entry_sets in system.

Update:
    we update nat_entry_set in real-time, e.g originally nat_entry_set->entry_cnt is
1, and licked by dirty_set[1]; then call __set_nat_cache_dirty to increase nat_entry'sentry_cnt, we move this nat_entry_set ot dirty_set[2]'s tail, no need to compare.

Flush:
    when flush dirty nat_entry_set, we just flush nat_entry_set from
f2fs_nm_info->dirty_set[0] to f2fs_nm_info->dirty_set[NAT_ENTRY_PER_BLOCK], where
gang_lookup and sort can be avoid.
    
footprint of this algorithm:  sizeof(struct list_head)*NAT_ENTRY_PER_BLOCK
                              = 8*455 = 3.6k bytes

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
---
 fs/f2fs/f2fs.h |  2 ++
 fs/f2fs/node.c | 53 ++++++++++++++++++++---------------------------------
 2 files changed, 22 insertions(+), 33 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 093d68a..1cf35c4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -629,6 +629,8 @@ struct f2fs_nm_info {
 	/* for checkpoint */
 	char *nat_bitmap;		/* NAT bitmap pointer */
 
+	struct list_head dirty_set[NAT_ENTRY_PER_BLOCK + 1];
+
 	unsigned int nat_bits_blocks;	/* # of nat bits blocks */
 	unsigned char *nat_bits;	/* NAT bits blocks */
 	unsigned char *full_nat_bits;	/* full NAT pages */
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d22db8c..454034f 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -174,13 +174,14 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 	list_move_tail(&ne->list, &head->entry_list);
 	nm_i->dirty_nat_cnt++;
 	head->entry_cnt++;
+	list_move_tail(&head->set_list, &(nm_i->dirty_set[head->entry_cnt]));
 	set_nat_flag(ne, IS_DIRTY, true);
 }
 
 static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 		struct nat_entry_set *set, struct nat_entry *ne)
 {
-	list_move_tail(&ne->list, &nm_i->nat_entries);
+	list_move_tail(&ne->list, &nm_i->nat_entries); /* to be shrinked */
 	set_nat_flag(ne, IS_DIRTY, false);
 	set->entry_cnt--;
 	nm_i->dirty_nat_cnt--;
@@ -2340,24 +2341,6 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
 	up_write(&curseg->journal_rwsem);
 }
 
-static void __adjust_nat_entry_set(struct nat_entry_set *nes,
-						struct list_head *head, int max)
-{
-	struct nat_entry_set *cur;
-
-	if (nes->entry_cnt >= max)
-		goto add_out;
-
-	list_for_each_entry(cur, head, set_list) {
-		if (cur->entry_cnt >= nes->entry_cnt) {
-			list_add(&nes->set_list, cur->set_list.prev);
-			return;
-		}
-	}
-add_out:
-	list_add_tail(&nes->set_list, head);
-}
-
 static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
 						struct page *page)
 {
@@ -2460,6 +2443,7 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
 
 	/* Allow dirty nats by node block allocation in write_begin */
 	if (!set->entry_cnt) {
+		list_del(&set->set_list);
 		radix_tree_delete(&NM_I(sbi)->nat_set_root, set->set);
 		kmem_cache_free(nat_entry_set_slab, set);
 	}
@@ -2473,10 +2457,8 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	struct f2fs_nm_info *nm_i = NM_I(sbi);
 	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
 	struct f2fs_journal *journal = curseg->journal;
-	struct nat_entry_set *setvec[SETVEC_SIZE];
 	struct nat_entry_set *set, *tmp;
-	unsigned int found;
-	nid_t set_idx = 0;
+	int i;
 	LIST_HEAD(sets);
 
 	if (!nm_i->dirty_nat_cnt)
@@ -2493,19 +2475,12 @@ void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 		!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
 		remove_nats_in_journal(sbi);
 
-	while ((found = __gang_lookup_nat_set(nm_i,
-					set_idx, SETVEC_SIZE, setvec))) {
-		unsigned idx;
-		set_idx = setvec[found - 1]->set + 1;
-		for (idx = 0; idx < found; idx++)
-			__adjust_nat_entry_set(setvec[idx], &sets,
-						MAX_NAT_JENTRIES(journal));
+	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++) {
+		list_for_each_entry_safe(set, tmp, &nm_i->dirty_set[i], set_list)
+			__flush_nat_entry_set(sbi, set, cpc);
+		f2fs_bug_on(sbi, !list_empty(&nm_i->dirty_set[i]));
 	}
 
-	/* flush dirty nats in nat entry set */
-	list_for_each_entry_safe(set, tmp, &sets, set_list)
-		__flush_nat_entry_set(sbi, set, cpc);
-
 	up_write(&nm_i->nat_tree_lock);
 	/* Allow dirty nats by node block allocation in write_begin */
 }
@@ -2668,6 +2643,15 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi)
 	return 0;
 }
 
+static void init_dirty_set_list(struct f2fs_sb_info *sbi)
+{
+	struct f2fs_nm_info *nm_i = NM_I(sbi);
+	int i;
+
+	for (i = 0; i <= NAT_ENTRY_PER_BLOCK; i++)
+		INIT_LIST_HEAD(&nm_i->dirty_set[i]);
+}
+
 int build_node_manager(struct f2fs_sb_info *sbi)
 {
 	int err;
@@ -2687,6 +2671,9 @@ int build_node_manager(struct f2fs_sb_info *sbi)
 	/* load free nid status from nat_bits table */
 	load_free_nid_bitmap(sbi);
 
+	/* init dirty set list */
+	init_dirty_set_list(sbi);
+
 	build_free_nids(sbi, true, true);
 	return 0;
 }
-- 
2.10.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-06-09 13:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-20 12:24 [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Hou Pengyang
2017-05-20 12:24 ` [PATCH 1/5] f2fs: use bucket sort to avoid tree lookup and list sort when nat flushing Hou Pengyang
2017-06-07  9:44   ` Chao Yu
2017-06-09 13:48     ` Hou Pengyang
2017-06-09 13:51   ` Hou Pengyang
2017-05-20 12:24 ` [PATCH 2/5] f2fs: reconstruct sit flushing code Hou Pengyang
2017-05-20 12:24 ` [PATCH 3/5] f2fs: use bucket sort to avoid list sort when flush sit entries Hou Pengyang
2017-05-20 12:24 ` [PATCH 4/5] f2fs: avoid unnecessary to_journal checking Hou Pengyang
2017-05-20 12:24 ` [PATCH 5/5] f2fs: use an array to record sit_entry_set info to make sort fast Hou Pengyang
2017-05-24  4:14 ` [PATCH v2 0/5] f2fs: improve performance of NAT/SIT flushing during cp Jaegeuk Kim
2017-06-09 13:48   ` Hou Pengyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.