All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Gaoxiang (OS)" <gaoxiang25@huawei.com>
To: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: "linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Duwei (Device OS)" <weidu.du@huawei.com>,
	"chao@kernel.org" <chao@kernel.org>,
	"Yuchao (T)" <yuchao0@huawei.com>
Subject: Re: [f2fs-dev] [PATCH RFC] f2fs: clean up build_sit_entries
Date: Thu, 21 Dec 2017 01:31:56 +0000	[thread overview]
Message-ID: <9047C53C18267742AB12E43B65C7F9F70BCB7DD8@dggemi505-mbs.china.huawei.com> (raw)
In-Reply-To: <20171220203903.GB52542@jaegeuk-macbookpro.roam.corp.google.com>

Hi Jaegeuk,

On 2017/12/21 4:39, Jaegeuk Kim wrote:
> On 12/20, Gaoxiang (OS) wrote:
>> Hi Jaegeuk,
>>
>> On 2017/12/20 10:31, Jaegeuk Kim wrote:
>>> On 12/20, Gaoxiang (OS) wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> On 2017/12/20 7:40, Jaegeuk Kim wrote:
>>>>> On 12/17, gaoxiang (P) wrote:
>>>>>> clean up some redundant repeat code in build_sit_entries
>>>>>> and seperate some logic into new functions:
>>>>>>     - build_discard_entries
>>>>>>     - build_sec_entries
>>>>>
>>>>> This patch makes another big loop redundantly?
>>>>>
>>>> "condition & jump" is the fundamental of "if" and loop.
>>>> In the worst case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > 1
>>>> are true, and I think, we have:
>>>> - the original approach:
>>>> 	MAIN_SEGS   <-- loop condition
>>>> 	MAIN_SEGS   <-- f2fs_discard_en(sbi)
>>>> 	MAIN_SEGS   <-- sbi->segs_per_sec > 1
>>>>
>>>> 	sits_in_cursum(journal) <-- loop_condition
>>>> 	sits_in_cursum(journal) <-- f2fs_discard_en(sbi)
>>>> 	sits_in_cursum(journal) <-- sbi->segs_per_sec > 1
>>>>       it takes about 3*(MAIN_SEGS + sits_in_cursum(journal)) jumps effort.
>>>> - this patch:
>>>> 	MAIN_SEGS   <-- loop condition
>>>> 	sits_in_cursum(journal) <-- loop_condition
>>>>
>>>> 	MAIN_SEGS   <-- f2fs_discard_en(sbi)
>>>> 	MAIN_SEGS   <-- sbi->segs_per_sec > 1
>>>>       it takes abourt 3*MAIN_SEGS + sits_in_cursum(journal) jumps effort.
>>>>
>>>> and in the best case, both f2fs_discard_en(sbi) and sbi->segs_per_sec >
>>>> 1 are false, and we have:
>>>> - this patch
>>>> 	MAIN_SEGS   <-- loop condition
>>>>
>>>> 	sits_in_cursum(journal) <-- loop_condition
>>>> See
>>>> https://stackoverflow.com/questions/1462710/can-c-compilers-optimize-if-statements-inside-for-loops
>>>
>>> Interesting!
>>>
>>> What I can think of the worst case looks like:
>>>
>>> In current code,
>>> 	for (N) {
>>> 		do_something;
>>> 		if (f2fs_discard_en())
>>> 			do _something;
>>> 		if (sbi->segs_per_sec > 1)
>>> 			do_something;
>>> 	}
>>> 	for (sits_in_cursum()) {
>>> 		do_something;
>>> 		if (f2fs_discard_en())
>>> 			do _something;
>>> 		if (sbi->segs_per_sec > 1)
>>> 			do_something;
>>> 	}
>>> => O(N + sits_in_cursum())
>>
>> In the worst case Its 3(1 -- exit condition < MAIN_SEG, 2 --  f2fs_discard_en(), 3 --
>> sbi->segs_per_sec > 1) *N + 3*sits_in_cursum() if condition & jump
>> instuctions at runtime.
>>
>> If we use Big O notion, yes I think O(N + sits_in_cursum())
>>
>>>
>>> Once compiler unrolled the loop, we can expect most of jumps could be hit by
>>> branch predictor, since the loop does not have many branches.
>>
>> The current code, if the compiler decides to unroll the loop, it could unroll it into, I guess
>>    	for (N) {
>>    		do_something;
>> 	}
>> 	if (f2fs_discard_en()) {
>> 		for(N) {
>>    			do _something;
>> 		}
>> 	}
>> 	if (sbi->segs_per_sec > 1) {
>> 		for(N) {
>> 			do_something;
>> 		}
>> 	}
>> 	...
>> 	for (sits_in_cursum()) {
>> 		do_something;
>> 	}
>> 	if (f2fs_discard_en()) {
>> 		for (N) {
>> 			do _something;
>> 		}
>> 	}
>> 	if (sbi->segs_per_sec > 1) {
>> 		for (N) {
>> 			do_something;
>> 		}
>>          	}
> 
> Unrolled loop would be something like:
> 	for (N/n) {
> 		do_something #1;
> 		if (f2fs_discard_en())
> 			do_something;
> 		if (sbi->segs_per_sec > 1)
> 			do_something;
> 		...
> 		do_something #n;
> 		if (f2fs_discard_en())
> 			do_something;
> 		if (sbi->segs_per_sec > 1)
> 			do_something;
> 	}
> 
> And, branch predictor can avoid three branch conditions by BTB.
> 
> do_something #1;
>    if (f2fs_discard_en())
>      do _something;
>        if (sbi->segs_per_sec > 1)
>          do_something;
>            do_something #2;
>              if (f2fs_discard_en())
>                do _something;
>                  if (sbi->segs_per_sec > 1)
>                    do_something;
>                      ...
>                        do_something #N;
>                          if (f2fs_discard_en())
>                            do _something;
>                              if (sbi->segs_per_sec > 1)
>                                do_something;
> 
> Anyway, I don't see huge benefit regarding to performance and code readability,
> but do worry about potential patch conflicts when backporting this.
> 
ok, I think it depends on the kind and complexity of BTB the target processor 
or architecture uses and how the target architecture enhances parallelization.

I am new to f2fs, curious about the f2fs detailed implementation.
I'm more focused on the code readability since some code with jumps are 
a little bit hacky. :)

Thanks,

>> Taking branch predictor into consideration, I think the unrolled one is more easier to predict
>> than the current rolled one. (as you say, the current has more conditions in the loop
>> and a exit condition to predict)
>>
>>>
>>> In the patch,
>>> 	for (N)
>>> 		do_something;
>>> 	for (sits_in_cursum())
>>> 		do_something;
>>>
>>> 	if (f2fs_discard_en()) {
>>> 		for (N)
>>> 			do_something;
>>> 	}
>>> 	if (sbi->segs_per_sec > 1) {
>>> 		for (N)
>>> 			do_something;
>>> 	}
>>> => O(3*N + sits_in_cursum())
>>
>> If we use Big O notion, I think O(N + sits_in_cursum()) too.
>>>
>>> BTW, build_discard_entries(struct f2fs_sb_info *sbi) is more likely to split
>>> the loops, as you describe.
>>>
>>> In build_discard_entries(),
>>> 	for (N) {
>>> 		if (CP_TRIMMED_FLAG)
>>> 			do_something;
>>> 		else
>>> 			do_semething_else;
>>> 	}
>>> Thanks,
>>>
>>
>> I think so, yet CP_TRIMMED_FLAG comes from a function argument, so I think compiler
>> cannot decide whether it is constant.
>>
>> I think if we count the runtime condition & jump instruction at runtime.
>> the unrolls above are beneficial, I think.
>>
>> Thanks.
>>
>>>>
>>>> In addtion, this approach is easier to read than the original
>>>> one, so splitting the loop would be beneficial.
>>>>
>>>> Thanks
>>>>
>>>>>>
>>>>>> Note that "if (f2fs_discard_en(sbi))" and "if (sbi->segs_per_sec > 1)"
>>>>>> are all taken out of the loops they are unchangable.
>>>>>>
>>>>>> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
>>>>>> ---
>>>>>>     fs/f2fs/segment.c | 80 +++++++++++++++++++++++++------------------------------
>>>>>>     1 file changed, 37 insertions(+), 43 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 40e1d20..bcd8a40 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -3476,12 +3476,39 @@ static int build_curseg(struct f2fs_sb_info *sbi)
>>>>>>     	return restore_curseg_summaries(sbi);
>>>>>>     }
>>>>>>     
>>>>>> +static void build_discard_entries(struct f2fs_sb_info *sbi)
>>>>>> +{
>>>>>> +	unsigned segno;
>>>>>> +
>>>>>> +	for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) {
>>>>>> +		struct seg_entry *se = get_seg_entry(sbi, segno);
>>>>>> +
>>>>>> +		if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG))
>>>>>> +			memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
>>>>>> +		else {
>>>>>> +			memcpy(se->discard_map, se->cur_valid_map,
>>>>>> +				SIT_VBLOCK_MAP_SIZE);
>>>>>> +
>>>>>> +			sbi->discard_blks += sbi->blocks_per_seg -
>>>>>> +				se->valid_blocks;
>>>>>> +		}
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void build_sec_entries(struct f2fs_sb_info *sbi)
>>>>>> +{
>>>>>> +	unsigned segno;
>>>>>> +
>>>>>> +	for (segno = 0; segno < MAIN_SEGS(sbi); ++segno)
>>>>>> +		get_sec_entry(sbi, segno)->valid_blocks +=
>>>>>> +			get_seg_entry(sbi, segno)->valid_blocks;
>>>>>> +}
>>>>>> +
>>>>>>     static void build_sit_entries(struct f2fs_sb_info *sbi)
>>>>>>     {
>>>>>>     	struct sit_info *sit_i = SIT_I(sbi);
>>>>>>     	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
>>>>>>     	struct f2fs_journal *journal = curseg->journal;
>>>>>> -	struct seg_entry *se;
>>>>>>     	struct f2fs_sit_entry sit;
>>>>>>     	int sit_blk_cnt = SIT_BLK_CNT(sbi);
>>>>>>     	unsigned int i, start, end;
>>>>>> @@ -3498,67 +3525,34 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
>>>>>>     			struct f2fs_sit_block *sit_blk;
>>>>>>     			struct page *page;
>>>>>>     
>>>>>> -			se = &sit_i->sentries[start];
>>>>>>     			page = get_current_sit_page(sbi, start);
>>>>>>     			sit_blk = (struct f2fs_sit_block *)page_address(page);
>>>>>>     			sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
>>>>>>     			f2fs_put_page(page, 1);
>>>>>>     
>>>>>>     			check_block_count(sbi, start, &sit);
>>>>>> -			seg_info_from_raw_sit(se, &sit);
>>>>>> -
>>>>>> -			/* build discard map only one time */
>>>>>> -			if (f2fs_discard_en(sbi)) {
>>>>>> -				if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
>>>>>> -					memset(se->discard_map, 0xff,
>>>>>> -						SIT_VBLOCK_MAP_SIZE);
>>>>>> -				} else {
>>>>>> -					memcpy(se->discard_map,
>>>>>> -						se->cur_valid_map,
>>>>>> -						SIT_VBLOCK_MAP_SIZE);
>>>>>> -					sbi->discard_blks +=
>>>>>> -						sbi->blocks_per_seg -
>>>>>> -						se->valid_blocks;
>>>>>> -				}
>>>>>> -			}
>>>>>>     
>>>>>> -			if (sbi->segs_per_sec > 1)
>>>>>> -				get_sec_entry(sbi, start)->valid_blocks +=
>>>>>> -							se->valid_blocks;
>>>>>> +			seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
>>>>>>     		}
>>>>>>     		start_blk += readed;
>>>>>>     	} while (start_blk < sit_blk_cnt);
>>>>>>     
>>>>>>     	down_read(&curseg->journal_rwsem);
>>>>>>     	for (i = 0; i < sits_in_cursum(journal); i++) {
>>>>>> -		unsigned int old_valid_blocks;
>>>>>> -
>>>>>>     		start = le32_to_cpu(segno_in_journal(journal, i));
>>>>>> -		se = &sit_i->sentries[start];
>>>>>>     		sit = sit_in_journal(journal, i);
>>>>>>     
>>>>>> -		old_valid_blocks = se->valid_blocks;
>>>>>> -
>>>>>>     		check_block_count(sbi, start, &sit);
>>>>>> -		seg_info_from_raw_sit(se, &sit);
>>>>>> -
>>>>>> -		if (f2fs_discard_en(sbi)) {
>>>>>> -			if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
>>>>>> -				memset(se->discard_map, 0xff,
>>>>>> -							SIT_VBLOCK_MAP_SIZE);
>>>>>> -			} else {
>>>>>> -				memcpy(se->discard_map, se->cur_valid_map,
>>>>>> -							SIT_VBLOCK_MAP_SIZE);
>>>>>> -				sbi->discard_blks += old_valid_blocks -
>>>>>> -							se->valid_blocks;
>>>>>> -			}
>>>>>> -		}
>>>>>> -
>>>>>> -		if (sbi->segs_per_sec > 1)
>>>>>> -			get_sec_entry(sbi, start)->valid_blocks +=
>>>>>> -				se->valid_blocks - old_valid_blocks;
>>>>>> +		seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
>>>>>>     	}
>>>>>>     	up_read(&curseg->journal_rwsem);
>>>>>> +
>>>>>> +	/* build discard map only one time */
>>>>>> +	if (f2fs_discard_en(sbi))
>>>>>> +		build_discard_entries(sbi);
>>>>>> +
>>>>>> +	if (sbi->segs_per_sec > 1)
>>>>>> +		build_sec_entries(sbi);
>>>>>>     }
>>>>>>     
>>>>>>     static void init_free_segmap(struct f2fs_sb_info *sbi)
>>>>>> -- 
>>>>>> 2.1.4

  reply	other threads:[~2017-12-21  1:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-17 12:07 [f2fs-dev] [PATCH RFC] f2fs: clean up build_sit_entries gaoxiang (P)
2017-12-17 12:07 ` gaoxiang (P)
2017-12-19 23:40 ` [f2fs-dev] " Jaegeuk Kim
2017-12-19 23:40   ` Jaegeuk Kim
2017-12-20  1:36   ` [f2fs-dev] " Gaoxiang (OS)
2017-12-20  2:31     ` Jaegeuk Kim
2017-12-20  2:31       ` Jaegeuk Kim
2017-12-20  3:07       ` [f2fs-dev] " Gaoxiang (OS)
2017-12-20  3:12       ` Gaoxiang (OS)
2017-12-20  3:12         ` Gaoxiang (OS)
2017-12-20 20:39         ` [f2fs-dev] " Jaegeuk Kim
2017-12-21  1:31           ` Gaoxiang (OS) [this message]
2017-12-20  5:46       ` Gaoxiang (OS)
  -- strict thread matches above, loose matches on Subject: below --
2017-12-17 11:59 gaoxiang (P)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9047C53C18267742AB12E43B65C7F9F70BCB7DD8@dggemi505-mbs.china.huawei.com \
    --to=gaoxiang25@huawei.com \
    --cc=chao@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=weidu.du@huawei.com \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.