All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Yu <chao2.yu@samsung.com>
To: "'Jaegeuk Kim'" <jaegeuk@kernel.org>, '이창만' <cm224.lee@samsung.com>
Cc: 'Gu Zheng' <guz.fnst@cn.fujitsu.com>,
	'f2fs' <linux-f2fs-devel@lists.sourceforge.net>,
	'fsdevel' <linux-fsdevel@vger.kernel.org>
Subject: RE: [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block
Date: Sat, 05 Jul 2014 19:15:58 +0800	[thread overview]
Message-ID: <004501cf9842$9c400ae0$d4c020a0$@samsung.com> (raw)
In-Reply-To: <20140705062535.GA35486@jmac.local>

Hi,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Saturday, July 05, 2014 2:26 PM
> To: '이창만'
> Cc: Gu Zheng; Chao Yu; 'f2fs'; 'fsdevel'
> Subject: Re: [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block
> 
> To Changman,
> 
> Just for sure, can you reproduce this issue in the x86 machine with proper
> benchmarks? (i.e., test_bit_le vs. find_next_bit_le)
> 
> To all,
> 
> I cautiously suspect that the performances might be different when processing
> f2fs_find_entry, since L1/L2 cache misses due to the intermediate routines like
> matching strings can make some effect on it.
> 
> But, IMO, it is still worth to investigate this issue and contemplate how to
> detect all ones or not.
> 
> Ah, one solution may be using 2 bytes from the reserved space, total 3, to
> indicate how many valid dentries are stored in the dentry block.
> 
> Any ideas?

That's good one!
IMO, one more byte could be maintained additionally to indicate last one in
bitmap, we save the searching time for exist target name by skipping testing area
behind the last one.

Thanks,
Yu

> 
> Thanks,
> 
> On Fri, Jul 04, 2014 at 04:04:09PM +0800, Gu Zheng wrote:
> > Hi Yu,
> > Thanks.
> > On 07/04/2014 02:21 PM, Chao Yu wrote:
> >
> > > Hi Jaegeuk, Gu, Changman
> > >
> > >> -----Original Message-----
> > >> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > >> Sent: Friday, July 04, 2014 1:36 PM
> > >> To: Gu Zheng
> > >> Cc: f2fs; fsdevel; 이창만; 俞
> > >> Subject: Re: [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in,
> find_in_block
> > >>
> > >> Well, how about testing with many ones in the bit streams?
> > >> Thanks,
> > >>
> > >> On Thu, Jul 03, 2014 at 06:14:02PM +0800, Gu Zheng wrote:
> > >>> Hi Jaegeuk, Changman
> > >>>
> > >>> Just a simple test, not very sure it can address
> > >>> our qualm.
> > >>>
> > >>> Bitmap size:216(the same as f2fs dentry_bits).
> > >>> CPU: Intel i5 x86_64.
> > >>>
> > >>> Time counting based on tsc(the less the fast).
> > >>> [Index of 1]	find_next_bit_le	test_bit_le
> > >>> 0		20			117
> > >>> 1		20			114
> > >>> 2		20			113
> > >>> 3		20			139
> > >>> 4		22			121
> > >>> 5		22			118
> > >>> 6		22			115
> > >>> 8		22			112
> > >>> 9		22			106
> > >>> 10		22			105
> > >>> 11		22			100
> > >>> 16		22			98
> > >>> 48		22			97
> > >>> 80		27			95
> > >>> 104		27			92
> > >>> 136		32			95
> > >>> 160		32			92
> > >>> 184		32			90
> > >>> 200		27			87
> > >>> 208		35			84
> > >>>
> > >>> According to the result, find_next_bit_le is always
> > >>> better than test_bit_le, though there may be some
> > >>> noise, but I think the result is clear.
> > >>> Hope it can help us.:)
> > >>> ps.The sample is attached too.
> > >>>
> > >>> Thanks,
> > >>> Gu
> > >
> > > I hope this could provide some help for this patch.
> > >
> > > I modify Gu's code like this, and add few test case:
> > >
> > > static void test_bit_search_speed(void)
> > > {
> > > 	unsigned long flags;
> > > 	uint64_t tsc_s_b1, tsc_s_e1, tsc_s_b2, tsc_s_e2;
> > > 	int i, j, pos;
> > > 	const void *bit_addr;
> > >
> > > 	local_irq_save(flags);
> > > 	preempt_disable();
> > >
> > > 	printk("find_next_bit	test_bit_le\n");
> > >
> > > 	for (i = 0; i < 24; i++) {
> > >
> > > 		bit_addr = &bitmaps[i];
> > >
> > > 		tsc_s_b1 = rdtsc();
> > >
> > > 		for (j = 0, pos = 0; j < 1000; j++, pos = 0)
> > > 			while (pos < 216)
> > > 				pos = find_next_bit_le(bit_addr, 216, pos) + 1;
> > >
> > > 		mb();
> > > 		tsc_s_e1 = rdtsc();
> > > 		tsc_s_e1 -= tsc_s_b1;
> > > 		do_div(tsc_s_e1, 1000);
> > >
> > > 		tsc_s_b2 = rdtsc();
> > >
> > > 		for (j = 0, pos = 0; j < 1000; j++, pos = 0)
> > > 			while (pos < 216)
> > > 				test_bit_le(pos++, bit_addr);
> > >
> > > 		mb();
> > > 		tsc_s_e2 = rdtsc();
> > > 		tsc_s_e2 -= tsc_s_b2;
> > > 		do_div(tsc_s_e2, 1000);
> > >
> > > 		printk("%-16llu %-16llu\n", tsc_s_e1, tsc_s_e2);
> > > 	}
> > >
> > > 	preempt_enable();
> > > 	local_irq_restore(flags);
> > > }
> > > case: 11111111 11111111
> > > 		{255, 255, 255, 255, 255, 255, 255,
> > > 		 255, 255, 255, 255, 255, 255, 255,
> > > 		 255, 255, 255, 255, 255, 255, 255,
> > > 		 255, 255, 255, 255, 255, 255},
> > > case: 10101010 10101010
> > > 		{170, 170, 170, 170, 170, 170, 170,
> > > 		 170, 170, 170, 170, 170, 170, 170,
> > > 		 170, 170, 170, 170, 170, 170, 170,
> > > 		 170, 170, 170, 170, 170, 170},
> > > case: 11111111 00000000
> > > 		{255, 0, 255, 0, 255, 0, 255,
> > > 		 0, 255, 0, 255, 0, 255, 0,
> > > 		 255, 0, 255, 0, 255, 0, 255,
> > > 		 0, 255, 0, 255, 0, 255},
> > > case: 00001111 00001111
> > > 		 {15, 15, 15, 15, 15, 15, 15,
> > > 		 15, 15, 15, 15, 15, 15, 15,
> > > 		 15, 15, 15, 15, 15, 15, 15,
> > > 		 15, 15, 15, 15, 15, 15}
> > >
> > > and here are test result in my env. (Ubuntu vm, 768MB, i3-3220)
> > > It seems find_next_bit works not so bad as I thought.
> > >
> > > find_next_bit    test_bit_le
> > > 73               4209
> > > 62               1271
> > > 69               1585
> > > 50               2031
> > > 67               2255
> > > 82               2261
> > > 52               4007
> > > 79               2159
> > > 50               2043
> > > 55               2215
> > > 53               2393
> > > 72               3784
> > > 76               1879
> > > 61               2562
> > > 70               2702
> > > 62               2489
> > > 56               2307
> > > 54               2063
> > > 51               2258
> > > 69               2712
> > > 4133             3989  -- case: 11111111 11111111
> > > 2370             3024  -- case: 10101010 10101010
> > > 2608             2413  -- case: 11111111 00000000
> > > 2457             2506  -- case: 00001111 00001111
> >
> > The time cost of test_bit_le shakes violently, it should be
> > very smooth according to the test case, maybe it has relation
> > to your vm env.
> >
> > To Jaegeuk,
> > Following test result is walking a bitmap via find_next_bit_le
> > and test_bit_le.
> > (Front 7 are random bitmaps, last four are cases from Yu, see
> > attached sample for detail):
> >
> > find_next_bit_le    test_bit_le
> >
> >      3615                3492
> >      2640                3492
> >      2431                3492
> >      1957                3494
> >      2160                3492
> >      1933                3495
> >      1096                3492
> >
> >      8078                3492
> >      3732                3492
> >      4237                3493
> >      3824                3492
> >
> > Thanks,
> > Gu
> >
> > >
> > > .
> > >
> >
> >
> 
> > #include <linux/module.h>
> >
> > __u8 bitmaps[11][27] = {
> > 		{1, 123, 2, 3, 127, 200, 100,
> > 		 123, 23, 13, 78, 42, 123, 50,
> > 		 123, 56, 198, 0, 58, 22, 19,
> > 		 123, 23, 13, 78, 42, 123},
> > 		{2, 0, 0, 0, 0, 0, 0,
> > 		 123, 23, 13, 78, 42, 123, 50,
> > 		 123, 56, 198, 0, 58, 22, 19,
> > 		 23, 123, 2, 34, 123, 12},
> > 		{4, 0, 0, 0, 0, 0, 0,
> > 		 123, 56, 198, 0, 58, 22, 19,
> > 		 123, 23, 13, 78, 42, 123,
> > 		 0, 0, 0, 34, 45, 29},
> > 		{8, 0, 0, 0, 0, 0, 0,
> > 		 1, 123, 2, 3, 127, 200, 100,
> > 		 123, 23, 13, 78, 42, 123, 50,
> > 		 0, 0, 0, 0, 0, 0},
> > 		{16, 0, 0, 0, 0, 0, 0,
> > 		 8, 78, 0, 0, 23, 0, 213,
> > 		 0, 12, 0, 45, 0, 109, 111,
> > 		 231, 11, 11, 88, 77, 99},
> > 		{32, 0, 0, 0, 0, 0, 0,
> > 		 8, 78, 0, 0, 23, 0, 213,
> > 		 0, 12, 0, 45, 0, 109, 111,
> > 		 231, 11, 11, 88, 77, 99},
> > 		{64, 0, 0, 0, 0, 0, 0,
> > 		 8, 78, 0, 0, 23, 0, 213,
> > 		 0, 12, 0, 45, 0, 109, 111,
> > 		 0, 0, 0, 0, 0, 0},
> > 		{255, 255, 255, 255, 255, 255, 255,
> > 		 255, 255, 255, 255, 255, 255, 255,
> > 		 255, 255, 255, 255, 255, 255, 255,
> > 		 255, 255, 255, 255, 255, 255},
> > 		{170, 170, 170, 170, 170, 170, 170,
> > 		 170, 170, 170, 170, 170, 170, 170,
> > 		 170, 170, 170, 170, 170, 170, 170,
> > 		 170, 170, 170, 170, 170, 170},
> > 		{255, 0, 255, 0, 255, 0, 255,
> > 		 0, 255, 0, 255, 0, 255, 0,
> > 		 255, 0, 255, 0, 255, 0, 255,
> > 		 0, 255, 0, 255, 0, 255},
> > 		 {15, 15, 15, 15, 15, 15, 15,
> > 		 15, 15, 15, 15, 15, 15, 15,
> > 		 15, 15, 15, 15, 15, 15, 15,
> > 		 15, 15, 15, 15, 15, 15}
> > 		};
> >
> > uint64_t rdtsc(void) {
> > uint32_t lo, hi;
> > __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
> > return (uint64_t)hi << 32 | lo;
> > }
> >
> > static void test_bit_search_speed(void)
> > {
> > 	unsigned long flags;
> > 	uint64_t tsc_s, tsc_diff;
> > 	int i, j, pos;
> > 	const void *bit_addr;
> >
> > 	local_irq_save(flags);
> > 	preempt_disable();
> >
> > 	printk("find_next_bit_le    test_bit_le\n");
> >
> > 	for (i = 0; i < 11; i++) {
> > 		bit_addr = &bitmaps[i];
> >
> > 		pos = 0;
> > 		tsc_s = rdtsc();
> >
> > 		for (j = 0; j < 1000; j++, pos = 0)
> > 			while (pos < 216)
> > 				pos = find_next_bit_le(bit_addr, 216, pos) + 1;
> > 		mb();
> > 		tsc_diff = (rdtsc() - tsc_s)/1000;
> >
> > 		printk("%8llu", tsc_diff);
> >
> > 		pos = 0;
> > 		tsc_s = rdtsc();
> >
> > 		for (j = 0; j < 1000; j++, pos = 0)
> > 			while (pos < 216)
> > 				test_bit_le(pos++, bit_addr);
> >
> > 		mb();
> > 		tsc_diff = (rdtsc() - tsc_s)/1000;
> >
> > 		printk("%20llu \n", tsc_diff);
> > 	}
> >
> > 	preempt_enable();
> > 	local_irq_restore(flags);
> > }
> >
> > static int __init start_test(void) {
> > 	test_bit_search_speed();
> > 	return 0;
> > }
> >
> > static void __exit exit_test(void) {
> >
> > }
> >
> > module_init(start_test);
> > module_exit(exit_test);
> >
> > MODULE_LICENSE("GPL");
> > MODULE_AUTHOR("Gu Zheng");
> 
> 
> --
> Jaegeuk Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-07-05 11:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-24 10:20 [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block Gu Zheng
2014-06-25  2:30 ` [f2fs-dev] " Chao Yu
2014-06-25  2:53   ` Gu Zheng
2014-06-27  9:58 ` [PATCH V2 3/4] f2fs: use find_next_bit_le rather than test_bit_le in find_in_block Gu Zheng
2014-07-02  7:49 ` [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block Changman Lee
2014-07-03  1:05   ` Gu Zheng
2014-07-02 10:30 ` Jaegeuk Kim
2014-07-03  1:13   ` Gu Zheng
2014-07-03 10:14   ` Gu Zheng
2014-07-04  5:36     ` Jaegeuk Kim
2014-07-04  6:21       ` Chao Yu
2014-07-04  8:04         ` Gu Zheng
2014-07-05  6:25           ` Jaegeuk Kim
2014-07-05 11:15             ` Chao Yu [this message]
2014-07-07  1:45             ` Changman Lee
2014-07-07  2:13               ` Gu Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='004501cf9842$9c400ae0$d4c020a0$@samsung.com' \
    --to=chao2.yu@samsung.com \
    --cc=cm224.lee@samsung.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.