All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster
@ 2010-04-28 16:48 Wengang Wang
  2010-04-28 17:14 ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-04-28 16:48 UTC (permalink / raw)
  To: ocfs2-devel

Lockres lookup is within the dlm->spinlock. We'd better finish the lookup as
fast as possible especially when the machine is with more cpus.

Existing lookup is comparing charactors starting on a non-aligned address which
takes more time. This patch improves the performance mostly by changing comparing
on non-aligned address to comparing on aligned address. Also it makes all lockres
have same name length so that comparing length is not needed. And thus the extra
comparing on the first charactor is not needed any longer.

This patch changes recovery lockres name length from 9 to 31. This change doesn't
have much badness.

Per my test on the loop comparations in user space, This change at most can get
15.7% faster.

Questions:
1. Is there other special lockres name with non-31 length?
2. If all lockres name length is changed to 32(including the tailing '\n'), it
gets at most 19% faster, but increase 1 byte network transfer for very request.
I don't know whether this is worthy.

Drawbacks:
1. It changes locking version which makes rolling upgrade impossible.

#I didn't test the patch yet.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlm/dlmcommon.h    |    5 +++--
 fs/ocfs2/dlm/dlmdomain.c    |    6 +-----
 fs/ocfs2/ocfs2_lockingver.h |    5 ++++-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 0102be3..c41ebf5 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -91,8 +91,9 @@ enum dlm_ast_type {
 			 LKM_CANCEL | LKM_INVVALBLK | LKM_FORCE | \
 			 LKM_RECOVERY | LKM_LOCAL | LKM_NOQUEUE)
 
-#define DLM_RECOVERY_LOCK_NAME       "$RECOVERY"
-#define DLM_RECOVERY_LOCK_NAME_LEN   9
+#define DLM_RECOVERY_LOCK_NAME       "$RECOVERY0000000000000000000000"
+/* make the length of recovery lock name the same as normal ones */
+#define DLM_RECOVERY_LOCK_NAME_LEN   31
 
 static inline int dlm_is_recovery_lock(const char *lock_name, int name_len)
 {
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 988c905..7062d48 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -191,11 +191,7 @@ struct dlm_lock_resource * __dlm_lookup_lockres_full(struct dlm_ctxt *dlm,
 	hlist_for_each(list, bucket) {
 		struct dlm_lock_resource *res = hlist_entry(list,
 			struct dlm_lock_resource, hash_node);
-		if (res->lockname.name[0] != name[0])
-			continue;
-		if (unlikely(res->lockname.len != len))
-			continue;
-		if (memcmp(res->lockname.name + 1, name + 1, len - 1))
+		if (memcmp(res->lockname.name, name, len))
 			continue;
 		dlm_lockres_get(res);
 		return res;
diff --git a/fs/ocfs2/ocfs2_lockingver.h b/fs/ocfs2/ocfs2_lockingver.h
index 2e45c8d..7fd9260 100644
--- a/fs/ocfs2/ocfs2_lockingver.h
+++ b/fs/ocfs2/ocfs2_lockingver.h
@@ -25,8 +25,11 @@
  * more details.
  *
  * 1.0 - Initial locking version from ocfs2 1.4.
+ * 1.1 - Change recovery lock name from "$recovery" to
+ *       "$recovery00000000000000000000000". --make it 31 bytes long, same as
+ *       the length of normal lock name.
  */
 #define OCFS2_LOCKING_PROTOCOL_MAJOR 1
-#define OCFS2_LOCKING_PROTOCOL_MINOR 0
+#define OCFS2_LOCKING_PROTOCOL_MINOR 1
 
 #endif  /* OCFS2_LOCKINGVER_H */
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster
  2010-04-28 16:48 [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster Wengang Wang
@ 2010-04-28 17:14 ` Sunil Mushran
  2010-04-29  9:31   ` Wengang Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2010-04-28 17:14 UTC (permalink / raw)
  To: ocfs2-devel

The dlm interface allows different sized locknames. And the locknames can be
binary. That we use mostly ascii is just coincidental. Yes, mostly. The 
dentry
lock is partially binary. Also, $RECOVERY is used only during recovery.

So the only interesting bit from my pov would be:

-		if (memcmp(res->lockname.name + 1, name + 1, len - 1))
+		if (memcmp(res->lockname.name, name, len))

Will just this change improve performance? How long a hash list would need
to be for us to see an appreciable improvement?

Sunil


Wengang Wang wrote:
> Lockres lookup is within the dlm->spinlock. We'd better finish the lookup as
> fast as possible especially when the machine is with more cpus.
>
> Existing lookup is comparing charactors starting on a non-aligned address which
> takes more time. This patch improves the performance mostly by changing comparing
> on non-aligned address to comparing on aligned address. Also it makes all lockres
> have same name length so that comparing length is not needed. And thus the extra
> comparing on the first charactor is not needed any longer.
>
> This patch changes recovery lockres name length from 9 to 31. This change doesn't
> have much badness.
>
> Per my test on the loop comparations in user space, This change at most can get
> 15.7% faster.
>
> Questions:
> 1. Is there other special lockres name with non-31 length?
> 2. If all lockres name length is changed to 32(including the tailing '\n'), it
> gets at most 19% faster, but increase 1 byte network transfer for very request.
> I don't know whether this is worthy.
>
> Drawbacks:
> 1. It changes locking version which makes rolling upgrade impossible.
>
> #I didn't test the patch yet.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  fs/ocfs2/dlm/dlmcommon.h    |    5 +++--
>  fs/ocfs2/dlm/dlmdomain.c    |    6 +-----
>  fs/ocfs2/ocfs2_lockingver.h |    5 ++++-
>  3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
> index 0102be3..c41ebf5 100644
> --- a/fs/ocfs2/dlm/dlmcommon.h
> +++ b/fs/ocfs2/dlm/dlmcommon.h
> @@ -91,8 +91,9 @@ enum dlm_ast_type {
>  			 LKM_CANCEL | LKM_INVVALBLK | LKM_FORCE | \
>  			 LKM_RECOVERY | LKM_LOCAL | LKM_NOQUEUE)
>  
> -#define DLM_RECOVERY_LOCK_NAME       "$RECOVERY"
> -#define DLM_RECOVERY_LOCK_NAME_LEN   9
> +#define DLM_RECOVERY_LOCK_NAME       "$RECOVERY0000000000000000000000"
> +/* make the length of recovery lock name the same as normal ones */
> +#define DLM_RECOVERY_LOCK_NAME_LEN   31
>  
>  static inline int dlm_is_recovery_lock(const char *lock_name, int name_len)
>  {
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index 988c905..7062d48 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -191,11 +191,7 @@ struct dlm_lock_resource * __dlm_lookup_lockres_full(struct dlm_ctxt *dlm,
>  	hlist_for_each(list, bucket) {
>  		struct dlm_lock_resource *res = hlist_entry(list,
>  			struct dlm_lock_resource, hash_node);
> -		if (res->lockname.name[0] != name[0])
> -			continue;
> -		if (unlikely(res->lockname.len != len))
> -			continue;
> -		if (memcmp(res->lockname.name + 1, name + 1, len - 1))
> +		if (memcmp(res->lockname.name, name, len))
>  			continue;
>  		dlm_lockres_get(res);
>  		return res;
> diff --git a/fs/ocfs2/ocfs2_lockingver.h b/fs/ocfs2/ocfs2_lockingver.h
> index 2e45c8d..7fd9260 100644
> --- a/fs/ocfs2/ocfs2_lockingver.h
> +++ b/fs/ocfs2/ocfs2_lockingver.h
> @@ -25,8 +25,11 @@
>   * more details.
>   *
>   * 1.0 - Initial locking version from ocfs2 1.4.
> + * 1.1 - Change recovery lock name from "$recovery" to
> + *       "$recovery00000000000000000000000". --make it 31 bytes long, same as
> + *       the length of normal lock name.
>   */
>  #define OCFS2_LOCKING_PROTOCOL_MAJOR 1
> -#define OCFS2_LOCKING_PROTOCOL_MINOR 0
> +#define OCFS2_LOCKING_PROTOCOL_MINOR 1
>  
>  #endif  /* OCFS2_LOCKINGVER_H */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster
  2010-04-28 17:14 ` Sunil Mushran
@ 2010-04-29  9:31   ` Wengang Wang
  2010-04-30  2:39     ` Wengang Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-04-29  9:31 UTC (permalink / raw)
  To: ocfs2-devel

Hi Sunil,

On 10-04-28 10:14, Sunil Mushran wrote:
> The dlm interface allows different sized locknames. And the locknames can be
> binary. That we use mostly ascii is just coincidental. Yes, mostly.
> The dentry
> lock is partially binary. Also, $RECOVERY is used only during recovery.
> 
> So the only interesting bit from my pov would be:
> 
> -		if (memcmp(res->lockname.name + 1, name + 1, len - 1))
> +		if (memcmp(res->lockname.name, name, len))

Yes, then it's the only bit. 
> Will just this change improve performance? How long a hash list would need
> to be for us to see an appreciable improvement?
I didn't do a test for only this bit, but for the whole change.
For the test I did the test c files are complied with no optimization.

I, just now, tested for only this bit with -O2 optimization, I can _not_ see
improvement for even a 1999999 x 99999 loops of comparation. So please
ignore this patch.

Compiled with no optimization, the comparation is done against each
charator one by one? It's funny.

regards,
wengang.

> Sunil
> 
> 
> Wengang Wang wrote:
> >Lockres lookup is within the dlm->spinlock. We'd better finish the lookup as
> >fast as possible especially when the machine is with more cpus.
> >
> >Existing lookup is comparing charactors starting on a non-aligned address which
> >takes more time. This patch improves the performance mostly by changing comparing
> >on non-aligned address to comparing on aligned address. Also it makes all lockres
> >have same name length so that comparing length is not needed. And thus the extra
> >comparing on the first charactor is not needed any longer.
> >
> >This patch changes recovery lockres name length from 9 to 31. This change doesn't
> >have much badness.
> >
> >Per my test on the loop comparations in user space, This change at most can get
> >15.7% faster.
> >
> >Questions:
> >1. Is there other special lockres name with non-31 length?
> >2. If all lockres name length is changed to 32(including the tailing '\n'), it
> >gets at most 19% faster, but increase 1 byte network transfer for very request.
> >I don't know whether this is worthy.
> >
> >Drawbacks:
> >1. It changes locking version which makes rolling upgrade impossible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster
  2010-04-29  9:31   ` Wengang Wang
@ 2010-04-30  2:39     ` Wengang Wang
  2010-04-30  7:30       ` Wengang Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-04-30  2:39 UTC (permalink / raw)
  To: ocfs2-devel

updates:

The test c file was not well written, so that the comparation is removed
by the optimization. I retested and got following result:

[wwg at cool src]$ ./a.out "1234567890123456789012345678901"
"1234567890123456789012345678902" 199999 9999
1234567890123456789012345678901 31
1234567890123456789012345678902 31
loops 199999 x 9999
loops 9999 199999
orig cost 122s
loops 9999 199999
new cost 124s

That is after the change, it become slower. It's out of my thought :-(.
Also with 32 long strs, it's slower.
[wwg at cool src]$ ./a.out "12345678901234567890123456789010"
"12345678901234567890123456789020" 29999 9999
12345678901234567890123456789010 32
12345678901234567890123456789020 32
loops 29999 x 9999
loops 9999 29999
orig cost 18s
loops 9999 29999
new cost 19s

Attached the test c file.
compiled with gcc -O2 2.c

regards,
wengang.

On 10-04-29 17:31, Wengang Wang wrote:
> Hi Sunil,
> 
> On 10-04-28 10:14, Sunil Mushran wrote:
> > The dlm interface allows different sized locknames. And the locknames can be
> > binary. That we use mostly ascii is just coincidental. Yes, mostly.
> > The dentry
> > lock is partially binary. Also, $RECOVERY is used only during recovery.
> > 
> > So the only interesting bit from my pov would be:
> > 
> > -		if (memcmp(res->lockname.name + 1, name + 1, len - 1))
> > +		if (memcmp(res->lockname.name, name, len))
> 
> Yes, then it's the only bit. 
> > Will just this change improve performance? How long a hash list would need
> > to be for us to see an appreciable improvement?
> I didn't do a test for only this bit, but for the whole change.
> For the test I did the test c files are complied with no optimization.
> 
> I, just now, tested for only this bit with -O2 optimization, I can _not_ see
> improvement for even a 1999999 x 99999 loops of comparation. So please
> ignore this patch.
> 
> Compiled with no optimization, the comparation is done against each
> charator one by one? It's funny.
> 
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <string.h>

#define LOOPCNT 199999
#define LOOPCNT2 9999
unsigned char *name1 = "12345678901234567890123456789012";
unsigned char *name2 = "12345678901234567890123456789012";
unsigned int len1 = 32, len2 = 32;
unsigned int loop1 = LOOPCNT, loop2 = LOOPCNT2;

int func1(unsigned int loop1, unsigned int loop2)
{
	int i,j;

	for (j = 0; j < loop1; j++) {
	for (i = 0; i < loop2; i++) {
		if (name1[0] != name2[0])
			continue;
		if (len1 != len2)
			continue;
		if (memcmp(name1 + 1, name2 + 1, len1 - 1))
			continue;
		break;
	}
	}
	printf("loops %d %d\n", i,j);
	return i + j;
}
int func2(unsigned int loop1, unsigned int loop2)
{
	int i,j;

	for (j = 0; j < loop1; j++) {
	for (i = 0; i < loop2; i++) {
		if (name1[0] != name2[0])
			continue;
		if (len1 != len2)
			continue;
		if (memcmp(name1, name2, len1))
			continue;
		break;
	}
	}
	printf("loops %d %d\n", i,j);
	return i + j;
}

int main(int argc, char **argv)
{
	int a;
	struct timeval timev1, timev2;

	name1 = argv[1];
	name2 = argv[2];
	loop1 = atoi(argv[3]);
	loop2 = atoi(argv[4]);
	len1 = strlen(name1);
	len2 = strlen(name2);
	
	printf("%s %d\n", name1, len1);
	printf("%s %d\n", name2, len2);
	printf("loops %d x %d\n", loop1, loop2);

	gettimeofday(&timev1, NULL);
	a = func1(loop1, loop2);
	gettimeofday(&timev2, NULL);
	printf("orig cost %lds\n", timev2.tv_sec - timev1.tv_sec);
	gettimeofday(&timev1, NULL);
	a += func2(loop1, loop2);
	gettimeofday(&timev2, NULL);
	printf("new cost %lds\n", timev2.tv_sec - timev1.tv_sec);
	return a;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster
  2010-04-30  2:39     ` Wengang Wang
@ 2010-04-30  7:30       ` Wengang Wang
  2010-05-04  0:14         ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-04-30  7:30 UTC (permalink / raw)
  To: ocfs2-devel

updates:

Checked the asm code, it's repeating calling cmpsb, which is a byte
operations, instead of cmpsw, which is an word opration. So a more cmpsb
means N more cpu clocks.

I replaced memcmp with strncmp, it gives us at most %50 improvement.

[wwg at cool src]$ ./a.out "1234567890123456789012345678901" "1234567890123456789012345678902" 10000 10000
0x8049a40 1234567890123456789012345678901 31
0x8049a60 1234567890123456789012345678902 31
loops 10000 x 10000
orig: 6s
fixed: 3s
[wwg at cool src]$ ./a.out "1234567890123456789012345678901" "1234567890123456789012345678902" 20000 10000
0x8049a40 1234567890123456789012345678901 31
0x8049a60 1234567890123456789012345678902 31
loops 20000 x 10000
orig: 12s
fixed: 6s
[wwg at cool src]$ ./a.out "1234567890123456789012345678901" "1234567890123456789012345678902" 40000 10000
0x8049a40 1234567890123456789012345678901 31
0x8049a60 1234567890123456789012345678902 31
loops 40000 x 10000
orig: 24s
fixed: 12s

So it saves at most 3s for 100,000,000 comparations, or 3ms for 100,000,
or 3us for 100, on my with Intel(R) Core(TM)2 Duo CPU E8400  @3.00GHz.
I have no idea whether this is much or little :P

So seems the user space strncmp() is making use of cmpsw as posible.
I checked kernel version memcmp/strcmp/strncmp, they are just using _byte_
operations.
So Why the kernel version functions are not optimized as in user space libs?
though we can't user strncmp instead of memcmp directly.

regards,
wengang.
On 10-04-30 10:39, Wengang Wang wrote:
> updates:
> 
> The test c file was not well written, so that the comparation is removed
> by the optimization. I retested and got following result:
> 
> [wwg at cool src]$ ./a.out "1234567890123456789012345678901"
> "1234567890123456789012345678902" 199999 9999
> 1234567890123456789012345678901 31
> 1234567890123456789012345678902 31
> loops 199999 x 9999
> loops 9999 199999
> orig cost 122s
> loops 9999 199999
> new cost 124s
> 
> That is after the change, it become slower. It's out of my thought :-(.
> Also with 32 long strs, it's slower.
> [wwg at cool src]$ ./a.out "12345678901234567890123456789010"
> "12345678901234567890123456789020" 29999 9999
> 12345678901234567890123456789010 32
> 12345678901234567890123456789020 32
> loops 29999 x 9999
> loops 9999 29999
> orig cost 18s
> loops 9999 29999
> new cost 19s
> 
> Attached the test c file.
> compiled with gcc -O2 2.c
> 
> regards,
> wengang.
> 
> On 10-04-29 17:31, Wengang Wang wrote:
> > Hi Sunil,
> > 
> > On 10-04-28 10:14, Sunil Mushran wrote:
> > > The dlm interface allows different sized locknames. And the locknames can be
> > > binary. That we use mostly ascii is just coincidental. Yes, mostly.
> > > The dentry
> > > lock is partially binary. Also, $RECOVERY is used only during recovery.
> > > 
> > > So the only interesting bit from my pov would be:
> > > 
> > > -		if (memcmp(res->lockname.name + 1, name + 1, len - 1))
> > > +		if (memcmp(res->lockname.name, name, len))
> > 
> > Yes, then it's the only bit. 
> > > Will just this change improve performance? How long a hash list would need
> > > to be for us to see an appreciable improvement?
> > I didn't do a test for only this bit, but for the whole change.
> > For the test I did the test c files are complied with no optimization.
> > 
> > I, just now, tested for only this bit with -O2 optimization, I can _not_ see
> > improvement for even a 1999999 x 99999 loops of comparation. So please
> > ignore this patch.
> > 
> > Compiled with no optimization, the comparation is done against each
> > charator one by one? It's funny.
> > 

> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/time.h>
> #include <string.h>
> 
> #define LOOPCNT 199999
> #define LOOPCNT2 9999
> unsigned char *name1 = "12345678901234567890123456789012";
> unsigned char *name2 = "12345678901234567890123456789012";
> unsigned int len1 = 32, len2 = 32;
> unsigned int loop1 = LOOPCNT, loop2 = LOOPCNT2;
> 
> int func1(unsigned int loop1, unsigned int loop2)
> {
> 	int i,j;
> 
> 	for (j = 0; j < loop1; j++) {
> 	for (i = 0; i < loop2; i++) {
> 		if (name1[0] != name2[0])
> 			continue;
> 		if (len1 != len2)
> 			continue;
> 		if (memcmp(name1 + 1, name2 + 1, len1 - 1))
> 			continue;
> 		break;
> 	}
> 	}
> 	printf("loops %d %d\n", i,j);
> 	return i + j;
> }
> int func2(unsigned int loop1, unsigned int loop2)
> {
> 	int i,j;
> 
> 	for (j = 0; j < loop1; j++) {
> 	for (i = 0; i < loop2; i++) {
> 		if (name1[0] != name2[0])
> 			continue;
> 		if (len1 != len2)
> 			continue;
> 		if (memcmp(name1, name2, len1))
> 			continue;
> 		break;
> 	}
> 	}
> 	printf("loops %d %d\n", i,j);
> 	return i + j;
> }
> 
> int main(int argc, char **argv)
> {
> 	int a;
> 	struct timeval timev1, timev2;
> 
> 	name1 = argv[1];
> 	name2 = argv[2];
> 	loop1 = atoi(argv[3]);
> 	loop2 = atoi(argv[4]);
> 	len1 = strlen(name1);
> 	len2 = strlen(name2);
> 	
> 	printf("%s %d\n", name1, len1);
> 	printf("%s %d\n", name2, len2);
> 	printf("loops %d x %d\n", loop1, loop2);
> 
> 	gettimeofday(&timev1, NULL);
> 	a = func1(loop1, loop2);
> 	gettimeofday(&timev2, NULL);
> 	printf("orig cost %lds\n", timev2.tv_sec - timev1.tv_sec);
> 	gettimeofday(&timev1, NULL);
> 	a += func2(loop1, loop2);
> 	gettimeofday(&timev2, NULL);
> 	printf("new cost %lds\n", timev2.tv_sec - timev1.tv_sec);
> 	return a;
> }
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster
  2010-04-30  7:30       ` Wengang Wang
@ 2010-05-04  0:14         ` Sunil Mushran
  0 siblings, 0 replies; 6+ messages in thread
From: Sunil Mushran @ 2010-05-04  0:14 UTC (permalink / raw)
  To: ocfs2-devel

On 04/30/2010 12:30 AM, Wengang Wang wrote:
> updates:
>
> Checked the asm code, it's repeating calling cmpsb, which is a byte
> operations, instead of cmpsw, which is an word opration. So a more cmpsb
> means N more cpu clocks.
>
> I replaced memcmp with strncmp, it gives us at most %50 improvement.
>
> [wwg at cool src]$ ./a.out "1234567890123456789012345678901" "1234567890123456789012345678902" 10000 10000
> 0x8049a40 1234567890123456789012345678901 31
> 0x8049a60 1234567890123456789012345678902 31
> loops 10000 x 10000
> orig: 6s
> fixed: 3s
> [wwg at cool src]$ ./a.out "1234567890123456789012345678901" "1234567890123456789012345678902" 20000 10000
> 0x8049a40 1234567890123456789012345678901 31
> 0x8049a60 1234567890123456789012345678902 31
> loops 20000 x 10000
> orig: 12s
> fixed: 6s
> [wwg at cool src]$ ./a.out "1234567890123456789012345678901" "1234567890123456789012345678902" 40000 10000
> 0x8049a40 1234567890123456789012345678901 31
> 0x8049a60 1234567890123456789012345678902 31
> loops 40000 x 10000
> orig: 24s
> fixed: 12s
>
> So it saves at most 3s for 100,000,000 comparations, or 3ms for 100,000,
> or 3us for 100, on my with Intel(R) Core(TM)2 Duo CPU E8400  @3.00GHz.
> I have no idea whether this is much or little :P
>
>    

We have 16K hash buckets. 100 each means there are 1.6 million lock 
resources.
 From what I have seen, users have 0.5 to 1 million active lock 
resources.  Now
consider the fact that a message round trip on a gige takes something around
100-150us. Saving 3us is not going to get us much.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-04  0:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-28 16:48 [Ocfs2-devel] [PATCH] ocfs2: make lockres lookup faster Wengang Wang
2010-04-28 17:14 ` Sunil Mushran
2010-04-29  9:31   ` Wengang Wang
2010-04-30  2:39     ` Wengang Wang
2010-04-30  7:30       ` Wengang Wang
2010-05-04  0:14         ` Sunil Mushran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.