All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dcache: faster dentry_cmp()
@ 2012-02-14 22:45 Alexey Dobriyan
  2012-02-14 22:58 ` Al Viro
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Alexey Dobriyan @ 2012-02-14 22:45 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, npiggin

1) consistently use "unsigned int" for dentry name length,
2) reuse subtraction result for return value, exact value doesn't matter
   because function is only used in boolean context,
3) use *p++ idiom for even better code.

All of this results in performance speedup of "git diff"
which is way out of statistical error (0.4% vs 0.15% of 3 sigma):

$ PAGER= perf stat -r 256 git-diff

 Performance counter stats for 'git-diff' (256 runs):

        115.033582 task-clock                #    0.993 CPUs utilized            ( +-  0.06% )
                 0 context-switches          #    0.000 M/sec                    ( +- 17.95% )
                 0 CPU-migrations            #    0.000 M/sec                    ( +- 19.47% )
             2,321 page-faults               #    0.020 M/sec                    ( +-  0.00% )
       384,540,991 cycles                    #    3.343 GHz                      ( +-  0.05% )
       121,833,562 stalled-cycles-frontend   #   31.68% frontend cycles idle     ( +-  0.16% )
        51,731,784 stalled-cycles-backend    #   13.45% backend  cycles idle     ( +-  0.37% )
       586,327,441 instructions              #    1.52  insns per cycle
                                             #    0.21  stalled cycles per insn  ( +-  0.00% )
       155,449,246 branches                  # 1351.338 M/sec                    ( +-  0.00% )
           542,511 branch-misses             #    0.35% of all branches          ( +-  0.07% )

       0.115856505 seconds time elapsed                                          ( +-  0.06% )

----------------------------------
after

 Performance counter stats for 'git-diff' (256 runs):

        114.486145 task-clock                #    0.993 CPUs utilized            ( +-  0.05% )
                 0 context-switches          #    0.000 M/sec                    ( +- 15.46% )
                 0 CPU-migrations            #    0.000 M/sec                    ( +- 20.06% )
             2,282 page-faults               #    0.020 M/sec                    ( +-  0.00% )
       382,725,382 cycles                    #    3.343 GHz                      ( +-  0.05% )
       119,808,563 stalled-cycles-frontend   #   31.30% frontend cycles idle     ( +-  0.15% )
        51,780,030 stalled-cycles-backend    #   13.53% backend  cycles idle     ( +-  0.33% )
       585,114,727 instructions              #    1.53  insns per cycle
                                             #    0.20  stalled cycles per insn  ( +-  0.00% )
       155,146,262 branches                  # 1355.153 M/sec                    ( +-  0.00% )
           526,739 branch-misses             #    0.34% of all branches          ( +-  0.12% )

       0.115315823 seconds time elapsed                                          ( +-  0.05% )

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 fs/dcache.c            |    6 +++---
 include/linux/dcache.h |   14 +++++++-------
 2 files changed, 10 insertions(+), 10 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1368,7 +1368,7 @@ static struct dentry *__d_instantiate_unique(struct dentry *entry,
 					     struct inode *inode)
 {
 	struct dentry *alias;
-	int len = entry->d_name.len;
+	unsigned int len = entry->d_name.len;
 	const char *name = entry->d_name.name;
 	unsigned int hash = entry->d_name.hash;
 
@@ -1750,7 +1750,7 @@ struct dentry *__d_lookup_rcu(struct dentry *parent, struct qstr *name,
 	hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
 		struct inode *i;
 		const char *tname;
-		int tlen;
+		unsigned int tlen;
 
 		if (dentry->d_name.hash != hash)
 			continue;
@@ -1869,7 +1869,7 @@ struct dentry *__d_lookup(struct dentry *parent, struct qstr *name)
 	
 	hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
 		const char *tname;
-		int tlen;
+		unsigned int tlen;
 
 		if (dentry->d_name.hash != hash)
 			continue;
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -51,18 +51,18 @@ extern struct dentry_stat_t dentry_stat;
  * Compare 2 name strings, return 0 if they match, otherwise non-zero.
  * The strings are both count bytes long, and count is non-zero.
  */
-static inline int dentry_cmp(const unsigned char *cs, size_t scount,
-				const unsigned char *ct, size_t tcount)
+static inline int dentry_cmp(const unsigned char *cs, unsigned int scount,
+			     const unsigned char *ct, unsigned int tcount)
 {
 	int ret;
-	if (scount != tcount)
-		return 1;
+
+	ret = scount - tcount;
+	if (ret)
+		return ret;
 	do {
-		ret = (*cs != *ct);
+		ret = *cs++ - *ct++;
 		if (ret)
 			break;
-		cs++;
-		ct++;
 		tcount--;
 	} while (tcount);
 	return ret;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-14 22:45 [PATCH] dcache: faster dentry_cmp() Alexey Dobriyan
@ 2012-02-14 22:58 ` Al Viro
  2012-02-15  8:14   ` Alexey Dobriyan
  2012-02-15  1:41 ` Matthew Wilcox
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2012-02-14 22:58 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: linux-fsdevel, npiggin

On Wed, Feb 15, 2012 at 01:45:27AM +0300, Alexey Dobriyan wrote:
> 1) consistently use "unsigned int" for dentry name length,
> 2) reuse subtraction result for return value, exact value doesn't matter
>    because function is only used in boolean context,
> 3) use *p++ idiom for even better code.
> 
> All of this results in performance speedup of "git diff"
> which is way out of statistical error (0.4% vs 0.15% of 3 sigma):

> -	if (scount != tcount)
> -		return 1;
> +
> +	ret = scount - tcount;
> +	if (ret)
> +		return ret;
>  	do {
> -		ret = (*cs != *ct);
> +		ret = *cs++ - *ct++;
>  		if (ret)
>  			break;
> -		cs++;
> -		ct++;
>  		tcount--;
>  	} while (tcount);
>  	return ret;

I wonder what'll happen if you simply replace that loop with
	return memcmp(cs, ct, tcount);
which is what it really is...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-14 22:45 [PATCH] dcache: faster dentry_cmp() Alexey Dobriyan
  2012-02-14 22:58 ` Al Viro
@ 2012-02-15  1:41 ` Matthew Wilcox
  2012-02-15  8:14   ` Alexey Dobriyan
  2012-02-15  1:46 ` Andi Kleen
  2012-04-20 12:06 ` Nick Piggin
  3 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2012-02-15  1:41 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: viro, linux-fsdevel, npiggin

On Wed, Feb 15, 2012 at 01:45:27AM +0300, Alexey Dobriyan wrote:
> 1) consistently use "unsigned int" for dentry name length,
> 2) reuse subtraction result for return value, exact value doesn't matter
>    because function is only used in boolean context,
> 3) use *p++ idiom for even better code.
> 
> All of this results in performance speedup of "git diff"
> which is way out of statistical error (0.4% vs 0.15% of 3 sigma):

On what machine, out of interest?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-14 22:45 [PATCH] dcache: faster dentry_cmp() Alexey Dobriyan
  2012-02-14 22:58 ` Al Viro
  2012-02-15  1:41 ` Matthew Wilcox
@ 2012-02-15  1:46 ` Andi Kleen
  2012-02-15  8:16   ` Alexey Dobriyan
  2012-04-20 12:06 ` Nick Piggin
  3 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2012-02-15  1:46 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: viro, linux-fsdevel, npiggin

Alexey Dobriyan <adobriyan@gmail.com> writes:

> 1) consistently use "unsigned int" for dentry name length,
> 2) reuse subtraction result for return value, exact value doesn't matter
>    because function is only used in boolean context,
> 3) use *p++ idiom for even better code.

I'm sure there are even better ways to do the comparision using longer
words,  especially as we know the boundaries and alignment.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-14 22:58 ` Al Viro
@ 2012-02-15  8:14   ` Alexey Dobriyan
  0 siblings, 0 replies; 10+ messages in thread
From: Alexey Dobriyan @ 2012-02-15  8:14 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, npiggin

On Tue, Feb 14, 2012 at 10:58:39PM +0000, Al Viro wrote:
> On Wed, Feb 15, 2012 at 01:45:27AM +0300, Alexey Dobriyan wrote:
> > 1) consistently use "unsigned int" for dentry name length,
> > 2) reuse subtraction result for return value, exact value doesn't matter
> >    because function is only used in boolean context,
> > 3) use *p++ idiom for even better code.
> > 
> > All of this results in performance speedup of "git diff"
> > which is way out of statistical error (0.4% vs 0.15% of 3 sigma):
> 
> > -	if (scount != tcount)
> > -		return 1;
> > +
> > +	ret = scount - tcount;
> > +	if (ret)
> > +		return ret;
> >  	do {
> > -		ret = (*cs != *ct);
> > +		ret = *cs++ - *ct++;
> >  		if (ret)
> >  			break;
> > -		cs++;
> > -		ct++;
> >  		tcount--;
> >  	} while (tcount);
> >  	return ret;
> 
> I wonder what'll happen if you simply replace that loop with
> 	return memcmp(cs, ct, tcount);
> which is what it really is...

It was memcmp() at some point and Nick changed it and it became faster.
See 9d55c369bb5e695e629bc35cba2ef607755b3bee
"fs: implement faster dentry memcmp".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-15  1:41 ` Matthew Wilcox
@ 2012-02-15  8:14   ` Alexey Dobriyan
  0 siblings, 0 replies; 10+ messages in thread
From: Alexey Dobriyan @ 2012-02-15  8:14 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: viro, linux-fsdevel, npiggin

On Tue, Feb 14, 2012 at 06:41:02PM -0700, Matthew Wilcox wrote:
> On Wed, Feb 15, 2012 at 01:45:27AM +0300, Alexey Dobriyan wrote:
> > 1) consistently use "unsigned int" for dentry name length,
> > 2) reuse subtraction result for return value, exact value doesn't matter
> >    because function is only used in boolean context,
> > 3) use *p++ idiom for even better code.
> > 
> > All of this results in performance speedup of "git diff"
> > which is way out of statistical error (0.4% vs 0.15% of 3 sigma):
> 
> On what machine, out of interest?

Core i5 760 (x86_64)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-15  1:46 ` Andi Kleen
@ 2012-02-15  8:16   ` Alexey Dobriyan
  2012-02-15 11:19     ` Alexey Dobriyan
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Dobriyan @ 2012-02-15  8:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: viro, linux-fsdevel, npiggin

On Tue, Feb 14, 2012 at 05:46:24PM -0800, Andi Kleen wrote:
> Alexey Dobriyan <adobriyan@gmail.com> writes:
> 
> > 1) consistently use "unsigned int" for dentry name length,
> > 2) reuse subtraction result for return value, exact value doesn't matter
> >    because function is only used in boolean context,
> > 3) use *p++ idiom for even better code.
> 
> I'm sure there are even better ways to do the comparision using longer
> words,  especially as we know the boundaries and alignment.

This requires terminating dentry names with 1-3 NULs.
Al, can we afford such beauty?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-15  8:16   ` Alexey Dobriyan
@ 2012-02-15 11:19     ` Alexey Dobriyan
  2012-02-15 20:20       ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Dobriyan @ 2012-02-15 11:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: viro, linux-fsdevel, npiggin

On Wed, Feb 15, 2012 at 11:16 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On Tue, Feb 14, 2012 at 05:46:24PM -0800, Andi Kleen wrote:
>> Alexey Dobriyan <adobriyan@gmail.com> writes:
>>
>> > 1) consistently use "unsigned int" for dentry name length,
>> > 2) reuse subtraction result for return value, exact value doesn't matter
>> >    because function is only used in boolean context,
>> > 3) use *p++ idiom for even better code.
>>
>> I'm sure there are even better ways to do the comparision using longer
>> words,  especially as we know the boundaries and alignment.
>
> This requires terminating dentry names with 1-3 NULs.
> Al, can we afford such beauty?

Hmm.

At least, in link_path_walk(), there is no room to insert NULs.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-15 11:19     ` Alexey Dobriyan
@ 2012-02-15 20:20       ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2012-02-15 20:20 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Andi Kleen, viro, linux-fsdevel, npiggin

> > This requires terminating dentry names with 1-3 NULs.
> > Al, can we afford such beauty?
> 
> Hmm.
> 
> At least, in link_path_walk(), there is no room to insert NULs.

We don't necessarily need to pad. It's enough to use the word compare
as a fast path.

Of course memcpy should do it already, but it may be inefficient for
small strings.

-Andi
> 

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dcache: faster dentry_cmp()
  2012-02-14 22:45 [PATCH] dcache: faster dentry_cmp() Alexey Dobriyan
                   ` (2 preceding siblings ...)
  2012-02-15  1:46 ` Andi Kleen
@ 2012-04-20 12:06 ` Nick Piggin
  3 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2012-04-20 12:06 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: viro, linux-fsdevel, npiggin

On Wed, Feb 15, 2012 at 01:45:27AM +0300, Alexey Dobriyan wrote:
> 1) consistently use "unsigned int" for dentry name length,
> 2) reuse subtraction result for return value, exact value doesn't matter
>    because function is only used in boolean context,
> 3) use *p++ idiom for even better code.
> 
> All of this results in performance speedup of "git diff"
> which is way out of statistical error (0.4% vs 0.15% of 3 sigma):

Would you have any interest to port this patch on top of
Linus' latest changes there? 0.4% increase is nothing to
sneeze at. There may be some improvements with the
DCACHE_WORD_ACCESS code too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-20 12:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14 22:45 [PATCH] dcache: faster dentry_cmp() Alexey Dobriyan
2012-02-14 22:58 ` Al Viro
2012-02-15  8:14   ` Alexey Dobriyan
2012-02-15  1:41 ` Matthew Wilcox
2012-02-15  8:14   ` Alexey Dobriyan
2012-02-15  1:46 ` Andi Kleen
2012-02-15  8:16   ` Alexey Dobriyan
2012-02-15 11:19     ` Alexey Dobriyan
2012-02-15 20:20       ` Andi Kleen
2012-04-20 12:06 ` Nick Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.