[PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
@ 2022-10-25  1:46 Nathan Moinvaziri
  2022-10-25  8:00 ` Andy Shevchenko
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Moinvaziri @ 2022-10-25  1:46 UTC (permalink / raw)
  To: linux-kernel, Andy Shevchenko

From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001
From: Nathan Moinvaziri <nathan@nathanm.com>
Date: Mon, 24 Oct 2022 16:37:59 -0700
Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if
 chars match.

With strings where many characters match exactly each character is needlessly
converted to lowercase before comparing. This patch improves the comparison
by only converting to lowercase after checking that the characters don't match.

The more characters that match exactly the better performance we expect versus
the old function.

When running tests using Quick Benchmark with two matching 256 character
strings these changes result in anywhere between ~6-9x speed improvement.

* We use unsigned char instead of int similar to strncasecmp.
* We only subtract c1 - c2 when they are not equal.

Reviewed-by: Sergey Markelov <sergio_nsk@yahoo.de>
Reviewed-by: Steve Tucker <steven.r.tucker@gmail.com>
---
 lib/string.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 3371d26a0e39..51ad56db1b5d 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -64,13 +64,20 @@ EXPORT_SYMBOL(strncasecmp);
 #ifndef __HAVE_ARCH_STRCASECMP
 int strcasecmp(const char *s1, const char *s2)
 {
-	int c1, c2;
+	/* Yes, Virginia, it had better be unsigned */
+	unsigned char c1, c2;

 	do {
-		c1 = tolower(*s1++);
-		c2 = tolower(*s2++);
-	} while (c1 == c2 && c1 != 0);
-	return c1 - c2;
+		c1 = *s1++;
+		c2 = *s2++;
+		if (c1 != c2) {
+			c1 = tolower(c1);
+			c2 = tolower(c2);
+			if (c1 != c2)
+				return (int)c1 - (int)c2;
+		}
+	} while (c1 != 0);
+	return 0;
 }
 EXPORT_SYMBOL(strcasecmp);
 #endif
-- 
2.37.2.windows.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25  1:46 [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Nathan Moinvaziri
@ 2022-10-25  8:00 ` Andy Shevchenko
  2022-10-25  9:03   ` Andy Shevchenko
  2022-10-25 19:55   ` Rasmus Villemoes
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Shevchenko @ 2022-10-25  8:00 UTC (permalink / raw)
  To: Nathan Moinvaziri; +Cc: linux-kernel, Andy Shevchenko

On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:
>
> From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001
> From: Nathan Moinvaziri <nathan@nathanm.com>
> Date: Mon, 24 Oct 2022 16:37:59 -0700
> Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if
>  chars match.

Why is the above in the commit message?

> With strings where many characters match exactly each character is needlessly
> converted to lowercase before comparing. This patch improves the comparison
> by only converting to lowercase after checking that the characters don't match.
>
> The more characters that match exactly the better performance we expect versus
> the old function.
>
> When running tests using Quick Benchmark with two matching 256 character
> strings these changes result in anywhere between ~6-9x speed improvement.
>
> * We use unsigned char instead of int similar to strncasecmp.
> * We only subtract c1 - c2 when they are not equal.

Nobody can take it. Please, read Submitting Patches documentation and
fix your contribution accordingly.

...

>         do {
> -               c1 = tolower(*s1++);
> -               c2 = tolower(*s2++);
> -       } while (c1 == c2 && c1 != 0);
> -       return c1 - c2;
> +               c1 = *s1++;
> +               c2 = *s2++;
> +               if (c1 != c2) {
> +                       c1 = tolower(c1);
> +                       c2 = tolower(c2);
> +                       if (c1 != c2)
> +                               return (int)c1 - (int)c2;
> +               }
> +       } while (c1 != 0);

You tell us that this is more preformant, but have not provided the
numbers. Can we see those, please?

Note, that you basically trash CPU cache lines when characters are not
equal, and before doing that you have a branching. I'm unsure that
your way is more performant than the original one.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25  8:00 ` Andy Shevchenko
@ 2022-10-25  9:03   ` Andy Shevchenko
  2022-10-25 17:53     ` Nathan Moinvaziri
  2022-10-25 19:55   ` Rasmus Villemoes
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Shevchenko @ 2022-10-25  9:03 UTC (permalink / raw)
  To: Nathan Moinvaziri; +Cc: linux-kernel

On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote:
> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:

...

> > When running tests using Quick Benchmark with two matching 256 character
> > strings these changes result in anywhere between ~6-9x speed improvement.
> >
> > * We use unsigned char instead of int similar to strncasecmp.
> > * We only subtract c1 - c2 when they are not equal.

...

> You tell us that this is more preformant, but have not provided the
> numbers. Can we see those, please?

So, I have read carefully and see the reference to some QuickBenchmark I have
no idea about. What I meant here is to have numbers provided by an (open
source) tool (maybe even in-kernel test case) that anybody can test on their
machines. You also missed details about how you run, what the data set has been
used, etc.

> Note, that you basically trash CPU cache lines when characters are not
> equal, and before doing that you have a branching. I'm unsure that
> your way is more performant than the original one.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25  9:03   ` Andy Shevchenko
@ 2022-10-25 17:53     ` Nathan Moinvaziri
  2022-10-25 19:19       ` Andy Shevchenko
  2022-10-25 19:32       ` Christophe JAILLET
  0 siblings, 2 replies; 11+ messages in thread
From: Nathan Moinvaziri @ 2022-10-25 17:53 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: linux-kernel

Hi Andy,

I appreciate your quick feedback!

I have done as you suggested and published my results this time using Google benchmark:
https://github.com/nmoinvaz/strcasecmp

After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. 

Thanks again,
Nathan

-----Original Message-----
From: Andy Shevchenko <andy@kernel.org> 
Sent: Tuesday, October 25, 2022 2:04 AM
To: Nathan Moinvaziri <nathan@nathanm.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match

On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote:
> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:

...

> > When running tests using Quick Benchmark with two matching 256 
> > character strings these changes result in anywhere between ~6-9x speed improvement.
> >
> > * We use unsigned char instead of int similar to strncasecmp.
> > * We only subtract c1 - c2 when they are not equal.

...

> You tell us that this is more preformant, but have not provided the 
> numbers. Can we see those, please?

So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open
source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc.

> Note, that you basically trash CPU cache lines when characters are not 
> equal, and before doing that you have a branching. I'm unsure that 
> your way is more performant than the original one.

--
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25 17:53     ` Nathan Moinvaziri
@ 2022-10-25 19:19       ` Andy Shevchenko
  2022-10-27  3:29         ` Nathan Moinvaziri
  2022-10-25 19:32       ` Christophe JAILLET
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Shevchenko @ 2022-10-25 19:19 UTC (permalink / raw)
  To: Nathan Moinvaziri; +Cc: Andy Shevchenko, linux-kernel

On Tue, Oct 25, 2022 at 8:53 PM Nathan Moinvaziri <nathan@nathanm.com> wrote:
>
> Hi Andy,
>
> I appreciate your quick feedback!
>
> I have done as you suggested and published my results this time using Google benchmark:
> https://github.com/nmoinvaz/strcasecmp

Thank you for sharing!

Looks promising, but may I suggest a few things:
1) have you considered the word-at-a-time use (like strscpy() does)?
2) instead of using tolower() on both sides,  have you considered
(with the above in mind) to use XOR over words and if they are not 0,
check if the result is one of possible combinations of 0x20 and then
by excluding the non-letters from the range you may find the
difference?

So, I think it's a good exercise for the twiddling of bits.

> After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand.

P.S. Avoid top-posting in the Linux kernel mailing lists!

> -----Original Message-----
> From: Andy Shevchenko <andy@kernel.org>
> Sent: Tuesday, October 25, 2022 2:04 AM
> To: Nathan Moinvaziri <nathan@nathanm.com>
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
>
> On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:
>
> ...
>
> > > When running tests using Quick Benchmark with two matching 256
> > > character strings these changes result in anywhere between ~6-9x speed improvement.
> > >
> > > * We use unsigned char instead of int similar to strncasecmp.
> > > * We only subtract c1 - c2 when they are not equal.
>
> ...
>
> > You tell us that this is more preformant, but have not provided the
> > numbers. Can we see those, please?
>
> So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open
> source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc.
>
> > Note, that you basically trash CPU cache lines when characters are not
> > equal, and before doing that you have a branching. I'm unsure that
> > your way is more performant than the original one.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25 19:19       ` Andy Shevchenko
@ 2022-10-27  3:29         ` Nathan Moinvaziri
  2022-10-27  6:31           ` Andy Shevchenko
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Moinvaziri @ 2022-10-27  3:29 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Andy Shevchenko, linux-kernel

On 10/25/2022 12:19 PM, Andy Shevchenko wrote:
> Looks promising, but may I suggest a few things:
> 1) have you considered the word-at-a-time use (like strscpy() does)?

Only briefly at the beginning of the function to check for an identical 
comparison and the added check hurt performance for strings that were 
not identical.

On 10/25/2022 12:19 PM, Andy Shevchenko wrote:

> 2) instead of using tolower() on both sides,  have you considered
> (with the above in mind) to use XOR over words and if they are not 0,
> check if the result is one of possible combinations of 0x20 and then
> by excluding the non-letters from the range you may find the
> difference?

I'm not sure what you mean about the possible combinations of the space 
character. I have not investigated this method.

...

According to my previous findings the check for c1 != c2 does perform 
better for strings that are at least 25% or more the same. I was able to 
get even more performance out of it by changing tolower() to use a 
different hash table than the one used for the is*() functions. By using 
a pre-generated hash table for both islower() and isupper() it is 
possible to remove the branch where ever those functions are used, 
including in strcasecmp. This method I've seen employed in the Android 
code base and also in cURL. Using it would add additional 2x256 bytes to 
the code size for the tables.

I've put together a Quick Benchmark that shows the comparison between 
the different methods:

https://quick-bench.com/q/l5DkYQO-CcMxQUu5MjZiqZ8M-Y0

Nathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-27  3:29         ` Nathan Moinvaziri
@ 2022-10-27  6:31           ` Andy Shevchenko
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Shevchenko @ 2022-10-27  6:31 UTC (permalink / raw)
  To: Nathan Moinvaziri; +Cc: Andy Shevchenko, linux-kernel

On Thu, Oct 27, 2022 at 6:29 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:
> On 10/25/2022 12:19 PM, Andy Shevchenko wrote:
> > Looks promising, but may I suggest a few things:
> > 1) have you considered the word-at-a-time use (like strscpy() does)?
>
> Only briefly at the beginning of the function to check for an identical
> comparison and the added check hurt performance for strings that were
> not identical.
>
> On 10/25/2022 12:19 PM, Andy Shevchenko wrote:
>
> > 2) instead of using tolower() on both sides,  have you considered
> > (with the above in mind) to use XOR over words and if they are not 0,
> > check if the result is one of possible combinations of 0x20 and then
> > by excluding the non-letters from the range you may find the
> > difference?
>
> I'm not sure what you mean about the possible combinations of the space
> character. I have not investigated this method.

'a' xor 'A' == 0x20 (same for all the letters.
That's why we have a specific _tolower() in vsprintf.c.

> According to my previous findings the check for c1 != c2 does perform
> better for strings that are at least 25% or more the same. I was able to
> get even more performance out of it by changing tolower() to use a
> different hash table than the one used for the is*() functions. By using
> a pre-generated hash table for both islower() and isupper() it is
> possible to remove the branch where ever those functions are used,
> including in strcasecmp. This method I've seen employed in the Android
> code base and also in cURL. Using it would add additional 2x256 bytes to
> the code size for the tables.

Rasmus raised a good question, where do we actually need the
performant strcasecmp()?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RE: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25 17:53     ` Nathan Moinvaziri
  2022-10-25 19:19       ` Andy Shevchenko
@ 2022-10-25 19:32       ` Christophe JAILLET
  2022-10-25 23:27         ` Nathan Moinvaziri
  1 sibling, 1 reply; 11+ messages in thread
From: Christophe JAILLET @ 2022-10-25 19:32 UTC (permalink / raw)
  To: Nathan Moinvaziri, Andy Shevchenko; +Cc: linux-kernel

Le 25/10/2022 à 19:53, Nathan Moinvaziri a écrit :
> Hi Andy,
> 
> I appreciate your quick feedback!
> 
> I have done as you suggested and published my results this time using Google benchmark:
> https://github.com/nmoinvaz/strcasecmp

Hi,
the algorithm on github is not the same as the one posted here.

IIUC, the one on github is wrong. If you compare 2 strings that are the 
same, they will have the same length, and "if (c1 == c2) continue;" will 
go one past the end of the strings. And the result will be <0 or 0 or >0 
depending the the char *after* the trailing \0.

On the other side, the results of the benchmark on github are likely not 
accurate with the algorithm posted here, because there is one more test 
in each loop ("while (c1 != 0)") as long as the 2 strings are the same.
On github this test is skipped because you will go through the "continue"

CJ

> 
> After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand.
> 
> Thanks again,
> Nathan
> 
> -----Original Message-----
> From: Andy Shevchenko <andy@kernel.org>
> Sent: Tuesday, October 25, 2022 2:04 AM
> To: Nathan Moinvaziri <nathan@nathanm.com>
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
> 
> On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote:
>> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:
> 
> ...
> 
>>> When running tests using Quick Benchmark with two matching 256
>>> character strings these changes result in anywhere between ~6-9x speed improvement.
>>>
>>> * We use unsigned char instead of int similar to strncasecmp.
>>> * We only subtract c1 - c2 when they are not equal.
> 
> ...
> 
>> You tell us that this is more preformant, but have not provided the
>> numbers. Can we see those, please?
> 
> So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open
> source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc.
> 
>> Note, that you basically trash CPU cache lines when characters are not
>> equal, and before doing that you have a branching. I'm unsure that
>> your way is more performant than the original one.
> 
> --
> With Best Regards,
> Andy Shevchenko
> 
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25 19:32       ` Christophe JAILLET
@ 2022-10-25 23:27         ` Nathan Moinvaziri
  0 siblings, 0 replies; 11+ messages in thread
From: Nathan Moinvaziri @ 2022-10-25 23:27 UTC (permalink / raw)
  To: Christophe JAILLET; +Cc: linux-kernel, Andy Shevchenko

On 10/25/2022 12:32 PM, Christophe JAILLET wrote:
> Hi,
> the algorithm on github is not the same as the one posted here.
>
> IIUC, the one on github is wrong. If you compare 2 strings that are 
> the same, they will have the same length, and "if (c1 == c2) 
> continue;" will go one past the end of the strings. And the result 
> will be <0 or 0 or >0 depending the the char *after* the trailing \0.
>
> On the other side, the results of the benchmark on github are likely 
> not accurate with the algorithm posted here, because there is one more 
> test in each loop ("while (c1 != 0)") as long as the 2 strings are the 
> same.
> On github this test is skipped because you will go through the "continue"
>
> CJ 

Hi CJ,

Thanks for catching that, I had changed it at the last second. I have 
updated the code and the benchmarks to what I initially proposed in the 
patch. Results are about +/-1% from previously.

Nathan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25  8:00 ` Andy Shevchenko
  2022-10-25  9:03   ` Andy Shevchenko
@ 2022-10-25 19:55   ` Rasmus Villemoes
  2022-10-25 22:37     ` Nathan Moinvaziri
  1 sibling, 1 reply; 11+ messages in thread
From: Rasmus Villemoes @ 2022-10-25 19:55 UTC (permalink / raw)
  To: Andy Shevchenko, Nathan Moinvaziri; +Cc: linux-kernel, Andy Shevchenko

On 25/10/2022 10.00, Andy Shevchenko wrote:
> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote:
>>
>> From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001
>> From: Nathan Moinvaziri <nathan@nathanm.com>
>> Date: Mon, 24 Oct 2022 16:37:59 -0700
>> Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if
>>  chars match.
> 
> Why is the above in the commit message?
> 
>> With strings where many characters match exactly each character is needlessly
>> converted to lowercase before comparing. This patch improves the comparison
>> by only converting to lowercase after checking that the characters don't match.
>>
>> The more characters that match exactly the better performance we expect versus
>> the old function.

> You tell us that this is more preformant, but have not provided the
> numbers. Can we see those, please?
> 
> Note, that you basically trash CPU cache lines when characters are not
> equal, and before doing that you have a branching. I'm unsure that
> your way is more performant than the original one.
> 

Are there any code paths in the kernel where strcasecmp performance
matters? strcmp, sure, but strcasecmp or strncasecmp? I don't think so.
If anything, we should nuke the complication in strncasecmp(), and then
make strcasecmp() simply do strncasecmp(a, b, SIZE_MAX).

Rasmus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match
  2022-10-25 19:55   ` Rasmus Villemoes
@ 2022-10-25 22:37     ` Nathan Moinvaziri
  0 siblings, 0 replies; 11+ messages in thread
From: Nathan Moinvaziri @ 2022-10-25 22:37 UTC (permalink / raw)
  To: Rasmus Villemoes, Andy Shevchenko; +Cc: linux-kernel, Andy Shevchenko

On 10/25/2022 12:55 PM, Rasmus Villemoes wrote:
> Are there any code paths in the kernel where strcasecmp performance
> matters? strcmp, sure, but strcasecmp or strncasecmp? I don't think so.
> If anything, we should nuke the complication in strncasecmp(), and then
> make strcasecmp() simply do strncasecmp(a, b, SIZE_MAX).

It looks like several of the string functions could be collapsed in that 
way.

Nathan


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-27  6:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25  1:46 [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Nathan Moinvaziri
2022-10-25  8:00 ` Andy Shevchenko
2022-10-25  9:03   ` Andy Shevchenko
2022-10-25 17:53     ` Nathan Moinvaziri
2022-10-25 19:19       ` Andy Shevchenko
2022-10-27  3:29         ` Nathan Moinvaziri
2022-10-27  6:31           ` Andy Shevchenko
2022-10-25 19:32       ` Christophe JAILLET
2022-10-25 23:27         ` Nathan Moinvaziri
2022-10-25 19:55   ` Rasmus Villemoes
2022-10-25 22:37     ` Nathan Moinvaziri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.