* [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match @ 2022-10-25 1:46 Nathan Moinvaziri 2022-10-25 8:00 ` Andy Shevchenko 0 siblings, 1 reply; 11+ messages in thread From: Nathan Moinvaziri @ 2022-10-25 1:46 UTC (permalink / raw) To: linux-kernel, Andy Shevchenko From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001 From: Nathan Moinvaziri <nathan@nathanm.com> Date: Mon, 24 Oct 2022 16:37:59 -0700 Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match. With strings where many characters match exactly each character is needlessly converted to lowercase before comparing. This patch improves the comparison by only converting to lowercase after checking that the characters don't match. The more characters that match exactly the better performance we expect versus the old function. When running tests using Quick Benchmark with two matching 256 character strings these changes result in anywhere between ~6-9x speed improvement. * We use unsigned char instead of int similar to strncasecmp. * We only subtract c1 - c2 when they are not equal. Reviewed-by: Sergey Markelov <sergio_nsk@yahoo.de> Reviewed-by: Steve Tucker <steven.r.tucker@gmail.com> --- lib/string.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/lib/string.c b/lib/string.c index 3371d26a0e39..51ad56db1b5d 100644 --- a/lib/string.c +++ b/lib/string.c @@ -64,13 +64,20 @@ EXPORT_SYMBOL(strncasecmp); #ifndef __HAVE_ARCH_STRCASECMP int strcasecmp(const char *s1, const char *s2) { - int c1, c2; + /* Yes, Virginia, it had better be unsigned */ + unsigned char c1, c2; do { - c1 = tolower(*s1++); - c2 = tolower(*s2++); - } while (c1 == c2 && c1 != 0); - return c1 - c2; + c1 = *s1++; + c2 = *s2++; + if (c1 != c2) { + c1 = tolower(c1); + c2 = tolower(c2); + if (c1 != c2) + return (int)c1 - (int)c2; + } + } while (c1 != 0); + return 0; } EXPORT_SYMBOL(strcasecmp); #endif -- 2.37.2.windows.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 1:46 [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Nathan Moinvaziri @ 2022-10-25 8:00 ` Andy Shevchenko 2022-10-25 9:03 ` Andy Shevchenko 2022-10-25 19:55 ` Rasmus Villemoes 0 siblings, 2 replies; 11+ messages in thread From: Andy Shevchenko @ 2022-10-25 8:00 UTC (permalink / raw) To: Nathan Moinvaziri; +Cc: linux-kernel, Andy Shevchenko On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001 > From: Nathan Moinvaziri <nathan@nathanm.com> > Date: Mon, 24 Oct 2022 16:37:59 -0700 > Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if > chars match. Why is the above in the commit message? > With strings where many characters match exactly each character is needlessly > converted to lowercase before comparing. This patch improves the comparison > by only converting to lowercase after checking that the characters don't match. > > The more characters that match exactly the better performance we expect versus > the old function. > > When running tests using Quick Benchmark with two matching 256 character > strings these changes result in anywhere between ~6-9x speed improvement. > > * We use unsigned char instead of int similar to strncasecmp. > * We only subtract c1 - c2 when they are not equal. Nobody can take it. Please, read Submitting Patches documentation and fix your contribution accordingly. ... > do { > - c1 = tolower(*s1++); > - c2 = tolower(*s2++); > - } while (c1 == c2 && c1 != 0); > - return c1 - c2; > + c1 = *s1++; > + c2 = *s2++; > + if (c1 != c2) { > + c1 = tolower(c1); > + c2 = tolower(c2); > + if (c1 != c2) > + return (int)c1 - (int)c2; > + } > + } while (c1 != 0); You tell us that this is more preformant, but have not provided the numbers. Can we see those, please? Note, that you basically trash CPU cache lines when characters are not equal, and before doing that you have a branching. I'm unsure that your way is more performant than the original one. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 8:00 ` Andy Shevchenko @ 2022-10-25 9:03 ` Andy Shevchenko 2022-10-25 17:53 ` Nathan Moinvaziri 2022-10-25 19:55 ` Rasmus Villemoes 1 sibling, 1 reply; 11+ messages in thread From: Andy Shevchenko @ 2022-10-25 9:03 UTC (permalink / raw) To: Nathan Moinvaziri; +Cc: linux-kernel On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: ... > > When running tests using Quick Benchmark with two matching 256 character > > strings these changes result in anywhere between ~6-9x speed improvement. > > > > * We use unsigned char instead of int similar to strncasecmp. > > * We only subtract c1 - c2 when they are not equal. ... > You tell us that this is more preformant, but have not provided the > numbers. Can we see those, please? So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > Note, that you basically trash CPU cache lines when characters are not > equal, and before doing that you have a branching. I'm unsure that > your way is more performant than the original one. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 9:03 ` Andy Shevchenko @ 2022-10-25 17:53 ` Nathan Moinvaziri 2022-10-25 19:19 ` Andy Shevchenko 2022-10-25 19:32 ` Christophe JAILLET 0 siblings, 2 replies; 11+ messages in thread From: Nathan Moinvaziri @ 2022-10-25 17:53 UTC (permalink / raw) To: Andy Shevchenko; +Cc: linux-kernel Hi Andy, I appreciate your quick feedback! I have done as you suggested and published my results this time using Google benchmark: https://github.com/nmoinvaz/strcasecmp After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. Thanks again, Nathan -----Original Message----- From: Andy Shevchenko <andy@kernel.org> Sent: Tuesday, October 25, 2022 2:04 AM To: Nathan Moinvaziri <nathan@nathanm.com> Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: ... > > When running tests using Quick Benchmark with two matching 256 > > character strings these changes result in anywhere between ~6-9x speed improvement. > > > > * We use unsigned char instead of int similar to strncasecmp. > > * We only subtract c1 - c2 when they are not equal. ... > You tell us that this is more preformant, but have not provided the > numbers. Can we see those, please? So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > Note, that you basically trash CPU cache lines when characters are not > equal, and before doing that you have a branching. I'm unsure that > your way is more performant than the original one. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 17:53 ` Nathan Moinvaziri @ 2022-10-25 19:19 ` Andy Shevchenko 2022-10-27 3:29 ` Nathan Moinvaziri 2022-10-25 19:32 ` Christophe JAILLET 1 sibling, 1 reply; 11+ messages in thread From: Andy Shevchenko @ 2022-10-25 19:19 UTC (permalink / raw) To: Nathan Moinvaziri; +Cc: Andy Shevchenko, linux-kernel On Tue, Oct 25, 2022 at 8:53 PM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > Hi Andy, > > I appreciate your quick feedback! > > I have done as you suggested and published my results this time using Google benchmark: > https://github.com/nmoinvaz/strcasecmp Thank you for sharing! Looks promising, but may I suggest a few things: 1) have you considered the word-at-a-time use (like strscpy() does)? 2) instead of using tolower() on both sides, have you considered (with the above in mind) to use XOR over words and if they are not 0, check if the result is one of possible combinations of 0x20 and then by excluding the non-letters from the range you may find the difference? So, I think it's a good exercise for the twiddling of bits. > After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. P.S. Avoid top-posting in the Linux kernel mailing lists! > -----Original Message----- > From: Andy Shevchenko <andy@kernel.org> > Sent: Tuesday, October 25, 2022 2:04 AM > To: Nathan Moinvaziri <nathan@nathanm.com> > Cc: linux-kernel@vger.kernel.org > Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match > > On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: > > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > ... > > > > When running tests using Quick Benchmark with two matching 256 > > > character strings these changes result in anywhere between ~6-9x speed improvement. > > > > > > * We use unsigned char instead of int similar to strncasecmp. > > > * We only subtract c1 - c2 when they are not equal. > > ... > > > You tell us that this is more preformant, but have not provided the > > numbers. Can we see those, please? > > So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open > source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > > > Note, that you basically trash CPU cache lines when characters are not > > equal, and before doing that you have a branching. I'm unsure that > > your way is more performant than the original one. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 19:19 ` Andy Shevchenko @ 2022-10-27 3:29 ` Nathan Moinvaziri 2022-10-27 6:31 ` Andy Shevchenko 0 siblings, 1 reply; 11+ messages in thread From: Nathan Moinvaziri @ 2022-10-27 3:29 UTC (permalink / raw) To: Andy Shevchenko; +Cc: Andy Shevchenko, linux-kernel On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > Looks promising, but may I suggest a few things: > 1) have you considered the word-at-a-time use (like strscpy() does)? Only briefly at the beginning of the function to check for an identical comparison and the added check hurt performance for strings that were not identical. On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > 2) instead of using tolower() on both sides, have you considered > (with the above in mind) to use XOR over words and if they are not 0, > check if the result is one of possible combinations of 0x20 and then > by excluding the non-letters from the range you may find the > difference? I'm not sure what you mean about the possible combinations of the space character. I have not investigated this method. ... According to my previous findings the check for c1 != c2 does perform better for strings that are at least 25% or more the same. I was able to get even more performance out of it by changing tolower() to use a different hash table than the one used for the is*() functions. By using a pre-generated hash table for both islower() and isupper() it is possible to remove the branch where ever those functions are used, including in strcasecmp. This method I've seen employed in the Android code base and also in cURL. Using it would add additional 2x256 bytes to the code size for the tables. I've put together a Quick Benchmark that shows the comparison between the different methods: https://quick-bench.com/q/l5DkYQO-CcMxQUu5MjZiqZ8M-Y0 Nathan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-27 3:29 ` Nathan Moinvaziri @ 2022-10-27 6:31 ` Andy Shevchenko 0 siblings, 0 replies; 11+ messages in thread From: Andy Shevchenko @ 2022-10-27 6:31 UTC (permalink / raw) To: Nathan Moinvaziri; +Cc: Andy Shevchenko, linux-kernel On Thu, Oct 27, 2022 at 6:29 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > > Looks promising, but may I suggest a few things: > > 1) have you considered the word-at-a-time use (like strscpy() does)? > > Only briefly at the beginning of the function to check for an identical > comparison and the added check hurt performance for strings that were > not identical. > > On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > > > 2) instead of using tolower() on both sides, have you considered > > (with the above in mind) to use XOR over words and if they are not 0, > > check if the result is one of possible combinations of 0x20 and then > > by excluding the non-letters from the range you may find the > > difference? > > I'm not sure what you mean about the possible combinations of the space > character. I have not investigated this method. 'a' xor 'A' == 0x20 (same for all the letters. That's why we have a specific _tolower() in vsprintf.c. > According to my previous findings the check for c1 != c2 does perform > better for strings that are at least 25% or more the same. I was able to > get even more performance out of it by changing tolower() to use a > different hash table than the one used for the is*() functions. By using > a pre-generated hash table for both islower() and isupper() it is > possible to remove the branch where ever those functions are used, > including in strcasecmp. This method I've seen employed in the Android > code base and also in cURL. Using it would add additional 2x256 bytes to > the code size for the tables. Rasmus raised a good question, where do we actually need the performant strcasecmp()? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RE: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 17:53 ` Nathan Moinvaziri 2022-10-25 19:19 ` Andy Shevchenko @ 2022-10-25 19:32 ` Christophe JAILLET 2022-10-25 23:27 ` Nathan Moinvaziri 1 sibling, 1 reply; 11+ messages in thread From: Christophe JAILLET @ 2022-10-25 19:32 UTC (permalink / raw) To: Nathan Moinvaziri, Andy Shevchenko; +Cc: linux-kernel Le 25/10/2022 à 19:53, Nathan Moinvaziri a écrit : > Hi Andy, > > I appreciate your quick feedback! > > I have done as you suggested and published my results this time using Google benchmark: > https://github.com/nmoinvaz/strcasecmp Hi, the algorithm on github is not the same as the one posted here. IIUC, the one on github is wrong. If you compare 2 strings that are the same, they will have the same length, and "if (c1 == c2) continue;" will go one past the end of the strings. And the result will be <0 or 0 or >0 depending the the char *after* the trailing \0. On the other side, the results of the benchmark on github are likely not accurate with the algorithm posted here, because there is one more test in each loop ("while (c1 != 0)") as long as the 2 strings are the same. On github this test is skipped because you will go through the "continue" CJ > > After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. > > Thanks again, > Nathan > > -----Original Message----- > From: Andy Shevchenko <andy@kernel.org> > Sent: Tuesday, October 25, 2022 2:04 AM > To: Nathan Moinvaziri <nathan@nathanm.com> > Cc: linux-kernel@vger.kernel.org > Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match > > On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: >> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > ... > >>> When running tests using Quick Benchmark with two matching 256 >>> character strings these changes result in anywhere between ~6-9x speed improvement. >>> >>> * We use unsigned char instead of int similar to strncasecmp. >>> * We only subtract c1 - c2 when they are not equal. > > ... > >> You tell us that this is more preformant, but have not provided the >> numbers. Can we see those, please? > > So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open > source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > >> Note, that you basically trash CPU cache lines when characters are not >> equal, and before doing that you have a branching. I'm unsure that >> your way is more performant than the original one. > > -- > With Best Regards, > Andy Shevchenko > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 19:32 ` Christophe JAILLET @ 2022-10-25 23:27 ` Nathan Moinvaziri 0 siblings, 0 replies; 11+ messages in thread From: Nathan Moinvaziri @ 2022-10-25 23:27 UTC (permalink / raw) To: Christophe JAILLET; +Cc: linux-kernel, Andy Shevchenko On 10/25/2022 12:32 PM, Christophe JAILLET wrote: > Hi, > the algorithm on github is not the same as the one posted here. > > IIUC, the one on github is wrong. If you compare 2 strings that are > the same, they will have the same length, and "if (c1 == c2) > continue;" will go one past the end of the strings. And the result > will be <0 or 0 or >0 depending the the char *after* the trailing \0. > > On the other side, the results of the benchmark on github are likely > not accurate with the algorithm posted here, because there is one more > test in each loop ("while (c1 != 0)") as long as the 2 strings are the > same. > On github this test is skipped because you will go through the "continue" > > CJ Hi CJ, Thanks for catching that, I had changed it at the last second. I have updated the code and the benchmarks to what I initially proposed in the patch. Results are about +/-1% from previously. Nathan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 8:00 ` Andy Shevchenko 2022-10-25 9:03 ` Andy Shevchenko @ 2022-10-25 19:55 ` Rasmus Villemoes 2022-10-25 22:37 ` Nathan Moinvaziri 1 sibling, 1 reply; 11+ messages in thread From: Rasmus Villemoes @ 2022-10-25 19:55 UTC (permalink / raw) To: Andy Shevchenko, Nathan Moinvaziri; +Cc: linux-kernel, Andy Shevchenko On 25/10/2022 10.00, Andy Shevchenko wrote: > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: >> >> From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001 >> From: Nathan Moinvaziri <nathan@nathanm.com> >> Date: Mon, 24 Oct 2022 16:37:59 -0700 >> Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if >> chars match. > > Why is the above in the commit message? > >> With strings where many characters match exactly each character is needlessly >> converted to lowercase before comparing. This patch improves the comparison >> by only converting to lowercase after checking that the characters don't match. >> >> The more characters that match exactly the better performance we expect versus >> the old function. > You tell us that this is more preformant, but have not provided the > numbers. Can we see those, please? > > Note, that you basically trash CPU cache lines when characters are not > equal, and before doing that you have a branching. I'm unsure that > your way is more performant than the original one. > Are there any code paths in the kernel where strcasecmp performance matters? strcmp, sure, but strcasecmp or strncasecmp? I don't think so. If anything, we should nuke the complication in strncasecmp(), and then make strcasecmp() simply do strncasecmp(a, b, SIZE_MAX). Rasmus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match 2022-10-25 19:55 ` Rasmus Villemoes @ 2022-10-25 22:37 ` Nathan Moinvaziri 0 siblings, 0 replies; 11+ messages in thread From: Nathan Moinvaziri @ 2022-10-25 22:37 UTC (permalink / raw) To: Rasmus Villemoes, Andy Shevchenko; +Cc: linux-kernel, Andy Shevchenko On 10/25/2022 12:55 PM, Rasmus Villemoes wrote: > Are there any code paths in the kernel where strcasecmp performance > matters? strcmp, sure, but strcasecmp or strncasecmp? I don't think so. > If anything, we should nuke the complication in strncasecmp(), and then > make strcasecmp() simply do strncasecmp(a, b, SIZE_MAX). It looks like several of the string functions could be collapsed in that way. Nathan ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-10-27 6:32 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-25 1:46 [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Nathan Moinvaziri 2022-10-25 8:00 ` Andy Shevchenko 2022-10-25 9:03 ` Andy Shevchenko 2022-10-25 17:53 ` Nathan Moinvaziri 2022-10-25 19:19 ` Andy Shevchenko 2022-10-27 3:29 ` Nathan Moinvaziri 2022-10-27 6:31 ` Andy Shevchenko 2022-10-25 19:32 ` Christophe JAILLET 2022-10-25 23:27 ` Nathan Moinvaziri 2022-10-25 19:55 ` Rasmus Villemoes 2022-10-25 22:37 ` Nathan Moinvaziri
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.