* [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
@ 2015-02-18 18:48 Alexander Kuleshov
2015-02-18 19:25 ` OGAWA Hirofumi
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Kuleshov @ 2015-02-18 18:48 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: linux-kernel, Alexander Kuleshov
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
---
fs/fat/fat.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 64e295e..1a5080b 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -207,12 +207,12 @@ static inline void fat_save_attrs(struct inode *inode, u8 attrs)
static inline unsigned char fat_checksum(const __u8 *name)
{
+ u8 i;
unsigned char s = name[0];
- s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
- s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
- s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
- s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
- s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
+
+ for (i = 1; i < 11; i++)
+ s = (s << 7) + (s >> 1) + name[i];
+
return s;
}
--
2.3.0.80.g18d0fec
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
2015-02-18 18:48 [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating Alexander Kuleshov
@ 2015-02-18 19:25 ` OGAWA Hirofumi
2015-02-18 19:46 ` Alexander Kuleshov
0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2015-02-18 19:25 UTC (permalink / raw)
To: Alexander Kuleshov; +Cc: linux-kernel
Alexander Kuleshov <kuleshovmail@gmail.com> writes:
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> + u8 i;
> unsigned char s = name[0];
> - s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
> - s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
> - s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
> - s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
> - s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
> +
> + for (i = 1; i < 11; i++)
> + s = (s << 7) + (s >> 1) + name[i];
> +
> return s;
> }
When I wrote this, IIRC, there was measurable performance
difference. All major gcc versions are enough smart now to optimize this?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
2015-02-18 19:25 ` OGAWA Hirofumi
@ 2015-02-18 19:46 ` Alexander Kuleshov
2015-02-18 20:32 ` OGAWA Hirofumi
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Kuleshov @ 2015-02-18 19:46 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: linux-kernel
Hello,
I tested two simple program with this (gcc-4.9.1)
Results are following:
time ./test_with_loop
real 0m0.001s
user 0m0.000s
sys 0m0.001s
And
time ./test_direct_calculation:
real 0m0.002s
user 0m0.000s
sys 0m0.001s
As you can see result almsot the same. Also i compared assembly output
from two these programras and there are two notes:
1. Assembly output of the program with loop is half as much than
program with directly rotation, but ofcourse we can't take it for rule
here. Current version prepares stack and than just does:
addq $1, %rax
movzbl (%rax), %eax
addl %edx, %eax
movb %al, -1(%rbp)
movzbl -1(%rbp), %eax
rorb %al
movl %eax, %edx
movq -24(%rbp), %rax
11 times, but version with loop does the same but with cmp/jump.
Thank you.
2015-02-19 1:25 GMT+06:00 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Alexander Kuleshov <kuleshovmail@gmail.com> writes:
>
>> static inline unsigned char fat_checksum(const __u8 *name)
>> {
>> + u8 i;
>> unsigned char s = name[0];
>> - s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
>> - s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
>> - s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
>> - s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
>> - s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
>> +
>> + for (i = 1; i < 11; i++)
>> + s = (s << 7) + (s >> 1) + name[i];
>> +
>> return s;
>> }
>
> When I wrote this, IIRC, there was measurable performance
> difference. All major gcc versions are enough smart now to optimize this?
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
2015-02-18 19:46 ` Alexander Kuleshov
@ 2015-02-18 20:32 ` OGAWA Hirofumi
2015-02-23 20:40 ` Heinrich Schuchardt
0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2015-02-18 20:32 UTC (permalink / raw)
To: Alexander Kuleshov; +Cc: linux-kernel
Alexander Kuleshov <kuleshovmail@gmail.com> writes:
> time ./test_with_loop
>
> real 0m0.001s
> user 0m0.000s
> sys 0m0.001s
>
> And
>
> time ./test_direct_calculation:
>
> real 0m0.002s
> user 0m0.000s
> sys 0m0.001s
Hm,
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
typedef unsigned char __u8;
typedef unsigned char u8;
#if 1
static inline unsigned char fat_checksum(const __u8 *name)
{
unsigned char s = name[0];
s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
return s;
}
#else
static inline unsigned char fat_checksum(const __u8 *name)
{
unsigned char s = name[0];
u8 i;
for (i = 1; i < 11; i++)
s = (s << 7) + (s >> 1) + name[i];
return s;
}
#endif
static __attribute__ ((noinline)) int test(unsigned char *name)
{
long i;
for (i = 0; i < 100000000L; i++)
name[i % 11] = fat_checksum(name);
return name[0];
}
int main(int argc, char *argv[])
{
printf("%u\n", test((unsigned char *)argv[1]));
return 0;
}
$ gcc --version
gcc (Debian 4.9.1-19) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -O2 -o c.inline c.c
# change #if 1 => #if 0
$ gcc -O2 -o c.loop c.c
$ time ./c.inline aaaaaaaaaaa
14
real 0m0.550s
user 0m0.548s
sys 0m0.000s
$ time ./c.loop aaaaaaaaaaa
14
real 0m0.901s
user 0m0.896s
sys 0m0.004s
This is my environment only? (gcc (Debian 4.9.1-19) 4.9.1)
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
2015-02-18 20:32 ` OGAWA Hirofumi
@ 2015-02-23 20:40 ` Heinrich Schuchardt
2015-02-24 2:42 ` OGAWA Hirofumi
0 siblings, 1 reply; 7+ messages in thread
From: Heinrich Schuchardt @ 2015-02-23 20:40 UTC (permalink / raw)
To: OGAWA Hirofumi, Alexander Kuleshov; +Cc: linux-kernel
On 18.02.2015 21:32, OGAWA Hirofumi wrote:
> Alexander Kuleshov <kuleshovmail@gmail.com> writes:
>
>> time ./test_with_loop
>>
>> real 0m0.001s
>> user 0m0.000s
>> sys 0m0.001s
>>
>> And
>>
>> time ./test_direct_calculation:
>>
>> real 0m0.002s
>> user 0m0.000s
>> sys 0m0.001s
>
> Hm,
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> typedef unsigned char __u8;
> typedef unsigned char u8;
>
> #if 1
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> unsigned char s = name[0];
> s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
> s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
> s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
> s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
> s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
> return s;
> }
> #else
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> unsigned char s = name[0];
> u8 i;
> for (i = 1; i < 11; i++)
> s = (s << 7) + (s >> 1) + name[i];
> return s;
> }
> #endif
>
> static __attribute__ ((noinline)) int test(unsigned char *name)
You have to put
__attribute__((optimize("unroll-loops")))
here to unroll the loop inside the function:
static __attribute__ ((noinline))
__attribute__((optimize("unroll-loops")))
int test(unsigned char *name)
> {
> long i;
> for (i = 0; i < 100000000L; i++)
> name[i % 11] = fat_checksum(name);
> return name[0];
> }
>
> int main(int argc, char *argv[])
> {
> printf("%u\n", test((unsigned char *)argv[1]));
> return 0;
> }
>
>
> $ gcc --version
> gcc (Debian 4.9.1-19) 4.9.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ gcc -O2 -o c.inline c.c
> # change #if 1 => #if 0
> $ gcc -O2 -o c.loop c.c
>
> $ time ./c.inline aaaaaaaaaaa
> 14
>
> real 0m0.550s
> user 0m0.548s
> sys 0m0.000s
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real 0m0.901s
> user 0m0.896s
> sys 0m0.004s
>
> This is my environment only? (gcc (Debian 4.9.1-19) 4.9.1)
>
$ time ./c.inline aaaaaaaaaaa
14
real 0m0.743s
user 0m0.740s
sys 0m0.000s
Without __attribute__((optimize("unroll-loops"))) :
$ time ./c.loop aaaaaaaaaaa
14
real 0m1.482s
user 0m1.472s
sys 0m0.004s
With __attribute__((optimize("unroll-loops"))) :
$ time ./c.loop aaaaaaaaaaa
14
real 0m0.742s
user 0m0.740s
sys 0m0.000s
Best regards
Heinrich Schuchardt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
2015-02-23 20:40 ` Heinrich Schuchardt
@ 2015-02-24 2:42 ` OGAWA Hirofumi
2015-02-24 19:22 ` Heinrich Schuchardt
0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2015-02-24 2:42 UTC (permalink / raw)
To: Heinrich Schuchardt; +Cc: Alexander Kuleshov, linux-kernel
Heinrich Schuchardt <xypron.glpk@gmx.de> writes:
> You have to put
> __attribute__((optimize("unroll-loops")))
> here to unroll the loop inside the function:
>
>
> static __attribute__ ((noinline))
> __attribute__((optimize("unroll-loops")))
> int test(unsigned char *name)
>
> $ time ./c.inline aaaaaaaaaaa
> 14
>
> real 0m0.743s
> user 0m0.740s
> sys 0m0.000s
>
> Without __attribute__((optimize("unroll-loops"))) :
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real 0m1.482s
> user 0m1.472s
> sys 0m0.004s
>
> With __attribute__((optimize("unroll-loops"))) :
>
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real 0m0.742s
> user 0m0.740s
> sys 0m0.000s
This attribute has to be added to the caller, not fat_checksum()? I.e.,
we has to add it to all callers of fat_checksum()?
Well, this is interesting gcc optimize option though, maybe not worth to
introduce this to kernel only for fatfs.
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating
2015-02-24 2:42 ` OGAWA Hirofumi
@ 2015-02-24 19:22 ` Heinrich Schuchardt
0 siblings, 0 replies; 7+ messages in thread
From: Heinrich Schuchardt @ 2015-02-24 19:22 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: Alexander Kuleshov, linux-kernel
On 24.02.2015 03:42, OGAWA Hirofumi wrote:
> Heinrich Schuchardt <xypron.glpk@gmx.de> writes:
>
>> You have to put
>> __attribute__((optimize("unroll-loops")))
>> here to unroll the loop inside the function:
>>
>>
>> static __attribute__ ((noinline))
>> __attribute__((optimize("unroll-loops")))
>> int test(unsigned char *name)
>>
>> $ time ./c.inline aaaaaaaaaaa
>> 14
>>
>> real 0m0.743s
>> user 0m0.740s
>> sys 0m0.000s
>>
>> Without __attribute__((optimize("unroll-loops"))) :
>> $ time ./c.loop aaaaaaaaaaa
>> 14
>>
>> real 0m1.482s
>> user 0m1.472s
>> sys 0m0.004s
>>
>> With __attribute__((optimize("unroll-loops"))) :
>>
>> $ time ./c.loop aaaaaaaaaaa
>> 14
>>
>> real 0m0.742s
>> user 0m0.740s
>> sys 0m0.000s
>
> This attribute has to be added to the caller, not fat_checksum()? I.e.,
> we has to add it to all callers of fat_checksum()?
The attribute is applied to the declaration of the function in which you
need loop unrolling.
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Attribute-Syntax.html
>
> Well, this is interesting gcc optimize option though, maybe not worth to
> introduce this to kernel only for fatfs.
There is an effort to compile the Linux kernel with Clang. We should
avoid new GCC specific items.
Best regards
Heinrich
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-02-24 19:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18 18:48 [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating Alexander Kuleshov
2015-02-18 19:25 ` OGAWA Hirofumi
2015-02-18 19:46 ` Alexander Kuleshov
2015-02-18 20:32 ` OGAWA Hirofumi
2015-02-23 20:40 ` Heinrich Schuchardt
2015-02-24 2:42 ` OGAWA Hirofumi
2015-02-24 19:22 ` Heinrich Schuchardt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.