* [GIT PULL] Update LZO compression
@ 2012-08-13 23:44 Markus F.X.J. Oberhumer
2012-08-14 3:15 ` Andi Kleen
2012-08-14 12:39 ` Johannes Stezenbach
0 siblings, 2 replies; 18+ messages in thread
From: Markus F.X.J. Oberhumer @ 2012-08-13 23:44 UTC (permalink / raw)
To: linux-kernel
Cc: Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger
Hi all,
as suggested on the mailing list I have converted the updated LZO
code into git, so please pull my "lzo-update" branch from
git://github.com/markus-oberhumer/linux.git lzo-update
You can browse the branch at
https://github.com/markus-oberhumer/linux/compare/lzo-update
I'd ask some official kernel maintainer for review and to push this into
linux-next so that it will hopefully land in the 3.7 release.
Share and enjoy,
Markus
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
> Hi all,
>
> I finally have prepared a small package that updates the LZO version
> in the Linux kernel. Please get it from:
>
> http://www.oberhumer.com/opensource/lzo/download/Testing/linux-kernel-lzo-20120716.tar.gz
>
> As stated in the README this version is significantly faster (typically more
> than 2 times faster!) than the current version, has been thoroughly tested on
> x86_64/i386/powerpc platforms and is intended to get included into the
> official Linux 3.6 or 3.7 release.
>
> I encourage all compression users to test and benchmark this new version,
> and I also would ask some official LZO maintainer to convert the updated
> source files into a GIT commit and possibly push it to Linus or linux-next.
>
> Share and enjoy,
> Markus
>
> Signed-off-by: Markus F.X.J. Oberhumer" <markus@oberhumer.com>
--
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-13 23:44 [GIT PULL] Update LZO compression Markus F.X.J. Oberhumer
@ 2012-08-14 3:15 ` Andi Kleen
2012-08-14 10:10 ` Markus F.X.J. Oberhumer
2012-08-14 12:39 ` Johannes Stezenbach
1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2012-08-14 3:15 UTC (permalink / raw)
To: Markus F.X.J. Oberhumer
Cc: linux-kernel, Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger
On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
> Hi all,
>
> as suggested on the mailing list I have converted the updated LZO
> code into git, so please pull my "lzo-update" branch from
>
> git://github.com/markus-oberhumer/linux.git lzo-update
>
> You can browse the branch at
>
> https://github.com/markus-oberhumer/linux/compare/lzo-update
Looks ok to me from a quick look.
Since kernel lzo is security relevant, I assume the new version
has been fuzz tested?
I couldn't tell from the github view, but I assume you follow
standard coding style.
-Andi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-14 3:15 ` Andi Kleen
@ 2012-08-14 10:10 ` Markus F.X.J. Oberhumer
0 siblings, 0 replies; 18+ messages in thread
From: Markus F.X.J. Oberhumer @ 2012-08-14 10:10 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger
On 2012-08-14 05:15, Andi Kleen wrote:
> On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
>> Hi all,
>>
>> as suggested on the mailing list I have converted the updated LZO
>> code into git, so please pull my "lzo-update" branch from
>>
>> git://github.com/markus-oberhumer/linux.git lzo-update
>>
>> You can browse the branch at
>>
>> https://github.com/markus-oberhumer/linux/compare/lzo-update
>
> Looks ok to me from a quick look.
>
> Since kernel lzo is security relevant, I assume the new version
> has been fuzz tested?
Of course!
> I couldn't tell from the github view, but I assume you follow
> standard coding style.
I've tried my best to follow the style guidelines.
Cheers,
Markus
>
> -Andi
>
--
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-13 23:44 [GIT PULL] Update LZO compression Markus F.X.J. Oberhumer
2012-08-14 3:15 ` Andi Kleen
@ 2012-08-14 12:39 ` Johannes Stezenbach
2012-08-15 12:02 ` Markus F.X.J. Oberhumer
1 sibling, 1 reply; 18+ messages in thread
From: Johannes Stezenbach @ 2012-08-14 12:39 UTC (permalink / raw)
To: Markus F.X.J. Oberhumer
Cc: linux-kernel, Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
> On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
> >
> > As stated in the README this version is significantly faster (typically more
> > than 2 times faster!) than the current version, has been thoroughly tested on
> > x86_64/i386/powerpc platforms and is intended to get included into the
> > official Linux 3.6 or 3.7 release.
> >
> > I encourage all compression users to test and benchmark this new version,
> > and I also would ask some official LZO maintainer to convert the updated
> > source files into a GIT commit and possibly push it to Linus or linux-next.
Sorry for not reporting earlier, but I didn't have time to do real
benchmarks, just a quick test on ARM926EJ-S using barebox,
and found in the new version decompression is slower:
http://lists.infradead.org/pipermail/barebox/2012-July/008268.html
BTW, do you have userspace code matching the old and new
lzo versions? It would be easier to benchmark.
Unfortunately I cannot claim high confidence in my benchmark results
due to missing time to do it properly, it would be useful if
someone else could do some benchmarks on ARM before merging this.
Johannes
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-14 12:39 ` Johannes Stezenbach
@ 2012-08-15 12:02 ` Markus F.X.J. Oberhumer
2012-08-15 14:45 ` Johannes Stezenbach
0 siblings, 1 reply; 18+ messages in thread
From: Markus F.X.J. Oberhumer @ 2012-08-15 12:02 UTC (permalink / raw)
To: Johannes Stezenbach
Cc: linux-kernel, Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
Hi Johannes,
On 2012-08-14 14:39, Johannes Stezenbach wrote:
> On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
>> On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
>>>
>>> As stated in the README this version is significantly faster (typically more
>>> than 2 times faster!) than the current version, has been thoroughly tested on
>>> x86_64/i386/powerpc platforms and is intended to get included into the
>>> official Linux 3.6 or 3.7 release.
>>>
>>> I encourage all compression users to test and benchmark this new version,
>>> and I also would ask some official LZO maintainer to convert the updated
>>> source files into a GIT commit and possibly push it to Linus or linux-next.
>
> Sorry for not reporting earlier, but I didn't have time to do real
> benchmarks, just a quick test on ARM926EJ-S using barebox,
> and found in the new version decompression is slower:
> http://lists.infradead.org/pipermail/barebox/2012-July/008268.html
I can only guess, but maybe your ARM cpu does not have an efficient
implementation of {get,put}_unaligned().
Could you please try the following patch and test if you can see
any significant speed difference?
Thanks,
Markus
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index ddc8db5..efc5714 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -12,8 +12,15 @@
*/
+#if defined(__arm__)
+#define COPY4(dst, src) \
+ (dst)[0] = (src)[0]; (dst)[1] = (src)[1]; \
+ (dst)[2] = (src)[2]; (dst)[3] = (src)[3]
+#endif
+#ifndef COPY4
#define COPY4(dst, src) \
put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#endif
#if defined(__x86_64__)
#define COPY8(dst, src) \
put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))
>
> BTW, do you have userspace code matching the old and new
> lzo versions? It would be easier to benchmark.
>
> Unfortunately I cannot claim high confidence in my benchmark results
> due to missing time to do it properly, it would be useful if
> someone else could do some benchmarks on ARM before merging this.
>
>
> Johannes
--
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-15 12:02 ` Markus F.X.J. Oberhumer
@ 2012-08-15 14:45 ` Johannes Stezenbach
2012-08-16 6:27 ` Markus F.X.J. Oberhumer
0 siblings, 1 reply; 18+ messages in thread
From: Johannes Stezenbach @ 2012-08-15 14:45 UTC (permalink / raw)
To: Markus F.X.J. Oberhumer
Cc: linux-kernel, Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]
Hi Marcus,
On Wed, Aug 15, 2012 at 02:02:43PM +0200, Markus F.X.J. Oberhumer wrote:
> On 2012-08-14 14:39, Johannes Stezenbach wrote:
> > On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
> >> On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
> >>>
> >>> As stated in the README this version is significantly faster (typically more
> >>> than 2 times faster!) than the current version, has been thoroughly tested on
> >>> x86_64/i386/powerpc platforms and is intended to get included into the
> >>> official Linux 3.6 or 3.7 release.
> >>>
> >>> I encourage all compression users to test and benchmark this new version,
> >>> and I also would ask some official LZO maintainer to convert the updated
> >>> source files into a GIT commit and possibly push it to Linus or linux-next.
> >
> > Sorry for not reporting earlier, but I didn't have time to do real
> > benchmarks, just a quick test on ARM926EJ-S using barebox,
> > and found in the new version decompression is slower:
> > http://lists.infradead.org/pipermail/barebox/2012-July/008268.html
>
> I can only guess, but maybe your ARM cpu does not have an efficient
> implementation of {get,put}_unaligned().
Yes, ARMv5 cannot do unaligned access. ARMv6+ could, but
I think the Linux kernel normally traps it for debug,
all ARM seem to use generic {get,put}_unaligned() implementation
which use byte access and shift.
> Could you please try the following patch and test if you can see
> any significant speed difference?
It isn't. I made the attached quick hack userspace code
using ARM kernel headers and barebox unlzop code.
(new == your new code, old == linux-3.5 git, test == new + your suggested change)
(sorry I had no time to clean it up)
I compressed a Linux Image with lzop (lzop <arch/arm/boot/Image >lzoimage)
and timed uncompression:
# time ./unlzopold <lzoimage >/dev/null
real 0m 0.29s
user 0m 0.19s
sys 0m 0.10s
# time ./unlzopold <lzoimage >/dev/null
real 0m 0.29s
user 0m 0.20s
sys 0m 0.09s
# time ./unlzopnew <lzoimage >/dev/null
real 0m 0.41s
user 0m 0.30s
sys 0m 0.10s
# time ./unlzopnew <lzoimage >/dev/null
real 0m 0.40s
user 0m 0.30s
sys 0m 0.10s
# time ./unlzopnew <lzoimage >/dev/null
real 0m 0.40s
user 0m 0.29s
sys 0m 0.11s
# time ./unlzoptest <lzoimage >/dev/null
real 0m 0.39s
user 0m 0.28s
sys 0m 0.11s
# time ./unlzoptest <lzoimage >/dev/null
real 0m 0.39s
user 0m 0.27s
sys 0m 0.11s
# time ./unlzoptest <lzoimage >/dev/null
real 0m 0.39s
user 0m 0.27s
sys 0m 0.11s
FWIW I also checked the sha1sum to confirm the Image uncompressed OK.
Johannes
[-- Attachment #2: lzo-bench.tar.gz --]
[-- Type: application/octet-stream, Size: 54224 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-15 14:45 ` Johannes Stezenbach
@ 2012-08-16 6:27 ` Markus F.X.J. Oberhumer
2012-08-16 15:06 ` Johannes Stezenbach
2012-08-16 15:21 ` Jeff Garzik
0 siblings, 2 replies; 18+ messages in thread
From: Markus F.X.J. Oberhumer @ 2012-08-16 6:27 UTC (permalink / raw)
To: Johannes Stezenbach
Cc: linux-kernel, Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
On 2012-08-15 16:45, Johannes Stezenbach wrote:
> On Wed, Aug 15, 2012 at 02:02:43PM +0200, Markus F.X.J. Oberhumer wrote:
>> On 2012-08-14 14:39, Johannes Stezenbach wrote:
>>> On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
>>>> On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
>>>>>
>>>>> As stated in the README this version is significantly faster (typically more
>>>>> than 2 times faster!) than the current version, has been thoroughly tested on
>>>>> x86_64/i386/powerpc platforms and is intended to get included into the
>>>>> official Linux 3.6 or 3.7 release.
>>>>>
>>>>> I encourage all compression users to test and benchmark this new version,
>>>>> and I also would ask some official LZO maintainer to convert the updated
>>>>> source files into a GIT commit and possibly push it to Linus or linux-next.
>>>
>>> Sorry for not reporting earlier, but I didn't have time to do real
>>> benchmarks, just a quick test on ARM926EJ-S using barebox,
>>> and found in the new version decompression is slower:
>>> http://lists.infradead.org/pipermail/barebox/2012-July/008268.html
>>
>> I can only guess, but maybe your ARM cpu does not have an efficient
>> implementation of {get,put}_unaligned().
>
> Yes, ARMv5 cannot do unaligned access. ARMv6+ could, but
> I think the Linux kernel normally traps it for debug,
> all ARM seem to use generic {get,put}_unaligned() implementation
> which use byte access and shift.
Hmm - I could imagine that we're wasting a lot of possible speed gain
by not exploiting that feature on ARMv6+.
>> Could you please try the following patch and test if you can see
>> any significant speed difference?
>
> It isn't. I made the attached quick hack userspace code
> using ARM kernel headers and barebox unlzop code.
> (new == your new code, old == linux-3.5 git, test == new + your suggested change)
> (sorry I had no time to clean it up)
My suggested COPY4 replacement probably has a lot of load stalls - maybe some
ARM expert could have a look and suggest a more efficient implementation.
In any case, I still would like to see the new code in linux-next because
of the huge improvements on other modern CPUs.
Cheers,
Markus
>
> I compressed a Linux Image with lzop (lzop <arch/arm/boot/Image >lzoimage)
> and timed uncompression:
>
> # time ./unlzopold <lzoimage >/dev/null
> real 0m 0.29s
> user 0m 0.19s
> sys 0m 0.10s
> # time ./unlzopold <lzoimage >/dev/null
> real 0m 0.29s
> user 0m 0.20s
> sys 0m 0.09s
> # time ./unlzopnew <lzoimage >/dev/null
> real 0m 0.41s
> user 0m 0.30s
> sys 0m 0.10s
> # time ./unlzopnew <lzoimage >/dev/null
> real 0m 0.40s
> user 0m 0.30s
> sys 0m 0.10s
> # time ./unlzopnew <lzoimage >/dev/null
> real 0m 0.40s
> user 0m 0.29s
> sys 0m 0.11s
> # time ./unlzoptest <lzoimage >/dev/null
> real 0m 0.39s
> user 0m 0.28s
> sys 0m 0.11s
> # time ./unlzoptest <lzoimage >/dev/null
> real 0m 0.39s
> user 0m 0.27s
> sys 0m 0.11s
> # time ./unlzoptest <lzoimage >/dev/null
> real 0m 0.39s
> user 0m 0.27s
> sys 0m 0.11s
>
> FWIW I also checked the sha1sum to confirm the Image uncompressed OK.
>
>
> Johannes
--
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 6:27 ` Markus F.X.J. Oberhumer
@ 2012-08-16 15:06 ` Johannes Stezenbach
2012-08-16 17:25 ` Roman Mamedov
2012-08-16 15:21 ` Jeff Garzik
1 sibling, 1 reply; 18+ messages in thread
From: Johannes Stezenbach @ 2012-08-16 15:06 UTC (permalink / raw)
To: Markus F.X.J. Oberhumer
Cc: linux-kernel, Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
On Thu, Aug 16, 2012 at 08:27:15AM +0200, Markus F.X.J. Oberhumer wrote:
> On 2012-08-15 16:45, Johannes Stezenbach wrote:
> >
> > I made the attached quick hack userspace code
> > using ARM kernel headers and barebox unlzop code.
> > (new == your new code, old == linux-3.5 git, test == new + your suggested change)
> > (sorry I had no time to clean it up)
>
> My suggested COPY4 replacement probably has a lot of load stalls - maybe some
> ARM expert could have a look and suggest a more efficient implementation.
>
> In any case, I still would like to see the new code in linux-next because
> of the huge improvements on other modern CPUs.
Well, ~2x speedup on x86 is certainly a good achievement, but there
are more ARM based devices than there are PCs, and I guess many
embedded devices use lzo compressed kernels and file systems
while I'm not convinced many PCs rely on lzo in the kernel.
I know everyone's either busy or on vacation, but it would
be so cool if someone could test on a more modern ARM core,
with the userspace test code I posted it should be easy to do.
Johannes
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 6:27 ` Markus F.X.J. Oberhumer
2012-08-16 15:06 ` Johannes Stezenbach
@ 2012-08-16 15:21 ` Jeff Garzik
2012-08-16 16:20 ` Andi Kleen
1 sibling, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2012-08-16 15:21 UTC (permalink / raw)
To: Markus F.X.J. Oberhumer
Cc: Johannes Stezenbach, linux-kernel, Andi Kleen, chris.mason,
linux-btrfs, Nitin Gupta, Richard Purdie,
richard -rw- weinberger, linux-arm-kernel
On 08/16/2012 02:27 AM, Markus F.X.J. Oberhumer wrote:
> On 2012-08-15 16:45, Johannes Stezenbach wrote:
>> On Wed, Aug 15, 2012 at 02:02:43PM +0200, Markus F.X.J. Oberhumer wrote:
>>> On 2012-08-14 14:39, Johannes Stezenbach wrote:
>>>> On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
>>>>> On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
>>>>>>
>>>>>> As stated in the README this version is significantly faster (typically more
>>>>>> than 2 times faster!) than the current version, has been thoroughly tested on
>>>>>> x86_64/i386/powerpc platforms and is intended to get included into the
>>>>>> official Linux 3.6 or 3.7 release.
>>>>>>
>>>>>> I encourage all compression users to test and benchmark this new version,
>>>>>> and I also would ask some official LZO maintainer to convert the updated
>>>>>> source files into a GIT commit and possibly push it to Linus or linux-next.
>>>>
>>>> Sorry for not reporting earlier, but I didn't have time to do real
>>>> benchmarks, just a quick test on ARM926EJ-S using barebox,
>>>> and found in the new version decompression is slower:
>>>> http://lists.infradead.org/pipermail/barebox/2012-July/008268.html
>>>
>>> I can only guess, but maybe your ARM cpu does not have an efficient
>>> implementation of {get,put}_unaligned().
>>
>> Yes, ARMv5 cannot do unaligned access. ARMv6+ could, but
>> I think the Linux kernel normally traps it for debug,
>> all ARM seem to use generic {get,put}_unaligned() implementation
>> which use byte access and shift.
>
> Hmm - I could imagine that we're wasting a lot of possible speed gain
> by not exploiting that feature on ARMv6+.
Or you could just realize that unaligned accesses are slow in the best
case, and are simply not supported on some processors.
If you think a little bit, I bet you could come up with a solution that
operates at cacheline-aligned granularity, something that would be _even
faster_ than simply fixing the code to do aligned accesses.
Jeff
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 15:21 ` Jeff Garzik
@ 2012-08-16 16:20 ` Andi Kleen
2012-08-16 16:48 ` Jeff Garzik
0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2012-08-16 16:20 UTC (permalink / raw)
To: Jeff Garzik
Cc: Markus F.X.J. Oberhumer, Johannes Stezenbach, linux-kernel,
Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
> If you think a little bit, I bet you could come up with a solution that
> operates at cacheline-aligned granularity, something that would be _even
> faster_ than simply fixing the code to do aligned accesses.
Cache aligned compression is unlikely to compress anything at all.
Compression algorithms are usually by definition unaligned.
-Andi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 16:20 ` Andi Kleen
@ 2012-08-16 16:48 ` Jeff Garzik
2012-08-16 17:22 ` Johannes Stezenbach
0 siblings, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2012-08-16 16:48 UTC (permalink / raw)
To: Andi Kleen
Cc: Markus F.X.J. Oberhumer, Johannes Stezenbach, linux-kernel,
chris.mason, linux-btrfs, Nitin Gupta, Richard Purdie,
richard -rw- weinberger, linux-arm-kernel
On 08/16/2012 12:20 PM, Andi Kleen wrote:
>> If you think a little bit, I bet you could come up with a solution that
>> operates at cacheline-aligned granularity, something that would be _even
>> faster_ than simply fixing the code to do aligned accesses.
>
> Cache aligned compression is unlikely to compress anything at all.
> Compression algorithms are usually by definition unaligned.
Sure it's a bitstream, but that does not imply the impossibility of
reading data in in an word-aligned manner.
Maybe cache-aligned is ambitious, because of resultant code bloat, but
machine-int-aligned is doable and reasonable.
Jeff
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 16:48 ` Jeff Garzik
@ 2012-08-16 17:22 ` Johannes Stezenbach
0 siblings, 0 replies; 18+ messages in thread
From: Johannes Stezenbach @ 2012-08-16 17:22 UTC (permalink / raw)
To: Jeff Garzik
Cc: Andi Kleen, Markus F.X.J. Oberhumer, linux-kernel, chris.mason,
linux-btrfs, Nitin Gupta, Richard Purdie,
richard -rw- weinberger, linux-arm-kernel
On Thu, Aug 16, 2012 at 12:48:49PM -0400, Jeff Garzik wrote:
> On 08/16/2012 12:20 PM, Andi Kleen wrote:
> >>If you think a little bit, I bet you could come up with a solution that
> >>operates at cacheline-aligned granularity, something that would be _even
> >>faster_ than simply fixing the code to do aligned accesses.
> >
> >Cache aligned compression is unlikely to compress anything at all.
> >Compression algorithms are usually by definition unaligned.
>
> Sure it's a bitstream, but that does not imply the impossibility of
> reading data in in an word-aligned manner.
>
> Maybe cache-aligned is ambitious, because of resultant code bloat,
> but machine-int-aligned is doable and reasonable.
Well, I for one would be content if the old and new lzo versions
could be merged based on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
(Assuming that the slowdown on ARM is due to unaligned access,
since the old version also uses get/put_unaligned, is the new
version actually using more unaligned accesses?)
Johannes
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 15:06 ` Johannes Stezenbach
@ 2012-08-16 17:25 ` Roman Mamedov
2012-08-16 17:52 ` Andi Kleen
0 siblings, 1 reply; 18+ messages in thread
From: Roman Mamedov @ 2012-08-16 17:25 UTC (permalink / raw)
To: Johannes Stezenbach
Cc: Markus F.X.J. Oberhumer, linux-kernel, Andi Kleen, chris.mason,
linux-btrfs, Nitin Gupta, Richard Purdie,
richard -rw- weinberger, linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]
On Thu, 16 Aug 2012 17:06:47 +0200
Johannes Stezenbach <js@sig21.net> wrote:
> Well, ~2x speedup on x86 is certainly a good achievement, but there
> are more ARM based devices than there are PCs, and I guess many
> embedded devices use lzo compressed kernels and file systems
> while I'm not convinced many PCs rely on lzo in the kernel.
Keep in mind that a major user of LZO is the BTRFS filesystem, and I believe it
is used much more often on larger machines than on ARM, in fact it had problems
operating on ARM at all, until quite recently.
> I know everyone's either busy or on vacation, but it would
> be so cool if someone could test on a more modern ARM core,
> with the userspace test code I posted it should be easy to do.
I have locked the Allwinner A10 CPU in my Mele A2000 to 60 MHz using cpufreq-set,
and ran your test. rnd.lzo is a 9 MB file from /dev/urandom compressed with lzo.
There doesn't seem to be a significant difference between all three variants.
# time for i in {1..20}; do old/unlzop < rnd.lzo >/dev/null ; done
real 0m11.353s
user 0m3.060s
sys 0m8.170s
# time for i in {1..20}; do new/unlzop < rnd.lzo >/dev/null ; done
real 0m11.416s
user 0m3.030s
sys 0m8.200s
# time for i in {1..20}; do test/unlzop < rnd.lzo >/dev/null ; done
real 0m11.310s
user 0m3.100s
sys 0m8.150s
--
With respect,
Roman
~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 17:25 ` Roman Mamedov
@ 2012-08-16 17:52 ` Andi Kleen
2012-08-16 18:18 ` Geert Uytterhoeven
0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2012-08-16 17:52 UTC (permalink / raw)
To: Roman Mamedov
Cc: Johannes Stezenbach, Markus F.X.J. Oberhumer, linux-kernel,
Andi Kleen, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
> I have locked the Allwinner A10 CPU in my Mele A2000 to 60 MHz using cpufreq-set,
> and ran your test. rnd.lzo is a 9 MB file from /dev/urandom compressed with lzo.
> There doesn't seem to be a significant difference between all three variants.
I found that in compression benchmarks it depends a lot on the data
compressed.
urandom (which should be essentially incompressible) will be handled
by different code paths in the compressor than other more compressible data.
It becomes a complicated memcpy then.
But then there are IO benchmarks which also only do zeroes, which
also gives an unrealistic picture.
Usually it's best to use some corpus with different data types, from very
compressible to less so; and look at the aggregate.
For my snappy work I usually had at least large executables (medium) and some
pdfs (already compressed; low) and then uncompressed source code tars (high
compression)
-Andi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 17:52 ` Andi Kleen
@ 2012-08-16 18:18 ` Geert Uytterhoeven
[not found] ` <CAPkEcwgv6f5B+Mrw7kzrHcZ5JZ+Yr0o=bLdoFqCB-e4CLiVx5A@mail.gmail.com>
0 siblings, 1 reply; 18+ messages in thread
From: Geert Uytterhoeven @ 2012-08-16 18:18 UTC (permalink / raw)
To: Andi Kleen
Cc: Roman Mamedov, Johannes Stezenbach, Markus F.X.J. Oberhumer,
linux-kernel, chris.mason, linux-btrfs, Nitin Gupta,
Richard Purdie, richard -rw- weinberger, linux-arm-kernel
On Thu, Aug 16, 2012 at 7:52 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> I have locked the Allwinner A10 CPU in my Mele A2000 to 60 MHz using cpufreq-set,
>> and ran your test. rnd.lzo is a 9 MB file from /dev/urandom compressed with lzo.
>> There doesn't seem to be a significant difference between all three variants.
>
> I found that in compression benchmarks it depends a lot on the data
> compressed.
>
> urandom (which should be essentially incompressible) will be handled
> by different code paths in the compressor than other more compressible data.
> It becomes a complicated memcpy then.
In addition, locking the CPU to 60 MHz may improve the memory access penalty,
as a cache miss may cost much less cycles.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
[not found] ` <CAPkEcwgv6f5B+Mrw7kzrHcZ5JZ+Yr0o=bLdoFqCB-e4CLiVx5A@mail.gmail.com>
@ 2012-08-16 22:17 ` Andi Kleen
2012-08-17 1:23 ` Mitch Harder
2012-09-07 21:31 ` Andi Kleen
1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2012-08-16 22:17 UTC (permalink / raw)
To: james northrup
Cc: Geert Uytterhoeven, Andi Kleen, Roman Mamedov,
Johannes Stezenbach, Markus F.X.J. Oberhumer, linux-kernel,
chris.mason, linux-btrfs, Nitin Gupta, Richard Purdie,
richard -rw- weinberger, linux-arm-kernel
On Thu, Aug 16, 2012 at 11:55:06AM -0700, james northrup wrote:
> looks like ARM results are inconclusive from a lot of folks without
> bandwidth to do a write-up, what about just plain STAGING status for ARM so
> the android tweakers can beat on it for a while?
Staging only really works for new drivers, not for updating existing
library functions like this.
I suppose you could keep both and have the architecture select with a
CONFIG.
-Andi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
2012-08-16 22:17 ` Andi Kleen
@ 2012-08-17 1:23 ` Mitch Harder
0 siblings, 0 replies; 18+ messages in thread
From: Mitch Harder @ 2012-08-17 1:23 UTC (permalink / raw)
To: Andi Kleen
Cc: james northrup, Geert Uytterhoeven, Roman Mamedov,
Johannes Stezenbach, Markus F.X.J. Oberhumer, linux-kernel,
chris.mason, linux-btrfs, Nitin Gupta, Richard Purdie,
richard -rw- weinberger, linux-arm-kernel
On Thu, Aug 16, 2012 at 5:17 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Thu, Aug 16, 2012 at 11:55:06AM -0700, james northrup wrote:
>> looks like ARM results are inconclusive from a lot of folks without
>> bandwidth to do a write-up, what about just plain STAGING status for ARM so
>> the android tweakers can beat on it for a while?
>
> Staging only really works for new drivers, not for updating existing
> library functions like this.
>
> I suppose you could keep both and have the architecture select with a
> CONFIG.
>
I've been doing some rough benchmarking with the updated LZO in btrfs.
My tests primarily consist of timing some typical copying, git
manipulating, and running rsync using a set of kernel git sources.
Git sources are typically about 50% pack files which won't compress
very well, with the remainder being mostly highly compressible source
files.
Of course, any underlying speed improvement attributable only to LZO
is not shown by test like this. But I thought it would be interesting
to see the impact in some typical real-world btrfs operations.
I was seeing between 3-9% improvement in speed with the new LZO.
Copying several directories of git sources showed the most
improvement, ~9%. Typical git operations, such as a git checkout or
git status where only showing 3-5% improvement, which is close to the
noise level of my tests. Running multiple rsync processes showed a 5%
improvement.
With only 10 trials (5 with each LZO), I can't say I would
statistically hang my hat on these numbers.
Given all the other stuff that is going on in my rough benchmarks, a
3-9% improvement from a single change is probably pretty good.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL] Update LZO compression
[not found] ` <CAPkEcwgv6f5B+Mrw7kzrHcZ5JZ+Yr0o=bLdoFqCB-e4CLiVx5A@mail.gmail.com>
2012-08-16 22:17 ` Andi Kleen
@ 2012-09-07 21:31 ` Andi Kleen
1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2012-09-07 21:31 UTC (permalink / raw)
To: james northrup
Cc: Geert Uytterhoeven, linux-arm-kernel, Johannes Stezenbach,
richard -rw- weinberger, Markus F.X.J. Oberhumer, linux-kernel,
Richard Purdie, Roman Mamedov, chris.mason, Nitin Gupta,
linux-btrfs
james northrup <northrup.james@gmail.com> writes:
> looks like ARM results are inconclusive from a lot of folks without bandwidth
> to do a write-up, what about just plain STAGING status for ARM so the android
> tweakers can beat on it for a while?
Again staging doesn't really work for a core library, it's more
for separate drivers.
FWIW we did some tests and the new library seems to be generally faster
on x86.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-09-07 21:31 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-13 23:44 [GIT PULL] Update LZO compression Markus F.X.J. Oberhumer
2012-08-14 3:15 ` Andi Kleen
2012-08-14 10:10 ` Markus F.X.J. Oberhumer
2012-08-14 12:39 ` Johannes Stezenbach
2012-08-15 12:02 ` Markus F.X.J. Oberhumer
2012-08-15 14:45 ` Johannes Stezenbach
2012-08-16 6:27 ` Markus F.X.J. Oberhumer
2012-08-16 15:06 ` Johannes Stezenbach
2012-08-16 17:25 ` Roman Mamedov
2012-08-16 17:52 ` Andi Kleen
2012-08-16 18:18 ` Geert Uytterhoeven
[not found] ` <CAPkEcwgv6f5B+Mrw7kzrHcZ5JZ+Yr0o=bLdoFqCB-e4CLiVx5A@mail.gmail.com>
2012-08-16 22:17 ` Andi Kleen
2012-08-17 1:23 ` Mitch Harder
2012-09-07 21:31 ` Andi Kleen
2012-08-16 15:21 ` Jeff Garzik
2012-08-16 16:20 ` Andi Kleen
2012-08-16 16:48 ` Jeff Garzik
2012-08-16 17:22 ` Johannes Stezenbach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).