All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Terrell <terrelln@fb.com>
To: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Nick Terrell <nickrterrell@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>, Kernel Team <Kernel-team@fb.com>,
	Yann Collet <yann.collet.73@gmail.com>,
	Gao Xiang <gaoxiang25@huawei.com>,
	Sven Schmidt <4sschmid@informatik.uni-hamburg.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] lz4: Fix kernel decompression speed
Date: Tue, 4 Aug 2020 02:57:50 +0000	[thread overview]
Message-ID: <0C67BA74-E014-449B-9E22-E0B1DDB930BF@fb.com> (raw)
In-Reply-To: <20200804015654.GA1943218@rani.riverdale.lan>



> On Aug 3, 2020, at 6:56 PM, Arvind Sankar <nivedita@alum.mit.edu> wrote:
> 
> On Mon, Aug 03, 2020 at 10:55:01PM +0000, Nick Terrell wrote:
>> 
>> 
>>> On Aug 3, 2020, at 2:57 PM, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>>> 
>>> On Mon, Aug 03, 2020 at 12:40:22PM -0700, Nick Terrell wrote:
>>>> From: Nick Terrell <terrelln@fb.com>
>>>> 
>>>> This patch replaces all memcpy() calls with LZ4_memcpy() which calls
>>>> __builtin_memcpy() so the compiler can inline it.
>>>> 
>>>> LZ4 relies heavily on memcpy() with a constant size being inlined. In
>>>> x86 and i386 pre-boot environments memcpy() cannot be inlined because
>>>> memcpy() doesn't get defined as __builtin_memcpy().
>>>> 
>>>> An equivalent patch has been applied upstream so that the next import
>>>> won't lose this change [1].
>>>> 
>>>> I've measured the kernel decompression speed using QEMU before and after
>>>> this patch for the x86_64 and i386 architectures. The speed-up is about
>>>> 10x as shown below.
>>>> 
>>>> Code	Arch	Kernel Size	Time	Speed
>>>> v5.8	x86_64	11504832 B	148 ms	 79 MB/s
>>>> patch	x86_64	11503872 B	 13 ms	885 MB/s
>>>> v5.8	i386	 9621216 B	 91 ms	106 MB/s
>>>> patch	i386	 9620224 B	 10 ms	962 MB/s
>>>> 
>>>> I also measured the time to decompress the initramfs on x86_64, i386,
>>>> and arm. All three show the same decompression speed before and after,
>>>> as expected.
>>>> 
>>>> [1] https://github.com/lz4/lz4/pull/890
>>>> 
>>> 
>>> Hi Nick, would you be able to test the below patch's performance to
>>> verify it gives the same speedup? It removes the #undef in misc.c which
>>> causes the decompressors to not use the builtin version. It should be
>>> equivalent to yours except for applying it to all the decompressors.
>>> 
>>> Thanks.
>> 
>> I will measure it. I would expect it to provide the same speed up. It would be great to fix
>> the problem for x86/i386 in general.
> 
> Thanks. I tried using RDTSC to get some timings under QEMU, and I get
> similar speedup as you have for LZ4, and around 15-20% or so for ZSTD
> (on 64-bit)

By the way, I was using this script for performance testing [0].

> -- I see that ZSTD_copy8 is already using __builtin_memcpy,
> but there must be more that can be optimized? There's a couple 1/2-byte
> sized copies in huf_decompress.c.

Oh wow, I totally missed that, I guess I stopped looking once performance
was about what I expected, nice find!

I suspect it is mostly the memcpy inside of HUF_decodeSymbolX4(), since
that should be the only hot one [1].

Do you want to put up the patch to fix the memcpy’s in zstd Huffman, or should I?

I will be submitting a patch upstream to migrate all of zstd’s memcpy() calls to
use __builtin_memcpy(), since I plan on updating the version in the kernel to
upstream zstd in the next few months. I was waiting until the compressed kernel
patch set landed, so I didn't distract from it.

[0] https://gist.github.com/terrelln/9bd53321a669f62683c608af8944fbc2
[1] https://github.com/torvalds/linux/blob/master/lib/zstd/huf_decompress.c#L598

Best,
Nick


  reply	other threads:[~2020-08-04  2:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 19:40 [PATCH] lz4: Fix kernel decompression speed Nick Terrell
2020-08-03 21:57 ` Arvind Sankar
2020-08-03 22:55   ` Nick Terrell
2020-08-04  1:56     ` Arvind Sankar
2020-08-04  2:57       ` Nick Terrell [this message]
2020-08-04 15:19         ` Arvind Sankar
2020-08-04 17:59       ` Nick Terrell
2020-08-04 23:48         ` [PATCH 0/1] x86/boot/compressed: Use builtin mem functions for decompressor Arvind Sankar
2020-08-04 23:48           ` [PATCH 1/1] " Arvind Sankar
2020-08-19 18:14             ` Kees Cook
2020-08-19 18:22               ` Linus Torvalds
2020-08-04  8:32     ` [PATCH] lz4: Fix kernel decompression speed Pavel Machek
2020-08-04 15:16       ` Arvind Sankar
2020-08-04 17:18         ` Nick Terrell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0C67BA74-E014-449B-9E22-E0B1DDB930BF@fb.com \
    --to=terrelln@fb.com \
    --cc=4sschmid@informatik.uni-hamburg.de \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=gaoxiang25@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=nickrterrell@gmail.com \
    --cc=nivedita@alum.mit.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=yann.collet.73@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.