All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arvind Sankar <nivedita@alum.mit.edu>
To: Nick Terrell <nickrterrell@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	Kernel Team <Kernel-team@fb.com>, Nick Terrell <terrelln@fb.com>,
	Yann Collet <yann.collet.73@gmail.com>,
	Gao Xiang <gaoxiang25@huawei.com>,
	Sven Schmidt <4sschmid@informatik.uni-hamburg.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] lz4: Fix kernel decompression speed
Date: Mon, 3 Aug 2020 17:57:47 -0400	[thread overview]
Message-ID: <20200803215747.GA1644409@rani.riverdale.lan> (raw)
In-Reply-To: <20200803194022.2966806-1-nickrterrell@gmail.com>

On Mon, Aug 03, 2020 at 12:40:22PM -0700, Nick Terrell wrote:
> From: Nick Terrell <terrelln@fb.com>
> 
> This patch replaces all memcpy() calls with LZ4_memcpy() which calls
> __builtin_memcpy() so the compiler can inline it.
> 
> LZ4 relies heavily on memcpy() with a constant size being inlined. In
> x86 and i386 pre-boot environments memcpy() cannot be inlined because
> memcpy() doesn't get defined as __builtin_memcpy().
> 
> An equivalent patch has been applied upstream so that the next import
> won't lose this change [1].
> 
> I've measured the kernel decompression speed using QEMU before and after
> this patch for the x86_64 and i386 architectures. The speed-up is about
> 10x as shown below.
> 
> Code	Arch	Kernel Size	Time	Speed
> v5.8	x86_64	11504832 B	148 ms	 79 MB/s
> patch	x86_64	11503872 B	 13 ms	885 MB/s
> v5.8	i386	 9621216 B	 91 ms	106 MB/s
> patch	i386	 9620224 B	 10 ms	962 MB/s
> 
> I also measured the time to decompress the initramfs on x86_64, i386,
> and arm. All three show the same decompression speed before and after,
> as expected.
> 
> [1] https://github.com/lz4/lz4/pull/890
> 

Hi Nick, would you be able to test the below patch's performance to
verify it gives the same speedup? It removes the #undef in misc.c which
causes the decompressors to not use the builtin version. It should be
equivalent to yours except for applying it to all the decompressors.

Thanks.

From 10f8d939fc367e3127e2d72ba099678debcae422 Mon Sep 17 00:00:00 2001
From: Arvind Sankar <nivedita@alum.mit.edu>
Date: Mon, 3 Aug 2020 17:07:37 -0400
Subject: [PATCH] x86/boot/compressed: Use builtin mem functions for decompressor

Since commits

  c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions")
  fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c")

the decompressor stub has been using the compiler's builtin memcpy,
memset and memcmp functions, _except_ where it would likely have the
largest impact, in the decompression code itself.

Remove the #undef's of memcpy and memset in misc.c so that the
decompressor code also uses the compiler builtins.

The rationale given in the comment doesn't really apply: just because
some functions use the out-of-line version is no reason to not use the
builtin version in the rest.

Replace the comment with an explanation of why memzero and memmove are
being #define'd.

Drop the suggestion to #undef in boot/string.h as well: the out-of-line
versions are not really optimized versions, they're generic code that's
good enough for the preboot environment. The compiler will likely
generate better code for constant-size memcpy/memset/memcmp if it is
allowed to.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 arch/x86/boot/compressed/misc.c | 7 ++-----
 arch/x86/boot/string.h          | 5 +----
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 9652d5c2afda..0c74a6e526b6 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -30,12 +30,9 @@
 #define STATIC		static
 
 /*
- * Use normal definitions of mem*() from string.c. There are already
- * included header files which expect a definition of memset() and by
- * the time we define memset macro, it is too late.
+ * Provide definitions of memzero and memmove as some of the decompressors will
+ * try to define their own functions if these are not defined as macros.
  */
-#undef memcpy
-#undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 #define memmove		memmove
 
diff --git a/arch/x86/boot/string.h b/arch/x86/boot/string.h
index 995f7b7ad512..a232da487cd2 100644
--- a/arch/x86/boot/string.h
+++ b/arch/x86/boot/string.h
@@ -11,10 +11,7 @@ void *memcpy(void *dst, const void *src, size_t len);
 void *memset(void *dst, int c, size_t len);
 int memcmp(const void *s1, const void *s2, size_t len);
 
-/*
- * Access builtin version by default. If one needs to use optimized version,
- * do "undef memcpy" in .c file and link against right string.c
- */
+/* Access builtin version by default. */
 #define memcpy(d,s,l) __builtin_memcpy(d,s,l)
 #define memset(d,c,l) __builtin_memset(d,c,l)
 #define memcmp	__builtin_memcmp
-- 
2.26.2


  reply	other threads:[~2020-08-03 21:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 19:40 [PATCH] lz4: Fix kernel decompression speed Nick Terrell
2020-08-03 21:57 ` Arvind Sankar [this message]
2020-08-03 22:55   ` Nick Terrell
2020-08-04  1:56     ` Arvind Sankar
2020-08-04  2:57       ` Nick Terrell
2020-08-04 15:19         ` Arvind Sankar
2020-08-04 17:59       ` Nick Terrell
2020-08-04 23:48         ` [PATCH 0/1] x86/boot/compressed: Use builtin mem functions for decompressor Arvind Sankar
2020-08-04 23:48           ` [PATCH 1/1] " Arvind Sankar
2020-08-19 18:14             ` Kees Cook
2020-08-19 18:22               ` Linus Torvalds
2020-08-04  8:32     ` [PATCH] lz4: Fix kernel decompression speed Pavel Machek
2020-08-04 15:16       ` Arvind Sankar
2020-08-04 17:18         ` Nick Terrell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200803215747.GA1644409@rani.riverdale.lan \
    --to=nivedita@alum.mit.edu \
    --cc=4sschmid@informatik.uni-hamburg.de \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=gaoxiang25@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=nickrterrell@gmail.com \
    --cc=terrelln@fb.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=yann.collet.73@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.