All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Terrell <nickrterrell@gmail.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	Kernel Team <Kernel-team@fb.com>,
	Nick Terrell <nickrterrell@gmail.com>,
	Nick Terrell <terrelln@fb.com>,
	Yann Collet <yann.collet.73@gmail.com>,
	Gao Xiang <gaoxiang25@huawei.com>,
	Sven Schmidt <4sschmid@informatik.uni-hamburg.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH] lz4: Fix kernel decompression speed
Date: Mon,  3 Aug 2020 12:40:22 -0700	[thread overview]
Message-ID: <20200803194022.2966806-1-nickrterrell@gmail.com> (raw)

From: Nick Terrell <terrelln@fb.com>

This patch replaces all memcpy() calls with LZ4_memcpy() which calls
__builtin_memcpy() so the compiler can inline it.

LZ4 relies heavily on memcpy() with a constant size being inlined. In
x86 and i386 pre-boot environments memcpy() cannot be inlined because
memcpy() doesn't get defined as __builtin_memcpy().

An equivalent patch has been applied upstream so that the next import
won't lose this change [1].

I've measured the kernel decompression speed using QEMU before and after
this patch for the x86_64 and i386 architectures. The speed-up is about
10x as shown below.

Code	Arch	Kernel Size	Time	Speed
v5.8	x86_64	11504832 B	148 ms	 79 MB/s
patch	x86_64	11503872 B	 13 ms	885 MB/s
v5.8	i386	 9621216 B	 91 ms	106 MB/s
patch	i386	 9620224 B	 10 ms	962 MB/s

I also measured the time to decompress the initramfs on x86_64, i386,
and arm. All three show the same decompression speed before and after,
as expected.

[1] https://github.com/lz4/lz4/pull/890

Signed-off-by: Nick Terrell <terrelln@fb.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 lib/lz4/lz4_compress.c   |  4 ++--
 lib/lz4/lz4_decompress.c | 18 +++++++++---------
 lib/lz4/lz4defs.h        | 10 ++++++++++
 lib/lz4/lz4hc_compress.c |  2 +-
 4 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
index cc7b6d4cc7c7..90bb67994688 100644
--- a/lib/lz4/lz4_compress.c
+++ b/lib/lz4/lz4_compress.c
@@ -446,7 +446,7 @@ static FORCE_INLINE int LZ4_compress_generic(
 			*op++ = (BYTE)(lastRun << ML_BITS);
 		}
 
-		memcpy(op, anchor, lastRun);
+		LZ4_memcpy(op, anchor, lastRun);
 
 		op += lastRun;
 	}
@@ -708,7 +708,7 @@ static int LZ4_compress_destSize_generic(
 		} else {
 			*op++ = (BYTE)(lastRunSize<<ML_BITS);
 		}
-		memcpy(op, anchor, lastRunSize);
+		LZ4_memcpy(op, anchor, lastRunSize);
 		op += lastRunSize;
 	}
 
diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
index 5371dab6b481..00cb0d0b73e1 100644
--- a/lib/lz4/lz4_decompress.c
+++ b/lib/lz4/lz4_decompress.c
@@ -153,7 +153,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
 		   && likely((endOnInput ? ip < shortiend : 1) &
 			     (op <= shortoend))) {
 			/* Copy the literals */
-			memcpy(op, ip, endOnInput ? 16 : 8);
+			LZ4_memcpy(op, ip, endOnInput ? 16 : 8);
 			op += length; ip += length;
 
 			/*
@@ -172,9 +172,9 @@ static FORCE_INLINE int LZ4_decompress_generic(
 			    (offset >= 8) &&
 			    (dict == withPrefix64k || match >= lowPrefix)) {
 				/* Copy the match. */
-				memcpy(op + 0, match + 0, 8);
-				memcpy(op + 8, match + 8, 8);
-				memcpy(op + 16, match + 16, 2);
+				LZ4_memcpy(op + 0, match + 0, 8);
+				LZ4_memcpy(op + 8, match + 8, 8);
+				LZ4_memcpy(op + 16, match + 16, 2);
 				op += length + MINMATCH;
 				/* Both stages worked, load the next token. */
 				continue;
@@ -263,7 +263,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
 				}
 			}
 
-			memcpy(op, ip, length);
+			LZ4_memcpy(op, ip, length);
 			ip += length;
 			op += length;
 
@@ -350,7 +350,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
 				size_t const copySize = (size_t)(lowPrefix - match);
 				size_t const restSize = length - copySize;
 
-				memcpy(op, dictEnd - copySize, copySize);
+				LZ4_memcpy(op, dictEnd - copySize, copySize);
 				op += copySize;
 				if (restSize > (size_t)(op - lowPrefix)) {
 					/* overlap copy */
@@ -360,7 +360,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
 					while (op < endOfMatch)
 						*op++ = *copyFrom++;
 				} else {
-					memcpy(op, lowPrefix, restSize);
+					LZ4_memcpy(op, lowPrefix, restSize);
 					op += restSize;
 				}
 			}
@@ -386,7 +386,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
 				while (op < copyEnd)
 					*op++ = *match++;
 			} else {
-				memcpy(op, match, mlen);
+				LZ4_memcpy(op, match, mlen);
 			}
 			op = copyEnd;
 			if (op == oend)
@@ -400,7 +400,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
 			op[2] = match[2];
 			op[3] = match[3];
 			match += inc32table[offset];
-			memcpy(op + 4, match, 4);
+			LZ4_memcpy(op + 4, match, 4);
 			match -= dec64table[offset];
 		} else {
 			LZ4_copy8(op, match);
diff --git a/lib/lz4/lz4defs.h b/lib/lz4/lz4defs.h
index 1a7fa9d9170f..c91dd96ef629 100644
--- a/lib/lz4/lz4defs.h
+++ b/lib/lz4/lz4defs.h
@@ -137,6 +137,16 @@ static FORCE_INLINE void LZ4_writeLE16(void *memPtr, U16 value)
 	return put_unaligned_le16(value, memPtr);
 }
 
+/*
+ * LZ4 relies on memcpy with a constant size being inlined. In freestanding
+ * environments, the compiler can't assume the implementation of memcpy() is
+ * standard compliant, so apply its specialized memcpy() inlining logic. When
+ * possible, use __builtin_memcpy() to tell the compiler to analyze memcpy()
+ * as-if it were standard compliant, so it can inline it in freestanding
+ * environments. This is needed when decompressing the Linux Kernel, for example.
+ */
+#define LZ4_memcpy(dst, src, size) __builtin_memcpy(dst, src, size)
+
 static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
 {
 #if LZ4_ARCH64
diff --git a/lib/lz4/lz4hc_compress.c b/lib/lz4/lz4hc_compress.c
index 1b61d874e337..e7ac8694b797 100644
--- a/lib/lz4/lz4hc_compress.c
+++ b/lib/lz4/lz4hc_compress.c
@@ -570,7 +570,7 @@ static int LZ4HC_compress_generic(
 			*op++ = (BYTE) lastRun;
 		} else
 			*op++ = (BYTE)(lastRun<<ML_BITS);
-		memcpy(op, anchor, iend - anchor);
+		LZ4_memcpy(op, anchor, iend - anchor);
 		op += iend - anchor;
 	}
 
-- 
2.28.0


             reply	other threads:[~2020-08-03 19:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 19:40 Nick Terrell [this message]
2020-08-03 21:57 ` [PATCH] lz4: Fix kernel decompression speed Arvind Sankar
2020-08-03 22:55   ` Nick Terrell
2020-08-04  1:56     ` Arvind Sankar
2020-08-04  2:57       ` Nick Terrell
2020-08-04 15:19         ` Arvind Sankar
2020-08-04 17:59       ` Nick Terrell
2020-08-04 23:48         ` [PATCH 0/1] x86/boot/compressed: Use builtin mem functions for decompressor Arvind Sankar
2020-08-04 23:48           ` [PATCH 1/1] " Arvind Sankar
2020-08-19 18:14             ` Kees Cook
2020-08-19 18:22               ` Linus Torvalds
2020-08-04  8:32     ` [PATCH] lz4: Fix kernel decompression speed Pavel Machek
2020-08-04 15:16       ` Arvind Sankar
2020-08-04 17:18         ` Nick Terrell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200803194022.2966806-1-nickrterrell@gmail.com \
    --to=nickrterrell@gmail.com \
    --cc=4sschmid@informatik.uni-hamburg.de \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=gaoxiang25@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=terrelln@fb.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=yann.collet.73@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.