linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Rodgman <dave.rodgman@arm.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
	"davem@davemloft.net" <davem@davemloft.net>,
	Matt Sealey <Matt.Sealey@arm.com>,
	"nitingupta910@gmail.com" <nitingupta910@gmail.com>,
	"markus@oberhumer.com" <markus@oberhumer.com>,
	"minchan@kernel.org" <minchan@kernel.org>,
	"sergey.senozhatsky.work@gmail.com" 
	<sergey.senozhatsky.work@gmail.com>,
	"sonnyrao@google.com" <sonnyrao@google.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	nd <nd@arm.com>, "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
Subject: [PATCH 2/8] lib/lzo: clean-up by introducing COPY16
Date: Fri, 30 Nov 2018 14:26:23 +0000	[thread overview]
Message-ID: <20181130142600.13782-3-dave.rodgman@arm.com> (raw)
In-Reply-To: <20181130142600.13782-1-dave.rodgman@arm.com>

From: Matt Sealey <matt.sealey@arm.com>

Most compilers should be able to merge adjacent loads/stores of sizes
which are less than but effect a multiple of a machine word size (in
effect a memcpy() of a constant amount). However the semantics of the
macro are that it just does the copy, the pointer increment is in the
code, hence we see

    *a = *b
    a += 8
    b += 8
    *a = *b
    a += 8
    b += 8

This introduces a dependency between the two groups of statements which
seems to defeat said compiler optimizers and generate some very strange
sequences of addition and subtraction of address offsets (i.e. it is
overcomplicated).

Since COPY8 is only ever used to copy amounts of 16 bytes (in pairs),
just define COPY16 as COPY8,COPY8. We leave the definition to preserve
the need to do unaligned accesses to machine-sized words per the
original code intent, we just don't use it in the code proper.

COPY16 then gives us code like:

    *a = *b
    *(a+8) = *(b+8)
    a += 16
    b += 16

This seems to allow compilers to generate much better code by using
base register writeback or simply positively incrementing offsets which
seems to positively affect performance. It is, at least, fewer
instructions to do the same job.

Link: http://lkml.kernel.org/r/20181127161913.23863-3-dave.rodgman@arm.com
Signed-off-by: Matt Sealey <matt.sealey@arm.com>
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <nitingupta910@gmail.com>
Cc: Richard Purdie <rpurdie@openedhand.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sonny Rao <sonnyrao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 lib/lzo/lzo1x_compress.c        |  9 +++------
 lib/lzo/lzo1x_decompress_safe.c | 18 ++++++------------
 lib/lzo/lzodefs.h               |  3 +++
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/lib/lzo/lzo1x_compress.c b/lib/lzo/lzo1x_compress.c
index 236eb21167b5..82fb5571ce5e 100644
--- a/lib/lzo/lzo1x_compress.c
+++ b/lib/lzo/lzo1x_compress.c
@@ -60,8 +60,7 @@ lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
 				op += t;
 			} else if (t <= 16) {
 				*op++ = (t - 3);
-				COPY8(op, ii);
-				COPY8(op + 8, ii + 8);
+				COPY16(op, ii);
 				op += t;
 			} else {
 				if (t <= 18) {
@@ -76,8 +75,7 @@ lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
 					*op++ = tt;
 				}
 				do {
-					COPY8(op, ii);
-					COPY8(op + 8, ii + 8);
+					COPY16(op, ii);
 					op += 16;
 					ii += 16;
 					t -= 16;
@@ -255,8 +253,7 @@ int lzo1x_1_compress(const unsigned char *in, size_t in_len,
 			*op++ = tt;
 		}
 		if (t >= 16) do {
-			COPY8(op, ii);
-			COPY8(op + 8, ii + 8);
+			COPY16(op, ii);
 			op += 16;
 			ii += 16;
 			t -= 16;
diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index a1c387f6afba..aa95d3066b7d 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -86,12 +86,9 @@ int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
 					const unsigned char *ie = ip + t;
 					unsigned char *oe = op + t;
 					do {
-						COPY8(op, ip);
-						op += 8;
-						ip += 8;
-						COPY8(op, ip);
-						op += 8;
-						ip += 8;
+						COPY16(op, ip);
+						op += 16;
+						ip += 16;
 					} while (ip < ie);
 					ip = ie;
 					op = oe;
@@ -187,12 +184,9 @@ int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
 			unsigned char *oe = op + t;
 			if (likely(HAVE_OP(t + 15))) {
 				do {
-					COPY8(op, m_pos);
-					op += 8;
-					m_pos += 8;
-					COPY8(op, m_pos);
-					op += 8;
-					m_pos += 8;
+					COPY16(op, m_pos);
+					op += 16;
+					m_pos += 16;
 				} while (op < oe);
 				op = oe;
 				if (HAVE_IP(6)) {
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index 497f9c9f03a8..e1b3cf6459a9 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -23,6 +23,9 @@
 		COPY4(dst, src); COPY4((dst) + 4, (src) + 4)
 #endif
 
+#define COPY16(dst, src) \
+	do { COPY8(dst, src); COPY8((dst) + 8, (src) + 8); } while (0)
+
 #if defined(__BIG_ENDIAN) && defined(__LITTLE_ENDIAN)
 #error "conflicting endian definitions"
 #elif defined(CONFIG_X86_64)
-- 
2.17.1


  parent reply	other threads:[~2018-11-30 14:26 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 14:26 [PATCH v4 0/7] lib/lzo: performance improvements Dave Rodgman
2018-11-30 14:26 ` [PATCH 1/8] lib/lzo: tidy-up ifdefs Dave Rodgman
2018-12-06 15:49   ` Markus F.X.J. Oberhumer
2018-11-30 14:26 ` Dave Rodgman [this message]
2018-12-06 15:56   ` [PATCH 2/8] lib/lzo: clean-up by introducing COPY16 Markus F.X.J. Oberhumer
2018-11-30 14:26 ` [PATCH 3/8] lib/lzo: enable 64-bit CTZ on Arm Dave Rodgman
2018-12-06 15:51   ` Markus F.X.J. Oberhumer
2018-11-30 14:26 ` [PATCH 4/8] lib/lzo: 64-bit CTZ on arm64 Dave Rodgman
2018-12-06 15:52   ` Markus F.X.J. Oberhumer
2018-11-30 14:26 ` [PATCH 5/8] lib/lzo: fast 8-byte copy " Dave Rodgman
2018-12-06 15:53   ` Markus F.X.J. Oberhumer
2018-11-30 14:26 ` [PATCH 6/8] lib/lzo: implement run-length encoding Dave Rodgman
2018-12-06 15:56   ` Markus F.X.J. Oberhumer
2018-11-30 14:26 ` [PATCH 7/8] lib/lzo: separate lzo-rle from lzo Dave Rodgman
2018-12-01  8:48   ` Herbert Xu
2018-11-30 14:26 ` [PATCH 8/8] zram: default to lzo-rle instead of lzo Dave Rodgman
2018-12-05  7:30 ` [PATCH v4 0/7] lib/lzo: performance improvements Sergey Senozhatsky
2018-12-05  9:50   ` Dave Rodgman
2018-12-05 10:20     ` Sergey Senozhatsky
2018-12-06 15:47 ` Markus F.X.J. Oberhumer
2018-12-06 16:22   ` Matt Sealey
2018-12-07 15:54   ` Dave Rodgman
2019-01-07 15:35     ` Dave Rodgman
  -- strict thread matches above, loose matches on Subject: below --
2018-11-30 11:47 [PATCH v3 " Dave Rodgman
2018-11-30 11:47 ` [PATCH 2/8] lib/lzo: clean-up by introducing COPY16 Dave Rodgman
2018-11-30 12:27   ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181130142600.13782-3-dave.rodgman@arm.com \
    --to=dave.rodgman@arm.com \
    --cc=Matt.Sealey@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus@oberhumer.com \
    --cc=minchan@kernel.org \
    --cc=nd@arm.com \
    --cc=nitingupta910@gmail.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sfr@canb.auug.org.au \
    --cc=sonnyrao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).