linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86-64: handle byte-wise tail copying in memcpy() without a loop
@ 2012-01-26 15:55 Jan Beulich
  2012-01-26 21:38 ` [tip:x86/asm] x86-64: Handle " tip-bot for Jan Beulich
  0 siblings, 1 reply; 2+ messages in thread
From: Jan Beulich @ 2012-01-26 15:55 UTC (permalink / raw)
  To: mingo, tglx, hpa; +Cc: linux-kernel

While hard to measure, reducing the number of possibly/likely
mis-predicted branches can generally be expected to be slightly better.

Other than apparent at the first glance, this also doesn't grow the
function size (the alignment gap to the next function just gets
smaller).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

---
 arch/x86/lib/memcpy_64.S |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

--- 3.3-rc1/arch/x86/lib/memcpy_64.S
+++ 3.3-rc1-x86_64-memcpy-tail/arch/x86/lib/memcpy_64.S
@@ -169,18 +169,19 @@ ENTRY(memcpy)
 	retq
 	.p2align 4
 .Lless_3bytes:
-	cmpl $0, %edx
-	je .Lend
+	subl $1, %edx
+	jb .Lend
 	/*
 	 * Move data from 1 bytes to 3 bytes.
 	 */
-.Lloop_1:
-	movb (%rsi), %r8b
-	movb %r8b, (%rdi)
-	incq %rdi
-	incq %rsi
-	decl %edx
-	jnz .Lloop_1
+	movzbl (%rsi), %ecx
+	jz .Lstore_1byte
+	movzbq 1(%rsi), %r8
+	movzbq (%rsi, %rdx), %r9
+	movb %r8b, 1(%rdi)
+	movb %r9b, (%rdi, %rdx)
+.Lstore_1byte:
+	movb %cl, (%rdi)
 
 .Lend:
 	retq




^ permalink raw reply	[flat|nested] 2+ messages in thread

* [tip:x86/asm] x86-64: Handle byte-wise tail copying in memcpy() without a loop
  2012-01-26 15:55 [PATCH] x86-64: handle byte-wise tail copying in memcpy() without a loop Jan Beulich
@ 2012-01-26 21:38 ` tip-bot for Jan Beulich
  0 siblings, 0 replies; 2+ messages in thread
From: tip-bot for Jan Beulich @ 2012-01-26 21:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, jbeulich, JBeulich, tglx, mingo

Commit-ID:  9d8e22777e66f420e46490e9fc6f8cb7e0e2222b
Gitweb:     http://git.kernel.org/tip/9d8e22777e66f420e46490e9fc6f8cb7e0e2222b
Author:     Jan Beulich <JBeulich@suse.com>
AuthorDate: Thu, 26 Jan 2012 15:55:32 +0000
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 26 Jan 2012 21:19:20 +0100

x86-64: Handle byte-wise tail copying in memcpy() without a loop

While hard to measure, reducing the number of possibly/likely
mis-predicted branches can generally be expected to be slightly
better.

Other than apparent at the first glance, this also doesn't grow
the function size (the alignment gap to the next function just
gets smaller).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F218584020000780006F422@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/lib/memcpy_64.S |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 1235b04..1c273be 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -164,18 +164,19 @@ ENTRY(memcpy)
 	retq
 	.p2align 4
 .Lless_3bytes:
-	cmpl $0, %edx
-	je .Lend
+	subl $1, %edx
+	jb .Lend
 	/*
 	 * Move data from 1 bytes to 3 bytes.
 	 */
-.Lloop_1:
-	movb (%rsi), %r8b
-	movb %r8b, (%rdi)
-	incq %rdi
-	incq %rsi
-	decl %edx
-	jnz .Lloop_1
+	movzbl (%rsi), %ecx
+	jz .Lstore_1byte
+	movzbq 1(%rsi), %r8
+	movzbq (%rsi, %rdx), %r9
+	movb %r8b, 1(%rdi)
+	movb %r9b, (%rdi, %rdx)
+.Lstore_1byte:
+	movb %cl, (%rdi)
 
 .Lend:
 	retq

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-01-26 21:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26 15:55 [PATCH] x86-64: handle byte-wise tail copying in memcpy() without a loop Jan Beulich
2012-01-26 21:38 ` [tip:x86/asm] x86-64: Handle " tip-bot for Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).