From: ling.ma@intel.com
To: mingo@elte.hu
Cc: hpa@zytor.com, tglx@linutronix.de, linux-kernel@vger.kernel.org,
Ma Ling <ling.ma@intel.com>
Subject: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast string.
Date: Fri, 6 Nov 2009 17:41:22 +0800 [thread overview]
Message-ID: <1257500482-16182-1-git-send-email-ling.ma@intel.com> (raw)
From: Ma Ling <ling.ma@intel.com>
Hi All
Intel Nehalem improves the performance of REP strings significantly
over previous microarchitectures in several ways:
1. Startup overhead have been reduced in most cases.
2. Data transfer throughput are improved.
3. REP string can operate in "fast string" even if address is not
aligned to 16bytes.
According to the experiment when copy size is big enough
movsq almost can get 16bytes throughput per cycle, which
approximate SSE instruction set. The patch intends to utilize
the optimization when copy size is over 1024.
Experiment data speedup under Nehalem platform:
Len alignment Speedup
1024, 0/ 0: 1.04x
2048, 0/ 0: 1.36x
3072, 0/ 0: 1.51x
4096, 0/ 0: 1.60x
5120, 0/ 0: 1.70x
6144, 0/ 0: 1.74x
7168, 0/ 0: 1.77x
8192, 0/ 0: 1.80x
9216, 0/ 0: 1.82x
10240, 0/ 0: 1.83x
11264, 0/ 0: 1.85x
12288, 0/ 0: 1.86x
13312, 0/ 0: 1.92x
14336, 0/ 0: 1.84x
15360, 0/ 0: 1.74x
'perf stat --repeat 10 ./static_orig' command get data before patch:
Performance counter stats for './static_orig' (10 runs):
2835.650105 task-clock-msecs # 0.999 CPUs ( +- 0.051% )
3 context-switches # 0.000 M/sec ( +- 6.503% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4429 page-faults # 0.002 M/sec ( +- 0.003% )
7941098692 cycles # 2800.451 M/sec ( +- 0.051% )
10848100323 instructions # 1.366 IPC ( +- 0.000% )
322808 cache-references # 0.114 M/sec ( +- 1.467% )
280716 cache-misses # 0.099 M/sec ( +- 0.618% )
2.838006377 seconds time elapsed ( +- 0.051% )
'perf stat --repeat 10 ./static_new' command get data after patch:
Performance counter stats for './static_new' (10 runs):
7401.423466 task-clock-msecs # 0.999 CPUs ( +- 0.108% )
10 context-switches # 0.000 M/sec ( +- 2.797% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4428 page-faults # 0.001 M/sec ( +- 0.003% )
20727280183 cycles # 2800.445 M/sec ( +- 0.107% )
1472673654 instructions # 0.071 IPC ( +- 0.013% )
1092221 cache-references # 0.148 M/sec ( +- 12.414% )
290550 cache-misses # 0.039 M/sec ( +- 1.577% )
7.407006046 seconds time elapsed ( +- 0.108% )
Appreciate your comments.
Thanks
Ma Ling
---
arch/x86/lib/memcpy_64.S | 17 +++++++++++++++++
1 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index ad5441e..2ea3561 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -50,6 +50,12 @@ ENTRY(memcpy)
movl %edx, %ecx
shrl $6, %ecx
jz .Lhandle_tail
+ /*
+ * If length is more than 1024 we chose optimized MOVSQ,
+ * which has more throughput.
+ */
+ cmpl $0x400, %edx
+ jae .Lmore_0x400
.p2align 4
.Lloop_64:
@@ -119,6 +125,17 @@ ENTRY(memcpy)
.Lend:
ret
+
+ .p2align 4
+.Lmore_0x400:
+ movq %rdi, %rax
+ movl %edx, %ecx
+ shrl $3, %ecx
+ andl $7, %edx
+ rep movsq
+ movl %edx, %ecx
+ rep movsb
+ ret
CFI_ENDPROC
ENDPROC(memcpy)
ENDPROC(__memcpy)
--
1.6.2.5
next reply other threads:[~2009-11-06 9:36 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-06 9:41 ling.ma [this message]
2009-11-06 16:51 ` [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast string Andi Kleen
2009-11-08 10:18 ` Ingo Molnar
2009-11-06 17:07 ` H. Peter Anvin
2009-11-06 19:26 ` H. Peter Anvin
2009-11-09 7:24 ` Ma, Ling
2009-11-09 7:36 ` H. Peter Anvin
2009-11-09 8:08 ` Ingo Molnar
2009-11-11 7:05 ` Ma, Ling
2009-11-11 7:18 ` Ingo Molnar
2009-11-11 7:57 ` Ma, Ling
2009-11-11 23:21 ` H. Peter Anvin
2009-11-12 2:12 ` Ma, Ling
2009-11-11 20:34 ` Cyrill Gorcunov
2009-11-11 22:39 ` H. Peter Anvin
2009-11-12 4:28 ` Cyrill Gorcunov
2009-11-12 4:49 ` Ma, Ling
2009-11-12 5:26 ` H. Peter Anvin
2009-11-12 7:42 ` Ma, Ling
2009-11-12 9:54 ` Cyrill Gorcunov
2009-11-12 12:16 ` Pavel Machek
2009-11-13 7:33 ` Ingo Molnar
2009-11-13 8:04 ` H. Peter Anvin
2009-11-13 8:10 ` Ingo Molnar
2009-11-09 9:26 ` Andi Kleen
2009-11-09 16:41 ` H. Peter Anvin
2009-11-09 18:54 ` Andi Kleen
2009-11-09 22:36 ` H. Peter Anvin
2009-11-12 12:16 ` Pavel Machek
2009-11-13 5:33 ` Ma, Ling
2009-11-13 6:04 ` H. Peter Anvin
2009-11-13 7:23 ` Ma, Ling
2009-11-13 7:30 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1257500482-16182-1-git-send-email-ling.ma@intel.com \
--to=ling.ma@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.