From: "Ma, Ling" <ling.ma@intel.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: "mingo@elte.hu" <mingo@elte.hu>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast string.
Date: Mon, 9 Nov 2009 15:24:03 +0800 [thread overview]
Message-ID: <8FED46E8A9CA574792FC7AACAC38FE7714FCF772C9@PDSMSX501.ccr.corp.intel.com> (raw)
In-Reply-To: <4AF4784C.5090800@zytor.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 6123 bytes --]
Hi All
Today we run our benchmark on Core2 and Sandy Bridge:
1. Retrieve result on Core2
Speedup on Core2
Len Alignement Speedup
1024, 0/ 0: 0.95x
2048, 0/ 0: 1.03x
3072, 0/ 0: 1.02x
4096, 0/ 0: 1.09x
5120, 0/ 0: 1.13x
6144, 0/ 0: 1.13x
7168, 0/ 0: 1.14x
8192, 0/ 0: 1.13x
9216, 0/ 0: 1.14x
10240, 0/ 0: 0.99x
11264, 0/ 0: 1.14x
12288, 0/ 0: 1.14x
13312, 0/ 0: 1.10x
14336, 0/ 0: 1.10x
15360, 0/ 0: 1.13x
Application run through perf
For (i= 1024; i < 1024 * 16; i = i + 64)
do_memcpy(0, 0, i);
Run application by 'perf stat --repeat 10 ./static_orig/new'
Before the patch:
Performance counter stats for './static_orig' (10 runs):
3323.041832 task-clock-msecs # 0.998 CPUs ( +- 0.016% )
22 context-switches # 0.000 M/sec ( +- 31.913% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4428 page-faults # 0.001 M/sec ( +- 0.003% )
9921549804 cycles # 2985.683 M/sec ( +- 0.016% )
10863809359 instructions # 1.095 IPC ( +- 0.000% )
972283451 cache-references # 292.588 M/sec ( +- 0.018% )
17703 cache-misses # 0.005 M/sec ( +- 4.304% )
3.330714469 seconds time elapsed ( +- 0.021% )
After the patch:
Performance counter stats for './static_new' (10 runs):
3392.902871 task-clock-msecs # 0.998 CPUs ( +- 0.226% )
21 context-switches # 0.000 M/sec ( +- 30.982% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4428 page-faults # 0.001 M/sec ( +- 0.003% )
10130188030 cycles # 2985.699 M/sec ( +- 0.227% )
391981414 instructions # 0.039 IPC ( +- 0.013% )
874161826 cache-references # 257.644 M/sec ( +- 3.034% )
17628 cache-misses # 0.005 M/sec ( +- 4.577% )
3.400681174 seconds time elapsed ( +- 0.219% )
2. Retrieve result on Sandy Bridge
Speedup on Sandy Bridge
Len Alignement Speedup
1024, 0/ 0: 1.08x
2048, 0/ 0: 1.42x
3072, 0/ 0: 1.51x
4096, 0/ 0: 1.63x
5120, 0/ 0: 1.67x
6144, 0/ 0: 1.72x
7168, 0/ 0: 1.75x
8192, 0/ 0: 1.77x
9216, 0/ 0: 1.80x
10240, 0/ 0: 1.80x
11264, 0/ 0: 1.82x
12288, 0/ 0: 1.85x
13312, 0/ 0: 1.85x
14336, 0/ 0: 1.88x
15360, 0/ 0: 1.88x
Application run through perf
For (i= 1024; i < 1024 * 16; i = i + 64)
do_memcpy(0, 0, i);
Run application by 'perf stat --repeat 10 ./static_orig/new'
Before the patch:
Performance counter stats for './static_orig' (10 runs):
3787.441240 task-clock-msecs # 0.995 CPUs ( +- 0.140% )
8 context-switches # 0.000 M/sec ( +- 22.602% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4428 page-faults # 0.001 M/sec ( +- 0.003% )
6053487926 cycles # 1598.305 M/sec ( +- 0.140% )
10861025194 instructions # 1.794 IPC ( +- 0.001% )
2823963 cache-references # 0.746 M/sec ( +- 69.345% )
266000 cache-misses # 0.070 M/sec ( +- 0.980% )
3.805400837 seconds time elapsed ( +- 0.139% )
After the patch:
Performance counter stats for './static_new' (10 runs):
2879.424879 task-clock-msecs # 0.995 CPUs ( +- 0.076% )
10 context-switches # 0.000 M/sec ( +- 24.761% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4428 page-faults # 0.002 M/sec ( +- 0.003% )
4602155158 cycles # 1598.290 M/sec ( +- 0.076% )
386146993 instructions # 0.084 IPC ( +- 0.005% )
520008 cache-references # 0.181 M/sec ( +- 8.077% )
267345 cache-misses # 0.093 M/sec ( +- 0.792% )
2.893813235 seconds time elapsed ( +- 0.085% )
Thanks
Ling
>-----Original Message-----
>From: H. Peter Anvin [mailto:hpa@zytor.com]
>Sent: 2009Äê11ÔÂ7ÈÕ 3:26
>To: Ma, Ling
>Cc: mingo@elte.hu; tglx@linutronix.de; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast
>string.
>
>On 11/06/2009 09:07 AM, H. Peter Anvin wrote:
>>
>> Where did the 1024 byte threshold come from? It seems a bit high to me,
>> and is at the very best a CPU-specific tuning factor.
>>
>> Andi is of course correct that older CPUs might suffer (sadly enough),
>> which is why we'd at the very least need some idea of what the
>> performance impact on those older CPUs would look like -- at that point
>> we can make a decision to just unconditionally do the rep movs or
>> consider some system where we point at different implementations for
>> different processors -- memcpy is probably one of the very few
>> operations for which something like that would make sense.
>>
>
>To be expicit: Ling, would you be willing to run some benchmarks across
>processors to see how this performs on non-Nehalem CPUs?
>
> -hpa
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
next prev parent reply other threads:[~2009-11-09 7:25 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-06 9:41 [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast string ling.ma
2009-11-06 16:51 ` Andi Kleen
2009-11-08 10:18 ` Ingo Molnar
2009-11-06 17:07 ` H. Peter Anvin
2009-11-06 19:26 ` H. Peter Anvin
2009-11-09 7:24 ` Ma, Ling [this message]
2009-11-09 7:36 ` H. Peter Anvin
2009-11-09 8:08 ` Ingo Molnar
2009-11-11 7:05 ` Ma, Ling
2009-11-11 7:18 ` Ingo Molnar
2009-11-11 7:57 ` Ma, Ling
2009-11-11 23:21 ` H. Peter Anvin
2009-11-12 2:12 ` Ma, Ling
2009-11-11 20:34 ` Cyrill Gorcunov
2009-11-11 22:39 ` H. Peter Anvin
2009-11-12 4:28 ` Cyrill Gorcunov
2009-11-12 4:49 ` Ma, Ling
2009-11-12 5:26 ` H. Peter Anvin
2009-11-12 7:42 ` Ma, Ling
2009-11-12 9:54 ` Cyrill Gorcunov
2009-11-12 12:16 ` Pavel Machek
2009-11-13 7:33 ` Ingo Molnar
2009-11-13 8:04 ` H. Peter Anvin
2009-11-13 8:10 ` Ingo Molnar
2009-11-09 9:26 ` Andi Kleen
2009-11-09 16:41 ` H. Peter Anvin
2009-11-09 18:54 ` Andi Kleen
2009-11-09 22:36 ` H. Peter Anvin
2009-11-12 12:16 ` Pavel Machek
2009-11-13 5:33 ` Ma, Ling
2009-11-13 6:04 ` H. Peter Anvin
2009-11-13 7:23 ` Ma, Ling
2009-11-13 7:30 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8FED46E8A9CA574792FC7AACAC38FE7714FCF772C9@PDSMSX501.ccr.corp.intel.com \
--to=ling.ma@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.