All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Ma, Ling" <ling.ma@intel.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast string.
Date: Mon, 9 Nov 2009 09:08:30 +0100	[thread overview]
Message-ID: <20091109080830.GI453@elte.hu> (raw)
In-Reply-To: <4AF7C66C.6000009@zytor.com>


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 11/08/2009 11:24 PM, Ma, Ling wrote:
> > Hi All
> > 
> > Today we run our benchmark on Core2 and Sandy Bridge:
> > 
> 
> Hi Ling,
> 
> Thanks for doing that.  Do you also have access to any older CPUs?  I 
> suspect that the CPUs that Andi are worried about are older CPUs like 
> P4, K8 or Pentium M/Core 1.  (Andi: please do clarify if you have 
> additional information.)
> 
> My personal opinion is that if we can show no significant slowdown on 
> P4, K8, P-M/Core 1, Core 2, and Nehalem then we can simply use this 
> code unconditionally.  If one of them is radically worse than 
> baseline, then we have to do something conditional, which is a lot 
> more complicated.
> 
> [Ingo, Thomas: do you agree?]

Yeah. IIRC the worst-case were the old P2's which had a really slow, 
microcode based string ops. (Some of them even had erratums in early 
prototypes although we can certainly ignore those as string ops get 
relied on quite frequently.)

IIRC the original PPro core came up with some nifty, hardwired string 
ops, but those had to be dumbed down and emulated in microcode due to 
SMP bugs - making it an inferior choice in the end.

But that should be ancient history and i'd suggest we ignore the P4 
dead-end too, unless it's some really big slowdown (which i doubt). If 
anyone cares then some optional assembly implementations could be added 
back.

Ling, if you are interested, could you send a user-space test-app to 
this thread that everyone could just compile and run on various older 
boxes, to gather a performance profile of hand-coded versus string ops 
performance?

( And i think we can make a judgement based on cache-hot performance
  alone - if then the strings ops will perform comparatively better in
  cache-cold scenarios, so the cache-hot numbers would be a conservative
  estimate. )

	Ingo

  reply	other threads:[~2009-11-09  8:08 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-06  9:41 [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast string ling.ma
2009-11-06 16:51 ` Andi Kleen
2009-11-08 10:18   ` Ingo Molnar
2009-11-06 17:07 ` H. Peter Anvin
2009-11-06 19:26   ` H. Peter Anvin
2009-11-09  7:24     ` Ma, Ling
2009-11-09  7:36       ` H. Peter Anvin
2009-11-09  8:08         ` Ingo Molnar [this message]
2009-11-11  7:05           ` Ma, Ling
2009-11-11  7:18             ` Ingo Molnar
2009-11-11  7:57               ` Ma, Ling
2009-11-11 23:21                 ` H. Peter Anvin
2009-11-12  2:12                   ` Ma, Ling
2009-11-11 20:34             ` Cyrill Gorcunov
2009-11-11 22:39               ` H. Peter Anvin
2009-11-12  4:28                 ` Cyrill Gorcunov
2009-11-12  4:49                   ` Ma, Ling
2009-11-12  5:26                     ` H. Peter Anvin
2009-11-12  7:42                       ` Ma, Ling
2009-11-12  9:54                     ` Cyrill Gorcunov
2009-11-12 12:16           ` Pavel Machek
2009-11-13  7:33             ` Ingo Molnar
2009-11-13  8:04               ` H. Peter Anvin
2009-11-13  8:10                 ` Ingo Molnar
2009-11-09  9:26         ` Andi Kleen
2009-11-09 16:41           ` H. Peter Anvin
2009-11-09 18:54             ` Andi Kleen
2009-11-09 22:36               ` H. Peter Anvin
2009-11-12 12:16       ` Pavel Machek
2009-11-13  5:33         ` Ma, Ling
2009-11-13  6:04           ` H. Peter Anvin
2009-11-13  7:23             ` Ma, Ling
2009-11-13  7:30               ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091109080830.GI453@elte.hu \
    --to=mingo@elte.hu \
    --cc=hpa@zytor.com \
    --cc=ling.ma@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.