linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Mala Anand <manand@us.ibm.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Bill Hartner <bhartner@us.ibm.com>
Subject: Re: 2.5.40-mm1
Date: Wed, 09 Oct 2002 16:32:11 -0700	[thread overview]
Message-ID: <3DA4BC7B.EC8D65A3@digeo.com> (raw)
In-Reply-To: OF13BF2DC5.95D8249D-ON87256C4C.00509A83@boulder.ibm.com

Mala Anand wrote:
> 
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
> 
>  Dst      2.5.40       2.5.40+patch     2.5.40+patch++
> Align    throughout     throughput      throughput
> (bytes)   KB/sec          KB/sec        KB/sec
>   0       1360071         1314783        912359
>   1       323674           340447
>   2       329202           336425
>   4       512955           693170
>   8       523223           615097        506641
>  12       517184           558701        553700
>  16       966598           872080        932736
>  32       846937           838514        845178

Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned.  Even 32-byte-aligned is down a lot.

 
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
> 
> Dst is aligned and src is misaligned as follows:
> 
>  Dst      2.5.40       2.5.40+patch
> Align    throughout     throughput
> (bytes)   KB/sec          KB/sec
>   0       1275372       1029815
>   1        529907        511815
>   2        534811        530850
>   4        643196        627013
>   8        568000        626676
>  12        574468        658793
>  16        631707        635979
>  32        741485        592938

This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.

But still, that's something we can tune later.
 
> 
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.

I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.

  reply	other threads:[~2002-10-09 23:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-09 23:20 2.5.40-mm1 Mala Anand
2002-10-09 23:32 ` Andrew Morton [this message]
  -- strict thread matches above, loose matches on Subject: below --
2002-10-01  9:32 2.5.40-mm1 Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DA4BC7B.EC8D65A3@digeo.com \
    --to=akpm@digeo.com \
    --cc=bhartner@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manand@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).