All of lore.kernel.org
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, tytso@mit.edu
Cc: hpa@linux.intel.com, linux-kernel@vger.kernel.org,
	mingo@kernel.org, price@mit.edu
Subject: Re: random: Benchamrking fast_mix2
Date: 12 Jun 2014 20:23:04 -0400	[thread overview]
Message-ID: <20140613002304.17318.qmail@ns.horizon.com> (raw)
In-Reply-To: <20140612204622.GB3112@thunk.org>

> So I just tried your modified 32-bit mixing function where you the
> rotation to the middle step instead of the last step.  With the
> usleep(), it doesn't make any difference:
> 
> # schedtool -R -p 1 -e /tmp/fast_mix2_48
> fast_mix: 212  fast_mix2: 400	fast_mix3: 400
> fast_mix: 208  fast_mix2: 408	fast_mix3: 388
> fast_mix: 208  fast_mix2: 396	fast_mix3: 404
> fast_mix: 224  fast_mix2: 408	fast_mix3: 392
> fast_mix: 200  fast_mix2: 404	fast_mix3: 404
> fast_mix: 208  fast_mix2: 412	fast_mix3: 396
> fast_mix: 208  fast_mix2: 392	fast_mix3: 392
> fast_mix: 212  fast_mix2: 408	fast_mix3: 388
> fast_mix: 200  fast_mix2: 716	fast_mix3: 773
> fast_mix: 426  fast_mix2: 717	fast_mix3: 728

> And here is my testing using your 64-bit variant:
> 
> # schedtool -R -p 1 -e /tmp/fast_mix2_49
> fast_mix: 294  fast_mix2: 476  fast_mix4: 442
> fast_mix: 286  fast_mix2: 1058 fast_mix4: 448
> fast_mix: 958  fast_mix2: 460  fast_mix4: 1002
> fast_mix: 940  fast_mix2: 1176 fast_mix4: 826
> fast_mix: 476  fast_mix2: 840  fast_mix4: 826
> fast_mix: 462  fast_mix2: 840  fast_mix4: 826
> fast_mix: 462  fast_mix2: 826  fast_mix4: 826
> fast_mix: 462  fast_mix2: 826  fast_mix4: 826
> fast_mix: 462  fast_mix2: 826  fast_mix4: 826
> fast_mix: 462  fast_mix2: 840  fast_mix4: 826

> The bottom line is that what we are primarily measuring here is all
> different cache effects.  And these are going to be quite different on
> different microarchitectures.

So adding fast_mix4 doubled the time taken by fast_mix.
Yeah, that's trustworthy timing! :-)

Still, you do seem to observe a pretty consistent factor of about 2x
difference, which confuses me because I can't reproduce it.

But it's hard to reach definite conclusions with this much measurement noise.

Another cache we might be hitting is the branch predictor.  Could you try
unrolling fast_mix2 and fast_mix4 and see what difference that makes?
(I'd send you a patch but you could probably do it by hand faster than
appying one.)

It only makes a slight difference on my high-end Intel box, but almost
doubles the speed on the Phenom:

Rolled (64-bit core, 2 rounds):
fast_mix: 293   fast_mix2: 205
fast_mix: 257   fast_mix2: 162
fast_mix: 170   fast_mix2: 137
fast_mix: 283   fast_mix2: 218
fast_mix: 270   fast_mix2: 185
fast_mix: 288   fast_mix2: 199
fast_mix: 423   fast_mix2: 131
fast_mix: 286   fast_mix2: 218
fast_mix: 681   fast_mix2: 165
fast_mix: 268   fast_mix2: 190

Unrolled (64-bit core, 2 rounds):
fast_mix: 394   fast_mix2: 108
fast_mix: 145   fast_mix2: 80
fast_mix: 270   fast_mix2: 112
fast_mix: 145   fast_mix2: 81
fast_mix: 145   fast_mix2: 79
fast_mix: 662   fast_mix2: 107
fast_mix: 145   fast_mix2: 78
fast_mix: 140   fast_mix2: 127
fast_mix: 164   fast_mix2: 182
fast_mix: 205   fast_mix2: 79

Since the original fast_mix is unrolled, a penalty there wouldn't
hit it.

> That being said, I wouldn't be at all surprised if there are some
> CPU's where the extract memory dereference to the twist_table[] would
> definitely hurt, since Intel's amazing cache architecture(tm) is no
> doubt covering a lot of sins.  I wouldn't be at all surprised if some
> of these new mixing functions would fare much better if we tried
> benchmarking them on an 32-bit ARM processor, for example....

Yes, Intel's D-caches are quite impressive.

  reply	other threads:[~2014-06-13  0:23 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-09  0:05 [RFC PATCH] drivers/char/random.c: Is reducing locking range like this safe? George Spelvin
2014-06-09  1:35 ` Theodore Ts'o
2014-06-09  2:10   ` George Spelvin
2014-06-09  2:18     ` George Spelvin
2014-06-09  4:03       ` George Spelvin
2014-06-09  9:23         ` George Spelvin
2014-06-09 13:34         ` Theodore Ts'o
2014-06-09 15:04           ` George Spelvin
2014-06-09 15:50             ` Theodore Ts'o
2014-06-09 16:11               ` George Spelvin
2014-06-10  0:20               ` drivers/char/random.c: more ruminations George Spelvin
2014-06-10  1:20                 ` Theodore Ts'o
2014-06-10  3:10                   ` George Spelvin
2014-06-10 15:25                     ` Theodore Ts'o
2014-06-10 20:40                       ` George Spelvin
2014-06-10 21:20                         ` Theodore Ts'o
2014-06-11  0:10                           ` George Spelvin
2014-06-11  2:08                             ` Theodore Ts'o
2014-06-11  3:58                               ` George Spelvin
2014-06-11 13:11                                 ` Theodore Ts'o
2014-06-12  0:42                                   ` George Spelvin
2014-06-12  1:03                                   ` H. Peter Anvin
2014-06-11  4:34                               ` George Spelvin
2014-06-11 13:09                                 ` Theodore Ts'o
2014-06-11  2:21                             ` Theodore Ts'o
2014-06-09 13:17   ` drivers/char/random.c: More futzing about George Spelvin
2014-06-11 16:38     ` Theodore Ts'o
2014-06-11 16:48       ` H. Peter Anvin
2014-06-11 19:25         ` Theodore Ts'o
2014-06-11 20:41           ` H. Peter Anvin
2014-06-12  0:44             ` H. Peter Anvin
2014-06-12  1:51               ` George Spelvin
2014-06-12  0:32       ` George Spelvin
2014-06-12  3:22         ` Theodore Ts'o
2014-06-12  4:13           ` random: Benchamrking fast_mix2 George Spelvin
2014-06-12 11:18             ` George Spelvin
2014-06-12 20:17               ` Theodore Ts'o
2014-06-12 20:46               ` Theodore Ts'o
2014-06-13  0:23                 ` George Spelvin [this message]
2014-06-13 15:52                   ` Theodore Ts'o
2014-06-14  2:10                     ` George Spelvin
2014-06-14  3:06                       ` Theodore Ts'o
2014-06-14  5:25                         ` George Spelvin
2014-06-14  6:24                           ` Theodore Ts'o
2014-06-14  8:03                             ` George Spelvin
2014-06-14 11:14                               ` George Spelvin
2014-06-14 15:13                                 ` George Spelvin
2014-06-14 16:33                                   ` Theodore Ts'o
2014-06-15  0:23                                     ` George Spelvin
2014-06-15  1:17                                       ` Theodore Ts'o
2014-06-15  6:58                                         ` George Spelvin
2014-06-15 13:01                                           ` Theodore Ts'o
2014-06-14  6:27                           ` Theodore Ts'o
2014-06-14  4:55                     ` [RFC] random: is the IRQF_TIMER test working as intended? George Spelvin
2014-06-14  6:43                       ` Theodore Ts'o
2014-06-14  7:23                         ` George Spelvin
2014-06-12  3:43       ` drivers/char/random.c: More futzing about George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140613002304.17318.qmail@ns.horizon.com \
    --to=linux@horizon.com \
    --cc=hpa@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=price@mit.edu \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.