From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, tytso@mit.edu
Cc: hpa@linux.intel.com, linux-kernel@vger.kernel.org,
mingo@kernel.org, price@mit.edu
Subject: Re: random: Benchamrking fast_mix2
Date: 12 Jun 2014 20:23:04 -0400 [thread overview]
Message-ID: <20140613002304.17318.qmail@ns.horizon.com> (raw)
In-Reply-To: <20140612204622.GB3112@thunk.org>
> So I just tried your modified 32-bit mixing function where you the
> rotation to the middle step instead of the last step. With the
> usleep(), it doesn't make any difference:
>
> # schedtool -R -p 1 -e /tmp/fast_mix2_48
> fast_mix: 212 fast_mix2: 400 fast_mix3: 400
> fast_mix: 208 fast_mix2: 408 fast_mix3: 388
> fast_mix: 208 fast_mix2: 396 fast_mix3: 404
> fast_mix: 224 fast_mix2: 408 fast_mix3: 392
> fast_mix: 200 fast_mix2: 404 fast_mix3: 404
> fast_mix: 208 fast_mix2: 412 fast_mix3: 396
> fast_mix: 208 fast_mix2: 392 fast_mix3: 392
> fast_mix: 212 fast_mix2: 408 fast_mix3: 388
> fast_mix: 200 fast_mix2: 716 fast_mix3: 773
> fast_mix: 426 fast_mix2: 717 fast_mix3: 728
> And here is my testing using your 64-bit variant:
>
> # schedtool -R -p 1 -e /tmp/fast_mix2_49
> fast_mix: 294 fast_mix2: 476 fast_mix4: 442
> fast_mix: 286 fast_mix2: 1058 fast_mix4: 448
> fast_mix: 958 fast_mix2: 460 fast_mix4: 1002
> fast_mix: 940 fast_mix2: 1176 fast_mix4: 826
> fast_mix: 476 fast_mix2: 840 fast_mix4: 826
> fast_mix: 462 fast_mix2: 840 fast_mix4: 826
> fast_mix: 462 fast_mix2: 826 fast_mix4: 826
> fast_mix: 462 fast_mix2: 826 fast_mix4: 826
> fast_mix: 462 fast_mix2: 826 fast_mix4: 826
> fast_mix: 462 fast_mix2: 840 fast_mix4: 826
> The bottom line is that what we are primarily measuring here is all
> different cache effects. And these are going to be quite different on
> different microarchitectures.
So adding fast_mix4 doubled the time taken by fast_mix.
Yeah, that's trustworthy timing! :-)
Still, you do seem to observe a pretty consistent factor of about 2x
difference, which confuses me because I can't reproduce it.
But it's hard to reach definite conclusions with this much measurement noise.
Another cache we might be hitting is the branch predictor. Could you try
unrolling fast_mix2 and fast_mix4 and see what difference that makes?
(I'd send you a patch but you could probably do it by hand faster than
appying one.)
It only makes a slight difference on my high-end Intel box, but almost
doubles the speed on the Phenom:
Rolled (64-bit core, 2 rounds):
fast_mix: 293 fast_mix2: 205
fast_mix: 257 fast_mix2: 162
fast_mix: 170 fast_mix2: 137
fast_mix: 283 fast_mix2: 218
fast_mix: 270 fast_mix2: 185
fast_mix: 288 fast_mix2: 199
fast_mix: 423 fast_mix2: 131
fast_mix: 286 fast_mix2: 218
fast_mix: 681 fast_mix2: 165
fast_mix: 268 fast_mix2: 190
Unrolled (64-bit core, 2 rounds):
fast_mix: 394 fast_mix2: 108
fast_mix: 145 fast_mix2: 80
fast_mix: 270 fast_mix2: 112
fast_mix: 145 fast_mix2: 81
fast_mix: 145 fast_mix2: 79
fast_mix: 662 fast_mix2: 107
fast_mix: 145 fast_mix2: 78
fast_mix: 140 fast_mix2: 127
fast_mix: 164 fast_mix2: 182
fast_mix: 205 fast_mix2: 79
Since the original fast_mix is unrolled, a penalty there wouldn't
hit it.
> That being said, I wouldn't be at all surprised if there are some
> CPU's where the extract memory dereference to the twist_table[] would
> definitely hurt, since Intel's amazing cache architecture(tm) is no
> doubt covering a lot of sins. I wouldn't be at all surprised if some
> of these new mixing functions would fare much better if we tried
> benchmarking them on an 32-bit ARM processor, for example....
Yes, Intel's D-caches are quite impressive.
next prev parent reply other threads:[~2014-06-13 0:23 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-09 0:05 [RFC PATCH] drivers/char/random.c: Is reducing locking range like this safe? George Spelvin
2014-06-09 1:35 ` Theodore Ts'o
2014-06-09 2:10 ` George Spelvin
2014-06-09 2:18 ` George Spelvin
2014-06-09 4:03 ` George Spelvin
2014-06-09 9:23 ` George Spelvin
2014-06-09 13:34 ` Theodore Ts'o
2014-06-09 15:04 ` George Spelvin
2014-06-09 15:50 ` Theodore Ts'o
2014-06-09 16:11 ` George Spelvin
2014-06-10 0:20 ` drivers/char/random.c: more ruminations George Spelvin
2014-06-10 1:20 ` Theodore Ts'o
2014-06-10 3:10 ` George Spelvin
2014-06-10 15:25 ` Theodore Ts'o
2014-06-10 20:40 ` George Spelvin
2014-06-10 21:20 ` Theodore Ts'o
2014-06-11 0:10 ` George Spelvin
2014-06-11 2:08 ` Theodore Ts'o
2014-06-11 3:58 ` George Spelvin
2014-06-11 13:11 ` Theodore Ts'o
2014-06-12 0:42 ` George Spelvin
2014-06-12 1:03 ` H. Peter Anvin
2014-06-11 4:34 ` George Spelvin
2014-06-11 13:09 ` Theodore Ts'o
2014-06-11 2:21 ` Theodore Ts'o
2014-06-09 13:17 ` drivers/char/random.c: More futzing about George Spelvin
2014-06-11 16:38 ` Theodore Ts'o
2014-06-11 16:48 ` H. Peter Anvin
2014-06-11 19:25 ` Theodore Ts'o
2014-06-11 20:41 ` H. Peter Anvin
2014-06-12 0:44 ` H. Peter Anvin
2014-06-12 1:51 ` George Spelvin
2014-06-12 0:32 ` George Spelvin
2014-06-12 3:22 ` Theodore Ts'o
2014-06-12 4:13 ` random: Benchamrking fast_mix2 George Spelvin
2014-06-12 11:18 ` George Spelvin
2014-06-12 20:17 ` Theodore Ts'o
2014-06-12 20:46 ` Theodore Ts'o
2014-06-13 0:23 ` George Spelvin [this message]
2014-06-13 15:52 ` Theodore Ts'o
2014-06-14 2:10 ` George Spelvin
2014-06-14 3:06 ` Theodore Ts'o
2014-06-14 5:25 ` George Spelvin
2014-06-14 6:24 ` Theodore Ts'o
2014-06-14 8:03 ` George Spelvin
2014-06-14 11:14 ` George Spelvin
2014-06-14 15:13 ` George Spelvin
2014-06-14 16:33 ` Theodore Ts'o
2014-06-15 0:23 ` George Spelvin
2014-06-15 1:17 ` Theodore Ts'o
2014-06-15 6:58 ` George Spelvin
2014-06-15 13:01 ` Theodore Ts'o
2014-06-14 6:27 ` Theodore Ts'o
2014-06-14 4:55 ` [RFC] random: is the IRQF_TIMER test working as intended? George Spelvin
2014-06-14 6:43 ` Theodore Ts'o
2014-06-14 7:23 ` George Spelvin
2014-06-12 3:43 ` drivers/char/random.c: More futzing about George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140613002304.17318.qmail@ns.horizon.com \
--to=linux@horizon.com \
--cc=hpa@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=price@mit.edu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.