From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, tytso@mit.edu
Cc: hpa@linux.intel.com, linux-kernel@vger.kernel.org,
mingo@kernel.org, price@mit.edu
Subject: random: Benchamrking fast_mix2
Date: 12 Jun 2014 00:13:18 -0400 [thread overview]
Message-ID: <20140612041318.11805.qmail@ns.horizon.com> (raw)
In-Reply-To: <20140612032248.GA2437@thunk.org>
> I redid my numbers, and I can no longer reproduce the 7x slowdown. I
> do see that if you compile w/o -O2, fast_mix2 is twice as slow. But
> it's not 7x slower.
For my single-round, I needed to drop to 2 loops rather than 3 to match
the speed. That's in the source I posted, but I didn't point it out.
(It wasn't an attempt to be deceptive, that's just how I happened
to have left the file when I was experimenting with various options.
I figured if we were looking for 7x, 1.5x wasn't all that important.)
That explains some of the residual difference between our figures.
When developing, I was using a many-iteration benchmark, and I suspect it
fitted in the Ivy Bridge uop cache, which let it saturate the execution
resources.
Sorry for the premature alarm; I'll go back to work and find something
better.
I still get comparable speed for 2 loops and -O2:
$ cc -W -Wall -m32 -O2 -march=native random.c -o random32
# ./perftest ../spooky/random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
0: 148 124 (-24)
1: 48 36 (-12)
2: 40 36 (-4)
3: 44 40 (-4)
4: 44 40 (-4)
5: 36 36 (+0)
6: 52 36 (-16)
7: 44 32 (-12)
8: 44 36 (-8)
9: 48 36 (-12)
$ cc -W -Wall -m64 -O2 -march=native random.c -o random64
# ./perftest ../spooky/random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
0: 132 104 (-28)
1: 40 40 (+0)
2: 36 44 (+8)
3: 32 40 (+8)
4: 40 36 (-4)
5: 32 40 (+8)
6: 36 44 (+8)
7: 40 40 (+0)
8: 36 44 (+8)
9: 40 36 (-4)
$ cc -W -Wall -m32 -O3 -march=native random.c -o random32
# ./perftest ./random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
0: 88 48 (-40)
1: 36 40 (+4)
2: 36 44 (+8)
3: 32 40 (+8)
4: 36 40 (+4)
5: 96 40 (-56)
6: 40 40 (+0)
7: 36 40 (+4)
8: 28 48 (+20)
9: 28 40 (+12)
$ cc -W -Wall -m64 -O3 -march=native random.c -o random64
# ./perftest ./random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
0: 72 80 (+8)
1: 36 52 (+16)
2: 32 36 (+4)
3: 32 36 (+4)
4: 28 40 (+12)
5: 32 40 (+8)
6: 32 40 (+8)
7: 32 36 (+4)
8: 28 44 (+16)
9: 36 36 (+0)
$ cc -W -Wall -m32 -Os -march=native random.c -o random32
# ./perftest ./random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
0: 108 132 (+24)
1: 44 44 (+0)
2: 76 40 (-36)
3: 44 48 (+4)
4: 36 40 (+4)
5: 32 44 (+12)
6: 40 56 (+16)
7: 44 36 (-8)
8: 44 40 (-4)
9: 32 40 (+8)
$ $ cc -W -Wall -m64 -Os -march=native random.c -o random64
# ./perftest ./random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
0: 96 108 (+12)
1: 44 52 (+8)
2: 40 40 (+0)
3: 40 36 (-4)
4: 40 32 (-8)
5: 36 36 (+0)
6: 44 32 (-12)
7: 36 36 (+0)
8: 40 36 (-4)
9: 40 36 (-4)
Yours looks much more careful about the timing.
A few GCC warnings I ended up fixing:
1) "volatile" on rdtsc is meaningless and ignore (with a warning)
2) fast_mix2() needs a void return type; it defaults to int.
3) int main() needs a "return 0"
Here's what I got running *your* program, unmodified except
for the above (meaning 3 inner loop iterations).
Compiled with GCC 4.9.0 (Devian 4.9.0-6), -O2.
i7-4940K# ./perftest ./ted32
fast_mix: 430 fast_mix2: 431
fast_mix: 442 fast_mix2: 464
fast_mix: 442 fast_mix2: 465
fast_mix: 442 fast_mix2: 431
fast_mix: 442 fast_mix2: 465
fast_mix: 431 fast_mix2: 430
fast_mix: 442 fast_mix2: 431
fast_mix: 431 fast_mix2: 465
fast_mix: 431 fast_mix2: 465
fast_mix: 431 fast_mix2: 431
i7-4940K# ./perftest ./ted64
fast_mix: 454 fast_mix2: 465
fast_mix: 453 fast_mix2: 465
fast_mix: 442 fast_mix2: 464
fast_mix: 453 fast_mix2: 464
fast_mix: 454 fast_mix2: 465
fast_mix: 453 fast_mix2: 465
fast_mix: 442 fast_mix2: 464
fast_mix: 453 fast_mix2: 464
fast_mix: 453 fast_mix2: 464
fast_mix: 453 fast_mix2: 465
In other words, pretty damn near the same
speed (with 3 loops).
So we still have some discrepancy to track down.
A few other machines.
i5-3330$ /tmp/ted32
fast_mix: 226 fast_mix2: 277
fast_mix: 561 fast_mix2: 429
fast_mix: 156 fast_mix2: 406
fast_mix: 504 fast_mix2: 534
fast_mix: 579 fast_mix2: 270
fast_mix: 240 fast_mix2: 270
fast_mix: 494 fast_mix2: 270
fast_mix: 240 fast_mix2: 138
fast_mix: 750 fast_mix2: 277
fast_mix: 124 fast_mix2: 270
i5-3330$ /tmp/ted64
fast_mix: 224 fast_mix2: 277
fast_mix: 226 fast_mix2: 312
fast_mix: 646 fast_mix2: 276
fast_mix: 233 fast_mix2: 456
fast_mix: 591 fast_mix2: 570
fast_mix: 413 fast_mix2: 563
fast_mix: 584 fast_mix2: 270
fast_mix: 231 fast_mix2: 261
fast_mix: 233 fast_mix2: 459
fast_mix: 528 fast_mix2: 277
Pentium4$ /tmp/ted32
fast_mix: 912 fast_mix2: 396
fast_mix: 792 fast_mix2: 160
fast_mix: 524 fast_mix2: 160
fast_mix: 1460 fast_mix2: 440
fast_mix: 496 fast_mix2: 160
fast_mix: 672 fast_mix2: 160
fast_mix: 700 fast_mix2: 160
fast_mix: 336 fast_mix2: 540
fast_mix: 896 fast_mix2: 160
fast_mix: 1052 fast_mix2: 156
Phemom9850$ /tmp/ted32
fast_mix: 463 fast_mix2: 158
fast_mix: 276 fast_mix2: 174
fast_mix: 194 fast_mix2: 135
fast_mix: 620 fast_mix2: 424
fast_mix: 584 fast_mix2: 424
fast_mix: 610 fast_mix2: 418
fast_mix: 651 fast_mix2: 1107
fast_mix: 634 fast_mix2: 439
fast_mix: 632 fast_mix2: 456
fast_mix: 534 fast_mix2: 205
Phemom9850$ /tmp/ted64
fast_mix: 783 fast_mix2: 185
fast_mix: 903 fast_mix2: 144
fast_mix: 955 fast_mix2: 178
fast_mix: 515 fast_mix2: 437
fast_mix: 642 fast_mix2: 580
fast_mix: 610 fast_mix2: 525
fast_mix: 523 fast_mix2: 119
fast_mix: 180 fast_mix2: 315
fast_mix: 596 fast_mix2: 570
fast_mix: 598 fast_mix2: 775
AthlonXP$ /tmp/ted32
fast_mix: 119 fast_mix2: 113
fast_mix: 139 fast_mix2: 109
fast_mix: 155 fast_mix2: 123
fast_mix: 134 fast_mix2: 140
fast_mix: 126 fast_mix2: 154
fast_mix: 134 fast_mix2: 113
fast_mix: 176 fast_mix2: 140
fast_mix: 145 fast_mix2: 113
fast_mix: 134 fast_mix2: 144
fast_mix: 155 fast_mix2: 112
So I'm still a bit confused. Would any bystanders like to
chip in? Ted, shall I send you some binaries?
next prev parent reply other threads:[~2014-06-12 4:13 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-09 0:05 [RFC PATCH] drivers/char/random.c: Is reducing locking range like this safe? George Spelvin
2014-06-09 1:35 ` Theodore Ts'o
2014-06-09 2:10 ` George Spelvin
2014-06-09 2:18 ` George Spelvin
2014-06-09 4:03 ` George Spelvin
2014-06-09 9:23 ` George Spelvin
2014-06-09 13:34 ` Theodore Ts'o
2014-06-09 15:04 ` George Spelvin
2014-06-09 15:50 ` Theodore Ts'o
2014-06-09 16:11 ` George Spelvin
2014-06-10 0:20 ` drivers/char/random.c: more ruminations George Spelvin
2014-06-10 1:20 ` Theodore Ts'o
2014-06-10 3:10 ` George Spelvin
2014-06-10 15:25 ` Theodore Ts'o
2014-06-10 20:40 ` George Spelvin
2014-06-10 21:20 ` Theodore Ts'o
2014-06-11 0:10 ` George Spelvin
2014-06-11 2:08 ` Theodore Ts'o
2014-06-11 3:58 ` George Spelvin
2014-06-11 13:11 ` Theodore Ts'o
2014-06-12 0:42 ` George Spelvin
2014-06-12 1:03 ` H. Peter Anvin
2014-06-11 4:34 ` George Spelvin
2014-06-11 13:09 ` Theodore Ts'o
2014-06-11 2:21 ` Theodore Ts'o
2014-06-09 13:17 ` drivers/char/random.c: More futzing about George Spelvin
2014-06-11 16:38 ` Theodore Ts'o
2014-06-11 16:48 ` H. Peter Anvin
2014-06-11 19:25 ` Theodore Ts'o
2014-06-11 20:41 ` H. Peter Anvin
2014-06-12 0:44 ` H. Peter Anvin
2014-06-12 1:51 ` George Spelvin
2014-06-12 0:32 ` George Spelvin
2014-06-12 3:22 ` Theodore Ts'o
2014-06-12 4:13 ` George Spelvin [this message]
2014-06-12 11:18 ` random: Benchamrking fast_mix2 George Spelvin
2014-06-12 20:17 ` Theodore Ts'o
2014-06-12 20:46 ` Theodore Ts'o
2014-06-13 0:23 ` George Spelvin
2014-06-13 15:52 ` Theodore Ts'o
2014-06-14 2:10 ` George Spelvin
2014-06-14 3:06 ` Theodore Ts'o
2014-06-14 5:25 ` George Spelvin
2014-06-14 6:24 ` Theodore Ts'o
2014-06-14 8:03 ` George Spelvin
2014-06-14 11:14 ` George Spelvin
2014-06-14 15:13 ` George Spelvin
2014-06-14 16:33 ` Theodore Ts'o
2014-06-15 0:23 ` George Spelvin
2014-06-15 1:17 ` Theodore Ts'o
2014-06-15 6:58 ` George Spelvin
2014-06-15 13:01 ` Theodore Ts'o
2014-06-14 6:27 ` Theodore Ts'o
2014-06-14 4:55 ` [RFC] random: is the IRQF_TIMER test working as intended? George Spelvin
2014-06-14 6:43 ` Theodore Ts'o
2014-06-14 7:23 ` George Spelvin
2014-06-12 3:43 ` drivers/char/random.c: More futzing about George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140612041318.11805.qmail@ns.horizon.com \
--to=linux@horizon.com \
--cc=hpa@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=price@mit.edu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.