All of lore.kernel.org
 help / color / mirror / Atom feed
* Slowdown due to threads bouncing between HT cores
@ 2014-10-03 19:44 Steinar H. Gunderson
  2014-10-03 21:11 ` Marc Burkhardt
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Steinar H. Gunderson @ 2014-10-03 19:44 UTC (permalink / raw)
  To: linux-kernel

Hi,

I did a chess benchmark of my new machine (2x E5-2650v3, so 20x2.3GHz
Haswell-EP), and it performed a bit worse than comparable Windows setups.
It looks like the scheduler somehow doesn't perform as well with
hyperthreading; HT is on in the BIOS, but I'm only using 20 threads
(chess scales sublinearly, so using all 40 usually isn't a good idea),
so really, the threads should just get one core each and that's it.
It looks like they are bouncing between cores, reducing overall performance
by ~20% for some reason. (The machine is otherwise generally idle.)

First some details to reproduce more easily. Kernel version is 3.16.3, 64-bit
x86, Debian stable (so gcc 4.7.2). The benchmark binary is a chess engine
knows as Stockfish; this is the compile I used (because that's what everyone
else is benchmarking with):

  http://abrok.eu/stockfish/builds/dbd6156fceaf9bec8e9ff14f99c325c36b284079/linux64modernsse/stockfish_13111907_x64_modern_sse42

Stockfish is GPL, so the source is readily available if you should need it.

The benchmark is run with by just running the binary, then giving it these
commands one by one:

uci
setoption name Threads value 20
setoption name Hash value 1024
position fen rnbq1rk1/pppnbppp/4p3/3pP1B1/3P3P/2N5/PPP2PP1/R2QKBNR w KQ – 0 7
go wtime 7200000 winc 30000 btime 7200000 binc 30000

After ~3 minutes, it will output “bestmove d1g4 ponder f8e8”. A few lines
above that, you'll see a line with something similar to “nps 13266463”.
That's nodes per second, and you want it to be higher.

So, benchmark:

 - Default: 13266 kN/sec
 - Change from ondemand to performance on all cores: 14600 kN/sec
 - taskset -c 0-19 (locking affinity to only one set of hyperthreads):
   17512 kN/sec

There is some local variation, but it's typically within a few percent.
Does anyone know what's going on? I have CONFIG_SCHED_SMT=y and
CONFIG_SCHED_MC=y.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-10-27 10:05 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-03 19:44 Slowdown due to threads bouncing between HT cores Steinar H. Gunderson
2014-10-03 21:11 ` Marc Burkhardt
2014-10-03 21:14   ` Steinar H. Gunderson
2014-10-04  9:22     ` Marc Burkhardt
2014-10-04 13:41 ` Andi Kleen
2014-10-04 14:12   ` Steinar H. Gunderson
2014-10-04 14:50 ` Chuck Ebbert
2014-10-05 11:19   ` Steinar H. Gunderson
2014-10-08 15:37 ` bisected: futex regression >= 3.14 - was - " Mike Galbraith
2014-10-08 16:14   ` Thomas Gleixner
2014-10-08 16:45     ` Steinar H. Gunderson
2014-10-08 17:52       ` Mike Galbraith
2014-10-08 16:23   ` Steinar H. Gunderson
2014-10-08 17:04   ` Linus Torvalds
2014-10-08 17:05     ` Steinar H. Gunderson
2014-10-08 17:59     ` Mike Galbraith
2014-10-24 15:25       ` Thomas Gleixner
2014-10-24 16:38         ` Mike Galbraith
2014-10-26 10:39           ` Steinar H. Gunderson
2014-10-26 13:16             ` Mike Galbraith
2014-10-26 13:58               ` Mike Galbraith
2014-10-26 14:11                 ` Steinar H. Gunderson
2014-10-26 14:41                   ` Mike Galbraith
2014-10-27 10:05                   ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.