[MODERATED] Some microperf tests

* [MODERATED] Some microperf tests
@ 2019-02-23 18:26 Andrew Cooper
  2019-02-23 19:30 ` [MODERATED] " Linus Torvalds
  2019-03-07 14:26 ` [MODERATED] Updated " Andrew Cooper
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Cooper @ 2019-02-23 18:26 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

Hello,

So I've finally got my Coffee Lake system and alpha microcode working.

All numbers are the deltas between two RDTSCP instructions, with the
single instruction under test and just enough compiler-inserted mov's to
preserve the output of the first RDTSCP for later calculations.

(Insert some disclaimer about these not being statistically rigorous,
but they do at least give a rough ballpark.)

Pre microcode:
* VERW of NUL   => 65-69 cycles
* VERW of %ds   => 33-37 cycles
* MSR_FLUSH_CMD => 925-980 cycles

Post microcode:
* VERW of NUL   => 512-520 cycles
* VERW of %ds   => 520-540 cycles
* MSR_FLUSH_CMD => 1300-1500 cycles

So, MSR_FLUSH_CMD has got longer, but not by as much as VERW got longer
by.  Pre microcode, the "use %ds" advice is clearly a win, but post
microcode, it appears to be fractionally worse.

I've raise the selector question with Intel - its possible it is a side
effect of this piece of alpha ucode being an early prototype, or that
this particular system is different to most older parts.

Thanks,

~Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread