All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@MIT.EDU>
To: x86@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, Andy Lutomirski <luto@MIT.EDU>
Subject: [RFT/PATCH v2 0/6] Micro-optimize vclock_gettime
Date: Wed,  6 Apr 2011 22:03:57 -0400	[thread overview]
Message-ID: <cover.1302137785.git.luto@mit.edu> (raw)

This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost 30%
(tested on Sandy Bridge).

I'm hoping someone can test this with Ingo's time-warp-test, which
you can get here:

http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

You'll need to change TEST_CLOCK to 1.  I'm especially interested
in Core 2, Pentium D, Westmere, or AMD systems with usable TSCs.
(I've already tested on Sandy Bridge and Bloomfield (Xeon W3520)).

The changes and timings (fastest of 20 trials of 100M iters on Sandy
Bridge) are:

CLOCK_MONOTONIC: 22.09ns -> 15.66ns
CLOCK_REALTIME_COARSE: 4.23ns -> 3.44ns
CLOCK_MONOTONIC_COARSE: 5.65ns -> 4.23ns

x86-64: Clean up vdso/kernel shared variables

Because vsyscall_gtod_data's address isn't known until load time, the
code contains unnecessary address calculations.  The code is also
rather complicated.  Clean it up and use addresses that are known at
compile time.

x86-64: Optimize vread_tsc's barriers

This replaces lfence;rdtsc;lfence with a faster sequence with similar
ordering guarantees.

x86-64: Don't generate cmov in vread_tsc

GCC likes to generate a cmov on a branch that's almost completely
predictable.  Force it to generate a real branch instead.

x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0

vset_normalize_timespec was more general than necessary.  Open-code
the appropriate normalization loops.  This is a big win for
CLOCK_MONOTONIC_COARSE.

x86-64: Move vread_tsc into a new file with sensible options

This way vread_tsc doesn't have a frame pointer, with saves about
0.3ns.  I guess that the CPU's stack frame optimizations aren't quite
as good as I thought.

x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

We're building the vDSO with optimizations disabled that were meant
for kernel code.  Override that, except for -fno-omit-frame-pointers,
which might make userspace debugging harder.

Changes from v1:
 - Redo the vsyscall_gtod_data address patch to make the code
   cleaner instead of uglier and to make it work for all the
   vsyscall variables.
 - Improve the comments for clarity and formatting.
 - Fix up the changelog for the nsec < 0 tweak (the normalization
   code can't be inline because the two callers are different).
 - Move vread_tsc into its own file, removing a GCC version
   dependence and making it more maintainable.

Ingo, I looked at moving vread_tsc into a .S file, but I think
it would be less maintainable for a few reasons:
 - rdtsc_barrier() would need an assembly version.  It uses
   alternatives.
 - The code needs access to the VVAR magic, which would need an
   assembly-callable version.  (This woudn't be so bad, but
   it's more code.)
 - It needs to know the offset of cycles_last.  This would
   involve adding an extra asm offset.
 - I don't think it's that bad in C, and the code it generates
   looks good.

Andy Lutomirski (6):
  x86-64: Clean up vdso/kernel shared variables
  x86-64: Optimize vread_tsc's barriers
  x86-64: Don't generate cmov in vread_tsc
  x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
  x86-64: Move vread_tsc into a new file with sensible options
  x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

 arch/x86/include/asm/tsc.h      |    4 +++
 arch/x86/include/asm/vdso.h     |   14 ----------
 arch/x86/include/asm/vgtod.h    |    2 -
 arch/x86/include/asm/vsyscall.h |   12 +-------
 arch/x86/include/asm/vvar.h     |   52 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile        |    8 +++--
 arch/x86/kernel/time.c          |    2 +-
 arch/x86/kernel/tsc.c           |   19 -------------
 arch/x86/kernel/vmlinux.lds.S   |   34 ++++++++----------------
 arch/x86/kernel/vread_tsc_64.c  |   55 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/vsyscall_64.c   |   46 ++++++++++++++------------------
 arch/x86/vdso/Makefile          |   17 ++++++++++-
 arch/x86/vdso/vclock_gettime.c  |   43 ++++++++++++++++--------------
 arch/x86/vdso/vdso.lds.S        |    7 -----
 arch/x86/vdso/vextern.h         |   16 -----------
 arch/x86/vdso/vgetcpu.c         |    3 +-
 arch/x86/vdso/vma.c             |   27 -------------------
 arch/x86/vdso/vvar.c            |   12 --------
 18 files changed, 189 insertions(+), 184 deletions(-)
 create mode 100644 arch/x86/include/asm/vvar.h
 create mode 100644 arch/x86/kernel/vread_tsc_64.c
 delete mode 100644 arch/x86/vdso/vextern.h
 delete mode 100644 arch/x86/vdso/vvar.c

-- 
1.7.4


             reply	other threads:[~2011-04-07  2:04 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-07  2:03 Andy Lutomirski [this message]
2011-04-07  2:03 ` [RFT/PATCH v2 1/6] x86-64: Clean up vdso/kernel shared variables Andy Lutomirski
2011-04-07  8:08   ` Ingo Molnar
2011-04-07  2:03 ` [RFT/PATCH v2 2/6] x86-64: Optimize vread_tsc's barriers Andy Lutomirski
2011-04-07  8:25   ` Ingo Molnar
2011-04-07 11:44     ` Andrew Lutomirski
2011-04-07 15:23     ` Andi Kleen
2011-04-07 17:28       ` Ingo Molnar
2011-04-07 16:18   ` Linus Torvalds
2011-04-07 16:42     ` Andi Kleen
2011-04-07 17:20       ` Linus Torvalds
2011-04-07 18:15         ` Andi Kleen
2011-04-07 18:30           ` Linus Torvalds
2011-04-07 21:26             ` Andrew Lutomirski
2011-04-08 17:59               ` Andrew Lutomirski
2011-04-09 11:51                 ` Ingo Molnar
2011-04-07 21:43         ` Raghavendra D Prabhu
2011-04-07 22:52           ` Andi Kleen
2011-04-07  2:04 ` [RFT/PATCH v2 3/6] x86-64: Don't generate cmov in vread_tsc Andy Lutomirski
2011-04-07  7:54   ` Ingo Molnar
2011-04-07 11:25     ` Andrew Lutomirski
2011-04-07  2:04 ` [RFT/PATCH v2 4/6] x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 Andy Lutomirski
2011-04-07  7:57   ` Ingo Molnar
2011-04-07 11:27     ` Andrew Lutomirski
2011-04-07  2:04 ` [RFT/PATCH v2 5/6] x86-64: Move vread_tsc into a new file with sensible options Andy Lutomirski
2011-04-07  2:04 ` [RFT/PATCH v2 6/6] x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO Andy Lutomirski
2011-04-07  8:03   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1302137785.git.luto@mit.edu \
    --to=luto@mit.edu \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.