All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64
Date: Mon, 1 Jul 2013 13:11:22 +0200	[thread overview]
Message-ID: <20130701111122.GA18772@gmail.com> (raw)
In-Reply-To: <20130701102306.GC23515@pd.tnic>


* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Jul 01, 2013 at 09:50:46AM +0200, Ingo Molnar wrote:
> > Not sure - the main thing we want to know is whether it gets faster.
> > The _amount_ will depend on things like precise usage patterns,
> > caching, etc. - but rarely does a real workload turn a win like this
> > into a loss.
> 
> Yep, and it does get faster by a whopping 6 seconds!
> 
> Almost all standard counters go down a bit.
> 
> Interestingly, branch misses get a slight increase and the asm goto
> thing does actually jump to the fail_fn from within the asm so maybe
> this could puzzle the branch predictor a bit. Although the instructions
> look the same and jumps are both forward.
> 
> Oh well, we don't know where those additional misses happened so it
> could be somewhere else entirely, or it is simply noise.
> 
> In any case, we're getting faster, so not worth investigating I guess.
> 
> 
> plain 3.10
> ==========
> 
>  Performance counter stats for '../build-kernel.sh' (5 runs):
> 
>     1312558.712266 task-clock                #    5.961 CPUs utilized            ( +-  0.02% )
>          1,036,629 context-switches          #    0.790 K/sec                    ( +-  0.24% )
>             55,118 cpu-migrations            #    0.042 K/sec                    ( +-  0.25% )
>         46,505,184 page-faults               #    0.035 M/sec                    ( +-  0.00% )
>  4,768,420,289,997 cycles                    #    3.633 GHz                      ( +-  0.02% ) [83.79%]
>  3,424,161,066,397 stalled-cycles-frontend   #   71.81% frontend cycles idle     ( +-  0.02% ) [83.78%]
>  2,483,143,574,419 stalled-cycles-backend    #   52.07% backend  cycles idle     ( +-  0.04% ) [67.40%]
>  3,091,612,061,933 instructions              #    0.65  insns per cycle
>                                              #    1.11  stalled cycles per insn  ( +-  0.01% ) [83.93%]
>    677,787,215,988 branches                  #  516.386 M/sec                    ( +-  0.01% ) [83.77%]
>     25,438,736,368 branch-misses             #    3.75% of all branches          ( +-  0.02% ) [83.78%]
> 
>      220.191740778 seconds time elapsed                                          ( +-  0.32% )
> 
>  + patch
> ========
> 
>  Performance counter stats for '../build-kernel.sh' (5 runs):
> 
>     1309995.427337 task-clock                #    6.106 CPUs utilized            ( +-  0.09% )
>          1,033,446 context-switches          #    0.789 K/sec                    ( +-  0.23% )
>             55,228 cpu-migrations            #    0.042 K/sec                    ( +-  0.28% )
>         46,484,992 page-faults               #    0.035 M/sec                    ( +-  0.00% )
>  4,759,631,961,013 cycles                    #    3.633 GHz                      ( +-  0.09% ) [83.78%]
>  3,415,933,806,156 stalled-cycles-frontend   #   71.77% frontend cycles idle     ( +-  0.12% ) [83.78%]
>  2,476,066,765,933 stalled-cycles-backend    #   52.02% backend  cycles idle     ( +-  0.10% ) [67.38%]
>  3,089,317,073,397 instructions              #    0.65  insns per cycle
>                                              #    1.11  stalled cycles per insn  ( +-  0.02% ) [83.95%]
>    677,623,252,827 branches                  #  517.271 M/sec                    ( +-  0.01% ) [83.79%]
>     25,444,376,740 branch-misses             #    3.75% of all branches          ( +-  0.02% ) [83.79%]
> 
>      214.533868029 seconds time elapsed                                          ( +-  0.36% )

Hm, a 6 seconds win looks _way_ too much - we don't execute that much 
mutex code, let alone a portion of it.

This could perhaps be a bootup-to-bootup cache layout systematic jitter 
artifact, which isn't captured by stddev observations?

Doing something like this with a relatively fresh version of perf:

  perf stat --repeat 10 -a --sync \
   --pre 'make -s O=defconfig-build/ clean; echo 1 > /proc/sys/vm/drop_caches' \
   make -s -j64 O=defconfig-build/ bzImage

... might do the trick (untested!). (Also note the use of -a: this should 
run on an otherwise quiescent system.)

As a sidenote, we could add this as a convenience feature, triggered via:

   perf stat --flush-vm-caches

... or so, in addition to the already existing --sync option.

Thanks,

	Ingo

  reply	other threads:[~2013-07-01 11:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-28 10:54 [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64 Wedson Almeida Filho
2013-06-28 11:19 ` Ingo Molnar
2013-06-28 14:09   ` Borislav Petkov
2013-06-28 14:12     ` H. Peter Anvin
2013-06-28 15:15       ` Peter Zijlstra
2013-06-28 16:41       ` [PATCH] x86, cpufeature: Use new CC_HAVE_ASM_GOTO Borislav Petkov
2013-07-05 14:24         ` [tip:x86/cpu] " tip-bot for Borislav Petkov
2013-06-29 23:56     ` [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64 Wedson Almeida Filho
2013-06-30 22:00       ` Borislav Petkov
2013-07-01  7:50         ` Ingo Molnar
2013-07-01 10:23           ` Borislav Petkov
2013-07-01 11:11             ` Ingo Molnar [this message]
2013-07-01 12:29               ` Borislav Petkov
2013-07-01 12:50                 ` Ingo Molnar
2013-07-01 14:48                   ` Borislav Petkov
2013-07-01 22:28                     ` Borislav Petkov
2013-07-01 22:35                       ` Wedson Almeida Filho
2013-07-01 22:44                         ` Borislav Petkov
2013-07-02  6:39                           ` Ingo Molnar
2013-07-02 10:29                             ` Borislav Petkov
2013-07-01 14:30             ` H. Peter Anvin
2013-07-01 14:36               ` Borislav Petkov
2013-07-01 14:45                 ` H. Peter Anvin
2013-07-01 14:50                   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130701111122.GA18772@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=wedsonaf@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.