All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>, x86-ml <x86@kernel.org>,
	Yves Dionne <yves.dionne@gmail.com>,
	Brice Goglin <Brice.Goglin@inria.fr>,
	Peter Zijlstra <peterz@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] x86/CPU/AMD: Bring back Compute Unit ID
Date: Thu, 2 Feb 2017 17:29:01 +0100	[thread overview]
Message-ID: <20170202162901.GB12498@gmail.com> (raw)
In-Reply-To: <CY4PR12MB164014100E42E4970B846B18F84C0@CY4PR12MB1640.namprd12.prod.outlook.com>


* Ghannam, Yazen <Yazen.Ghannam@amd.com> wrote:

> Here are my results on a 32C Bulldozer system with an SSD. Also, I use ccache so 
> I added "ccache -C" in the pre-build script so the cache gets cleared.
> 
> Before:
> Performance counter stats for 'make -s -j65 bzImage' (3 runs):
> 
>     2375752.777479      task-clock (msec)         #   23.589 CPUs utilized            ( +-  0.35% )
>          1,198,979      context-switches          #    0.505 K/sec                    ( +-  0.34% )
>      8,964,671,259      cache-misses                                                  ( +-  0.44% )
>             79,399      cpu-migrations            #    0.033 K/sec                    ( +-  1.92% )
>         37,840,875      page-faults               #    0.016 M/sec                    ( +-  0.20% )
>  5,425,612,846,538      cycles                    #    2.284 GHz                      ( +-  0.36% )
>  3,367,750,745,825      instructions              #    0.62  insn per cycle                                              ( +-  0.11% )
>    750,591,286,261      branches                  #  315.938 M/sec                    ( +-  0.11% )
>     43,544,059,077      branch-misses             #    5.80% of all branches          ( +-  0.08% )
> 
>      100.716043494 seconds time elapsed                                          ( +-  1.97% )
> 
> After:
> Performance counter stats for 'make -s -j65 bzImage' (3 runs):
> 
>     1736720.488346      task-clock (msec)         #   23.529 CPUs utilized            ( +-  0.16% )
>          1,144,737      context-switches          #    0.659 K/sec                    ( +-  0.20% )
>      8,570,352,975      cache-misses                                                  ( +-  0.33% )
>             91,817      cpu-migrations            #    0.053 K/sec                    ( +-  1.67% )
>         37,688,118      page-faults               #    0.022 M/sec                    ( +-  0.03% )
>  5,547,082,899,245      cycles                    #    3.194 GHz                      ( +-  0.19% )
>  3,363,365,420,405      instructions              #    0.61  insn per cycle                                              ( +-  0.00% )
>    749,676,420,820      branches                  #  431.662 M/sec                    ( +-  0.00% )
>     43,243,046,270      branch-misses             #    5.77% of all branches          ( +-  0.01% )
> 
>       73.810517234 seconds time elapsed                                          ( +-  0.02% )

That's pretty impressive: ~35% difference in wall clock performance of this 
workload.

And that while both the cycles and the instructions count is within 2.5% of each 
other. The only stat the differs beyond the level of noise is cache-misses:

      8,964,671,259      cache-misses                                                  ( +-  0.44% )
      8,570,352,975      cache-misses                                                  ( +-  0.33% )

which is 4.5%, but I have trouble believing that just 4.5% more cachemisses can 
have such a massive effect on performance.

So unless +4.5% cachemisses can cause a 35% difference in performance this is a 
really weird result. Where did the extra performance come from - was the 'good' 
workload perhaps running at higher CPU frequencies for some reason?

Thanks,

	Ingo

      reply	other threads:[~2017-02-02 16:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-01 20:02 [RFC PATCH] x86/CPU/AMD: Bring back Compute Unit ID Borislav Petkov
2017-02-01 21:37 ` Ghannam, Yazen
2017-02-01 21:44   ` Borislav Petkov
2017-02-01 21:55     ` Ghannam, Yazen
2017-02-01 22:25       ` Borislav Petkov
2017-02-01 22:41         ` Borislav Petkov
2017-02-02 12:10           ` Borislav Petkov
2017-02-02 15:43             ` Borislav Petkov
2017-02-02 16:09               ` Ingo Molnar
2017-02-02 17:04                 ` Borislav Petkov
2017-02-02 18:10                   ` Borislav Petkov
2017-02-02 20:45                     ` Ghannam, Yazen
2017-02-02 16:14             ` Ghannam, Yazen
2017-02-02 16:29               ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170202162901.GB12498@gmail.com \
    --to=mingo@kernel.org \
    --cc=Brice.Goglin@inria.fr \
    --cc=Yazen.Ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    --cc=yves.dionne@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.