linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@kernel.org>, Paul Mackerras <paulus@samba.org>,
	Namhyung Kim <namhyung.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>,
	David Ahern <dsahern@gmail.com>, Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCHSET 00/17] perf report: Add -F option for specifying output fields (v4)
Date: Tue, 29 Apr 2014 13:27:35 -0400	[thread overview]
Message-ID: <20140429172735.GD32375@redhat.com> (raw)
In-Reply-To: <878uqpos3k.fsf@sejong.aot.lge.com>

On Tue, Apr 29, 2014 at 10:13:35AM +0900, Namhyung Kim wrote:
> > Our orignal concept for the c2c tool was to sort hist entries into
> > cachelines, filter in only the HITMs and stores and re-sort based on
> > cachelines with the most weight.
> >
> > So using today's perf with a new search called 'cacheline' to achieve
> > this (copy-n-pasted):
> 
> Maybe 'd'cacheline is a more appropriate name IMHO.

Fair enough :-)

> 
> >
> > ----
> > #define CACHE_LINESIZE       64
> > #define CLINE_OFFSET_MSK     (CACHE_LINESIZE - 1)
> > #define CLADRS(a)            ((a) & ~(CLINE_OFFSET_MSK))
> > #define CLOFFSET(a)          (int)((a) &  (CLINE_OFFSET_MSK))
> >
> > static int64_t
> > sort__cacheline_cmp(struct hist_entry *left, struct hist_entry *right)
> > {
> >        u64 l, r;
> >        struct map *l_map, *r_map;
> >
> >        if (!left->mem_info)  return -1;
> >        if (!right->mem_info) return 1;
> >
> >        /* group event types together */
> >        if (left->cpumode > right->cpumode) return -1;
> >        if (left->cpumode < right->cpumode) return 1;
> >
> >        l_map = left->mem_info->daddr.map;
> >        r_map = right->mem_info->daddr.map;
> >
> >        /* properly sort NULL maps to help combine them */
> >        if (!l_map && !r_map)
> >                goto addr;
> >
> >        if (!l_map) return -1;
> >        if (!r_map) return 1;
> >
> >        if (l_map->maj > r_map->maj) return -1;
> >        if (l_map->maj < r_map->maj) return 1;
> >
> >        if (l_map->min > r_map->min) return -1;
> >        if (l_map->min < r_map->min) return 1;
> >
> >        if (l_map->ino > r_map->ino) return -1;
> >        if (l_map->ino < r_map->ino) return 1;
> >
> >        if (l_map->ino_generation > r_map->ino_generation) return -1;
> >        if (l_map->ino_generation < r_map->ino_generation) return 1;
> >
> >        /*
> >         * Addresses with no major/minor numbers are assumed to be
> >         * anonymous in userspace.  Sort those on pid then address.
> >         *
> >         * The kernel and non-zero major/minor mapped areas are
> >         * assumed to be unity mapped.  Sort those on address.
> >         */
> >
> >        if ((left->cpumode != PERF_RECORD_MISC_KERNEL) &&
> >            !l_map->maj && !l_map->min && !l_map->ino &&
> >            !l_map->ino_generation) {
> >                /* userspace anonymous */
> >
> >                if (left->thread->pid_ > right->thread->pid_) return -1;
> >                if (left->thread->pid_ < right->thread->pid_) return 1;
> 
> Isn't it necessary to check whether the address is in a same map in case
> of anon pages?  I mean the daddr.al_addr is a map-relative offset so it
> might have same value for different maps.

That's why I sort on pids here.  Because the anon address might have the
same value for different maps.  The thought was to group all the pid
addresses together to keep things seperated.

Do you see a different way to solve the problem?  I am not sure al_addr
vs. addr will make much difference here.

> 
> 
> >        }
> >
> > addr:
> >        /* al_addr does all the right addr - start + offset calculations */
> >        l = CLADRS(left->mem_info->daddr.al_addr);
> >        r = CLADRS(right->mem_info->daddr.al_addr);
> >
> >        if (l > r) return -1;
> >        if (l < r) return 1;
> >
> >        return 0;
> > }
> >
> > ----
> >
> > I can get the following 'perf mem report' outputs
> >
> > I used a special program called hitm_test3 which purposely generates
> > HITMs either locally or remotely based on cpu input.  It does this by
> > having processA grab lockX from cacheline1, release lockY from cacheline2,
> > then processB grabs lockY from cacheline2 and releases lockX from
> > cacheline1 (IOW ping pong two locks across two cachelines), found here
> >
> > http://people.redhat.com/dzickus/hitm_test/
> >
> > [ perf mem record -a hitm_test -s1,19 -c1000000 -t]
> >
> > (where -s is the cpus to bind to, -c is loop count, -t disables internal
> > perf tracking)
> >
> > (using 'perf mem' to auto generate correct record/report options for
> > cachelines)
> > (the hitm counts should be higher, but sampling is a crapshoot.  Using
> > ld_lat=30 would probably filter most of the L1 hits)
> >
> > Table 1: normal perf
> > #perf mem report --stdio -s cacheline,pid
> >
> >
> > # Overhead       Samples  Cacheline                       Command:  Pid
> > # ........  ............  .......................  ....................
> > #
> >     47.61%         42257  [.] 0x0000000000000080      hitm_test3:146344
> >     46.14%         42596  [.] 0000000000000000        hitm_test3:146343
> >      2.16%          2074  [.] 0x0000000000003340      hitm_test3:146344
> >      1.88%          1796  [.] 0x0000000000003340      hitm_test3:146343
> >      0.20%           140  [.] 0x00007ffff291ce00      hitm_test3:146344
> >      0.18%           126  [.] 0x00007ffff291ce00      hitm_test3:146343
> >      0.10%             1  [k] 0xffff88042f071500         swapper:     0
> >      0.07%             1  [k] 0xffff88042ef747c0     watchdog/11:    62
> > ...
> >
> > Ok, now I know the hottest cachelines. Not to bad.  However, in order to
> > determine cacheline contention, it would be nice to know the offsets into
> > the cacheline to see if there is contention or not. Unfortunately, the way
> > the sorting works here, all the hist_entry data was combined into each
> > cacheline, so I lose my granularity...
> >
> > I can do:
> >
> > Table 2: normal perf
> > #perf mem report --stdio -s cacheline,pid,dso_daddr,mem
> >
> >
> > # Overhead       Samples  Cacheline                       Command:  Pid
> > # Data Object             Memory access
> > # ........  ............  .......................  ....................
> > # ..............................  ........................
> > #
> >     45.24%         42581  [.] 0000000000000000        hitm_test3:146343 SYSV00000000 (deleted)          L1 hit
> >     44.43%         42231  [.] 0x0000000000000080      hitm_test3:146344 SYSV00000000 (deleted)          L1 hit
> >      2.19%            13  [.] 0x0000000000000080      hitm_test3:146344 SYSV00000000 (deleted)          Local RAM hit
> >      2.16%          2074  [.] 0x0000000000003340      hitm_test3:146344 hitm_test3                      L1 hit
> >      1.88%          1796  [.] 0x0000000000003340      hitm_test3:146343 hitm_test3                      L1 hit
> >      1.00%            13  [.] 0x0000000000000080      hitm_test3:146344 SYSV00000000 (deleted)          Remote Cache (1 hop) hit
> >      0.91%            15  [.] 0000000000000000        hitm_test3:146343 SYSV00000000 (deleted)          Remote Cache (1 hop) hit
> >      0.20%           140  [.] 0x00007ffff291ce00      hitm_test3:146344 [stack]                         L1 hit
> >      0.18%           126  [.] 0x00007ffff291ce00      hitm_test3:146343 [stack]                         L1 hit
> >
> > Now I have some granularity (though the program keeps hitting the same
> > offset in the cacheline) and some different levels of memory operations.
> > Seems like a step forward.  However, the cacheline is broken up a little
> > bit (see 0x0000000000000080 is split up three ways).
> >
> > I can now see where the cache contention is but I don't know how prevalent
> > it is (what percentage of the cacheline is under contention).  No need to
> > waste time with cachelines that have little or no contention.
> >
> > Hmm, what if I used the -F option to group all the cachelines and their
> > offsets together.
> >
> > Table 3: perf with -F
> > #perf mem report --stdio -s cacheline,pid,dso_daddr,mem  -i don.data -F cacheline,pid,dso_daddr,mem,overhead,sample|grep 0000000000000
> >   [k] 0000000000000000           swapper:     0  [kernel.kallsyms] Uncached hit                 0.00%             1
> >   [k] 0000000000000000            kipmi0:  1500  [kernel.kallsyms] Uncached hit                 0.02%             1
> >   [.] 0000000000000000        hitm_test3:146343  SYSV00000000 (deleted) L1 hit                      45.24%         42581
> >   [.] 0000000000000000        hitm_test3:146343  SYSV00000000 (deleted) Remote Cache (1 hop) hit     0.91%            15
> >   [.] 0x0000000000000080      hitm_test3:146344  SYSV00000000 (deleted) L1 hit                      44.43%         42231
> >   [.] 0x0000000000000080      hitm_test3:146344  SYSV00000000 (deleted) Local RAM hit                2.19%            13
> >   [.] 0x0000000000000080      hitm_test3:146344  SYSV00000000 (deleted) Remote Cache (1 hop) hit     1.00%            13
> >
> > Now I have the ability to see the whole cacheline easily and can probably
> > roughly calculate the contention in my head.  Of course there was some
> > pre-determined knowledge that was needed to get this info (like which
> > cacheline is interesting from Table 1).
> >
> > Of course, our c2c tool was trying to make the output more readable and
> > more obvious such that the user didn't have to know what to look for.
> >
> > Internally our tool sorts similar to Table2, but then resorts onto a new
> > rbtree with a struct c2c_hit based on the hottest cachelines.  Based on
> > this new rbtree we can print our analysis easily.
> >
> > This new rbtree is slightly different than the -F output in that we
> > 'group' cacheline entries together and re-sort that group.  The -F option
> > just resorts the sorted hist_entry and has no concept of grouping.
> >
> >
> >
> >
> > We would prefer to have a 'group' sorting concept as we believe that is
> > the easiest way to organize the data.  But I don't know if that can be
> > incorporated into the 'perf' tool itself, or just keep that concept local
> > to our flavor of the perf subcommand.
> >
> > I am hoping this semi-concocted example gives a better picture of the
> > problem I am trying to wrestle with.
> 
> Yep, I understand your problem.
> 
> And I think it's good for having the group sorting concept in perf tools
> for general use.  But it has a conflict with the proposed change of -F
> option when non-sort keys are used for the -s or -F.  So it needs more
> thinking..
> 
> Unfortunately I'll be busy by the end of next week.  So I'll be able to
> discuss and work on it later.

Ok.  I am glad you are able to understand my problem.  It will be nice to
have others looking at a solution too. :-)

Cheers,
Don

  reply	other threads:[~2014-04-29 17:28 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-16  3:05 [PATCHSET 00/17] perf report: Add -F option for specifying output fields (v4) Namhyung Kim
2014-04-16  3:05 ` [PATCH 01/17] perf tools: Add ->cmp(), ->collapse() and ->sort() to perf_hpp_fmt Namhyung Kim
2014-04-16  3:05 ` [PATCH 02/17] perf tools: Convert sort entries to hpp formats Namhyung Kim
2014-04-16  3:05 ` [PATCH 03/17] perf tools: Use hpp formats to sort hist entries Namhyung Kim
2014-04-16  3:05 ` [PATCH 04/17] perf tools: Support event grouping in hpp ->sort() Namhyung Kim
2014-04-16  3:05 ` [PATCH 05/17] perf tools: Use hpp formats to sort final output Namhyung Kim
2014-04-16  3:05 ` [PATCH 06/17] perf tools: Consolidate output field handling to hpp format routines Namhyung Kim
2014-04-16  3:05 ` [PATCH 07/17] perf ui: Get rid of callback from __hpp__fmt() Namhyung Kim
2014-04-16  3:05 ` [PATCH 08/17] perf tools: Allow hpp fields to be sort keys Namhyung Kim
2014-04-16  3:05 ` [PATCH 09/17] perf tools: Consolidate management of default sort orders Namhyung Kim
2014-04-16  3:05 ` [PATCH 10/17] perf tools: Call perf_hpp__init() before setting up GUI browsers Namhyung Kim
2014-04-16  3:05 ` [PATCH 11/17] perf report: Add -F option to specify output fields Namhyung Kim
2014-04-16  3:05 ` [PATCH 12/17] perf tools: Add ->sort() member to struct sort_entry Namhyung Kim
2014-04-16  3:05 ` [PATCH 13/17] perf report/tui: Fix a bug when --fields/sort is given Namhyung Kim
2014-04-16  3:05 ` [PATCH 14/17] perf top: Add --fields option to specify output fields Namhyung Kim
2014-04-16  3:05 ` [PATCH 15/17] perf diff: Add missing setup_output_field() Namhyung Kim
2014-04-16  3:05 ` [PATCH 16/17] perf tools: Skip elided sort entries Namhyung Kim
2014-04-16  3:05 ` [PATCH 17/17] perf hists: Reset width of output fields with header length Namhyung Kim
2014-04-22 21:16 ` [PATCHSET 00/17] perf report: Add -F option for specifying output fields (v4) Don Zickus
2014-04-23  6:15   ` Namhyung Kim
2014-04-23 12:58     ` Don Zickus
2014-04-24 13:41       ` Namhyung Kim
2014-04-24 21:00         ` Don Zickus
2014-04-25  7:58           ` Namhyung Kim
2014-04-28 19:46           ` Don Zickus
2014-04-29  1:13             ` Namhyung Kim
2014-04-29 17:27               ` Don Zickus [this message]
2014-04-29 23:38                 ` Namhyung Kim
2014-04-30 13:35                   ` Don Zickus
2014-05-07  3:05                     ` Namhyung Kim
2014-05-07 15:22                       ` Don Zickus
2014-05-09  6:11                         ` Namhyung Kim
2014-05-09 13:33                           ` Don Zickus
2014-05-04 17:53 ` Jiri Olsa
2014-05-07  3:09   ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140429172735.GD32375@redhat.com \
    --to=dzickus@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=andi@firstfloor.org \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung.kim@lge.com \
    --cc=namhyung@kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).