All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Jiri Olsa <jolsa@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Adrian Hunter <adrian.hunter@intel.com>,
	Corey Ashford <cjashfor@linux.vnet.ibm.com>,
	David Ahern <dsahern@gmail.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Jean Pihet <jean.pihet@linaro.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Paul Mackerras <paulus@samba.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 17/19] perf tools: Always force PERF_RECORD_FINISHED_ROUND event
Date: Thu, 24 Jul 2014 18:34:51 -0300	[thread overview]
Message-ID: <20140724213451.GH7831@kernel.org> (raw)
In-Reply-To: <1405893363-21967-18-git-send-email-jolsa@kernel.org>

Em Sun, Jul 20, 2014 at 11:56:01PM +0200, Jiri Olsa escreveu:
> The PERF_RECORD_FINISHED_ROUND governs queue flushing in
> reporting, so it needs to be stored for any kind of event.
> 
> Forcing the PERF_RECORD_FINISHED_ROUND event to be stored any
> time we finish the round and wrote at least one event.

This is not just one change, it does two things and it is not clearly
detailed here.

The existing code would write a PERF_RECORD_FINISHED_ROUND per call to
record__mmap_read() for tracepoints were present.

I.e. if tracepoints and any other type of event type was present, then
this syntetic PERF_RECORD_FINISHED_ROUND thing is inserted which will be
handled at report time.

Now you changed it to not check if tracepoints are present, i.e. it will
always insert that after processing N events in
PERF_RECORD_FINISHED_ROUND. (change #1)

The thing is, before it didn't checked if N was zero, needlessly
inserting PERF_RECORD_FINISHED_ROUND events in some corner cases.

Now you, in addition to change #1 you also check that the number of
bytes written before that event processing loop in record__mmap_read()
is the same after the loop, i.e. if some events were written to the
output perf.data file, only inserting the PERF_RECORD_FINISHED_ROUND
events if so. (change #2)

I think both changes are OK, but should be split in different patches,
and while testing comparing having the patch applied versus patch
applied + ignore the PERF_RECORD_FINISHED_ROUND events in the resulting
perf.data file, I get:

[root@zoo /]# perf stat -r 5 perf report --no-ordered-samples > /dev/null
<SNIP symbol stuff>
 Performance counter stats for 'perf report --no-ordered-samples' (5 runs):

      30263.033852      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.48% )
               346      context-switches          #    0.011 K/sec                    ( +- 54.65% )
               182      cpu-migrations            #    0.006 K/sec                    ( +- 29.94% )
            89,951      page-faults               #    0.003 M/sec                    ( +-  0.05% )
    92,342,939,311      cycles                    #    3.051 GHz                      ( +-  0.46% )
    50,605,250,178      stalled-cycles-frontend   #   54.80% frontend cycles idle     ( +-  0.88% )
   <not supported>      stalled-cycles-backend   
   101,171,572,553      instructions              #    1.10  insns per cycle        
                                                  #    0.50  stalled cycles per insn  ( +-  0.01% )
    22,643,469,804      branches                  #  748.222 M/sec                    ( +-  0.01% )
       284,395,273      branch-misses             #    1.26% of all branches          ( +-  0.45% )

      30.249514999 seconds time elapsed                                          ( +-  0.48% )

[root@zoo /]# perf stat -r 5 perf report --ordered-samples > /dev/null
<SNIP symbol stuff>
 Performance counter stats for 'perf report --ordered-samples' (5 runs):

      32665.828429      task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.41% )
               304      context-switches          #    0.009 K/sec                    ( +- 23.32% )
               102      cpu-migrations            #    0.003 K/sec                    ( +- 21.70% )
            79,405      page-faults               #    0.002 M/sec                    ( +-  0.02% )
   101,761,091,322      cycles                    #    3.115 GHz                      ( +-  0.40% )
    57,627,138,326      stalled-cycles-frontend   #   56.63% frontend cycles idle     ( +-  0.65% )
   <not supported>      stalled-cycles-backend   
   105,982,144,263      instructions              #    1.04  insns per cycle        
                                                  #    0.54  stalled cycles per insn  ( +-  0.01% )
    23,493,670,235      branches                  #  719.212 M/sec                    ( +-  0.01% )
       319,060,575      branch-misses             #    1.36% of all branches          ( +-  0.19% )

      32.636483981 seconds time elapsed                                          ( +-  0.41% )

[root@zoo /]#

The patch used to enable/disable processing of PERF_RECORD_FINISHED_ROUND is this one:

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 21d830bafff3..8b5410313005 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -675,6 +675,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
 		    "Disable symbol demangling"),
 	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
+	OPT_BOOLEAN(0, "ordered-samples", &report.tool.ordered_samples,
+		    "Make sure samples are ordered by time"),
 	OPT_CALLBACK(0, "percent-limit", &report, "percent",
 		     "Don't show entries under that percent", parse_percent_limit),
 	OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",

---

The workload:

[root@zoo /]# perf report --header-only
# ========
# captured on: Thu Jul 24 17:10:16 2014
# hostname : zoo.ghostprotocols.net
# os release : 3.15.4-200.fc20.x86_64
# perf version : 3.16.rc2.g2b5b8b
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
# cpuid : GenuineIntel,6,58,9
# total memory : 8081004 kB
# cmdline : /home/acme/bin/perf record -F 7000 -a -e cache-misses,instructions,cycles sleep 5m 
# event : name = cache-misses, type = 0, config = 0x3, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, attr_mmap2 = 0, attr_mmap  = 1, attr_mmap_data = 0
# event : name = instructions, type = 0, config = 0x1, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, attr_mmap2 = 0, attr_mmap  = 0, attr_mmap_data = 0
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, attr_mmap2 = 0, attr_mmap  = 0, attr_mmap_data = 0, id =
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, power = 9, uncore_imc = 8, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7, breakpoint = 5
# ========
#

[root@zoo /]# perf evlist -v
cache-misses: sample_freq=7000, config: 3, size: 96, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
instructions: sample_freq=7000, config: 1, size: 96, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
cycles: sample_freq=7000, size: 96, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
[root@zoo /]#

- Arnaldo

> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Jean Pihet <jean.pihet@linaro.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/builtin-record.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 378b85b731a7..4869050e7194 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -238,6 +238,7 @@ static struct perf_event_header finished_round_event = {
>  
>  static int record__mmap_read_all(struct record *rec)
>  {
> +	u64 bytes_written = rec->bytes_written;
>  	int i;
>  	int rc = 0;
>  
> @@ -250,7 +251,11 @@ static int record__mmap_read_all(struct record *rec)
>  		}
>  	}
>  
> -	if (perf_header__has_feat(&rec->session->header, HEADER_TRACING_DATA))
> +	/*
> +	 * Mark the round finished in case we wrote
> +	 * at least one event.
> +	 */
> +	if (bytes_written != rec->bytes_written)
>  		rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
>  
>  out:
> -- 
> 1.8.3.1

  reply	other threads:[~2014-07-24 21:34 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-20 21:55 [PATCHv3 00/19] perf tools: Factor ordered samples queue Jiri Olsa
2014-07-20 21:55 ` [PATCH 01/19] perf tools: Fix accounting of " Jiri Olsa
2014-07-28  8:27   ` [tip:perf/core] perf session: " tip-bot for Jiri Olsa
2014-07-20 21:55 ` [PATCH 02/19] perf tools: Rename ordered_samples bool to ordered_events Jiri Olsa
2014-07-20 21:55 ` [PATCH 03/19] perf tools: Rename ordered_samples struct " Jiri Olsa
2014-07-20 21:55 ` [PATCH 04/19] perf tools: Rename ordered_events members Jiri Olsa
2014-07-20 21:55 ` [PATCH 05/19] perf tools: Add ordered_events_(new|delete) interface Jiri Olsa
2014-07-20 21:55 ` [PATCH 06/19] perf tools: Factor ordered_events_flush to be more generic Jiri Olsa
2014-07-20 21:55 ` [PATCH 07/19] perf tools: Limit ordered events queue size Jiri Olsa
2014-07-20 21:55 ` [PATCH 08/19] perf tools: Flush ordered events in case of allocation failure Jiri Olsa
2014-07-20 21:55 ` [PATCH 09/19] perf tools: Make perf_session_deliver_event global Jiri Olsa
2014-07-20 21:55 ` [PATCH 10/19] perf tools: Create ordered-events object Jiri Olsa
2014-07-20 21:55 ` [PATCH 11/19] perf tools: Use list_move in ordered_events_delete function Jiri Olsa
2014-07-20 21:55 ` [PATCH 12/19] perf tools: Add ordered_events_init function Jiri Olsa
2014-07-20 21:55 ` [PATCH 13/19] perf tools: Add ordered_events_free function Jiri Olsa
2014-07-20 21:55 ` [PATCH 14/19] perf tools: Add perf_config_u64 function Jiri Olsa
2014-07-20 21:55 ` [PATCH 15/19] perf tools: Add report.queue-size config file option Jiri Olsa
2014-07-20 21:56 ` [PATCH 16/19] perf tools: Add debug prints for ordered events queue Jiri Olsa
2014-07-20 21:56 ` [PATCH 17/19] perf tools: Always force PERF_RECORD_FINISHED_ROUND event Jiri Olsa
2014-07-24 21:34   ` Arnaldo Carvalho de Melo [this message]
2014-07-25 11:34     ` Jiri Olsa
2014-07-25 14:14       ` Arnaldo Carvalho de Melo
2014-07-25 15:45         ` Frederic Weisbecker
2014-07-25 16:12           ` Peter Zijlstra
2014-07-20 21:56 ` [PATCH 18/19] perf tools: Limit the ordered events queue by default to 100MB Jiri Olsa
2014-07-20 21:56 ` [PATCH 19/19] perf tools: Allow out of order messages in forced flush Jiri Olsa
2014-07-21  6:43 ` [PATCHv3 00/19] perf tools: Factor ordered samples queue Adrian Hunter
2014-07-21  8:02   ` Jiri Olsa
2014-07-21  8:47     ` Adrian Hunter
2014-07-21  9:54       ` Jiri Olsa
2014-07-21 12:09         ` Adrian Hunter
2014-07-21 12:35           ` Jiri Olsa
2014-07-21 12:58             ` Adrian Hunter
2014-07-21 16:31         ` Andi Kleen
2014-07-21 18:23           ` Adrian Hunter
2014-07-21 18:36             ` David Ahern
2014-07-21 18:44               ` Adrian Hunter
2014-07-24 14:19               ` Arnaldo Carvalho de Melo
2014-07-24 14:56                 ` Jiri Olsa
2014-07-24 15:10                   ` Arnaldo Carvalho de Melo
2014-07-24 15:20                     ` Jiri Olsa
2014-07-24 15:51                     ` David Ahern
2014-07-24 18:01                 ` Adrian Hunter
2014-07-21 19:39             ` Andi Kleen
2014-07-25 14:55 [PATCHv4 " Jiri Olsa
2014-07-25 14:56 ` [PATCH 17/19] perf tools: Always force PERF_RECORD_FINISHED_ROUND event Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140724213451.GH7831@kernel.org \
    --to=acme@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adrian.hunter@intel.com \
    --cc=cjashfor@linux.vnet.ibm.com \
    --cc=dsahern@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jean.pihet@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.