linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Linton <jeremy.linton@arm.com>
To: Tan Xiaojun <tanxiaojun@huawei.com>,
	peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	alexander.shishkin@linux.intel.com, jolsa@redhat.com,
	namhyung@kernel.org, ak@linux.intel.com, adrian.hunter@intel.com,
	yao.jin@linux.intel.com, tmricht@linux.ibm.com,
	brueckner@linux.ibm.com, songliubraving@fb.com,
	gregkh@linuxfoundation.org, Kim Phillips <Kim.Phillips@amd.com>
Cc: gengdongjiu@huawei.com, wxf.wang@hisilicon.com,
	liwei391@huawei.com, huawei.libin@huawei.com,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
Date: Thu, 8 Aug 2019 16:00:12 -0500	[thread overview]
Message-ID: <0ac06995-273c-034d-52a3-921ea0337be2@arm.com> (raw)
In-Reply-To: <1564738813-10944-3-git-send-email-tanxiaojun@huawei.com>

Hi,

First thanks for posting this!

I ran this on our DAWN platform and it does what it says. Its a pretty 
reasonable start, but I get -1's in the command row rather than "dd" (or 
similar) and this also results in [unknown] for the shared object and 
most userspace addresses. This is quite possibly something I'm not doing 
right, but I didn't spend a lot of time testing/debugging it.

I did a quick glance at the code to, and had a couple comments, although 
I'm not a perf tool expert.


On 8/2/19 4:40 AM, Tan Xiaojun wrote:
> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
> Profiling Extensions (SPE) support") is merged, "perf record" and
> "perf report --dump-raw-trace" have been supported. However, the
> raw data that is dumped cannot be used without parsing.
> 
> This patch is to improve the "perf report" support for spe, and
> further process the data. Currently, support for the three events
> of llc-miss, tlb-miss, and branch-miss is added.
> 
> Example usage:
> 
> --------------------------------------------------------------------
> ...
>      37.84%    37.84%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>      16.22%    16.22%  dd       [kernel.kallsyms]  [k] copy_page
>       5.41%     5.41%  dd       [kernel.kallsyms]  [k] find_vma
>       5.41%     5.41%  dd       [kernel.kallsyms]  [k] perf_event_mmap
>       5.41%     5.41%  dd       [kernel.kallsyms]  [k] zap_pte_range
>       5.41%     5.41%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>       5.41%     5.41%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>       2.70%     2.70%  dd       [kernel.kallsyms]  [k] __remove_shared_vm_struct.isra.1
>       2.70%     2.70%  dd       [kernel.kallsyms]  [k] kmem_cache_free
>       2.70%     2.70%  dd       [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>       2.70%     2.70%  dd       dd                 [.] 0x000000000000d9d8
>       2.70%     2.70%  dd       ld-2.28.so         [.] _dl_relocate_object
>       2.70%     2.70%  dd       libc-2.28.so       [.] __unregister_atfork
>       2.70%     2.70%  dd       libc-2.28.so       [.] _dl_addr
> 
>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] __audit_syscall_entry
>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] kmem_cache_free
>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>      12.50%    12.50%  dd       dd                 [.] 0x000000000000d9d8
>      12.50%    12.50%  dd       libc-2.28.so       [.] __unregister_atfork
>      12.50%    12.50%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>      12.50%    12.50%  dd       libc-2.28.so       [.] vfprintf
> 
>      16.67%    16.67%  dd       libc-2.28.so       [.] read_alias_file
>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] __arch_copy_to_user
>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] lookup_fast
>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] strncpy_from_user
>       8.33%     8.33%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>       8.33%     8.33%  dd       ld-2.28.so         [.] check_match
>       8.33%     8.33%  dd       libc-2.28.so       [.] __GI___printf_fp_l
>       8.33%     8.33%  dd       libc-2.28.so       [.] _dl_addr
>       8.33%     8.33%  dd       libc-2.28.so       [.] _int_malloc
>       8.33%     8.33%  dd       libc-2.28.so       [.] _nl_intern_locale_data
> 
> --------------------------------------------------------------------
> 
> After that, more analysis and processing of the raw data of spe
> will be done.
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> ---
>   tools/perf/builtin-report.c                        |   5 +
>   tools/perf/util/arm-spe-decoder/Build              |   2 +-
>   tools/perf/util/arm-spe-decoder/arm-spe-decoder.c  | 214 ++++++
>   tools/perf/util/arm-spe-decoder/arm-spe-decoder.h  |  51 ++
>   .../util/arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
>   tools/perf/util/arm-spe.c                          | 715 ++++++++++++++++++++-
>   tools/perf/util/auxtrace.c                         |  45 ++
>   tools/perf/util/auxtrace.h                         |  27 +
>   tools/perf/util/session.h                          |   2 +
>   9 files changed, 1028 insertions(+), 35 deletions(-)
>   create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>   create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index abf0b9b..fadc8eb 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
>   {
>   	struct perf_session *session;
>   	struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
> +	struct arm_spe_synth_opts arm_spe_synth_opts;
>   	struct stat st;
>   	bool has_br_stack = false;
>   	int branch_mode = -1;
> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
>   	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
>   			    "Instruction Tracing options\n" ITRACE_HELP,
>   			    itrace_parse_synth_opts),
> +	OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
> +			    "ARM SPE Tracing options",
> +			    arm_spe_parse_synth_opts),
>   	OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
>   			"Show full source file name path for source lines"),
>   	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
>   	}
>   
>   	session->itrace_synth_opts = &itrace_synth_opts;
> +	session->arm_spe_synth_opts = &arm_spe_synth_opts;
>   
>   	report.session = session;
>   
> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
> index 16efbc2..f8dae13 100644
> --- a/tools/perf/util/arm-spe-decoder/Build
> +++ b/tools/perf/util/arm-spe-decoder/Build
> @@ -1 +1 @@
> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> new file mode 100644
> index 0000000..8008375
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -0,0 +1,214 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm_spe_decoder.c: ARM SPE support
> + */
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <linux/compiler.h>
> +#include <linux/zalloc.h>
> +
> +#include "../util.h"
> +#include "../auxtrace.h"
> +
> +#include "arm-spe-pkt-decoder.h"
> +#include "arm-spe-decoder.h"
> +
> +struct arm_spe_decoder {
> +	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
> +	void *data;
> +	struct arm_spe_state state;
> +	const unsigned char *buf;
> +	size_t len;
> +	uint64_t pos;
> +	struct arm_spe_pkt packet;
> +	int pkt_step;
> +	int pkt_len;
> +	int last_packet_type;
> +
> +	uint64_t last_ip;
> +	uint64_t ip;
> +	uint64_t timestamp;
> +	uint64_t sample_timestamp;
> +	const unsigned char *next_buf;
> +	size_t next_len;
> +	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
> +};
> +
> +static uint64_t arm_spe_calc_ip(uint64_t payload)
> +{
> +	uint64_t ip = (payload & ~(0xffULL << 56));
> +
> +	/* fill high 8 bits for kernel virtual address */
> +	if (ip & 0x1000000000000ULL)

It might be better to use VA_START here if possible.

> +		ip |= (uint64_t)0xff00000000000000ULL;
> +
> +	return ip;
> +}
> +
> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
> +{
> +	struct arm_spe_decoder *decoder;
> +
> +	if (!params->get_trace)
> +		return NULL;
> +
> +	decoder = zalloc(sizeof(struct arm_spe_decoder));
> +	if (!decoder)
> +		return NULL;
> +
> +	decoder->get_trace          = params->get_trace;
> +	decoder->data               = params->data;
> +
> +	return decoder;
> +}
> +
> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
> +{
> +	free(decoder);
> +}
> +
> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
> +{
> +	decoder->pkt_len = 1;
> +	decoder->pkt_step = 1;
> +	pr_debug("ERROR: Bad packet\n");
> +
> +	return -EBADMSG;
> +}
> +
> +
> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
> +{
> +	struct arm_spe_buffer buffer = { .buf = 0, };
> +	int ret;
> +
> +	decoder->pkt_step = 0;
> +
> +	pr_debug("Getting more data\n");
> +	ret = decoder->get_trace(&buffer, decoder->data);
> +	if (ret)
> +		return ret;
> +
> +	decoder->buf = buffer.buf;
> +	decoder->len = buffer.len;
> +	if (!decoder->len) {
> +		pr_debug("No more data\n");
> +		return -ENODATA;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
> +{
> +	return arm_spe_get_data(decoder);
> +}
> +
> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
> +{
> +	int ret;
> +
> +	decoder->last_packet_type = decoder->packet.type;
> +
> +	do {
> +		decoder->pos += decoder->pkt_step;
> +		decoder->buf += decoder->pkt_step;
> +		decoder->len -= decoder->pkt_step;
> +
> +
> +		if (!decoder->len) {
> +			ret = arm_spe_get_next_data(decoder);
> +			if (ret)
> +				return ret;
> +		}
> +
> +		ret = arm_spe_get_packet(decoder->buf, decoder->len,
> +				&decoder->packet);
> +		if (ret <= 0)
> +			return arm_spe_bad_packet(decoder);
> +
> +		decoder->pkt_len = ret;
> +		decoder->pkt_step = ret;
> +	} while (decoder->packet.type == ARM_SPE_PAD);
> +
> +	return 0;
> +}
> +
> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
> +{
> +	int err;
> +	int idx;
> +	uint64_t payload;
> +
> +	while (1) {
> +		err = arm_spe_get_next_packet(decoder);
> +		if (err)
> +			return err;
> +
> +		idx = decoder->packet.index;
> +		payload = decoder->packet.payload;
> +
> +		switch (decoder->packet.type) {
> +		case ARM_SPE_TIMESTAMP:
> +			decoder->sample_timestamp = payload;
> +			return 0;
> +		case ARM_SPE_END:
> +			decoder->sample_timestamp = 0;
> +			return 0;
> +		case ARM_SPE_ADDRESS:
> +			decoder->ip = arm_spe_calc_ip(payload);
> +			if (idx == 0)
> +				decoder->state.from_ip = decoder->ip;
> +			else if (idx == 1)
> +				decoder->state.to_ip = decoder->ip;
> +			break;
> +		case ARM_SPE_COUNTER:
> +			break;
> +		case ARM_SPE_CONTEXT:
> +			break;
> +		case ARM_SPE_OP_TYPE:
> +			break;
> +		case ARM_SPE_EVENTS:
> +			if (payload & 0x20)
> +				decoder->state.type |= ARM_SPE_TLB_MISS;
> +			if (payload & 0x80)
> +				decoder->state.type |= ARM_SPE_BRANCH_MISS;
> +			if (idx > 1 && (payload & 0x200))
> +				decoder->state.type |= ARM_SPE_LLC_MISS;
> +
> +			break;
> +		case ARM_SPE_DATA_SOURCE:
> +			break;
> +		case ARM_SPE_BAD:
> +			break;
> +		case ARM_SPE_PAD:
> +			break;
> +		default:
> +			pr_err("Get Packet Error!\n");
> +			return -ENOSYS;
> +		}
> +	}
> +}

This code looks very similar to  arm_spe_pkt_desc(), I can't help but 
think they should be consolidated in some way. If nothing else the magic 
0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.


> +
> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
> +{
> +	int err;
> +
> +	decoder->state.type = 0;
> +
> +	err = arm_spe_walk_trace(decoder);
> +	if (err)
> +		decoder->state.err = err;
> +
> +	decoder->state.timestamp = decoder->sample_timestamp;
> +
> +	return &decoder->state;

(trimming remainder)


  reply	other threads:[~2019-08-08 21:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02  9:40 [RFC PATCH 0/3] perf tools: Add support for "report" for some spe events Tan Xiaojun
2019-08-02  9:40 ` [RFC PATCH 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir Tan Xiaojun
2019-08-02  9:40 ` [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events Tan Xiaojun
2019-08-08 21:00   ` Jeremy Linton [this message]
2019-08-09  6:12     ` Tan Xiaojun
2019-10-04 13:46       ` James Clark
2019-10-08  5:59         ` Tan Xiaojun
2019-10-09  9:48           ` James Clark
2019-10-09 11:09             ` Tan Xiaojun
2019-10-09 11:49               ` Tan Xiaojun
2019-10-16 10:12                 ` James Clark
2019-10-17  1:51                   ` Tan Xiaojun
2019-10-17  6:08                     ` Tan Xiaojun
2019-10-09  2:43         ` Tan Xiaojun
2019-10-09  3:06         ` Tan Xiaojun
2019-08-02  9:40 ` [RFC PATCH 3/3] perf report: add --spe options for arm-spe Tan Xiaojun
2019-08-21 12:38   ` James Clark
2019-08-22  1:44     ` Tan Xiaojun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ac06995-273c-034d-52a3-921ea0337be2@arm.com \
    --to=jeremy.linton@arm.com \
    --cc=Kim.Phillips@amd.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=brueckner@linux.ibm.com \
    --cc=gengdongjiu@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=huawei.libin@huawei.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=liwei391@huawei.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=tanxiaojun@huawei.com \
    --cc=tmricht@linux.ibm.com \
    --cc=wxf.wang@hisilicon.com \
    --cc=yao.jin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).